CN115269163A - Resource allocation method and device for computer graph, computer equipment and storage medium - Google Patents

Resource allocation method and device for computer graph, computer equipment and storage medium Download PDF

Info

Publication number
CN115269163A
CN115269163A CN202110474902.9A CN202110474902A CN115269163A CN 115269163 A CN115269163 A CN 115269163A CN 202110474902 A CN202110474902 A CN 202110474902A CN 115269163 A CN115269163 A CN 115269163A
Authority
CN
China
Prior art keywords
operator graph
operator
core
time period
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110474902.9A
Other languages
Chinese (zh)
Inventor
吴欣洋
祝夭龙
李涵
胡川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202110474902.9A priority Critical patent/CN115269163A/en
Priority to PCT/CN2021/114217 priority patent/WO2022042519A1/en
Publication of CN115269163A publication Critical patent/CN115269163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a resource allocation method and device of an operator graph, computer equipment and a storage medium. The method comprises the following steps: acquiring a demanded operator graph configured with operation demand information from an operator graph set to be operated in a many-core system; dividing an operation cycle into a plurality of operation time periods, and distributing the operation time periods for the operator graphs with the requirements according to the operation requirement information of the operator graphs with the requirements; determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph; and the operation data of the demanded algorithm graph in the distributed operation time period meets the operation demand information in the operation period. The embodiment of the invention can reasonably configure the core resources of the many-core system.

Description

Resource allocation method and device for computer graph, computer equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of artificial intelligence, in particular to a resource allocation method and device of an operator graph, computer equipment and a storage medium.
Background
In recent years, with the rapid development of artificial intelligence-related applications and technologies, demands for computing power and power consumption efficiency have been increasing. In the related art, the defect that the core resource allocation of the many-core system is unreasonable exists.
Disclosure of Invention
The embodiment of the invention provides a resource allocation method and device of an operator graph, computer equipment and a storage medium, which can reasonably allocate core resources of a many-core system.
In a first aspect, an embodiment of the present invention provides a resource allocation method for an operator graph, which is applied to a many-core system, where the many-core system includes an allocable core resource, and includes:
acquiring a demanded operator graph configured with operation demand information in an operator graph set to be operated in a many-core system;
dividing an operation cycle into a plurality of operation time periods, and distributing the operation time periods for the operator graphs with the requirements according to the operation requirement information of the operator graphs with the requirements;
determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph;
and the running data of the computer graph with the demand in the distributed running time period meets the running demand information in the running period.
In a second aspect, an embodiment of the present invention further provides a resource allocation apparatus for an operator graph, configured in a many-core system, where the many-core system includes an allocable core resource, and the resource allocation apparatus includes:
the system comprises a demand operator graph acquisition module, a demand operator graph acquisition module and a demand operator graph acquisition module, wherein the demand operator graph acquisition module is used for acquiring a demand operator graph configured with operation demand information in an operator graph set to be operated in a many-core system;
the operation time period distribution module is used for dividing an operation cycle into a plurality of operation time periods and distributing the operation time periods to the operator graphs with the demands according to the operation demand information of the operator graphs with the demands;
the target core resource allocation module is used for determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph; and the operation data of the demanded algorithm graph in the distributed operation time period meets the operation demand information in the operation period.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program that is stored in the memory and is executable on the processor, where when the processor executes the computer program, the resource allocation method for an operator graph according to any one of the embodiments of the present invention is implemented.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the resource allocation method for an operator graph according to any one of the embodiments of the present invention.
The embodiment of the invention allocates the operation time period for the operator graph with the demand and determines the target core resource of the operator graph with the demand in the operation time period, thereby solving the problem of the waste of the core resource caused by the full resource allocation of each operator graph in the related technology, providing an allocation mode of the time resource and the core resource, and being capable of executing the operator graph with the demand in a time-sharing way, allocating the adaptive core resource to meet the operation demand information, realizing the reasonable allocation of the resource required by the operator graph, allocating the resource in a targeted way, improving the utilization rate of the resource and reducing the waste of the resource.
Drawings
FIG. 1 is a flowchart of a resource allocation method for operator graphs in a first embodiment of the present invention;
FIG. 2 is a flowchart of a resource allocation method for an operator graph according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a resource allocation method for operator graphs in a third embodiment of the present invention;
FIG. 4 is a flowchart of a resource allocation method for operator graphs in the fourth embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a resource allocation apparatus for operator graphs in the fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device in a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the embodiments described herein are illustrative only and are not limiting upon the present invention. It should be further noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings, not all of them.
Example one
Fig. 1 is a flowchart of a resource allocation method for an operator graph in a first embodiment of the present invention, where this embodiment is applicable to a situation where operation requirement information of an operator graph is adapted to allocate resources to the operator graph, and the method may be executed by a resource allocation device for the operator graph provided in the first embodiment of the present invention, where the device may be implemented in a software and/or hardware manner, and may generally be integrated into a computer device. As shown in fig. 1, the method of the present embodiment includes:
any embodiment of the invention is applied to a many-core system, which comprises a distributable core resource.
Wherein, allocatable Core resources may refer to cores (cores) dedicated to be allocated to an operator graph. The core is the minimum unit which can be independently scheduled and has complete computing capacity, and the core has own storage resources and computing resources. The resources of the computer device comprise core resources which are distributed to programs of the non-operator graph besides the core resources which can be distributed, so as to run other programs. The computer equipment for executing the resource allocation method of the operator graph comprises a many-core system, and the many-core system is used for simultaneously operating a plurality of operator graphs. The Many Core is a Core set which is composed of a large number (hundreds to thousands in the future) of cores and various types of cores, connected together in a preset manner and has high-performance parallel processing capability.
S110, obtaining a demanded operator graph configured with operation demand information in an operator graph set to be operated in the many-core system.
An operator graph collection may refer to a collection of operator graphs running in a many-core system. The operation requirement information may refer to information of operation performance that the corresponding operator graph needs to achieve. The operational requirement information may be configurable by a user. Illustratively, the operation requirement information includes an operation speed minimum value and/or an operation accuracy minimum value of the operator graph configured with the operation requirement information, and the like. The operator graph with the requirement refers to an operator graph with operation requirement information configured in the operator graph set. And matching the running data of the demanded algorithm graph with the running demand information, which shows that the demanded algorithm graph is run by adopting the distributed running time period and the distributed resources, and the obtained running data meets the requirement of the running demand information. The operation performance of the computer graph which is not configured with the operation requirement information is unlimited and has no requirement, and any core resource larger than zero can be allocated to operate.
Wherein the operator graph set comprises at least one operator graph. The operator graph comprises at least one operator, and when the operator graph comprises at least two operators, the output of the previous operator is used as the input of the adjacent subsequent operator in the operator graph. The operator can be convolution, addition, subtraction, multiplication, division or matrix addition, multiplication and other algorithms. The operator graph is used for realizing a specific function, the operator graph may refer to a high-performance computing algorithm, and the operator graph may include, but is not limited to, an AI algorithm, a machine learning algorithm, and a general-purpose scientific computing algorithm, for example, the operator graph is a deep learning model, and as another example, the operator graph is a neural network. The operator graphs to be allocated with resources may be multiple, and at least two operator graphs may be independent operator graphs or operator graphs with dependency relationships, where the two operator graphs with dependency relationships may refer to: the output of the first operator graph serves as the input to the second operator graph. And (4) the operator graph needs resources when running, core resources are distributed for the operator graph so as to run the operator graph, input data are calculated, output data are obtained, and therefore specific functions are achieved.
Optionally, the operator graph includes: the neural network model or the neural network model comprises at least one network.
Wherein the operator graph may be a whole or a part of the neural network model. It is to be understood that the operator graph is used to implement a specific function in its entirety, or to implement some of the functions in a specific function. For example, the operator graph may be a neural network model or a network set formed by at least one network in a neural network model.
In some alternative embodiments, the operator graph may comprise a model formed by an image detection network and a speech recognition network, or the operator graph may comprise only an image detection network, or the operator graph may comprise only a speech recognition network. As another example, the operator graph may comprise a model formed by an image detection network and an object recognition network, or the operator graph may comprise only an image detection network, or the operator graph may comprise only an object recognition network.
By configuring the operator graph into the whole model or a part of network included by the model, the application scene of the operator graph executable by the chip can be enriched, the service mode of the operator graph can be enriched, and the utilization rate of core resources can be improved.
And S120, dividing the operation cycle into a plurality of operation time periods, and distributing the operation time periods to the operator graphs with the demands according to the operation demand information of the operator graphs with the demands.
One operation cycle refers to the time that the many-core system needs to execute the computation tasks corresponding to all the operator graphs in the operator graph set once. The many-core system will cycle through the repeating operation cycle. The operation time period refers to a part of the operation cycle, and the duration of the operation time period is less than the operation cycle. The operation time period allocated by the demand operator graph refers to the time period in which the demand operator graph operates in the many-core system. The operating period is relative to a fraction of the time duration in the operating cycle of the many-core system. The demand algorithm graph is loaded into the many-core system to run in the running time period to be run, and is removed from the many-core system when the running time period is finished. The division of the operation time period can be performed according to the number of the demanded algorithm graphs. For example, the number of the demanded algorithm graphs is 8, the operation cycle may be divided into 4 operation time periods with the same duration, and each operation time period runs two demanded algorithm graphs. The time periods allocated by different demanding operator graphs can be the same or different.
In some optional embodiments, on-chip storage of the many-core system cannot put down configuration information of all the operator graphs included in the operator graph set at one time, the operator graphs need to be put into the many-core system in batches to realize time-sharing operation in the many-core system, and a time period in which the operator graphs operate in the many-core system is an operation time period. The on-chip storage can be allocated storage resources in a many-core system or a memory of a chip for storing data associated with the operator atlas. The on-chip storage is smaller than the occupied space of the operator graph set, which indicates that the chip can not store (or put down) the operator graph set at the same time. The on-chip storage is smaller than the occupied space of the operator graph set, the configuration information of the operator graph set cannot be put down, the configuration information of partial operator graphs of the operator graph set can be loaded firstly, after the partial operator graphs finish running to obtain output results, the on-chip storage is removed from the configuration information of partial operator graphs, the configuration information of the rest operator graphs is loaded into the on-chip storage to run the rest operator graphs, the calculation is continued, and the like is carried out until final results are obtained.
The operation cycle is divided into a plurality of operation time periods, which may be a mode of dividing a time period smaller than the operation cycle in advance as the currently divided operation time period, putting the demanded operator graphs into the many-core system to operate the operation time period for each demanded operator graph, collecting operation data, and determining the operation time period as the operation time period of the demanded operator graph under the condition that the operation data meets the operation demand information. And under the condition that the operation requirement information is not met, determining that the operation time period does not match with the operation algorithm graph with the requirement. And continuously dividing the next operation time period in the rest operation period according to the number of the demand operator graphs to be distributed after all the demand operator graphs to be distributed judge whether to operate in the operation time period, continuously judging whether the demand operator graphs to be distributed operate in the currently divided operation time period, and repeating the steps until all the demand operator graphs are distributed completely, and determining that the operation time period is divided completely. The operation sub-graph with the demand to be allocated may refer to an operation sub-graph with the demand, to which an operation time period is not allocated, in the currently divided operation time period.
S130, determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph; and the running data of the computer graph with the demand in the distributed running time period meets the running demand information in the running period.
The target core resource is the core resource which is allocated to the demanded algorithm subgraph in the running time period, and the demanded algorithm subgraph is run by adopting the target core resource in the many-core system. The resource allocation type is used as a basis for allocating the core resources for the operator graph and is used for determining a resource allocation strategy for the operator graph set. The resource allocation type is used for determining a method for allocating core resources, and different resource allocation types correspond to different core resource allocation methods. Optionally, resource allocation types of different operator graphs in the operator graph group are the same, that is, the operator graphs correspond to the same core resource allocation method.
Optionally, the resource allocation types include: a high performance type, a power saving type, or a balanced type. The evaluation of the resources can be divided into three categories, which mainly correspond to the power consumption problem of the AI chip, and because the power consumption problem is related to the throughput, the evaluation can be regarded as a combination rule of three kinds of keywords with different priorities, so that the resource allocation type can be divided into a high performance type, a balance type or an energy saving type. The power consumption of a chip depends mainly on data handling, core computation and scheduling of resources. The power consumption of the three parts is different under different conditions, for example, under the same other conditions, the larger the data transmission amount per unit time is, the larger the power consumption is; under the same other conditions, the more cores are used for calculation in unit time, the larger the power consumption is; under the same other conditions, the more resource scheduling instructions are issued per unit time, the greater the power consumption of the scheduler.
The embodiment of the invention solves the problem of core resource waste caused by full resource allocation aiming at each operator graph in the related technology by allocating the operation time period for the operator graph with the demand and determining the target core resource of the operator graph with the demand in the operation time period, and provides an allocation mode of time resources and core resources, which can operate the operator graph with the demand in a time-sharing manner and allocate the adaptive core resources to meet the operation demand information, thereby realizing reasonable allocation of resources required by the operator graph and targeted allocation of the resources, improving the utilization rate of the resources and reducing the resource waste.
Example two
Fig. 2 is a flowchart of a resource allocation method for an operator graph in a second embodiment of the present invention, where this embodiment is based on the foregoing embodiment, and divides an operation cycle into multiple operation time periods, and allocates an operation time period to each operator graph with a demand according to operation demand information of each operator graph with a demand, where the operation time period may be: alternately executing the operation of dividing an operation time period in the operation cycle, and determining the operation of each target demand computation sub-graph running in the divided operation time period according to the operation demand information of each demand computation sub-graph to be distributed currently until the matched operation time period is distributed for all the demand computation sub-graphs in the operation cycle. Optionally, the method of this embodiment may include:
s210, acquiring a demanded operator graph configured with operation demand information from an operator graph set to be operated in the many-core system.
Reference is made to the preceding description for non-exhaustive embodiments of the invention.
S220, alternately executing the operation of dividing an operation time period in the operation cycle, and determining the operation of each target demand algorithm subgraph operating in the divided operation time period according to the operation demand information of each demand algorithm subgraph to be distributed at present until the matched operation time period is distributed for all the demand algorithm subgraphs in the operation cycle.
The selectable operation time periods are determined one by one, and when each operation time period is divided, whether the demanded operator graph to be allocated is a target demanded operator graph or not is detected, namely whether the operation data of the demanded operator graph to be allocated in the operation time period meets the configured operation demand information or not is detected; thereby alternately performing the operation of dividing the operation time period and the operation of determining the target first algorithm graph in the operation time period.
The target demanded operator graph is a demanded operator graph which is obtained by enabling operating data operating in the divided operating time period to meet the configured operating demand information. And matching the target demand computation graph with the current divided operation time period.
Optionally, after an operation of dividing an operation time period within the operation time duration is executed, detecting whether operation data obtained by each operator graph operating in the operation time period meets the operation requirement information, and if so, determining the operator graph with the requirement as a target operator graph matched with the operation time period; and under the condition of not meeting the requirement, determining that the demanded algorithm subgraph is not matched with the operation time period, namely determining that the demanded algorithm subgraph is not the target demanded algorithm subgraph matched with the operation time period. After the detection that whether all the demand operator graphs are matched with the operation time period is completed, dividing the next operation time period, detecting whether the residual demand operator graphs to be distributed currently operate in the operation time period to obtain operation data, and repeating the steps until whether the operation data meet the operation demand information.
S230, determining target core resources for operating each demanded algorithm subgraph in each operation time period according to resource allocation types and operation demand information of each demanded algorithm subgraph; and the operation data of the demanded algorithm graph in the distributed operation time period meets the operation demand information in the operation period.
The embodiment of the invention can quickly and accurately divide the operation time period in the operation time period by alternately executing the operation of dividing the operation time period and the operation of determining the target demanded algorithm graph in the operation time period, and for each operation time period, the matched target demanded algorithm graphs are sequentially determined in the demanded algorithm graphs to be distributed, thereby reducing the omission of the demanded algorithm graphs, improving the distribution accuracy of the operation time period, improving the utilization rate of the operation time period, and enabling each demanded algorithm graph to meet the configured operation demand information, thereby reasonably configuring the resources of the many-core system and improving the resource utilization rate of the many-core system.
Optionally, the determining, according to the operation demand information of each demand operator graph to be currently allocated, each target demand operator graph operating in the divided operation time period includes: and determining each target demand operator graph operating in the divided operation time period according to the intra-segment demand information obtained after converting the operation demand information of the current demand operator graph to be allocated into the divided operation time period and the operation data of the current demand operator graph to be allocated in the divided operation time period.
The intra-segment requirement information is used for judging whether the divided operation time segment is an operation time segment matched with the demanded operator graph to be currently distributed, namely, for judging whether the demanded operator graph to be currently distributed is a target demanded operator graph running in the divided operation time segment. The in-segment demand information is matched with the operation demand information, the operation demand information is demand information under an operation period, and the in-segment demand information is demand information under the divided operation time periods. The intra-segment demand information may be determined according to the operation cycle, the duration of the divided operation time segment, and the operation demand information. Illustratively, the product of the ratio of the operation period to the time length of the divided operation time period and the value corresponding to the operation demand information is the value corresponding to the demand information in the segment.
In some optional embodiments, the operation requirement information of the demanded operator graph to be currently allocated refers to requirement information that needs to be satisfied in the whole operation cycle, and the demanded operator graph to be currently allocated operates in the allocated operation time period and does not operate in the unallocated operation time period, that is, the operation data of the demanded operator graph to be currently allocated in the allocated operation time period needs to satisfy certain requirement information in the unallocated operation time period. Correspondingly, the operation demand information can be converted into the divided operation time periods, the operation demand information is mapped into the section demand information, the section demand information is compared with the operation data of the current demanded operator graph to be distributed according to the section demand information, whether the operation data of the current demanded operator graph to be distributed meets the section demand information or not is detected, and whether the operation data of the current demanded operator graph to be distributed meets the operation demand information in the operation period or not is detected.
The operation requirement information in the operation period is converted into the segment requirement information in the operation time period, and whether the operation data of the current demanded operator graph to be distributed in the operation time period meets the segment requirement information is detected according to the segment requirement information to judge whether the current demanded operator graph to be distributed is matched with the operation time period, so that whether the operation data of the current demanded operator graph to be distributed in the operation time period meets the operation requirement information in the operation period can be accurately judged, the target demanded operator graph matched with the operation time period is accurately determined, the distribution accuracy of the operation time period is improved, and the operation requirement information configured by each demanded operator graph can be accurately met.
Optionally, the dividing a running time period within the running period includes: in the operation period, dividing a first operation time period in the operation period according to a time starting point and a preset duration of the operation period; or dividing each demanded algorithm graph to be distributed currently, and determining the number of groups formed by division; determining the residual time length according to the operation period and the time length corresponding to each divided operation time period; dividing the residual time length according to the group number to obtain divided time lengths; and dividing an operation time period in the operation cycle according to the end point of the operation time period at the last of the time sequence in the operation cycle and the time division length.
The time starting point of the operation period is determined as the starting point of the first operation time period, the preset time length is determined as the time length of the first operation time period, and the first operation time period can be determined in the operation period according to the starting point of the first operation time period and the time length of the first operation time period. The preset time length may be determined according to the operation requirement information of each first operator graph, or may be determined according to an average operation period of the first operator graphs obtained through experimental statistics, or may be determined by calculating a ratio between the operation period and the number of groups formed by dividing each first operator graph to be currently allocated.
The division of the other operating periods, except the first operating period, is determined by the number of groups and the remaining period, the division period, and the end point of the last operating period of the time sequence.
The number of groups is used to divide the remaining duration. The number of groups may be, according to the number of first operator graphs in a preset group, calculating a ratio between the number of first operator graphs to be currently allocated and the number of first operator graphs in the group. Illustratively, the number of first operator graphs in a group is 2, the number of first operator graphs to be currently allocated is 8, and the number of groups is 8/2=4.
And calculating the difference value between the operation period and the sum of the time lengths corresponding to the divided operation time periods, and determining the difference value as the residual time length. And determining the ratio of the residual time length to the number of the groups as the time division length. As in the previous example, if the number of groups is 4 and the remaining time period is 8 minutes, the minute time period is determined to be 8/4=2 minutes.
And determining the operation time periods except the first time period in the operation cycle according to the time division length by taking the end point of the operation time period with the last time sequence in each divided operation time period as the starting point of the operation time periods except the first time period.
By providing the dividing mode of the operation cycle, the time length of the operation time period can be flexibly adjusted according to the number of the first algorithm graphs to be distributed at present, the operation time period of each first algorithm graph can be finely controlled, the operation cycle of the many-core system is reasonably configured, the utilization rate and flexibility of time resources are improved, and the waste of the many-core system resources is reduced.
Optionally, one operation cycle includes a plurality of operation time periods, and at least two operation time periods have different durations.
The operation period can be divided into a plurality of operation time periods, so that the operator graph can be used for time division multiplexing of the many-core system. Optionally, according to the determined operation time period, the duration of the operation time period before the time sequence is greater than the duration of the operation time period after the time sequence. The selectable first operation time period has the largest duration, and the subsequent subdivision selectable according to the number of the remaining algorithm graphs is smaller and smaller.
By configuring one operation cycle to comprise a plurality of operation time periods and configuring at least two operation time periods with different durations, the operation time periods can be flexibly adjusted to adapt to the operation configuration requirements of the demand algorithm graph.
Optionally, determining each target first operator graph operating in the divided operation time period according to the intra-segment demand information obtained by converting the operation demand information of the currently to-be-allocated operator graph into the divided operation time period and the operation data of the currently to-be-allocated operator graph in the divided operation time period, including: determining a current operator graph in each current operator graph to be distributed with requirements; determining the required information of the current operator graph in the segment of the divided operation time period according to the operation required information of the current operator graph, the time length of the divided operation time period and the operation period; acquiring the operation data of the current operator graph in the divided operation time period; when the current operator graph is determined to meet the performance condition according to the required information and the operation data in the segment of the current operator graph, determining the current operator graph as a target required operator graph; and returning to execute the operation of determining the current operator graph in each current operator graph to be distributed with the demand until the distribution ending condition is met.
And selecting one demand operator graph from the demand operator graphs to be distributed currently to determine the demand operator graph as the current operator graph. For example, in each demanded operator graph to be currently allocated, the current operator graph may be determined according to the priority of each demanded operator graph. For example, the priority may be whether the operator graph with the requirement is an operator graph with a time sequence relationship, and whether a required operator graph with an allocated operation time period exists in the time sequence relationship to which the required operator graph with the time sequence relationship belongs. The priority order from high to low can be that the demanded operator graph with the distributed operation time period has a time sequence relation with the demanded operator graph to be distributed, each demanded operator graph with the time sequence relation to be distributed and each demanded operator graph without the time sequence relation to be distributed. Two demanded algorithm graphs with timing relation have dependency relation in input data and/or output data.
The ratio of the value corresponding to the operation demand information of the current operator graph to the value corresponding to the demand information of the current operator graph in the divided operation time period corresponds to the ratio of the operation period to the duration of the divided operation time period, and illustratively, the former is inversely proportional to the latter. The operation data is used for describing the operation performance of the current operator graph in the current processing operation time period, and optionally, the current operator graph is operated by using the remaining core resources in the current processing operation time period, so as to obtain the detection result of the operation performance. It should be noted that the core resource finally allocated to the third operator graph may be determined to be different from the remaining core resources, and optionally, the core resource finally allocated to the third operator graph is less than or equal to the remaining core resources. And the residual core resources are used for determining whether the running third algorithm graph with the residual maximum capacity in the currently processed running time period can meet the running demand information of the third algorithm graph, and if the maximum capacity cannot be met, determining that the currently processed running time period cannot meet the running demand information of the third algorithm graph. The performance condition is used for judging whether the running data of the current algorithm graph meets the intra-segment requirement information. And the distribution ending condition is used for judging whether the judgment of the operator graph to be distributed with the demand is finished. For example, the distribution ending condition is that the matching results of the current demanded operator graph to be distributed and the running time period are determined, and the current demanded operator graph to be distributed does not exist.
The method comprises the steps of circularly judging whether the current demanded operator graph to be distributed is a target demanded operator graph operating in an operating time period one by one, and determining the matching relation between each demanded operator graph and the operating time period, so that the operating time period is accurately distributed for each demanded operator graph, the omission of the demanded operator graphs is avoided, meanwhile, each demanded operator graph needs to judge whether the performance demands are met, the operating time period can be distributed under the condition that the operating demands of the demanded operator graphs are accurately met, and the distribution flexibility and the control precision of the operating time period are improved.
Optionally, the two demanded operator graphs having the time sequence relationship are allocated to the same or adjacent operation time periods.
The timing relationship includes a serial timing relationship and/or a parallel timing relationship. The at least two operator graphs having a serial time sequence relationship may refer to a chronological order of the dependence of the at least two operator graphs on data, for example, input data of a first operator graph is output data of a second operator graph, and the two operator graphs have a serial time sequence relationship. The at least two operator graphs with a parallel time sequence relationship may mean that the at least two operator graphs are dependent on the same data at the same time, and one input data is input into the two operator graphs respectively for computation, where the two operator graphs have a parallel time sequence relationship. For an operator graph with a time sequence relation, data flows from front to back along the serial time sequence relation, and when the parallel time sequence relation is met, the data is handed to all the operator graphs with the parallel time sequence relation at the same time.
If at least two operator graphs with a time sequence relationship operate in different operation time periods, intermediate data of a previous operation time period need to be reserved until a next operation time period or more operation time periods, which may cause a large amount of intermediate data to be stored, occupy excessive storage resources, and cause storage resource waste. Therefore, the operator graphs with the time sequence relation are placed in the same operation time period or two adjacent operation time periods of the time sequence respectively, the storage time of the intermediate data can be reduced, the storage resources of the intermediate data can be released in time, and the utilization rate of the storage resources is improved.
It should be noted that the serial timing relationship of the two operator graphs having the serial timing relationship corresponds to the time sequence of the operation time period, for example, the operator graph with the previous time sequence operates in the operation time period with the previous time sequence, and the operator graph with the subsequent time sequence operates in the operation time period with the subsequent time sequence.
By placing the operator graphs with the time sequence relationship in the same operation time period or respectively placing the operator graphs in two adjacent operation time periods of the time sequence, the storage time of the intermediate data can be reduced, the storage resources of the intermediate data can be released in time, and the utilization rate of the storage resources can be improved.
Optionally, the operation requirement information includes an operation requirement speed, and the intra-segment requirement information includes an intra-segment requirement speed; the determining that the current operator graph requires information in the segment within the divided operation time period includes: calculating the ratio of the operation period to the time length corresponding to the divided operation time period; and calculating the product of the operation demand speed and the ratio, and determining the product as the intra-segment demand speed.
The operation data is the average speed of the current operator graph in the divided operation time period, and the operation requirement information is the operation requirement speed. The intra-segment demand information is an intra-segment demand speed. And the average speed of the current operator graph is the speed of the operator graph for obtaining output data through calculation according to the received random data. The operation requirement information can be divided into operation requirement information of a single picture and operation requirement information of a plurality of pictures, the delay in the operation requirement information of the single picture cannot be ignored, the delay of the operation requirement information of the plurality of pictures can be ignored, and the operation requirement information in the scheme is the operation requirement information in a period of time and is not single. And the ratio of the required speed to the operation required speed in the section is equal to the ratio of the operation period to the time length corresponding to the divided operation time period.
The operation performance of the operator graph with the requirement can be accurately determined by configuring the content of the operation data and the corresponding calculation mode, the operator graph is enabled to distribute the operation time period towards the direction meeting the performance requirement, diversified performance requirements can be met, the operation mode of the operator graph is flexibly configured, and the application scene of the operator graph is enriched.
Optionally, the determining that the current operator graph satisfies the performance condition includes: and if the running speed of the current operator graph is determined to be greater than or equal to the required speed in the segment, determining that the current operator graph meets the performance condition.
If the operation speed of the current operator graph is determined to be greater than or equal to the required speed in the segment, determining that the current operator graph meets the performance condition, namely the operation data in the operation time segment meets the operation requirement information, wherein the current operator graph is a target operator graph with a requirement and operates in the operation time segment; and if the operation speed of the current operator graph is determined to be less than the required speed in the segment, and the current operator graph is determined not to meet the performance condition, namely the operation data in the operation time segment does not meet the operation requirement information, the current operator graph is not the target operator graph which operates in the operation time segment and has the requirement.
Whether the current operator graph is matched with the operation time period or not is detected by combining the required speed in the segment according to the actual operation speed of the current operator graph, the operation cycle of each operator graph can be finely controlled, time-sharing multiplexing core resources are flexibly configured, and the utilization rate of the core resources is improved.
EXAMPLE III
Fig. 3 is a flowchart of a resource allocation method for an operator graph in the third embodiment of the present invention, where this embodiment is based on the above embodiment, and determines, according to the operation requirement information of each required operator graph to be currently allocated, each target required operator graph that operates in a divided operation time period, where the method may be: determining whether each current demand operator graph to be distributed comprises a target operator graph group or not in each current demand operator graph to be distributed, wherein the target operator graph group comprises a plurality of first operator graphs, each first operator graph is configured with operation demand information, and a time sequence relation exists among the plurality of first operator graphs in the target operator graph group; when the existence of the target operator graph group is determined, determining the operation time period of each first operator graph in the target operator graph group to be operated in the many-core system according to the operation requirement information of each first operator graph in the target operator graph group. Optionally, the method of this embodiment may include:
s310, obtaining a demanded operator graph configured with operation demand information in an operator graph set to be operated in the many-core system.
Reference is made to the preceding description for non-exhaustive embodiments of the invention.
S320, alternately executing the operation of dividing an operation time period in the operation cycle; and
s330, in each current demanded operator graph to be distributed, determining whether each current demanded operator graph to be distributed comprises a target operator graph group, wherein the target operator graph group comprises a plurality of first operator graphs, each first operator graph is configured with operation demand information, and a time sequence relation exists among the plurality of first operator graphs in the target operator graph group.
And each first operator graph in the target operator graph group has a time sequence relation with at least one first operator graph in the group, and each first operator graph configuration is configured with operation requirement information. In some optional embodiments, the target operator graph group may be formed by forming a plurality of first operator graphs each configured with the operation requirement information and having a time sequence relationship.
S340, when the existence of the target operator graph group is determined, according to the operation requirement information of each first operator graph in the target operator graph group, determining the operation of each first operator graph in the target operator graph group in the operation time period to be operated in the many-core system until the matched operation time period is distributed for all the operator graphs with the requirement in the operation period.
And the first operator graph in the target operator graph group is a demanded operator graph. And preferentially and sequentially determining the operation time period to be operated for each first operator graph in each target operator graph group. And when the first operator graph in each target operator graph group cannot operate in the current processing operation time period, verifying whether the demanded operator graphs except the first operator graph in the target operator graph group are the target demanded operator graphs in the operation time period.
S350, determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph; and the operation data of the demanded algorithm graph in the distributed operation time period meets the operation demand information in the operation period.
Optionally, the determining, according to the resource allocation type and the operation demand information of each demanded operator graph, a target core resource for operating the demanded operator graph in each operation time period includes: and determining target core resources for operating each first operator graph in the target operator graph group in each operation time period according to the resource allocation type and the operation requirement information of each first operator graph in the target operator graph group.
According to the method and the device, when the operator graph with the demand comprises the operation demand and has a time sequence relation with other operator graphs with the demand, the operation time period in which each operator graph with the demand is to be operated is determined, the target core resource for operating each operator graph with the demand in each operation time period is determined, the operator graphs with the demand can be placed in different time periods to operate, a distribution mode of time resources and core resources is provided, the resources required by the operator graphs can be reasonably configured, the resources can be configured in a targeted mode, the utilization rate of the resources is improved, and the waste of the resources is reduced.
Optionally, the determining, according to the operation requirement information of each first operator graph in the target operator graph group, an operation time period in which each first operator graph in the target operator graph group is to be operated in the many-core system includes: aiming at the current processing running time period, selecting a current operator graph from each first operator graph to be distributed currently in the target operator graph group; if the operating data of the current operator graph in the currently processed operating time period meets the performance condition, determining the currently processed operating time period as the operating time period to be operated of the current operator graph in the many-core system; returning to execute the operation of selecting the current operator graph from the first operator graphs to be distributed currently in the target operator graph group until the current operator graph is determined not to meet the performance condition; and if the current operator graph is determined not to meet the performance condition and/or the residual core resources except the target core resources are determined not to exist, finishing the verification of the target operator graph group aiming at the current processing running time period.
The operation time period may be divided in the operation cycle, and when one operation time period is obtained by the division, the operation time period obtained by the division may be determined as the operation time period of the current processing. Each first operator graph to be currently allocated in the target operator graph group refers to a first operator graph except for the first operator graphs operated in the divided operation time period (including the operation time period of the current processing). Selecting the current operator graph may refer to selecting a first operator graph with the highest priority from among the first operator graphs to be currently allocated. In the currently processed target operator graph group, a time sequence relationship exists between a first operator graph (belonging to the currently processed target operator graph group) to which an operation time period is allocated and a first operator graph (belonging to the currently processed target operator graph group) to be currently allocated. Therefore, the current operator graph can be sequentially selected according to the time sequence among the first operator graphs in the target operator graph group.
The performance condition is used for judging whether the operation data of the current algorithm graph meets the configured operation requirement information. And determining that the operation data of the current operator graph meets the performance condition, which indicates that the current operator graph can meet the configured operation requirement information in the current operation time period, so that the current operator graph can be placed in the current operation time period to operate. And determining that the current operator map does not meet the performance condition, which indicates that the current operator map does not meet the configured operation requirement information in the current operation time period, so that the current operator map cannot be placed in the current operation time period for operation. And whether the current operator graph meets the performance condition or not, selecting the next current operator graph after the judgment is finished. The current operator graph does not meet the performance condition, which indicates that the current operator graph can not meet the configured operation requirement information in the current processing operation time period, the current processing operation time period verifies the first operator graph according to the sequence and allocates resources, and when the first operator graph with high priority can not meet the configured operation requirement information, the first operator graph with low priority can not be verified and allocated resources. And the absence of the residual core resources except the target core resources indicates that the residual core resources which are not allocated in the operation time period are insufficient to operate the residual first algorithm graph, and at the moment, the currently processed operation time period cannot bear the first algorithm graph. And the current operator graph does not meet the performance condition, and/or the existence of the remaining core resources except the target core resources is determined, which indicates that the remaining first operator graphs cannot be verified, so that the verification of any first operator graph in the target operator graph group performed for the currently processed running time period is finished. When the current operator graph does not meet the performance condition, the verification of any first operator graph in the next target operator graph group can be started according to the current processing running time period; and under the condition that the residual core resources except the target core resources are determined to be absent, finishing the verification of any target operator graph group aiming at the current processing running time period.
The first operator graph is sequentially selected from the target operator graph group according to the current processing operation time period, whether the performance condition can be met or not is detected, the first operator graph is placed as much as possible under the condition that operation requirement information is considered, resources are reasonably configured, and the resource utilization rate of the many-core system is improved.
Optionally, the selecting a current operator graph from the first operator graphs to be currently allocated in the first operator graph group includes: and sequentially selecting the first operator graphs with the front time sequence according to the time sequence relation of the first operator graphs to be distributed currently in the first operator graph group, and determining the first operator graphs as the current operator graphs.
In the serial time sequence relation, the output of a first operator graph is the input of another first operator graph, and at the moment, the first operator graph with the previous time sequence is the first operator graph for outputting a result. And if the two first operator graphs have dependence on the same data, the first operator graph with the time sequence being the front one is any one of the two first operator graphs.
It should be noted that, the determining that the current operator graph does not satisfy the performance condition includes: and determining that the first operator graph with the previous time sequence does not meet the performance condition. In some optional embodiments, in the first operator graph to be currently distributed, the first operator graph with a later time sequence depends on the output result of the first operator graph with a earlier time sequence. If the first operator graph with the later time sequence runs in the current processing operation time period, and the first operator graph with the earlier time sequence does not run in the current processing operation time period, the first operator graph with the later time sequence cannot obtain an output result, so that the first operator graph cannot run correctly. Thus, when it is determined that the first operator graph preceding the time sequence does not satisfy the performance condition, the verification of the first operator graph for the currently processed operation time period may be ended. In addition, in the parallel time sequence relation, the number of the first operator graphs in the time sequence front comprises at least two, and correspondingly, the number of the current operator graphs comprises at least two.
By selecting the first operator graph with the time sequence in front and preferentially distributing, the first operator graph with the time sequence behind can obtain the output result of the first operator graph with the time sequence in front, the accurate operation of the first operator graph is improved, and the operation stability of the many-core system is improved.
Optionally, the method further includes: for any operation time period, when determining that the residual core resources except the target core resources exist, determining a second operator graph from the demanded operator graphs so as to determine the core resources for operating the second operator graph according to the residual core resources, wherein the second operator graph does not have a time sequence relation with any operator graph.
The sum of the remaining core resources and the at least one target core resource is a core resource allocable by the many-core system. The target core resource is a core resource allocated by a first computer graph that is operable during the operational time period. The number of target core resources is the same as the number of first operator graphs. And the existence of the residual core resources except the target core resource indicates that the first operator graph which can be operated in the operation time period is completely determined and completed, and meanwhile, the residual core resources are not enough to be allocated to the first operator graph to meet the configured operation requirement information of the first operator graph. The second operator graph is different from the first operator graph, and the second operator graph can be an operator graph which does not have a time sequence relation with any one of the operator graphs and is configured with operation requirement information.
In some optional embodiments, the first operator graph in which the operation time period is executable is preferentially determined, and when the remaining first operator graph cannot be executed in the operation time period and the remaining core resources exist in the operation time period, the second operator graph in which the operation time period is executable is determined. In the matching process of the running time period and the operator graph, the priority of the first operator graph is higher than that of the second operator graph. And if the residual core resources do not exist in the operation time period, not detecting the second algorithm graph which can be operated in the operation time period.
By distributing the core resources for the second operator graph and determining the second operator graph as the operator graph running on the running time period when the existence of the residual core resources is determined, a resource distribution mode for determining the running time period for the operator graph according to the priority of the operator graph is provided, the resource distribution flexibility is improved, and the diversified distribution requirements can be met.
It should be noted that, for the currently-processed running time period, after the verification of each target operator graph group performed for the currently-processed running time period is finished, if it is determined that at least one second operator graph exists, a current operator graph is selected from the second operator graphs to be currently allocated, and if it is determined that the running data of the current operator graph in the currently-processed running time period meets the performance condition, the currently-processed running time period is determined as the running time period to be run of the current operator graph in the many-core system; and returning to execute the operation of selecting the current operator graph in the current second operator graph to be distributed until the verification of each current second operator graph to be distributed is completed and/or the fact that the residual core resources except the target core resources do not exist is determined, and finishing the verification of the demanded operator graph aiming at the current processing operation time period.
Optionally, the method further includes: for any operation time period, when determining that the residual core resources except the target core resources exist, determining an operation-needless sub-graph from the operation sub-graph set, and determining the core resources for operating the operation-needless sub-graph according to the residual core resources, wherein the operation-needless sub-graph is not configured with operation requirement information.
Since the no-demand operator graph does not have the operation requirement information, the no-demand operator graph can operate in the current operation time period as long as the remaining core resources exist in the current operation time period (for example, the number of the remaining core resources includes at least one). Illustratively, the number of remaining core resources is at least one. When no demand-free operator graph exists or the current residual core resources are empty, the demand-free operator graph cannot run in the current processing running time period.
By judging whether the operation is the operation-requiring subgraph in the currently processed operation time period one by one according to the requirement of the requirement information, the operation time period of each operation-requiring subgraph can be finely controlled, the operator graph with the largest operation quantity in the currently processed operation time period can be furthest operated, the operation time period of the many-core system is reasonably configured, the utilization rate and the flexibility of time resources are improved, and the waste of the many-core system resources is reduced.
Optionally, the determining an unnecessary operator graph from the operator graph set includes: when at least one third operator graph exists, determining the third operator graph as an operator graph without requirement, wherein the third operator graph is not configured with operation requirement information and has a time sequence relation with the at least one operator graph; and when the third operator graph does not exist or after the judgment of each third operator graph is finished, the current residual core resources are not empty and at least one fourth operator graph exists, determining that the fourth operator graph is an operator graph which does not need to be required, and the fourth operator graph is not configured with operation requirement information and does not have a time sequence relation with any operator graph.
In order to effectively utilize resources and timely release the resources, the operator graphs with the time sequence relation can be optionally placed in the same or adjacent operation time periods during operation. And aiming at the current processing running time period, in the non-demand operator graphs, preferentially verifying the third operator graphs with the time sequence relationship, and after the verification of each third operator graph is finished, verifying the fourth operator graphs without the time sequence relationship.
And forming a non-demand time sequence operator graph group by the plurality of third operator graphs with time sequence relations, and selecting the current operator graph group. Aiming at the current processing operation time period, selecting a current operator graph from all third operator graphs to be distributed currently in the current operator graph group; if the operation data of the current operator graph in the current processing operation time period meets the performance condition, determining the current processing operation time period as the operation time period to be operated of the current operator graph in the many-core system; returning to execute the operation of selecting the current operator graph from the third operator graphs to be distributed currently in the time sequence operator graph group without solving until the current operator graph is determined not to meet the performance condition; and if the current operator graph is determined not to meet the performance condition and/or the residual core resources except the target core resources are determined not to exist, ending the verification of the current operator graph group aiming at the current processing running time period, and returning to execute the step of selecting the current operator graph group until the verification of each time sequence-free operator graph group aiming at the current processing running time period is ended. After finishing the verification of each time-sequence-free operator graph group which is carried out aiming at the current processing operation time period, selecting a current operator graph from a fourth operator graph, and if the operation data of the current operator graph in the current processing operation time period meets the performance condition, determining the current processing operation time period as the operation time period to be operated of the current operator graph in the many-core system; and returning to execute the operation of selecting the current operator graph in the fourth operator graph until all fourth operator graphs are verified.
By configuring the verification sequence of the computation-free subgraphs, whether the computation-free subgraphs operate in the currently processed operation time period can be judged one by one according to the requirements of the requirement information and the time sequence relation, the operation time period of each computation-free subgraph can be controlled finely, the operation time period of the many-core system is configured reasonably, the utilization rate and the flexibility of time resources are improved, and the waste of the many-core system resources is reduced.
Optionally, the resource allocation method of the operator graph further includes: upon determining that the target operator graph group does not exist, determining whether the demanded operator graph comprises a second operator graph; and when at least one second operator graph exists, determining the operation time period of each second operator graph to be operated in the many-core system and the target core resource for operating each second operator graph in each operation time period according to the operation requirement information of each second operator graph.
And when the first operator graph group does not exist, preferentially judging whether a second operator graph exists or not. And when the second operator graph exists, determining the operation time period of each second operator graph to be operated in the multi-core system and the target core resource for operating each second operator graph in each operation time period according to the operation requirement information.
By configuring the configuration without the first operator graph group and performing operation time period and core resource allocation on the rest operator graphs, application scenes of operator graph resource allocation can be enriched, different application requirements can be adapted, diversified resource allocation requirements can be met, and resource allocation is more flexible.
Example four
Fig. 4 is a flowchart of a resource allocation method for an operator graph in a fourth embodiment of the present invention, and this embodiment determines, based on the foregoing embodiment, a target core resource used for operating each demand operator graph in each operating time period according to a resource allocation type and operating demand information of each demand operator graph, where the method may be as follows: determining the residual core resources of the current processing operation time period according to the core resources which can be distributed by the many-core system and the core resources distributed in the current processing operation time period; and in the residual core resources of the currently processed running time period, distributing target core resources for the current operator graph meeting the performance condition according to the resource distribution type. Optionally, the method of this embodiment may include:
s410, acquiring a demanded operator graph configured with operation demand information in an operator graph set to be operated in the many-core system.
Reference is made to the preceding description for non-exhaustive embodiments of the invention.
S420, alternately executing the operation of dividing an operation time period in the operation cycle; and
s430, in each current demanded operator graph to be distributed, determining whether each current demanded operator graph to be distributed comprises a target operator graph group, wherein the target operator graph group comprises a plurality of first operator graphs, each first operator graph is configured with operation demand information, and a time sequence relation exists among the plurality of first operator graphs in the target operator graph group.
S440, aiming at the current processing running time period, selecting the current operator graph from the first operator graphs to be distributed currently in the target operator graph group.
S450, if the operation data of the current operator graph in the current operation time period meets the performance condition, determining the current operation time period as the operation time period to be operated of the current operator graph in the many-core system.
And S460, returning to execute the operation of selecting the current operator graph from the first operator graphs to be distributed currently in the target operator graph group until the current operator graph is determined not to meet the performance condition.
S470, if the current operator graph is determined not to meet the performance condition and/or the residual core resources except the target core resources are determined not to exist, ending the operation of verifying the target operator graph group aiming at the currently processed operation time period until the matched operation time period is distributed for all the demanded operator graphs in the operation cycle.
S480, determining the residual core resources in the current processing operation time period according to the core resources which can be distributed by the many-core system and the core resources which are distributed in the current processing operation time period.
The allocable core resources are cores which can be allocated to the operator graph calculation for a many-core system. The allocated core resources in the current processing running time period refer to the sum of the target core resources of the first algorithm graph which is determined to run in the current processing running time period. The remaining core resources refer to core resources that may be allocated to the new first algorithm graph allocated for the currently processed runtime period. The remaining core resources are equal to the difference between the allocable core resources and the allocated core resources in the currently processed running time period.
S490, in the remaining core resources of the currently processed running time period, according to the resource allocation type, allocating target core resources for the current operator graph satisfying the performance condition; and the operation data of the demanded algorithm graph in the distributed operation time period meets the operation demand information in the operation period.
At this time, the current operator graph satisfying the performance condition is a new first operator graph operating in the operating time period of the current process.
The embodiment of the invention can accurately allocate resources to the newly allocated first operator graph in the current processing operation time period by calculating the residual core resources and allocating the core resources to the current operator graph meeting the performance condition, thereby providing an accurate, flexible and controllable core resource allocation mode and accurately allocating time resources and core resources to the first operator graph.
Optionally, the resource allocation type includes a high performance type, an energy saving type or an equalization type.
The resource allocation method of the high-performance type can be understood as pursuing maximization of the calculation performance of the chip; and allocating the throughput, the time sequence, the scheduling of the serial-parallel relation and the like to the direction of maximally utilizing the effective computing power of the matched core. The performance can be expressed in terms of chip computation performance. The energy-saving resource allocation method can be understood as maintaining extremely energy saving under a certain performance, at the moment, because power loss caused by frequently carrying data is limited, data carrying is reduced as much as possible, and related scheduling instructions, core calculation, throughput, time delay, time sequence, serial-parallel relation and the like are all configured towards the direction of reducing data carrying capacity within a period of time so as to meet the requirement of reducing power consumption under a certain performance condition. The balance type can be regarded as an intermediate type of the two types, and aims to obtain an optimal power consumption ratio under a certain condition, the factors are required to be considered comprehensively, the core calculation force of the resource allocation mode and the power consumption situation of data carrying scheduling are measured, the optimal energy consumption ratio is obtained, and the energy consumption ratio is used for evaluating the ratio of performance to power consumption.
The on-chip storage is larger than or equal to the occupied space of the operator graph set, at the moment, data related to the operator graph set can be put down through the on-chip storage, at the moment, all the operator graphs can form an operator graph group, and the influence of data carrying on the computing performance of the chip can be reduced. Any one of the resource allocation modes of the high performance type, the energy saving type or the balance type can be realized.
And after the operation of the operator graphs in the operator graph group is finished, the rest data is put into the on-chip storage to operate the rest operator graphs and carry out data transportation. In this case, the resource allocation pattern of the energy saving type is not matched, and thus, when the on-chip storage is smaller than the occupied space of the arithmetic graph set, the resource allocation pattern of the energy saving type is not configured.
By configuring the resource allocation types and the application scenes suitable for each resource allocation type, the resource allocation modes can be flexibly configured, the flexibility of the resource allocation modes is improved, the application scenes of the executable algorithm graphs of the chips are adapted, and the rationality of the resource allocation is improved.
Optionally, the resource allocation types include: a high performance type; the allocating, in the remaining core resources of the currently processed operating time period, a target core resource for the current operator graph satisfying the performance condition according to the resource allocation type includes: and determining the target core resource meeting the condition of the fastest processing speed aiming at the current computation graph meeting the performance condition by a dichotomy in the residual core resources of the currently processed running time period.
The dichotomy means that the resource range is continuously divided into two in the resource range, so that the resource range is reduced to a certain core resource, and the operating operator graph meets the condition of the fastest processing speed. The condition of the fastest processing speed is used for determining the target core resource with the best performance, and when the operator graph is operated by a certain number of core resources, the processing speed of the operator graph is faster than that when any other core resource operates the operator graph. The condition of the fastest processing speed can be that the target core resource is adopted to run the algorithm graph, and the chip has the fastest processing speed for processing the algorithm graph.
For example, a plurality of core resources may be determined among the core resources, for example, the core resources are 4 cores, and the number of cores of the core resources may be 1, 2, 3, and 4. The target core resource meeting the condition of the fastest processing speed shows that compared with other core resources, when the target core resource is adopted to run the algorithm subgraph, the chip has the fastest processing speed for processing the algorithm subgraph.
By determining the core resource meeting the condition of the fastest processing speed as the target core resource, the core resource meeting the high-performance operation requirement can be accurately determined, the maximization of the chip computing performance is realized, and the high-performance type resource allocation mode is met.
Optionally, the determining, by a bisection method, a target core resource that satisfies a condition that a processing speed of the current operator graph is fastest includes: determining a core number range according to the current residual core resources and preset minimum core resources, and calculating a median of the core number range, wherein the current residual core resources, the minimum core resources and the median are integers; respectively adopting the current residual core resources, the minimum core resources and the median to operate the current operator graph, eliminating the quantity with the slowest processing speed, and determining the two residual quantities as target quantities; and re-determining the median and the target number according to the two target numbers until the median does not exist in the core number range, and determining the number with the highest processing speed in the current core number range as the target core resource meeting the condition of the highest processing speed of the current operator graph.
The currently remaining core resources may be understood as core resources that may currently be allocated to the current operator graph. In some alternative embodiments, the current remaining core resources are the maximum number of core resources that can currently be allocated to the current operator graph. The minimum core resource is the minimum number of core resources that can be currently allocated to the current operator graph, and for example, the minimum core resource is 1 core resource. When the current remaining core resources are equal to the minimum core resources, the current processing speed is the highest quantity of the current remaining core resources or the minimum core resources, namely the target core resources of the current operator graph. The current remaining core resource is greater than or equal to the minimum core resource, the current remaining core resource and the minimum core resource determine the core number range, which means that the current remaining core resource and the minimum core resource are respectively used as the end points of the core number range, the current remaining core resource is the upper limit of the core number range in the dichotomy, and the minimum core resource is the lower limit of the dichotomy core number range. The median of the range of the number of cores in the dichotomy may refer to a median between the currently remaining core resources and the minimum core resources.
And respectively adopting three quantities of core resources to run the current operator graph, eliminating the quantity with the slowest processing speed, and determining the two residual quantities as two quantities with the faster processing speed as the target quantity. And re-determining the core number range according to the two target numbers, taking the maximum number of the two target numbers as the upper limit end point of the core number range, namely as the upper limit end point of the re-determined core number range, and taking the minimum number of the two target numbers as the lower limit end point of the core number range, namely as the lower limit end point of the re-determined core number range. Optionally, the target number is the current remaining core resource and the median, or the minimum core resource and the median, and thus, the re-determined core number range is smaller than the core number range determined by the current remaining core resource and the minimum core resource, so as to reduce the core number range. The absence of the median of the integer within the core number range indicates that no integer exists between the two endpoints of the current core number range, and at this time, the two endpoints are adjacent integers. And determining the number of the two endpoints with the highest query processing speed as the target core resource.
For example, the resource range corresponding to the core resource may be a core number range between 1 (or 0) and a core number corresponding to the core resource, where the core cannot be split any more because the core is a distributable minimum core resource unit, and thus the core number can only take an integer value. Continuously reducing the resource range by adopting a dichotomy, and determining the quantity with the highest current processing speed as the target core resource meeting the condition of the highest current operator graph processing speed, wherein the quantity can be as follows: presetting a second quantity, such as 0 or 1, setting the core quantity corresponding to the core resources as the first quantity, taking the median in a core quantity range determined by the first quantity and the second quantity as a third quantity, respectively operating the operator graph groups by the first quantity, the second quantity and the third quantity, selecting two quantities with the highest processing speed, re-determining the resource range, re-determining two ends of the resource range as the first quantity (upper limit) and the second quantity (lower limit), taking the median in the re-determined resource range as the third quantity, continuously operating the operator graph groups by three quantities, selecting two quantities with the highest processing speed, re-determining the resource range again, and so on until the resource range only comprises at most two quantities, and taking the quantity with the highest processing speed as a target core resource meeting the condition of the highest processing speed of the current operator graph. At this time, the number with the highest processing speed in at most two numbers is also the number with the highest processing speed in the initial resource range corresponding to the core resource.
Illustratively, 1 core, 1/2 total cores (rounded up if not an integer) and total cores are first assigned separately, and the speed at which the set of sub-graphs is processed is determined. For example, in the case that the processing speed of 1/2 total cores and all cores is faster than 1 core, a number may be taken between 1/2 total cores and all cores, for example, 3/4 total cores determine the processing speed, and so on, until the fastest target core number for processing the plurality of algorithm graphs is determined.
In addition, whether the on-chip storage can put down data associated with the sub-graph sets or not, the target core resource matched with the maximum processing speed can be determined by adopting a dichotomy method.
By the dichotomy, the operation cost of operating the operator graph groups one by one can be reduced, the target core resource can be determined quickly, the determination cost of the target core resource is reduced, and the determination efficiency of the target core resource is improved.
Optionally, the resource allocation types include: an energy saving type; the allocating, in the remaining core resources of the currently processed operating time period, a target core resource for the current operator graph satisfying the performance condition according to the resource allocation type includes: calculating the minimum sub-core resource corresponding to the current operator graph meeting the performance condition; accumulating at least one resource in the corresponding minimum sub-core resource to obtain at least one alternative sub-core resource, wherein the alternative sub-core resource is less than or equal to the residual core resource of the currently processed operation time period; and screening out the alternative sub-core resources meeting the condition of processing time reduction fastest from all the alternative sub-core resources, and determining the alternative sub-core resources as the target core resources of the current operator graph meeting the performance condition.
In the energy-saving resource allocation mode, the number of data transportation is reduced as much as possible, and at the moment, all the to-be-operated operator graphs can be put down and stored on the chip. The minimum sub-core resource may refer to the minimum core resource allocated to the current operator graph under the condition that the operation requirement of the operator graph is met.
Illustratively, the minimum child core resource of the current operator graph is determined: and according to the occupied space (such as the size of the total configuration data) of the operator graph set and the core storage size of the current operator graph, calculating the quotient of the occupied space divided by the storage size of the current operator graph, and determining the minimum integer greater than or equal to the quotient as the minimum sub-core resource corresponding to the current operator graph.
The alternative sub-core resource refers to the number of cores that is greater than the minimum sub-core resource and less than or equal to the remaining core resources. The minimum sub-core resource may be copied, and the same resource is added to each minimum sub-core resource for multiple times, respectively, to obtain different candidate sub-core resources, for example, a set number of resources is added once to obtain one candidate sub-core resource, and a set number of resources is added twice to obtain one candidate sub-core resource. The number of cores of different alternative sub-core resources is different. For example, the resource is gradually increased on the minimum core resource to obtain a plurality of candidate sub-core resources, for example, the minimum core resource is 3 cores, and 1 core may be gradually increased to obtain 4, 5, 6, and 7 cores, which are all used as the candidate sub-core resources.
And meeting the condition of processing time reduction fastest for the chip to process the core resource with the fastest processing time reduction speed of the current sub-graph. For example, a relation curve between the number of cores and the processing time may be established, and in the relation curve, the number of cores with the fastest processing time reduction rate is found to be determined as the target core resource.
In some optional embodiments, in the energy-saving type resource allocation manner, whether the total configuration of all the operator graphs in the operator graph set can be put down in on-chip storage is judged according to the total configuration data size of the operator graph set and the size of the on-chip storage, and under the condition that the total configuration of all the operator graphs in the operator graph set can be put down, a target core resource in the energy-saving type resource allocation manner is determined. The target core resource determined by the energy-saving type is the minimum core number under the condition of ensuring that the configuration information of the operator subgraph set is not disassembled.
By reducing the processing time by the alternative sub-core resources with the fastest condition and determining the alternative sub-core resources as the target core resources, the core resources meeting the energy-saving operation requirements can be accurately determined, the minimization of the chip power consumption is realized, and the energy-saving type resource allocation mode is met.
Optionally, the accumulating resources at least once in the corresponding minimum sub-core resource to obtain at least one candidate sub-core resource, screening out a candidate sub-core resource satisfying a condition that a processing time is reduced fastest from each of the candidate sub-core resources, and determining the candidate sub-core resource as the target core resource of the current algorithm sub-graph satisfying the performance condition, includes: accumulating unit resources in the minimum sub-core resources to obtain alternative sub-core resources; respectively adopting the alternative sub-core resources and the minimum sub-core resource to operate the current algorithm graph, and acquiring matched processing time; calculating a difference value between the processing time matched with the candidate sub-core resource and the processing time matched with the minimum sub-core resource, and determining the difference value as a processing time reduction value matched with the candidate sub-core resource; continuously accumulating the unit resources in the minimum sub-core resource to obtain a next alternative sub-core resource, and calculating a processing time reduction value matched with the next alternative sub-core resource until the difference value between the processing time reduction values matched with two adjacent alternative sub-core resources is smaller than a set threshold value; and determining the candidate sub-core resources with a large number as the candidate sub-core resources meeting the condition of fastest processing time reduction in the two adjacent candidate sub-core resources.
Each accumulation may be increased by a set number of units, e.g., 10% or 20% of the core resources, etc. The processing time for matching the core resource may be time required for computing and setting the input data by using the core resource to run the current operator map. Wherein the processing time reduction value is used to evaluate the extent to which the processing time is reduced as the core resources increase. The processing time reduction value may be a negative value, representing an increase in processing time as core resources increase.
Continuing to accumulate the resources in the minimum core resources means continuing to accumulate the resources based on the resources that were already accumulated. That is, the number of the candidate sub-core resources obtained successively is gradually increased, and the next candidate sub-core resource is larger than the last candidate sub-core resource in the two adjacent candidate sub-core resources in the generation order. The difference value between the processing time reduction values matched with the two adjacent alternative sub-core resources in the generation sequence is smaller than the set threshold value, which indicates that the processing time is sharply reduced along with the increase of the core resources, namely the reduction degree of the processing time is larger. The candidate sub-core resources with the largest number may be determined as the candidate sub-core resources satisfying the condition that the processing time is reduced fastest, where the candidate sub-core resource with the largest number generates the next candidate sub-core resource in the two adjacent candidate sub-core resources in the sequence.
The resources are accumulated for multiple times in the minimum core resource to obtain a plurality of alternative sub-core resources, the matched alternative sub-core resources with the sharply reduced processing time are inquired in the accumulation process, and the alternative sub-core resources meeting the condition that the processing time is reduced fastest are determined, so that the running cost of running sub-core graph groups by the alternative sub-core resources one by one can be reduced, the target core resource is determined quickly, the determination cost of the target core resource is reduced, and the determination efficiency of the target core resource is improved.
Optionally, the resource allocation types include: a type of equalization; the allocating, in the remaining core resources of the currently processed operating time period, a target core resource for the current operator graph satisfying the performance condition according to the resource allocation type includes: determining at least one sub-core resource corresponding to a current operator graph meeting performance conditions in the residual core resources of the currently processed running time period, wherein the number of each sub-core resource is less than or equal to the number of the residual core resources of the currently processed running time period; respectively calculating the energy consumption ratio matched with each sub-core resource according to the following formula:
Figure BDA0003047030360000211
and determining the sub-core resource matched with the maximum energy consumption ratio as the target core resource of the current algorithm graph meeting the performance condition.
The balance type refers to a resource allocation mode with the best calculation performance under unit power consumption. The number of the cores of the sub-core resources is less than or equal to the number of the cores of the rest core resources, and the number of the cores of different sub-core resources is different. Determining the target core resource may be: and finding the target core resource which enables the energy consumption ratio of the current algorithm graph to be maximum. And the variables in the formula are core resources, different core resources are adopted to respectively calculate the energy consumption ratio of the current operator graph, and the core resource with the maximum energy consumption ratio is obtained and used as the target core resource of the current operator graph.
The power consumption ratio may represent a computational performance per unit power consumption. The processing time of the core resource × the calculation speed of the core resource is used to evaluate the amount of calculation, and the power consumption of the transport data plus the power consumption calculated by the core resource is used to evaluate the power consumption. The processing time of the core resources is multiplied by the computing speed of the core resources, and the product of the main frequency of the core and the sub-core resources is equal to the processing time of the core resources; the power consumption of the data to be transported is obtained by calculating the occupied space, the bandwidth and the power consumption required by transporting the data in unit time of the operator graph group, for example, the occupied space is divided by the bandwidth to obtain the transporting time, and the product of the transporting time and the power consumption required by transporting in unit time is determined as the power consumption of the data to be transported; the power consumption calculated by the core resource is obtained by calculating the calculation amount, the sub-core resource and the power consumption required by the calculation in unit time. The power consumption required for the transport per unit time and the power consumption required for the calculation per unit time are related to hardware of the many-core system, and can be determined through experiments.
Therefore, in the energy consumption ratio formula, only the sub-core resource is used as a variable, a plurality of sub-core resources are input into the formula, a plurality of energy consumption ratios can be obtained, the maximum energy consumption ratio is selected from the plurality of energy consumption ratios, and the input sub-core resource is determined as the target core resource.
By determining the sub-core resource with the largest energy consumption ratio as the target core resource, the core resource meeting the balance requirement can be accurately determined, the calculation performance maximization of the chip under the unit power consumption is realized, and thus the balance type resource allocation mode is met.
EXAMPLE five
Fig. 5 is a schematic diagram of a resource allocation apparatus for an operator graph in a fifth embodiment of the present invention. The second embodiment is a corresponding device for implementing the resource allocation method of the computation graph provided in the above embodiments of the present invention, and the device may be implemented in a software and/or hardware manner, may be generally integrated into a computer device, and may be in a many-core system.
A demanded operator graph acquisition module 510, configured to acquire a demanded operator graph configured with operation demand information from an operator graph set to be operated in a many-core system;
an operation time period allocation module 520, configured to divide an operation cycle into multiple operation time periods, and allocate an operation time period to each demanded operator graph according to the operation demand information of each demanded operator graph;
a target core resource allocation module 530, configured to determine, according to a resource allocation type and operation requirement information of each required operator graph, a target core resource for operating each required operator graph in each operation time period; and the running data of the computer graph with the demand in the distributed running time period meets the running demand information in the running period.
The embodiment of the invention allocates the operation time period for the operator graph with the demand and determines the target core resource of the operator graph with the demand in the operation time period, thereby solving the problem of the waste of the core resource caused by the full resource allocation of each operator graph in the related technology, providing an allocation mode of the time resource and the core resource, and being capable of executing the operator graph with the demand in a time-sharing way, allocating the adaptive core resource to meet the operation demand information, realizing the reasonable allocation of the resource required by the operator graph, allocating the resource in a targeted way, improving the utilization rate of the resource and reducing the waste of the resource.
Further, the operation time period allocation module 520 includes: and the cycle distribution time period unit is used for alternately executing the operation of dividing an operation time period in the operation period, determining the operation of each target demand computation graph which operates in the divided operation time period according to the operation demand information of each demand computation graph to be distributed at present, and distributing the matched operation time period for all the demand computation graphs in the operation period.
Further, the cyclic allocation time period unit includes: and the operation demand information folding and calculating subunit is used for determining each target demand operator graph which operates in the divided operation time period according to the section demand information obtained after folding the operation demand information of the current demand operator graph to be allocated into the divided operation time period and the operation data of the current demand operator graph to be allocated in the divided operation time period.
Further, the operation time period allocation module 520 includes: the running time period dividing unit is used for dividing a first running time period in the running period according to a time starting point and preset duration of the running period in the running period; or dividing each demanded algorithm graph to be distributed currently, and determining the number of groups formed by division; determining the remaining time length according to the operation period and the time length corresponding to each divided operation time period; dividing the residual time length according to the group number to obtain divided time lengths; and dividing an operation time period in the operation cycle according to the end point of the operation time period at the last time sequence in the operation cycle and the time division length.
Further, the cyclic allocation time period unit includes: a target operator graph group determining subunit, configured to determine, in each of the currently to-be-allocated demanded operator graphs, whether a target operator graph group is included in each of the currently to-be-allocated demanded operator graphs, where the target operator graph group includes a plurality of first operator graphs, each first operator graph is configured with operation demand information, and a time sequence relationship exists between the plurality of first operator graphs in the target operator graph group; when the existence of the target operator graph group is determined, determining the operation time period of each first operator graph in the target operator graph group to be operated in the many-core system according to the operation requirement information of each first operator graph in the target operator graph group.
Further, the target operator graph group determination subunit may be configured to: aiming at the current processing running time period, selecting a current operator graph from each first operator graph to be distributed currently in the target operator graph group; if the operation data of the current operator graph in the current operation time period meets the performance condition, determining the current operation time period as the operation time period of the current operator graph to be operated in the many-core system; returning to execute the operation of selecting the current operator graph from the first operator graphs to be distributed currently in the target operator graph group until the current operator graph is determined not to meet the performance condition; and if the current operator graph is determined not to meet the performance condition and/or the residual core resources except the target core resources are determined not to exist, finishing the verification of the target operator graph group aiming at the current processing running time period.
Further, the apparatus further comprises: and for any operation time period, when determining that the residual core resources except the target core resources exist, determining a second operator graph from the demanded operator graph, and determining the core resources for operating the second operator graph according to the residual core resources, wherein the second operator graph has no time sequence relation with any operator graph.
Further, the apparatus further comprises: and the non-demand computation graph resource allocation module is used for determining a non-demand computation graph from the computation graph set when determining that the residual core resources except the target core resources exist for any operation time period, and determining the core resources for operating the non-demand computation graph according to the residual core resources, wherein the non-demand computation graph is not configured with operation requirement information.
Further, the computation-free sub-graph resource allocation module includes: the non-demand or non-time sequence operator graph distribution unit is used for determining that the third operator graph is a non-demand operator graph when at least one third operator graph exists, the third operator graph is not configured with operation demand information, and a time sequence relation exists between the third operator graph and the at least one operator graph; and when no third operator graph exists or after the judgment of each third operator graph is finished, the current residual core resources are not empty and at least one fourth operator graph exists, determining that the fourth operator graph is an operator graph which does not need to be required, and the fourth operator graph is not configured with operation requirement information and has no time sequence relation with any operator graph.
Further, the target core resource allocation module 530 includes: a residual core determining unit, configured to determine a residual core resource in the currently processed operating time period according to the core resource allocable by the many-core system and the core resource allocated in the currently processed operating time period; and in the residual core resources of the currently processed operation time period, distributing target core resources for the current operator graph meeting the performance condition according to the resource distribution type.
Further, the resource allocation types include: a high performance type; the residual core determining unit comprises a high-performance allocation subunit, and is used for determining a target core resource meeting the condition of the fastest processing speed aiming at the current operator graph meeting the performance condition through a dichotomy in the residual core resources of the current processing operation time period.
Further, the resource allocation types include: an energy saving type; the residual core determining unit comprises an energy-saving distribution subunit and is used for calculating the minimum sub-core resource corresponding to the current operator graph meeting the performance condition; accumulating at least one resource in the corresponding minimum sub-core resource to obtain at least one alternative sub-core resource, wherein the alternative sub-core resource is less than or equal to the residual core resource of the currently processed operation time period; and screening the alternative sub-core resources meeting the condition that the processing time is reduced fastest from all the alternative sub-core resources, and determining the alternative sub-core resources as the target core resources of the current operator graph meeting the performance condition.
Further, the resource allocation types include: a type of equalization; the remaining core determining unit includes a balanced allocation subunit, configured to determine, in the remaining core resources in the currently-processed operation time period, at least one sub-core resource corresponding to a current operator graph that meets a performance condition, where the number of each sub-core resource is less than or equal to the number of the remaining core resources in the currently-processed operation time period; respectively calculating the energy consumption ratio matched with each sub-core resource according to the following formula:
Figure BDA0003047030360000241
and determining the sub-core resource matched with the maximum energy consumption ratio as the target core resource of the current algorithm graph meeting the performance condition.
The image generation device can execute the resource allocation method of the computer graph provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executed image generation method.
Example six
Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 6 is only an example and should not impose any limitations on the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 6, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16. The computer device 12 may be a device that is attached to a bus.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures can include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, alternatively referred to as a "hard disk mover"). Although not shown in FIG. 6, a disk mover for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk") and an optical disk mover for reading from and writing to a removable, non-volatile optical disk (e.g., a Compact disk Read-Only Memory (CD-ROM), digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each mover may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
Program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which or some combination of which may comprise an implementation of a network environment. Program modules 42 optionally perform the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an Input/Output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., local Area Network (LAN), wide Area Network (WAN)) via Network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be understood that although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to microcode, device movers, redundant processing units, external disk motion Arrays, (Redundant Arrays of Inexplicit Disks) systems, tape movers, data backup storage systems, and the like.
The processing unit 16 executes programs stored in the system memory 28 to perform various functional applications and data processing, for example, to implement a resource allocation method for operator maps provided by any of the embodiments of the present invention.
EXAMPLE seven
The seventh embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the resource allocation method for an algorithm graph provided in all the embodiments of the present invention:
that is, the program when executed by the processor implements: acquiring a demanded operator graph configured with operation demand information in an operator graph set to be operated in a many-core system; dividing an operation cycle into a plurality of operation time periods, and distributing the operation time periods for the operator graphs with the requirements according to the operation requirement information of the operator graphs with the requirements; determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph; and the operation data of the demanded algorithm graph in the distributed operation time period meets the operation demand information in the operation period.
Computer storage media for embodiments of the present invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More alternative examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a LAN or a WAN, or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in some detail by the above embodiments, the invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the invention, and the scope of the invention is determined by the scope of the appended claims.

Claims (16)

1. A resource allocation method of an operator graph is applied to a many-core system, wherein the many-core system comprises allocable core resources, and the method comprises the following steps:
acquiring a demanded operator graph configured with operation demand information in an operator graph set to be operated in a many-core system;
dividing an operation cycle into a plurality of operation time periods, and distributing the operation time periods for each demanded operator graph according to the operation demand information of each demanded operator graph;
determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph;
and the running data of the computer graph with the demand in the distributed running time period meets the running demand information in the running period.
2. The method of claim 1, wherein the dividing the operation cycle into a plurality of operation time segments and allocating an operation time segment to each demanded operator graph according to the operation demand information of each demanded operator graph comprises:
and alternately executing the operation of dividing an operation time period in the operation cycle, and determining the operation of each target demand computation sub-graph operated in the divided operation time period according to the operation demand information of each demand computation sub-graph to be allocated currently until all demand computation sub-graphs are allocated with the matched operation time period in the operation cycle.
3. The method according to claim 2, wherein the determining the target demand operator graphs operating in the divided operation time period according to the operation demand information of the demand operator graphs to be currently distributed comprises:
and determining each target demand operator graph operating in the divided operation time period according to the intra-segment demand information obtained after converting the operation demand information of the current demand operator graph to be allocated into the divided operation time period and the operation data of the current demand operator graph to be allocated in the divided operation time period.
4. The method of claim 2, wherein dividing an operating period within an operating cycle comprises:
in the operation period, dividing a first operation time period in the operation period according to a time starting point and a preset duration of the operation period; or alternatively
Dividing each demanded algorithm graph to be distributed currently, and determining the number of groups formed by division;
determining the remaining time length according to the operation period and the time length corresponding to each divided operation time period;
dividing the residual time length according to the group number to obtain divided time lengths;
and dividing an operation time period in the operation cycle according to the end point of the operation time period at the last of the time sequence in the operation cycle and the time division length.
5. The method according to claim 2, wherein the determining the target demand operator graphs operating in the divided operation time period according to the operation demand information of the demand operator graphs to be currently distributed comprises:
in each current demanded operator graph to be distributed, determining whether each current demanded operator graph to be distributed comprises a target operator graph group or not, wherein the target operator graph group comprises a plurality of first operator graphs, each first operator graph is configured with operation demand information, and a time sequence relation exists among the plurality of first operator graphs in the target operator graph group;
when the existence of the target operator graph group is determined, determining the operation time period of each first operator graph in the target operator graph group to be operated in the many-core system according to the operation requirement information of each first operator graph in the target operator graph group.
6. The method of claim 5, wherein the determining, according to the operation requirement information of each first operator graph in the target operator graph group, an operation time period in which each first operator graph in the target operator graph group is to be operated in the many-core system comprises:
aiming at the current processing running time period, selecting a current operator graph from each first operator graph to be distributed currently in the target operator graph group;
if the operation data of the current operator graph in the current operation time period meets the performance condition, determining the current operation time period as the operation time period of the current operator graph to be operated in the many-core system;
returning to execute the operation of selecting the current operator graph from the first operator graphs to be distributed currently in the target operator graph group until the current operator graph is determined not to meet the performance condition;
and if the current operator graph is determined not to meet the performance condition and/or the residual core resources except the target core resources are determined not to exist, finishing the verification of the target operator graph group aiming at the current processing running time period.
7. The method of claim 5, further comprising:
for any operation time period, when determining that the residual core resources except the target core resources exist, determining a second operator graph from the demanded operator graphs, and determining the core resources for operating the second operator graph according to the residual core resources, wherein the second operator graph does not have a time sequence relation with any operator graph.
8. The method of claim 1, further comprising:
for any operation time period, when determining that the residual core resources except the target core resources exist, determining an operation-needless sub-graph from the operation sub-graph set, and determining the core resources for operating the operation-needless sub-graph according to the residual core resources, wherein the operation-needless sub-graph is not configured with operation requirement information.
9. The method of claim 8, wherein the determining a no-demand operator graph from the set of operator graphs comprises:
when at least one third operator graph exists, determining the third operator graph as an operator graph without requirement, wherein the third operator graph is not configured with operation requirement information and has a time sequence relation with the at least one operator graph;
and when the third operator graph does not exist or after the judgment of each third operator graph is finished, the current residual core resources are not empty and at least one fourth operator graph exists, determining that the fourth operator graph is an operator graph which does not need to be required, and the fourth operator graph is not configured with operation requirement information and does not have a time sequence relation with any operator graph.
10. The method of claim 6, wherein the determining, according to the resource allocation type and the operation requirement information of each demanded operator graph, the target core resource for operating each demanded operator graph in each operation time period comprises:
determining the residual core resources of the current processing operation time period according to the core resources which can be distributed by the many-core system and the core resources distributed in the current processing operation time period;
and in the residual core resources of the currently processed running time period, distributing target core resources for the current operator graph meeting the performance condition according to the resource distribution type.
11. The method of claim 10, wherein the resource allocation type comprises: a high performance type;
the allocating, in the remaining core resources of the currently processed operating time period, a target core resource for the current operator graph satisfying the performance condition according to the resource allocation type includes:
and determining the target core resource meeting the condition of the fastest processing speed aiming at the current computation graph meeting the performance condition by a dichotomy in the residual core resources of the currently processed running time period.
12. The method of claim 10, wherein the resource allocation type comprises: an energy saving type;
the allocating, in the remaining core resources of the currently processed operating time period, a target core resource for the current operator graph satisfying the performance condition according to the resource allocation type includes:
calculating the minimum sub-core resource corresponding to the current operator graph meeting the performance condition;
accumulating at least one resource in the corresponding minimum sub-core resource to obtain at least one alternative sub-core resource, wherein the alternative sub-core resource is less than or equal to the residual core resource of the currently processed operation time period;
and screening the alternative sub-core resources meeting the condition that the processing time is reduced fastest from all the alternative sub-core resources, and determining the alternative sub-core resources as the target core resources of the current operator graph meeting the performance condition.
13. The method of claim 10, wherein the resource allocation type comprises: a type of equalization;
the allocating, according to the resource allocation type, a target core resource to the current operator graph that meets the performance condition in the remaining core resources of the currently processed operating period includes:
determining at least one sub-core resource corresponding to a current operator graph meeting performance conditions in the residual core resources of the currently processed running time period, wherein the number of each sub-core resource is less than or equal to the number of the residual core resources of the currently processed running time period;
respectively calculating the energy consumption ratio matched with each sub-core resource according to the following formula:
Figure FDA0003047030350000031
and determining the sub-core resource matched with the maximum energy consumption ratio as the target core resource of the current algorithm graph meeting the performance condition.
14. An apparatus for resource allocation of operator graphs, configured for a many-core system, the many-core system including allocable core resources, the apparatus comprising:
the system comprises a demand operator graph acquisition module, a demand operator graph acquisition module and a demand operator graph acquisition module, wherein the demand operator graph acquisition module is used for acquiring a demand operator graph configured with operation demand information in an operator graph set to be operated in a many-core system;
the operation time period distribution module is used for dividing an operation cycle into a plurality of operation time periods and distributing the operation time periods to the operator graphs with the demands according to the operation demand information of the operator graphs with the demands;
the target core resource allocation module is used for determining target core resources for operating each demanded operator graph in each operation time period according to the resource allocation type and the operation demand information of each demanded operator graph; and the operation data of the demanded algorithm graph in the distributed operation time period meets the operation demand information in the operation period.
15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for resource allocation of an operator graph according to any of claims 1-13 when executing the program.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for resource allocation of an operator graph according to any one of claims 1-13.
CN202110474902.9A 2020-08-27 2021-04-29 Resource allocation method and device for computer graph, computer equipment and storage medium Pending CN115269163A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110474902.9A CN115269163A (en) 2021-04-29 2021-04-29 Resource allocation method and device for computer graph, computer equipment and storage medium
PCT/CN2021/114217 WO2022042519A1 (en) 2020-08-27 2021-08-24 Resource allocation method and apparatus, and computer device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110474902.9A CN115269163A (en) 2021-04-29 2021-04-29 Resource allocation method and device for computer graph, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115269163A true CN115269163A (en) 2022-11-01

Family

ID=83744802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110474902.9A Pending CN115269163A (en) 2020-08-27 2021-04-29 Resource allocation method and device for computer graph, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115269163A (en)

Similar Documents

Publication Publication Date Title
CN111427681B (en) Real-time task matching scheduling system and method based on resource monitoring in edge computing
WO2016078008A1 (en) Method and apparatus for scheduling data flow task
US20150295970A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
CN110389816B (en) Method, apparatus and computer readable medium for resource scheduling
CN112068957B (en) Resource allocation method, device, computer equipment and storage medium
CN109886859B (en) Data processing method, system, electronic device and computer readable storage medium
CN110413412B (en) GPU (graphics processing Unit) cluster resource allocation method and device
CN109918182B (en) Multi-GPU task scheduling method under virtualization technology
CN110300959B (en) Method, system, device, apparatus and medium for dynamic runtime task management
CN111104211A (en) Task dependency based computation offload method, system, device and medium
US9471387B2 (en) Scheduling in job execution
CN111142938A (en) Task processing method and task processing device of heterogeneous chip and electronic equipment
CN112130966A (en) Task scheduling method and system
CN105488134A (en) Big data processing method and big data processing device
CN108170861B (en) Distributed database system collaborative optimization method based on dynamic programming
CN116708451A (en) Edge cloud cooperative scheduling method and system
CN116069480B (en) Processor and computing device
CN115269163A (en) Resource allocation method and device for computer graph, computer equipment and storage medium
CN114283046B (en) Point cloud file registration method and device based on ICP (inductively coupled plasma) algorithm and storage medium
CN112988363B (en) Resource scheduling method, device, server and storage medium
CN115904510A (en) Multi-operand instruction processing method, graphics processor and storage medium
CN110825502A (en) Neural network processor and task scheduling method for neural network processor
Wang et al. On scheduling algorithms for mapreduce jobs in heterogeneous clouds with budget constraints
CN115269165A (en) Operator graph resource allocation method and device, computer equipment and storage medium
CN115269164A (en) Resource allocation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination