WO2022042519A1 - 资源分配方法和装置、计算机设备、计算机可读存储介质 - Google Patents

资源分配方法和装置、计算机设备、计算机可读存储介质 Download PDF

Info

Publication number
WO2022042519A1
WO2022042519A1 PCT/CN2021/114217 CN2021114217W WO2022042519A1 WO 2022042519 A1 WO2022042519 A1 WO 2022042519A1 CN 2021114217 W CN2021114217 W CN 2021114217W WO 2022042519 A1 WO2022042519 A1 WO 2022042519A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
measured
resource
subgraph
resources
Prior art date
Application number
PCT/CN2021/114217
Other languages
English (en)
French (fr)
Inventor
吴欣洋
李涵
丁瑞强
孟凡辉
戚海涛
冯开革
陈锐
李康
祝夭龙
胡川
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010879892.2A external-priority patent/CN112068957B/zh
Priority claimed from CN202110476735.1A external-priority patent/CN115269165A/zh
Priority claimed from CN202110474902.9A external-priority patent/CN115269163A/zh
Priority claimed from CN202110476902.2A external-priority patent/CN115269166A/zh
Priority claimed from CN202110475134.9A external-priority patent/CN115269164A/zh
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022042519A1 publication Critical patent/WO2022042519A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • Embodiments of the present invention relate to the field of artificial intelligence, and in particular, to a resource allocation method and apparatus, computer equipment, and a computer-readable storage medium.
  • AI Artificial Intelligence
  • the AI chip can be a many-core system.
  • the many-core system includes multiple independently schedulable processing cores (cores), and multiple processing cores can cooperate to process a task (such as an operator graph).
  • System resources are important.
  • Embodiments of the present invention provide a resource allocation method and apparatus, computer equipment, and a computer-readable storage medium, which can reasonably configure the resources of the many-core system to improve the resource utilization rate of the many-core system.
  • an embodiment of the present invention provides a resource allocation method, which includes: determining a plurality of operator graphs of resources to be allocated, and determining that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs; Resource runs the measured operator graph, and determines the target resource of the measured operator graph according to the running information of the measured operator graph when it runs under the test resource; wherein, the target resource of the operator graph is an operator The resources occupied by graphs when running in a many-core system.
  • the resources include running resources; and the determining the target resource of the measured operation subgraph includes: determining a test resource that meets the conditions of the resource allocation type as the target operation resource of the measured operation subgraph.
  • the running the measured operator subgraph with the test resource, and determining the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when running under the test resource includes: the Use the multi-level test resources to run the measured operator subgraph respectively, and determine the first-level test resource that meets the conditions of the resource allocation type to be the measured operator according to the operation information of the measured operator subgraph when running under the test resources at all levels.
  • the target running resources of the subgraph; among any two-level test resources, the running resources of the higher-level test resources are larger than the running resources of the lower-level test resources.
  • the resource allocation type includes a high-performance type; the test resources that meet the conditions of the high-performance type are: among all the test resources of the level, the level that enables the fastest processing speed of the measured operator subgraph Test resources.
  • the resource allocation type includes an energy-saving type; the difference in the amount of running resources in any two adjacent test resources is equal; the test resources that meet the conditions of the energy-saving type are: the time reduction value is greater than a preset threshold The lowest-level test resource among all the test resources of the level; wherein, the time reduction value of the test resource is the time-consuming of running the measured operator subgraph to process the predetermined data with the test resource of this level, which is one level lower than the test resource of this level. A reduction in the amount of time it takes for the test resource to run the measured operator subgraph to process the predetermined data.
  • the resource allocation type includes a balance type; the test resources that meet the conditions of the balance type are: among all the level test resources, the level one test resource with the largest energy consumption ratio; wherein, the test resources of the test resources are The energy consumption ratio is the amount of data that can be processed by consuming unit energy consumption when running the measured operator subgraph with this level of test resources.
  • the resource allocation type includes any one of a high-performance type, an energy-saving type, and a balanced type ; If the total amount of configuration data of all the measured operator subgraphs is greater than the on-chip storage space of the many-core system, the resource allocation type includes any one of the high-performance type and the balanced type; those that meet the conditions of the high-performance type
  • the test resources are: among all the level test resources, the level one test resources that make the processing speed of the measured operator subgraph the fastest; the test resources that meet the conditions of the energy saving type are: all levels whose time reduction value is greater than the preset threshold The lowest-level test resource in the test resources; wherein, the time reduction value of the test resource is the time-consuming of using the test resource of this level to run the measured operator subgraph to process the predetermined data, and the test resource that is one level lower than the test resource of this level is relatively used
  • test resources that meet the conditions of the balance type are: the first-level test resources with the largest energy consumption ratio among all the level-level test resources; wherein, the test resources of the test resources are The energy consumption ratio is the amount of data that can be processed by consuming unit energy consumption when running the measured operator subgraph with this level of test resources.
  • the resources include time resources; the determining that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs includes: determining a plurality of measured operator graph groups, each measured operator graph The group includes at least one measured operator subgraph; the test resource is used to run the measured operator subgraph, and according to the running information of the measured operator subgraph when running under the test resource, it is determined that the target resource of the measured operator subgraph includes: : Load each measured operator sub-graph group into the many-core system to run in time-sharing, and determine the target time resource of each measured operator sub-graph group according to the total operation information of all the measured operator sub-graph groups.
  • the time-sharing loading each of the measured operator sub-graph groups into the many-core system for running includes: time-sharing loading each of the measured operator sub-graph groups into the many-core system for running equal time periods.
  • the determining the target time resource of each measured operator subgraph group according to the total operation information of all the measured operator subgraph groups includes: according to the operation information of all the actually measured operator subgraph groups, determining the energy The time resource for making the total operation information of all the measured operation subgraph groups reach the demand information is the target time resource of each measured operation subgraph group.
  • the demand information includes processing speed demand information.
  • the determining the target time resource of each measured operator sub-graph group includes: determining the running time proportion of each measured operator sub-graph group in each predetermined operation cycle.
  • the resources include running resources; and the determining that at least part of the operator graphs in the multiple operator graphs are measured operator graphs includes: if the multiple operator graphs include a first operator with demand information Figure, at least determine that the first operator map is the measured operator map; said running the measured operator map with the test resource, according to the running information of the measured operator map when running under the test resource, determine the
  • the target resource of the measured operator graph includes: if the first operator graph exists, use the test resource to run the first operator graph, and determine the operation information of the first operator graph when the first operator graph runs under the test resource.
  • the running resource that can make the running information of the first operator graph meet the requirement information is the target running resource of the first operator graph.
  • the test resource is used to run the first operator graph, and according to the operation information of the first operator graph when the first operator graph is run under the test resource, it is determined that the operation of the first operator graph can be enabled
  • the operation resource whose information reaches the requirement information is the target operation resource of the first operator graph includes: if the first operator graph includes a first operator graph with a time sequence relationship and a first operator graph without a time sequence relationship, then : Use the test resource to run the first operator graph with a time sequence relationship, and determine the running information of the first operator graph with a time sequence relationship according to the operation information of the first operator graph with a time sequence relationship when running under the test resource
  • the operation resource that achieves the requirement information is the target operation resource of the first operator graph with time sequence relationship; the first operator graph without time sequence relationship is run with test resources, and runs under the test resource according to the first operator graph without time sequence relationship
  • the running information of the first operator graph without time sequence relationship is determined to make the running resource of the first operator graph without time sequence relationship reach the target running resource
  • the method further includes: if the plurality of operator graphs include For the second operator graph without demand information, determine the remaining running resources except the target running resources of the first operator graph, which are the target running resources of the second operator graph.
  • the at least determining that the third operator graph is a measured operator graph further includes: determining a plurality of measured operator graph groups, each of the measured operator graph groups including at least one third operator Figure, any two third operator graphs with time-series relationship between them are located in the same measured operator graph group or adjacent measured operator graph groups; said running the third operator graph with test resources, according to the The operation information of the three operator graphs when running under the test resource, and determining the target operation resource of the third operator graph includes: using the test resource to run each of the measured operator graph groups respectively, and according to each measured operator graph group The running information when running under the test resource determines the target running resource of the measured subgraph group.
  • the method further includes: if the plurality of operator graphs include a fourth operator graph without a time sequence relationship, determining to divide the third operator graph.
  • the remaining operating resources other than the target operating resources of the subgraph are the target operating resources of the fourth operator graph.
  • the resources include running resources and time resources; the determining that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs includes: if the plurality of operator graphs include a first graph having a time sequence relationship Three operator graphs, at least determine that the third operator graph is a measured operator graph, and determine a plurality of measured operator graph groups, each measured operator graph group includes at least one third operator graph; Resource running the measured operator subgraph, and determining the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when it runs under the test resource includes: time-sharing each of the measured operator subgraphs The group is loaded into the many-core system to run, and according to the operation information of each measured operator subgraph group, the target time resource of each measured operator subgraph group is determined, and the test resources that meet the conditions of the resource allocation type are determined as the actual measurement The target running resource of the operator graph group.
  • the running the measured operator subgraph using the test resource includes: loading the measured operator subgraph into a many-core system, and running the measured operator subgraph using the test resource in the many-core system.
  • running the measured operator subgraph with the test resource, and determining the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when running under the test resource includes: using the test The resource runs all the measured operator subgraphs, and determines the target resources of all the measured operator subgraphs according to the operation information of all the measured operator subgraphs when running under the test resource.
  • the determining that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs includes: determining a plurality of measured operator graph groups, and each measured operator graph group includes at least one measured operator graph Fig.; the described actual measurement operator subgraph is run with the test resource, according to the running information of the actual measurement operator subgraph when running under the test resource, determining the target resource of the actual measurement operator subgraph includes: using the test resource to run each subgraph respectively For the measured operator sub-graph group, according to the operation information of each measured operator sub-graph group when running under the test resource, the running resource of the measured operator sub-graph group is determined.
  • the total amount of configuration data of the measured operator subgraphs in each of the measured operator subgraph groups is less than or equal to the on-chip storage space of the many-core system.
  • the determining the target resource of the measured operator subgraph includes: determining that the resource that can make the operation information of the measured operator subgraph meet the demand information is the target resource of the measured operator subgraph.
  • determining that the resource that can make the operation information of the measured operator subgraph meet the demand information is the target resource of the measured operator subgraph includes: determining that the operation information of the measured operator subgraph can meet the demand information
  • the minimum resource is the target resource of the measured operator subgraph.
  • the determining that the resource that can make the operation information of the measured operator subgraph meet the demand information is the target resource of the measured operator subgraph includes: if the operation information that can make the measured operator subgraph cannot be determined When the resource of the required information is reached, a prompt is issued.
  • an embodiment of the present invention provides a resource allocation apparatus, which includes: an operator graph determination module, configured to determine a plurality of operator graphs of resources to be allocated, and determine at least part of the operator graphs in the plurality of operator graphs is the measured operator subgraph; the resource determination module is used to run the measured operator subgraph with the test resource, and determine the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when it runs under the test resource ; wherein, the target resource of the operator graph is the resource occupied by the operator graph when it runs in the many-core system.
  • an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the embodiment of the present invention when the processor executes the computer program Any method of resource allocation.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the resource allocation methods in the embodiment of the present invention.
  • the present invention first determine the running state (running information) of the operator graph when it actually runs under different test resources, so that the actual effect of running the operator graph with various resources (test resources) can be clearly known, and then according to This effect determines the actual resources (target resources) allocated to each operator graph in the many-core system, so that it can reasonably allocate the resources of the many-core system and improve resource utilization.
  • FIG. 1 is a schematic flowchart of a resource allocation method according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of another resource allocation method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of another resource allocation method provided by an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of another resource allocation method provided by an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of another resource allocation method provided by an embodiment of the present invention.
  • FIG. 6 is a structural block diagram of a resource allocation apparatus according to an embodiment of the present invention.
  • FIG. 7 is a structural block diagram of a computer device according to an embodiment of the present invention.
  • Fig. 8 is a structural block diagram of a computer-readable storage medium provided by an embodiment of the present invention.
  • an embodiment of the present invention provides a resource allocation method.
  • the resource allocation method in the embodiment of the present invention may be executed by a corresponding resource allocation apparatus, and the apparatus may be implemented in software and/or hardware, and may generally be integrated with computer equipment or the like.
  • the resource allocation method of the embodiment of the present invention is used to allocate target resources for each operator graph that needs to run in the many-core system, so that when each operator graph runs in the many-core system, it actually occupies its own target resource. .
  • the above resource allocation may be performed when the many-core system runs the operator graph (ie, dynamic allocation), or may be performed in advance (pre-allocation) when the operator graph is compiled.
  • a many-core (Many Core) system refers to a processing core collection system composed of a large number of processing cores connected together in a preset manner.
  • each processing core core, Core
  • each processing core is the smallest unit that can be independently scheduled and has independent computing capabilities, that is, each processing core has its own independent storage resources and computing resources, so that required operations can be performed independently.
  • processing cores in the many-core system can be connected to each other through routing (for example, the routing can be in the form of a bus, an on-chip network, etc.), so that information exchange can be realized between any two processing cores.
  • routing can be in the form of a bus, an on-chip network, etc.
  • multiple processing cores in the many-core system can cooperate with each other to perform certain operations.
  • the many-core system may also include some other units, such as a scheduler that controls each processing core, and an on-chip storage space that can be accessed by each processing core.
  • the many-core system may be a multi-core chip, a combination of multiple single-core chips, or a combination of multiple multi-core chips.
  • each operator graph includes at least one operator (or operations, such as convolution, addition, subtraction, multiplication and division, matrix addition and multiplication, etc.), and when the operator graph includes at least two operators, different operators It can have a certain relationship, such as the output of the previous operator as the input of the adjacent latter operator. Different operators may also be irrelevant.
  • the operator graph may include at least a plurality of parallel branches, and each branch may include at least one operator. Wherein, different parallel branches may not be related to each other.
  • the operator graph includes two operators, and the two operators are parallel. Among them, different branches in parallel can also have the same input.
  • the output of at least one operator may be used as the input of all parallel branches, or the output of at least one operator may be used as the input of some branches, and the inputs of different branches may be the outputs of different operators.
  • the present disclosure does not limit the number of operators included in the operator graph and the relationship between the operators.
  • the operator graph as a whole is an algorithm used to achieve relatively complete specific functions, including but not limited to artificial intelligence (Artificial Intelligence, AI) algorithms, machine learning algorithms, general scientific computing algorithms Wait.
  • AI Artificial Intelligence
  • machine learning algorithms including but not limited to machine learning algorithms, general scientific computing algorithms Wait.
  • the neural network is a deep learning model (machine learning algorithm) that can achieve certain functions, such as image detection neural network, speech detection neural network, image recognition (such as recognizing people, cars, etc.) neural network.
  • machine learning algorithm machine learning algorithm
  • the division of the neural network may be different according to its functions.
  • the image recognition neural network recognizes the image detected by the image detection neural network, they can be regarded as two continuous working neural networks. It can be regarded as an image "detection + recognition" neural network.
  • the operator graph may be a model, a part of a model, or the whole or part of multiple models.
  • an operator graph can be "divided (folded)" into multiple operator graphs, and multiple operator graphs can also be “merged” into one operator graph.
  • the application scenarios of the many-core system can be enriched, the business model of the operator graph can be enriched, and the resource utilization of the many-core system can be improved at the same time.
  • the resources of the many-core system may include operating resources, or hardware resources of the many-core system, such as the processing cores, threads, and on-chip storage space of the system.
  • the processing cores, threads, and on-chip storage space in the target running resources can only be occupied by the operator graph.
  • the resources of the many-core system may also include time resources, and the target time resources allocated to the operator graph are equivalent to specifying when the operator graph should be processed by the many-core system. That is to say, multiple operator graphs can be "time-multiplexed" and processed in the many-core system, that is, the many-core system loads and processes one or more , transport the obtained data (result data) or temporarily store it in the on-chip storage space, and then transport (load) the subsequent data (configuration data) of one or more other operator graphs into the many-core system, and perform The newly loaded operator graph is processed. It should be understood that each operator graph processed by the many-core system at the same time should also be allocated some operating resources (target operating resources) in the many-core system.
  • the same resource cannot be allocated to multiple different operator graphs at the same time, that is, any running resource, under the same time resource (same time), can only be used as the target resource of one operator graph at most; But the same running resources can be allocated to different operator graphs at different times.
  • the resource allocation method according to the embodiment of the present invention includes the following S001 to S002.
  • S001 Determine a plurality of operator graphs of resources to be allocated, and determine that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs.
  • the target resource of the operator graph is the resource occupied by the operator graph when it runs in the many-core system.
  • operation information refers to the parameters of various performances actually shown in the actual operation process of the operator graph, such as processing speed, energy consumption, data handling capacity, and the like.
  • test resources certain resources (test resources) to actually run the above measured operator subgraph, for example, let the measured operator subgraph process some data (such as randomly generated images, voice, etc.), and according to the measured operator subgraph
  • the actual operation information when running under these test resources determines which resources (target resources) should be allocated to each measured operator graph, that is, to determine how each operator graph should run in the many-core system.
  • test resources and target resources should be part of the resources of the many-core system, and should not exceed the resources of the many-core system.
  • the specific methods of "running the measured operator subgraph with the test resources” and the specific methods of "determining the target resource according to the operation information" are various.
  • the measured operator subgraph can be run under multiple different test resources, and one of the test resources can be determined as the target resource according to the operation information of the actual measured operator subgraph under each test resource, or one of the test resources can be calculated by using each test resource. target resource.
  • the multiple different test resources can be multiple predetermined resources, or gradually increased/decreased resources, or according to the previous test resource.
  • the run information below determines the next test resource, etc.
  • the operator graph may also include “non-measured operator graphs”, so these “non-measured operator graphs” can also be allocated corresponding target resources, but not through
  • the "non-measured operator subgraph” is allocated by running under the test resource. For example, the resource remaining after removing the target resource of the measured operator subgraph may be allocated to the "non-measured operator subgraph”.
  • the present invention first determine the running state (running information) of the operator graph when it actually runs under different test resources, so that the actual effect of running the operator graph with various resources (test resources) can be clearly known, and then according to This effect determines the actual resources (target resources) allocated to each operator graph in the many-core system, so that it can reasonably allocate the resources of the many-core system and improve resource utilization.
  • using the test resource to run the measured operator subgraph (S002) includes: loading the measured operator subgraph into the many-core system, and using the test resource to run the measured operator subgraph in the many-core system.
  • the measured operator subgraph may be actually loaded (mapped) into the many-core system, and some resources in the many-core system are allocated as test resources, so as to determine whether it is in the many-core system. Runtime information in the actual runtime.
  • test systems it is also feasible to "simulate" the running information of the measured operator graph under the test resources in the many-core system. As long as it is guaranteed that the operating state in the test system is the same as that in the many-core system.
  • using the test resource to run the measured operator subgraph, and determining the target resource of the measured operator subgraph according to the operation information of the measured operator subgraph when running under the test resource includes: using the test resource to run all the measured calculations Subgraph, according to the running information of all measured operator subgraphs when they run under the test resource, determine the target resources of all measured operator subgraphs.
  • all the determined measured operator subgraphs may be run with test resources at one time "simultaneously" (of course, each measured operator subgraph is actually under a part of all test resources) run), thereby "together” to determine the target resources of all the measured operator subgraphs (of course, the actual target resources of each measured operator subgraph should also be specifically determined).
  • determining that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs (S001) includes: determining a plurality of measured operator graph groups, and each measured operator graph group includes at least one measured operator graph. subgraph.
  • the test resource is used to run the measured operator subgraph, and according to the operation information of the measured operator subgraph when running under the test resource, determining the target resource of the measured operator subgraph (S002) includes: using the test The resource runs each measured operator sub-graph group respectively, and determines the target resource of the measured operator sub-graph group according to the operation information of each measured operator sub-graph group when running under the test resource.
  • the measured operator subgraphs may also be divided into multiple groups (each group includes one or more measured operator subgraphs), and the measured operator subgraphs are separately stored in their respective test resources (that is, the test resources of different measured operator subgraph groups are different), so that each group (each) measured operator is determined in turn by traversing "group by group (if each group has only one measured operator subgraph, one by one)"
  • the target resource of the graph if a group has multiple measured operator subgraphs, of course, the target resource of each measured operator subgraph in the group must be specifically determined).
  • the total amount of configuration data of the measured operator subgraphs in each measured operator subgraph group is less than or equal to the on-chip storage space of the many-core system.
  • the total data volume of the configuration data of all the measured operator subgraphs in any measured operator subgraph group should not exceed the on-chip storage space of the many-core system.
  • the on-chip storage space refers to the storage space inside the many-core system, which varies with the many-core system. Through the parallel connection of chips, the on-chip storage space of the popular-core system can be increased, and the computing power of the many-core system can be improved.
  • each operator graph has certain configuration data (such as connection weights, membrane potentials, emission thresholds, convolution kernels, etc.), and these configuration data need to be stored in the on-chip storage space of the many-core system. Only then can the system perform operations and processing on the corresponding operator graph.
  • configuration data such as connection weights, membrane potentials, emission thresholds, convolution kernels, etc.
  • these operator graphs should be processed in time division (serial) in a "time division multiplexing” manner, or an operator graph (or a group of operator graphs) should be ) after “folding (splitting)” into multiple operator graphs (or multiple sets of operator graphs), then run the test resources in a time-sharing manner, and determine their target resources respectively.
  • the many-core system can be "put down", so that the operating resources in the many-core system can be used to process all the actual measurements in each measured operator subgraph group at the same time. operator graph.
  • determining the target resource of the measured operator subgraph includes: determining a resource that can make the operation information of the measured operator subgraph meet the demand information as the target resource of the measured operator subgraph.
  • “requirement information” may be preset, that is, the performance that the user expects the many-core system to achieve when running the operator graph, such as the desired processing speed, energy consumption, and the like.
  • determining that the resource that can make the operation information of the measured operator subgraph meet the demand information is the target resource of the measured operator subgraph includes: determining that the minimum resource that can make the operation information of the measured operator subgraph meet the demand information is: The target resource of the measured operator subgraph.
  • the “least” resource can be selected as the target resource, that is, the “just enough” requirement information can be selected as the target resource.
  • the resources are used as the actual target resources to achieve the purpose of saving resources as much as possible and improving resource utilization.
  • the amount of test resources can be gradually increased or decreased, and different test resources are used to run the operator graph, so as to determine the test resources that "just meet” the demand information as the target resources.
  • determining that the resource that can make the operation information of the measured operator subgraph meet the demand information is the target resource of the measured operator subgraph includes: if the resource that can make the operation information of the measured operator subgraph meet the demand information cannot be determined, A prompt is issued.
  • the user can perform further corresponding operations. For example, the user may allow corresponding reduction of demand information, or determine that the number of operator graphs to be allocated resources can be reduced, etc., so as to re-execute the method of the embodiment of the present invention according to the adjusted situation to realize resource allocation.
  • the resources include running resources; determining the target resource of the measured operator subgraph (S002) includes: determining the test resource that meets the conditions of the resource allocation type as the target running resource of the measured operator subgraph.
  • each resource allocation type has certain conditions, so that when under one resource allocation type (or "allocation mode") , the test resource that meets the requirements of the current resource allocation type can be used as the target running resource of the measured operator subgraph.
  • using the test resource to run the measured operator subgraph, and determining the target resource of the measured operator subgraph according to the operation information of the measured operator subgraph when running under the test resource (S002) includes: using multi-level test resources to run respectively For the measured operator subgraph, according to the operation information of the measured operator subgraph when it runs under the test resources at all levels, the first-level test resource that meets the conditions of the resource allocation type is determined as the target operation resource of the measured operator subgraph.
  • the amount of running resources of the higher-level test resources is greater than the amount of running resources of the lower-level test resources.
  • the measured operator subgraphs may be run under different test resources of multiple "levels", and one level of test is selected according to the operation information of the actual measured operator subgraphs under the test resources of all levels.
  • the resource runs the resource as the corresponding target.
  • the "level" of the test resources is divided by the "running resource amount" therein, that is, the higher the level of the test resource, the more the running resource amount is included.
  • running resources may specifically include processing cores, threads, on-chip storage space, etc.
  • the more processing cores, threads, and on-chip storage space included in the test resources (as long as the above items are different) ), the corresponding "level" is higher.
  • the difference in the amount of operating resources included in the test resources of two adjacent "levels” can be preset according to needs, not necessarily the minimum difference in the amount of operating resources that can exist in theory.
  • two adjacent "levels” The number of processing cores included in the test resource may differ by only one, or may differ by another predetermined number (eg, a difference of 10% of the total number of processing cores).
  • the operating resources include processing cores.
  • each processing core may be directly allocated to a corresponding operator graph as a target running resource, that is, each operator graph can be processed by one or more corresponding processing cores.
  • test resources of different levels may be different numbers of processing cores, for example, each additional processing core may be an increase of one level of test resources.
  • the specific content of the running resources is not limited to processing cores.
  • the running resources may also include the number of threads, the amount of on-chip storage space, etc., that is, the differences in these resources can also be regarded as different levels of test resources.
  • the resource allocation type includes a high-performance type; the test resources that meet the conditions of the high-performance type are: among all the test resources of the level, the test resource of the first level that enables the fastest processing speed of the measured operator subgraph.
  • the resource allocation type includes a high-performance type.
  • allocation mode the main purpose is to optimize the performance of the operator graph when running in a many-core system , so the test resource with the fastest processing speed (that is, the amount of data that the operator graph can process in unit time) should be selected as the target running resource.
  • the amount of running resources and the processing speed of the operator graph are not necessarily positively correlated, because if the amount of running resources (such as the number of processing cores) corresponding to an operator graph is too large, it may cause the operator graph to be mapped In the process, it is excessively "scattered" to multiple processing cores, which leads to a decrease in processing speed.
  • the "dichotomy" may be used to determine the target running resource. For example, first use one processing core, half processing cores, and all processing cores to run the measured operator graph, and then it can be determined that the target operating resources (that is, the number of processing cores with the fastest processing speed) should be the number of processing cores at both ends.
  • the next test resource can be the middle (half) of the above two, that is, three-quarters of the number of processing cores; And so on, until the number of processing cores with the fastest processing speed is determined as the target running resource.
  • the resource allocation type includes an energy-saving type; the difference in the amount of running resources in any two adjacent test resources is equal; the test resources that meet the conditions of the energy-saving type are: all level tests whose time reduction value is greater than a preset threshold The lowest level test resource in the resource.
  • the time reduction value of the test resource is the time-consuming of running the measured operator subgraph with the test resource at this level to process the predetermined data, compared with the time-consuming of running the measured operator subgraph with the test resource one level lower than the test resource at this level to process the predetermined data. reduction.
  • the resource allocation type includes an energy-saving type, and in this resource allocation type (allocation mode), the main purpose is to make as much as possible on the premise of satisfying basic performance (such as basic processing speed). Minimize the "energy consumption" when the operator graph is run.
  • the energy-saving type mainly considers reducing data handling, so that scheduling instructions, processing core operations, throughput, delay, timing, serial-parallel relationship, etc. are all inclined to reduce the amount of data handling. In order to reduce energy consumption under the condition of meeting certain performance.
  • the resource amount of the test resource can be gradually increased from the minimum resource amount until a test resource that greatly improves the performance is found, which is used as the target running resource. For example, you can first run the operator graph with the minimum number of processing cores that are theoretically feasible (calculated according to the calculation amount of the operator graph), and record the time it takes to process predetermined data (such as randomly generated input data), and then add a certain amount of time.
  • the resource allocation type includes a balanced type; the test resources that meet the conditions of the balanced type are: the first-level test resources with the largest energy consumption ratio among all the level-level test resources.
  • the energy consumption ratio of the test resource is the amount of data that can be processed by consuming unit energy consumption when running the measured sub-graph with the test resource at this level.
  • the resource allocation type includes a balanced type, and this resource allocation type (allocation mode) can be regarded as an "intermediate mode” or “integrated mode” of the above performance type and energy saving type, Its purpose is to balance the relationship between performance (such as processing speed) and energy consumption to obtain the optimal energy consumption ratio.
  • the energy consumption ratio is the amount of data that the operator graph can process per unit of energy consumed.
  • the energy consumption ratio under a certain test resource can be calculated by the following formula:
  • the running time of the processing core refers to the time it takes for the processing core to process some data
  • the operation frequency of the processing core refers to the number of operations performed by the processing core in a unit time, such as the main frequency of the processing core multiplied by the number of processing cores
  • the energy consumption of the transport configuration and the energy consumption of the processing core calculation are the two main forms of energy consumption. It should be understood that the processing cores at this time refer to all processing cores in the current test resource.
  • the energy consumption ratio of testing resources at all levels there are various specific ways to determine the energy consumption ratio of testing resources at all levels. For example, it is possible to run the operator graph under the test resources at all levels for the same time, and record the energy consumption of the test resources at all levels to calculate the energy consumption ratio of the test resources at all levels.
  • test resources For example, it is possible to use all levels of test resources to run all measured operator subgraphs and determine their target operation resources at the same time, or to use the corresponding test resources at all levels for each (each group) of measured operator subgraphs. run to determine the target running resources corresponding to each group of measured operator subgraphs one by one.
  • the resource allocation type includes any one of a high-performance type, an energy-saving type, and a balanced type.
  • the resource allocation type includes either high-performance type or balanced type.
  • the Select any one of high-performance type, energy-saving type, and balance type as the current allocation mode when the total amount of configuration data of all measured operator subgraphs does not exceed the on-chip storage space of the many-core system (that is, when the many-core system “can put down” all measured operator subgraphs), then the Select any one of high-performance type, energy-saving type, and balance type as the current allocation mode.
  • the optional resource allocation types only include high-performance types This is because if the many-core system "cannot put down” the measured operator subgraph, the above time division multiplexing method must be adopted, which will inevitably involve a large amount of data handling, and the energy consumption will not be high. may be effectively reduced, so the energy-saving type cannot be used.
  • the resources include time resources.
  • determining at least part of the operator graphs in the plurality of operator graphs to be the measured operator graphs ( S001 ) includes the following S101 .
  • S101 Determine a plurality of measured operator subgraph groups, and each measured operator subgraph group includes at least one measured operator subgraph.
  • use the test resource to run the measured operator subgraph, and determine the target resource of the measured operator subgraph according to the operation information of the measured operator subgraph when running under the test resource ( S002 ) includes the following S102 .
  • each group includes some measured operator subgraphs.
  • the measured operator graphs in the same group can be "put down" in the many-core system, they can run in the many-core system at the same time, while the many-core systems in different groups need to be "time-division multiplexed", that is, the many-core system is loaded first And process a measured operator subgraph group for a period of time, then remove the measured operator subgraph group, and load and process the next measured operator subgraph group for a period of time, and so on.
  • the running time (target time) of each measured operator subgraph group can be determined according to the overall running state (total operation information) of all the measured operator subgraph groups. resources) to optimize their overall performance.
  • each measured operator sub-graph group when each measured operator sub-graph group is running, it is also possible to use any method in the embodiment of the present invention (such as some of the above) according to its operation information.
  • Each allocation mode allocates corresponding target running resources to it, and more specifically determines the target running resources of each measured operator subgraph in the group, which will not be described in detail here.
  • determining the target time resource of each measured operator subgraph group according to the total operation information of all the measured operator subgraph groups (S102) includes: according to the operation information of all the measured operator subgraph groups, The time resource for the total operation information of the operator graph group to reach the demand information is the target time resource of each measured operator graph group.
  • the target time resource can also be allocated to each measured operator subgraph group by making the total operation information of each operator graph meet the requirement information. For example, the running time of each measured sub-graph group may be adjusted so that the total processing speed (eg, frame rate) of all measured operation sub-graph groups reaches the expected processing speed (eg, expected frame rate).
  • the total processing speed eg, frame rate
  • the demand information includes processing speed demand information.
  • the demand information may be a desired processing speed (eg, an expected frame rate), so that the total operation information is also a corresponding overall processing speed.
  • a desired processing speed eg, an expected frame rate
  • the processing speed can be specifically represented by the "frame rate", that is, when multiple data (such as multi-frame images) are continuously input, the number of data (such as image frames) that can be processed by each measured sub-picture group per unit time number).
  • the time-sharing loading each measured operator sub-graph group into the many-core system for running (S102) includes: time-sharing loading each measured operator sub-graph group into the many-core system to run for an equal time part.
  • each measured operator subgraph group may be run at an "equal" time, that is, during the "time division multiplexing" process of running each measured operator subgraph group, each The measured operator subgraph groups are run “isochronously” in the same time period, and the total running information under the "isochronous” running mode is obtained. Therefore, in the subsequent process of determining the target time resources of each measured operator subgraph group, it is equivalent to determining through analysis that the running time of each measured operator subgraph group should be "extended” or “shortened” to improve its overall operation. Status (total operating information).
  • determining the target time resources of each measured operator sub-graph group (S102) includes: determining the running time proportion of each measured operator sub-graph group in each predetermined operation cycle.
  • the target time resources of the above measured sub-graph groups may not be an absolute time period, but the relative proportion of the running time of each measured sub-graph group in each running cycle. .
  • each measured operator sub-graph group should run in turn according to a predetermined order, and each measured operator sub-graph group runs for a predetermined period of time, so as to determine the length of time in any continuous period of time.
  • the running time of each measured operator subgraph group conforms to a specific proportional relationship.
  • the resources include operational resources.
  • determining at least part of the operator graphs in the plurality of operator graphs to be the measured operator graphs ( S001 ) includes the following S201 .
  • the multiple operator graphs include a first operator graph with demand information, at least determine that the first operator graph is an actual measured operator graph.
  • test resource uses the test resource to run the measured operator subgraph, and determine the target resource of the measured operator subgraph (S002) and the following S202 according to the operation information of the measured operator subgraph when running under the test resource.
  • the running resource of is the target running resource of the first operator graph.
  • At least some of the operator graphs may be preset with corresponding required information (for example, the expected processing speed), which is called the first operator graph.
  • the operator graphs with demand information are often more important key operator graphs. Therefore, it is more beneficial to improve the overall performance of all operator graphs by giving priority to satisfying the demand information of these first operator graphs.
  • the first operator graph is preferentially run with test resources, and target running resources that can meet the demand information are allocated to the first operator graph according to the running information. .
  • the above target operating resources may further be the minimum operating resources that can satisfy the demand information, which will not be described in detail here.
  • the above specific manners for determining the target running resource of the first operator graph are various.
  • all the first operator graphs can be run simultaneously to determine the target running resources of all the first operator graphs at the same time, or each first operator graph can be run (traversed) one by one (group) to determine different first operator graphs in turn.
  • a target running resource of an operator graph; for another example, any one of the above allocation modes can be used to determine the target running resource, which will not be described in detail here.
  • the first operator graph is run with the test resource, and according to the operation information of the first operator graph when running under the test resource, it is determined that the operation information of the first operator graph can meet the requirement information
  • the running resource of the first operator graph is the target running resource (S202) including the following S2021 to S2023.
  • the set threshold such as the speed threshold
  • the test resources may start from a larger amount of running resources (such as all the remaining running resources at present), and gradually decrease until the running information under a certain test resource is equal to or slightly higher than the demand information, it can be considered that this test resource is the smallest test resource that can satisfy the running information, so it is the target running resource.
  • the specific ways of reducing the test resources are also various, and can be determined according to the specific operation resource type.
  • the method of "reducing test resources” may include : Calculate the first number ratio between the number of processing cores included in the test resource and the number of processing cores that have not yet been allocated, and calculate the second number ratio between the number of threads included in the test resource and the number of threads that have not been allocated; if the first number ratio If the number is greater than or equal to the second number ratio, the number of processing cores included in the test resource is reduced, and if the first number ratio is less than the second number ratio, the number of threads included in the test resource is reduced.
  • the test resource is used to run the first operator graph, and according to the running information of the first operator graph when running under the test resource, it is determined that the running resource that can make the running information of the first operator graph meet the requirement information is:
  • the target running resources of the first operator graph (S202) include: if the first operator graph includes a first operator graph with a time sequence relationship and a first operator graph without a time sequence relationship, perform the following S2024 and S2025.
  • test resource Use the test resource to run the first operator graph with the time sequence relationship, and determine the operation of the first operator graph with the time sequence relationship according to the operation information of the first operator graph with the time sequence relationship when running under the test resource
  • the operation resource of the information reaching the requirement information is the target operation resource of the first operator graph with a time sequence relationship.
  • test resource Use the test resource to run the first operator graph without the time sequence relationship, and determine the operation of the first operator graph without the time sequence relationship according to the operation information of the first operator graph without the time sequence relationship when running under the test resource
  • the operation resource of the information reaching the requirement information is the target operation resource of the first operator graph without a time sequence relationship.
  • the first operator graph with demand information it can be further divided into two categories according to whether it has a time sequence relationship.
  • timing relationship is the sequence relationship of running time that different operator graphs must satisfy when running.
  • timing relationships may include “serial relationships” and “parallel relationships.”
  • an operator graph with a time sequence relationship means that the operator graph must have a time sequence relationship with at least one other operator graph, such as the above serial relationship, parallel relationship, or Layers" are more complex indirect temporal relationships composed of serial and parallel relationships.
  • the fact that an operator graph has no time sequence relationship means that the operator graph is relatively independent and has no time sequence relationship with any other operator graph.
  • serial relationship means that the operation of one operator graph must use the output (calculation result) of another operator graph as input (or part of the input), thus, only when the operation of the previous operator graph is completed , the operation of the operator graph is only possible after the latter, that is, the two operator graphs are "serial”.
  • parallel relationship means that multiple operator graphs need to process some related data, and their processing results need to be used together, so these operator graphs should run at the same time, that is, "parallel”.
  • an image recognition neural network is used to perform image recognition on an image detected by an image detection neural network
  • the image recognition neural network is a post-operator graph in the serial relationship
  • the image detection neural network is a serial relationship.
  • a network is a graph of two parallel operators.
  • the target running resources can be allocated to the first operator graphs that have a time sequence relationship, so as to ensure that the first operator graphs that have a time sequence relationship can be assigned to Better target resources (e.g. not having to reduce demand information), thus improving the overall performance of all operator graphs.
  • step S203 is further included.
  • the multiple operator graphs include a second operator graph without demand information, determine the remaining operating resources except the target operating resources of the first operator graph as the target operating resources of the second operator graph.
  • all the operator graphs may also include operator graphs without demand information (second operator graphs). Although these second operator graphs do not have clear demand information, they also need to be used in the many-core graph. It runs in the system, so it also needs to allocate certain target running resources for it.
  • the target operating resources allocated to all the first operator graphs are determined, these allocated target operating resources can be excluded, and the remaining unallocated operating resources in the many-core system (ie, remaining operating resources), As the target running resource allocated to the second operator graph.
  • the remaining running resources may be evenly allocated to each second operator graph, or the remaining running resources may be allocated to each second operator graph according to theoretical estimation.
  • the second operator graph can also be used as a "measured operator graph", that is, after running each first operator graph with test resources and determining the target running resources of each first operator graph, continue to use other
  • the test resource runs the second operator graph, so as to determine the target running resource of each second operator graph according to the operation information of the second operator graph.
  • the test resources and target running resources corresponding to the second operator graph should be within the range of the remaining running resources after removing the target resources allocated to the first operator graphs.
  • the second operator graph can also be divided into multiple "groups", and each group includes one or more second operator graphs (the number of second operator graphs in each group can be determined according to the amount of remaining operating resources). ), and each group of second operator graphs can occupy all the remaining operating resources, but the operation of each group of second operator graphs is “time division multiplexed”, that is, the second operator graphs of different groups occupy different target time resources .
  • the resources include operational resources.
  • determining at least part of the operator graphs in the plurality of operator graphs to be the measured operator graphs ( S001 ) includes the following S301 .
  • the multiple operator graphs include a third operator graph with a time sequence relationship, at least determine that the third operator graph is an actual measured operator graph.
  • use the test resource to run the measured operator subgraph, and determine the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when running under the test resource ( S002 ) includes the following S302 .
  • the third operator graph there may be at least some of the operator graphs with temporal relationship, which is called the third operator graph.
  • the overall processing speed of multiple operator graphs with temporal relationship is essentially determined by Due to the processing speed of the "slowest" operator graph, as long as one operator graph with a time sequence relationship runs slowly, the actual running speed of multiple related operator graphs will be slowed down. Therefore, it is more conducive to improve the overall performance of all operator graphs by prioritizing allocating target operating resources to these third operator graphs with temporal relationships (eg, to ensure that they can meet demand information).
  • the third operator graph is preferentially run with test resources, and target operation resources are allocated to it according to its operation information (for example, a target that satisfies its requirement information). run resources).
  • At least determining that the third operator graph is a measured operator graph further includes: determining a plurality of measured operator graph groups, each measured operator graph group including at least one third operator graph, any Two third operator graphs with a time sequence relationship between them are located in the same measured operator subgraph group or an adjacent measured operator subgraph group.
  • Running the third operator graph with the test resource, and determining the target running resource of the third operator graph according to the running information of the third operator graph when running under the test resource includes: using the test resource to run each measured operator respectively In the graph group, according to the operation information of each measured operator subgraph group when it runs under the test resource, the target operation resource of the measured operator subgraph group is determined.
  • third operator graphs with temporal relationship there is not necessarily a demand relationship between "any two", and at the same time, many-core systems may not necessarily “put down” all third operators with temporal relationship. picture.
  • the above third operator graphs can also be grouped, so that the third operator graphs with the exact timing relationship between them are in the same group or in adjacent groups, and all the third operator graphs in each group are many-core systems. "Can be put down”, so that the many-core system can run each group of third operator graphs separately, and determine the target operating resources of each group of third operator graphs (of course, it is necessary to specifically determine each third operator in a group. subgraph's target runtime resource).
  • the target running resource of the third operator graph is determined ( S302 )
  • the following S303 is further included.
  • the multiple operator graphs include a fourth operator graph without a time sequence relationship, determine the remaining operating resources except the target operating resources of the third operator graph as the target operating resources of the fourth operator graph.
  • all the operator graphs may also include operator graphs without time sequence relationship (the fourth operator graph). Although these operator graphs have no time sequence relationship, they also need to run in the many-core system, so they also need to be It allocates certain target running resources.
  • the target running resources allocated to all the third operator graphs are determined, these allocated target running resources can be excluded, and the remaining unallocated running resources (ie remaining running resources) in the many-core system can be used as As the target running resource allocated to the fourth operator graph.
  • the remaining running resources may be evenly allocated to each fourth operator graph, or the remaining running resources allocated to each fourth operator graph may be determined according to theoretical estimation.
  • the fourth operator graph can also be used as a "measured operator graph", that is, after determining the target running resources of each third operator graph, the fourth The operation information of the operator graph determines the target operation resource of each fourth operator graph.
  • the test resources and target running resources corresponding to the fourth operator graph should be within the range of the remaining running resources after removing the target resources allocated to the third operator graphs.
  • the fourth operator graph can also be divided into multiple "groups", and each group includes one or more fourth operator graphs (the number of fourth operator graphs in each group can be determined according to the remaining operating resources). ), and each group can occupy all the remaining operating resources, but each group of fourth operator graphs “time-division multiplexing” operation, that is, the fourth operator graphs of different groups occupy different target time resources.
  • the resources include operational resources and time resources.
  • determining at least part of the operator graphs in the plurality of operator graphs to be the measured operator graphs (S001) includes the following S401.
  • the multiple operator graphs include a third operator graph with a time sequence relationship, at least determine that the third operator graph is an actual measured operator graph, and determine a plurality of measured operator graph groups, each measured operator graph group includes At least one third operator graph.
  • use the test resource to run the measured operator subgraph, and determine the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when running under the test resource ( S002 ) includes the following S402 .
  • each measured operator sub-graph group into the many-core system to run in time-sharing, and determine the target time resource of each measured operator sub-graph group according to the operation information of each measured operator sub-graph group, and determine the type of resource allocation that conforms to the actual measurement.
  • the conditional test resource is the target running resource of each measured operator subgraph group.
  • the third operator graph can be divided into a plurality of measured operator graph groups first, and each measured operator graph group is run in time-sharing, so as to Determine the target time resources of each measured operator subgraph group separately.
  • the target operation resource of the measured operator subgraph group within the target time resource can be determined according to the currently selected resource allocation type (allocation mode), and more specifically, to determine the actual measured operator subgraph group.
  • the target running resource of each third operator graph in the operator graph group within the target time resource can be determined according to the currently selected resource allocation type (allocation mode), and more specifically, to determine the actual measured operator subgraph group.
  • the target operation resource determined for each measured operator subgraph group can make the operation information of the measured operator subgraph group meet the corresponding demand information; and the target time resource determined for each measured operator subgraph group can make the operation information of the measured operator subgraph group meet the corresponding demand information;
  • the total operation information of all measured operator subgraph groups satisfies the total demand information, and will not be described in detail here.
  • an embodiment of the present invention provides a resource allocation device, which is a corresponding device for implementing the resource allocation method provided by the above-mentioned embodiments of the present invention.
  • the device can be implemented in software and/or hardware, and can generally be integrated with a computer Equipment is medium.
  • FIG. 6 is a schematic diagram of a resource allocation apparatus 40 in an embodiment of the present invention.
  • a resource allocation apparatus 40 provided by an embodiment of the present invention includes an operator graph determination module 410 and a resource determination module 420 .
  • the operator graph determining module 410 is configured to determine a plurality of operator graphs of resources to be allocated, and determine that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs.
  • the resource determination module 420 is configured to use the test resource to run the measured operator subgraph, and determine the target resource of the measured operator subgraph according to the operation information of the measured operator subgraph when running under the test resource; wherein, the target resource of the operator graph is the operator graph.
  • the resources include running resources; determining the target resource of the measured operator subgraph includes: determining the test resource that meets the conditions of the resource allocation type as the target running resource of the measured operator subgraph.
  • using the test resource to run the measured operator subgraph, and determining the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when running under the test resource includes: using the multi-level test resources to separately run the measured operator Figure, according to the operation information of the measured operator subgraph when it runs under the test resources at all levels, determine the first-level test resource that meets the conditions of the resource allocation type as the target operation resource of the measured operator subgraph; among them, among any two-level test resources, Higher-level test resources run a greater amount of resources than lower-level test resources.
  • the resource allocation type includes a high-performance type; the test resources that meet the conditions of the high-performance type are: among all the test resources of the level, the test resource of the first level that enables the fastest processing speed of the measured operator subgraph.
  • the resource allocation type includes an energy-saving type; the difference in the amount of running resources in any two adjacent test resources is equal; the test resources that meet the conditions of the energy-saving type are: all test levels whose time reduction value is greater than a preset threshold The lowest-level test resource in the resource; among them, the time reduction value of the test resource is the time-consuming of using the test resource of this level to run the measured subgraph to process the predetermined data. A reduction in the time it takes for the subgraph to process predetermined data.
  • the resource allocation type includes a balanced type; the test resources that meet the conditions of the balanced type are: the first-level test resource with the largest energy consumption ratio among all the level-level test resources; The amount of data that can be processed by the unit energy consumption when the level test resource runs the measured operator subgraph.
  • the resource allocation type includes any one of a high-performance type, an energy-saving type, and a balanced type;
  • the total amount of configuration data of the operator graph is larger than the on-chip storage space of the many-core system, and the resource allocation type includes either high-performance type or balanced type;
  • the test resources that meet the conditions of the high-performance type are: The first-level test resource that enables the fastest processing speed of the measured operator subgraph;
  • the test resource that meets the conditions of the energy-saving type is: the lowest-level test resource among all the test resources whose time reduction value is greater than the preset threshold;
  • the time reduction value is the time-consuming of running the measured operator subgraph with the test resources of this level to process the predetermined data, compared with the time-consuming reduction of running the measured operator subgraph with the test resources one level lower than the test resources of this level to process the predetermined data;
  • the resources include time resources; determining that at least part of the operator graphs in the multiple operator graphs are measured operator graphs includes: determining multiple measured operator graph groups, each measured operator graph group including at least one Measured operator subgraph; use test resources to run the measured operator subgraph, and determine the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when it runs under the test resource, including: time-sharing grouping each measured operator subgraph It is loaded into the many-core system to run, and the target time resources of each measured operator subgraph group are determined according to the total operation information of all the measured operator subgraph groups.
  • the time-sharing loading each measured operator sub-graph group into the many-core system for running includes: time-sharing loading each measured operator sub-graph group into the many-core system for an equal period of time.
  • determining the target time resource of each measured operator subgraph group according to the total operation information of all the measured operator subgraph groups includes: according to the operation information of all the measured operator subgraph groups, The time resource for the total operation information of the group to reach the demand information is the target time resource of each subgraph group measured.
  • the demand information includes processing speed demand information.
  • determining the target time resource of each measured operator sub-graph group includes: determining the running time proportion of each measured operator sub-graph group in each predetermined operation cycle.
  • the resources include running resources; determining that at least part of the operator graphs in the multiple operator graphs are measured operator graphs includes: if the multiple operator graphs include a first operator graph with demand information, at least determining The first operator graph is the measured operator graph; the measured operator graph is run with the test resource, and the target resource of the measured operator graph is determined according to the operation information of the measured operator graph when running under the test resource, including: if there is a first operator graph Subgraph, use the test resource to run the first operator graph, and according to the operation information of the first operator graph when running under the test resource, determine the operation resource that can make the operation information of the first operator graph meet the demand information as the first operator.
  • the target runtime resource for the subgraph includes: if the multiple operator graphs include a first operator graph with demand information, at least determining The first operator graph is the measured operator graph; the measured operator graph is run with the test resource, and the target resource of the measured operator graph is determined according to the operation information of the measured operator graph when running under the test resource, including: if
  • the test resource is used to run the first operator graph, and according to the running information of the first operator graph when running under the test resource, it is determined that the running resource that can make the running information of the first operator graph meet the requirement information is:
  • the target running resources of the first operator graph include: if the first operator graph includes a first operator graph with a time sequence relationship and a first operator graph without a time sequence relationship, then: use the test resource to run the first operator graph with a time sequence relationship.
  • An operator graph according to the operation information of the first operator graph with a time sequence relationship when running under the test resource, determine that the operation resource that can make the operation information of the first operator graph with a time sequence relationship meet the demand information is a time sequence relationship.
  • the target running resource of the first operator graph of the The operation information of the first operator graph of the relationship reaches the target operation resource of the first operator graph with no time sequence relationship.
  • the method further includes: if the plurality of operator graphs include a resource without requirement information The second operator graph determines the remaining running resources except the target running resources of the first operator graph, which are the target running resources of the second operator graph.
  • the resources include operating resources; determining that at least part of the operator graphs in the multiple operator graphs are measured operator graphs includes: if the multiple operator graphs include a third operator graph having a time sequence relationship, at least determining The third operator graph is the measured operator graph; use the test resource to run the measured operator graph, and determine the target resource of the measured operator graph according to the running information of the measured operator graph when running under the test resource, including: using the test resource to run the first Three-operator graph, according to the running information of the third operator graph when it runs under the test resource, to determine the target running resource of the third operator graph.
  • determining at least the third operator graph as the measured operator graph further includes: determining a plurality of measured operator graph groups, each measured operator graph group including at least one third operator graph, and any two The third operator graph with a time sequence relationship is located in the same measured operator graph group or adjacent measured operator graph group; the third operator graph is run with the test resource, and the operation when the third operator graph is run under the test resource information, and determining the target running resource of the third operator graph includes: using the test resource to run each measured operator graph group respectively, and determining the measured operator graph according to the running information of each measured operator graph group when running under the test resource The group's target run resource.
  • the method further includes: if the plurality of operator graphs include a fourth operator graph that has no time sequence relationship, determining the target running resources except the third operator graph The remaining running resources outside the fourth operator graph are the target running resources of the fourth operator graph.
  • the resources include running resources and time resources; determining that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs includes: if the plurality of operator graphs include a third operator graph having a time sequence relationship , at least determine that the third operator graph is the measured operator graph, and determine a plurality of measured operator graph groups, each of which includes at least one third operator graph; use the test resources to run the measured operator graph, according to The running information of the measured operator subgraph when it runs under the test resource, and determining the target resource of the measured operator subgraph includes: loading each measured operator subgraph group into the many-core system in a time-sharing manner, according to each measured operator subgraph group. to determine the target time resource of each measured operator subgraph group, and determine the test resource that meets the conditions of the resource allocation type as the target operation resource of each measured operator subgraph group.
  • running the measured operator subgraph with the test resource includes: loading the measured operator subgraph into the many-core system, and running the measured operator subgraph with the test resource in the many-core system.
  • using the test resource to run the measured operator subgraph, and determining the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when running under the test resource includes: running all the measured operator subgraphs with the test resource, Determine the target resources of all measured operator subgraphs according to the running information of all measured operator subgraphs when they run under the test resource.
  • determining that at least part of the operator graphs in the plurality of operator graphs are measured operator graphs includes: determining a plurality of measured operator graph groups, and each measured operator graph group includes at least one measured operator graph; Run the measured operator subgraph with the test resource, and determine the target resource of the measured operator subgraph according to the running information of the measured operator subgraph when it runs under the test resource.
  • the running information of the operator graph group when it runs under the test resource determines the target resource of the measured operator graph group.
  • the total amount of configuration data of the measured operator subgraphs in each measured operator subgraph group is less than or equal to the on-chip storage space of the many-core system.
  • determining the target resource of the measured operator subgraph includes: determining the resource that enables the operation information of the measured operator subgraph to meet the demand information as the target resource of the measured operator subgraph.
  • determining that the resource that can make the operation information of the measured operator subgraph meet the demand information is the target resource of the measured operator subgraph includes: determining that the minimum resource that can make the operation information of the measured operator subgraph meet the demand information is: The target resource of the measured operator subgraph.
  • determining that the resource that can make the operation information of the measured operator subgraph meet the demand information is the target resource of the measured operator subgraph includes: if the resource that can make the operation information of the measured operator subgraph meet the demand information cannot be determined, A prompt is issued.
  • an embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, any one of the embodiments of the present invention is implemented. resource allocation method.
  • FIG. 7 is a block diagram of an exemplary computer device 12 provided by an embodiment of the present invention for implementing the method implemented by the present invention.
  • the computer device 12 shown in FIG. 7 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
  • computer device 12 takes the form of a general-purpose computing device.
  • Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .
  • Computer device 12 may be a device attached to the bus.
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
  • Computer device 12 typically includes a variety of computer-readable storage media. These media can be any available media that can be accessed by computer device 12, including both volatile and nonvolatile media, removable and non-removable media.
  • System memory 28 may include computer-readable storage media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 .
  • Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer-readable storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive”).
  • a disk mover may be provided for reading and writing to removable non-volatile magnetic disks (eg, "floppy disks"), as well as removable non-volatile optical disks (eg, compact disk read-only memory).
  • System memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.
  • a program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other Program modules and program data, each or some combination of these examples may include an implementation of a network environment.
  • Program modules 42 generally perform the functions and/or methods of the described embodiments of the present invention.
  • Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with computer device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. Such communication may take place through an input/output (I/O) interface 22 .
  • the computer equipment 12 can also communicate with one or more networks (such as local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) through the network adapter 20. As shown in the figure, the network adapter 20 communicates with the bus 18 with communicates with other modules of the computer device 12.
  • networks such as local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) through the network adapter 20.
  • the network adapter 20 communicates with the bus 18 with communicates with other modules of the computer device 12.
  • an embodiment of the present invention provides a computer-readable storage medium 50 on which a computer program is stored, and when the computer program is executed by a processor, implements any resource allocation method in the embodiment of the present invention.
  • the computer storage medium in the embodiments of the present invention may adopt any combination of one or more computer-readable mediums.
  • the computer-readable storage medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, RAM, Read Only Memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable CD-ROM, optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable storage medium other than a computer-readable storage medium that can be sent, propagated, or transmitted for use by or in connection with the instruction execution system, apparatus, or device program of.
  • Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium, including - but not limited to - wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the foregoing .
  • RF radio frequency
  • Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a LAN or WAN, or may be connected to an external computer (eg, through an Internet connection using an Internet service provider).

Abstract

公开的资源分配方法包括:确定多个待分配资源的算子图,确定多个算子图中的至少部分算子图为实测算子图;用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源;其中,所述算子图的目标资源为算子图在众核系统中运行时占用的资源。

Description

资源分配方法和装置、计算机设备、计算机可读存储介质 技术领域
本发明实施例涉及人工智能领域,尤其涉及一种资源分配方法和装置、计算机设备、计算机可读存储介质。
背景技术
近年来,随着人工智能相关应用和技术的飞速发展,对计算能力和能耗的要求日益提高,专用人工智能(Artificial Intelligence,AI)芯片来运行AI算法已经成为未来的趋势。
AI芯片可为众核系统,众核系统包括多个可独立调度的处理核(核心,Core),多个处理核可协同处理一个任务(如算子图),故为任务合理的分配众核系统的资源是重要的。
然而,相关技术中众核系统的资源分配不合理,资源利用率低。
发明内容
本发明实施例提供一种资源分配方法和装置、计算机设备、计算机可读存储介质,其可以合理的配置众核系统的资源,以提高众核系统的资源利用率。
第一方面,本发明实施例提供一种资源分配方法,其包括:确定多个待分配资源的算子图,确定多个算子图中的至少部分算子图为实测算子图;用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源;其中,所述算子图的目标资源为算子图在众核系统中运行时占用的资源。
在一些实施例中,所述资源包括运行资源;所述确定所述实测算子图的目标资源包括:确定符合资源分配类型的条件的测试资源为所述实测算子图的目标运行资源。
在一些实施例中,所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:所述用多级测试资源分别运行所述实测算子图,根据所述实测算子图在各级测试资源下运行时的运行信息,确定符合资源分配类型的条件的一级测试资源为所述实测算子图的目标运行资源;其中,任意两级测试资源中,较高级测试资源的运行资源量大于较低级测试资源的运行资源量。
在一些实施例中,所述资源分配类型包括高性能类型;符合所述高性能类型的条件的测试资源为:所有级测试资源中,使所述实测算子图的处理速度最快的一级测试资源。
在一些实施例中,所述资源分配类型包括节能类型;任意两相邻级测试资源中的运行资源量的差相等;符合所述节能类型的条件的测试资源为:时间降低值大于预设阈值的所有级测试资源中的最低级测试资源;其中,所述测试资源的时间降低值为用该级测试资源运行实测算子图处理预定数据的耗时,相对用比该级测试资源低一级的测试资源运行实测算子图处理预定数据的耗时的减少量。
在一些实施例中,所述资源分配类型包括均衡类型;符合所述均衡类型的条件的测试资源为:所有级测试资源中,能耗比最大的一级测试资源;其中,所述测试资源的能耗比为用该级测试资源运行实测算子图时消耗单位能耗所能处理的数据量。
在一些实施例中,若所有所述实测算子图的配置数据总量小于或等于众核系统的片上存储空间,所述资源分配类型包括高性能类型、节能类型、均衡类型中的任意一种;若所有所述实测算子图的配置数据总量大于众核系统的片上存储空间,所述资源分配类型包括高性能类型、均衡类型中的任意一种;符合所述高性能类型的条件的测试资源为:所有级测试资源中,使所述实测算子图的处理速度最快的一级测试资源;符合所述节能类型的条件的测试资源为:时间降低值大于预设阈值的所有级测试资源中的最低级测试资源;其中,所述测试资源的时间降低值为用该级测试资源运行实测算子图处理预定数据的耗时,相对用比该级测试资源低一级的测试资源运行实测算子图处理预定数据的耗时的减少量;符合所述均衡类型的条件的测试资源为:所有级测试资源中,能耗比最大的一级测试资源;其中,所述测试资源的能耗比为用该级测试资源运行实测算子图时消耗单位能耗所能处理的数据量。
在一些实施例中,所述资源包括时间资源;所述确定多个算子图中的至少部分算子图为实测算子图包括:确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图;所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:分时的将每个所述实测算子图组加载至众核系统中运行,根据所有所述实测算子图组的总运行信息,确定各实测算子图组的目标时间资源。
在一些实施例中,所述分时的将每个所述实测算子图组加载至众核系统中运行包括:分时的将每个所述实测算子图组加载至众核系统中运行相等的时间段。
在一些实施例中,所述根据所有所述实测算子图组的总运行信息,确定各实测算子图组的目标时间资源包括:根据所有所述实测算子图组的运行信息,确定能使所有所述实测算子图组的总运行信息达到需 求信息的时间资源为各实测算子图组的目标时间资源。
在一些实施例中,所述需求信息包括处理速度需求信息。
在一些实施例中,所述确定各实测算子图组的目标时间资源包括:确定在每个预定的运行周期中各实测算子图组的运行时间占比。
在一些实施例中,所述资源包括运行资源;所述确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有需求信息的第一算子图,至少确定所述第一算子图为实测算子图;所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:若存在所述第一算子图,用测试资源运行所述第一算子图,根据所述第一算子图在测试资源下运行时的运行信息,确定能使所述第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源。
在一些实施例中,所述用测试资源运行所述第一算子图,根据所述第一算子图在测试资源下运行时的运行信息,确定能使所述第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源包括:若所述第一算子图中包括有时序关系的第一算子图和无时序关系的第一算子图,则:用测试资源运行有时序关系的第一算子图,根据有时序关系的第一算子图在测试资源下运行时的运行信息,确定能使有时序关系的第一算子图的运行信息达到需求信息的运行资源为有时序关系的第一算子图的目标运行资源;用测试资源运行无时序关系的第一算子图,根据无时序关系的第一算子图在测试资源下运行时的运行信息,确定能使无时序关系的第一算子图的运行信息达到需求信息的运行资源为无时序关系的第一算子图的目标运行资源。
在一些实施例中,在所述确定能使所述第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源后,还包括:若多个算子图包括无需求信息的第二算子图,确定除所述第一算子图的目标运行资源外的剩余运行资源,为所述第二算子图的目标运行资源。
在一些实施例中,所述资源包括运行资源;所述确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有时序关系的第三算子图,至少确定所述第三算子图为实测算子图;所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:用测试资源运行所述第三算子图,根据所述第三算子图在测试资源下运行时的运行信息,确定所述第三算子图的目标运行资源。
在一些实施例中,所述至少确定所述第三算子图为实测算子图还包括:确定多个实测算子图组,每个所述实测算子图组包括至少一个第三算子图,任意两个之间有时序关系的第三算子图位于同一实测算 子图组或相邻实测算子图组;所述用测试资源运行所述第三算子图,根据所述第三算子图在测试资源下运行时的运行信息,确定所述第三算子图的目标运行资源包括:用测试资源分别运行各所述实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的目标运行资源。
在一些实施例中,在所述确定所述第三算子图的目标运行资源后,还包括:若多个算子图包括无时序关系的第四算子图,确定除所述第三算子图的目标运行资源外的剩余运行资源,为所述第四算子图的目标运行资源。
在一些实施例中,所述资源包括运行资源和时间资源;所述确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有时序关系的第三算子图,至少确定所述第三算子图为实测算子图,并确定多个实测算子图组,每个实测算子图组包括至少一个第三算子图;所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:分时的将每个所述实测算子图组加载至众核系统中运行,根据各所述实测算子图组的运行信息,确定各实测算子图组的目标时间资源,并确定符合资源分配类型的条件的测试资源为各所述实测算子图组的目标运行资源。
在一些实施例中,所述用测试资源运行所述实测算子图包括:将所述实测算子图加载至众核系统中,在众核系统中用测试资源运行所述实测算子图。
在一些实施例中,所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:用测试资源运行所有所述实测算子图,根据所有所述实测算子图在测试资源下运行时的运行信息,确定所有所述实测算子图的目标资源。
在一些实施例中,所述确定多个算子图中的至少部分算子图为实测算子图包括:确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图;所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:用测试资源分别运行各所述实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的运行资源。
在一些实施例中,每个所述实测算子图组中的实测算子图的配置数据总量小于或等于众核系统的片上存储空间。
在一些实施例中,所述确定所述实测算子图的目标资源包括:确 定能使所述实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源。
在一些实施例中,所述确定能使所述实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:确定能使所述实测算子图的运行信息达到需求信息的、最少的资源为实测算子图的目标资源。
在一些实施例中,所述确定能使所述实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:若无法确定出能使所述实测算子图的运行信息达到需求信息的资源,则发出提示。
第二方面,本发明实施例提供一种资源分配装置,其包括:算子图确定模块,用于确定多个待分配资源的算子图,确定多个算子图中的至少部分算子图为实测算子图;资源确定模块,用于用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源;其中,所述算子图的目标资源为算子图在众核系统中运行时占用的资源。
第三方面,本发明实施例提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本发明实施例任意一种资源分配方法。
第四方面,本发明实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例任意一种资源分配方法。
本发明实施例中,先确定算子图在不同测试资源下实际运行时的运行状态(运行信息),从而可明确获知用各种资源(测试资源)运行算子图时的实际效果,进而根据该效果确定出众核系统中分配给各算子图的实际资源(目标资源),从而其可合理配置众核系统的资源,提高资源利用率。
附图说明
图1为本发明实施例提供的一种资源分配方法的流程示意图。
图2为本发明实施例提供的另一种资源分配方法的流程示意图。
图3为本发明实施例提供的另一种资源分配方法的流程示意图。
图4为本发明实施例提供的另一种资源分配方法的流程示意图。
图5为本发明实施例提供的另一种资源分配方法的流程示意图。
图6为本发明实施例提供的一种资源分配装置的结构组成框图。
图7为本发明实施例提供的一种计算机设备的结构组成框图
图8为本发明实施例提供的一种计算机可读存储介质的结构组成 框图。
具体实施方式
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。
第一方面,本发明实施例提供一种资源分配方法。
本发明实施例的资源分配方法可由相应的资源分配装置执行,该装置可采用软件和/或硬件的方式实现,并一般可集成计算机设备中等。
本发明实施例的资源分配方法,用于为需要在众核系统中运行的各算子图分配目标资源,从而,当各算子图在众核系统中运行时,分别实际占用各自的目标资源。
其中,以上资源资源分配可以是在众核系统运行算子图时进行(即动态分配),也可以是在算子图编译时提前进行(预分配)。
本发明实施例中,众核(Many Core)系统是指由大量处理核以预设方式连接在一起构成的处理核集合系统。其中,每个处理核(核心,Core)为可独立调度并拥有独立计算能力的最小单元,即每个处理核都具有自己独立的存储资源和计算资源,从而可独立的进行所需运算。、
而众核系统的不同处理核间可通过路由(例如路由可为总线、片上网络等形式)相互连接,从而任意两个处理核间可实现信息交互。由此,众核系统中的多个处理核,可相互协同进行一定的运算。除处理核和路由外,众核系统中还可包括一些其它的单元,如控制各处理核的调度器,可供各处理核访问的片上存储空间等。
其中,众核系统的具体形式是多样的,如众核系统可为一个多核芯片,或为多个单核芯片的组合,或为多个多核芯片的组合等。
本发明实施例中,每个算子图包括至少一个算子(或者说运算,如卷积、加减乘除、矩阵加乘等),算子图包括至少两个算子时,不同算子间可以具有一定的关系,如前一算子的输出作为相邻后一算子的输入等。不同算子间还可以不相关,例如,算子图可以至少包括并行的多个分支,每个分支可以包括至少一个算子。其中,并行的不同分支可以互不关联,例如,算子图包括两个算子,两个算子是并行的。其中,并行的不同分支还可以具有相同的输入。例如,可以是至少一个算子的输出作为并行的所有分支的输入,还可以是至少一个算子的输出作为部分分支的输入,不同分支的输入可以为不同算子的输出。本公开对算子图包括的算子的数量、算子间的关系均不作限制。
通过内部多个算子的共同作用,算子图整体为用于实现相对完整 的特定功能的算法,其包括但不局限于人工智能(Artificial Intelligence,AI)算法、机器学习算法、通用科学计算算法等。
其中,神经网络是能实现一定功能的深度学习模型(机器学习算法),例如图像检测神经网络、语音检测神经网络、图像识别(如识别人、车等)神经网络等。
应当理解,神经网络的划分可根据其实现功能的不同而不同,例如,若图像识别神经网络对图像检测神经网络检测到的图像进行识别,则它们可视为两个连续工作的神经网络,也可视为一个图像“检测+识别”的神经网络。
相应的,算子图的界定方式也是多样的,如算子图可为一个模型,或为模型的一部分,或为多个模型的整体或一部分。由此,一个算子图可被“划分(折叠)”为多个算子图,而多个算子图也可“合并”为一个算子图。通过将算子图配置为模型或者模型的一部分,可丰富众核系统的应用场景,以及丰富算子图的业务模式,同时可以提高众核系统的资源利用率。
由此,当众核系统要处理多个算子图时,需要给不同的算子图分配不同的资源,或者说要每个算子图“映射”至众核网络的不同资源上。即,众核系统在运行时,其中的每个算子图都应不重复的使用(占用)众核系统中的一部分确定资源。
其中,众核系统的资源可包括运行资源,或者说是众核系统的硬件资源,例如是系统的处理核、线程、片上存储空间等,为算子图分配目标运行资源后,当算子图在众核系统中运行时,其目标运行资源中的处理核、线程、片上存储空间仅能被该算子图占用。
众核系统的资源还可包括时间资源,为算子图分配的目标时间资源相当于规定算子图应在什么时间被众核系统处理。也就是说,多个算子图可“时分复用”的在众核系统中处理,即,众核系统在一个时间段内先加载并处理一个或多个算子图,而在处理完成后,将其得到的数据(结果数据)搬运出去或暂存在片上存储空间中,再将后续的一个或多个其它算子图的数据(配置数据)搬运(加载)至众核系统中,并对新加载的算子图进行处理。应当理解,在同一时刻被众核系统处理的每个算子图,也都应被分配众核系统中的一些运行资源(目标运行资源)。
其中,应当理解,同样的资源不能被同时分配给多个不同算子图,即,任意的运行资源,在相同的时间资源下(相同时间),最多只能作为一个算子图的目标资源;但相同的运行资源可在不同时间分配给不同的算子图。
当然,若部分运行资源在某些时间是“空闲”的,不被分配给任何算子图,也是可行的。
参照图1,本发明实施例的资源分配方法包括以下S001至S002。
S001、确定多个待分配资源的算子图,确定多个算子图中的至少部分算子图为实测算子图。
首先,确定有哪些算子图要在众核系统中运行,例如可对要在众核系统中运行的所有运算(如模型)进行划分,而得到多个算子图。
之后,确定这些算子图中的至少一部分为需要进行后续工作的实测算子图。
S002、用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源。
其中,算子图的目标资源为算子图在众核系统中运行时占用的资源。
其中,“运行信息”是指在算子图的实际运算过程中所实际表现出的各种性能的参数,如处理速度、能耗、数据搬运量等。
同一个算子图在同一个众核系统中运行时,若其占用的资源不同,则其在运行中的实际性能(运行信息)也会有不同,且这种不同是无法通过理论准确预测的,即,只有将算子图实际在一定资源(测试资源)下运行,才可确定其相应的运行信息。
本发明实施例中,先用一定的资源(测试资源)实际运行以上实测算子图,例如让实测算子图处理一些数据(如随机产生的图像、语音等),并根据实测算子图在这些测试资源下运行时的实际运行信息,确定应为各实测算子图分配哪些资源(目标资源),也就是确定各算子图应如何在众核系统中运行。
应当理解,以上测试资源和目标资源,都应是众核系统的资源的一部分,而不能超出众核系统的资源。
其中,“用测试资源运行实测算子图”的具体方式,以及“根据运行信息确定目标资源”的具体方式都是多样的。例如,可以是在多个不同的测试资源下运行实测算子图,并根据实测算子图在各测试资源下的运行信息,确定其中一个测试资源为目标资源,或者是用各测试资源计算出目标资源。
其中,以上“多个不同的测试资源”的具体形式也是多样的,例如,多个不同的测试资源可为多个预定的资源,或是逐渐增加/减少的资源,或是根据前一个测试资源下的运行信息确定下一个测试资源等。
应当理解,除了为实测算子图确定目标资源外,算子图中还可能 包括“非实测算子图”,故也可为这些“非实测算子图”分配相应的目标资源,但不是通过使“非实测算子图”在测试资源下运行而分配,例如,可以是为“非实测算子图”分配除去实测算子图的目标资源后剩余的资源。
本发明实施例中,先确定算子图在不同测试资源下实际运行时的运行状态(运行信息),从而可明确获知用各种资源(测试资源)运行算子图时的实际效果,进而根据该效果确定出众核系统中分配给各算子图的实际资源(目标资源),从而其可合理配置众核系统的资源,提高资源利用率。
在一些实施例中,用测试资源运行实测算子图(S002)包括:将实测算子图加载至众核系统中,在众核系统中用测试资源运行实测算子图。
作为本发明实施例的一种方式,可以是将实测算子图实际加载(映射)至众核系统中,并分配给其众核系统中的部分资源作为测试资源,以确定其在众核系统中实际运行时的运行信息。
当然,如果是用其它的测试系统,“模拟”得到实测算子图在众核系统中的测试资源下在运行信息,也是可行的。只要保证在该测试系统中的运行状态与在众核系统中运行时是相同的即可。
在一些实施例中,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源(S002)包括:用测试资源运行所有实测算子图,根据所有实测算子图在测试资源下运行时的运行信息,确定所有实测算子图的目标资源。
作为本发明实施例的一种方式,可以是将所有确定的实测算子图,一次性“同时”用测试资源运行(当然,其中每个实测算子图实际都在全部测试资源的一部分资源下运行),从而“一起”确定所有实测算子图的目标资源(当然实际也要具体确定出每个实测算子图的目标资源)。
在一些实施例中,确定多个算子图中的至少部分算子图为实测算子图(S001)包括:确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图。
作为本发明实施例的另一种方式,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源(S002)包括:用测试资源分别运行各实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的目标资源。
作为本发明实施例的另一种方式,也可以是将实测算子图分为多 组(每组包括一个或多个实测算子图),而实测算子图则分别的在各自的测试资源下运行(即不同实测算子图组的测试资源不同),从而通过“逐组(若每组只有一个实测算子图就是逐一)”遍历的方式依次确定出每组(每个)实测算子图的目标资源(若一组有多个实测算子图,当然也要具体确定出一组中每个实测算子图的目标资源)。例如,可以是将一个实测算子图组加入众核系统中运行并确定其目标资源,再将下一个实测算子图组加入众核系统中运行并确定其目标资源,依次类推,直到确定出所有组实测算子图的目标资源。
在一些实施例中,每个实测算子图组中的实测算子图的配置数据总量小于或等于众核系统的片上存储空间。
作为本发明实施例的一种方式,任意一个实测算子图组中所有的实测算子图的配置数据的总数据量应不超过众核系统的片上存储空间。其中,片上存储空间是指众核系统内部的存储空间,其随着众核系统的不同而不同,通过芯片的并联等可增大众核系统的片上存储空间,并提高众核系统的计算能力。
同时,每个算子图都有一定的配置数据(如连接权重、膜电位、发放阈值、卷积核等),而这些配置数据需要被存储至众核系统的片上存储空间,之后,众核系统才能对相应的算子图进行运算和处理。
而如果众核系统要同时处理的算子图的配置数据的总数据量,超过了众核系统片上存储空间的容量,则这些算子图就无法被同时加载和处理,故每个众核系统同时能“放下”的算子图总数是有限的,或者说,众核系统可能“放不下”过多的算子图。
而如果众核系统“放不下”算子图,则应采用“时分复用”的方式将这些算子图分时(串行)处理,或者是将一个算子图(或一组算子图)“折叠(拆分)”为多个算子图(或多组算组图)后,再分时的用测试资源运行,以及分别确定其目标资源。
因此,对以上划分出的每个实测算子图组,可以都是众核系统“能放下”的,以便用众核系统中的运行资源可同时处理每个实测算子图组中的全部实测算子图。
在一些实施例中,确定实测算子图的目标资源(S002)包括:确定能使实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源。
作为本发明实施例的一种方式,可预先设定有“需求信息”,即用户希望众核系统运行算子图时能达到的性能,如希望的处理速度、能耗等。此时,可用“达到需求信息”作为目标,确定实测算子图的目标资源,即保证用确定出的目标资源运行相应实测算子图时,可使 其达到需求信息。
其中,应当理解,如果没有“需求信息”,而只是按照“尽量优化”实测算子图的运行信息的方式,确定实测算子图的目标资源,也是可行的。
在一些实施例中,确定能使实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:确定能使实测算子图的运行信息达到需求信息的、最少的资源为实测算子图的目标资源。
作为本发明实施例的一种方式,当要求目标资源能使实测算子图的运行信息达到需求信息时,可尽量选择“最少”的资源为目标资源,即,以“刚好能满足”需求信息的资源作为实际的目标资源,以达到尽量节约资源,提高资源利用率的目的。
其中,选出符合以上需求信息的要求的目标资源的具体方式是多样的。例如,可以是逐渐增加或减少测试资源的资源量,并分别用各不同测试资源运行算子图,从而确定出其中“刚好能满足”需求信息的测试资源作为目标资源。
在一些实施例中,确定能使实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:若无法确定出能使实测算子图的运行信息达到需求信息的资源,则发出提示。
在一些情况下,可能无论如何,也无法找到能使实测算子图达到需求信息的资源分配方式(如众核系统的总资源量不足),此时,则可向用户发出提示(如显示提示文字,或发出提示语音等),“告知”用户当前无法完成满足需求信息的资源分配。从而,用户可进行进一步的相应操作。例如,用户可允许相应降低需求信息,或者是确定可减少要分配资源的算子图的数量等,以根据调整后的情况重新执行本发明实施例的方法,实现资源分配。
其中,由于众核系统的总计算能力是固定,故只要连接总计算能力,并相应设定合理的需求信息,即可避免以上无法满足需求信息的情况的出现。
在一些实施例中,资源包括运行资源;确定实测算子图的目标资源(S002)包括:确定符合资源分配类型的条件的测试资源为实测算子图的目标运行资源。
作为本发明实施例的一种方式,可设定不同的“资源分配类型”,每种资源分配类型具有一定的条件,从而当处在一种资源分配类型(或者说“分配模式”)下时,可用符合当前的资源分配类型的要求测试资源,作为实测算子图的目标运行资源。
在一些实施例中,用测试资源运行实测算子图,根据实测算子图 在测试资源下运行时的运行信息,确定实测算子图的目标资源(S002)包括:用多级测试资源分别运行实测算子图,根据实测算子图在各级测试资源下运行时的运行信息,确定符合资源分配类型的条件的一级测试资源为实测算子图的目标运行资源。
其中,任意两级测试资源中,较高级测试资源的运行资源量大于较低级测试资源的运行资源量。
作为本发明实施例的一种方式,可以是在多“级”不同的测试资源下分别运行实测算子图,并根据实测算子图在各级测试资源下的运行信息,选择其中一级测试资源作为相应的目标运行资源。
本发明实施例中,测试资源的“级”是以其中的“运行资源量”划分的,即,越高级的测试资源,包括的运行资源量越多。例如,运行资源具体可包括处理核、线程、片上存储空间等,相应的,测试资源包括的处理核个数越多、线程个数越多、片上存储空间量越大(只要以上一项有差别即可),则其相应的“级”越高。
其中,应当理解,相邻两“级”测试资源所包括运行资源量的差可根据需要预先设定,而不一定是理论上可存在的最小运行资源量差距,例如,相邻两“级”测试资源包括的处理核个数可只相差一个,也可相差其它预定个数(如相差总处理核个数的10%)。
在一些实施例中,运行资源包括处理核。
作为本发明实施例的一种方式,可以直接将各处理核作为目标运行资源分给相应的算子图,即,每个算子图可由其对应的一个或多个处理核处理。
由此,不同级的测试资源可以是不同个数的处理核,例如,可为每增加一个处理核就是测试资源增加一级。
当然,运行资源的具体内容不限于处理核,例如,运行资源还可包括线程个数、片上存储空间量等,即这些资源的不同也可视为不同级的测试资源。
在一些实施例中,资源分配类型包括高性能类型;符合高性能类型的条件的测试资源为:所有级测试资源中,使实测算子图的处理速度最快的一级测试资源。
作为本发明实施例的一种具体方式,资源分配类型包括高性能类型,在这种资源分配类型(分配模式)下,主要目的是使算子图在众核系统中运行时的性能达到最佳,故应选用处理速度(即算子图在单位时间内能处理的数据量)最快的测试资源作为目标运行资源。
其中,运行资源量与算子图的处理速度并不一定是正相关的,因为,若一个算子图对应的运行资源量(如处理核个数)过多,反而可 能导致该算子图在映射过程中被过分“打散”到多个处理核上,反而导致处理速度降低。
其中,找到以上使实测算子图的处理速度最快的测试资源的具体方式是多样的。示例性的,可以是采用“二分法”确定出目标运行资源。例如,先分别用一个处理核、一半处理核、全部处理核运行实测算子图,而此时可确定出,目标运行资源(即处理速度最快的处理核数量)应在两端的处理核数量(一个处理核、全部处理核)中处理速度较快的那个与中间的处理核数量(一半处理核)之间;例如,假设全部处理核对应的处理速度快于一个处理核对应的处理速度,则可确定处理速度最快的处理核数量必定在全部处理核于一半处理核之间,从而下次的测试资源可为以上二者的中间(二分),即四分之三数量的处理核;以此类推,直到确定出处理速度最快的个数的处理核为目标运行资源。
在一些实施例中,资源分配类型包括节能类型;任意两相邻级测试资源中的运行资源量的差相等;符合节能类型的条件的测试资源为:时间降低值大于预设阈值的所有级测试资源中的最低级测试资源。
其中,测试资源的时间降低值为用该级测试资源运行实测算子图处理预定数据的耗时,相对用比该级测试资源低一级的测试资源运行实测算子图处理预定数据的耗时的减少量。
作为本发明实施例的另一种具体方式,资源分配类型包括节能类型,在该种资源分配类型(分配模式)下,主要目的是在满足基本性能(如基本的处理速度)的前提下,尽量使算子图运行时的“能耗”最低。
通常而言,单位时间内数据搬运量(如时分复用时导致的数据搬运)越大能耗越大,单位时间内进行处理的处理核个数越多能耗越大,单位时间内调度器发出的调度指令越多能耗越大。由于算子图的基本运算是必须进行的,故节能类型主要考虑减少数据搬运,使调度指令、处理核运算、吞吐量、延时、时序、串并行关系等都偏向降低数据搬运量的方向,以在满足一定的性能的条件下降低能耗。
其中,找到以上最低级测试资源的具体方式是多样的。示例性的,可从最小资源量开始逐渐增加测试资源的资源量,直到找到使性能大幅提升的测试资源,作为目标运行资源。例如,可先用理论可行的最小数量的处理核(根据算子图的运算量算出)运行算子图,并记录其处理预定数据(如随机生成的输入数据)的耗时,之后增加一定量的处理核(如增加总数10%的处理核),再次运行算子图并记录其处理同样的预定数据的耗时,以此类推,直到某次增加处理核后,耗时突然显著降低(当然也就是处理速度突然显著增大),也就表明此时测 试资源的时间降低值大于预设阈值,故可以本次增加后的数量的处理核作为目标运行资源。
在一些实施例中,资源分配类型包括均衡类型;符合均衡类型的条件的测试资源为:所有级测试资源中,能耗比最大的一级测试资源。
其中,测试资源的能耗比为用该级测试资源运行实测算子图时消耗单位能耗所能处理的数据量。
作为本发明实施例的另一种具体方式,资源分配类型包括均衡类型,在该种资源分配类型(分配模式)可看作是以上性能类型和节能类型的“中间模式”或“综合模式”,其目的是平衡性能(如处理速度)与能耗之间的关系,以获得最优的能耗比。
其中,能耗比为算子图每消耗单位的能量所能处理的数据量。例如,一定测试资源下的能耗比可通过以下公式计算:
Figure PCTCN2021114217-appb-000001
其中,处理核运行时间是指处理核处理一些数据所花费的时间,处理核运算频率是指在单位时间内处理核能进行的运算的次数,例如为处理核的主频乘以处理核个数,而搬运配置的能耗和处理核计算的能耗为主要的两种能耗形式。应当理解,此时的处理核均是指当前测试资源中的所有处理核。
其中,确定各级测试资源能耗比的具体方式是多样的。例如可分别在各级测试资源下运行算子图相同的时间,并分别记录各级测试资源的能耗,以算出各级测试资源所的能耗比。
应当理解,以上各资源分配类型对应的确定目标资源的具体方式并不限于以上的例子。例如,可以是先分别用所有级的测试资源都运行算子图,并根据所有级测试资源的运行结果,选出符合相应条件的一级测试资源作为目标运行资源。
应当理解,以上根据不同的资源分配类型的确定相应目标运行资源的方式,与各种具体的用测试资源运行实测算子图的方式是可相互组合、兼容的。
例如,可以是用各级测试资源分别运行所有的实测算子图并同时确定它们的目标运行资源,也可以是对每个(每组)实测算子图,分别用其对应的各级测试资源运行,以逐一确定每组实测算子图对应的目标运行资源。
在一些实施例中,若所有实测算子图的配置数据总量小于或等于众核系统的片上存储空间,资源分配类型包括高性能类型、节能类型、均衡类型中的任意一种。
若所有实测算子图的配置数据总量大于众核系统的片上存储空间,资源分配类型包括高性能类型、均衡类型中的任意一种。
作为本发明实施例的一种方式,当所有实测算子图的配置数据总量不超过众核系统的片上存储空间时(即众核系统“能放下”所有实测算子图时),则可选择高性能类型、节能类型、均衡类型中的任意一种作为当前的分配模式。
而当所有实测算子图的配置数据总量超过众核系统的片上存储空间时(即众核系统“放不下”所有实测算子图时),则可选的资源分配类型只包括高性能类型和均衡类型,而没有“节能类型”,这是因为,若众核系统“放不下”实测算子图,则必须采用以上时分复用的方式,从而也必然涉及大量的数据搬运,能耗不可能有效降低,故也就无法采用节能类型。
在一些实施例中,资源包括时间资源。
参照图2,确定多个算子图中的至少部分算子图为实测算子图(S001)包括以下S101。
S101、确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图。
参照图2,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源(S002)包括以下S102。
S102、分时的将每个实测算子图组加载至众核系统中运行,根据所有实测算子图组的总运行信息,确定各实测算子图组的目标时间资源。
作为本发明实施例的一种方式,若所有实测算子图的配置数据总量超过众核系统的片上存储空间时(即众核系统“放不下”实测算子图),则需要将其分为多个实测算子图组(包括将一个算子图“拆分”为多个实测算子图组),每组包括部分实测算子图。
由于同组中的实测算子图是众核系统“能放下”的,故可同时在众核系统中运行,而不同组的众核系统则需要“时分复用”,即众核系统先加载并处理一个实测算子图组一段时间,再将该实测算子图组移出,并加载和处理下一个实测算子图组一段时间,依次类推。
而在“时分复用”的运行完各实测算子图后,可根据所有实测算子图组整体的运行状态(总运行信息),确定各实测算子图组所应运行的时间(目标时间资源),以优化它们整体的运行状态。
其中,应当理解,除了确定各实测算子图组的目标运行时间外,在每个实测算子图组运行时,还可根据其运行信息,通过本发明实施 例的任意方式(如以上的某个分配模式)为其分配相应的目标运行资源,更具体是确定组中每个实测算子图的目标运行资源,在此不再详细描述。
在一些实施例中,根据所有实测算子图组的总运行信息,确定各实测算子图组的目标时间资源(S102)包括:根据所有实测算子图组的运行信息,确定能使所有实测算子图组的总运行信息达到需求信息的时间资源为各实测算子图组的目标时间资源。
作为本发明实施例的一种方式,也可用让各算子图的总运行信息达到需求信息为目标,为各实测算子图组分配目标时间资源。例如可以是,调整各实测算子图组的运行时间,以使所有实测算子图组的总处理速度(如帧率)达到预期处理速度(如预期帧率)。
在一些实施例中,需求信息包括处理速度需求信息。
作为本发明实施例的一种方式,需求信息可为希望达到的处理速度(如预期帧率),从而总运行信息也是相应的整体处理速度。
其中,处理速度具体可由“帧率”代表,即在连续输入多个数据(如多帧图像)的情况下,各实测算子图组在单位时间能处理的数据的个数(如图像的帧数)。
在一些实施例中,分时的将每个实测算子图组加载至众核系统中运行(S102)包括:分时的将每个实测算子图组加载至众核系统中运行相等的时间段。
作为本发明实施例的一种方式,可以是先对每个实测算子图组用“相等”的时间进行运行,即在“时分复用”的运行各实测算子图组的过程中,各实测算子图组是相等的时间段内“等时”运行的,并获取该“等时”运行方式下的总运行信息。从而,在后续确定每个实测算子图组的目标时间资源的过程中,相当于通过分析,确定各实测算子图组的运行时间应当“延长”或“缩短”,以改善其整体的运行状态(总运行信息)。
在一些实施例中,确定各实测算子图组的目标时间资源(S102)包括:确定在每个预定的运行周期中各实测算子图组的运行时间占比。
作为本发明实施例的一种方式,以上各实测算子图组的目标时间资源,可以不是绝对的时间段,而是在每个运行周期中,各实测算子图组的运行时间的相对比例。
即,在众核系统的工作过程中,各实测算子图组应按照预定的次序轮流运行,且其中每个实测算子图组都运行一个预定的时长,从而在任意一段连续的确定时间长度(运行周期)中,各实测算子图组的运行时间均符合特定的比例关系。
在一些实施例中,资源包括运行资源。
参照图3,确定多个算子图中的至少部分算子图为实测算子图(S001)包括以下S201。
S201、若多个算子图包括具有需求信息的第一算子图,至少确定第一算子图为实测算子图。
参照图3,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源(S002)以下S202。
S202、若存在第一算子图,用测试资源运行第一算子图,根据第一算子图在测试资源下运行时的运行信息,确定能使第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源。
在所有算子图中,可能有至少部分算子图预先设定对应的有需求信息(例如预期的处理速度),其称为第一算子图。而具有需求信息的算子图往往是比较重要的关键算子图,因此,优先满足这些第一算子图的需求信息,更有利于改善所有算子图的整体性能。
本发明实施例中,若存在有需求信息的第一算子图时,则优先用测试资源运行第一算子图,并根据运行信息为第一算子图分配能满足需求信息的目标运行资源。
其中,以上目标运行资源,进一步可以是能满足需求信息的、最少的运行资源,在此不再详细描述。
应当理解,以上确定第一算子图的目标运行资源的具体方式是多样的。例如,可以是同时运行所有第一算子图以同时确定所有第一算子图的目标运行资源,也可以是逐一(组)运行(遍历)各第一算子图,以分别依次确定不同第一算子图的目标运行资源;再如,其中确定目标运行资源可采用以上任意一种分配模式的方式,在此不再详细描述。
作为本发明实施例的一种方式,用测试资源运行第一算子图,根据第一算子图在测试资源下运行时的运行信息,确定能使第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源(S202)包括以下S2021至S2023。
S2021、用测试资源运行第一算子图,并获取第一算子图的运行信息(如处理速度)。
S2022、若运行信息与需求信息(如期望处理速度)之间的差值大于设定阈值(如速度阈值),则减少测试资源,并返回步骤S2021。
S2023、若运行信息与需求信息(如期望处理速度)之间的差值小 于或等于设定阈值,则确定当前的测试资源为目标运行资源。
作为本发明实施例的一种方式,测试资源可从较大的运行资源量(如当前剩余的全部运行资源)开始,逐渐降低,直到在某个测试资源下的运行信息等于或稍微高于需求信息,就可认为这个测试资源是能满足运行信息的最小测试资源,故以其为目标运行资源。
其中,减少测试资源的具体方式也是多样的,并可根据具体的运行资源类型而确定。例如,若运行资源包括处理核和线程(线程为操作系统能够进行运算调度的最小逻辑单位,它被包涵在进程之中,是行程中的运作单位),则“减少测试资源”的方式可包括:计算测试资源包括的处理核与还未分配的处理核的数量间的第一数量比值,计算测试资源包括的线程与还未分配的线程的数量间的第二数量比值;若第一数量比值大于或等于第二数量比值,则减少测试资源包括的处理核的数量,而若第一数量比值小于第二数量比值,则减少测试资源包括的线程的数量。
在一些实施例中,用测试资源运行第一算子图,根据第一算子图在测试资源下运行时的运行信息,确定能使第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源(S202)包括:若第一算子图中包括有时序关系的第一算子图和无时序关系的第一算子图,则进行以下S2024、S2025。
S2024、用测试资源运行有时序关系的第一算子图,根据有时序关系的第一算子图在测试资源下运行时的运行信息,确定能使有时序关系的第一算子图的运行信息达到需求信息的运行资源为有时序关系的第一算子图的目标运行资源。
S2025、用测试资源运行无时序关系的第一算子图,根据无时序关系的第一算子图在测试资源下运行时的运行信息,确定能使无时序关系的第一算子图的运行信息达到需求信息的运行资源为无时序关系的第一算子图的目标运行资源。
对于有需求信息的第一算子图,还可按照其是否有时序关系再分为两类。
其中,“时序关系”是不同算子图在运行时,所必须满足的运行时间的顺序关系。例如,时序关系可包括“串行关系”和“并行关系”。
本发明实施例中,具有时序关系的算子图是指,该算子图必须与至少一个其它的算子图之间具有时序关系,例如是以上串行关系、并行关系,或者是由“多层”串行关系、并行关系组成的更复杂的间接时序关系。而一个算子图没有时序关系是指,该算子图是相对独立的,其与其它任意一个算子图之间均无时序关系。
其中,“串行关系”是指一个算子图的运算必须用到另一个算子图的输出(计算结果)作为输入(或部分输入),从而,只有当在前算子图的运算完成后,在后算子图才可能开始运行,即,该两个算子图是“串行”的。而“并行关系”是指,多个算子图都要处理一些相关的数据,而它们的处理结果要再被一起利用,从而这些算子图应当同时运行的,也就是“并行”的。
例如,若要用图像识别神经网络对图像检测神经网络检测到的图像进行图像识别,则,图像识别神经网络就是串行关系中的在后算子图,而图像检测神经网络就是串行关系中的在前算子图。
再如,若是要同时用图像检测神经网络和语音检测神经网络进行检测,只有在二者都检测到对象(图像、语音)时才触发后续的工作,则该图像检测神经网络和该语音检测神经网络就是两个并行关系的算子图。
显然,多个有时序关系的算子图要整体运行结束,必然是要求其中“最后一个”算子图运行结束,即,多个有时序关系的算子图的整体处理速度,本质上决定于其中“最慢的”算子图的处理速度。为此,每个有时序关系的算子图的处理速度,实际可能影响多个与其相关的算子图的处理速度。
在对多个第一算子图分配目标运行资源时,可“优先”为其中有时序关系的第一算子图分配目标运行资源,以尽量保证有时序关系的第一算子图可分到更好的目标资源(例如不必降低需求信息),从而提高所有算子图的整体性能。
在一些实施例中,参照图3,在确定能使第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源(S202)后,还包括以下S203。
S203、若多个算子图包括无需求信息的第二算子图,确定除第一算子图的目标运行资源外的剩余运行资源,为第二算子图的目标运行资源。
本发明实施例中,所有的算子图中,可能还包括无需求信息的算子图(第二算子图),这些第二算子图虽然没有明确的需求信息,但也需要在众核系统中运行,故也需要为其分配一定的目标运行资源。
因此,可在确定出分配给所有第一算子图的目标运行资源后,将这些已分配目标运行资源排除,而以众核系统中剩余的还未分配的运行资源(即剩余运行资源),作为分配给第二算子图的目标运行资源。
当然,为第二算子图分配目标运行资源的具体方式是多样的。
例如,可以是将剩余运行资源均匀分配给各第二算子图,或是根 据理论估算给各第二算子图分配剩余运行资源。
再如,第二算子图也可作为“实测算子图”,即,可在用测试资源运行各第一算子图并确定各第一算子图的目标运行资源后,继续用其它的测试资源运行第二算子图,以根据第二算子图的运行信息,确定各第二算子图的目标运行资源。当然,此时对应第二算子图的测试资源和目标运行资源,都应在除去分配给各第一算子图的目标资源后的剩余运行资源的范围内。
再如,也可将第二算子图也分为多个“组”,每组包括一个或多个第二算子图(每组中的第二算子图数量可根据剩余运行资源量确定),而每组第二算子图均可占用所有的剩余运行资源,但各组第二算子图“时分复用”的运行,即不同组的第二算子图占据不同的目标时间资源。
在一些实施例中,资源包括运行资源。
参照图4,确定多个算子图中的至少部分算子图为实测算子图(S001)包括以下S301。
S301、若多个算子图包括具有时序关系的第三算子图,至少确定第三算子图为实测算子图。
参照图4,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源(S002)包括以下S302。
S302、用测试资源运行第三算子图,根据第三算子图在测试资源下运行时的运行信息,确定第三算子图的目标运行资源。
在所有算子图中,可能有至少部分算子图是具有时序关系的,其称为第三算子图,而如前,具有时序关系的多个算子图的整体处理速度,本质上决定于其中“最慢的”算子图的处理速度,故只要有一个具有时序关系的算子图运行很慢,就会拖慢多个相关算子图的实际运行速度。因此,优先为这些具有时序关系的第三算子图分配目标运行资源(如保证其能满足需求信息),更有利于改善所有算子图的整体性能。
本发明实施例中,若存在有时序关系的第三算子图时,则优先用测试资源运行第三算子图,并根据其运行信息为其分配目标运行资源(如满足其需求信息的目标运行资源)。
在一些实施例中,至少确定第三算子图为实测算子图(S301)还包括:确定多个实测算子图组,每个实测算子图组包括至少一个第三算子图,任意两个之间有时序关系的第三算子图位于同一实测算子图组或相邻实测算子图组。
用测试资源运行第三算子图,根据第三算子图在测试资源下运行时的运行信息,确定第三算子图的目标运行资源(S302)包括:用测试资源分别运行各实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的目标运行资源。
所有具有时序关系的第三算子图中,并不一定是“任意两者”之间都有需求关系,同时,众核系统也不一定“能放下”所有的具有时序关系的第三算子图。为此,还可对以上第三算子图进行分组,使之间确实时序关系的第三算子图同组或位于相邻组,且每组的所有第三算子图都是众核系统“能放下”的,从而众核系统可分别运行各组第三算子图,并分别确定每组第三算子图的目标运行资源(当然也要具体确定出一组中每个第三算子图的目标运行资源)。
在一些实施例中,参照图4,在确定第三算子图的目标运行资源(S302)后,还包括以下S303。
S303、若多个算子图包括无时序关系的第四算子图,确定除第三算子图的目标运行资源外的剩余运行资源,为第四算子图的目标运行资源。
其中,所有的算子图中,可能还包括无时序关系的算子图(第四算子图),这些算子图虽然没有时序关系,但也需要在众核系统中运行,故也需要为其分配一定的目标运行资源。
因此,可在确定出分配给所有第三算子图的目标运行资源后,将这些已分配的目标运行资源排除,而以众核系统中剩余的未分配的运行资源(即剩余运行资源),作为分配给第四算子图的目标运行资源。
当然,为第四算子图分配目标运行资源的具体方式是多样的。
例如,可以是将剩余运行资源均匀分配给各第四算子图,或是根据理论估算确定分配给各第四算子图的剩余运行资源。
再如,第四算子图也可作为“实测算子图”,即,可在确定各第三算子图的目标运行资源后,继续用测试资源运行第四算子图,以根据第四算子图的运行信息,确定各第四算子图的目标运行资源。当然,此时对应第四算子图的测试资源和目标运行资源,都应在除去分配给各第三算子图的目标资源后的剩余运行资源的范围内。
再如,也可将第四算子图也分为多个“组”,每组包括一个或多个第四算子图(每组中的第四算子图数量可根据剩余运行资源量确定),而每组均可占用所有的剩余运行资源,但各组第四算子图“时分复用”的运行,即不同组的第四算子图占据不同的目标时间资源。
在一些实施例中,资源包括运行资源和时间资源。
参照图5,确定多个算子图中的至少部分算子图为实测算子图 (S001)包括以下S401。
S401、若多个算子图包括具有时序关系的第三算子图,至少确定第三算子图为实测算子图,并确定多个实测算子图组,每个实测算子图组包括至少一个第三算子图。
参照图5,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源(S002)包括以下S402。
S402、分时的将每个实测算子图组加载至众核系统中运行,根据各实测算子图组的运行信息,确定各实测算子图组的目标时间资源,并确定符合资源分配类型的条件的测试资源为各实测算子图组的目标运行资源。
作为本发明实施例的一种方式,可以是“综合运用”本发明实施例的多种方式,实现资源分配。
具体的,在算子图包括有时序关系的第三算子图时,可先将第三算子图分为多个实测算子图组,并以分时运行各实测算子图组,以分别确定每个实测算子图组的目标时间资源。而在每个实测算子图组的运行过程中,可根据当前选择的资源分配类型(分配模式),确定该实测算子图组在该目标时间资源内的目标运行资源,更具体是确定实测算子图组中每个第三算子图在该目标时间资源内的目标运行资源。
应当理解,以上实施例还可与本发明实施例的其它任意方式结合使用。例如,其中为每个实测算子图组确定的目标运行资源,可使该实测算子图组的运行信息满足对应的需求信息;而为各实测算子图组确定的目标时间资源,可使所有实测算子图组的总运行信息满足总的需求信息,在此不再详细描述。
应当理解,以上实施例中,确定目标运行资源和目标时间资源的具体运算方式也是多样的,在此不再详细描述。
即以上实施例只是本发明的一个具体例子,而不是对本发明保护范围的限定。
第二方面,本发明实施例提供一种资源分配装置,其是实现本发明上述实施例提供的资源分配方法的相应装置,该装置可采用软件和/或硬件的方式实现,并一般可集成计算机设备中等。
图6为本发明实施例中的一种资源分配装置40的示意图。
参照图6,本发明实施例提供的一种资源分配装置40包括算子图确定模块410和资源确定模块420。
算子图确定模块410,用于确定多个待分配资源的算子图,确定 多个算子图中的至少部分算子图为实测算子图。
资源确定模块420,用于用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源;其中,算子图的目标资源为算子图在众核系统中运行时占用的资源。
在一些实施例中,资源包括运行资源;确定实测算子图的目标资源包括:确定符合资源分配类型的条件的测试资源为实测算子图的目标运行资源。
在一些实施例中,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源包括:用多级测试资源分别运行实测算子图,根据实测算子图在各级测试资源下运行时的运行信息,确定符合资源分配类型的条件的一级测试资源为实测算子图的目标运行资源;其中,任意两级测试资源中,较高级测试资源的运行资源量大于较低级测试资源的运行资源量。
在一些实施例中,资源分配类型包括高性能类型;符合高性能类型的条件的测试资源为:所有级测试资源中,使实测算子图的处理速度最快的一级测试资源。
在一些实施例中,资源分配类型包括节能类型;任意两相邻级测试资源中的运行资源量的差相等;符合节能类型的条件的测试资源为:时间降低值大于预设阈值的所有级测试资源中的最低级测试资源;其中,测试资源的时间降低值为用该级测试资源运行实测算子图处理预定数据的耗时,相对用比该级测试资源低一级的测试资源运行实测算子图处理预定数据的耗时的减少量。
在一些实施例中,资源分配类型包括均衡类型;符合均衡类型的条件的测试资源为:所有级测试资源中,能耗比最大的一级测试资源;其中,测试资源的能耗比为用该级测试资源运行实测算子图时消耗单位能耗所能处理的数据量。
在一些实施例中,若所有实测算子图的配置数据总量小于或等于众核系统的片上存储空间,资源分配类型包括高性能类型、节能类型、均衡类型中的任意一种;若所有实测算子图的配置数据总量大于众核系统的片上存储空间,资源分配类型包括高性能类型、均衡类型中的任意一种;符合高性能类型的条件的测试资源为:所有级测试资源中,使实测算子图的处理速度最快的一级测试资源;符合节能类型的条件的测试资源为:时间降低值大于预设阈值的所有级测试资源中的最低级测试资源;其中,测试资源的时间降低值为用该级测试资源运行实测算子图处理预定数据的耗时,相对用比该级测试资源低一级的测试资源运行实测算子图处理预定数据的耗时的减少量;符合均衡类型的条件的测试资源为:所有级测试资源中,能耗比最大的一级测试资源; 其中,测试资源的能耗比为用该级测试资源运行实测算子图时消耗单位能耗所能处理的数据量。
在一些实施例中,资源包括时间资源;确定多个算子图中的至少部分算子图为实测算子图包括:确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图;用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源包括:分时的将每个实测算子图组加载至众核系统中运行,根据所有实测算子图组的总运行信息,确定各实测算子图组的目标时间资源。
在一些实施例中,分时的将每个实测算子图组加载至众核系统中运行包括:分时的将每个实测算子图组加载至众核系统中运行相等的时间段。
在一些实施例中,根据所有实测算子图组的总运行信息,确定各实测算子图组的目标时间资源包括:根据所有实测算子图组的运行信息,确定能使所有实测算子图组的总运行信息达到需求信息的时间资源为各实测算子图组的目标时间资源。
在一些实施例中,需求信息包括处理速度需求信息。
在一些实施例中,确定各实测算子图组的目标时间资源包括:确定在每个预定的运行周期中各实测算子图组的运行时间占比。
在一些实施例中,资源包括运行资源;确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有需求信息的第一算子图,至少确定第一算子图为实测算子图;用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源包括:若存在第一算子图,用测试资源运行第一算子图,根据第一算子图在测试资源下运行时的运行信息,确定能使第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源。
在一些实施例中,用测试资源运行第一算子图,根据第一算子图在测试资源下运行时的运行信息,确定能使第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源包括:若第一算子图中包括有时序关系的第一算子图和无时序关系的第一算子图,则:用测试资源运行有时序关系的第一算子图,根据有时序关系的第一算子图在测试资源下运行时的运行信息,确定能使有时序关系的第一算子图的运行信息达到需求信息的运行资源为有时序关系的第一算子图的目标运行资源;用测试资源运行无时序关系的第一算子图,根据无时序关系的第一算子图在测试资源下运行时的运行信息,确定能使无时序关系的第一算子图的运行信息达到需求信息的运行资源为无时序 关系的第一算子图的目标运行资源。
在一些实施例中,在确定能使第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源后,还包括:若多个算子图包括无需求信息的第二算子图,确定除第一算子图的目标运行资源外的剩余运行资源,为第二算子图的目标运行资源。
在一些实施例中,资源包括运行资源;确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有时序关系的第三算子图,至少确定第三算子图为实测算子图;用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源包括:用测试资源运行第三算子图,根据第三算子图在测试资源下运行时的运行信息,确定第三算子图的目标运行资源。
在一些实施例中,至少确定第三算子图为实测算子图还包括:确定多个实测算子图组,每个实测算子图组包括至少一个第三算子图,任意两个之间有时序关系的第三算子图位于同一实测算子图组或相邻实测算子图组;用测试资源运行第三算子图,根据第三算子图在测试资源下运行时的运行信息,确定第三算子图的目标运行资源包括:用测试资源分别运行各实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的目标运行资源。
在一些实施例中,在确定第三算子图的目标运行资源后,还包括:若多个算子图包括无时序关系的第四算子图,确定除第三算子图的目标运行资源外的剩余运行资源,为第四算子图的目标运行资源。
在一些实施例中,资源包括运行资源和时间资源;确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有时序关系的第三算子图,至少确定第三算子图为实测算子图,并确定多个实测算子图组,每个实测算子图组包括至少一个第三算子图;用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源包括:分时的将每个实测算子图组加载至众核系统中运行,根据各实测算子图组的运行信息,确定各实测算子图组的目标时间资源,并确定符合资源分配类型的条件的测试资源为各实测算子图组的目标运行资源。
在一些实施例中,用测试资源运行实测算子图包括:将实测算子图加载至众核系统中,在众核系统中用测试资源运行实测算子图。
在一些实施例中,用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源包括:用测试资源运行所有实测算子图,根据所有实测算子图在测试资源下运行时的运行信息,确定所有实测算子图的目标资源。
在一些实施例中,确定多个算子图中的至少部分算子图为实测算子图包括:确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图;用测试资源运行实测算子图,根据实测算子图在测试资源下运行时的运行信息,确定实测算子图的目标资源包括:用测试资源分别运行各实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的目标资源。
在一些实施例中,每个实测算子图组中的实测算子图的配置数据总量小于或等于众核系统的片上存储空间。
在一些实施例中,确定实测算子图的目标资源包括:确定能使实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源。
在一些实施例中,确定能使实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:确定能使实测算子图的运行信息达到需求信息的、最少的资源为实测算子图的目标资源。
在一些实施例中,确定能使实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:若无法确定出能使实测算子图的运行信息达到需求信息的资源,则发出提示。
第三方面,本发明实施例提供一种计算机设备,其包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现本发明实施例任意一种资源分配方法。
图7为本发明实施例提供的一种用来实现本发明实施的方法的示例性计算机设备12的框图。图7显示的计算机设备12仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图7所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。计算机设备12可以是挂接在总线上的设备。
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(Micro Channel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。
计算机设备12典型地包括多种计算机可读存储介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失 性介质,可移动的和不可移动的介质。
系统存储器28可以包括易失性存储器形式的计算机可读存储介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机可读存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图7未显示,通常称为“硬盘运动器”)。尽管图7中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘运动器,以及对可移动非易失性光盘(例如紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM),数字视盘(Digital Video Disc-Read Only Memory,DVD-ROM)或者其它光介质)读写的光盘运动器。在这些情况下,每个运动器可以通过一个或者多个数据介质接口与总线18相连。系统存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如系统存储器28中,这样的程序模块42包括——但不限于——操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图5中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备运动器、冗余处理单元、外部磁盘运动阵列、(Redundant Arrays of Inexpensive Disks,RAID)系统、磁带运动器以及数据备份存储系统等。
第四方面,参照图8,本发明实施例提供一种计算机可读存储介质50,其上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例任意一种资源分配方法。
本发明实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读存储介质可以是计算机可读信号介 质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式CD-ROM、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读存储介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、电线、光缆、无线电频率(Radio Frequency,RF)等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (29)

  1. 一种资源分配方法,其特征在于,所述方法包括:
    确定多个待分配资源的算子图,确定多个算子图中的至少部分算子图为实测算子图;
    用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源;其中,所述算子图的目标资源为算子图在众核系统中运行时占用的资源。
  2. 根据权利要求1所述的方法,其特征在于,所述资源包括运行资源;所述确定所述实测算子图的目标资源包括:
    确定符合资源分配类型的条件的测试资源为所述实测算子图的目标运行资源。
  3. 根据权利要求2所述的方法,其特征在于,所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:
    所述用多级测试资源分别运行所述实测算子图,根据所述实测算子图在各级测试资源下运行时的运行信息,确定符合资源分配类型的条件的一级测试资源为所述实测算子图的目标运行资源;其中,任意两级测试资源中,较高级测试资源的运行资源量大于较低级测试资源的运行资源量。
  4. 根据权利要求3所述的方法,其特征在于,所述资源分配类型包括高性能类型;
    符合所述高性能类型的条件的测试资源为:所有级测试资源中,使所述实测算子图的处理速度最快的一级测试资源。
  5. 根据权利要求3所述的方法,其特征在于,所述资源分配类型包括节能类型;任意两相邻级测试资源中的运行资源量的差相等;
    符合所述节能类型的条件的测试资源为:时间降低值大于预设阈值的所有级测试资源中的最低级测试资源;其中,所述测试资源的时间降低值为用该级测试资源运行实测算子图处理预定数据的耗时,相对用比该级测试资源低一级的测试资源运行实测算子图处理预定数据的耗时的减少量。
  6. 根据权利要求3所述的方法,其特征在于,所述资源分配类型包括均衡类型;
    符合所述均衡类型的条件的测试资源为:所有级测试资源中,能耗比最大的一级测试资源;其中,所述测试资源的能耗比为用该级测试资源运行实测算子图时消耗单位能耗所能处理的数据量。
  7. 根据权利要求3所述的方法,其特征在于,
    若所有所述实测算子图的配置数据总量小于或等于众核系统的片上存储空间,所述资源分配类型包括高性能类型、节能类型、均衡类型中的任意一种;
    若所有所述实测算子图的配置数据总量大于众核系统的片上存储空间,所述资源分配类型包括高性能类型、均衡类型中的任意一种;
    符合所述高性能类型的条件的测试资源为:所有级测试资源中,使所述实测算子图的处理速度最快的一级测试资源;
    符合所述节能类型的条件的测试资源为:时间降低值大于预设阈值的所有级测试资源中的最低级测试资源;其中,所述测试资源的时间降低值为用该级测试资源运行实测算子图处理预定数据的耗时,相对用比该级测试资源低一级的测试资源运行实测算子图处理预定数据的耗时的减少量;
    符合所述均衡类型的条件的测试资源为:所有级测试资源中,能耗比最大的一级测试资源;其中,所述测试资源的能耗比为用该级测试资源运行实测算子图时消耗单位能耗所能处理的数据量。
  8. 根据权利要求1所述的方法,其特征在于,所述资源包括时间资源;
    所述确定多个算子图中的至少部分算子图为实测算子图包括:确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图;
    所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:分时的将每个所述实测算子图组加载至众核系统中运行,根据所有所述实测算子图组的总运行信息,确定各实测算子图组的目标时间资源。
  9. 根据权利要求8所述的方法,其特征在于,所述分时的将每个所述实测算子图组加载至众核系统中运行包括:
    分时的将每个所述实测算子图组加载至众核系统中运行相等的时间段。
  10. 根据权利要求8所述的方法,其特征在于,所述根据所有所述实测算子图组的总运行信息,确定各实测算子图组的目标时间资源包括:
    根据所有所述实测算子图组的运行信息,确定能使所有所述实测算子图组的总运行信息达到需求信息的时间资源为各实测算子图组的目标时间资源。
  11. 根据权利要求10所述的方法,其特征在于,
    所述需求信息包括处理速度需求信息。
  12. 根据权利要求8所述的方法,其特征在于,所述确定各实测算子图组的目标时间资源包括:
    确定在每个预定的运行周期中各实测算子图组的运行时间占比。
  13. 根据权利要求1所述的方法,其特征在于,所述资源包括运行资源;
    所述确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有需求信息的第一算子图,至少确定所述第一算子图为实测算子图;
    所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:若存在所述第一算子图,用测试资源运行所述第一算子图,根据所述第一算子图在测试资源下运行时的运行信息,确定能使所述第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源。
  14. 根据权利要求13所述的方法,其特征在于,所述用测试资源运行所述第一算子图,根据所述第一算子图在测试资源下运行时的运行信息,确定能使所述第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源包括:
    若所述第一算子图中包括有时序关系的第一算子图和无时序关系的第一算子图,则:用测试资源运行有时序关系的第一算子图,根据有时序关系的第一算子图在测试资源下运行时的运行信息,确定能使有时序关系的第一算子图的运行信息达到需求信息的运行资源为有时序关系的第一算子图的目标运行资源;用测试资源运行无时序关系的第一算子图,根据无时序关系的第一算子图在测试资源下运行时的运行信息,确定能使无时序关系的第一算子图的运行信息达到需求信息的运行资源为无时序关系的第一算子图的目标运行资源。
  15. 根据权利要求13所述的方法,其特征在于,在所述确定能使所述第一算子图的运行信息达到需求信息的运行资源为第一算子图的目标运行资源后,还包括:
    若多个算子图包括无需求信息的第二算子图,确定除所述第一算子图的目标运行资源外的剩余运行资源,为所述第二算子图的目标运行资源。
  16. 根据权利要求1所述的方法,其特征在于,所述资源包括运行资源;
    所述确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有时序关系的第三算子图,至少确定所述第三算子 图为实测算子图;
    所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:用测试资源运行所述第三算子图,根据所述第三算子图在测试资源下运行时的运行信息,确定所述第三算子图的目标运行资源。
  17. 根据权利要求16所述的方法,其特征在于,
    所述至少确定所述第三算子图为实测算子图还包括:确定多个实测算子图组,每个所述实测算子图组包括至少一个第三算子图,任意两个之间有时序关系的第三算子图位于同一实测算子图组或相邻实测算子图组;
    所述用测试资源运行所述第三算子图,根据所述第三算子图在测试资源下运行时的运行信息,确定所述第三算子图的目标运行资源包括:用测试资源分别运行各所述实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的目标运行资源。
  18. 根据权利要求16所述的方法,其特征在于,在所述确定所述第三算子图的目标运行资源后,还包括:
    若多个算子图包括无时序关系的第四算子图,确定除所述第三算子图的目标运行资源外的剩余运行资源,为所述第四算子图的目标运行资源。
  19. 根据权利要求1所述的方法,其特征在于,所述资源包括运行资源和时间资源;
    所述确定多个算子图中的至少部分算子图为实测算子图包括:若多个算子图包括具有时序关系的第三算子图,至少确定所述第三算子图为实测算子图,并确定多个实测算子图组,每个实测算子图组包括至少一个第三算子图;
    所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:分时的将每个所述实测算子图组加载至众核系统中运行,根据各所述实测算子图组的运行信息,确定各实测算子图组的目标时间资源,并确定符合资源分配类型的条件的测试资源为各所述实测算子图组的目标运行资源。
  20. 根据权利要求1所述的方法,其特征在于,所述用测试资源运行所述实测算子图包括:
    将所述实测算子图加载至众核系统中,在众核系统中用测试资源运行所述实测算子图。
  21. 根据权利要求1所述的方法,其特征在于,所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:
    用测试资源运行所有所述实测算子图,根据所有所述实测算子图在测试资源下运行时的运行信息,确定所有所述实测算子图的目标资源。
  22. 根据权利要求1所述的方法,其特征在于,
    所述确定多个算子图中的至少部分算子图为实测算子图包括:确定多个实测算子图组,每个实测算子图组包括至少一个实测算子图;
    所述用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的目标资源包括:用测试资源分别运行各所述实测算子图组,根据每个实测算子图组在测试资源下运行时的运行信息,确定该实测算子图组的目标资源。
  23. 根据权利要求22所述的方法,其特征在于,
    每个所述实测算子图组中的实测算子图的配置数据总量小于或等于众核系统的片上存储空间。
  24. 根据权利要求1所述的方法,其特征在于,所述确定所述实测算子图的目标资源包括:
    确定能使所述实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源。
  25. 根据权利要求24所述的方法,其特征在于,所述确定能使所述实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:
    确定能使所述实测算子图的运行信息达到需求信息的、最少的资源为实测算子图的目标资源。
  26. 根据权利要求24所述的方法,其特征在于,所述确定能使所述实测算子图的运行信息达到需求信息的资源为实测算子图的目标资源包括:
    若无法确定出能使所述实测算子图的运行信息达到需求信息的资源,则发出提示。
  27. 一种资源分配装置,其特征在于,包括:
    算子图确定模块,用于确定多个待分配资源的算子图,确定多个算子图中的至少部分算子图为实测算子图;
    资源确定模块,用于用测试资源运行所述实测算子图,根据所述实测算子图在测试资源下运行时的运行信息,确定所述实测算子图的 目标资源;其中,所述算子图的目标资源为算子图在众核系统中运行时占用的资源。
  28. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至26中任一所述的资源分配方法。
  29. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1至26中任一所述的资源分配方法。
PCT/CN2021/114217 2020-08-27 2021-08-24 资源分配方法和装置、计算机设备、计算机可读存储介质 WO2022042519A1 (zh)

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
CN202010879892.2A CN112068957B (zh) 2020-08-27 2020-08-27 资源分配方法、装置、计算机设备及存储介质
CN202010879892.2 2020-08-27
CN202110476735.1A CN115269165A (zh) 2021-04-29 2021-04-29 算子图资源分配方法、装置、计算机设备及存储介质
CN202110474902.9A CN115269163A (zh) 2021-04-29 2021-04-29 算子图的资源分配方法、装置、计算机设备及存储介质
CN202110475134.9 2021-04-29
CN202110476902.2 2021-04-29
CN202110476735.1 2021-04-29
CN202110474902.9 2021-04-29
CN202110476902.2A CN115269166A (zh) 2021-04-29 2021-04-29 算子图的时间分配方法、装置、计算机设备及存储介质
CN202110475134.9A CN115269164A (zh) 2021-04-29 2021-04-29 资源分配方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022042519A1 true WO2022042519A1 (zh) 2022-03-03

Family

ID=80352690

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114217 WO2022042519A1 (zh) 2020-08-27 2021-08-24 资源分配方法和装置、计算机设备、计算机可读存储介质

Country Status (1)

Country Link
WO (1) WO2022042519A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030216951A1 (en) * 2002-05-02 2003-11-20 Roman Ginis Automating resource management for distributed business processes
CN102508717A (zh) * 2011-11-17 2012-06-20 大唐移动通信设备有限公司 一种应用于多核处理器的内存调度方法及装置
CN104657219A (zh) * 2015-02-27 2015-05-27 西安交通大学 一种用于异构众核系统下的应用程序线程数动态调整方法
CN106293931A (zh) * 2015-06-23 2017-01-04 北京神州泰岳软件股份有限公司 一种分配服务器资源的方法和装置
CN109558248A (zh) * 2018-12-11 2019-04-02 中国海洋大学 一种用于确定面向海洋模式计算的资源分配参数的方法及系统
CN110515739A (zh) * 2019-10-23 2019-11-29 上海燧原智能科技有限公司 深度学习神经网络模型负载计算方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030216951A1 (en) * 2002-05-02 2003-11-20 Roman Ginis Automating resource management for distributed business processes
CN102508717A (zh) * 2011-11-17 2012-06-20 大唐移动通信设备有限公司 一种应用于多核处理器的内存调度方法及装置
CN104657219A (zh) * 2015-02-27 2015-05-27 西安交通大学 一种用于异构众核系统下的应用程序线程数动态调整方法
CN106293931A (zh) * 2015-06-23 2017-01-04 北京神州泰岳软件股份有限公司 一种分配服务器资源的方法和装置
CN109558248A (zh) * 2018-12-11 2019-04-02 中国海洋大学 一种用于确定面向海洋模式计算的资源分配参数的方法及系统
CN110515739A (zh) * 2019-10-23 2019-11-29 上海燧原智能科技有限公司 深度学习神经网络模型负载计算方法、装置、设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHEN YANG, QI DEYU;ZHOU NAQIN;WANG XINYANG: "A Virtual Core Resource Distribution Algorithm for Many-Core Processor System-on-Chip", HUANAN LIGONG DAXUE XUEBAO - JOURNAL OF SOUTH CHINA UNIVERSITY OF TECHNOLOGY, GUANGZHOU, CH, vol. 46, no. 1, 31 January 2018 (2018-01-31), CH , XP055903313, ISSN: 1000-565X, DOI: 10.3969/j.issn.1000-565X.2018.01.015 *

Similar Documents

Publication Publication Date Title
US11354563B2 (en) Configurable and programmable sliding window based memory access in a neural network processor
CN110704360B (zh) 一种基于异构fpga数据流的图计算优化方法
KR20190055610A (ko) 뉴럴 네트워크 모델들의 공용 연산 그룹을 단일 처리하는 뉴럴 네트워크 시스템, 이를 포함하는 애플리케이션 프로세서 및 뉴럴 네트워크 시스템의 동작방법
Quan et al. A scenario-based run-time task mapping algorithm for mpsocs
US11609792B2 (en) Maximizing resource utilization of neural network computing system
CN112068957B (zh) 资源分配方法、装置、计算机设备及存储介质
CN111190735B (zh) 一种基于Linux的片上CPU/GPU流水化计算方法及计算机系统
EP3920026A1 (en) Scheduler, method of operating the same, and accelerator apparatus including the same
CN111176792A (zh) 一种资源调度方法、装置及相关设备
TW202109285A (zh) 用以致能工作負載之靜態映射的亂序管線化執行之方法及設備
WO2021232769A1 (zh) 一种存储数据的方法及数据处理装置
US20200005127A1 (en) System And Method Of Input Alignment For Efficient Vector Operations In An Artificial Neural Network
CN116954929B (zh) 一种实时迁移的动态gpu调度方法及系统
CN114386560A (zh) 数据处理方法和设备
US10684834B2 (en) Method and apparatus for detecting inter-instruction data dependency
CN113553103B (zh) 基于cpu+gpu异构处理平台的多核并行调度方法
WO2022042519A1 (zh) 资源分配方法和装置、计算机设备、计算机可读存储介质
CN116680063B (zh) 任务调度方法、装置、计算系统、电子设备和存储介质
CN111756802B (zh) 一种数据流任务在numa平台上的调度方法及系统
US20230143270A1 (en) Apparatus and method with scheduling
CN115712506A (zh) 一种资源分配方法以及加速器
CN112130977B (zh) 一种任务调度方法、装置、设备及介质
US11494238B2 (en) Run-time neural network re-allocation across heterogeneous processors
US10338837B1 (en) Dynamic mapping of applications on NVRAM/DRAM hybrid memory
KR102592330B1 (ko) OpenCL 커널을 처리하는 방법과 이를 수행하는 컴퓨팅 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860354

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21860354

Country of ref document: EP

Kind code of ref document: A1