CN113254200A - Resource arrangement method and intelligent agent - Google Patents

Resource arrangement method and intelligent agent Download PDF

Info

Publication number
CN113254200A
CN113254200A CN202110520783.6A CN202110520783A CN113254200A CN 113254200 A CN113254200 A CN 113254200A CN 202110520783 A CN202110520783 A CN 202110520783A CN 113254200 A CN113254200 A CN 113254200A
Authority
CN
China
Prior art keywords
resource
local environment
information
state information
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110520783.6A
Other languages
Chinese (zh)
Other versions
CN113254200B (en
Inventor
刘晶
徐雷
毋涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202110520783.6A priority Critical patent/CN113254200B/en
Publication of CN113254200A publication Critical patent/CN113254200A/en
Application granted granted Critical
Publication of CN113254200B publication Critical patent/CN113254200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request

Abstract

The invention discloses a resource arranging method and an agent, and relates to the technical field of computers. The specific scheme comprises the following steps: acquiring global reward information and local environment state information, wherein the global reward information is information acquired based on a preset global environment, the global environment corresponds to one or more intelligent agents, and the local environment state information is information acquired according to the local environment corresponding to the current intelligent agent; updating the arrangement strategy according to the global reward information and the local environment state information; in the event that a first resource orchestration request is received, the resources in the local environment are orchestrated based on the first resource orchestration request, the local environment state information, and the updated orchestration policy. The arrangement strategy is updated based on the global reward information and the local environment state information, so that the correlation of the arrangement strategy among the intelligent agents can be reduced, and the arrangement strategy can be updated in time according to the environment change, so that a more reasonable and accurate arrangement strategy is obtained, and the utilization rate of resources is improved.

Description

Resource arrangement method and intelligent agent
Technical Field
The invention relates to the technical field of computers, in particular to a resource arrangement method and an agent.
Background
An agent is one of the important concepts in the field of artificial intelligence, and refers to a computing entity that can continuously and autonomously play a role in a certain environment and has the characteristics of residency, responsiveness, sociality, initiative and the like. In practical applications, a policy may be preset for the agent, and the agent performs a corresponding action based on the preset policy. However, in general, the preset policy is a relatively fixed policy, and there is a correlation between multiple smarts based on the same environment and the same reward update policy, so that the smarts cannot reasonably and accurately update the policy according to the change of the environment.
Disclosure of Invention
Therefore, the invention provides a resource arranging method and an agent, and aims to solve the problem that the agent cannot reasonably and accurately update a strategy according to environmental changes.
In order to achieve the above object, a first aspect of the present invention provides a resource arranging method, including:
acquiring global reward information and local environment state information, wherein the global reward information is information acquired based on a preset global environment, the global environment corresponds to one or more agents, and the local environment state information is information acquired according to a local environment corresponding to a current agent;
updating an arrangement strategy according to the global reward information and the local environment state information;
in the event a first resource orchestration request is received, orchestrating resources in the local environment based on the first resource orchestration request, the local environment state information, and the updated orchestration policy.
Further, before obtaining the global reward information and the local environment status information, the method further includes:
receiving a second resource scheduling request sent by the user terminal;
and arranging the resources in the historical local environment corresponding to the second resource arranging request according to the second resource arranging request, the historical local environment state information corresponding to the second resource arranging request and the historical arranging strategy corresponding to the second resource arranging request.
Further, the obtaining of the global reward information and the local environment status information includes:
and acquiring the global reward information and the local environment state information according to the resources in the historical local environment corresponding to the second resource arranging request after arrangement.
Further, the first resource orchestration request and the second resource orchestration request comprise a resource orchestration type and a resource demand.
Further, the global reward information is information generated by batch processing historical local environment state information and local environment state information corresponding to the second resource arrangement requests of all agents in the global environment, obtaining a resource balance rate and a request acceptance rate of the global environment, and according to a preset reward mechanism, the resource balance rate and the request acceptance rate.
Further, the orchestration policy comprises an action policy;
the updating of the orchestration policy according to the global reward information and the local environment state information includes:
and inputting the global reward information and the local environment state information into a preset action strategy prediction model so that the action strategy prediction model executes action strategy prediction operation and outputs an updated action strategy.
Further, the action policy includes one or more of a path deployment sub-policy and a routing sub-policy.
Further, said orchestrating resources in the local environment based on the first resource orchestration request, the local environment state information, and the updated orchestration policy if the first resource orchestration request is received comprises:
and sending the first resource arranging request, the local environment state information and the updated arranging strategy to a resource manager, and arranging the resources in the local environment by the resource manager.
Further, the resource manager is configured to configure and orchestrate various types of resources in the local environment.
In order to achieve the above object, a second aspect of the present invention provides an agent comprising:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining global reward information and local environment state information, the global reward information is obtained based on a preset global environment, and the global environment corresponds to one or more agents;
the second acquisition module is used for acquiring local environment state information, wherein the local environment state information is acquired according to a local environment corresponding to the current agent;
the updating module is used for updating the arranging strategy according to the global reward information and the local environment state information;
and the arranging module is used for arranging the resources in the local environment based on the first resource arranging request, the local environment state information and the updated arranging strategy under the condition of receiving the first resource arranging request.
The invention has the following advantages:
according to the resource arranging method provided by the invention, each intelligent agent updates the arranging strategy according to the global reward information and the local environment state information, so that the correlation of the arranging strategies among the intelligent agents can be reduced, the arranging strategy can be updated in time according to the environment change, a more reasonable and accurate arranging strategy can be obtained, when a resource arranging request is received, the resource in the local environment corresponding to each intelligent agent is arranged according to the resource arranging request, the local environment state information and the updated arranging strategy, and the utilization rate of the resource is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a flowchart of a resource scheduling method according to an embodiment of the present invention;
fig. 2 is a flowchart of a resource scheduling method according to a second embodiment of the present invention;
fig. 3 is a block diagram illustrating components of an agent according to a third embodiment of the present invention;
fig. 4 is a block diagram illustrating a resource scheduling system according to a fourth embodiment of the present invention;
fig. 5 is a block diagram of a resource arrangement system according to a fifth embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
The application provides a resource arranging method in a first aspect. Fig. 1 is a flowchart of a resource orchestration method according to an embodiment of the present application. As shown in fig. 1, the resource arranging method includes the following steps:
step S101, obtaining global reward information and local environment state information.
The global reward information is information obtained based on a preset global environment, the global environment corresponds to one or more intelligent agents, and the local environment state information is information obtained according to a local environment corresponding to the current intelligent agent.
In some embodiments, the global reward information is information generated by batch processing of local environment states of all agents in the global environment, obtaining a resource balancing rate and a request acceptance rate of the global environment, and according to a preset reward mechanism, the resource balancing rate and the request acceptance rate. The batch processing may be executed after a preset time period is reached, or may be executed after the number of the processed resource scheduling requests reaches a preset number threshold, and those skilled in the art may flexibly set the batch processing according to actual requirements, which is not limited in the present application.
For example, after the agent performs resource arrangement, the resource balance rate and the request acceptance rate calculated according to the global environment are both improved, and the global reward information should be forward feedback information. The global reward information can be a numerical value with a symbol, wherein positive and negative of the symbol indicate that the reward information is fed back positively or negatively, and the size of the numerical value indicates the degree of positive or negative feedback.
In some embodiments, the agent obtains the local environment state information through a device with an information acquisition function, where the local environment state information includes information such as resource type, resource occupancy, resource vacancy and the like in the local environment.
It should be noted that the above local environment state information is only an example, and may be specifically set according to actual needs, and other non-described local environment state information is also within the protection scope of the present application, and is not described herein again.
It should be further noted that the preset reward mechanism itself is also an iterative update mechanism, so that more reasonable and accurate global reward information can be obtained, and accordingly, the update of the orchestration strategy is more reasonable and accurate.
And step S102, updating the arrangement strategy according to the global reward information and the local environment state information.
The arrangement strategy refers to a strategy for arranging various resources in a local environment.
In some embodiments, the orchestration policy comprises an action policy comprising one or more of a path deployment sub-policy and a routing sub-policy. Accordingly, the actions of the agent include path deployment and routing configuration.
In some embodiments, the global reward information and the local environment state information are input into a preset action strategy prediction model, so that the action strategy prediction model executes action strategy prediction operation, and outputs updated action strategy. The action strategy prediction model is a model constructed based on an Actor-Critic algorithm. The Actor-Critic algorithm combines two types of reinforcement Learning algorithms, Policy-based Policy Gradient and value-based Q-Learning, with the former being the Actor to select behavior based on probability and the latter being Critic to evaluate the action score of the Actor, which in turn modifies the probability of behavior based on the Critic's score. Based on the method, the Actor-Critic algorithm can effectively process the selection of continuous actions and can also perform single-step updating.
Step S103, in the case of receiving the first resource arranging request, arranging the resource in the local environment based on the first resource arranging request, the local environment state information and the updated arranging strategy.
In some embodiments, the global environment includes various types of servers, storage devices, network devices, etc., and the resources of these devices are virtualized into various types of virtual resources. The global environment corresponds to a plurality of agents, the action of each agent includes VNF (Virtual Network Feature) multipath deployment and traffic flow routing, and the corresponding action policy includes a path policy and a routing policy.
Assume that the first resource orchestration request comprises a resource orchestration type and a resource demand. After receiving the first resource arranging request, the agent arranges various virtual resources in the local environment corresponding to the agent according to the resource arranging type and the resource demand amount corresponding to the first resource arranging request, the current local environment state information of the agent and the updated arranging strategy.
In some implementations, the first resource orchestration request, the local environment state information, and the updated orchestration policy are sent to a resource manager, which orchestrates the resources in the local environment. Wherein the resource manager is used for configuring and arranging various resources in the local environment. For example, the resource manager includes an SFC (Service Function Chaining) orchestrator, an SDN (Software Defined Network) controller, an edge computing orchestrator, an NFV (Network Functions Virtualization) orchestrator, and the like.
It should be noted that, after receiving the first resource orchestration request, the agent may also receive other resource orchestration requests, and in the case of receiving other resource orchestration requests, the agent iteratively orchestrates resources according to the current resource orchestration method and updates the orchestration policy.
According to the resource arranging method provided by the embodiment, each intelligent agent updates the arranging strategy according to the global reward information and the local environment state information, the relevance of the arranging strategies among the intelligent agents can be reduced, the arranging strategy can be updated timely according to the environment change, a more reasonable and accurate arranging strategy is obtained, when a resource arranging request is received, the resources in the local environment corresponding to each intelligent agent are arranged according to the resource arranging request, the local environment state information and the updated arranging strategy, and the utilization rate of the resources is improved.
Fig. 2 is a flowchart of a resource orchestration method according to a second embodiment of the present application. As shown in fig. 2, the resource arranging method includes the following steps:
step S201, receiving a second resource scheduling request sent by the user terminal.
The agent updating and arranging strategy is an iterative updating process. And after receiving a resource arranging request every time, the intelligent agent executes resource arranging according to the current arranging strategy, the local environment state information changes, new global reward information is correspondingly generated, based on the change, the intelligent agent updates the arranging strategy according to the new global reward information and the changed local environment state information, so that a new arranging strategy is obtained, and the process is repeated until a preset stopping condition is met. The stopping condition may be a condition that the number of iterations satisfies a preset iteration threshold, and the stopping condition may be flexibly set according to a requirement in practical application, which is not limited in the present application.
In some embodiments, the agent receives a second resource orchestration request before obtaining the global reward information and the local environment state information corresponding to the first resource orchestration request, wherein the second resource orchestration request comprises a resource orchestration type and a resource demand.
Step S202, arranging the resources in the historical local environment corresponding to the second resource arranging request according to the second resource arranging request, the historical local environment state information corresponding to the second resource arranging request and the historical arranging strategy corresponding to the second resource arranging request.
The historical local environment state information corresponding to the second resource arrangement request refers to the local environment state information corresponding to the current agent before the agent executes the first resource arrangement request (the local environment state information corresponding to the second resource arrangement request is before the time point corresponding to the first resource arrangement request, and therefore the local environment state information corresponding to the second resource arrangement request belongs to the historical local environment state information relative to the first resource arrangement request). Similarly, the historical scheduling policy corresponding to the second resource scheduling request refers to the scheduling policy corresponding to the current agent before the first resource scheduling request is executed.
The method for arranging the resources in the historical local environment corresponding to the second resource arrangement request by the agent according to the second resource arrangement request, the historical local environment state information corresponding to the second resource arrangement request, and the historical arrangement policy corresponding to the second resource arrangement request is similar to that in step S103 in the first embodiment of the present application, and details are not repeated here.
Step S203, obtaining the global reward information and the local environment state information.
And step S204, updating the arranging strategy according to the global reward information and the local environment state information.
Step S205, in case of receiving the first resource orchestration request, orchestrating resources in the local environment based on the first resource orchestration request, the local environment state information, and the updated orchestration policy.
Steps S203 to S205 in this embodiment are the same as steps S101 to S103 in the first embodiment of the present application, and are not described herein again.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A second aspect of the present application provides an agent. Fig. 3 is a block diagram of an agent according to a third embodiment of the present application. As shown in fig. 3, the agent includes: a first obtaining module 301, a second obtaining module 302, an updating module 303 and an orchestration module 304.
The first obtaining module 301 is configured to obtain global reward information and local environment status information.
The global reward information is information obtained based on a preset global environment, the global environment corresponds to one or more intelligent agents, and the local environment state information is information obtained according to a local environment corresponding to the current intelligent agent.
In some embodiments, the global reward information is information generated by batch processing of local environment states of all agents in the global environment, obtaining a resource balancing rate and a request acceptance rate of the global environment, and according to a preset reward mechanism, the resource balancing rate and the request acceptance rate. The batch processing may be executed after a preset time period is reached, or may be executed after the number of the processed resource scheduling requests reaches a preset number threshold, and those skilled in the art may flexibly set the batch processing according to actual requirements, which is not limited in the present application.
For example, after the agent performs resource arrangement, the resource balance rate and the request acceptance rate calculated according to the global environment are both improved, and the global reward information should be forward feedback information. The global reward information can be a numerical value with a symbol, wherein positive and negative of the symbol indicate that the reward information is fed back positively or negatively, and the size of the numerical value indicates the degree of positive or negative feedback.
It should be noted that the preset reward mechanism itself is also an iterative update mechanism, so that more reasonable and accurate global reward information can be obtained, and accordingly, the update of the orchestration strategy is more reasonable and accurate.
A second obtaining module 302, configured to obtain the local environment status information.
In some embodiments, the agent obtains the local environment state information through the second obtaining module 302 with an information collecting function, where the local environment state information includes information of resource type, resource occupancy amount, resource idle amount, and the like in the local environment.
It should be noted that the above local environment state information is only an example, and may be specifically set according to actual needs, and other non-described local environment state information is also within the protection scope of the present application, and is not described herein again.
And the updating module 303 is configured to update the orchestration policy according to the global reward information and the local environment state information.
The arrangement strategy refers to a strategy for arranging various resources in a local environment.
In some embodiments, the orchestration policy comprises an action policy comprising one or more of a path deployment sub-policy and a routing sub-policy. Accordingly, the actions of the agent include path deployment and routing configuration.
In some embodiments, the updating module 303 updates the orchestration policy according to the global reward information and the local environment state information, including:
and inputting the global reward information and the local environment state information into a preset action strategy prediction model so that the action strategy prediction model executes action strategy prediction operation and outputs an updated action strategy. The action strategy prediction model is a model constructed based on an Actor-Critic algorithm.
The orchestration module 304 is configured to, upon receiving the first resource orchestration request, orchestrate resources in the local environment based on the first resource orchestration request, the local environment state information, and the updated orchestration policy.
In some embodiments, the global environment includes various types of servers, storage devices, network devices, etc., and the resources of these devices are virtualized into various types of virtual resources. The global environment corresponds to a plurality of agents, the action of each agent comprises VNF multi-path deployment and traffic flow routing, and the corresponding action strategy comprises a path strategy and a routing strategy.
Assume that the first resource orchestration request comprises a resource orchestration type and a resource demand. After receiving the first resource arranging request, the agent arranges various virtual resources in the local environment corresponding to the agent according to the resource arranging type and the resource demand amount corresponding to the first resource arranging request, the current local environment state information of the agent and the updated arranging strategy.
In some implementations, the first resource orchestration request, the local environment state information, and the updated orchestration policy are sent to a resource manager, which orchestrates the resources in the local environment. Wherein the resource manager is used for configuring and arranging various resources in the local environment. For example, the resource managers include SFC organizers, SDN controllers, edge computing organizers, NFV organizers, and the like.
It should be noted that, after receiving the first resource orchestration request, the agent may also receive other resource orchestration requests, and in the case of receiving other resource orchestration requests, the agent iteratively orchestrates resources according to the current resource orchestration method and updates the orchestration policy.
The intelligent agents provided by the embodiment update the arrangement strategy according to the global reward information and the local environment state information, so that the correlation of the arrangement strategy among the intelligent agents can be reduced, the arrangement strategy can be updated in time according to the environment change, a more reasonable and accurate arrangement strategy can be obtained, when a resource arrangement request is received, the resources in the local environment corresponding to each intelligent agent are arranged according to the resource arrangement request, the local environment state information and the updated arrangement strategy, and the utilization rate of the resources is improved.
A third aspect of the present application provides a resource orchestration system. Fig. 4 is a block diagram of a resource arrangement system according to a fourth embodiment of the present application. As shown in FIG. 4, the resource orchestration system 400 comprises: agent 410, reward module 420, environment 430, and resource manager 440.
The environment 410 includes n resources, where n is an integer greater than or equal to 1, from the first resource 431 to the nth resource 43 n; the agent 410 includes a predetermined and updatable orchestration policy 411, and a predetermined executable action 412; the reward module 420 is configured to determine a request acceptance rate 421 and a resource balance rate 422 according to the environment state information acquired from the environment 410 and a preset reward mechanism, and determine global reward information according to the request acceptance rate 421 and the resource balance rate 422, so that the agent 410 refers to the global reward information when updating the orchestration policy; the resource manager 440 is configured to execute the orchestration work of each resource in the environment 410 according to the orchestration policy 411 of the agent 410, the local environment state information, and the specific request content in the resource orchestration request, after receiving the resource orchestration request.
It should be noted that the first resource 431 to the nth resource 43n may be the same type of resource or different types of resources, which are only exemplary in this embodiment, and other types of resource distribution situations are also within the scope of the present application, and the present application is not limited thereto.
It should be further noted that the reward information output by the reward module 420 is reward information of a global nature, that is, the global reward information is information that is generated according to a preset reward mechanism, a resource balance rate and a request acceptance rate, and that is, the global reward information is batch-processed according to local environment state information corresponding to all agents (only one agent is exemplarily shown in fig. 4, other agents are not shown) in the global environment to obtain the resource balance rate and the request acceptance rate of the global environment.
Fig. 5 is a block diagram of a resource orchestration system according to a fifth embodiment of the present application. As shown in fig. 5, the resource scheduling system mainly includes: the actor commentary family network model 500 includes m agents, such as a first agent 511, a second agent 521, an m-th agent 5m1, wherein m is an integer greater than or equal to 1, a first local environment 512, a second local environment 522, an m-th local environment 5m2 corresponding to the agents, and an experience pool formed by a first experience 513, a second experience 523, an m-th experience 5m 3. The actor critic network model 500 is a network model constructed based on an actor critic algorithm, and the network model can obtain global reward information according to the set of local environment state information of each agent, so that the agents can refer to the global reward information when performing the updating of the orchestration strategy.
Each agent corresponds to its own local environment and operates asynchronously based on its own local environment, and the set of local environments of all agents constitutes a global environment. Specifically to the first agent 511, the first agent 511 obtains first local environment state information S1 from the first local environment 512tAnd upon receiving the resource scheduling request, according to the first local environment status information S1tAnd global bonus information riAnd the current orchestration policy outputs action a1 to first local environment 512t. Where i represents the number of updates of global reward information, t represents the number of changes of local environmental status information of the first agent 511, and t is related to the number of resource orchestration requests received by the first agent 511. The second agent 521 and the mth agent 5m1 have the same functions and the same resource arrangement method as the first agent 511, and are not described herein again (j denotes the number of times of changing the second local environment state information of the second agent 521, and k denotes the number of times of changing the mth local environment state information of the mth agent 5m 1).
For example, in the initial state, the first local environment state information of the first agent 511 is S10After receiving a resource orchestration request, the first agent 511, according to the content (such as information of resource type, resource quantity, etc.) in the resource orchestration request, S10And an initial orchestration policy execution action a10Wherein the initial orchestration policy may be a preset policy. After the resource orchestration operation is performed, the current local environment status of the first agent 511 is S10Change to S11The first agent 511 according to S11Current global prize information r1And updating the editing strategy. After receiving a new resource orchestration request, the first agent 511 iteratively performs the update operation of the resource orchestration and the orchestration policy according to the method described above.
It should be noted that the global bonus information riIs information obtained from experience of all agents in the global environment. Specifically, after each agent performs resource orchestration several times, each agent's local environmental state information, action information, and global incentive information constitute the corresponding experience information for each agent. For example, the first agent 511 corresponds to the first experience 513, the second agent corresponds to the second experience 523, and the mth agent corresponds to the mth experience 5m3, where the m sets of experiences form an experience pool. When the preset batch processing condition is reached, the actor comment family network model 500 carries out batch processing on the experiences in the experience pool, acquires new global reward information and issues the new global reward information toEach agent. And each agent updates the current orchestration strategy according to the new global reward information and executes resource orchestration based on the updated orchestration strategy.
In the embodiment, a plurality of agents run asynchronously, the actor comment family network model generates global reward information based on global environment information, and each agent updates the local arrangement strategy according to the shared global reward information during local interaction with the environment, so that the correlation of the arrangement strategies among the agents is weakened, and more reasonable and accurate arrangement strategies can be obtained.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (10)

1. A method of resource orchestration, comprising:
acquiring global reward information and local environment state information, wherein the global reward information is information acquired based on a preset global environment, the global environment corresponds to one or more agents, and the local environment state information is information acquired according to a local environment corresponding to a current agent;
updating an arrangement strategy according to the global reward information and the local environment state information;
in the event a first resource orchestration request is received, orchestrating resources in the local environment based on the first resource orchestration request, the local environment state information, and the updated orchestration policy.
2. The method of claim 1, wherein prior to obtaining the global reward information and the local environment status information, further comprising:
receiving a second resource scheduling request sent by the user terminal;
and arranging the resources in the historical local environment corresponding to the second resource arranging request according to the second resource arranging request, the historical local environment state information corresponding to the second resource arranging request and the historical arranging strategy corresponding to the second resource arranging request.
3. The method of claim 2, wherein obtaining global reward information and local environment status information comprises:
and acquiring the global reward information and the local environment state information according to the resources in the historical local environment corresponding to the second resource arranging request after arrangement.
4. The resource orchestration method according to claim 2, wherein the first resource orchestration request and the second resource orchestration request comprise a resource orchestration type and a resource demand.
5. The resource orchestration method according to claim 2 or 3, wherein the global incentive information is information generated by batch processing historical local environment state information and local environment state information corresponding to the second resource orchestration requests of all agents in the global environment, obtaining a resource balancing rate and a request acceptance rate of the global environment, and according to a preset incentive mechanism, the resource balancing rate and the request acceptance rate.
6. The resource orchestration method according to claim 1, wherein the orchestration policy comprises an action policy;
the updating of the orchestration policy according to the global reward information and the local environment state information includes:
and inputting the global reward information and the local environment state information into a preset action strategy prediction model so that the action strategy prediction model executes action strategy prediction operation and outputs an updated action strategy.
7. The method of claim 6, wherein the action policy comprises one or more of a path deployment sub-policy and a routing sub-policy.
8. The resource orchestration method according to claim 1, wherein orchestrating resources in the local environment based on the first resource orchestration request, the local environment state information, and an updated orchestration policy if the first resource orchestration request is received comprises:
and sending the first resource arranging request, the local environment state information and the updated arranging strategy to a resource manager, and arranging the resources in the local environment by the resource manager.
9. The method of claim 8, wherein the resource manager is configured to configure and orchestrate various types of resources in the local environment.
10. An agent, comprising:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining global reward information and local environment state information, the global reward information is obtained based on a preset global environment, and the global environment corresponds to one or more agents;
the second acquisition module is used for acquiring local environment state information, wherein the local environment state information is acquired according to a local environment corresponding to the current agent;
the updating module is used for updating the arranging strategy according to the global reward information and the local environment state information;
and the arranging module is used for arranging the resources in the local environment based on the first resource arranging request, the local environment state information and the updated arranging strategy under the condition of receiving the first resource arranging request.
CN202110520783.6A 2021-05-13 2021-05-13 Resource arrangement method and intelligent agent Active CN113254200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110520783.6A CN113254200B (en) 2021-05-13 2021-05-13 Resource arrangement method and intelligent agent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110520783.6A CN113254200B (en) 2021-05-13 2021-05-13 Resource arrangement method and intelligent agent

Publications (2)

Publication Number Publication Date
CN113254200A true CN113254200A (en) 2021-08-13
CN113254200B CN113254200B (en) 2023-06-09

Family

ID=77181535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110520783.6A Active CN113254200B (en) 2021-05-13 2021-05-13 Resource arrangement method and intelligent agent

Country Status (1)

Country Link
CN (1) CN113254200B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499491A (en) * 2023-12-27 2024-02-02 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018212918A1 (en) * 2017-05-18 2018-11-22 Microsoft Technology Licensing, Llc Hybrid reward architecture for reinforcement learning
CN110418416A (en) * 2019-07-26 2019-11-05 东南大学 Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system
US20190347371A1 (en) * 2018-05-09 2019-11-14 Volvo Car Corporation Method and system for orchestrating multi-party services using semi-cooperative nash equilibrium based on artificial intelligence, neural network models,reinforcement learning and finite-state automata
CN110648049A (en) * 2019-08-21 2020-01-03 北京大学 Multi-agent-based resource allocation method and system
CN110852448A (en) * 2019-11-15 2020-02-28 中山大学 Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
US20200090074A1 (en) * 2018-09-14 2020-03-19 Honda Motor Co., Ltd. System and method for multi-agent reinforcement learning in a multi-agent environment
CN111585811A (en) * 2020-05-06 2020-08-25 郑州大学 Virtual optical network mapping method based on multi-agent deep reinforcement learning
CN112001585A (en) * 2020-07-14 2020-11-27 北京百度网讯科技有限公司 Multi-agent decision method and device, electronic equipment and storage medium
CN112584347A (en) * 2020-09-28 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) UAV heterogeneous network multi-dimensional resource dynamic management method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018212918A1 (en) * 2017-05-18 2018-11-22 Microsoft Technology Licensing, Llc Hybrid reward architecture for reinforcement learning
US20190347371A1 (en) * 2018-05-09 2019-11-14 Volvo Car Corporation Method and system for orchestrating multi-party services using semi-cooperative nash equilibrium based on artificial intelligence, neural network models,reinforcement learning and finite-state automata
US20200090074A1 (en) * 2018-09-14 2020-03-19 Honda Motor Co., Ltd. System and method for multi-agent reinforcement learning in a multi-agent environment
CN110418416A (en) * 2019-07-26 2019-11-05 东南大学 Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system
CN110648049A (en) * 2019-08-21 2020-01-03 北京大学 Multi-agent-based resource allocation method and system
CN110852448A (en) * 2019-11-15 2020-02-28 中山大学 Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN111585811A (en) * 2020-05-06 2020-08-25 郑州大学 Virtual optical network mapping method based on multi-agent deep reinforcement learning
CN112001585A (en) * 2020-07-14 2020-11-27 北京百度网讯科技有限公司 Multi-agent decision method and device, electronic equipment and storage medium
CN112584347A (en) * 2020-09-28 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) UAV heterogeneous network multi-dimensional resource dynamic management method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DENG JINSHENG 等: "Intelligent Agents Using Competitive Bidding Mechanism in Storage Grids", 《2008 FOURTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRID》, pages 404 - 407 *
NAJAMUL DIN 等: "Mobility-Aware Resource Allocation in Multi-Access Edge Computing Using Deep Reinforcement Learning", 《2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM)》, pages 202 - 209 *
张鸿 等: "时分波分无源光网络与云无线接入网联合架构中负载平衡的用户关联与资源分配策略", 《电子与信息学报》, vol. 43, no. 9, pages 2672 - 2679 *
陆瑾洋: "社交网络中的资源分配机制研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 3, pages 139 - 120 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499491A (en) * 2023-12-27 2024-02-02 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning
CN117499491B (en) * 2023-12-27 2024-03-26 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN113254200B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
Liu et al. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning
Ananthanarayanan et al. {GRASS}: Trimming stragglers in approximation analytics
US10108458B2 (en) System and method for scheduling jobs in distributed datacenters
WO2017176333A1 (en) Batching inputs to a machine learning model
CN109117252B (en) Method and system for task processing based on container and container cluster management system
JP2023055853A (en) Learned dynamically controlling combined system
CN111143039B (en) Scheduling method and device of virtual machine and computer storage medium
CN112114973A (en) Data processing method and device
CN111738446A (en) Scheduling method, device, equipment and medium of deep learning inference engine
CN113448728B (en) Cloud resource scheduling method, device, equipment and storage medium
US10755175B2 (en) Early generation of individuals to accelerate genetic algorithms
Maruf et al. Extending resources for avoiding overloads of mixed‐criticality tasks in cyber‐physical systems
CN116467061A (en) Task execution method and device, storage medium and electronic equipment
Tchernykh et al. Mitigating uncertainty in developing and applying scientific applications in an integrated computing environment
CN113254200A (en) Resource arrangement method and intelligent agent
CN116302448B (en) Task scheduling method and system
CN113407343A (en) Service processing method, device and equipment based on resource allocation
CN113535346B (en) Method, device, equipment and computer storage medium for adjusting thread number
CN111049900B (en) Internet of things flow calculation scheduling method and device and electronic equipment
CN110795075B (en) Data processing method and device for software programming
CN113177632A (en) Model training method, device and equipment based on pipeline parallelism
CN112306670A (en) Server cluster optimization method under Docker virtualization scene
CN116523030B (en) Method and device for training resources by dynamic scheduling model
US11934870B2 (en) Method for scheduling a set of computing tasks in a supercomputer
US11941421B1 (en) Evaluating and scaling a collection of isolated execution environments at a particular geographic location

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant