CN115955402A

CN115955402A - Service function chain determining method, device, equipment, medium and product

Info

Publication number: CN115955402A
Application number: CN202310239063.1A
Authority: CN
Inventors: 尚晶; 肖智文; 武智晖; 郭志伟; 陈卓
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-04-11
Anticipated expiration: 2043-03-14
Also published as: CN115955402B

Abstract

The application provides a method, a device, equipment, a medium and a product for determining a service function chain, wherein the method comprises the following steps: the method comprises the steps of obtaining a plurality of virtual network functions and a plurality of physical nodes, wherein the virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node; determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategies for realizing the virtual network function; and taking the scheduling strategy corresponding to the maximum return value as service function chain scheduling of a plurality of virtual network functions. The method and the device for arranging the service function chains can improve the reliability of arranging the service function chains.

Description

Service function chain determining method, device, equipment, medium and product

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a medium, and a product for determining a service function chain.

Background

With the rapid development of big data and artificial intelligence, reasonable processing, mining and application of a large amount of geographically distributed data resources are the key for exerting data values and enabling various industries. Due to the real-time and reliability requirements of data processing, a computing mode is changed from a centralized processing mode to a distributed collaborative computing mode, a service provider deploys edge data centers at a plurality of positions, large-scale data processing tasks are scheduled to geographically dispersed data centers for processing based on a Network Function Virtualization (NFV), and more efficient and intelligent network services can be provided near user positions. Network Service functions are mainly carried by Service Function Chains (SFCs), which refer to a sequence of ordered Virtualized Network Functions (VNFs) through which traffic passes. Service Function Chain Orchestration (SFC Orchestration) refers to a given set of known SFCs, and on the premise that certain constraints are met, a VNF is deployed in a reasonable manner to achieve optimization of Service performance or economy.

SFC orchestration can be learned from the interaction of the environment and agents through a reinforcement learning model, however, a single data center is not experienced enough in orchestrating different types of services in different network environments, and needs to rely on the business process experience of multiple data centers. However, the business data of different data centers are governed by different companies, and it is difficult to collect unified training.

That is, the current SFC orchestration has a problem of low reliability.

Disclosure of Invention

The method, the device, the equipment, the medium and the product for determining the service function chain can improve the reliability of arranging the service function chain.

In a first aspect, an embodiment of the present application provides a method for determining a service function chain, where the method includes:

the method comprises the steps of obtaining a plurality of virtual network functions and a plurality of physical nodes, wherein the virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node;

determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategies for realizing the virtual network function;

and using the arrangement strategy corresponding to the maximum return value as the service function chain arrangement of the plurality of virtual network functions.

In an embodiment, the determining a plurality of report values corresponding to a plurality of orchestration policies further includes:

and determining a plurality of return values corresponding to the plurality of arrangement strategies by using a trained federated reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes, and a plurality of arrangement strategies corresponding to a plurality of sample virtual network functions, and each sample virtual network function is realized on one first physical node.

In an embodiment, before determining, by using the trained federal reinforcement learning model, a plurality of return values corresponding to the plurality of orchestration strategies, the method further includes:

obtaining a plurality of training samples, wherein the virtual network functions of the training samples have a second preset execution sequence;

determining a target physical node for each sample virtual network function from the plurality of first physical nodes according to the second preset execution sequence;

determining a corresponding pre-estimated return value according to the target physical node;

and updating a return value table of a training sample in the initial federal reinforcement learning model according to the estimated return value to obtain the trained federal reinforcement learning model, wherein the return value table comprises the maximum return value of the virtual network function arrangement strategy of each sample.

In an embodiment, before determining the corresponding pre-estimated return value according to the target physical node, the method further includes:

judging whether the total amount of the residual resources of the target physical node is greater than or equal to the resources required by the sample virtual network function corresponding to the target physical node;

and determining a corresponding pre-estimated return value according to the target physical node under the condition that the total amount of the residual resources of the target physical node is greater than or equal to the resources required by the sample virtual network function corresponding to the target physical node.

In an embodiment, the updating the report value table of the training samples in the initial federal reinforcement learning model according to the estimated report value to obtain the trained federal reinforcement learning model includes:

performing aggregation processing on the updated return value table corresponding to each training sample to obtain an aggregated return report;

distributing the aggregated report back to each training sample so that each training sample determines a new return value table corresponding to each training sample according to the aggregated report back;

and under the condition that the error of the new return value table and the updated return value table is smaller than a preset error threshold value, obtaining the trained federal reinforcement learning model.

In an embodiment, the aggregating is performed on the updated return value table corresponding to each training sample to obtain an aggregated return report, and the method further includes:

determining confidence coefficients according to the updated return value tables of each training book, wherein the confidence coefficients are used for the trust degrees of the return value tables;

and determining the aggregated report back according to the confidence.

In a second aspect, the present application provides a service function chain determining apparatus, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of virtual network functions and a plurality of physical nodes, the virtual network functions have a preset execution sequence, and each virtual network function is realized on one physical node;

the virtual network scheduling system comprises a first determining module, a second determining module and a scheduling module, wherein the first determining module is used for determining a plurality of return values corresponding to a plurality of scheduling strategies, each scheduling strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the scheduling strategies for realizing the virtual network function;

and the second determining module is used for taking the scheduling strategy corresponding to the maximum return value as the service function chain scheduling of the plurality of virtual network functions.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements a service function chain determination method as in any one of the embodiments of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, implement the service function chain determination method as in any one of the embodiments of the first aspect.

In a fifth aspect, the present application provides a computer program product, and when executed by a processor of an electronic device, the instructions of the computer program product cause the electronic device to execute a service function chain determination method implemented in any one of the embodiments of the first aspect.

In a method, an apparatus, a device, a medium, and a product for determining a service function chain provided in an embodiment of the present application, a plurality of virtual network functions and a plurality of physical nodes are obtained, where the plurality of virtual network functions have a first preset execution order, and each virtual network function is implemented on one physical node; determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategies for realizing the virtual network function; and using the arrangement strategy corresponding to the maximum return value as the service function chain arrangement of the plurality of virtual network functions. By the method, the return values of different arrangement strategies of the virtual network functions in the corresponding physical nodes can be obtained, and the arrangement strategy with the maximum return value is selected as the optimal arrangement strategy of the virtual network functions, so that the virtual network functions can obtain the service function chain according to the optimal arrangement strategy, the probability of successful execution of the virtual network functions in the corresponding physical nodes can be ensured to be high, and the reliability of arrangement of the service function chain is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart illustrating a service function chain determination method according to an embodiment of the present application;

FIG. 2 is a schematic workflow diagram of a federated learning model provided in one embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating migration and calibration of a federated learning model provided in an embodiment of the present application;

FIG. 4 is a diagram illustrating simulation results of a simulation experiment according to an embodiment of the present application;

FIG. 5 is a schematic view of a simulation experiment according to an embodiment of the present application;

FIG. 6 is a diagram of simulation results of another simulation experiment provided in one embodiment of the present application;

FIG. 7 is a graph of simulation results of another simulation experiment provided in one embodiment of the present application;

FIG. 8 is a graph of simulation results of another simulation experiment provided in one embodiment of the present application;

FIG. 9 is a graph of simulation results of another simulation experiment provided in one embodiment of the present application;

fig. 10 is a schematic structural diagram of a service function chain determining apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments of the present disclosure may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

It is noted that, in this document, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional identical elements in the process, method, article, or apparatus that comprises the element.

With the rapid development of big data and artificial intelligence, reasonable processing, mining and application of a large amount of geographically distributed data resources are the key for exerting data values and enabling various industries. Due to the real-time and reliability requirements of data processing, a computing mode is changed from a centralized processing mode to a distributed collaborative computing mode, a service provider deploys edge data centers at a plurality of positions, large-scale data processing tasks are scheduled to geographically dispersed data centers for processing based on a Network Function Virtualization (NFV), and more efficient and intelligent network services can be provided at positions close to users.

In the prior art, service function chains are mainly arranged in the following way: (1) acquiring a historical Network state according to a Software Defined Network (SDN) controller; the network state comprises service function chain request flow information generated in the Internet of things network supporting mobile edge computing and corresponding network resource state information; setting deep reinforcement learning parameters and initializing the weight of a neural network; training a neural network according to an experience sample generated by interaction of an agent and the environment; and for the service function chain request flow acquired in real time, determining and deploying the placement and routing path of the virtualized network function meeting the requirement of the service function chain request flow by using the trained neural network and adopting a heuristic algorithm. (2) Resources required by virtual network functions are regarded as variable parameters, the relation among request arrival rate, computing resources and processing delay is analyzed on the basis of queuing theory and elastic resource allocation, energy consumption and network delay are used as optimization indexes, and a two-stage heuristic arranging strategy is provided. In the first stage, a series of mapping relations between VNFs and servers are obtained by utilizing a greedy deployment strategy; and in the second stage, on the basis of the solution set in the previous stage, the nonlinear constraint optimization problem is further solved, and the optimal configuration of the computing resources is obtained.

The scheme can have better performance according to different optimization targets in a single problem environment, but when the environment changes and the service data is limited, high-quality resource scheduling decision is difficult to perform. And the service function chain arrangement optimization goal of the scheme does not consider the reliability problem. Due to the objective existence of software bugs and other factors, virtual nodes built on reliable physical nodes may themselves fail. The efficient and intelligent network service has requirements on not only timeliness of services but also reliability, and once a problem occurs in the services, serious harm is caused to life. Existing research typically addresses the reliability problem through a backup strategy, but this incurs additional redundant device overhead. That is, in the prior art, there is no comprehensive consideration for the optimization index (reliability and arrangement overhead) of the service function chain.

In order to solve the problem of the prior art, embodiments of the present application provide a method, an apparatus, a device, a medium, and a product for determining a service function chain. First, a service function chain determination method provided in the embodiment of the present application is described below.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

Network Function Virtualization (NFV)

In month 10 2012, the European Telecommunications Standards Institute (ETSI) established the NFVISG organization and defined the network function virtualization, with the initiative of multiple telecommunications operators such as german telecommunications. Network function virtualization aims at piggybacking traditional network services into standardized servers, which may be located on data centers, network nodes or user terminals, by developing standard Internet Technology (IT) virtualization Technology. Network functions are implemented in software that can run on a range of industry standard server hardware and can be instantiated at various locations in the network as needed without the need to equip new devices. In 2013, in 10 months, ETSI supplements the concept of network function virtualization, and proposes a network function virtualization architecture.

The VNF layer is mainly composed of two parts: a Virtualized Network Function (VNF) and a virtual Element Management System (EMS). The VNF represents a software implementation of network functions running on the infrastructure layer, while the EMS is used to manage the specific functions of the individual VNF and its characteristics. A VNF is an entity corresponding to a specific network function node, which is capable of providing different network functions in software, without being affected by hardware resources.

The MANO layer mainly includes 3 parts, namely, an NFV Orchestrator (NFV editor, NFVO), a VNF Manager (VNF Manager, VNFM), and an infrastructure Manager (NFVI Manager, VIM), and the MANO can provide an efficient and convenient management platform. Wherein the overall management of network traffic, VNF and resources is handled by NFVO; the VNFM is responsible for related management of resources, life cycles and the like of the VNF; the VIM is responsible for management and monitoring of infrastructure layer resources.

The bearer of the service in NFV is the service function chain. The service function chain consists of an ordered sequence of virtualized network functions, i.e. a user's service request is split into different sub-services and handed over to a group of related VNFs for processing. The deployed VNFs are logically independent of each other, as VNFs can be instantiated on any NFV-capable physical node in the physical network. Therefore, by dynamically sequencing and placing a plurality of VNFs according to a specific traffic forwarding graph, a service function chain capable of providing a specific network service can be integrated.

Fig. 1 is a flowchart illustrating a service function chain determination method according to an embodiment of the present application. As shown in fig. 1, the method may specifically include the following steps:

s101, acquiring a plurality of virtual network functions and a plurality of physical nodes, wherein the virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node.

Optionally, in the embodiment of the present application, the service in the distributed cloud-edge collaborative computing scenario has a high requirement on reliability. Meanwhile, with the intelligent evolution of the network, the types and the number of services in the network are greatly increased. The method can be applied to processing large-scale cross-domain data, wherein the intensive application for processing the large-scale cross-domain data comprises a plurality of service function blocks, such as data acquisition, encryption and decryption, data storage, data transmission, analysis and processing and the like, each service function block can be abstracted into a plurality of Virtual Network Functions (VNFs), and the VNFs are linked according to the dependency relationship to form a service function chain. In the embodiment of the application, a large-scale cross-domain data-intensive application service function chain can be constructed into a virtual network.

Optionally, in a possible implementation manner of the present application, the virtual network

Wherein Gv is a virtual network that is based on a set of virtual nodes ≧>

And virtual link set->

Formed by having virtual nodes>

And f is any one virtual node in the virtual node set, wherein the virtual node represents one VNF on the service function chain. Virtual link->

Z is any one virtual link in the virtual link set, and the virtual links may form a logical series relationship between VNFs according to a dependency relationship between the VNFs.

Optionally, in a possible implementation manner of the present application, the data center may allocate corresponding physical resources for the data-intensive application, and carry respective service function blocks of the cross-domain data-intensive application, that is, each virtual network function is implemented on one physical node, and it is easily understood that, in this embodiment of the present application, multiple VNFs may be executed in one physical node. The physical server and the communication link of the data center are constructed into a physical network, wherein the physical server is the physical node of the application. It is easily understood that, in the embodiment of the present application, the number of physical nodes needs to be equal to or greater than the number of VNFs.

Optionally, in one possible implementation of the present application, the physical network

Wherein Gp is a physical network which is based on a set of physical nodes->

And physical link set->

Is formed by a physical node>

V is any physical node in the physical node set, and the physical node represents a physical server of each data center for bearing a virtual network function, where the physical node resource may include a storage resource and a computing resource. Physical link->

And e is any one physical link in the physical link set, and the physical link is an actual communication link between the physical nodes, wherein the resources of the physical link comprise communication resources.

S102, determining a plurality of report values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the report values are used for representing the execution success rate of each physical node in the arrangement strategies for realizing the virtual network function.

Optionally, in this embodiment of the present application, in a cross-domain data center scenario, a scheduling problem of a service function chain may be expressed as: the VNF in the service function chain needs to find an optimal scheduling policy, so that the service function chain is mapped to physical resources one by one, and the mapped service function chain can have higher reliability and lower running cost. To describe the service function chain orchestration problem, the present invention defines the mapping relationship as follows.

Optionally, for service function chains

(S is the set of service function chains and S is any one of the set of service function chains), there is a resource mapping ≧ for S>

. Wherein, g _s Is a mapping of s, is->

Representing a set of virtual nodes->

Collect ≧ physical node>

Is greater than or equal to>

；/>

Representing a virtual link set->

Set { } to the physical link>

Is greater than or equal to>

。/>

Indicating that a virtual node is to be pick>

Arranged to a physical node pick>

Is taken on or in conjunction with the action of>

. Similarly, is>

Indicating that a virtual link is to be +>

Arranging for a physical link->

The behavior of (c).

Alternatively, when service function chain orchestration, the orchestration process may be broken down into sequential placement of VNFs, along with the connection of links. The impact of each VNF placement on the service function chain as a whole is related to the previous placed VNF. Therefore, when the service chain is arranged, the influence of each arrangement of the VNFs on the whole service function chain, that is, the size of the report value needs to be calculated, so that the report value corresponding to each arrangement policy can be compared by calculating the report value of each VNF arrangement policy. It is easy to understand that, in the embodiment of the present application, when the service function chain is arranged, reliability and operation cost issues need to be considered, so the return value in the embodiment of the present application may be used to represent the operation cost of each physical node for implementing the virtual network function, in addition to representing the execution success rate of each physical node for implementing the virtual network function. In these alternative embodiments, the report values of each scheduling policy are compared, so that the reliability and running cost of the service function chain are considered comprehensively, and the joint optimization reliability and scheduling overhead are targeted, so that the scheduled service function chain can have higher reliability and lower running cost.

And S103, taking the arrangement strategy corresponding to the maximum return value as a service function chain arrangement of the plurality of virtual network functions.

Optionally, in this embodiment of the present application, by comparing the return values of different arrangement strategies of multiple virtual network functions in corresponding physical nodes, and selecting an arrangement strategy with the largest return value as an optimal arrangement strategy of the virtual network function, the problem of step-by-step arrangement of service function chains is solved, so that the virtual network function can obtain a corresponding optimal service function chain according to the optimal arrangement strategy, thereby ensuring that the probability of successful execution of the virtual network function in the corresponding physical node is higher, and thus improving the reliability of arrangement of the service function chains.

In the service function chain determining method provided in the embodiment of the present application, a plurality of virtual network functions and a plurality of physical nodes are obtained, where the virtual network functions have a first preset execution sequence, and each virtual network function is implemented on one physical node; determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategies for realizing the virtual network function; and taking the scheduling strategy corresponding to the maximum return value as service function chain scheduling of the plurality of virtual network functions. By the method, the return values of different arrangement strategies of the virtual network functions in the corresponding physical nodes can be obtained, and the arrangement strategy with the maximum return value is selected as the optimal arrangement strategy of the virtual network functions, so that the virtual network functions can obtain the corresponding optimal service function chain according to the optimal arrangement strategy, the probability that the virtual network functions are successfully executed in the corresponding physical nodes can be guaranteed to be high, and the reliability of service function chain arrangement is improved.

In an embodiment, the step 102 may specifically perform the following steps:

s201, determining a plurality of return values corresponding to the plurality of arrangement strategies by using a trained federal reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes and the arrangement strategies corresponding to a plurality of sample virtual network functions, and each sample virtual network function is realized on one first physical node.

Reinforcement learning is a new machine learning mode, also called reinforcement learning, which does not need prior knowledge, obtains feedback of the environment on the current action of the agent through interactive operation with the environment, determines the next operation by utilizing the feedback, and corrects own behavior through the iterative process of one time.

The reinforcement learning is different from the supervised learning or the unsupervised learning, the reinforcement learning does not directly depend on external experience, the reinforcement learning is learned according to own exploration, in each exploration process, the external environment can evaluate the action of the intelligent body, and the intelligent body continuously corrects own behavior strategy in positive or negative feedback. If a behavior gets positive feedback, the intelligence will prefer to choose that behavior in the decision, and if a behavior gets negative feedback, the intelligence will try to avoid in the decision.

Optionally, in an embodiment of the present application, the service function chaining decision is learned from the interaction of the environment and the agent through a reinforcement learning model. In the cross-data center service orchestration scenario of the present proposal, a single data center has insufficient experience in orchestrating different types of services in different network environments, and needs to rely on the business processing experience of multiple data centers. However, business data of different data centers may be governed by different companies, and it is difficult to collect unified training. In a distributed cloud-edge collaborative computing scene, geographically distributed data centers serve as decision-making main bodies to make arrangement decisions for computing tasks, different decision-making main bodies hold limited service data, service data collection and interaction among the data centers waste a large amount of bandwidth resources, and meanwhile, service data among part of the data centers are difficult to share due to privacy problems. Therefore, experience sharing is realized by using safe fusion of federal learning, and the federal reinforcement learning model of the multi-data center is designed, so that the arrangement strategy of the service function chain is more reliable and accurate.

The Federal reinforcement learning model is realized by the following basic steps: firstly, the initialized training model is downloaded to each training end (namely training sample), each training end updates the model parameters according to the data of the training end, and different training environments can generate different model parameters. And then, the updated model parameters are sent to the cloud for integration, and the integrated model parameters are used as initial parameters of the next model update. And repeating the training steps until the model converges, and generating the Federal reinforcement learning model. The Federal reinforcement learning model is characterized in that all training data have the same feature dimension, and the parameters of the trained model are transmitted instead of local data.

Optionally, in a possible implementation manner of the present application, in order to solve the service function chain arrangement problem by using the federal reinforcement learning model, a reinforcement learning model may be set in each training sample, and the service function chains in the respective databases in the respective training samples are arranged by the reinforcement learning model, in this embodiment of the present application, each training sample may be a data center in a different place, or a data center belonging to a different company in the same place.

Optionally, in the embodiment of the present application, before training the reinforcement learning model of each training sample itself, several key parts of the reinforcement learning model in each training sample need to be: state (State), action (Action), reward (Reward), and target are defined and analyzed. The goal of the data center service function chain arrangement in the present invention is to improve service reliability and reduce service cost, so the reliability of service and cost factors need to be considered when designing the reward function.

Wherein, the state:

represents the service function chain choreography state at time t, where S _t Scheduling status for service function chains at time t>

Represents the physical resource occupancy of the server (i.e., the first physical node in the training sample) at time t, where the resources may include storage resources and computing resources. />

Represents the VNF (i.e., the sample virtual network function in the training sample) to be placed for the current orchestration phase, ->

Indicating the VNF that has been placed before the current orchestration phase.

The method comprises the following steps:

representing service function chain orchestration actions at time t, A _t Indicating a service function chain choreography action at time t, i.e. < >>

VNF placement policy indicating time t, including whether services are placedWhether the tandem relationship between the service and the physical node maps to a physical link. VNF placement policy usage &inservice function chain>

And (4) showing. />

Indicates execution>

In a manner known in the art, is based on>

Represents execution>

The value range of the strategy (1) is {0,1}, wherein 1 represents that the strategy is placed on a physical node or a link, and 0 represents that the strategy is not placed. For->

Have>

，/>

Indicating linking of service function>

All virtual links on are mapped to a policy set of physical links.

Optionally, in this embodiment of the present application, when the service function chain is scheduled, the scheduling process may be split into sequential placement of VNFs, along with connection of the links. The impact of each VNF placement on the service function chain as a whole is related to the VNF of the previous placement. The reinforcement learning model can obtain the optimal decision by calculating the return of each action, and therefore, in the alternative embodiments, the reinforcement learning model is suitable for solving the step-by-step arrangement problem of the service function chain, so that the arranged service function chain can have higher reliability and lower operation cost.

In an embodiment, before the step 201, the method specifically includes the following steps:

s301, obtaining the training samples, wherein the virtual network functions of the training samples have a second preset execution sequence;

s302, determining target physical nodes for each sample virtual network function from the plurality of first physical nodes according to the second preset execution sequence.

Optionally, in the embodiment of the present application, the reinforcement learning model used in the present disclosure is Q-learning.

Q-learning is one of the most important reinforcement learning methods, learning from rewards or penalties of behaviors, rather than learning from given training samples, is an incremental dynamic programming process that searches for optimal strategies step by step. Q-learning provides the agent the ability to choose the best strategy by evaluating the Q value that represents the overall outcome of a series of actions. At each step of the interaction, the agent receives a reinforcing signal that is a reward or penalty for the selected action in the environment. Then updating the current state and Q-table (Q-table includes all Q values); the agent continues to select the next action through certain policies. Through continuous loop iteration, the agent finds the optimal strategy.

In Q-learning, the purpose of finding the optimal strategy is realized by updating the Q table. The Q table is a mapping table between state-action and estimated future rewards.

Optionally, in one possible implementation manner of the present application, the reinforcement learning model may obtain the scheduling state of the server resource and service function chain at time t, thereby determining the scheduling state S at time t _t 。

Optionally, in this embodiment of the application, a corresponding orchestration action may be performed according to the state at time t, and a VNF to be scheduled next and a first physical node corresponding to the VNF are determined. It is to be understood that the first physical node corresponding to the next orchestrated VNF may be any one of a plurality of first physical nodes.

S303, determining a corresponding pre-estimated return value according to the target physical node;

optionally, in the embodiment of the present application, after the corresponding orchestration action is made, the unreported value thereof may be estimated, wherein the unreported value is mainly determined by the reward function.

Optionally, in the embodiment of the present application, in order to define the reward function, the resource constraint existing in the service function chain placement needs to be studied. For physical nodes

The total amount of resources required by the virtual network service function placed on the physical node cannot exceed the total amount of service resources that the physical node can provide. Equation (1) below represents a resource constraint on a physical node, where>

Representing the total amount of resources of the physical node at time t.

(1)

For physical link e, the total amount of resources required by the service function chain placed on the physical link cannot exceed the total amount of service resources that the physical link can provide. Equation (2) represents a resource constraint on a physical link, wherein,

indicating the total amount of resources of physical link e at time t.

(2)

Optionally, in this embodiment of the present application, in order to simplify the analysis of the scheduling problem, the backup of the resource is not considered. At this time, for one virtual node (VNF to be placed), only one physical node can be mapped; meanwhile, for one virtual link (the series relationship between VNFs), only one physical link can be mapped. Namely, the following formulae (3) and (4).

(3)

(4)

Since the purpose of the orchestration is to improve the reliability of the service and reduce the cost, the design of the reward function needs to define the service reliability and the cost benefit of service function chain deployment in addition to considering the resource constraints.

Cost in service orchestration

The definition is shown in formula (5). The cost consists of both the cost of the physical node resources and the cost of the physical links.

(5)

Wherein k is _x Represents the cost of a unit resource on physical node x, k _z Representing the cost per resource on physical link z, and y is z for any physical link.

Reliability of service r _s The definition is shown in formula (6). The reliability of the service consists of the reliability of the physical node and the reliability of the physical link.

(6)

Wherein r is _e Denotes the reliability of the physical link e, r _v Representing the reliability of the physical node v.

The reward function R is defined as formula (7), where β represents the importance of adjusting the coefficient, controlling two factors, i.e., the service reliability and the scheduling cost. The design of the return function comprehensively considers the cost and the reliability of the data center service arrangement and the resource total amount constraint in the arrangement.

(7)

S304, updating a return value table of training samples in the initial federated reinforcement learning model according to the estimated return value to obtain the trained federated reinforcement learning model, wherein the return value table comprises the maximum return value of the virtual network function arrangement strategy of the samples at each time.

Optionally, in the embodiment of the present application, in the scenario of the present problem, the goal of the reinforcement learning model is to find the optimal strategy

Where π is the optimal strategy, A _t I.e. the action of the optimal strategy, O _t I.e. the return value of the optimal strategy, which can be derived from the initial state S _t The greatest return is obtained as shown in equation (8).

(8)

Wherein gamma is a discount elicitor,

and R is a return value.

Optionally, in the embodiment of the present application, the reinforcement learning model used is Q-learning. In Q-learning, the purpose of finding the optimal policy, that is, the maximum return value table for the maximum return value of each state is finally found by updating the Q table (that is, the return value table, which will be described below as the Q table). The Q table is a mapping table between state-action and estimated future rewards. The reinforcement learning model acquires the arrangement state of the server resource and service function chain at the moment t, makes corresponding arrangement action for the state, and estimates a future return function of the state, so as to update the Q value (namely the return value) corresponding to the state-action.

The update procedure of (2) is shown in equation (9).

(9)

Wherein the content of the first and second substances,

for a learning factor, is selected>

For a discount factor, is selected>

Representing the behavior under policy pi, and r is the immediate reward obtained after performing action a.

Assume that the current server resource and service function chain choreography state is

If the currently selected action is a, then the currently selected action needs to satisfy equation (10).

(10)

To guarantee the generalization of Q-learning and avoid trapping local optimality of the result, we usually use epsilon greedy approach for action selection. The mechanism uniformly and randomly selects a selectable action for exploration according to a small probability epsilon, and selects the current best action according to the formula according to the probability of 1-epsilon.

In these optional embodiments, a large-scale cross-domain data intensive application is abstracted into a service function chain based on a network function virtualization technology, the network function virtualization technology improves network capacity, a service function chain arrangement strategy based on federal reinforcement learning is established with joint optimization reliability and arrangement overhead as targets, and service arrangement decision optimization facing reliability guarantee in different environments is realized. And when the return function is set, resource constraint, service reliability and service function chain deployment cost benefit are comprehensively considered, so that the trained reinforcement learning model can output three factors of the resource constraint, the service reliability and the service function chain deployment cost benefit, and the obtained optimal arrangement strategy of the virtual network function enables the arranged service function chain to have higher reliability and lower operation cost.

In an embodiment, before the step 303, the method may specifically perform the following steps:

s401, judging whether the total amount of the residual resources of the target physical node is larger than or equal to the resources required by the sample virtual network function corresponding to the target physical node;

s402, under the condition that the total amount of the residual resources of the target physical node is larger than or equal to the resources required by the sample virtual network function corresponding to the target physical node, determining a corresponding pre-estimated return value according to the target physical node.

Optionally, in this embodiment of the present application, the resource constraints existing in the service function chain placement need to be studied. For a physical node v, the total amount of resources required by a virtual network service function placed on the physical node cannot exceed the total amount of service resources that the physical node can provide. For physical link e, the total amount of resources required by the service function chain placed onto the physical link cannot exceed the total amount of service resources that the physical link can provide. Therefore, in the embodiment of the present application, the calculation of the report value can be performed only when the total amount of resources required by the next virtual network function is less than or equal to the total amount of resources of the first physical node to be selected, otherwise, the selection is not performed.

In these optional embodiments, when the reward function is designed, the problem of resource constraint is considered, so that the physical node selected each time by the virtual network function is ensured to be more accurate, and the accuracy of the arrangement strategy selected by the reinforcement learning model is improved.

In an embodiment, the step 304 may specifically perform the following steps:

s501, the updated return value table corresponding to each training sample is subjected to aggregation processing to obtain an aggregated return report.

Optionally, in the application embodiment, each data center is trained locally to obtain its respective reinforcement learning model, since the model used in the present disclosure is Q-learning, the model parameters are stored in a private Q table, that is, a maximum return value table, and the Q table is a mapping of state-action and reward obtaining in reinforcement learning. Secondly, using a homomorphic encryption technology to encrypt and transmit Q table information obtained from different data centers (training environments), and then sending the encrypted Q table information to a cloud control center (aggregation server).

As shown in fig. 2, fig. 2 is a specific workflow of federal learning of the present application. In order to ensure that training data in different data centers are not leaked, a federal learning framework needs to encrypt a reinforcement learning Q table obtained by training so as to ensure the safety and reliability of a transmission process and the privacy of the training data.

In the federal learning model used in the proposal, a homomorphic encryption method is adopted to ensure the safety of the model. Homomorphic encryption is a classical encryption algorithm. Homomorphism refers to the mapping from an algebraic structure to a homogeneous algebraic structure, which keeps all relevant structures unchanged. Because the result obtained by carrying out specific operation on the homomorphic encrypted ciphertext and then decrypting the homomorphic encrypted ciphertext is consistent with the result obtained by carrying out specific operation on the ciphertext after decrypting the ciphertext, the confidentiality of data operation in the fusion stage of the federal model can be effectively ensured, and the requirement of a federal learning scene is very met. Compared with a safe multiparty computing method, homomorphic encryption is used for less data interaction, so that the communication overhead is less, and the model fusion efficiency can be improved under the federal learning scene needing multi-party participation of the cross-domain data center.

The process of model transmission using homomorphic encryption is roughly as follows: firstly, an aggregation server generates a homomorphic encryption public and private key pair and distributes public keys to each data center participant of federal study; secondly, each data center transmits a calculation result to the aggregation server in a homomorphic ciphertext mode; the aggregation server performs summary calculation and decrypts the result; and finally, the aggregation server returns the decrypted result to each data center, and each data center updates the respective parameter of the function chain arrangement model based on the reinforcement learning service according to the result.

Optionally, in a possible implementation manner of the present application, after obtaining the maximum return value table of each training sample, the aggregation server performs decryption and secure aggregation operation on the model parameters to obtain federal model parameters, that is, a federal Q table (that is, an aggregation return report).

Optionally, in one possible implementation of the present application, which placement action is selected in the current state for the federal model depends on the confidence values of the different private networks. For example, in a certain state, the Q table trained in environment A evaluates to (5, 4,5,6, 5) for the Q value of the next different action, and the Q value of the Q table in environment B was evaluated as (3, 1, 10,2,2,1,1,2, 10,1), we considered that the Q value obtained by the environment B is more reliable because it shows larger difference, so the action corresponding to the Q table in the environment B is selected in the state.

S502, distributing the aggregated report back to each training sample, so that each training sample determines a new return value table corresponding to each training sample according to the aggregated report back.

Optionally, in this embodiment of the application, the aggregation server encrypts the generated federal learning model (i.e., the aggregated report) and sends the encrypted federal learning model back to each data center (i.e., each training sample) for updating the model, and after the agent in each data center acquires the federal learning model, the agent in each data center fuses the local model parameters (i.e., the original return value table) and the federal model parameters (i.e., the aggregated report), so that the decision-making experience can be shared among multiple training environments. In addition to model sharing in the training environment, the federal learning model can also be applied to a newly added data center service deployment scene (new environment) to be used as a basis for model training in the new environment and participate in subsequent model training.

And S503, under the condition that the error between the new return value table and the updated return value table is smaller than a preset error threshold value, obtaining a trained federal reinforcement learning model.

Optionally, in these alternative embodiments, after generating the federal learning model, the aggregation server may send the model parameters, i.e., the aggregated report back, to different data centers. Each data center updates the original maximum return value table of the local model by using the formula (11), so that the local model can learn more features.

(11)

Wherein Q _old Representing the original Q-table, i.e. the original maximum return value table, Q _new Representing the newly acquired Q table, i.e. the new maximum return table, alpha representing the weight of the original Q table, Q _fl To aggregate back to the report.

Optionally, after generating the new local model, the data center may resend the new Q-table to the aggregation server for aggregation, and generate a new federal learning model, that is, generate a new aggregated report back. And then, iterating the process until the errors of the Q table of the new model and the Q table of the original model in each training sample are smaller than a set error threshold value delta, calculating the error threshold value as shown in formula (12), and calculating the errors between the matrixes by using Euclidean distances.

(12)

In the optional embodiments, based on the service function chain arrangement mechanism of the federal reinforcement learning, specifically, a reinforcement learning model is locally deployed in a cross-domain data center as a training basis, training parameters of multiple data centers are further fused to obtain a federal model, and the federal model is issued to a training environment and a new environment for model correction, so that the problem of interactive overhead of original service data is solved, data transmission is reduced, bandwidth is saved, and packet loss rate is reduced.

In the optional embodiments, the federal reinforcement learning model is provided for solving the problem of arranging the cooperative decision of the cross-domain data center service function chain in the distributed cloud-edge cooperative scene, so that training subjects in different environments can obtain the training experience of groups, and the optimal arrangement decision facing reliability improvement and cost reduction is realized. Firstly, defining the problem of service function chain arrangement, and establishing a mathematical model of the service function chain arrangement; secondly, a reinforcement learning service arrangement model and a federal reinforcement learning model which take network service reliability as an optimization target are established, and the arrangement method provided by the application can realize arrangement of reliability guarantee and resource overhead control.

In an embodiment, the step 501 may specifically perform the following steps:

s601, determining confidence degrees according to the updated return value tables of each training book, wherein the confidence degrees are used for representing the trust degrees of the return value tables;

s602, determining the aggregated report according to the confidence.

Optionally, in the embodiment of the present application, there are many methods for defining the confidence, and there are variance, information entropy, and the like in common. And the variance neglects the influence of the extreme value, so the use range is limited. In contrast, information entropy is more appropriate for describing the uncertainty of information. Therefore, we refer to the way the entropy of information is computed to define the confidence value.

Confidence value c in state x for data center j, i.e., each training sample _xj The definition is shown in formula (13).

(13)

Wherein m is an action dimension (i.e. the number of selectable physical nodes), reward represents a return value, and ij is any numerical value in the return value table.

Based on the above calculation of the confidence value, the confidence w _xj Is as defined in formula (14).

(14)

Assuming that k states exist in the reinforcement learning environment and the Q table is a k × m dimensional matrix, the federal model generation formula is shown in formula (15).

(15)

Wherein Q is _j Is involved in trainingQ-table, Q, for data center (i.e., training sample) j _fl To aggregate back to the report.

Optionally, in the embodiment of the present application, in order to ensure the adaptability of the federal model in a new environment, model migration and correction are required to be performed on the obtained federal learning model. FIG. 3 shows the basic flow of the migration and correction of the Federal learning model. Firstly, the reinforcement learning models obtained from different data centers are fused to generate a federal model. Secondly, the federal model is required to be transmitted to an original data center (training environment) and a new data center (new environment), the model in the training environment is fused with the federal model, and the model fusion mode is shown as a formula (11); and in the new environment, the federal model is used as a pre-trained model, and the training of reinforcement learning is continued until the model converges. In actual use, the training and fusing steps are usually repeated several times.

Optionally, in a possible implementation manner of the present application, the federal learning model is used as a pre-training model, and when reinforcement learning training is performed in a new environment, since experience of early training is obtained, convergence rate of reinforcement learning is increased. Fig. 4 shows the training process of the model in the new environment, with the abscissa as the convergence number and the ordinate as the return value, based on the coordinates (0, 1600), where the curve B below the coordinates represents the return from each training round during the retraining of the reinforcement learning model in the new environment, and the curve a above the coordinates represents the training effect after using the federal model as the pre-training model. As can be seen from the figure, under the condition of pre-training, after the training times reach 1 ten thousand, the training result tends to be stable; when retraining, the training was repeated for about 80 ten thousand times.

In the optional embodiments, a reinforcement learning model is locally deployed in the cross-domain data center as a training basis, training parameters of multiple data centers are further fused to obtain a federal model, and the federal model is issued to a training environment and a new environment for model correction, so that the problem of interactive overhead of original service data is solved, data transmission is reduced, bandwidth is saved, and packet loss rate is reduced.

Optionally, in order to verify the service arrangement effect of the federal reinforcement learning model provided by the invention, a proper simulation environment and arrangement task need to be designed. In the simulation, 3 different environments are used, which respectively correspond to different data centers, wherein 2 are training environments with pre-arrangement experience, and 1 is a new environment without arrangement experience. The simulation scenario is shown in fig. 5. The respective training processes of the 3 environments are independent, and model information can be exchanged only after a federal enhanced model is generated.

The physical network in each environment comprises 20 nodes, each node has different reliability and VNF load, and because the physical distances between the nodes are different, the cost of the VNF to be placed is in direct proportion to the distance between the node selected this time and the node where the VNF placed last time is located; and the service function chain to be orchestrated consists of 4 VNFs in series.

Firstly, 2 training environments need to be independently and intensively trained, and the process of obtaining experience in the early stage is simulated. After the reinforcement learning model in the training environment converges, the training environment is considered to have certain arrangement experience, and at the moment, the training models are subjected to federal fusion through a frame of federal learning to generate a federal model. The generated federal model is respectively transmitted to 2 training environments and 1 new environment, and model fusion is carried out in the respective environments.

The effectiveness of the federal learning model provided by the proposal is verified by comparing the arrangement effects of the federal learning model and an independently trained reinforcement learning model in a new environment and a training environment.

In a simulation experiment, the training effects of a retrained reinforcement learning model and a federal learning model are compared, and the performances of the arrangement model in the aspects of service reliability and resource expenditure are respectively compared in a new environment and a training environment. In the simulation, the comparison index used is the reliability of the service function chain and the resource cost of the service orchestration. The simulation results obtained by a number of model training processes are shown in fig. 6-9.

Fig. 6 and fig. 7 show the comparison of the learning effect of the functional chain arrangement model based on the federal reinforcement learning service proposed in the present invention in the training environment and the functional chain arrangement model based on the reinforcement learning service in the prior art. Fig. 6 shows the simulation results of the service reliability, in which the abscissa indicates the number of convergence times and the ordinate indicates the service reliability, the curve a above the coordinate is the simulation result of the federal learning model of the present application with reference to the coordinate (0,0.834), and the curve B below the coordinate is the simulation result of the reinforcement learning model Q-learning. Fig. 7 shows the simulation results on the resource cost, where the abscissa is the number of convergence times, the ordinate is the cost (i.e., the resource cost), the coordinate (0, 31) is used as a reference, the curve B above the coordinate is the simulation result based on the reinforcement learning model Q-learning, and the curve a below the coordinate is the simulation result of the federal learning model of the present application. From the simulation result, in the aspect of service reliability, the method provided by the invention can achieve higher reliability at the beginning of training and maintain stability, and the Q-learning model based on reinforcement learning can achieve a high reliability value only by training for about 40 ten thousand times; in the aspect of resource overhead, the method provided by the invention can consume lower cost, and the reinforcement learning model still cannot reach the level which can be reached by the federal model in limited training. The method provided by the invention can be used for training and deciding in a training environment, can achieve higher service reliability and consumes less resources.

The comparison between the learning effect of the functional chain arrangement model based on the federal reinforcement learning service proposed in the present invention and the learning effect of the functional chain arrangement model based on the reinforcement learning service in the prior art in the new environment is shown in fig. 8 and fig. 9. Fig. 8 shows the simulation result of the service reliability, in which the abscissa shows the number of convergence times and the ordinate shows the service reliability, the curve a above the coordinates is the simulation result of the federal learning model of the present application with reference to the coordinates (0,0.870), and the curve B below the coordinates is the simulation result of the reinforcement learning model Q-learning. Fig. 9 shows the simulation results on the resource cost, the abscissa shows the number of convergence times, the ordinate shows the cost (i.e., the resource cost), the coordinate (0, 17) is used as a reference, the curve B above the coordinate shows the simulation results based on the reinforcement learning model Q-learning, and the curve a below the coordinate shows the simulation results of the federal learning model of the present application. As can be seen from the simulation results, in the aspect of the reliability of the service, the method provided by the invention can quickly reach the convergence state, and the reliability of the service is kept at a higher value in the whole training process; in terms of resource overhead of the service function chain, the federal model can also achieve rapid convergence and consume lower cost. The method provided by the invention can be used for training and deciding in a new environment, can achieve higher service reliability and consumes less resources.

Next, the training effect of the federal reinforcement learning proposed by the present invention is compared with other existing methods.

TABLE 1 comparison of the effects of the algorithms

As shown in table 1, the comparison methods of selection are a reliability-first greedy algorithm and a cost-first greedy algorithm, which are commonly used when securing reliability or service arrangement cost. And selecting a service function chain consisting of 4 VNFs in the new training environment to perform algorithm simulation, and comparing the reliability and the cost of the target service function chain.

As can be seen from the comparison of the algorithm effects in Table 1, the method provided by the invention can be guaranteed in terms of reliability and cost. Compared with a greedy algorithm with priority on reliability, the method for the federal reinforcement learning has lower consumption cost; compared with the greedy algorithm with cost priority, the method can improve the service reliability.

By combining the analysis, the service function chain determining method based on the federal reinforcement learning provided by the invention can effectively guarantee the reliability of the service and reduce the resource cost of the service. Meanwhile, rapid and effective real-time arrangement decision can be achieved in the multi-data center environment.

In these alternative embodiments, the service orchestration scenario focused on the cross-data center is applicable to a variety of big data service applications, for example, large-scale cross-domain data intensive applications such as communication range cards, including service functions such as signaling data acquisition, encryption and decryption, storage, transmission, analysis and processing, which need to be linked and deployed on different servers. The invention aims to design a service function chain arrangement strategy, and obtain a service function chain arrangement scheme so as to improve service reliability and reduce service cost. However, in different data centers distributed geographically, resource conditions and service requirements of servers are different, a large amount of communication resource overhead exists in bottom layer data interaction between different data centers, meanwhile, service data cannot be shared among part of the data centers due to privacy problems, a reinforcement learning arrangement model obtained by training a single data center does not have universality, and optimal arrangement effect and arrangement speed cannot be obtained in multiple data centers. The service function chain arrangement strategy based on the federal reinforcement learning designed by the invention can overcome the problem of arranging business data sharing of the cross-domain data center.

Fig. 10 is a schematic structural diagram of a service function chain determination device according to another embodiment of the present application, and only a part related to the embodiment of the present application is shown for convenience of description.

Referring to fig. 10, the service function chain determining apparatus may include:

an obtaining module 1001, configured to obtain a plurality of virtual network functions and a plurality of physical nodes, where the plurality of virtual network functions have a first preset execution order, and each virtual network function is implemented on one physical node;

a first determining module 1002, configured to determine multiple reporting values corresponding to multiple orchestration policies, where each orchestration policy is used to indicate a physical node corresponding to each virtual network function, and the reporting value is used to characterize an execution success rate of each physical node in the orchestration policy to implement the virtual network function;

a second determining module 1003, configured to use the orchestration policy corresponding to the maximum reward value as a service function chain orchestration of the multiple virtual network functions.

In an embodiment, the service function chain determining apparatus may further include:

and the third determination module is used for determining a plurality of report values corresponding to the plurality of arrangement strategies by using a trained federated reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes and the arrangement strategies corresponding to a plurality of sample virtual network functions, and each sample virtual network function is realized on one first physical node.

In one embodiment, the federated reinforcement learning model includes:

the second acquisition module is used for acquiring the plurality of training samples, and the sample virtual network functions of the plurality of training samples have a second preset execution sequence;

a fourth determining module, configured to determine, according to the second preset execution order, a target physical node for each sample virtual network function from the multiple first physical nodes;

a fifth determining module, configured to determine, according to the target physical node, a corresponding pre-estimated return value;

and the first updating module is used for updating a return value table of training samples in an initial federated reinforcement learning model according to the estimated return value to obtain the federated reinforcement learning model, wherein the return value table comprises the maximum return value of the virtual network function arrangement strategy of each sample.

the first judgment module is used for judging whether the total amount of the residual resources of the target physical node is greater than or equal to the resources required by the sample virtual network function corresponding to the target physical node;

a sixth determining module, configured to determine, according to the target physical node, a corresponding pre-estimated return value when a total amount of remaining resources of the target physical node is greater than or equal to resources required by a sample virtual network function corresponding to the target physical node.

the first aggregation module is used for aggregating the updated return value table corresponding to each training sample to obtain an aggregated return report table;

the distribution module is used for distributing the aggregated report back to each training sample so that each training sample can determine a new return value table corresponding to each training sample according to the aggregated report back;

and the seventh determining module is used for obtaining the trained federal reinforcement learning model under the condition that the error between the new return value table and the updated return value table is smaller than a preset error threshold value.

an eighth determining module, configured to determine a confidence level according to the updated return value table of each training book, where the confidence level is used to represent a trust level of each return value table;

and the ninth determining module is used for determining the aggregated return report according to the confidence coefficient.

It should be noted that, the contents of information interaction, execution process, and the like between the above-mentioned devices/units are based on the same concept as that of the method embodiment of the present application, and are devices corresponding to the above-mentioned battery thermal runaway early warning method, and all implementation manners in the above-mentioned method embodiment are applicable to the embodiment of the device, and specific functions and technical effects thereof may be specifically referred to in the method embodiment section, and are not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 11 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application.

The device may include a processor 1101 and a memory 1102 in which program instructions are stored.

The steps in any of the various method embodiments described above are implemented when the program is executed by the processor 1101.

Illustratively, the programs may be divided into one or more modules/units, which are stored in the memory 1102 and executed by the processor 1101 to complete the application. One or more modules/units may be a series of program instruction segments capable of performing certain functions and describing the execution of programs on the device.

Specifically, the processor 1101 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 1102 may include mass storage for data or instructions. By way of example, and not limitation, memory 1102 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, magnetic tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 1102 may include removable or non-removable (or fixed) media, where appropriate. Memory 1102 can be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 1102 is a non-volatile solid-state memory.

The memory may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) readable storage media (e.g., a memory device) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the method according to an aspect of the disclosure.

The processor 1101 reads and executes the program instructions stored in the memory 1102 to implement any one of the methods in the above-described embodiments.

In one example, the electronic device can also include a communication interface 1103 and a bus 1110. The processor 1101, the memory 1102, and the communication interface 1103 are connected via a bus 1110 to complete communication therebetween.

The communication interface 1103 is mainly used for implementing communication between modules, apparatuses, units and/or devices in this embodiment.

Bus 1110 includes hardware, software, or both to couple the components of the online data traffic billing device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 1110 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

In addition, the embodiments of the present application may provide a storage medium to implement the method in the embodiments. The storage medium having stored thereon program instructions; which when executed by a processor implements any of the methods of the above embodiments.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the foregoing method embodiment, and the same technical effect can be achieved.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing method embodiments, and can achieve the same technical effects, and in order to avoid repetition, details are not described here again.

It is to be understood that the present application is not limited to the particular arrangements and instrumentalities described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions, or change the order between the steps, after comprehending the spirit of the present application.

The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments can be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and so forth. The code segments may be downloaded via a computer grid such as the internet, an intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As will be apparent to those skilled in the art, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A method for service function chain determination, the method comprising:

2. The method of claim 1, wherein determining a plurality of reward values for a plurality of orchestration policies comprises:

and determining a plurality of return values corresponding to the plurality of arrangement strategies by using a trained federal reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes and the arrangement strategies corresponding to a plurality of sample virtual network functions, and each sample virtual network function is realized on one first physical node.

3. The method of claim 2, wherein prior to determining the plurality of reward values corresponding to the plurality of orchestration strategies using the trained federated reinforcement learning model, the method further comprises:

obtaining a plurality of training samples, wherein the sample virtual network functions of the training samples have a second preset execution sequence;

4. The method of claim 3, wherein prior to said determining a corresponding pre-estimated return value from said target physical node, said method further comprises:

and under the condition that the total amount of the residual resources of the target physical node is greater than or equal to the resources required by the sample virtual network function corresponding to the target physical node, determining a corresponding pre-estimated return value according to the target physical node.

5. The method according to claim 3, wherein the updating the report value table of the training samples in the initial federated reinforcement learning model according to the estimated report value to obtain the trained federated reinforcement learning model comprises:

6. The method according to claim 5, wherein the aggregating the updated report back value table corresponding to each training sample to obtain an aggregated report back table comprises:

determining confidence degrees according to the updated return value tables of each training book, wherein the confidence degrees are used for representing the trust degrees of the return value tables;

and determining the aggregated report back according to the confidence.

7. A service function chain determining apparatus applied to a first device, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of virtual network functions and a plurality of physical nodes, the virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node;

the system comprises a first determining module, a second determining module and a judging module, wherein the first determining module is used for determining a plurality of return values corresponding to a plurality of arrangement strategies, each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategies for realizing the virtual network function;

8. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the service function chain determination method of any of claims 1-6.

9. A computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the service function chain determination method according to any one of claims 1 to 6.

10. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the service function chain determination method according to any of claims 1-6.