CN115955402B

CN115955402B - Service function chain determining method, device, equipment, medium and product

Info

Publication number: CN115955402B
Application number: CN202310239063.1A
Authority: CN
Inventors: 尚晶; 肖智文; 武智晖; 郭志伟; 陈卓
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-08-01
Anticipated expiration: 2043-03-14
Also published as: CN115955402A

Abstract

The application provides a service function chain determining method, a device, equipment, a medium and a product, wherein the method comprises the following steps: acquiring a plurality of virtual network functions and a plurality of physical nodes, wherein the plurality of virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node; determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategy for realizing the virtual network function; and arranging the arrangement strategy corresponding to the maximum return value as a service function chain of a plurality of virtual network functions. The embodiment of the application can improve the reliability of service function chain arrangement.

Description

Service function chain determining method, device, equipment, medium and product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a medium, and a product for determining a service function chain.

Background

With the vigorous development of big data and artificial intelligence, the reasonable processing, mining and application of a large number of geographically distributed data resources are key to playing data value and enabling various industries. Due to the real-time and reliability requirements of data processing, the computing mode is changed from a centralized processing mode to a distributed collaborative computing mode, service providers deploy edge data centers at multiple locations, and large-scale data processing tasks are scheduled to geographically dispersed data centers for processing based on network function virtualization technologies (NetworkFunction Virtualization, NFV), so that more efficient and intelligent network services can be provided near the user locations. The network service functions are mainly carried by a service function chain (Service Function Chain, SFC), which refers to an ordered sequence of virtualized network functions (Virtual Network Function, VNF) through which traffic passes. Service function chaining orchestration (Service Function Chain Orchestration, SFC orchestration) refers to deploying VNFs in a reasonable manner to achieve service performance or economic optimization given a set of known SFCs, given certain constraints.

SFC orchestration can learn from interactions of environments and agents through reinforcement learning models, but a single data center is inexperienced in orchestrating different types of services in different network environments, requiring business process experience that relies on multiple data centers. However, business data of different data centers are governed by different companies, and it is difficult to collect unified training.

That is, the current SFC layout has a problem of low reliability.

Disclosure of Invention

The service function chain determining method, device, equipment, medium and product can improve the reliability of service function chain arrangement.

In a first aspect, an embodiment of the present application provides a service function chain determining method, where the method includes:

acquiring a plurality of virtual network functions and a plurality of physical nodes, wherein the plurality of virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node;

determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating physical nodes corresponding to each virtual network function, and the return values are used for representing the success rate of executing the virtual network functions by each physical node in the arrangement strategy;

And arranging the arrangement strategy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions.

In an embodiment, the determining a plurality of return values corresponding to the plurality of arrangement policies further includes:

and determining a plurality of return values corresponding to the plurality of arrangement strategies by using a trained federal reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes, and the arrangement strategies corresponding to a plurality of sample virtual network functions, and each sample virtual network function is realized on one first physical node.

In an embodiment, before the determining, by using the trained federal reinforcement learning model, a plurality of return values corresponding to the plurality of scheduling policies, the method further includes:

acquiring the plurality of training samples, wherein the sample virtual network functions of the plurality of training samples have a second preset execution sequence;

determining target physical nodes for each sample virtual network function from the plurality of first physical nodes according to the second preset execution sequence;

determining a corresponding estimated return value according to the target physical node;

And updating a return value table of training samples in the initial federal reinforcement learning model according to the estimated return value to obtain a trained federal reinforcement learning model, wherein the return value table comprises the maximum return value of each sample virtual network function arrangement strategy.

In one embodiment, before the determining the corresponding estimated return value according to the target physical node, the method further includes:

judging whether the total amount of the residual resources of the target physical node is larger than or equal to the resources required by the sample virtual network function corresponding to the target physical node;

and under the condition that the total amount of the residual resources of the target physical node is greater than or equal to the resources required by the sample virtual network function corresponding to the target physical node, determining a corresponding estimated return value according to the target physical node.

In an embodiment, updating the report value table of the training sample in the initial federal reinforcement learning model according to the estimated report value, and obtaining the trained federal reinforcement learning model includes:

carrying out aggregation treatment on the updated return value table corresponding to each training sample to obtain an aggregated return table;

distributing the aggregated return report to each training sample so that each training sample can determine a new return value table corresponding to each training sample according to the aggregated return report;

And under the condition that the errors of the new return value table and the updated return value table are smaller than a preset error threshold value, obtaining the trained federal reinforcement learning model.

In an embodiment, the aggregating processing is performed on the updated report value table corresponding to each training sample to obtain an aggregated report table, and the method further includes:

according to the updated return value table of each training book, determining the confidence coefficient, wherein the confidence coefficient is used for the trust degree of each return value table;

and determining the aggregated return report according to the confidence level.

In a second aspect, the present application provides a service function chain determining apparatus, the apparatus comprising:

the system comprises an acquisition module, a storage module and a control module, wherein the acquisition module is used for acquiring a plurality of virtual network functions and a plurality of physical nodes, the plurality of virtual network functions have a preset execution sequence, and each virtual network function is realized on one physical node;

the first determining module is used for determining a plurality of return values corresponding to a plurality of arrangement strategies, each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategy for realizing the virtual network function;

And the second determining module is used for arranging the arrangement strategy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions;

the processor when executing the computer program instructions implements the service function chain determination method as in any one of the embodiments of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium, on which computer program instructions are stored, which when executed by a processor implement a service function chain determination method as in any one of the embodiments of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, where instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform a service function chain determination method implementing any one of the embodiments of the first aspect described above.

In the method, the device, the equipment, the medium and the product for determining the service function chain provided by the embodiment of the application, a plurality of virtual network functions and a plurality of physical nodes are obtained, wherein the plurality of virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node; determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating physical nodes corresponding to each virtual network function, and the return values are used for representing the success rate of executing the virtual network functions by each physical node in the arrangement strategy; and arranging the arrangement strategy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions. By the method, the return values of different arrangement strategies of the virtual network functions in the corresponding physical nodes can be obtained, and the arrangement strategy with the largest return value is selected as the arrangement strategy with the optimal virtual network function, so that the virtual network function can obtain the service function chain according to the optimal arrangement strategy, the probability of successful execution of the virtual network function in the corresponding physical nodes can be ensured to be higher, and the arrangement reliability of the service function chain is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.

FIG. 1 is a flow chart of a method for determining a service function chain according to one embodiment of the present application;

FIG. 2 is a schematic workflow diagram of a federal learning model provided in one embodiment of the present application;

FIG. 3 is a schematic diagram of a basic flow of federal learning model migration and correction provided in one embodiment of the present application;

FIG. 4 is a diagram of simulation results of a simulation experiment provided in one embodiment of the present application;

FIG. 5 is a schematic illustration of a simulation experiment provided in one embodiment of the present application;

FIG. 6 is a graph of simulation results of another simulation experiment provided in one embodiment of the present application;

FIG. 7 is a graph of simulation results of another simulation experiment provided in one embodiment of the present application;

FIG. 8 is a graph of simulation results of another simulation experiment provided in one embodiment of the present application;

FIG. 9 is a graph of simulation results of another simulation experiment provided in one embodiment of the present application;

Fig. 10 is a schematic structural diagram of a service function chain determining apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

With the vigorous development of big data and artificial intelligence, the reasonable processing, mining and application of a large number of geographically distributed data resources are key to playing data value and enabling various industries. Because of the real-time and reliability requirements of data processing, the computing mode is changed from a centralized processing mode to a distributed collaborative computing mode, service providers deploy edge data centers at multiple locations, and large-scale data processing tasks are scheduled to geographically dispersed data centers for processing based on Network Function Virtualization (NFV), so that more efficient and intelligent network services can be provided at locations close to users.

In the prior art, service function chains are mainly organized by: (1) acquiring historical network states according to a software defined network (Software Defined Network, SDN) controller; the network state comprises service function chain request flow information generated in the Internet of things network supporting mobile edge computing and corresponding network resource state information; setting deep reinforcement learning parameters and initializing weights of a neural network; training a neural network according to an experience sample generated by interaction of the intelligent agent and the environment; and for the service function chain request flow acquired in real time, determining and deploying the placement and routing paths of the virtualized network functions meeting the service function chain request flow requirements by using a trained neural network and adopting a heuristic algorithm. (2) The resources required by the virtual network function are regarded as variable parameters, the relation among the request arrival rate, the calculation resources and the processing delay is analyzed based on queuing theory and elastic resource allocation, and the energy consumption and the network delay are used as optimization indexes, so that a two-stage heuristic arrangement strategy is provided. In the first stage, a greedy deployment strategy is utilized to obtain a series of mapping relations between the VNF and the server; and the second stage is used for further solving the nonlinear constraint optimization problem on the basis of the solution set in the previous stage to obtain the optimal configuration of the computing resources.

The scheme can better perform according to different optimization targets in a single problem environment, but when the environment changes and service data are limited, high-quality resource scheduling decisions are difficult to make. Moreover, the service function chain arrangement optimization objective of the scheme does not consider the reliability problem. Virtual nodes built on reliable physical nodes may themselves fail due to the objective existence of software defects and other factors. The high-efficiency intelligent network service has a high requirement on timeliness and reliability of the service, and once the service is in a problem, serious harm is caused to production and life. Existing research has typically addressed reliability issues through backup strategies, but this introduces additional redundant device overhead. That is, none of the prior art has integrated consideration of the optimization metrics (reliability and orchestration overhead) of the service function chain.

In order to solve the problems in the prior art, the embodiment of the application provides a service function chain determining method, a device, equipment, a medium and a product. The following first describes a service function chain determination method provided in the embodiment of the present application.

Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.

Network Function Virtualization (NFV)

The european telecommunications standards institute (European Telecommunication Standard Institution, ETSI) established the NFVISG organization under the initiative of multiple home telecommunications operators, e.g., germany telecommunications, for defining network function virtualization, 10 months 2012. Network function virtualization aims at piggybacking traditional network services into standardized servers, which may be located on data centers, network nodes or user terminals, by developing standard internet technology (Internet Technology, IT) virtualization technologies. The network functions are implemented in software that can run on a range of industry standard server hardware and can be instantiated at various locations in the network as needed without the need to equip new devices. 10 months 2013, ETSI complements the concept of network function virtualization and proposes architecture for network function virtualization.

The VNF layer is mainly composed of two parts: a Virtualized Network Function (VNF) and a virtual element management system (Element Management System, EMS). The VNF represents a software implementation of network functions running on the infrastructure layer, while the EMS is used to manage specific functions of the individual VNFs and their characteristics. A VNF is an entity corresponding to a specific network function node that is capable of providing different network functions in the form of software without being affected by hardware resources.

The popular sensory nerve operator layer (MANO layer) mainly includes 3 parts of NFV Orchestrator (NFVO), VNF Manager (VNFM) and infrastructure Manager (NFVI Manager, VIM), and MANO can provide an efficient and convenient management platform. Wherein the overall management of network traffic, VNF and resources is responsible for NFVO; and the VNFM is responsible for relevant management of resources, life cycle and the like of the VNF; the VIM is responsible for management and monitoring of infrastructure layer resources.

The carrier of services in NFV is the service function chain. The service function chain consists of an ordered sequence of virtualized network functions, i.e. the user's service requests are split into different sub-services and handed over to a group of relevant VNFs for processing. Since VNFs can be instantiated on any NFV-capable physical node in the physical network, the VNFs deployed are logically independent of each other. Therefore, as long as the VNFs are dynamically ordered and placed according to a specific traffic forwarding graph, the VNFs can be integrated into a service function chain capable of providing specific network services.

Fig. 1 is a schematic flow chart of a service function chain determining method according to an embodiment of the present application. As shown in fig. 1, the method specifically may include the following steps:

S101, acquiring a plurality of virtual network functions and a plurality of physical nodes, wherein the plurality of virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node.

Optionally, in the embodiment of the present application, the service in the distributed cloud-edge collaborative computing scenario has a high requirement on reliability. Meanwhile, with the intelligent evolution of the network, the service types and the number in the network are greatly increased. The method can be applied to processing large-scale cross-domain data, wherein the processing large-scale cross-domain data intensive application comprises a plurality of service function blocks, such as data acquisition, encryption and decryption, data storage, data transmission, analysis and processing and the like, each service function block can be abstracted into a plurality of virtual network functions (VNs), and the VNs are linked according to the dependency relationship to form a service function chain. In the embodiment of the application, a large-scale cross-domain data-intensive application service function chain can be constructed as a virtual network.

Optionally, in a possible implementation manner of the present application, the virtual networkWherein Gv is a virtual network consisting of a set of virtual nodes +.>And virtual Link set->Is composed of virtual node- >F is any virtual node in the virtual node set, wherein, the virtual node is virtualThe quasi node represents one VNF on the service function chain. Virtual linksZ is any virtual link in the virtual link set, and the virtual links may form a logical tandem relationship between VNFs according to the dependency relationship between VNFs.

Alternatively, in one possible implementation of the present application, the data center may allocate corresponding physical resources for the data-intensive application, and carry the respective service function blocks of the cross-domain data-intensive application, that is, each virtual network function is implemented on one physical node, it is easy to understand that in the embodiment of the present application, multiple VNFs may be executed in one physical node. The invention constructs the physical server and the communication link of the data center as a physical network, wherein the physical server is the physical node of the application. It is easy to understand that in the embodiment of the present application, the number of physical nodes needs to be equal to or greater than the number of VNFs.

Optionally, in a possible embodiment of the present application, the physical networkWherein Gp is a physical network, and the physical network is composed of a physical node set +.>And physical Link set->Is composed of physical nodes- >V is any physical node in the set of physical nodes, and the physical nodes represent physical servers of the virtual network functions carried by each data center, where the physical node resources may include storage resources and computing resources. Physical Link->E is any physical link in the set of physical links, and the physical links are actual communication between physical nodesA link, wherein the resources of the physical link comprise communication resources.

S102, determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating a physical node corresponding to each virtual network function, and the return values are used for representing the execution success rate of each physical node in the arrangement strategy for realizing the virtual network function.

Alternatively, in the embodiment of the present application, in a cross-domain data center scenario, the arrangement problem of the service function chain may be expressed as: the VNF in the service function chain needs to find an optimal orchestration policy to implement one-to-one mapping of the service function chain to the physical resource, and the mapped service function chain can have higher reliability and lower running cost. In order to describe the service function chain arrangement problem, the present invention defines the mapping relationship as follows.

Optionally, for a service function chain(S is a set of service function chains, S is any one of the set of service function chains), there is a resource mapping for S>. Wherein g _s Mapping relation of s->Representing virtual node set->To the set of physical nodes->Mapping of->；/>Representing virtual Link set +.>To physical Link set->Mapping of->。/>Representing virtual node +.>Arrange to physical node->Behavior of->. Similarly, a->Representing virtual Link +.>Arrange to physical link->Is a behavior of (1).

Alternatively, at the time of service function chain orchestration, the orchestration process may be split into sequential placement of VNFs, accompanied by the connection of links. Each VNF placement has an impact on the service function chain as a whole with respect to the VNF of the previous placement. Therefore, in the process of arranging the service chain, the influence of the arrangement of the VNF on the whole service function chain, namely the magnitude of the return value, needs to be calculated, so that the return value corresponding to each arranging strategy can be compared by calculating the return value of each VNF arranging strategy. It is easy to understand that in the embodiment of the present application, reliability and operation cost issues need to be considered when the service function chain is arranged, so that the return value in the embodiment of the present application may be used to characterize the operation cost of each physical node for implementing the virtual network function in addition to the execution success rate of each physical node for implementing the virtual network function. In these alternative embodiments, the return value of each scheduling policy is compared, so that the reliability and the running cost of the service function chain are comprehensively considered, and the joint optimization reliability and the scheduling expense are targeted, so that the scheduled service function chain can have higher reliability and lower running cost.

And S103, arranging an arrangement strategy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions.

Optionally, in the embodiment of the present application, by comparing the return values of different arrangement policies of multiple virtual network functions in corresponding physical nodes, an arrangement policy with the largest return value is selected as the arrangement policy with the optimal virtual network function, so as to solve the step-by-step arrangement problem of the service function chain, thereby enabling the virtual network function to obtain the corresponding optimal service function chain according to the optimal arrangement policy, and further ensuring that the probability of successful execution of the virtual network function in the corresponding physical node is higher, and thereby improving the reliability of service function chain arrangement.

In the method for determining the service function chain provided by the embodiment of the application, a plurality of virtual network functions and a plurality of physical nodes are obtained, wherein the plurality of virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node; determining a plurality of return values corresponding to a plurality of arrangement strategies, wherein each arrangement strategy is used for indicating physical nodes corresponding to each virtual network function, and the return values are used for representing the success rate of executing the virtual network functions by each physical node in the arrangement strategy; and arranging the arrangement strategy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions. By the method, the return values of different arrangement strategies of the virtual network functions in the corresponding physical nodes can be obtained, and the arrangement strategy with the largest return value is selected as the arrangement strategy with the optimal virtual network function, so that the virtual network function can obtain the corresponding optimal service function chain according to the optimal arrangement strategy, the probability of successful execution of the virtual network function in the corresponding physical nodes can be ensured to be higher, and the reliability of arrangement of the service function chain is improved.

In one embodiment, the step 102 may specifically be performed as follows:

s201, determining a plurality of return values corresponding to the plurality of arrangement strategies by using a trained federal reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes, and the arrangement strategies corresponding to a plurality of sample virtual network functions, and each sample virtual network function is implemented on one first physical node.

Reinforcement learning is a new machine learning mode, also called reinforcement learning, and reinforcement learning does not need priori knowledge, obtains feedback of the environment on the current action of an agent through interactive operation with the environment, uses the feedback to determine the next operation, and corrects own behavior through one iteration process.

Unlike supervised learning or unsupervised learning, reinforcement learning does not directly depend on external experience, but rather learns according to own exploration, in which the external environment evaluates actions of the agent in each exploration, and the agent continuously corrects its own behavior strategy in positive or negative feedback. If a positive feedback is obtained for an action, the agent will prefer to choose the action in the decision, and if a negative feedback is obtained for an action, the agent will try to avoid in the decision.

Optionally, in an embodiment of the present application, the service function chaining decision is learned from interactions of the environment and the agent through a reinforcement learning model. In the present proposal cross-data center service orchestration scenario, a single data center is inexperienced in orchestrating different types of services in different network environments, requiring the reliance on business process experience of multiple data centers. However, business data from different data centers may be administered by different companies, and it is difficult to collect unified training. In a distributed cloud-edge collaborative computing scene, geographically distributed data centers are used as decision-making bodies to make arrangement decisions for computing tasks, different decision-making bodies hold limited service data, the service data collection and interaction among the data centers waste a large amount of bandwidth resources, and meanwhile, the service data among part of the data centers are difficult to share due to privacy problems. Therefore, the invention designs a federal reinforcement learning model of a multi-data center by using secure fusion of federal learning to realize experience sharing, so that the arrangement strategy of the obtained service function chain is more reliable and accurate.

The federal reinforcement learning model is implemented by the following basic steps: firstly, the initialized training model is downloaded to each training end (namely training samples), model parameters are updated by each training end according to own data, and different model parameters are generated in different training environments. The updated model parameters are then sent to the cloud for integration, and the integrated model parameters are used as initial parameters for the next model update. Repeating the training steps until the model converges, and generating the federal reinforcement learning model. The federal reinforcement learning model is characterized by all training data having the same feature dimension, and the transmitted model parameters are trained instead of local data.

Optionally, in one possible implementation manner of the present application, in order to solve the service function chain arrangement problem by using the federal reinforcement learning model, a reinforcement learning model may be set in each training sample, and the service function chains in the respective databases in the respective training samples are arranged by using the reinforcement learning model.

Optionally, in the embodiment of the present application, before training the reinforcement learning model of each training sample itself, several key parts of the reinforcement learning model in each training sample need to be: state, action, reward, and goal are defined and analyzed. The goal of the data center service function chain arrangement in the invention is service reliability improvement and service cost reduction, so that service reliability and cost factors need to be considered when designing the return function.

Wherein, the state:representing the service function chain arrangement state at time t, wherein S _t For the service function chain at time t, status,/- >Representing the physical resource occupancy of the server (i.e., the first physical node in the training sample) at time t, where the resources may include storage resources and computing resources. />Representing the VNF to be placed at the current orchestration stage (i.e. sample virtual network function in training samples),>representing the VNF that has been placed before the current orchestration phase.

The actions are as follows:a represents the arrangement action of the service function chain at the time t _t Representing the service function chain orchestration action at time t, i.e. +.>The VNF placement policy, which indicates at time t, includes whether a service is placed on a physical node or whether a tandem relation between services is mapped onto a physical link. VNF placement policy usage in service function chain>And (3) representing. />Representing executionIs (are) policy of->Representing execution->The range of values is {0,1},1 indicates a put on a physical node or link, and 0 indicates no put. For->There is->，/>Representing the service function chain->All virtual links on the table are mapped to a policy set of physical links.

Alternatively, in the embodiment of the present application, when the service function chain is orchestrated, the orchestration procedure may be split into sequential placement of VNFs, accompanied by the connection of the links. Each VNF placement has an impact on the service function chain as a whole with respect to the VNF of the previous placement. The reinforcement learning model is capable of obtaining optimal decisions by calculating the return of each action, so that in these alternative embodiments, the reinforcement learning model is suitable for solving the step-by-step arrangement problem of the service function chain, so that the arranged service function chain can have higher reliability and lower running cost.

In one embodiment, before the step 201, the method specifically includes the following steps:

s301, acquiring a plurality of training samples, wherein the sample virtual network functions of the plurality of training samples have a second preset execution sequence;

s302, determining target physical nodes for each sample virtual network function from the plurality of first physical nodes according to the second preset execution sequence.

Alternatively, in the embodiments of the present application, the reinforcement learning model used in the present proposal is Q-learning.

Q-learning is one of the most important reinforcement learning methods, which learns from rewards or penalties of behavior, rather than from given training samples, and is an incremental dynamic programming process that progressively searches for optimal strategies. Q-learning provides the ability for an agent to choose the best strategy by evaluating the Q value that represents the overall outcome of a series of actions. At each step of the interaction, the agent receives an enhanced signal that is a reward or penalty for the selected action in the environment. Then updating the current state and Q-table (Q-table includes all Q values); the agent continues to select the next action by a certain policy. Through continuous loop iteration, the agent finds out the optimal strategy.

In Q-learning, the purpose of finding the optimal strategy is realized through updating the Q table. The Q-table is a mapping table between state-actions and estimated future rewards.

Alternatively, in one possible implementation of the present application, the reinforcement learning model may obtain the server resource and service function chain arrangement state at time t, thereby determining the arrangement state S at time t _t 。

Optionally, in the embodiment of the present application, a corresponding orchestration action may be made according to the state at the time t, and a VNF of the next orchestration and a first physical node corresponding to the VNF are determined. It is easily understood that the first physical node corresponding to the next orchestrated VNF may be any one of a plurality of first physical nodes.

S303, determining a corresponding estimated return value according to the target physical node;

alternatively, in embodiments of the present application, after the corresponding orchestration action is made, its future return value may be estimated, where the future return value is determined primarily by the return function.

Alternatively, in the embodiments of the present application, to define a reward function, it is necessary to study the resource constraints present in service function chain placement. For physical nodesThe total amount of resources required for virtual network service functions placed on the physical node cannot exceed the total amount of service resources that the physical node is capable of providing. The following formula (1) represents a resource constraint on a physical node, wherein +_ >Indicating the total amount of resources of the physical node at time t.

(1)

For physical link e, the total amount of resources required for the service function link placed on that physical link cannot exceed the total amount of service resources that the physical link is capable of providing. Equation (2) represents a resource constraint on a physical link, wherein,indicating the total amount of resources of physical link e at time t.

(2)

Optionally, in the embodiments of the present application, backup of resources is not considered in order to simplify analysis of orchestration issues. At this time, for one virtual node (VNF to be placed), only one physical node can be mapped; meanwhile, for one virtual link (tandem relation between VNFs), only one physical link can be mapped. Namely, the following formulas (3) and (4).

(3)

(4)

Since the purpose of orchestration is to improve service reliability while reducing costs, the design of the payback function requires definition of service reliability and service function chain deployment cost benefits in addition to resource constraints.

Cost in service orchestrationThe definition is shown as a formula (5). Cost is derived from physical node resourcesIs made up of two parts, the cost of the physical link and the cost of the physical link.

(5)

Wherein k is _x Representing the cost, k, of a unit resource on a physical node x _z Representing the cost per resource on physical link z, y is z for any one physical link.

Reliability r of service _s The definition is shown as a formula (6). The reliability of the service consists of two parts, namely the reliability of the physical node and the reliability of the physical link.

(6)

Wherein r is _e Representing the reliability of the physical link e, r _v Representing the reliability of the physical node v.

The definition of the return function R is shown in a formula (7), wherein beta represents an adjustment coefficient and controls the importance degree of two influencing factors of service reliability and arrangement cost. The design of the reward function comprehensively considers the cost, reliability and total resource constraint in the arrangement of the data center service.

(7)

And S304, updating a return value table of training samples in the initial federal reinforcement learning model according to the estimated return value to obtain the trained federal reinforcement learning model, wherein the return value table comprises the maximum return value of each sample virtual network function arrangement strategy.

Optionally, in an embodiment of the present application, in the present problem scenario, the goal of the reinforcement learning model is to find an optimal strategyWherein pi is the optimal strategy, A _t I.e. the action of the optimal strategy, O _t I.e. the return value of the optimal strategy Can be from an initial state S _t The maximum return is obtained as shown in equation (8).

(8)

Wherein, gamma is the discount primer,r is a return value.

Alternatively, in embodiments of the present application, the reinforcement learning model used is Q-learning. In Q-learning, the purpose of finding the optimal strategy is achieved by updating the Q table (i.e., the report value table, which is described below as the Q table), that is, the maximum report value table of the maximum report value of each state is finally found. The Q-table is a mapping table between state-actions and estimated future rewards. The reinforcement learning model acquires the server resource and service function chain arrangement state at the time t, makes corresponding arrangement action for the state, and estimates a future return function of the state, so as to update the Q value (namely the return value) corresponding to the state-action.The update procedure of (2) is shown in the formula (9).

(9)

Wherein, the liquid crystal display device comprises a liquid crystal display device,for learning factors->For discounts factor->Representing behavior under policy pi, r is the immediate rewards obtained after performing action a.

Assume that the current server resource and service function chain arrangement state isThe currently selected action is a, and then the currently selected action needs to satisfy equation (10).

(10)

In order to guarantee the generalization function of Q-learning and avoid local optimization of the trapping result, we generally use epsilon greedy approach for action selection. The mechanism uniformly and randomly selects an optional action to explore with a small probability epsilon, and selects the current optimal action according to the formula with a probability of 1-epsilon.

In these alternative embodiments, a large-scale cross-domain data intensive application is abstracted into a service function chain based on a network function virtualization technology, the network function virtualization technology improves network capability, aims at joint optimization reliability and arrangement expense, establishes a service function chain arrangement strategy based on federal reinforcement learning, and realizes service arrangement decision optimization oriented to reliability guarantee in different environments. And when setting the return function, comprehensively considering the resource constraint, the service reliability and the service function chain deployment cost benefit, so that the trained reinforcement learning model can output an optimal arrangement strategy of the virtual network function, which comprehensively considers the three factors of the resource constraint, the service reliability and the service function chain deployment cost benefit, and the arranged service function chain can have higher reliability and lower running cost.

In one embodiment, before the step 303, the method may specifically perform the following steps:

s401, judging whether the total amount of the residual resources of the target physical node is larger than or equal to resources required by a sample virtual network function corresponding to the target physical node;

s402, determining a corresponding estimated return value according to the target physical node under the condition that the total amount of the residual resources of the target physical node is greater than or equal to the resources required by the sample virtual network function corresponding to the target physical node.

Alternatively, in the embodiments of the present application, the resource constraints present in service function chain placement need to be studied. For a physical node v, the total amount of resources required for virtual network service functions placed on that physical node cannot exceed the total amount of service resources that the physical node is capable of providing. For physical link e, the total amount of resources required for the service function link placed on that physical link cannot exceed the total amount of service resources that the physical link is capable of providing. Therefore, in the embodiment of the present application, the calculation of the return value can be performed only if the total amount of resources required by the next virtual network function is less than or equal to the total amount of resources of the first physical node to be selected, otherwise, the selection is not performed.

In these alternative embodiments, when designing the return function, the problem of resource constraint is considered, so that the physical node selected by the virtual network function each time is ensured to be more accurate, and the accuracy of the arrangement strategy selected by the reinforcement learning model is improved.

In one embodiment, the step 304 may specifically be performed as follows:

s501, carrying out aggregation processing on the updated return value table corresponding to each training sample to obtain an aggregated return table.

Optionally, in the embodiment of the application, each data center is first trained locally to obtain a respective reinforcement learning model, and because the model used in the present disclosure is Q-learning, the model parameters are stored in a private Q table, i.e. the maximum return value table, where the Q table is a mapping of state-action and rewarded in reinforcement learning. Next, Q-table information obtained in different data centers (training environments) is encrypted and transmitted using homomorphic encryption techniques, and then the encrypted Q-table information is transmitted to a cloud control center (aggregation server).

As shown in fig. 2, fig. 2 is a specific workflow of federal learning of the present application. In order to ensure that the training data in different data centers are not leaked, the federal learning framework needs to encrypt the reinforced learning Q table obtained by training so as to ensure the safety and reliability of the transmission process and the privacy of the training data.

In the federal learning model used in the proposal, a homomorphic encryption method is adopted to ensure the safety of the model. Homomorphic encryption is a classical encryption algorithm. Homomorphism refers to the mapping from one algebraic structure to a homogeneous algebraic structure, which can keep all relevant structures unchanged. The result obtained by decrypting the homomorphic encrypted ciphertext after the specific operation is consistent with the result obtained by decrypting the ciphertext and then carrying out the specific operation, so that the confidentiality of data operation in the federation model fusion stage can be effectively ensured, and the requirements of federation learning scenes are met. Compared with the method of secure multiparty calculation, the homomorphic encryption method has less data interaction, so the communication cost is less, and the model fusion efficiency can be improved under the federal learning scene requiring the multiparty participation of the cross-domain data center.

The process of model transmission using homomorphic encryption is generally as follows: firstly, an aggregation server generates homomorphic encryption public and private key pairs and participates in Fang Fenfa public keys to all data centers of federal learning; secondly, each data center transmits a calculation result to the aggregation server in the form of homomorphic ciphertext; the aggregation server performs summarization calculation and decrypts the result; and finally, the aggregation server returns the decrypted result to each data center, and each data center updates the respective reinforcement learning service function chain-based programming model parameters according to the result.

Optionally, in one possible implementation manner of the present application, after obtaining the maximum report value table of each training sample, the aggregation server performs decryption and secure aggregation operation of the model parameters, so as to obtain federal model parameters, i.e. a Q table after federation (i.e. an aggregated report table).

Alternatively, in one possible implementation of the present application, which placement action is selected in the current state for the federal model depends on the confidence values of the different private networks. For example, in a certain state, the Q-table trained in the environment a evaluates the Q-value of the current different actions to (5,4,4,5,6,5,5,6,5,5), and the Q-value of the Q-table in the environment B evaluates the Q-value of the Q-table to (3, 1, 10,2,2,1,1,2, 10, 1), at this time, we consider that the Q-table obtained in the environment B is more reliable because it shows a larger variability, so that the action corresponding to the Q-table in the environment B is selected in that state.

S502, distributing the aggregated return report to each training sample, so that each training sample can determine a new return value table corresponding to each training sample according to the aggregated return report.

Optionally, in the embodiment of the present application, the aggregation server sends the generated federal learning model (i.e. the aggregated return report) back to each data center (i.e. each training sample) after encryption processing to update the model, and after the agent in each data center obtains the federal learning model, fuses the local model parameters (i.e. the original report value table) and the federal model parameters (i.e. the aggregated return report), so that the decision experience can be shared among multiple training environments. Besides model sharing in a training environment, the federal learning model can also be applied to a newly added data center service deployment scene (new environment) to serve as a basis for model training in the new environment to participate in subsequent model training.

S503, obtaining a trained federal reinforcement learning model under the condition that the errors of the new report value table and the updated report value table are smaller than a preset error threshold value.

Optionally, in these alternative embodiments, after generating the federal learning model, the aggregation server may send the model parameters, i.e., the aggregated return report, to a different data center. Each data center uses formula (11) to update the original maximum return value table of the local model, so that the local model can learn more characteristics.

(11)

Wherein Q is _old Representing the original Q table, namely the original maximum return value table, Q _new Representing the newly obtained Q table, i.e. the new maximum return value table, alpha represents the weight of the original Q table, Q _fl To aggregate back reports.

Optionally, after the new local model is generated, the data center will resend the new Q table to the aggregation server for aggregation, and a new federal learning model is generated, that is, a new aggregated return table is generated. The above process is then iterated until the Q-table of the new model and the Q-table of the original model in each training sample have errors less than the set error threshold δ, which is calculated as shown in equation (12), we use Euclidean distance to calculate the error between the matrices.

(12)

In these optional embodiments, the service function chain arrangement mechanism based on federal reinforcement learning, specifically, the reinforcement learning model is deployed locally in the cross-domain data center as a training basis, training parameters of multiple data centers are further fused to obtain a federal model, the federal model is issued to a training environment and a new environment to perform model correction, the problem of the interaction overhead of original service data is solved, the data transmission is reduced, the bandwidth is saved, and the packet loss rate is reduced.

In these optional embodiments, the present application proposes a federal reinforcement learning model for solving the problem of collaborative decision making of service function chains of a cross-domain data center in a distributed cloud-edge collaborative scenario, so as to achieve training experiences of a group by training subjects in different environments, and achieve optimal decision making for reliability improvement and cost reduction. Firstly, defining a problem of service function chain arrangement, and establishing a mathematical model of the service function chain arrangement; secondly, a reinforcement learning service arrangement model and a federation reinforcement learning model which take network service reliability as an optimization target are established, and the arrangement method provided by the application can realize arrangement of reliability guarantee and resource overhead control.

In an embodiment, the step 501 may specifically be performed as follows:

s601, determining a confidence coefficient according to the updated return value table of each training book, wherein the confidence coefficient is used for representing the trust degree of each return value table;

s602, determining the aggregated return report according to the confidence level.

Alternatively, in the embodiment of the present application, there are many methods for defining the confidence coefficient, and there are variance, information entropy and the like. Whereas the variance ignores the influence of extrema and thus has a limited range of use. In contrast, information entropy is more suitable for describing uncertainty of information. We therefore refer to the way in which the entropy of information is calculated to define the confidence value.

Confidence value c in state x for data center j, i.e., for each training sample _xj The definition is shown as a formula (13).

(13)

Where m is the action dimension (i.e., the number of optional physical nodes), reward represents the return value, and ij is any one value in the return value table.

Based on the calculation of the confidence value, the confidence degree w _xj The definition of (2) is shown in the formula (14).

(14)

Assuming that k states exist in the reinforcement learning environment, and the Q table is a matrix in k×m dimensions, the federal model generation formula is shown in formula (15).

(15)

Wherein Q is _j Is the Q table, Q of the data center j participating in training (i.e. training sample) _fl To aggregate back reports.

Optionally, in the embodiment of the present application, in order to ensure the adaptability of the federal model in a new environment, model migration and correction are required to be performed on the obtained federal learning model. Fig. 3 illustrates the basic flow of federal learning model migration and correction. First, reinforcement learning models obtained by different data centers are fused to generate a federal model. Secondly, the federation model is required to be transmitted to an original data center (training environment) and a new data center (new environment), and the model in the training environment is fused with the federation model, wherein the model fusion mode is shown in a formula (11); and in the new environment, using the federal model as a pre-trained model, and continuing the training of reinforcement learning until the model converges. In actual use, the training and fusing steps described above are typically repeated several times.

Optionally, in one possible implementation manner of the present application, the federal learning model is used as a pre-training model, and when reinforcement learning training is performed in a new environment, the convergence speed of reinforcement learning is increased due to the experience of earlier training. Fig. 4 shows the training process of the model in the new environment, the abscissa is the convergence number, the ordinate is the return value, the curve B below the coordinate represents the return obtained by each round of training during the retraining of the reinforcement learning model in the new environment, and the curve a above the coordinate represents the training effect after using the federal model as the pre-training model, based on the coordinate (0,1600). As can be seen from the figure, under the condition of pre-training, after the training times reach 1 ten thousand times, the training result tends to be stable; and in retraining, the training converges about 80 ten thousand times.

In these optional embodiments, the reinforcement learning model is deployed locally in the cross-domain data center as a training basis, training parameters of the multi-data center are further fused to obtain a federal model, the federal model is issued to a training environment and a new environment to perform model correction, the problem of original business data interaction overhead is solved, data transmission is reduced, bandwidth is saved, and packet loss rate is reduced.

Optionally, in order to verify the service orchestration effect of the federal reinforcement learning model proposed by the present invention, a suitable simulation environment and orchestration task need to be designed. 3 different environments are used in the simulation, corresponding to different data centers respectively, wherein 2 are training environments with prior programming experience, and 1 is a new environment without programming experience. The simulation scenario is shown in fig. 5. The training processes of the 3 environments are mutually independent, and model information can be exchanged only after the federal reinforced model is generated.

The physical network in each environment comprises 20 nodes, each node has different reliability and VNF load, and the cost of the VNF to be placed is proportional to the distance between the node selected at the time and the node where the last placed VNF is located because the physical distance between the nodes is different; and the service function chain to be orchestrated consists of 4 VNFs in series.

Firstly, 2 training environments are required to be subjected to independent reinforcement learning training, and a process of acquiring experience in the early stage is simulated. After the reinforcement learning model in the training environment converges, the training environment is considered to have a certain arranging experience, and at the moment, the training model is subjected to federal fusion through a federal learning framework to generate a federal model. The federal model generated at this time is transmitted to 2 training environments and 1 new environment, respectively, and model fusion is performed in the respective environments.

We verify the effectiveness of the federal learning model proposed by this proposal by comparing the orchestration effect of the federal learning model and the independently trained reinforcement learning model in the new and training environments.

In simulation experiments, we compared the training effects of the retrained reinforcement learning model with the federal learning model, and compared the performance of the orchestration model in terms of service reliability and resource overhead in the new and training environments, respectively. In the simulation, the used comparison index is the reliability of the service function chain and the resource cost of the service arrangement. The simulation results obtained through a number of model training processes are shown in fig. 6-9.

Fig. 6 and fig. 7 show a comparison of learning effects based on the federal reinforcement learning service function chain arrangement model proposed by the present invention and the reinforcement learning service function chain arrangement model in the prior art in a training environment. Fig. 6 shows simulation results of service reliability, the abscissa shows the number of convergence times, the ordinate shows service reliability, the curve a above the coordinate is the simulation result of the federal learning model of the present application based on the coordinate (0,0.834), and the curve B below the coordinate is the simulation result based on the reinforcement learning model Q-learning. Fig. 7 shows simulation results of resource overhead, the abscissa represents the convergence number, the ordinate represents the cost (i.e., resource overhead), the curve B above the coordinate represents the simulation results based on the reinforcement learning model Q-learning, and the curve a below the coordinate represents the simulation results of the federal learning model of the present application, based on the coordinate (0,31). As can be seen from simulation results, in terms of service reliability, the method provided by the invention can achieve higher reliability and maintain stability at the beginning of training, and the high reliability value can be achieved by about 40 ten thousand times of training based on the reinforcement learning model Q-learning; in terms of resource overhead, the method provided by the invention can consume lower cost, and the reinforcement learning model still cannot reach the level which can be reached by the federal model in limited training. The method provided by the invention carries out training and decision making in a training environment, can achieve higher service reliability, and consumes less resources.

A comparison of the learning effect of the federal reinforcement learning service-based functional chain orchestration model proposed by the present invention in a new environment with the reinforcement learning service-based functional chain orchestration model of the prior art is shown in fig. 8 and 9. Fig. 8 shows simulation results of service reliability, the abscissa shows the number of convergence times, the ordinate shows service reliability, the curve a above the coordinate is the simulation result of the federal learning model of the present application based on the coordinate (0,0.870), and the curve B below the coordinate is the simulation result based on the reinforcement learning model Q-learning. Fig. 9 shows simulation results of resource overhead, the abscissa shows the number of convergence times, the ordinate shows the cost (i.e., resource overhead), the curve B above the coordinate shows simulation results based on the reinforcement learning model Q-learning, and the curve a below the coordinate shows simulation results of the federal learning model of the present application, based on the coordinate (0,17). From the simulation result, in terms of service reliability, the method provided by the invention can quickly reach a convergence state, and the service reliability is kept at a higher value in the whole training process; in terms of resource overhead of the service function chain, the federal model can also achieve rapid convergence and consumes lower cost. The method provided by the invention carries out training and decision making in a new environment, can achieve higher service reliability, and consumes less resources.

Next, the training effect of the federal reinforcement learning proposed by the present invention is compared with other existing methods.

Table 1 algorithm effect comparison

As shown in table 1, the comparison method selected is a reliability-priority greedy algorithm and a cost-priority greedy algorithm, which are methods commonly used in securing reliability or service orchestration costs. And selecting a service function chain consisting of 4 VNs from the new training environment to perform algorithm simulation, and comparing the reliability and cost of the target service function chain.

As can be seen from the comparison of the algorithm effects in Table 1, the method provided by the invention can ensure the reliability and the cost. Compared with a greedy algorithm with priority of reliability, the federal reinforcement learning method has lower consumption cost; the service reliability can be improved over the cost-first greedy algorithm.

By integrating the analysis, the service function chain determining method based on the federal reinforcement learning can effectively ensure the reliability of the service and reduce the resource cost of the service. Meanwhile, quick and effective real-time arrangement decision can be achieved in a multi-data center environment.

In these alternative embodiments, focusing on the service arrangement scenario of the cross-data center, the method can be suitable for various large-data service applications, such as large-scale cross-domain data intensive applications of communication journey cards, and the like, and service functions including signaling data acquisition, encryption and decryption, storage, transmission, analysis processing and the like need to be linked and deployed on different servers. The invention aims to design a service function chain arrangement strategy to acquire a service function chain arrangement scheme so as to improve service reliability and reduce service cost. However, in different data centers with different geographic distribution, the resource condition and the service requirement of the server are different, a great amount of communication resource overhead exists in the underlying data interaction between the different data centers, meanwhile, the service data cannot be shared among part of the data centers due to privacy problems, the reinforcement learning arrangement model obtained by training the single data center has no universality, and the optimal arrangement effect and arrangement speed cannot be obtained in the multiple data centers. The service function chain arrangement strategy based on federation reinforcement learning, which is designed by the invention, can overcome the problem of data sharing of arrangement business data of the cross-domain data center, and the technical scheme provided by the invention can realize reliability guarantee and resource overhead control service arrangement in a service arrangement scene of the cross-domain data center, can effectively promote arrangement decision speed and ensures real-time performance of cross-domain data scheduling.

Fig. 10 is a schematic structural view of a service function chain determining apparatus according to another embodiment of the present application, and only a portion related to the embodiment of the present application is shown for convenience of explanation.

Referring to fig. 10, the service function chain determining apparatus may include:

an obtaining module 1001, configured to obtain a plurality of virtual network functions and a plurality of physical nodes, where the plurality of virtual network functions have a first preset execution sequence, and each virtual network function is implemented on one physical node;

a first determining module 1002, configured to determine a plurality of return values corresponding to a plurality of arrangement policies, where each arrangement policy is configured to indicate a physical node corresponding to each virtual network function, and the return values are configured to characterize an execution success rate of each physical node in the arrangement policy to implement the virtual network function;

a second determining module 1003, configured to schedule the scheduling policy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions.

In an embodiment, the service function chain determining apparatus may further include:

and the third determining module is used for determining a plurality of return values corresponding to the plurality of arrangement strategies by utilizing a trained federal reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes, and the arrangement strategy corresponding to a plurality of sample virtual network functions, and each sample virtual network function is realized on one first physical node.

In one embodiment, the federal reinforcement learning model includes:

the second acquisition module is used for acquiring the plurality of training samples, and the sample virtual network functions of the plurality of training samples have a second preset execution sequence;

a fourth determining module, configured to determine a target physical node for each sample virtual network function from the plurality of first physical nodes according to the second preset execution order;

a fifth determining module, configured to determine a corresponding estimated return value according to the target physical node;

and the first updating module is used for updating a return value table of training samples in the initial federal reinforcement learning model according to the estimated return value to obtain the federal reinforcement learning model, wherein the return value table comprises the maximum return value of each sample virtual network function arrangement strategy.

the first judging module is used for judging whether the total amount of the residual resources of the target physical node is larger than or equal to the resources required by the sample virtual network function corresponding to the target physical node;

and a sixth determining module, configured to determine, according to the target physical node, a corresponding estimated return value when the total amount of remaining resources of the target physical node is greater than or equal to resources required by the sample virtual network function corresponding to the target physical node.

the first aggregation module is used for carrying out aggregation treatment on the updated return value table corresponding to each training sample so as to obtain an aggregated return table;

the distribution module is used for distributing the aggregated return report to each training sample so that each training sample can determine a new return value table corresponding to each training sample according to the aggregated return report;

and a seventh determining module, configured to obtain a trained federal reinforcement learning model when the error between the new report value table and the updated report value table is less than a preset error threshold.

the eighth determining module is used for determining a confidence coefficient according to the updated return value table of each training book, wherein the confidence coefficient is used for representing the trust degree of each return value table;

and a ninth determining module, configured to determine the aggregate return report according to the confidence level.

It should be noted that, based on the same concept as the embodiment of the method of the present application, the information interaction and the execution process between the above devices/units are devices corresponding to the battery thermal runaway warning method, and all implementation manners in the above method embodiment are applicable to the embodiment of the device, and specific functions and technical effects thereof may be referred to in the method embodiment section, and are not repeated herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Fig. 11 shows a schematic hardware structure of an electronic device according to an embodiment of the present application.

The device may include a processor 1101 and a memory 1102 storing program instructions.

The steps of any of the various method embodiments described above are implemented when a program is executed by the processor 1101.

For example, the program may be partitioned into one or more modules/units, which are stored in the memory 1102 and executed by the processor 1101 to complete the present application. One or more of the modules/units may be a series of program instruction segments capable of performing specific functions to describe the execution of the program in the device.

In particular, the processor 1101 may comprise a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

Memory 1102 may include mass storage for data or instructions. By way of example, and not limitation, memory 1102 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. Memory 1102 may include removable or non-removable (or fixed) media where appropriate. Memory 1102 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 1102 is a non-volatile solid state memory.

The memory may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors) it is operable to perform the operations described with reference to methods in accordance with aspects of the present disclosure.

The processor 1101 implements any of the methods of the above embodiments by reading and executing program instructions stored in the memory 1102.

In one example, the electronic device may also include a communication interface 1103 and a bus 1110. The processor 1101, the memory 1102, and the communication interface 1103 are connected to each other through a bus 1110 and perform communication with each other.

The communication interface 1103 is mainly used for implementing communication between each module, device, unit and/or apparatus in the embodiments of the present application.

Bus 1110 includes hardware, software, or both, that couple the components of the online data flow billing device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 1110 can include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.

In addition, in combination with the method in the above embodiment, the embodiment of the application may be implemented by providing a storage medium. The storage medium has program instructions stored thereon; the program instructions, when executed by a processor, implement any of the methods of the embodiments described above.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or instructions, the processes of the above method embodiment are realized, the same technical effects can be achieved, and in order to avoid repetition, the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

The embodiments of the present application provide a computer program product, which is stored in a storage medium, and the program product is executed by at least one processor to implement the respective processes of the above method embodiments, and achieve the same technical effects, and are not repeated herein.

It should be clear that the present application is not limited to the particular arrangements and processes described above and illustrated in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions, or change the order between steps, after appreciating the spirit of the present application.

The functional blocks shown in the above block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer grids such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be different from the order in the embodiments, or several steps may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, which are intended to be included in the scope of the present application.

Claims

1. A method of determining a service function chain, the method comprising:

wherein the determining the plurality of return values corresponding to the plurality of arrangement strategies includes:

determining a plurality of return values corresponding to the plurality of arrangement strategies by using a trained federal reinforcement learning model, wherein the reinforcement learning model comprises a plurality of training samples, each training sample comprises the total amount of target resources of a plurality of first physical nodes, and the arrangement strategy corresponding to a plurality of sample virtual network functions, and each sample virtual network function is realized on one first physical node;

the federal reinforcement learning model is provided with an encrypted public key and a private key corresponding to the public key, wherein the public key is used for encrypting training results of all training samples, and the private key is used for decrypting the encrypted training results sent by all training samples to the federal reinforcement learning model; arranging an arrangement strategy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions;

The training process of the federal reinforcement learning model is as follows:

obtaining a maximum return value table corresponding to each first sample according to the total amount of target resources of each first physical node in the first samples and an arrangement strategy corresponding to the virtual network function of the samples, wherein the first samples are any at least two training samples in the training samples;

carrying out parameter fusion on the maximum return value table corresponding to each first sample to obtain a first federal model;

and training the first federal model by using a second sample until the first federal model converges to obtain the trained federal reinforcement learning model, wherein the second sample is a training sample which is not subjected to parameter fusion in the plurality of training samples.

2. The method of claim 1, wherein prior to the determining a plurality of return values for the plurality of orchestration strategies using the trained federal reinforcement learning model, the method further comprises:

3. The method of claim 2, wherein prior to said determining a corresponding predicted return value from said target physical node, the method further comprises:

4. The method of claim 2, wherein updating the report value table of the training samples in the initial federal reinforcement learning model according to the estimated report value to obtain the trained federal reinforcement learning model comprises:

5. The method of claim 4, wherein aggregating the updated report value table corresponding to each training sample to obtain an aggregated report table, comprises:

according to the updated return value table of each training book, determining the confidence coefficient, wherein the confidence coefficient is used for representing the trust degree of each return value table;

and determining the aggregated return report according to the confidence level.

6. A service function chain determination apparatus, the apparatus comprising:

the system comprises an acquisition module, a storage module and a control module, wherein the acquisition module is used for acquiring a plurality of virtual network functions and a plurality of physical nodes, the plurality of virtual network functions have a first preset execution sequence, and each virtual network function is realized on one physical node;

a third determining module, configured to determine a plurality of return values corresponding to the plurality of arrangement strategies by using a trained federal reinforcement learning model, where the reinforcement learning model includes a plurality of training samples, each training sample includes a total amount of target resources of a plurality of first physical nodes, and an arrangement strategy corresponding to a plurality of sample virtual network functions, and each sample virtual network function is implemented on one of the first physical nodes;

the federal reinforcement learning model is provided with an encrypted public key and a private key corresponding to the public key, wherein the public key is used for encrypting training results of all training samples, and the private key is used for decrypting the encrypted training results sent by all training samples to the federal reinforcement learning model;

the second determining module is used for arranging an arrangement strategy corresponding to the maximum return value as a service function chain of the plurality of virtual network functions;

The training process of the federal reinforcement learning model is as follows:

7. An electronic device, the device comprising: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the service function chain determination method of any one of claims 1-5.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon computer program instructions, which when executed by a processor, implement the service function chain determination method according to any of claims 1-5.