CN115396495B - Fault handling method for factory micro-service system in SDN-FOG environment - Google Patents
Fault handling method for factory micro-service system in SDN-FOG environment Download PDFInfo
- Publication number
- CN115396495B CN115396495B CN202211005627.7A CN202211005627A CN115396495B CN 115396495 B CN115396495 B CN 115396495B CN 202211005627 A CN202211005627 A CN 202211005627A CN 115396495 B CN115396495 B CN 115396495B
- Authority
- CN
- China
- Prior art keywords
- fault
- workflow
- node
- micro
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000004044 response Effects 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 30
- 238000005457 optimization Methods 0.000 claims abstract description 28
- 230000010485 coping Effects 0.000 claims abstract description 26
- 238000005265 energy consumption Methods 0.000 claims abstract description 17
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 13
- 230000005540 biological transmission Effects 0.000 claims description 16
- 230000002265 prevention Effects 0.000 claims description 16
- 238000012544 monitoring process Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 6
- 238000011084 recovery Methods 0.000 claims description 6
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000013468 resource allocation Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 description 12
- 230000003044 adaptive effect Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 230000006978 adaptation Effects 0.000 description 8
- 238000011160 research Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 239000003595 mist Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000006585 stringent response Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Telephonic Communication Services (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a fault handling method of a factory micro-service system in an SDN-FOG environment, and relates to the technical field of industrial Internet of things. The invention develops a fault handling scheme of an information physical intelligent factory micro-service system in an SDN-FOG environment. The method comprises the steps of establishing a system model which accords with an actual factory operation scene, establishing a fault processing framework of a micro-service system based on an SDN-FOG environment, establishing an integer programming problem which can simultaneously consider network resource constraint, workflow response time, load, energy consumption and fault coping, designing logic of an application layer fault coping program based on the integer programming problem, and obtaining an optimal response strategy by using a Gurobi solver through two self-adaptive related optimization problems. In the scene of strict requirements on fault processing response time, a suboptimal heuristic algorithm is designed to simplify the flow, the expandability of the provided optimization problem is solved, and the obtained solution is a self-adaptive method and is suitable for the network of the actual factory scene.
Description
Technical Field
The invention relates to the technical field of industrial Internet of things, in particular to a fault handling method of a factory micro-service system in an SDN-FOG environment.
Background
In recent years, industry 4.0 is considered as an important driving force of the new generation of industrial revolution, and industrial internet of things (IIoT) has attracted a great deal of attention, and it can meet various actual factory application scenarios, such as remote adaptation and configuration, intelligent operation and maintenance, and cooperative control of devices. IIoT connects physical entities with computing power based on Internet technology standards, and facilitates production by information and data technology through industrial data modeling, management and analysis. IIoT generally has stringent quality of service (QoS) requirements, such as very short response time requirements, load balancing issues, energy consumption issues, and reliability issues. Given that cloud computing servers are typically remote from field devices, the delay that exists can complicate the process of sufficiently short response times, and in order to support applications with stringent response time requirements, a fog computing paradigm is emerging. Fog computing deploys computing resources closer to end users, edge fog computing can quickly compute and analyze data locally, and transmit relevant on-demand data streams from the event geographic locations to the core platform, improving overall network efficiency. To improve the management efficiency of edge fog computing devices, software Defined Networking (SDN) technology may be introduced, a paradigm that allows networks to be programmed by SDN controllers, programmable route optimization, which is scalable and flexible when new devices are added to the infrastructure. Furthermore, the SDN architecture also allows the network controller to monitor the network and computing devices, collecting performance information from the infrastructure. The QoS of the overall infrastructure can be improved by using a plurality of coordinated SDN controllers in the fog infrastructure of the industrial internet of things. After the mist infrastructure based on SDN is deployed in the factory, the mist infrastructure can be converted into an information physical intelligent factory by utilizing IIoT, and the digitization of the factory environment is beneficial to realizing a new automation mode, so that more flexible and safe operation can be realized.
In an SDN-FOG environment, the operation of a factory may be implemented by a microservice architecture, a very promising architecture that breaks down single software into a set of loosely coupled containerized microservices, supports distributed deployments, and associates them to multiple microservice chains to service requests, which can significantly shorten the time to change the system and apply the changes to the production environment, reduce development and maintenance costs, and increase flexibility. Currently, there are three main categories of optimization problems related to micro-service development, deployment, extension and maintenance in SDN-FOG environments: one is the decentralized computing distribution problem, which is mainly studied in the fog computing environment, which node the micro-service uses and which part of the application should be carried by the node. The problem is generally to take average response time as an optimization target, and presume that the network is a static entity providing specific delay, the SDN device is responsible for collecting information from the infrastructure and then minimizing the network delay through technologies such as route optimization, but the problem is generally to consider optimization when designing the architecture, and rarely consider micro-service redeployment under the fault condition. Secondly, the problem of placing an optimal controller is solved, the time delay between an SDN switch and the controller can affect the time delay between any two devices in an SDN architecture, the problem generally assumes that the flow cannot be changed according to the delay realized by a network, the relation between the arrangement of the controller and the network topology and the flow guiding mode in the network is researched, and then the control delay and the data transmission delay are optimized, but the problem is generally that optimization is considered when the architecture is designed, and real-time control flow under the fault condition is rarely considered. Thirdly, the path optimization problem generally regards the response time of the system as the quality of service index to be optimized, the response time being a combination of the execution time and the transmission delay. The problems are generally that the most suitable paths are arranged for the workflow by establishing a multi-objective optimization problem and jointly considering service execution time and transmission delay, but the problems are considered in the scene of route optimization in normal operation, and the consideration of rerouting under the fault condition is lacked.
The application of micro services in industrial scenarios has been of interest in the research community, however, as mentioned above, most of the existing research on applying micro services to intelligent plants has focused on general architecture principles and optimized deployment schemes, only a few works consider fault coping schemes under micro service architecture, and their scope is generally narrow, providing only limited adaptation forms (such as self-repair and location adaptation at runtime, etc.). These studies combine micro-service architecture with adaptive systems that are designed to monitor their behavior and alter their configuration at run-time to maintain and enhance the quality attributes of the micro-service architecture under uncertain operating conditions (e.g., workload changes and risk of failure, etc.). It is now necessary to consider dynamic network topology changes, workflow changes, fault prevention and fault handling in addition to the execution time and delay key factors when studying the application of micro services in real plants.
Designing an adaptive system that enables automatic fault handling involves making design decisions on the environment and the system itself while observing the network environment, and then selecting the appropriate adaptation mechanism. In micro-service architecture based on SDN-FOG environment, the design space for making adaptive decisions is more complex due to the large number of runtime components and the independence and high dynamics. The primary challenge is: how to develop monitoring and adaptation mechanisms to cope with the diversity of quality attributes of micro-service architecture. Major optimization points that may be considered include: 1) Response time. 2) Node reliability. 3) Network overhead. 4) Load balancing (if micro services are run on only one or a few nodes, this will result in reduced service execution performance. Therefore, performance degradation due to node load increase generally needs to be considered in a fog computing scenario).
The features of the microservice architecture, i.e., independent and frequent deployment, the need for high automation, and complex operating architectures, may facilitate further research and development of adaptive systems. Accordingly, the adaptive system provides a control-oriented view, and the research on fault coping schemes is an important ring for designing the adaptive system, and the work of designing adaptive correlation schemes for micro-service architecture at present mainly comprises the following two steps: firstly, by utilizing a large number of theoretical and practical results, a dynamic programming algorithm is designed to select the best adaptation strategy for each micro-service. When the system runs, information in the network is continuously collected, the current actual working state of the controlled object is determined, and according to the setting of an application program, at the time point when the set threshold is reached, the self-adaptive control rule is triggered, so that the structure or parameters of the system are adjusted in real time, and the system always and automatically works in the optimal or suboptimal running state. And secondly, a new adaptation strategy is learned from the past adaptation result through a reinforcement learning method, so that the quality attribute of the whole micro-service system is improved, and the strategy reasoning and the multi-objective optimization are realized under the uncertainty condition so as to meet the requirements of various service quality which are possibly in conflict with each other. From a solution model perspective, reinforcement learning is applied to adaptive solution design of micro-service architecture due to significant advantages in solving strategic decision problems.
By the introduction and analysis of the background, it can be seen that, for the information physical intelligent factory in the SDN-FOG environment, the design of the fault handling scheme of the micro-service system mainly faces the following three difficulties: 1) The proposed scheme not only needs to consider the fault handling scheme, but also needs to satisfy basic service quality indexes, such as response time, energy consumption, load and other quality indexes, when researching the route optimization link of fault recovery. 2) The prior art of micro service systems in industrial settings rarely considers the design of failure handling schemes, and the problems associated with failure handling of micro service systems are more than one, as many as possible, such as failure prevention and rerouting costs. 3) The number of workflows, the number of fog nodes and the number of micro-services required in real industrial scenarios will have an impact on the proposed algorithm, so the designed solution needs to be flexible in real scenarios and the proposed algorithm should be able to respond quickly to workflows that change dynamically over time.
It can be seen that research on FOG calculation, SDN and micro services is relatively independent, and the research on development, deployment, management and expansion of micro services in an SDN-FOG environment is limited by combining the three, and the research on fault handling in the environment is less. The method has the advantages that the optimal deployment scheme of the micro-service system is fully researched aiming at the prior art of the micro-service system in an industrial scene, but the self-adaptive design of the micro-service system is rarely considered, only a few works solve the specific challenges of developing the self-adaptive scheme for the micro-service system at present, the fault handling is used as a key ring of the self-adaptive design, and the related research work quantity is also less. The prior art solutions to failure handling in micro-service architecture generally have a narrower scope, focusing more on failure handling and scheduling new workflows, and do not adequately account for failure prevention, rerouting costs. The quality attributes of the micro-service system have diversity, and comprehensive consideration of response time, node reliability, network overhead and load is lacked in fault coping scheme design.
Accordingly, those skilled in the art are working to develop a fault handling method for a factory micro service system in an SDN-FOG environment. And constructing a fault processing framework of a micro-service system based on an SDN-FOG environment, constructing an integer programming problem capable of simultaneously considering network resource constraint, workflow response time, load, energy consumption and fault coping, and obtaining an optimal response strategy by using a Gurobi solver. And for the scene with strict requirements on fault processing response time, a suboptimal heuristic algorithm is designed to simplify the flow, solve the expandability of the optimization problem, obtain a self-adaptive solution and be suitable for the network of the actual factory scene.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention aims to develop a fault handling scheme for an information physical intelligent factory micro-service system in an SDN-FOG environment. The method comprises the steps of establishing a system model which accords with an actual factory operation scene, establishing a fault processing framework of a micro-service system based on an SDN-FOG environment, establishing an integer programming problem which can simultaneously consider network resource constraint, workflow response time, load, energy consumption and fault coping, designing logic of an application layer fault coping program based on the integer programming problem, and obtaining an optimal response strategy by using a Gurobi solver through two self-adaptive related optimization problems. In the scene of strict requirements on fault processing response time, a suboptimal heuristic algorithm is designed to simplify the flow, the expandability of the provided optimization problem is solved, and the obtained solution is a self-adaptive method and is suitable for the network of the actual factory scene.
In order to achieve the above object, the present invention provides a fault handling method for a factory micro service system in an SDN-FOG environment, comprising the following steps:
step 1, establishing a micro-service system fault processing architecture based on an SDN-FOG environment;
step 2, modeling a system, describing the relationship among infrastructure, a transmission network and fog computing equipment;
step 3, constructing an integer programming problem and a fault coping program;
step 4, solving the optimal coping strategy by using a solver;
and 5, a scene of fault processing with particularly strict requirements on fault response time is adopted by using a heuristic algorithm.
Further, the micro-service system fault handling architecture of the step 1 based on the SDN-FOG environment comprises an application layer, a control layer and a base layer.
Further, the application layer includes fault handling programs, timers, and functionality programs.
Further, the control layer includes various types of components required for the fault handling scheme.
Further, the base layer deploys a switch supporting SDN control, the switch has different fault probabilities in time slots, and the central master controller is connected to the switch to summarize network topology and traffic information and dynamically programs and configures the switch by using a southbound interface; each switch is connected with one fog node and is combined to be regarded as one node; each fog node corresponds to a set of industrial internet of things devices and fog servers.
Further, the step 1 includes the following steps:
step 1.1, continuously acquiring information by a sensor connected to a factory field device, preprocessing sensor data, and transmitting the sensor data to a connected switch supporting SDN control; the network monitoring component collects node information and link information including field devices, foggy nodes and switches from the switch and sends the information to the application layer to continuously monitor network traffic;
step 1.2, the fault detection assembly judges whether the transmission network has faults according to the information collected by the network monitoring assembly; if the switch or the fog node or the link is detected to be faulty, the network monitoring component sends the current new network topology and the micro service set supported by each fog node to a fault coping program;
step 1.3, calculating a fault coping program in real time, and reallocating resources to a workflow passing through a fault node or a link; the node configuration component distributes resources to the workflow through dispatching a forwarding table, and applies fault response decisions made by a fault response program in an application layer to the switch and the fog node;
step 1.4, refreshing a timer after configuration is completed, reporting a fault type and converting to manual processing if micro services in part of workflow cannot be completed due to faults of part of nodes or links;
step 1.5, activating a fault prevention component after a complete period of the timer, actively reconstructing the network periodically, optimizing a selected path of the workflow, improving the overall service quality and reducing the fault probability.
Further, the step 2 includes the steps of:
step 2.1, modeling a transmission network;
step 2.2, modeling the fog node and the supported micro-service set;
step 2.3, modeling workflow;
and 2.4, modeling the service quality index.
Further, the step 3 includes the following steps:
step 3.1, defining decision variables;
step 3.2, constructing resource constraint;
step 3.3, constructing an optimization target related to the decision variable;
step 3.4, constructing decision variable related constraints;
and 3.5, constructing an objective function.
Further, the step 3.5 of constructing the objective function includes a problem of fault recovery of the network and a problem of periodic reconstruction of the network.
Further, the step 4 solver includes a commercial solver Gurobi.
In a preferred embodiment of the present invention, a micro-service system fault handling architecture suitable for an information physical intelligent factory is provided, which utilizes SDN and mist computing technologies and gives a description of problems of fault recovery and fault prevention. The corresponding optimization problem is in the form of integer programming, and the Gurobi commercial solver can be used for obtaining an optimal response strategy.
The method provided considers the maximum utilization rate and the fault probability of the links and the nodes, optimizes the load of the network equipment and the probability of network faults, and dynamically redistributes resources with smaller rerouting cost in a predefined time period to ensure that the network is in an optimal state as the traffic demand in the network and the probability of the nodes or the links are changed along with the change of time and considers the fault prevention.
The proposed method can optimize network load and energy consumption to a certain extent, ensure the required quality of service level, and reconfigure the network in real time under the condition of node or link failure; and the diversity of the quality attributes of the micro service system is considered, and the weight of the micro service system can be measured according to actual conditions.
The proposed suboptimal heuristic method can quickly respond to a workflow which dynamically changes with time, and the solution is an adaptive method and is applicable to a network of actual factory scenes.
In actual operation deployment, the microservices may be logically grouped according to business requirements and quality requirements, which corresponds to the set of microservices supported by each fog node, and then may be collectively managed in a hierarchical or distributed structure with separate, application-aware servers.
Compared with the prior art, the invention has the following obvious substantial characteristics and obvious advantages:
1. the method integrates SDN, FOG calculation and micro-service technology, designs a micro-service system fault processing architecture based on an SDN-FOG environment on the premise of guaranteeing common service quality indexes, models actual factory operation scenes, gives out fault processing and fault prevention description, and is used for designing fault coping programs.
2. The diversity of quality attributes of the micro-service system is comprehensively considered, response time, node reliability, energy consumption overhead and load are comprehensively considered in the design of the fault coping scheme, and the weight of the micro-service system can be measured according to actual conditions.
3. The proposal considers the maximum utilization rate of the links and the nodes, carries out overload prevention fault prevention design, also considers the fault probability of the links and the nodes, carries out fault prevention design for limiting the fault probability, and considers the cost of network flow change during rerouting.
4. In actual operation, the microservices may be logically grouped according to business and quality requirements, corresponding to the set of microservices supported by each fog node, and then may be collectively managed in a hierarchical or distributed structure with separate, application-aware servers.
The conception, specific structure, and technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, features, and effects of the present invention.
Drawings
FIG. 1 is a micro-service system failure handling architecture based on SDN-FOG environment in accordance with a preferred embodiment of the present invention;
FIG. 2 is a detailed structure of a base layer of a preferred embodiment of the present invention;
FIG. 3 is a flow of micro-service system failure handling based on SDN-FOG environment in accordance with a preferred embodiment of the present invention;
FIG. 4 is a flowchart of a heuristic algorithm of a preferred embodiment of the present invention.
Detailed Description
The following description of the preferred embodiments of the present invention refers to the accompanying drawings, which make the technical contents thereof more clear and easy to understand. The present invention may be embodied in many different forms of embodiments and the scope of the present invention is not limited to only the embodiments described herein.
In the drawings, like structural elements are referred to by like reference numerals and components having similar structure or function are referred to by like reference numerals. The dimensions and thickness of each component shown in the drawings are arbitrarily shown, and the present invention is not limited to the dimensions and thickness of each component. The thickness of the components is exaggerated in some places in the drawings for clarity of illustration.
The invention provides a fault coping method of an intelligent factory micro-service system in an SDN-FOG environment, which comprises the following steps:
step one: aiming at an information physical intelligent factory, a micro-service system fault processing architecture based on an SDN-FOG environment as shown in fig. 1 is provided, and mainly comprises the following steps:
s1, fog infrastructure and industrial Internet of things related equipment are deployed in a factory, a sensor connected to factory field equipment continuously collects information, sensor data is preprocessed and then sent to a switch which is connected with the sensor and supports SDN control; the network monitoring component gathers node information and related link information from the switch, including field devices, fog nodes, and switches, and sends the information to the application layer to continuously monitor network traffic.
S2, the fault detection component judges whether the transmission network has faults according to the information collected by the network monitoring component. If a failure at a switch or a foggy node or link is detected, the network monitoring component sends the current new network topology and the set of micro services supported by each foggy node to a failure handling program.
And S3, calculating the fault coping program in real time, and reallocating resources to the workflow passing through the fault node or link. The node configuration component allocates resources to the workflow according to the dispatch forwarding table, and applies the failure response decision made by the failure response program in the application layer to the switch and the fog node.
S4, refreshing the timer after the configuration is completed, and if the micro-service in part of the workflow cannot be completed because of the fault of part of nodes or links, reporting the fault type and converting to manual processing.
And S5, activating a fault prevention component after a complete period of the timer, and actively reconstructing the network periodically, optimizing a selected path of the workflow on the premise of low rerouting cost, improving the overall service quality and reducing the fault probability.
Specifically, as shown in fig. 1, in the micro-service system fault handling architecture based on the SDN-FOG environment, there are three major layers, namely 1) an application layer, including a fault handling program, a timer and other functional programs; 2) The control layer comprises various components and the like required by the fault coping scheme; 3) A base layer, the base layer deploying a plurality of switches supporting SDN control, the switch nodes having different failure probabilities in time slots, a central master controller of the plant being connected to the switches to summarize network topology and traffic information and dynamically programming the configuration switches using a southbound interface; as shown in fig. 2, each switch is set to be connected with a fog node in the architecture, and the fog nodes are combined to be regarded as a node; each fog node corresponds to a set of industrial internet of things devices and fog servers. In this plant scenario, some functional applications are deployed, the status of the plant equipment is collected by the relevant sensors and processed in the fog server, thus completing the corresponding commands, continuously monitoring and managing the intelligent plant. These applications are designed using a micro-service architecture and thus contain different independent services that perform specific types of processing, whose functions can be requested individually or combined by workflow.
As shown in the right half of fig. 1, the base layer can be divided into three parts: the physical portion comprises the physical equipment of the plant; the network part comprises a switch and a master controller supporting SDN; the computing part comprises industrial Internet of things equipment and a fog server.
Step two: the detailed system modeling of the base layer to describe the relationship between infrastructure, transport network and fog computing devices mainly comprises the following steps:
s1, modeling a transmission network: the network infrastructure is denoted g= { N, L }, N being a set of nodes, L being a set of links connecting different switches. Node i e N a tuple i=<p i ,r i >,p i Is the computational processing power (units/s) of the fog node, one unit representing one micro-cycle; r is (r) i Is the RAM (Mb) of the node. Link l ij E L is a tuple L ij =<d ij ,c ij >Wherein d ij For delay of links(ms),c ij For maximum capacity (Mb/s) of the link, node i is the source of the link and node j is the destination of the link. In view of avoiding overload, mu is used 1 Represents maximum link utilization, in mu 2 Representing a maximum node utilization; in view of fault prevention, use P i (t) represents the failure probability of node i, and P ij (t) represents a link l ij Is a failure probability of (a). Then the network topology can use matrix C N*N The propagation delay of the link can be represented by matrix D N*N Representing, for example:
s2, modeling the fog node and the supported micro service set: considering X different micro-services, micro-service xε_X may be represented by a tuple as x=<pc x ,r x >Wherein pc x Is the processing power (measured by the number of microcycles required) required for microservice x per unit flow, r x Is the amount of RAM (Mb) required for micro service x. Using a matrix NF N*X Representing the set of micro services supported by each node, NF (i,x) =1 means that node i supports microservice x. In actual operation deployment, the micro services can be logically grouped according to service requirements and quality requirements, which corresponds to the micro service set supported by each fog node.
S3, modeling workflow: under this architecture we consider loop-free routing only, i.e. nodes and links are not used twice in the routing of the workflow. F is a collection of workflows, each workflow containing execution of a particular function, linking between 1 to |Y| micro-services depending on the number of micro-services required to execute the function, |Y|<X. Workflow F e F is an ordered tuple f= { m 1 ,m 2 ,…,m |f-1| ,m |f| Each element m a Having exactly the same format and value as one microservice X in X, i.e. m a ∈X,The value of f is less than or equal to Y. By C f (t) represents f in time slot tIs a flow rate of (a). To execute a workflow, data must flow from the node that initiated the workflow to execute m 1 From there to the execution m 2 And so on, the last functional microservice m |f-1| After completion, m |f| Indicating that the data needs to be returned to the plant or delivered to the control layer. The micro-services of each workflow request are represented by matrix R F*X And (3) representing. />Indicating that the workflow F e F in slot t requested the micro-service X e X. S for starting node of workflow f f Indicating that the workflow f is transmitted to the node d after being executed f Aggregate information, either acting back on the factory equipment or uploading to the control layer.
S4, modeling the service quality index: 1) There are two different modes for each fog node: work and idle. If there is no micro-service activated on the foggy node, the foggy node will be in idle mode; by e i Representing the energy consumption of node i in the active mode, in node idle mode, the energy consumption being a fraction e of the energy consumption during operation 0 +e idle (mainly the operation energy consumption e of the exchange) 0 Plus static energy consumption e of foggy node idle ) The current state of the fog node is represented by S 1*N (t) ∈ {0,1} specifies, let E (t) be the energy consumption of the network in time slot t. 2) By usingRepresenting the maximum processing delay that can be tolerated by the workflow f, with + ->Let T (T) be the response time of the network for all the workflows in time slot T, representing the maximum allowable propagation delay of workflow f. 3) The link failure probability and the node failure probability of the workflow f routing in the network should be smaller than a predefined threshold M l And M n . 4) The cost of network rerouting in slot t is measured by the number of routes that need to be changed compared to the original network configuration under the new network configuration.
Step three: the integer programming problem is constructed to describe a fault coping program, and mainly comprises the following steps:
s1, defining 0-1 decision variables: 1) Matrix arrayRepresenting the assignment of nodes and micro-services to workflows in time slot t.Representing execution of micro-service m in workflow f on node i in time slot t a . 2) Matrix->Network link resource allocation, which is workflow, +.>Representing micro-services m in workflow f a In time slot t via link l ij Routing, routing method>Indicating that the workflow f has passed the link l in the time slot t ij . 3) The current state of the fog node is represented by S 1*N (t) ∈ {0,1} designation.
S2, constructing resource constraint: constraint (1) is a link capacity constraint between nodes,
constraint (2) is used to control the propagation delay of each workflow f in the link,
constraint (3) is a probability of failure constraint of the link in response to workflow f,
constraint (4) is a RAM capacity constraint of the node providing the micro-service,
constraint (5) is used to control the processing delay of each workflow in the node,
constraint (6) is a fault probability constraint of a node responsive to workflow f,
s3, constructing an optimization target related to the decision variable, wherein equation (7) is used for calculating the energy consumption E (t) of the network in the time slot t
Equation (8) is used to calculate the overall response time T (T) of the network for the working stream in time slot T
Equation (9) is used to calculate the Cost (t) of the network rerouting in time slot t
S4, constructing decision variable related constraints: constraint (10) represents: in addition to the nodes that start the workflow and the nodes that deliver after the workflow is completed, other intermediate nodes have traffic inputs and outputs,
constraint (11) sets that we consider loop-free routing only under this architecture,
constraint (12) indicates that when the workflow reaches the node, the requested micro-service is executed,
constraint (13) ensures that the workflow executes the micro-service of the request to be supported on the node of the micro-service,
constraint (14) ensures that the workflow does not request the same microservice multiple times,
constraint (15) ensures that micro-services are executed only on nodes through which the workflow passes,
constraint (16) indicates that when a node provides a micro-service to a workflow, the node is in an operational state, where Si (t ε {0,1}
S5, constructing an objective function: two self-adaptive related problems are mainly considered in the framework, and the method is mainly a fault handling method.
1) Failure recovery problem of network: objective function 1 optimizes the cost of network traffic rerouting preferentially, and secondly optimizes the overall response time of the workflow in the network, targeting fast response, where α 1 >α 2 。
minα 1 Cost(t)+α 2 T(t)
2) Periodic reconfiguration problem of network: and (3) periodically optimizing the overall network after one complete period of the timer, wherein the objective function 2 mainly optimizes service execution time and node energy consumption, so that the network reaches an optimal state, and the cost of limiting rerouting is not too high. Wherein beta is 1 >β 2 >β 3 。
minβ 1 T(t)+β 2 E(t)+β 3 Cost(t)
Step four: solving by using a solver to obtain an optimal coping strategy, and obtaining the coping strategy through the third step
The expression of optimization problem 1 is:
minα 1 Cost(t)+α 2 T(t)
s.t.(1)-(16)
the expression of optimization problem 2 is:
minβ 1 T(t)+β 2 E(t)+β 3 Cost(t)
s.t.(1)-(16)
both problems are integer programming problems, and can be directly input into a commercial solver Gurobi to obtain a numerical solution, and the corresponding complete fault coping flow is shown in FIG. 3. In addition, when solving the optimization problem 2, the commercial solver Gurobi can be directly used; in solving optimization problem 1, if the requirements on the fault response time are particularly stringent, a suboptimal heuristic algorithm as described below may be used.
Step five: because the calculation complexity of the optimization problem under the architecture is relatively high, a suboptimal heuristic method is provided as shown in fig. 4, the method is suitable for a scene of fault processing with stricter requirements on fault response time, and can quickly respond to network traffic changing along with time, and the solution is a self-adaptive method, is suitable for a network of an actual factory scene, and comprises the following specific steps:
s1, a fault detection component activates a fault coping program, when the optimization problem 1 is solved, the input of an algorithm is a workflow set F through a fault node or link effected And the current network topology g= { N, L } (e.gIf a node or link fails, the network topology and the set of micro services supported by the node may change) and then for each affected workflow f in turn i ∈F effected Resources are allocated.
S2, according to the link capacity constraint (1) and the transmission delay constraint (2), executing m from the current time a ∈f i Starting from node CN, deleting all links whose available capacity or transmission delay cannot meet the workflow requirements, and then obtaining a list K from the current node CN to other micro services m supporting the current requirement a+1 Is a path of a node of (a).
S3, deleting the nodes which cannot meet the workflow requirement of the available capacity or the processing delay from the list K according to the node capacity constraint (4) and the processing delay constraint (5), and then selecting a new node NN which has the shortest execution time and meets the fault probability constraint from the list K.
S4, adding the path to the new node NN to the current path, deleting the current node from the network topology, and preventing the working flow from circulating. Looping S1-S4 until workflow f is completed i Is changed to S5.
S5, outputting a workflow f i The workflow will complete with optimal execution time and continue for the next affected workflow f i+1 And allocating resources, and circulating S1-S5 until all the affected workflows are processed, so as to complete fault processing.
The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention without requiring creative effort by one of ordinary skill in the art. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.
Claims (1)
1. A fault handling method of a factory micro-service system in SDN-FOG environment is characterized in that,
the method comprises the following steps:
step 1, establishing a micro-service system fault processing architecture based on an SDN-FOG environment;
step 2, modeling a system, describing the relationship among infrastructure, a transmission network and fog computing equipment;
step 3, constructing an integer programming problem, wherein two optimization targets are fault processing of fault recovery and periodically reconstructed fault prevention;
step 4, selecting a Gurobi solver to solve the optimal solution of the fault prevention problem;
step 5, using heuristic algorithm for 'fault handling' problem;
the micro-service system fault processing architecture based on the SDN-FOG environment comprises an application layer, a control layer and a base layer;
the application layer comprises a fault processing program, a timer and a functional program;
the control layer comprises various components required by a fault coping scheme, including a network monitoring component, a fault detection component, a node configuration component and a fault prevention component;
the base layer is used for deploying the switches supporting SDN control, the switches have different fault probabilities in time slots, the central master controller is connected to the switches to summarize network topology and flow information, and the switches are configured by using a southbound interface in a dynamic programming way; each switch is connected with one fog node and is combined to be regarded as one node; each fog node corresponds to a group of industrial Internet of things equipment and a fog server;
the step 1 comprises the following steps:
step 1.1, continuously acquiring information by a sensor connected to a factory field device, preprocessing sensor data, and transmitting the sensor data to a connected switch supporting SDN control; the network monitoring component collects node information and link information including field devices, foggy nodes and switches from the switch and sends the information to the application layer to continuously monitor network traffic;
step 1.2, the fault detection assembly judges whether the transmission network has faults according to the information collected by the network monitoring assembly; if the switch or the fog node or the link is detected to be faulty, the network monitoring component sends the current new network topology and the micro service set supported by each fog node to a fault coping program;
step 1.3, calculating a fault coping program in real time, and reallocating resources to a workflow passing through a fault node or a link; the node configuration component distributes resources to the workflow through dispatching a forwarding table, and applies fault response decisions made by a fault response program in an application layer to the switch and the fog node;
step 1.4, refreshing a timer after configuration is completed, reporting a fault type and converting to manual processing if micro services in part of workflow cannot be completed due to faults of part of nodes or links;
step 1.5, activating a fault prevention component after a complete period of a timer, actively reconstructing a network periodically, optimizing a selected path of a workflow, improving the overall service quality and reducing the fault probability;
the step 2 comprises the following steps:
step 2.1, modeling a transmission network: establishing a network topology matrix and a propagation delay matrix of a link;
step 2.2, modeling the fog node and the supported micro service set: representing the micro-service set supported by each node by using a matrix, logically grouping the micro-services according to the service requirement and the quality requirement, and corresponding to the micro-service set supported by each fog node;
step 2.3, modeling workflow: f is a collection of workflows, each workflow containing the execution of a particular function, workflow F ε F is an ordered tuple f= { m 1 ,m 2 ,…,m |f-1| ,m |f| To execute a workflow, data must flow from the node that initiated the workflow to the executing microservice m 1 From there to the execution m 2 And so on, the last functional microservice m |f-1| After completion, m |f| Indicating that the data is to be returned to the plant or delivered to the control layer;
step 2.4, modeling the service quality index: the energy consumption of the network, the response time of the workflow, the predefined threshold of the link failure probability, the predefined threshold of the node failure probability, the rerouting cost;
the step 3 comprises the following steps:
step 3.1, defining decision variables: the system comprises a workflow matrix, a micro-service for executing the workflow and a network link resource allocation matrix;
step 3.2, constructing resource constraint: the method comprises the steps of link capacity constraint among nodes, transmission delay constraint of each workflow in a link, fault probability constraint of a link responding to the workflow, RAM capacity constraint of a node providing micro service, processing delay constraint of each workflow in the node and fault probability constraint of the node responding to the workflow;
step 3.3, constructing an optimization target related to the decision variable: including energy consumption, overall response time, rerouting costs;
step 3.4, constructing decision variable related constraints: when the workflow reaches the node, the requested micro-service is executed, the workflow is ensured not to request the same micro-service for a plurality of times, the micro-service is ensured to be executed only on the node through which the workflow passes, and when the node provides the micro-service for the workflow, the node is in a working state;
step 3.5, constructing an objective function: the method comprises the steps of recovering faults of a network and periodically reconstructing the network;
the heuristic algorithm comprises the following steps:
s1, a fault detection component activates a fault coping program, when a fault recovery problem is solved, the algorithm is input by the attribute of a workflow set of a fault node or a link and the current network topology, and then resources are allocated for each affected workflow in sequence;
s2, according to the link capacity constraint and the transmission delay constraint, starting from the currently executed node, deleting all links with available capacity or transmission delay which cannot meet the workflow requirement, and then obtaining a list, wherein paths from the current node to other nodes supporting the current required micro-service are obtained in the list;
s3, deleting nodes with available capacity or processing delay which cannot meet the workflow requirement from the list according to the node capacity constraint and the processing delay constraint, and then selecting new nodes with shortest execution time and meeting the fault probability constraint;
s4, adding the path to the new node into the current path, deleting the current node from the network topology, and preventing the working flow from circulating; S1-S4 are circulated until all micro services in the workflow are completed, and S5 is switched;
s5, outputting a path of the workflow, completing the workflow with optimal execution time, continuing to allocate resources for the next affected workflow, and circulating S1-S5 until all the affected workflows are processed, and completing fault processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211005627.7A CN115396495B (en) | 2022-08-22 | 2022-08-22 | Fault handling method for factory micro-service system in SDN-FOG environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211005627.7A CN115396495B (en) | 2022-08-22 | 2022-08-22 | Fault handling method for factory micro-service system in SDN-FOG environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115396495A CN115396495A (en) | 2022-11-25 |
CN115396495B true CN115396495B (en) | 2023-12-12 |
Family
ID=84120126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211005627.7A Active CN115396495B (en) | 2022-08-22 | 2022-08-22 | Fault handling method for factory micro-service system in SDN-FOG environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115396495B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018024809A1 (en) * | 2016-08-03 | 2018-02-08 | Schneider Electric Industries Sas | Industrial software defined networking architecture for deployment in a software defined automation system |
CN108259573A (en) * | 2017-12-26 | 2018-07-06 | 西安电子科技大学 | A kind of vehicle self-organizing network system for mixing SDN and mist and calculating |
CN110830292A (en) * | 2019-11-01 | 2020-02-21 | 西安电子科技大学 | Medical big data-oriented cloud and mist mixed path determination method |
-
2022
- 2022-08-22 CN CN202211005627.7A patent/CN115396495B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018024809A1 (en) * | 2016-08-03 | 2018-02-08 | Schneider Electric Industries Sas | Industrial software defined networking architecture for deployment in a software defined automation system |
CN109716732A (en) * | 2016-08-03 | 2019-05-03 | 施耐德电器工业公司 | The network architecture that industrial software for the deployment in the automated system of software definition defines |
CN108259573A (en) * | 2017-12-26 | 2018-07-06 | 西安电子科技大学 | A kind of vehicle self-organizing network system for mixing SDN and mist and calculating |
CN110830292A (en) * | 2019-11-01 | 2020-02-21 | 西安电子科技大学 | Medical big data-oriented cloud and mist mixed path determination method |
Non-Patent Citations (1)
Title |
---|
云雾协同优化控制和软件定义应用技术;刘广一;史迪;朱文东;陈晰;陈金祥;;电力信息与通信技术(03);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115396495A (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Iftikhar et al. | Deltaiot: A self-adaptive internet of things exemplar | |
Skarlat et al. | A framework for optimization, service placement, and runtime operation in the fog | |
Huang et al. | Scalable orchestration of service function chains in NFV-enabled networks: A federated reinforcement learning approach | |
Guim et al. | Autonomous lifecycle management for resource-efficient workload orchestration for green edge computing | |
Okwuibe et al. | SDN-enabled resource orchestration for industrial IoT in collaborative edge-cloud networks | |
Nastic et al. | Polaris scheduler: Edge sensitive and slo aware workload scheduling in cloud-edge-iot clusters | |
CN113341712B (en) | Intelligent hierarchical control selection method for unmanned aerial vehicle autonomous control system | |
CN104301391A (en) | Multi-domain optical network data center resource virtualization mapping method | |
CN108965014A (en) | The service chaining backup method and system of QoS perception | |
Naik et al. | Workflow scheduling optimisation for distributed environment using artificial neural networks and reinforcement learning | |
Fang et al. | Predictive analytics based knowledge-defined orchestration in a hybrid optical/electrical datacenter network testbed | |
CN116032767A (en) | Intelligent fusion identification network-oriented computing power service chain management and control system architecture | |
Li et al. | Scalable knowledge-defined orchestration for hybrid optical–electrical datacenter networks | |
Batista et al. | Self-adjustment of resource allocation for grid applications | |
Cardellini et al. | Self-adaptive container deployment in the fog: A survey | |
Lyu et al. | Cooperative computing anytime, anywhere: Ubiquitous fog services | |
Pasieka et al. | Models, methods and algorithms of web system architecture optimization | |
CN116684418B (en) | Calculation power arrangement scheduling method, calculation power network and device based on calculation power service gateway | |
Tang et al. | Digital Twin Assisted VNF Mapping and Scheduling in SDN/NFV-Enabled Industrial IoT | |
Wu | Deep reinforcement learning based multi-layered traffic scheduling scheme in data center networks | |
CN115396495B (en) | Fault handling method for factory micro-service system in SDN-FOG environment | |
CN117439885A (en) | Kubernetes container scheduling method and system based on service grid | |
Lv et al. | Edge asset management based on administration shell in industrial cyber-physical systems | |
Rahmani et al. | Cognitive controller for 6G-enabled edge autonomic | |
Talaat et al. | Fog computing effective load balancing and strategy for deadlock prediction management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |