WO2014060465A1

WO2014060465A1 - Control system and method for supervisory control and data acquisition

Info

Publication number: WO2014060465A1
Application number: PCT/EP2013/071608
Authority: WO
Inventors: Stuart Goose; Jonathan Kirsch; Dong Wei
Original assignee: Siemens Aktiengesellschaft
Priority date: 2012-10-19
Filing date: 2013-10-16
Publication date: 2014-04-24

Abstract

The present disclosure related to a supervisory control and data acquisition system, SCADA, having a first instance of a SCADA server application instantiated on a first network component of a first cloud, and at least one second instance of a SCADA server application instantiated on a first network component of a second cloud.

Description

CONTROL SYSTEM AND METHOD FOR SUPERVISORY CONTROL AND DATA ACQUISITION

FIELD OF THE INVENTION

The invention relates to a control system, particularly a supervisory control and data acquisition system, and a method for supervisory control and data acquisition. BACKGROUND

Important aspects of designing supervisory control and data acquisition (SCADA) systems are guaranteeing availability, fault tolerance, reliability of service and quality of ser- vice. Ensuring continuous availability typically requires the ability of a SCADA system to withstand, deal with and overcome various types of faults potentially arising in large distributed system, among which may be so-called benign or non-Byzantine faults as well as malicious or so-called Byzan- tine faults. The document WO 2013/049299 Al for example discloses systems and methods for resisting malicious code from tampering with or otherwise exploiting a SCADA system.

A component in a system is considered as faulty once its be- haviour is no longer consistent with its specification. A component comprises a Byzantine failure when it exhibits an arbitrary and malicious behaviour which possibly involves collusion with other faulty components. If the component has a so-called fail-stop failure, the component changes to a state that permits other components to detect that a failure has occurred and then stops. Naturally, Byzantine failures of components in the system can be most disruptive for the respective system. SUMMARY OF THE INVENTION

Accordingly, it is an object of the present invention to pro- vide solutions for improving the resilience against failures and thus the availability of SCADA systems .

This object is achieved by a SCADA system having the features of claim 1, a method for supervisory control and data acquisition having the features of claim 8, a SCADA system having the features of claim 11, and a method for supervisory control and data acquisition having the features of claim 16.

According to a first aspect of the present invention, a supervisory control and data acquisition system, SCADA, com- prises a first instance of a SCADA server application instantiated on a first network component of a first cloud; and at least one second instance of a SCADA server application instantiated on a first network component of a second cloud. The first cloud and the second cloud may be operated by dif- ferent cloud service providers or with independent data centres .

In a possible embodiment of the first aspect, the SCADA system may further comprise a fault -tolerant replication engine being configured to logically synchronize the first instance of the SCADA server application and the second instance of the SCADA server application. This enhances the performance and the reliability of the system since the instances are always guaranteed to be commensurate in state.

In a further possible embodiment of the first aspect, the SCADA system may comprise a third instance of a SCADA server application instantiated on a second network component of the first cloud; a fourth instance of a SCADA server application instantiated on a third network component of the first cloud; and a fifth instance of a SCADA server application instantiated on a fourth network component of the first cloud, wherein the first, third, fourth and fifth instances of the SCADA server application are replicas generated and synchronized according to a fault -tolerant replication protocol. The number of four instances is the minimum number of instances that may guarantee a Byzantine fault -tolerant system against corruption of a single instance of the application instances of the Primary system.

In yet another possible embodiment of the first aspect, the SCADA system may further comprise a sixth instance of a SCADA server application instantiated on a second network component of the second cloud; a seventh instance of a SCADA server application instantiated on a third network component of the second cloud; and an eighth instance of a SCADA server application instantiated on a fourth network component of the second cloud, wherein the second, sixth, seventh and eighth instances of the SCADA server application are replicas generated and synchronized according to a fault -tolerant replication protocol. The number of four instances is the minimum number of instances that may guarantee a Byzantine fault - tolerant system against corruption of a single instance of the application instances. The provision of further instances in the Hot Standby system further enhances the reliability and fail-safeness of the system.

In another possible embodiment of the first aspect, the first and second instances of the SCADA server application may be replicas generated and synchronized according to a fault- tolerant replication protocol. This advantageously allows the system to be fail-safe against failures occurring within the components of the cloud. In yet another possible embodiment of the first aspect, the replication protocol may be a Byzantine fault -tolerant replication protocol. Such a protocol further enhances resistance of the system against malicious attacks from outside.

In a further possible embodiment of the first aspect, the SCADA system may further comprise a third instance of a SCADA server application instantiated on a first network component of a third cloud; and a fourth instance of a SCADA server application instantiated on a first network component of a fourth cloud, wherein the first, second, third and fourth instances of the SCADA server application are replicas generated and synchronized according to the fault -tolerant repli- cation protocol. The number of four instances is the minimum number of instances that may guarantee a Byzantine fault - tolerant system against corruption of a single instance of the application instances of the Primary system. According to a second aspect of the present invention, a method for supervisory control and data acquisition, SCADA, comprises instantiating a first instance of a SCADA server application on a first network component of a first cloud; and instantiating at least one second instance of a SCADA server application instantiated on a first network component of a second cloud. The first cloud and the second cloud may be operated by different cloud service providers or with independent data centres . In a possible embodiment of the second aspect, the method may further comprise instantiating a third instance of a SCADA server application on a second network component of the first cloud; instantiating a fourth instance of a SCADA server application on a third network component of the first cloud; instantiating a fifth instance of a SCADA server application on a fourth network component of the first cloud; and synchronizing the first, third, fourth and fifth instances of the SCADA server application according to a fault -tolerant replication protocol. The number of four instances is the minimum number of instances that may guarantee a Byzantine fault-tolerant system against corruption of a single instance of the application instances of the Primary system. In a further possible embodiment of the second aspect, the method may further comprise instantiating a sixth instance of a SCADA server application on a second network component of the second cloud; instantiating a seventh instance of a SCADA server application on a third network component of the second cloud; instantiating an eighth instance of a SCADA server application on a fourth network component of the second cloud; and synchronizing the second, sixth, seventh and eighth instances of the SCADA server application according to a fault - tolerant replication protocol. The number of four instances is the minimum number of instances that may guarantee a Byzantine fault-tolerant system against corruption of a single instance of the application instances. The provision of further instances in the Hot Standby system further enhances the reliability and fail-safeness of the system.

According to a third aspect of the present invention, a supervisory control and data acquisition, SCADA, system, comprises at least one instance of a SCADA server application instantiated on a network component of a cloud; a plurality of Remote Terminal Units, RTUs, or Programmable Logic Controllers, PLCs,; and an overlay network over the Internet, wherein the at least one instance of the SCADA server application communicates with the plurality of RTUs or PLCs over network nodes of the overlay network. In a possible embodiment of the third aspect, communication between the at least one instance of a SCADA server application and the plurality of RTUs or PLCs may be subject to a hop-by-hop packet recovery. Hop-by-hop packet recovery advantageously facilitates early packet losses and leads to a reduced latency in communication between the instances.

In another possible embodiment of the third aspect, the plu- rality of RTUs or PLCs may be configured to send communication messages to the at least one instance of a SCADA server application via a multicasting protocol. Multicasting has the great advantage that the number of messages needed to be generated by the sending party is vastly reduced, thus leading to decreased communication costs and reduced communication resources necessary in the respective network components.

In yet another possible embodiment of the third aspect, the routing decisions for the communication between the at least one instance of a SCADA server application and the plurality of RTUs or PLCs may be based on a performance metric of the overlay network. This facilitates flexible communication for reduced latency and fast reaction times to safety-critical situations in the network in an advantageous fashion.

In a further possible embodiment of the third aspect, the performance metric of the overlay network may comprise one or more of latency, packet loss rate, message priority, bandwidth costs or predefined policy-based routing.

According to a fourth aspect of the present invention, a method for supervisory control and data acquisition, SCADA, comprises instantiating at least one instance of a SCADA server application on a network component of a cloud; con- necting a plurality of Remote Terminal Units, RTUs, or Programmable Logic Controllers, PLCs, with the at least one instance of a SCADA server application via the Internet; and setting up an overlay network over the Internet, wherein the at least one instance of the SCADA server application communicates with the plurality of RTUs or PLCs over network nodes of the overlay network.

In a possible embodiment of the fourth aspect, communication between the at least one instance of a SCADA server application and the plurality of RTUs or PLCs may be subject to a hop-by-hop packet recovery. Hop-by-hop packet recovery advantageously facilitates early packet losses and leads to a reduced latency in communication between the instances.

In another possible embodiment of the fourth aspect, the plurality of RTUs or PLCs may be configured to send communication messages to the at least one instance of a SCADA server application via a multicasting protocol. Multicasting has the great advantage that the number of messages needed to be generated by the sending party is vastly reduced, thus leading to decreased communication costs and reduced communication resources necessary in the respective network components. In yet another possible embodiment of the fourth aspect, the routing decisions for the communication between the at least one instance of a SCADA server application and the plurality of RTUs or PLCs may be based on a performance metric of the overlay network. This facilitates flexible communication for reduced latency and fast reaction times to safety-critical situations in the network in an advantageous fashion.

In a further possible embodiment of the fourth aspect, the performance metric of the overlay network may comprise one or more of latency, packet los rate, message priority, bandwidth costs or predefined policy-based routing.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, possible embodiments of different aspects of the present invention are described in more detail with reference to the enclosed figures. Fig. 1 shows a network environment of a SCADA system according to an embodiment of the present invention.

Fig. 2 shows a network environment of a SCADA system according to a further embodiment of the present in- vention. shows a network environment of a SCADA system according to a further embodiment of the present invention . shows an overlay network environment according to further embodiment of the present invention.

Fig. 5 shows a network environment of a SCADA system util- izing an overlay network according to yet another embodiment of the present invention.

Fig. 6 schematically illustrates a method for supervisory control and data acquisition according to yet an- other embodiment of the present invention.

Fig. 7 schematically illustrates a method for supervisory control and data acquisition according to yet another embodiment of the present invention. In the figures, like reference numerals denote like or functionally like components, unless indicated otherwise. Although specific embodiments have been illustrated and de- scribed herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. Generally, this application is intended to cover any adaptations or variations of the specific embodiments discussed herein.

DETAILED DESCRIPTION OF EMBODIMENTS The Internet in the sense of the present disclosure comprises a global system of interconnected computer networks which utilize commonly agreed upon communication protocols of the standardized Internet Protocol Suite. The Internet is thus a network of networks consisting of a plurality of private, academic, public, commercial, and government networks of local to global scope. Those networks are connected and interconnected by copper wires, fiber-optic cables, wireless connections, and other data transmission means, carrying a vast amount of information, resources and services, and supporting services such as inter alia - but not limited to - email, video conferencing, voice-over- IP, online chat, file transfer, file sharing and media streaming.

A cloud within the sense of the present disclosure comprises a shared pool or distributed network of configurable computing resources which are connected through a real-time communication network, for example the Internet. The computing resources may for example include network devices, routers, servers, storage devices, application device, and services such as storage space, processing capability, memory space or network bandwidth. The resources can be rapidly provisioned and released with minimal management effort or service provider interaction. A cloud enables ubiquitous, convenient, and on-demand network access to users utilizing resources of the cloud for applications. Different clouds are under the responsibility of different cloud service providers.

The computing resources of a cloud of a cloud service pro- vider are pooled to serve multiple users using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to user demand. A user of a cloud generally lacks control or knowledge over the exact physical location, distribution and attribution of the provided resources. In some case, according to some service level agreement users of a cloud may be allowed to specify certain resource allocation constraints at a higher level of abstraction . Supervisory Control and Data Acquisition, abbreviated

"SCADA", within the meaning of the present disclosure refers to an industrial control system, electric grid control system or computer system used in conjunction with monitoring and controlling one or more distributed processes. SCADA systems within the meaning of the present disclosure may refer to systems which coordinate monitoring, surveillance and management of sites or complexes of systems spread out over large areas, in which control actions at the remote premises may be performed automatically by Remote Terminal Units, RTUs, or Programmable Logic Controllers, PLCs. SCADA systems may form the backbone of vital services in the energy domain, such as electricity generation, transmission and distribution. SCADA systems within the sense of this application may in some embodiments involve a separation between so-called Primary systems and so-called Hot Standby systems. Hot Standby systems are redundant systems running simultaneously with the identical Primary system. Upon failure of the Primary system, the Hot Standby systems provides immediate backup without which the entire system would fail. The switchover may happen automatically, for example upon some agreed upon procedure of error or failure detection. The Hot Standby system at least temporarily replaces the Primary system in full functionality, thereby significantly reducing the time required for a failed system to return to normal operation. A Hot Standby system will have all configuration, state and operational data being mirrored and/or synchronized with the Primary sys- tern at any given point in time.

The combination of a Primary system with a Hot Standby increases fault tolerance, by having the Primary system control the overall system and interact with the clients, whereas the Hot Standby system receives the same inputs as the Primary system and executes them in the same manner. While the output of the Hot Standby system may typically be suppressed, the Hot Standby system monitors the Primary system and performs a switchover operation upon determining that the Primary is in a state of failure.

SCADA systems within the meaning of the present disclosure may alternatively, or additionally, involve SCADA applications or instances of such applications which are replicas of each other, i.e. exact copies of each other that are able to communicate with each other to ensure synchronization of state so that in the event of a partial failure of one or more of the replicas the surviving replicas may continue to provide functionality. The replicas are logically synchronized with one another.

One approach for logically synchronizing replicated SCADA systems is employing state machine replication protocols which are designed to ensure safety, i.e. replica consistency, and liveness, i.e. eventual progress, under different fault and synchronization assumptions. For example, under a benign fault-tolerant replication protocol, safety is guaran- teed in all executions in which replicas fail only by crashing. Under such benign fault -tolerant replication protocols liveness during executions in which a majority of said replicas can communicate with one another in a timely manner is ensured .

Under a Byzantine fault -tolerant replication protocol, safety is guaranteed in all executions in which no more than a threshold number f out of 3f+l replicas are Byzantine - liveness is guaranteed in executions in which at least 2f+l cor- rect, i.e. non-Byzantine, replicas can communicate with one another .

In a benign fault-tolerant replication regime, the clients are able to act on a reply when it is received from one or more replicas. In contrast, in a Byzantine fault-tolerant replication regime, a client is forced to wait for reception of at least f+1 identical responses before acting on them, in order to make sure that the content of the reply has been sent by at least one correct replica.

State machine replication is an established technique for improving the availability of software applications in a distributed system. In a distributed system, one or several services can be provided by servers. Each service can be pro- vided by one or more servers when the clients invoke this service by making corresponding requests for the service. If replicas of a server or server application are executed on separate processors of the distributed information system, state machine replication protocols can be used to coordinate client interactions with those replicas. A state machine approach implements fault -tolerant services by replicating software for servers or server applications and coordinating client interactions with the server or server application replicas.

Fig. 1 shows a network environment 100 of a SCADA system. The network environment 100 may comprise at least two separate clouds 10 and 20, each of the clouds 10 and 20 comprising a plurality of computing resources, i.e. cloud components. The clouds 10 and 20 may generally be managed and serviced by different cloud service providers which may be independent from each other. It may also be possible to have the same cloud service provider for the clouds 10 and 20, if the clouds run with independent data centres. In general, it is desirable to have a separation or causal independence of possible failure causes for the clouds 10 and 20, i.e. if one of the clouds 10 and 20 fails for some cause, the other cloud is very unlikely to fail due to the same cause.

The first cloud 10 may include a SCADA server 15 running an instance of a SCADA server application 16. For example, SCADA server application 16 may be executed as one or more application software modules that work together to provide monitor- ing and control functionality of the SCADA system. The SCADA server application 16 may maintain real-time communication with one or more RTUs or PLCs 30 having a client application 31. The RTUs or PLCs 30 may more generally be contemplated as remote clients within the SCADA system, their number being in principle not limited.

Each RTU or PLC 30 may communicate with and aggregate data from one or more local sensors installed within the system being controlled at respective local elements that are involved with implementing an industrial process or critical service. The SCADA server 15 may, via its SCADA server application 16, periodically poll the RTUs or PLCs 30 to gather and process the sensor data, i.e. in a pull messaging mode. The SCADA server 15 may further also use the gathered sensor data to make decisions as to the control of the system and may also issue supervisory control commands to the RTUs or PLCs 30 for the control of their respective local elements. The RTUs or PLCs 30 may also be configured to send messages to the SCADA server 15 without the SCADA server 15 first initiating the communication, i.e. in a push messaging mode. For example, the RTUs or PLCs 30 may send updated sensor data without first being polled by the SCADA server 15.

The SCADA server 15 may be accessed and programmed by a user using a human machine interface, HMI, workstation (not shown in Fig. 1) . The HMI workstation may periodically query the SCADA server 15 to graphically display the state of each of the RTUs or PLCs 30 for the user. The SCADA server 15 may also push information to the HMI workstation without the HMI workstation first requesting the information.

The SCADA server 15 may communicate with the RTUs or PLCs 30 across a first network connection 3, for example an Internet connection, a local area network, LAN, connection, or a wide area network, WAN, connection. The SCADA server 15 of Fig. 1 may be implemented as a Primary system, with a Hot Standby system implemented in the second cloud 20. The Hot Standby system may include a backup SCADA server 25 running a second instance of a SCADA server appli- cation 26. For example, the second SCADA server application 26 may be executed as one or more application software modules that work together to provide monitoring and control functionality of the SCADA system in case the first cloud 10 hosting the Primary system fails entirely. Upon such failure of the first cloud 10 and thus the Primary system, the SCADA server application 26 may take over and maintain real-time communication with the RTUs or PLCs 30 via a second network connection 4, for example an Internet connection, a local area network, LAN, connection, or a wide area network, WAN, connection.

The instances 16 and 26 of the SCADA server application may be run within virtual machines on the respective SCADA server 15, 25. Each of the SCADA servers 15 and 25 may thus be quickly replaceable by one or more of network components 11, 12, 13 and 21, 22, 23, respectively, of a resource pool 14 or 24 of backup components. This leverages the redundancy and fault tolerance capabilities of each respective cloud 10 and 20. If, for example, the Primary SCADA server 15 fails physi- cally, the first instance 16 of the SCADA server application may be rapidly re- instantiated on a different physical machine taken from the resource pool 14 of network components 11, 12, 13 of the first cloud 10. Similarly, if the Hot Standby SCADA server 25 fails physically, the second instance 26 of the SCADA server application may be rapidly re- instantiated on a different physical machine taken from the resource pool 24 of network components 21, 22, 23 of the second cloud 20. The clouds 10 and 20 themselves are interconnected by network connections, generally denoted as cloud connection 1, which may be Internet, LAN, WAN or similar connections. The SCADA server 15 and the SCADA server 25 are logically synchronized to each by means of a synchronization protocol, generally denoted as reference numeral 2 in Fig. 1. The synchronization protocol 2 may be an asynchronous replication protocol, i.e. the Primary system does not necessarily wait for receipt of an acknowledgement message from the Hot Standby system before taking observable action. Thus, the system of Fig. 1 has the advantage of displaying low latency, since the logical synchronization does not incur major latency itself. The SCADA system of Fig. 1 is designed to withstand benign, i.e. non- Byzantine faults, for example crashes and network failures. The SCADA system of Fig. 1 may, for purposes of logical synchronization, employ a fault -tolerant replication engine being configured to logically synchronize the first instance 16 and the second instance 26. Fig. 2 schematically shows a network environment 200 of a

SCADA system. The network environment 200 of Fig. 2 differs from the network environment 100 of Fig. 1 mainly in that instead of instantiating a single SCADA server application 16 on a SCADA server 15 in the first cloud 10, a set of SCADA server application replicas 16 may be employed, including, for example, a first SCADA server application replica 16a, a second SCADA server application replica 16b, a third SCADA server application replica 16c, and a fourth SCADA server application replica 16d on network components 11, 12, 13 and 15 of the first cloud 10, respectively. In a similar manner, instead of instantiating a single SCADA server application 26 on a SCADA server 25 in the second cloud 20, a set of SCADA server application replicas 26 may be employed, including, for example, a first SCADA server application replica 26a, a second SCADA server application replica 26b, a third SCADA server application replica 26c, and a fourth SCADA server application replica 26d on network components 21, 22, 23 and 25 of the second cloud 20, respectively.

Each of the replicas 16a, 16b, 16c, 16d and 26a, 26b, 26c and 26d may be connected to each other via network links within the first or second cloud 10 or 20, respectively. In the example of Fig. 2, only four replicas 16a, 16b, 16c, 16d and 26a, 26b, 26c and 26d are illustrated - it is to be understood that the actual number of replicas is not limited to four and may be any other number deviating from four. It may be possible to choose the number of actually deployed replicas on the intended fault-tolerance level. For example, the implementation of a Byzantine fault -tolerant state machine replication protocol for the set of replicas 16a, 16b, 16c, 16d and 26a, 26b, 26c and 26d may require the use of 3f+l replicas, f being the maximum threshold number of compromised or Byzantine- faulty replicas to be tolerated.

Each of the set of SCADA server application replicas 16a, 16b, 16c, 16d, and 26a, 26b, 26c, 26d, is subject to a separate state machine replication protocol, i.e. the SCADA server application replicas 16a, 16b, 16c, 16d are replicated and synchronized with each other under the regime of a first state machine replication protocol, whereas the SCADA server application replicas 26a, 26b, 26c, 26d are replicated and synchronized with each other under the regime of a second state machine replication protocol. The first and second state machine replication protocol may in principle be the same, however, the synchronization of the states may not be synchronized between the different state machine replication protocols. It may also be possible to employ first and second state machine replication protocols which differ in their level of fault tolerance. For example, the first state machine replication protocol may be a benign fault tolerant replication protocol, while the second state machine replication protocol may be a Byzantine fault tolerant replication protocol. It may also be possible for the second state machine replication protocol to be a benign fault tolerant replication protocol, while the first state machine replication protocol may be a Byzantine fault tolerant replication protocol .

This allows for the Primary and Hot Standby system to be designed intrusion tolerant, i.e. guaranteed to operate correctly even if part of the system is compromised and under control of a malicious attacker. In that case, the Primary system is a logical Primary system 17 made up from the subset or replicas 16a to 16d, whereas the Hot Standby system is a logical Hot Standby system 27 made up from the subset of replicas 26a to 26d. Since the Primary system 17 and the Hot Standby system 27 are replicated separately in the respective first and second clouds 10 and 20, independently from each other, all time critical messages and notifications of the replication protocols are localized within each cloud 10, 20, i.e. the in- crease in overall latency is tolerable due to message delays in intercloud communication being avoided. As in Fig. 1 the logical synchronization between the Primary system 17 and the Hot Standby system 27 is asynchronous by definition and impervious to system responsiveness.

As first and second state machine replication protocols, the Prime protocol may be chosen. Prime was the first intrusion tolerant replication protocol to guarantee correct operation of a system under malignant attack. Prime achieves replica- tion via the state machine approach: Each replica starts in the same initial state, and the replicas exchange messages to agree on the order in which to execute any event that might cause the application to change its state. The state machine approach assumes that the state transition resulting from applying each state update is deterministic. Therefore, by beginning in the same initial state and applying any events in the same order, non-compromised replicas will be ensured to proceed through exactly the same sequence of states, thus re- maining consistent with one another.

Fig. 3 schematically illustrates a network environment 300 with a plurality of clouds 10, 20 and 40. Similar to the network environment 100 of Fig. 1, the network environment 300 may comprise at least two separate clouds, for example three clouds 10, 20 and 40, each of the clouds 10, 20 and 40 comprising a plurality of computing resources, i.e. cloud components . Each of the clouds 10, 20 and 40 may include a SCADA server 15, 25 and 45 running an instance of a SCADA server application 56. For example, SCADA server application 56 may be executed as one or more application software modules that work together to provide monitoring and control functionality of the SCADA system. The SCADA server application 56 may maintain real-time communication with one or more RTUs or PLCs 30 having a client application 31. The RTUs or PLCs 30 may more generally be contemplated as remote clients within the SCADA system, their number being in principle not limited.

The instances 56 of the SCADA server application are instantiated as replicas, each of which replicas 56 may be connected to each other via intercloud network links 1 between the clouds 10, 20 and 40. In the example of Fig. 3, only three replicas 56 are illustrated - it is to be understood that the actual number of replicas is not limited to three and may be any other number deviating from three. It may be possible to choose the number of actually deployed replicas on the intended fault-tolerance level. For example, the implementation of a Byzantine fault -tolerant state machine replication protocol for the set of replicas 56 may require the use of 3f+l replicas, f being the maximum threshold number of compromised or Byzantine-faulty replicas to be tolerated.

The SCADA server application replicas 56 form an intercloud set of replicas under the regime of a state machine replication protocol, i.e. the SCADA server application replicas 56 are replicated and synchronized with each other under the re- gime of a common state machine replication protocol. For example, the state machine replication protocol may be a benign fault tolerant replication protocol or a Byzantine fault tolerant replication protocol, such as the aforementioned Prime protocol .

Instead of using a Primary / Hot Standby approach as in the case of Figs. 1 and 2, the approach of Fig. 3 uses a single set of replicas 56. Since the replicas 56 are deployed across multiple clouds 10, 20, 40, all replication messages accord- ing to the replication protocol will be inter-cloud messages, i.e. they will cross cloud boundaries. Consequently, there will be a trade-off between increased fault tolerance and increased bandwidth costs and latency. The network environment 300 of Fig. 3 might be most useful in private cloud deploy- ments in which a single cloud service provider may minimize intercloud latency by strategically aligning the localization of the respective cloud data centers for the clouds 10, 20 and 40. Moreover, if a single cloud service provider may con- trol the entire network environment 300, the bandwidth costs would not increase either.

Fig. 4 schematically shows an overlay network ON overlaid over a common network such as the Internet. An overlay network ON is a logical or virtual network built above the underlying network, such as the Internet. A subset of all nodes N of the underlying network, the so-called daemons D, appear in the overlay network ON as overlay nodes. The overlay nodes are connected by virtual links V which consist of one or more physical links A in the underlying network. The overlay nodes or daemons D act as routers on an application level, forwarding packets from one overlay hop to the next until they reach their destination.

Fig. 5 schematically illustrates a network environment 400 employing such an overlay network for carrying messages between a supervisory control and data acquisition, SCADA, system, which comprises at least one instance 16, 26 of a SCADA server application instantiated on a network component 15, 25 of a respective cloud 10, 20 to a plurality of RTUs or PLCs 30. The overlay network 60 comprises a plurality of overlay nodes or daemons 61, 62, 63 of which nodes 61 are entry nodes for the connection to the RTUs or PLCs 30, of which nodes 62 are intermediate nodes and of which nodes 63 are entry nodes for the clouds 10, 20. The instances 16, 26 of the SCADA server application communicate with the plurality of RTUs or PLCs 30 over the network nodes 61, 62, 63 of the overlay network 60.

Communication between the instances 16, 26 and the RTUs or PLCs 30 may be subject to a hop-by-hop packet recovery, i.e. overlay nodes may recover lost packets in a hop-by-hop fashion (as opposed to an end-to-end recovery mechanism) . The number of packets being recovered locally may thus be dramatically increased leading to significant mitigation of the effects of packet losses such as average packet delay or end- to-end reliability.

The RTUs or PLCs 30 may be configured to send communication messages to the instances 16, 26 of a SCADA server application via a multicasting protocol, for example an IP multicast protocol. Multicasting means that the instances 16, 26 or replicas of a SCADA server application are pooled in a multicast group under a multicast address. Any RTU or PLC 30 sending a communication message to the SCADA system may send the message only once to the multicast group and the overlay nodes 61, 62, 63 route packets carrying the message to the current members of the multicast group in one go.

Multicasting is increasingly useful, the more replicas are pooled in a multicast group. For a SCADA system employing a Primary system and a Hot Standby system, each subset of rep- licas of the respective Primary / Hot Standby system may be pooled in separate multicast groups. Thus, the number of individual messages that RTUs or PLCs 30 need to send out is drastically reduced. Similar simplifications can be achieved in the communication between a Primary system and a Hot Standby system where synchronization messages between the Primary system and the Hot Standby system need only be sent by one replica of one of the systems to the multicast group of the respective other sys- tern.

The routing decisions for the communication between the instances 16, 26 of the SCADA server application and the RTUs or PLCs 30 may be based on a performance metric of the over- lay network 60, such as latency, packet loss rate, message priority, bandwidth costs or predefined policy-based routing. Particularly policy-based routing may take into account the economics of using intracloud over intercloud communication according to the bandwidth costs involved in each case.

Internet Service Providers' (ISP) failures may equally been taken into account for the routing decisions so that each overlay node 61, 62, 63 may connect to multiple ISPs. Of course, the environment 400 of Fig. 5 is only exemplary and similar or analog topologies may be created in combining the technical features of the overlay network 60 in Fig. 5 with the cloud architectures of the network environments 200 and 300 in Figs. 2 and 3 as well.

Fig. 6 schematically shows a method Ml for supervisory control and data acquisition, SCADA. The method Ml may be employed in any of the SCADA systems as shown in conjunction with the Figs. 1 to 5. At Mil, a first instance of a SCADA server application on a first network component of a first cloud is instantiated. At M12 at least one second instance of a SCADA server application instantiated on a first network component of a second cloud is instantiated, the first cloud and the second cloud being operated by different cloud ser- vice providers.

Optionally, it may be possible to instantiate at M13 a third instance of a SCADA server application on a second network component of the first cloud, at a fourth instance of a SCADA server application on a third network component of the first cloud, and at M15 a fifth instance of a SCADA server application on a fourth network component of the first cloud. At M16, the first, third, fourth and fifth instances of the SCADA server application may be synchronized according to a fault-tolerant replication protocol.

Further, it may be possible to instantiate at Ml7 a sixth in- stance of a SCADA server application on a second network component of the second cloud, at Ml 8 a seventh instance of a SCADA server application on a third network component of the second cloud, and at M19 an eighth instance of a SCADA server application on a fourth network component of the second cloud. Those second, sixth, seventh and eighth instances of the SCADA server application may at MHO be synchronized according to a fault -tolerant replication protocol.

Fig. 7 schematically shows a method M2 for supervisory con- trol and data acquisition, SCADA. The method M2 may be employed in any SCADA system as shown in conjunction with the Fig. 5, i.e. a SCADA system employing communication over an overlay network connecting the SCADA server applications with the RTUs or PLCs.

At M21, at least one instance of a SCADA server application on a network component of a cloud is instantiated. At M22, a plurality of Remote Terminal Units, RTUs, or Programmable Logic Controllers, PLCs, is connected with the at least one instance of a SCADA server application via the Internet. Finally, at M23, an overlay network over the Internet, wherein the at least one instance of the SCADA server application communicates with the plurality of RTUs or PLCs over network nodes of the overlay network.

In the foregoing detailed description, various features are grouped together in one or more examples or examples with the purpose of streamlining the disclosure. It is to be understood that the above description is intended to be illustra- tive, and not restrictive. It is intended to cover all alternatives, modifications and equivalents. Many other examples will be apparent to one skilled in the art upon reviewing the above specification.

The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. In the appended claims and throughout the specification, the terms "including" and "in which" are used as the plain-English equivalents of the respective terms "comprising" and "wherein," respectively. Furthermore, "a" or "one" does not exclude a plurality in the present case.

Claims

A supervisory control and data acquisition system, SCADA, comprising :

a first instance (16; 16a; 56) of a SCADA server application instantiated on a first network component (15) of a first cloud (10) ; and

at least one second instance (26; 26a; 56) of a SCADA server application instantiated on a first network component (25) of a second cloud (20) .

The SCADA system of claim 1, further comprising:

a fault-tolerant replication engine being configured to logically synchronize the first instance (16; 16a; 56) of the SCADA server application and the second instance (26; 26a; 56) of the SCADA server application.

The SCADA system of claim 1 or 2, further comprising: a third instance (16d) of a SCADA server application instantiated on a second network component (11) of the first cloud (10) ;

a fourth instance (16c) of a SCADA server application instantiated on a third network component (12) of the first cloud (10) ; and

a fifth instance (16b) of a SCADA server application instantiated on a fourth network component (13) of the first cloud (10) ,

wherein the first, third, fourth and fifth instances (16a, 16b, 16c, 16d) of the SCADA server application are replicas generated and synchronized according to a fault - tolerant replication protocol.

The SCADA system of one of the claims 1 to 3 , further comprising : a sixth instance (26d) of a SCADA server application instantiated on a second network component (21) of the sec ond cloud (20) ;

a seventh instance (26c) of a SCADA server application instantiated on a third network component (22) of the second cloud (20) ; and

an eighth instance (26b) of a SCADA server application instantiated on a fourth network component (23) of the second cloud (20) ,

wherein the second, sixth, seventh and eighth instances (26a, 26b, 26c, 26d) of the SCADA server application are replicas generated and synchronized according to a fault tolerant replication protocol.

The SCADA system of claim 1,

wherein the first and second instances (56) of the SCADA server application are replicas generated and synchronized according to a fault -tolerant replication protocol

The SCADA system of one of the claims 3 to 5,

wherein the replication protocol is a Byzantine fault- tolerant replication protocol.

The SCADA system of claim 5, further comprising:

a third instance (56) of a SCADA server application instantiated on a first network component (41) of a third cloud (40) ; and

a fourth instance (56) of a SCADA server application instantiated on a first network component of a fourth cloud,

wherein the first, second, third and fourth instances (56) of the SCADA server application are replicas generated and synchronized according to the fault -tolerant replication protocol. The SCADA system of one of the claims 1 to 7, wherein the first cloud (10) and the second cloud (20) are operated by different cloud service providers or with independent data centres .

A method (Ml) for supervisory control and data acquisition, SCADA, comprising:

instantiating (Mil) a first instance (16; 16a; 56) of a SCADA server application on a first network component (15) of a first cloud (10) ; and

instantiating (M12) at least one second instance (26; 26a; 56) of a SCADA server application instantiated on a first network component (25) of a second cloud (20) , the first cloud (10) and the second cloud (20) being operated by different cloud service providers.

The method according to claim 9, further comprising:

instantiating (M13) a third instance (16d) of a SCADA server application on a second network component (11) of the first cloud (10) ;

instantiating (M14) a fourth instance (16c) of a SCADA server application on a third network component (12) of the first cloud (10) ;

instantiating (M15) a fifth instance (16b) of a SCADA server application on a fourth network component (13) of the first cloud (10) ; and

synchronizing (M16) the first, third, fourth and fifth instances (16a, 16b, 16c, 16d) of the SCADA server appli cation according to a fault -tolerant replication protocol .

11. The method of claim 10, further comprising:

instantiating (M17) a sixth instance (26d) of a SCADA server application on a second network component (21) of the second cloud (20) ;

instantiating (M18) a seventh instance (26c) of a SCADA server application on a third network component (22) of the second cloud (20) ;

instantiating (M19) an eighth instance (26b) of a SCADA server application on a fourth network component (23) of the second cloud (20) ; and

synchronizing (MHO) the second, sixth, seventh and eighth instances (26a, 26b, 26c, 26d) of the SCADA server application according to a fault-tolerant replication protocol .

A supervisory control and data acquisition, SCADA, system, comprising:

at least one instance (16) of a SCADA server application instantiated on a network component (15) of a cloud (10) a plurality of Remote Terminal Units, RTUs, or Programma ble Logic Controllers, PLCs, (30); and

an overlay network (60) over the Internet,

wherein the at least one instance (16) of the SCADA server application communicates with the plurality of

RTUs or PLCs (30) over network nodes (61, 62, 63) of the overlay network (60) .

The SCADA system of claim 12, wherein the communication between the at least one instance (16) of a SCADA server application and the plurality of RTUs or PLCs (30) is subject to a hop-by-hop packet recovery.

The SCADA system of one of the claims 12 and 13, wherein the plurality of RTUs or PLCs (30) are configured to send communication messages to the at least one instance (16) of a SCADA server application via a multicasting protocol .

The SCADA system of one of the claims 12 to 14, wherein the routing decisions for the communication between the at least one instance (16) of a SCADA server application and the plurality of RTUs or PLCs (30) is based on a performance metric of the overlay network (60) .

The SCADA system of claim 15, wherein the performance metric of the overlay network (60) comprises one or more of latency, packet loss rate, message priority, bandwidth costs or predefined policy-based routing.

A method (M2) for supervisory control and data acquisition, SCADA, comprising:

instantiating (M21) at least one instance (16) of a SCADA server application on a network component (15) of a cloud (10) ;

connecting (M22) a plurality of Remote Terminal Units, RTUs, or Programmable Logic Controllers, PLCs, (30) with the at least one instance (16) of a SCADA server application via the Internet; and

setting up (M23) an overlay network (60) over the Internet, wherein the at least one instance (16) of the SCADA server application communicates with the plurality of RTUs or PLCs (30) over network nodes (61, 62, 63) of the overlay network (60) .

The method of claim 17, wherein the communication between the at least one instance (16) of a SCADA server application and the plurality of RTUs or PLCs (30) is subject to a hop-by-hop packet recovery.

19. The method of one of the claims 17 and 18, wherein the plurality of RTUs or PLCs (30) are configured to send communication messages to the at least one instance (16) of a SCADA server application via a multicasting protocol .

20. The method of one of the claims 17 to 19, wherein the routing decisions for the communication between the at least one instance (16) of a SCADA server application and the plurality of RTUs or PLCs (30) is based on a performance metric of the overlay network (60) .

21. The method of claim 20, wherein the performance metric of the overlay network (60) comprises one or more of latency, packet loss rate, message priority, bandwidth costs or predefined policy-based routing.