WO2017025203A1

WO2017025203A1 - Managing lifecycle of a software container

Info

Publication number: WO2017025203A1
Application number: PCT/EP2016/055532
Authority: WO
Inventors: Daniel ESPLING; Jonas Lundberg; Nicklas Sandgren; Johan Kristiansson
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2015-08-13
Filing date: 2016-03-15
Publication date: 2017-02-16

Abstract

It is presented a method for managing a lifecycle of another software container. The method is performed by a first software container of a server. The method comprises the steps of: reading a deployment configuration in a distributed peer-to-peer repository, the deployment configuration relating to an application to which the first software container belongs; finding, in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container; checking a status of the second software container; and triggering deployment of a new instance of the second software container having the second identity, when no properly executing second software container is found.

Description

MANAGING LIFECYCLE OF A SOFTWARE CONTAINER

TECHNICAL FIELD

The invention relates to a method, a server, a computer program and a computer program product for managing a lifecycle of a software container. BACKGROUND

Recently, microservices have become a popular architecture to build modern Web services. By breaking down a complex monolithic application into small independent services, it becomes possible to develop services that are more resilient to error and more scalable. For example, if a particular microservice would fail, it would not affect the entire service. However, if a component part of a monolithic service would fail, the entire service would have to be restarted. Also, the only way to scale a monolithic service is to duplicate the whole monolith by adding more instances of it. In a microservice based architecture on the other hand, only the services that need to be scaled need to be duplicated.

Software containers are commonly used to implement microservice-based architectures and make sure services can run independently of each other. In contrast to virtual machines, software containers are more lightweight and can instantly be started, similar to standard Unix processes, assuming the server has all images required to start the container. Another advantage is that software containers provide a reliable execution environment allowing developers to develop and test their services locally on their machine and then upload the image to a cloud platform and still be sure the containers behave similarly as running them locally. Docker is an example of a container runtime that has recently gained popularity. By allowing container images to be stacked in a so-called union file system, container images can more efficiently be distributed.

However, when adopting a microservice based container architecture, designers and developers still need to figure out how to manage the life cycle of individual components, for example when to start a new service, when to scale it, how to handle fault recovery and so on.

SUMMARY

It is an object to provide a more efficient and robust way in which software containers of an application are started and/or terminated.

According to a first aspect, it is provided a method performed a first software container of a server for managing a lifecycle of another software container. The method comprises the steps of: reading a deployment configuration in a distributed peer-to-peer repository, the deployment configuration relating to an application to which the first software container belongs; finding, in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container; checking a status of the second software container; and triggering deployment of a new instance of the second software container having the second identity, when no operational second software container is found. In this way, the first software container triggers the deployment of the second software container, in line with the deployment configuration. When each software container performs this method, additional software containers will be deployed until the structure of software containers in the deployment configuration is fulfilled. Additionally, if a software container happens to fail, it is redeployed by the preceding (in terms of the deployment configuration) container. This creates a robust system of software containers without the need for central control.

The method may further comprise the step of: triggering termination of the first software container when the deployment configuration indicates that the first software container should be terminated. In this way, a modified system resulting in a removed software container can be achieved by simply modifying the deployment configuration.

The method may further comprise the step of: triggering termination of the first software container when there is another software container having the same identity as the first software container and a predetermined contention resolution algorithm results in that the first software container should be terminated. This prevents duplicate software containers from executing side- by-side. In the step of triggering deployment, parameters for deploying the second software container may be retrieved from the deployment configuration. This is an efficient way of providing any initial parameters for when the second software container is to be deployed.

The method may further comprise the step of: writing an operational status indicator for the first software container in the distributed peer-to-peer repository when the first software container is operational. This is a way to signal that the first software container is operational, i.e. that it has not failed in which case it should be redeployed.

The step of writing the operational status indicator may be repeated, in which case the operational status indicator expires after a period of time unless renewed. Hence, if the operational status indicator has expired, the first software container has failed and should be redeployed by the preceding software container.

The step of checking the status of the second software container may comprise communicating with the second software container. By

communicating with the second software container, great flexibility in terms of using any suitable operational test is provided.

The step of checking the status of the second software container may comprise testing functionality of the second software container. In this way, the first software container can test any suitable (typically critical) function and redeploy the second software container if the result is not satisfactory.

The step of checking the status of the second software container may comprise reading an operational status indicator for the second software container in the peer-to-peer repository. This provides a robust way of distributing operational status indicators, eliminating the need of any central communication node.

According to a second aspect, it is provided a server configured to manage, in a first software container, a lifecycle of another software container. The server comprises: a processor; and a memory storing instructions that, when executed by the processor, cause the server to: read a deployment

configuration in a distributed peer-to-peer repository, the deployment configuration relating to an application to which the first software container belongs; find, in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container; check a status of the second software container; and trigger deployment of a new instance of the second software container having the second identity, when no operational second software container is found.

The server may further comprise instructions that, when executed by the processor, cause the server to trigger termination the first software container when the deployment configuration indicates that the first software container should be terminated.

The server may further comprise instructions that, when executed by the processor, cause the server to trigger termination the first software container when there is another software container having the same identity as the first software container and a predetermined contention resolution algorithm results in that the first software container should be terminated.

The instructions to trigger deployment may comprise instructions that, when executed by the processor, cause the server to retrieve parameters for deploying the second software container from the deployment configuration.

The server may further comprise instructions that, when executed by the processor, cause the server to write an operational status indicator for the first software container in the distributed peer-to-peer repository when the first software container is operational. The server may further comprise instructions that, when executed by the processor, cause the server to repeat the instructions to write the operational status indicator is repeated, in which case the operational status indicator expires after a period of time unless renewed. The instructions to check the status of the second software container may comprise instructions that, when executed by the processor, cause the server to communicate with the second software container.

The instructions to check the status of the second software container may comprise instructions that, when executed by the processor, cause the server to test functionality of the second software container.

The instructions to check the status of the second software container may comprise instructions that, when executed by the processor, cause the server to read an operational status indicator for the second software container in the peer-to-peer repository. According to a third aspect, it is provided a server comprising: means for reading a deployment configuration in a distributed peer-to-peer repository, the deployment configuration relating to an application to which a first software container belongs, the first software container executing in the server; means for finding, in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container; means for checking a status of the second software container; and means for triggering deployment of a new instance of the second software container having the second identity, when no operational second software container is found. According to a fourth aspect, it is provided a computer program for managing, in a first software container, a lifecycle of another software container. The computer program comprises computer program code which, when run on a server causes the server to: read a deployment configuration in a distributed peer-to-peer repository, the deployment configuration relating to an application to which the first software container belongs; find, in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container; check a status of the second software container; and trigger deployment of a new instance of the second software container having the second identity, when no operational second software container is found.

According to a fifth aspect, it is provided a computer program product comprising a computer program according to the fourth aspect and a computer readable means on which the computer program is stored. The computer readable means can be non-transitory. Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the element, apparatus, component, means, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is now described, by way of example, with reference to the accompanying drawings, in which: Fig 1 is a schematic drawing illustrating an environment in which

embodiments presented herein can be applied;

Fig 2 is a schematic drawing illustrating a server shown in Fig 1;

Figs 3A-C are schematic diagrams illustrating the deployment of a software container in an environment corresponding to that of Fig 1; Figs 4A-C are schematic diagrams illustrating the termination of a software container in an environment corresponding to that of Fig 1; Fig 5 is a schematic diagram illustrating the situation with several software containers having the same identity in an environment corresponding to that of Fig I;

Figs 6A-B are flow charts illustrating embodiments of methods for managing lifecycle of another software container;

Fig 7 is a schematic diagram illustrating components of the servers of Fig l;

Fig 8 is a schematic diagram showing functional modules of the server of Fig 7 according to one embodiment; and

Fig 9 shows one example of a computer program product comprising computer readable means.

DETAILED DESCRIPTION

The invention will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like numbers refer to like elements throughout the description. Fig 1 is a schematic drawing illustrating an environment in which

embodiments presented herein can be applied.

There is here a number of servers 4a-h forming part of a set 8 of servers. While there are eight servers 4a-h shown here, the set 8 of servers can contain any suitable number of servers. Each one of the servers 4a-h can execute software containers 2a-h when required, as described in more detail below. Each server 4a-h can execute zero, one or more software containers in parallel. For instance, the software containers 2a-h can be containers running on a Docker platform. The software containers 2a-h are distributed as images being files (images are here not to be confused with

illustrations/photographs). Moreover, a method called process injection is used to transparently add additional processes into the container. This makes it possible to inject management processes into the software containers 2a-h. As is explained in more detail below, the management processes here are used to manage the lifecycle for other software containers as well as for the software container to which the management process belongs. A deployment initiator is used when new a new set of software containers is to be deployed. Using the embodiments presented herein, in contrast to the prior art, it is sufficient for the deployment initiator to deploy a single software container to get the deployment process started. Hence, the deployment initiator only needs to deploy at least one software container and a deployment configuration. Remaining software containers are then deployed without further management by the software containers

themselves.

The set of servers 8 is organised in a decentralized peer-to-peer network, which can be implemented on an underlying network, such as an IP (Internet Protocol) network.

To provide the decentralized solution, this can be based on a Distributed Hash Table (DHT) algorithm, such as Kademlia, Chord or Pastry.

Fig 2 is a schematic drawing illustrating a server 4 shown in Fig 1. The server 4 can be any of the servers 4a-h shown in Fig 1. The server 4 comprises one or more software containers 2. Each software container is an instance of an image and contains, apart from its operative software, a deployment agent 11 and a termination agent 12. The deployment agent 11 and the termination agent 12 are injected processes and do not need to be part of the image for the software container. A peer-to-peer repository 10 is implemented e.g. using DHT as described above. Now an embodiment of the peer-to-peer network for the servers will be described based on Bitverse. Bitverse is a framework to build decentralized peer-to-peer application. Bitverse is based on the Kademlia DHT algorithm and provides a messaging API (Application Programming Interface) and a key value store API. In both APIs, self-generated SHA (Secure Hash

Algorithm)-i or SHA-2 strings are used to identify node end-points and data objects. Bitverse is thus a DHT implementation of a decentralised repository, i.e. a peer-to-peer repository.

Bitverse consists of two different types of nodes, super nodes and edge nodes, Edge nodes are connected using web sockets to a super node and thus form a star topology. An edge node can either run as a library in a web browser client or directly in a server component.

Super nodes communicate using UDP (User Datagram Protocol). Messages are routed using a routing table provides by Kademlia. The routing table consists of 160 buckets where each bucket contains a limited list of contacts (typically 20) discovered by the super node. XOR (Exclusive OR) is used to determine the distance between two different nodes in the network and which bucket a contact should be stored in. This makes it possible to implement an iterative protocol where a node calculates which contacts in its routing table that has the shortest distance to a particular string, or the SHA- 1 of the string to be more specific. These contacts are then queried until there are no more nodes to query. This makes it possible to find the nodes in the whole network that has the shortest distance to a particular string. Like all DHTs, this procedure is very efficient, typically 0(log n) messages need to be sent, where is n is the number of nodes in the network and O denotes order.

By calculating a SHA-i of the key it becomes possible to calculate which nodes in the network a particular key-value pair should be stored, and thus implement a distributed (peer-to-peer) data repository where data is replicated in the network. In Bitverse, the value is represented as an unordered set, which makes it possible to store multiple values associated with a specific key. This special key-value(s) pair is called a SetMap. Below is an example of SetMap storing five values, identified by the key "hello".

"hello" => {value2, valuei, values, valuer values}

The SetMap provides a simple API, for example setmap = bitverse.getSetMapC'hello") setMapAddValue("value6")

It is also possible to set a time-to-live (TTL) on a value stored in a SetMap, e.g. setMapAddValueTTLC'mykey", "value , 2) In this case, value7 is automatically purged after 2 seconds. The TTL mechanism can optionally be combined with a tracking mechanism where a client is notified when a value is added or removed to implement service discovery. If a client stops adding a specific value with a TTL, the value will automatically be purged when the TTL expired, thus causing a tracking event to other clients which can then take appropriate actions. This is one mechanism which can be used by the lifecycle mechanism described below.

The SetMap is also used internally in Bitverse to provide a messaging service. When an edge node connects to a super node, the super node will store the IP number and UDP port of the super node as a value in a SetMap, where the self-generated SHA-i identity of the edge node is used as a key. This makes it possible for an edge node to send messages to any other edge node in the network, by allowing the super nodes to use the information stored in the SetMap to set up tunnels between different super nodes.

Below is an example of a messaging API call to send a message to a remote edge node, where 4cd7ei8o4bad56sds2a4i6df5fgi5e bboisee6f is the address (as an SHA-i value) of the remote edge node. bitverse.sendM essage("4cdzei 8ο αά563ά32α4ΐ 6df5fgi5eft>boi3ee6f', "hello")

The message is sent to edge node's local super node, which will tunnel the message to a foreign super node where the remote edge node is connected, assuming the remote edge is not directly connected to the local super node. In this case, the message can be send directly without using a tunnel link.

Now embodiments will be disclosed with reference to Figs ι and 2.

Embodiments presented herein allow an application with multiple software containers to scale (grow) and self-repair using software containers as building blocks, without a central management function. The embodiments presented herein comprise of two lifecycle processes that are injected into the software containers. As the proposed solution is completely decentralized, each individual software container is responsible for executing both lifecycle processes independently of each other. The first lifecycle process is called software container deployment, performed by the deployment agent 11 and is responsible for starting new software containers according a cyclic data structure called a deployment

requirements document. The process is somewhat similar to biological cell division in the sense that a software container creates another software container. However, in reality, the container deployment works more like a dispatcher, e.g. asking an external platform to spin off a new software container.

The second mechanism is called software container termination, and is somewhat similar to the programmable cell death mechanism existing in living cells. This mechanism is performed by the termination agent 12. The software container termination process is responsible for deciding if a software container should self-destruct and then make sure it is destroyed in a controlled way, for example waiting until active connections have been closed before terminating the software container. In this way, the deployment and termination processes oppose each other where one process tries to create, whereas the other process tries to terminate. However, the overall goal of both processes is to make sure that the current state of the system converges to a pre-defined condition defined by a deployment requirements document, and eventually reaching equilibrium. The deployment requirements document is compiled by software developers or an automatic deployment system and can be changed whenever

appropriate. Once the deployment requirements document has been modified, the software container deployment process automatically deploys missing software containers and the termination process terminates superfluous software containers, as explained in more detail below. As these changes can take time to perform and do not happen instantly, it can take some time before the system stabilizes. During this stabilization period, there could either be too few or too many software containers running. However, by designing an application in such a way that redundancy is a de facto mode of operation, it is not a critical failure of if an incorrect number of software containers are running. For example, by running web servers behind a high availability proxy or clustering a database over multiple software containers, running an incorrect number of software containers will only result in temporary decreased performance or temporary over-utilization of resources. Instead of using a centralized controller, or a leader election algorithm to dynamically choose a controller, each software container is given a dedicated responsibility to manage one or several other software container(s) of the application and make sure it is up and running. This is achieved by

organizing the deployment requirements document as a cyclic ring, where a node in the ring is responsible for managing the clockwise (or alternatively counter-clockwise) next node in the ring, thus creating circular dependencies.

Although not the only method to deploy an application, it would be possible to deploy an entire application just by deploying a single software container, which will spin of the next software container in the ring, which will then spin of a third software container, and so on. The recursion will end when all software containers specified in the deployment requirements document are up and running. Likewise, the only way to crash an application would be if all software containers crash at the same time, which makes the proposed embodiments highly resilient to failures.

Each node in the deployment requirements document can be a key-value pair in a peer-to-peer repository (e.g. DHT) representing a particular microservice or software container, where the key is the identity of the deployment requirements document node and the value contains configuration options needed to deploy the corresponding software container. The value also contains the ID (identity) of the clockwise next node in the ring. Note that the deployment requirements document is an overlay abstraction stored in a DHT and exists independently if the software containers are running or not.

Before deploying a software container image to a cloud platform, the image is implanted with two agents, the deployment agent 11 and the termination agent 12. These agents are configured so that they can access the deployment requirements document.

The deployment agent 11 is responsible for managing the clockwise next node in the deployment requirements document. It checks the health (i.e.

operational status) of the clockwise next node by reading data related to it, which can be published by the termination agent 12 of that node. As previously described, if the deployment agent 11 detects that a software container is not running or malfunctioning (i.e. is not operational), it then deploys a new software container according to the configuration option stored in the clockwise next node in the deployment requirements document.

Looking at Fig 1, let us assume a situation where on the first software container 2a and the eighth software container 2h are running, but the deployment requirement document specifies that eight software containers should form part of the application (see below how this can be specified). This means that the deployment agent associated with the first software container, e.g. ID=i, will lookup node with ID=2 in the DHT and then spin off (i.e. deploy) the associated software container. The recursion ends when all eight software containers are running.

The proposed embodiments make it possible to manage scaling and fault tolerance of applications containing several software containers by injecting a lifecycle mechanism into the software containers. In this way, it becomes possible create a decentralized peer-to-peer network of software containers, where the software containers manage themselves from the inside using service choreography rather than relying on centralized orchestration tools.

This approach has several benefits: Firstly, this results in minimal dependency to the underlying platform.

Developers are not dependent or vendor-locked to a particular platform. In fact, it becomes possible to mix several platforms at the same time, for exampling using a Docker Swarm and Mesos even running on different datacenters. This could also open for an interesting idea called Liquid software containers, where application components can seamlessly move between execution environments, including end-user devices such as laptops or smart phones, or telecommunication equipment such as radio base stations, to create a ubiquitous cloud execution environment.

Secondly, there is no single point of failure. As all software containers are more or less acting as a controller (its own orchestration tool), the system can recover from failures as long as not all software containers are terminated at the same time.

Thirdly, excellent scalability is provided. As a node is only responsible for exactly one other node, in contrast to having a centralized component responsible for managing a large amount of software containers, the system can be scaled to support a vast number of software containers.

Now the embodiments will be described in some more detail. In short, the lifecycle management of the software container comprises the following steps. The first two steps are preparation steps and typically done before an application is deployed or upgraded. This is performed for one application which may comprise software containers of different types, here three types. The same process can be performed for more applications.

1. All needed software container images are implanted with a deployment agent and a termination agent using an implantation tool.

2. Developers or operators specify which implanted software container images should be deployed and how many instances of them should run as well environmental variables for each software container image in a deployment requirements document, e.g. as a JSON (JavaScript Object Notation) file (or any other suitable configuration file), such as the following:

[

{image url : http : //<registry>/implanted/containerA, instances: 2},

{image url: http : //<registry>/implanted/containerB, instances: 2},

{image url: http : //<registry>/implanted/containerC, instances: 10},

]

Each entry represents a different application with a respective image.

The deployment requirements document is compiled to a deployment configuration. Alternatively, the deployment configuration can indirectly be derived in runtime from the deployment requirements document, e.g. using indices in the deployment requirements document. The purpose of the deployment configuration is to specify how the

deployment requirements document should be composed by specifying how to assign unique IDs to the nodes and how to link the nodes to a ring structure. It also contains information (software container images and environmental variables) how to deploy the software containers

corresponding to each node in the deployment requirements document.

The JSON object below is an example of a deployment configuration. [

{id: 1, next id: 2, image url :

http : / /<registry>/implanted/containerA} ,

{id: 2, next id: 3, image url:

http : / /<registry>/implanted/containerA} ,

{id: 3, next id: 4, image url:

http : //<registry>/implanted/containerB } ,

{id: 4, next id: 5, image url:

http : //<registry>/implanted/containerB } ,

{id: 5, next id: 6, image url:

http : //<registry>/implanted/containerC } ,

{id: 14, next id: 1, image url:

http : //<registry>/example/containerC }

]

Note that the next_id of the last object in the deployment configuration points back to node 1 to thereby create a ring structure.

3. The next step is to deploy the software containers specified in the deployment configuration. By specifying a node ID (or index) in the deployment configuration, an external deployment initiator 7 or a

deployment agent 11 then deploys the software container to a platform (Container Runtime). Note that at least one software container need to be deployed externally to bootstrap the application, which will cause a chain reaction repeating this step (Step 3) until the entire application is deployed and operating correctly.

After deploying the software container, the deployment agent or the External Deployment Tool assigns an ID (according to the deployment configuration) to the deployment agent and the termination agent running in the new software container. 4. Each deployment agent 11 then uses the deployment configuration to find out the ID of the deployment requirements document node it is responsible for. Alternatively, the deployment agent could directly be assigned the node ID (next_id) it is responsible for in addition to its own ID. By looking up the next_id in a DHT-based overlay peer-to-peer network, e.g. Bitverse, a deployment agent 11 can obtain health information of that node published by its associated termination agent 12. If it is not running it will deploy the missing software container and configure the corresponding deployment and termination agent, as described in Step 3.

The termination agent 12 is responsible for checking the health of a software container and ultimately determining if it should terminate. Various methods can be used for health checking. Healthy is herein to be construed as operational, i.e. not in a faulty or failed state. One simple method is simply to assume the application process in the software container is operational (i.e. operating correctly) as long as the implanted termination agent 12 is running. By letting the termination agent 12 periodically publish a token to a Bitverse SetMap using a TTL, as described above, a remote responsible deployment agent 11 can retrieve the token from the SetMap. If the termination agent 12 stops publishing the token, e.g. due to a failure of the software container 2 holding the termination agent, the token will automatically be purged, thus causing the responsible deployment agent 11 to receive an event and then redeploy the software container. An alternative implementation would be to allow the deployment agent 11 to send a ping message to the termination agent 12. If it does not respond, the software container is presumed dead, i.e. non-operational.

The following pseudo code illustrates how a deployment agent interacts with a remote termination agent running in a remote software container it is responsible for. Note that the deployment configuration is also stored in Bitverse. termination agent pseudo code :

thisNode = bitverse . getSetMap (mylD)

// this deployment requirements document node

for {

thisNode .AddValueTTL ("healthStatus", "ok", 10)

// TTL set to 10 seconds

sleep (5) // sleep 5 seconds l8

} deployment agent pseudo code

nextNode = bitverse . getSetMap (nextID)

responsibilities =

parse (bitverse . getSetMap ( "heal thResonsibilityDoc") ) for {

healthStatus := nextNode . GetValue ("healthStatus") if healthStatus != "ok" {

// we need to repair the next node

// find out which container image to deploy

containerlmage = responsibilities [nextID] . imageUrl

// deploy the container and configure its deployment and // termination agents

deploymentTool . deploy (containerlmage , nextID)

sleep (5) // sleep 5 seconds, and repeat health check again }

An alternative implementation would be to let the deployment agent track the healthStatus value in the SetMap. In this case, it will automatically be notified if the ok value is purged from the Bitverse network.

It would also be possible to implement more advanced health checks, e.g. requiring the termination agent 12 to successfully run a test (e.g. testing if the application process is responding on a TCP port) before publishing the health token in Bitverse (or answering a ping request).

If the termination agent detects that the application process is not working correctly, it needs to take appropriate actions such as logging the event and gracefully terminating the software container. Terminating the software container can either be done by terminating the application process inside the software container, or asking an external tool to terminate the software container. If the application has been scaled down and less software containers instances are required, the software container could become superfluous, consequently requiring the termination agent to terminate the software container. In this case, the termination agent could be programmed to starve the software container before terminating it. Modification of the deployment requirements document to scale up or down an application is further described below.

Figs 3A-C are schematic diagrams illustrating the deployment of a software container in an environment corresponding to that of Fig 1. To modify an application composition, e.g. scale up or down an application, developers or operators needs to modify the deployment requirements document, resulting in a new deployment configuration. Affected nodes then need to be assigned new responsibilities.

For instance, let us look at a scenario where a ninth software container 21 is to be deployed between the second software container 2b and the third software container.

Looking first to Fig 3A, the ninth software container 21 is added to the deployment requirements document. Its next_id is set to ID = 3.

The deployment agent of the second software container 2b (ID = 2) is assigned responsibility for the ninth software container 21 (ID=9).

Looking now to Fig 3B, the deployment agent of the second software container deploys the ninth software container to comply with the new deployment configuration. The deployment agent of the ninth software container 21 is assigned responsibility for the third software container 2c (ID = 3)·

This results in the situation of Fig 3C, where the second software container 2b is no longer responsible for the third software container 2c. Hence, to scale up, all affected running deployment agents need to be made aware of about their new next_id links, which will cause them to deploy the new missing software container (s). This can be done explicitly or by the deployment agents and the termination agents tracking the deployment configuration stored in Bitverse

Figs 4A-C are schematic diagrams illustrating the termination of a software container in an environment corresponding to that of Fig 1. In this scenario, the third software container 2c is to be terminated.

Looking first to Fig 4A, the third software container 2c (ID=3) is removed from the deployment configuration.

Looking now to Fig 4B, the deployment configuration is also modified such that the second software container 2b (ID = 2) is assigned responsibility for the fourth software container 2d (IS =4), i.e. the next_id is set to 4 for ID=2.

The termination agent of the third software container 2c detects it absence from the deployment configuration and triggers a shutdown, resulting in termination.

Hence, to scale down, the affected node is removed from the deployment configuration and the previous node is made aware of its new responsibility before the removed software container is stopped, otherwise it will just restart the stopped software container.

Fig 5 is a schematic diagram illustrating the situation with several software containers having the same identity in an environment corresponding to that of Fig 1.

If both an external tool deployment tool (e.g. the deployment initiator 7) and deployment agents 11 are allowed to deploy new software containers, it is possible that multiple agents of software containers are assigned exactly the same ID. Creation of redundant deployment and termination agents can also occur if a deployment agent is given responsibility of multiple nodes, which can for example happen when scaling up an application, as the node 2b (ID=2) depicted in Figs 3A-C. It could also make sense to assign a

deployment agent multiple responsibilities to prevent the deployment requirements document ring from becoming disconnected (another solution would be regularly traverse all nodes in the ring to check if it is intact). One approach to this problem would be to prevent creation of redundant agents in the first place. However, a drawback of this approach is that a distributed lock mechanism or a consensus algorithm needs to be

implemented, which can be complex to implement and require significant amount of signalling traffic between involved nodes. Another approach would simply accept that redundant agents can be created, and then terminate them using the termination mechanism. After all, the effect of inconsistency in the deployment requirements document would only result in too many software containers running temporarily.

See for example in Fig 5, where there are three instances 2c, 2c' and 2c" of the third software container.

To detect redundant software containers created due to inconsistency, the termination agents 12 need to be able to discover other agents with the same ID. This can for example be implemented by allowing multiple IDs to be stored in next_id field, assuming the next_id field is accessible in Bitverse. Such a solution would also require each node to be able to look up its parent (counter clockwise) node in the ring, e.g. by introducing bidirectional links in the deployment requirements document.

When a termination agent 12 discover that multiple software containers are running with the same ID, they need to collectively decide which one should survive and which should be terminated. This can be determined using a predetermined contention resolution algorithm. Termination agents that lose the contention resolution then trigger the termination of their respective software containers. As each software container needs to be granted access to the platform to deploy and stop software containers, it could become possible for an intruder to hack an available software container and then introduce malicious software containers, overload the system, or simply stop software containers to thereby cause service disruption. To prevent this from happening, the underlying platform could be configured to only deploy software container images specified in the deployment requirements document. The deployment requirements document could also contain hashes (e.g. SHA-i) of the software container images that could be checked for validity by the

underlying platform before deploying a software container, thus preventing unauthorized manipulation of software container images. Note that the deployment requirements document and the deployment configuration can also be protected with public-key cryptography and access control so that deployment and termination agents can only read and not modify the content.

While a deployment policy, as described above, prevents deployment of non- approved software container images, it would still be possible to hack a software container and stop other software containers. As an additional security precaution, the platform could be configured to reject external un- deployment requests. In this case, the only way to stop a software container would be to stop it from the inside of the software container by letting the termination agent terminate all running processes, thus causing the software container to terminate, e.g. in response to a modified deployment

requirements document as illustrated in Figs 4A-C and described above. Figs 6A-B are flow charts illustrating embodiments of methods for managing lifecycle of another software container. The method is performed in a software container, e.g. as an injected process as explained above. Each software container (e.g. in the embodiment of Fig 1) can perform this method in parallel. In a read config (configuration) step 40, a deployment configuration is read in a distributed peer-to-peer repository 10. The deployment configuration relates to an application to which the first software container belongs.

In a find next step 42, a second identity, referring to a second software container being directly subsequent to the first software container, is found in the deployment configuration. The deployment configuration can e.g. be in the form of the deployment requirements document described above.

In a check status step 44, a status of the second software container is checked. In one embodiment, this comprises communicating with the second software container, to thereby detect if it is operational (healthy) or not. In one embodiment, this comprises testing functionality of the second software container, e.g. by invoking a test routine of the second software container and receiving a result, to thereby detect if is operational or not.

Alternatively or additionally, this step comprises reading an operational status indicator for the second software container in the peer-to-peer repository. When the second software container has written an operational status indicator indicating an operable state, this allows the first software container to use the peer-to-peer repository to easily detect this.

In a conditional next active step 45, it is checked whether an operational (i.e. properly executing, healthy) second software container is found in step 44. If this is the case, the method ends, otherwise, the method proceeds to a trigger deployment step 46.

In the trigger deployment step 46, a deployment of a new instance of the second software container having the second identity is triggered. Optionally, parameters for deploying the second software container are retrieved from the deployment configuration obtained in the read config step 40.

The deployment can be effected by the software container, the platform within the server for the software container or an external entity. Looking now to Fig 6B, only new or modified steps compared to the method illustrated by the flow chart of Fig 6A will be described.

Here, after the read config step 40, the method splits in three parts. The left string (steps 42, 44 45, 46) is performed by a deployment agent (see reference numeral 11 of Fig 2), the right string (steps 47, 49, 48, 52) is performed by a termination agent (see reference numeral 12 of Fig 2). The write status indicator step 50 can be performed by a separate agent or may e.g. form part of the termination agent 12. The read config step 40 may be performed by a separate agent or may e.g. form part of the deployment agent 11 and/or the termination agent 12.

In the optional write status indicator step 50, an operational status indicator for the first software indicator is written in the distributed peer-to-peer repository (10) when the first software container is operational. In other words, the operational status indicator written to the repository indicates that the software container executing this step is operational, i.e. healthy.

The write status indicator step 50 may be repeated, optionally after a delay. The operational status indicator may expire after a period of time unless renewed, e.g. using the TTL mechanism explained above. In this way, when a software container fails, there is no renewal of the operational status indicator and another software container can determine that the software container is not operational.

In a conditional termination indicated step 47, it is determined whether the deployment configuration (obtained in the read config step 40) indicates that the first software container should be terminated (see e.g. Figs 4A-C and corresponding text above). If this is the case, the method proceeds to a trigger self-destruct step 48, 52. Otherwise, the method proceeds to a conditional conflict and contention lost step 49.

In the conditional conflict and contention lost step 49, it is determined whether there is another software container having the same identity as the first software container (see e.g. Fig 5) and a predetermined contention resolution algorithm results in that the first software container should be terminated. If this is the case, the method proceeds to the trigger self- destruct step 48, 52. Otherwise, the method returns to the read config step 40. In the trigger self-destruct step 48, 52 the first software container triggers the termination itself, i.e. a self-destruct is effected. For instance, this can include waiting until all active client connections have been closed before terminating the software container to achieve a graceful termination.

Here, after the trigger deployment step 46, the method returns to the read config step 40.

Fig 7 is a schematic diagram illustrating components of the each one of the servers 4a-h of Fig 1, here represented by a single server 4. A processor 70 is provided using any combination of one or more of a suitable central processing unit (CPU), multiprocessor, microcontroller, digital signal processor (DSP), application specific integrated circuit etc., capable of executing software instructions 77 stored in a memory 75, which can thus be a computer program product. The processor 70 can be configured to execute the method described with reference to Figs 6A-B above.

The memory 75 can be any combination of read and write memory (RAM) and read only memory (ROM). The memory 75 also comprises persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory.

A data memory 76 is also provided for reading and/or storing data during execution of software instructions in the processor 70. The data memory 76 can be any combination of read and write memory (RAM) and read only memory (ROM). The server 4 further comprises an I/O interface 72 for communicating with other external entities. Optionally, the I/O interface 72 also includes a user interface.

Other components of the server 4 are omitted in order not to obscure the concepts presented herein.

Fig 8 is a schematic diagram showing functional modules of the server 4 of Fig 7 according to one embodiment. The modules are implemented using software instructions such as a computer program executing in the server 4. The modules correspond to the steps in the methods illustrated in Figs 6A and 6B.

A reader 80 corresponds to step 40. A finder 81 corresponds to step 42. A checker 82 corresponds to steps 44, 45, 47, and 49. A deployer 83

corresponds to step 46. A writer 84 corresponds to step 50. A self-destructor 85 corresponds to steps 46, 48. Fig 9 shows one example of a computer program product comprising computer readable means. On this computer readable means a computer program 91 can be stored, which computer program can cause a processor to execute a method according to embodiments described herein. In this example, the computer program product is an optical disc, such as a CD (compact disc) or a DVD (digital versatile disc) or a Blu-Ray disc. As explained above, the computer program product could also be embodied in a memory of a device, such as the computer program product 77 of Fig 7. While the computer program 91 is here schematically shown as a track on the depicted optical disk, the computer program can be stored in any way which is suitable for the computer program product, such as a removable solid state memory, e.g. a Universal Serial Bus (USB) drive.

The invention has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the invention, as defined by the appended patent claims.

Claims

1. A method performed a first software container of a server (4) for managing a lifecycle of another software container, the method comprising the steps of:

reading (40) a deployment configuration in a distributed peer-to-peer repository (10), the deployment configuration relating to an application to which the first software container belongs;

finding (42), in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container;

checking (44) a status of the second software container; and

triggering (46) deployment of a new instance of the second software container having the second identity, when no operational second software container is found.

2. The method according to claim 1, further comprising the step of:

triggering (48) termination of the first software container when the deployment configuration indicates that the first software container should be terminated.

3. The method according to claim 1 or 2, further comprising the step of: triggering (52) termination of the first software container when there is another software container having the same identity as the first software container and a predetermined contention resolution algorithm results in that the first software container should be terminated.

4. The method according to any one of the preceding claims, wherein in the step of triggering (46) deployment, parameters for deploying the second software container are retrieved from the deployment configuration.

5. The method according to any one of the preceding claims, further comprising the step of:

writing (50) an operational status indicator for the first software container in the distributed peer-to-peer repository (10) when the first software container is operational.

6. The method according to claim 5, wherein the step of writing (48) the operational status indicator is repeated and wherein the operational status indicator expires after a period of time unless renewed.

7. The method according to any one of the preceding claims, wherein the step of checking (44) the status of the second software container comprises communicating with the second software container.

8. The method according to any one of the preceding claims, wherein the step of checking (44) the status of the second software container comprises testing functionality of the second software container.

9. The method according to any one of the preceding claims, wherein the step of checking (44) the status of the second software container comprises reading an operational status indicator for the second software container in the peer-to-peer repository.

10. A server (4) configured to manage, in a first software container, a lifecycle of another software container, the server comprising:

a processor (70); and

a memory (74) storing instructions (76) that, when executed by the processor, cause the server (4) to:

read a deployment configuration in a distributed peer-to-peer repository (10), the deployment configuration relating to an application to which the first software container belongs;

find, in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container;

check a status of the second software container; and

trigger deployment of a new instance of the second software container having the second identity, when no operational second software container is found.

11. The server (4) according to claim 10, further comprising instructions (76) that, when executed by the processor, cause the server (4) to trigger termination the first software container when the deployment configuration indicates that the first software container should be terminated.

12. The server (4) according to claim 10 or 11, further comprising

instructions (76) that, when executed by the processor, cause the server (4) to trigger termination the first software container when there is another software container having the same identity as the first software container and a predetermined contention resolution algorithm results in that the first software container should be terminated.

13. The server (4) according to any one of claims 10 to 12, wherein the instructions to trigger deployment comprise instructions (76) that, when executed by the processor, cause the server (4) to retrieve parameters for deploying the second software container from the deployment configuration.

14. The server (4) according to any one of claims 10 to 13, further

comprising instructions (76) that, when executed by the processor, cause the server (4) to write an operational status indicator for the first software container in the distributed peer-to-peer repository (10) when the first software container is operational.

15. The server (4) according to claim 14, further comprising instructions (76) that, when executed by the processor, cause the server (4) to repeat the instructions to write the operational status indicator is repeated and wherein the operational status indicator expires after a period of time unless renewed.

16. The server (4) according to any one of claims 10 to 15, wherein the instructions to check the status of the second software container comprise instructions (76) that, when executed by the processor, cause the server (4) to communicate with the second software container.

17. The server (4) according to any one of claims 10 to 16, wherein the instructions to check the status of the second software container comprise instructions (76) that, when executed by the processor, cause the server (4) to test functionality of the second software container.

18. The server (4) according to any one of claims 10 to 17, wherein the instructions to check the status of the second software container comprise instructions (76) that, when executed by the processor, cause the server (4) to read an operational status indicator for the second software container in the peer-to-peer repository.

19. A server (4) comprising:

means for reading a deployment configuration in a distributed peer-to- peer repository (10), the deployment configuration relating to an application to which a first software container belongs, the first software container executing in the server;

means for finding, in the deployment configuration, a second identity referring to a second software container being directly subsequent to the first software container;

means for checking a status of the second software container; and means for triggering deployment of a new instance of the second software container having the second identity, when no operational second software container is found.

20. A computer program (91) for managing, in a first software container, a lifecycle of another software container the computer program comprising computer program code which, when run on a server (4) causes the server (4) to:

check a status of the second software container; and

trigger deployment of a new instance of the second software container having the second identity, when no operational second software container found.

21. A computer program product (90) comprising a computer program according to claim 20 and a computer readable means on which the computer program is stored.