WO2013022411A1 - Coordinating software deployment - Google Patents

Coordinating software deployment Download PDF

Info

Publication number
WO2013022411A1
WO2013022411A1 PCT/UA2011/000073 UA2011000073W WO2013022411A1 WO 2013022411 A1 WO2013022411 A1 WO 2013022411A1 UA 2011000073 W UA2011000073 W UA 2011000073W WO 2013022411 A1 WO2013022411 A1 WO 2013022411A1
Authority
WO
WIPO (PCT)
Prior art keywords
production change
change request
worker
production
configuration data
Prior art date
Application number
PCT/UA2011/000073
Other languages
French (fr)
Inventor
David J. HIXSON
Andreas HALTER
Didier FRICK
Bohdan Vlasyuk
Pim B. PELT
Original Assignee
Google Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Inc. filed Critical Google Inc.
Priority to PCT/UA2011/000073 priority Critical patent/WO2013022411A1/en
Publication of WO2013022411A1 publication Critical patent/WO2013022411A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling

Definitions

  • a distributed system may have instances of one or more software components implemented at multiple nodes within the system. Further, each node may implement different combinations of components depending on, for example, the services provided by the node or functions performed by that portion of the system. In some cases, one or more components may be called or reused by modules or other components, for example, to avoid having to define a new component each time a particular task must be performed.
  • modules, component, and sub-component as used herein, each refer to a unit of software that when executed by one or more processors performs a task.
  • a module may depend on one or more components to accomplish a task, and a component may depend on one or more subcomponents or other components to accomplish a task.
  • any production change to a component may affect the operation of a module or other components that depend on the modified component.
  • any production change to a sub-component may affect the operation of the component that depends on the sub-component, and any module or other component that depends on the component.
  • a software deployment system is provided to coordinate and manage the dispatch of production changes across many different software components and services in a distributed system.
  • the system presents a universal interface layer for submitting production change requests and interfaces with disparate back-end tools for carrying out the deployment of the production changes. In this way, the update or reconfiguration of interrelated software components across distributed systems can be carried out reliably and without redeveloping the back-end tools.
  • one aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving production change requests by a deployment manager, each production change request identifying a component module to be modified by a corresponding production change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies; scheduling a first production change request and a second production change request in accordance with scheduling requirements specified by the corresponding configuration data; and assigning the first production change request and the second production change request to different worker modules, wherein the worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools.
  • Other embodiments of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices.
  • the worker modules can be configured to execute on different computers. Assigning the first production change request and the second production change request to different worker modules can include assigning the first production change request to a first worker module, and assigning the second production change request to a second worker module, such that the first and second worker modules operate in parallel to deploy the corresponding production change. Scheduling the first production change request can include determining whether component dependencies specified by the corresponding configuration data have been modified. Scheduling the first production change request can include determining whether a conflict exists between the one or more component dependencies specified by the configuration data associated with the first production change request and component dependencies associated with a dispatched production change request.
  • the workflow specified by the configuration data associated with the first production change request can include a canary push phase whereby the production change is deployed to a first system, such that only target components within the first system are modified, in order to detect unexpected deviations from the normal behavior when exposed to production traffic.
  • the workflow specified by the configuration data associated with the first production change request can include a series of independent phases to be performed by the corresponding worker module in deploying the corresponding production change.
  • the actions or operations can include receiving and/or exchanging phase state information associated with a current phase being performed by the corresponding worker module, and reporting the phase state information to a user via one or more communication channels.
  • the system may facilitate scheduling and deployment of production changes across a distributed production system using existing back-end tools. Further the system may improve the reliability and- stability of the production system by automatically resolving software dependencies, conflicts, and scheduling priorities between co-pending production change requests prior to deployment.
  • the system may safeguard the production system by conditioning full deployment of a production change on the results of partial/test deployments.
  • the system may facilitate the detection of unexpected changes in the performance characteristics of the production system.
  • the system may enable more rapid parallel rollouts of non-conflict changes without an increase in risk to the stability of the production system.
  • the system may provide for detailed logging which enables troubleshooting in the event that a problem is detected and for auditing purposes.
  • FIG. 1 illustrates an example software deployment system for coordinating and managing the dispatch of production changes across different software components and services in a distributed system.
  • FIG. 2 is a flow diagram of an example multiple phase deployment plan.
  • FIG. 3 is a flow diagram of an example software deployment technique.
  • FIG. 4 is a block diagram of an example software deployment system. Like reference numbers and designations in the various drawings indicate like elements.
  • Software deployment system 100 includes a deployment manager 102, one or more workers 104, and a user interface 106.
  • Deployment manager 102 and workers 104 may be implemented, for example, as one or more processes, services, daemons, or a combination thereof.
  • user interface 106 includes a command line interface, a web interface, or both.
  • User interface 106 accepts production change requests identifying a target component to receive the production change, as well as other related information, as detailed further below.
  • a production change is a change introduced to a component currently running in production and can include a binary replacement of the component, a flag/variable setting associated with the component, a change in the components configuration, or an update/change to the data served by the component.
  • Manager 102 maintains a persistent queue of production change requests in DB 108.
  • manager 102 also stores the production change request and related information locally, for example, in an in-memory queue associated with manager 102, to limit round trips to DB 108 for in-flight changes.
  • the production change requests stored in DB 108 are retrieved and used by manager 102 to resume deployment of the production changes.
  • DB 108 includes a state table indexed by the submission time of the production change request and the user responsible for the request (e.g., by a string concatenation: production-change-submission-time+owner). Each entry in the state table stores information associated with the corresponding production change, including, for example, production change information, state information, configuration information, and a worker identifier ("worker id") if assigned.
  • the state information provides an indication of the deployment progress for the corresponding production change, including the status of the completed phases.
  • the worker identifier identifies the worker tasked with deploying the production change.
  • DB 108 may include more or less information, different information, or different combinations of information.
  • Production change information includes information used in implementing the production change, including, for example, a component identifier, component dependency information, and configuration information.
  • the component identifier identifies one or more target components 120 in one or more system nodes 130 to receive the production change. In some cases, the component identifier identifies different components having a common characteristic or set of characteristics to be affected by the production change.
  • the component dependency information identifies the components ("dependencies") on which the target components depend and, in some examples, identifies an expected state or version of the dependencies to ensure the dependencies have not been affected by other production change requests.
  • a network 140 for example, the Internet, a privately managed network, or other method of interconnecting systems, connects system nodes 130 and the software deployment system 100.
  • the configuration information specifies a workflow that includes pre-deployment steps, a deployment plan, and a post deployment steps, as well as corresponding information, including, for example, scheduling parameters.
  • the workflow is specified by the user using user interface 106 or can be provided through other means.
  • the configuration information is defined in a configuration file.
  • the pre-deployment steps include, for example, timing checks, ownership checks, presubmit checks, and locking checks.
  • the locking checks include obtaining exclusive access to the component or other/dependent components to be modified by modifying a semaphore variable or other access restriction technique.
  • Manager 102 performs the pre-deployment steps to schedule the dispatch of the production change requests based on the specified scheduling parameters.
  • the scheduling parameters include locking requirements, delayed scheduling information, and priority information.
  • the scheduling parameters also include punitive scheduling information derived, for example, based on prior unsuccessful attempts to deploy the production change.
  • the locking requirements are specified as a component lock list.
  • the component lock list identifies components and subcomponents affected by the production change request or other safety reasons.
  • the specified components are locked when the production change is dispatched to prevent a conflict with an overlapping production change request.
  • manager 102 resolves all component dependencies for a single production change, for example, by verifying that the current state or version information associated with the dependencies matches that specified with the production change information. In addition, manager 102 confirms that the production change is clear of conflicts with existing locks. The production change is ready for dispatch when each of the scheduling parameters/requirements is met.
  • the assignment of the production change to a worker 104 is based on worker availability.
  • Worker availability is determined, for example, based on a registered, dynamic pool ofworker list and an available worker list maintained by manager 102.
  • manager 102 communicates with worker 104 via a remote procedure call (“RPC") server.
  • RPC remote procedure call
  • each worker communicates with the RPC server and binds to an available RPC port.
  • worker 104 registers with manager 102 to obtain a worker id.
  • Worker 104 then creates an ephemeral lock file identifying the workerjd, the host, the RPC port, and a time stamp.
  • the lock file is part of a reliable distributed locking mechanism that presents itself as a network file system. Worker 104 tries to maintain the existence of this file while it's running, thus maintaining its registration with manager 102.
  • manager 102 In the event that manager 102 restarts, for example, if a crash occurs, it will attempt to reacquire the worker locks based on the assignment information retrieved from DB 108. If manager 102 fails to acquire a lock, the worker is deemed to be alive. If it acquires a worker lock, it treats the deployed production change request as unfinished and reschedules it.
  • a worker may die/crash while deploying the production change and restart. If the worker dies during a reentrant phase, the work can be rescheduled. Manager 102 is able to detect worker restarts by determining that the previously assigned worker id is no longer available for assignment during worker registration. Since there is a race condition between registration and worker restarts, the worker_id is verified before being assigned to a new production change request.
  • Manager 102 periodically scans a worker state table or map to find unassigned workers or dead workers. In some examples, manager 102 retrieves a list of assignments identifying a dispatched production change request and the corresponding worker identifier. Manager 102 then determines whether each of the worker ids in the assignment list is currently registered with the manager. If no match is found, the worker is considered dead and disassociated from the production change request. In addition, the assignment list is updated, the state information associated with the production change request is reset to IN QUEUE status, and the production
  • assignment list are considered available for assignment and included in an "available_workers" list.
  • manager 102 determines whether a production change request in its queue is ready for dispatch, and if so, communicates the production change information to the worker. The worker id of the worker is then associated with the production change, thereby updating the assignment information and the worker id is removed from the available_workers list.
  • the workflow also includes a deployment plan that specifies a series of independent phases or rounds to be performed by worker 104 after the production change has been dispatched. Each phase includes a specific update or verification task to be performed prior to the start of the next phase in the workflow.
  • FIG. 2 illustrates an example deployment plan including multiple phases.
  • the phases include a system check phase, a global capacity and load balancing check phase, a canary push phase, an idle phase, a post canary check phase, a main push phase, and a post main check phase.
  • the first phase, or system check phase includes identifying systems including one or more target components matching the component identifier (202). Affected systems may be identified, for example, based on a system-component database indexed by the component identifier or information provided in the production change request.
  • the global capacity and load balancing check phase includes ordering the identified systems (204), for example, based on the work load being handled by the system.
  • the identified systems are sorted based on the user traffic such that systems having a greater quantity of requests or queries per second ("qps") are first in the order.
  • the canary push phase includes deploying the production change to a first system (206), for example, the system having the highest amount of traffic, such that only target components within the first system are modified.
  • the mechanism by which the production change is implemented or deployed is implementation dependent and can include multiple, different mechanisms, including, for example, the use of configuration management tools such as Bcfg2, DACS, LCFG, and various others. In this way, the update or reconfiguration of each of the components in a system can be carried out reliably and without redeveloping existing back-end tools, scripts, or libraries previously used to deploy production changes.
  • the idle phase includes entering into a sleep state to allow the modified components to operate in production (208).
  • the sleep state is maintained until a triggering event occurs. Triggering events may include, for example, the expiration of a predetermined amount of time, the processing of a threshold amount of requests, and the occurrence of a fatal error.
  • the post canary check phase includes determining the health of the modified components (210), for example, by determining whether an error has occurred during the deployment or as a consequence of it.
  • the main push phase includes deploying the production change to the other identified systems (212) and is followed by a post main check phase (214) similar to the post canary check phase described above.
  • a successful deployment is followed by the execution of post deployment steps.
  • the post deployment steps may include submitting deployment information to the SCM system (216).
  • An SCM system manages a master file repository, or depot, containing every revision of every file under the system's control. The information may be submitted, for example, by issuing a command to create an entry in a change log maintained by the SCM system.
  • manager 104 communicates updated state information associated with the production change request to manager 102 for storage in DB 108 and for reporting.
  • the state information provides an indication of the deployment progress for the corresponding production change, including the status of the completed phases.
  • the state information maintained by manager 102 is the sole communication channel between phases and phase dependence on external information is deprecated. Storing the state information in persistent storage, for example DB 108, enables restarting worker 104 in the event of a crash, a stall or other interruptions.
  • the state information is stored as a dictionary of arbitrary structure represented as a JSON-encoded string.
  • manager 102 replies with an error, worker 104 resends the updated state information indefinitely, until it is successfully received. If the updated state information indicates an error was encountered during the current phase, worker 104 enters a wait loop and awaits further instructions. In some implementations, system 100 notifies interested parties by sending out out status and error notifications over multiple channels, including, for example, email and xmpp. A user will then decide whether the production change request should be cancelled, retried, rolled back, scheduled on a different worker, or modified to skip a phase. In some examples, if the updated state information indicates a fatal error, the production change request is automatically rolled back by worker 104. In some cases, manager 102 maintains a full history of state information, and thus, enables the deployment process to revert to an earlier phase in the workflow in response to input provided by a user.
  • next phase begins by retrieving the updated state information from manager 102 and storing it in a memory associated with the current phase to reduce the risk of data tampering.
  • Phases may be defined in executable software code or scripts, for example.
  • each defined phase includes a "dry run" mode, which performs as many checks as possible without making any changes to the target components. Dry run mode can be used for testing or to provide immediate user feedback about a production change affecting a target component that is currently locked, for example.
  • manager 102 After successful deployment of the production change, manager 102 performs the post-deployment steps, as described above.
  • system 100 also includes a logging server 110, a natural language processing server 112, and a key- value data store service 114.
  • Logging server 110 multiplexes notifications to multiple channels as described above.
  • Natural language processor 112 is used to extract dates specified in natural language and convert them into machine readable format, for example, in specifying the scheduling parameters.
  • Key-value data store service 114 maintains one or more key- value data stores which provide a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Key- value data store service 114 enables manager 102 or users access to statistics/metrics related to completed production change deployments and is used for punitive scheduling decisions.
  • a success rate based on previous deployment attempts may be associated with users or teams of users such that production change requests are scheduled in accordance with the requesting user's success rate.
  • the success rate may be calculated, for example, based on a rolling average of the number of production change requests that result in roll-back. This ensures an increase in the net rate of change to the system while motivating users/teams to improve the quality and stability of each production change.
  • FIG. 3 illustrates an example software deployment technique performed by data processing device.
  • production change requests are first received by a deployment manager (302), for example from multiple users.
  • Each production change request identifies a component module to be modified by a " corresponding production -change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies.
  • the production change requests are then scheduled in accordance with scheduling requirements (304) specified by the corresponding configuration data.
  • the deployment manager determines whether component dependencies specified by the corresponding configuration data have been modified, for example, by another production change request such that proceeding with the deployment of the current production change request may result in system instabilities.
  • the deployment manager determines whether a conflict exists between the one or more component dependencies specified by the configuration data associated with a queued production change request and component dependencies associated with a dispatched production change request, for example, to ensure component locks do not interfere with the deployed production change request.
  • production change requests are assigned to different worker modules and dispatched (306).
  • the worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools, including, for example, a canary push stage.
  • the different software configuration tools may be preexisting tools previously used to implement production change requests to particular nodes or subsystems within the production system.
  • the worker modules execute on different computers in the production system.
  • phase state information associated with the current phase being performed by the corresponding worker module is submitted by the worker modules and received by the deployment manager, and, if necessary, the deployment manager reports the phase state information to a user via one or more communication channels (308).
  • FIG. 4 is a schematic diagram of an example system configured to coordinate and manage the dispatch of production changes across many different software components and services in a distributed system.
  • the system generally consists of a server 402.
  • the server 402 is optionally connected to one or more user or client computers 490 through a network 480.
  • the server 402 consists of one or more data processing apparatus. While only one data processing apparatus is shown in FIG. 4, multiple data processing apparatus can be used.
  • the server 402 includes various modules, e.g. executable software programs, including a deployment manager 404, one or more workers 405, " a " logging server ⁇ 406, a natural language processing server 408, and an key-value data store service 410.
  • Each module runs as part of the operating system on the server 402, runs as an application on the server 402, or runs as part of the operating system and part of an application on the server 402, for instance.
  • the software modules can be distributed on one or more data processing apparatus connected by one or more networks or other suitable communication mediums.
  • the server 402 also includes hardware or firmware devices including one or more processors 412, one or more additional devices 414, a computer readable medium 416, a communication interface 418, and one or more user interface devices 420.
  • Each processor 412 is capable of processing instructions for execution within the server 402. In some implementations, the processor 412 is a single or multi-threaded processor.
  • Each processor 412 is capable of processing instructions stored on the computer readable medium 416 or on a storage device such as one of the additional devices 414.
  • the server 402 uses its communication interface 418 to communicate with one or more computers 490, for example, over a network 480.
  • Examples of user interface devices 420 include a display, a camera, a speaker, a microphone, a tactile feedback device, a keyboard, and a mouse.
  • the server 402 can store instructions that implement operations associated with the modules described above, for example, on the computer readable medium 416 or one or more additional devices 414, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, or a tape device.
  • Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine- generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • a "computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them.
  • a computer storage medium is not a propagated signal
  • a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal.
  • the computer storage medium can also be, or be included in, one or more separate physical components or media, e.g., multiple CDs, disks, or other storage devices.
  • the operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
  • data processing apparatus encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing
  • the apparatus can include special purpose logic circuitry, e.g., an FPGA, or field programmable gate array, or an ASIC, or application-specific integrated circuit.
  • the apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them.
  • the apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
  • a computer program also known as a program, software, software application, script, or code, can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code.
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA or an ASIC.
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
  • Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • a computer can interact with a user by sending documents to and feceivihg documents from a device that is used by the user; for example, by sending documents to a web browser on a user's client device in response to requests received from the web browser.
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a commumcation network.
  • Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an internetwork, e.g., the Internet, and peer-to-peer networks, e.g., ad hoc peer-to-peer networks.
  • LAN local area network
  • WAN wide area network
  • Internet an internetwork
  • peer-to-peer networks e.g., ad hoc peer-to-peer networks.
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • a server transmits data, e.g., an HTML document, to a client device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device.
  • Data generated at the client device e.g., a result of the user interaction, can be received from the client device at the server.

Abstract

Methods, systems, and apparatus, including computer programs encoded on a system, for semantic document analysis. In one aspect, methods include the actions of receiving production change requests by a deployment manager, each production change request identifying a component module to be modified by a corresponding production change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies; scheduling a first production change request and a second production change request in accordance with scheduling requirements specified by the corresponding configuration data; and assigning the first production change request and the second production change request to one or more worker modules, wherein the one or more worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools.

Description

COORDINATING SOFTWARE DEPLOYMENT
BACKGROUND
This specification relates to coordinating software deployment in distributed systems. A distributed system may have instances of one or more software components implemented at multiple nodes within the system. Further, each node may implement different combinations of components depending on, for example, the services provided by the node or functions performed by that portion of the system. In some cases, one or more components may be called or reused by modules or other components, for example, to avoid having to define a new component each time a particular task must be performed. In general, the terms, module, component, and sub-component, as used herein, each refer to a unit of software that when executed by one or more processors performs a task. A module may depend on one or more components to accomplish a task, and a component may depend on one or more subcomponents or other components to accomplish a task. As such, any production change to a component may affect the operation of a module or other components that depend on the modified component. Similarly, any production change to a sub-component may affect the operation of the component that depends on the sub-component, and any module or other component that depends on the component.
In large or complex systems, teams of engineers are typically tasked with maintaining or updating the various units of software within the system. In some cases, particular components may be modified by different teams. Conflicts between different production changes may arise, for example, when independent changes affecting multiple inter-related components or sub-components are implemented concurrently. Further, the tools required to update certain components may be incompatible with other components due to having been created by different teams using different systems and during different time periods. Converting or redeveloping one or more of the required tools may increase the risk of a software bug and is likely to require a significant investment of resources, particularly in large scale distributed systems.
SUMMARY
A software deployment system is provided to coordinate and manage the dispatch of production changes across many different software components and services in a distributed system. The system presents a universal interface layer for submitting production change requests and interfaces with disparate back-end tools for carrying out the deployment of the production changes. In this way, the update or reconfiguration of interrelated software components across distributed systems can be carried out reliably and without redeveloping the back-end tools.
In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving production change requests by a deployment manager, each production change request identifying a component module to be modified by a corresponding production change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies; scheduling a first production change request and a second production change request in accordance with scheduling requirements specified by the corresponding configuration data; and assigning the first production change request and the second production change request to different worker modules, wherein the worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools. Other embodiments of this aspect include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices.
These and other embodiments can each optionally include one or more of the following features. The worker modules can be configured to execute on different computers. Assigning the first production change request and the second production change request to different worker modules can include assigning the first production change request to a first worker module, and assigning the second production change request to a second worker module, such that the first and second worker modules operate in parallel to deploy the corresponding production change. Scheduling the first production change request can include determining whether component dependencies specified by the corresponding configuration data have been modified. Scheduling the first production change request can include determining whether a conflict exists between the one or more component dependencies specified by the configuration data associated with the first production change request and component dependencies associated with a dispatched production change request. The workflow specified by the configuration data associated with the first production change request can include a canary push phase whereby the production change is deployed to a first system, such that only target components within the first system are modified, in order to detect unexpected deviations from the normal behavior when exposed to production traffic. The workflow specified by the configuration data associated with the first production change request can include a series of independent phases to be performed by the corresponding worker module in deploying the corresponding production change. The actions or operations can include receiving and/or exchanging phase state information associated with a current phase being performed by the corresponding worker module, and reporting the phase state information to a user via one or more communication channels.
Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. The system may facilitate scheduling and deployment of production changes across a distributed production system using existing back-end tools. Further the system may improve the reliability and- stability of the production system by automatically resolving software dependencies, conflicts, and scheduling priorities between co-pending production change requests prior to deployment. The system may safeguard the production system by conditioning full deployment of a production change on the results of partial/test deployments. The system may facilitate the detection of unexpected changes in the performance characteristics of the production system. The system may enable more rapid parallel rollouts of non-conflict changes without an increase in risk to the stability of the production system. The system may provide for detailed logging which enables troubleshooting in the event that a problem is detected and for auditing purposes.
The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an example software deployment system for coordinating and managing the dispatch of production changes across different software components and services in a distributed system.
FIG. 2 is a flow diagram of an example multiple phase deployment plan.
FIG. 3 is a flow diagram of an example software deployment technique.
FIG. 4 is a block diagram of an example software deployment system. Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
In the description below, for the purposes of explanation, specific examples related to coordinating software deployment in a distributed or grid computing environment have been set forth in order to provide a thorough understanding of the implementations of the subject matter described in this specification. It is appreciated that the implementations described herein can be utilized in other capacities as well and need not be limited to software deployment in distributed systems. For example, one or more of the aspects described below may be used in reliably managing updates to software support -files _ or software applications on personal computing devices.
Software deployment system 100 includes a deployment manager 102, one or more workers 104, and a user interface 106. Deployment manager 102 and workers 104 may be implemented, for example, as one or more processes, services, daemons, or a combination thereof. In some examples, user interface 106 includes a command line interface, a web interface, or both. User interface 106 accepts production change requests identifying a target component to receive the production change, as well as other related information, as detailed further below. A production change is a change introduced to a component currently running in production and can include a binary replacement of the component, a flag/variable setting associated with the component, a change in the components configuration, or an update/change to the data served by the component.
Manager 102 maintains a persistent queue of production change requests in DB 108. In some implementations, manager 102 also stores the production change request and related information locally, for example, in an in-memory queue associated with manager 102, to limit round trips to DB 108 for in-flight changes. Upon startup, the production change requests stored in DB 108 are retrieved and used by manager 102 to resume deployment of the production changes.
In some implementations, DB 108 includes a state table indexed by the submission time of the production change request and the user responsible for the request (e.g., by a string concatenation: production-change-submission-time+owner). Each entry in the state table stores information associated with the corresponding production change, including, for example, production change information, state information, configuration information, and a worker identifier ("worker id") if assigned. The state information provides an indication of the deployment progress for the corresponding production change, including the status of the completed phases. The worker identifier identifies the worker tasked with deploying the production change. In general, DB 108 may include more or less information, different information, or different combinations of information.
Production change information includes information used in implementing the production change, including, for example, a component identifier, component dependency information, and configuration information. The component identifier identifies one or more target components 120 in one or more system nodes 130 to receive the production change. In some cases, the component identifier identifies different components having a common characteristic or set of characteristics to be affected by the production change. The component dependency information identifies the components ("dependencies") on which the target components depend and, in some examples, identifies an expected state or version of the dependencies to ensure the dependencies have not been affected by other production change requests. A network 140, for example, the Internet, a privately managed network, or other method of interconnecting systems, connects system nodes 130 and the software deployment system 100.
The configuration information specifies a workflow that includes pre-deployment steps, a deployment plan, and a post deployment steps, as well as corresponding information, including, for example, scheduling parameters. The workflow is specified by the user using user interface 106 or can be provided through other means. For example, in some implementations, the configuration information is defined in a configuration file.
The pre-deployment steps include, for example, timing checks, ownership checks, presubmit checks, and locking checks. In some implementations, the locking checks include obtaining exclusive access to the component or other/dependent components to be modified by modifying a semaphore variable or other access restriction technique. Manager 102 performs the pre-deployment steps to schedule the dispatch of the production change requests based on the specified scheduling parameters. The scheduling parameters include locking requirements, delayed scheduling information, and priority information. In some implementations, the scheduling parameters also include punitive scheduling information derived, for example, based on prior unsuccessful attempts to deploy the production change. In some examples, the locking requirements are specified as a component lock list. The component lock list identifies components and subcomponents affected by the production change request or other safety reasons. The specified components are locked when the production change is dispatched to prevent a conflict with an overlapping production change request. In some cases, it may be necessary or desirable to schedule the deployment of multiple production changes to occur in parallel. For example, in some cases, simultaneous changes to different components may be necessary due to a codependency between the components. Further, by scheduling simultaneous or overlapping deployments of non-conflicting changes, the speed at which production changes are implemented can be increased.
On every queue scan, manager 102 resolves all component dependencies for a single production change, for example, by verifying that the current state or version information associated with the dependencies matches that specified with the production change information. In addition, manager 102 confirms that the production change is clear of conflicts with existing locks. The production change is ready for dispatch when each of the scheduling parameters/requirements is met.
The assignment of the production change to a worker 104 is based on worker availability. Worker availability is determined, for example, based on a registered, dynamic pool ofworker list and an available worker list maintained by manager 102.
In some instances, manager 102 communicates with worker 104 via a remote procedure call ("RPC") server. Upon start up or resume, each worker communicates with the RPC server and binds to an available RPC port. After binding to the RPC port, worker 104 registers with manager 102 to obtain a worker id. Worker 104 then creates an ephemeral lock file identifying the workerjd, the host, the RPC port, and a time stamp. The lock file is part of a reliable distributed locking mechanism that presents itself as a network file system. Worker 104 tries to maintain the existence of this file while it's running, thus maintaining its registration with manager 102. In the event that manager 102 restarts, for example, if a crash occurs, it will attempt to reacquire the worker locks based on the assignment information retrieved from DB 108. If manager 102 fails to acquire a lock, the worker is deemed to be alive. If it acquires a worker lock, it treats the deployed production change request as unfinished and reschedules it.
In some cases, a worker may die/crash while deploying the production change and restart. If the worker dies during a reentrant phase, the work can be rescheduled. Manager 102 is able to detect worker restarts by determining that the previously assigned worker id is no longer available for assignment during worker registration. Since there is a race condition between registration and worker restarts, the worker_id is verified before being assigned to a new production change request.
Manager 102 periodically scans a worker state table or map to find unassigned workers or dead workers. In some examples, manager 102 retrieves a list of assignments identifying a dispatched production change request and the corresponding worker identifier. Manager 102 then determines whether each of the worker ids in the assignment list is currently registered with the manager. If no match is found, the worker is considered dead and disassociated from the production change request. In addition, the assignment list is updated, the state information associated with the production change request is reset to IN QUEUE status, and the production
Figure imgf000009_0001
assignment list are considered available for assignment and included in an "available_workers" list.
Once an available worker is found, manager 102 determines whether a production change request in its queue is ready for dispatch, and if so, communicates the production change information to the worker. The worker id of the worker is then associated with the production change, thereby updating the assignment information and the worker id is removed from the available_workers list.
As described above, the workflow also includes a deployment plan that specifies a series of independent phases or rounds to be performed by worker 104 after the production change has been dispatched. Each phase includes a specific update or verification task to be performed prior to the start of the next phase in the workflow.
FIG. 2 illustrates an example deployment plan including multiple phases. The phases include a system check phase, a global capacity and load balancing check phase, a canary push phase, an idle phase, a post canary check phase, a main push phase, and a post main check phase. As shown, the first phase, or system check phase, includes identifying systems including one or more target components matching the component identifier (202). Affected systems may be identified, for example, based on a system-component database indexed by the component identifier or information provided in the production change request. The global capacity and load balancing check phase includes ordering the identified systems (204), for example, based on the work load being handled by the system. In some implementations, the identified systems are sorted based on the user traffic such that systems having a greater quantity of requests or queries per second ("qps") are first in the order. The canary push phase includes deploying the production change to a first system (206), for example, the system having the highest amount of traffic, such that only target components within the first system are modified. The mechanism by which the production change is implemented or deployed is implementation dependent and can include multiple, different mechanisms, including, for example, the use of configuration management tools such as Bcfg2, DACS, LCFG, and various others. In this way, the update or reconfiguration of each of the components in a system can be carried out reliably and without redeveloping existing back-end tools, scripts, or libraries previously used to deploy production changes.
The idle phase includes entering into a sleep state to allow the modified components to operate in production (208). In some examples, the sleep state is maintained until a triggering event occurs. Triggering events may include, for example, the expiration of a predetermined amount of time, the processing of a threshold amount of requests, and the occurrence of a fatal error. The post canary check phase includes determining the health of the modified components (210), for example, by determining whether an error has occurred during the deployment or as a consequence of it. The main push phase includes deploying the production change to the other identified systems (212) and is followed by a post main check phase (214) similar to the post canary check phase described above.
In some implementations, a successful deployment is followed by the execution of post deployment steps. For example, in a production system equipped with a software configuration management ("SCM") system, the post deployment steps may include submitting deployment information to the SCM system (216). An SCM system manages a master file repository, or depot, containing every revision of every file under the system's control. The information may be submitted, for example, by issuing a command to create an entry in a change log maintained by the SCM system.
At the completion of each phase, worker 104 communicates updated state information associated with the production change request to manager 102 for storage in DB 108 and for reporting. As explained above, the state information provides an indication of the deployment progress for the corresponding production change, including the status of the completed phases. In some implementations, the state information maintained by manager 102 is the sole communication channel between phases and phase dependence on external information is deprecated. Storing the state information in persistent storage, for example DB 108, enables restarting worker 104 in the event of a crash, a stall or other interruptions. In some implementations, the state information is stored as a dictionary of arbitrary structure represented as a JSON-encoded string.
If manager 102 replies with an error, worker 104 resends the updated state information indefinitely, until it is successfully received. If the updated state information indicates an error was encountered during the current phase, worker 104 enters a wait loop and awaits further instructions. In some implementations, system 100 notifies interested parties by sending out out status and error notifications over multiple channels, including, for example, email and xmpp. A user will then decide whether the production change request should be cancelled, retried, rolled back, scheduled on a different worker, or modified to skip a phase. In some examples, if the updated state information indicates a fatal error, the production change request is automatically rolled back by worker 104. In some cases, manager 102 maintains a full history of state information, and thus, enables the deployment process to revert to an earlier phase in the workflow in response to input provided by a user.
If the updated state information indicates successful completion, worker 104 resets itself and automatically proceeds onto the next phase. In some cases, the next phase begins by retrieving the updated state information from manager 102 and storing it in a memory associated with the current phase to reduce the risk of data tampering.
Phases may be defined in executable software code or scripts, for example. In some examples, each defined phase includes a "dry run" mode, which performs as many checks as possible without making any changes to the target components. Dry run mode can be used for testing or to provide immediate user feedback about a production change affecting a target component that is currently locked, for example. After successful deployment of the production change, manager 102 performs the post-deployment steps, as described above.
Referring again to FIG. 1, in some implementations, system 100 also includes a logging server 110, a natural language processing server 112, and a key- value data store service 114. Logging server 110 multiplexes notifications to multiple channels as described above. Natural language processor 112 is used to extract dates specified in natural language and convert them into machine readable format, for example, in specifying the scheduling parameters. Key-value data store service 114 maintains one or more key- value data stores which provide a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Key- value data store service 114 enables manager 102 or users access to statistics/metrics related to completed production change deployments and is used for punitive scheduling decisions. For example, a success rate based on previous deployment attempts may be associated with users or teams of users such that production change requests are scheduled in accordance with the requesting user's success rate. The success rate may be calculated, for example, based on a rolling average of the number of production change requests that result in roll-back. This ensures an increase in the net rate of change to the system while motivating users/teams to improve the quality and stability of each production change.
FIG. 3 illustrates an example software deployment technique performed by data processing device. As shown, production change requests are first received by a deployment manager (302), for example from multiple users. Each production change request identifies a component module to be modified by a "corresponding production -change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies. The production change requests are then scheduled in accordance with scheduling requirements (304) specified by the corresponding configuration data. In some examples, the deployment manager determines whether component dependencies specified by the corresponding configuration data have been modified, for example, by another production change request such that proceeding with the deployment of the current production change request may result in system instabilities. In some examples, the deployment manager determines whether a conflict exists between the one or more component dependencies specified by the configuration data associated with a queued production change request and component dependencies associated with a dispatched production change request, for example, to ensure component locks do not interfere with the deployed production change request.
When the deployment manager determines one or more production change requests can be dispatched without creating a conflict, production change requests are assigned to different worker modules and dispatched (306). The worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools, including, for example, a canary push stage. Advantageously, the different software configuration tools may be preexisting tools previously used to implement production change requests to particular nodes or subsystems within the production system. In some implementations, the worker modules execute on different computers in the production system. As the worker modules progress through their respective workflows, phase state information associated with the current phase being performed by the corresponding worker module is submitted by the worker modules and received by the deployment manager, and, if necessary, the deployment manager reports the phase state information to a user via one or more communication channels (308).
FIG. 4 is a schematic diagram of an example system configured to coordinate and manage the dispatch of production changes across many different software components and services in a distributed system. The system generally consists of a server 402. The server 402 is optionally connected to one or more user or client computers 490 through a network 480. The server 402 consists of one or more data processing apparatus. While only one data processing apparatus is shown in FIG. 4, multiple data processing apparatus can be used. The server 402 includes various modules, e.g. executable software programs, including a deployment manager 404, one or more workers 405,"a"logging server~406, a natural language processing server 408, and an key-value data store service 410.
Each module runs as part of the operating system on the server 402, runs as an application on the server 402, or runs as part of the operating system and part of an application on the server 402, for instance. Although several software modules are illustrated, there may be fewer or more software modules. Moreover, the software modules can be distributed on one or more data processing apparatus connected by one or more networks or other suitable communication mediums.
The server 402 also includes hardware or firmware devices including one or more processors 412, one or more additional devices 414, a computer readable medium 416, a communication interface 418, and one or more user interface devices 420. Each processor 412 is capable of processing instructions for execution within the server 402. In some implementations, the processor 412 is a single or multi-threaded processor. Each processor 412 is capable of processing instructions stored on the computer readable medium 416 or on a storage device such as one of the additional devices 414. The server 402 uses its communication interface 418 to communicate with one or more computers 490, for example, over a network 480. Examples of user interface devices 420 include a display, a camera, a speaker, a microphone, a tactile feedback device, a keyboard, and a mouse. The server 402 can store instructions that implement operations associated with the modules described above, for example, on the computer readable medium 416 or one or more additional devices 414, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, or a tape device. Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine- generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A " computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media, e.g., multiple CDs, disks, or other storage devices.
The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing The apparatus can include special purpose logic circuitry, e.g., an FPGA, or field programmable gate array, or an ASIC, or application-specific integrated circuit. The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures. A computer program, also known as a program, software, software application, script, or code, can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA or an ASIC.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few. Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and feceivihg documents from a device that is used by the user; for example, by sending documents to a web browser on a user's client device in response to requests received from the web browser.
Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a commumcation network. Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), an internetwork, e.g., the Internet, and peer-to-peer networks, e.g., ad hoc peer-to-peer networks.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data, e.g., an HTML document, to a client device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device. Data generated at the client device, e.g., a result of the user interaction, can be received from the client device at the server. While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

Claims

What is claimed is: CLAIMS
1. A method performed by data processing apparatus, the method comprising: receiving production change requests by a deployment manager, each production change request identifying a component module to be modified by a corresponding production change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies;
scheduling a first production change request and a second production change request in accordance with scheduling requirements specified by the corresponding configuration data; and assigning the first production change request and the second production change request to different worker modules, wherein the worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools.
2. The method of claim 1, wherein the worker modules are configured to execute on different computers .
3. The method of claim 1, wherein assigning the first production change request and the second production change request to different worker modules comprises:
assigning the first production change request to a first worker module; and
assigning the second production change request to a second worker module;
wherein the first and second worker modules operate in parallel to deploy the corresponding production change.
4. The method of claim 1, wherein scheduling the first production change request comprises determining whether component dependencies specified by the corresponding configuration data have been modified.
5. The method of claim 1, wherein scheduling the first production change request comprises determining whether a conflict exists between the one or more component dependencies specified by the configuration data associated with the first production change request and component dependencies associated with a dispatched production change request.
6. The method of claim 1, wherein the workflow specified by the configuration data associated with the first production change request comprises a canary push phase.
7. The method of claim 1, wherein the workflow specified by the configuration data associated with the first production change request comprises a series of independent phases to be performed by the corresponding worker module in deploying the corresponding production change.
8. The method of claim 7, further comprising receiving phase state information associated with a current phase being performed by the corresponding worker module, and reporting the phase state information to a user via one or more communication channels.
9. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
receiving production change requests by a deployment manager, each production change request identifying a component module to be modified by a corresponding production change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies;
scheduling a first production change request and a second production change request in accordance with scheduling requirements specified by the corresponding configuration data; and assigning the first production change request and the second production change request to different worker modules, wherein the worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools.
10. The computer storage medium of claim 9, wherein the worker modules are configured to execute on different computers.
11. The computer storage medium of claim 9, wherein assigning the first production change request and the second production change request to different worker modules comprises:
assigning the first production change request to a first worker module; and
assigning the second production change request to a second worker module;
wherein the first and second worker modules operate in parallel to deploy the corresponding production change.
12. The computer storage medium of claim 9, wherein scheduling the first production change request comprises determining whether component dependencies specified by the corresponding configuration data have been modified.
13. The computer storage medium of claim 9, wherein scheduling the first production change request comprises determining whether a conflict exists between the one or more component dependencies specified by the configuration data associated with the first production change request and a component dependencies associated with a dispatched production change request.
14. The computer storage medium of claim 9, wherein the workflow specified by the configuration data associated with the first production change request comprises a canary push phase.
15. The computer storage medium of claim 9, wherein the workflow specified by the configuration data associated with the first production change request comprises a series of independent phases to be performed by the corresponding worker module in deploying the corresponding production change.
16. The computer storage medium of claim 15, the operations further comprising receiving phase state information associated with a current phase being performed by the corresponding worker module, and reporting the phase state information to a user via one or more communication channels.
17. A system comprising:
one or more computers operable to perform operations comprising:
receiving production change requests by a deployment manager, each production change request identifying a component module to be modified by a corresponding production change and configuration data specifying a workflow to be executed in implementing the production change and one or more component dependencies;
scheduling a first production change request and a second production change request in accordance with scheduling requirements specified by the corresponding configuration data; and assigning the first production change request and the second production change request to different worker modules, wherein the worker modules are configured to deploy the corresponding production change in accordance with the specified workflow using different software configuration tools.
18. The system of claim 17, wherein the worker modules are configured to execute on different computers.
19. The system of claim 17, wherein assigning the first production change request and the second production change request to different worker modules comprises:
assigning the first production change request to a first worker module; and
assigning the second production change request to a second worker module; wherein the first and second worker modules operate in parallel to deploy the corresponding production change.
20. The system of claim 17, wherein scheduling the first production change request comprises determining whether component dependencies specified by the corresponding configuration data have been modified.
21. The system of claim 17, wherein scheduling the first production change request comprises determining whether a conflict exists between the one or more component dependencies specified by the configuration data associated with the first production change request and a component dependencies associated with a dispatched production change request.
22. The system of claim 17, wherein the workflow specified by the configuration data associated with the first production change request comprises a canary push phase.
23. The system of claim 17, wherein the workflow specified by the configuration data associated with the first production change request comprises a series of independent phases to be performed by the corresponding worker module in deploying the corresponding production change.
24. The system of claim 23, the operations further comprising receiving phase state information associated with a current phase being performed by the corresponding worker module, and reporting the phase state information to a user via one or more communication channels.
PCT/UA2011/000073 2011-08-10 2011-08-10 Coordinating software deployment WO2013022411A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/UA2011/000073 WO2013022411A1 (en) 2011-08-10 2011-08-10 Coordinating software deployment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/UA2011/000073 WO2013022411A1 (en) 2011-08-10 2011-08-10 Coordinating software deployment

Publications (1)

Publication Number Publication Date
WO2013022411A1 true WO2013022411A1 (en) 2013-02-14

Family

ID=45023868

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/UA2011/000073 WO2013022411A1 (en) 2011-08-10 2011-08-10 Coordinating software deployment

Country Status (1)

Country Link
WO (1) WO2013022411A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427624A (en) * 2017-02-13 2018-08-21 阿里巴巴集团控股有限公司 A kind of recognition methods of system stability risk and equipment
US20220330013A1 (en) * 2021-04-13 2022-10-13 Bank Of Montreal Managing configurations of mobile devices across mobility configuration environments

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003107147A2 (en) * 2002-06-17 2003-12-24 Marimba, Inc. Method and system for automatically updating multiple servers
US20070005769A1 (en) * 2005-06-30 2007-01-04 Microsoft Corporation Solution deployment in a server farm
EP2189900A1 (en) * 2008-11-25 2010-05-26 Fisher-Rosemount Systems, Inc. Software deployment manager integration within a process control system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003107147A2 (en) * 2002-06-17 2003-12-24 Marimba, Inc. Method and system for automatically updating multiple servers
US20070005769A1 (en) * 2005-06-30 2007-01-04 Microsoft Corporation Solution deployment in a server farm
EP2189900A1 (en) * 2008-11-25 2010-05-26 Fisher-Rosemount Systems, Inc. Software deployment manager integration within a process control system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427624A (en) * 2017-02-13 2018-08-21 阿里巴巴集团控股有限公司 A kind of recognition methods of system stability risk and equipment
CN108427624B (en) * 2017-02-13 2021-03-02 创新先进技术有限公司 System stability risk identification method and device
US20220330013A1 (en) * 2021-04-13 2022-10-13 Bank Of Montreal Managing configurations of mobile devices across mobility configuration environments

Similar Documents

Publication Publication Date Title
US20230119331A1 (en) Techniques for utilizing directed acyclic graphs for deployment instructions
US8332443B2 (en) Masterless distributed batch scheduling engine
US8370802B2 (en) Specifying an order for changing an operational state of software application components
RU2429529C2 (en) Dynamic configuration, allocation and deployment of computer systems
US9485151B2 (en) Centralized system management on endpoints of a distributed data processing system
US9940598B2 (en) Apparatus and method for controlling execution workflows
US11055180B2 (en) Backup management of software environments in a distributed network environment
US8959518B2 (en) Window-based scheduling using a key-value data store
JP2011123881A (en) Performing workflow having a set of dependency-related predefined activities on a plurality of task servers
US9009725B2 (en) System of growth and automated migration
CN113569987A (en) Model training method and device
US10754705B2 (en) Managing metadata hierarch for a distributed processing system with depth-limited hierarchy subscription
US20180082228A1 (en) Digital project management office
CN113778486A (en) Containerization processing method, device, medium and equipment for code pipeline
US9595014B1 (en) System and method for executing workflow instance and modifying same during execution
US7979870B1 (en) Method and system for locating objects in a distributed computing environment
Tang et al. Application centric lifecycle framework in cloud
US20200310828A1 (en) Method, function manager and arrangement for handling function calls
US20230289234A1 (en) Computing environment pooling
US9934268B2 (en) Providing consistent tenant experiences for multi-tenant databases
WO2013022411A1 (en) Coordinating software deployment
US10713085B2 (en) Asynchronous sequential processing execution
CN113791876A (en) System, method and apparatus for processing tasks
US11907364B2 (en) Managing incompliances identified at instances of cloud applications
US20230176839A1 (en) Automatic management of applications in a containerized environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11787754

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11787754

Country of ref document: EP

Kind code of ref document: A1