CN112764789A

CN112764789A - Distributed software upgrading method and node

Info

Publication number: CN112764789A
Application number: CN201911077613.4A
Authority: CN
Inventors: 赵�怡; 陆尧; 徐京京
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2021-05-07

Abstract

The invention discloses a distributed software upgrading method and a node, wherein the method comprises the following steps: updating the version of the first software package based on the patch and/or the installation package at the first node to obtain an updated first software package; wherein the first software package is a first part of distributed software; if the operation of the updated first software package of the first node depends on a second node, the first node detects the second node; the second node is a node for running or installing a second part of software in the distributed software; the first portion of software is the same as or at least partially different from the second portion of software; and if the detection result represents that the second node is in a preset operation state, when the first node can operate the updated first software package, determining that the distributed software is successfully upgraded.

Description

Distributed software upgrading method and node

Technical Field

The invention relates to the field of edge computing distributed cloud, in particular to a distributed software upgrading method and nodes.

Background

Edge computing can be viewed as a distributed cloud computing. Different from the common cloud computing deployed in a centralized data center machine room, the edge computing is characterized by extremely many nodes and extremely dispersed in geography. Due to these particularities of edge computing, upgrading, updating becomes very difficult once a need for software update occurs or a need to fix existing defects arises.

Disclosure of Invention

In view of the above, the present invention mainly aims to provide a distributed software upgrading method and node.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a distributed software upgrade method, the method comprising: updating the version of the first software package based on the patch and/or the installation package at the first node to obtain an updated first software package; wherein the first software package is a first part of distributed software;

if the operation of the updated first software package of the first node depends on a second node, the first node detects the second node; the second node is a node for running or installing a second part of software in the distributed software; the first portion of software is the same as or at least partially different from the second portion of software;

and if the detection result represents that the second node is in a preset operation state, when the first node can operate the updated first software package, determining that the distributed software is successfully upgraded.

In the above scheme, the method further comprises: and the first node receives the patch and/or the installation package distributed by the main node and at least used for updating the first software package.

In the above scheme, the method further comprises: if the running of the first software package updated by the first node does not depend on the second node, determining that the distributed software is successfully upgraded when the first node can run the updated first software package.

In the foregoing solution, the detecting, by the first node, the second node includes: and the first node detects the running state of a second part of software in the distributed software installed in the second node according to a preset detection time interval.

In the above scheme, the method further comprises: if the first node performs M-time detection on the second node, the detection results of the M-time detection indicate that the second node is not in the preset operation state, and the total time length of the M-time detection is equal to the preset detection time length, the first node sends a distributed software upgrading failure message to the main node; wherein M is a non-0 natural number.

In the above scheme, the method further comprises: if the first node continuously detects the second node for N times and the detection results of the N times of detection indicate that the second node is not in the preset operation state, the first node sends a distributed software upgrading failure message to the main node; wherein, N is a preset threshold value of detection times and is a non-0 natural number.

In the foregoing solution, the method further includes one of:

when the distributed software of the first node is upgraded successfully, if the first node receives a backspacing command sent by the main node, the first node downgrades the updated first software package to an original version;

when the first software package of the first node fails to be updated, if the first node receives a rollback command sent by the main node, the first node stops updating the first software package;

when the first software package of the first node is updated successfully and the updated first software package cannot be run, if the first node receives a rollback command sent by the master node, the first node downgrades the updated first software package to an original version.

A distributed software upgrade first node, the first node comprising:

the version updating module is used for updating the version of the first software package based on the patch and/or the installation package to obtain an updated first software package; wherein the first software package is a first part of distributed software;

the state detection module is used for detecting a second node if the operation of the updated first software package depends on the second node; the second node is a node for running or installing a second part of software in the distributed software; the first portion of software is the same as or at least partially different from the second portion of software;

and the upgrade success determining module is used for determining that the distributed software is upgraded successfully when the updated first software package can be operated if the detection result represents that the second node is in a preset operation state.

In the foregoing solution, the first node further includes: a receiving module;

the receiving module is configured to receive the patch and/or the installation package distributed by the host node, where the patch and/or the installation package is at least used for updating the first software package.

In the foregoing solution, the module for determining successful upgrade is further configured to determine that the distributed software upgrade is successful when the updated first software package can be run if the running of the updated first software package does not depend on the second node.

In the foregoing solution, the state detection module is further configured to detect, according to a preset detection time interval, an operation state of a second part of software in the distributed software installed in the second node.

In the foregoing scheme, the state detection module is further configured to send a message of failure in upgrading distributed software to the master node if M times of detections are performed on the second node, and detection results of the M times of detections all indicate that the second node is not in the preset operating state, and a total duration of the M times of detections is equal to a preset detection duration; wherein M is a non-0 natural number.

In the above scheme, the state detection module is further configured to send a distributed software upgrade failure message to the master node if N times of detections are continuously performed on the second node and detection results of the N times of detections indicate that the second node is not in the preset operating state; wherein, N is a preset threshold value of detection times and is a non-0 natural number.

In the foregoing solution, the version update module is further configured to at least one of:

when the distributed software is upgraded successfully, if a backspacing command sent by the main node is received, downgrading the updated first software package to an original version;

when the first software package fails to be updated, if a backspacing command sent by the main node is received, the first software package is stopped to be updated;

and when the first software package of the first node is updated successfully and the updated first software package cannot be operated, if a rollback command sent by the main node is received, degrading the updated first software package to an original version.

In the distributed software upgrading method and node provided by the embodiments of the present invention, when the operation of a first software package updated by a first node depends on a second node, the first node detects whether the operation state of the second node is in a preset operation state, and if the second node is in the preset operation state, when the first node can operate the updated first software package, it is determined that the distributed software upgrading is successful. Therefore, whether the node depended by the first node is in the preset running state or not is detected, the situation that the running state of the node depended by the first node is uniformly set for a time period to wait for the running state of the node to meet the preset running state is avoided, other operations are executed by the node after the running state meets the preset running state, the waiting time is saved, the efficiency of enabling the updated first software package on the first node to run is improved, and the efficiency of upgrading the distributed software is improved.

Drawings

Fig. 1 is a first schematic flow chart illustrating an implementation process of a distributed software upgrading method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an implementation flow of a distributed software upgrading method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a distributed software upgrade environment architecture according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a first node according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a first node according to an embodiment of the present invention.

Detailed Description

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

The following three methods for distributed software upgrade in the related art are mainly used:

document 1. method for upgrading software

The invention provides a software upgrading method.A built upgrading platform is independent of software and independently applied to different products or projects, a platform client automatically searches a patch server when in use, acquires a patch package from the patch server, and the platform automatically verifies the effectiveness of the patch, supports breakpoint transmission and carries out patch upgrading operation after downloading and verification are finished. The patch is automatically downloaded and manually downloaded in the platform according to a defined task; automatic upgrade and manual upgrade according to the definition task: the invention has the advantages that the description language is used for describing and analyzing the patch package, the patch is subjected to universal compression processing, the transmission is rapid, safe and reliable, the situations of network data blockage and the like can be avoided, the problems occurring during patch downloading and installation can be found in time, the problems of software updating, version management of different clients, automatic downloading and updating and the like are solved, and various software upgrading requirements under different platforms can be well met.

Document 2. distributed software patch updating method and system

The invention discloses a distributed software patch updating method and system. The method comprises the following steps: the patch management server sends a first program patch file to the first target application server, remotely controls the first target application server to execute the first program patch file, and carries out patch updating on the first program module; the patch management server also sends a data patch file to a database server, and a database system of the distributed software is installed on the database server; and the database server carries out patch updating on the database system according to the data patch file. In the patch updating process of the distributed software, the patch management server actively pushes the program patch files and the data patch files of the distributed software to the target application server and the database server, so that the program module on the target application server can be updated and the database system on the database server can be updated uniformly, and the upgrading and repairing of the distributed software are realized.

Document 3. software upgrading method for distributed storage system based on RPM package

The embodiment of the invention discloses a distributed storage system software upgrading method based on an RPM (revolution speed limit) packet, which comprises the following steps: acquiring state information of a distributed storage system; judging whether the state information is normal or not; if yes, stopping the key service of the distributed storage system, otherwise, stopping upgrading; backing up files of the distributed storage system to generate backup files; packing and sending the software upgrading package and the upgrading script to other nodes; and each node runs the software upgrading package respectively for upgrading. In the embodiment of the invention, the state information of the whole distributed storage system is judged by running the upgrading script in the main node, when the state information shows that the system state is normal, the key service in the system is closed, so that the related service is stopped, the file of the system is backed up, the upgrading script and the upgrading packet are sent to other nodes by the main node, then the upgrading script and the upgrading packet are simultaneously run on each node, so that the simultaneous upgrading of each node is realized, and the upgrading efficiency is improved.

However, documents 1 and 2 do not consider the case where there is interdependence between remote distributed cluster services, which cannot be easily upgraded, and need to consider the dependency between cluster services; although the document 3 proposes to upgrade the master node first and then upgrade other nodes, which solves some master and backup service upgrades to a certain extent, it is not comprehensive, and many distributed services need to perform long-time restart waiting and judgment of special conditions to perform cluster synchronization.

In view of the above problem, an embodiment of the present invention provides a distributed software upgrading method, as shown in fig. 1, where fig. 1 is a schematic diagram of an implementation flow of the distributed software upgrading method provided in the embodiment of the present invention, specifically:

s101, updating the version of the first software package based on the patch and/or the installation package at the first node to obtain an updated first software package; wherein the first software package is a first part of software in distributed software.

Before the distributed software upgrade, the main node distributes a full-scale software upgrade package input by a user to the first node, wherein the full-scale software upgrade package at least comprises patches and/or installation packages and a description file written by yaml or xml. And after the first node receives the full software upgrading package, updating the version of the first software package aiming at the patch and/or installation package corresponding to the original existing software on the first node. When the first node needs to be patched, firstly performing trial operation on the node, and then formally patching after the trial operation is successful.

After the first node finishes installing the patch and/or the installation package of the node, the version corresponding to the first software package on the node is updated, for example, the version of the previous database is v1.0, and the updated version is v 2.0; the first software package is a first piece of software in the distributed software.

S102, if the operation of the updated first software package of the first node depends on a second node, detecting the second node by the first node; the second node is a node for running or installing a second part of software in the distributed software; the first portion of software is the same as or at least partially different from the second portion of software.

The distributed cloud computing has numerous nodes, and some nodes have dependency relationships, such as each node in a cluster or a plurality of nodes in a non-cluster; for several nodes in a non-cluster, there may be a case where software installed on each node performs a partial function, but the partial functions of several nodes may be combined together to implement a complete function.

For the above situation, if the running of the first software package updated by the first node depends on the second node, the first node detects the running state of the second node according to a pre-detection command preset by the blocking waiting condition in the description file in the full upgrade package received by the first node.

Wherein the second part of software on the second node is the same as or at least partially different from the first part of software on the first node.

S103, if the detection result represents that the second node is in a preset operation state, when the first node can operate the updated first software package, determining that the distributed software is upgraded successfully.

When the running state of the second software package installed or run by the second node meets a preset detection result in the blocking waiting condition of the first node, the second node is in a preset running state; the first node executes the association operation for the installed first software package, where the association operation may be to restart a service, open a firewall, or execute a special operation, and after the first node executes the association operation, the software module and the configuration file on the first node may be modified, and the updated first software package may be able to run. And at this point, the distributed software upgrade of the first node is completed successfully.

In the distributed software upgrading method provided by the embodiment of the present invention, when the operation of the first software package after the update of the first node depends on the second node, the first node detects whether the operation state of the second node is in a preset operation state, and if the second node is in the preset operation state, when the first node can operate the updated first software package, it is determined that the distributed software upgrading is successful. Therefore, whether the node depended by the first node is in the preset running state or not is detected, the situation that the running state of the node depended by the first node is uniformly set for a time period to wait for the running state of the node to meet the preset running state is avoided, other operations are executed by the node after the running state meets the preset running state, the waiting time is saved, the efficiency of enabling the updated first software package on the first node to run is improved, and the efficiency of upgrading the distributed software is improved.

An embodiment of the present invention provides a distributed software upgrading method, as shown in fig. 2, and fig. 2 is a schematic diagram illustrating an implementation flow of the distributed software upgrading method according to the embodiment of the present invention.

The method further comprises the following steps: and the first node receives the patch and/or the installation package distributed by the main node and at least used for updating the first software package. Specifically, the method comprises the following steps:

and the operation and maintenance upgrade personnel inputs a full upgrade package in the main program of the automatic upgrade software of the local main node and inputs the user name of the upgrade execution personnel. The full upgrade package may be in a zip-like compressed format, and specifically includes all sub upgrade packages of the upgrade and a fixed-name description file, which may be in a yaml or xml format, and is used to describe information of the sub upgrade packages and upgrade methods adopted for the sub upgrade packages.

The description file specifically contains the following information: the version (version) of the full upgrade package, all patches except the description file in the compressed package, the file number (file _ count) of the software upgrade package, and the upgrade information of the sub software upgrade package may include a plurality of dispatches.

Each sub-software upgrade package information includes:

file name of software package (file _ name);

the file types of the software package comprise two types (file _ type) of a patch file and an installation package, if the patch file is a patch file, the file type should contain a commit ID (commit _ ID) and a commit _ history list of the commit _ history in a code library where the patch file is generated at the time, wherein the commit _ history is a past repaired patch version record and records the description of each repaired version and the modification of related codes; the common code base is svn or git, and the ID is the submission ID (commit ID) in the code base; if in the form of a software package, should contain the version of the software package (version);

md5 check value of patch file or software package (md 5);

a classification (package _ type) of the software package, including: kernel, drive module, common software, configuration file, etc.; the classification indicates that the sub-package is an upgrade to at least one of the classifications;

an upgrade method (upgrade _ upgrade) includes two modes, namely patching (patch) or updating (update) of an installation package;

an upgrade node address (upgrade _ host) for identifying on which node the child software package is specifically upgraded;

correlation operation (dependency _ ops): user-defined operations such as restarting a certain service, restarting associated services, opening a firewall, executing special operations and the like can be included; after the correlation operation is executed, the software package updated on the node can be operated;

congestion waiting condition (wait _ for): for some operations, it may be necessary to check whether other nodes have related preconditions before executing the operation, such as a database cluster upgrade restart, and it is required that a slave node (slave) needs to wait for a master node (master) to restart successfully before performing a synchronization coordination, where the operation at this time is an associated operation. Wherein the congestion waiting condition parameter comprises: whether to start a blocking wait (wait _ for), a target detection node list (wait _ for _ hosts), a pre-detection command (pre _ check) of the blocking wait, a result comparison method (check _ op) of the blocking wait, a result detection content (check _ value) of the blocking wait, an interval time (wait _ for _ interval) of the blocking detection, and a blocking detection duration (wait _ for _ timeout);

the full upgrade package comprises an update directory, and the directory comprises all the sub-software package patch files or the sub-software upgrade installation package.

See specifically the following example of upgrade. yaml code within the full upgrade software package:

in the first example of the upgrade.yaml code, each parenthesis below the patch field is a child software upgrade package, the child software upgrade package may be a patch (patch) or an installation package (update), and each node determines whether the child software package is a child software package upgraded for itself according to an upgrade _ host field in the child software package.

Fig. 3 provides a schematic diagram of a distributed software upgrade environment architecture, in which after an operation and maintenance upgrade worker inputs a full upgrade package into an automatic upgrade software main program, the automatic upgrade software main program copies the full upgrade package to agent upgrade software of each edge computing node through the internet. Wherein, the edge computing node is provided with a client of the upgrading software, namely an agent upgrading software program; a code version information database, the data block is used for maintaining, maintaining the upgrading state and the like; and the software backup is used for backing up the child software packages of the old versions.

In fig. 3, the upgrade software server, i.e., the automatic upgrade software main program and each agent upgrade software program communicate with each other via an API protocol based on the reliable transport protocol TCP, and the API connection is encrypted via SSL.

The first node can be regarded as any edge node, and the agent upgrading software program of the first node receives the whole upgrading package sent by the main node automatic upgrading software main program, decompresses the compressed package, and determines a patch and/or an installation package suitable for the first node according to the upgrade _ host field so as to update the old first software package on the first node.

The specific steps are taken as an example of the first node, and can be as shown in fig. 2:

s201, running agent upgrading software.

S202, reading a file _ count field (file _ count) of a sub-software package in a description file by a first node, checking whether the total number of members in a software list in a patch field is equal to the number corresponding to the file _ count field, if not, reporting an error to an automatic upgrading software main program of a main node by the agent upgrading software, and ending a software upgrading process, wherein the first step specifically refers to an upgrade.yaml code example I in a full-scale upgrading software package; if yes, go to step S203.

S203, reading the schedules list, checking whether all file names (file _ name) exist in the update directory, if not, reporting an error to the main program of the automatic upgrading software of the main node by the agent upgrading software, and ending the software upgrading process; if yes, go to step S204.

S204, reading an md5 field, calculating and checking whether a check value of the file is correct or not as shown in an upgrade. yaml code example I, if not, reporting an error to a main program of the automatic upgrade software of the main node by the agent upgrade software, and ending a software upgrade process; if the result is correct, step S205 is executed.

S205, if the sub-software package type (file _ type) is patch, reading a local code version database for version information check, if the sub-software package type (file _ type) is patch, reporting an error to an automatic upgrading software main program of the main node by the agent upgrading software, and ending the software upgrading process; otherwise, go to S2051; if the software package type (file _ type) is the installation package (taking the RPM software package as an example), S2052 is performed.

S2051, simulating a real environment at the first node, performing trial run patching operation (dry run), and if the operation is correct, entering S206.

S2052, whether the local same-name installation package is installed or not is checked, if the installed installation package version is larger than the installation package version needing to be upgraded, the agent upgrading software reports an error and the software upgrading process is ended. If the version is correct, S206 is entered.

S206, when the type of the sub-software package is the patch, performing formal patching operation; after the patch is printed, the upgrading software automatically updates the submitted ID (commit id), the user name of the upgrade executive personnel, the operation time and the related information of the patch to a code version information database, and backups the patch to a specific directory of the first node; when the type of the sub-software package is an installation package, the upgrading software automatically runs the sub-software package, and updates the version (version) of the updated sub-software package, the user name of an upgrade executor, the operation time and other related information of the updated sub-software package to a code version database, so that the upgrading history can be effectively detected, and problem tracing and responsibility tracing can be performed when problems occur; and backs up the old version of the child software package to the particular directory of the first node.

S207, after the installation of the patch and/or the installation package in the child software package is finished, executing associated operations according to the sequence in the description file, namely:

when the number of the associated operations is at least two, the at least two associated operations have corresponding execution sequences; and the first node executes each association operation in turn according to the execution sequence corresponding to the at least two association operations.

S208, executing associated operation according to the blocking waiting condition in the description file in the full upgrade package received by the first node, updating the code version database when the distributed software upgrade of the first node is successful or failed, and recording the upgrade success of the first node and the version information of the software package corresponding to the upgrade success; when all edge nodes similar to the first node in the distributed software upgrading are successfully upgraded, the whole distributed software upgrading is successful.

And when the first node is not upgraded, the node informs the automatic upgrading software program of the main node of the node upgrading failure through the agent upgrading software program. At the moment, the main node sends a rollback message to all edge nodes, the upgrading is stopped aiming at the edge nodes which are being upgraded, operation and maintenance upgrading personnel obtain old patches and/or installation packages, and the old patches and/or installation packages are operated on each node, so that the versions of the patches and/or installation packages on the node are rolled back to the original versions; and for the nodes which are successfully upgraded, the operation and maintenance upgrade personnel also obtain the old patches and/or the old installation packages, operate the old patches and/or the old installation packages on the nodes and perform the downgrading operation on the patches and/or the old installation packages on the nodes.

After the rollback operation is executed, the version information of the corresponding patch and/or installation package in the code version database still needs to be updated, if the whole distributed software is successfully updated, the record is successfully updated, and if the whole distributed software is failed, the corresponding failure record is made. Meanwhile, information such as executive personnel, operation time and the like is recorded, so that the upgrading history information can be conveniently checked in the future or problem and responsibility investigation can be conveniently carried out, and the upgrading process is ended afterwards.

The process of performing the association operation is described in detail in the work flow diagrams S102 and S103 shown in fig. 1 with respect to the above step S207.

For the above case, see the description file upgrade.

When the congestion waiting condition (wait _ for) under the association operation (dependency _ ops) in the description file received by the first node is True, it indicates that it needs to be determined whether the running state of the second node meets the precondition for the first node to execute the association operation, at this time, the second node is a node listed in the target check node list (wait _ for _ hosts), and when the number of the second nodes is greater than or equal to 2, the running state of each node needs to meet the precondition.

Specifically, a way of determining whether the running state of the second node meets the precondition for executing the association operation by the first node is described in first example of upgrade. yaml code, the second node may be first ping, and a pre-detection command (pre _ check), for example, systemctl status http, is executed for the second node, so as to detect the running state of the second node at this time.

As an exemplary result comparison method (check _ op) of blocking wait preset in the yaml code by the first node is include, and checking content (check _ value) of the result of blocking wait is running, it can be understood that: when the returned running state of the second node contains the string, that is, include running, it is considered that the running state of the patch and/or the installation package on the second node at this time satisfies the precondition that the first node executes the association operation, that is, satisfies the running state preset in the blocking waiting condition.

Similarly, the result comparison method (check _ op) of the preset blocking wait may be set as "exception", and the result check content (check _ value) of the blocking wait may be set as "error", which may be understood as: and when the returned running state of the second node does not contain the error character string, namely the error character string is contained in the exception, the running state of the patch and/or the installation package on the second node at the moment is considered to meet the precondition that the first node executes the association operation, namely the running state preset in the blocking waiting condition is met.

If the first node performs M-time detection on the second node, the detection results of the M-time detection indicate that the second node is not in the preset operation state, and the total time length of the M + 1-time detection is equal to the preset detection time length, the first node sends a message of failure in distributed software upgrading to the main node; wherein M is a non-0 natural number.

It can be understood that the running state of the second node may not meet the precondition of the first node executing the association operation at the beginning, and at this time, the first node needs to wait for the second node to meet the precondition, so that the running state of the second node is detected for multiple times, and when the mth time is detected and the total time of the previous cumulative detections is equal to the preset detection time, the first node sends a message that the distributed software upgrade fails to the master node.

Specifically, the first node performs M detections on the second node, and the time interval of each detection is equal, that is:

and the first node detects the running state of a second part of software in the distributed software installed in the second node according to a preset detection time interval.

An exemplary time interval for congestion detection (wait _ for _ interval) set in the above upgrade.yaml file means that the first node may periodically detect the running status of the patch and/or installation package on the second node every 10 seconds or 10 milliseconds at this time interval: when the first detection does not satisfy the precondition for the first node to perform the association operation, the pre-detection command pre _ check is re-executed to the second node after 10 seconds or 10 milliseconds, and as long as the total detection time taken does not reach the congestion detection time period (wait _ for _ timeout), as in the above code example, 100 seconds or 100 milliseconds, the detection for the second node may be always performed.

When the first node detects the second node for 10 th time, if the running state of the second node is not in the preset running state at the moment, but the total consumed time is 100 seconds or 100 milliseconds, and the time spent at the moment is equal to the preset total time, the first node sends a message that the distributed software upgrading fails to the main node.

It should be noted that when the number of the second nodes is more than two, and the same detection is needed, the operation states of all the second nodes are in the preset operation state, and the detection result of the second node can be determined to be in the preset operation state; however, when the detection result of any one of the second nodes is not in the preset operation state during one detection, it can be determined that the detection result of the second node is not in the preset operation state.

If the running of the first software package updated by the first node does not depend on the second node, determining that the distributed software is successfully upgraded when the first node can run the updated first software package.

When a congestion waiting condition (wait _ for) does not exist under the association operation (dependency _ ops) in the description file received by the first node, it indicates that the first node does not need to judge whether the running state of the second node is in the preset running state when executing the association operation, at this time, the first node directly executes the association operation, after the association operation is successfully executed, the first software package updated on the first node can run, and at this time, it can also be determined that the upgrading of the distributed software on the first node is successful.

If the first node continuously detects the second node for N times and the second node is not in the preset running state, the first node sends a distributed software upgrading failure message to a main node; wherein N is a non-0 natural number.

It can be understood that, the congestion detection duration (wait _ for _ timeout) in the congestion waiting condition is changed to a congestion detection time threshold (wait _ for _ check _ counts), that is, when the first node detects the running state of the second node according to the pre-detection command preset in the description file, and when the running state of the second node is still not in the preset running state after N times of continuous detection, the first node sends a distributed software upgrade failure message to the master node at this time.

The distributed software upgrading method further comprises one of the following steps:

Because of numerous nodes in the distributed software upgrading, when any node fails to be upgraded or abnormal conditions such as power failure are sent in the upgrading process, the automatic upgrading software of the main node receives the message of the distributed software upgrading failure and sends a rollback command to remind each edge node, and here, the operation and maintenance upgrading personnel at the first node can perform rollback operation. Therefore, the whole upgrading process can be timely and quickly finished, the problem of time waste caused by continuous upgrading of other nodes due to the fact that the other nodes are not aware of node upgrading failure can be avoided, and resources are saved. The specific operation is as follows:

the method comprises the steps that a first node receiving a backspacing command downgrades an updated first software package to an original version if distributed software is upgraded successfully; such as reducing the database version from v2.0 to v 1.0.

When the first software package of the first node fails to be updated, namely the database version is still in the version state of v1.0 at the moment, and the first node receives the rollback command sent by the main node, the first node does not install the database of the new version any more, so that the database of the new version is maintained at v 1.0.

When the first software package of the first node is successfully updated from v1.0 to v2.0, but the first node cannot run the updated database at this time, and the first node receives the rollback command sent by the master node, the first node stops operations enabling the first software package to run, such as stopping installation of associated operations like OpenStack, and runs the first software package before updating on the first node again, and the first node downgrades the updated first software package to the original version v 1.0.

An embodiment of the present invention provides a distributed software upgrade first node, and as shown in fig. 4, is a schematic structural diagram of the first node, specifically:

a first node 10, the first node comprising:

the version updating module 11 is configured to update the version of the first software package based on the patch and/or the installation package to obtain an updated first software package; wherein the first software package is a first part of software in distributed software.

Before the distributed software upgrade, the main node distributes a full-scale software upgrade package input by a user to the first node, wherein the full-scale software upgrade package at least comprises patches and/or installation packages and a description file written in yaml or xml. After the first node receives the full software upgrade package, the version update module 11 is used to perform version update on the first software package for the patch and/or installation package corresponding to the original software on the first node. When the first node needs to be patched, firstly performing trial operation on the node, and then formally patching after the trial operation is successful.

A state detection module 12, configured to detect a second node if the operation of the updated first software package depends on the second node; the second node is a node for running or installing a second part of software in the distributed software; the first portion of software is the same as or at least partially different from the second portion of software.

For the above situation, if the operation of the first software package after the update of the first node depends on the second node, the first node detects the operation state of the second node by using the state detection module 12 according to the pre-detection command preset by the congestion waiting condition in the description file in the full upgrade package received by the first node.

And an upgrade success determining module 13, configured to determine that the distributed software is upgraded successfully when the updated first software package can be run if the detection result indicates that the second node is in the preset running state.

When the running state of the second software package installed or run by the second node meets a preset detection result in the blocking waiting condition of the first node, the second node is in a preset running state; the first node executes the association operation for the installed first software package, where the association operation may be to restart a service, open a firewall, or execute a special operation, and after the first node executes the association operation, the software module and the configuration file on the node may be modified, so that the updated first software package can run. And at this point, the distributed software upgrade of the first node is completed successfully.

The first node further comprises: a receiving module 14; the receiving module is configured to receive the patch and/or the installation package distributed by the host node, where the patch and/or the installation package is at least used for updating the first software package.

Specifically, as shown in fig. 5, the automatic upgrade software program on the master node distributes the software upgrade package input by the operation and maintenance upgrade personnel to each edge node, and in the present invention, the first node is also an edge node, and the first node receives the patch and/or the installation package distributed by the master node through the receiving module 14.

The module 13 for determining successful upgrade is further configured to determine that the distributed software upgrade is successful when the updated first software package can be run if the running of the updated first software package does not depend on the second node.

At this time, it can be understood that the first node directly performs the association operation regardless of the congestion waiting condition.

The state detection module 12 is further configured to detect an operation state of a second part of software in the distributed software installed in the second node according to a preset detection time interval.

That is, the first node periodically detects the operation state of the second node according to the detection time interval preset in the congestion waiting condition through the state detection module 12.

The state detection module 12 is further configured to send a message of failure in upgrading distributed software to the master node if M times of detections are performed on the second node, and detection results of the M times of detections all indicate that the second node is not in the preset operation state, and a total time duration of the M times of detections is equal to a preset detection time duration; wherein M is a non-0 natural number.

The running state of the second node may not meet the precondition of the first node when executing the association operation at the beginning, and at this time, the first node needs to wait for the second node to meet the precondition, so the running state of the second node is detected for multiple times, and when the mth time is detected and the total detected time length is equal to the preset detection time length, the first node sends a message that the distributed software upgrade fails to the master node.

Specifically, the first node performs M detections on the second node, and the time interval of each detection is equal.

The state detection module 12 is further configured to send a distributed software upgrade failure message to the master node if N times of detections are continuously performed on the second node and detection results of the N times of detections indicate that the second node is not in the preset operation state; wherein, N is a preset threshold value of detection times and is a non-0 natural number.

The version update module 11 is further configured to at least one of:

and when the first software package is successfully updated and the updated first software package cannot be operated, if a backspacing command sent by the main node is received, degrading the updated first software package to the original version.

In the first node provided in the embodiment of the present invention, when the operation of the first software package after the update of the first node depends on the second node, the first node detects whether the operation state of the second node is in a preset operation state, and if the second node is in the preset operation state, when the first node can operate the updated first software package, it is determined that the distributed software is successfully upgraded. Therefore, whether the node depended by the first node is in the preset running state or not is detected, the situation that the running state of the node depended by the first node is uniformly set for a time period to wait for the running state of the node to meet the preset running state is avoided, other operations are executed by the node after the running state meets the preset running state, the waiting time is saved, the efficiency of enabling the updated first software package on the first node to run is improved, and the efficiency of upgrading the distributed software is improved.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A distributed software upgrade method, the method comprising:

updating the version of the first software package based on the patch and/or the installation package at the first node to obtain an updated first software package; wherein the first software package is a first part of distributed software;

2. The method of claim 1, further comprising:

and the first node receives the patch and/or the installation package distributed by the main node and at least used for updating the first software package.

3. The method of claim 1, further comprising:

4. The method of claim 1, wherein detecting the second node by the first node comprises:

5. The method of claim 1, further comprising:

if the first node performs M-time detection on the second node, the detection results of the M-time detection indicate that the second node is not in the preset operation state, and the total time length of the M-time detection is equal to the preset detection time length, the first node sends a message of failure in distributed software upgrading to the main node; wherein M is a non-0 natural number.

6. The method of claim 1, further comprising:

if the first node continuously detects the second node for N times and the detection results of the N times of detection indicate that the second node is not in the preset operation state, the first node sends a distributed software upgrading failure message to a main node; wherein, N is a preset threshold value of detection times and is a non-0 natural number.

7. The method of claim 5 or 6, further comprising one of:

8. A first node, characterized in that the first node comprises:

9. The first node of claim 8, wherein the first node further comprises: a receiving module;

10. The first node of claim 8,

the module for determining successful upgrade is further configured to determine that the distributed software upgrade is successful when the updated first software package can be run if the running of the updated first software package does not depend on the second node.

11. The first node of claim 8,

the state detection module is further configured to detect an operating state of a second part of software in the distributed software installed in the second node according to a preset detection time interval.

12. The first node of claim 8,

the state detection module is further configured to send a message of failure in distributed software upgrade to the master node if M times of detections are performed on the second node, and detection results of the M times of detections all indicate that the second node is not in the preset operation state, and a total time length of the M times of detections is equal to a preset detection time length; wherein M is a non-0 natural number.

13. The first node of claim 8,

the state detection module is further configured to send a distributed software upgrade failure message to the master node if the second node is continuously detected for N times and the detection results of the N times of detection all indicate that the second node is not in the preset operation state; wherein, N is a preset threshold value of detection times and is a non-0 natural number.

14. The first node of claim 12 or 13, wherein the version update module is further configured to at least one of: