CN107040406B

CN107040406B - End cloud cooperative computing system and fault-tolerant method thereof

Info

Publication number: CN107040406B
Application number: CN201710148783.1A
Authority: CN
Inventors: 沈玉龙; 张宇恒; 司旭; 保积元; 王思怡; 阮金清
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-03-14
Filing date: 2017-03-14
Publication date: 2020-08-11
Anticipated expiration: 2037-03-14
Also published as: CN107040406A

Abstract

The invention discloses a terminal cloud cooperative computing system and a fault-tolerant method thereof, which utilize the characteristic that the cloud capability is stronger than that of a terminal to realize two safe fault-tolerant strategies of cloud backup and cloud recovery at the cloud end, thereby ensuring the task safety and the correctness of terminal cloud cooperative computing. And an abnormal terminal node detection function is provided, so that the cloud node can replace a terminal node which does not meet the system computing standard in real time, and the stable operation of the whole service system is ensured. According to the invention, the terminal with certain computing capacity in the security certification access system is also added into the computing resource pool to form the terminal cluster, the terminal cluster has strong expansibility, the expansibility of the cloud is increased, the terminal cluster and the service nodes in the static cloud together complete the tasks required to be computed by the single cloud platform, the traditional cloud platform is optimized, the load of the cloud is effectively reduced, and the load balance is achieved.

Description

End cloud cooperative computing system and fault-tolerant method thereof

Technical Field

The invention relates to the technical field of cloud computing, in particular to a terminal cloud cooperative computing system and a fault-tolerant method thereof.

Background

Cloud computing is an increasing, usage and delivery model for internet-based related services, typically involving the provision of dynamically scalable and often virtualized resources over the internet. Cloud is a metaphor of network and internet. In the past, telecommunications networks were often represented by clouds and later also by the abstraction of the internet and the underlying infrastructure. Cloud computing can enable a user to experience computing power of 10 trillion times per second, and the powerful computing power can simulate nuclear explosion, forecast climate change and market development trend. The user can access the data center through a computer, a notebook, a mobile phone and the like and calculate according to the own requirements.

Cloud computing is a product of development and fusion of traditional computer and network technologies, such as distributed computing, parallel computing, utility computing, network storage, virtualization, load balancing, hot backup redundancy and the like. There are many references to the definition of cloud computing, and at the present stage, it is widely accepted that the definition of the national institute of standards and technology is: cloud computing is a pay-per-use model that provides available, convenient, on-demand network access into a configurable shared pool of computing resources (resources including networks, servers, storage, applications, services) that can be provisioned quickly with little administrative effort or interaction with service providers.

The cloud computing platform is also referred to as a cloud platform. Cloud platforms can be divided into 3 classes: the cloud computing platform comprises a storage type cloud platform taking data storage as a main part, a computing type cloud platform taking data processing as a main part and a comprehensive type cloud computing platform taking computing and data processing into consideration.

The terminal cloud cooperation platform is different from a traditional cloud platform, and mobile terminals providing certain computing capacity are added into a resource pool to jointly complete computing tasks. The method achieves effective utilization of idle resources, and responds to the call of the existing low-carbon technology and green technology.

Most of terminal mobile devices are mobile devices, most of access terminal cloud cooperative architectures are accessed in a wireless network mode, and certain difference exists between the access terminal cloud cooperative architectures and computing nodes on a cloud platform in terms of network stability and quality. Meanwhile, the terminal has relatively weak durability and is easily influenced by a plurality of external environment factors, and the possibility of abnormity or failure is greatly increased.

Disclosure of Invention

In view of the above defects in the prior art, the technical problem to be solved by the present invention is to provide an end cloud cooperative computing system and a fault tolerance method thereof, which implement a safe end cloud cooperative fault tolerance method by combining two strategies of cloud backup and cloud recovery, thereby ensuring high reliability of the architecture.

In order to achieve the above object, the present invention provides a peer cloud collaborative computing system, which is characterized in that: the system consists of a task management server module, an end cloud cooperative server module, a static cloud service node module and a mobile terminal computing service node module, wherein the task management server module comprises four modules:

a task management server module: the module is responsible for acquiring tasks submitted by a user, packaging the tasks, processing the tasks and sending the tasks to the end cloud cooperation server module;

the end cloud collaboration server module: the module is responsible for receiving tasks sent by the management server, formulating a task scheduling strategy and cooperatively managing the static cloud and the mobile terminal resources;

static cloud service node module: the module consists of a static cloud server and is responsible for calculating various calculation tasks sent by an end cloud cooperative server, setting check points for mobile terminal tasks and carrying out cloud backup on the tasks so as to prevent the tasks from being lost due to calculation failure on mobile terminal resources;

the mobile terminal computing service node module: the module consists of various hardware terminals and is responsible for computing tasks sent by the terminal cloud collaboration server.

Furthermore, the end cloud cooperation server module is also provided with an automatic discovery module, the end cloud cooperation server uses an active detection method for discovering available resources in time, can dynamically expand mobile terminal resources for the cloud platform, allocate tasks suitable for operation of the static cloud and the mobile terminal resources for the static cloud and the mobile terminal resources, and perform real-time communication with the static cloud and the mobile terminal resources to ensure normal execution of the fault-tolerant system.

Furthermore, the static cloud service node module is also provided with a dynamic monitoring module, and can monitor the uploading condition of the calculation result of the terminal resource check point in real time by the cooperation of the cloud and the server so as to ensure the operation of the fault-tolerant system.

Further, the mobile terminal computing service node module is also provided with a log storage and uploading module, and can store the calculation results of the check points and upload the calculation results of the check points to the terminal cloud coordination server on time.

Furthermore, the mobile terminals are all brands of pads and mobile phones based on the android system.

A fault tolerance method of an end cloud cooperative computing system is characterized by comprising the following specific steps:

the method comprises the following steps: after a user issues a task, the task management server receives the task, sorts all the tasks and sends the tasks to the end cloud coordination server;

step two: after receiving the task, the end cloud cooperation server carries out task scheduling and transmission;

step three: after receiving the task, the static cloud end performs checkpoint setting on the task, performs static cloud backup operation on the task after checkpoint setting, and sends the task to the end cloud coordination server after execution is completed;

step four: the end cloud cooperation server sends the tasks processed by the static cloud end to the mobile end equipment;

step five: and after receiving the task, the mobile terminal performs operation processing on the task.

Further, the second step is specifically: the tasks are classified and processed, the tasks are divided into priorities according to the end cloud distribution module, the task queues are sorted according to the first-come first-serve and priority sizes, a task distribution strategy is formulated, and the tasks suitable for being calculated on front-end equipment are sent to the cloud end first.

Further, the fifth step is specifically: in the processing process, if the operation is successful, the result is uploaded to the cooperative server, meanwhile, the cooperative server uploads the result to the cloud end, and then the backup part of the task is deleted by the cloud end; if the operation fails, the cooperative server executes the fault-tolerant method and returns the task to the cloud for execution.

The invention has the beneficial effects that:

according to the method, two safety fault-tolerant strategies of cloud backup and cloud recovery are realized at the cloud side by utilizing the characteristic that the cloud side capability is higher than that of the terminal, so that the task safety and the accuracy of end cloud cooperative computing are ensured. And an abnormal terminal node detection function is provided, so that the cloud node can replace a terminal node which does not meet the system computing standard in real time, and the stable operation of the whole service system is ensured. According to the invention, the terminal with certain computing capacity in the security certification access system is also added into the computing resource pool to form the terminal cluster, the terminal cluster has strong expansibility, the expansibility of the cloud is increased, the terminal cluster and the service nodes in the static cloud together complete the tasks required to be computed by the single cloud platform, the traditional cloud platform is optimized, the load of the cloud is effectively reduced, and the load balance is achieved.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

Fig. 1 is a structural block diagram of an end cloud cooperative computing system according to the present invention.

Fig. 2 is a task computing flow chart of an end cloud coordination system-based task computing with a fault tolerant system according to the present invention.

Fig. 3 is a flowchart of a fault tolerance method of an end cloud cooperative computing architecture according to the present invention.

Detailed Description

As shown in fig. 1, an end cloud collaborative computing system is characterized in that: the system consists of a task management server module, an end cloud cooperative server module, a static cloud service node module and a mobile terminal computing service node module, wherein the task management server module comprises four modules:

In this embodiment, the end cloud cooperation server module is further equipped with an automatic discovery module, and the end cloud cooperation server uses an active detection method to discover available resources in time, dynamically extend mobile terminal resources for the cloud platform, allocate tasks suitable for operations of static clouds and mobile terminal resources, and perform real-time communication with the static clouds and the mobile terminal resources to ensure normal execution of the fault-tolerant system.

In this embodiment, the static cloud service node module is further equipped with a dynamic monitoring module, and the dynamic monitoring module can monitor the uploading condition of the calculation result of the terminal resource check point in real time by the cooperation with the cloud coordination server, so as to ensure the operation of the fault-tolerant system.

In this embodiment, the mobile terminal computing service node module is further equipped with a log storage and uploading module, and can upload the computing results of each check point to the peer cloud cooperation server on time.

In this embodiment, the mobile terminals are all brands of pads and mobile phones based on the android system.

As shown in fig. 2, a fault tolerance method for a peer cloud cooperative computing system is characterized by comprising the following specific steps:

step three: after receiving the task, the cloud end performs checkpoint setting on the task, performs static cloud backup operation on the task after checkpoint setting, and sends the task to the end cloud coordination server after execution is completed;

step four: the end cloud cooperation server sends the task processed by the cloud end to the mobile end equipment;

In this embodiment, the second step specifically includes: the tasks are classified and processed, the tasks are divided into priorities according to the end cloud distribution module, the task queues are sorted according to the first-come first-serve and priority sizes, a task distribution strategy is formulated, and the tasks suitable for being calculated on front-end equipment are sent to the cloud end first.

In this embodiment, the fifth step is specifically: in the processing process, if the operation is successful, the result is uploaded to the cooperative server, meanwhile, the cooperative server uploads the result to the cloud end, and then the backup part of the task is deleted by the cloud end; if the operation fails, the cooperative server executes the fault-tolerant method and returns the task to the cloud for execution.

Example one

With reference to fig. 1 and fig. 2, this example details a specific execution flow of a cloud-based cooperative computing architecture with a fault tolerance method according to the present invention, and includes the following steps:

step 1, a task management server of the end cloud coordination system sends a received task from a user to an end cloud coordination server;

step 2, the end cloud cooperative server receives the task of the task management server, then performs task allocation operation on the task, mainly based on the principle that priority and first-in first-out are set for the task, and respectively sends the task to the cloud end and the mobile end for calculation according to the category of the task;

step 3, sending the task suitable for the mobile terminal to calculate in the step 2 to a static cloud terminal, wherein the static cloud terminal obtains an estimated deadline T for the task, then setting a check point for the task, and after the setting is finished, performing static cloud backup on the task and sending the task to a cooperative server;

step 4, the cooperative server sends the task with the check point to the mobile terminal;

step 5, the mobile terminal executes the task, judges the task according to a fault-tolerant mechanism in the calculation process, and if the static cloud terminal receives the calculation result of each check point of the mobile terminal obtained from the cooperative server all the time in the execution process, the task is continuously executed to the completion by the mobile terminal;

step 6, if the static cloud end does not receive the check point calculation result of the mobile end acquired from the cooperative server in the execution process of the mobile end, the cloud server of the end requests the calculation of the mobile end to return to an available check point, recalculation is carried out, if the retry frequency exceeds 3 times and the static cloud end still does not receive the result, cloud recovery operation is executed, the task calculated on the mobile end is unloaded, the task is recovered to the static cloud end, and the static cloud end calculates according to the calculation result of the available check point;

and 7, ending.

Example two

With reference to fig. 3, this example describes in detail a fault tolerance method based on an end cloud cooperative computing architecture provided in the present invention, where the method is described as follows:

setting the total amount of one task as M, wherein a represents the number of check points which can tolerate e error tasks and are inserted at equal intervals, and the interval is N, and then N is M/(a + 1); wherein e is₁Indicating an error occurred during checkpoint save, e₂Indicating an error that occurred at checkpoint recovery, e₃Indicating an error that occurred during the efficient computation; when e is generated₁In case of error, the maximum time required for recovering to normal operation is' check point interval time N + time R for recovering one check point_c+ time S for saving a check point_c", when generating e₂In the event of an error, the maximum time required to restore to normal operation is "time for restoring one checkpoint" R_c", when the b3 error is generated, the maximum time required to recover to normal operation is" checkpoint interval time N + time R for recovering a checkpoint_c". When a task is executed, its total response time is defined as "estimated task execution time without error + time required to save a checkpoints + e₁Time + e at which a fault occurs during the retention period₂One fault occurs during state recovery + e₃A fault occurs during active execution. It can be seen that the worst case occurs when only e occurs₁Type of error (when it occurs, the maximum time required to recover from a failure to the task functioning properly is the longest), and,the total response time should be less than a cutoff time T pre-calculated by the system, by which the number of checkpoints a can be finally calculated.

After the check point is set, cloud backup storage is carried out on the static cloud end, the task with the check point set is sent to the end cloud coordination server, and the task is sent to the mobile end by the end cloud coordination server to be calculated. In the calculation process of the mobile terminal, when each check point is reached, the task carries out calculation result backup operation once, the result is stored in a local log file and uploaded to the cooperative server, and the result is uploaded to the static cloud terminal by the cooperative server. And the static cloud end waits at each check point, the waiting time does not exceed the check point interval time and the task transmission time at most, if the overtime does not obtain the correspondence, the static cloud end sends an instruction to the cooperative server, and the mobile end is required to retry: the calculation is returned to the mobile for recalculation to an available checkpoint and the maximum number of retries is 3. And if the terminal cloud cooperation server does not obtain any response of the mobile terminal for more than 3 times or the total number of times of errors of the mobile terminal exceeds more than half of the number of the check points, executing cloud recovery operation, unloading the task of the mobile terminal, and calculating by the static cloud terminal according to the data of the check points available on the task. If the retry receives the response, the mobile terminal continues to execute the task, and after the task is completed, the static cloud terminal unloads the backup information of the task which is stored before.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. An end cloud collaborative computing system, characterized in that: the system consists of a task management server module, an end cloud cooperative server module, a static cloud service node module and a mobile terminal computing service node module, wherein the task management server module comprises four modules:

the mobile terminal computing service node module: the module consists of various mobile terminals and is responsible for calculating tasks sent by the terminal cloud collaboration server;

the end cloud cooperation server module is also provided with an automatic discovery module, and the end cloud cooperation server uses an active detection method for discovering available resources in time, can dynamically expand mobile terminal resources for the cloud platform, allocate tasks suitable for the operation of the static cloud and the mobile terminal resources, and perform real-time communication with the static cloud and the mobile terminal resources to ensure the normal execution of the fault-tolerant system.

2. The peer-cloud collaborative computing system of claim 1, wherein: the static cloud service node module is also provided with a dynamic monitoring module, and can monitor the uploading condition of the calculation result of the terminal resource check point in real time by the cooperation of the cloud and the cloud cooperative server so as to ensure the operation of the fault-tolerant system.

3. The peer-cloud collaborative computing system of claim 1, wherein: the mobile terminal computing service node module is also provided with a log storage and uploading module, and can upload the computing results of each check point to the terminal cloud cooperation server on time.

4. The peer-cloud collaborative computing system of claim 1, wherein: the mobile terminals are all brands of pads and mobile phones based on the android system.

5. A fault tolerance method for applying the end cloud cooperative computing system according to any one of claims 1 to 4 is characterized by comprising the following specific steps:

6. The fault tolerance method for the cloud-side collaborative computing system according to claim 5, wherein the second step is specifically: the tasks are classified and processed, the tasks are divided into priorities according to the end cloud distribution module, the task queues are sorted according to the first-come first-serve and priority sizes, a task distribution strategy is formulated, and the tasks suitable for being calculated on front-end equipment are sent to the cloud end first.

7. The fault tolerance method for the cloud-side collaborative computing system according to claim 5, wherein the step five is specifically: in the processing process, if the operation is successful, the result is uploaded to the cooperative server, meanwhile, the cooperative server uploads the result to the cloud end, and then the backup part of the task is deleted by the cloud end; if the operation fails, the cooperative server executes the fault-tolerant method and returns the task to the cloud for execution.