CN105323271A

CN105323271A - Cloud computing system, and processing method and apparatus thereof

Info

Publication number: CN105323271A
Application number: CN201410289531.7A
Authority: CN
Inventors: 莫嫣; 高洪; 韩银俊
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2014-06-24
Filing date: 2014-06-24
Publication date: 2016-02-10
Anticipated expiration: 2034-06-24
Also published as: WO2015196692A1; CN105323271B

Abstract

The present invention provides a cloud computing system, and a processing method and apparatus of the cloud computing system. The processing method of the cloud computing system includes the steps as follows: receiving an operation requirement of a client side to the cloud computing system; acquiring the data identification to be operated in the cloud computing system according to the operation requirement; searching various disks storing data corresponding to the data identification in various nodes of the cloud computing system, and various disk states according to a node disk state report of the cloud computing system; performing corresponding operations according to the various dish states storing data corresponding to the data identification in the various nodes of the cloud computing system. The node disk state report includes: the disk states in the various nodes of the cloud computing system, and the data identification corresponding to the data stored in the disks. According to the invention, the tolerance of the system to disk faults can be improved.

Description

The processing method of a kind of cloud computing system and cloud computing system and device

Technical field

The present invention relates to field of cloud computer technology, refer to processing method and the device of a kind of cloud computing system and cloud computing system especially.

Background technology

At present, cloud computing (CloudComputing) is grid computing (GridComputing), Distributed Calculation (DistributedComputing), parallel computation (ParallelComputing), effectiveness calculate the product that the traditional calculations machine technology such as (UtilityComputing), the network storage (NetworkStorageTechnologies) virtual (Virtualization), load balancing (LoadBalance) and network technical development merge.It to be intended to by network, the computational entity of multiple advantage of lower cost, be integrated into the system that has powerful calculating ability.Distributed caching is a field in cloud computing category, and its effect is to provide the distributed storage service of mass data and the ability of high-speed read-write access.

Distributed cache system is connected to each other by some server nodes and client and forms; Server node is responsible for the storage of data, and client can to operations such as the write of Servers for data, reading, renewal, deletions.In general, data can not only be kept on individual server node (hereinafter referred to as " node "), but on multiple stage node, preserve the copy of same data, backup each other.Modal memory module is active-standby mode, and one of them node is as host node (master), and other nodes are as slave node (slave), and the identity of host node is obtained by election or other algorithms.For simple flow, Data Update generally occurs on the primary node, and slave node obtains data from host node to carry out synchronously, and data access can obtain data from host node, also can obtain data from slave node, specifically sees the consistency policy of this access.

In distributed cache system, according to the requirement of consistency and availability, generally this data storage method is classified by NRW, wherein N represents the number of copies of data, R represents the data trnascription number obtained in a data access request, and W represents the minimum participation nodes (Data Update namely on how many nodes completes) of a Data Update request.

When distributed cache system realizes persistence function, distribution data are on that server kept on disk.In practical situations both, if disk failures, this server just cannot provide read-write to serve.Because distributed cache system data preserve the characteristic of multiple copy, at this moment, as long as other servers are in normal condition, system still can normally provide read-write service by the copy of other nodes.

If distributed cache system node has mounted polylith disk, wherein only have one or a few disk to damage for a certain reason, cause this server normally can not provide service, according to aforementioned, because other servers are good for use, whole cluster or available.Assuming that during this period of time, another server also there occurs analogue, and that node normally can not provide service, probably makes number of copies cannot meet NRW strategy, and so distributed caching cluster just thoroughly cannot provide and serve.Typically under relatively more conventional NRW is the condition of 3/2/2, two nodes break down, and only have a node normal, read-write operation all cannot meet the minimum requirement operated on two copies.

Summary of the invention

The technical problem to be solved in the present invention is, provides processing method and the device of a kind of cloud computing system and cloud computing system, can improve the tolerance of system to disk failure.

For solving the problems of the technologies described above, embodiments of the invention provide a kind of energy-consumption monitoring system, comprising:

On the one hand, a kind of processing method of cloud computing system is provided, comprises:

Receive client to the operation requests of cloud computing system;

According to described operation requests, obtain Data Identification to be operated in described cloud computing system;

Node Disk State according to described cloud computing system is reported, searches in each node of described cloud computing system and stores each disk of described Data Identification corresponding data and the state of disk described in each; The report of described node Disk State comprises: the Data Identification corresponding to data stored in the state of disk, described disk in each node of described cloud computing system;

According to store in each node in described cloud computing system described Data Identification corresponding data each described in the state of disk, operate accordingly.

The described state according to disk described in each, the step of carrying out corresponding operation comprises:

Described operation requests is update request; When storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the minimum participation number of nodes of a predetermined Data Update request of described cloud computing system, then respond described update request; Otherwise, refuse described update request; Or

Described operation requests is data access request; When storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the data trnascription quantity that a predetermined data access request of described cloud computing system obtains, then respond described data access request; Otherwise, refuse described data access request.

Described when storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the minimum participation number of nodes of a predetermined Data Update request of described cloud computing system, then the step responding described update request comprises:

When described operation requests is update request, and when the state storing the disk of the host node of described data is normal, the host node of described cloud computing system carries out Data Update to the described data place disk of host node; Described cloud computing system obtain data to be synchronized from node from described host node, describedly carry out Data Update from node to the described described data place disk from node;

When described operation requests is update request, and when the state storing the disk of the host node of described data is fault, first of described cloud computing system carries out Data Update from node to described first from the described data place disk of node; Second of described cloud computing system obtains data to be synchronized from node from described first from node; Described Section Point carries out Data Update to described second from the described data place disk of node; Described first state from node and described second from the disk of the described data of the storage of node is normal.

Described when storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the data trnascription quantity that a predetermined data access request of described cloud computing system obtains, then the step responding described data access request comprises:

When described operation requests is data access request, and when the state storing the disk of the host node of described data is normal, from the described data place disk of the host node of described cloud computing system, obtain the first authentic copy of described data, from the described data place disk of node, obtain the triplicate of described data from least one of described cloud computing system; From the described first authentic copy and described triplicate, choose the copy of latest edition; And the copy of described latest edition is sent to described client; Described second is normal from the state of the disk of the described data of the storage of node;

When described operation requests is data access request, and when the state storing the disk of the host node of described data is fault, from the described data place disk of node, obtain the triplicate of described data from least one of described cloud computing system; From triplicate described at least one, choose the copy of latest edition, and the copy of described latest edition is sent to described client; Described second is normal from the state of the disk of the described data of the storage of node.

Before the step of the operation requests of described reception client, described method also comprises:

The node Disk State report of described cloud computing system is obtained from node.

On the other hand, a kind of processing unit of cloud computing system is provided, comprises:

First receiving element, receives client to the operation requests of cloud computing system;

Acquiring unit, according to described operation requests, obtains Data Identification to be operated in described cloud computing system;

Search unit, the node Disk State according to described cloud computing system is reported, searches in each node of described cloud computing system and stores each disk of described Data Identification corresponding data and the state of disk described in each; The report of described node Disk State comprises: the Data Identification corresponding to data stored in the state of disk, described disk in each node of described cloud computing system;

Operating unit, according to store in each node in described cloud computing system described Data Identification corresponding data each described in the state of disk, operate accordingly.

Described operating unit comprises:

First response subelement, described operation requests is update request; When storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the minimum participation number of nodes of a predetermined Data Update request of described cloud computing system, then respond described update request;

First refusal subelement, when storing described data in described cloud computing system and the quantity being in the described disk of normal condition is less than the minimum participation number of nodes of a predetermined Data Update request of described cloud computing system, refuses described update request;

Second response subelement, described operation requests is data access request; When storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the data trnascription quantity that a predetermined data access request of described cloud computing system obtains, then respond described data access request;

Second refusal subelement, when storing described data in described cloud computing system and the quantity being in the described disk of normal condition is less than the data trnascription quantity that a predetermined data access request of described cloud computing system obtains, refuses described data access request.

Described device, also comprises:

Second receiving element, receives the node Disk State report of described cloud computing system from node.

On the other hand, a kind of cloud computing system is provided, comprises: client, processing unit, node, the disk that described node is corresponding;

Described processing unit, receives the operation requests of described client to cloud computing system; According to described operation requests, obtain Data Identification to be operated in described cloud computing system; Node Disk State according to described cloud computing system is reported, search described cloud computing system each described in store each disk of described Data Identification corresponding data and the state of disk described in each in node; Described node Disk State report comprises: the Data Identification corresponding to data stored in the state of disk described in each node of described cloud computing system, described disk; According to store in each node in described cloud computing system described Data Identification corresponding data each described in the state of disk, operate accordingly.

Described node, to described processing unit sending node Disk State report.

The beneficial effect of technique scheme of the present invention is as follows:

The present invention is directed to distributed cache system, when there being disk failures, available resource can be made full use of, integrate out the copy resource meeting consistency and availability requirement, improve the availability of system as far as possible, improve system to the tolerance of fault.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of the processing method of a kind of cloud computing system of the present invention;

Fig. 2 is the structural representation of the processing unit of a kind of cloud computing system of the present invention;

Fig. 3 is the structural representation of a kind of cloud computing system of the present invention;

Fig. 4 and Fig. 5 is the structural representation of the application scenarios of a kind of cloud computing system of the present invention.

Embodiment

For making the technical problem to be solved in the present invention, technical scheme and advantage clearly, be described in detail below in conjunction with the accompanying drawings and the specific embodiments.

As shown in Figure 1, be the processing method of a kind of cloud computing system of the present invention, comprise:

Step 11, receives client to the operation requests of cloud computing system; Operation requests can be Data Update request or data access request etc.

Step 12, according to described operation requests, obtains Data Identification to be operated in described cloud computing system; Such as, operation requests is upgrade the copy 1 in Fig. 4, and copy 1 is Data Identification.

Step 13, the node Disk State according to described cloud computing system is reported, searches in each node of described cloud computing system and stores each disk of described Data Identification corresponding data and the state of disk described in each; The report of described node Disk State comprises: the Data Identification corresponding to data stored in the state of disk, described disk in each node of described cloud computing system; The state of disk is normal or fault, and in Fig. 4, the Disk State of node A is reported as: (node A: disk I, copy 1, fault; Disk I I, copy 2, normally; Disk I II, copy 3, normal).

Step 14, according to store in each node in described cloud computing system described Data Identification corresponding data each described in the state of disk, operate accordingly.

Before step 14, described method also comprises:

Step 10, obtains the node Disk State report of described cloud computing system from node.Nodal test to storage one data disk failures or break down, then send report; Or send report based on request.

Wherein, step 14 step comprises:

Be specially:

When described operation requests is data access request, and when the state storing the disk of the host node of described data is normal, the first authentic copy of described data is obtained from the described data place disk of the host node of described cloud computing system, from the described data place disk of node, the triplicate of described data is obtained from described cloud computing system at least one (also can be two or 3, according to actual conditions setting); From the described first authentic copy and described triplicate, choose the copy of latest edition; And the copy of described latest edition is sent to described client; Described second is normal from the state of the disk of the described data of the storage of node;

Such as, Fig. 5 is a distributed caching storage system be made up of 3 nodes, and each data of this storage system have three copies, adopts the mode of 322 to upgrade and visit data.The read request access copy amount that cloud computing system specifies is 2, when there being a disk to break down, still can respond renewal or data access operation request, when there being two disks to break down, then and can not operation response request.

In the present invention, when generation node disk failure, even multiple node breaks down disk simultaneously, as long as on remaining available disk, number of copies can meet NRW strategy on cluster, system just can ensure consistency and availability, even may not affect the service of all data, more thoroughly can not cannot provide the situation of service by generation systems, also just provide service as far as possible.

Certainly, when part disk failures continues to provide service, the recovery problem of data after thereupon bringing disk to recover, this can be completed by distributed caching data recovery function, namely obtains copy data to repair from other nodes.

As shown in Figure 2, be the processing unit of a kind of cloud computing system of the present invention, comprise:

First receiving element 21, receives client to the operation requests of cloud computing system;

Acquiring unit 22, according to described operation requests, obtains Data Identification to be operated in described cloud computing system;

Search unit 23, the node Disk State according to described cloud computing system is reported, searches in each node of described cloud computing system and stores each disk of described Data Identification corresponding data and the state of disk described in each; The report of described node Disk State comprises: the Data Identification corresponding to data stored in the state of disk, described disk in each node of described cloud computing system;

Operating unit 24, according to store in each node in described cloud computing system described Data Identification corresponding data each described in the state of disk, operate accordingly.

Described operating unit 24 comprises:

Described device, also comprises:

Second receiving element 25, receives the node Disk State report of described cloud computing system from node.

As shown in Figure 3, be a kind of cloud computing system of the present invention, comprise: the disk 34 of client 31, processing unit 32, node 33, described node 33 correspondence;

Described processing unit 32, receives the operation requests of described client 31 pairs of cloud computing system; According to described operation requests, obtain Data Identification to be operated in described cloud computing system; Node Disk State according to described cloud computing system is reported, search described cloud computing system each described in store the disk of described Data Identification corresponding data and the state of disk 34 described in each in node 33; Described node Disk State report comprises: the Data Identification corresponding to data stored in the state of disk described in each node 33 of described cloud computing system, described disk; According to store in each node in described cloud computing system described Data Identification corresponding data each described in the state of disk 34, operate accordingly.

Described node 33, to described processing unit 32 sending node Disk State report.

Two methods scene of the present invention is below described.

First application scenarios is the implementation method describing the availability when disk failures situation in a kind of cloud computing distributed cache system, under many disks path.

Previous step: in client and distributed cache system, multiple server node connects, connect mutually and normal operation between server node, each server has some pieces of disks for the persistence of data, and different data fragmentation persistences is on different disks.Data trnascription number is N, read request access number of copies is R, the minimum more latest copy of write request successfully counts as W, the maximum Fault Tolerance of single of system is that (expression allows that the request on O node is broken down to O, as Single Point of Faliure then O=1, O<W), coherence request W+R>N.

Steps A: under normal circumstances, all disks on each node normally work system, and data have N number of copy in systems in which.When client initiates Data Update request, Data Update process is carried out to data place disk by Master, slave is from master synchrodata, and carry out Data Update to data place disk on slave, after Data Update is successfully completed on W node, returns client data and be updated successfully message;

When client initiates data access request, asked by Master/Slave process, after obtaining the data trnascription of access from R node data place disk, from this R data trnascription, choose up-to-date copy return to client.

During step B: node A startup, find that certain disk failure cannot be accessed, but other disks are still normal; Or, in node A running, find that certain disk repeatedly accesses failure, be judged to be this disk failure.Node A does not switch to node failure, but continues to provide read-write service, records the mark of data trnascription corresponding on failed disk and this disk simultaneously.

Step C: when client initiates Data Update request, and these data are distributed in the failed disk of node A described in step B just, then, when to these node updates data, node A directly returns failure; After Data Update (does not comprise node A in this W node) and is successfully completed on W node, return to client data and be updated successfully message;

When client initiates data access request, node A directly returns failure, asked by Master/Slave process, obtain after the data trnascription of access from R node (not comprising node A this R node) data place disk, from this R data trnascription, choose up-to-date copy return to client.

Step D: when client initiates Data Update and access request, and these data are not distributed in the failed disk of node A described in step B, then the same steps A of processing mode.

Step e: when Node B is in running, repeatedly accesses certain disk and unsuccessfully judges that this disk is as fault.Node B does not switch to node failure, but continues to provide read-write service, records the mark of data trnascription corresponding on failed disk and this disk simultaneously.

Assuming that the copy that the failed disk of the failed disk of Node B and node A is preserved is without coincidence.Continue next step.

Step F: when client initiates Data Update and access request, and these data are distributed in the failed disk of Node B described in step e just, based on above-mentioned supposition, then not in the failed disk of node A described in step B, then when to these node updates data, Node B directly returns failure; After Data Update (does not comprise Node B in this W node) and is successfully completed on W node, return to client data and be updated successfully message;

When client initiates data access request, Node B directly returns failure, asked by Master/Slave process, obtain after the data trnascription of access from R node (not comprising Node B this R node) data place disk, from this R data trnascription, choose up-to-date copy, return to client.

Step G: when client initiates Data Update request, and these data are distributed in the failed disk of node A described in step B just, based on above-mentioned supposition, then not in the failed disk of Node B described in step e, then when to this node updates and visit data, processing procedure is with step C, and result normally to upgrade and to have access to.

The invention provides a kind of implementation method improving availability at distributed cache system in many disk failures situation, when consistency is constant, enhance the availability of system, thus optimize application experience.

Below in conjunction with Fig. 4 and Fig. 5, the second application scenarios is described.

Be specially: describe in detail under single node occurs that disk failures and multinode occur disk failures simultaneously for the active and standby storage system of 322 patterns, availability implementation.

Distributed cache system is formed by server node and client, to specific data, there is a host node (master) to be responsible for the process renewal of client and access request, have several slave nodes for the data of synchronous master and receive the data access request (slave is deal with data update request not) of client.

Environment: a distributed caching storage system be made up of 3 nodes, each data of this storage system have three copies, adopt the mode of 322 to upgrade and visit data.

The present invention includes following steps:

Step 1, initial normal phase, system acceptance client-requested, in the disk I that tentation data is positioned at node A copy 1 (being equivalent to above-mentioned Data Identification), Node B disk I on copy 1 and node C disk I II on copy 1.For the purpose of describing and simplifying, assuming that the copy 1 in Node B is master, the copy on other two nodes is slave.Copy 2 on node A is master, and the copy on other two nodes is slave.Copy 3 on node A is master, and the copy on other two nodes is slave.

Step 2, when client initiates Data Update request, Data Update is carried out to copy in disk I 1 by B node M aster, slave is from master synchrodata, and carry out Data Update to data place disk on slave, after Data Update is successfully completed on W=2 node, returns to client data and be updated successfully message.Because all disks are all normal, actual all copies have all been updated successfully; When client initiates data access request, three nodes all process request, and after obtaining the data trnascription of access from R=2 node data place disk, return client, actual all node copies have all read successfully.

Step 3, as shown in Figure 4, assuming that disk I is damaged on node A, causes copy 1 unavailable.When the data of the update request that client is initiated are positioned on node A copy 1, Data Update is carried out to copy in disk I 1 by B node M aster, the slave of node C is from master synchrodata, and carry out Data Update to data on copy on node C disk I II, at this moment, Data Update returns to client data and is updated successfully message after being successfully completed on W=2 node;

When client initiate the data of data access request be positioned on node A copy 1 time, node A directly returns failure, and after the copy 1 of Node B and node C obtains data, (meeting R=2) returns to client.

Step 4, in step 3 situation, when the renewal that client is initiated and access request are positioned on node A copy 2 or copy 3, because the copy of three nodes is all available, then handling process is with step 2.

Step 5, as shown in Figure 5, when disk I I in Node B damages, causes the copy 3 of Node B unavailable.When the renewal that client is initiated and the data of access request are positioned on node A copy 1, the copy on Node B and node C is all available, and meet NRW strategy, then handling process is with step 3.

Step 6, in step 5 situation, when the renewal that client is initiated and access request are positioned on node A copy 2, because the copy 2 of three nodes is all available, then handling process is with step 2.

Step 7, in step 5 situation, when the data of the update request that client is initiated are positioned on node A copy 3, the copy 3 of B node damages, and the copy 3 of C node can be used.Data Update is carried out to copy on disk I II 3 by A node M aster, the slave of node C is from master synchrodata, and carry out Data Update to data on copy 3 on node C disk I I, after at this moment Data Update is successfully completed on W=2 node, returns client data and be updated successfully message;

When the data that client initiates data access request are positioned on node A copy 3, Node B directly returns failure, and after the copy 3 of node A and node C obtains data, (meeting R=2) returns client.

Can see from above, even if when node A and Node B all exist disk failures, as long as the copy damaging disk does not repeat, distributed caching cluster still can provide the read-write service of total data.

In above-mentioned application scenarios, if there are two malfunctioning nodes, each node reality is part disk failures, when more optimistic, if what the disk damaged was deposited is not the copy of same data, on the available disk of then actual whole system, or at least two copies of in store all data, possess the condition that all services are normally provided completely.Even if just deposit the copy of same data on the disk damaged, data available so on other disks, still can meet consistency and availability, can provide read-write service, only for this part data damaged simultaneously, read and write access cannot be provided.

Beneficial effect of the present invention is as follows:

The present invention is directed to distributed cache system, when there being disk failures, available resource can be made full use of, integrate out the copy resource meeting consistency and availability requirement, improve the availability of system as far as possible, improve system to the tolerance of fault.That is, in field of cloud calculation distributed cache system, a kind of disk and data management mechanism are provided, even if in node section disk failures situation, still can utilize the data on available disk as far as possible, keep the ability that service is provided, make service end when less disk or data resource, the stores service of consistency and availability is provided.

The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the prerequisite not departing from principle of the present invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims

1. a processing method for cloud computing system, is characterized in that, comprising:

Receive client to the operation requests of cloud computing system;

2. method according to claim 1, is characterized in that, the described state according to disk described in each, and the step of carrying out corresponding operation comprises:

3. method according to claim 2, it is characterized in that, described when storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the minimum participation number of nodes of a predetermined Data Update request of described cloud computing system, then the step responding described update request comprises:

4. method according to claim 2, it is characterized in that, described when storing described data in described cloud computing system and the quantity being in the described disk of normal condition is more than or equal to the data trnascription quantity that a predetermined data access request of described cloud computing system obtains, then the step responding described data access request comprises:

5. method according to claim 1, is characterized in that, before the step of the operation requests of described reception client, described method also comprises:

6. a processing unit for cloud computing system, is characterized in that, comprising:

7. device according to claim 6, is characterized in that, described operating unit comprises:

8. device according to claim 6, is characterized in that, also comprises:

9. a cloud computing system, is characterized in that, comprising: client, processing unit, node, the disk that described node is corresponding;

Described processing unit, receives the operation requests of described client to cloud computing system; According to described operation requests, obtain Data Identification to be operated in described cloud computing system; Node Disk State according to described cloud computing system is reported, search described cloud computing system each described in store the disk of described Data Identification corresponding data and the state of disk described in each in node; Described node Disk State report comprises: the Data Identification corresponding to data stored in the state of disk described in each node of described cloud computing system, described disk; According to store in each node in described cloud computing system described Data Identification corresponding data each described in the state of disk, operate accordingly.

10. system according to claim 9, is characterized in that, described node, to described processing unit sending node Disk State report.