CN106339279B - Service recovery method and device - Google Patents

Service recovery method and device Download PDF

Info

Publication number
CN106339279B
CN106339279B CN201610720759.6A CN201610720759A CN106339279B CN 106339279 B CN106339279 B CN 106339279B CN 201610720759 A CN201610720759 A CN 201610720759A CN 106339279 B CN106339279 B CN 106339279B
Authority
CN
China
Prior art keywords
area
log data
server
index table
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610720759.6A
Other languages
Chinese (zh)
Other versions
CN106339279A (en
Inventor
杜鑫
陆强
黄哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Information Technologies Co Ltd
Original Assignee
New H3C Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Technologies Co Ltd filed Critical New H3C Technologies Co Ltd
Priority to CN201610720759.6A priority Critical patent/CN106339279B/en
Publication of CN106339279A publication Critical patent/CN106339279A/en
Application granted granted Critical
Publication of CN106339279B publication Critical patent/CN106339279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The embodiment of the invention provides a service recovery method and a service recovery device, which are applied to an Hbase system, and the method comprises the following steps: after sensing that a first area server fails, searching a stored index table, and acquiring a first area corresponding to the first area server and first log data in the index table; storing first log data corresponding to the first area; and distributing the first area to a second area server which normally runs, so that when the second area server loads the first area, service recovery is carried out according to first log data corresponding to the first area. By applying the embodiment of the invention, after the regional server fails, the region corresponding to the failed regional server and the log data corresponding to the region can be directly searched from the index table without log segmentation, so that the efficiency of service recovery after the regional server fails is improved.

Description

Service recovery method and device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a service recovery method and apparatus.
Background
The HBase (Hadoop database) is a nematic, scalable and real-time read-write distributed storage database, can realize parallel and distributed processing of complex tasks, and has high processing performance and reliability.
The HBase is based on a Distributed File System 120 (HDFS), and its general structure is shown in fig. 1, and includes: a host (Master)101, a regional server (RegionServer)102, a coordinator server (Zookeeper)103, and a client 104.
Wherein, the client 104 accesses the interface of the HBase for the user.
The Master101 is connected to each of the regionservers 102, and is responsible for managing the respective regionservers 102.
The regionservers 102 are the most core modules in the HBase, and each RegionServer102 receives a read-write request of a user to a database through the client 104 according to its allocated Region (Region), responds to the read-write request, and reads and writes data into the distributed file system 120.
Zookeeper 103: and coordinating data sharing and access among network elements in the cluster, storing the positions of all the RegionServers and monitoring the states of the RegionServers in real time. In addition, the Zookeeper103 also stores other information required for the HBase to operate.
Specifically, each RegionServer102 manages multiple hregage objects. Each HRegion corresponds to a Region in the database Table Table. In HBase, one Region is allocated to only one Region server.
The principle of operation of the RegionServer202 is shown in fig. 2, which shows a case where one RegionServer202 is allocated 3 hregations 203 in fig. 2.
The Region server202 receives a read-write request of a user through the client 201, determines which Region belongs to according to data in the read-write request, submits the read-write request to a corresponding HRegion203 object, and the HRegion203 object writes data in the read-write request into a corresponding memory 204, and further reads and writes the data into a distributed file system, thereby completing the operation corresponding to the read-write request. In this process, the hregage 203 object generates log data for the read/write request and stores the log data into the HLog. One RegionServer202 corresponds to one HLog in which logs of all the RegionServers 202 are stored. The log data in the HLog are stored in sequence according to the time of log data generation, and the log data of each Region are mixed together.
In a distributed system environment, a system error or downtime cannot be avoided, and once the RegionServer is accidentally exited due to a fault, data in a memory is lost, so that service interruption is caused.
To recover the interrupted traffic, the Master will assign the Region of the failed Region server to the new Region server. The new Region server needs to obtain the log data corresponding to each Region in the failed Region server, and then the service can be respectively recovered aiming at different regions.
Because the log data of each Region in the HLog are mixed together, the log data in the HLog needs to be divided first, and the log data corresponding to each Region needs to be divided.
Currently, there are two ways to perform log splitting:
firstly, the Master senses a fault of a certain RegionServer through the Zookeepers, reads all log data in the HLog corresponding to the fault RegionServer, performs log segmentation on the log data, obtains the log data of each Region to be respectively stored, and obtains the log data distributed to the Region of the Master from the log data stored by the new RegionServer.
Secondly, the Master senses a fault of a certain RegionServer through the Zookeepers, generates a plurality of log segmentation tasks aiming at all log data in the HLog corresponding to the fault RegionServer, distributes the log segmentation tasks to different RegionServers and completes the log segmentation tasks in parallel by the different RegionServers.
Although the two modes can complete log segmentation, the first mode is completely processed by the Master, which needs a long time, and thus, the overload of the I/O and the memory is caused, and the processing efficiency is low, because the data volume of log data is huge; in the second mode, the RegionServer processing needs to occupy a large amount of computing resources of different regionservers, which causes resource consumption and lower processing efficiency.
Therefore, in the prior art, because log segmentation is required, and the processing efficiency of the current log segmentation is low, the efficiency of service recovery is not high.
Disclosure of Invention
The embodiment of the invention aims to provide a service recovery method and a service recovery device so as to improve the efficiency of service recovery after a regional server fails. The specific technical scheme is as follows:
on one hand, the embodiment of the invention discloses a service recovery method, which is applied to an Hbase system and comprises the following steps:
after sensing that a first area server fails, searching a stored index table, and acquiring a first area corresponding to the first area server and first log data in the index table; the index table is used for recording the corresponding relation among the area server, the area and the log data;
storing first log data corresponding to the first area;
and distributing the first area to a second area server which normally runs, so that when the second area server loads the first area, service recovery is carried out according to the stored first log data corresponding to the first area.
Preferably, the index table further includes status information, and the status information is used to indicate an operation status of the area server;
the allocating the first zone to a second zone server that operates normally includes:
selecting a normally operating regional server according to the state information in the index table;
and taking the selected area server as a second area server, and distributing the first area server to the second area server.
Preferably, the method further comprises: before the first area server fails, first log data generated in each read-write operation executed for the first area is recorded in the index table.
Preferably, after the first zone is allocated to a second zone server which normally operates, the method further includes:
recording second log data in the index table, so that the index table records the corresponding relation between the second area server and the first area as well as the second log data; the second log data includes first log data and log data generated by the second area server in read-write operation performed on the first area.
Preferably, the method further comprises:
after the fault recovery of the first regional server is sensed, searching a stored index table, and obtaining a first region distributed to the second regional server and second log data in the index table;
storing second log data corresponding to the first area;
and allocating the first area back to the first area server, so that when the first area server loads the first area, service switching back is carried out according to second log data corresponding to the first area.
Preferably, the number of the first areas corresponding to the first area server is multiple;
the step of storing the first log data corresponding to the first area is as follows: storing first log data corresponding to each first area;
the step of allocating the first area to a second area server which normally operates is as follows: and respectively allocating each first area to one or more second area servers which normally run, so that when each second area server loads the allocated first area, service recovery is carried out according to the stored first log data corresponding to the first area.
On the other hand, the embodiment of the invention discloses a service recovery device, which is applied to an Hbase system and comprises the following components:
the first table look-up unit is used for looking up a stored index table after sensing that a first area server fails, and acquiring a first area corresponding to the first area server and first log data in the index table; the index table is used for recording the corresponding relation among the area server, the area and the log data;
a first storage unit, configured to store first log data corresponding to the first area;
and the first allocation unit is used for allocating the first area to a second area server which normally runs, so that when the second area server loads the first area, service recovery is carried out according to the stored first log data corresponding to the first area.
Preferably, the apparatus further comprises: a state information unit for setting and representing the operation state of the area server;
the first distribution unit includes:
the selecting subunit is used for selecting the area server which normally runs according to the state information in the index table;
and an allocation subunit, configured to use the selected area server as a second area server, and allocate the first area server to the second area server.
Preferably, the apparatus further comprises:
and a first log recording unit configured to record, in the index table, first log data generated for each read/write operation performed on the first area before the first area server fails.
Preferably, the apparatus further comprises:
a second log recording unit, configured to record second log data in the index table, so that the index table records a corresponding relationship between the second area server and the first area, and the second log data; the second log data includes first log data and log data generated by the second area server in read-write operation performed on the first area.
Preferably, the apparatus further comprises:
the second table look-up unit is used for looking up the stored index table after sensing the fault recovery of the first area server, and acquiring a first area and second log data which are distributed to the second area server in the index table;
the second storage unit is used for storing second log data corresponding to the first area;
a second allocation unit configured to allocate the first area back to the first area server so that the first area server performs a service return based on second log data corresponding to the first area when the first area server is loaded with the first area.
As can be seen from the foregoing technical solutions, the service recovery method and apparatus provided in the embodiments of the present invention include, in the index table, a correspondence between the area server, the area, and the log data; that is to say, in the index table, the log data corresponding to each area has been recorded, so that after the area server fails, the area corresponding to the failed area server and the log data corresponding to the area can be directly searched from the index table without performing log segmentation, thereby improving the efficiency of service recovery after the area server fails. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a prior art Hbase system;
FIG. 2 is a schematic diagram of the working principle of a prior art RegionServer;
fig. 3 is a schematic flow chart of a service recovery method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a working principle of a service recovery method RegionServer to which an embodiment of the present invention is applied;
fig. 5 is another schematic flow chart of a service recovery method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a service recovery apparatus according to an embodiment of the present invention;
fig. 7 is another schematic structural diagram of a service restoration apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a service recovery method and a service recovery device, which are used for improving the efficiency of service recovery after a regional server fails.
It should be noted that, in the embodiment of the present invention, an index table is preset in the Hbase system, and a Region server (RegionServer) which normally operates will uniformly write log data corresponding to a Region (Region) allocated by the Region server into the index table. According to the method and the device for recovering the service after the RegionServer fault, the Master in the Hbase system is preferably used for processing based on the index table, so that the service recovery is realized.
Referring to fig. 3, fig. 3 is a schematic flow chart of a service restoration method provided in an embodiment of the present invention, and the method is applied to an Hbase system, and the method includes:
s301: after sensing that a first RegionServer fails, searching a stored index table, and obtaining a first Region corresponding to the first RegionServer and first log data in the index table; the index table is used for recording the corresponding relation among the RegionServer, the Region and the log data;
specifically, the failure of the first RegionServer may be understood as a failure of one RegionServer in the Hbase system, or a failure of multiple regionservers. The first Region corresponding to the first Region server may be one or more regions. When one or more regions servers in the Hbase system simultaneously fail, the Master senses the failed Region or regions servers through the zookeeper and searches and acquires the Region or regions corresponding to the failed Region or regions servers in the stored index table. In the index table, each RegionServer corresponds to one or more Regions, and each RegionServer also corresponds to log data corresponding to the first RegionS.
S302: storing first log data corresponding to the first Region;
s303: and distributing the first Region to a second Region server which normally runs, so that when the second Region server loads the first Region, service recovery is carried out according to the stored first log data corresponding to the first Region.
Specifically, when one or more regions servers in the Hbase system fail, the regions corresponding to the one or more regions servers in the index table are allocated to the normally operating regions servers in the Hbase system. It should be noted that, if the number of the corresponding regions of the failed Region server in the index table is small, all the regions may be allocated to a normally operating Region server, and if the number of the regions allocated to the failed Region server is large, all the regions may be allocated to a normally operating Region server, so as to prevent the load sharing of each Region server from being uneven.
In practical applications, the index table may further include state information for indicating the operating state of each RegionServer. For example, when the RegionServer normally operates, the Master sets the state information of the RegionServer to an Active state, and specifically, the state information of the RegionServer which normally operates may be represented as an Active state in the index table; or, when the RegionServer fails, the Master sets the state information of the RegionServer to a locked state, specifically, the state information of the RegionServer may be indicated as a locked state by "Block" in the index table, that is, the state information of each RegionServer that normally operates is set to "Active", and the state information of each RegionServer that fails is set to "Block". Therefore, the Master can sense the failed RegionServer according to the state information of the RegionServer.
In this case, the step S303 may include: and searching the index table, obtaining the current activated area servers, determining one or more area servers as second area servers from the current activated area servers, allocating the first area to the second area servers which normally operate, namely determining one or more area servers from the area servers which normally operate (namely, allowing the second area servers to take over the first area), and allocating the first area of the failed area server to the determined one or more area servers. Here, it may be determined to select one or more regionservers according to a load condition of the normally operating regionservers; for example, if the number of the corresponding regions of the second Region server in the index table is less, the first Region is allocated to the second Region server.
In addition, in the step S302, after acquiring log data corresponding to one or more regions corresponding to the failed Region server in the index table, the log data may be stored in a specified position associated with a first Region, where the first Region is at least one Region or a plurality of regions.
In this way, in step S303, when the second Region server loads the allocated first Region, the first log data corresponding to the allocated first Region is obtained from the designated location associated with the first Region, respectively, and service recovery is performed.
In the embodiment of the present invention, after the first Region is allocated to the second Region server which normally operates, first log data corresponding to the first Region may be added to the index table, where the first log data corresponds to the second Region server and the first Region.
In an implementation manner of the embodiment of the present invention, before the first RegionServer fails, first log data generated at each read-write operation executed for the first Region is recorded in the index table. It can be understood that, when no RegionServer fails in the Hbase system, that is, all regionservers are in a normal operating state, the RegionServer generates corresponding log data for each read-write operation performed by a Region, and writes the log data into the index table.
Referring to fig. 4, fig. 4 is a schematic diagram of an operating principle of a service restoration method RegionServer provided by applying the embodiment of the present invention. Fig. 4 shows a case where 6 regions (Region1, Region2, Region3, Region4, Region5, and Region6) are allocated to 3 Region servers (Region server1, Region server2, and Region server3), and 2regions are allocated to each Region server.
In this embodiment, the Hbase system further stores an index table, specifically, the index table may be stored in any designated location that can be associated with the Master, the RegionServer, and the Region, and the index table is a table established according to a corresponding relationship among the RegionServer, the Region, and the log data.
For example, specifically, before the RegionServer1 fails, first log data is generated at each read-write operation performed on the Region1 and the Region2 of the RegionServer1 and added to the index table, and the first log data corresponds to the RegionServer1, the Region1, and the Region 2.
For the case illustrated in fig. 4, before the RegionServer1 fails, the index table is shown in table one, where Log in the table is Log data, and Log1 and Log2 are first Log data, specifically, the first Log data corresponds to the RegionServer 1.
Watch 1
Figure BDA0001089849590000091
Figure BDA0001089849590000101
As shown in table two: when the Master senses that the RegionServer1 has a fault, the state information of the RegionServer1 is set to be 'Block', the RegionServer1 and the RegionServer2 in the index table of the RegionServer1 are respectively allocated to the RegionServer2 and the RegionServer3 which normally operate, and meanwhile, first Log data corresponding to the RegionServer1 and the RegionServer2 are respectively and correspondingly added to the RegionServer2 and the RegionServer3, wherein logs in the table are Log data, specifically, logs 1 and logs 2 are first Log data, and the first Log data are specifically Log data corresponding to the RegionServer 1. It should be noted that, in this embodiment, the first RegionServer is a RegionServer1, the second RegionServer includes a RegionServer2 and a RegionServer3, and the first Region includes a Region1 and a Region 2.
Watch two
Figure BDA0001089849590000102
As shown in table two, when the RegionServer fails, the state of the RegionServer becomes Block, the Master writes the Log and Log (i.e. the first Log data) corresponding to the RegionServer into the Region and the Region corresponding to the RegionServer, respectively, and allocates the Region and the Region into which the first Log data is written to the Region server and the Region server, respectively (here, load balancing is considered, so the Region and the Region are allocated to the Region server2Region server, respectively, that is, the Region server and the Region server are the second Region server in the present embodiment), after the Region and the Region are allocated, the Region server and the Region server continue to execute the service in the Region by reading and writing the data in the Region and the Region, thereby realizing the service recovery, further, the Region server generates the Log, the Region server, and the Region table for the corresponding operations, and the Region to the Region server, and the Region server, respectively, and generates the Log and the Region into the Region table (Log and the Region) for the Region server, region2 corresponds to Region Server3 and Log2+ Log 2'. At this time, the index table records the corresponding relationship between the second RegionServer and the first Region, and the second log data, where the second log data includes the first log data and the log data generated by the second RegionServer during the read-write operation performed on the first Region (i.e. the second log data is log1+ log1 'and log2+ log 2').
By applying the embodiment, when the Region Server in the Hbase system fails, the Region corresponding to the failed Region Server can be directly acquired from the stored index table, and the Region is allocated to the normally operating Region Server in the Hbase system. Therefore, in the embodiment of the invention, when the Region server fails, log segmentation is not needed, the Region corresponding to the failed Region server is directly obtained in the index table, and the obtained Region is distributed to the normally running Region server, so that service recovery is completed, and the efficiency of service recovery after the Region server fails is improved.
Referring to fig. 5, fig. 5 is another schematic flow chart of a service resuming method provided in an embodiment of the present invention, which is applied in an Hbase system, and compared with the embodiment shown in fig. 4, the present embodiment adds a step of service switching back, in the method:
steps S501 to S503 are similar to steps S301 to S303 shown in fig. 3, and therefore, are not described again here.
The procedure for the service switch back from step S504 is as follows:
s504: after the fault recovery of the first RegionServer is sensed, searching a stored index table, and obtaining a first Region distributed to the second RegionServer and second log data in the index table;
specifically, after the failure of the first regionServer is recovered, the Master can know that the failure of the first regionServer is recovered through zookeeper perception, search the stored index table, and simultaneously obtain second log data corresponding to the first Region allocated to the second regionServer. Here, the first log data and the second log data are different, and the second log data includes the first log data and log data generated at the time of a read-write operation performed by a second RegionServer with respect to the first Region.
S505: storing second log data corresponding to the first Region;
and storing the obtained second log data corresponding to the first Region to a specified position associated with the first Region. Specifically, the designated location may be "retrieved.
S506: distributing the first Region back to the first Region server so as to perform service back-cut according to second log data corresponding to the first Region when the first Region server loads the first Region;
specifically, after the failure of the first regionServer is recovered, when the first regionServer loads the first Region corresponding to the regionServer, the second log data corresponding to the first Region allocated to the second Region are obtained from the stored second log data corresponding to the first Region, and the service switching is performed. It can be understood that after the failure of the first regionServer is recovered, when the first regionServer loads the first Region of the regionServer, the first regionServer obtains the second log data corresponding to the first Region from the specified position associated with the first Region to perform the service switching back.
In practical application, after the second log data corresponding to the first Region is distributed back to the first Region server, the method further includes: and setting the state information of the running state of the first RegionServer in the index table to be an activated state, wherein the activated state indicates that the first RegionServer runs normally. In addition, when the region Server in the Hbase system has a fault, the method of the invention can be used for recovering the service.
By applying the above embodiments, after the RegionServer fails, the stored index table can be searched, the Region corresponding to the RegionServer and the log data corresponding to the Region are directly obtained, the RegionServer is recovered, and the second log data is migrated back to the log record of the RegionServer in the index table, so that the service is switched back to the original RegionServer. Therefore, the embodiment of the invention omits the process of log segmentation, not only improves the efficiency of service recovery after the fault of the RegionServer, but also improves the efficiency of service switching after the fault of the RegionServer is recovered.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a service restoration apparatus according to an embodiment of the present invention, applied to an Hbase system, where the apparatus includes:
a first table look-up unit 601, configured to, after sensing that a first RegionServer fails, look up a stored index table, and obtain, in the index table, a first Region corresponding to the first RegionServer and first log data; the index table is used for recording the corresponding relation among the RegionServer, the Region and the log data;
a first storage unit 602, configured to store first log data corresponding to the first Region;
the first allocating unit 603 is configured to allocate the first Region to a second Region server that operates normally, so that when the second Region server loads the first Region, service recovery is performed according to stored first log data corresponding to the first Region.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: the state information unit is used for setting and expressing the running state of the RegionServer; for example, the state information of the normally operating RegionServer is set to an active state, and the state information of the failed RegionServer is set to a locked state.
In an implementation manner of the embodiment of the present invention, the first allocating unit 603 includes:
the first table look-up subunit is used for selecting a normally running RegionServer according to the state information in the index table;
specifically, the index table is searched to obtain the region Server in the current activation state;
and the first allocation subunit is used for taking the selected Region server as a second Region server and allocating the first Region to the second Region server.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: and the first log recording unit is used for recording first log data generated by each read-write operation executed aiming at the first Region into the index table before the first Region Server fails.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: a second log recording unit, configured to record second log data in the index table, so that the index table records a corresponding relationship between the second RegionServer and the first Region, and the second log data; the second log data includes first log data and log data generated by the second RegionServer in response to the read-write operation performed by the first Region.
By applying the embodiments, when the Region server fails, log segmentation is not needed, the Region corresponding to the failed Region server is directly obtained in the index table and distributed to the normally running Region server, so that service recovery is completed, and the efficiency of service recovery after the Region server fails is improved.
Referring to fig. 7, fig. 7 is a schematic structural diagram of another service restoration apparatus provided in an embodiment of the present invention, and the apparatus is applied to an Hbase system, and the apparatus includes:
a first table look-up unit 701, configured to, after sensing that a first RegionServer fails, look up a stored index table, and obtain, in the index table, a first Region corresponding to the first RegionServer and first log data; the index table is used for recording the corresponding relation among the RegionServer, the Region and the log data;
a first storage unit 702, configured to store first log data corresponding to the first Region;
the first allocating unit 703 is configured to allocate the first Region to a second Region server that operates normally, so that when the second Region server loads the first Region, service recovery is performed according to the stored first log data corresponding to the first Region.
It should be noted that the first lookup table unit 701, the first storage unit 702, and the first allocation unit 703 in this embodiment may be similar to the first lookup table unit 601, the first storage unit 602, and the first allocation unit 603 in the embodiment shown in fig. 6, respectively, and therefore, description thereof is omitted here.
A second table look-up unit 704, configured to, after sensing that the first RegionServer failure is recovered, look up a stored index table, and obtain, in the index table, a first Region allocated to the second RegionServer and second log data;
a second storage unit 705, configured to store second log data corresponding to the first Region;
the second allocating unit 706 is configured to allocate the first Region back to the first Region server, so that when the first Region server loads the first Region, service cutback is performed according to second log data corresponding to the first Region.
In an implementation manner of the embodiment of the present invention, the index table further includes state information, so the apparatus further includes: the state setting unit is used for setting the state information of the normally running RegionServer into an activated state and setting the state information of the faulted RegionServer into a locked state;
in an implementation manner of the embodiment of the present invention, the first allocating unit 703 includes:
the selecting subunit is used for selecting the normally running RegionServer according to the state information in the index table;
and the distribution subunit is used for taking the selected Region server as a second Region server and distributing the first Region to the second Region server.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: and the first log recording unit is used for recording first log data generated by each read-write operation executed aiming at the first Region into the index table before the first Region Server fails.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: a second log recording unit, configured to record second log data in the index table, so that the index table records a corresponding relationship between the second RegionServer and the first Region, and the second log data; the second log data includes first log data and log data generated by the second RegionServer in response to the read-write operation performed by the first Region.
By applying the above embodiments, after the RegionServer fails, the stored index table can be searched, the Region corresponding to the RegionServer and the log data corresponding to the Region are directly obtained, the RegionServer is recovered, and the second log data is distributed back to the recovered RegionServer, so that the service is switched back to the original RegionServer. Therefore, the embodiment of the invention omits the process of log segmentation, not only improves the efficiency of service recovery after the fault of the RegionServer, but also improves the efficiency of service switching after the fault of the RegionServer is recovered.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (7)

1. A service restoration method is applied to an Hbase system, and comprises the following steps:
after sensing that a first area server fails, searching a stored index table, and acquiring a first area corresponding to the first area server and first log data in the index table; the index table is used for recording the corresponding relation among the area server, the area and the log data;
storing first log data corresponding to the first area;
distributing the first area to a second area server which normally runs, so that when the second area server loads the first area, service recovery is carried out according to stored first log data corresponding to the first area;
after assigning the first zone to a second zone server that is operating normally, the method further comprises:
recording second log data in the index table, so that the index table records the corresponding relation between the second area server and the first area as well as the second log data; the second log data comprises first log data and log data generated by the second area server during read-write operation executed on the first area;
the method further comprises the following steps:
after the fault recovery of the first regional server is sensed, searching a stored index table, and obtaining a first region distributed to the second regional server and second log data in the index table;
storing second log data corresponding to the first area;
and allocating the first area back to the first area server, so that when the first area server loads the first area, service switching back is carried out according to second log data corresponding to the first area.
2. The method of claim 1, wherein the index table further comprises status information, and the status information is used for indicating an operation status of the area server;
the allocating the first zone to a second zone server that operates normally includes:
selecting a normally operating regional server according to the state information in the index table;
and taking the selected area server as a second area server, and distributing the first area server to the second area server.
3. The method of claim 1, further comprising: before the first area server fails, first log data generated in each read-write operation executed for the first area is recorded in the index table.
4. The method according to claim 1, wherein the first area server corresponds to a plurality of first areas;
the step of storing the first log data corresponding to the first area is as follows: storing first log data corresponding to each first area;
the step of allocating the first area to a second area server which normally operates is as follows: and respectively allocating each first area to one or more second area servers which normally run, so that when each second area server loads the allocated first area, service recovery is carried out according to the stored first log data corresponding to the first area.
5. A service restoration device, applied to an Hbase system, includes:
the first table look-up unit is used for looking up a stored index table after sensing that a first area server fails, and acquiring a first area corresponding to the first area server and first log data in the index table; the index table is used for recording the corresponding relation among the area server, the area and the log data;
a first storage unit, configured to store first log data corresponding to the first area;
the first allocation unit is used for allocating the first area to a second area server which normally runs, so that when the second area server loads the first area, service recovery is carried out according to stored first log data corresponding to the first area;
the device further comprises:
a second log recording unit, configured to record second log data in the index table, so that the index table records a corresponding relationship between the second area server and the first area, and the second log data; the second log data comprises first log data and log data generated by the second area server during read-write operation executed on the first area;
the device further comprises:
the second table look-up unit is used for looking up the stored index table after sensing the fault recovery of the first area server, and acquiring a first area and second log data which are distributed to the second area server in the index table;
the second storage unit is used for storing second log data corresponding to the first area;
a second allocation unit configured to allocate the first area back to the first area server so that the first area server performs a service return based on second log data corresponding to the first area when the first area server is loaded with the first area.
6. The apparatus of claim 5, further comprising: a state information unit for setting and representing the operation state of the area server;
the first distribution unit includes:
the selecting subunit is used for selecting the area server which normally runs according to the state information in the index table;
and an allocation subunit, configured to use the selected area server as a second area server, and allocate the first area server to the second area server.
7. The apparatus of claim 5, further comprising:
and a first log recording unit configured to record, in the index table, first log data generated for each read/write operation performed on the first area before the first area server fails.
CN201610720759.6A 2016-08-24 2016-08-24 Service recovery method and device Active CN106339279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610720759.6A CN106339279B (en) 2016-08-24 2016-08-24 Service recovery method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610720759.6A CN106339279B (en) 2016-08-24 2016-08-24 Service recovery method and device

Publications (2)

Publication Number Publication Date
CN106339279A CN106339279A (en) 2017-01-18
CN106339279B true CN106339279B (en) 2021-10-12

Family

ID=57824872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610720759.6A Active CN106339279B (en) 2016-08-24 2016-08-24 Service recovery method and device

Country Status (1)

Country Link
CN (1) CN106339279B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117312B (en) * 2018-08-23 2022-03-01 北京小米智能科技有限公司 Data recovery method and device
CN111628893B (en) * 2020-05-27 2022-07-12 北京星辰天合科技股份有限公司 Fault processing method and device of distributed storage system and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7634679B2 (en) * 2005-11-30 2009-12-15 Microsoft Corporation Remote location failover server application
CN104424283A (en) * 2013-08-30 2015-03-18 阿里巴巴集团控股有限公司 Data migration system and data migration method
CN104636218A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Data recovery method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4415610B2 (en) * 2003-08-26 2010-02-17 株式会社日立製作所 System switching method, replica creation method, and disk device
JP4615344B2 (en) * 2005-03-24 2011-01-19 株式会社日立製作所 Data processing system and database management method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7634679B2 (en) * 2005-11-30 2009-12-15 Microsoft Corporation Remote location failover server application
CN104424283A (en) * 2013-08-30 2015-03-18 阿里巴巴集团控股有限公司 Data migration system and data migration method
CN104636218A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Data recovery method and device

Also Published As

Publication number Publication date
CN106339279A (en) 2017-01-18

Similar Documents

Publication Publication Date Title
US11586673B2 (en) Data writing and reading method and apparatus, and cloud storage system
CN108287669B (en) Date storage method, device and storage medium
US11243706B2 (en) Fragment management method and fragment management apparatus
US10838829B2 (en) Method and apparatus for loading data from a mirror server and a non-transitory computer readable storage medium
CN105630418A (en) Data storage method and device
US20180069944A1 (en) Automatic data replica manager in distributed caching and data processing systems
US9854037B2 (en) Identifying workload and sizing of buffers for the purpose of volume replication
US11073986B2 (en) Memory data versioning
EP3786802B1 (en) Method and device for failover in hbase system
CN112181736A (en) Distributed storage system and configuration method thereof
CN110633046A (en) Storage method and device of distributed system, storage equipment and storage medium
US11010072B2 (en) Data storage, distribution, reconstruction and recovery methods and devices, and data processing system
CN103150225B (en) Disk full abnormity fault tolerance method of object parallel storage system based on application level agent
CN106339279B (en) Service recovery method and device
US8621260B1 (en) Site-level sub-cluster dependencies
CN106970830B (en) Storage control method of distributed virtual machine and virtual machine
CN109032762B (en) Virtual machine backtracking method and related equipment
CN107340974B (en) Virtual disk migration method and virtual disk migration device
EP3264254B1 (en) System and method for a simulation of a block storage system on an object storage system
CN105068896A (en) Data processing method and device based on RAID backup
US20190050455A1 (en) Adaptive page rendering for a data management system
CN108769123B (en) Data system and data processing method
CN113687935A (en) Cloud native storage scheduling mode based on super-fusion design
CN107168646B (en) Distributed data storage control method and server
CN113485644A (en) IO data storage method and server

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 310052 Binjiang District Changhe Road, Zhejiang, China, No. 466, No.

Applicant after: NEW H3C TECHNOLOGIES Co.,Ltd.

Address before: 310053 Hangzhou science and Technology Industrial Park, high tech Industrial Development Zone, Zhejiang Province, No. six and road, No. 310

Applicant before: HANGZHOU H3C TECHNOLOGIES Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230529

Address after: 310052 11th Floor, 466 Changhe Road, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: H3C INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 310052 Changhe Road, Binjiang District, Hangzhou, Zhejiang Province, No. 466

Patentee before: NEW H3C TECHNOLOGIES Co.,Ltd.