Background
The HBase (Hadoop database) is a nematic, scalable and real-time read-write distributed storage database, can realize parallel and distributed processing of complex tasks, and has high processing performance and reliability.
The HBase is based on a Distributed File System 120 (HDFS), and its general structure is shown in fig. 1, and includes: a host (Master)101, a regional server (RegionServer)102, a coordinator server (Zookeeper)103, and a client 104.
Wherein, the client 104 accesses the interface of the HBase for the user.
The Master101 is connected to each of the regionservers 102, and is responsible for managing the respective regionservers 102.
The regionservers 102 are the most core modules in the HBase, and each RegionServer102 receives a read-write request of a user to a database through the client 104 according to its allocated Region (Region), responds to the read-write request, and reads and writes data into the distributed file system 120.
Zookeeper 103: and coordinating data sharing and access among network elements in the cluster, storing the positions of all the RegionServers and monitoring the states of the RegionServers in real time. In addition, the Zookeeper103 also stores other information required for the HBase to operate.
Specifically, each RegionServer102 manages multiple hregage objects. Each HRegion corresponds to a Region in the database Table Table. In HBase, one Region is allocated to only one Region server.
The principle of operation of the RegionServer202 is shown in fig. 2, which shows a case where one RegionServer202 is allocated 3 hregations 203 in fig. 2.
The Region server202 receives a read-write request of a user through the client 201, determines which Region belongs to according to data in the read-write request, submits the read-write request to a corresponding HRegion203 object, and the HRegion203 object writes data in the read-write request into a corresponding memory 204, and further reads and writes the data into a distributed file system, thereby completing the operation corresponding to the read-write request. In this process, the hregage 203 object generates log data for the read/write request and stores the log data into the HLog. One RegionServer202 corresponds to one HLog in which logs of all the RegionServers 202 are stored. The log data in the HLog are stored in sequence according to the time of log data generation, and the log data of each Region are mixed together.
In a distributed system environment, a system error or downtime cannot be avoided, and once the RegionServer is accidentally exited due to a fault, data in a memory is lost, so that service interruption is caused.
To recover the interrupted traffic, the Master will assign the Region of the failed Region server to the new Region server. The new Region server needs to obtain the log data corresponding to each Region in the failed Region server, and then the service can be respectively recovered aiming at different regions.
Because the log data of each Region in the HLog are mixed together, the log data in the HLog needs to be divided first, and the log data corresponding to each Region needs to be divided.
Currently, there are two ways to perform log splitting:
firstly, the Master senses a fault of a certain RegionServer through the Zookeepers, reads all log data in the HLog corresponding to the fault RegionServer, performs log segmentation on the log data, obtains the log data of each Region to be respectively stored, and obtains the log data distributed to the Region of the Master from the log data stored by the new RegionServer.
Secondly, the Master senses a fault of a certain RegionServer through the Zookeepers, generates a plurality of log segmentation tasks aiming at all log data in the HLog corresponding to the fault RegionServer, distributes the log segmentation tasks to different RegionServers and completes the log segmentation tasks in parallel by the different RegionServers.
Although the two modes can complete log segmentation, the first mode is completely processed by the Master, which needs a long time, and thus, the overload of the I/O and the memory is caused, and the processing efficiency is low, because the data volume of log data is huge; in the second mode, the RegionServer processing needs to occupy a large amount of computing resources of different regionservers, which causes resource consumption and lower processing efficiency.
Therefore, in the prior art, because log segmentation is required, and the processing efficiency of the current log segmentation is low, the efficiency of service recovery is not high.
Disclosure of Invention
The embodiment of the invention aims to provide a service recovery method and a service recovery device so as to improve the efficiency of service recovery after a regional server fails. The specific technical scheme is as follows:
on one hand, the embodiment of the invention discloses a service recovery method, which is applied to an Hbase system and comprises the following steps:
after sensing that a first area server fails, searching a stored index table, and acquiring a first area corresponding to the first area server and first log data in the index table; the index table is used for recording the corresponding relation among the area server, the area and the log data;
storing first log data corresponding to the first area;
and distributing the first area to a second area server which normally runs, so that when the second area server loads the first area, service recovery is carried out according to the stored first log data corresponding to the first area.
Preferably, the index table further includes status information, and the status information is used to indicate an operation status of the area server;
the allocating the first zone to a second zone server that operates normally includes:
selecting a normally operating regional server according to the state information in the index table;
and taking the selected area server as a second area server, and distributing the first area server to the second area server.
Preferably, the method further comprises: before the first area server fails, first log data generated in each read-write operation executed for the first area is recorded in the index table.
Preferably, after the first zone is allocated to a second zone server which normally operates, the method further includes:
recording second log data in the index table, so that the index table records the corresponding relation between the second area server and the first area as well as the second log data; the second log data includes first log data and log data generated by the second area server in read-write operation performed on the first area.
Preferably, the method further comprises:
after the fault recovery of the first regional server is sensed, searching a stored index table, and obtaining a first region distributed to the second regional server and second log data in the index table;
storing second log data corresponding to the first area;
and allocating the first area back to the first area server, so that when the first area server loads the first area, service switching back is carried out according to second log data corresponding to the first area.
Preferably, the number of the first areas corresponding to the first area server is multiple;
the step of storing the first log data corresponding to the first area is as follows: storing first log data corresponding to each first area;
the step of allocating the first area to a second area server which normally operates is as follows: and respectively allocating each first area to one or more second area servers which normally run, so that when each second area server loads the allocated first area, service recovery is carried out according to the stored first log data corresponding to the first area.
On the other hand, the embodiment of the invention discloses a service recovery device, which is applied to an Hbase system and comprises the following components:
the first table look-up unit is used for looking up a stored index table after sensing that a first area server fails, and acquiring a first area corresponding to the first area server and first log data in the index table; the index table is used for recording the corresponding relation among the area server, the area and the log data;
a first storage unit, configured to store first log data corresponding to the first area;
and the first allocation unit is used for allocating the first area to a second area server which normally runs, so that when the second area server loads the first area, service recovery is carried out according to the stored first log data corresponding to the first area.
Preferably, the apparatus further comprises: a state information unit for setting and representing the operation state of the area server;
the first distribution unit includes:
the selecting subunit is used for selecting the area server which normally runs according to the state information in the index table;
and an allocation subunit, configured to use the selected area server as a second area server, and allocate the first area server to the second area server.
Preferably, the apparatus further comprises:
and a first log recording unit configured to record, in the index table, first log data generated for each read/write operation performed on the first area before the first area server fails.
Preferably, the apparatus further comprises:
a second log recording unit, configured to record second log data in the index table, so that the index table records a corresponding relationship between the second area server and the first area, and the second log data; the second log data includes first log data and log data generated by the second area server in read-write operation performed on the first area.
Preferably, the apparatus further comprises:
the second table look-up unit is used for looking up the stored index table after sensing the fault recovery of the first area server, and acquiring a first area and second log data which are distributed to the second area server in the index table;
the second storage unit is used for storing second log data corresponding to the first area;
a second allocation unit configured to allocate the first area back to the first area server so that the first area server performs a service return based on second log data corresponding to the first area when the first area server is loaded with the first area.
As can be seen from the foregoing technical solutions, the service recovery method and apparatus provided in the embodiments of the present invention include, in the index table, a correspondence between the area server, the area, and the log data; that is to say, in the index table, the log data corresponding to each area has been recorded, so that after the area server fails, the area corresponding to the failed area server and the log data corresponding to the area can be directly searched from the index table without performing log segmentation, thereby improving the efficiency of service recovery after the area server fails. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a service recovery method and a service recovery device, which are used for improving the efficiency of service recovery after a regional server fails.
It should be noted that, in the embodiment of the present invention, an index table is preset in the Hbase system, and a Region server (RegionServer) which normally operates will uniformly write log data corresponding to a Region (Region) allocated by the Region server into the index table. According to the method and the device for recovering the service after the RegionServer fault, the Master in the Hbase system is preferably used for processing based on the index table, so that the service recovery is realized.
Referring to fig. 3, fig. 3 is a schematic flow chart of a service restoration method provided in an embodiment of the present invention, and the method is applied to an Hbase system, and the method includes:
s301: after sensing that a first RegionServer fails, searching a stored index table, and obtaining a first Region corresponding to the first RegionServer and first log data in the index table; the index table is used for recording the corresponding relation among the RegionServer, the Region and the log data;
specifically, the failure of the first RegionServer may be understood as a failure of one RegionServer in the Hbase system, or a failure of multiple regionservers. The first Region corresponding to the first Region server may be one or more regions. When one or more regions servers in the Hbase system simultaneously fail, the Master senses the failed Region or regions servers through the zookeeper and searches and acquires the Region or regions corresponding to the failed Region or regions servers in the stored index table. In the index table, each RegionServer corresponds to one or more Regions, and each RegionServer also corresponds to log data corresponding to the first RegionS.
S302: storing first log data corresponding to the first Region;
s303: and distributing the first Region to a second Region server which normally runs, so that when the second Region server loads the first Region, service recovery is carried out according to the stored first log data corresponding to the first Region.
Specifically, when one or more regions servers in the Hbase system fail, the regions corresponding to the one or more regions servers in the index table are allocated to the normally operating regions servers in the Hbase system. It should be noted that, if the number of the corresponding regions of the failed Region server in the index table is small, all the regions may be allocated to a normally operating Region server, and if the number of the regions allocated to the failed Region server is large, all the regions may be allocated to a normally operating Region server, so as to prevent the load sharing of each Region server from being uneven.
In practical applications, the index table may further include state information for indicating the operating state of each RegionServer. For example, when the RegionServer normally operates, the Master sets the state information of the RegionServer to an Active state, and specifically, the state information of the RegionServer which normally operates may be represented as an Active state in the index table; or, when the RegionServer fails, the Master sets the state information of the RegionServer to a locked state, specifically, the state information of the RegionServer may be indicated as a locked state by "Block" in the index table, that is, the state information of each RegionServer that normally operates is set to "Active", and the state information of each RegionServer that fails is set to "Block". Therefore, the Master can sense the failed RegionServer according to the state information of the RegionServer.
In this case, the step S303 may include: and searching the index table, obtaining the current activated area servers, determining one or more area servers as second area servers from the current activated area servers, allocating the first area to the second area servers which normally operate, namely determining one or more area servers from the area servers which normally operate (namely, allowing the second area servers to take over the first area), and allocating the first area of the failed area server to the determined one or more area servers. Here, it may be determined to select one or more regionservers according to a load condition of the normally operating regionservers; for example, if the number of the corresponding regions of the second Region server in the index table is less, the first Region is allocated to the second Region server.
In addition, in the step S302, after acquiring log data corresponding to one or more regions corresponding to the failed Region server in the index table, the log data may be stored in a specified position associated with a first Region, where the first Region is at least one Region or a plurality of regions.
In this way, in step S303, when the second Region server loads the allocated first Region, the first log data corresponding to the allocated first Region is obtained from the designated location associated with the first Region, respectively, and service recovery is performed.
In the embodiment of the present invention, after the first Region is allocated to the second Region server which normally operates, first log data corresponding to the first Region may be added to the index table, where the first log data corresponds to the second Region server and the first Region.
In an implementation manner of the embodiment of the present invention, before the first RegionServer fails, first log data generated at each read-write operation executed for the first Region is recorded in the index table. It can be understood that, when no RegionServer fails in the Hbase system, that is, all regionservers are in a normal operating state, the RegionServer generates corresponding log data for each read-write operation performed by a Region, and writes the log data into the index table.
Referring to fig. 4, fig. 4 is a schematic diagram of an operating principle of a service restoration method RegionServer provided by applying the embodiment of the present invention. Fig. 4 shows a case where 6 regions (Region1, Region2, Region3, Region4, Region5, and Region6) are allocated to 3 Region servers (Region server1, Region server2, and Region server3), and 2regions are allocated to each Region server.
In this embodiment, the Hbase system further stores an index table, specifically, the index table may be stored in any designated location that can be associated with the Master, the RegionServer, and the Region, and the index table is a table established according to a corresponding relationship among the RegionServer, the Region, and the log data.
For example, specifically, before the RegionServer1 fails, first log data is generated at each read-write operation performed on the Region1 and the Region2 of the RegionServer1 and added to the index table, and the first log data corresponds to the RegionServer1, the Region1, and the Region 2.
For the case illustrated in fig. 4, before the RegionServer1 fails, the index table is shown in table one, where Log in the table is Log data, and Log1 and Log2 are first Log data, specifically, the first Log data corresponds to the RegionServer 1.
Watch 1
As shown in table two: when the Master senses that the RegionServer1 has a fault, the state information of the RegionServer1 is set to be 'Block', the RegionServer1 and the RegionServer2 in the index table of the RegionServer1 are respectively allocated to the RegionServer2 and the RegionServer3 which normally operate, and meanwhile, first Log data corresponding to the RegionServer1 and the RegionServer2 are respectively and correspondingly added to the RegionServer2 and the RegionServer3, wherein logs in the table are Log data, specifically, logs 1 and logs 2 are first Log data, and the first Log data are specifically Log data corresponding to the RegionServer 1. It should be noted that, in this embodiment, the first RegionServer is a RegionServer1, the second RegionServer includes a RegionServer2 and a RegionServer3, and the first Region includes a Region1 and a Region 2.
Watch two
As shown in table two, when the RegionServer fails, the state of the RegionServer becomes Block, the Master writes the Log and Log (i.e. the first Log data) corresponding to the RegionServer into the Region and the Region corresponding to the RegionServer, respectively, and allocates the Region and the Region into which the first Log data is written to the Region server and the Region server, respectively (here, load balancing is considered, so the Region and the Region are allocated to the Region server2Region server, respectively, that is, the Region server and the Region server are the second Region server in the present embodiment), after the Region and the Region are allocated, the Region server and the Region server continue to execute the service in the Region by reading and writing the data in the Region and the Region, thereby realizing the service recovery, further, the Region server generates the Log, the Region server, and the Region table for the corresponding operations, and the Region to the Region server, and the Region server, respectively, and generates the Log and the Region into the Region table (Log and the Region) for the Region server, region2 corresponds to Region Server3 and Log2+ Log 2'. At this time, the index table records the corresponding relationship between the second RegionServer and the first Region, and the second log data, where the second log data includes the first log data and the log data generated by the second RegionServer during the read-write operation performed on the first Region (i.e. the second log data is log1+ log1 'and log2+ log 2').
By applying the embodiment, when the Region Server in the Hbase system fails, the Region corresponding to the failed Region Server can be directly acquired from the stored index table, and the Region is allocated to the normally operating Region Server in the Hbase system. Therefore, in the embodiment of the invention, when the Region server fails, log segmentation is not needed, the Region corresponding to the failed Region server is directly obtained in the index table, and the obtained Region is distributed to the normally running Region server, so that service recovery is completed, and the efficiency of service recovery after the Region server fails is improved.
Referring to fig. 5, fig. 5 is another schematic flow chart of a service resuming method provided in an embodiment of the present invention, which is applied in an Hbase system, and compared with the embodiment shown in fig. 4, the present embodiment adds a step of service switching back, in the method:
steps S501 to S503 are similar to steps S301 to S303 shown in fig. 3, and therefore, are not described again here.
The procedure for the service switch back from step S504 is as follows:
s504: after the fault recovery of the first RegionServer is sensed, searching a stored index table, and obtaining a first Region distributed to the second RegionServer and second log data in the index table;
specifically, after the failure of the first regionServer is recovered, the Master can know that the failure of the first regionServer is recovered through zookeeper perception, search the stored index table, and simultaneously obtain second log data corresponding to the first Region allocated to the second regionServer. Here, the first log data and the second log data are different, and the second log data includes the first log data and log data generated at the time of a read-write operation performed by a second RegionServer with respect to the first Region.
S505: storing second log data corresponding to the first Region;
and storing the obtained second log data corresponding to the first Region to a specified position associated with the first Region. Specifically, the designated location may be "retrieved.
S506: distributing the first Region back to the first Region server so as to perform service back-cut according to second log data corresponding to the first Region when the first Region server loads the first Region;
specifically, after the failure of the first regionServer is recovered, when the first regionServer loads the first Region corresponding to the regionServer, the second log data corresponding to the first Region allocated to the second Region are obtained from the stored second log data corresponding to the first Region, and the service switching is performed. It can be understood that after the failure of the first regionServer is recovered, when the first regionServer loads the first Region of the regionServer, the first regionServer obtains the second log data corresponding to the first Region from the specified position associated with the first Region to perform the service switching back.
In practical application, after the second log data corresponding to the first Region is distributed back to the first Region server, the method further includes: and setting the state information of the running state of the first RegionServer in the index table to be an activated state, wherein the activated state indicates that the first RegionServer runs normally. In addition, when the region Server in the Hbase system has a fault, the method of the invention can be used for recovering the service.
By applying the above embodiments, after the RegionServer fails, the stored index table can be searched, the Region corresponding to the RegionServer and the log data corresponding to the Region are directly obtained, the RegionServer is recovered, and the second log data is migrated back to the log record of the RegionServer in the index table, so that the service is switched back to the original RegionServer. Therefore, the embodiment of the invention omits the process of log segmentation, not only improves the efficiency of service recovery after the fault of the RegionServer, but also improves the efficiency of service switching after the fault of the RegionServer is recovered.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a service restoration apparatus according to an embodiment of the present invention, applied to an Hbase system, where the apparatus includes:
a first table look-up unit 601, configured to, after sensing that a first RegionServer fails, look up a stored index table, and obtain, in the index table, a first Region corresponding to the first RegionServer and first log data; the index table is used for recording the corresponding relation among the RegionServer, the Region and the log data;
a first storage unit 602, configured to store first log data corresponding to the first Region;
the first allocating unit 603 is configured to allocate the first Region to a second Region server that operates normally, so that when the second Region server loads the first Region, service recovery is performed according to stored first log data corresponding to the first Region.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: the state information unit is used for setting and expressing the running state of the RegionServer; for example, the state information of the normally operating RegionServer is set to an active state, and the state information of the failed RegionServer is set to a locked state.
In an implementation manner of the embodiment of the present invention, the first allocating unit 603 includes:
the first table look-up subunit is used for selecting a normally running RegionServer according to the state information in the index table;
specifically, the index table is searched to obtain the region Server in the current activation state;
and the first allocation subunit is used for taking the selected Region server as a second Region server and allocating the first Region to the second Region server.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: and the first log recording unit is used for recording first log data generated by each read-write operation executed aiming at the first Region into the index table before the first Region Server fails.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: a second log recording unit, configured to record second log data in the index table, so that the index table records a corresponding relationship between the second RegionServer and the first Region, and the second log data; the second log data includes first log data and log data generated by the second RegionServer in response to the read-write operation performed by the first Region.
By applying the embodiments, when the Region server fails, log segmentation is not needed, the Region corresponding to the failed Region server is directly obtained in the index table and distributed to the normally running Region server, so that service recovery is completed, and the efficiency of service recovery after the Region server fails is improved.
Referring to fig. 7, fig. 7 is a schematic structural diagram of another service restoration apparatus provided in an embodiment of the present invention, and the apparatus is applied to an Hbase system, and the apparatus includes:
a first table look-up unit 701, configured to, after sensing that a first RegionServer fails, look up a stored index table, and obtain, in the index table, a first Region corresponding to the first RegionServer and first log data; the index table is used for recording the corresponding relation among the RegionServer, the Region and the log data;
a first storage unit 702, configured to store first log data corresponding to the first Region;
the first allocating unit 703 is configured to allocate the first Region to a second Region server that operates normally, so that when the second Region server loads the first Region, service recovery is performed according to the stored first log data corresponding to the first Region.
It should be noted that the first lookup table unit 701, the first storage unit 702, and the first allocation unit 703 in this embodiment may be similar to the first lookup table unit 601, the first storage unit 602, and the first allocation unit 603 in the embodiment shown in fig. 6, respectively, and therefore, description thereof is omitted here.
A second table look-up unit 704, configured to, after sensing that the first RegionServer failure is recovered, look up a stored index table, and obtain, in the index table, a first Region allocated to the second RegionServer and second log data;
a second storage unit 705, configured to store second log data corresponding to the first Region;
the second allocating unit 706 is configured to allocate the first Region back to the first Region server, so that when the first Region server loads the first Region, service cutback is performed according to second log data corresponding to the first Region.
In an implementation manner of the embodiment of the present invention, the index table further includes state information, so the apparatus further includes: the state setting unit is used for setting the state information of the normally running RegionServer into an activated state and setting the state information of the faulted RegionServer into a locked state;
in an implementation manner of the embodiment of the present invention, the first allocating unit 703 includes:
the selecting subunit is used for selecting the normally running RegionServer according to the state information in the index table;
and the distribution subunit is used for taking the selected Region server as a second Region server and distributing the first Region to the second Region server.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: and the first log recording unit is used for recording first log data generated by each read-write operation executed aiming at the first Region into the index table before the first Region Server fails.
In an implementation manner of the embodiment of the present invention, the apparatus further includes: a second log recording unit, configured to record second log data in the index table, so that the index table records a corresponding relationship between the second RegionServer and the first Region, and the second log data; the second log data includes first log data and log data generated by the second RegionServer in response to the read-write operation performed by the first Region.
By applying the above embodiments, after the RegionServer fails, the stored index table can be searched, the Region corresponding to the RegionServer and the log data corresponding to the Region are directly obtained, the RegionServer is recovered, and the second log data is distributed back to the recovered RegionServer, so that the service is switched back to the original RegionServer. Therefore, the embodiment of the invention omits the process of log segmentation, not only improves the efficiency of service recovery after the fault of the RegionServer, but also improves the efficiency of service switching after the fault of the RegionServer is recovered.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.