CN107273231A

CN107273231A - Distributed memory system hard disk tangles fault detect, processing method and processing device

Info

Publication number: CN107273231A
Application number: CN201610212740.0A
Authority: CN
Inventors: 王勇; 赵树起; 朱家稷; 董乘宇
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-04-07
Filing date: 2016-04-07
Publication date: 2017-10-20
Also published as: WO2017173927A1; TW201737111A

Abstract

Fault detect, processing method and processing device are tangled this application discloses a kind of distributed memory system hard disk, it judges whether the target hard disk occurs tangling failure by detecting the execution time of the corresponding access request of target hard disk, and target hard disk can be found in time tangles failure；After finding that target hard disk occurs tangling failure, on the one hand faulty hard disk is avoided to be accessed again by status indication, on the other hand the system resource that cleaning faulty hard disk takes, other processes are redistributed using these system resources, reduction hard disk tangles the adverse effect that failure may be brought, and reaches and stops loss purpose.It can be seen that, tangle fault detect and the processing scheme of the application offer had both needed not rely on HD vendor's offer detection instrument, it is not required that increase new hardware on hard disk, it is not required that human intervention, simple and easy to apply, did not interfered with production and the use cost of hard disk.

Description

Distributed memory system hard disk tangles fault detect, processing method and processing device

Technical field

Fault detect, place are tangled the present invention relates to field of computer technology, more particularly to a kind of distributed memory system hard disk Manage method and device.

Background technology

Distributed memory system is to build the storage system on local file system, and the scattered storage of data is arrived multiple by it On hard disk.For distributed memory system, have on the whole link from local file system to each hard drive internal Failure is likely to occur, wherein hard disk tangles (hang up) failure, shows as hard disk and cannot respond to normal operation, own The input-output operation of the hard disk is not all replied because of whole link and can not be stopped.If the hard disk processing tangled is not Lose response when may result in whole access process, so cause the data for being managed the process can not all access, it is preceding The problems such as end request delay is uprised, system load increase, availability of data are reduced.Therefore detecting hard disk tangles failure in time, The influence that the failure is caused is reduced, is a key issue for ensureing distributed memory system performance.

Existing hard disk, which tangles fault handling method, mainly includes following four：(1) using HD vendor provide instrument to Hard disk sends lower line, and hard disk is stopped after receiving lower line, so that the access to hard disk can be returned, eventually Only hard disk tangles state；(2) stop hard disk operational using the hardware switch of hard disk, be typically to increase by one on existing hard disk Individual part, the voltage of hard disk is directly dragged down by the part, makes hard disk power down, so that terminating hard disk tangles state；(3) Restart machine, after restarting, disk state is reset, but only exist the possibility that improvement hard disk tangles state；(4) directly Restart process, new process can evade using the hard disk tangled.

But above-mentioned processing method all has certain defect, including need to rely on extra aid, influence system money Source availability etc..Specifically, the above method (1) need to rely on the instrument of HD vendor's offer, and hard disk is not suitable for it The situation of lower line can not be received, practical application success rate is relatively low；Method (2) needs to increase new hardware on hard disk (i.e. Hardware switch), the cost increase for causing hard disk to develop and safeguard, and narrow application range；Method (3) introduces artificial dry In advance, during machine is restarted, machine is reduced with the availability of storage system in itself, and in the presence of the possibility for restarting failure, Even if restarting success, it is also desirable to which storage system can evade the use of the hard disk to tangling, the requirement to storage system is higher； Original process in method (4) is because there is thread to tangle, it is impossible to releasing memory resource so that Installed System Memory takes height, even if Having restarted the available resources of system can also reduce.Therefore, a kind of success rate is needed badly high, applied widely, available to system Property the small hard disk of influence tangle fault handling method.

The content of the invention

The application first technical problem to be solved is that distributed storage system is realized on the premise of not against aid System hard disk tangles the automatic detection of failure；Therefore, the application, which provides a kind of distributed memory system hard disk, tangles fault detect Method and device.

The application first aspect tangles fault detection method there is provided a kind of distributed memory system hard disk, including：

Detect the execution time of each corresponding access request of target hard disk；

Judge whether that the execution time is more than the time lag request of corresponding predetermined threshold value；

If there is time lag request, it is determined that the target hard disk occurs tangling failure.

With reference in a first aspect, in the application first aspect the first feasible embodiment, the fault detection method is also Including：

Create the corresponding IO sets of threads of the target hard disk；

Read by the IO sets of threads and handle each corresponding access request of the target hard disk, to complete to described The read-write operation of target hard disk.

With reference in a first aspect, or first aspect the first feasible embodiment, in second of feasible reality of first aspect Apply in mode, the execution time of each corresponding access request of detection target hard disk, including：

Detect the execution time of the access request in team's head position in the input rank of target hard disk.

The application second aspect tangles failure detector there is provided a kind of distributed memory system hard disk, including：

Detection unit, the execution time for detecting each corresponding access request of target hard disk；

Comparing unit, for judging whether that the execution time is more than the time lag request of corresponding predetermined threshold value, if there is The time lag request, it is determined that the target hard disk occurs tangling failure.

With reference to second aspect, in second aspect in the first feasible embodiment, the failure detector also includes：

Management of process unit, reads for creating the corresponding IO sets of threads of the target hard disk, and by the IO sets of threads Take and handle each corresponding access request of the target hard disk, to complete the read-write operation to the target hard disk.

With reference to second aspect, or second aspect the first feasible embodiment, in second of feasible reality of second aspect Apply in mode, to realize the execution time of corresponding each access request of detection target hard disk, the specific quilt of the detection unit It is configured to：

From above technical scheme, the embodiment of the present application is by detecting execution time of the corresponding access request of target hard disk To judge whether the target hard disk occurs tangling failure, target hard disk can be found in time tangles failure；And this tangles event Barrier detection mode had both needed not rely on HD vendor and provides detection instrument, it is not required that increase new hardware on hard disk, also not Human intervention is needed, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.

The application second technical problem to be solved is that distributed storage system is realized on the premise of not against aid System hard disk tangles automatically processing for failure；Therefore, the application, which provides a kind of distributed memory system hard disk, tangles troubleshooting Method and device.

The application third aspect tangles fault handling method there is provided a kind of distributed memory system hard disk, including：

It is to tangle malfunction by the status indication of the target hard disk when failure occurs tangling in target hard disk；

Clear up that the target hard disk is corresponding to be tangled system resource shared by managing process, new be used to manage to start Manage the managing process of the target hard disk.

With reference to the third aspect, in the third aspect in the first feasible embodiment, the corresponding quilt of the target hard disk is cleared up The system resource shared by managing process is tangled, including：

Apply for new internal memory, and following two steps are performed by the new internal memory to operate, described tangled managing process to remove and accounted for Memory source；

Search and obtain the full memory section for being tangled process occupancy；

The corresponding internal memory mapping of each application heap is released respectively.

With reference to the third aspect, or the third aspect the first feasible embodiment, in second of feasible reality of the third aspect Apply in mode, the fault handling method also includes：

Before clearing up that the target hard disk is corresponding and being tangled system resource shared by managing process, the target is ejected Each access request cached in the input rank of hard disk, and return to the fault message of the target hard disk.

With reference to the third aspect, or the third aspect the first feasible embodiment, the third feasible reality in the third aspect Apply in mode, the fault handling method also includes：

After the managing process of the target hard disk is started every time, the state of the target hard disk is determined；

If the state of the target hard disk forbids the access to the target hard disk to tangle malfunction.

With reference to the third aspect, or the third aspect the first feasible embodiment, in the 4th kind of feasible reality of the third aspect Apply in mode, the fault handling method also includes：

The malfunction that tangles of the target hard disk is preserved to normal hard disk.

The application fourth aspect tangles fault treating apparatus there is provided a kind of distributed memory system hard disk, including：

State managing unit, for being extension by the status indication of the target hard disk when failure occurs tangling in target hard disk Firmly malfunction；

Resource clears up unit, and for clearing up, the target hard disk is corresponding to be tangled system resource shared by managing process, To start the new managing process for being used to manage the target hard disk.

With reference to fourth aspect, in fourth aspect in the first feasible embodiment, to realize in the cleaning target hard disk System resource shared by managing process is tangled, the resource cleaning unit is specifically configured to, and applies for new internal memory, and Following two steps are performed by the new internal memory to operate, to remove the memory source for being tangled managing process occupancy：Search The full memory section taken by the process that tangled is obtained, and releases the corresponding internal memory mapping of each application heap respectively.

With reference to fourth aspect, or fourth aspect the first feasible embodiment, in second of feasible reality of fourth aspect Apply in mode, the fault treating apparatus also includes：

Request cleaning unit, each access request cached in the input rank for ejecting the target hard disk, and return The fault message of the target hard disk.

With reference to fourth aspect, or fourth aspect the first feasible embodiment, the third feasible reality in fourth aspect Apply in mode, the fault treating apparatus also includes：

Availability supervision unit, for after the managing process of the target hard disk is started every time, determining the target hard disk State, and the target hard disk state for tangle malfunction when, forbid the access to the target hard disk.

With reference to fourth aspect, or fourth aspect the first feasible embodiment, in the 4th kind of feasible reality of fourth aspect Apply in mode, the state managing unit is additionally operable to：The malfunction that tangles of the target hard disk is preserved to normal Hard disk.

From above technical scheme, on the one hand the embodiment of the present application passes through after finding that target hard disk occurs tangling failure Status indication avoids faulty hard disk from being accessed again, the system resource that on the other hand cleaning faulty hard disk takes so that other Process can be redistributed using these system resources, and reduction hard disk tangles the adverse effect that failure may be brought, reached only Damage purpose.It can be seen that, the troubleshooting scheme that tangles of the embodiment of the present application offer had both needed not rely on HD vendor's offer detection Instrument, it is not required that increase new hardware on hard disk, it is not required that human intervention, it is simple and easy to apply, do not interfere with hard disk Production and use cost.

It should be appreciated that the general description of the above and detailed description hereinafter are only exemplary and explanatory, can not Limit the application.

Brief description of the drawings

, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art The accompanying drawing used required in technology description is briefly described, it should be apparent that, for those of ordinary skill in the art Speech, without having to pay creative labor, can also obtain other accompanying drawings according to these accompanying drawings.

Fig. 1 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault detection method Flow chart.

Fig. 2 please to be accessed in a data memory node in the distributed memory system shown in the exemplary embodiment of the application one Seek handling process schematic diagram.

Fig. 3 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault handling method Flow chart.

Fig. 4 is that another distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault handling method Flow chart.

Fig. 5 is that the distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault detect and processing method Timing diagram.

Fig. 6 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles failure detector Structured flowchart.

Fig. 7 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault treating apparatus Structured flowchart.

Fig. 8 is that another distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault treating apparatus Structured flowchart.

Embodiment

Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to attached During figure, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary is implemented Embodiment described in example does not represent all embodiments consistent with the application.On the contrary, they be only with such as The example of the consistent apparatus and method of some aspects be described in detail in appended claims, the application.

For comprehensive understanding the application, numerous concrete details are refer in the following detailed description, but art technology Personnel are it should be understood that the application can be realized without these details.In other embodiments, public affairs are not described in detail Method, process, component and the circuit known, are obscured in order to avoid undesirably resulting in embodiment.

Fig. 1 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault detection method Schematic flow sheet.As shown in figure 1, the detection method includes：

S101, each corresponding access request of detection target hard disk the execution time.

The access request, can specifically include the read request (Output) that data are read from target hard disk, Yi Jixiang Data are write in target hard disk or write request (Input) of data etc. is changed, and have the managing process United Dispatching of target hard disk And perform.

S102, judge whether the execution time be more than corresponding predetermined threshold value time lag ask.

S103, if there is the time lag ask, it is determined that the target hard disk occurs tangling failure.

In practical application, no matter why planting reason (such as hardware damage, read-write excess load) causes hard disk to occur tangling event Barrier, its access request for the hard disk that directly performance all at least includes being carrying out all is not held within a very long time Row terminates.In view of this, the embodiment of the present application is judged by detecting the execution time of the corresponding access request of target hard disk Whether the target hard disk occurs tangling failure, and target hard disk can be found in time tangles failure, so as to timely handling failure； And, what the embodiment of the present application was provided, which tangle fault detection method, can be only fitted to and performed automatically in the managing process of target hard disk, Both HD vendor had been needed not rely on and detection instrument is provided, it is not required that increased new hardware on hard disk, it is not required that be artificial dry In advance, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.

In one feasible embodiment of the application, above-mentioned distributed memory system hard disk tangles fault detection method, also It may comprise steps of：

S104, for the target hard disk corresponding IO sets of threads is set.

S105, read by the IO sets of threads and handle each corresponding access request of the target hard disk, to complete To the read-write operation of the target hard disk.

For ease of being managed, realizing that read-write (I/O) is serviced to hard disk, the embodiment of the present application is that target hard disk is set specially IO sets of threads, namely create in the managing process of target hard disk one group of IO thread, share the managing process is System resource, and for being only used for access request of the processing to the target hard disk；Pass through one group of IO line relative to prior art Journey serves all hard disk read-write operations simultaneously, or directly serves hard disk read-write operations, the application using user thread Embodiment is that each hard disk sets IO sets of threads respectively, can be avoided because non-hand disk failure or some hard disk failure are led Cause IO threads to be tangled, and then influence the phenomenon of the read-write operation of all hard disks.In addition, each line in IO sets of threads Cheng Binghang performs different access requests, on the one hand can improve access request treatment effeciency, that is, improves the read-write speed of hard disk Degree, on the other hand can also be when some thread performing some access request and tangled, and other threads can not be by shadow Sound continues with other access requests.

Accordingly, based on above-mentioned IO sets of threads, each corresponding access of detection target hard disk described in above-mentioned steps S101 The execution time of request, it is specifically as follows the execution time for detecting its IO sets of threads to each access request.

Further, in the application in another feasible embodiment, above-mentioned detection method can also include：For each Target disk, sets corresponding input rank.

Access request (I/O Request) is handled in any of the embodiment of the present application as shown in Figure 2 data memory node Y Schematic flow sheet, for a disk X in data memory node Y, sets one group of IO thread, for ease of distinguishing, It is T1~Tn to number it respectively in Fig. 2；Accordingly, each IO threads are correspondingly arranged an input rank, i.e. Fig. 2 The n IO queue that middle numbering is Q1 to Qn, is corresponded with IO threads.Data memory node Y, which is received, to be come from After the I/O Request of client, first the I/O Request is handled, determines it accesses which part number of the object for which disk According to, and the different I/O Requests to the same data of same disk are put into same IO queues, realize to same data Serial access, so as to avoid two I/O Requests while accessing same data.In addition, for data memory node Y, It can also have the IO threads (T0) for not binding any disk and corresponding IO queues (Q0), realize to whole section Point Y associative operation.One complete distributed memory system can include many numbers arranged side by side with data memory node Y According to memory node, the I/O Request handling process of each data memory node can use flow shown in Fig. 2.

Accordingly, the execution time of each corresponding access request of detection target hard disk described in above-mentioned steps S101, tool Body can be：Detect the execution time of the access request in team's head position in the input rank of target hard disk.

Wherein, the input rank could be arranged to first in first out (First Input First Output, FIFO) queue. By taking IO queues Q1 in Fig. 2 as an example, different access request is sequentially sequentially stored into Q1 according to the time, wherein into Q1 More early access request is closer to team's head of the Q1, then corresponding IO threads T1 reads from Q1 team head position every time One access request is simultaneously performed, and completes corresponding disk operating H1；Meanwhile, when reading the access request of team's head every time, Start to carry out timing to the execution time of the access request of this in T1, terminate until the access request is performed, if timing reaches During to predetermined threshold value, the access request is not finished yet, illustrates that the execution time of the access request exceedes predetermined threshold value, It then can be determined that the access request is asked for time lag, corresponding IO threads T1 is tangled, and then can be determined that disk X goes out Now tangle failure.

It can be seen that, input rank of the embodiment of the present application based on target hard disk is performed simultaneously according to the team's order that goes out of its access request Time progress timing is performed to it, the execution time of each access request can be accurately obtained, so as to find that time lag please in time Ask, determine that hard disk tangles failure, be that the hard disk that failure occurs tangling in timely processing is laid a good foundation.

The embodiment of the present application additionally provides a kind of distributed memory system hard disk and tangles fault handling method, and Fig. 3 shows this Tangle a kind of flow chart of fault handling method.

Comprise the following steps as shown in figure 3, this tangles fault handling method：

S201, when there is tangling failure in target hard disk, by the status indication of the target hard disk to tangle malfunction.

S203, the cleaning target hard disk are corresponding to be tangled system resource shared by managing process.

Fault detection method or other feasible detection methods are tangled based on above distributed memory system hard disk, work as judgement When failure occurs tangling in some hard disk, the processing method of the present embodiment offer can be continued executing with.Specifically, step S201 It is actual to manage operation for disk state, malfunction is tangled to the hard disk mark for occurring tangling failure；Wherein, hard disk is once It is marked as tangling malfunction, then does not allow it to re-flag again for normal condition, so as to avoid hanging again Firmly failure.It is resource clean-up operations to being tangled hard disk that step S203 is actual：Embodiment institute according to above-mentioned detection method State, when failure occurs tangling in target hard disk, certainly exist the request of at least one time lag, namely the management of target hard disk is entered Journey is tangled, and is tangled system resource shared by managing process by clearing up this, such as closes the file handle opened.

The effect of clear system resources is in step S203, on the one hand, can be to being tangled shared by managing process System resource is redistributed, for other process applications；On the other hand, the system resource quilt of process occupancy is tangled After cleaning out, this is tangled managing process and automatically exited from, namely relieves the state that tangles of the process, and then can be with Create and start new managing process to manage the target hard disk.

It can be seen that, the hard disk that the embodiment of the present application is provided tangles fault handling method, on the one hand avoids failure by status indication Hard disk is accessed again, the system resource that on the other hand cleaning faulty hard disk takes so that other processes can be redistributed Using these system resources, reduction hard disk tangles the adverse effect that failure may be brought, and reaches and stops loss purpose.And, it is above-mentioned Processing method had both needed not rely on HD vendor and provides detection instrument, it is not required that increases new hardware on hard disk, is also not required to Human intervention is wanted, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.

It is being extension by the status indication of target hard disk in above-mentioned steps S201 in one feasible embodiment of the application Firmly after malfunction, it can also continue to perform following steps：This is tangled into malfunction to preserve to normal hard disk.

Above-mentioned normal hard disk is specifically as follows the system disk of whole distributed memory system, or hard with the target Take inventory other hard disks in communication connection.It is above-mentioned to be directly realized by state synchronized in hard disk, it is ensured that the hard disk tangled is i.e. Make to temporarily become upstate to be used again, so as to avoid occurring tangling failure again.

Further, in one feasible embodiment of the application, the target hard disk pair is cleared up in above-mentioned steps S203 That answers is tangled system resource shared by managing process, specifically may comprise steps of：

S2031, the new internal memory of application, and pass through new internal memory execution the following step S2032 and S2033.

The application performs specific cleanup step by new internal memory, rather than.

S2032, lookup obtain the full memory section for being tangled process occupancy.

The allocated memory headroom of the managing process of target hard disk is usually multiple application heaps, to realize cleaning completely, it is necessary to Find whole application heaps；Specifically, under a linux operating system can be from/proc/self/smaps this file Obtain the application heap.

S2033, the mapping of each application heap corresponding internal memory is released respectively.

During for managing process distributing system resource, typically by mmap operations at a certain system resource (such as one file) Mapping relations are set up between an application heap；Accordingly, when clearing up EMS memory occupation, it can release each by munmap The corresponding internal memory mapping relations of individual application heap.

It can be seen that, above-mentioned steps S2032 and S2033 actual is to perform cleaning by new internal memory to be tangled shared by managing process Memory source operation, the implementation procedure of the operation also without extra hardware toolses and human intervention, it is simple easily OK；And relative to the operation is directly performed in the internal memory that the managing process of the target hard disk was allocated originally, can avoid The thread for performing the cleanup step is also tangled with managing process.

Reference picture 4, in the application in another feasible embodiment, above-mentioned distributed memory system hard disk is tangled at failure Reason method, it is further comprising the steps of：

Each access request cached in S202, the input rank of the ejection target hard disk, and return to the target hard disk Fault message.

Managing process due to target hard disk is tangled, and each request cached in the input rank of target hard disk is (i.e. also not It is in time for the request of processing) it can not also continue to be processed, the present embodiment ejects these access requests, and is returned to user The fault message of the target hard disk, such as " hard disk error ", so as to avoid associated user from continuing waiting for untreated ask The response asked, and avoid user from sending access request to the target hard disk again.

Referring now still to Fig. 4, in the application in another feasible embodiment, above-mentioned processing method also includes：

S204, after the managing process of the target hard disk is started every time, determine the state of the target hard disk, and in institute The state of target hard disk is stated when tangling malfunction, to forbid the access to the target hard disk.

Above-mentioned steps S204 is realized to the supervision of the availability of target hard disk, when both can be implemented in new hard disk and enabling, is used Failure detection steps are tangled to target hard disk in starting, to realize the real-time oversight to target hard disk availability；Step S204 It can also carry out after above-mentioned steps S203, that is, after the managing process for restarting faulty hard disk, due in step s 201 Faulty hard disk has been marked as tangling malfunction, therefore can refuse all visits for being directed to the faulty hard disk by step S204 Ask request, it is to avoid the faulty hard disk is accessed and causes process to tangle again again.

In addition, the distributed memory system hard disk that Fig. 5 illustrates described in the embodiment of the present application by the form of timing diagram is tangled Fault detect and handling process.Reference picture 5 is corresponding to set up simultaneously after the managing process of a data memory node starts Start Hang disks detection thread, be periodically detected the disk that whether there is in the data memory node and occur tangling failure (Hang disks)；Wherein, the detection operation performed by Hang disks detection thread is specifically included, for each of disk IO threads, detect whether there is the request for not returning to implementing result for a long time (i.e. time lag is asked), if certain of disk X There is time lag request in individual IO threads, illustrate that disk X is lived by Hang, then start Hang disks cleaning thread, clear up whole Various resources, internal memory (memory) shared by individual data memory node managing process, functional dependencies (Functional Dependency, FD) etc., and recording disc X state is to tangle malfunction (Hang states) on system disk； Then restart current managing process, obtain new managing process, after new managing process starts, the storage is recognized first The state of the disk of each in node, to disable the disk that (ignoring) is labeled as Hang states.

The description of embodiment of the method more than, it is apparent to those skilled in the art that the application can be borrowed Help software to add the mode of required general hardware platform to realize, naturally it is also possible to by hardware, but in many cases the former It is more preferably embodiment.Understood based on such, the technical scheme of the application is substantially made to prior art in other words The part of contribution can be embodied in the form of software product, and be stored in a storage medium, including some instructions To cause distributed memory system to perform all or part of step of each embodiment methods described of the application.And it is foregoing Storage medium includes：Read-only storage (ROM), random access memory (RAM), magnetic disc or CD etc. are various Can be with data storage and the medium of program code.

Fig. 6 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles failure detector Structured flowchart.As shown in fig. 6, the detection means includes：Detection unit 101 and comparing unit 102.

Wherein, detection unit 101 is used for, the execution time of each corresponding access request of detection target hard disk；

Comparing unit 102 is used for, and judges whether that the execution time is more than the time lag request of corresponding predetermined threshold value, if There is the time lag request, it is determined that the target hard disk occurs tangling failure.

From above technical scheme, the embodiment of the present application is by detecting execution time of the corresponding access request of target hard disk To judge whether the target hard disk occurs tangling failure, target hard disk can be found in time tangles failure, to locate in time Manage failure；And, the embodiment of the present application had both needed not rely on HD vendor and provides detection instrument, it is not required that increase on hard disk Plus new hardware, it is not required that human intervention, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.

In one feasible embodiment of the application, above-mentioned detection device can also include：Management of process unit；This enters Thread management unit is used for, and creates the corresponding IO sets of threads of the target hard disk, and read and locate by the IO sets of threads Each corresponding access request of the target hard disk is managed, to complete the read-write operation to the target hard disk.

In the application in another feasible embodiment, the detection unit 101 in above-mentioned detection device can specifically be configured For：Detect the execution time of the access request in team's head position in the input rank of target hard disk.

I.e. in the access request by the input rank caching of target hard disk based on FIFO rules, the management of target hard disk Process (more specifically, can be above-mentioned IO sets of threads) only from the team of input rank position read access request and is opened Begin to perform, therefore when the access request of team's head is read, the execution time to the access request that starts carries out timing, until The access request, which is performed, to be terminated, if timing reaches predetermined threshold value, the access request is not finished yet, illustrates the visit Ask that the execution time of request exceedes predetermined threshold value, then can be determined that the access request is asked for time lag, and then can be determined that phase The target hard disk answered occurs tangling failure.

Fig. 7 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault treating apparatus Structured flowchart.As shown in fig. 7, the processing unit includes：State managing unit 201 and resource cleaning unit 203.

Wherein, state managing unit 201 is used for, when failure occurs tangling in hard disk, by the mesh for occurring tangling failure The status indication of mark hard disk is to tangle malfunction；

Resource cleaning unit 203 is used for, and the cleaning target hard disk is corresponding to be provided by the system tangled shared by managing process Source, to start the new managing process for being used to manage the target hard disk.

From above technical scheme, the hard disk that the embodiment of the present application is provided tangles fault treating apparatus, on the one hand passes through shape State mark avoids faulty hard disk from being accessed again, the system resource that on the other hand cleaning faulty hard disk takes so that other enter Journey can be redistributed using these system resources, and reduction hard disk tangles the adverse effect that failure may be brought, reaches and stop loss Purpose.And, above-mentioned processing unit had both needed not rely on HD vendor and provides detection instrument, it is not required that increase on hard disk New hardware, it is not required that human intervention, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.

In one feasible embodiment of the application, above-mentioned state managing unit 201 is by the status indication of target hard disk To tangle after malfunction, this can also be tangled to malfunction and preserved to normal hard disk.

The present embodiment passes through the direct state synchronized of different hard disks, it is ensured that though the hard disk tangled temporarily become it is available State can not be used again, so as to avoid occurring tangling failure again.

In one feasible embodiment of the application, to realize that being tangled managing process in the cleaning target hard disk takes System resource, resource cleaning unit 203 is specifically configured to, and applies for new internal memory, and pass through it is described it is new in counter foil The following two steps operations of row, to remove the memory source for being tangled managing process occupancy：Lookup obtain it is described tangled into Cheng Zhanyong full memory section, and the corresponding internal memory mapping of each application heap is released respectively.

Reference picture 8, in the application in another feasible embodiment, above-mentioned fault treating apparatus can also include：Please Seek cleaning unit 202.Request cleaning unit 202 is used for, and what is cached in the input rank for ejecting the target hard disk is each Individual access request, and return to the fault message of the target hard disk.

Referring now still to Fig. 8, above-mentioned fault treating apparatus can also include：Availability supervision unit 204；The availability is supervised Unit 204 is used for, after the managing process of the target hard disk is started every time, determines the state of the target hard disk, and When the state of the target hard disk is tangles malfunction, forbid the access to the target hard disk.

It can be seen that, pass through above-mentioned availability supervision unit, it is possible to achieve to the real-time oversight of target hard disk availability, and in mesh Mark hard disk refuses all access requests for being directed to the faulty hard disk when there is tangling failure, it is to avoid the faulty hard disk again by Access and cause process to tangle again.

Each embodiment in this specification is described by the way of progressive, identical similar part between each embodiment Mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for device Or for system embodiment, because it is substantially similar to embodiment of the method, so describing fairly simple, related part is joined See the part explanation of embodiment of the method.

Described above is only the embodiment of the application, is made skilled artisans appreciate that or realizing the application. A variety of modifications to these embodiments will be apparent to one skilled in the art, and as defined herein one As principle can in other embodiments be realized in the case where not departing from spirit herein or scope.Therefore, this Shen The embodiments shown herein please be not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty Consistent most wide scope.

Claims

1. a kind of distributed memory system hard disk tangles fault detection method, it is characterised in that including：

2. detection method according to claim 1, it is characterised in that also include：

Create the corresponding IO sets of threads of the target hard disk；

3. detection method according to claim 1 or 2, it is characterised in that each corresponding visit of detection target hard disk Ask the execution time of request, including：

4. a kind of distributed memory system hard disk tangles fault handling method, it is characterised in that including：

5. fault handling method according to claim 4, it is characterised in that the corresponding quilt of the cleaning target hard disk The system resource shared by managing process is tangled, including：

6. the fault handling method according to claim 4 or 5, the target hard disk is corresponding to be tangled pipe clearing up Before system resource shared by reason process, in addition to：

Each access request cached in the input rank for ejecting the target hard disk, and return to the failure of the target hard disk Information.

7. the fault handling method according to claim 4 or 5, it is characterised in that also include：

8. the fault handling method according to claim 4 or 5, it is characterised in that also include：

9. a kind of distributed memory system hard disk tangles failure detector, it is characterised in that including：

10. failure detector according to claim 9, it is characterised in that also include：

11. the failure detector according to claim 9 or 10, it is characterised in that to realize detection target hard disk The execution time of each corresponding access request, the detection unit is specifically configured to：

12. a kind of distributed memory system hard disk tangles fault treating apparatus, it is characterised in that including：

13. fault treating apparatus according to claim 12, it is characterised in that to realize the cleaning target hard disk Middle to be tangled system resource shared by managing process, the resource cleaning unit is specifically configured to,

Apply for new internal memory, and following two steps are performed by the new internal memory to operate, described tangled managing process to remove and accounted for Memory source：Search and obtain the full memory section taken by the process that tangled, and release each application heap respectively Corresponding internal memory mapping.

14. the fault treating apparatus according to claim 12 or 13, it is characterised in that also include：

15. the fault treating apparatus according to claim 12 or 13, it is characterised in that also include：

16. the fault treating apparatus according to claim 12 or 13, it is characterised in that the state managing unit, It is additionally operable to：