CN107273231A - Distributed memory system hard disk tangles fault detect, processing method and processing device - Google Patents
Distributed memory system hard disk tangles fault detect, processing method and processing device Download PDFInfo
- Publication number
- CN107273231A CN107273231A CN201610212740.0A CN201610212740A CN107273231A CN 107273231 A CN107273231 A CN 107273231A CN 201610212740 A CN201610212740 A CN 201610212740A CN 107273231 A CN107273231 A CN 107273231A
- Authority
- CN
- China
- Prior art keywords
- hard disk
- target hard
- target
- failure
- tangled
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Fault detect, processing method and processing device are tangled this application discloses a kind of distributed memory system hard disk, it judges whether the target hard disk occurs tangling failure by detecting the execution time of the corresponding access request of target hard disk, and target hard disk can be found in time tangles failure;After finding that target hard disk occurs tangling failure, on the one hand faulty hard disk is avoided to be accessed again by status indication, on the other hand the system resource that cleaning faulty hard disk takes, other processes are redistributed using these system resources, reduction hard disk tangles the adverse effect that failure may be brought, and reaches and stops loss purpose.It can be seen that, tangle fault detect and the processing scheme of the application offer had both needed not rely on HD vendor's offer detection instrument, it is not required that increase new hardware on hard disk, it is not required that human intervention, simple and easy to apply, did not interfered with production and the use cost of hard disk.
Description
Technical field
Fault detect, place are tangled the present invention relates to field of computer technology, more particularly to a kind of distributed memory system hard disk
Manage method and device.
Background technology
Distributed memory system is to build the storage system on local file system, and the scattered storage of data is arrived multiple by it
On hard disk.For distributed memory system, have on the whole link from local file system to each hard drive internal
Failure is likely to occur, wherein hard disk tangles (hang up) failure, shows as hard disk and cannot respond to normal operation, own
The input-output operation of the hard disk is not all replied because of whole link and can not be stopped.If the hard disk processing tangled is not
Lose response when may result in whole access process, so cause the data for being managed the process can not all access, it is preceding
The problems such as end request delay is uprised, system load increase, availability of data are reduced.Therefore detecting hard disk tangles failure in time,
The influence that the failure is caused is reduced, is a key issue for ensureing distributed memory system performance.
Existing hard disk, which tangles fault handling method, mainly includes following four:(1) using HD vendor provide instrument to
Hard disk sends lower line, and hard disk is stopped after receiving lower line, so that the access to hard disk can be returned, eventually
Only hard disk tangles state;(2) stop hard disk operational using the hardware switch of hard disk, be typically to increase by one on existing hard disk
Individual part, the voltage of hard disk is directly dragged down by the part, makes hard disk power down, so that terminating hard disk tangles state;(3)
Restart machine, after restarting, disk state is reset, but only exist the possibility that improvement hard disk tangles state;(4) directly
Restart process, new process can evade using the hard disk tangled.
But above-mentioned processing method all has certain defect, including need to rely on extra aid, influence system money
Source availability etc..Specifically, the above method (1) need to rely on the instrument of HD vendor's offer, and hard disk is not suitable for it
The situation of lower line can not be received, practical application success rate is relatively low;Method (2) needs to increase new hardware on hard disk (i.e.
Hardware switch), the cost increase for causing hard disk to develop and safeguard, and narrow application range;Method (3) introduces artificial dry
In advance, during machine is restarted, machine is reduced with the availability of storage system in itself, and in the presence of the possibility for restarting failure,
Even if restarting success, it is also desirable to which storage system can evade the use of the hard disk to tangling, the requirement to storage system is higher;
Original process in method (4) is because there is thread to tangle, it is impossible to releasing memory resource so that Installed System Memory takes height, even if
Having restarted the available resources of system can also reduce.Therefore, a kind of success rate is needed badly high, applied widely, available to system
Property the small hard disk of influence tangle fault handling method.
The content of the invention
The application first technical problem to be solved is that distributed storage system is realized on the premise of not against aid
System hard disk tangles the automatic detection of failure;Therefore, the application, which provides a kind of distributed memory system hard disk, tangles fault detect
Method and device.
The application first aspect tangles fault detection method there is provided a kind of distributed memory system hard disk, including:
Detect the execution time of each corresponding access request of target hard disk;
Judge whether that the execution time is more than the time lag request of corresponding predetermined threshold value;
If there is time lag request, it is determined that the target hard disk occurs tangling failure.
With reference in a first aspect, in the application first aspect the first feasible embodiment, the fault detection method is also
Including:
Create the corresponding IO sets of threads of the target hard disk;
Read by the IO sets of threads and handle each corresponding access request of the target hard disk, to complete to described
The read-write operation of target hard disk.
With reference in a first aspect, or first aspect the first feasible embodiment, in second of feasible reality of first aspect
Apply in mode, the execution time of each corresponding access request of detection target hard disk, including:
Detect the execution time of the access request in team's head position in the input rank of target hard disk.
The application second aspect tangles failure detector there is provided a kind of distributed memory system hard disk, including:
Detection unit, the execution time for detecting each corresponding access request of target hard disk;
Comparing unit, for judging whether that the execution time is more than the time lag request of corresponding predetermined threshold value, if there is
The time lag request, it is determined that the target hard disk occurs tangling failure.
With reference to second aspect, in second aspect in the first feasible embodiment, the failure detector also includes:
Management of process unit, reads for creating the corresponding IO sets of threads of the target hard disk, and by the IO sets of threads
Take and handle each corresponding access request of the target hard disk, to complete the read-write operation to the target hard disk.
With reference to second aspect, or second aspect the first feasible embodiment, in second of feasible reality of second aspect
Apply in mode, to realize the execution time of corresponding each access request of detection target hard disk, the specific quilt of the detection unit
It is configured to:
Detect the execution time of the access request in team's head position in the input rank of target hard disk.
From above technical scheme, the embodiment of the present application is by detecting execution time of the corresponding access request of target hard disk
To judge whether the target hard disk occurs tangling failure, target hard disk can be found in time tangles failure;And this tangles event
Barrier detection mode had both needed not rely on HD vendor and provides detection instrument, it is not required that increase new hardware on hard disk, also not
Human intervention is needed, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.
The application second technical problem to be solved is that distributed storage system is realized on the premise of not against aid
System hard disk tangles automatically processing for failure;Therefore, the application, which provides a kind of distributed memory system hard disk, tangles troubleshooting
Method and device.
The application third aspect tangles fault handling method there is provided a kind of distributed memory system hard disk, including:
It is to tangle malfunction by the status indication of the target hard disk when failure occurs tangling in target hard disk;
Clear up that the target hard disk is corresponding to be tangled system resource shared by managing process, new be used to manage to start
Manage the managing process of the target hard disk.
With reference to the third aspect, in the third aspect in the first feasible embodiment, the corresponding quilt of the target hard disk is cleared up
The system resource shared by managing process is tangled, including:
Apply for new internal memory, and following two steps are performed by the new internal memory to operate, described tangled managing process to remove and accounted for
Memory source;
Search and obtain the full memory section for being tangled process occupancy;
The corresponding internal memory mapping of each application heap is released respectively.
With reference to the third aspect, or the third aspect the first feasible embodiment, in second of feasible reality of the third aspect
Apply in mode, the fault handling method also includes:
Before clearing up that the target hard disk is corresponding and being tangled system resource shared by managing process, the target is ejected
Each access request cached in the input rank of hard disk, and return to the fault message of the target hard disk.
With reference to the third aspect, or the third aspect the first feasible embodiment, the third feasible reality in the third aspect
Apply in mode, the fault handling method also includes:
After the managing process of the target hard disk is started every time, the state of the target hard disk is determined;
If the state of the target hard disk forbids the access to the target hard disk to tangle malfunction.
With reference to the third aspect, or the third aspect the first feasible embodiment, in the 4th kind of feasible reality of the third aspect
Apply in mode, the fault handling method also includes:
The malfunction that tangles of the target hard disk is preserved to normal hard disk.
The application fourth aspect tangles fault treating apparatus there is provided a kind of distributed memory system hard disk, including:
State managing unit, for being extension by the status indication of the target hard disk when failure occurs tangling in target hard disk
Firmly malfunction;
Resource clears up unit, and for clearing up, the target hard disk is corresponding to be tangled system resource shared by managing process,
To start the new managing process for being used to manage the target hard disk.
With reference to fourth aspect, in fourth aspect in the first feasible embodiment, to realize in the cleaning target hard disk
System resource shared by managing process is tangled, the resource cleaning unit is specifically configured to, and applies for new internal memory, and
Following two steps are performed by the new internal memory to operate, to remove the memory source for being tangled managing process occupancy:Search
The full memory section taken by the process that tangled is obtained, and releases the corresponding internal memory mapping of each application heap respectively.
With reference to fourth aspect, or fourth aspect the first feasible embodiment, in second of feasible reality of fourth aspect
Apply in mode, the fault treating apparatus also includes:
Request cleaning unit, each access request cached in the input rank for ejecting the target hard disk, and return
The fault message of the target hard disk.
With reference to fourth aspect, or fourth aspect the first feasible embodiment, the third feasible reality in fourth aspect
Apply in mode, the fault treating apparatus also includes:
Availability supervision unit, for after the managing process of the target hard disk is started every time, determining the target hard disk
State, and the target hard disk state for tangle malfunction when, forbid the access to the target hard disk.
With reference to fourth aspect, or fourth aspect the first feasible embodiment, in the 4th kind of feasible reality of fourth aspect
Apply in mode, the state managing unit is additionally operable to:The malfunction that tangles of the target hard disk is preserved to normal
Hard disk.
From above technical scheme, on the one hand the embodiment of the present application passes through after finding that target hard disk occurs tangling failure
Status indication avoids faulty hard disk from being accessed again, the system resource that on the other hand cleaning faulty hard disk takes so that other
Process can be redistributed using these system resources, and reduction hard disk tangles the adverse effect that failure may be brought, reached only
Damage purpose.It can be seen that, the troubleshooting scheme that tangles of the embodiment of the present application offer had both needed not rely on HD vendor's offer detection
Instrument, it is not required that increase new hardware on hard disk, it is not required that human intervention, it is simple and easy to apply, do not interfere with hard disk
Production and use cost.
It should be appreciated that the general description of the above and detailed description hereinafter are only exemplary and explanatory, can not
Limit the application.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art
The accompanying drawing used required in technology description is briefly described, it should be apparent that, for those of ordinary skill in the art
Speech, without having to pay creative labor, can also obtain other accompanying drawings according to these accompanying drawings.
Fig. 1 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault detection method
Flow chart.
Fig. 2 please to be accessed in a data memory node in the distributed memory system shown in the exemplary embodiment of the application one
Seek handling process schematic diagram.
Fig. 3 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault handling method
Flow chart.
Fig. 4 is that another distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault handling method
Flow chart.
Fig. 5 is that the distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault detect and processing method
Timing diagram.
Fig. 6 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles failure detector
Structured flowchart.
Fig. 7 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault treating apparatus
Structured flowchart.
Fig. 8 is that another distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault treating apparatus
Structured flowchart.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to attached
During figure, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary is implemented
Embodiment described in example does not represent all embodiments consistent with the application.On the contrary, they be only with such as
The example of the consistent apparatus and method of some aspects be described in detail in appended claims, the application.
For comprehensive understanding the application, numerous concrete details are refer in the following detailed description, but art technology
Personnel are it should be understood that the application can be realized without these details.In other embodiments, public affairs are not described in detail
Method, process, component and the circuit known, are obscured in order to avoid undesirably resulting in embodiment.
Fig. 1 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault detection method
Schematic flow sheet.As shown in figure 1, the detection method includes:
S101, each corresponding access request of detection target hard disk the execution time.
The access request, can specifically include the read request (Output) that data are read from target hard disk, Yi Jixiang
Data are write in target hard disk or write request (Input) of data etc. is changed, and have the managing process United Dispatching of target hard disk
And perform.
S102, judge whether the execution time be more than corresponding predetermined threshold value time lag ask.
S103, if there is the time lag ask, it is determined that the target hard disk occurs tangling failure.
In practical application, no matter why planting reason (such as hardware damage, read-write excess load) causes hard disk to occur tangling event
Barrier, its access request for the hard disk that directly performance all at least includes being carrying out all is not held within a very long time
Row terminates.In view of this, the embodiment of the present application is judged by detecting the execution time of the corresponding access request of target hard disk
Whether the target hard disk occurs tangling failure, and target hard disk can be found in time tangles failure, so as to timely handling failure;
And, what the embodiment of the present application was provided, which tangle fault detection method, can be only fitted to and performed automatically in the managing process of target hard disk,
Both HD vendor had been needed not rely on and detection instrument is provided, it is not required that increased new hardware on hard disk, it is not required that be artificial dry
In advance, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.
In one feasible embodiment of the application, above-mentioned distributed memory system hard disk tangles fault detection method, also
It may comprise steps of:
S104, for the target hard disk corresponding IO sets of threads is set.
S105, read by the IO sets of threads and handle each corresponding access request of the target hard disk, to complete
To the read-write operation of the target hard disk.
For ease of being managed, realizing that read-write (I/O) is serviced to hard disk, the embodiment of the present application is that target hard disk is set specially
IO sets of threads, namely create in the managing process of target hard disk one group of IO thread, share the managing process is
System resource, and for being only used for access request of the processing to the target hard disk;Pass through one group of IO line relative to prior art
Journey serves all hard disk read-write operations simultaneously, or directly serves hard disk read-write operations, the application using user thread
Embodiment is that each hard disk sets IO sets of threads respectively, can be avoided because non-hand disk failure or some hard disk failure are led
Cause IO threads to be tangled, and then influence the phenomenon of the read-write operation of all hard disks.In addition, each line in IO sets of threads
Cheng Binghang performs different access requests, on the one hand can improve access request treatment effeciency, that is, improves the read-write speed of hard disk
Degree, on the other hand can also be when some thread performing some access request and tangled, and other threads can not be by shadow
Sound continues with other access requests.
Accordingly, based on above-mentioned IO sets of threads, each corresponding access of detection target hard disk described in above-mentioned steps S101
The execution time of request, it is specifically as follows the execution time for detecting its IO sets of threads to each access request.
Further, in the application in another feasible embodiment, above-mentioned detection method can also include:For each
Target disk, sets corresponding input rank.
Access request (I/O Request) is handled in any of the embodiment of the present application as shown in Figure 2 data memory node Y
Schematic flow sheet, for a disk X in data memory node Y, sets one group of IO thread, for ease of distinguishing,
It is T1~Tn to number it respectively in Fig. 2;Accordingly, each IO threads are correspondingly arranged an input rank, i.e. Fig. 2
The n IO queue that middle numbering is Q1 to Qn, is corresponded with IO threads.Data memory node Y, which is received, to be come from
After the I/O Request of client, first the I/O Request is handled, determines it accesses which part number of the object for which disk
According to, and the different I/O Requests to the same data of same disk are put into same IO queues, realize to same data
Serial access, so as to avoid two I/O Requests while accessing same data.In addition, for data memory node Y,
It can also have the IO threads (T0) for not binding any disk and corresponding IO queues (Q0), realize to whole section
Point Y associative operation.One complete distributed memory system can include many numbers arranged side by side with data memory node Y
According to memory node, the I/O Request handling process of each data memory node can use flow shown in Fig. 2.
Accordingly, the execution time of each corresponding access request of detection target hard disk described in above-mentioned steps S101, tool
Body can be:Detect the execution time of the access request in team's head position in the input rank of target hard disk.
Wherein, the input rank could be arranged to first in first out (First Input First Output, FIFO) queue.
By taking IO queues Q1 in Fig. 2 as an example, different access request is sequentially sequentially stored into Q1 according to the time, wherein into Q1
More early access request is closer to team's head of the Q1, then corresponding IO threads T1 reads from Q1 team head position every time
One access request is simultaneously performed, and completes corresponding disk operating H1;Meanwhile, when reading the access request of team's head every time,
Start to carry out timing to the execution time of the access request of this in T1, terminate until the access request is performed, if timing reaches
During to predetermined threshold value, the access request is not finished yet, illustrates that the execution time of the access request exceedes predetermined threshold value,
It then can be determined that the access request is asked for time lag, corresponding IO threads T1 is tangled, and then can be determined that disk X goes out
Now tangle failure.
It can be seen that, input rank of the embodiment of the present application based on target hard disk is performed simultaneously according to the team's order that goes out of its access request
Time progress timing is performed to it, the execution time of each access request can be accurately obtained, so as to find that time lag please in time
Ask, determine that hard disk tangles failure, be that the hard disk that failure occurs tangling in timely processing is laid a good foundation.
The embodiment of the present application additionally provides a kind of distributed memory system hard disk and tangles fault handling method, and Fig. 3 shows this
Tangle a kind of flow chart of fault handling method.
Comprise the following steps as shown in figure 3, this tangles fault handling method:
S201, when there is tangling failure in target hard disk, by the status indication of the target hard disk to tangle malfunction.
S203, the cleaning target hard disk are corresponding to be tangled system resource shared by managing process.
Fault detection method or other feasible detection methods are tangled based on above distributed memory system hard disk, work as judgement
When failure occurs tangling in some hard disk, the processing method of the present embodiment offer can be continued executing with.Specifically, step S201
It is actual to manage operation for disk state, malfunction is tangled to the hard disk mark for occurring tangling failure;Wherein, hard disk is once
It is marked as tangling malfunction, then does not allow it to re-flag again for normal condition, so as to avoid hanging again
Firmly failure.It is resource clean-up operations to being tangled hard disk that step S203 is actual:Embodiment institute according to above-mentioned detection method
State, when failure occurs tangling in target hard disk, certainly exist the request of at least one time lag, namely the management of target hard disk is entered
Journey is tangled, and is tangled system resource shared by managing process by clearing up this, such as closes the file handle opened.
The effect of clear system resources is in step S203, on the one hand, can be to being tangled shared by managing process
System resource is redistributed, for other process applications;On the other hand, the system resource quilt of process occupancy is tangled
After cleaning out, this is tangled managing process and automatically exited from, namely relieves the state that tangles of the process, and then can be with
Create and start new managing process to manage the target hard disk.
It can be seen that, the hard disk that the embodiment of the present application is provided tangles fault handling method, on the one hand avoids failure by status indication
Hard disk is accessed again, the system resource that on the other hand cleaning faulty hard disk takes so that other processes can be redistributed
Using these system resources, reduction hard disk tangles the adverse effect that failure may be brought, and reaches and stops loss purpose.And, it is above-mentioned
Processing method had both needed not rely on HD vendor and provides detection instrument, it is not required that increases new hardware on hard disk, is also not required to
Human intervention is wanted, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.
It is being extension by the status indication of target hard disk in above-mentioned steps S201 in one feasible embodiment of the application
Firmly after malfunction, it can also continue to perform following steps:This is tangled into malfunction to preserve to normal hard disk.
Above-mentioned normal hard disk is specifically as follows the system disk of whole distributed memory system, or hard with the target
Take inventory other hard disks in communication connection.It is above-mentioned to be directly realized by state synchronized in hard disk, it is ensured that the hard disk tangled is i.e.
Make to temporarily become upstate to be used again, so as to avoid occurring tangling failure again.
Further, in one feasible embodiment of the application, the target hard disk pair is cleared up in above-mentioned steps S203
That answers is tangled system resource shared by managing process, specifically may comprise steps of:
S2031, the new internal memory of application, and pass through new internal memory execution the following step S2032 and S2033.
The application performs specific cleanup step by new internal memory, rather than.
S2032, lookup obtain the full memory section for being tangled process occupancy.
The allocated memory headroom of the managing process of target hard disk is usually multiple application heaps, to realize cleaning completely, it is necessary to
Find whole application heaps;Specifically, under a linux operating system can be from/proc/self/smaps this file
Obtain the application heap.
S2033, the mapping of each application heap corresponding internal memory is released respectively.
During for managing process distributing system resource, typically by mmap operations at a certain system resource (such as one file)
Mapping relations are set up between an application heap;Accordingly, when clearing up EMS memory occupation, it can release each by munmap
The corresponding internal memory mapping relations of individual application heap.
It can be seen that, above-mentioned steps S2032 and S2033 actual is to perform cleaning by new internal memory to be tangled shared by managing process
Memory source operation, the implementation procedure of the operation also without extra hardware toolses and human intervention, it is simple easily
OK;And relative to the operation is directly performed in the internal memory that the managing process of the target hard disk was allocated originally, can avoid
The thread for performing the cleanup step is also tangled with managing process.
Reference picture 4, in the application in another feasible embodiment, above-mentioned distributed memory system hard disk is tangled at failure
Reason method, it is further comprising the steps of:
Each access request cached in S202, the input rank of the ejection target hard disk, and return to the target hard disk
Fault message.
Managing process due to target hard disk is tangled, and each request cached in the input rank of target hard disk is (i.e. also not
It is in time for the request of processing) it can not also continue to be processed, the present embodiment ejects these access requests, and is returned to user
The fault message of the target hard disk, such as " hard disk error ", so as to avoid associated user from continuing waiting for untreated ask
The response asked, and avoid user from sending access request to the target hard disk again.
Referring now still to Fig. 4, in the application in another feasible embodiment, above-mentioned processing method also includes:
S204, after the managing process of the target hard disk is started every time, determine the state of the target hard disk, and in institute
The state of target hard disk is stated when tangling malfunction, to forbid the access to the target hard disk.
Above-mentioned steps S204 is realized to the supervision of the availability of target hard disk, when both can be implemented in new hard disk and enabling, is used
Failure detection steps are tangled to target hard disk in starting, to realize the real-time oversight to target hard disk availability;Step S204
It can also carry out after above-mentioned steps S203, that is, after the managing process for restarting faulty hard disk, due in step s 201
Faulty hard disk has been marked as tangling malfunction, therefore can refuse all visits for being directed to the faulty hard disk by step S204
Ask request, it is to avoid the faulty hard disk is accessed and causes process to tangle again again.
In addition, the distributed memory system hard disk that Fig. 5 illustrates described in the embodiment of the present application by the form of timing diagram is tangled
Fault detect and handling process.Reference picture 5 is corresponding to set up simultaneously after the managing process of a data memory node starts
Start Hang disks detection thread, be periodically detected the disk that whether there is in the data memory node and occur tangling failure
(Hang disks);Wherein, the detection operation performed by Hang disks detection thread is specifically included, for each of disk
IO threads, detect whether there is the request for not returning to implementing result for a long time (i.e. time lag is asked), if certain of disk X
There is time lag request in individual IO threads, illustrate that disk X is lived by Hang, then start Hang disks cleaning thread, clear up whole
Various resources, internal memory (memory) shared by individual data memory node managing process, functional dependencies (Functional
Dependency, FD) etc., and recording disc X state is to tangle malfunction (Hang states) on system disk;
Then restart current managing process, obtain new managing process, after new managing process starts, the storage is recognized first
The state of the disk of each in node, to disable the disk that (ignoring) is labeled as Hang states.
The description of embodiment of the method more than, it is apparent to those skilled in the art that the application can be borrowed
Help software to add the mode of required general hardware platform to realize, naturally it is also possible to by hardware, but in many cases the former
It is more preferably embodiment.Understood based on such, the technical scheme of the application is substantially made to prior art in other words
The part of contribution can be embodied in the form of software product, and be stored in a storage medium, including some instructions
To cause distributed memory system to perform all or part of step of each embodiment methods described of the application.And it is foregoing
Storage medium includes:Read-only storage (ROM), random access memory (RAM), magnetic disc or CD etc. are various
Can be with data storage and the medium of program code.
Fig. 6 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles failure detector
Structured flowchart.As shown in fig. 6, the detection means includes:Detection unit 101 and comparing unit 102.
Wherein, detection unit 101 is used for, the execution time of each corresponding access request of detection target hard disk;
Comparing unit 102 is used for, and judges whether that the execution time is more than the time lag request of corresponding predetermined threshold value, if
There is the time lag request, it is determined that the target hard disk occurs tangling failure.
From above technical scheme, the embodiment of the present application is by detecting execution time of the corresponding access request of target hard disk
To judge whether the target hard disk occurs tangling failure, target hard disk can be found in time tangles failure, to locate in time
Manage failure;And, the embodiment of the present application had both needed not rely on HD vendor and provides detection instrument, it is not required that increase on hard disk
Plus new hardware, it is not required that human intervention, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.
In one feasible embodiment of the application, above-mentioned detection device can also include:Management of process unit;This enters
Thread management unit is used for, and creates the corresponding IO sets of threads of the target hard disk, and read and locate by the IO sets of threads
Each corresponding access request of the target hard disk is managed, to complete the read-write operation to the target hard disk.
In the application in another feasible embodiment, the detection unit 101 in above-mentioned detection device can specifically be configured
For:Detect the execution time of the access request in team's head position in the input rank of target hard disk.
I.e. in the access request by the input rank caching of target hard disk based on FIFO rules, the management of target hard disk
Process (more specifically, can be above-mentioned IO sets of threads) only from the team of input rank position read access request and is opened
Begin to perform, therefore when the access request of team's head is read, the execution time to the access request that starts carries out timing, until
The access request, which is performed, to be terminated, if timing reaches predetermined threshold value, the access request is not finished yet, illustrates the visit
Ask that the execution time of request exceedes predetermined threshold value, then can be determined that the access request is asked for time lag, and then can be determined that phase
The target hard disk answered occurs tangling failure.
It can be seen that, input rank of the embodiment of the present application based on target hard disk is performed simultaneously according to the team's order that goes out of its access request
Time progress timing is performed to it, the execution time of each access request can be accurately obtained, so as to find that time lag please in time
Ask, determine that hard disk tangles failure, be that the hard disk that failure occurs tangling in timely processing is laid a good foundation.
Fig. 7 is that a kind of distributed memory system hard disk shown in the exemplary embodiment of the application one tangles fault treating apparatus
Structured flowchart.As shown in fig. 7, the processing unit includes:State managing unit 201 and resource cleaning unit 203.
Wherein, state managing unit 201 is used for, when failure occurs tangling in hard disk, by the mesh for occurring tangling failure
The status indication of mark hard disk is to tangle malfunction;
Resource cleaning unit 203 is used for, and the cleaning target hard disk is corresponding to be provided by the system tangled shared by managing process
Source, to start the new managing process for being used to manage the target hard disk.
From above technical scheme, the hard disk that the embodiment of the present application is provided tangles fault treating apparatus, on the one hand passes through shape
State mark avoids faulty hard disk from being accessed again, the system resource that on the other hand cleaning faulty hard disk takes so that other enter
Journey can be redistributed using these system resources, and reduction hard disk tangles the adverse effect that failure may be brought, reaches and stop loss
Purpose.And, above-mentioned processing unit had both needed not rely on HD vendor and provides detection instrument, it is not required that increase on hard disk
New hardware, it is not required that human intervention, it is simple and easy to apply, do not interfere with production and the use cost of hard disk.
In one feasible embodiment of the application, above-mentioned state managing unit 201 is by the status indication of target hard disk
To tangle after malfunction, this can also be tangled to malfunction and preserved to normal hard disk.
The present embodiment passes through the direct state synchronized of different hard disks, it is ensured that though the hard disk tangled temporarily become it is available
State can not be used again, so as to avoid occurring tangling failure again.
In one feasible embodiment of the application, to realize that being tangled managing process in the cleaning target hard disk takes
System resource, resource cleaning unit 203 is specifically configured to, and applies for new internal memory, and pass through it is described it is new in counter foil
The following two steps operations of row, to remove the memory source for being tangled managing process occupancy:Lookup obtain it is described tangled into
Cheng Zhanyong full memory section, and the corresponding internal memory mapping of each application heap is released respectively.
Reference picture 8, in the application in another feasible embodiment, above-mentioned fault treating apparatus can also include:Please
Seek cleaning unit 202.Request cleaning unit 202 is used for, and what is cached in the input rank for ejecting the target hard disk is each
Individual access request, and return to the fault message of the target hard disk.
Referring now still to Fig. 8, above-mentioned fault treating apparatus can also include:Availability supervision unit 204;The availability is supervised
Unit 204 is used for, after the managing process of the target hard disk is started every time, determines the state of the target hard disk, and
When the state of the target hard disk is tangles malfunction, forbid the access to the target hard disk.
It can be seen that, pass through above-mentioned availability supervision unit, it is possible to achieve to the real-time oversight of target hard disk availability, and in mesh
Mark hard disk refuses all access requests for being directed to the faulty hard disk when there is tangling failure, it is to avoid the faulty hard disk again by
Access and cause process to tangle again.
Each embodiment in this specification is described by the way of progressive, identical similar part between each embodiment
Mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for device
Or for system embodiment, because it is substantially similar to embodiment of the method, so describing fairly simple, related part is joined
See the part explanation of embodiment of the method.
Described above is only the embodiment of the application, is made skilled artisans appreciate that or realizing the application.
A variety of modifications to these embodiments will be apparent to one skilled in the art, and as defined herein one
As principle can in other embodiments be realized in the case where not departing from spirit herein or scope.Therefore, this Shen
The embodiments shown herein please be not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty
Consistent most wide scope.
Claims (16)
1. a kind of distributed memory system hard disk tangles fault detection method, it is characterised in that including:
Detect the execution time of each corresponding access request of target hard disk;
Judge whether that the execution time is more than the time lag request of corresponding predetermined threshold value;
If there is time lag request, it is determined that the target hard disk occurs tangling failure.
2. detection method according to claim 1, it is characterised in that also include:
Create the corresponding IO sets of threads of the target hard disk;
Read by the IO sets of threads and handle each corresponding access request of the target hard disk, to complete to described
The read-write operation of target hard disk.
3. detection method according to claim 1 or 2, it is characterised in that each corresponding visit of detection target hard disk
Ask the execution time of request, including:
Detect the execution time of the access request in team's head position in the input rank of target hard disk.
4. a kind of distributed memory system hard disk tangles fault handling method, it is characterised in that including:
It is to tangle malfunction by the status indication of the target hard disk when failure occurs tangling in target hard disk;
Clear up that the target hard disk is corresponding to be tangled system resource shared by managing process, new be used to manage to start
Manage the managing process of the target hard disk.
5. fault handling method according to claim 4, it is characterised in that the corresponding quilt of the cleaning target hard disk
The system resource shared by managing process is tangled, including:
Apply for new internal memory, and following two steps are performed by the new internal memory to operate, described tangled managing process to remove and accounted for
Memory source;
Search and obtain the full memory section for being tangled process occupancy;
The corresponding internal memory mapping of each application heap is released respectively.
6. the fault handling method according to claim 4 or 5, the target hard disk is corresponding to be tangled pipe clearing up
Before system resource shared by reason process, in addition to:
Each access request cached in the input rank for ejecting the target hard disk, and return to the failure of the target hard disk
Information.
7. the fault handling method according to claim 4 or 5, it is characterised in that also include:
After the managing process of the target hard disk is started every time, the state of the target hard disk is determined;
If the state of the target hard disk forbids the access to the target hard disk to tangle malfunction.
8. the fault handling method according to claim 4 or 5, it is characterised in that also include:
The malfunction that tangles of the target hard disk is preserved to normal hard disk.
9. a kind of distributed memory system hard disk tangles failure detector, it is characterised in that including:
Detection unit, the execution time for detecting each corresponding access request of target hard disk;
Comparing unit, for judging whether that the execution time is more than the time lag request of corresponding predetermined threshold value, if there is
The time lag request, it is determined that the target hard disk occurs tangling failure.
10. failure detector according to claim 9, it is characterised in that also include:
Management of process unit, reads for creating the corresponding IO sets of threads of the target hard disk, and by the IO sets of threads
Take and handle each corresponding access request of the target hard disk, to complete the read-write operation to the target hard disk.
11. the failure detector according to claim 9 or 10, it is characterised in that to realize detection target hard disk
The execution time of each corresponding access request, the detection unit is specifically configured to:
Detect the execution time of the access request in team's head position in the input rank of target hard disk.
12. a kind of distributed memory system hard disk tangles fault treating apparatus, it is characterised in that including:
State managing unit, for being extension by the status indication of the target hard disk when failure occurs tangling in target hard disk
Firmly malfunction;
Resource clears up unit, and for clearing up, the target hard disk is corresponding to be tangled system resource shared by managing process,
To start the new managing process for being used to manage the target hard disk.
13. fault treating apparatus according to claim 12, it is characterised in that to realize the cleaning target hard disk
Middle to be tangled system resource shared by managing process, the resource cleaning unit is specifically configured to,
Apply for new internal memory, and following two steps are performed by the new internal memory to operate, described tangled managing process to remove and accounted for
Memory source:Search and obtain the full memory section taken by the process that tangled, and release each application heap respectively
Corresponding internal memory mapping.
14. the fault treating apparatus according to claim 12 or 13, it is characterised in that also include:
Request cleaning unit, each access request cached in the input rank for ejecting the target hard disk, and return
The fault message of the target hard disk.
15. the fault treating apparatus according to claim 12 or 13, it is characterised in that also include:
Availability supervision unit, for after the managing process of the target hard disk is started every time, determining the target hard disk
State, and the target hard disk state for tangle malfunction when, forbid the access to the target hard disk.
16. the fault treating apparatus according to claim 12 or 13, it is characterised in that the state managing unit,
It is additionally operable to:
The malfunction that tangles of the target hard disk is preserved to normal hard disk.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212740.0A CN107273231A (en) | 2016-04-07 | 2016-04-07 | Distributed memory system hard disk tangles fault detect, processing method and processing device |
TW106107797A TW201737111A (en) | 2016-04-07 | 2017-03-09 | Method and device for detecting and processing hard disk hanging fault in distributed storage system |
PCT/CN2017/077995 WO2017173927A1 (en) | 2016-04-07 | 2017-03-24 | Method and device for detecting and processing hard disk hanging fault in distributed storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212740.0A CN107273231A (en) | 2016-04-07 | 2016-04-07 | Distributed memory system hard disk tangles fault detect, processing method and processing device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107273231A true CN107273231A (en) | 2017-10-20 |
Family
ID=60000846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610212740.0A Pending CN107273231A (en) | 2016-04-07 | 2016-04-07 | Distributed memory system hard disk tangles fault detect, processing method and processing device |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN107273231A (en) |
TW (1) | TW201737111A (en) |
WO (1) | WO2017173927A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170375A (en) * | 2017-12-21 | 2018-06-15 | 创新科存储技术有限公司 | Transfinite guard method and device in a kind of distributed memory system |
CN108762913A (en) * | 2018-03-23 | 2018-11-06 | 阿里巴巴集团控股有限公司 | service processing method and device |
CN108776579A (en) * | 2018-06-19 | 2018-11-09 | 郑州云海信息技术有限公司 | A kind of distributed storage cluster expansion method, device, equipment and storage medium |
CN108932113A (en) * | 2018-06-28 | 2018-12-04 | 郑州云海信息技术有限公司 | A kind of disk management method, device, equipment and readable storage medium storing program for executing |
CN110688193A (en) * | 2018-07-04 | 2020-01-14 | 阿里巴巴集团控股有限公司 | Disk processing method and device |
CN110750213A (en) * | 2019-09-09 | 2020-02-04 | 华为技术有限公司 | Hard disk management method and device |
CN110795276A (en) * | 2018-08-01 | 2020-02-14 | 阿里巴巴集团控股有限公司 | Storage medium repairing method, computer equipment and storage medium |
CN110837428A (en) * | 2018-08-16 | 2020-02-25 | 杭州海康威视系统技术有限公司 | Storage device management method and device |
CN111897684A (en) * | 2020-07-15 | 2020-11-06 | 中国工商银行股份有限公司 | Disk fault simulation test method and device and electronic equipment |
WO2024082834A1 (en) * | 2022-10-18 | 2024-04-25 | 苏州元脑智能科技有限公司 | Disk arbitration area detection method and apparatus, device, and nonvolatile readable storage medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109739702A (en) * | 2018-12-18 | 2019-05-10 | 曙光信息产业股份有限公司 | Hard disk automated detection method |
CN109669828B (en) * | 2018-12-21 | 2021-11-26 | 郑州云海信息技术有限公司 | Hard disk detection method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6324490B1 (en) * | 1999-01-25 | 2001-11-27 | J&L Fiber Services, Inc. | Monitoring system and method for a fiber processing apparatus |
US20020001152A1 (en) * | 2000-06-29 | 2002-01-03 | Ikuko Iida | Disk controller for detecting hang-up of disk storage system |
CN101127233A (en) * | 2007-09-25 | 2008-02-20 | Ut斯达康通讯有限公司 | Hard disc error detection and fault-tolerant method in stream media uses |
CN101650669A (en) * | 2008-08-14 | 2010-02-17 | 英业达股份有限公司 | Method for executing disk read-write under multi-thread |
CN104734979A (en) * | 2015-04-07 | 2015-06-24 | 北京极科极客科技有限公司 | Control method for storage device externally connected with router |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7000154B1 (en) * | 2001-11-28 | 2006-02-14 | Intel Corporation | System and method for fault detection and recovery |
CN101296135A (en) * | 2008-06-27 | 2008-10-29 | 中兴通讯股份有限公司 | Fault information processing method and device |
CN103383689A (en) * | 2012-05-03 | 2013-11-06 | 阿里巴巴集团控股有限公司 | Service process fault detection method, device and service node |
CN103488544B (en) * | 2013-09-26 | 2016-08-17 | 华为技术有限公司 | Detect the treating method and apparatus of slow dish |
CN103761180A (en) * | 2014-01-11 | 2014-04-30 | 浪潮电子信息产业股份有限公司 | Method for preventing and detecting disk faults during cluster storage |
CN104461865A (en) * | 2014-11-04 | 2015-03-25 | 哈尔滨工业大学 | Cloud environment distributed file system reliability test suite |
-
2016
- 2016-04-07 CN CN201610212740.0A patent/CN107273231A/en active Pending
-
2017
- 2017-03-09 TW TW106107797A patent/TW201737111A/en unknown
- 2017-03-24 WO PCT/CN2017/077995 patent/WO2017173927A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6324490B1 (en) * | 1999-01-25 | 2001-11-27 | J&L Fiber Services, Inc. | Monitoring system and method for a fiber processing apparatus |
US20020001152A1 (en) * | 2000-06-29 | 2002-01-03 | Ikuko Iida | Disk controller for detecting hang-up of disk storage system |
CN101127233A (en) * | 2007-09-25 | 2008-02-20 | Ut斯达康通讯有限公司 | Hard disc error detection and fault-tolerant method in stream media uses |
CN101650669A (en) * | 2008-08-14 | 2010-02-17 | 英业达股份有限公司 | Method for executing disk read-write under multi-thread |
CN104734979A (en) * | 2015-04-07 | 2015-06-24 | 北京极科极客科技有限公司 | Control method for storage device externally connected with router |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170375B (en) * | 2017-12-21 | 2020-12-18 | 创新科技术有限公司 | Overrun protection method and device in distributed storage system |
CN108170375A (en) * | 2017-12-21 | 2018-06-15 | 创新科存储技术有限公司 | Transfinite guard method and device in a kind of distributed memory system |
CN108762913A (en) * | 2018-03-23 | 2018-11-06 | 阿里巴巴集团控股有限公司 | service processing method and device |
CN108776579A (en) * | 2018-06-19 | 2018-11-09 | 郑州云海信息技术有限公司 | A kind of distributed storage cluster expansion method, device, equipment and storage medium |
CN108776579B (en) * | 2018-06-19 | 2021-10-15 | 郑州云海信息技术有限公司 | Distributed storage cluster capacity expansion method, device, equipment and storage medium |
CN108932113A (en) * | 2018-06-28 | 2018-12-04 | 郑州云海信息技术有限公司 | A kind of disk management method, device, equipment and readable storage medium storing program for executing |
CN110688193B (en) * | 2018-07-04 | 2023-05-09 | 阿里巴巴集团控股有限公司 | Disk processing method and device |
CN110688193A (en) * | 2018-07-04 | 2020-01-14 | 阿里巴巴集团控股有限公司 | Disk processing method and device |
CN110795276A (en) * | 2018-08-01 | 2020-02-14 | 阿里巴巴集团控股有限公司 | Storage medium repairing method, computer equipment and storage medium |
CN110837428A (en) * | 2018-08-16 | 2020-02-25 | 杭州海康威视系统技术有限公司 | Storage device management method and device |
CN110837428B (en) * | 2018-08-16 | 2023-09-19 | 杭州海康威视系统技术有限公司 | Storage device management method and device |
WO2021047234A1 (en) * | 2019-09-09 | 2021-03-18 | 华为技术有限公司 | Hard disk management method and apparatus |
CN110750213A (en) * | 2019-09-09 | 2020-02-04 | 华为技术有限公司 | Hard disk management method and device |
CN111897684A (en) * | 2020-07-15 | 2020-11-06 | 中国工商银行股份有限公司 | Disk fault simulation test method and device and electronic equipment |
CN111897684B (en) * | 2020-07-15 | 2023-08-15 | 中国工商银行股份有限公司 | Method and device for simulating and testing disk faults and electronic equipment |
WO2024082834A1 (en) * | 2022-10-18 | 2024-04-25 | 苏州元脑智能科技有限公司 | Disk arbitration area detection method and apparatus, device, and nonvolatile readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2017173927A1 (en) | 2017-10-12 |
TW201737111A (en) | 2017-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107273231A (en) | Distributed memory system hard disk tangles fault detect, processing method and processing device | |
CN105431862B (en) | For the key rotation of Memory Controller | |
US8365009B2 (en) | Controlled automatic healing of data-center services | |
US8862833B2 (en) | Selection of storage containers for thin-partitioned data storage based on criteria | |
CN109542645A (en) | A kind of method, apparatus, electronic equipment and storage medium calling service | |
CN108334396A (en) | The creation method and device of a kind of data processing method and device, resource group | |
CN107391268A (en) | service request processing method and device | |
CN106233269A (en) | Fine granulation bandwidth supply in Memory Controller | |
US20150058865A1 (en) | Management of bottlenecks in database systems | |
CN103226598A (en) | Method and device for accessing database and data base management system | |
CN109614276A (en) | Fault handling method, device, distributed memory system and storage medium | |
CN106598801A (en) | Coroutine monitoring method and apparatus | |
CN106484330A (en) | A kind of hybrid magnetic disc individual-layer data optimization method and device | |
CN110580195B (en) | Memory allocation method and device based on memory hot plug | |
CN102063338A (en) | Method and device for requesting exclusive resource | |
TWI759708B (en) | Method and apparatus for concurrently executing transactions in a blockchain and computer-readable storage medium and computing device | |
CN108196940A (en) | Delete the method and relevant device of container | |
CN109669822A (en) | The creation method and computer readable storage medium of electronic device, spare memory pool | |
CN107203451B (en) | Method and apparatus for handling failures in a storage system | |
CN107368324A (en) | A kind of component upgrade methods, devices and systems | |
CN102880467A (en) | Method for verifying Cache coherence protocol and multi-core processor system | |
US20090187614A1 (en) | Managing Dynamically Allocated Memory in a Computer System | |
CN112711462A (en) | Cloud platform virtual CPU hot binding method and device and computer readable storage medium | |
CN104734896A (en) | Method and system for acquiring running situations of service sub-systems | |
JP6651836B2 (en) | Information processing apparatus, shared memory management method, and shared memory management program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1245441 Country of ref document: HK |
|
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171020 |
|
RJ01 | Rejection of invention patent application after publication |