CN105975358A

CN105975358A - Fault-tolerant method and system based on SCSI equipment

Info

Publication number: CN105975358A
Application number: CN201610284196.0A
Authority: CN
Inventors: 花瑞; 文刘飞
Original assignee: Shenzhen Sandstone Data Technology Co Ltd
Current assignee: Shenzhen Sandstone Data Technology Co Ltd
Priority date: 2016-05-03
Filing date: 2016-05-03
Publication date: 2016-09-28
Anticipated expiration: 2036-05-03
Also published as: CN105975358B

Abstract

The invention provides a fault-tolerant method and system based on SCSI equipment. The method comprises the following steps that: an upper application system makes an access request for the SCSI equipment through a universal block equipment layer; the universal block equipment layer receives and processes the access request, and returns a processing result to the upper application system; the upper application system receives the processing result, constructs a SCSI read/write command after determining that the request is failed, then, re-initiates access to the SCSI equipment through SG character equipment, and receives response returned by the SCSI equipment; and error processing is carried out according to the response returned by the SCSI equipment. By means of the fault-tolerant method and system based on the SCSI equipment provided by the invention, the error of the bottom SCSI equipment can be accurately perceptual; and the utilization rate of a hard disk is increased.

Description

A kind of fault-tolerance approach based on scsi device and system

Technical field

The present invention relates to Internet technical field, particularly to a kind of appearance based on scsi device Wrong method and system.

Background technology

Along with network and the development of communication technology, should in cloud storage, communication network and the Internet etc. In with, need to store substantial amounts of data, and need the data of storage to be carried out the most uninterruptedly Access, in this case, it is necessary to use possess large buffer memory, higher data throughout Employing small computer system interface (Small Computer System with low cost Interface, SCSI) storage device.Common scsi device includes hard disk, CD-ROM, DVD, magnetic tape controller etc..Owing to needs carry out continuous continual access to SCSI storage device, This can occur SCSI access errors the most unavoidably, and identification promptly and accurately also processes the SCSI of generation Access errors, could ensure data safety, and keep stability and the reliability of business.

Defined in scsi bus protocol, a kind of error message code returned with command response, is used for The abnormality residing for failed reason or equipment is ordered in instruction, in (SuSE) Linux OS SCSI driver obtains and has processed SCSI error code information.SCSI driver is a kind of Layering framework, be divided into three layers: upper strata, middle level and bottom, wherein to error code at Managing and realize in middle level and upper strata, its processing mode mainly has two kinds: retries, or directly notifies Upper layer application I/O command is not successfully performed by.

Under a linux operating system, application accesses scsi device from user's space has following several Mode: the file access interface that (1) is provided by file system conducts interviews；(2) by naked Equipment conducts interviews, and upper layer application directly uses POSIX interface that operating system provides to SCSI Equipment conducts interviews；(3) accessed by SCSI PASS THROUGH mode, i.e. upper strata should Being used in user's space and directly access the SG character device that Linux provides, application can directly be sent out CDB order is to this scsi device, and by this interface, user both can do some SCSI pipe Reason operation, substantially describes information etc. such as query facility, it is also possible to send read-write data command.Please Refering to Fig. 1, when being conducted interviews by file system or raw device, will leading to through kernel Use block layer, when IO returns from bottom, first pass through ground floor readjustment notice middle level, for processing Successfully order, SCSI middle level can be called second time readjustment and be notified block layer, in SCSI Layer judges the order needing to retry, then can be added into block request queue and again be processed.For SCSI middle level judges to need directly to return or exceed the order allowing number of retries, by second time After readjustment returns to equipment access layer, error code can be converted into the mistakes such as the EIO of block layer, enters And returning to User space application program, user cannot specifically there occurs any event by perception scsi device Barrier.When being accessed by SCSI PASS THROUGH mode, User space application program according to SCSI structure CDB read write command word, by IOCTL mode directly to SG character device Initiating request and receive return information, the I/O Request of this kind of mode is synchronization request, does not pass through The process of kernel generic block layer and other algorithm directly arrives SCSI layer, although readwrite performance is the highest, But each order can have corresponding SCSI return code, according to return code, can will be apparent from Solve the situation that scsi device hardware is current.

The distributed memory system of prior art is typically by file system interface or internal tune Realize the access to final hard disc data by the mode of raw device interface, pass through file system interface Mouth or raw device interface have passed through generic block layer when SCSI hard disk is carried out I O access, it is impossible to Obtain the error code of SCSI hard disk, it is impossible to know the concrete type of error of SCSI access errors, When hard disk error, it is impossible to tolerance mistake, directly the hard disk made mistakes can only be kicked out of cluster and start Data reconstruction, or to user's alarm triggered hand inspection hard disk health status.

The patent of Publication No. CN103543960A provides a kind of method storing data, will Scsi command is directly committed to bottom layer driving, walks around generic block layer after entering kernel state, it is to avoid Use BIO interface, by the AIO interface accessing scsi device in this scheme, run into hard disk The concrete type of error of hard disk can be directly judged during mistake.But it has the disadvantage in that 1, need to transform kernel, walk around generic block layer workload bigger；2, the I O scheduling of generic block layer Algorithm and cache mechanism are mature and stable, improve application program to a certain extent and access The performance of scsi device, the program cannot utilize these advantages；3, AIO interface can only be used Access by the way of raw device, it is impossible to use the main flow file system such as Ext4/XFS, have the biggest Limitation.

The patent of Publication No. CN103220162A provides a kind of SCSI based on HDFS Fault-tolerant optimization method and device, by amendment kernel SCSI layer, makes the SCSI layer can be with perception The replication policy of HDFS, just locates when receiving the first time readjustment of bottom in SCSI middle level Reason, can improve IO efficiency and reduce hardware fault rate.But, it needs amendment system The SCSI layer of kernel, relatively costly.Owing to hard disk error is low probability event, in order to improve IO efficiency under this low probability event, and whole kernel SCSI layer is transformed, in making Core removes perception HDFS upper strata replication policy, adds great amount of cost.

The patent of Publication No. CN102222033A provides a kind of preservation miniature computer The method and device of system interface accessing mistake, it is possible to preserve SCSI error message in time, and make Obtain application system and obtain SCSI mistake letter quickly and accurately based on the SCSI error message preserved Breath, quickly to determine the fault type of storage device according to this SCSI error message.But its It is to judge that misjudgment easily occurs based on the SCSI error message preserved, the most right The error message not preserved also cannot be carried out accurately judging.

Summary of the invention

For problem above, patent purpose of the present invention is to devise one based on scsi device Fault-tolerance approach and system, can the mistake of accurate perception bottom scsi device, improve hard Dish utilization rate.

The concrete technical scheme that the present invention provides is as follows:

A kind of fault-tolerance approach based on scsi device, including:

Upper layer application system proposes access request by generic block mechanical floor to scsi device；

Generic block mechanical floor receives described access request and processes, and result is returned Back to upper layer application system；

Upper layer application system receives described result, constructs after determining described request unsuccessfully SCSI read write command, is then initiated described scsi device again by SG character device Access, and receive the response that described scsi device returns；

Fault processing is carried out according to the response that described scsi device returns.

The present invention also provides for a kind of tolerant system based on scsi device, including:

Generic block EM equipment module, for receiving the access request of upper layer application system and processing；

According to result, judge module, for judging that described request is successfully；

First processing module, for constructing SCSI read write command after determining described request unsuccessfully,

SG character device module, for again initiating the access to scsi device, and receives institute State the response that scsi device returns；

Second processing module, is carried out at mistake for the response returned according to described scsi device Reason, and return upper layer application system request result.

Further, the response returned according to described scsi device of the present invention is carried out at mistake Reason, farther includes:

If the response that described scsi device returns is that order runs succeeded, then to described upper strata Application system returns this access request success；

If the response that described scsi device returns performs failure for order, then judge described The type of error of the response that scsi device returns, and repair according to described type of error.

Further, of the present invention repair according to described type of error, farther include:

If the type of error of the response that described scsi device returns is for repairing by covering WriteMode Multiple read error, then perform to cover write operation after being obtained redundant data by described upper layer application system Realize repairing；

If the type of error of the response that described scsi device returns is can be by bad block mapping mode The write error repaired, then be redirected to reserved area by described upper layer application system by write operation, And the damage of this data block of labelling realizes repairing；

If the type of error of the response that described scsi device returns is other mistake, then to described Upper layer application system returns concrete error code, and records fault-tolerant turkey.

Further, of the present invention determine construct SCSI read write command after described process unsuccessfully, Farther include:

If described generic block equipment is directly connect by described upper layer application system by raw device interface When mouth conducts interviews, then build described SCSI based on this generic block device address accessed and read Write order；

If described upper layer application system passes through file system indirectly to described generic block equipment interface When conducting interviews, then call corresponding file system instrument or interface to obtain the logical of this access Use block device address, build described SCSI read write command.

Accompanying drawing explanation

Embodiments of the present invention is further illustrated referring to the drawings, wherein:

Fig. 1 is the structure chart of prior art scsi device three kinds request access mode；

Fig. 2 is the flow chart of a kind of fault-tolerance approach based on scsi device of the present invention；

Fig. 3 is the module map of a kind of tolerant system based on scsi device of the present invention；

Fig. 4 is the specific embodiment of the invention one User space software RAID block diagram；

Fig. 5 is the read error process chart of the specific embodiment of the invention one；

Fig. 6 is the write error process chart of the specific embodiment of the invention one；

Fig. 7 is the read error process chart of the specific embodiment of the invention two；

Fig. 8 is the write error process chart of the specific embodiment of the invention two.

Detailed description of the invention

The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.

The present invention proposes the present invention and provides a kind of fault-tolerance approach based on scsi device, not Amendment operating system nucleus SCSI layer on the premise of, make User space storage software (include but not It is limited to software RAID and distributed memory system software) mistake of perception bottom scsi device, And complete errors repair, or the healthy shape as scsi device by verification or replication policy The state of state information evaluation equipment, to improve hard disk utilization rate.

Specific implementation is: when upper layer application system is by the way of file system or raw device When accessing scsi device, if running into application programming interfaces layer to return I O error (such as EIO Mistake), in the case of confirming that scsi device is in place, by the SG of (SuSE) Linux OS Equipment initiates the request of SCSI Pass Through mode to the address made mistakes, and receives SCSI The response that equipment returns, this respond packet error code Han SCSI, upper strata is according to error code class Type, does corresponding fault-tolerant processing.

Referring to Fig. 2, concrete steps include:

Generic block mechanical floor receives described access request and processes, and result is returned Back to upper layer application system；Upper layer application system receives described result, determines described request Construct SCSI read write command after failure, then again initiated described by SG character device The access of scsi device, and receive the response that described scsi device returns；

When the described response returned according to described scsi device carries out fault processing: if described The response that scsi device returns is that order runs succeeded, then return to described upper layer application system This access request success；If the response that described scsi device returns performs failure for order, Then judge the type of error of the response that described scsi device returns, and according to described type of error Repair.

Described repair according to described type of error, farther include: if described SCSI sets The type of error of the standby response returned is can be by covering the read error that WriteMode is repaired, then by institute State and perform after upper layer application system obtains redundant data to cover write operation realization reparation；If it is described The type of error of the response that scsi device returns is can wrongly writing by bad block mapping mode reparation By mistake, then (this region is permissible by described upper layer application system, write operation to be redirected to reserved area It is the region within the one piece of described scsi device marked in advance by described upper layer application system, Can also be other non-loss storage medium space), and the damage realization of this data block of labelling Repair；If the type of error of the response that described scsi device returns is other mistake, then to institute State upper layer application system and return concrete error code, and record fault-tolerant turkey.

Described determine when constructing SCSI read write command after described process unsuccessfully: if described upper strata should When directly described generic block equipment interface being conducted interviews by raw device interface by system, then base Described SCSI read write command is built in this generic block device address accessed；If described upper strata When described generic block equipment interface is indirectly conducted interviews by application system by file system, then adjust This generic block device address accessed, structure is obtained with corresponding file system instrument or interface Build described SCSI read write command.

According to the fault-tolerance approach based on scsi device of the present invention, the present invention also provides for a kind of base In the tolerant system of scsi device, refer to Fig. 3, including:

Second processing module, is carried out at mistake for the response returned according to described scsi device Reason.

Specific embodiment one:

The fault-tolerance approach based on scsi device of the present invention can be used for realizing Linux/Unix user State software RAID, traditional software RAID module is generally operated at kernel state, in Linux The mdraid module that core carries.User space software RAID refers to RAID function and hard The fault-tolerant processing of dish all realizes at User space, provides data redundancy and high performance to application program Read-write api interface, a User space software RAID realizes block diagram as shown in Figure 4, wherein, RAID functional module passes through block device interface, initiates IO through kernel generic block layer to hard disk Ask and receive response, specifically can use the AIO of (SuSE) Linux OS according to performance need Interface, Sync IO or other I/O engine realize the reading and writing data to hard disk raw device.? When RAID function I/O path runs into I O error, SCSI hard disk fault-tolerant processing module passes through SG Equipment interface initiates to retry and perception hard disk SCSI error code, then does corresponding fault-tolerant processing.

The fault-tolerance approach based on scsi device of the present invention is used for realizing Linux/Unix User space Software RAID, specifically comprises the following steps that

(1), when RAID function runs into read error on the I/O path of certain hard disk, please Refering to Fig. 5, concrete fault-tolerant processing flow process is as follows:

Hard disk fault-tolerant processing module sends SCSI read command, life to the SG equipment of hard disk of makeing mistakes The address parameter of order is the physical address using block device interface accessing to return when makeing mistakes.If life Make successfully, then it represents that retry and be read as merit, terminate this fault-tolerant flow process.

SCSI read command returns unsuccessfully, then judge whether SCSI error code is can be by writing covering The SCSI read error type repaired, if it is not, then return this SCSI to RAID functional module Error code, and the anomalous counts value of this hard disk is added 1, the mark judged as hard disk health status Standard, during for asynchronous statistics Hard disk error, when mistake reaches threshold value, plays dish and processes.

If the error code type that SCSI read command returns is for repairing by writing covering, then lead to Know that RAID functional module verifies out the data content on this hard disk band by RAID algorithm, Form by sending from SCSI write command to SG equipment is implemented to write covering reparation again, if order Return successfully, then repair success and terminate this fault-tolerant flow process；If SCSI write command failure, Then report the failure of fault-tolerant flow process, and the anomalous counts value of this hard disk is added 1, healthy as hard disk The standard of condition adjudgement, during for asynchronous statistics Hard disk error, when mistake reaches threshold value, does Play dish to process.

(2), when RAID function runs into write error on the I/O path of certain hard disk, please Refering to Fig. 6, concrete fault-tolerant processing flow process is as follows:

Hard disk fault-tolerant processing module sends SCSI write command to the SG equipment of hard disk of makeing mistakes.As Fruit is ordered successfully, then it represents that retries and is write as merit, terminates this fault-tolerant flow process.

SCSI write command returns unsuccessfully, then judge whether SCSI error code is can be reflected by bad block The SCSI write error type that the mode of penetrating is repaired, if it is not, then it is wrong to return up this SCSI Error code, and the anomalous counts value of this hard disk is added 1, the standard judged as hard disk health status, During for asynchronous statistics Hard disk error, when mistake reaches threshold value, play dish and process.

If the error code type that SCSI write command returns is can to repair by the way of bad block maps Multiple, then start mapping mechanism, will write request be redirected on this hard disk reserved region or its In its non-loss storage medium, follow-up to this defect block addresses reading and writing data request be all redirected To new mapping area, if mapping successfully, then repairing successfully, terminating this fault-tolerant flow process；If Map repairing failure, then return fault-tolerant failure, and the anomalous counts value of this hard disk is added 1, make The standard judged for hard disk health status, during for asynchronous statistics Hard disk error, when mistake reaches To threshold value, play dish and process.

Specific embodiment two:

In distributed memory system, when the I/O Request of logical volume is issued to system from client After in, can be through the data route service such as metadata management module, distributed raid, please The memory element asking IO fractionation to be issued to bottom (is OSD module in Ceph, deposits for one Storage unit is responsible for the reading and writing data service of a physical hard disk), it is all i.e. finally single by storage Unit initiates I/O Request to local hard drive one by one.And memory element is sent out to local SCSI hard disk When playing request, typically all carried out by file system interface or raw device interface, thus The SCSI error code that None-identified hard disk is concrete.

By the fault-tolerance approach based on scsi device of the present invention, can be at locally stored unit When running into mistake by file system interface or raw device interface accessing SCSI hard disk, root again It is configured to scsi command according to request, initiates asking of SCSI Passthrough mode to SG equipment Asking, if asking successfully, then being equivalent to retry successfully, if asking unsuccessfully, then can obtain SCSI error code, and do further fault-tolerant processing according to SCSI error code.So, not Server OS kernel and other upper strata that amendment distributed storage software is disposed are soft In the case of part module, only by revising the User space program of memory element, SCSI can be completed The fault-tolerant processing of hard disk.

The fault-tolerance approach based on scsi device of the present invention is used for distributed memory system, specifically Step is as follows:

(1), please join when the memory element disk-read of distributed memory system operates and runs into mistake Read Fig. 7, process step as follows:

Memory element sends read request to block device, and read request interface is probably file system and provides Interface, it is also possible to raw device access interface, if read return unsuccessfully, then enter fault-tolerant processing Flow process.

If file system interface, certain file during i.e. read request address is file system Deviant, then utilize file system tool queries to go out the hard disc physical address that this deviant is corresponding； If raw device access interface, then reference address is exactly hard disc physical address, it is not necessary to conversion.

Memory element sends SCSI read command, the address parameter of order to the SG equipment of hard disk For hard disc physical address.If ordering successfully, then it represents that retry and be read as merit, terminate this fault-tolerant Flow process.

SCSI read command returns unsuccessfully, then judge whether SCSI error code is can be by writing covering The SCSI read error type repaired, if it is not, then return up this SCSI error code, And the anomalous counts value of this hard disk is added 1, and the standard judged as hard disk health status, it is provided with Use during post analysis statistics Hard disk error.

If the error code that SCSI read command returns can be repaired, then to these data by writing covering Memory element (being assumed to be the second memory element) request data at copy place, or according to it The data check that its memory element preserves goes out the data content on this hard disk, by SG equipment The form sending SCSI write command is implemented to write covering reparation, if order returns successfully, then repairs Success, terminates this fault-tolerant flow process；If SCSI write command failure, then return up failure, And the anomalous counts value of this hard disk is added 1, and the standard judged as hard disk health status, it is provided with Use during post analysis statistics Hard disk error.

(2), when the memory element writing disk manipulation of distributed memory system runs into mistake, please Refering to Fig. 8, process step as follows:

Memory element sends write request to block device, and write request interface is probably file system and provides Interface, it is also possible to raw device access interface, return unsuccessfully if writing, then enter fault-tolerant processing Flow process.

If file system interface, i.e. write request address are certain files in file system Deviant, then utilize file system tool queries to go out the hard disc physical address that this deviant is corresponding； If raw device access interface, then reference address is exactly hard disc physical address, it is not necessary to conversion.

Memory element sends SCSI write command to the SG equipment of hard disk, and the address parameter of order is Hard disc physical address.If ordering successfully, then it represents that retry and write as merit, terminate this fault-tolerant stream Journey.

SCSI write command returns unsuccessfully, then judge whether SCSI error code is can be reflected by bad block The SCSI write error type that the mode of penetrating is repaired, if it is not, then return up this SCSI mistake Code, and the anomalous counts value of this hard disk is added 1, the standard judged as hard disk health status, It is provided with during post analysis statistics Hard disk error using.

If the error code that SCSI write command returns can be repaired by the way of bad block maps, then open Dynamic mapping mechanism, write request will be redirected on hard disk reserved region, follow-up to this bad block Address date read-write requests is all redirected to new mapping area, if mapping successfully, then repairs Success, terminates this fault-tolerant flow process；If mapping repairing failure, then return up failure, and will The anomalous counts value of this hard disk adds 1, the standard judged as hard disk health status, for dividing later Use during analysis statistics Hard disk error.

The detailed description of the invention of present invention described above, is not intended that scope Limit.Any technology according to the present invention is conceived various other made and is changed accordingly and become Shape, should be included in the protection domain of the claims in the present invention.

Claims

1. a fault-tolerance approach based on scsi device, it is characterised in that including:

A kind of fault-tolerance approach based on scsi device the most according to claim 1, it is special Levying and be, the described response returned according to described scsi device carries out fault processing, further Including:

A kind of fault-tolerance approach based on scsi device the most according to claim 2, it is special Levy and be, described repair according to described type of error, farther include:

A kind of fault-tolerance approach based on scsi device the most according to claim 1 and 2, It is characterized in that, described determine construct SCSI read write command after described process unsuccessfully, further Including:

5. a tolerant system based on scsi device, it is characterised in that including:

A kind of tolerant system based on scsi device the most according to claim 5, it is special Levying and be, described upper layer application system passes through raw device interface or file system to described general Block device module conducts interviews.

A kind of tolerant system based on scsi device the most according to claim 5, it is special Levy and be, during described first processing module structure SCSI read write command, particularly as follows:

If described upper layer application system passes through raw device interface directly to described generic block equipment mould When block interface conducts interviews, then build described based on this generic block device address accessed SCSI read write command；

If described upper layer application system passes through file system indirectly to described generic block EM equipment module When interface conducts interviews, then call corresponding file system instrument or interface and access to obtain this Generic block device address, build described SCSI read write command.

A kind of tolerant system based on scsi device the most according to claim 5, it is special Levying and be, the response that described second processing module returns according to described scsi device carries out mistake Process, particularly as follows: