CN105975358A - Fault-tolerant method and system based on SCSI equipment - Google Patents

Fault-tolerant method and system based on SCSI equipment Download PDF

Info

Publication number
CN105975358A
CN105975358A CN201610284196.0A CN201610284196A CN105975358A CN 105975358 A CN105975358 A CN 105975358A CN 201610284196 A CN201610284196 A CN 201610284196A CN 105975358 A CN105975358 A CN 105975358A
Authority
CN
China
Prior art keywords
scsi
error
scsi device
response
application system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610284196.0A
Other languages
Chinese (zh)
Other versions
CN105975358B (en
Inventor
花瑞
文刘飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sandstone Data Technology Co Ltd
Original Assignee
Shenzhen Sandstone Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sandstone Data Technology Co Ltd filed Critical Shenzhen Sandstone Data Technology Co Ltd
Priority to CN201610284196.0A priority Critical patent/CN105975358B/en
Publication of CN105975358A publication Critical patent/CN105975358A/en
Application granted granted Critical
Publication of CN105975358B publication Critical patent/CN105975358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Abstract

The invention provides a fault-tolerant method and system based on SCSI equipment. The method comprises the following steps that: an upper application system makes an access request for the SCSI equipment through a universal block equipment layer; the universal block equipment layer receives and processes the access request, and returns a processing result to the upper application system; the upper application system receives the processing result, constructs a SCSI read/write command after determining that the request is failed, then, re-initiates access to the SCSI equipment through SG character equipment, and receives response returned by the SCSI equipment; and error processing is carried out according to the response returned by the SCSI equipment. By means of the fault-tolerant method and system based on the SCSI equipment provided by the invention, the error of the bottom SCSI equipment can be accurately perceptual; and the utilization rate of a hard disk is increased.

Description

A kind of fault-tolerance approach based on scsi device and system
Technical field
The present invention relates to Internet technical field, particularly to a kind of appearance based on scsi device Wrong method and system.
Background technology
Along with network and the development of communication technology, should in cloud storage, communication network and the Internet etc. In with, need to store substantial amounts of data, and need the data of storage to be carried out the most uninterruptedly Access, in this case, it is necessary to use possess large buffer memory, higher data throughout Employing small computer system interface (Small Computer System with low cost Interface, SCSI) storage device.Common scsi device includes hard disk, CD-ROM, DVD, magnetic tape controller etc..Owing to needs carry out continuous continual access to SCSI storage device, This can occur SCSI access errors the most unavoidably, and identification promptly and accurately also processes the SCSI of generation Access errors, could ensure data safety, and keep stability and the reliability of business.
Defined in scsi bus protocol, a kind of error message code returned with command response, is used for The abnormality residing for failed reason or equipment is ordered in instruction, in (SuSE) Linux OS SCSI driver obtains and has processed SCSI error code information.SCSI driver is a kind of Layering framework, be divided into three layers: upper strata, middle level and bottom, wherein to error code at Managing and realize in middle level and upper strata, its processing mode mainly has two kinds: retries, or directly notifies Upper layer application I/O command is not successfully performed by.
Under a linux operating system, application accesses scsi device from user's space has following several Mode: the file access interface that (1) is provided by file system conducts interviews;(2) by naked Equipment conducts interviews, and upper layer application directly uses POSIX interface that operating system provides to SCSI Equipment conducts interviews;(3) accessed by SCSI PASS THROUGH mode, i.e. upper strata should Being used in user's space and directly access the SG character device that Linux provides, application can directly be sent out CDB order is to this scsi device, and by this interface, user both can do some SCSI pipe Reason operation, substantially describes information etc. such as query facility, it is also possible to send read-write data command.Please Refering to Fig. 1, when being conducted interviews by file system or raw device, will leading to through kernel Use block layer, when IO returns from bottom, first pass through ground floor readjustment notice middle level, for processing Successfully order, SCSI middle level can be called second time readjustment and be notified block layer, in SCSI Layer judges the order needing to retry, then can be added into block request queue and again be processed.For SCSI middle level judges to need directly to return or exceed the order allowing number of retries, by second time After readjustment returns to equipment access layer, error code can be converted into the mistakes such as the EIO of block layer, enters And returning to User space application program, user cannot specifically there occurs any event by perception scsi device Barrier.When being accessed by SCSI PASS THROUGH mode, User space application program according to SCSI structure CDB read write command word, by IOCTL mode directly to SG character device Initiating request and receive return information, the I/O Request of this kind of mode is synchronization request, does not pass through The process of kernel generic block layer and other algorithm directly arrives SCSI layer, although readwrite performance is the highest, But each order can have corresponding SCSI return code, according to return code, can will be apparent from Solve the situation that scsi device hardware is current.
The distributed memory system of prior art is typically by file system interface or internal tune Realize the access to final hard disc data by the mode of raw device interface, pass through file system interface Mouth or raw device interface have passed through generic block layer when SCSI hard disk is carried out I O access, it is impossible to Obtain the error code of SCSI hard disk, it is impossible to know the concrete type of error of SCSI access errors, When hard disk error, it is impossible to tolerance mistake, directly the hard disk made mistakes can only be kicked out of cluster and start Data reconstruction, or to user's alarm triggered hand inspection hard disk health status.
The patent of Publication No. CN103543960A provides a kind of method storing data, will Scsi command is directly committed to bottom layer driving, walks around generic block layer after entering kernel state, it is to avoid Use BIO interface, by the AIO interface accessing scsi device in this scheme, run into hard disk The concrete type of error of hard disk can be directly judged during mistake.But it has the disadvantage in that 1, need to transform kernel, walk around generic block layer workload bigger;2, the I O scheduling of generic block layer Algorithm and cache mechanism are mature and stable, improve application program to a certain extent and access The performance of scsi device, the program cannot utilize these advantages;3, AIO interface can only be used Access by the way of raw device, it is impossible to use the main flow file system such as Ext4/XFS, have the biggest Limitation.
The patent of Publication No. CN103220162A provides a kind of SCSI based on HDFS Fault-tolerant optimization method and device, by amendment kernel SCSI layer, makes the SCSI layer can be with perception The replication policy of HDFS, just locates when receiving the first time readjustment of bottom in SCSI middle level Reason, can improve IO efficiency and reduce hardware fault rate.But, it needs amendment system The SCSI layer of kernel, relatively costly.Owing to hard disk error is low probability event, in order to improve IO efficiency under this low probability event, and whole kernel SCSI layer is transformed, in making Core removes perception HDFS upper strata replication policy, adds great amount of cost.
The patent of Publication No. CN102222033A provides a kind of preservation miniature computer The method and device of system interface accessing mistake, it is possible to preserve SCSI error message in time, and make Obtain application system and obtain SCSI mistake letter quickly and accurately based on the SCSI error message preserved Breath, quickly to determine the fault type of storage device according to this SCSI error message.But its It is to judge that misjudgment easily occurs based on the SCSI error message preserved, the most right The error message not preserved also cannot be carried out accurately judging.
Summary of the invention
For problem above, patent purpose of the present invention is to devise one based on scsi device Fault-tolerance approach and system, can the mistake of accurate perception bottom scsi device, improve hard Dish utilization rate.
The concrete technical scheme that the present invention provides is as follows:
A kind of fault-tolerance approach based on scsi device, including:
Upper layer application system proposes access request by generic block mechanical floor to scsi device;
Generic block mechanical floor receives described access request and processes, and result is returned Back to upper layer application system;
Upper layer application system receives described result, constructs after determining described request unsuccessfully SCSI read write command, is then initiated described scsi device again by SG character device Access, and receive the response that described scsi device returns;
Fault processing is carried out according to the response that described scsi device returns.
The present invention also provides for a kind of tolerant system based on scsi device, including:
Generic block EM equipment module, for receiving the access request of upper layer application system and processing;
According to result, judge module, for judging that described request is successfully;
First processing module, for constructing SCSI read write command after determining described request unsuccessfully,
SG character device module, for again initiating the access to scsi device, and receives institute State the response that scsi device returns;
Second processing module, is carried out at mistake for the response returned according to described scsi device Reason, and return upper layer application system request result.
Further, the response returned according to described scsi device of the present invention is carried out at mistake Reason, farther includes:
If the response that described scsi device returns is that order runs succeeded, then to described upper strata Application system returns this access request success;
If the response that described scsi device returns performs failure for order, then judge described The type of error of the response that scsi device returns, and repair according to described type of error.
Further, of the present invention repair according to described type of error, farther include:
If the type of error of the response that described scsi device returns is for repairing by covering WriteMode Multiple read error, then perform to cover write operation after being obtained redundant data by described upper layer application system Realize repairing;
If the type of error of the response that described scsi device returns is can be by bad block mapping mode The write error repaired, then be redirected to reserved area by described upper layer application system by write operation, And the damage of this data block of labelling realizes repairing;
If the type of error of the response that described scsi device returns is other mistake, then to described Upper layer application system returns concrete error code, and records fault-tolerant turkey.
Further, of the present invention determine construct SCSI read write command after described process unsuccessfully, Farther include:
If described generic block equipment is directly connect by described upper layer application system by raw device interface When mouth conducts interviews, then build described SCSI based on this generic block device address accessed and read Write order;
If described upper layer application system passes through file system indirectly to described generic block equipment interface When conducting interviews, then call corresponding file system instrument or interface to obtain the logical of this access Use block device address, build described SCSI read write command.
Accompanying drawing explanation
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is the structure chart of prior art scsi device three kinds request access mode;
Fig. 2 is the flow chart of a kind of fault-tolerance approach based on scsi device of the present invention;
Fig. 3 is the module map of a kind of tolerant system based on scsi device of the present invention;
Fig. 4 is the specific embodiment of the invention one User space software RAID block diagram;
Fig. 5 is the read error process chart of the specific embodiment of the invention one;
Fig. 6 is the write error process chart of the specific embodiment of the invention one;
Fig. 7 is the read error process chart of the specific embodiment of the invention two;
Fig. 8 is the write error process chart of the specific embodiment of the invention two.
Detailed description of the invention
The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
The present invention proposes the present invention and provides a kind of fault-tolerance approach based on scsi device, not Amendment operating system nucleus SCSI layer on the premise of, make User space storage software (include but not It is limited to software RAID and distributed memory system software) mistake of perception bottom scsi device, And complete errors repair, or the healthy shape as scsi device by verification or replication policy The state of state information evaluation equipment, to improve hard disk utilization rate.
Specific implementation is: when upper layer application system is by the way of file system or raw device When accessing scsi device, if running into application programming interfaces layer to return I O error (such as EIO Mistake), in the case of confirming that scsi device is in place, by the SG of (SuSE) Linux OS Equipment initiates the request of SCSI Pass Through mode to the address made mistakes, and receives SCSI The response that equipment returns, this respond packet error code Han SCSI, upper strata is according to error code class Type, does corresponding fault-tolerant processing.
Referring to Fig. 2, concrete steps include:
Upper layer application system proposes access request by generic block mechanical floor to scsi device;
Generic block mechanical floor receives described access request and processes, and result is returned Back to upper layer application system;Upper layer application system receives described result, determines described request Construct SCSI read write command after failure, then again initiated described by SG character device The access of scsi device, and receive the response that described scsi device returns;
Fault processing is carried out according to the response that described scsi device returns.
When the described response returned according to described scsi device carries out fault processing: if described The response that scsi device returns is that order runs succeeded, then return to described upper layer application system This access request success;If the response that described scsi device returns performs failure for order, Then judge the type of error of the response that described scsi device returns, and according to described type of error Repair.
Described repair according to described type of error, farther include: if described SCSI sets The type of error of the standby response returned is can be by covering the read error that WriteMode is repaired, then by institute State and perform after upper layer application system obtains redundant data to cover write operation realization reparation;If it is described The type of error of the response that scsi device returns is can wrongly writing by bad block mapping mode reparation By mistake, then (this region is permissible by described upper layer application system, write operation to be redirected to reserved area It is the region within the one piece of described scsi device marked in advance by described upper layer application system, Can also be other non-loss storage medium space), and the damage realization of this data block of labelling Repair;If the type of error of the response that described scsi device returns is other mistake, then to institute State upper layer application system and return concrete error code, and record fault-tolerant turkey.
Described determine when constructing SCSI read write command after described process unsuccessfully: if described upper strata should When directly described generic block equipment interface being conducted interviews by raw device interface by system, then base Described SCSI read write command is built in this generic block device address accessed;If described upper strata When described generic block equipment interface is indirectly conducted interviews by application system by file system, then adjust This generic block device address accessed, structure is obtained with corresponding file system instrument or interface Build described SCSI read write command.
According to the fault-tolerance approach based on scsi device of the present invention, the present invention also provides for a kind of base In the tolerant system of scsi device, refer to Fig. 3, including:
Generic block EM equipment module, for receiving the access request of upper layer application system and processing;
According to result, judge module, for judging that described request is successfully;
First processing module, for constructing SCSI read write command after determining described request unsuccessfully,
SG character device module, for again initiating the access to scsi device, and receives institute State the response that scsi device returns;
Second processing module, is carried out at mistake for the response returned according to described scsi device Reason.
Specific embodiment one:
The fault-tolerance approach based on scsi device of the present invention can be used for realizing Linux/Unix user State software RAID, traditional software RAID module is generally operated at kernel state, in Linux The mdraid module that core carries.User space software RAID refers to RAID function and hard The fault-tolerant processing of dish all realizes at User space, provides data redundancy and high performance to application program Read-write api interface, a User space software RAID realizes block diagram as shown in Figure 4, wherein, RAID functional module passes through block device interface, initiates IO through kernel generic block layer to hard disk Ask and receive response, specifically can use the AIO of (SuSE) Linux OS according to performance need Interface, Sync IO or other I/O engine realize the reading and writing data to hard disk raw device.? When RAID function I/O path runs into I O error, SCSI hard disk fault-tolerant processing module passes through SG Equipment interface initiates to retry and perception hard disk SCSI error code, then does corresponding fault-tolerant processing.
The fault-tolerance approach based on scsi device of the present invention is used for realizing Linux/Unix User space Software RAID, specifically comprises the following steps that
(1), when RAID function runs into read error on the I/O path of certain hard disk, please Refering to Fig. 5, concrete fault-tolerant processing flow process is as follows:
Hard disk fault-tolerant processing module sends SCSI read command, life to the SG equipment of hard disk of makeing mistakes The address parameter of order is the physical address using block device interface accessing to return when makeing mistakes.If life Make successfully, then it represents that retry and be read as merit, terminate this fault-tolerant flow process.
SCSI read command returns unsuccessfully, then judge whether SCSI error code is can be by writing covering The SCSI read error type repaired, if it is not, then return this SCSI to RAID functional module Error code, and the anomalous counts value of this hard disk is added 1, the mark judged as hard disk health status Standard, during for asynchronous statistics Hard disk error, when mistake reaches threshold value, plays dish and processes.
If the error code type that SCSI read command returns is for repairing by writing covering, then lead to Know that RAID functional module verifies out the data content on this hard disk band by RAID algorithm, Form by sending from SCSI write command to SG equipment is implemented to write covering reparation again, if order Return successfully, then repair success and terminate this fault-tolerant flow process;If SCSI write command failure, Then report the failure of fault-tolerant flow process, and the anomalous counts value of this hard disk is added 1, healthy as hard disk The standard of condition adjudgement, during for asynchronous statistics Hard disk error, when mistake reaches threshold value, does Play dish to process.
(2), when RAID function runs into write error on the I/O path of certain hard disk, please Refering to Fig. 6, concrete fault-tolerant processing flow process is as follows:
Hard disk fault-tolerant processing module sends SCSI write command to the SG equipment of hard disk of makeing mistakes.As Fruit is ordered successfully, then it represents that retries and is write as merit, terminates this fault-tolerant flow process.
SCSI write command returns unsuccessfully, then judge whether SCSI error code is can be reflected by bad block The SCSI write error type that the mode of penetrating is repaired, if it is not, then it is wrong to return up this SCSI Error code, and the anomalous counts value of this hard disk is added 1, the standard judged as hard disk health status, During for asynchronous statistics Hard disk error, when mistake reaches threshold value, play dish and process.
If the error code type that SCSI write command returns is can to repair by the way of bad block maps Multiple, then start mapping mechanism, will write request be redirected on this hard disk reserved region or its In its non-loss storage medium, follow-up to this defect block addresses reading and writing data request be all redirected To new mapping area, if mapping successfully, then repairing successfully, terminating this fault-tolerant flow process;If Map repairing failure, then return fault-tolerant failure, and the anomalous counts value of this hard disk is added 1, make The standard judged for hard disk health status, during for asynchronous statistics Hard disk error, when mistake reaches To threshold value, play dish and process.
Specific embodiment two:
In distributed memory system, when the I/O Request of logical volume is issued to system from client After in, can be through the data route service such as metadata management module, distributed raid, please The memory element asking IO fractionation to be issued to bottom (is OSD module in Ceph, deposits for one Storage unit is responsible for the reading and writing data service of a physical hard disk), it is all i.e. finally single by storage Unit initiates I/O Request to local hard drive one by one.And memory element is sent out to local SCSI hard disk When playing request, typically all carried out by file system interface or raw device interface, thus The SCSI error code that None-identified hard disk is concrete.
By the fault-tolerance approach based on scsi device of the present invention, can be at locally stored unit When running into mistake by file system interface or raw device interface accessing SCSI hard disk, root again It is configured to scsi command according to request, initiates asking of SCSI Passthrough mode to SG equipment Asking, if asking successfully, then being equivalent to retry successfully, if asking unsuccessfully, then can obtain SCSI error code, and do further fault-tolerant processing according to SCSI error code.So, not Server OS kernel and other upper strata that amendment distributed storage software is disposed are soft In the case of part module, only by revising the User space program of memory element, SCSI can be completed The fault-tolerant processing of hard disk.
The fault-tolerance approach based on scsi device of the present invention is used for distributed memory system, specifically Step is as follows:
(1), please join when the memory element disk-read of distributed memory system operates and runs into mistake Read Fig. 7, process step as follows:
Memory element sends read request to block device, and read request interface is probably file system and provides Interface, it is also possible to raw device access interface, if read return unsuccessfully, then enter fault-tolerant processing Flow process.
If file system interface, certain file during i.e. read request address is file system Deviant, then utilize file system tool queries to go out the hard disc physical address that this deviant is corresponding; If raw device access interface, then reference address is exactly hard disc physical address, it is not necessary to conversion.
Memory element sends SCSI read command, the address parameter of order to the SG equipment of hard disk For hard disc physical address.If ordering successfully, then it represents that retry and be read as merit, terminate this fault-tolerant Flow process.
SCSI read command returns unsuccessfully, then judge whether SCSI error code is can be by writing covering The SCSI read error type repaired, if it is not, then return up this SCSI error code, And the anomalous counts value of this hard disk is added 1, and the standard judged as hard disk health status, it is provided with Use during post analysis statistics Hard disk error.
If the error code that SCSI read command returns can be repaired, then to these data by writing covering Memory element (being assumed to be the second memory element) request data at copy place, or according to it The data check that its memory element preserves goes out the data content on this hard disk, by SG equipment The form sending SCSI write command is implemented to write covering reparation, if order returns successfully, then repairs Success, terminates this fault-tolerant flow process;If SCSI write command failure, then return up failure, And the anomalous counts value of this hard disk is added 1, and the standard judged as hard disk health status, it is provided with Use during post analysis statistics Hard disk error.
(2), when the memory element writing disk manipulation of distributed memory system runs into mistake, please Refering to Fig. 8, process step as follows:
Memory element sends write request to block device, and write request interface is probably file system and provides Interface, it is also possible to raw device access interface, return unsuccessfully if writing, then enter fault-tolerant processing Flow process.
If file system interface, i.e. write request address are certain files in file system Deviant, then utilize file system tool queries to go out the hard disc physical address that this deviant is corresponding; If raw device access interface, then reference address is exactly hard disc physical address, it is not necessary to conversion.
Memory element sends SCSI write command to the SG equipment of hard disk, and the address parameter of order is Hard disc physical address.If ordering successfully, then it represents that retry and write as merit, terminate this fault-tolerant stream Journey.
SCSI write command returns unsuccessfully, then judge whether SCSI error code is can be reflected by bad block The SCSI write error type that the mode of penetrating is repaired, if it is not, then return up this SCSI mistake Code, and the anomalous counts value of this hard disk is added 1, the standard judged as hard disk health status, It is provided with during post analysis statistics Hard disk error using.
If the error code that SCSI write command returns can be repaired by the way of bad block maps, then open Dynamic mapping mechanism, write request will be redirected on hard disk reserved region, follow-up to this bad block Address date read-write requests is all redirected to new mapping area, if mapping successfully, then repairs Success, terminates this fault-tolerant flow process;If mapping repairing failure, then return up failure, and will The anomalous counts value of this hard disk adds 1, the standard judged as hard disk health status, for dividing later Use during analysis statistics Hard disk error.
The detailed description of the invention of present invention described above, is not intended that scope Limit.Any technology according to the present invention is conceived various other made and is changed accordingly and become Shape, should be included in the protection domain of the claims in the present invention.

Claims (8)

1. a fault-tolerance approach based on scsi device, it is characterised in that including:
Upper layer application system proposes access request by generic block mechanical floor to scsi device;
Generic block mechanical floor receives described access request and processes, and result is returned Back to upper layer application system;
Upper layer application system receives described result, constructs after determining described request unsuccessfully SCSI read write command, is then initiated described scsi device again by SG character device Access, and receive the response that described scsi device returns;
Fault processing is carried out according to the response that described scsi device returns.
A kind of fault-tolerance approach based on scsi device the most according to claim 1, it is special Levying and be, the described response returned according to described scsi device carries out fault processing, further Including:
If the response that described scsi device returns is that order runs succeeded, then to described upper strata Application system returns this access request success;
If the response that described scsi device returns performs failure for order, then judge described The type of error of the response that scsi device returns, and repair according to described type of error.
A kind of fault-tolerance approach based on scsi device the most according to claim 2, it is special Levy and be, described repair according to described type of error, farther include:
If the type of error of the response that described scsi device returns is for repairing by covering WriteMode Multiple read error, then perform to cover write operation after being obtained redundant data by described upper layer application system Realize repairing;
If the type of error of the response that described scsi device returns is can be by bad block mapping mode The write error repaired, then be redirected to reserved area by described upper layer application system by write operation, And the damage of this data block of labelling realizes repairing;
If the type of error of the response that described scsi device returns is other mistake, then to described Upper layer application system returns concrete error code, and records fault-tolerant turkey.
A kind of fault-tolerance approach based on scsi device the most according to claim 1 and 2, It is characterized in that, described determine construct SCSI read write command after described process unsuccessfully, further Including:
If described generic block equipment is directly connect by described upper layer application system by raw device interface When mouth conducts interviews, then build described SCSI based on this generic block device address accessed and read Write order;
If described upper layer application system passes through file system indirectly to described generic block equipment interface When conducting interviews, then call corresponding file system instrument or interface to obtain the logical of this access Use block device address, build described SCSI read write command.
5. a tolerant system based on scsi device, it is characterised in that including:
Generic block EM equipment module, for receiving the access request of upper layer application system and processing;
According to result, judge module, for judging that described request is successfully;
First processing module, for constructing SCSI read write command after determining described request unsuccessfully,
SG character device module, for again initiating the access to scsi device, and receives institute State the response that scsi device returns;
Second processing module, is carried out at mistake for the response returned according to described scsi device Reason, and return upper layer application system request result.
A kind of tolerant system based on scsi device the most according to claim 5, it is special Levying and be, described upper layer application system passes through raw device interface or file system to described general Block device module conducts interviews.
A kind of tolerant system based on scsi device the most according to claim 5, it is special Levy and be, during described first processing module structure SCSI read write command, particularly as follows:
If described upper layer application system passes through raw device interface directly to described generic block equipment mould When block interface conducts interviews, then build described based on this generic block device address accessed SCSI read write command;
If described upper layer application system passes through file system indirectly to described generic block EM equipment module When interface conducts interviews, then call corresponding file system instrument or interface and access to obtain this Generic block device address, build described SCSI read write command.
A kind of tolerant system based on scsi device the most according to claim 5, it is special Levying and be, the response that described second processing module returns according to described scsi device carries out mistake Process, particularly as follows:
If the response that described scsi device returns is that order runs succeeded, then to described upper strata Application system returns this access request success;
If the response that described scsi device returns performs failure for order, then judge described The type of error of the response that scsi device returns, and repair according to described type of error.
CN201610284196.0A 2016-05-03 2016-05-03 A kind of fault-tolerance approach and system based on scsi device Active CN105975358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610284196.0A CN105975358B (en) 2016-05-03 2016-05-03 A kind of fault-tolerance approach and system based on scsi device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610284196.0A CN105975358B (en) 2016-05-03 2016-05-03 A kind of fault-tolerance approach and system based on scsi device

Publications (2)

Publication Number Publication Date
CN105975358A true CN105975358A (en) 2016-09-28
CN105975358B CN105975358B (en) 2019-02-26

Family

ID=56994819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610284196.0A Active CN105975358B (en) 2016-05-03 2016-05-03 A kind of fault-tolerance approach and system based on scsi device

Country Status (1)

Country Link
CN (1) CN105975358B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371942A (en) * 2016-08-27 2017-02-01 华为技术有限公司 Memory error processing method, and related apparatus and system
CN107038041A (en) * 2016-12-27 2017-08-11 阿里巴巴集团控股有限公司 The dynamic compatibility method of data processing method, error code, device and system
CN107391049A (en) * 2017-09-08 2017-11-24 南宁磁动电子科技有限公司 Storage connection equipment and storage system
CN108733753A (en) * 2018-04-10 2018-11-02 网宿科技股份有限公司 A kind of file reading and application entity
CN109101331A (en) * 2018-08-31 2018-12-28 郑州云海信息技术有限公司 A kind of method, system and the equipment of AIO request processing
CN112002370A (en) * 2020-07-23 2020-11-27 烽火通信科技股份有限公司 Method and device for identifying disk abnormity and distributed storage system
CN112463023A (en) * 2020-10-18 2021-03-09 苏州浪潮智能科技有限公司 Data processing method, device and equipment for read-write disk and readable medium
US11221773B2 (en) 2018-11-08 2022-01-11 Silicon Motion, Inc. Method and apparatus for performing mapping information management regarding redundant array of independent disks
CN114327662A (en) * 2021-12-30 2022-04-12 山石网科通信技术股份有限公司 Operating system processing method and device, storage medium and processor
TWI768476B (en) * 2018-11-08 2022-06-21 慧榮科技股份有限公司 Method and apparatus for performing mapping information management regarding redundant array of independent disks, and associated storage system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064460A1 (en) * 2002-10-01 2004-04-01 Subramaniyam Pooni Method and arrangement for communicating with SCSI devices
US20050259632A1 (en) * 2004-03-31 2005-11-24 Intel Corporation Load balancing and failover
CN102073605A (en) * 2010-12-27 2011-05-25 深圳市创新科信息技术有限公司 Method for storage interface bypassing Bio layer to access disk drive
CN103403667A (en) * 2012-12-19 2013-11-20 华为技术有限公司 Data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064460A1 (en) * 2002-10-01 2004-04-01 Subramaniyam Pooni Method and arrangement for communicating with SCSI devices
US20050259632A1 (en) * 2004-03-31 2005-11-24 Intel Corporation Load balancing and failover
CN102073605A (en) * 2010-12-27 2011-05-25 深圳市创新科信息技术有限公司 Method for storage interface bypassing Bio layer to access disk drive
CN103403667A (en) * 2012-12-19 2013-11-20 华为技术有限公司 Data processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
董博: "基于嵌入式Linux的海量存储系统中关键技术的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
雷旭: "嵌入式Linux操作系统的研究与开发", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371942B (en) * 2016-08-27 2019-05-03 华为技术有限公司 Memory errors processing method and relevant apparatus and system
CN106371942A (en) * 2016-08-27 2017-02-01 华为技术有限公司 Memory error processing method, and related apparatus and system
CN107038041A (en) * 2016-12-27 2017-08-11 阿里巴巴集团控股有限公司 The dynamic compatibility method of data processing method, error code, device and system
CN107391049A (en) * 2017-09-08 2017-11-24 南宁磁动电子科技有限公司 Storage connection equipment and storage system
EP3594820A4 (en) * 2018-04-10 2020-08-05 Wangsu Science & Technology Co., Ltd. File reading method and application entity
CN108733753A (en) * 2018-04-10 2018-11-02 网宿科技股份有限公司 A kind of file reading and application entity
CN108733753B (en) * 2018-04-10 2021-08-03 网宿科技股份有限公司 File reading method and application entity
CN109101331A (en) * 2018-08-31 2018-12-28 郑州云海信息技术有限公司 A kind of method, system and the equipment of AIO request processing
US11221773B2 (en) 2018-11-08 2022-01-11 Silicon Motion, Inc. Method and apparatus for performing mapping information management regarding redundant array of independent disks
TWI768476B (en) * 2018-11-08 2022-06-21 慧榮科技股份有限公司 Method and apparatus for performing mapping information management regarding redundant array of independent disks, and associated storage system
CN112002370A (en) * 2020-07-23 2020-11-27 烽火通信科技股份有限公司 Method and device for identifying disk abnormity and distributed storage system
CN112002370B (en) * 2020-07-23 2022-04-15 烽火通信科技股份有限公司 Method and device for identifying disk abnormity and distributed storage system
CN112463023A (en) * 2020-10-18 2021-03-09 苏州浪潮智能科技有限公司 Data processing method, device and equipment for read-write disk and readable medium
CN112463023B (en) * 2020-10-18 2022-08-19 苏州浪潮智能科技有限公司 Data processing method, device and equipment for read-write disk and readable medium
CN114327662A (en) * 2021-12-30 2022-04-12 山石网科通信技术股份有限公司 Operating system processing method and device, storage medium and processor

Also Published As

Publication number Publication date
CN105975358B (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN105975358A (en) Fault-tolerant method and system based on SCSI equipment
US7069465B2 (en) Method and apparatus for reliable failover involving incomplete raid disk writes in a clustering system
US7281160B2 (en) Rapid regeneration of failed disk sector in a distributed database system
US6584582B1 (en) Method of file system recovery logging
CN101571815B (en) Information system and i/o processing method
US7761677B2 (en) Clustered storage system and its control method
US6907419B1 (en) Method, system, and product for maintaining within a virtualization system a historical performance database for physical devices
US7237141B2 (en) Method for recovering data from a redundant storage object
US7613946B2 (en) Apparatus, system, and method for recovering a multivolume data set
US7210071B2 (en) Fault tracing in systems with virtualization layers
KR100749922B1 (en) Crash recovery system and method for a distributed file server using object based storage
US20090265510A1 (en) Systems and Methods for Distributing Hot Spare Disks In Storage Arrays
US20060161807A1 (en) System and method for implementing self-describing RAID configurations
US8954783B2 (en) Two-tier failover service for data disaster recovery
US7996643B2 (en) Synchronizing logical systems
CN102147713B (en) Method and device for managing network storage system
US7617373B2 (en) Apparatus, system, and method for presenting a storage volume as a virtual volume
US11221785B2 (en) Managing replication state for deleted objects
CN102164165B (en) Management method and device for network storage system
US20170277451A1 (en) Method to limit impact of partial media failure of disk drive and detect/report the loss of data for objects due to partial failure of media
US20030225585A1 (en) System and method for locating log records in multiplexed transactional logs
US9229814B2 (en) Data error recovery for a storage device
CN100543743C (en) Multiple machine file storage system and method
US7529776B2 (en) Multiple copy track stage recovery in a data storage system
US7685377B1 (en) Piecewise logical data management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Fault-tolerant method and system based on SCSI equipment

Effective date of registration: 20191101

Granted publication date: 20190226

Pledgee: Shenzhen SME financing Company limited by guarantee

Pledgor: SHENZHEN SANDSTONE DATA TECHNOLOGY Co.,Ltd.

Registration number: Y2019990000452

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20201215

Granted publication date: 20190226

Pledgee: Shenzhen SME financing Company limited by guarantee

Pledgor: SHENZHEN SANDSTONE DATA TECHNOLOGY Co.,Ltd.

Registration number: Y2019990000452

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A fault tolerant method and system based on SCSI device

Effective date of registration: 20220629

Granted publication date: 20190226

Pledgee: Shenzhen small and medium sized small loan Co.,Ltd.

Pledgor: SHENZHEN SANDSTONE DATA TECHNOLOGY Co.,Ltd.

Registration number: Y2022440020119

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230821

Granted publication date: 20190226

Pledgee: Shenzhen small and medium sized small loan Co.,Ltd.

Pledgor: SHENZHEN SANDSTONE DATA TECHNOLOGY Co.,Ltd.

Registration number: Y2022440020119