CN107479823B - Data verification method and device in random read-write file test - Google Patents

Data verification method and device in random read-write file test Download PDF

Info

Publication number
CN107479823B
CN107479823B CN201610398736.8A CN201610398736A CN107479823B CN 107479823 B CN107479823 B CN 107479823B CN 201610398736 A CN201610398736 A CN 201610398736A CN 107479823 B CN107479823 B CN 107479823B
Authority
CN
China
Prior art keywords
data
check code
code set
read
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610398736.8A
Other languages
Chinese (zh)
Other versions
CN107479823A (en
Inventor
张彪
田磊磊
闫卫斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610398736.8A priority Critical patent/CN107479823B/en
Publication of CN107479823A publication Critical patent/CN107479823A/en
Application granted granted Critical
Publication of CN107479823B publication Critical patent/CN107479823B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
    • G06F11/1056Updating check bits on partial write, i.e. read/modify/write
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data verification method and a device in random read-write file testing, which comprises the following steps: storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data representing the value range of the check code; randomly selecting a data block in which data needs to be written, writing the data into a plurality of copies of the data block and updating a corresponding check code set; and when the data is randomly selected to be read from the data block, reading the data from one copy of the data block and checking the read data by utilizing the check code set. The method reflects the change condition of the data stored in the data blocks, and can verify the data of each data block during the random read-write file test of the storage system during distribution.

Description

Data verification method and device in random read-write file test
Technical Field
The application belongs to the technical field of computers, and particularly relates to a data verification method and device in random read-write file testing.
Background
The random read-write file in the distributed storage system has the characteristics of multiple copies and random read-write. How to check data in a test of a random read-write file of a distributed storage system is a key technology for ensuring the correctness of the random read-write file data of the distributed storage system.
In the existing technical scheme of data verification, an application scene is an additional type File (appendix Only File). And storing the check code of the written data in a volatile or nonvolatile storage medium according to requirements, calculating the check code of the read data, comparing the calculated check code with the stored corresponding check code, and if the calculated check code is consistent with the stored corresponding check code, determining that the check is successful, otherwise, determining that the check is failed.
The existing verification scheme applicable to only an additional type file is not applicable to a random read-write file because any position in the random read-write file can be rewritten, so that a corresponding verification code can also change at any time, write operation in a distributed storage system can cause write operation failure due to various reasons, and then data of different copies of a random read-write file data block can be inconsistent, so that various possibilities exist. The purpose of data verification when testing the random read-write file of the distributed storage system is as follows: in order to verify the correctness of the data. If the existing verification mode is adopted to verify the random read-write file, the read data is inconsistent with the verification code corresponding to the data written last time, and the data is considered to be wrong. However, in the data verification of the random read-write file, for the copy with write failure, the data is still considered to be correct by reading the value written once, and only when the value which is not written is read, the data is considered to be correct, so that the problem of data correctness occurs. Therefore, the correctness of the random read-write file data cannot be accurately verified by using the conventional verification method.
Disclosure of Invention
In view of this, the present application provides a data verification method and apparatus in a random read-write file test, so as to solve the technical problem in the prior art that data verification cannot be performed when a random read-write file of a distributed storage system is tested.
In order to solve the technical problem, the application discloses a data verification method in a random read-write file test, which comprises the following steps: storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data representing the value range of the check code; randomly selecting a data block in which data needs to be written, writing the data into a plurality of copies of the data block and updating a corresponding check code set; and when the data is randomly selected to be read from the data block, reading the data from one copy of the data block and checking the read data by utilizing the check code set.
In order to solve the above technical problem, the present application further discloses a data verification apparatus in a random read-write file test, including: the storage module is used for storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data representing the value range of the check code; the writing module is used for randomly selecting a data block in which data needs to be written, writing the data into the multiple copies of the data block and updating the corresponding check code set; and the verification module is used for reading data from one copy of the data block and verifying the read data by utilizing the verification code set when the data is randomly selected to be read from the data block.
Compared with the prior art, the application can obtain the following technical effects: the method comprises the steps of reflecting that data possibly read in a data block of a random read-write file can be reflected through a check code set representing a value range, synchronizing the data written into the data block in the check code set, reflecting the change condition of the data stored in the data block, and verifying the data of each data block during the test of the random read-write file of a storage system during distribution.
Of course, it is not necessary for any one product to achieve all of the above-described technical effects simultaneously.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an architecture of a distributed storage system according to an embodiment of the present application;
FIG. 2 is a flowchart of a data verification method in a random access file test according to an embodiment of the present application;
fig. 3 is a schematic diagram of a corresponding relationship between a data block and a check code set according to an embodiment of the present application;
FIG. 4 is a flowchart at the time of data writing of an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating the updating of parity codes when data is successfully written in the embodiment of the present application;
FIG. 6 is a flowchart at the time of data writing of an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating check code update when data write fails according to an embodiment of the present application;
FIG. 8 is a flow chart of data verification according to an embodiment of the present application;
FIG. 9 is a schematic diagram illustrating the update of check codes when data reading is successful according to an embodiment of the present application;
fig. 10 is a block diagram of a data read/write apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail with reference to the drawings and examples, so that how to implement technical means to solve technical problems and achieve technical effects of the present application can be fully understood and implemented.
The embodiment of the application is applied to a distributed storage system, the topological structure of the distributed storage system is as shown in fig. 1, and the distributed storage system generally comprises a metadata server (metasserver), a file server cluster (FileServer), and a Client (Client) supporting high concurrent access, the system can continuously increase file servers through expansion to cope with the scale expansion of a later system, generally, L inux system is used as a bottom server, a metasserver process and a FileServer process are operated, interaction among the metadata server, the file server cluster and the Client is performed through a TCP/IP protocol, a plurality of different FileServer processes can be simultaneously operated on one L inux system by being distinguished by FileServer ID and TCP port, each file server manages one storage space, the metadata server and the file server can be operated on the same L inux system, but considering the performance of the distributed storage system, the metadata server and the file server are generally respectively operated on different L inux systems.
The metadata server and the file server cluster keep heartbeat connection so as to monitor and manage the state of each file server cluster; the metadata server also provides a management monitoring interface, and a management machine is used for managing and monitoring the distributed storage system; the client interface provides the function of accessing the distributed storage system by the client, and the client can directly access the metadata server to obtain the file attribute and other operations, or access the file server cluster through the metadata server to perform the read-write operation of the file.
Firstly, a metadata server inquires a corresponding file path and a file identifier for reading operation and writing operation of data from a client; after receiving the file path and the file identification returned by the metadata server, the client transfers the reading operation and the writing operation to a file server cluster corresponding to the file path, and a file processing interface of the file server cluster receives and executes a reading operation instruction or a writing operation instruction.
The distributed storage system may provide a random read/write interface, such as an openrandom access file, which returns a random access file random read/write class. When the client calls the random read-write interface, the client requests the random read-write stream of the corresponding random read-write file according to the input file name, and then the client obtains the file random read-write operation handle, so that the random read-write file can be randomly read and written through pointer operation.
The random read and write files are stored in a file server cluster, which may include one or more storage devices. For a random access file, different portions of the file may be stored on disks of one or more storage devices. When the random read-write file of the file server cluster is tested, the data is verified through the verification code sets corresponding to different parts of the random read-write file.
In the distributed storage system in the embodiment of the present application, the data length of the random access file is an integer multiple of the length of one data block, for example, the length of the data block is 4 KB. During random reading and writing, the read and written data displacement can be evenly divided by the length of the data block, so that a random reading and writing file can be understood as being divided into a plurality of data blocks by taking the data block as the minimum unit. And each data block holds multiple copies simultaneously in the distributed storage system.
And for any data block of the random read-write file, mutually exclusive reading and writing at the same time. If a block of data is being written, it will not be read, and vice versa. But there may be simultaneous reads or writes of multiple mutually disjoint data blocks of the same file. For a random read/write file, multiple mutually disjoint ranges (Range) can be selected for simultaneous reading or writing, each Range containing one or more contiguous blocks of data. After the reading and writing of a certain range is finished, another range is randomly selected (not intersected with other ranges which are being read and written), and the next reading and writing is started. Therefore, the random read-write file is ensured to have mutually disjoint ranges and mutually irrelevant ranges in time at any time.
The data reading and writing method provided by the embodiment of the application is suitable for a client in a distributed storage system, and as shown in fig. 2, the method includes the following steps.
S10, storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data which represent the value range of the check code.
The data blocks are independent of each other, and the contents of the data blocks are also irrelevant, so that corresponding check code sets need to be stored for multiple copies of each data block, and the basic form is shown in fig. 3. And the file server cluster correspondingly stores a plurality of copies of the data block and the check code set corresponding to the copies in the disk.
Fig. 3 shows three consecutive data blocks in a random access file, where each data block corresponds to a check code set, and actually, multiple copies of each data block in the distributed storage system correspond to a check code set together. The check code set may include at least two integer data, and all current valid values of the check code set may be listed, for example, the check code set is (3, 4, 5), and at this time, the check code set represents a discrete value range, and the at least two integer data are discrete values in the value range, respectively. When the valid values in the check code set increase, too much storage space is occupied and the calculation during updating is complicated. For example, for a 64-bit computer device, each integer is represented by 64 bits, and every 8 bits is a Byte (Byte), so an integer is 64/8-8 bytes; that is, each integer valid value in the check code set occupies 8 bytes, and at this time, each check code set occupies more space to cope with the situation when multiple valid values exist.
Therefore, in order to compromise between the storage space and the computation cost, the check code set preferably includes only two pieces of integer data, which respectively represent the minimum value and the maximum value of the value range represented by the check code set. For example, the check code set is [3,5], then the value range in the check code set is 3 to 5, and the valid values include 3, 4 and 5. Taking a 64-bit storage device as an example, a check code set is composed of two integer data, and occupies a space of 16 bytes, so the storage ratio of a data block (4KB) to the check code set is 256: 1.
s11, randomly selecting a data block needing to be written with data, writing the data into the multiple copies of the data block and updating the corresponding check code set.
And S12, when the data is randomly selected to be read from the data block, reading the data from one copy of the data block and checking the read data by using the check code set.
The method comprises the steps of reflecting that data possibly read in a data block of a random read-write file can be reflected through a check code set representing a value range, synchronizing the data written into the data block in the check code set, reflecting the change condition of the data stored in the data block, and verifying the data of each data block during the test of the random read-write file of a storage system during distribution.
Each read-write operation may trigger the update of the check code set, and synchronize the data of the data block with the check code set to reflect the change condition of the data stored in the data block, and can check each data block in the random read-write file during data check, where the start-stop position of the check is an integer multiple of the size of one data block (e.g., 4 KB).
After the data read-write command is executed, the data of the corresponding data block is verified, and the data read-write method may further include step S12.
And S12, after the read-write operation is completed, verifying the data of the data block according to the updated check code set.
The following describes the case of data writing and the case of data reading, respectively.
When writing data to the random access file, as shown in fig. 4, the following steps are included.
S20, randomly selecting a data block needing to be written in from the random read-write file;
s21, before writing operation is executed on the multiple copies corresponding to the data block, determining the sum of the maximum value of the check code set and a preset step length as a writing value of the current writing operation, and updating the maximum value of the check code set to be the writing value;
firstly, determining a range needing to be written in a random read-write file according to a data writing instruction, and further subdividing any data block in the range into a plurality of sub-blocks according to a storage space occupied by integer data. For example, if the data block length is 4KB and the size of each block is 8 bytes, the number of further blocks of the data block is 512, and the size of each block is 4 KB/512-8 Byte. And determining the sum of the maximum value of the check code set and a preset step length as a write-in value of the current write-in operation, filling the write-in value into each sub-block of the data block, and updating the maximum value of the check code set to the write-in value.
When the check code set includes two integer data, the preset step size is 1 in order to represent a continuous value range. If the preset step length is larger than 1, a continuous value range cannot be represented, for example, 3 is written for the first time, 5 is written for the second time, the set of the check codes may be updated to [3,5], the value range is 3 to 5, but 4 does not belong to one of the written values, if invalid data 4 is read, the check is still determined to be passed, and the check result is inaccurate.
And when the check code set comprises more than two integer data, the preset step length is greater than or equal to 1. Since the value range represented by the check code set is a discrete value range, the preset step length may be 1 or greater than 1.
For example, in the data writing scenario shown in fig. 5, the three rectangles with a value of 3 on the left represent 3 copies of a data block in the distributed storage system, and the initial values are all 3. The initial value of 3 means that the integer data currently written in each sub-block of the copy is 3. The lower rectangle represents a check code set [3,3] corresponding to the multiple copies of the data block, and the preset step length of the check code set form is 1. Before starting writing, the current maximum value of the check code set is 3, then the current writing value is determined to be 4, and the check code is updated from [3,3] to [3, 4 ].
And recording the integer data written into the data block at this time through the maximum value of the check code set so as to facilitate subsequent data check.
In a distributed storage system, a write-once success does not necessarily require that all copies of a data block be written successfully, depending on the different policies and configurations of the distributed storage system. As shown in the right rectangle of fig. 5, one copy may not be successfully written due to network timeout, but the client still determines that the writing is successful at this time because the other 2 copies are successfully written. Therefore, whether the number of copies returning the write-in success status message in the plurality of copies is greater than or equal to a second preset threshold is judged; when the number of the copies returning the writing success status message is larger than or equal to a second preset threshold, judging that the writing is successful; and when the number of the copies returning the write-in success status message is less than a second preset threshold, judging that the write-in fails.
If the written value is successfully written into each copy, judging whether the proportion of the written value successfully written into the plurality of sub-blocks of one copy is greater than or equal to a first preset threshold; and when the proportion of the write-in values successfully written into the plurality of sub-blocks is greater than or equal to a first preset threshold, returning the copy write-in success status message. And when the proportion of the write-in values successfully written into the plurality of sub-blocks is smaller than a first preset threshold, returning the copy write-in failure status message.
Therefore, for each data writing command, it needs to determine whether the data writing to the data block is successful, as shown in fig. 6, the data writing method may further include the following steps.
And S22, executing the current writing operation on the plurality of copies corresponding to the data block and judging whether the writing is successful. When the writing is successful, step S23 is executed; when the writing fails, step S24 is executed.
And S23, updating the minimum value of the check code set to the written value.
For example, in the scenario shown in fig. 5, when the write is successful, the minimum value 3 of the check code set is updated to the written value 4, and the check code set is [4, 4 ].
During the subsequent data verification process for the data block, the current value read from the data block is subjected to data verification, the content of each block of the data block is only 4, and any other value is considered as a data error. In this example, there is a copy that may have a value of 3, and if 3 is read during the verification process, it indicates that the policy of the distributed storage system needs to be adjusted, and since this write has been determined to be a successful write, the distributed storage system itself must ensure that only the copy that was successfully written is read.
S24, the minimum value of the check code set is kept unchanged.
And in the data verification process aiming at the data block, verifying the current value read from the data block and the value range of the check code set, and if the current value read from the data block is not in the value range of the check code set, representing the data error of the data block.
In the data writing scenario shown in fig. 7, the initial values of the three copies of the data block are all 3, and the check code set is updated from [3, 3[ to [3, 4[ ] before writing starts. And judging that one-time write failure occurs because the 2 copies are not successfully written, wherein the check code set is not updated when the write failure occurs, and the minimum value is kept unchanged. The contents of the three copies are now 4, 3, respectively. Before the second writing, the check code set is updated to [3,5], and the second writing still fails, and the check code set is still not updated and remains [3,5 ].
During a subsequent data check for the data block, a data check is performed on the current value read from the data block, and the content valid value of the data block includes 3, 4, and 5. In a distributed storage system, a write failure is not necessarily a true failure, because there is a possibility that some copies are already successfully written on a disk in the process of writing, but due to network failure or other reasons, a client cannot receive a response of its write success status message, and the client considers the write failure. Therefore, in the actually read data, the content of each block of the data block may be 3, 4, 5, but only 3, 4, 5, and if other values occur, it is considered as a data error.
When reading data from a random access file of a file server cluster, as shown in fig. 8, the following steps are included.
S30, reading data from one copy of the data block;
s31, judging whether the read data belongs to the value range of the check code set;
s32, when the read data belongs to the value range of the check code set, judging that the read data passes the check, and updating the minimum value of the check code set to the read data;
and S33, when the read data belongs to the value range of the check code set, judging that the read data check fails, and keeping the check code set unchanged.
When the data read by one read operation is successful, data verification is performed. Fig. 9 is an example when the data read operation is continued with the scenario shown in fig. 7. Suppose that due to write failure, the three copies of the data block are respectively 4, 3, and 5, and the corresponding check code set of the data block is [3,5], data check is performed after successful read. And for a data block in the read range, if the data block is 4KB, dividing the data block into 512 sub-blocks, and for each sub-block, an 8-byte integer value is obtained, comparing the value of each sub-block with the check code set, if the value of each sub-block is consistent and is positioned in a closed interval defined by the check code set, the check is passed, and if the value of each sub-block is consistent and is out of the closed interval defined by the check code set, the check is not passed. That is, if the read value belongs to one of 3, 4, 5, the check is passed, and if the read value is other than 3, 4, 5, the check is not passed. If the values of the sub-blocks are inconsistent, the reading fails and the verification is judged to fail.
And after the reading is successful and the verification is passed, updating the verification code set according to the read value, and updating the minimum value of the verification code set to the value read from the data block. For example, reading 4, the minimum value of the check code set is updated from 3 to 4, and the updated check code set is [4, 5 ]. In the distributed storage system, the rollback of the version is not allowed, when a certain read operation succeeds in reading, the read data is indicated to be the confirmed version, so that the data read later cannot be smaller than the version, namely, if 3 is read later, the verification is judged to be failed.
For the situation that the verification fails, a developer needs to search the existing BUG of the distributed storage system through codes and process the BUG as a high-priority event, so that the corresponding problem is solved. The size of the data block may be the same as one sector of a disk of the distributed storage system to determine which sector of the disk has a problem with randomly reading and writing files.
The following are apparatus embodiments of the present application for performing method embodiments of the present application.
Fig. 10 is a data verification apparatus in a random read/write file test according to an embodiment of the present application, including: the storage module 40 is used for storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data representing the value range of the check code; a write-in module 41, which randomly selects a data block to be written with data, writes data into multiple copies of the data block, and updates a corresponding check code set; and the checking module 42 reads data from one copy of the data block and checks the read data by using the check code set when the data is randomly selected to be read from the data block.
In one embodiment, the check code set includes two integer data and represents a continuous value range, and the two integer data are respectively a minimum value and a maximum value of the value range;
the check code set comprises more than two integer data and represents a discrete value range, and the more than two integer data are discrete values in the value range respectively.
In one embodiment, the write module includes:
a selecting submodule for randomly selecting the data block to be written from the random read-write file,
a first updating submodule, configured to determine, before performing a write operation on multiple copies corresponding to the data block, a sum of a maximum value of the check code set and a preset step size as a write value of the write operation of this time, and update the maximum value of the check code set to the write value;
the writing submodule is used for executing the writing operation of this time on the plurality of copies corresponding to the data block and judging whether the writing is successful;
the second updating submodule is used for updating the minimum value of the check code set to the written value when the writing is judged to be successful;
and the first processing submodule is used for keeping the check code set unchanged when the write failure is judged.
In one embodiment, when the set of check codes comprises two integer data, the preset step size is equal to 1,
and when the check code set comprises more than two integer data, the preset step length is greater than or equal to 1.
In one embodiment, the write submodule includes:
and the writing unit is used for subdividing each copy into a plurality of sub-blocks according to the storage space occupied by the integer data and writing the written value into each sub-block.
In one embodiment, the write submodule further comprises:
the first judgment unit is used for judging whether the proportion of the write-in values successfully written in a plurality of sub-blocks of one copy is larger than or equal to a first preset threshold or not;
and the return unit is used for returning the copy write-in success status message when the proportion of the write-in values successfully written in the plurality of sub-blocks is greater than or equal to a first preset threshold.
In one embodiment, the write submodule includes:
a second judging unit, configured to judge whether a number of copies returning a write-success status message in the multiple copies is greater than or equal to a second preset threshold;
the first determining unit is used for judging that the writing is successful when the number of the copies of the returned writing success status message is greater than or equal to a second preset threshold;
and the second determining unit is used for judging that the writing fails when the number of the copies of the returned writing success status message is less than a second preset threshold.
In one embodiment, the verification module comprises:
a read submodule for reading data from a copy of the data block;
the judging submodule is used for judging whether the read data belongs to the value range of the check code set or not;
a third updating submodule, configured to determine that the read data passes verification when the read data belongs to the value range of the check code set, and update the minimum value of the check code set to the read data;
and the second processing submodule is used for judging that the read data is not verified when the read data belongs to the value range of the check code set, and the check code set is kept unchanged.
In one embodiment, the read submodule includes:
the reading unit is used for subdividing the copy into a plurality of sub-blocks according to the storage space occupied by the integer data and respectively reading the written integer data in all the sub-blocks;
a third judging unit configured to judge whether the integer data read from all the sub-blocks are the same;
a third determining unit, configured to determine that the reading is successful when the integer data read from all the sub-blocks are the same;
and the fourth determining unit is used for judging that the reading fails and the verification fails when the integer data read from all the sub-blocks are different.
In addition, in the embodiment of the present application, each functional module may be implemented by a hardware processor (hardware processor).
An embodiment of the present application further provides a data verification apparatus in a random read/write file test, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to:
storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data representing the value range of the check code;
randomly selecting a data block in which data needs to be written, writing the data into a plurality of copies of the data block and updating a corresponding check code set;
and when the data is randomly selected to be read from the data block, reading the data from one copy of the data block and checking the read data by utilizing the check code set.
In one embodiment, the check code set includes two integer data and represents a continuous value range, and the two integer data are respectively a minimum value and a maximum value of the value range;
the check code set comprises more than two integer data and represents a discrete value range, and the more than two integer data are discrete values in the value range respectively.
In one embodiment, the randomly selecting a data block to which data needs to be written, writing data to multiple copies of the data block and updating a corresponding check code set includes:
randomly selecting a data block needing to be written with data from the random read-write file,
before writing operation is executed on a plurality of copies corresponding to the data block, determining the sum of the maximum value of the check code set and a preset step length as a written value of the current writing operation, and updating the maximum value of the check code set to be the written value;
executing the current writing operation on the plurality of copies corresponding to the data block and judging whether the writing is successful or not;
when the writing is judged to be successful, updating the minimum value of the check code set to the written value;
and when the writing fails, keeping the check code set unchanged.
In one embodiment, when the set of check codes comprises two integer data, the preset step size is equal to 1,
and when the check code set comprises more than two integer data, the preset step length is greater than or equal to 1.
In one embodiment, the performing the write operation on the multiple copies corresponding to the data block includes:
and subdividing each copy into a plurality of sub-blocks according to the storage space occupied by one integer data, and writing the written value into each sub-block.
In one embodiment, the performing the write operation on the multiple copies corresponding to the data block further includes:
judging whether the proportion of the write-in values successfully written in a plurality of sub-blocks of one copy is larger than or equal to a first preset threshold or not;
and when the proportion of the write-in values successfully written into the plurality of sub-blocks is greater than or equal to a first preset threshold, returning the copy write-in success status message.
In one embodiment, determining whether the write was successful comprises:
judging whether the number of copies returning the write-in success status message in the plurality of copies is greater than or equal to a second preset threshold or not;
when the number of the copies returning the writing success status message is larger than or equal to a second preset threshold, judging that the writing is successful;
and when the number of the copies returning the write-in success status message is less than a second preset threshold, judging that the write-in fails.
In one embodiment, when randomly selecting to read data from the data block, reading data from one copy of the data block and checking the read data with the set of check codes comprises:
reading data from one copy of the data block;
judging whether the read data belongs to the value range of the check code set or not;
when the read data belongs to the value range of the check code set, judging that the read data passes the check, and updating the minimum value of the check code set into the read data;
and when the read data belongs to the value range of the check code set, judging that the read data check is not passed, and keeping the check code set unchanged.
In one embodiment, reading data from a copy of the data block comprises:
subdividing the copy into a plurality of sub-blocks according to a storage space occupied by the integer data, and respectively reading the written integer data in all the sub-blocks;
judging whether the integer data respectively read from all the sub-blocks are the same or not;
when the integer data read from all the sub-blocks are the same, judging that the reading is successful;
and when the integer data read from all the sub-blocks are different, judging that the reading fails and the verification fails.
In one embodiment, the storage space of the data block is the same size as a sector of the distributed storage system disk.
In addition, a non-transitory computer-readable storage medium including instructions, such as a memory including instructions, which are executable by a processor of an apparatus to perform the data reading and writing method, the data writing method, and the data reading method, is provided in an embodiment of the present application. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect. Furthermore, the term "coupled" is intended to encompass any direct or indirect electrical coupling. Thus, if a first device couples to a second device, that connection may be through a direct electrical coupling or through an indirect electrical coupling via other devices and couplings. The description which follows is a preferred embodiment of the present application, but is made for the purpose of illustrating the general principles of the application and not for the purpose of limiting the scope of the application. The protection scope of the present application shall be subject to the definitions of the appended claims.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.
The foregoing description shows and describes several preferred embodiments of the present application, but as aforementioned, it is to be understood that the application is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the application, which is to be protected by the claims appended hereto.

Claims (19)

1. A data verification method in random read-write file test is characterized by comprising the following steps:
storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data representing the value range of the check code;
randomly selecting a data block in which data needs to be written, writing the data into a plurality of copies of the data block and updating a corresponding check code set;
and when the data is randomly selected to be read from the data block, reading the data from one copy of the data block and checking the read data by utilizing the check code set.
2. The method of claim 1,
the check code set comprises two integer data and represents a continuous value range, and the two integer data are respectively the minimum value and the maximum value of the value range;
the check code set comprises more than two integer data and represents a discrete value range, and the more than two integer data are discrete values in the value range respectively.
3. The method of claim 1 or 2, wherein the randomly selecting a data block to which data needs to be written, writing data to multiple copies of the data block and updating a corresponding set of check codes comprises:
randomly selecting a data block needing to be written with data from the random read-write file,
before writing operation is executed on a plurality of copies corresponding to the data block, determining the sum of the maximum value of the check code set and a preset step length as a written value of the current writing operation, and updating the maximum value of the check code set to be the written value;
executing the current writing operation on the plurality of copies corresponding to the data block and judging whether the writing is successful or not;
when the writing is judged to be successful, updating the minimum value of the check code set to the written value;
and when the writing fails, keeping the check code set unchanged.
4. The method of claim 3,
when the check code set comprises two integer data, the preset step length is equal to 1,
and when the check code set comprises more than two integer data, the preset step length is greater than or equal to 1.
5. The method according to claim 3, wherein performing the current write operation on the plurality of copies of the data block comprises:
and subdividing each copy into a plurality of sub-blocks according to the storage space occupied by one integer data, and writing the written value into each sub-block.
6. The method of claim 5, wherein performing the write operation on the plurality of copies of the data block further comprises:
judging whether the proportion of the write-in values successfully written in a plurality of sub-blocks of one copy is larger than or equal to a first preset threshold or not;
and when the proportion of the write-in values successfully written into the plurality of sub-blocks is greater than or equal to a first preset threshold, returning the copy write-in success status message.
7. The method of claim 3, wherein determining whether the write was successful comprises:
judging whether the number of copies returning the write-in success status message in the plurality of copies is greater than or equal to a second preset threshold or not;
when the number of the copies returning the writing success status message is larger than or equal to a second preset threshold, judging that the writing is successful;
and when the number of the copies returning the write-in success status message is less than a second preset threshold, judging that the write-in fails.
8. The method of claim 1, wherein when randomly selecting to read data from the data block, reading data from one copy of the data block and checking the read data with the set of check codes comprises:
reading data from one copy of the data block;
judging whether the read data belongs to the value range of the check code set or not;
when the read data belongs to the value range of the check code set, judging that the read data passes the check, and updating the minimum value of the check code set into the read data;
and when the read data belongs to the value range of the check code set, judging that the read data check is not passed, and keeping the check code set unchanged.
9. The method of claim 1, wherein reading data from one copy of the data block comprises:
subdividing the copy into a plurality of sub-blocks according to a storage space occupied by the integer data, and respectively reading the written integer data in all the sub-blocks;
judging whether the integer data respectively read from all the sub-blocks are the same or not;
when the integer data read from all the sub-blocks are the same, judging that the reading is successful;
and when the integer data read from all the sub-blocks are different, judging that the reading fails and the verification fails.
10. The method of claim 1, wherein the storage space of the data block is the same size as a sector of the disk of the distributed storage system.
11. A data verifying device in random read-write file testing is characterized by comprising:
the storage module is used for storing each data block of the random read-write file in a distributed storage system in a mode of multiple copies; the multiple copies of each data block correspond to a check code set; the check code set comprises at least two integer data representing the value range of the check code;
the writing module is used for randomly selecting a data block in which data needs to be written, writing the data into the multiple copies of the data block and updating the corresponding check code set;
and the verification module is used for reading data from one copy of the data block and verifying the read data by utilizing the verification code set when the data is randomly selected to be read from the data block.
12. The apparatus of claim 11,
the check code set comprises two integer data and represents a continuous value range, and the two integer data are respectively the minimum value and the maximum value of the value range;
the check code set comprises more than two integer data and represents a discrete value range, and the more than two integer data are discrete values in the value range respectively.
13. The apparatus of claim 11 or 12, wherein the write module comprises:
a selecting submodule for randomly selecting the data block to be written from the random read-write file,
a first updating submodule, configured to determine, before performing a write operation on multiple copies corresponding to the data block, a sum of a maximum value of the check code set and a preset step size as a write value of the write operation of this time, and update the maximum value of the check code set to the write value;
the writing submodule is used for executing the writing operation of this time on the plurality of copies corresponding to the data block and judging whether the writing is successful;
the second updating submodule is used for updating the minimum value of the check code set to the written value when the writing is judged to be successful;
and the first processing submodule is used for keeping the check code set unchanged when the write failure is judged.
14. The apparatus of claim 13,
when the check code set comprises two integer data, the preset step length is equal to 1,
and when the check code set comprises more than two integer data, the preset step length is greater than or equal to 1.
15. The apparatus of claim 13, wherein the write submodule comprises:
and the writing unit is used for subdividing each copy into a plurality of sub-blocks according to the storage space occupied by the integer data and writing the written value into each sub-block.
16. The apparatus of claim 15, wherein the write submodule further comprises:
the first judgment unit is used for judging whether the proportion of the write-in values successfully written in a plurality of sub-blocks of one copy is larger than or equal to a first preset threshold or not;
and the return unit is used for returning the copy write-in success status message when the proportion of the write-in values successfully written in the plurality of sub-blocks is greater than or equal to a first preset threshold.
17. The apparatus of claim 13, wherein the write submodule comprises:
a second judging unit, configured to judge whether a number of copies returning a write-success status message in the multiple copies is greater than or equal to a second preset threshold;
the first determining unit is used for judging that the writing is successful when the number of the copies of the returned writing success status message is greater than or equal to a second preset threshold;
and the second determining unit is used for judging that the writing fails when the number of the copies of the returned writing success status message is less than a second preset threshold.
18. The apparatus of claim 11, wherein the verification module comprises:
a read submodule for reading data from a copy of the data block;
the judging submodule is used for judging whether the read data belongs to the value range of the check code set or not;
a third updating submodule, configured to determine that the read data passes verification when the read data belongs to the value range of the check code set, and update the minimum value of the check code set to the read data;
and the second processing submodule is used for judging that the read data is not verified when the read data belongs to the value range of the check code set, and the check code set is kept unchanged.
19. The apparatus of claim 18, wherein the read submodule comprises:
the reading unit is used for subdividing the copy into a plurality of sub-blocks according to the storage space occupied by the integer data and respectively reading the written integer data in all the sub-blocks;
a third judging unit configured to judge whether the integer data read from all the sub-blocks are the same;
a third determining unit, configured to determine that the reading is successful when the integer data read from all the sub-blocks are the same;
and the fourth determining unit is used for judging that the reading fails and the verification fails when the integer data read from all the sub-blocks are different.
CN201610398736.8A 2016-06-07 2016-06-07 Data verification method and device in random read-write file test Active CN107479823B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610398736.8A CN107479823B (en) 2016-06-07 2016-06-07 Data verification method and device in random read-write file test

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610398736.8A CN107479823B (en) 2016-06-07 2016-06-07 Data verification method and device in random read-write file test

Publications (2)

Publication Number Publication Date
CN107479823A CN107479823A (en) 2017-12-15
CN107479823B true CN107479823B (en) 2020-07-21

Family

ID=60593302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610398736.8A Active CN107479823B (en) 2016-06-07 2016-06-07 Data verification method and device in random read-write file test

Country Status (1)

Country Link
CN (1) CN107479823B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363640B (en) * 2018-03-01 2020-10-30 深圳市道通智能航空技术有限公司 Data verification method and device and computer readable storage medium
CN111831297B (en) * 2019-04-17 2021-10-26 中兴通讯股份有限公司 Zero-difference upgrading method and device
CN110795407B (en) * 2019-10-14 2022-06-10 华东计算技术研究所(中国电子科技集团公司第三十二研究所) File random writing method and system suitable for distributed file system
CN110888779B (en) * 2019-11-18 2023-07-07 上海新炬网络信息技术股份有限公司 File system read-only judging method based on analog writing
CN112148523B (en) * 2020-09-11 2023-10-31 武汉华中数控股份有限公司 Verification method and device for data files in embedded system
CN112306410B (en) * 2020-10-29 2022-09-30 珠海格力电器股份有限公司 Data processing method and device for electric energy meter, storage medium and electric energy meter

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1728099A (en) * 2004-06-18 2006-02-01 微软公司 Efficient changing of replica sets in distributed fault-tolerant computing system
CN101630282A (en) * 2009-07-29 2010-01-20 国网电力科学研究院 Data backup method based on Erasure coding and copying technology
CN101950264A (en) * 2010-10-28 2011-01-19 冠捷显示科技(厦门)有限公司 Method for recovering display data by shortcut key
CN104052576A (en) * 2014-06-07 2014-09-17 华中科技大学 Data recovery method based on error correcting codes in cloud storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1728099A (en) * 2004-06-18 2006-02-01 微软公司 Efficient changing of replica sets in distributed fault-tolerant computing system
CN101630282A (en) * 2009-07-29 2010-01-20 国网电力科学研究院 Data backup method based on Erasure coding and copying technology
CN101950264A (en) * 2010-10-28 2011-01-19 冠捷显示科技(厦门)有限公司 Method for recovering display data by shortcut key
CN104052576A (en) * 2014-06-07 2014-09-17 华中科技大学 Data recovery method based on error correcting codes in cloud storage

Also Published As

Publication number Publication date
CN107479823A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN107479823B (en) Data verification method and device in random read-write file test
US9268648B1 (en) System and method for consistency verification of replicated data in a recovery system
JP4843604B2 (en) Method and system for obtaining data storage device specific information from data storage device
US11099953B2 (en) Automatic data healing using a storage controller
WO2021135280A1 (en) Data check method for distributed storage system, and related apparatus
CN111078662B (en) Block chain data storage method and device
US20140181396A1 (en) Virtual tape using a logical data container
US8458238B2 (en) Method and system for efficient write journal entry management for a distributed file system
US20160170842A1 (en) Writing to files and file meta-data
JP5352027B2 (en) Computer system management method and management apparatus
CN110287164B (en) Data recovery method and device and computer equipment
US11226746B2 (en) Automatic data healing by I/O
KR20170031004A (en) Methods and systems to detect silent corruptionof data
CN117093325A (en) Virtual machine high availability implementation method, equipment and computer readable medium
EP2936319A1 (en) Virtual tape library system
US6678107B1 (en) System and method for reading and writing N-way mirrored storage devices
CN113704026A (en) Distributed financial memory database security synchronization method, device and medium
US20130110789A1 (en) Method of, and apparatus for, recovering data on a storage system
CN110874285B (en) Method for realizing reducible write operation of EXT file system
US20220043717A1 (en) System and method for a backup data verification for a file system based backup
US11217324B2 (en) Validating data in storage systems
US9298390B2 (en) Systems and methods for copying data maintained in a dynamic storage volume and verifying the copied data
US9152637B1 (en) Just-in time formatting of file system metadata
US11593230B2 (en) Efficient mechanism for data protection against cloud region failure or site disasters and recovery time objective (RTO) improvement for backup applications
US11907102B2 (en) Dynamic debug log enabler for any protection failure jobs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant