WO2020001287A1 - 数据校验方法及装置,存储介质 - Google Patents

数据校验方法及装置,存储介质 Download PDF

Info

Publication number
WO2020001287A1
WO2020001287A1 PCT/CN2019/091227 CN2019091227W WO2020001287A1 WO 2020001287 A1 WO2020001287 A1 WO 2020001287A1 CN 2019091227 W CN2019091227 W CN 2019091227W WO 2020001287 A1 WO2020001287 A1 WO 2020001287A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
verification
storage node
data
verified
Prior art date
Application number
PCT/CN2019/091227
Other languages
English (en)
French (fr)
Inventor
宋平凡
谷跃胜
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to JP2020572791A priority Critical patent/JP7442466B2/ja
Priority to EP19825019.3A priority patent/EP3817255A4/en
Publication of WO2020001287A1 publication Critical patent/WO2020001287A1/zh
Priority to US17/133,426 priority patent/US11537304B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system

Definitions

  • the present invention relates to the field of computers, and in particular, to a data verification method and device, and a storage medium.
  • Solution 1 The verification program sends a verification request for the specified data block to the storage node where all the replicas are located; the storage node reads the data of the entire replica and uses CRC32 / MD5
  • the waiting verification algorithm calculates the entire verification code of the replica data and returns it to the verification program.
  • the verification program compares whether the verification code returned by each storage node is the same. If they are different, it determines that the data of multiple copies is inconsistent.
  • Solution 2 The verification program slices the data block twice; the verification program selects an unchecked slice and sends a verification request for the specified slice of the specified data block to the storage node where all replicas are located; the storage node reads the specified slice of the replica Data, use a check algorithm such as CRC32 / MD5 to calculate the check code of the slice data, and return it to the check program; the check program compares the check code returned by each storage node with the same, and if different, the slice Join the retry queue and make multiple retries. If the check code returned by each storage node is still different after exceeding the maximum number of retries, it is determined that the data of multiple copies is inconsistent.
  • CRC32 / MD5 a check algorithm
  • the first solution is mainly used for verifying read-only data. If the user is continuously updating the data, due to the transmission delay of the data on the network, the storage node where multiple copies of a data block are located may have partially received the write request and some have not yet received, so there is an inconsistency in a short time. Normal situation. At this time, the data verification will lead to inconsistent conclusions, which may lead to inaccurate conclusions, which will cause false positives. In addition, this solution reads the data of the entire copy for each verification request, which consumes a large amount of disk bandwidth resources of the storage node, and will cause a large performance glitch to the user's normal read request.
  • Embodiments of the present invention provide a data verification method and device, and a storage medium to solve at least the technical problems in the related art that affect the read-write performance of the user's front-end during the data verification process.
  • a data verification method includes: determining a data block to be verified among a plurality of data blocks corresponding to a predetermined file in a distributed storage system, wherein the data to be verified
  • the storage nodes where the blocks are located include: storage nodes that meet the load balancing policy in a distributed storage system; and verify the data blocks to be verified.
  • a data verification apparatus including: a determining module, configured to determine a data block to be verified among a plurality of data blocks corresponding to a predetermined file in a distributed storage system.
  • the storage node where the data block to be verified is located includes: a storage node that meets a load balancing policy in a distributed storage system; and an inspection module for verifying the data block to be verified.
  • a storage medium is also provided, where the storage medium stores program instructions, wherein when the program instructions run, the device where the storage medium is located is controlled to execute the data verification method of any one of the above.
  • a processor is further provided.
  • the processor is configured to run a program, and when the program runs, the data verification method of any one of the foregoing is performed.
  • a storage node adopting a data block to be verified includes a storage node that meets a load balancing policy in a distributed storage system.
  • the determined storage node where the data block to be verified is located includes: distributed Storage nodes that meet the load balancing policy in the storage system, in the process of verifying the data blocks to be verified, avoid a large number of verification requests focusing on several storage nodes, causing the problem that some storage nodes are overloaded.
  • FIG. 1 shows a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a data verification method
  • FIG. 3 is a schematic flowchart of processing a single file according to an optional embodiment of the present application.
  • FIG. 4 is a structural block diagram of a data verification device according to an embodiment of the present invention.
  • FIG. 5 is a structural block diagram of a computer terminal according to an embodiment of the present invention.
  • Distributed storage system A system consisting of several nodes that provides external storage services as a whole.
  • distributed algorithms are used to provide high availability and high performance, and can tolerate the failure of some nodes; for example, it can include master nodes and storage nodes.
  • Master node A node that manages metadata in a distributed storage system, also called a master node.
  • Storage node A node that stores application data in a distributed storage system. It usually consists of several storage media.
  • Data block In the distributed storage system, due to the consideration of data distribution balance, the file is divided into several blocks according to a fixed size. Each block is called a data block, which corresponds to a piece of continuous data of the original file.
  • Copies In the distributed storage system, for the sake of data security, multiple copies of a data block are stored on different storage nodes. Each copy is called a copy.
  • the verification process provided in this application is: obtaining files in the distributed storage system After the list, a verification task is created for each file and added to the thread pool, and all verification tasks are processed concurrently with a large degree of concurrency.
  • the data verification scheme affects the read-write performance of the user's front-end during the data verification process.
  • the embodiments of the present application provide corresponding solutions, which are described in detail below.
  • an embodiment of a data verification method is also provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and Although the logical order is shown in the flowchart, in some cases the steps shown or described may be performed in a different order than here.
  • the storage node reads the data block and calculates the verification code according to the verification request, and there will be multiple storage nodes in the distributed storage system, and the data verification will occupy a certain amount of the storage node being verified.
  • Disk and network bandwidth, and this application performs load balancing on storage nodes to prevent multiple verification tasks running concurrently from being focused on a certain number of storage nodes, resulting in excessive load on some storage nodes, effectively reducing
  • the contention between different verification tasks greatly improves the overall verification speed of the cluster in the distributed storage system.
  • global flow control is used to ensure unit time.
  • the number of verification requests on each storage node can be controlled within a certain range, thereby ensuring that the additional resource overhead brought by data verification on each storage node in a unit time can be controlled within an acceptable range, that is, for For a single storage node, the global flow control method can be used to limit the total number of verification requests sent by all verification tasks to an acceptable Within this range, the problem of excessive load on a single storage node is effectively avoided; it should be noted that if there is no load balancing processing method, it may happen that most verification tasks access the same storage node at the same time. When nodes use global flow control, all verification tasks will be slowed down, and the overall verification speed of the cluster will be slowed down. With the load balancing processing method, since there is no case where multiple verification tasks are aggregated on one storage node, the contention between different verification tasks is reduced, and the overall verification speed of the cluster is reduced. Effectively improved.
  • this application uses a back-off method for retrying to avoid too many verification requests affecting the normal reading and writing of the interval; by using one or more of the above-mentioned multiple methods, it is possible to reduce the high concurrent data verification process on the storage node.
  • the disk and network resources are occupied, which can effectively reduce the impact of the data verification process on the user's front-end reading and writing while maintaining the completion of the verification task quickly, so that the data verification can run normally without the user's perception.
  • FIG. 1 shows a hardware block diagram of a computer terminal (or mobile device) for implementing a data verification method.
  • the computer terminal 10 may include one or more (shown with 102a, 102b, ..., 102n in the figure) a processor 102 (the processor 102 may include but is not limited to a microcomputer).
  • a processing device such as a processor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for a communication function.
  • the computer terminal 10 can also include: display, input / output interface (I / O interface), universal serial bus (USB) port (can be included as one of the I / O interface ports), network interface, power supply And / or camera.
  • I / O interface input / output interface
  • USB universal serial bus
  • FIG. 1 is only schematic, and it does not limit the structure of the electronic device.
  • the computer terminal 10 may further include more or fewer components than those shown in FIG. 1, or have a configuration different from that shown in FIG. 1.
  • the one or more processors 102 and / or other data processing circuits described above may generally be referred to herein as "data processing circuits.”
  • the data processing circuit may be fully or partially embodied as software, hardware, firmware, or any other combination.
  • the data processing circuit may be a single independent processing module, or may be wholly or partially incorporated into any of the other elements in the computer terminal 10 (or mobile device).
  • the data processing circuit is controlled as a processor (for example, selection of a variable resistance terminal path connected to an interface).
  • the memory 104 may be used to store software programs and modules of application software, such as a program instruction / data storage device corresponding to the data verification method in the embodiment of the present invention.
  • the processor 102 runs the software programs and modules stored in the memory 104, thereby Perform various functional applications and data processing, that is, implement the data verification method of the above-mentioned application program.
  • the memory 104 may include a high-speed random access memory, and may further include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include memory remotely disposed with respect to the processor 102, and these remote memories may be connected to the computer terminal 10 through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the transmission device 106 is used for receiving or transmitting data via a network.
  • a specific example of the network described above may include a wireless network provided by a communication provider of the computer terminal 10.
  • the transmission device 106 includes a network adapter (NIC), which can be connected to other network equipment through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • the display may be, for example, a touch screen liquid crystal display (LCD), which may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
  • LCD liquid crystal display
  • FIG. 2 is a flowchart of a data verification method according to Embodiment 1 of the present application; as shown in FIG. 2, the method includes:
  • Step S202 determining a data block to be verified among a plurality of data blocks corresponding to a predetermined file in the distributed storage system, wherein the storage node where the data block to be verified is located includes: satisfying a load balancing policy in the distributed storage system Storage node
  • the above load balancing strategy can be used to balance the verification tasks of the storage nodes that store data blocks in the distributed storage system. For example, there is a large number of verification tasks on a certain storage node (the number of verification tasks reaches a certain (A threshold) or the verification task is more complex (the complexity of the verification task is greater than a certain threshold), then the load on the storage node may be considered too large; when a data block needs to be verified, it is found that it should correspond to the storage node At this time, the data block is not verified; and when the verification task on the storage node corresponding to the data block is less or the verification task is relatively simple, the data block can be subjected to verification processing; thus, distributed data can be implemented.
  • the check tasks of the storage nodes in the storage system are balanced to avoid overloading some storage nodes.
  • the storage nodes that meet the load balancing policy in the above-mentioned distributed storage system can be considered to have fewer verification tasks (the number of verifications considered to be less than a certain threshold) or calibration among multiple storage nodes in the distributed storage system.
  • a storage node whose verification task is relatively simple (complexity of the verification task is less than a certain threshold) is not limited to this.
  • the above step S202 It may be expressed as: selecting a first data block as a data block to be verified from a plurality of data blocks, where the first data block is a data block in which the number of first storage nodes reaches a predetermined number of storage nodes where the copy is located, the first A storage node is a storage node whose verification task has not reached the upper limit of the verification task. That is, among the storage nodes where the determined copies of the data blocks to be verified are located, the number of storage nodes with fewer verification tasks reaches a predetermined number.
  • the above-mentioned predetermined quantity may be preset according to requirements.
  • the above-mentioned predetermined quantity may be the quantity of all copies of the data block; that is, the above-mentioned data to be verified may be considered There are fewer verification tasks on the storage node where all copies of the block are located.
  • a first data block may be selected as a data block to be verified from a plurality of data blocks according to a first mapping table, where the first mapping table stores the address and Correspondence between the number of concurrent verification tasks accessed to the storage node.
  • data blocks can be stored in a first-in-first-out queue in advance.
  • the upper limit of the verification task (the maximum number of verification tasks) of a storage node is M
  • the first mapping table is Table A, where Table A
  • the key value represents the address of the storage node
  • the value corresponding to the key value represents the number of concurrent verifications that access the storage node.
  • the above step S202 can be expressed as: fetching a data block from the FIFO queue After that, the address information of the copy of the retrieved data block is used; for the address information of each copy, the key value column or row in Table A is used to find whether the address information of the corresponding copy exists. If it exists, determine the address of the copy.
  • the verification task on the storage node corresponding to the information reaches the above M; when the verification task on the storage node where all the copies of the above data block are located does not reach M, the value corresponding to the corresponding storage node in Table A is increased by 1, and The data block at that time is the data block to be verified; when at least one storage node exists on the storage node where all copies of the above data block are located, the number is The data block is inserted into the tail of the first-in-first-out queue, a new data block is obtained, and the above query determination process is continued until the obtained data block conforms to the equilibrium policy (the storage node colonel where all copies of the data block are located The inspection tasks did not reach the data block of M).
  • Step S204 verify the data block to be verified.
  • Embodiment 1 of the present application after determining a data block to be verified among a plurality of data blocks included in a predetermined file in a distributed storage system, verifying the data block to be verified; wherein, the determined to be verified
  • the storage node in the distributed storage system where the verification data block is located meets the load balancing policy; this solution can determine the data block to be verified based on the load balancing policy.
  • the storage node of the data block to be verified is a storage node that meets the load balancing strategy in the distributed storage system, that is, the storage node where the determined data block is located is to satisfy the load balancing strategy in the distributed storage system.
  • Storage node, that is, the above-mentioned determined data block to be verified is actually determined based on a load balancing policy.
  • step S204 may be performed as follows: performing a second slice on the data block to be verified to obtain a data block slice; The sections are checked.
  • this application provides an optional embodiment, that is, the verification of the data block slice can be performed.
  • the performance is as follows: judging whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request; when the judgment result is yes, verifying the data block slice.
  • the upper limit of the number of verification requests may be the maximum number of verification requests allowed to be sent to the storage node in a unit time, but is not limited thereto.
  • the above upper limit of the verification request may be set in advance based on characteristics of the storage node, or may be set in accordance with experience, but is not limited thereto.
  • the number of verification requests on the storage node per unit time can be controlled within a certain range, thereby ensuring that the data verification on the storage node is cited within the unit time.
  • the additional resource overhead brought by it can be controlled within an acceptable range.
  • the number of verification requests of the storage node where the data block slice is located is reset and cleared within a predetermined period of time.
  • the unit of the above unit time may be any time, such as 1s, 3s, 5s, 50ms, etc., but is not limited thereto.
  • the above predetermined time period may be considered as a unit time, but is not limited to this.
  • whether the number of verification requests for the storage node where the data block slice is located does not reach the upper limit of the verification request based on the second mapping table, wherein the second mapping table stores the storage node's Correspondence between the address and the number of verification requests to the storage node.
  • the address of the storage node can be used as the key value in the second mapping table, and the number of verification requests to access the storage node can be used as the value corresponding to the key value in the second mapping table, but it is not Limited to this.
  • the following uses the second mapping table as Table B, where the key value in Table B represents the address of the storage node, and the value corresponding to the key value represents the number of verification requests for accessing the storage node, and the maximum number of verification requests (maximum Number of verification requests) is N as an example.
  • the method may further include: checking and retrying the data block slice in a backoff and retry manner, wherein the backoff and retry The method is to delay retry on the data block slice.
  • performing a retry check on a data block slice in a backoff retry manner may be expressed as follows: In the process of performing a check retry, the time interval of the retry is determined by the retry The time interval of the last retry is determined, wherein the time interval of the current retry is greater than the time interval of the last retry.
  • the time interval of the current retry is determined by the time interval of the last retry of the current retry. It can be expressed as: a predetermined value is obtained by processing the recursive function from the last time interval, and the predetermined value is equal to The smaller value of the upper limit of the retry interval is the time interval of this retry.
  • FIG. 3 is a schematic diagram of a single file processing flow provided according to an optional embodiment of the present application. As shown in FIG. 3, the single file processing flow includes:
  • Step 301 Obtain a data block list of a file from a main control checkpoint.
  • step 302 it is determined whether there are unprocessed data blocks in the data block list. Among them, if the determination result is yes, step 303 is performed; if the determination result is no, the process ends;
  • Step 303 Take a data block that complies with the load balancing policy.
  • Step 304 Perform a second slice on the data block according to a fixed size
  • step 305 it is determined whether there are unprocessed slices; in the case where the determination result is yes, step 306 is performed; in the case where the determination result is no, step 302 is performed;
  • Step 306 Take a slice and request the check code of the slice from the storage node where the copies are located (all requests need to go through a global flow control);
  • step 307 it is determined whether the check codes are inconsistent. In the case of yes, step 308 is performed, and in the case of no, step 305 is performed;
  • Step 308 retreat and confirm.
  • each verification task first obtain all the data blocks of the file and the address information of all copies of these data blocks through the master control node, and put these data blocks into a FIFO queue. For this queue, according to the principle of storage node load balancing, a suitable data block is selected from the queue each time, the data block is sliced twice, and then all the slices are checked for consistency in the process. Verify the request for global flow control. When all slice verification of a data block is completed, the next data block is selected and processing continues; until all data block verification is complete, a single verification task ends.
  • a verification request is first sent to the storage node where the copy is located.
  • the request includes the data block name, the offset of the slice within the data block, and the length of the slice.
  • the storage node reads the data of the specified slice of the copy according to the verification request, calculates the verification code, and then returns the verification code of the slice. Compare the check codes of the slices of different copies. If the check codes of the slices of different copies are different, confirm by backoff and retry. For the slices that are finally confirmed to be inconsistent, an alarm message is given.
  • the impact on front-end read and write is comprehensively reduced through strategies such as storage node load balancing, global flow control of check requests, and backoff retry.
  • strategies such as storage node load balancing, global flow control of check requests, and backoff retry.
  • load balancing is performed on the storage nodes in this optional solution: set to allow simultaneous access
  • the maximum number of tasks M to the same storage node (equivalent to the upper limit of the verification task)
  • establish a global mapping table A (equivalent to the first mapping table above), where the key in the table is the address of the storage node, and the corresponding value is the number of concurrent tasks that will access the storage node.
  • each time a data block is fetched from the FIFO queue the address information of several copies of the data block needs to be queried in Table A to see if there may be concurrency on the storage node where a certain copy is located.
  • the number of verification tasks that access the storage node has reached M. If the number of verification tasks on the storage node where all copies of the data block are located does not reach M, the value corresponding to the corresponding storage node in Table A is increased by one and the data block is processed; otherwise, the data block is inserted into At the end of the queue, a new data block is fetched, and the above load balancing check is continued until a data block conforming to the balancing policy is fetched.
  • multiple verification tasks of the verification program will pass a global flow control when sending verification requests.
  • the global flow control method can be used to make the verification requests sent by all verification tasks. The total number is limited to an acceptable range, which effectively avoids the problem of overloading a single storage node.
  • Global flow control is used to ensure that the number of verification requests issued by the verification program to each storage node can be controlled within a certain range within a unit time.
  • the additional resource overhead brought by verification can be controlled within an acceptable range: set the maximum number of verification requests N (equivalent to the above verification request upper limit) allowed to be sent to a single storage node per unit time, and establish a Global mapping table B (equivalent to the second mapping table), where the key is the address of the storage node, and the corresponding value is the number of verification requests that all verification tasks have sent to the storage node within a certain time range. Every other short time period (for example, 1s / 1min) (equivalent to the above-mentioned predetermined time period), the statistical results corresponding to all storage nodes in Table B will be set to zero and then re-stated. This time period is called a time slice.
  • the verification program Before sending a verification request to a storage node, the verification program needs to query the table B using the address information of the storage node. The verification sent to the storage node within the current time slice Whether the number of requests reaches the upper limit N. If N has not been reached, the verification request is issued, and the number of sent requests corresponding to the storage node in Table B is increased by one; otherwise, the verification task needs to wait for a period of time and re-query Table B until the flow is met. The control conditions actually issue the request.
  • M, N, and Z are all integers greater than zero, but are not limited thereto.
  • the execution body of the above method may be a terminal as shown in FIG. 1 described above, or a program such as a verification program.
  • the verification program may be located on the terminal, or may be located on a terminal different from the terminal. Third-party devices, but not limited to this.
  • the method according to the above embodiments can be implemented by means of software plus a necessary universal hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of the present invention in essence, or a part that contributes to the existing technology, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM / RAM, magnetic disk, The optical disc) includes several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of the embodiments of the present invention.
  • FIG. 4 is a structural block diagram of a data verification apparatus according to an embodiment of the present invention.
  • the device includes:
  • a determining module 42 is configured to determine a data block to be verified among a plurality of data blocks corresponding to a predetermined file in the distributed storage system, wherein the storage node where the data block to be verified is located includes: Storage nodes with load balancing policies;
  • the above load balancing strategy can be used to balance the verification tasks of the storage nodes that store data blocks in the distributed storage system. For example, there is a large number of verification tasks on a certain storage node (the number of verification tasks reaches a certain (A threshold) or the verification task is more complex (the complexity of the verification task is greater than a certain threshold), then the load on the storage node may be considered too large; when a data block needs to be verified, it is found that it should correspond to the storage node At this time, the data block is not verified; and when the verification task on the storage node corresponding to the data block is less or the verification task is relatively simple, the data block can be subjected to verification processing; thus, distributed data can be implemented.
  • the check tasks of the storage nodes in the storage system are balanced to avoid overloading some storage nodes.
  • the storage nodes that meet the load balancing policy in the above-mentioned distributed storage system can be considered to have fewer verification tasks (the number of verifications considered to be less than a certain threshold) or calibration among multiple storage nodes in the distributed storage system.
  • a storage node whose verification task is relatively simple (complexity of the verification task is less than a certain threshold) is not limited to this.
  • the above determination module 42 is also used to select a first data block from a plurality of data blocks as a data block to be verified, where the first data block is a data block in which the number of first storage nodes in the storage node where the copy is located reaches a predetermined number,
  • the first storage node is a storage node whose verification task has not reached the upper limit of the verification task. That is, among the storage nodes where the determined copies of the data blocks to be verified are located, the number of storage nodes with fewer verification tasks reaches a predetermined number.
  • the above-mentioned predetermined quantity may be preset according to requirements.
  • the above-mentioned predetermined quantity may be the quantity of all copies of the data block; that is, the above-mentioned data to be verified may be considered There are fewer verification tasks on the storage node where all copies of the block are located.
  • the foregoing determining module 42 may select the first data block from a plurality of data blocks as the data block to be verified according to the first mapping table, where the first mapping table stores a storage Correspondence between the address of a node and the number of concurrent verification tasks that access the storage node.
  • the checking module 44 is connected to the above-mentioned determining module 42 and is used for checking the data block to be checked.
  • the verification module 44 performs verification on the data blocks to be verified Among them, the storage node in the storage node distributed storage system where the determined data block to be verified meets the load balancing policy; this solution can implement the determination of the data block to be verified based on the load balancing policy.
  • the storage node of the data block to be verified is a storage node that meets the load balancing strategy in the distributed storage system, that is, the storage node where the determined data block is located is to satisfy the load balancing strategy in the distributed storage system.
  • Storage node, that is, the above-mentioned determined data block to be verified is actually determined based on a load balancing policy.
  • the verification module 44 includes: a processing unit, configured to perform a second slice of the data block to be verified to obtain a data block slice; the verification unit, and the above processing Unit connection for verifying data block slices.
  • this application provides an optional embodiment, that is, the above verification unit is also used to determine Whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request; if the judgment result is yes, the data block slice is verified.
  • the upper limit of the number of verification requests may be the maximum number of verification requests allowed to be sent to the storage node in a unit time, but is not limited thereto.
  • the above upper limit of the verification request may be set in advance based on characteristics of the storage node, or may be set in accordance with experience, but is not limited thereto.
  • the number of verification requests on the storage node per unit time can be controlled within a certain range, thereby ensuring that the data verification on the storage node is cited within the unit time.
  • the additional resource overhead brought by it can be controlled within an acceptable range.
  • the number of verification requests of the storage node where the data block slice is located is reset and cleared within a predetermined period of time.
  • the unit of the above unit time may be any time, such as 1s, 3s, 5s, 50ms, etc., but is not limited thereto.
  • the above predetermined time period may be considered as a unit time, but is not limited to this.
  • the verification unit may determine, based on the second mapping table, whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request, wherein the second mapping table stores There is a correspondence between the address of a storage node and the number of verification requests that access the storage node.
  • the address of the storage node can be used as the key value in the second mapping table, and the number of verification requests to access the storage node can be used as the value corresponding to the key value in the second mapping table, but it is not Limited to this.
  • the device may further include: a retry module, connected to the verification unit, and configured to back up and retry the data.
  • the block slice is checked and retried. The way to evade retry is to perform delayed retry on the data slice.
  • the time interval of the current retry is determined by the time interval of the last retry of the current retry.
  • the time interval is greater than the time interval of the last retry.
  • the time interval of the current retry is determined by the time interval of the last retry of the current retry. It can be expressed as: a predetermined value is obtained by processing the recursive function from the last time interval, and the predetermined value is equal to The smaller value of the upper limit of the retry interval is the time interval of this retry.
  • the above-mentioned determining module 42 and sending module 44 correspond to steps S202 to S204 in Embodiment 1.
  • the two modules and the corresponding steps implement the same examples and application scenarios, but are not limited to the above embodiments. 1 published content. It should be noted that, as part of the device, the above modules can be run in the computer terminal 10 provided in the first embodiment.
  • An embodiment of the present invention may provide a computer terminal, and the computer terminal may be any computer terminal device in a computer terminal group.
  • the computer terminal described above may also be replaced with a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device among multiple network devices in a computer network.
  • the computer terminal may execute the program code of the following steps in the data verification method of an application program: determining a data block to be verified among a plurality of data blocks included in a predetermined file in a distributed storage system, where The storage node where the check data block is located is a storage node that meets the load balancing policy in the distributed storage system; the data block to be checked is checked.
  • FIG. 5 is a structural block diagram of a computer terminal according to an embodiment of the present invention.
  • the computer terminal 5 may include one or more processors (only one is shown in the figure), a processor 52, a memory 54, and a transmission device 56.
  • the memory 54 may be used to store software programs and modules, such as program instructions / modules corresponding to the security vulnerability detection method and device in the embodiments of the present invention.
  • the processor 52 executes the software programs and modules stored in the memory 54 to execute Various functional applications and data processing, that is, the method for detecting the above-mentioned system vulnerability attacks.
  • the memory 54 may include a high-speed random access memory, and may further include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory.
  • the memory 54 may further include a memory remotely provided with respect to the processor, and these remote memories may be connected to the terminal 5 through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the processor 52 may call the information stored in the memory and the application program through the transmission device to perform the following steps: determining a data block to be verified among a plurality of data blocks corresponding to a predetermined file in the distributed storage system, where
  • the storage node where the verification data block is located includes: a storage node that meets the load balancing policy in a distributed storage system; and verification of the data block to be verified.
  • the processor may further execute the program code of the following steps: selecting a first data block from a plurality of data blocks as a data block to be verified, where the first data block is the first of the storage nodes where the copy is located The number of storage nodes reaches a predetermined number of data blocks, and the first storage node is a storage node whose verification task has not reached the upper limit of the verification task.
  • the processor may further execute the program code of the following steps: selecting the first data block from a plurality of data blocks as the data block to be verified according to the first mapping table, where the first mapping table stores a storage Correspondence between the address of a node and the number of concurrent verification tasks that access the storage node.
  • the processor may further execute the program code of the following steps: performing secondary slicing of the data block to be verified to obtain a data block slice; and verifying the data block slice.
  • the above processor may also execute the program code of the following steps: judging whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request; if the judgment result is yes, the data block is The sections are checked.
  • the processor may further execute the program code of the following steps: judging whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request according to the second mapping table, wherein the second mapping table The correspondence relationship between the address of the storage node and the number of verification requests to access the storage node is stored.
  • the processor may further execute the program code of the following steps: after verifying the data block slice, verify and retry the data block slice in a backoff retry manner, where the backoff retry mode is Block slicing is delayed retry.
  • Embodiment 1 of the present application after determining a data block to be verified among a plurality of data blocks included in a predetermined file in a distributed storage system, verifying the data block to be verified; wherein, the determined to be verified
  • the storage node in the distributed storage system where the verification data block is located meets the load balancing policy; this solution can determine the data block to be verified based on the load balancing policy.
  • the storage node of the data block to be verified is a storage node that meets the load balancing strategy in the distributed storage system, that is, the storage node where the determined data block is located is to satisfy the load balancing strategy in the distributed storage system.
  • Storage node, that is, the above-mentioned determined data block to be verified is actually determined based on a load balancing policy.
  • FIG. 5 is only a schematic, and the computer terminal may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (Mobile Internet Devices, MID). ), PAD and other terminal equipment.
  • FIG. 5 does not limit the structure of the electronic device.
  • the computer terminal 5 may further include more or less components (such as a network interface, a display device, etc.) than those shown in FIG. 5, or have a different configuration from that shown in FIG. 5.
  • the storage medium may Including: flash disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc.
  • An embodiment of the present invention also provides a storage medium.
  • the foregoing storage medium may be used to store program code executed by the data verification method provided in the foregoing Embodiment 1.
  • the foregoing storage medium may be located in any computer terminal in a computer terminal group in a computer network, or in any mobile terminal in a mobile terminal group.
  • the storage medium is configured to store program code for performing the following steps: determining a data block to be verified among a plurality of data blocks corresponding to a predetermined file in the distributed storage system, where: The storage nodes where the data blocks to be verified are located include: storage nodes that meet the load balancing policy in the distributed storage system; and the data blocks to be verified are verified.
  • the foregoing storage medium is further configured to store program code for performing the following steps: selecting a first data block from a plurality of data blocks as a data block to be verified, where the first data block is a copy where the copy is located The number of first storage nodes in the storage node reaches a predetermined number of data blocks, and the first storage node is a storage node whose verification task has not reached the upper limit of the verification task.
  • the foregoing storage medium is further configured to store program code for performing the following steps: selecting a first data block as a data block to be verified from a plurality of data blocks according to a first mapping table, where the first The mapping table stores the correspondence between the address of the storage node and the number of concurrent verification tasks that access the storage node.
  • the storage medium is further configured to store program code for performing the following steps: performing a second slice on the data block to be verified to obtain a data block slice; and verifying the data block slice.
  • the storage medium is further configured to store program code for performing the following steps: judging whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request; when the judgment result is yes Next, verify the data block slice.
  • the storage medium is further configured to store program code for performing the following steps: judging whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request according to the second mapping table, where The second mapping table stores a correspondence between the address of the storage node and the number of verification requests for accessing the storage node.
  • the storage medium is further configured to store program code for performing the following steps: after verifying the data block slice, verify and retry the data block slice in a backoff retry manner, where the backoff retry The way to try is to delay retry on the data block slice.
  • This embodiment provides a processor, where the processor is used to run a program, and the program code executed by the data verification method provided in the foregoing Embodiment 1 is executed when the program is run.
  • the foregoing processor may be located in any computer terminal in a computer terminal group in a computer network, or in any mobile terminal in a mobile terminal group.
  • the processor is configured as program code for performing the following steps: determining a data block to be verified among a plurality of data blocks corresponding to a predetermined file in the distributed storage system, where The storage nodes where the verification data blocks are located include: storage nodes that meet the load balancing policy in the distributed storage system; and verification of the verification data blocks.
  • the processor is configured as program code for performing the following steps: selecting a first data block from a plurality of data blocks as a data block to be verified, where the first data block is The number of first storage nodes in the storage node where the copy is located reaches a predetermined number of data blocks, and the first storage node is a storage node whose verification task has not reached the upper limit of the verification task.
  • the processor is configured as program code for performing the following steps: selecting a first data block as a data block to be verified from a plurality of data blocks according to a first mapping table, where The first mapping table stores a correspondence between an address of a storage node and the number of concurrent verification tasks that access the storage node.
  • the processor is configured as program code for performing the following steps: performing a second slicing of the data block to be verified to obtain a data block slice; and verifying the data block slice.
  • the processor is configured as program code for performing the following steps: judging whether the number of verification requests of the storage node where the data block slice is located does not reach the upper limit of the verification request; when the judgment result is If yes, check the data block slice.
  • the processor is configured as program code for performing the following steps: judging whether the number of verification requests of the storage node where the data block slice is located does not reach the verification request upper limit according to the second mapping table. Number, wherein the second mapping table stores the correspondence between the address of the storage node and the number of verification requests to access the storage node.
  • the processor is configured as program code for performing the following steps: after verifying the data block slice, verifying and retrying the data block slice in a backoff retry manner, where The way to back off retry is to delay retry on the data block slice.
  • sequence numbers of the foregoing embodiments of the present invention are merely for description, and do not represent the superiority or inferiority of the embodiments.
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be combined. Integration into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present invention essentially or part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium , Including a number of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in various embodiments of the present invention.
  • the foregoing storage media include: U disks, Read-Only Memory (ROM), Random Access Memory (RAM), mobile hard disks, magnetic disks, or optical disks, and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种数据校验方法及装置,存储介质。其中,该方法包括:确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;对待校验数据块进行校验。本发明解决了相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。

Description

数据校验方法及装置,存储介质
本申请要求2018年06月28日递交的申请号为201810687959.5、发明名称为“数据校验方法及装置,存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机领域,具体而言,涉及一种数据校验方法及装置,存储介质。
背景技术
在大型分布式存储系统中,由于输入输出(input-output,简称IO)链路长,且节点数众多,单个节点发生数据损坏的概率大大上升。因此,对系统中存储的数据块做一致性检验,检查一个数据块的多个副本之间是否一致,就显得非常必要。但是数据校验本身是要额外占用分布式存储系统中网络和存储节点带宽资源的,而分布式存储系统的用户往往对低延时和高吞吐有很高的诉求,因此,需要一种数据校验方案,在快速完成数据一致性校验的同时,还能尽量降低对用户前端读写的影响。
相关技术中主要存在以下两种数据校验的方案:方案一:校验程序发送对指定数据块的校验请求给所有副本所在的存储节点;存储节点读取整个副本的数据,使用CRC32/MD5等校验算法计算该副本数据整体的校验码,并返回给校验程序;校验程序比对各个存储节点返回的校验码是否相同,如果不同,则判定多副本数据不一致。方案二:校验程序对数据块进行二次切片;校验程序选取一个未检查的切片,发送对指定数据块指定切片的校验请求给所有副本所在的存储节点;存储节点读取副本指定切片的数据,使用CRC32/MD5等校验算法计算该切片数据的校验码,并返回给校验程序;校验程序比对各个存储节点返回的校验码是否相同,如果不同,则将该切片加入重试队列,并进行多次重试。如果超过最大重试次数后,各个存储节点返回的校验码仍然不同,则判定多副本数据不一致。
对于上述方案一,其主要用于只读数据的校验。如果用户在持续地更新数据,由于数据在网络上的传输延迟,一个数据块的多个副本所在的存储节点,可能部分收到了写请求,部分还未收到,因此在短时间内存在不一致是正常的情况。此时进行数据校验就会得到不一致的结论,因而可能导致结论不准确,因而会出现误报情况的发生。另外,该方案每次的校验请求都会读取整个副本的数据,占用了存储节点较大的磁盘带宽资源, 会对用户的正常读请求造成很大的性能毛刺。
而对于上述方案二,其通过二次切片与重试,尽管可以减小校验请求给用户造成的读毛刺,且可减少数据动态变化导致的误报,但是由于数据校验会占用正在校验的存储节点一定的磁盘与网络带宽,因此,仍然存在由于用户的读写请求需要访问正在做校验的存储节点而导致性能变差的问题。
因此,相关技术中存在数据校验过程中影响用户前端的读写而导致性能变差的问题。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本发明实施例提供了一种数据校验方法及装置,存储介质,以至少解决相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
根据本发明实施例的一个方面,提供了一种数据校验方法,包括:确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;对待校验数据块进行校验。
根据本发明实施例的另一方面,还提供了一种数据校验装置,包括:确定模块,用于确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;检验模块,用于对待校验数据块进行校验。
根据本发明实施例的另一方面,还提供了一种存储介质,存储介质存储有程序指令,其中,在程序指令运行时控制存储介质所在设备执行上述任一项的数据校验方法。
根据本发明实施例的另一方面,还提供了一种处理器,处理器用于运行程序,其中,程序运行时执行上述任一项的数据校验方法。
在本发明实施例中,采用待校验数据块的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点的方式,通过确定的待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点,在对该待校验数据块进行校验的过程中,避免了大量的校验请求聚焦在几个存储节点上,造成部分存储节点负载过大的问题,有效地减少了不同校验任务间的争抢,较大地提升了分布式存储系统中集群的整体校验速度;达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前端读写的影响,从而实现了读写性能被提高的技术效果,进而解决了相关技术中由于存 在数据校验过程中影响用户前端的读写性能的技术问题。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1示出了一种用于实现数据校验方法的计算机终端(或移动设备)的硬件结构框图;
图2是根据本申请实施例1提供的数据校验方法的流程图;
图3是根据本申请可选实施例提供的单个文件的处理流程示意图;
图4是根据本发明实施例提供的一种数据校验装置的结构框图;
图5是根据本发明实施例的一种计算机终端的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
首先,在对本申请实施例进行描述的过程中出现的部分名词或术语适用于如下解释:
分布式存储系统:由若干节点构成,整体对外提供存储服务的系统,通常采用分布式算法来提供高可用与高性能,可容忍部分节点的故障;比如其可以包括主控节点和存储节点。
主控节点:分布式存储系统中管理元数据的节点,也叫Master节点。
存储节点:分布式存储系统中保存应用数据的节点,一般由若干存储介质组成。
数据块:分布式存储系统中出于数据分布均衡的考虑,将文件按照固定大小分割成若干个块,每一个块称之为一个数据块,对应原始文件的一段连续数据。
副本:分布式存储系统中出于数据安全考虑,将一个数据块复制出多份,分别存放在不同的存储节点上,每一份称之为一个副本。
由于分布式存储系统节点多、文件多、数据量大,为了高效、快速地对系统里所有的文件完成校验,本申请提供所实现的校验过程为:在获取分布式存储系统里的文件列表后,为每一个文件创建一个校验任务并加入到线程池中,以较大的并发度N并发处理所有的校验任务。
相关技术中,数据校验方案在数据校验过程中会影响用户前端的读写性能,针对该问题,本申请实施例提供了相应的解决方案,以下详细说明。
实施例1
根据本发明实施例,还提供了一种数据校验的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
由于在数据校验过程中,存储节点根据校验请求读取数据块并计算校验码,而分布式存储系统中会存在多个存储节点,并且数据校验会占用正在校验的存储节点一定的磁盘与网络带宽,而本申请对存储节点进行了负载均衡以防止并发运行的多个校验任务被大量的聚焦到某几个存储节点上而导致部分存储节点负载过大,有效地减少了不同校验任务间的争抢,较大地提升了分布式存储系统中集群的整体校验速度;另外,在多个校验任务在发送校验请求时,使用全局流控的方式以保证单位时间内对每一个存储节点上的校验请求数能控制在一定的范围内,进而保证单位时间内每个存储节点上以数据校验带来的额外资源开销可以控制在可接受的范围,即对于单个存储节点,采用全局流控的方式可以使得所有校验任务发送的校验请求总数被限制在一个可接受的范围内,有效地避免了单个存储节点负载过大的问题;需要说明的是,如果没有负载均衡的处理方式,可能会出现多数校验任务同时访问同一个存储节点的情况,而如果针对单个存储节点又采用全局流控的方式时,会使得所有校验任务都会变慢,集群的整体校验速度就会变慢。在有了负载均衡的处理方式后,由于不会出现多个校验任务扎堆在一个存储节点上校验的情况,因此,不同校验任务间的争抢少了,集群整体的校验速度就有效地提高了。
再者,本申请使用退避的方式进行重试,避免校验请求太多影响区间正常的读写;通过上述多种方式中的一个或多个方式,可以减少高并发数据校验过程对存储节点磁盘和网络的资源占用,从而在保持快速完成校验任务的同时有效降低了数据校验过程对用户前端读写的影响,使得数据校验可以在用户无感知的情况下常态化运行。
基于上述思想,本申请实施例1所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。图1示出了一种用于实现数据校验方法的计算机终端(或移动设备)的硬件结构框图。如图1所示,计算机终端10(或移动设备10)可以包括一个或多个(图中采用102a、102b,……,102n来示出)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输模块106。除此以外,还可以包括:显示器、输入/输出接口(I/O接口)、通用串行总线(USB)端口(可以作为I/O接口的端口中的一个端口被包括)、网络接口、电源和/或相机。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,计算机终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
应当注意到的是上述一个或多个处理器102和/或其他数据处理电路在本文中通常可以被称为“数据处理电路”。该数据处理电路可以全部或部分的体现为软件、硬件、固件或其他任意组合。此外,数据处理电路可为单个独立的处理模块,或全部或部分的结合到计算机终端10(或移动设备)中的其他元件中的任意一个内。如本申请实施例中所涉及到的,该数据处理电路作为一种处理器控制(例如与接口连接的可变电阻终端路径的选择)。
存储器104可用于存储应用软件的软件程序以及模块,如本发明实施例中的数据校验方法对应的程序指令/数据存储装置,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的应用程序的数据校验方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络 适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
显示器可以例如触摸屏式的液晶显示器(LCD),该液晶显示器可使得用户能够与计算机终端10(或移动设备)的用户界面进行交互。
在上述运行环境下,本申请提供了如图2所示的数据校验方法。图2是根据本申请实施例1提供的数据校验方法的流程图;如图2所示,该方法包括:
步骤S202,确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;
需要说明的是,上述负载均衡策略可以用于平衡分布式存储系统中存储数据块的存储节点的校验任务,比如:某一个存储节点上校验任务较多(校验任务的数量达到了某一个阈值)或者校验任务比较复杂(校验任务的复杂度大于某个阈值),则可以认为该存储节点的负载过大;当某一个数据块需要校验时,发现其应当对应该存储节点,此时数据块不被验证;而当数据块对应的存储节点上的校验任务较少或校验任务比较简单时,可以对该数据块进行校验处理;这样即可以实现了对分布式存储系统中存储节点的校验任务进行均衡,不至于造成某些存储节点负载过大。
需要说明的是,上述分布式存储系统中满足负载均衡策略的存储节点可以认为是分布式存储系统中的多个存储节点中校验任务较少(校验认为的数量小于某个阈值)或校验任务比较简单(校验任务的复杂度小于某个阈值)的存储节点,但并不限于此。
考虑到数据安全,一个数据块可以复制多个副本,每个副本分别存放在不同的存储节点上,即每个副本对应一个存储节点,因而在本申请的一个可选实施例中,上述步骤S202可以表现为:从多个数据块中选择出第一数据块作为待校验数据块,其中,第一数据块为副本所在的存储节点中第一存储节点的数量达到预定数量的数据块,第一存储节点为校验任务未达到校验任务上限的存储节点。即确定的待校验数据块的副本所在的存储节点中,校验任务较少的存储节点的数量达到预定数量。
需要说明的是,上述预定数量可以是根据需要进行预先设定的,在本发明的一个可选实施例中,上述预定数量可以是数据块的所有副本的数量;即可以认为上述待校验数据块的所有副本所在的存储节点上校验任务均较少。
在本申请的一个可选实施例中,可以根据第一映射表从多个数据块中选择出第一数 据块作为待校验数据块,其中,该第一映射表存储有存储节点的地址与访问到该存储节点的并发校验任务的个数之间的对应关系。
需要说明的是,数据块可以预先存储在先进先出队列中,以下以一个存储节点的校验任务上限(最大校验任务数)为M,第一映射表为表A,其中,表A中的key值表示存储节点的地址,与key值对应的value值表示访问到该存储节点的并发校验认为的个数为例,上述步骤S202可以表现为:从先进先出队列中取出一个数据块后,利用取出的数据块的副本的地址信息;针对每个副本的地址信息,在表A中的key值列或行查找是否存在对应的副本的地址信息,如果存在,确定在该副本的地址信息对应的存储节点上的校验任务是否达到上述M;在上述数据块的所有副本所在的存储节点上校验任务均未达到M时,将表A中相应存储节点对应的value加1,此时的该数据块即为待校验数据块;在上述数据块的所有副本所在的存储节点上存在至少一个存储节点上校验任务达到M时,则将该数据块插入到上述先进先出队列中的队尾,重新获取一个数据块,继续进行上述的查询确定过程,直到取到的数据块为符合均衡策略(数据块的所有副本所在的存储节点上校验任务均未达到M)的数据块。
步骤S204,对待校验数据块进行校验。
本申请上述实施例1公开的方案中,在确定分布式存储系统中预定文件所包括的多个数据块中待校验数据块之后,对待校验数据块进行校验;其中,确定的待校验数据块所处的存储节点分布式存储系统中满足负载均衡策略的存储节点;本方案可以基于负载均衡策略来实现对待校验数据块的确定。
容易注意到,由于待校验数据块的存储节点为分布式存储系统中满足负载均衡策略的存储节点,即确定的待校验数据块所处的存储节点为分布式存储系统中满足负载均衡策略的存储节点,即实际上上述确定的待校验数据块是基于负载均衡策略来确定的,因而,通过本申请实施例提供的上述方案,在对该待校验数据块进行校验的过程中,可以避免大量的校验请求聚焦在几个存储节点上而造成部分存储节点负载过大的问题,有效地减少了不同校验任务间的争抢,较大地提升了分布式存储系统中集群的整体校验速度;达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前端读写的影响,从而实现了读写性能被提高的技术效果,进而解决了相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
为了减少数据动态变化导致的误报,减少传输的流量,在本发明的一个可选实施例中,上述步骤S204可以表现为:对待校验数据块进行二次切片得到数据块切片;对数据 块切片进行校验。
需要说明的是,为了保证存储节点上引数据校验带来的额外资源开销可以控制在可接受的范围内,本申请提供了一种可选的实施例,即对数据块切片进行校验可以表现为:判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数;在判断结果为是的情况下,对数据块切片进行校验。即通过将数据块切片所在的存储节点上的校验请求数控制在一定的范围内,进而可以保证存储节点上引数据校验带来的额外资源开销可以控制在可接受的范围,进一步达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前端读写的影响,从而实现了读写性能被提高的技术效果,更好地解决了相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
需要说明的是,校验请求上限数可以是单位时间内允许发送给存储节点的最大校验请求数,但并不限于此。上述校验请求上限数可以基于存储节点的特性进行预先设定,也可以按照经验进行设定,但并不限于此。
为了更好地减少对存储节点磁盘和网络的资源占用,可以进一步保证单位时间内对存储节点上的校验请求数能控制在一定的范围内,进而保证单位时间内存储节点上引数据校验带来的额外资源开销可以控制在可接受的范围,在本发明的一个可选实施例中,上述数据块切片所在的存储节点的校验请求数在预定时间段内被重置清零。
需要说明的是,上述单位时间的单位可以是任意时间,比如1s,3s,5s,50ms等,但并不限于此,可选地,上述预定时间段可以认为是一个单位时间,但并不限于此。
在本发明的一个可选实施例中,可以基于第二映射表判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数,其中,第二映射表存储有存储节点的地址与访问到该存储节点的校验请求数之间的对应关系。
需要说明的是,上述存储节点的地址可以作为第二映射表中的key值,上述访问到该存储节点的校验请求数可以作为第二映射表中与key值对应的value值,但并不限于此。
以下以第二映射表为表B,其中,表B中的key值表示存储节点的地址,与key值对应的value值表示访问到该存储节点的校验请求数,校验请求上限数(最大校验请求数)为N为例,向某一个存储节点发送校验请求之前,用该存储节点的地址信息在表B中查询,当在该时间段中已发送给该存储节点的校验请求的数量是否达到N,如果未达到N,则发出校验请求,并将B中与该存储节点的地址对应的value值加1,;如果已经达到N,则需要等待一段时间后再重新查找表B,直到符合流控条件再发出请求。
需要说明的是,在对上述数据块切片进行校验之后,其可能会出现校验码不一致的情况,此时有很大的概率可能是用户正在修改该数据块切片对应的区间,数据正在动态变化,因而,在本申请上述实施例中,在对上述数据块切片进行校验之后,上述方法还可以包括:采用退避重试的方式对数据块切片进行检验重试,其中,退避重试的方式为对数据块切片进行延时重试。
在本申请的一个可选实施例中,采用退避重试的方式对数据块切片进行检验重试可以表现为:在进行校验重试的过程中,本次重试的时间间隔由本次重试的上一次重试的时间间隔确定,其中,本次重试的时间间隔大于上一次重试的时间间隔。
通过采用上述退避重试的方式对数据块切换进行校验重试,避免了重试请求太多影响上述数据块切片对应的区间的正常读写,也增加了对数据块切片的持续观察的时间;更进一步达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前端读写的影响,从而实现了读写性能被提高的技术效果,更好地解决了相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
需要说明的是,上述本次重试的时间间隔由本次重试的上一次重试的时间间隔确定可以表现为:由上一次的时间间隔经递推函数处理得到预定值,该预定值与重试间隔的上限值的较小值为本次重试的时间间隔。
需要说明的是,由于分布式存储系统节点多、文件多、数据量大,为了高效、快速地对系统里所有的文件完成校验,本方案在获取分布式存储系统里的文件列表后,为每一个文件创建一个校验任务并加入到线程池中,以较大的并发度并发处理所有的校验任务。其中,图3是根据本申请可选实施例提供的单个文件的处理流程示意图,如图3,单个文件的处理流程包括:
步骤301,从主控检点获取文件的数据块列表;
步骤302,判断数据块列表中是否有未处理的数据块;其中,在判断结果为是的是情况下,执行步骤303;在判断结果为否的情况下,结束;
步骤303,取一个符合负载均衡策略的数据块;
步骤304,对数据块按照固定大小进行二次切片;
步骤305,判断是否还有未处理的切片;其中,在判断结果为是的情况下,执行步骤306;在判断结果为否的情况下,执行步骤302;
步骤306,取一个切片,向几份副本所在的存储节点请求该切片的校验码(所有请求需经过一个全局流控);
步骤307,判断校验码是否不一致,在是的情况下,执行步骤308,在否的情况下,执行步骤305;
步骤308,退避重试,进行确认。
在每一个校验任务内部,首先通过主控节点获取该文件的所有数据块以及这些数据块所有副本的地址信息,将这些数据块放入一个先进先出队列。对于这个队列,依据存储节点负载均衡的原则每次从队列里选取一个合适的数据块,对数据块进行二次切片,然后依次对所有的切片进行一致性的校验,校验过程中会对校验请求做全局流控。当一个数据块的所有切片校验完成后,再选取下一个数据块,继续处理;直到所有数据块校验完毕,单个校验任务结束。
而针对单个切片的校验过程中,先对副本所在的存储节点发送校验请求,请求中包含数据块名、该切片在数据块内的偏移量、切片的长度。存储节点根据校验请求读取副本指定切片的数据并计算校验码,然后返回该切片的校验码。比较不同副本切片的校验码,如果不同副本的切片校验码不同,则通过退避重试进行确认,对于最终确认不一致的切片,给出告警信息。
即本申请上述实施例中主要通过存储节点负载均衡、校验请求全局流控、退避重试等策略来综合降低对前端读写的影响。下面分别详述,
(1)存储节点负载均衡
为了防止并发运行的多个校验任务将大量的校验请求聚焦到某几个存储节点,造成部分存储节点负载过大,本可选方案中对存储节点进行了负载均衡:设定允许同时访问到同一个存储节点的最大任务数M(相当于上述校验任务上限),并建立一张全局的映射表A(相当于上述第一映射表),表中key为存储节点的地址,相应的value是会访问到该存储节点的并发任务个数。有效地减少了不同校验任务间的争抢,较大地提升了分布式存储系统中集群的整体校验速度。
在一个校验任务内,每次从先进先出队列取出一个数据块后,需先用该数据块几个副本的地址信息在表A中查询,是否在某一个副本所在的存储节点上可能并发访问该存储节点的校验任务数已达到M。如果该数据块的所有副本所在的存储节点上校验任务数均未到达M,则将表A中相应存储节点对应的value加一,并处理该数据块;反之,则将该数据块插入到队列的队尾,并重新取一个数据块,继续上述负载均衡的检查,直到取出一个符合均衡策略的数据块。
(2)校验请求全局流控
本可选方案中,校验程序的多个校验任务在发送校验请求时会通过一个全局流控,对于单个存储节点,采用全局流控的方式可以使得所有校验任务发送的校验请求总数被限制在一个可接受的范围内,有效地避免了单个存储节点负载过大的问题。采用全局的流控用以保证单位时间内,校验程序对每一个存储节点发出的校验请求数能控制在一定的范围内,从而也就保证了单位时间内,每个存储节点上因为数据校验带来的额外资源开销可以控制在可接受的范围内:设定单位时间内允许发送给单个存储节点的最大校验请求数N(相当于上述校验请求上限数),并建立一张全局的映射表B(相当于第二映射表),表中key为存储节点的地址,相应的value是一定时间范围内,所有校验任务已经发送给该存储节点的校验请求数。每隔一个较短的时间段(例如1s/1min)(相当于上述预定时间段),表B中所有存储节点对应的统计结果会被置零,然后重新统计,这个时间段称为时间片。
在一个校验任务内,校验程序在向某一个存储节点发送校验请求前,需要先用该存储节点的地址信息在表B中查询,当前时间片内已发送给该存储节点的校验请求数是否达到上限N。如果还未达到N,则发出该校验请求,并将表B中该存储节点对应的已发送请求数加一;反之,则该校验任务需等待一段时间后重新查询表B,直到符合流控条件再实际发出该请求。
(3)退避重试
如果发现校验码不一致的切片,有大概率的可能是用户正在修改该切片对应的区间,数据正在动态变化,本优选方案会进行重试。但为了避免重试请求太多影响该区间正常的读写,也为了有一个较长的时间窗口可以对该切片持续观察,本优选方案采用了退避的方式进行重试:设定最大重试次数Z,重试间隔的初始值T0、上限值Tm,以及递推函数F(x),设第n次重试的间隔为Tn,令Tn=F(Tn-1),且Tn>=Tn-1。
在重试过程中,令第一次重试的间隔为T1=T0;之后每一次重试的间隔Tn,都需先由上一次的间隔Tn-1(相当于上一次重试的时间间隔)经递推函数F(x)算出一个临时值Tx,然后以Tx与Tm中相对更小的值作为本次重试的实际间隔(相当于本次重试的时间间隔),直到重试后发现切片的一致性校验通过,或重试次数达到Z,结束重试。
需要说明的是,上述M,N,Z都为大于零的整数,但并不限于此。
需要说明的是,上述方法的执行主体可以是如上述图1所示的终端,也可以是一个程序比如校验程序,该校验程序可以位于该终端上,也可以位于一个不同于该终端的第三方设备中,但并不限于此。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例的方法。
实施例2
根据本发明实施例,还提供了一种用于实施上述数据校验方法的装置,如图4所示,其中,图4是根据本发明实施例提供的一种数据校验装置的结构框图,该装置包括:
确定模块42,用于确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;
需要说明的是,上述负载均衡策略可以用于平衡分布式存储系统中存储数据块的存储节点的校验任务,比如:某一个存储节点上校验任务较多(校验任务的数量达到了某一个阈值)或者校验任务比较复杂(校验任务的复杂度大于某个阈值),则可以认为该存储节点的负载过大;当某一个数据块需要校验时,发现其应当对应该存储节点,此时数据块不被验证;而当数据块对应的存储节点上的校验任务较少或校验任务比较简单时,可以对该数据块进行校验处理;这样即可以实现了对分布式存储系统中存储节点的校验任务进行均衡,不至于造成某些存储节点负载过大。
需要说明的是,上述分布式存储系统中满足负载均衡策略的存储节点可以认为是分布式存储系统中的多个存储节点中校验任务较少(校验认为的数量小于某个阈值)或校验任务比较简单(校验任务的复杂度小于某个阈值)的存储节点,但并不限于此。
考虑到数据安全,一个数据块可以复制多个副本,每个副本分别存放在不同的存储节点上,即每个副本对应一个存储节点,因而在本申请的一个可选实施例中,上述确定模块42还用于从多个数据块中选择出第一数据块作为待校验数据块,其中,该第一数据 块为副本所在的存储节点中第一存储节点的数量达到预定数量的数据块,第一存储节点为校验任务未达到校验任务上限的存储节点。即确定的待校验数据块的副本所在的存储节点中,校验任务较少的存储节点的数量达到预定数量。
需要说明的是,上述预定数量可以是根据需要进行预先设定的,在本发明的一个可选实施例中,上述预定数量可以是数据块的所有副本的数量;即可以认为上述待校验数据块的所有副本所在的存储节点上校验任务均较少。
在本申请的一个可选实施例中,上述确定模块42可以根据第一映射表从多个数据块中选择出第一数据块作为待校验数据块,其中,该第一映射表存储有存储节点的地址与访问到该存储节点的并发校验任务的个数之间的对应关系。
检验模块44,与上述确定模块42连接,用于对待校验数据块进行校验。
本申请上述实施例2公开的方案中,在确定模块42确定分布式存储系统中预定文件所包括的多个数据块中待校验数据块之后,校验模块44对待校验数据块进行校验;其中,确定的待校验数据块所处的存储节点分布式存储系统中满足负载均衡策略的存储节点;本方案可以实现基于负载均衡策略来实现对待校验数据块的确定。
容易注意到,由于待校验数据块的存储节点为分布式存储系统中满足负载均衡策略的存储节点,即确定的待校验数据块所处的存储节点为分布式存储系统中满足负载均衡策略的存储节点,即实际上上述确定的待校验数据块是基于负载均衡策略来确定的,因而,通过本申请实施例提供的上述方案,在对该待校验数据块进行校验的过程中,可以避免大量的校验请求聚焦在几个存储节点上而造成部分存储节点负载过大的问题,有效地减少了不同校验任务间的争抢,较大地提升了分布式存储系统中集群的整体校验速度;达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前端读写的影响,从而实现了读写性能被提高的技术效果,进而解决了相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
为了减少传输的流量,在本发明的一个可选实施例中,上述校验模块44包括:处理单元,用于对待校验数据块进行二次切片得到数据块切片;校验单元,与上述处理单元连接,用于对数据块切片进行校验。
需要说明的是,为了保证存储节点上引数据校验带来的额外资源开销可以控制在可接受的范围内,本申请提供了一种可选的实施例,即上述校验单元还用于判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数;在判断结果为是的情况下,对数据块切片进行校验。即通过将数据块切片所在的存储节点上的校验请求数控制在一 定的范围内,进而可以保证存储节点上引数据校验带来的额外资源开销可以控制在可接受的范围,进一步达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前端读写的影响,从而实现了读写性能被提高的技术效果,更好地解决了相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
需要说明的是,校验请求上限数可以是单位时间内允许发送给存储节点的最大校验请求数,但并不限于此。上述校验请求上限数可以基于存储节点的特性进行预先设定,也可以按照经验进行设定,但并不限于此。
为了更好地减少对存储节点磁盘和网络的资源占用,可以进一步保证单位时间内对存储节点上的校验请求数能控制在一定的范围内,进而保证单位时间内存储节点上引数据校验带来的额外资源开销可以控制在可接受的范围,在本发明的一个可选实施例中,上述数据块切片所在的存储节点的校验请求数在预定时间段内被重置清零。
需要说明的是,上述单位时间的单位可以是任意时间,比如1s,3s,5s,50ms等,但并不限于此,可选地,上述预定时间段可以认为是一个单位时间,但并不限于此。
在本发明的一个可选实施例中,上述校验单元可以基于第二映射表判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数,其中,第二映射表存储有存储节点的地址与访问到该存储节点的校验请求数之间的对应关系。
需要说明的是,上述存储节点的地址可以作为第二映射表中的key值,上述访问到该存储节点的校验请求数可以作为第二映射表中与key值对应的value值,但并不限于此。
需要说明的是,在对上述数据块切片进行校验之后,其可能会出现校验码不一致的情况,此时有很大的概率可能是用户正在修改该数据块切片对应的区间,数据正在动态变化,因而,在本申请上述实施例中,在对上述数据块切片进行校验之后,上述装置还可以包括:重试模块,与上述校验单元连接,用于采用退避重试的方式对数据块切片进行检验重试,其中,退避重试的方式为对数据块切片进行延时重试。
在本申请的一个可选实施例中,在进行校验重试的过程中,本次重试的时间间隔由本次重试的上一次重试的时间间隔确定,其中,本次重试的时间间隔大于上一次重试的时间间隔。
通过采用上述退避重试的方式对数据块切换进行校验重试,避免了重试请求太多影响上述数据块切片对应的区间的正常读写,也增加了对数据块切片的持续观察的时间;更进一步达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前端读写的影响,从而实现了读写性能被提高的技术效果,更好地解决了相关技 术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
需要说明的是,上述本次重试的时间间隔由本次重试的上一次重试的时间间隔确定可以表现为:由上一次的时间间隔经递推函数处理得到预定值,该预定值与重试间隔的上限值的较小值为本次重试的时间间隔。
此处需要说明的是,上述确定模块42和发送模块44对应于实施例1中的步骤S202至步骤S204,两个模块与对应的步骤所实现的实例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在实施例1提供的计算机终端10中。
实施例3
本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。
在本实施例中,上述计算机终端可以执行应用程序的数据校验方法中以下步骤的程序代码:确定分布式存储系统中预定文件所包括的多个数据块中待校验数据块,其中,待校验数据块所处的存储节点为分布式存储系统中满足负载均衡策略的存储节点;对待校验数据块进行校验。
可选地,图5是根据本发明实施例的一种计算机终端的结构框图。如图5所示,该计算机终端5可以包括:一个或多个(图中仅示出一个)处理器52、存储器54、以及传输装置56。
其中,存储器54可用于存储软件程序以及模块,如本发明实施例中的安全漏洞检测方法和装置对应的程序指令/模块,处理器52通过运行存储在存储器54内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的系统漏洞攻击的检测方法。存储器54可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器54可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端5。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
处理器52可以通过传输装置调用存储器存储的信息及应用程序,以执行下述步骤:确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,待校 验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;对待校验数据块进行校验。
可选的,上述处理器还可以执行如下步骤的程序代码:从多个数据块中选择出第一数据块作为待校验数据块,其中,第一数据块为副本所在的存储节点中第一存储节点的数量达到预定数量的数据块,第一存储节点为校验任务未达到校验任务上限的存储节点。
可选的,上述处理器还可以执行如下步骤的程序代码:根据第一映射表,从多个数据块中选择出第一数据块作为待校验数据块,其中,第一映射表存储有存储节点的地址与访问到该存储节点的并发校验任务的个数之间的对应关系。
可选的,上述处理器还可以执行如下步骤的程序代码:对待校验数据块进行二次切片得到数据块切片;对数据块切片进行校验。
可选的,上述处理器还可以执行如下步骤的程序代码:判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数;在判断结果为是的情况下,对数据块切片进行校验。
可选的,上述处理器还可以执行如下步骤的程序代码:根据第二映射表,判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数,其中,第二映射表存储有存储节点的地址与访问到该存储节点的校验请求数之间的对应关系。
可选的,上述处理器还可以执行如下步骤的程序代码:在对数据块切片进行校验之后,采用退避重试的方式对数据块切片进行检验重试,其中,退避重试的方式为对数据块切片进行延时重试。
本申请上述实施例1公开的方案中,在确定分布式存储系统中预定文件所包括的多个数据块中待校验数据块之后,对待校验数据块进行校验;其中,确定的待校验数据块所处的存储节点分布式存储系统中满足负载均衡策略的存储节点;本方案可以基于负载均衡策略来实现对待校验数据块的确定。
容易注意到,由于待校验数据块的存储节点为分布式存储系统中满足负载均衡策略的存储节点,即确定的待校验数据块所处的存储节点为分布式存储系统中满足负载均衡策略的存储节点,即实际上上述确定的待校验数据块是基于负载均衡策略来确定的,因而,通过本申请实施例提供的上述方案,在对该待校验数据块进行校验的过程中,可以避免大量的校验请求聚焦在几个存储节点上而造成部分存储节点负载过大的问题,有效地减少了不同校验任务间的争抢,较大地提升了分布式存储系统中集群的整体校验速度;达到了减少对存储节点磁盘和网络的资源占用的目的,降低了数据校验过程中对用户前 端读写的影响,从而实现了读写性能被提高的技术效果,进而解决了相关技术中由于存在数据校验过程中影响用户前端的读写性能的技术问题。
本领域普通技术人员可以理解,图5所示的结构仅为示意,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌声电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图5其并不对上述电子装置的结构造成限定。例如,计算机终端5还可包括比图5中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图5所示不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
实施例4
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述实施例1所提供的数据校验方法所执行的程序代码。
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:确定与分布式存储系统中预定文件所对应的多个数据块中待校验数据块,其中,待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;对待校验数据块进行校验。
可选的,上述存储介质还被设置为存储用于执行以下步骤的程序代码:从多个数据块中选择出第一数据块作为待校验数据块,其中,第一数据块为副本所在的存储节点中第一存储节点的数量达到预定数量的数据块,第一存储节点为校验任务未达到校验任务上限的存储节点。
可选的,上述存储介质还被设置为存储用于执行以下步骤的程序代码:根据第一映射表,从多个数据块中选择出第一数据块作为待校验数据块,其中,第一映射表存储有存储节点的地址与访问到该存储节点的并发校验任务的个数之间的对应关系。
可选的,上述存储介质还被设置为存储用于执行以下步骤的程序代码:对待校验数据块进行二次切片得到数据块切片;对数据块切片进行校验。
可选的,上述存储介质还被设置为存储用于执行以下步骤的程序代码:判断数据块 切片所在的存储节点的校验请求数是否未达到校验请求上限数;在判断结果为是的情况下,对数据块切片进行校验。
可选的,上述存储介质还被设置为存储用于执行以下步骤的程序代码:根据第二映射表,判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数,其中,第二映射表存储有存储节点的地址与访问到该存储节点的校验请求数之间的对应关系。
可选的,上述存储介质还被设置为存储用于执行以下步骤的程序代码:在对数据块切片进行校验之后,采用退避重试的方式对数据块切片进行检验重试,其中,退避重试的方式为对数据块切片进行延时重试。
实施例5
本实施例提供一种处理器,上述处理器用于运行程序,其中,上述程序运行时执行上述实施例1所提供的数据校验方法所执行的程序代码。
可选地,在本实施例中,上述处理器可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,处理器被设置为用于执行以下步骤的程序代码:确定与分布式存储系统中预定文件所对应的多个数据块中待校验数据块,其中,待校验数据块所处的存储节点包括:分布式存储系统中满足负载均衡策略的存储节点;对待校验数据块进行校验。
可选地,在本实施例中,处理器被设置为用于执行以下步骤的程序代码:从多个数据块中选择出第一数据块作为待校验数据块,其中,第一数据块为副本所在的存储节点中第一存储节点的数量达到预定数量的数据块,第一存储节点为校验任务未达到校验任务上限的存储节点。
可选地,在本实施例中,处理器被设置为用于执行以下步骤的程序代码:根据第一映射表,从多个数据块中选择出第一数据块作为待校验数据块,其中,第一映射表存储有存储节点的地址与访问到该存储节点的并发校验任务的个数之间的对应关系。
可选地,在本实施例中,处理器被设置为用于执行以下步骤的程序代码:对待校验数据块进行二次切片得到数据块切片;对数据块切片进行校验。
可选地,在本实施例中,处理器被设置为用于执行以下步骤的程序代码:判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数;在判断结果为是的情况下,对数据块切片进行校验。
可选地,在本实施例中,处理器被设置为用于执行以下步骤的程序代码:根据第二 映射表,判断数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数,其中,第二映射表存储有存储节点的地址与访问到该存储节点的校验请求数之间的对应关系。
可选地,在本实施例中,处理器被设置为用于执行以下步骤的程序代码:在对数据块切片进行校验之后,采用退避重试的方式对数据块切片进行检验重试,其中,退避重试的方式为对数据块切片进行延时重试。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式 体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (13)

  1. 一种数据校验方法,包括:
    确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,所述待校验数据块所处的存储节点包括:所述分布式存储系统中满足负载均衡策略的存储节点;
    对所述待校验数据块进行校验。
  2. 根据权利要求1所述的方法,其中,确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块包括:
    从所述多个数据块中选择出第一数据块作为所述待校验数据块,其中,所述第一数据块为副本所在的存储节点中第一存储节点的数量达到预定数量的数据块,所述第一存储节点为校验任务未达到校验任务上限的存储节点。
  3. 根据权利要求2所述的方法,其中,所述预定数量为所述数据块的所有副本的数量。
  4. 根据权利要求2或3所述的方法,其中,根据第一映射表,从所述多个数据块中选择出第一数据块作为所述待校验数据块,其中,所述第一映射表存储有存储节点的地址与访问到该存储节点的并发校验任务的个数之间的对应关系。
  5. 根据权利要求1所述的方法,其中,对所述待校验数据块进行校验包括:
    对所述待校验数据块进行二次切片得到数据块切片;
    对所述数据块切片进行校验。
  6. 根据权利要求5所述的方法,其中,对所述数据块切片进行校验包括:
    判断所述数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数;
    在判断结果为是的情况下,对所述数据块切片进行校验。
  7. 根据权利要求6所述的方法,其中,所述数据块切片所在的存储节点的校验请求数在预定时间段内被重置清零。
  8. 根据权利要求6或7所述的方法,其中,根据第二映射表,判断所述数据块切片所在的存储节点的校验请求数是否未达到校验请求上限数,其中,所述第二映射表存储有存储节点的地址与访问到该存储节点的校验请求数之间的对应关系。
  9. 根据权利要求5所述的方法,其中,在对所述数据块切片进行校验之后,还包括:
    采用退避重试的方式对所述数据块切片进行检验重试,其中,所述退避重试的方式 为对所述数据块切片进行延时重试。
  10. 根据权利要求9所述的方法,其中,采用退避重试的方式对所述数据块切片进行检验重试包括:
    在进行校验重试的过程中,本次重试的时间间隔由所述本次重试的上一次重试的时间间隔确定,其中,所述本次重试的时间间隔大于上一次重试的时间间隔。
  11. 一种数据校验装置,包括:
    确定模块,用于确定与分布式存储系统中预定文件所对应的多个数据块中的待校验数据块,其中,所述待校验数据块所处的存储节点包括:所述分布式存储系统中满足负载均衡策略的存储节点;
    检验模块,用于对所述待校验数据块进行校验。
  12. 根据权利要求11所述的装置,其中,所述确定模块,用于从所述多个数据块中选择出第一数据块作为所述待校验数据块,其中,所述第一数据块为副本所在的存储节点中第一存储节点的数量达到预定数量的数据块,所述第一存储节点为校验任务未达到校验任务上限的存储节点。
  13. 一种存储介质,所述存储介质存储有程序指令,其中,在所述程序指令运行时控制所述存储介质所在设备执行权利要求1至10中任意一项所述的数据校验方法。
PCT/CN2019/091227 2018-06-28 2019-06-14 数据校验方法及装置,存储介质 WO2020001287A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2020572791A JP7442466B2 (ja) 2018-06-28 2019-06-14 データ検証の方法および装置、ならびに記憶媒体
EP19825019.3A EP3817255A4 (en) 2018-06-28 2019-06-14 DATA CHECK METHOD AND DEVICE AND STORAGE MEDIUM
US17/133,426 US11537304B2 (en) 2018-06-28 2020-12-23 Data verification method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810687959.5A CN110659151B (zh) 2018-06-28 2018-06-28 数据校验方法及装置,存储介质
CN201810687959.5 2018-06-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/133,426 Continuation US11537304B2 (en) 2018-06-28 2020-12-23 Data verification method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2020001287A1 true WO2020001287A1 (zh) 2020-01-02

Family

ID=68986038

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091227 WO2020001287A1 (zh) 2018-06-28 2019-06-14 数据校验方法及装置,存储介质

Country Status (5)

Country Link
US (1) US11537304B2 (zh)
EP (1) EP3817255A4 (zh)
JP (1) JP7442466B2 (zh)
CN (1) CN110659151B (zh)
WO (1) WO2020001287A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668521A (zh) * 2023-07-25 2023-08-29 广东广宇科技发展有限公司 一种基于数据结构的分布式多元数据快速传输方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11429435B1 (en) * 2020-02-04 2022-08-30 Amazon Technologies, Inc. Distributed execution budget management system
CN112486421B (zh) * 2020-12-16 2024-05-28 上海达梦数据库有限公司 一种数据存储方法、装置、电子设备及存储介质
CN112713964B (zh) * 2020-12-22 2022-08-05 潍柴动力股份有限公司 数据校验加速方法、装置、计算机设备及存储介质
CN113204371B (zh) * 2021-05-28 2023-09-19 金蝶软件(中国)有限公司 一种访问控制方法、相关装置及存储介质
CN113435170B (zh) * 2021-06-24 2024-09-20 平安国际智慧城市科技股份有限公司 数据校验方法、装置、电子设备及存储介质
CN113391767B (zh) * 2021-06-30 2022-10-28 北京百度网讯科技有限公司 数据一致性的校验方法、装置、电子设备及可读存储介质
CN114153647B (zh) * 2021-09-24 2022-08-02 深圳市木浪云科技有限公司 云存储系统的快速数据校验方法、装置及系统
CN114780021B (zh) * 2022-03-25 2022-11-29 北京百度网讯科技有限公司 副本修复方法、装置、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103731505A (zh) * 2014-01-17 2014-04-16 中国联合网络通信集团有限公司 一种数据分布式存储方法及系统
CN105530294A (zh) * 2015-12-04 2016-04-27 中科院成都信息技术股份有限公司 一种海量数据分布式存储的方法

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7200626B1 (en) * 2004-01-22 2007-04-03 Unisys Corporation System and method for verification of a quiesced database copy
JP4916809B2 (ja) 2006-08-04 2012-04-18 日本電信電話株式会社 負荷分散制御装置および方法
US8301791B2 (en) * 2007-07-26 2012-10-30 Netapp, Inc. System and method for non-disruptive check of a mirror
US9098519B2 (en) * 2008-09-16 2015-08-04 File System Labs Llc Methods and apparatus for distributed data storage
US8694467B2 (en) 2010-03-31 2014-04-08 Xerox Corporation Random number based data integrity verification method and system for distributed cloud storage
US8386835B2 (en) 2010-05-17 2013-02-26 Oracle International Corporation System and method for end-to-end data integrity in a network file system
US8861727B2 (en) 2010-05-19 2014-10-14 Cleversafe, Inc. Storage of sensitive data in a dispersed storage network
US8555142B2 (en) 2010-06-22 2013-10-08 Cleversafe, Inc. Verifying integrity of data stored in a dispersed storage memory
US9015499B2 (en) 2010-11-01 2015-04-21 Cleversafe, Inc. Verifying data integrity utilizing dispersed storage
US8627065B2 (en) * 2010-11-09 2014-01-07 Cleversafe, Inc. Validating a certificate chain in a dispersed storage network
US8910031B1 (en) 2011-03-29 2014-12-09 Emc Corporation DIF-CRC based fast hashing
US8589761B2 (en) 2011-05-31 2013-11-19 Micron Technology, Inc. Apparatus and methods for providing data integrity
US8468423B2 (en) 2011-09-01 2013-06-18 International Business Machines Corporation Data verification using checksum sidefile
US8990664B2 (en) * 2012-01-31 2015-03-24 Cleversafe, Inc. Identifying a potentially compromised encoded data slice
CN103984607A (zh) * 2013-02-08 2014-08-13 华为技术有限公司 分布式存储的方法、装置和系统
US9448877B2 (en) 2013-03-15 2016-09-20 Cisco Technology, Inc. Methods and apparatus for error detection and correction in data storage systems using hash value comparisons
CN104239148B (zh) * 2013-06-06 2018-05-18 腾讯科技(深圳)有限公司 一种分布式任务调度方法及装置
WO2014205286A1 (en) 2013-06-19 2014-12-24 Exablox Corporation Data scrubbing in cluster-based storage systems
US9753955B2 (en) 2014-09-16 2017-09-05 Commvault Systems, Inc. Fast deduplication data verification
US9880904B2 (en) * 2014-12-12 2018-01-30 Ca, Inc. Supporting multiple backup applications using a single change tracker
JP2017010425A (ja) 2015-06-25 2017-01-12 富士通株式会社 制御プログラム、情報処理装置および制御方法
US10437671B2 (en) * 2015-06-30 2019-10-08 Pure Storage, Inc. Synchronizing replicated stored data
US10496484B2 (en) 2016-08-05 2019-12-03 Sandisk Technologies Llc Methods and apparatus for error detection for data storage devices

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103731505A (zh) * 2014-01-17 2014-04-16 中国联合网络通信集团有限公司 一种数据分布式存储方法及系统
CN105530294A (zh) * 2015-12-04 2016-04-27 中科院成都信息技术股份有限公司 一种海量数据分布式存储的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3817255A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668521A (zh) * 2023-07-25 2023-08-29 广东广宇科技发展有限公司 一种基于数据结构的分布式多元数据快速传输方法
CN116668521B (zh) * 2023-07-25 2023-10-31 广东广宇科技发展有限公司 一种基于数据结构的分布式多元数据快速传输方法

Also Published As

Publication number Publication date
US11537304B2 (en) 2022-12-27
US20210117093A1 (en) 2021-04-22
CN110659151A (zh) 2020-01-07
JP7442466B2 (ja) 2024-03-04
CN110659151B (zh) 2023-05-02
EP3817255A4 (en) 2022-03-30
EP3817255A1 (en) 2021-05-05
JP2021529392A (ja) 2021-10-28

Similar Documents

Publication Publication Date Title
WO2020001287A1 (zh) 数据校验方法及装置,存储介质
US10642704B2 (en) Storage controller failover system
KR102444425B1 (ko) 블록체인 트랜잭션에 대하여 배치 프로세싱을 수행하는 장치 및 그 방법
US11544001B2 (en) Method and apparatus for transmitting data processing request
US9015519B2 (en) Method and system for cluster wide adaptive I/O scheduling by a multipathing driver
US20190163371A1 (en) Next generation storage controller in hybrid environments
WO2018121456A1 (zh) 一种数据存储方法、服务器以及存储系统
WO2021139224A1 (zh) 云场景下的文件备份方法、装置、介质、电子设备
CN110362402B (zh) 一种负载均衡方法、装置、设备及可读存储介质
US9864706B2 (en) Management of allocation for alias devices
EP3131015B1 (en) Memory migration method and device
CN111159195A (zh) 区块链系统中的数据存储控制方法及设备
US20220253356A1 (en) Redundant data calculation method and apparatus
CN111475108A (zh) 一种分布式存储方法、计算机设备及计算机可读存储介质
US9300530B2 (en) Management device, management method, and medium
US20180088955A1 (en) Method and system for managing data access in storage system
US8438277B1 (en) Systems and methods for preventing data inconsistency within computer clusters
CN111562884A (zh) 一种数据存储方法、装置及电子设备
CN116301568A (zh) 一种数据访问方法、装置以及设备
US20130238871A1 (en) Data processing method and apparatus, pci-e bus system, and server
US20230273801A1 (en) Method for configuring compute mode, apparatus, and computing device
CN116248699B (zh) 多副本场景下的数据读取方法、装置、设备及存储介质
CN111046004A (zh) 一种数据文件存储方法、装置、设备及存储介质
EP4383076A1 (en) Data processing method and apparatus, computer device, and computer-readable storage medium
CN107203559B (zh) 一种划分数据条带的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19825019

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020572791

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2019825019

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2019825019

Country of ref document: EP

Effective date: 20210128