WO2023241350A1 - Data processing method and device, data access end, and storage medium - Google Patents

Data processing method and device, data access end, and storage medium Download PDF

Info

Publication number
WO2023241350A1
WO2023241350A1 PCT/CN2023/097128 CN2023097128W WO2023241350A1 WO 2023241350 A1 WO2023241350 A1 WO 2023241350A1 CN 2023097128 W CN2023097128 W CN 2023097128W WO 2023241350 A1 WO2023241350 A1 WO 2023241350A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
target
space
stored
Prior art date
Application number
PCT/CN2023/097128
Other languages
French (fr)
Chinese (zh)
Inventor
赵旭东
易曌平
Original Assignee
重庆紫光华山智安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 重庆紫光华山智安科技有限公司 filed Critical 重庆紫光华山智安科技有限公司
Publication of WO2023241350A1 publication Critical patent/WO2023241350A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • This application relates to the field of distributed storage, specifically, to a data processing method, device, data access terminal and storage medium.
  • Distributed storage means that the data access end divides the data to be stored into multiple data strips, encodes the multiple data strips through the erasure coding algorithm, obtains redundant verification data, and then separates each data strip into and verification data are stored on multiple storage nodes. When there is data that failed to be stored, the data that failed to be stored can be restored through the data that has been successfully stored.
  • bitmaps are generally used to record the storage locations of data strips, and read and write operations of data strips are performed by finding storage locations. Bitmaps are usually recorded by metadata and combined with data strips. are stored together in each storage node. When any storage node is powered off abnormally, metadata may be successfully stored at that node, but data stripe storage may fail. At this time, because the data version numbers in the metadata between this node and other nodes are consistent. , the data access end will not restore the data strips that failed to be stored at the node, and the consistency of the data cannot be guaranteed.
  • embodiments of the present application provide a data processing method, device, data access terminal and storage medium, which can avoid the occurrence of consistent data version numbers in each storage node and storage failure in data strips situation, so that the data access end can promptly recover the failed data strips to ensure data consistency.
  • this application provides a data processing method, which is applied to the data access terminal in a distributed storage system.
  • the distributed storage system also includes a plurality of storage nodes, and each storage node is provided with a number.
  • Each storage node is communicatively connected with the data access terminal, and the method includes:
  • For each data block determine the target node from the multiple storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node stores the data block and the data version number respectively into the first space and the second space of the target node, where , the serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space.
  • the method before processing the data to be stored to obtain multiple data blocks, the method further includes:
  • the step of processing the data to be stored to obtain multiple data blocks includes:
  • Each preset number of the data strips is formed into an original data block to obtain multiple first data blocks;
  • Erasure coding is performed on the plurality of original data blocks to obtain a plurality of verification data blocks.
  • the plurality of data blocks include the plurality of original data blocks and the plurality of verification data blocks.
  • the method further includes:
  • the data read request includes the writing node sequence of the data to be read
  • the data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
  • the method further includes:
  • target data version numbers multiple storage nodes are divided into normal nodes and abnormal nodes according to each target data version number, where the target data version numbers corresponding to all normal nodes are equal. Consistent, the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes;
  • the data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
  • the step of generating the data to be read according to the preset mapping relationship and all the target data blocks includes:
  • the method further includes:
  • For each abnormal node use the target data block corresponding to the abnormal node to cover the content of the target area of the abnormal node.
  • this application provides a data processing device applied to a data access terminal in a distributed storage system.
  • the distributed storage system also includes a plurality of storage nodes, and each storage node is provided with a number.
  • Each storage node is communicatively connected with the data access terminal, and the method includes:
  • a receiving module configured to receive a data write request, where the data write request includes data to be stored
  • a processing module used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number
  • a sending module configured to determine, for each data block, a target node from the plurality of storage nodes according to the sequence number of the data block, and combine the data block with the data version number of the data to be stored. Sent to the target node, so that the target node stores the data block and the data version number in the first space and the second space of the target node respectively, where the sequence number of the data block is the same as the data version number.
  • the number of the target node satisfies a preset mapping relationship, and the first space is located before the second space.
  • the present application provides a data access terminal, including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, it implements any one of the preceding embodiments. Data processing methods.
  • the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the data processing method as described in any one of the preceding embodiments is implemented.
  • a data write request is received, and the data write request includes data to be stored; then, the data to be stored is processed.
  • multiple data blocks are obtained, each data block is assigned a sequence number; then, for each data block, the target node is determined from multiple storage nodes according to the sequence number of the data block, and the data block is combined with the data to be stored
  • the data version number is sent to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block and the number of the target node satisfy the preset mapping relationship , the first space is before the second space.
  • the target node Since the embodiment of the present application sends each data block to the target node together with the data version number of the data to be stored, the target node stores the data block and data version number in its first space and second space respectively, and the target node The first space is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly recover the failed data blocks to ensure data consistency. .
  • Figure 1 is a schematic structural diagram of a distributed storage system provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of the distributed storage process provided by the embodiment of the present application.
  • Figure 3 is a schematic flow chart of the data processing method provided by the embodiment of the present application.
  • Figure 4 is a schematic diagram of the data write request response process provided by the embodiment of the present application.
  • FIG. 5 is a schematic flow chart of the implementation of step S102 provided by the embodiment of the present application.
  • Figure 6 is another schematic flow chart of the data processing method provided by the embodiment of the present application.
  • Figure 7 is a schematic diagram of the data read request response process provided by the embodiment of the present application.
  • Figure 8 is another schematic diagram of the data read request response process provided by the embodiment of the present application.
  • Figure 9 is an example of the data write request response process provided by the embodiment of the present application.
  • Figure 10 is an example of the data read request response process provided by the embodiment of the present application.
  • Figure 11 is another example of the data read request response process provided by the embodiment of the present application.
  • Figure 12 is a schematic structural block diagram of a data access terminal provided by an embodiment of the present application.
  • Figure 13 is a functional unit block diagram of the data processing device provided by the embodiment of the present application.
  • Icon 300-data access terminal; 310-memory; 320-processor; 400-data processing device; 401-receiving module; 402-processing module; 403-sending module.
  • the current industry response methods include allocating location information to data for data reading and writing, but this method will cause data fragmentation, and random reading and writing of data will lead to poor storage system performance; or the metadata of the data will be changed.
  • the disadvantage of this method is that the copy copy will occupy excess space, and the checksum repair is also time-consuming and consumes the system. Performance; or when reading and writing data, determine whether the data version numbers of each copy of the data are consistent. If they are inconsistent, select the most complete copy for replacement.
  • this method cannot accurately determine the data version of each copy when the node is offline or data abnormality occurs. Whether the numbers are consistent will lead to data inconsistency; erasure coding technology and multi-copy technology can be optimized to improve distributed data storage performance. However, when the amount of data is large, algorithm optimization cannot save greater performance.
  • the existing distributed storage technology has not yet solved the problem of how to ensure data consistency while improving the read and write performance of the distributed storage system.
  • embodiments of the present application provide a data processing method, which will be introduced in detail below.
  • Figure 1 is a schematic diagram of the results of a distributed storage system provided by an embodiment of the present application.
  • the distributed storage system includes a data access terminal and multiple storage nodes.
  • the data access terminal communicates with each storage node.
  • the data access end can interact with upper-layer applications or external hosts and receive data write requests sent by upper-layer applications or external hosts. As shown in Figure 2, the data access end divides the data into n original data blocks and m verification data blocks, while generating a data version number for each original data block or verification data block, and finally writing each original data block or verification data block and the corresponding data version number to each storage node at the same time . When a read data request is made, the data access end reads the data version number of the data block from the storage node for comparison. If the data version number of the data block is consistent, the data block is read in response to the data read request.
  • the data access end can be a server, a personal computer (hereinafter referred to as PC), a laptop, etc.
  • the data access end can also be one or more program modules on a device, or a virtual machine or virtual machine running on a device.
  • a container or client can also be a cluster composed of multiple devices, for example, it can be a collective name for multiple program modules distributed on multiple devices.
  • the storage node can store original data blocks and/or verification data blocks from the data access terminal.
  • the storage node can be a server, PC, laptop, etc.
  • the storage node can be a physical storage node or a logical storage node obtained by dividing the physical storage node.
  • Figure 3 shows a flow of the data processing method provided by the embodiment of the present application.
  • the data processing method includes steps S101 to S103, and the execution subject is the data access terminal in Figure 1.
  • the data write request includes data to be stored, and the data write request may be sent to the data access terminal by an upper-layer application or an external host.
  • S102 process the data to be stored and obtain multiple data blocks.
  • the data access end after receiving the data write request, divides the data to be stored with length L into n original data blocks, and then generates m verification data blocks through the erasure ratio, and n original data
  • Each block is assigned a sequence number, and the value range of the sequence number is [1, n].
  • the m check data blocks are also assigned a sequence number, and the value range of the sequence number is [1, m] (see Figure 4).
  • For each data block determine the target node from multiple storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node combines the data block and the data version number.
  • the data version numbers are stored in the first space and the second space of the target node respectively.
  • serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space.
  • storage nodes are divided into data nodes and check nodes, which are used to store original data blocks and check data blocks respectively.
  • the total number of data nodes is n
  • the total number of check nodes is m
  • Each data node is set with a number, and the range of the number value is [1, n].
  • Each check node is also set with a number, and the range of the number value is [1, m].
  • the preset mapping relationship includes the correspondence between the sequence numbers of n original data blocks and the numbers of n data nodes, and the correspondence between the sequence numbers of m check data blocks and the numbers of m check nodes.
  • the target node is determined from n data nodes or m check nodes according to its sequence number and preset mapping relationship, and then the original data block or check data block is summed
  • the data version number of the data to be stored generated based on the current timestamp is sent to the target node together.
  • the target node of the original data block 1 is data node 1
  • the target node of the verification data block m is the verification node m (see Figure 4 ).
  • each data node or check node its disk space is divided into a first space and a second space, and the first space is located before the second space.
  • the data block When writing the data block and data version number, the data block must be placed on the disk before the data version number, thereby avoiding the situation where the data version number is successfully stored but the data block storage fails, so that the data access end can promptly respond to the storage failure. Data blocks are restored to ensure data consistency.
  • the beneficial effect of the above method provided by the embodiment of the present application is that by sending each data block together with the data version number of the data to be stored to the target node, the target node stores the data block and the data version number respectively in its first space and the second space, and the first space in the target node is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly respond to the storage failure. Data blocks are restored to ensure data consistency.
  • the embodiment of the present application also provides an implementation method of generating a data version number by the data access terminal, which will be introduced in detail below.
  • Case 1 If there is one data write request, use the current timestamp as the data version number of the data to be stored.
  • the system timestamp of the current distributed storage system can be directly used as the data version number of the data to be stored in the data write request.
  • Case 2 If there are multiple data write requests, perform multiple auto-increment operations on the current timestamp, and use the result of each auto-increment operation as one data write request according to the reception time of each data write request. The data version number of the data to be stored.
  • the system timestamp of the current distributed storage system is used as the initial value to perform multiple self-increment operations.
  • the number of self-increment operations is the total number of data write requests. number, and according to the reception time of each data write request, the result of each auto-increment operation is used as the data version number of the data to be stored in a data write request, thereby obtaining the data to be stored in each data write request.
  • the data version number of the data is used as the data version number of the data to be stored in a data write request, thereby obtaining the data to be stored in each data write request.
  • the above-mentioned data version number generation method also provided by the embodiment of the present application can realize unified management of data version numbers, and ensure the consistency of data through the data version number.
  • Step S102 is introduced in detail below.
  • Step S102 includes sub-steps S102-1 to S102-3.
  • S102-1 Divide the data to be stored into multiple data strips according to the preset length.
  • the length of each data strip is a preset length. As shown in Figure 4, the preset length is x, and the data to be stored with length L is divided into L/x data strips.
  • S102-2 Combine each preset number of data strips into an original data block to obtain multiple original data blocks.
  • the preset number is determined by the preset length, the number of data nodes, and the length of the data to be stored. As shown in Figure 4, the length of the data to be stored is L, the preset length is x, and the number of data nodes is n, so the preset number is L/nx. Understandably, according to the cutting order of the data strips, each L/nx data strips form an original data block, and a total of n original data blocks are obtained.
  • S102-3 Perform erasure coding on multiple original data blocks to obtain multiple verification data blocks.
  • each verification data block includes L/nx data strips, and each data The length of the strip is x.
  • each storage node is allocated a fixed and continuous first space and a second space for the data block and the data version number of the data to be stored.
  • the method of reading and writing data in a data block is changed to random reading and writing. Change to sequential reading and writing, which improves the reading and writing performance of the distributed storage system.
  • FIG. 6 shows another flow of the data processing method provided by the embodiment of the present application.
  • the data processing method includes steps S201 to S207.
  • the data read request includes the order of writing nodes of the data to be read.
  • the data to be read includes the original data blocks stored in multiple data nodes by the data access end through processing the data write request.
  • the order of writing nodes refers to the original data.
  • the data access terminal processed data write request 1, data write request 2,..., data write request k in order of reception time, among which data was written in the first space of data node 1 in sequence.
  • Original data block 1 corresponding to write request 1 original data block 1 corresponding to data write request 2,..., original data block 1 corresponding to data write request k.
  • data is written in the second space of data node 1 in sequence.
  • the data version number of the data to be stored in write request 1, the data version number of the data to be stored in data write request 2,..., the data version number of the data to be stored in data write request k understandably, the first of data node n
  • the original data block n corresponding to data write request 1, the original data block n corresponding to data write request 2,..., the original data block n corresponding to data write request k are sequentially written in the space, and the second space of data node n is sequentially written.
  • the data version number of the data to be stored in data write request 1, the data version number of the data to be stored in data write request 2,..., the data version number of the data to be stored in data write request k are written. If the writing node order of the data to be read is 2, the data to be read is composed of the second original data block in data node 1 to data node n.
  • S201 Read the target data version number from the second space of each storage node according to the order of writing nodes.
  • multiple data version numbers are stored in the second space of each storage node, and the size of the space occupied by each data version number is the same.
  • the target area in the second space is determined according to the order of writing nodes and the space occupied by the data version number, and the content read from the target area is used as the target data version number.
  • the target area in the second space of each storage node is calculated to be the 24th B to the 32nd B, and the target area will be calculated from each storage node.
  • the content read in the target area in the second space of the node is used as the target data version number corresponding to the storage node.
  • the target data version numbers read from the second space of each storage node are consistent, it means that the data blocks used to form the data to be read at each storage node are successfully stored.
  • the first empty of each storage node Multiple data blocks are stored in the space, and each data block occupies the same size of space.
  • the target area in the first space is determined according to the writing node order and the space occupied by the data block. The content read from the target area is used as the target data block, and then all target data blocks are combined to obtain the target data block. Read data.
  • the target area within the first space of each storage node is calculated to be the 384Kth to 512Kth, and the target area will be calculated from each storage node.
  • the content read in the target area in the first space is used as the target data block corresponding to the storage node.
  • S205 Read the target data block from the first space of each normal node according to the order of writing nodes.
  • step 203 multiple data blocks are stored in the first space of each normal point, and the size of the space occupied by each data block is the same.
  • the target area in the first space is determined according to the writing node order and the space occupied by the data block, and the content read from the target area is used as the target data block.
  • the target data blocks corresponding to all normal nodes are restored.
  • S207 Generate data to be read according to the preset mapping relationship and all target data blocks to respond to the data read request.
  • the target data blocks read from the first space of all data nodes can be used to generate to-be-read Get data.
  • all data nodes (which may be normal nodes or abnormal ones) can be used node) to generate the data to be read.
  • step S207 is as follows:
  • the target data block corresponding to the abnormal node is used to cover the content of the target area of the abnormal node.
  • the embodiment of this application assumes that the number of storage nodes in the distributed storage system is 3 (2 data nodes, 1 check node), and the erasure ratio is 2:1 for illustration. .
  • the data access end receives the data write request sent by the upper-layer application or external host, and splits the data to be stored with a total length of 256K in the data write request into 2 original data blocks and 1 according to the erasure ratio.
  • a check data block, each original data block or check data block includes 32 data strips, and the length of each data strip is 4K.
  • the data version number of the data to be stored is 164961834.
  • the original data block with serial number 1 and the data version number are written into the first space and the second space of the data node numbered 1 respectively, and the original data block with serial number 2 and the data version number are written into In the first space and second space of the data node numbered 2, write the verification data block and data version number numbered 1 into the first space and second space of the verification node numbered 1 in response to the data write ask.
  • Each data node consists of multiple 64MB first spaces and 128KB second spaces. Multiple data write requests continuously store multiple original data blocks and corresponding data version numbers on the disk of the data node. Similarly, each check node is also composed of multiple first spaces of 64MB size and second space of 128KB size. Multiple data write requests continuously store multiple check blocks and corresponding data version numbers in the data node. on disk.
  • the data access end receives a data read request sent by an upper-layer application or an external host. According to the order of writing nodes of the data to be read in the data read request, the target is read in the second space of 2 data nodes and 1 check node. Data version number and compare.
  • the order of writing nodes to be read is from 2 Read the target data block from the first space of each data node, and then combine all the target data blocks according to the preset mapping relationship and the number of the data node to obtain the data to be read in response to the data read request.
  • the data access end first reads the data from the first space of data node 1 and check node 1 according to the order of the writing nodes of the data to be read. Read the target data block and recover the target data block corresponding to data node 2 through erasure calculation. After completing the recovery process, the data access end combines the target data blocks corresponding to data node 1 and data node 2 into data to be read according to the preset mapping relationship and the number of the data node to respond to the data read request.
  • the data access terminal 300 may include a memory 310 and a processor 320.
  • the processor 320 can be a general central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), or one or more for controlling the implementation of the above method. Examples provide data processing methods for program execution on integrated circuits.
  • CPU Central Processing Unit
  • ASIC Application-Specific Integrated Circuit
  • the memory 310 can be ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or it can be an electrically erasable programmable read-only memory (Electrically Erasable Programmabler) -Only MEMory, EEPROM), Compactdisc Read-Only MEMory, CD-ROM or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage device, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, without limitation.
  • the memory 310 may exist independently and be connected to the processor 320 through a communication bus. Memory 310 may also be integrated with processor 320. Among them, the memory 310 is used to store machine-executable instructions for executing the solution of the present application. The processor 320 is configured to execute machine-executable instructions stored in the memory 310 to implement the above method embodiments.
  • Embodiments of the present application also provide a computer-readable storage medium containing a computer program.
  • the computer program When executed, the computer program can be used to perform relevant operations in the data processing method provided by the above-mentioned method embodiments.
  • FIG. 13 is a functional unit block diagram of the data processing device 400 provided by an embodiment of the present application.
  • the data processing device 400 is applied to the data access terminal 300 and may include a receiving module 401, a processing module 402, and a sending module 403.
  • the receiving module 401, the processing module 402, and the sending module 403 can all be stored in the memory or computer-readable storage medium in the form of software.
  • the basic principles and technical effects of the data processing device 400 provided in the embodiments of the present application are the same as those of the above-mentioned embodiments. For the sake of brief description, they are not mentioned in the embodiments of the present application.
  • the receiving module 401 is used to receive a data write request, where the data write request includes data to be stored.
  • the processing module 402 is used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number.
  • the sending module 403 is used to determine the target node from multiple storage nodes according to the sequence number of the data block for each data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node
  • the data block and the data version number are respectively stored in the first space and the second space of the target node, where the serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space.
  • the processing module 402 is also configured to use the current timestamp as the data version number of the data to be stored if there is one data write request; if there are multiple data write requests, perform multiple times on the current timestamp. Auto-increment operation, and according to the reception time of each data write request, the result of each auto-increment operation is used as the data version number of the data to be stored in a data write request.
  • the processing module 402 is specifically configured to divide the data to be stored into multiple data strips according to a preset length; each preset number of data strips is composed into an original data block to obtain multiple checksums. Data block; perform erasure coding on multiple original data blocks to obtain multiple verification data blocks.
  • the multiple data blocks include multiple original data blocks and multiple verification data blocks.
  • the receiving module 401 is also configured to receive a data read request, which includes the writing node sequence of the data to be read; the processing module 402 is also configured to obtain the data from each storage node according to the writing node sequence. Read the target data version number from the second space; if all target data version numbers are consistent, read the target data block from the first space of each storage node according to the order of writing nodes; according to the preset mapping relationship and all The target data block generates data to be read in response to the data read request.
  • the processing module 402 is also configured to divide multiple storage nodes into normal nodes and abnormal nodes according to each target data version number if there are inconsistent target data version numbers, wherein all normal nodes correspond to The target data version numbers are all consistent, and the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes; according to the order of writing nodes, read the target data block from the first space of each normal node ; According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node; according to the preset mapping relationship and all target data blocks, generate data to be read in response to the data read request.
  • the processing module 402 is specifically configured to assign a sequence number to each target data block according to the preset mapping relationship and the number of each storage node; and to sort all target data blocks according to the sequence number of each target data block. , get the data to be read.
  • the processing module 402 is also used to determine the target area in the first space of each abnormal node according to the writing node order and the preset size; for each abnormal node, use the target data corresponding to the abnormal node block, covering the contents of the target area of the abnormal node.
  • Embodiments of the present application provide a data processing method, device, data access terminal and storage medium.
  • a data write request is received, and the data write request includes data to be stored; then, the data to be stored is processed to obtain multiple data blocks. , each data block is assigned a serial number; then, for each data block, the target node is determined from multiple storage nodes according to the serial number of the data block, and the data block and the data version number of the data to be stored are sent to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block and the number of the target node satisfy the preset mapping relationship, and the first space is located in the second space. before space.
  • the target node Since the embodiment of the present application sends each data block to the target node together with the data version number of the data to be stored, the target node stores the data block and data version number in its first space and second space respectively, and the target node The first space is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly recover the failed data blocks to ensure data consistency. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data processing method and device, a data access end, and a storage medium provided in embodiments of the present application, relating to the field of distributed storage. The method comprises: first receiving a data write request (S101), the data write request comprising data to be stored; then processing said data to obtain a plurality of data blocks (S102), a serial number being distributed to each data block; and next, for each data block, determining a target node from a plurality of storage nodes according to the serial number of each data block, and sending the data block and a data version number of said data to the target node, such that the target node stores the data block and the data version number into a first space and a second space of the target node, respectively (S103), wherein the serial number of each data block and the serial number of the target node satisfy a preset mapping relationship, and the first space is located in front of the second space, such that the situation that the data version numbers in the storage nodes are consistent, but the storage of the data block fails is avoided, the data access end can recover in time the data block which fails to be stored, and the data consistency is ensured.

Description

数据处理方法、装置、数据接入端及存储介质Data processing method, device, data access terminal and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年06月17日提交中国国家知识产权局的申请号为202210692395.0、名称为“数据处理方法、装置、数据接入端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application with application number 202210692395.0 and titled "Data processing method, device, data access terminal and storage medium" submitted to the State Intellectual Property Office of China on June 17, 2022, and its entire contents incorporated herein by reference.
技术领域Technical field
本申请涉及分布式存储领域,具体而言,涉及一种数据处理方法、装置、数据接入端及存储介质。This application relates to the field of distributed storage, specifically, to a data processing method, device, data access terminal and storage medium.
背景技术Background technique
大数据时代下,随着海量数据爆发式的增长,分布式存储越来越多的被应用。分布式存储是指数据接入端将待存储数据切分为多个数据条带,通过纠删码算法对多个数据条带进行编码,得到冗余的校验数据,再将各个数据条带和校验数据存储在多个存储节点上。当存在存储失败的数据时,可以通过已存储成功的数据恢复存储失败的数据。In the era of big data, with the explosive growth of massive data, distributed storage is increasingly used. Distributed storage means that the data access end divides the data to be stored into multiple data strips, encodes the multiple data strips through the erasure coding algorithm, obtains redundant verification data, and then separates each data strip into and verification data are stored on multiple storage nodes. When there is data that failed to be stored, the data that failed to be stored can be restored through the data that has been successfully stored.
现有的分布式存储技术中,一般是使用位图记录数据条带的存储位置,通过寻找存储位置来进行数据条带的读写操作,位图通常由元数据进行记录,并与数据条带一起被存储至各存储节点。当任意一个存储节点异常断电时,在该节点处可能会出现元数据存储成功,而数据条带存储失败的情况,此时,由于该节点与其他节点间元数据中的数据版本号是一致的,数据接入端不会对该节点处的存储失败的数据条带进行恢复,无法保障数据的一致性。In existing distributed storage technology, bitmaps are generally used to record the storage locations of data strips, and read and write operations of data strips are performed by finding storage locations. Bitmaps are usually recorded by metadata and combined with data strips. are stored together in each storage node. When any storage node is powered off abnormally, metadata may be successfully stored at that node, but data stripe storage may fail. At this time, because the data version numbers in the metadata between this node and other nodes are consistent. , the data access end will not restore the data strips that failed to be stored at the node, and the consistency of the data cannot be guaranteed.
发明内容Contents of the invention
为了克服现有技术的不足,本申请实施例提供了一种数据处理方法、装置、数据接入端和存储介质,其能够避免出现各存储节点中数据版本号一致,而数据条带存在存储失败的情况,使数据接入端能及时能对存储失败的数据条带进行恢复,从而保证数据的一致性。In order to overcome the deficiencies of the existing technology, embodiments of the present application provide a data processing method, device, data access terminal and storage medium, which can avoid the occurrence of consistent data version numbers in each storage node and storage failure in data strips situation, so that the data access end can promptly recover the failed data strips to ensure data consistency.
本申请的实施例可以这样实现:The embodiment of this application can be implemented as follows:
第一方面,本申请提供一种数据处理方法,应用于分布式存储系统中的数据接入端,所述分布式存储系统还包括多个存储节点,每个所述存储节点均设置有编号,每个所述存储节点均与所述数据接入端通信连接,所述方法包括:In a first aspect, this application provides a data processing method, which is applied to the data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes, and each storage node is provided with a number. Each storage node is communicatively connected with the data access terminal, and the method includes:
接收数据写请求,所述数据写请求包括待存储数据;Receive a data write request, the data write request includes data to be stored;
对所述待存储数据进行处理,得到多个数据块,每个所述数据块均分配有序号;Process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;
针对每个所述数据块,根据所述数据块的序号,从所述多个存储节点中确定出目标节点,并将所述数据块与所述待存储数据的数据版本号发送给所述目标节点,以使所述目标节点将所述数据块和所述数据版本号分别存储至所述目标节点的第一空间和第二空间,其 中,所述数据块的序号与所述目标节点的编号满足预设映射关系,所述第一空间位于所述第二空间之前。For each data block, determine the target node from the multiple storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node stores the data block and the data version number respectively into the first space and the second space of the target node, where , the serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space.
在可选的实施方式中,所述对所述待存储数据进行处理,得到多个数据块的步骤前,所述方法还包括:In an optional implementation, before processing the data to be stored to obtain multiple data blocks, the method further includes:
若所述数据写请求为一个,则将当前时间戳作为所述待存储数据的数据版本号;If there is one data write request, use the current timestamp as the data version number of the data to be stored;
若所述数据写请求为多个,则对所述当前时间戳进行多次自增运算,并按照每个所述数据写请求的接收时间的先后,将每次自增运算的结果作为一个所述数据写请求中的待存储数据的数据版本号。If there are multiple data write requests, multiple auto-increment operations are performed on the current timestamp, and the result of each auto-increment operation is used as one according to the reception time of each data write request. Describe the data version number of the data to be stored in the data write request.
在可选的实施方式中,所述对所述待存储数据进行处理,得到多个数据块的步骤包括:In an optional implementation, the step of processing the data to be stored to obtain multiple data blocks includes:
按照预设长度将所述待存储数据切分为多个数据条带;Divide the data to be stored into multiple data strips according to a preset length;
将每预设数量个所述数据条带组成一个原始数据块,得到多个第一数据块;Each preset number of the data strips is formed into an original data block to obtain multiple first data blocks;
对所述多个原始数据块进行纠删编码,得到多个校验数据块,所述多个数据块包括所述多个原始数据块和所述多个校验数据块。Erasure coding is performed on the plurality of original data blocks to obtain a plurality of verification data blocks. The plurality of data blocks include the plurality of original data blocks and the plurality of verification data blocks.
在可选的实施方式中,所述方法还包括:In optional implementations, the method further includes:
接收数据读请求,所述数据读请求包括待读取数据的写入节点顺序;Receive a data read request, the data read request includes the writing node sequence of the data to be read;
根据所述写入节点顺序,从每个所述存储节点的第二空间中读取目标数据版本号;According to the writing node sequence, read the target data version number from the second space of each storage node;
若所有所述目标数据版本号均一致,则根据所述写入节点顺序,从每个所述存储节点的第一空间中读取目标数据块;If all the target data version numbers are consistent, read the target data block from the first space of each storage node according to the writing node sequence;
根据所述预设映射关系和所有所述目标数据块,生成所述待读取数据,以响应所述数据读请求。The data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
在可选的实施方式中,所述方法还包括:In optional implementations, the method further includes:
若存在不一致的所述目标数据版本号,则根据每个所述目标数据版本号将多个所述存储节点划分为正常节点和异常节点,其中,所有所述正常节点对应的目标数据版本号均一致,每个所述异常节点对应的目标数据版本号与所有所述正常节点对应的目标数据版本号不一致;If there are inconsistent target data version numbers, multiple storage nodes are divided into normal nodes and abnormal nodes according to each target data version number, where the target data version numbers corresponding to all normal nodes are equal. Consistent, the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes;
根据所述写入节点顺序,从每个所述正常节点的第一空间中读取目标数据块;According to the writing node sequence, read the target data block from the first space of each normal node;
根据每个所述正常节点对应的目标数据块,恢复每个所述异常节点对应的目标数据块;According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node;
根据所述预设映射关系和所有所述目标数据块,生成所述待读取数据,以响应所述数据读请求。The data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
在可选的实施方式中,所述根据所述预设映射关系和所有所述目标数据块,生成所述待读取数据的步骤包括:In an optional implementation, the step of generating the data to be read according to the preset mapping relationship and all the target data blocks includes:
根据所述预设映射关系和每个所述存储节点的编号,为每个所述目标数据块分配序号; Allocate a sequence number to each target data block according to the preset mapping relationship and the number of each storage node;
按照每个所述目标数据块的序号对所有所述目标数据块进行排序,得到所述待读取数据。Sort all the target data blocks according to the sequence number of each target data block to obtain the data to be read.
在可选的实施方式中,所述方法还包括:In optional implementations, the method further includes:
根据所述写入节点顺序和预设大小,确定每个所述异常节点的第一空间中的目标区域;Determine the target area in the first space of each abnormal node according to the writing node order and the preset size;
针对每个所述异常节点,利用所述异常节点对应的目标数据块,对所述异常节点的目标区域的内容进行覆盖。For each abnormal node, use the target data block corresponding to the abnormal node to cover the content of the target area of the abnormal node.
第二方面,本申请提供一种数据处理装置,应用于分布式存储系统中的数据接入端,所述分布式存储系统还包括多个存储节点,每个所述存储节点均设置有编号,每个所述存储节点均与所述数据接入端通信连接,所述方法包括:In a second aspect, this application provides a data processing device applied to a data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes, and each storage node is provided with a number. Each storage node is communicatively connected with the data access terminal, and the method includes:
接收模块,用于接收数据写请求,所述数据写请求包括待存储数据;A receiving module, configured to receive a data write request, where the data write request includes data to be stored;
处理模块,用于对所述待存储数据进行处理,得到多个数据块,每个所述数据块均分配有序号;A processing module, used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;
发送模块,用于针对每个所述数据块,根据所述数据块的序号,从所述多个存储节点中确定出目标节点,并将所述数据块与所述待存储数据的数据版本号发送给所述目标节点,以使所述目标节点将所述数据块和所述数据版本号分别存储至所述目标节点的第一空间和第二空间,其中,所述数据块的序号与所述目标节点的编号满足预设映射关系,所述第一空间位于所述第二空间之前。A sending module, configured to determine, for each data block, a target node from the plurality of storage nodes according to the sequence number of the data block, and combine the data block with the data version number of the data to be stored. Sent to the target node, so that the target node stores the data block and the data version number in the first space and the second space of the target node respectively, where the sequence number of the data block is the same as the data version number. The number of the target node satisfies a preset mapping relationship, and the first space is located before the second space.
第三方面,本申请提供一种数据接入端,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时,实现如前述实施方式任一项所述的数据处理方法。In a third aspect, the present application provides a data access terminal, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements any one of the preceding embodiments. Data processing methods.
第四方面,本申请提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现如前述实施方式任一项所述的数据处理方法。In a fourth aspect, the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the data processing method as described in any one of the preceding embodiments is implemented.
相较于现有技术,本申请实施例提供的一种数据处理方法、装置、数据接入端及存储介质,首先,接收数据写请求,数据写请求包括待存储数据;然后,对待存储数据进行处理,得到多个数据块,每个数据块均分配有序号;接着,针对每个数据块,根据数据块的序号,从多个存储节点中确定出目标节点,并将数据块与待存储数据的数据版本号发送给目标节点,以使目标节点将数据块和数据版本号分别存储至目标节点的第一空间和第二空间,其中,数据块的序号与目标节点的编号满足预设映射关系,第一空间位于第二空间之前。由于本申请实施例将每个数据块与待存储数据的数据版本号一起发送给目标节点,使目标节点将数据块和数据版本号分别存储至其第一空间和第二空间,且目标节点中第一空间位于第二空间之前,从而避免出现各存储节点中数据版本号一致,而数据块存储失败的情况,使数据接入端能及时对存储失败的数据块进行恢复,保证数据的一致性。 Compared with the existing technology, embodiments of the present application provide a data processing method, device, data access terminal and storage medium. First, a data write request is received, and the data write request includes data to be stored; then, the data to be stored is processed. After processing, multiple data blocks are obtained, each data block is assigned a sequence number; then, for each data block, the target node is determined from multiple storage nodes according to the sequence number of the data block, and the data block is combined with the data to be stored The data version number is sent to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block and the number of the target node satisfy the preset mapping relationship , the first space is before the second space. Since the embodiment of the present application sends each data block to the target node together with the data version number of the data to be stored, the target node stores the data block and data version number in its first space and second space respectively, and the target node The first space is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly recover the failed data blocks to ensure data consistency. .
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application and therefore do not It should be regarded as a limitation of the scope. For those of ordinary skill in the art, other relevant drawings can be obtained based on these drawings without exerting creative efforts.
图1为本申请实施例提供的分布式存储系统的一种结构示意图;Figure 1 is a schematic structural diagram of a distributed storage system provided by an embodiment of the present application;
图2为本申请实施例提供的分布式存储过程的一种示意图;Figure 2 is a schematic diagram of the distributed storage process provided by the embodiment of the present application;
图3为本申请实施例提供的数据处理方法的一种流程示意图;Figure 3 is a schematic flow chart of the data processing method provided by the embodiment of the present application;
图4为本申请实施例提供的数据写请求响应过程的一种示意图;Figure 4 is a schematic diagram of the data write request response process provided by the embodiment of the present application;
图5为本申请实施例提供的步骤S102的实现方式的一种流程示意图;Figure 5 is a schematic flow chart of the implementation of step S102 provided by the embodiment of the present application;
图6为本申请实施例提供的数据处理方法的另一种流程示意图;Figure 6 is another schematic flow chart of the data processing method provided by the embodiment of the present application;
图7为本申请实施例提供的数据读请求响应过程的一种示意图;Figure 7 is a schematic diagram of the data read request response process provided by the embodiment of the present application;
图8为本申请实施例提供的数据读请求响应过程的另一种示意图;Figure 8 is another schematic diagram of the data read request response process provided by the embodiment of the present application;
图9为本申请实施例提供的数据写请求响应过程的一种示例;Figure 9 is an example of the data write request response process provided by the embodiment of the present application;
图10为本申请实施例提供的数据读请求响应过程的一种示例;Figure 10 is an example of the data read request response process provided by the embodiment of the present application;
图11为本申请实施例提供的数据读请求响应过程的另一种示例;Figure 11 is another example of the data read request response process provided by the embodiment of the present application;
图12为本申请实施例提供的数据接入端的一种结构示意框图;Figure 12 is a schematic structural block diagram of a data access terminal provided by an embodiment of the present application;
图13为本申请实施例提供的数据处理装置的一种功能单元框图。Figure 13 is a functional unit block diagram of the data processing device provided by the embodiment of the present application.
图标:300-数据接入端;310-存储器;320-处理器;400-数据处理装置;401-接收模块;402-处理模块;403-发送模块。Icon: 300-data access terminal; 310-memory; 320-processor; 400-data processing device; 401-receiving module; 402-processing module; 403-sending module.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments These are part of the embodiments of this application, but not all of them. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。Accordingly, the following detailed description of the embodiments of the application provided in the appended drawings is not intended to limit the scope of the claimed application, but rather to represent selected embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters represent similar items in the following figures, therefore, once an item is defined in one figure, it does not need further definition and explanation in subsequent figures.
此外,若出现术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对 重要性。In addition, if the terms "first", "second", etc. appear, they are only used to distinguish the description and cannot be understood as indicating or implying relative importance.
需要说明的是,在不冲突的情况下,本申请的实施例中的特征可以相互结合。It should be noted that, as long as there is no conflict, the features in the embodiments of the present application can be combined with each other.
大数据时代下,随着海量数据爆发式的增长,分布式存储越来越多的被应用。分布式储存要求存储系统具有高可用、高性能等特点。然而,当传统的文件系统应用于分布式存储时,大多存在着读写性能差、可靠性低等问题,同时当节点数量分布较多时,也对数据一致性保证带来很大的挑战。In the era of big data, with the explosive growth of massive data, distributed storage is increasingly used. Distributed storage requires the storage system to have high availability and high performance. However, when traditional file systems are applied to distributed storage, most of them have problems such as poor read and write performance and low reliability. At the same time, when the number of nodes is large, it also poses a great challenge to ensuring data consistency.
目前业界的应对方法,包括,为数据分配位置信息以进行数据读取写入,但该方法会造成数据碎片化,且数据的随机读写会导致存储系统性能较差;或将数据的元数据存储在多个存储节点并拷贝副本,同时对元数据进行校验和修复来保证数据的一致性,但该方法的弊端在于副本拷贝会占用多余空间,校验和修复也较为耗时且消耗系统性能;或读写数据时判断数据的各个副本的数据版本号是否一致,若不一致,则选择最完整的副本进行替换,但该方法在节点离线或出现数据异常时无法准确判断各个副本的数据版本号是否一致,会导致数据不一致的情况;或优化纠删编码技术及多副本技术来提高分布式的数据存储性能,但是当数据量较大时,算法优化无法节约较大的性能。The current industry response methods include allocating location information to data for data reading and writing, but this method will cause data fragmentation, and random reading and writing of data will lead to poor storage system performance; or the metadata of the data will be changed. Store in multiple storage nodes and copy copies, while verifying and repairing metadata to ensure data consistency. However, the disadvantage of this method is that the copy copy will occupy excess space, and the checksum repair is also time-consuming and consumes the system. Performance; or when reading and writing data, determine whether the data version numbers of each copy of the data are consistent. If they are inconsistent, select the most complete copy for replacement. However, this method cannot accurately determine the data version of each copy when the node is offline or data abnormality occurs. Whether the numbers are consistent will lead to data inconsistency; erasure coding technology and multi-copy technology can be optimized to improve distributed data storage performance. However, when the amount of data is large, algorithm optimization cannot save greater performance.
也就是说,现有的分布式存储技术暂未解决如何保证数据一致性的同时,提高分布式存储系统的读写性能的问题。In other words, the existing distributed storage technology has not yet solved the problem of how to ensure data consistency while improving the read and write performance of the distributed storage system.
有鉴于此,本申请实施例提供了一种数据处理方法,下面将进行详细介绍。In view of this, embodiments of the present application provide a data processing method, which will be introduced in detail below.
请参照图1,图1为本申请实施例提供的分布式存储系统的结果示意图,该分布式存储系统包括数据接入端和多个存储节点。数据接入端与每个存储节点均通信连接。Please refer to Figure 1. Figure 1 is a schematic diagram of the results of a distributed storage system provided by an embodiment of the present application. The distributed storage system includes a data access terminal and multiple storage nodes. The data access terminal communicates with each storage node.
数据接入端可以与上层应用或外部主机交互,接收上层应用或外部主机发送的数据写请求,如图2所示,数据接入端根据纠删编码算法将数据分为n个原始数据块和m个校验数据块,同时为每个原始数据块或校验数据块生成数据版本号,最后同时将每个原始数据块或校验数据块以及对应的数据版本号写到每个存储节点中。当进行读数据请求时,数据接入端从存储节点读取数据块的数据版本号进行比对,若数据块的数据版本号一致,则进行数据块的读取,以响应数据读请求,若读取的数据块的数据版本号不一致,则数据接入端会将原始数据块和校验数据块通过纠删计算来进行数据恢复后再读取。数据接入端可以是服务器、个人计算机(Personal Computer,以下简称PC)、笔记本电脑等,数据接入端也可以是一个设备上的一个或多个程序模块,或者一个设备上运行的虚拟机或容器,客户端还可以是多个设备组成的集群,例如可以是分布在多台设备上的多个程序模块的统称。The data access end can interact with upper-layer applications or external hosts and receive data write requests sent by upper-layer applications or external hosts. As shown in Figure 2, the data access end divides the data into n original data blocks and m verification data blocks, while generating a data version number for each original data block or verification data block, and finally writing each original data block or verification data block and the corresponding data version number to each storage node at the same time . When a read data request is made, the data access end reads the data version number of the data block from the storage node for comparison. If the data version number of the data block is consistent, the data block is read in response to the data read request. If If the data version numbers of the read data blocks are inconsistent, the data access end will restore the original data blocks and the verification data blocks through erasure calculation and then read them. The data access end can be a server, a personal computer (hereinafter referred to as PC), a laptop, etc. The data access end can also be one or more program modules on a device, or a virtual machine or virtual machine running on a device. A container or client can also be a cluster composed of multiple devices, for example, it can be a collective name for multiple program modules distributed on multiple devices.
存储节点可以存储来自数据接入端的原始数据块和/或校验数据块,存储节点可以是服务器、PC、笔记本电脑等。存储节点可以是物理存储节点,也可以是物理存储节点划分得到的逻辑存储节点。 The storage node can store original data blocks and/or verification data blocks from the data access terminal. The storage node can be a server, PC, laptop, etc. The storage node can be a physical storage node or a logical storage node obtained by dividing the physical storage node.
请参照图3,图3示出了本申请实施例提供的数据处理方法的一种流程,该数据处理方法包括步骤S101至S103,其执行主体为图1中的数据接入端。Please refer to Figure 3. Figure 3 shows a flow of the data processing method provided by the embodiment of the present application. The data processing method includes steps S101 to S103, and the execution subject is the data access terminal in Figure 1.
S101,接收数据写请求。S101, receive a data write request.
其中,数据写请求包括待存储数据,数据写请求可以是上层应用或外部主机发送给数据接入端的。The data write request includes data to be stored, and the data write request may be sent to the data access terminal by an upper-layer application or an external host.
S102,对待存储数据进行处理,得到多个数据块。S102, process the data to be stored and obtain multiple data blocks.
其中,数据接入端在接收到数据写请求后,将长度为L的待存储数据切分为n个原始数据块,再通过纠删比例生成m个校验数据块,并且为n个原始数据块均分配了序号,序号的值范围为[1,n],为m个校验数据块也分配了序号,序号的值范围为[1,m](见图4)。Among them, after receiving the data write request, the data access end divides the data to be stored with length L into n original data blocks, and then generates m verification data blocks through the erasure ratio, and n original data Each block is assigned a sequence number, and the value range of the sequence number is [1, n]. The m check data blocks are also assigned a sequence number, and the value range of the sequence number is [1, m] (see Figure 4).
S103,针对每个数据块,根据数据块的序号,从多个存储节点中确定出目标节点,并将数据块与待存储数据的数据版本号发送给目标节点,以使目标节点将数据块和数据版本号分别存储至目标节点的第一空间和第二空间。S103. For each data block, determine the target node from multiple storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node combines the data block and the data version number. The data version numbers are stored in the first space and the second space of the target node respectively.
其中,数据块的序号与目标节点的编号满足预设映射关系,第一空间位于第二空间之前。如图4所示,存储节点分为数据节点和校验节点,分别用于存储原始数据块和校验数据块,数据节点的总个数为n,校验节点的总个数为m,且每个数据节点均设置了编号,编号的值的范围为[1,n],每个校验节点也均设置了编号,编号的值的范围为[1,m]。The serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space. As shown in Figure 4, storage nodes are divided into data nodes and check nodes, which are used to store original data blocks and check data blocks respectively. The total number of data nodes is n, and the total number of check nodes is m, and Each data node is set with a number, and the range of the number value is [1, n]. Each check node is also set with a number, and the range of the number value is [1, m].
预设映射关系包括n个原始数据块的序号与n个数据节点的编号之间的对应关系,以及m个校验数据块的序号与m个校验节点的编号之间的对应关系。针对每个原始数据块或校验数据块,根据其序号和预设映射关系,从n个数据节点或m个校验节点中确定出目标节点,再将该原始数据块或校验数据块和基于当前时间戳生成的待存储数据的数据版本号一起发送给目标节点,例如,原始数据块1的目标节点为数据节点1,校验数据块m的目标节点为校验节点m(见图4)。The preset mapping relationship includes the correspondence between the sequence numbers of n original data blocks and the numbers of n data nodes, and the correspondence between the sequence numbers of m check data blocks and the numbers of m check nodes. For each original data block or check data block, the target node is determined from n data nodes or m check nodes according to its sequence number and preset mapping relationship, and then the original data block or check data block is summed The data version number of the data to be stored generated based on the current timestamp is sent to the target node together. For example, the target node of the original data block 1 is data node 1, and the target node of the verification data block m is the verification node m (see Figure 4 ).
针对每个数据节点或校验节点,其磁盘空间均被划分为第一空间和第二空间,且第一空间位于第二空间之前。在写入数据块和数据版本号时,数据块要先于数据版本号落盘,从而避免了数据版本号存储成功,但数据块存储失败的情况出现,使数据接入端能及时对存储失败的数据块进行恢复,以保证数据的一致性。For each data node or check node, its disk space is divided into a first space and a second space, and the first space is located before the second space. When writing the data block and data version number, the data block must be placed on the disk before the data version number, thereby avoiding the situation where the data version number is successfully stored but the data block storage fails, so that the data access end can promptly respond to the storage failure. Data blocks are restored to ensure data consistency.
本申请实施例提供的上述方法,其有益效果在于,通过将每个数据块与待存储数据的数据版本号一起发送给目标节点,使目标节点将数据块和数据版本号分别存储至其第一空间和第二空间,且目标节点中第一空间位于第二空间之前,从而避免出现各存储节点中数据版本号一致,而数据块存储失败的情况,使数据接入端能及时对存储失败的数据块进行恢复,保证数据的一致性。The beneficial effect of the above method provided by the embodiment of the present application is that by sending each data block together with the data version number of the data to be stored to the target node, the target node stores the data block and the data version number respectively in its first space and the second space, and the first space in the target node is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly respond to the storage failure. Data blocks are restored to ensure data consistency.
由于现有方法中,数据版本号是由各存储节点生成的,当某存储节点异常断电时,除了 出现数据版本号存储成功,但数据块存储失败的情况,还可能出现数据版本号存储错误,但数据块存储成功的情况,这会使得数据接入端对各存储节点中存储成功的数据块进行恢复,消耗系统性能。对此,在执行步骤S102之前,本申请实施例还提供了一种由数据接入端生成数据版本号的实现方式,下面进行详细介绍。Since in the existing method, the data version number is generated by each storage node, when a storage node is abnormally powered off, in addition to If the data version number is stored successfully, but the data block storage fails, or the data version number is stored incorrectly, but the data block is stored successfully, this will cause the data access end to process the successfully stored data blocks in each storage node. Recovery consumes system performance. In this regard, before executing step S102, the embodiment of the present application also provides an implementation method of generating a data version number by the data access terminal, which will be introduced in detail below.
在本申请实施例中,数据接入端在生成待存储数据的数据版本号时,可能存在以下两个情况。In this embodiment of the present application, when the data access terminal generates the data version number of the data to be stored, the following two situations may occur.
情况一:若数据写请求为一个,则将当前时间戳作为待存储数据的数据版本号。Case 1: If there is one data write request, use the current timestamp as the data version number of the data to be stored.
其中,当数据接入端接收到的数据写请求为一个时,可以直接将当前分布式存储系统的系统时间戳作为该数据写请求中的待存储数据的数据版本号。Among them, when the data access terminal receives one data write request, the system timestamp of the current distributed storage system can be directly used as the data version number of the data to be stored in the data write request.
情况二:若数据写请求为多个,则对当前时间戳进行多次自增运算,并按照每个数据写请求的接收时间的先后,将每次自增运算的结果作为一个数据写请求中的待存储数据的数据版本号。Case 2: If there are multiple data write requests, perform multiple auto-increment operations on the current timestamp, and use the result of each auto-increment operation as one data write request according to the reception time of each data write request. The data version number of the data to be stored.
其中,当数据接入端接收到的数据写请求为多个时,对当前分布式存储系统的系统时间戳作为初始值进行多次自增运算,自增运算的次数为数据写请求的总个数,并且按照每个数据写请求的接收时间的先后,将每次自增运算的结果均作为一个数据写请求中的待存储数据的数据版本号,从而得到每个数据写请求中的待存储数据的数据版本号。Among them, when the data access end receives multiple data write requests, the system timestamp of the current distributed storage system is used as the initial value to perform multiple self-increment operations. The number of self-increment operations is the total number of data write requests. number, and according to the reception time of each data write request, the result of each auto-increment operation is used as the data version number of the data to be stored in a data write request, thereby obtaining the data to be stored in each data write request. The data version number of the data.
本申请实施例还提供的上述数据版本号的生成方式,可以实现对数据版本号进行统一管理,并通过该数据版本号来保证数据的一致性。The above-mentioned data version number generation method also provided by the embodiment of the present application can realize unified management of data version numbers, and ensure the consistency of data through the data version number.
下面对步骤S102进行详细介绍。Step S102 is introduced in detail below.
请参照图5,图5示出了本申请实施例提供的步骤S102的实现方式的一种流程,步骤S102包括子步骤S102-1至S102-3。Please refer to Figure 5. Figure 5 shows a flow of implementation of step S102 provided by the embodiment of the present application. Step S102 includes sub-steps S102-1 to S102-3.
S102-1,按照预设长度将待存储数据切分为多个数据条带。S102-1: Divide the data to be stored into multiple data strips according to the preset length.
其中,每个数据条带的长度均为预设长度。如图4所示,预设长度为x,长度为L的待存储数据被切分为L/x个数据条带。The length of each data strip is a preset length. As shown in Figure 4, the preset length is x, and the data to be stored with length L is divided into L/x data strips.
S102-2,将每预设数量个数据条带组成一个原始数据块,得到多个原始数据块。S102-2: Combine each preset number of data strips into an original data block to obtain multiple original data blocks.
其中,预设数量由预设长度、数据节点的个数以及待存储数据的长度确定。如图4所示,待存储数据的长度为L,预设长度为x,数据节点的个数为n,则预设数量为L/nx。可以理解地,按照数据条带的切分顺序,每L/nx数据条带组成一个原始数据块,一共得到n个原始数据块。The preset number is determined by the preset length, the number of data nodes, and the length of the data to be stored. As shown in Figure 4, the length of the data to be stored is L, the preset length is x, and the number of data nodes is n, so the preset number is L/nx. Understandably, according to the cutting order of the data strips, each L/nx data strips form an original data block, and a total of n original data blocks are obtained.
S102-3,对多个原始数据块进行纠删编码,得到多个校验数据块。S102-3: Perform erasure coding on multiple original data blocks to obtain multiple verification data blocks.
其中,如图4所示,对n个原始数据块进行纠删运算,得到m个校验数据块,可以理解地,每个校验数据块均包括L/nx个数据条带,每个数据条带的长度为x。 Among them, as shown in Figure 4, erasure operation is performed on n original data blocks to obtain m verification data blocks. Understandably, each verification data block includes L/nx data strips, and each data The length of the strip is x.
如图4所示,每个存储节点中为数据块和待存储数据的数据版本号分配了固定且连续的第一空间和第二空间,将一个数据块内数据读写的方式由随机读写变为顺序读写,提高了分布式存储系统的读写性能。As shown in Figure 4, each storage node is allocated a fixed and continuous first space and a second space for the data block and the data version number of the data to be stored. The method of reading and writing data in a data block is changed to random reading and writing. Change to sequential reading and writing, which improves the reading and writing performance of the distributed storage system.
在介绍完数据接入端处理数据写请求的过程,下面将对数据接入端处理数据读请求的过程进行详细介绍。After introducing the process of the data access terminal processing data write requests, the following will introduce in detail the process of the data access terminal processing data read requests.
请参照图6,图6示出了本申请实施例提供的数据处理方法的另一种流程,该数据处理方法包括步骤S201至S207。Please refer to FIG. 6 , which shows another flow of the data processing method provided by the embodiment of the present application. The data processing method includes steps S201 to S207.
S201,接收数据读请求。S201, receive a data read request.
其中,数据读请求包括待读取数据的写入节点顺序,待读取数据包括数据接入端通过处理数据写请求,存储至多个数据节点的原始数据块,而写入节点顺序是指原始数据块在存储节点的第一空间中的位置次序。Among them, the data read request includes the order of writing nodes of the data to be read. The data to be read includes the original data blocks stored in multiple data nodes by the data access end through processing the data write request. The order of writing nodes refers to the original data. The position order of the blocks in the first space of the storage node.
如图4所示,数据接入端按照接收时间先后,依次处理了数据写请求1,数据写请求2,…,数据写请求k,其中,数据节点1的第一空间中依次写入了数据写请求1对应的原始数据块1,数据写请求2对应的原始数据块1,…,数据写请求k对应的原始数据块1,同样地,数据节点1的第二空间中依次写入了数据写请求1中待存储数据的数据版本号,数据写请求2中待存储数据的数据版本号,…,数据写请求k中待存储数据的数据版本号,可以理解地,数据节点n的第一空间中依次写入了数据写请求1对应的原始数据块n,数据写请求2对应的原始数据块n,…,数据写请求k对应的原始数据块n,数据节点n的第二空间中依次写入了数据写请求1中待存储数据的数据版本号,数据写请求2中待存储数据的数据版本号,…,数据写请求k中待存储数据的数据版本号。若待读取数据的写入节点顺序为2,则由数据节点1至数据节点n中的第二个原始数据块组成待读取数据。As shown in Figure 4, the data access terminal processed data write request 1, data write request 2,..., data write request k in order of reception time, among which data was written in the first space of data node 1 in sequence. Original data block 1 corresponding to write request 1, original data block 1 corresponding to data write request 2,..., original data block 1 corresponding to data write request k. Similarly, data is written in the second space of data node 1 in sequence. The data version number of the data to be stored in write request 1, the data version number of the data to be stored in data write request 2,..., the data version number of the data to be stored in data write request k, understandably, the first of data node n The original data block n corresponding to data write request 1, the original data block n corresponding to data write request 2,..., the original data block n corresponding to data write request k are sequentially written in the space, and the second space of data node n is sequentially written. The data version number of the data to be stored in data write request 1, the data version number of the data to be stored in data write request 2,..., the data version number of the data to be stored in data write request k are written. If the writing node order of the data to be read is 2, the data to be read is composed of the second original data block in data node 1 to data node n.
S201,根据写入节点顺序,从每个存储节点的第二空间中读取目标数据版本号。S201: Read the target data version number from the second space of each storage node according to the order of writing nodes.
其中,每个存储节点的第二空间内存储有多个数据版本号,且每个数据版本号占据的空间大小是相同的。针对每个存储节点,按照写入节点顺序和数据版本号占据的空间大小,确定出第二空间中的目标区域,将从目标区域读取的内容作为目标数据版本号。Wherein, multiple data version numbers are stored in the second space of each storage node, and the size of the space occupied by each data version number is the same. For each storage node, the target area in the second space is determined according to the order of writing nodes and the space occupied by the data version number, and the content read from the target area is used as the target data version number.
例如,写入节点顺序为4,每个数据版本号占据的空间大小均为8B,则计算出每个存储节点的第二空间的内的目标区域为第24B至第32B,将从每个存储节点的第二空间中的目标区域内读取到的内容,作为该存储节点对应的目标数据版本号。For example, if the order of writing nodes is 4, and the space size occupied by each data version number is 8B, then the target area in the second space of each storage node is calculated to be the 24th B to the 32nd B, and the target area will be calculated from each storage node. The content read in the target area in the second space of the node is used as the target data version number corresponding to the storage node.
S203,若所有目标数据版本号均一致,则根据写入节点顺序,从每个存储节点的第一空间中读取目标数据块。S203, if all target data version numbers are consistent, read the target data block from the first space of each storage node according to the order of writing nodes.
其中,当从每个存储节点的第二空间中读取的目标数据版本号均一致时,则意味着每个存储节点处用于组成待读取数据的数据块均存储成功。同样地,每个存储节点的第一空 间内存储有多个数据块,且每个数据块占据的空间大小是相同的。针对每个存储节点,按照写入节点顺序和数据块占据的空间大小,确定出第一空间中的目标区域,将从目标区域读取的内容作为目标数据块,再组合所有目标数据块得到待读取数据。Among them, when the target data version numbers read from the second space of each storage node are consistent, it means that the data blocks used to form the data to be read at each storage node are successfully stored. Similarly, the first empty of each storage node Multiple data blocks are stored in the space, and each data block occupies the same size of space. For each storage node, the target area in the first space is determined according to the writing node order and the space occupied by the data block. The content read from the target area is used as the target data block, and then all target data blocks are combined to obtain the target data block. Read data.
例如,写入节点顺序为4,每个数据块占据的空间大小均为128K,则计算出每个存储节点的第一空间的内的目标区域为第384K至第512K,将从每个存储节点的第一空间中的目标区域内读取到的内容,作为该存储节点对应的目标数据块。For example, if the writing node order is 4 and the space size occupied by each data block is 128K, then the target area within the first space of each storage node is calculated to be the 384Kth to 512Kth, and the target area will be calculated from each storage node. The content read in the target area in the first space is used as the target data block corresponding to the storage node.
S204,若存在不一致的目标数据版本号,则根据每个目标数据版本号将多个存储节点划分为正常节点和异常节点。S204, if there are inconsistent target data version numbers, divide the multiple storage nodes into normal nodes and abnormal nodes according to each target data version number.
其中,当从每个存储节点的第二空间中读取的目标数据版本号中,存在不一致的目标数据版本号时,则意味着一些存储节点处用于组成待读取数据的数据块是存储失败的。此时,则根据每个目标数据版本号,将多个存储节点划分为正常节点和异常节点两类。由于正常节点处用于组成待读取数据的数据块是存储成功的,异常节点处用于组成待读取数据的数据块是存储失败的,可以理解地,所有正常节点对应的目标数据版本号均一致,每个异常节点对应的目标数据版本号与所有正常节点对应的目标数据版本号不一致。Among them, when there are inconsistent target data version numbers in the target data version numbers read from the second space of each storage node, it means that the data blocks used to form the data to be read at some storage nodes are stored Failure. At this time, multiple storage nodes are divided into two categories: normal nodes and abnormal nodes based on each target data version number. Since the data blocks used to form the data to be read at the normal nodes are successfully stored, and the data blocks used to form the data to be read at the abnormal nodes are failed to be stored, it is understandable that the target data version numbers corresponding to all normal nodes are consistent, the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes.
S205,根据写入节点顺序,从每个正常节点的第一空间中读取目标数据块。S205: Read the target data block from the first space of each normal node according to the order of writing nodes.
其中,同上述步骤203,每个正常点的第一空间内存储有多个数据块,且每个数据块占据的空间大小是相同的。针对每个正常节点,按照写入节点顺序和数据块占据的空间大小,确定出第一空间中的目标区域,将从目标区域读取的内容作为目标数据块。Wherein, the same as step 203 above, multiple data blocks are stored in the first space of each normal point, and the size of the space occupied by each data block is the same. For each normal node, the target area in the first space is determined according to the writing node order and the space occupied by the data block, and the content read from the target area is used as the target data block.
S206,根据每个正常节点对应的目标数据块,恢复每个异常节点对应的目标数据块。S206: According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node.
其中,通过对所有正常节点对应的目标数据块进行纠删计算,恢复每个异常节点对应的目标数据块。Among them, by performing erasure calculation on the target data blocks corresponding to all normal nodes, the target data blocks corresponding to each abnormal node are restored.
S207,根据预设映射关系和所有目标数据块,生成待读取数据,以响应数据读请求。S207: Generate data to be read according to the preset mapping relationship and all target data blocks to respond to the data read request.
其中,如图7所示,当从每个存储节点的第二空间中读取的目标数据版本号均一致时,可以利用所有从数据节点的第一空间内读取目标数据块来生成待读取数据。如图8所示,当从每个存储节点的第二空间中读取的目标数据版本号中,存在不一致的目标数据版本号时,可以利用所有数据节点(可能是正常节点,也可能是异常节点)对应的目标数据块来生成待读取数据。Among them, as shown in Figure 7, when the target data version numbers read from the second space of each storage node are consistent, the target data blocks read from the first space of all data nodes can be used to generate to-be-read Get data. As shown in Figure 8, when there is an inconsistent target data version number in the target data version number read from the second space of each storage node, all data nodes (which may be normal nodes or abnormal ones) can be used node) to generate the data to be read.
步骤S207的实现过程如下:The implementation process of step S207 is as follows:
首先,根据预设映射关系和每个存储节点的编号,为每个目标数据块分配序号;First, assign a sequence number to each target data block based on the preset mapping relationship and the number of each storage node;
然后,按照每个目标数据块的序号对所有目标数据块进行排序,得到待读取数据。Then, sort all the target data blocks according to the sequence number of each target data block to obtain the data to be read.
可以理解地,由于异常节点处存储的用于组成待读取数据的数据块是存储失败的,为了便于后续的数据读取,还需将恢复处理得到的异常节点对应地目标数据块进行重新写入, 详细的实现过程如下:Understandably, since the data blocks used to compose the data to be read stored at the abnormal node failed to be stored, in order to facilitate subsequent data reading, the target data block corresponding to the abnormal node obtained by the recovery processing needs to be rewritten. enter, The detailed implementation process is as follows:
首先,根据写入节点顺序和预设大小,确定每个异常节点的第一空间中的目标区域;First, determine the target area in the first space of each abnormal node according to the writing node order and preset size;
然后,针对每个异常节点,利用异常节点对应的目标数据块,对异常节点的目标区域的内容进行覆盖。Then, for each abnormal node, the target data block corresponding to the abnormal node is used to cover the content of the target area of the abnormal node.
为了更加清楚的介绍前述的数据处理方法,本申请实施例假设分布式存储系统中存储节点的数量为3(2个数据节点,1个校验节点),纠删比例为2:1进行举例说明。In order to introduce the aforementioned data processing method more clearly, the embodiment of this application assumes that the number of storage nodes in the distributed storage system is 3 (2 data nodes, 1 check node), and the erasure ratio is 2:1 for illustration. .
如图9所示,数据接入端接收上层应用或外部主机发送的数据写请求,将数据写请求中总长度为256K的待存储数据,根据纠删比例拆分为2个原始数据块和1个校验数据块,每个原始数据块或校验数据块均包括32个数据条带,每个书数据条带的长度为4K。根据分布式存储系统的当前时间戳,得到待存储数据的数据版本号为164961834。As shown in Figure 9, the data access end receives the data write request sent by the upper-layer application or external host, and splits the data to be stored with a total length of 256K in the data write request into 2 original data blocks and 1 according to the erasure ratio. A check data block, each original data block or check data block includes 32 data strips, and the length of each data strip is 4K. According to the current timestamp of the distributed storage system, the data version number of the data to be stored is 164961834.
根据预设映射关系,将序号为1的原始数据块和数据版本号分别写入编号为1的数据节点的第一空间和第二空间,将序号为2的原始数据块和数据版本号写入编号为2的数据节点的第一空间和第二空间,将序号为1的校验数据块和数据版本号写入编号为1的校验节点的第一空间和第二空间,以响应数据写请求。According to the preset mapping relationship, the original data block with serial number 1 and the data version number are written into the first space and the second space of the data node numbered 1 respectively, and the original data block with serial number 2 and the data version number are written into In the first space and second space of the data node numbered 2, write the verification data block and data version number numbered 1 into the first space and second space of the verification node numbered 1 in response to the data write ask.
每个数据节点由多个64MB大小的第一空间和128KB大小的第二空间组成,多个数据写请求将多个原始数据块和对应的数据版本号连续存储再数据节点的磁盘上。同样地,每个校验节点也由多个64MB大小的第一空间和128KB大小的第二空间组成,多个数据写请求将多个校验块和对应的数据版本号连续存储再数据节点的磁盘上。Each data node consists of multiple 64MB first spaces and 128KB second spaces. Multiple data write requests continuously store multiple original data blocks and corresponding data version numbers on the disk of the data node. Similarly, each check node is also composed of multiple first spaces of 64MB size and second space of 128KB size. Multiple data write requests continuously store multiple check blocks and corresponding data version numbers in the data node. on disk.
数据接入端接收上层应用或外部主机发送的数据读请求,根据数据读请求中的待读取数据的写入节点顺序,2个数据节点和1个校验节点的第二空间中读取目标数据版本号,并进行对比。The data access end receives a data read request sent by an upper-layer application or an external host. According to the order of writing nodes of the data to be read in the data read request, the target is read in the second space of 2 data nodes and 1 check node. Data version number and compare.
如图10所示,若从2个数据节点和1个校验节点的第二空间中读取目标数据版本号一致,且满足纠删比例,则根据待读取数据的写入节点顺序从2个数据节点的第一空间中读取目标数据块,再根据预设映射关系和数据节点的编号,组合所有目标数据块,得到待读取数据,以响应数据读请求。As shown in Figure 10, if the version numbers of the target data read from the second space of 2 data nodes and 1 check node are consistent and the erasure ratio is met, then the order of writing nodes to be read is from 2 Read the target data block from the first space of each data node, and then combine all the target data blocks according to the preset mapping relationship and the number of the data node to obtain the data to be read in response to the data read request.
如图11所示,若从数据节点1的第二空间中读取的目标数据版本号与从数据节点1的第二空间中读取的目标数据版本号不一致,与从校验节点1的第二空间中读取的目标数据版本号一致,在满足纠删比例的情况下,数据接入端首先根据待读取数据的写入节点顺序从数据节点1和校验节点1的第一空间内读取目标数据块,并通过纠删计算恢复出数据节点2对应的目标数据块。在完成恢复处理后,数据接入端根据预设映射关系和数据节点的编号,将数据节点1和数据节点2对应的目标数据块组合成待读取数据,以响应数据读请求。 As shown in Figure 11, if the target data version number read from the second space of data node 1 is inconsistent with the target data version number read from the second space of data node 1, it will be inconsistent with the target data version number read from the second space of check node 1. The target data version numbers read in the two spaces are consistent. When the erasure ratio is met, the data access end first reads the data from the first space of data node 1 and check node 1 according to the order of the writing nodes of the data to be read. Read the target data block and recover the target data block corresponding to data node 2 through erasure calculation. After completing the recovery process, the data access end combines the target data blocks corresponding to data node 1 and data node 2 into data to be read according to the preset mapping relationship and the number of the data node to respond to the data read request.
进一步地,本申请实施例还提供了数据接入端300的一种结构示意框图,请参照图12,数据接入端300可以包括存储器310和处理器320。Furthermore, this embodiment of the present application also provides a schematic structural block diagram of the data access terminal 300. Please refer to FIG. 12. The data access terminal 300 may include a memory 310 and a processor 320.
其中,处理器320可以是一个通用的中央处理器(Central Processing Unit,CPU),微处理器,特定应用集成电路(Application-Specific Integrated Circuit,ASIC),或一个或多个用于控制上述方法实施例提供的数据处理方法的程序执行的集成电路。Among them, the processor 320 can be a general central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), or one or more for controlling the implementation of the above method. Examples provide data processing methods for program execution on integrated circuits.
存储器310可以是ROM或可存储静态信息和指令的其它类型的静态存储设备,RAM或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmabler-Only MEMory,EEPROM)、只读光盘(Compactdisc Read-Only MEMory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器310可以是独立存在,通过通信总线与处理器320相连接。存储器310也可以和处理器320集成在一起。其中,存储器310用于存储执行本申请方案的机器可执行指令。处理器320用于执行存储器310中存储的机器可执行指令,以实现上述的方法实施例。The memory 310 can be ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or it can be an electrically erasable programmable read-only memory (Electrically Erasable Programmabler) -Only MEMory, EEPROM), Compactdisc Read-Only MEMory, CD-ROM or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage device, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, without limitation. The memory 310 may exist independently and be connected to the processor 320 through a communication bus. Memory 310 may also be integrated with processor 320. Among them, the memory 310 is used to store machine-executable instructions for executing the solution of the present application. The processor 320 is configured to execute machine-executable instructions stored in the memory 310 to implement the above method embodiments.
本申请实施例还提供一种包含计算机程序的计算机可读存储介质,计算机程序在被执行时可以用于执行上述的方法实施例提供的数据处理方法中的相关操作。Embodiments of the present application also provide a computer-readable storage medium containing a computer program. When executed, the computer program can be used to perform relevant operations in the data processing method provided by the above-mentioned method embodiments.
请参照图13,图13为本申请实施例提供的数据处理装置400的一种功能单元框图。数据处理装置400应用于数据接入端300,可以包括接收模块401、处理模块402、发送模块403。其中,接收模块401、处理模块402、发送模块403均能以软件形式存储于存储器或计算机可读存储介质中。需要说明的是,本申请实施例提供的数据处理装置400,其基本原理及产生的技术效果和上述实施例相同,为简要描述,本申请实施例部分未提及指出。Please refer to FIG. 13 , which is a functional unit block diagram of the data processing device 400 provided by an embodiment of the present application. The data processing device 400 is applied to the data access terminal 300 and may include a receiving module 401, a processing module 402, and a sending module 403. Among them, the receiving module 401, the processing module 402, and the sending module 403 can all be stored in the memory or computer-readable storage medium in the form of software. It should be noted that the basic principles and technical effects of the data processing device 400 provided in the embodiments of the present application are the same as those of the above-mentioned embodiments. For the sake of brief description, they are not mentioned in the embodiments of the present application.
接收模块401,用于接收数据写请求,数据写请求包括待存储数据。The receiving module 401 is used to receive a data write request, where the data write request includes data to be stored.
处理模块402,用于对待存储数据进行处理,得到多个数据块,每个数据块均分配有序号。The processing module 402 is used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number.
发送模块403,用于针对每个数据块,根据数据块的序号,从多个存储节点中确定出目标节点,并将数据块与待存储数据的数据版本号发送给目标节点,以使目标节点将数据块和数据版本号分别存储至目标节点的第一空间和第二空间,其中,数据块的序号与目标节点的编号满足预设映射关系,第一空间位于第二空间之前。The sending module 403 is used to determine the target node from multiple storage nodes according to the sequence number of the data block for each data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node The data block and the data version number are respectively stored in the first space and the second space of the target node, where the serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space.
在一种实现方式中,处理模块402还用于若数据写请求为一个,则将当前时间戳作为待存储数据的数据版本号;若数据写请求为多个,则对当前时间戳进行多次自增运算,并按照每个数据写请求的接收时间的先后,将每次自增运算的结果作为一个数据写请求中的待存储数据的数据版本号。 In one implementation, the processing module 402 is also configured to use the current timestamp as the data version number of the data to be stored if there is one data write request; if there are multiple data write requests, perform multiple times on the current timestamp. Auto-increment operation, and according to the reception time of each data write request, the result of each auto-increment operation is used as the data version number of the data to be stored in a data write request.
在一种实现方式中,处理模块402具体用于按照预设长度将待存储数据切分为多个数据条带;将每预设数量个数据条带组成一个原始数据块,得到多个校验数据块;对多个原始数据块进行纠删编码,得到多个校验数据块,多个数据块包括多个原始数据块和多个校验数据块。In one implementation, the processing module 402 is specifically configured to divide the data to be stored into multiple data strips according to a preset length; each preset number of data strips is composed into an original data block to obtain multiple checksums. Data block; perform erasure coding on multiple original data blocks to obtain multiple verification data blocks. The multiple data blocks include multiple original data blocks and multiple verification data blocks.
在一种实现方式中,接收模块401还用于接收数据读请求,数据读请求包括待读取数据的写入节点顺序;处理模块402还用于根据写入节点顺序,从每个存储节点的第二空间中读取目标数据版本号;若所有目标数据版本号均一致,则根据写入节点顺序,从每个存储节点的第一空间中读取目标数据块;根据预设映射关系和所有目标数据块,生成待读取数据,以响应数据读请求。In one implementation, the receiving module 401 is also configured to receive a data read request, which includes the writing node sequence of the data to be read; the processing module 402 is also configured to obtain the data from each storage node according to the writing node sequence. Read the target data version number from the second space; if all target data version numbers are consistent, read the target data block from the first space of each storage node according to the order of writing nodes; according to the preset mapping relationship and all The target data block generates data to be read in response to the data read request.
在一种实现方式中,处理模块402还用于若存在不一致的目标数据版本号,则根据每个目标数据版本号将多个存储节点划分为正常节点和异常节点,其中,所有正常节点对应的目标数据版本号均一致,每个异常节点对应的目标数据版本号与所有正常节点对应的目标数据版本号不一致;根据写入节点顺序,从每个正常节点的第一空间中读取目标数据块;根据每个正常节点对应的目标数据块,恢复每个异常节点对应的目标数据块;根据预设映射关系和所有目标数据块,生成待读取数据,以响应数据读请求。In one implementation, the processing module 402 is also configured to divide multiple storage nodes into normal nodes and abnormal nodes according to each target data version number if there are inconsistent target data version numbers, wherein all normal nodes correspond to The target data version numbers are all consistent, and the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes; according to the order of writing nodes, read the target data block from the first space of each normal node ; According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node; according to the preset mapping relationship and all target data blocks, generate data to be read in response to the data read request.
在一种实现方式中,处理模块402具体用于根据预设映射关系和每个存储节点的编号,为每个目标数据块分配序号;按照每个目标数据块的序号对所有目标数据块进行排序,得到待读取数据。In one implementation, the processing module 402 is specifically configured to assign a sequence number to each target data block according to the preset mapping relationship and the number of each storage node; and to sort all target data blocks according to the sequence number of each target data block. , get the data to be read.
在一种实现方式中,处理模块402还用于根据写入节点顺序和预设大小,确定每个异常节点的第一空间中的目标区域;针对每个异常节点,利用异常节点对应的目标数据块,对异常节点的目标区域的内容进行覆盖。In one implementation, the processing module 402 is also used to determine the target area in the first space of each abnormal node according to the writing node order and the preset size; for each abnormal node, use the target data corresponding to the abnormal node block, covering the contents of the target area of the abnormal node.
本申请实施例提供的一种数据处理方法、装置、数据接入端及存储介质,首先,接收数据写请求,数据写请求包括待存储数据;然后,对待存储数据进行处理,得到多个数据块,每个数据块均分配有序号;接着,针对每个数据块,根据数据块的序号,从多个存储节点中确定出目标节点,并将数据块与待存储数据的数据版本号发送给目标节点,以使目标节点将数据块和数据版本号分别存储至目标节点的第一空间和第二空间,其中,数据块的序号与目标节点的编号满足预设映射关系,第一空间位于第二空间之前。由于本申请实施例将每个数据块与待存储数据的数据版本号一起发送给目标节点,使目标节点将数据块和数据版本号分别存储至其第一空间和第二空间,且目标节点中第一空间位于第二空间之前,从而避免出现各存储节点中数据版本号一致,而数据块存储失败的情况,使数据接入端能及时对存储失败的数据块进行恢复,保证数据的一致性。Embodiments of the present application provide a data processing method, device, data access terminal and storage medium. First, a data write request is received, and the data write request includes data to be stored; then, the data to be stored is processed to obtain multiple data blocks. , each data block is assigned a serial number; then, for each data block, the target node is determined from multiple storage nodes according to the serial number of the data block, and the data block and the data version number of the data to be stored are sent to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block and the number of the target node satisfy the preset mapping relationship, and the first space is located in the second space. before space. Since the embodiment of the present application sends each data block to the target node together with the data version number of the data to be stored, the target node stores the data block and data version number in its first space and second space respectively, and the target node The first space is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly recover the failed data blocks to ensure data consistency. .
以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技 术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。 The above are only specific implementation modes of the present application, but the protection scope of the present application is not limited thereto. Anyone familiar with the art Changes or substitutions that can be easily imagined by those skilled in the art within the technical scope disclosed in this application should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (10)

  1. 一种数据处理方法,其特征在于,应用于分布式存储系统中的数据接入端,所述分布式存储系统还包括多个存储节点,每个所述存储节点均设置有编号,每个所述存储节点均与所述数据接入端通信连接,所述方法包括:A data processing method, characterized in that it is applied to the data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes. Each of the storage nodes is provided with a number, and each storage node is provided with a number. The storage nodes are all communicatively connected to the data access terminal, and the method includes:
    接收数据写请求,所述数据写请求包括待存储数据;Receive a data write request, the data write request includes data to be stored;
    对所述待存储数据进行处理,得到多个数据块,每个所述数据块均分配有序号;Process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;
    针对每个所述数据块,根据所述数据块的序号,从所述多个存储节点中确定出目标节点,并将所述数据块与所述待存储数据的数据版本号发送给所述目标节点,以使所述目标节点将所述数据块和所述数据版本号分别存储至所述目标节点的第一空间和第二空间,其中,所述数据块的序号与所述目标节点的编号满足预设映射关系,所述第一空间位于所述第二空间之前。For each data block, determine a target node from the plurality of storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block is the same as the number of the target node. Satisfying the preset mapping relationship, the first space is located before the second space.
  2. 如权利要求1所述的方法,其特征在于,所述对所述待存储数据进行处理,得到多个数据块的步骤前,所述方法还包括:The method of claim 1, wherein before processing the data to be stored to obtain a plurality of data blocks, the method further includes:
    若所述数据写请求为一个,则将当前时间戳作为所述待存储数据的数据版本号;If there is one data write request, use the current timestamp as the data version number of the data to be stored;
    若所述数据写请求为多个,则对所述当前时间戳进行多次自增运算,并按照每个所述数据写请求的接收时间的先后,将每次自增运算的结果作为一个所述数据写请求中的待存储数据的数据版本号。If there are multiple data write requests, multiple auto-increment operations are performed on the current timestamp, and the result of each auto-increment operation is used as one according to the reception time of each data write request. Describe the data version number of the data to be stored in the data write request.
  3. 如权利要求1所述的方法,其特征在于,所述对所述待存储数据进行处理,得到多个数据块的步骤包括:The method of claim 1, wherein the step of processing the data to be stored to obtain a plurality of data blocks includes:
    按照预设长度将所述待存储数据切分为多个数据条带;Divide the data to be stored into multiple data strips according to a preset length;
    将每预设数量个所述数据条带组成一个原始数据块,得到多个原始数据块;Each preset number of the data strips is formed into an original data block to obtain multiple original data blocks;
    对所述多个原始数据块进行纠删编码,得到多个校验数据块,所述多个数据块包括所述多个原始数据块和所述多个校验数据块。Erasure coding is performed on the plurality of original data blocks to obtain a plurality of verification data blocks. The plurality of data blocks include the plurality of original data blocks and the plurality of verification data blocks.
  4. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, further comprising:
    接收数据读请求,所述数据读请求包括待读取数据的写入节点顺序;Receive a data read request, the data read request includes the writing node sequence of the data to be read;
    根据所述写入节点顺序,从每个所述存储节点的第二空间中读取目标数据版本号;According to the writing node sequence, read the target data version number from the second space of each storage node;
    若所有所述目标数据版本号均一致,则根据所述写入节点顺序,从每个所述存储节点的第一空间中读取目标数据块;If all the target data version numbers are consistent, read the target data block from the first space of each storage node according to the writing node sequence;
    根据所述预设映射关系和所有所述目标数据块,生成所述待读取数据,以响应所述数据读请求。The data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
  5. 如权利要求4所述的方法,其特征在于,所述方法还包括: The method of claim 4, further comprising:
    若存在不一致的所述目标数据版本号,则根据每个所述目标数据版本号将多个所述存储节点划分为正常节点和异常节点,其中,所有所述正常节点对应的目标数据版本号均一致,每个所述异常节点对应的目标数据版本号与所有所述正常节点对应的目标数据版本号不一致;If there are inconsistent target data version numbers, multiple storage nodes are divided into normal nodes and abnormal nodes according to each target data version number, where the target data version numbers corresponding to all normal nodes are equal. Consistent, the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes;
    根据所述写入节点顺序,从每个所述正常节点的第一空间中读取目标数据块;According to the writing node sequence, read the target data block from the first space of each normal node;
    根据每个所述正常节点对应的目标数据块,恢复每个所述异常节点对应的目标数据块;According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node;
    根据所述预设映射关系和所有所述目标数据块,生成所述待读取数据,以响应所述数据读请求。The data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
  6. 如权利要求4或5所述的方法,其特征在于,所述根据所述预设映射关系和所有所述目标数据块,生成所述待读取数据的步骤包括:The method of claim 4 or 5, wherein the step of generating the data to be read according to the preset mapping relationship and all the target data blocks includes:
    根据所述预设映射关系和每个所述存储节点的编号,为每个所述目标数据块分配序号;Allocate a sequence number to each target data block according to the preset mapping relationship and the number of each storage node;
    按照每个所述目标数据块的序号对所有所述目标数据块进行排序,得到所述待读取数据。Sort all the target data blocks according to the sequence number of each target data block to obtain the data to be read.
  7. 如权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, further comprising:
    根据所述写入节点顺序和预设大小,确定每个所述异常节点的第一空间中的目标区域;Determine the target area in the first space of each abnormal node according to the writing node order and the preset size;
    针对每个所述异常节点,利用所述异常节点对应的目标数据块,对所述异常节点的目标区域的内容进行覆盖。For each abnormal node, use the target data block corresponding to the abnormal node to cover the content of the target area of the abnormal node.
  8. 一种数据处理装置,其特征在于,应用于分布式存储系统中的数据接入端,所述分布式存储系统还包括多个存储节点,每个所述存储节点均设置有编号,每个所述存储节点均与所述数据接入端通信连接,所述方法包括:A data processing device, characterized in that it is applied to a data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes. Each storage node is provided with a number, and each storage node is provided with a number. The storage nodes are all communicatively connected to the data access terminal, and the method includes:
    接收模块,用于接收数据写请求,所述数据写请求包括待存储数据;A receiving module, configured to receive a data write request, where the data write request includes data to be stored;
    处理模块,用于对所述待存储数据进行处理,得到多个数据块,每个所述数据块均分配有序号;A processing module, used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;
    发送模块,用于针对每个所述数据块,根据所述数据块的序号,从所述多个存储节点中确定出目标节点,并将所述数据块与所述待存储数据的数据版本号发送给所述目标节点,以使所述目标节点将所述数据块和所述数据版本号分别存储至所述目标节点的第一空间和第二空间,其中,所述数据块的序号与所述目标节点的编号满足预设映射关系,所述第一空间位于所述第二空间之前。A sending module, configured to determine, for each data block, a target node from the plurality of storage nodes according to the sequence number of the data block, and combine the data block with the data version number of the data to be stored. Sent to the target node, so that the target node stores the data block and the data version number in the first space and the second space of the target node respectively, where the sequence number of the data block is the same as the data version number. The number of the target node satisfies a preset mapping relationship, and the first space is located before the second space.
  9. 一种数据接入端,其特征在于,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1-7任一项所述的数据处理方法。A data access terminal, characterized in that it includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the data according to any one of claims 1-7 is realized. Approach.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现如权利要求1-7任一项所述的数据处理方法。 A computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the data processing method according to any one of claims 1-7 is implemented.
PCT/CN2023/097128 2022-06-17 2023-05-30 Data processing method and device, data access end, and storage medium WO2023241350A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210692395.0 2022-06-17
CN202210692395.0A CN114968668A (en) 2022-06-17 2022-06-17 Data processing method and device, data access terminal and storage medium

Publications (1)

Publication Number Publication Date
WO2023241350A1 true WO2023241350A1 (en) 2023-12-21

Family

ID=82963496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097128 WO2023241350A1 (en) 2022-06-17 2023-05-30 Data processing method and device, data access end, and storage medium

Country Status (2)

Country Link
CN (1) CN114968668A (en)
WO (1) WO2023241350A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114968668A (en) * 2022-06-17 2022-08-30 重庆紫光华山智安科技有限公司 Data processing method and device, data access terminal and storage medium
CN115454959B (en) * 2022-11-08 2023-01-24 中国民用航空飞行学院 Meteorological data verification method and system during aviation flight planning
CN116301670B (en) * 2023-05-25 2023-09-05 极限数据(北京)科技有限公司 Data partitioning method and data processing method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140136802A1 (en) * 2012-11-09 2014-05-15 International Business Machines Corporation Accessing data in a storage system
CN109558086A (en) * 2018-12-03 2019-04-02 浪潮电子信息产业股份有限公司 A kind of method for reading data, system and associated component
CN109684338A (en) * 2018-11-20 2019-04-26 深圳花儿数据技术有限公司 A kind of data-updating method of storage system
US20190347160A1 (en) * 2016-11-16 2019-11-14 Beijing Sankuai Online Technology Co., Ltd Erasure code-based partial write-in
CN111857603A (en) * 2020-07-31 2020-10-30 重庆紫光华山智安科技有限公司 Data processing method and related device
CN112988683A (en) * 2021-02-07 2021-06-18 北京金山云网络技术有限公司 Data processing method and device, electronic equipment and storage medium
CN113297203A (en) * 2020-07-15 2021-08-24 阿里巴巴集团控股有限公司 Data query and write-in method and device, computer storage medium and electronic equipment
CN113590041A (en) * 2021-07-29 2021-11-02 杭州宏杉科技股份有限公司 Data protection storage method, device and equipment
US20220129346A1 (en) * 2019-09-09 2022-04-28 Huawei Technologies Co., Ltd. Data processing method and apparatus in storage system, and storage system
CN114415976A (en) * 2022-03-28 2022-04-29 深圳市杉岩数据技术有限公司 Distributed data storage system and method
CN114968668A (en) * 2022-06-17 2022-08-30 重庆紫光华山智安科技有限公司 Data processing method and device, data access terminal and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140136802A1 (en) * 2012-11-09 2014-05-15 International Business Machines Corporation Accessing data in a storage system
US20190347160A1 (en) * 2016-11-16 2019-11-14 Beijing Sankuai Online Technology Co., Ltd Erasure code-based partial write-in
CN109684338A (en) * 2018-11-20 2019-04-26 深圳花儿数据技术有限公司 A kind of data-updating method of storage system
CN109558086A (en) * 2018-12-03 2019-04-02 浪潮电子信息产业股份有限公司 A kind of method for reading data, system and associated component
US20220129346A1 (en) * 2019-09-09 2022-04-28 Huawei Technologies Co., Ltd. Data processing method and apparatus in storage system, and storage system
CN113297203A (en) * 2020-07-15 2021-08-24 阿里巴巴集团控股有限公司 Data query and write-in method and device, computer storage medium and electronic equipment
CN111857603A (en) * 2020-07-31 2020-10-30 重庆紫光华山智安科技有限公司 Data processing method and related device
CN112988683A (en) * 2021-02-07 2021-06-18 北京金山云网络技术有限公司 Data processing method and device, electronic equipment and storage medium
CN113590041A (en) * 2021-07-29 2021-11-02 杭州宏杉科技股份有限公司 Data protection storage method, device and equipment
CN114415976A (en) * 2022-03-28 2022-04-29 深圳市杉岩数据技术有限公司 Distributed data storage system and method
CN114968668A (en) * 2022-06-17 2022-08-30 重庆紫光华山智安科技有限公司 Data processing method and device, data access terminal and storage medium

Also Published As

Publication number Publication date
CN114968668A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
WO2023241350A1 (en) Data processing method and device, data access end, and storage medium
US10365983B1 (en) Repairing raid systems at per-stripe granularity
US8131969B2 (en) Updating system configuration information
US8996611B2 (en) Parallel serialization of request processing
US8712976B1 (en) Managing deduplication density
US8108595B2 (en) Storage apparatus and method of managing data storage area
US8583607B1 (en) Managing deduplication density
EP3617867B1 (en) Fragment management method and fragment management apparatus
CN104750565B (en) NAND bad block processing method and NAND flash memory equipment
CN112597069A (en) Storage system, host system, and method of operating storage system
US20180267856A1 (en) Distributed storage system, data storage method, and software program
CN109344094B (en) Address mapping relation feedback method, device and equipment and readable storage medium
US10489289B1 (en) Physical media aware spacially coupled journaling and trim
US7809908B2 (en) Disk snapshot acquisition method
CN110018783B (en) Data storage method, device and system
US11640244B2 (en) Intelligent block deallocation verification
US20190199794A1 (en) Efficient replication of changes to a byte-addressable persistent memory over a network
US11481132B2 (en) Removing stale hints from a deduplication data store of a storage system
CN109240943B (en) Address mapping relation feedback method, device and equipment and readable storage medium
CN110569000A (en) Host RAID (redundant array of independent disk) management method and device based on solid state disk array
CN112748865A (en) Method, electronic device and computer program product for storage management
CN114327292B (en) File management method, system, electronic device and storage medium
CN105068896A (en) Data processing method and device based on RAID backup
KR20230088215A (en) Distributed storage system
US11467777B1 (en) Method and system for storing data in portable storage devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23822920

Country of ref document: EP

Kind code of ref document: A1