WO2019137322A1 - 数据处理方法、装置及计算设备 - Google Patents

数据处理方法、装置及计算设备 Download PDF

Info

Publication number
WO2019137322A1
WO2019137322A1 PCT/CN2019/070581 CN2019070581W WO2019137322A1 WO 2019137322 A1 WO2019137322 A1 WO 2019137322A1 CN 2019070581 W CN2019070581 W CN 2019070581W WO 2019137322 A1 WO2019137322 A1 WO 2019137322A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
minimum
additional
data unit
unit
Prior art date
Application number
PCT/CN2019/070581
Other languages
English (en)
French (fr)
Inventor
吴冬政
董乘宇
刘金鑫
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to EP19738606.3A priority Critical patent/EP3739472A4/en
Publication of WO2019137322A1 publication Critical patent/WO2019137322A1/zh
Priority to US16/923,999 priority patent/US11294592B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1435Saving, restoring, recovering or retrying at system level using file system or storage system metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a data processing method, apparatus, and computing device.
  • Append write is a common data write method in the storage system, which means adding newly written data to the already written data.
  • write operations can be performed by using additional writes. For example, if the data file to be written is large, a data file can usually be written to the disk by multiple write operations. The way to perform a write operation.
  • a data write method in the prior art is to write the additional data to a new location on the disk each time, and establish data with Stores the index relationship of the location to locate the data based on the index relationship.
  • the embodiment of the present application provides a data processing method, device, and computing device, which are used to solve the technical problem of inconvenient operation and low efficiency in the prior art.
  • a data processing method including:
  • the minimum unfilled data unit corresponding to the additional data is cached.
  • a data processing method including:
  • Reading and splicing valid data in the at least one target minimum data unit from the storage device Reading and splicing valid data in the at least one target minimum data unit from the storage device
  • the valid data written in each of the minimum data units includes at least part of the additional data in the one additional write request or at least part of the additional data in the one additional write request and at least part of the additional data in the next additional write request.
  • a data processing method including:
  • the valid data length after the penultimate write operation of the metadata description area based on the minimum data unit, the valid data length after the last write operation, and the last written data and at least part of the data in the next smallest data unit Whether it belongs to the same additional write request, restore the data recovery length of the data file to the end position of the data of any additional write request to maintain atomicity;
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is configured to store valid data; and the metadata description area is configured to store the valid data.
  • Metadata the valid data includes at least part of the additional data in the one-time additional write request or at least part of the additional data in the one-time additional write request and at least part of the additional data in the next additional write request.
  • a data processing apparatus including:
  • a cache lookup module configured to search for the smallest data unit of the cache for the additional data in the write request at one time
  • a data organization module configured to sequentially write at least part of the data in the additional data to the cached minimum data unit to obtain a first minimum data unit to be stored, and write unwritten data in the additional data Entering at least one minimum data unit to obtain at least one second smallest data unit to be stored;
  • a data writing module configured to write the first minimum data unit to be stored to the storage device, and sequentially write the at least one second minimum data unit to be stored into the storage device;
  • a data cache module configured to cache an unfilled minimum data unit corresponding to the additional data.
  • a data processing apparatus including:
  • a request receiving module configured to receive a read data request
  • a calculating module configured to calculate, according to a first fixed length of the minimum data unit, at least one target minimum data unit corresponding to the read data request;
  • a data acquisition module configured to read and splicing valid data in the at least one target minimum data unit from the storage device
  • the valid data written in each of the minimum data units includes at least part of the additional data in the one additional write request or at least part of the additional data in the one additional write request and at least part of the additional data in the next additional write request.
  • a data processing apparatus including:
  • a fault detection module for detecting a data recovery instruction
  • a data recovery module for valid data length after the penultimate write operation of the metadata description area based on the minimum data unit, valid data length after the last write operation, and last written data and next minimum data Whether at least part of the data in the unit belongs to the same additional write request, and restores the data recovery length of the data file to the data end position of any additional write request to maintain atomicity;
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is configured to store valid data; and the metadata description area is configured to store the valid data.
  • Metadata the valid data includes at least part of the additional data in the one-time additional write request or at least part of the additional data in the one-time additional write request and at least part of the additional data in the next additional write request.
  • a computing device including a storage component and a processing component, is provided in an embodiment of the present application.
  • the storage component is configured to store one or more computer instructions, wherein the one or more computer instructions are for execution by the processing component;
  • the processing component is used to:
  • the minimum unfilled data unit corresponding to the additional data is cached.
  • a computing device including a storage component and a processing component, is provided in an embodiment of the present application.
  • the storage component is configured to store one or more computer instructions, wherein the one or more computer instructions are for execution by the processing component;
  • the processing component is used to:
  • Reading and splicing valid data in the at least one target minimum data unit from the storage device Reading and splicing valid data in the at least one target minimum data unit from the storage device
  • the valid data written in each of the minimum data units includes at least part of the additional data in the one additional write request or at least part of the additional data in the one additional write request and at least part of the additional data in the next additional write request.
  • a computing device including a storage component and a processing component, is provided in an embodiment of the present application.
  • the storage component is configured to store one or more computer instructions, wherein the one or more computer instructions are for execution by the processing component;
  • the processing component is used to:
  • the valid data length after the penultimate write operation of the metadata description area based on the minimum data unit, the valid data length after the last write operation, and the last written data and at least part of the data in the next smallest data unit Whether it belongs to the same additional write request, restore the data recovery length of the data file to the end position of the data of any additional write request to maintain atomicity;
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is configured to store valid data; and the metadata description area is configured to store the valid data.
  • Metadata the valid data includes at least part of the additional data in the one-time additional write request or at least part of the additional data in the one-time additional write request and at least part of the additional data in the next additional write request.
  • the cached minimum data unit may be first searched, and then at least part of the data in the additional data is sequentially written into the cached minimum data unit to obtain the first a minimum data unit to be stored, and writing unwritten data in the additional data to at least one minimum data unit to obtain at least one second smallest data unit to be stored; and then storing the first minimum to be stored
  • the data unit covers the write storage device, and the at least one second minimum data unit to be stored is sequentially written into the storage device, that is, the operation of adding data to the storage device may be completed, and the unwritten corresponding to the additional data is
  • the full minimum data unit continues to be cached, so that the next additional write request can be continued according to the technical solution of the present application.
  • the minimum data unit is used, and the minimum data unit length is fixed, so that it is not necessary to additionally establish and maintain an index relationship.
  • the calculation can locate the storage location of the data and improve the convenience of operation.
  • Minimum data unit in the storage device in order to write the write mode, to ensure data integrity and efficiency of the write operation.
  • FIG. 1 is a flow chart showing an embodiment of a data processing method provided by the present application.
  • FIG. 2 is a flow chart showing still another embodiment of a data processing method provided by the present application.
  • FIG. 3 is a schematic structural diagram of a minimum data unit in the embodiment of the present application.
  • FIG. 4 is a flow chart showing still another embodiment of a data processing method provided by the present application.
  • FIG. 5 is a flow chart showing still another embodiment of a data processing method provided by the present application.
  • FIG. 6 is a flow chart showing still another embodiment of a data processing method provided by the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a data processing apparatus provided by the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a computing device provided by the present application.
  • FIG. 9 is a schematic structural diagram of still another embodiment of a data processing apparatus provided by the present application.
  • FIG. 10 is a schematic structural diagram of still another embodiment of a computing device provided by the present application.
  • FIG. 11 is a schematic structural diagram of still another embodiment of a data processing apparatus provided by the present application.
  • FIG. 12 is a schematic structural diagram of still another embodiment of a computing device provided by the present application.
  • the technical solution of the embodiment of the present application is applied to a storage system, particularly a distributed storage system.
  • Additional write A data write method in which newly written data is added to the already written data.
  • the operation of writing data by additional write is also called an additional write operation, and is written in a write-once manner.
  • Incoming is also called add-on writing.
  • Overwrite write A data write mode in which the current write data will overwrite the corresponding written data, and the operation of writing data in an overwrite manner is also called an overwrite write operation, and is written in an overwrite manner. Called overwrite write.
  • Sequential write refers to the position of multiple write operations is a continuous data write mode, the operation of writing data in the form of sequential write is also called sequential write operation, written in sequential write mode is also called Write sequentially.
  • Metadata Data that describes data attributes, such as data length, data status, and so on.
  • Additional write request A write request triggered by requesting write data by an additional write, carrying additional data to be written.
  • Request atomicity An additional write request is either all successful or all fails.
  • Storage device A hardware device used to store data in a storage system. The data ultimately needs to be written into the storage device, and may refer to a storage medium such as a magnetic disk.
  • the minimum data unit is a data storage structure defined in the embodiment of the present application.
  • the additional data in the one-time additional write request is split according to the minimum data unit to convert the additional data into a data organization form of at least one minimum data unit, and the minimum
  • the data unit is a minimum unit for writing data to or reading data from the storage device, and has a fixed length.
  • the embodiment of the present application is defined as a first fixed length, and the first fixed length may be equal to a physical fan in the storage device.
  • the sector length of the area or a multiple of the sector length may be, for example, 4K (Kilobyte, kilobyte) or a multiple of 4K.
  • the inventors have proposed a technical solution of the present application through a series of studies.
  • the additional data in an additional write request may be first searched.
  • Cached minimum data unit after which at least part of the data in the additional data is sequentially written into the cached minimum data unit to obtain a first minimum data unit to be stored, and unwritten data in the additional data
  • the minimum data unit is sequentially written into the storage device, that is, the operation of adding data to the storage device can be completed, and the minimum data unit that is not full corresponding to the additional data is continuously cached, so that the next additional write request can be performed according to the present
  • the technical solution of the application continues to be implemented, using the smallest data unit, the smallest data sheet
  • the length of the element is fixed, so there is no need to additionally establish and maintain an index relationship.
  • the storage location of the data can be located by calculation, the operation portability is improved, and the minimum data unit is written in the storage device in a sequential write manner, thereby ensuring data
  • FIG. 1 is a flowchart of an embodiment of a data processing method according to an embodiment of the present disclosure, where the method may include the following steps:
  • the additional write request is initiated for a data file, and the data file is written to the storage device by multiple write operations.
  • the additional data in each additional write request may be part of the data in the data file, and the length of the additional data in each additional write request may be different.
  • the minimum data unit in which at least part of the data is sequentially written into the cache may mean that the at least part of the data is written in the last data write end position in the smallest data unit of the cache.
  • step 103 can be directly executed.
  • the unwritten data in the additional data refers to data that is not written in any of the smallest data units. If there is no unwritten data in the additional data, step 104 and step 105 can be performed.
  • the first data unit to be stored is first written to the storage device, and then at least one second smallest data unit to be stored is sequentially written to the storage device.
  • the at least one second smallest data unit to be stored is sequentially written to the storage device, that is, the smallest data unit to be stored in the first storage unit, and the at least one second minimum data to be stored is written in the write operation end position in the storage device.
  • the unit, because the first data unit to be stored covers the write storage device, the write operation end position of the first smallest data unit to be stored in the storage device may also refer to the write of the last additional write request in the storage device. End of operation.
  • At least one second minimum data unit to be stored is directly written to the storage device, that is, the last write request of the last additional write request in the storage device end position Writing to the at least one second smallest data unit to be stored.
  • the smallest data unit is a data storage structure, and the additional data is split according to the minimum data unit. It can be understood that at least part of the data of the additional data is written into the minimum data unit and the unwritten data of the additional data is written into the at least one minimum data unit, that is, the additional data is
  • the data storage structure corresponding to the minimum data unit performs a conversion operation to convert the additional data into a data organization form of at least one minimum data unit, and each of the minimum data units includes at least part of the additional data in one additional write request.
  • writing the minimum data unit to be stored in the first storage unit to the storage device, and sequentially writing the at least one second minimum data unit to be stored to the storage device refers to an operation of storing the smallest data unit in the storage device.
  • At least part of the data in the additional data is selected from the start position of the additional data; writing the unwritten data to the at least one minimum data unit means starting from the beginning of the unwritten data, each time selecting the second fixed The length data, a minimum data unit is generated, until the unwritten data is less than the second fixed length, and a minimum data unit is also generated to store the unwritten data, except that the smallest data unit is not full.
  • the length of the minimum data unit is fixed.
  • the length of the smallest data unit is described as "first fixed length”.
  • the minimum data unit can write the second fixed length of the additional data, and the unfilled minimum data unit can mean that the data written in the minimum data unit is smaller than the second fixed length; the smallest data unit that is full is also It means that the data written in the minimum data unit is equal to the second fixed length.
  • the smallest data unit stored in the cache is the smallest data unit that is not full. If there is a minimum data unit in the cache, the additional data is preferentially written into the smallest data unit of the cache until the minimum read/write single write full data of the cache, and then the unwritten data is sequentially written into at least one minimum data unit. .
  • the first smallest data unit to be stored is written to the storage device in an overwrite manner to overwrite the last write to the storage device.
  • the minimum data unit that is filled, because the first data to be stored is already written to the additional data in the last additional write request, that is, written to the storage device in an overwrite manner, also ensures data integrity. And by preferentially writing the additional data into the smallest data unit of the cache, the continuity of the data in the minimum data unit is ensured, so that continuous and complete data can be obtained when the data is read.
  • the storage device is written in the form of a minimum length of data unit of a fixed length, and the data storage location can be determined only by calculation, and the index relationship between the data and the storage location is not required to be maintained, thereby improving the operation.
  • the minimum data unit is sequentially written to the storage device in this embodiment, instead of being randomly written to the storage device, the write operation efficiency can be improved, and the integrity and continuity of the data are ensured by partially overwriting the write operation.
  • the data processing method in the embodiment shown in FIG. 1 is mainly introduced from the data writing process, and the data processing method according to the embodiment shown in FIG. 1 is used to write data into the storage device.
  • the data is read from the storage device, the calculation is performed.
  • the storage location of the data can be located, as shown in FIG. 2, which is a flowchart of another embodiment of a data processing method provided by the embodiment of the present application, and the method may include the following steps:
  • the read data request can be sent by the requesting end.
  • the request start location corresponding to the read data request and the request offset may be first determined.
  • the request start position and the request offset may be carried in each read data request.
  • the request offset may refer to the data length of the requested object.
  • the start boundary of any one of the minimum data units can be located, based on the request offset and the first fixed length of the minimum data unit, that is, the number of minimum data units that need to be read can be calculated, and thus the request is combined
  • the starting position and the request offset and the first fixed length, ie, at least one target minimum data unit can be calculated.
  • the starting position according to the request can be located to the starting boundary of a certain minimum data unit, and can be positioned to a certain minimum data unit according to the requested reading length. End the border.
  • the valid data in the at least one target minimum data unit may be read from the storage device, and the at least one target minimum data unit is spliced according to the storage order to form a larger data, and the feedback is To the requester.
  • the index relationship between the data and the storage location does not need to be established, and the storage location of the data can be located by calculation, thereby improving the convenience of operation.
  • the minimum data unit may include a data storage area and a metadata description area located at a tail of the data storage area, and the additional data is written in the data storage area, and the metadata of the additional data is generated to generate the metadata description area.
  • the metadata is generated based on the received additional data, and can be used to indicate related information such as attributes of the additional data.
  • the minimum data unit has a fixed length, and the data storage area and the data description area are also fixed in length.
  • the additional data is written into the data storage area, which is the valid data of the smallest data unit.
  • the length of the minimum data unit is a first fixed length
  • the length of the data storage area is a second fixed length
  • the length of the metadata description area is a third fixed length, wherein the first fixed length is equal to the second fixed length and the third The sum of the fixed lengths.
  • each of the minimum data units 300 is composed of a data storage area 301 and a metadata description area 302, and the metadata description area 302 is located at the end of the data storage area 301.
  • the data storage area 301 stores valid data, that is, data requested to be written, as shown in the shaded portion in FIG. 3; metadata in the metadata description area 302 for storing valid data.
  • the at least part of the additional data is additionally written to the cached minimum data unit to obtain a first minimum data unit to be stored, and the additional data is Not writing data to write at least one minimum data unit to obtain at least one second smallest data unit to be stored may include:
  • the additional data and its metadata are encapsulated together to form an independent data format, and the write of the additional data and the metadata can be realized by one write operation, which can reduce the number of writes and improve the efficiency of the write operation.
  • reading and splicing the valid data in the at least one target minimum data unit from the storage device may include:
  • data verification may be performed on valid data in each target minimum data unit to verify whether the data is complete and correct.
  • the valid data in the data storage area in at least one target minimum data unit is spliced.
  • the metadata in the metadata description area may include a data checksum of valid data, that is, data written based on the data storage area, calculate a data checksum, and classify the data.
  • the checksum is stored as metadata in the metadata description area.
  • the data checksum can be obtained by using a CRC (Cyclic Redundancy Check) algorithm, which is the same as the prior art, and is not described here.
  • CRC Cyclic Redundancy Check
  • reading and splicing the valid data in the data storage area of the at least one target minimum data unit from the storage device may include:
  • an additional write request may be divided into at least one minimum data unit into the storage device for additional data.
  • data recovery you need to ensure atomicity of requests, that is, an additional write request is either successful. All failed, that is, at least one minimum data unit corresponding to one additional write request is either all written successfully, or all writes fail, so that after a write operation failure causes the process to restart, the data can be restored to any additional write request.
  • the boundary is to guarantee the atomicity of the append request.
  • the valid data length after the penultimate write operation, the valid data length after the last write operation, and the last written data are included in the metadata. Whether at least part of the data in the next smallest data unit belongs to the same additional write request;
  • the data checksum obtained by the calculation after the penultimate write operation and the data checksum calculated after the last write operation may also be included.
  • the metadata must also include some other attribute information related to the service, which is the same as the prior art and will not be described herein.
  • each metadata description area may include at least the following fields:
  • prevSize representing the effective data length after the penultimate write operation in the smallest data unit
  • prevCrc which represents the data checksum of the valid data in the smallest data unit after the second write operation of the minimum data unit, that is, the data checksum of the data stored in the data storage area
  • currSize representing the effective data length after the last write operation in the smallest data unit
  • currCrc which represents the data checksum of the valid data in the smallest data unit after the last write operation in the smallest data unit, that is, the data checksum of the data stored in the data storage area.
  • hasMore indicating whether the last data written in the smallest data unit and at least part of the data in the next smallest data unit belong to the same additional write request.
  • a data recovery instruction if a data recovery instruction is detected, data recovery is required, that is, based on at least the valid data length after the penultimate write operation in each of the smallest data units, after the last write operation. Whether the valid data length, and the last written data and the at least part of the data in the next smallest data unit belong to the same additional write request, restore the data recovery length of the data file to the data end position of any additional write request to maintain the atom Sex.
  • Whether the data of the last write written in the description of the metadata description area and the at least part of the data in the next smallest data unit belong to the same additional write request field can determine whether the two adjacent write requests correspond to the same minimum data.
  • the unit enables the data recovery process to ensure that the data end boundary of any additional write request can be identified, and then the data can be recovered based on the valid data length after the penultimate write operation or the valid data length after the last write operation.
  • the end-of-boundary boundary of the file to any additional write request guarantees the atomicity of the request.
  • a flowchart of still another embodiment of a data processing method according to an embodiment of the present application may include the following steps:
  • the data recovery command may be automatically triggered when detecting a write operation failure, and may of course be manually triggered.
  • the valid data length of the written data of the initialization data file is zero and the initial scan position is the start position of the data file in the storage device.
  • step 407 is not limited to the operation sequence of the embodiment, and may be performed after or before or simultaneously with any of steps 401 to 406.
  • step 410 Verify whether the data after the penultimate write operation in the current minimum data unit is successful, if step 411 is performed, and if no, step 415 is performed.
  • the metadata description area may store the data checksum after the penultimate write operation, so that the data after the second last write operation may be determined based on the data checksum after the second last write operation. Successful test.
  • step 412 Verify whether the data after the last write operation is successful. If yes, go to step 413, if no, go to step 415.
  • the metadata description area can store the data checksum after the last write operation, so it can be determined whether the data after the last write operation is successfully verified based on the data checksum after the last write operation.
  • step 413 Detect whether the current minimum data unit and the next smallest data unit belong to the same additional write request; if yes, return to step 409 to continue execution, and if no, execute step 414.
  • the current minimum data unit belongs to the same additional write request as the next smallest data unit, it indicates that the last data written in the current minimum data unit is not the data end position of an additional write request, so the scanning needs to be continued. Find the request boundary.
  • the data check after the last write operation of any of the smallest data units fails, and the data check after the second last write operation is successful, it indicates that the second last write operation does not belong to the same add operation as the last write operation.
  • the write request at this time, if the data check after the last write operation fails, the valid data length of the written data obtained based on the valid data length update after the penultimate write operation can be taken as the recovery length of the data file.
  • the data recovery can be performed to ensure the atomicity of the request, so that the data file can be restored to the data end position of any of the additional write requests. Therefore, the technical solution of the embodiment of the present application can not only ensure the write operation. Efficiency, ease of writing, and guaranteed atomicity.
  • the last data to be written may be deleted and the minimum data unit may be cached to ensure that Continue to perform an additional write operation.
  • the writing the unwritten data in the additional data to the data storage area in the at least one minimum data unit may include:
  • the tail of the data written in the data storage area of any one of the smallest data units fills the data storage area with a preset character
  • the data storage area in which at least part of the data in the additional data is sequentially written into the cached minimum data unit includes:
  • the preset character may be a character 0 or a null character.
  • the writing of the first minimum data unit to be stored to the storage device may include:
  • Each additional write request carries a write start position, and may also include a write data length and the like. Therefore, based on the write start position and the first fixed length of the minimum data unit, the position of the previous smallest data unit can be found as the position to be written to cover the first smallest data unit of the first to be stored, Write to the storage device.
  • FIG. 5 is a flowchart of still another embodiment of a data processing method according to an embodiment of the present disclosure. This embodiment describes a data acquisition process, and the method may include the following steps:
  • the valid data written in each of the minimum data units includes at least part of the additional data in the one additional write request or at least part of the additional data in the one additional write request and at least part of the additional data in the next additional write request.
  • the calculating, according to the first fixed length of the minimum data unit, the at least one target minimum data unit corresponding to the read data request comprises:
  • the minimum data unit may include a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is configured to store valid data; the metadata a description area for storing metadata of the valid data;
  • the reading and splicing the valid data in the at least one target minimum data unit from the storage device includes:
  • the metadata may further include a data checksum after the penultimate write operation and a data checksum after the last write operation;
  • the data checksum and the last write after the penultimate write operation in each of the smallest data units can be based.
  • FIG. 6 is a flowchart of still another embodiment of a data processing method according to an embodiment of the present disclosure. This embodiment is mainly described from a data recovery process, and the method may include the following steps:
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is configured to store valid data; and the metadata description area is configured to store the valid data.
  • Metadata the valid data includes at least part of the additional data in the one-time additional write request or at least part of the additional data in the one-time additional write request and at least part of the additional data in the next additional write request.
  • the valid data length after the penultimate write operation of the metadata description area based on the minimum data unit, the valid data length after the last write operation, and the last written data and the next minimum data Whether at least part of the data in the unit belongs to the same additional write request, and restoring the data recovery length of the data file to the data end position of any additional write request to maintain atomicity may include:
  • the valid data length of the written data of the initialization data file is zero and the initial scan position is the start position of the data file in the storage device;
  • the valid data length is currently written as the data recovery length of the data file.
  • the metadata description area may store the data checksum after the penultimate write operation, and thus may determine the second to last write operation based on the data checksum after the second last write operation. Whether the data is verified successfully.
  • the metadata description area may store the data checksum after the last write operation, so it may be determined whether the data after the last write operation is successfully verified based on the data checksum after the last write operation.
  • the additional data is written into the storage device together with the metadata of the additional data, which can reduce the number of write operations and ensure the efficiency of the write operation.
  • the additional data is reorganized according to the fixed data format of the minimum data unit, and the index relationship is not maintained.
  • the data position can be located by calculation, and the data reading is realized, thereby ensuring the convenience of the writing operation.
  • the data end boundary of the additional write request can be identified, and the atomicity of the request at the time of data recovery is ensured.
  • FIG. 7 is a schematic structural diagram of an embodiment of a data processing apparatus according to an embodiment of the present disclosure, where the apparatus may include:
  • the cache search module 701 is configured to search for the cached minimum data unit for the additional data in the write request at one time;
  • the data organization module 702 is configured to sequentially write at least part of the data in the additional data to the cached minimum data unit to obtain a first minimum data unit to be stored, and to write unwritten data in the additional data. Writing at least one minimum data unit to obtain at least one second smallest data unit to be stored;
  • a data writing module 703 configured to write the first minimum data unit to be stored to the storage device, and sequentially write the at least one second minimum data unit to be stored into the storage device;
  • the data cache module 704 is configured to cache an unfilled minimum data unit corresponding to the additional data.
  • the data processing apparatus can further include:
  • a request receiving module configured to receive a read data request
  • a calculating module configured to calculate, according to a first fixed length of the minimum data unit, at least one target minimum data unit corresponding to the read data request;
  • the calculating module may be specifically configured to determine a request start location corresponding to the read data request and a request offset;
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the metadata description area is for storing metadata;
  • the data organization module can be specifically used for:
  • the metadata may include at least a valid data length after the penultimate write operation, a valid data length after the last write operation, and at least part of the data of the last write and the next minimum data unit. Belong to the same additional write request;
  • the apparatus can also include:
  • a fault detection module for detecting a data recovery instruction
  • a data recovery module for valid data length after the penultimate write operation of the metadata description area based on the minimum data unit, valid data length after the last write operation, and last written data and next minimum data Whether at least part of the data in the unit belongs to the same additional write request, and restores the data recovery length of the data file to the data end position of any additional write request to maintain atomicity.
  • the data recovery module may be specifically configured to:
  • the valid data length of the written data of the initialization data file is zero and the initial scan position is the start position of the data file in the storage device;
  • the valid data length is currently written as the data recovery length of the data file.
  • the metadata further includes a data checksum after the penultimate write operation and a data checksum after the last write operation;
  • the data recovery module verifies that the data after the penultimate write operation in the current minimum data unit may be specifically based on a data checksum after the penultimate write operation in the current minimum data unit, and the check reciprocal Data after the second write operation;
  • the data recovery module verifies that the data after the last write operation in the current minimum data unit may be specifically based on a data checksum after the last write operation in the current minimum data unit, and verify the last write operation of the countdown After the data.
  • the apparatus further comprises:
  • the cache triggering module is configured to cache any one of the smallest data units after the last written data is deleted if the data check fails after the last write operation in any of the smallest data units.
  • the data organization module writes the unwritten data in the additional data into the data storage area in the at least one minimum data unit, which may be specifically The write data is written into the data storage area of the at least one minimum data unit; if the data storage area of any of the smallest data units is not full of data, the tail of the data written in the data storage area of any one of the smallest data units is utilized The preset character fills up the data storage area;
  • the data writing module sequentially writing at least part of the data in the additional data to the data storage area in the cached minimum data unit may specifically write at least part of the additional data into the cache.
  • the data writing module writes the first minimum data unit to be stored to the storage device, and may be specifically based on the write start position of the additional write request and the minimum data unit.
  • the first fixed length determines the position to be written;
  • the data processing apparatus of the embodiment shown in FIG. 7 can be implemented as a computing device.
  • the computing device can deploy a data storage node in a distributed storage system, etc., and a data storage node. That is, a node in a distributed storage system that is responsible for processing write data requests or read data requests, etc., and the distributed storage system is composed of a plurality of data storage nodes.
  • the computing device can include a storage component 801 and a processing component 802;
  • the storage component 801 stores one or more computer instructions for the processing component 802 to invoke execution.
  • the processing component 802 is configured to:
  • the minimum unfilled data unit corresponding to the additional data is cached.
  • the smallest data unit that is not full corresponding to the additional data may be cached in the storage component 801.
  • the storage device may be an external storage medium of the computing device.
  • the storage component 901 may also be referred to.
  • the processing component may further perform the data processing method described in any of the embodiments of FIG. 1 to FIG.
  • the processing component 802 can include one or more processors to execute computer instructions to perform all or part of the steps described above.
  • the processing component can also be one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs).
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • Storage component 801 is configured to store various types of data to support operation at the computing device.
  • the storage component can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • the computing device must also include other components, such as input/output interfaces, communication components, and the like.
  • the input/output interface provides an interface between the processing component and the peripheral interface module, and the peripheral interface module may be an output device, an input device, or the like.
  • the communication component is configured to facilitate wired or wireless communication between the computing device and other devices, such as communication with the requesting end, and the like.
  • the embodiment of the present application further provides a computer readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the data processing method of any of the embodiments shown in FIG. 1 to FIG. 3 can be implemented.
  • FIG. 9 is a schematic structural diagram of still another embodiment of a data processing apparatus according to an embodiment of the present disclosure, where the apparatus may include:
  • the request receiving module 901 is configured to receive a read data request.
  • the calculating module 902 is configured to calculate, according to the first fixed length of the minimum data unit, at least one target minimum data unit corresponding to the read data request;
  • a data obtaining module 903 configured to read and splicing valid data in the at least one target minimum data unit from the storage device;
  • the data obtained after splicing can be fed back to the requesting end.
  • the valid data written in each of the minimum data units includes at least part of the additional data in the one additional write request or at least part of the additional data in the one additional write request and at least part of the additional data in the next additional write request.
  • the computing module can be specifically configured to:
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is for storing valid data; and the metadata description area is for storing Metadata of the valid data;
  • the reading and splicing the valid data in the at least one target minimum data unit from the storage device includes:
  • the data processing apparatus of the embodiment shown in FIG. 9 can be implemented as a computing device in which a data storage node or the like in a distributed storage system can be deployed.
  • the computing device can include a storage component 1001 and a processing component 1002;
  • the storage component 1001 is one or more computer instructions, wherein the one or more computer instructions are for execution by the processing component 1002.
  • the processing component 1002 is configured to:
  • Reading and splicing valid data in the at least one target minimum data unit from the storage device Reading and splicing valid data in the at least one target minimum data unit from the storage device
  • the valid data written in each of the minimum data units includes at least part of the additional data in the additional write request or at least part of the additional data in the one additional write request and at least part of the additional data in the next additional write request.
  • the processing component 1002 can include one or more processors to execute computer instructions to perform all or part of the steps described above.
  • the processing component can also be one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs).
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • Storage component 1001 is configured to store various types of data to support operation at the computing device.
  • the memory can be implemented by any type of volatile or non-volatile memory device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), and erasable programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • the computing device must also include other components, such as input/output interfaces, communication components, and the like.
  • the input/output interface provides an interface between the processing component and the peripheral interface module, and the peripheral interface module may be an output device, an input device, or the like.
  • the communication component is configured to facilitate wired or wireless communication between the computing device and other devices, such as communication with the requesting end, and the like.
  • the embodiment of the present application further provides a computer readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the data processing method of any of the embodiments shown in FIG. 4 can be implemented.
  • the computing device shown in FIG. 10 and the computing device shown in FIG. 8 may be the same computing device.
  • FIG. 11 is a schematic structural diagram of still another embodiment of a data processing apparatus according to an embodiment of the present disclosure, where the apparatus may include:
  • the fault detection module 1101 is configured to detect a data recovery instruction.
  • the data recovery module 1102 is configured to: based on the valid data length after the penultimate write operation of the metadata description area of the minimum data unit, the effective data length after the last write operation, and the last written data and the next minimum Whether at least part of the data in the data unit belongs to the same additional write request, and restores the data recovery length of the data file to the data end position of any additional write request to maintain atomicity;
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is configured to store valid data; and the metadata description area is configured to store the valid data.
  • Metadata the valid data includes at least part of the additional data in the one-time additional write request or at least part of the additional data in the one-time additional write request and at least part of the additional data in the next additional write request.
  • the data recovery module can be specifically configured to:
  • the valid data length is currently written as the data recovery length of the data file.
  • the data processing apparatus shown in FIG. 11 can be implemented as a computing device in which a data storage node or the like in a distributed storage system can be deployed.
  • the computing device can include a storage component 1201 and a processing component 1202;
  • the storage component 1201 is one or more computer instructions, wherein the one or more computer instructions are for execution by the processing component 1202.
  • the processing component 1202 is configured to:
  • the valid data length after the penultimate write operation of the metadata description area based on the minimum data unit, the valid data length after the last write operation, and the last written data and at least part of the data in the next smallest data unit Whether it belongs to the same additional write request, restore the data recovery length of the data file to the end position of the data of any additional write request to maintain atomicity;
  • the minimum data unit includes a data storage area and a metadata description area located at a tail of the data storage area; the data storage area is configured to store valid data; and the metadata description area is configured to store the valid data.
  • Metadata the valid data includes at least part of the additional data in the one-time additional write request or at least part of the additional data in the one-time additional write request and at least part of the additional data in the next additional write request.
  • the processing component 1202 can include one or more processors to execute computer instructions to perform all or part of the steps described above.
  • the processing component can also be one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs).
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • Storage component 1201 is configured to store various types of data to support operation at the computing device.
  • the memory can be implemented by any type of volatile or non-volatile memory device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), and erasable programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • the computing device must also include other components, such as input/output interfaces, communication components, and the like.
  • the input/output interface provides an interface between the processing component and the peripheral interface module, and the peripheral interface module may be an output device, an input device, or the like.
  • the communication component is configured to facilitate wired or wireless communication between the computing device and other devices, such as communication with the requesting end, and the like.
  • the embodiment of the present application further provides a computer readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the data processing method of the embodiment shown in FIG. 5 can be implemented.
  • the computing device shown in FIG. 12 and the computing device shown in FIG. 10 and FIG. 8 may be the same computing device.
  • the additional data is written into the storage device together with the metadata of the additional data, which can reduce the number of write operations and ensure the efficiency of the write operation.
  • the additional data is reorganized according to the fixed data format of the minimum data unit, and the index relationship is not required to be maintained.
  • the data position can be read by calculation to ensure the convenience of the write operation.
  • by overwriting the write operation and recording whether the two additional write requests correspond to the same minimum data unit before and after the data end boundary of the additional write request can be identified, and the atomicity of the request at the time of data recovery is ensured.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法、装置及计算设备。其中,针对一次追加写请求中的追加数据,查找缓存的最小数据单元(101);将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元(102),以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元(103);将所述第一待存储的最小数据单元覆盖写入存储设备(104),以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备(105);缓存所述追加数据对应的未写满的最小数据单元(106)。保证了数据完整性,提高操作便捷性以及写操作效率。

Description

数据处理方法、装置及计算设备
本申请要求2018年01月09日递交的申请号为201810020395.X、发明名称为“数据处理方法、装置及计算设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及一种数据处理方法、装置及计算设备。
背景技术
追加写是存储系统中一种常见的数据写入方式,是指在已写入的数据的基础上追加新写入的数据。在很多应用场景均可以采用追加写的方式执行写操作,例如如果待写入的数据文件很大时,通常通过多次写操作才能完成一个数据文件写入磁盘中,此时即可以采用追加写的方式执行写操作。
为了保证采用追加写的方式执行写操作时,写入数据的完整性,现有技术中的一种数据写入方式是,每次将追加数据写入磁盘中的一个新位置,并建立数据与其存储位置的索引关系,以根据索引关系定位到数据。
但是,现有技术的这种方式需要额外建立并维护一份索引关系,操作不够便捷,效率较低。
发明内容
本申请实施例提供一种数据处理方法、装置及计算设备,用以解决现有技术中操作不便捷、效率低的技术问题。
第一方面,本申请实施例中提供了一种数据处理方法,包括:
针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;
将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
缓存所述追加数据对应的未写满的最小数据单元。
第二方面,本申请实施例中提供了一种数据处理方法,包括:
接收读数据请求;
基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
第三方面,本申请实施例中提供了一种数据处理方法,包括:
检测数据恢复指令;
基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
第四方面,本申请实施例中提供了一种数据处理装置,包括:
缓存查找模块,用于针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
数据组织模块,用于将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;
数据写入模块,用于将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
数据缓存模块,用于缓存所述追加数据对应的未写满的最小数据单元。
第五方面,本申请实施例中提供了一种数据处理装置,包括:
请求接收模块,用于接收读数据请求;
计算模块,用于基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
数据获取模块,用于从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
第六方面,本申请实施例中提供了一种数据处理装置,包括:
故障检测模块,用于检测数据恢复指令;
数据恢复模块,用于基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
第七方面,本申请实施例中提供了一种计算设备,包括存储组件以及处理组件,
所述存储组件用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件调用执行;
所述处理组件用于:
针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;
将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
缓存所述追加数据对应的未写满的最小数据单元。
第八方面,本申请实施例中提供了一种计算设备,包括存储组件以及处理组件,
所述存储组件用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件调用执行;
所述处理组件用于:
接收读数据请求;
基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
第九方面,本申请实施例中提供了一种计算设备,包括存储组件以及处理组件,
所述存储组件用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件调用执行;
所述处理组件用于:
检测数据恢复指令;
基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
本申请实施例中,针对一次追加写请求中的追加数据,可以首先查找缓存的最小数据单元,之后将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;之后再将所述第一待存储的最小数据单元覆盖写入存储设备,及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备,即可以完成追加数据写入存储设备的操作,且对于追加数据对应的未写满的最小数据单元继续进行缓存,从而下一次追加写请求即可以按照本申请的技术方案继续执行,采用最小数据单元的方式,最小数据单元的长度固定,因此无需额外建立并维护索引关系,通过计算即可以定位数据的存储位置,提高了操作便捷性,且最小数据单元在存储设备以顺序写的方式写入,保证了数据完整性以及写操作效率。
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请提供的一种数据处理方法一个实施例的流程图;
图2示出了本申请提供的一种数据处理方法又一个实施例的流程图;
图3示出了本申请实施例中最小数据单元的一种结构示意图;
图4示出了本申请提供的一种数据处理方法又一个实施例的流程图;
图5示出了本申请提供的一种数据处理方法又一个实施例的流程图;
图6示出了本申请提供的一种数据处理方法又一个实施例的流程图;
图7示出了本申请提供的一种数据处理装置一个实施例的结构示意图;
图8示出了本申请提供的一种计算设备一个实施例的结构示意图;
图9示出了本申请提供的一种数据处理装置又一个实施例的结构示意图;
图10示出了本申请提供的一种计算设备又一个实施例的结构示意图;
图11示出了本申请提供的一种数据处理装置又一个实施例的结构示意图;
图12示出了本申请提供的一种计算设备又一个实施例的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
在本申请的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
本申请实施例的技术方案应用于存储系统中,特别是分布式存储系统中。
为了方便理解,下面首先对本申请实施例中可能出现的技术术语进行相应解释:
追加写:在已写入数据的基础上追加新写入数据的一种数据写入方式,其中,以追加写的方式写入数据的操作又被称为追加写操作,以追写的方式写入也即称为追加写入。
覆盖写:当前写入数据会覆盖相应的已写入数据的一种数据写入方式,以覆盖写的方式写入数据的操作又被称为覆盖写操作,以覆盖写的方式写入也即称为覆盖写入。
顺序写:是指多次写操作的位置是连续的一种数据写入方式,以顺序写的方式写入数据的操作又被称为顺序写操作,以顺序写的方式写入也即称为顺序写入。
元数据:描述数据属性的数据,例如数据长度、数据状态等。
追加写请求:以追加写的方式请求写入数据而触发的写请求,携带待写入的追加数据。
请求原子性:一次追加写请求要么全部成功要么全部失败。
存储设备:存储系统中用以存储数据的硬件设备,数据最终需要写入存储设备中,可以是指诸如磁盘等的存储介质。
最小数据单元:本申请实施例定义的一种数据存储结构,一次追加写请求中的追加数据会按照最小数据单元进行拆分,以将追加数据转换为至少一个最小数据单元的数据组织形式,最小数据单元为数据写入存储设备或者从存储设备读取数据的最小单元,其长度固定,为方便描述,本申请实施例定义为第一固定长度,该第一固定长度可以等于存储设备中物理扇区的扇区长度或者扇区长度的倍数,例如可以为4K(Kilobyte,千字节)或者4K的倍数。
由于在很多应用场景中以追加写的方式写入数据,需要保证数据的完整性,而存储设备本身无法保证一次跨多个扇区的数据完整性,目前的一种数据写入方式,采用每次追加数据均写入存储设备中的一个新位置,通过建立索引关系来定位数据,这种方式虽然可以保证数据完整性,保证数据不被损坏,但是需要额外建立并维护索引关系,操作并不便捷,且写操作效率较低。
为了保证写操作效率,保证数据完整性,提高操作便携性,发明人经过一系列研究提出了本申请的技术方案,在本申请实施例中,针对一次追加写请求中的追加数据,可以首先查找缓存的最小数据单元,之后将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;之后再将 所述第一待存储的最小数据单元覆盖写入存储设备,及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备,即可以完成追加数据写入存储设备的操作,且对于追加数据对应的未写满的最小数据单元继续进行缓存,从而下一次追加写请求即可以按照本申请的技术方案继续执行,采用最小数据单元的方式,最小数据单元的长度固定,因此无需额外建立并维护索引关系,通过计算即可以定位数据的存储位置,提高了操作便携性,且最小数据单元在存储设备以顺序写的方式写入,保证了数据完整性以及写操作效率。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1为本申请实施例提供的一种数据处理方法一个实施例的流程图,该方法可以包括以下几个步骤:
101:针对一次追加写请求中的追加数据,查找缓存的最小数据单元。
该追加写请求针对一个数据文件发起,通过多次追加写操作以将数据文件写入存储设备中。每一次追加写请求中的追加数据可以为该数据文件中的部分数据,每一次追加写请求中的追加数据的长度可以不同。
102:将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元。
其中,将至少部分数据顺序写入所述缓存的最小数据单元可以是指在缓存的最小数据单元中接连上一次写操作结束位置写入所述至少部分数据。
103:将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元。
其中,若不存在缓存的最小数据单元,则可以直接执行步骤103。
其中,追加数据中的未写入数据是指未写入任一最小数据单元中的数据。若追加数据中不存在未写入数据了,即可以执行步骤104以及步骤105。
104:将所述第一待存储的最小数据单元覆盖写入存储设备。
105:将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备。
如果存在第一待存储的最小数据单元,则先将第一待存储的最小数据单元覆盖写入存储设备之后,再将至少一个第二待存储的最小数据单元顺序写入所述存储设备。至少 一个第二待存储的最小数据单元顺序写入所述存储设备也即是接连第一待存储的最小数据单元在存储设备中的写操作结束位置写入该至少一个第二待存储的最小数据单元,由于第一待存储的最小数据单元覆盖写入存储设备,该第一待存储的最小数据单元在存储设备中的写操作结束位置也可以是指上一次追加写请求在存储设备中的写操作结束位置。
如果不存在第一待存储的最小数据单元,则直接将至少一个第二待存储的最小数据单元顺序写入所述存储设备,也即接连上一次追加写请求在存储设备中的写操作结束位置写入该至少一个第二待存储的最小数据单元。
106:缓存所述追加数据对应的未写满的最小数据单元。
最小数据单元为一种数据存储结构,追加数据按照该最小数据单元进行拆分。可以理解的是,本申请实施例中描述的将追加数据的至少部分数据写入最小数据单元,以及将追加数据的未写入数据写入至少一个最小数据单元,也即是指将追加数据按照该最小数据单元对应的数据存储结构进行转换的操作,以将追加数据转换为至少一个最小数据单元的数据组织形式,每一个最小数据单元包括一次追加写请求中的至少部分追加数据。
而将第一待存储的最小数据单元覆盖写入存储设备,以及将至少一个第二待存储的最小数据单元顺序写入存储设备,是指将最小数据单元存储至存储设备中的操作,以此实现追加数据的存储。
其中,追加数据中的至少部分数据从追加数据的起始位置开始选取;将未写入数据写入至少一个最小数据单元是指从未写入数据的起始位置开始,每次选取第二固定长度的数据,生成一个最小数据单元,直至未写入数据小于该第二固定长度,此时也会生成一个最小数据单元以存储未写入数据,只是该最小数据单元未写满。
该最小数据单元的长度是固定的,为了方便区分,最小数据单元的长度描述为“第一固定长度”。此外,最小数据单元可以写入第二固定长度的追加数据,未写满的最小数据单元可以是指最小数据单元中写入的数据小于该第二固定长度;写满的最小数据单元也即是指最小数据单元中写入的数据等于该第二固定长度。
由上述描述可知,缓存中存储的最小数据单元即为未写满的最小数据单元。如果缓存中存在最小数据单元,则优先将追加数据顺序写入该缓存的最小数据单元中直至该缓存的最小读写单写满数据,再将未写入数据依次写入至少一个最小数据单元中。
由于缓存的最小数据单元在上一次追加写操作中,已写入存储设备,因此第一待存储的最小数据单元即以覆盖写的方式写入存储设备,以覆盖上一次写入存储设备的未写 满的最小数据单元,由于第一待存储的最小数据单元中已写入上一次追加写请求中的追加数据,即以覆盖写的方式写入存储设备,也保证了数据完整性。且通过将追加数据优先写入缓存的最小数据单元中,保证了最小数据单元中数据的连续性,从而读取数据时可以获得连续完整的数据。
由于本实施例中,采用固定长度的最小数据单元的形式写入存储设备,读取数据时仅需通过计算即可以确定数据的存储位置,无需额外维护数据与存储位置的索引关系,提高了操作便捷性,且本实施例中最小数据单元顺序写入存储设备,而不是随机写入存储设备,可以提高写操作效率,且通过部分覆盖写操作,保证了数据的完整性和连续性。
图1所示实施例中的数据处理方法主要从数据写入流程介绍,按照图1所示实施例的数据处理方法将数据写入存储设备中,在从存储设备读取数据时,通过计算即可以定位数据的存储位置,如图2所示,为本申请实施例提供的一种数据处理方法又一个实施例的流程图,该方法可以包括以下几个步骤:
201:针对一次追加写请求中的追加数据,查找缓存的最小数据单元。
202:将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元。
203:将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元。
204:将所述第一待存储的最小数据单元覆盖写入存储设备。
205:将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备。
206:缓存所述追加数据对应的未写满的最小数据单元。
步骤201~步骤206的操作可以参见上述实施例中步骤101~步骤106的操作,在此不再赘述。
207:接收读数据请求。
其中,读数据请求可以为请求端发送的。
208:基于所述最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
可选地,可以首先确定所述读数据请求对应的请求开始位置以及请求偏移量。
其中,每一个读数据请求中可以携带该请求开始位置以及请求偏移量。
其中,请求偏移量可以是指请求读物的数据长度。
基于请求开始位置可以定位至任一个最小数据单元的起始边界,基于请求偏移量以及该最小数据单元的第一固定长度,即可以计算获得需要读取的最小数据单元的数量,因此结合请求开始位置以及请求偏移量以及第一固定长度,即可以计算获得至少一个目标最小数据单元。
假设,存储设备中数据文件的起始位置为0K(Kilobyte,千字节),对该数据文件的追加写操作从起始位置开始写入,假设读数据请求的请求开始位置为8K,第一固定长度为4K,请求读取长度为12K,可知需要从8K位置开始,读取12K/4K=3个目标最小数据单元,请求结束位置为8K+12K,也即20K。
由于最小数据单元为存储设备中读取数据以及写入数据的最小单元,因此根据请求开始位置可以定位至某一个最小数据单元的起始边界,根据请求读物长度可定位至某一个最小数据单元的结束边界。
209:从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据。
确定该至少一个目标最小数据单元之后,即可以从存储设备中读取该最少一个目标最小数据单元中的有效数据,并将该至少一个目标最小数据单元按照存储顺序拼接形成一个较大数据,反馈给请求端。
本实施例中,无需建立数据与存储位置的索引关系,通过计算即可以定位数据的存储位置,提高了操作便捷性。
其中,在某些实施例中,该最小数据单元可以包括数据存储区以及位于数据存储区尾部的元数据描述区,追加数据写入数据存储区中,追加数据的元数据生成该元数据描述区。元数据基于接收到追加数据生成,可以用于表示追加数据的属性等相关信息。
最小数据单元的长度固定,其数据存储区以及数据描述区的长度也均固定,追加数据写入数据存储区,即为最小数据单元的有效数据。最小数据单元的长度为第一固定长度,数据存储区的长度即为第二固定长度,元数据描述区的长度为第三固定长度,其中,第一固定长度即等于第二固定长度与第三固定长度之和。
如图3中所示的最小数据单元的结构示意图,可知,每一个最小数据单元300即由数据存储区301以及元数据描述区302构成,元数据描述区302位于数据存储区301的尾部。
其中,数据存储区301存储有效数据,也即请求写入的数据,如图3中的阴影部分;元数据描述区302中用于存储有效数据的元数据。
因此,在某些实施例中,所述将所述追加数据中的至少部分数据追加写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元可以包括:
将所述追加数据中的至少部分数据追加写入所述缓存的最小数据单元中的数据存储区,以及基于所述至少部分数据修改所述缓存的最小数据单元中的元数据描述区,以获得第一待存储的最小数据单元;
将所述追加数据中的未写入数据写入至少一个最小数据单元中的数据存储区,以及基于每一个数据存储区中写入的数据生成每一个数据存储区对应的元数据描述区,以获得至少一个第二待存储的最小数据单元。
本实施例中,将追加数据及其元数据封装在一起,形成独立的数据格式,只需一次写操作即可以实现追加数据与元数据的写入,可以减少写入次数,提高写操作效率。
可选地,如果最小数据单元包括数据存储区以及元数据描述区,在某些实施例中,从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据可以包括:
从存储设备中,读取并拼接所述至少一个目标最小数据单元中数据存储区中的有效数据。
此外,读取所述至少一个目标最小数据单元中数据存储区中的有效数据时,还可以对每一个目标最小数据单元中的有效数据进行数据校验,以校验数据是否完整以及是否正确等;
如果校验通过,再将至少一个目标最小数据单元中数据存储区中的有效数据进行拼接。
为了方便进行数据校验,可选地,元数据描述区中的元数据可以包括有效数据的数据校验和,也即基于数据存储区写入的数据,计算数据校验和,并将数据校验和作为元数据存储至该元数据描述区中。
其中,数据校验和可以利用CRC(Cyclical Redundancy Check,循环冗余检查)算法计算获得,与现有技术相同,在此不再赘述。
因此从存储设备中,读取并拼接所述至少一个目标最小数据单元中数据存储区中的有效数据可以包括:
从存储设备中,读取所述至少一个目标最小数据单元的数据存储区中的有效数据;
基于每一个目标最小数据单元的元数据描述区的数据校验和,校验从每一个目标最小数据单元的数据存储区读取的有效数据;
如果所述至少一个目标最小数据单元的数据存储区中的有效数据均校验通过,拼接所述至少一个目标最小数据单元的数据存储区中的有效数据。
此外,由于一次追加写请求针对追加数据可能划分为至少一个最小数据单元写入存储设备中。而在写操作过程中,可能遭遇进程异常崩溃、机器掉电重启等写操作故障,进程重启之后需要进行数据恢复,数据恢复时就需要保证请求原子性,也即一次追加写请求要么全部成功要么全部失败,也即一次追加写请求对应的至少一个最小数据单元要么全部写入成功,要么全部写入失败,以在发生写操作故障导致进程重启之后,可以保证数据恢复至任一个追加写请求的边界,以保证追加写请求的原子性。
为了保证请求原子性,在某些实施例中,在该元数据中可以包括倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度,以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求;
此外,为了方便校验数据,还可以包括倒数第二次写操作之后的计算获得的数据校验和以及最后一次写操作之后计算获得的数据校验和。
当然该元数据必然还包括一些与业务相关的其它属性信息,与现有技术相同在此不再赘述。
也即每一个元数据描述区可以至少包括以下字段:
prevSize,表示最小数据单元中倒数第二次写操作之后的有效数据长度;
prevCrc,表示最小数据单元倒数第二次写操作之后,最小数据单元内有效数据的数据校验和,也即数据存储区中存储数据的数据校验和;
currSize,表示最小数据单元中最后一次写操作之后的有效数据长度;
currCrc,表示最小数据单元中最后一次写操作之后,最小数据单元中有效数据的数据校验和,也即数据存储区中存储数据的数据校验和。
hasMore,表示最小数据单元中最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求。
因此,在某些实施例中,如果检测到数据恢复指令,需要进行数据恢复时,即可以至少基于每一个最小数据单元中倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性。
通过在元数据描述区记录的描述最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求的字段,可以确定相邻两次追加写请求是否对应同一最小数据单元,使得数据恢复过程中,可以保证能够识别出任一追加写请求的数据结束边界,进而基于倒数第二次写操作之后的有效数据长度或者最后一次写操作之后的有效数据长度,即可以恢复数据文件至任一追加写请求的数据结束边界,以此保证了请求原子性。
具体的,可以参见图4中所示,为本申请实施例提供的一种数据处理方法又一个实施例的流程图,该方法可以包括一下几个步骤:
401:针对一次追加写请求中的追加数据,查找缓存的最小数据单元。
402:将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元。
403:将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元。
404:将所述第一待存储的最小数据单元覆盖写入存储设备。
405:将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备。
406:缓存所述追加数据对应的未写满的最小数据单元。
步骤401~步骤406的操作可以参见上述实施例中步骤101~步骤106的操作,在此不再赘述。
407:检测数据恢复指令。
该数据恢复指令可以是在检测存在写操作故障时而自动触发的,当然也可以是人工触发的。
408:初始化数据文件的已写入数据的有效数据长度为零以及初始扫描位置为所述存储设备中所述数据文件的起始位置。
由于写操作故障可能发生于一次写操作过程中的任意时刻,因此步骤407的操作并不限定于本实施例的操作顺序,其可以在步骤401~步骤406任一步骤之后或者之前或者同时执行。
409:扫描下一个最小数据单元。
410:校验当前最小数据单元中倒数第二次写操作之后的数据是否成功,如果是执行步骤411,如果否执行步骤415。
其中,元数据描述区可以存储倒数第二次写操作之后的数据校验和,因此可以基于 该倒数第二次写操作之后的数据校验和确定该倒数第二次写操作之后的数据是否校验成功。
411:基于所述倒数第二次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度;
412:校验最后一次写操作之后的数据是否成功,如果是执行步骤413,如果否执行步骤415。
其中,元数据描述区可以存储最后一次写操作之后的数据校验和,因此可以基于最后一次写操作之后的数据校验和确定该最后一次写操作之后的数据是否校验成功。
413:检测当前最小数据单元与下一个最小数据单元是否属于同一个追加写请求;如果是,返回步骤409继续执行,如果否,执行步骤414。
如果当前最小数据单元与下一个最小数据单元属于同一个追加写请求,此时即表明该当前最小数据单元中最后一次写入的数据并非一个追加写请求的数据结束位置,因此需要继续进行扫描以找到请求边界。
414:基于所述最后用一次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度,并返回步骤409继续执行。
415:将当前已写入数据的有效数据长度作为所述数据文件的数据恢复长度。
如果任一最小数据单元最后一次写操作之后的数据校验失败,而倒数第二次写操作之后的数据校验成功,即表明该倒数第二次写操作与该最后一次写操作不属于同一追加写请求,此时若最后一次写操作之后的数据校验失败,则可以将基于倒数第二次写操作之后有效数据长度更新获得的已写入数据的有效数据长度作为数据文件的恢复长度。
本实施例中,如果存在写操作故障,可以通过数据恢复以保证请求原子性,使得数据文件可以恢复至任一个追加写请求的数据结束位置,因此本申请实施例的技术方案不仅可以保证写操作效率、提高写操作的便捷性,还可以保证请求原子性。
此外,在某些实施例中,如果任一个最小数据单元中最后一次写操作之后的数据校验失败,还可以将最后一次写入的数据删除之后缓存所述任一个最小数据单元,以保障可以继续执行追加写操作。
在上述一个或多个实施例中,所述将所述追加数据中的未写入数据写入至少一个最小数据单元中的数据存储区可以包括:
将所述追加数据中的未写入数据写入至少一个最小数据单元中数据存储区;
如果任一个最小数据单元的数据存储区未写满数据,在所述任一个最小数据单元的数据存储区中写入的数据的尾部利用预设字符填满所述数据存储区;
所述将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元中的数据存储区包括:
将所述追加数据中的至少部分数据写入所述缓存的最小数据单元的数据存储区中预设字符所在位置,以替换所述预设字符。
可选地,该预设字符即可以为字符0或者空字符等。
在上述一个或多个实施例中,所述将所述第一待存储的最小数据单元覆盖写入存储设备可以包括:
基于所述追加写请求的写入起始位置以及所述最小数据单元的第一固定长度,确定待写入位置;
基于所述待写入位置,将所述第一待存储的最小数据单元覆盖写入所述存储设备。
每一个追加写请求中携带写入起始位置,还可以包括写入数据长度等。因此基于写入起始位置以及最小数据单元的第一固定长度,可以找到前一个最小数据单元的位置,作为待写入位置,以将第一待存储的最小数据单元覆盖前一个最小数据单元,写入所述存储设备。
图5为本申请实施例提供的一种数据处理方法又一个实施例的流程图,本实施例从数据获取流程进行描述,所述方法可以包括以下几个步骤:
501:接收读数据请求。
502:基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元。
503:从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据。
其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
每一个最小数据单元的生成以及写入流程可以参见上述图1所示实施例,在此不再赘述。
可选地,在某些实施例中,所述基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元包括:
确定所述读数据请求对应的请求开始位置以及请求偏移量;
基于所述最小数据单元的第一固定长度、所述请求开始位置以及所述请求偏移量,计算所述读取请求对应的至少一个目标最小数据单元。
可选地,在某些实施例中,所述最小数据单元可以包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;
所述从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据包括:
从存储设备中,读取并拼接所述至少一个目标最小数据单元的数据存储区中的有效数据。
其中,该元数据还可以包括倒数第二次写操作之后的数据校验和以及最后一次写操作之后的数据校验和;
因此可以从存储设备中读取所述至少一个目标最小数据单元的数据存储区中的有效数据之后,可以基于每一个最小数据单元中倒数第二次写操作之后的数据校验和以及最后一次写操作之后的数据校验和,进行数据校验,如果每一个最小数据单元的倒数第二次写操作之后的数据校验和以及最后一次写操作之后的数据校验和均校验成功,再拼接所述至少一个目标最小数据单元的数据存储区中的有效数据。
图6为本申请实施例提供的一种数据处理方法又一个实施例的流程图,本实施例主要从数据恢复流程进行描述,所述方法可以包括以下几个步骤:
601:检测数据恢复指令。
602:基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
其中,最小数据单元的生成以及写入可以参见上述实施例中所述的数据写入流程,在此不再赘述。
可选地,所述基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性可以包括:
初始化数据文件的已写入数据的有效数据长度为零以及初始扫描位置为所述存储设备中所述数据文件的起始位置;
扫描下一个最小数据单元;
校验当前最小数据单元中倒数第二次写操作之后的数据;
如果所述当前最小数据单元中倒数第二次写操作之后的数据校验通过,基于所述当最小数据单元中倒数第二次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度;
校验所述当前最小数据单元中最后一次写操作之后的数据;
如果所述当前最小数据单元中最后一次写操作之后的数据校验通过,检测所述当前最小数据单元与下一个最小数据单元是否属于同一个追加写请求;
如果是,返回所述扫描下一个最小数据单元的步骤继续执行;
如果否,基于所述当前最小数据单元中最后一次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度,并返回所述扫描下一个最小数据单元的步骤继续执行;
如果任一个最小数据单元中倒数第二次写操作之后的数据校验失败或者最后一次写操作之后的数据校验失败,将当前已写入有效数据长度作为所述数据文件的数据恢复长度。
其中,可选地,元数据描述区可以存储倒数第二次写操作之后的数据校验和,因此可以基于该倒数第二次写操作之后的数据校验和确定该倒数第二次写操作之后的数据是否校验成功。
可选地,元数据描述区可以存储最后一次写操作之后的数据校验和,因此可以基于最后一次写操作之后的数据校验和确定该最后一次写操作之后的数据是否校验成功。
通过本申请实施例的技术方案,将追加数据与追加数据的元数据一起写入存储设备中,可以减少写操作次数,保证写操作效率。且将追加数据按照最小数据单元的固定数据格式重新组织,无需维护索引关系,通过计算即可以定位数据位置,实现数据读取,保证了写操作的便捷性。且本申请实施例中,通过覆盖写操作,并记录前后两次追加写 请求是否对应同一个最小数据单元,可以识别追加写请求的数据结束边界,保证了数据恢复时的请求原子性。
图7为本申请实施例提供的一种数据处理装置一个实施例的结构示意图,该装置可以包括:
缓存查找模块701,用于针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
数据组织模块702,用于将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;
数据写入模块703,用于将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
数据缓存模块704,用于缓存所述追加数据对应的未写满的最小数据单元。
在某些实施例中,该数据处理装置还可以包括:
请求接收模块,用于接收读数据请求;
计算模块,用于基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
可选地,该计算模块可以具体用于确定所述读数据请求对应的请求开始位置以及请求偏移量;
基于所述最小数据单元的第一固定长度、所述请求开始位置以及所述请求偏移量,计算所述读取请求对应的至少一个目标最小数据单元。
在某些实施例中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述元数据描述区用于存储元数据;
所述数据组织模块可以具体用于:
将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元中的数据存储区,并基于所述至少部分数据修改所述缓存的最小数据单元中的元数据描述区,以获得第一待存储的最小数据单元;
将所述追加数据中的未写入数据写入至少一个最小数据单元中的数据存储区,并基于每一个数据存储区中写入的数据生成每一个数据存储区对应的元数据描述区,以获得至少一个第二待存储的最小数据单元。
其中,所述元数据可以至少包括倒数第二次写操作之后的有效数据长度、最后一次 写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求;
在某些实施例中,该装置还可以包括:
故障检测模块,用于检测数据恢复指令;
数据恢复模块,用于基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性。
可选地,该数据恢复模块可以具体用于:
初始化数据文件的已写入数据的有效数据长度为零以及初始扫描位置为所述存储设备中所述数据文件的起始位置;
扫描下一个最小数据单元;
校验当前最小数据单元中倒数第二次写操作之后的数据;
如果所述当前最小数据单元中倒数第二次写操作之后的数据校验通过,基于所述当最小数据单元中倒数第二次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度;
校验所述当前最小数据单元中最后一次写操作之后的数据;
如果所述当前最小数据单元中最后一次写操作之后的数据校验通过,检测所述当前最小数据单元与下一个最小数据单元是否属于同一个追加写请求;
如果是,返回所述扫描下一个最小数据单元的步骤继续执行;
如果否,基于所述当前最小数据单元中最后一次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度,并返回所述扫描下一个最小数据单元的步骤继续执行;
如果任一个最小数据单元中倒数第二次写操作之后的数据校验失败或者最后一次写操作之后的数据校验失败,将当前已写入有效数据长度作为所述数据文件的数据恢复长度。
在某些实施例中,所述元数据还包括倒数第二次写操作之后的数据校验和以及最后一次写操作之后的数据校验和;
所述数据恢复模块校验所述当前最小数据单元中倒数第二次写操作之后的数据可以具体是基于所述当前最小数据单元中倒数第二次写操作之后的数据校验和,校验倒数第二次写操作之后的数据;
所述数据恢复模块校验所述当前最小数据单元中最后一次写操作之后的数据可以具体是基于所述当前最小数据单元中最后一次写操作之后的数据校验和,校验倒数最后一次写操作之后的数据。
在某些实施例中,该装置还以包括:
缓存触发模块,用于如果任一个最小数据单元中最后一次写操作之后的数据校验失败,将最后一次写入的数据删除之后缓存所述任一个最小数据单元。
可选地,在某些实施例中,所述数据组织模块将所述追加数据中的未写入数据写入至少一个最小数据单元中的数据存储区可以具体是将所述追加数据中的未写入数据写入至少一个最小数据单元中数据存储区;如果任一个最小数据单元的数据存储区未写满数据,在所述任一个最小数据单元的数据存储区中写入的数据的尾部利用预设字符填满所述数据存储区;
所述数据写入模块将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元中的数据存储区可以具体是将所述追加数据中的至少部分数据写入所述缓存的最小数据单元的数据存储区中预设字符所在位置,以替换所述预设字符。
在某些实施例中,所述数据写入模块将所述第一待存储的最小数据单元覆盖写入存储设备可以具体是基于所述追加写请求的写入起始位置以及所述最小数据单元的第一固定长度,确定待写入位置;
基于所述待写入位置,将所述第一待存储的最小数据单元覆盖写入所述存储设备。
在一个可能的设计中,图7所示实施例的数据处理装置可以实现为一计算设备,在分布式存储系统中,该计算设备可以部署分布式存储系统中的数据存储节点等,数据存储节点即为分布式存储系统中负责处理写数据请求或者读数据请求等的节点,分布式存储系统由多个数据存储节点构成。
如图8所示,该计算设备可以包括存储组件801以及处理组件802;
存储组件801存储一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件802调用执行。
所述处理组件802用于:
针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元 以获得至少一个第二待存储的最小数据单元;
将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
缓存所述追加数据对应的未写满的最小数据单元。
其中,可以是在存储组件801中缓存所述追加数据对应的未写满的最小数据单元。
该存储设备可以为该计算设备的外部存储介质,当然在某些实现场景中,也可以即是指该存储组件901。
可选地,还处理组件还可以执行图1~图4任一实施例中所述的数据处理方法。
其中,处理组件802可以包括一个或多个处理器来执行计算机指令,以完成上述的方法中的全部或部分步骤。当然处理组件也可以为一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
存储组件801被配置为存储各种类型的数据以支持在计算设备的操作。存储组件可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
当然,计算设备必然还可以包括其他部件,例如输入/输出接口、通信组件等。
输入/输出接口为处理组件和外围接口模块之间提供接口,上述外围接口模块可以是输出设备、输入设备等。
通信组件被配置为便于计算设备和其他设备之间有线或无线方式的通信,例如和请求端的通信等。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被计算机执行时可以实现上述图1~图3所示任一实施例的数据处理方法。
图9为本申请实施例提供的一种数据处理装置又一个实施例的结构示意图,该装置可以包括:
请求接收模块901,用于接收读数据请求;
计算模块902,用于基于最小数据单元的第一固定长度,计算所述读数据请求对应 的至少一个目标最小数据单元;
数据获取模块903,用于从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
拼接之后获得的数据即可以反馈给请求端。
其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
在某些实施例,所述计算模块可以具体用于:
确定所述读数据请求对应的请求开始位置以及请求偏移量;
基于所述最小数据单元的第一固定长度、所述请求开始位置以及所述请求偏移量,计算所述读取请求对应的至少一个目标最小数据单元。
在某些实施例中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;
所述从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据包括:
从存储设备中,读取并拼接所述至少一个目标最小数据单元的数据存储区中的有效数据。
在一个可能的设计中,图9所示实施例的数据处理装置可以实现为一计算设备,在分布式存储系统中,该计算设备可以部署分布式存储系统中的数据存储节点等。
如图10所示,该计算设备可以包括存储组件1001以及处理组件1002;
存储组件1001一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件1002调用执行。
所述处理组件1002用于:
接收读数据请求;
基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少 部分追加数据。
其中,处理组件1002可以包括一个或多个处理器来执行计算机指令,以完成上述的方法中的全部或部分步骤。当然处理组件也可以为一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
存储组件1001被配置为存储各种类型的数据以支持在计算设备的操作。存储器可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
当然,计算设备必然还可以包括其他部件,例如输入/输出接口、通信组件等。
输入/输出接口为处理组件和外围接口模块之间提供接口,上述外围接口模块可以是输出设备、输入设备等。
通信组件被配置为便于计算设备和其他设备之间有线或无线方式的通信,例如和请求端的通信等。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被计算机执行时可以实现上述图4所示任一实施例的数据处理方法。
可选地,图10所示计算设备与8所示计算设备可以为同一个计算设备。
图11为本申请实施例提供的一种数据处理装置又一个实施例的结构示意图,该装置可以包括:
故障检测模块1101,用于检测数据恢复指令;
数据恢复模块1102,用于基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写 请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
在某些实施例中,该数据恢复模块可以具体用于:
扫描下一个最小数据单元;
校验当前最小数据单元中倒数第二次写操作之后的数据;
如果所述当前最小数据单元中倒数第二次写操作之后的数据校验通过,基于所述当最小数据单元中倒数第二次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度;
校验所述当前最小数据单元中最后一次写操作之后的数据;
如果所述当前最小数据单元中最后一次写操作之后的数据校验通过,检测所述当前最小数据单元与下一个最小数据单元是否属于同一个追加写请求;
如果是,返回执行所述扫描下一个最小数据单元;
如果否,基于所述当前最小数据单元中最后一次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度,并返回执行所述扫描下一个最小数据单元;
如果任一个最小数据单元中倒数第二次写操作之后的数据校验失败或者最后一次写操作之后的数据校验失败,将当前已写入有效数据长度作为所述数据文件的数据恢复长度。
在一个可能的设计中,图11所示的数据处理装置可以实现为一计算设备,在分布式存储系统中,该计算设备可以部署分布式存储系统中的数据存储节点等。
如图12所示,该计算设备可以包括存储组件1201以及处理组件1202;
存储组件1201一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件1202调用执行。
所述处理组件1202用于:
检测数据恢复指令;
基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的 元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
其中,处理组件1202可以包括一个或多个处理器来执行计算机指令,以完成上述的方法中的全部或部分步骤。当然处理组件也可以为一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
存储组件1201被配置为存储各种类型的数据以支持在计算设备的操作。存储器可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
当然,计算设备必然还可以包括其他部件,例如输入/输出接口、通信组件等。
输入/输出接口为处理组件和外围接口模块之间提供接口,上述外围接口模块可以是输出设备、输入设备等。
通信组件被配置为便于计算设备和其他设备之间有线或无线方式的通信,例如和请求端的通信等。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被计算机执行时可以实现上述图5所示实施例的数据处理方法。
可选地,图12所示计算设备与图10所示计算设备以及图8可以为同一个计算设备。
通过本申请实施例的技术方案,将追加数据与追加数据的元数据一起写入存储设备中,可以减少写操作次数,保证写操作效率。且将追加数据按照最小数据单元的固定数据格式重新组织,无需维护索引关系,通过计算即可以定位数据位置实现数据读取,保证了写操作的便捷性。且本申请实施例中,通过覆盖写操作,并记录前后两次追加写请求是否对应同一个最小数据单元,可以识别追加写请求的数据结束边界,保证了数据恢复时的请求原子性。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (21)

  1. 一种数据处理方法,其特征在于,包括:
    针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
    将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;
    将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
    缓存所述追加数据对应的未写满的最小数据单元。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    接收读数据请求;
    基于所述最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
    从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元包括:
    确定所述读数据请求对应的请求开始位置以及请求偏移量;
    基于所述最小数据单元的第一固定长度、所述请求开始位置以及所述请求偏移量,计算所述读取请求对应的至少一个目标最小数据单元。
  4. 根据权利要求1所述的方法,其特征在于,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述元数据描述区用于存储元数据;
    所述将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元包括:
    将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元中的数据存储区,并基于所述至少部分数据修改所述缓存的最小数据单元中的元数据描述区,以获得第一待存储的最小数据单元;
    将所述追加数据中的未写入数据写入至少一个最小数据单元中的数据存储区,并基于每一个数据存储区中写入的数据生成每一个数据存储区对应的元数据描述区,以获得至少一个第二待存储的最小数据单元。
  5. 根据权利要求4所述的方法,其特征在于,所述元数据至少包括倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求;
    所述方法还包括:
    检测数据恢复指令;
    基于每一个最小数据单元中倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性。
  6. 根据权利要求4所述的方法,其特征在于,所述基于每一个最小数据单元中倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性包括:
    初始化数据文件的已写入数据的有效数据长度为零以及初始扫描位置为所述存储设备中所述数据文件的起始位置;
    扫描下一个最小数据单元;
    校验当前最小数据单元中倒数第二次写操作之后的数据;
    如果所述当前最小数据单元中倒数第二次写操作之后的数据校验通过,基于所述当前最小数据单元中倒数第二次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度;
    校验所述当前最小数据单元中最后一次写操作之后的数据;
    如果所述当前最小数据单元中最后一次写操作之后的数据校验通过,检测所述当前最小数据单元与下一个最小数据单元是否属于同一个追加写请求;
    如果是,返回所述扫描下一个最小数据单元的步骤继续执行;
    如果否,基于所述当前最小数据单元中最后一次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度,并返回所述扫描下一个最小数据单元的步骤继续执行;
    如果任一个最小数据单元中倒数第二次写操作之后的数据校验失败或者最后一次写操作之后的数据校验失败,将当前已写入数据的有效数据长度作为所述数据文件的数据恢复长度。
  7. 根据权利要求6所述的方法,其特征在于,所述元数据还包括倒数第二次写操作之后的数据校验和以及最后一次写操作之后的数据校验和;
    所述校验所述当前最小数据单元中倒数第二次写操作之后的数据包括:
    基于所述当前最小数据单元中倒数第二次写操作之后的数据校验和,校验倒数第二次写操作之后的数据;
    所述校验所述当前最小数据单元中最后一次写操作之后的数据包括:
    基于所述当前最小数据单元中最后一次写操作之后的数据校验和,校验倒数最后一次写操作之后的数据。
  8. 根据权利要求6所述的方法,其特征在于,还包括:
    如果任一个最小数据单元中最后一次写操作之后的数据校验失败,将最后一次写入的数据删除之后缓存所述任一个最小数据单元。
  9. 根据权利要求3所述的方法,其特征在于,所述将所述追加数据中的未写入数据写入至少一个最小数据单元中的数据存储区包括:
    将所述追加数据中的未写入数据依次写入至少一个最小数据单元中数据存储区;
    如果任一个最小数据单元的数据存储区未写满数据,在所述任一个最小数据单元的数据存储区中写入的数据的尾部利用预设字符填满所述数据存储区;
    所述将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元中的数据存储区包括:
    将所述追加数据中的至少部分数据写入所述缓存的最小数据单元的数据存储区中预设字符所在位置,以替换所述预设字符。
  10. 根据权利要求1所述的方法,其特征在于,所述将所述第一待存储的最小数据单元覆盖写入存储设备包括:
    基于所述追加写请求的写入起始位置以及所述最小数据单元的第一固定长度,确定待写入位置;
    基于所述待写入位置,将所述第一待存储的最小数据单元覆盖写入所述存储设备。
  11. 一种数据处理方法,其特征在于,包括:
    接收读数据请求;
    基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
    从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
    其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
  12. 根据权利要求11所述的方法,其特征在于,所述基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元包括:
    确定所述读数据请求对应的请求开始位置以及请求偏移量;
    基于所述最小数据单元的第一固定长度、所述请求开始位置以及所述请求偏移量,计算所述读取请求对应的至少一个目标最小数据单元。
  13. 根据权利要求11所述的方法,其特征在于,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;
    所述从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据包括:
    从存储设备中,读取并拼接所述至少一个目标最小数据单元的数据存储区中的有效数据。
  14. 一种数据处理方法,其特征在于,包括:
    检测数据恢复指令;
    基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
    其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
  15. 根据权利要求14所述的方法,其特征在于,所述恢复步骤包括:
    初始化数据文件的已写入数据的有效数据长度为零以及初始扫描位置为所述存储设备中所述数据文件的起始位置;
    扫描下一个最小数据单元;
    校验当前最小数据单元中倒数第二次写操作之后的数据;
    如果所述当前最小数据单元中倒数第二次写操作之后的数据校验通过,基于所述当前最小数据单元中倒数第二次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度;
    校验所述当前最小数据单元中最后一次写操作之后的数据;
    如果所述当前最小数据单元中最后一次写操作之后的数据校验通过,检测所述当前最小数据单元与下一个最小数据单元是否属于同一个追加写请求;
    如果是,返回执行所述扫描下一个最小数据单元;
    如果否,基于所述当前最小数据单元中最后一次写操作之后的有效数据长度,更新所述已写入数据的有效数据长度,并返回执行所述扫描下一个最小数据单元;
    如果任一个最小数据单元中倒数第二次写操作之后的数据校验失败或者最后一次写操作之后的数据校验失败,将当前已写入有效数据长度作为所述数据文件的数据恢复长度。
  16. 一种数据处理装置,其特征在于,包括:
    缓存查找模块,用于针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
    数据组织模块,用于将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;
    数据写入模块,用于将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
    数据缓存模块,用于缓存所述追加数据对应的未写满的最小数据单元。
  17. 一种数据处理装置,其特征在于,包括:
    请求接收模块,用于接收读数据请求;
    计算模块,用于基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
    数据获取模块,用于从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
    其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
  18. 一种数据处理装置,其特征在于,包括:
    故障检测模块,用于检测数据恢复指令;
    数据恢复模块,用于基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
    其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
  19. 一种计算设备,其特征在于,包括存储组件以及处理组件,
    所述存储组件用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件调用执行;
    所述处理组件用于:
    针对一次追加写请求中的追加数据,查找缓存的最小数据单元;
    将所述追加数据中的至少部分数据顺序写入所述缓存的最小数据单元以获得第一待存储的最小数据单元,以及将所述追加数据中的未写入数据写入至少一个最小数据单元以获得至少一个第二待存储的最小数据单元;
    将所述第一待存储的最小数据单元覆盖写入存储设备,以及将所述至少一个第二待存储的最小数据单元顺序写入所述存储设备;
    缓存所述追加数据对应的未写满的最小数据单元。
  20. 一种计算设备,其特征在于,包括存储组件以及处理组件,
    所述存储组件用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件调用执行;
    所述处理组件用于:
    接收读数据请求;
    基于最小数据单元的第一固定长度,计算所述读数据请求对应的至少一个目标最小数据单元;
    从存储设备中,读取并拼接所述至少一个目标最小数据单元中的有效数据;
    其中,每一个最小数据单元中写入的有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至 少部分追加数据。
  21. 一种计算设备,其特征在于,包括存储组件以及处理组件,
    所述存储组件用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令供所述处理组件调用执行;
    所述处理组件用于:
    检测数据恢复指令;
    基于最小数据单元的元数据描述区的倒数第二次写操作之后的有效数据长度、最后一次写操作之后的有效数据长度、以及最后一次写入的数据与下一个最小数据单元中的至少部分数据是否属于同一个追加写请求,恢复数据文件的数据恢复长度至任一追加写请求的数据结束位置以保持原子性;
    其中,所述最小数据单元包括数据存储区以及位于所述数据存储区尾部的元数据描述区;所述数据存储区用于存储有效数据;所述元数据描述区用于存储所述有效数据的元数据;所述有效数据至少包括一次追加写请求中的至少部分追加数据或者一次追加写请求中的至少部分追加数据以及下一次追加写请求中的至少部分追加数据。
PCT/CN2019/070581 2018-01-09 2019-01-07 数据处理方法、装置及计算设备 WO2019137322A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19738606.3A EP3739472A4 (en) 2018-01-09 2019-01-07 METHOD AND DEVICE FOR DATA PROCESSING AND COMPUTER DEVICE
US16/923,999 US11294592B2 (en) 2018-01-09 2020-07-08 Method and device for data processing, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810020395.X 2018-01-09
CN201810020395.XA CN110018784B (zh) 2018-01-09 2018-01-09 数据处理方法、装置及计算设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/923,999 Continuation US11294592B2 (en) 2018-01-09 2020-07-08 Method and device for data processing, and computer device

Publications (1)

Publication Number Publication Date
WO2019137322A1 true WO2019137322A1 (zh) 2019-07-18

Family

ID=67187847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/070581 WO2019137322A1 (zh) 2018-01-09 2019-01-07 数据处理方法、装置及计算设备

Country Status (4)

Country Link
US (1) US11294592B2 (zh)
EP (1) EP3739472A4 (zh)
CN (1) CN110018784B (zh)
WO (1) WO2019137322A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115908678B (zh) * 2023-02-25 2023-05-30 深圳市益玩网络科技有限公司 骨骼模型渲染方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101241420A (zh) * 2008-03-20 2008-08-13 杭州华三通信技术有限公司 用于提高写地址非连续的数据存储效率的方法和存储设备
CN102799679A (zh) * 2012-07-24 2012-11-28 河海大学 基于Hadoop的海量空间数据索引更新系统及方法
CN103257831A (zh) * 2012-02-20 2013-08-21 深圳市腾讯计算机系统有限公司 存储器的读写控制方法及对应的存储器
CN105335098A (zh) * 2015-09-25 2016-02-17 华中科技大学 一种基于存储级内存的日志文件系统性能提高方法
CN106445405A (zh) * 2015-08-13 2017-02-22 北京忆恒创源科技有限公司 一种面向闪存存储的数据访问方法及其装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6343352B1 (en) 1997-10-10 2002-01-29 Rambus Inc. Method and apparatus for two step memory write operations
JP2005293774A (ja) 2004-04-02 2005-10-20 Hitachi Global Storage Technologies Netherlands Bv ディスク装置の制御方法
US7519788B2 (en) 2004-06-04 2009-04-14 Micron Technology, Inc. System and method for an asynchronous data buffer having buffer write and read pointers
JP2006072435A (ja) 2004-08-31 2006-03-16 Hitachi Ltd ストレージシステムおよびデータ記録方法
US7743217B2 (en) 2005-06-29 2010-06-22 Stmicroelectronics S.A. Cache consistency in a multiprocessor system with shared memory
US8250316B2 (en) 2006-06-06 2012-08-21 Seagate Technology Llc Write caching random data and sequential data simultaneously
US7596643B2 (en) 2007-02-07 2009-09-29 Siliconsystems, Inc. Storage subsystem with configurable buffer
JP4977583B2 (ja) 2007-11-22 2012-07-18 株式会社日立製作所 記憶制御装置及び記憶制御装置の制御方法
JP2010267164A (ja) 2009-05-15 2010-11-25 Toshiba Storage Device Corp 記憶装置、データ転送制御装置、データ転送方法およびデータ転送プログラム
US8370683B1 (en) 2009-07-31 2013-02-05 Western Digital Technologies, Inc. System and method to reduce write splice failures
DE112012002622B4 (de) * 2011-06-24 2017-01-26 International Business Machines Corporation Aufzeichnungseinheit für lineare Aufzeichnung zum Ausführen optimalen Schreibens beim Empfangen einer Reihe von Befehlen, darunter gemischte Lese- und Schreibbefehle, sowie Verfahren und Programm für dessen Ausführung
CN102364474B (zh) * 2011-11-17 2014-08-20 中国科学院计算技术研究所 用于机群文件系统的元数据存储系统和管理方法
US10684986B2 (en) 2013-08-28 2020-06-16 Biosense Webster (Israel) Ltd. Double buffering with atomic transactions for the persistent storage of real-time data flows
US9760281B2 (en) 2015-03-27 2017-09-12 Intel Corporation Sequential write stream management
US10592150B2 (en) * 2016-02-15 2020-03-17 Hitachi, Ltd. Storage apparatus
CN105843775B (zh) * 2016-04-06 2018-12-04 中国科学院计算技术研究所 片上数据划分读写方法、系统及其装置
CN111949605A (zh) * 2019-05-15 2020-11-17 伊姆西Ip控股有限责任公司 用于实现文件系统的方法、设备和计算机程序产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101241420A (zh) * 2008-03-20 2008-08-13 杭州华三通信技术有限公司 用于提高写地址非连续的数据存储效率的方法和存储设备
CN103257831A (zh) * 2012-02-20 2013-08-21 深圳市腾讯计算机系统有限公司 存储器的读写控制方法及对应的存储器
CN102799679A (zh) * 2012-07-24 2012-11-28 河海大学 基于Hadoop的海量空间数据索引更新系统及方法
CN106445405A (zh) * 2015-08-13 2017-02-22 北京忆恒创源科技有限公司 一种面向闪存存储的数据访问方法及其装置
CN105335098A (zh) * 2015-09-25 2016-02-17 华中科技大学 一种基于存储级内存的日志文件系统性能提高方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3739472A4

Also Published As

Publication number Publication date
CN110018784A (zh) 2019-07-16
US11294592B2 (en) 2022-04-05
CN110018784B (zh) 2023-01-10
EP3739472A4 (en) 2021-10-06
EP3739472A1 (en) 2020-11-18
US20200341680A1 (en) 2020-10-29

Similar Documents

Publication Publication Date Title
US20230117542A1 (en) Remote Data Replication Method and System
US9235524B1 (en) System and method for improving cache performance
JP6026538B2 (ja) 検証されたデータセットの不揮発性媒体ジャーナリング
US9134914B1 (en) Deduplication
US9396073B2 (en) Optimizing restores of deduplicated data
US8627012B1 (en) System and method for improving cache performance
US10740184B2 (en) Journal-less recovery for nested crash-consistent storage systems
US20180203606A1 (en) Method and device for writing data and acquiring data in a distributed storage system
JP5886447B2 (ja) ロケーション非依存のファイル
US20200241781A1 (en) Method and system for inline deduplication using erasure coding
US10929176B2 (en) Method of efficiently migrating data from one tier to another with suspend and resume capability
US10831401B2 (en) Method, device and computer program product for writing data
US10452502B2 (en) Handling node failure in multi-node data storage systems
WO2019137321A1 (zh) 数据处理方法、装置及计算设备
US20140279943A1 (en) File system verification method and information processing apparatus
WO2019137322A1 (zh) 数据处理方法、装置及计算设备
JPWO2015087509A1 (ja) 状態保存復元装置、状態保存復元方法、および、プログラム
US9053033B1 (en) System and method for cache content sharing
US11487428B2 (en) Storage control apparatus and storage control method
US11210024B2 (en) Optimizing read-modify-write operations to a storage device by writing a copy of the write data to a shadow block
US11256434B2 (en) Data de-duplication
US10649807B1 (en) Method to check file data integrity and report inconsistencies with bulk data movement
US10896201B2 (en) Synchronization of block based volumes
JP6327028B2 (ja) オブジェクトストレージシステムおよびその制御方法およびその制御プログラム
US20200241782A1 (en) Method and system for inline deduplication using erasure coding to minimize read and write operations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19738606

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019738606

Country of ref document: EP

Effective date: 20200810