WO2023207132A1 - Data storage method and apparatus, and device and medium - Google Patents
Data storage method and apparatus, and device and medium Download PDFInfo
- Publication number
- WO2023207132A1 WO2023207132A1 PCT/CN2022/138693 CN2022138693W WO2023207132A1 WO 2023207132 A1 WO2023207132 A1 WO 2023207132A1 CN 2022138693 W CN2022138693 W CN 2022138693W WO 2023207132 A1 WO2023207132 A1 WO 2023207132A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- data block
- stored
- storage
- name
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 138
- 238000013500 data storage Methods 0.000 title claims abstract description 54
- 239000007787 solid Substances 0.000 claims abstract description 41
- 230000011218 segmentation Effects 0.000 claims description 44
- 238000013507 mapping Methods 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0625—Power saving in storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Definitions
- the present invention relates to the field of distributed storage, and in particular, to a data storage method, device, equipment and medium.
- Distributed storage systems store data dispersedly on multiple independent devices. They generally adopt a scalable system structure, use multiple storage servers to share the storage load, and use location servers to locate storage information. It not only improves the reliability and availability of the system and access efficiency, and easy to expand.
- disks and SSD Solid State Drives
- key-value database software key-value database software
- the current mainstream solid-state storage media generally uses iSCSI (Internet Small Computer System Interface, Internet Small Computer System Interface) and NVMe (Non-Volatile Memory express) interfaces.
- iSCSI Internet Small Computer System Interface, Internet Small Computer System Interface
- NVMe Non-Volatile Memory express
- the application provides a data storage method applied to distributed storage systems, including:
- the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm
- the key-value information corresponding to the data to be stored is stored in the solid-state drive through the key-value storage interface in the solid-state drive.
- the method before determining the data name of the data to be stored using a preset name determination method, the method further includes:
- the segmentation offset is used to perform remainder on the segmentation length to obtain the remainder result corresponding to each data block, and the remainder result corresponding to each data block is determined as the data block sequence number of each data block.
- a preset name determination method is used to determine the data name of the data to be stored, including:
- the key value information corresponding to the data to be stored is constructed, and the key value information corresponding to the data to be stored is stored through the key value storage interface in the solid state drive.
- solid state drive including:
- the key-value information corresponding to the data block is stored in the solid-state drive through the key-value storage interface in the solid-state drive.
- the data block serial number and the current storage scenario are used to determine the data name of the data to be stored, including:
- determine the file number corresponding to the data to be stored In response to the current storage scenario being file storage or object storage, determine the file number corresponding to the data to be stored, and use the file number and the data block serial number to determine the data block name of the data block, or, in response to the current storage scenario being block storage, determine The logical unit number corresponding to the data to be stored, and the data block name of the data block is determined using the logical unit number and the data block serial number.
- a preset name determination method is used to determine the data name of the data to be stored, including:
- the key value information corresponding to the data to be stored is constructed, and the key value information corresponding to the data to be stored is stored through the key value storage interface in the solid state drive.
- solid state drive including:
- the key-value information corresponding to the data block is stored in the solid-state drive through the key-value storage interface in the solid-state drive.
- the above data storage method also includes:
- mapping relationship between the data block serial number and the data block name to form a mapping relationship list between the data block serial number and the data block name.
- the method before determining the data to be stored, the method further includes:
- determine the data to be stored including:
- the second aspect of this application provides a data storage device applied to a distributed storage system, including:
- the data name determination module is used to determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method;
- the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm;
- a key-value information building module used to construct key-value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value;
- the information storage module is used to store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive.
- a third aspect of this application provides an electronic device, including:
- One or more memories for holding computer-readable instructions
- One or more processors used to execute computer-readable instructions to implement the aforementioned data storage method.
- a fourth aspect of the present application provides a non-volatile computer-readable storage medium for storing computer-readable instructions; wherein the computer-readable instructions implement the aforementioned disclosed data storage when executed by one or more processors. Method steps.
- Figure 1 is a flow chart of a data storage method provided in one or more embodiments of the present application.
- Figure 2 is an overall architecture diagram of a data storage provided in one or more embodiments of the present application.
- Figure 3 is an overall architecture diagram of a traditional distributed data storage
- Figure 4 is a flow chart of a specific data storage method provided in one or more embodiments of the present application.
- Figure 5 is a flow chart of a specific data storage method provided in one or more embodiments of the present application.
- Figure 6 is a schematic structural diagram of a data storage device provided in one or more embodiments of the present application.
- Figure 7 is a structural diagram of an electronic device provided in one or more embodiments of the present application.
- An embodiment of the present invention discloses a data storage method, which is applied to a distributed storage system. See Figure 1.
- the method includes:
- Step S11 Determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method;
- the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm.
- determining the data to be stored it may also include: obtaining the data to be stored and determining a target object storage device corresponding to the data to be stored; writing the data to be stored into the target object storage device; correspondingly, determining the data to be stored.
- Storing data includes: retrieving data to be stored from the target object storage device.
- the data to be stored is data pre-written in the object storage device (ie, Object Storage Device, OSD).
- the data to be stored may be data taken out from a preset data pool and stored in the corresponding target object storage device.
- Step S12 Construct key value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value.
- this method avoids the use of hardware resources such as CPU and memory of storage nodes in traditional distributed storage, and uses database software to record the storage location of data. Instead, it directly determines the data name of the data to be stored ( (i.e., data ID), the method of determining the data name as a key (i.e., key) and determining the data to be stored as a value (i.e., value) can greatly shorten the software stack of distributed storage and shorten IO (i.e., Input/Output, Input/output) delay, thereby achieving the purpose of improving system performance.
- IO i.e., Input/Output, Input/output
- Step S13 Store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive.
- the solid state drive in this embodiment is a solid state drive with a key-value storage interface, which is hereinafter referred to as KV-SSD.
- the solid state drive in this embodiment may also be SSD hardware that can directly provide a key-value interface to the outside world as proposed in the NVMe2.0 protocol.
- Figure 2 is an overall architecture diagram of a data storage proposed by this application. The figure shows that after data interaction with the outside, the data can be stored in the data pool, and the data can be written to the corresponding object storage device. Finally, the process of writing data into a preset solid-state drive with a key-value storage interface through the storage layer, in which the data name of the data (i.e. data ID) can be directly used as the key of the data (i.e. key), and the data Directly as the value (i.e. value), and then complete the construction of key-value information.
- data ID data name of the data
- key i.e. key
- value i.e. value
- Figure 3 is the overall architecture diagram of traditional distributed data storage.
- database software i.e. RocksDB in the figure
- KV-SSD is directly used instead of database software to store key value information.
- the management of the metadata system is completed inside the KV-SSD, that is, inside the hardware, effectively reducing the number of storage nodes. Resource consumption and software layer complexity.
- the database software used in the traditional distributed storage method is no longer used to complete the storage of key-value information. As long as the keys and values corresponding to the data to be stored meet the preset key-value storage interface specifications, it is sufficient. suitable for the method of the present invention.
- the data to be stored is determined first, and the data name of the data to be stored is determined using a preset name determination method.
- the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm.
- the storage node uses the data name as the key and the data as the value, directly Stored in a solid-state drive with a key-value storage interface.
- the storage node does not need to manage metadata, and the management of metadata is completed by the KV-SSD main control CPU, thereby reducing the software level and reducing the number of
- the resource consumption of storage nodes reduces system complexity and system overhead, achieving the purpose of efficiently improving system performance.
- Figure 4 is a flow chart of a specific data storage method provided by an embodiment of the present application. As shown in Figure 4, the method includes:
- Step S21 Determine the data to be stored, and obtain the segmentation length through the preset segmentation length acquisition interface.
- the data to be stored can be segmented, and the segmented data can be processed. It is understandable that in this case, the segmentation length must first be obtained through the preset segmentation length acquisition interface, and then used to complete data segmentation based on the segmentation length. In a specific implementation, the segmentation length may be 2MB or 4MB, etc.
- Step S22 Segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored, and determine the segmentation offset corresponding to each data block.
- the segmentation length obtained through the preset segmentation length acquisition interface is used to segment the data to be stored, and each segmented data block is obtained. It can be understood that during the segmentation process, the segmentation offset (ie, offset) of each data block will also be determined.
- Step S23 Use the segmentation offset to perform remainder on the segmentation length to obtain the remainder result corresponding to each data block, and determine the remainder result corresponding to each data block as the data block sequence number of each data block.
- the data block sequence number can also be recorded as data index.
- Step S24 Determine the current storage scenario, and use the data block serial number and the current storage scenario to determine the data block name of each data block.
- using the data block serial number and the current storage scenario to determine the data name of the data to be stored may include: if the current storage scenario is file storage or object storage, determining the file number corresponding to the data to be stored, and using the file number Determine the data block name of the data block with the data block serial number; if the current storage scenario is block storage, determine the logical unit number corresponding to the data to be stored, and use the logical unit number and the data block serial number to determine the data block name of the data block.
- different data block name determination methods can be used in different storage scenarios. That is, if the current storage is file storage or object storage, the file number (ie, inodenumber) and data block serial number can be used to determine the data.
- LUN ID logical unit number
- the data block serial number can be used to determine the data block name of the data block.
- Step S25 Construct the key value information corresponding to the data block by determining the data block name as the key and the data block data in the data block as the value.
- step S25 For other more specific processing procedures of step S25, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
- Step S26 Store the key value information corresponding to the data block in the solid state drive through the key value storage interface in the solid state drive.
- step S26 For other more specific processing procedures of step S26, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
- the data to be stored is segmented, and each segmented data block is obtained, and then a determination method based on the storage scenario is used to determine the data block name of each data block, and
- the key-value information is constructed by determining the data block name as a value and the data block data as a value, and stores them in a solid-state drive with a key-value storage interface.
- the determination method based on the storage scenario proposed in this embodiment is simple to calculate. , the amount of calculation is small, and the system operation overhead and system complexity are reduced.
- Figure 5 is a flow chart of a specific data storage method provided by an embodiment of the present application. As shown in Figure 5, the method includes:
- Step S31 Determine the data to be stored, and obtain the segmentation length through the preset segmentation length acquisition interface.
- step S31 For other more specific processing procedures of step S31, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
- Step S32 Segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored.
- step S32 For other more specific processing procedures of step S32, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and will not be described again here.
- Step S33 Calculate the hash value corresponding to the data block using a preset hash algorithm.
- the hash value corresponding to the data block can be calculated through a determination method based on a hash algorithm.
- Hash algorithms include but are not limited to MD5 (Message Digest Algorithm 5, message digest algorithm), SHA (Secure Hash Algorithm, secure hash algorithm).
- Step S34 Use the hash value as the data block name of the data block.
- the data storage method may also include: binding the data block serial number corresponding to any data block and the data block name, and forming a mapping relationship; performing the mapping relationship between the data block serial number and the data block name. Record to form a mapping relationship list between data block serial numbers and data block names.
- this method also needs to additionally calculate the corresponding relationship between the data block serial number data index and the data block name data ID. Compared with the above determination method based on storage scenarios, the amount of calculation increases, but the advantage of this method is that For data blocks with the same content, the calculated data ID is the same. Since only one copy of the same data is stored on the disk, it has the function of deduplication.
- Step S35 Construct the key value information corresponding to the data block by determining the data block name as the key and the data block data in the data block as the value.
- step S35 For other more specific processing procedures of step S35, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
- Step S36 Store the key value information corresponding to the data block in the solid state drive through the key value storage interface in the solid state drive.
- step S36 For other more specific processing procedures of step S36, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
- the data to be stored is segmented, and each segmented data block is obtained, and then a determination method based on a hash algorithm is used to determine the data block name of each data block.
- the key-value information is constructed by determining the data block name as a value and the data block data as a value, and stores them in a solid-state drive with a key-value storage interface.
- the determination method based on the hash algorithm proposed in this embodiment Compared with the above determination method based on storage scenarios, the amount of calculation is increased, but the advantage of this method is that for data blocks with the same content, the calculated data ID is the same, because only one copy of the same data is stored on the disk , so it has the function of deduplication.
- a data storage device which may specifically include:
- the data name determination module 11 is used to determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method;
- the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm;
- the key value information construction module 12 is used to construct the key value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value;
- the information storage module 13 is used to store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive.
- This application first determines the data to be stored, and uses a preset name determination method to determine the data name of the data to be stored.
- the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm.
- the data name is determined by Construct the key value information corresponding to the data to be stored as a key and determine the data to be stored as a value, and store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive.
- the storage node uses the data name as the key and the data as the value, directly Stored in a solid-state drive with a key-value storage interface, the storage node does not need to manage metadata in this method, thereby reducing the software level, reducing the resource consumption of the storage node, reducing the system complexity and system overhead, and achieving The purpose of efficiently improving system performance.
- the data storage device also includes:
- the cutting length acquisition module is used to obtain the cutting length through the preset cutting length acquisition interface
- the data segmentation module is used to segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored, and determine the segmentation offset corresponding to each data block;
- the data block sequence number determination module is used to perform remainder on the segmentation length using the segmentation offset to obtain the remainder result corresponding to each data block, and determine the remainder result corresponding to each data block as each data block.
- the data block sequence number is used to perform remainder on the segmentation length using the segmentation offset to obtain the remainder result corresponding to each data block, and determine the remainder result corresponding to each data block as each data block. The data block sequence number.
- the data name determination module 11 includes:
- the scene determination unit is used to determine the current storage scene
- the first data block name determination unit is used to determine the data block name of each data block using the data block serial number and the current storage scenario
- the key value information building module 12 and the information storage module 13 include:
- the first key value information building unit is used to construct the key value information corresponding to the data block by determining the name of the data block as the key and the data block data in the data block as the value;
- the first information storage unit is used to store the key value information corresponding to the data block into the solid state drive through the key value storage interface in the solid state drive.
- the first data block name determination unit includes:
- the first scene naming unit is used to determine the file number corresponding to the data to be stored if the current storage scene is file storage or object storage, and use the file number and the data block serial number to determine the data block name of the data block;
- the second scene naming unit is used to determine the logical unit number corresponding to the data to be stored if the current storage scenario is block storage, and use the logical unit number and the data block serial number to determine the data block name of the data block.
- the data name determination module 11 includes:
- the hash value determination unit is used to calculate the hash value corresponding to the data block using a preset hash algorithm
- the second data block name determination unit is used to use the hash value as the data block name of the data block
- the key value information building module 12 and the information storage module 13 include:
- the first key value information building unit is used to construct the key value information corresponding to the data block by determining the name of the data block as the key and the data block data in the data block as the value;
- the first information storage unit is used to store the key value information corresponding to the data block into the solid state drive through the key value storage interface in the solid state drive.
- the data storage device also includes:
- the mapping relationship determination unit is used to bind the data block sequence number corresponding to any data block and the data block name, and form a mapping relationship;
- the mapping list determination unit is used to record the mapping relationship between the data block serial number and the data block name, so as to form a mapping relationship list between the data block serial number and the data block name.
- the data storage device also includes:
- An object storage device determination unit is used to obtain the data to be stored and determine the target object storage device corresponding to the data to be stored;
- a data writing unit is used to write the data to be stored into the target object storage device
- the data name determination module 11 includes:
- a data extraction unit is used to extract data to be stored from the target object storage device.
- Figure 7 is a structural diagram of an electronic device 20 according to an exemplary embodiment.
- the content in the figure cannot be considered to be any computer-readable instructions within the scope of use of the present application. limit.
- FIG. 7 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application.
- the electronic device 20 may specifically include: one or more processors 21, one or more memories 22, a power supply 23, a display screen 24, an input and output interface 25, a communication interface 26 and a communication bus 27.
- the memory 22 is used for storage, and the computer-readable instructions are loaded and executed by one or more processors 21 to implement the relevant steps in the data storage method disclosed in any of the foregoing embodiments.
- the electronic device 20 in this embodiment may specifically be an electronic computer.
- the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20; the communication interface 26 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here; the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here No specific limitation is made.
- the memory 22, as a carrier for resource storage can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
- the resources stored thereon can include an operating system 221, computer readable instructions 222, etc., and the storage method can be short-term storage or Permanent storage.
- the operating system 221 is used to manage and control each hardware device on the electronic device 20 and the computer readable instructions 222, which can be Windows, Unix, Linux, etc.
- the computer-readable instructions 222 may further include computer-readable instructions that can be used to complete other specific tasks. instruction.
- this application also discloses a non-volatile computer-readable storage medium.
- the non-volatile computer-readable storage medium mentioned here includes random access memory (Random Access Memory, RAM), memory, and read-only memory.
- Memory Read-Only Memory, ROM
- electrically programmable ROM electrically erasable programmable ROM
- register hard disk, magnetic disk or optical disk or any other form of storage medium known in the technical field.
- the computer-readable instructions implement the aforementioned disclosed data storage method when executed by one or more processors. Regarding the specific steps of this method, reference may be made to the corresponding content disclosed in the foregoing embodiments, which will not be described again here.
- RAM random access memory
- ROM read-only memory
- electrically programmable ROM electrically erasable programmable ROM
- registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
Abstract
The present application relates to the field of distributed storage. Provided are a data storage method and apparatus, and a device and a medium. The method comprises: determining data to be stored, and determining a data ID of said data by using a preset ID determination method, wherein the preset ID determination method is a determination method based on a storage scenario or a determination method based on a hash algorithm; by means of determining the data ID as a key and determining said data as a value, constructing key-value information corresponding to said data; and by means of a key-value storage interface in a solid state drive, storing, in the solid state drive, the key-value information corresponding to said data.
Description
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年04月28日提交中国专利局,申请号为202210462119.5,申请名称为“一种数据存储方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on April 28, 2022, with the application number 202210462119.5 and the application title "A data storage method, device, equipment and medium", the entire content of which is incorporated by reference. in this application.
本发明涉及分布式存储领域,尤其涉及一种数据存储方法、装置、设备及介质。The present invention relates to the field of distributed storage, and in particular, to a data storage method, device, equipment and medium.
分布式存储系统将数据分散存储在多台独立的设备上,一般采用可扩展的系统结构,利用多台存储服务器分担存储负荷,利用位置服务器定位存储信息,它不但提高了系统的可靠性、可用性和存取效率,还易于扩展。Distributed storage systems store data dispersedly on multiple independent devices. They generally adopt a scalable system structure, use multiple storage servers to share the storage load, and use location servers to locate storage information. It not only improves the reliability and availability of the system and access efficiency, and easy to expand.
在当前主流的分布式存储本地系统中,一般采用磁盘和SSD(即Solid State Drives,固态硬盘)作为存储介质。数据在存储到介质时,一些记录数据信息的数据,也就是元数据,一般还需要通过键值数据库软件(key-value DB,如RocksDB)组织起来。而当前主流的固态存储介质,普遍使用iSCSI(即Internet Small Computer System Interface,Internet小型计算机系统接口)、NVMe(即Non-Volatile Memory express)接口然而,发明人意识到,在存储软件系统中,键值存储由于简单泛用的接口,目前被广泛用于作为存储后端。这就导致了在当前的存储系统中,如果需要使用键值存储系统,就要经过多个软件层的转换,导致软件层次多,系统复杂,这带来了巨大的资源开销。In the current mainstream distributed storage local systems, disks and SSD (Solid State Drives) are generally used as storage media. When data is stored in the media, some data that records data information, that is, metadata, generally needs to be organized through key-value database software (key-value DB, such as RocksDB). The current mainstream solid-state storage media generally uses iSCSI (Internet Small Computer System Interface, Internet Small Computer System Interface) and NVMe (Non-Volatile Memory express) interfaces. However, the inventor realized that in the storage software system, the key Value storage is currently widely used as a storage backend due to its simple and versatile interface. This leads to the fact that in the current storage system, if you need to use a key-value storage system, you must go through the conversion of multiple software layers, resulting in multiple software layers and complex systems, which brings huge resource overhead.
由上可见,在使用键值存储系统的过程中,如何避免出现由于传统的数据存储方式导致系统复杂度高,系统开销大,软件层次多的情况是本领域有待解决的问题。It can be seen from the above that in the process of using key-value storage systems, how to avoid the high system complexity, high system overhead, and multiple software layers caused by traditional data storage methods is a problem to be solved in this field.
发明内容Contents of the invention
申请提供了一种数据存储方法,应用于分布式存储系统,包括:The application provides a data storage method applied to distributed storage systems, including:
确定待存储数据,并利用预设名称确定方法确定待存储数据的数据名称;预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法;Determine the data to be stored, and use a preset name determination method to determine the data name of the data to be stored; the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm;
通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息;及Construct key-value information corresponding to the data to be stored by determining the data name as the key and the data to be stored as the value; and
通过固态硬盘中的键值存储接口将待存储数据对应的键值信息存储至固态硬盘中。The key-value information corresponding to the data to be stored is stored in the solid-state drive through the key-value storage interface in the solid-state drive.
在其中一些实施例中,利用预设名称确定方法确定待存储数据的数据名称之前,还包括:In some embodiments, before determining the data name of the data to be stored using a preset name determination method, the method further includes:
通过预设切分长度获取接口获取切分长度;Obtain the cutting length through the preset cutting length acquisition interface;
利用切分长度对待存储数据进行切分,以获取与待存储数据所对应的各数据块,并确定与各数据块对应的切分偏移量;及Segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored, and determine the segmentation offset corresponding to each data block; and
分别利用切分偏移量对切分长度进行取余,以获取与各数据块对应的取余结果,并将各数据块对应的取余结果确定为各数据块的数据块序号。The segmentation offset is used to perform remainder on the segmentation length to obtain the remainder result corresponding to each data block, and the remainder result corresponding to each data block is determined as the data block sequence number of each data block.
在其中一些实施例中,利用预设名称确定方法确定待存储数据的数据名称,包括:In some embodiments, a preset name determination method is used to determine the data name of the data to be stored, including:
确定当前存储场景;及Determine the current storage scenario; and
利用数据块序号与当前存储场景确定各数据块的数据块名称;Use the data block serial number and the current storage scenario to determine the data block name of each data block;
相应的,通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息,通过固态硬盘中的键值存储接口将待存储数据对应的键值信息存储至固态硬盘中,包括:Correspondingly, by determining the data name as a key and the data to be stored as a value, the key value information corresponding to the data to be stored is constructed, and the key value information corresponding to the data to be stored is stored through the key value storage interface in the solid state drive. to solid state drive, including:
通过将数据块名称确定为键以及将数据块中的数据块数据确定为值的方式,构建数据块对应的键值信息;及Construct the key value information corresponding to the data block by determining the data block name as the key and the data block data in the data block as the value; and
通过固态硬盘中的键值存储接口将数据块对应的键值信息存储至固态硬盘中。The key-value information corresponding to the data block is stored in the solid-state drive through the key-value storage interface in the solid-state drive.
在其中一些实施例中,利用数据块序号与当前存储场景确定待存储数据的数据名称,包括:In some embodiments, the data block serial number and the current storage scenario are used to determine the data name of the data to be stored, including:
响应于当前存储场景为文件存储或对象存储,确定与待存储数据对应的文件编号,并利用文件编号与数据块序号确定数据块的数据块名称,或,响应于当前存储场景为块存储,确定与待存储数据对应的逻辑单元号,并利用逻辑单元号与数据块序号确定数据块的数据块名称。In response to the current storage scenario being file storage or object storage, determine the file number corresponding to the data to be stored, and use the file number and the data block serial number to determine the data block name of the data block, or, in response to the current storage scenario being block storage, determine The logical unit number corresponding to the data to be stored, and the data block name of the data block is determined using the logical unit number and the data block serial number.
在其中一些实施例中,利用预设名称确定方法确定待存储数据的数据名称,包括:In some embodiments, a preset name determination method is used to determine the data name of the data to be stored, including:
利用预设哈希算法计算与数据块对应的哈希值;及Calculate the hash value corresponding to the data block using a preset hash algorithm; and
将哈希值作为数据块的数据块名称;Use the hash value as the data block name of the data block;
相应的,通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息,通过固态硬盘中的键值存储接口将待存储数据对应的键值信息存储至固态硬盘中,包括:Correspondingly, by determining the data name as a key and the data to be stored as a value, the key value information corresponding to the data to be stored is constructed, and the key value information corresponding to the data to be stored is stored through the key value storage interface in the solid state drive. to solid state drive, including:
通过将数据块名称确定为键以及将数据块中的数据块数据确定为值的方式,构建数据块对应的键值信息;及Construct the key value information corresponding to the data block by determining the data block name as the key and the data block data in the data block as the value; and
通过固态硬盘中的键值存储接口将数据块对应的键值信息存储至固态硬盘中。The key-value information corresponding to the data block is stored in the solid-state drive through the key-value storage interface in the solid-state drive.
在其中一些实施例中,上述数据存储方法,还包括:In some embodiments, the above data storage method also includes:
将任一数据块所对应的数据块序号与数据块名称进行绑定,并形成映射关系;及Bind the data block serial number corresponding to any data block with the data block name and form a mapping relationship; and
将数据块序号与数据块名称的映射关系进行记录,以形成数据块序号与数据块名称的映射关系列表。Record the mapping relationship between the data block serial number and the data block name to form a mapping relationship list between the data block serial number and the data block name.
在其中一些实施例中,在确定待存储数据之前,还包括:In some of the embodiments, before determining the data to be stored, the method further includes:
获取待存储数据,并确定与待存储数据对应的目标对象存储设备;及Obtain the data to be stored and determine the target object storage device corresponding to the data to be stored; and
将待存储数据写入目标对象存储设备中;Write the data to be stored into the target object storage device;
相应的,确定待存储数据,包括:Accordingly, determine the data to be stored, including:
从目标对象存储设备中提取待存储数据。Extract the data to be stored from the target object storage device.
本申请的第二方面,提供了一种数据存储装置,应用于分布式存储系统,包括:The second aspect of this application provides a data storage device applied to a distributed storage system, including:
数据名称确定模块,用于确定待存储数据,并利用预设名称确定方法确定待存储数据的数据名称;预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法;The data name determination module is used to determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method; the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm;
键值信息构建模块,用于通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息;及A key-value information building module, used to construct key-value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value; and
信息存储模块,用于通过固态硬盘中的键值存储接口将待存储数据对应的键值信息存储至固态硬盘中。The information storage module is used to store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive.
本申请的第三方面,提供了一种电子设备,包括:A third aspect of this application provides an electronic device, including:
一个或多个存储器,用于保存计算机可读指令;及One or more memories for holding computer-readable instructions; and
一个或多个处理器,用于执行计算机可读指令,以实现前述的数据存储方法。One or more processors, used to execute computer-readable instructions to implement the aforementioned data storage method.
本申请的第四方面,提供了一种非易失性计算机可读存储介质,用于保存计算机可读指令;其中,计算机可读指令被一个或多个处理器执行时实现前述公开的数据存储方法的步骤。A fourth aspect of the present application provides a non-volatile computer-readable storage medium for storing computer-readable instructions; wherein the computer-readable instructions implement the aforementioned disclosed data storage when executed by one or more processors. Method steps.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.
图1为本申请一个或多个实施例中提供的一种数据存储方法流程图;Figure 1 is a flow chart of a data storage method provided in one or more embodiments of the present application;
图2为本申请一个或多个实施例中提供的一种数据存储的整体架构图;Figure 2 is an overall architecture diagram of a data storage provided in one or more embodiments of the present application;
图3为一种传统的分布式数据存储的整体架构图;Figure 3 is an overall architecture diagram of a traditional distributed data storage;
图4为本申请一个或多个实施例中提供的一种具体的数据存储方法流程图;Figure 4 is a flow chart of a specific data storage method provided in one or more embodiments of the present application;
图5为本申请一个或多个实施例中提供的一种具体的数据存储方法流程图;Figure 5 is a flow chart of a specific data storage method provided in one or more embodiments of the present application;
图6为本申请一个或多个实施例中提供的一种数据存储装置结构示意图;Figure 6 is a schematic structural diagram of a data storage device provided in one or more embodiments of the present application;
图7为本申请一个或多个实施例中提供的一种电子设备结构图。Figure 7 is a structural diagram of an electronic device provided in one or more embodiments of the present application.
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
现有技术中,当需要使用键值存储系统时,就要在存储节点中使用数据库软件,经过多个软件层的转换完成数据的存储,整个过程中软件层次多,系统复杂,有着巨大的资源开销。在本申请中提出的方法中,不再使用数据库软件等软件方法存储键值信息,而是以数据名称为键,以数据为值,直接使用带有键值存储接口的固态硬盘进行存储,能够减少系统复杂度与软件层次,以及减小系统开销。In the existing technology, when a key-value storage system needs to be used, database software must be used in the storage node, and data storage is completed through conversion of multiple software layers. The entire process has many software layers, complex systems, and huge resources. overhead. In the method proposed in this application, software methods such as database software are no longer used to store key-value information. Instead, the data name is used as the key and the data is used as the value. A solid-state drive with a key-value storage interface is directly used for storage, which can Reduce system complexity and software levels, and reduce system overhead.
本发明实施例公开了一种数据存储方法,应用于分布式存储系统,参见图1,该方法包括:An embodiment of the present invention discloses a data storage method, which is applied to a distributed storage system. See Figure 1. The method includes:
步骤S11:确定待存储数据,并利用预设名称确定方法确定待存储数据的数据名称;预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法。Step S11: Determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method; the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm.
本实施例中,确定待存储数据之前,还可以包括:获取待存储数据,并确定与待存储数据对应的目标对象存储设备;将待存储数据写入目标对象存储设备中;相应的,确定待存储数据,包括:从目标对象存储设备中提取待存储数据。本实施例中,待存储数据为预先写入对象存储设备(即Object Storage Device,OSD)中的数据。在一种具体的实施方式 中,待存储数据可以是从预设的数据池中取出,并存放至对应的目标对象存储设备中的数据。In this embodiment, before determining the data to be stored, it may also include: obtaining the data to be stored and determining a target object storage device corresponding to the data to be stored; writing the data to be stored into the target object storage device; correspondingly, determining the data to be stored. Storing data includes: retrieving data to be stored from the target object storage device. In this embodiment, the data to be stored is data pre-written in the object storage device (ie, Object Storage Device, OSD). In a specific implementation, the data to be stored may be data taken out from a preset data pool and stored in the corresponding target object storage device.
步骤S12:通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息。Step S12: Construct key value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value.
本实施例中,本方法避免了传统分布式存储时中使用存储节点的CPU、内存等硬件资源,并利用数据库软件记录数据的存放位置的方法,而是直接在确定待存储数据的数据名称(即data ID)后,将数据名称确定为键(即key),并将待存储数据确定为值(即value)的方法,大幅度缩短分布式存储的软件栈,缩短IO(即Input/Output,输入/输出)时延,进而达到提升系统性能的目的。In this embodiment, this method avoids the use of hardware resources such as CPU and memory of storage nodes in traditional distributed storage, and uses database software to record the storage location of data. Instead, it directly determines the data name of the data to be stored ( (i.e., data ID), the method of determining the data name as a key (i.e., key) and determining the data to be stored as a value (i.e., value) can greatly shorten the software stack of distributed storage and shorten IO (i.e., Input/Output, Input/output) delay, thereby achieving the purpose of improving system performance.
步骤S13:通过固态硬盘中的键值存储接口将待存储数据对应的键值信息存储至固态硬盘中。Step S13: Store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive.
可以理解的是,本实施例中的固态硬盘为带有键值存储接口的固态硬盘,以下称作KV-SSD。在一种具体的实施方式中,本实施例中的固态硬盘也可以是NVMe2.0协议中提出的可直接对外提供键值接口的SSD硬件。It can be understood that the solid state drive in this embodiment is a solid state drive with a key-value storage interface, which is hereinafter referred to as KV-SSD. In a specific implementation manner, the solid state drive in this embodiment may also be SSD hardware that can directly provide a key-value interface to the outside world as proposed in the NVMe2.0 protocol.
如图2为本申请提出的一种数据存储的整体架构图,图中展示了在与外部进行数据交互后,可以将数据存放至数据池中,并在数据被写入对应的对象存储设备中后,通过存储层将数据写入预设的带有键值存储接口的固态硬盘中的过程,其中可以将数据的数据名称(即data ID)直接作为数据的键(即key),并将数据直接作为值(即value),进而完成键值信息的构建。Figure 2 is an overall architecture diagram of a data storage proposed by this application. The figure shows that after data interaction with the outside, the data can be stored in the data pool, and the data can be written to the corresponding object storage device. Finally, the process of writing data into a preset solid-state drive with a key-value storage interface through the storage layer, in which the data name of the data (i.e. data ID) can be directly used as the key of the data (i.e. key), and the data Directly as the value (i.e. value), and then complete the construction of key-value information.
如图3为传统的分布式数据存储的整体架构图,通过图2与图3的对比可知,在传统的分布式数据存储中,会利用数据库软件(即图中的RocksDB)对元数据进行处理,而在本申请中,不使用这些数据库软件,而直接使用KV-SSD代替数据库软件存储键值信息,在KV-SSD内部,也就是硬件内部完成元数据系统的管理,有效地减少存储节点的资源消耗与软件层的复杂度。Figure 3 is the overall architecture diagram of traditional distributed data storage. A comparison of Figure 2 and Figure 3 shows that in traditional distributed data storage, database software (i.e. RocksDB in the figure) is used to process metadata. , in this application, these database software are not used, but KV-SSD is directly used instead of database software to store key value information. The management of the metadata system is completed inside the KV-SSD, that is, inside the hardware, effectively reducing the number of storage nodes. Resource consumption and software layer complexity.
需要指出的是,本实施例中不再使用传统分布式存储方法中使用数据库软件完成键值信息的存储,只要待存储数据所对应的键与值满足预设的键值存储接口规格,即可适用于本发明中的方法。It should be pointed out that in this embodiment, the database software used in the traditional distributed storage method is no longer used to complete the storage of key-value information. As long as the keys and values corresponding to the data to be stored meet the preset key-value storage interface specifications, it is sufficient. suitable for the method of the present invention.
本实施例通过先确定待存储数据,并利用预设名称确定方法确定待存储数据的数据名称,预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法,通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息,通 过固态硬盘中的键值存储接口将待存储数据对应的键值信息存储至固态硬盘中。这样一来,本方案避免了传统分布式存储时中使用存储节点的CPU、内存等硬件资源,且不需要利用数据库软件完成数据的存储,而是将数据名称作为键,以数据为值,直接存储至带有键值存储接口的固态硬盘中,本方法中存储节点无需进行元数据的管理,而将元数据的管理交由KV-SSD主控CPU完成,进而减少了软件层次,并减少了存储节点的资源消耗,降低了系统复杂度与系统开销,达到高效提升系统性能的目的。In this embodiment, the data to be stored is determined first, and the data name of the data to be stored is determined using a preset name determination method. The preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm. By changing the data name By determining as a key and determining the data to be stored as a value, the key value information corresponding to the data to be stored is constructed, and the key value information corresponding to the data to be stored is stored in the solid state drive through the key value storage interface in the solid state drive. In this way, this solution avoids the use of hardware resources such as CPU and memory of storage nodes in traditional distributed storage, and does not need to use database software to complete data storage. Instead, it uses the data name as the key and the data as the value, directly Stored in a solid-state drive with a key-value storage interface. In this method, the storage node does not need to manage metadata, and the management of metadata is completed by the KV-SSD main control CPU, thereby reducing the software level and reducing the number of The resource consumption of storage nodes reduces system complexity and system overhead, achieving the purpose of efficiently improving system performance.
图4为本申请实施例提供的一种具体的数据存储方法流程图。参见图4所示,该方法包括:Figure 4 is a flow chart of a specific data storage method provided by an embodiment of the present application. As shown in Figure 4, the method includes:
步骤S21:确定待存储数据,并通过预设切分长度获取接口获取切分长度。Step S21: Determine the data to be stored, and obtain the segmentation length through the preset segmentation length acquisition interface.
在一种具体的实施方式中,在确定待存储数据后,可以对待存储数据进行切分,并对切分后的数据进行处理。可以理解的是,在这种情况下,则先要通过预设的切分长度获取接口获取切分长度,并在用于后续基于切分长度完成数据的切分。在一种具体的实施方式中,切分长度可以是2MB或4MB等。In a specific implementation, after determining the data to be stored, the data to be stored can be segmented, and the segmented data can be processed. It is understandable that in this case, the segmentation length must first be obtained through the preset segmentation length acquisition interface, and then used to complete data segmentation based on the segmentation length. In a specific implementation, the segmentation length may be 2MB or 4MB, etc.
步骤S22:利用切分长度对待存储数据进行切分,以获取与待存储数据所对应的各数据块,并确定与各数据块对应的切分偏移量。Step S22: Segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored, and determine the segmentation offset corresponding to each data block.
本步骤中使用通过预设切分长度获取接口获取的切分长度对待存储数据进行切分,并获取切分后的各数据块。可以理解的是,在切分过程中还会确定各数据块的切分偏移量(即offset)。In this step, the segmentation length obtained through the preset segmentation length acquisition interface is used to segment the data to be stored, and each segmented data block is obtained. It can be understood that during the segmentation process, the segmentation offset (ie, offset) of each data block will also be determined.
步骤S23:分别利用切分偏移量对切分长度进行取余,以获取与各数据块对应的取余结果,并将各数据块对应的取余结果确定为各数据块的数据块序号。Step S23: Use the segmentation offset to perform remainder on the segmentation length to obtain the remainder result corresponding to each data block, and determine the remainder result corresponding to each data block as the data block sequence number of each data block.
本步骤完成的是确定各数据块所对应的数据块序号的过程,数据块序号也可记作data index,其中的具体过程可以是利用切分偏移量对切分长度进行取余,即当切分长度为4MB时,数据块序号为data index=offset%4MB。What this step completes is the process of determining the data block sequence number corresponding to each data block. The data block sequence number can also be recorded as data index. The specific process can be to use the segmentation offset to take the remainder of the segmentation length, that is, when When the segmentation length is 4MB, the data block sequence number is data index=offset%4MB.
步骤S24:确定当前存储场景,并利用数据块序号与当前存储场景确定各数据块的数据块名称。Step S24: Determine the current storage scenario, and use the data block serial number and the current storage scenario to determine the data block name of each data block.
本实施例中,利用数据块序号与当前存储场景确定待存储数据的数据名称,可以包括:若当前存储场景为文件存储或对象存储,则确定与待存储数据对应的文件编号,并利用文件编号与数据块序号确定数据块的数据块名称;若当前存储场景为块存储,则确定与待存储数据对应的逻辑单元号,并利用逻辑单元号与数据块序号确定数据块的数据块名称。可以理解的是,本实施例中可以在不同的存储场景下使用不同的数据块名称确定方法,即若 当前为文件存储或对象存储时,可以使用文件编号(即inodenumber)和数据块序号确定数据块的数据块名称,在一种具体的实施方式中,数据块名称data ID可以记作data ID=inodenumber+offset%4MB,由于文件编号与待存储数据唯一对应,则可以使用此方法得到的data ID区分KV-SSD上的不同文件。若当前为块存储时,可以使用逻辑单元号(即LUN ID)和数据块序号确定数据块的数据块名称,在一种具体的实施方式中,数据块名称data ID可以记作data ID=LUN ID+offset%4MB,由于逻辑单元号与待存储数据唯一对应,则可以使用此方法得到的data ID区分KV-SSD上不同卷的数据。In this embodiment, using the data block serial number and the current storage scenario to determine the data name of the data to be stored may include: if the current storage scenario is file storage or object storage, determining the file number corresponding to the data to be stored, and using the file number Determine the data block name of the data block with the data block serial number; if the current storage scenario is block storage, determine the logical unit number corresponding to the data to be stored, and use the logical unit number and the data block serial number to determine the data block name of the data block. It can be understood that in this embodiment, different data block name determination methods can be used in different storage scenarios. That is, if the current storage is file storage or object storage, the file number (ie, inodenumber) and data block serial number can be used to determine the data. The data block name of the block. In a specific implementation, the data block name data ID can be recorded as data ID=inodenumber+offset%4MB. Since the file number uniquely corresponds to the data to be stored, the data obtained by this method can be used. ID distinguishes different files on KV-SSD. If the current storage is block storage, the logical unit number (i.e. LUN ID) and the data block serial number can be used to determine the data block name of the data block. In a specific implementation, the data block name data ID can be recorded as data ID=LUN ID+offset%4MB. Since the logical unit number uniquely corresponds to the data to be stored, the data ID obtained by this method can be used to distinguish the data of different volumes on the KV-SSD.
步骤S25:通过将数据块名称确定为键以及将数据块中的数据块数据确定为值的方式,构建数据块对应的键值信息。Step S25: Construct the key value information corresponding to the data block by determining the data block name as the key and the data block data in the data block as the value.
其中,关于步骤S25的其它更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For other more specific processing procedures of step S25, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
步骤S26:通过固态硬盘中的键值存储接口将数据块对应的键值信息存储至固态硬盘中。Step S26: Store the key value information corresponding to the data block in the solid state drive through the key value storage interface in the solid state drive.
其中,关于步骤S26的其它更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For other more specific processing procedures of step S26, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
在本实施例中,在获取到待存储数据后,对待存储数据进行切分,并获取切分后的各数据块,然后使用基于存储场景的确定方法来确定各数据块的数据块名称,并将数据块名称确定为值,数据块数据确定为值的方式构建键值信息,并存入带有键值存储接口的固态硬盘中,本实施例中所提出的基于存储场景的确定方法计算简单,计算量较小,减少了系统运行开销与系统复杂度。In this embodiment, after obtaining the data to be stored, the data to be stored is segmented, and each segmented data block is obtained, and then a determination method based on the storage scenario is used to determine the data block name of each data block, and The key-value information is constructed by determining the data block name as a value and the data block data as a value, and stores them in a solid-state drive with a key-value storage interface. The determination method based on the storage scenario proposed in this embodiment is simple to calculate. , the amount of calculation is small, and the system operation overhead and system complexity are reduced.
图5为本申请实施例提供的一种具体的数据存储方法流程图。参见图5所示,该方法包括:Figure 5 is a flow chart of a specific data storage method provided by an embodiment of the present application. As shown in Figure 5, the method includes:
步骤S31:确定待存储数据,并通过预设切分长度获取接口获取切分长度。Step S31: Determine the data to be stored, and obtain the segmentation length through the preset segmentation length acquisition interface.
其中,关于步骤S31的其它更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For other more specific processing procedures of step S31, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
步骤S32:利用切分长度对待存储数据进行切分,以获取与待存储数据所对应的各数据块。Step S32: Segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored.
其中,关于步骤S32的其它更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For other more specific processing procedures of step S32, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and will not be described again here.
步骤S33:利用预设哈希算法计算与数据块对应的哈希值。Step S33: Calculate the hash value corresponding to the data block using a preset hash algorithm.
本实施例中可以通过基于哈希算法的确定方法来计算与数据块对应的哈希值。哈希算法包括但不限于MD5(即Message Digest Algorithm 5,信息摘要算法)、SHA(即Secure Hash Algorithm,安全哈希算法)。In this embodiment, the hash value corresponding to the data block can be calculated through a determination method based on a hash algorithm. Hash algorithms include but are not limited to MD5 (Message Digest Algorithm 5, message digest algorithm), SHA (Secure Hash Algorithm, secure hash algorithm).
步骤S34:将哈希值作为数据块的数据块名称。Step S34: Use the hash value as the data block name of the data block.
本实施例中,的数据存储方法,还可以包括:将任一数据块所对应的数据块序号与数据块名称进行绑定,并形成映射关系;将数据块序号与数据块名称的映射关系进行记录,以形成数据块序号与数据块名称的映射关系列表。需要指出的是,本方法还需额外计算数据块序号data index与数据块名称data ID的对应关系,与上述基于存储场景的确定方法相比,增大了计算量,但本方法的优点是,对于内容相同的数据块,其计算得出的data ID相同,由于相同数据在盘上只储存一份,所以具有重复数据删除的功能。In this embodiment, the data storage method may also include: binding the data block serial number corresponding to any data block and the data block name, and forming a mapping relationship; performing the mapping relationship between the data block serial number and the data block name. Record to form a mapping relationship list between data block serial numbers and data block names. It should be pointed out that this method also needs to additionally calculate the corresponding relationship between the data block serial number data index and the data block name data ID. Compared with the above determination method based on storage scenarios, the amount of calculation increases, but the advantage of this method is that For data blocks with the same content, the calculated data ID is the same. Since only one copy of the same data is stored on the disk, it has the function of deduplication.
步骤S35:通过将数据块名称确定为键以及将数据块中的数据块数据确定为值的方式,构建数据块对应的键值信息。Step S35: Construct the key value information corresponding to the data block by determining the data block name as the key and the data block data in the data block as the value.
其中,关于步骤S35的其它更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For other more specific processing procedures of step S35, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
步骤S36:通过固态硬盘中的键值存储接口将数据块对应的键值信息存储至固态硬盘中。Step S36: Store the key value information corresponding to the data block in the solid state drive through the key value storage interface in the solid state drive.
其中,关于步骤S36的其它更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。For other more specific processing procedures of step S36, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
在本实施例中,在获取到待存储数据后,对待存储数据进行切分,并获取切分后的各数据块,然后使用基于哈希算法的确定方法来确定各数据块的数据块名称,并将数据块名称确定为值,数据块数据确定为值的方式构建键值信息,并存入带有键值存储接口的固态硬盘中,本实施例中所提出的基于哈希算法的确定方法与上述基于存储场景的确定方法相比,增大了计算量,但本方法的优点是,对于内容相同的数据块,其计算得出的data ID相同,由于相同数据在盘上只储存一份,所以具有重复数据删除的功能。In this embodiment, after obtaining the data to be stored, the data to be stored is segmented, and each segmented data block is obtained, and then a determination method based on a hash algorithm is used to determine the data block name of each data block. The key-value information is constructed by determining the data block name as a value and the data block data as a value, and stores them in a solid-state drive with a key-value storage interface. The determination method based on the hash algorithm proposed in this embodiment Compared with the above determination method based on storage scenarios, the amount of calculation is increased, but the advantage of this method is that for data blocks with the same content, the calculated data ID is the same, because only one copy of the same data is stored on the disk , so it has the function of deduplication.
参见图6所示,本申请实施例公开了一种数据存储装置,具体可以包括:Referring to Figure 6, an embodiment of the present application discloses a data storage device, which may specifically include:
数据名称确定模块11,用于确定待存储数据,并利用预设名称确定方法确定待存储数据的数据名称;预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法;The data name determination module 11 is used to determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method; the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm;
键值信息构建模块12,用于通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息;及The key value information construction module 12 is used to construct the key value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value; and
信息存储模块13,用于通过固态硬盘中的键值存储接口将待存储数据对应的键值信息 存储至固态硬盘中。The information storage module 13 is used to store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive.
本申请通过先确定待存储数据,并利用预设名称确定方法确定待存储数据的数据名称,预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法,通过将数据名称确定为键以及将待存储数据确定为值的方式,构建待存储数据对应的键值信息,通过固态硬盘中的键值存储接口将待存储数据对应的键值信息存储至固态硬盘中。这样一来,本方案避免了传统分布式存储时中使用存储节点的CPU、内存等硬件资源,且不需要利用数据库软件完成数据的存储,而是将数据名称作为键,以数据为值,直接存储至带有键值存储接口的固态硬盘中,本方法中存储节点无需进行元数据的管理,进而减少了软件层次,并减少了存储节点的资源消耗,降低了系统复杂度与系统开销,达到高效提升系统性能的目的。This application first determines the data to be stored, and uses a preset name determination method to determine the data name of the data to be stored. The preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm. The data name is determined by Construct the key value information corresponding to the data to be stored as a key and determine the data to be stored as a value, and store the key value information corresponding to the data to be stored in the solid state drive through the key value storage interface in the solid state drive. In this way, this solution avoids the use of hardware resources such as CPU and memory of storage nodes in traditional distributed storage, and does not need to use database software to complete data storage. Instead, it uses the data name as the key and the data as the value, directly Stored in a solid-state drive with a key-value storage interface, the storage node does not need to manage metadata in this method, thereby reducing the software level, reducing the resource consumption of the storage node, reducing the system complexity and system overhead, and achieving The purpose of efficiently improving system performance.
在一些具体实施例中,数据存储装置,还包括:In some specific embodiments, the data storage device also includes:
切分长度获取模块,用于通过预设切分长度获取接口获取切分长度;The cutting length acquisition module is used to obtain the cutting length through the preset cutting length acquisition interface;
数据切分模块,用于利用切分长度对待存储数据进行切分,以获取与待存储数据所对应的各数据块,并确定与各数据块对应的切分偏移量;及The data segmentation module is used to segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored, and determine the segmentation offset corresponding to each data block; and
数据块序号确定模块,用于分别利用切分偏移量对切分长度进行取余,以获取与各数据块对应的取余结果,并将各数据块对应的取余结果确定为各数据块的数据块序号。The data block sequence number determination module is used to perform remainder on the segmentation length using the segmentation offset to obtain the remainder result corresponding to each data block, and determine the remainder result corresponding to each data block as each data block. The data block sequence number.
在一些具体实施例中,数据名称确定模块11,包括:In some specific embodiments, the data name determination module 11 includes:
场景确定单元,用于确定当前存储场景;The scene determination unit is used to determine the current storage scene;
第一数据块名称确定单元,用于利用数据块序号与当前存储场景确定各数据块的数据块名称;The first data block name determination unit is used to determine the data block name of each data block using the data block serial number and the current storage scenario;
相应的,键值信息构建模块12与信息存储模块13,包括:Correspondingly, the key value information building module 12 and the information storage module 13 include:
第一键值信息构建单元,用于通过将数据块名称确定为键以及将数据块中的数据块数据确定为值的方式,构建数据块对应的键值信息;及The first key value information building unit is used to construct the key value information corresponding to the data block by determining the name of the data block as the key and the data block data in the data block as the value; and
第一信息存储单元,用于通过固态硬盘中的键值存储接口将数据块对应的键值信息存储至固态硬盘中。The first information storage unit is used to store the key value information corresponding to the data block into the solid state drive through the key value storage interface in the solid state drive.
在一些具体实施例中,第一数据块名称确定单元,包括:In some specific embodiments, the first data block name determination unit includes:
第一场景命名单元,用于若当前存储场景为文件存储或对象存储,则确定与待存储数据对应的文件编号,并利用文件编号与数据块序号确定数据块的数据块名称;及The first scene naming unit is used to determine the file number corresponding to the data to be stored if the current storage scene is file storage or object storage, and use the file number and the data block serial number to determine the data block name of the data block; and
第二场景命名单元,用于若当前存储场景为块存储,则确定与待存储数据对应的逻辑单元号,并利用逻辑单元号与数据块序号确定数据块的数据块名称。The second scene naming unit is used to determine the logical unit number corresponding to the data to be stored if the current storage scenario is block storage, and use the logical unit number and the data block serial number to determine the data block name of the data block.
在一些具体实施例中,数据名称确定模块11,包括:In some specific embodiments, the data name determination module 11 includes:
哈希值确定单元,用于利用预设哈希算法计算与数据块对应的哈希值;The hash value determination unit is used to calculate the hash value corresponding to the data block using a preset hash algorithm;
第二数据块名称确定单元,用于将哈希值作为数据块的数据块名称;The second data block name determination unit is used to use the hash value as the data block name of the data block;
相应的,键值信息构建模块12与信息存储模块13,包括:Correspondingly, the key value information building module 12 and the information storage module 13 include:
第一键值信息构建单元,用于通过将数据块名称确定为键以及将数据块中的数据块数据确定为值的方式,构建数据块对应的键值信息;及The first key value information building unit is used to construct the key value information corresponding to the data block by determining the name of the data block as the key and the data block data in the data block as the value; and
第一信息存储单元,用于通过固态硬盘中的键值存储接口将数据块对应的键值信息存储至固态硬盘中。The first information storage unit is used to store the key value information corresponding to the data block into the solid state drive through the key value storage interface in the solid state drive.
在一些具体实施例中,数据存储装置,还包括:In some specific embodiments, the data storage device also includes:
映射关系确定单元,用于将任一数据块所对应的数据块序号与数据块名称进行绑定,并形成映射关系;及The mapping relationship determination unit is used to bind the data block sequence number corresponding to any data block and the data block name, and form a mapping relationship; and
映射列表确定单元,用于将数据块序号与数据块名称的映射关系进行记录,以形成数据块序号与数据块名称的映射关系列表。The mapping list determination unit is used to record the mapping relationship between the data block serial number and the data block name, so as to form a mapping relationship list between the data block serial number and the data block name.
在一些具体实施例中,数据存储装置,还包括:In some specific embodiments, the data storage device also includes:
对象存储设备确定单元,用于获取待存储数据,并确定与待存储数据对应的目标对象存储设备;An object storage device determination unit is used to obtain the data to be stored and determine the target object storage device corresponding to the data to be stored;
数据写入单元,用于将待存储数据写入目标对象存储设备中;A data writing unit is used to write the data to be stored into the target object storage device;
相应的,数据名称确定模块11中,包括:Correspondingly, the data name determination module 11 includes:
数据提取单元,用于从目标对象存储设备中提取待存储数据。A data extraction unit is used to extract data to be stored from the target object storage device.
进一步的,本申请实施例还公开了一种电子设备,图7是根据示例性实施例示出的电子设备20结构图,图中的内容不能认为是对本申请的使用范围的任意计算机可读指令何限制。Furthermore, embodiments of the present application also disclose an electronic device. Figure 7 is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content in the figure cannot be considered to be any computer-readable instructions within the scope of use of the present application. limit.
图7为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,具体可以包括:一个或多个处理器21、一个或多个存储器22、电源23、显示屏24、输入输出接口25、通信接口26和通信总线27。其中,存储器22用于存储,计算机可读指令由一个或多个处理器21加载并执行,以实现前述任一实施例公开的数据存储方法中的相关步骤。另外,本实施例中的电子设备20具体可以为电子计算机。FIG. 7 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application. The electronic device 20 may specifically include: one or more processors 21, one or more memories 22, a power supply 23, a display screen 24, an input and output interface 25, a communication interface 26 and a communication bus 27. The memory 22 is used for storage, and the computer-readable instructions are loaded and executed by one or more processors 21 to implement the relevant steps in the data storage method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in this embodiment may specifically be an electronic computer.
本实施例中,电源23用于为电子设备20上的各硬件设备提供工作电压;通信接口26能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,用于 获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。In this embodiment, the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20; the communication interface 26 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here; the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here No specific limitation is made.
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源可以包括操作系统221、计算机可读指令222等,存储方式可以是短暂存储或者永久存储。In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc. The resources stored thereon can include an operating system 221, computer readable instructions 222, etc., and the storage method can be short-term storage or Permanent storage.
其中,操作系统221用于管理与控制电子设备20上的各硬件设备以及计算机可读指令222,其可以是Windows、Unix、Linux等。计算机可读指令222除了包括能够用于完成前述任一实施例公开的由电子设备20执行的数据存储方法的计算机可读指令之外,还可以进一步包括能够用于完成其他特定工作的计算机可读指令。Among them, the operating system 221 is used to manage and control each hardware device on the electronic device 20 and the computer readable instructions 222, which can be Windows, Unix, Linux, etc. In addition to computer-readable instructions that can be used to complete the data storage method executed by the electronic device 20 disclosed in any of the foregoing embodiments, the computer-readable instructions 222 may further include computer-readable instructions that can be used to complete other specific tasks. instruction.
进一步的,本申请还公开了一种非易失性计算机可读存储介质,这里所说的非易失性计算机可读存储介质包括随机存取存储器(Random Access Memory,RAM)、内存、只读存储器(Read-Only Memory,ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、磁碟或者光盘或技术领域内所公知的任意其他形式的存储介质。其中,计算机可读指令被一个或多个处理器执行时实现前述公开的数据存储方法。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。Furthermore, this application also discloses a non-volatile computer-readable storage medium. The non-volatile computer-readable storage medium mentioned here includes random access memory (Random Access Memory, RAM), memory, and read-only memory. Memory (Read-Only Memory, ROM), electrically programmable ROM, electrically erasable programmable ROM, register, hard disk, magnetic disk or optical disk or any other form of storage medium known in the technical field. The computer-readable instructions implement the aforementioned disclosed data storage method when executed by one or more processors. Regarding the specific steps of this method, reference may be made to the corresponding content disclosed in the foregoing embodiments, which will not be described again here.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section. Those skilled in the art may further realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both. In order to clearly illustrate the possible functions of hardware and software, Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、一个或多个处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by one or more processors, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之 间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or any such actual relationship or sequence between operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element qualified by the statement "comprises a..." does not exclude the presence of additional identical elements in the process, method, article, or device that includes the element.
以上对本发明所提供的数据存储方法、装置、设备、存储介质进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The data storage methods, devices, equipment, and storage media provided by the present invention have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation modes of the present invention. The description of the above embodiments is only used to help understand the present invention. The method of the invention and its core idea; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation and scope of application based on the idea of the invention. In summary, the contents of this specification should not be understood are limitations of the present invention.
Claims (20)
- 一种数据存储方法,其特征在于,应用于分布式存储系统,包括:A data storage method, characterized in that it is applied to a distributed storage system, including:确定待存储数据,并利用预设名称确定方法确定所述待存储数据的数据名称;所述预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法;Determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method; the preset name determination method is a determination method based on storage scenarios or a determination method based on a hash algorithm;通过将所述数据名称确定为键以及将所述待存储数据确定为值的方式,构建所述待存储数据对应的键值信息;及Construct key value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value; and通过固态硬盘中的键值存储接口将所述待存储数据对应的所述键值信息存储至所述固态硬盘中。The key value information corresponding to the data to be stored is stored in the solid state drive through a key value storage interface in the solid state drive.
- 根据权利要求1所述的数据存储方法,其特征在于,在确定待存储数据之前,所述方法还包括:The data storage method according to claim 1, characterized in that before determining the data to be stored, the method further includes:获取所述待存储数据,确定与所述待存储数据对应的目标对象存储设备;及Obtain the data to be stored and determine the target object storage device corresponding to the data to be stored; and将所述待存储数据写入所述目标对象的存储设备中。Write the data to be stored in the storage device of the target object.
- 根据权利要求2所述的数据存储方法,其特征在于,在将所述待存储数据写入所述目标对象的存储设备中之后,所述方法还包括:The data storage method according to claim 2, characterized in that, after writing the data to be stored into the storage device of the target object, the method further includes:从所述目标对象的存储设备中提取所述待存储数据。Extract the data to be stored from the storage device of the target object.
- 根据权利要求1所述的数据存储方法,其特征在于,所述利用预设名称确定方法确定所述待存储数据的数据名称之前,还包括:The data storage method according to claim 1, characterized in that before determining the data name of the data to be stored using a preset name determination method, it further includes:通过预设切分长度获取接口获取切分长度;Obtain the cutting length through the preset cutting length acquisition interface;利用所述切分长度对所述待存储数据进行切分,以获取与所述待存储数据所对应的各数据块,并确定与所述各数据块对应的切分偏移量;及Segment the data to be stored using the segmentation length to obtain each data block corresponding to the data to be stored, and determine the segmentation offset corresponding to each data block; and分别利用所述切分偏移量对所述切分长度进行取余,以获取与所述各数据块对应的取余结果,并将各数据块对应的取余结果确定为各数据块的数据块序号。Respectively use the segmentation offset to perform remainder on the segmentation length to obtain the remainder result corresponding to each data block, and determine the remainder result corresponding to each data block as the data of each data block. Block serial number.
- 根据权利要求4所述的数据存储方法,其特征在于,分别利用所述切分偏移量对所述切分长度进行取余,以获取与所述各数据块对应的取余结果,并将各数据块对应的取余结果确定为各数据块的数据块序号,包括:The data storage method according to claim 4, characterized in that the segmentation offset is used to carry out remainder on the segmentation length to obtain the remainder result corresponding to each data block, and The remainder result corresponding to each data block is determined as the data block sequence number of each data block, including:分别利用所述切分偏移量对所述切分长度进行取余,以获取与所述各数据块对应的取余结果,并将各数据块对应的取余结果基于下述公式计算所述数据块序号:The segmentation offset is respectively used to perform remainder on the segmentation length to obtain the remainder result corresponding to each data block, and the remainder result corresponding to each data block is calculated based on the following formula Data block sequence number:data index=offset*D*100%;data index=offset*D*100%;data index为数据块序号,offset为所述数据块对应的切分偏移量,D为所述切分长度。data index is the data block serial number, offset is the segmentation offset corresponding to the data block, and D is the segmentation length.
- 根据权利要求4所述的数据存储方法,其特征在于,所述利用预设名称确定方法确定所述待存储数据的数据名称,包括:The data storage method according to claim 4, characterized in that the use of a preset name determination method to determine the data name of the data to be stored includes:确定当前存储场景;及Determine the current storage scenario; and利用所述数据块序号与所述当前存储场景确定所述各数据块的数据块名称;Determine the data block name of each data block using the data block serial number and the current storage scenario;相应的,所述通过将所述数据名称确定为键以及将所述待存储数据确定为值的方式,构建所述待存储数据对应的键值信息,通过固态硬盘中的键值存储接口将所述待存储数据对应的所述键值信息存储至所述固态硬盘中,包括:Correspondingly, by determining the data name as a key and the data to be stored as a value, the key value information corresponding to the data to be stored is constructed, and all key value storage interfaces in the solid state drive are used. The key value information corresponding to the data to be stored is stored in the solid state drive, including:通过将所述数据块名称确定为键以及将所述数据块中的数据块数据确定为值的方式,构建所述数据块对应的键值信息;及Construct the key value information corresponding to the data block by determining the data block name as a key and the data block data in the data block as a value; and通过固态硬盘中的键值存储接口将所述数据块对应的所述键值信息存储至所述固态硬盘中。The key value information corresponding to the data block is stored in the solid state drive through a key value storage interface in the solid state drive.
- 根据权利要求6所述的数据存储方法,其特征在于,所述利用所述数据块序号与所述当前存储场景确定所述待存储数据的数据名称,包括:The data storage method according to claim 6, characterized in that, using the data block serial number and the current storage scenario to determine the data name of the data to be stored includes:响应于当前存储场景为文件存储或对象存储,确定与所述待存储数据对应的文件编号,并利用所述文件编号与所述数据块序号确定所述数据块的数据块名称;及In response to the current storage scenario being file storage or object storage, determine the file number corresponding to the data to be stored, and use the file number and the data block serial number to determine the data block name of the data block; and响应于当前存储场景为块存储,确定与所述待存储数据对应的逻辑单元号,并利用所述逻辑单元号与所述数据块序号确定所述数据块的数据块名称。In response to the current storage scenario being block storage, the logical unit number corresponding to the data to be stored is determined, and the data block name of the data block is determined using the logical unit number and the data block serial number.
- 根据权利要求6所述的数据存储方法,其特征在于,响应于当前存储场景为文件存储或对象存储,确定与所述待存储数据对应的文件编号,并利用所述文件编号与所述数据块序号确定所述数据块的数据块名称,包括:The data storage method according to claim 6, characterized in that, in response to the current storage scenario being file storage or object storage, determining the file number corresponding to the data to be stored, and using the file number and the data block The sequence number determines the data block name of the data block, including:确定与所述待存储数据对应的文件编号,并利用所述文件编号与所述数据块序号基于下述公式计算所述各数据块的数据块名称:Determine the file number corresponding to the data to be stored, and use the file number and the data block serial number to calculate the data block name of each data block based on the following formula:data ID=inodenumber+offset*100%*D;data ID=inodenumber+offset*100%*D;data ID为所述数据块名称,offset为所述数据块对应的切分偏移量,D为所述切分长度。data ID is the name of the data block, offset is the segmentation offset corresponding to the data block, and D is the segmentation length.
- 根据权利要求6所述的数据存储方法,其特征在于,响应于当前存储场景为块存储,确定与所述待存储数据对应的逻辑单元号,并利用所述逻辑单元号与所述数据块序号确定所述数据块的数据块名称,包括:The data storage method according to claim 6, characterized in that, in response to the current storage scenario being block storage, determining the logical unit number corresponding to the data to be stored, and using the logical unit number and the data block serial number Determine the data block name of the data block, including:data ID=LUN ID+offset*100%*D;data ID=LUN ID+offset*100%*D;data ID为所述数据块名称,offset为所述数据块对应的切分偏移量,D为所述切分长度,LUN ID为所述逻辑单元号。data ID is the name of the data block, offset is the segmentation offset corresponding to the data block, D is the segmentation length, and LUN ID is the logical unit number.
- 根据权利要求4所述的数据存储方法,其特征在于,所述利用预设名称确定方法确定所述待存储数据的数据名称,包括:The data storage method according to claim 4, characterized in that the use of a preset name determination method to determine the data name of the data to be stored includes:利用预设哈希算法计算与所述数据块对应的哈希值;及Calculate a hash value corresponding to the data block using a preset hash algorithm; and将所述哈希值作为所述数据块的数据块名称;Use the hash value as the data block name of the data block;相应的,所述通过将所述数据名称确定为键以及将所述待存储数据确定为值的方式,构建所述待存储数据对应的键值信息,通过固态硬盘中的键值存储接口将所述待存储数据对应的所述键值信息存储至所述固态硬盘中,包括:Correspondingly, by determining the data name as a key and the data to be stored as a value, the key value information corresponding to the data to be stored is constructed, and all key value storage interfaces in the solid state drive are used. The key value information corresponding to the data to be stored is stored in the solid state drive, including:通过将所述数据块名称确定为键以及将所述数据块中的数据块数据确定为值的方式,构建所述数据块对应的键值信息;及Construct the key value information corresponding to the data block by determining the data block name as a key and the data block data in the data block as a value; and通过固态硬盘中的键值存储接口将所述数据块对应的所述键值信息存储至所述固态硬盘中。The key value information corresponding to the data block is stored in the solid state drive through a key value storage interface in the solid state drive.
- 根据权利要求10所述的数据存储方法,其特征在于,预设哈希算法包括信息摘要算法和安全哈希算法。The data storage method according to claim 10, wherein the preset hash algorithm includes an information digest algorithm and a secure hash algorithm.
- 根据权利要求10所述的数据存储方法,其特征在于,将所述哈希值作为所述数据块的数据块名称,包括:The data storage method according to claim 10, characterized in that using the hash value as the data block name of the data block includes:将任一所述数据块所对应的数据块序号与数据块名称进行绑定,获得目标映射关系;及Bind the data block serial number corresponding to any of the data blocks and the data block name to obtain the target mapping relationship; and记录所述目标映射关系以获得所述数据块序号与所述数据块名称的映射关系列表。Record the target mapping relationship to obtain a mapping relationship list between the data block serial number and the data block name.
- 根据权利要求12所述的数据存储方法,其特征在于,记录所述目标映射关系以获得所述数据块序号与所述数据块名称的映射关系列表,包括:The data storage method according to claim 12, characterized in that recording the target mapping relationship to obtain a mapping relationship list between the data block serial number and the data block name includes:记录所述目标映射关系;Record the target mapping relationship;根据所述目标映射关系计算所述数据块序号与所述数据块名称的目标对应关系;及Calculate the target correspondence between the data block serial number and the data block name according to the target mapping relationship; and根据所述目标对应关系获得所述数据块序号与所述数据块名称的映射关系列表。Obtain a mapping relationship list between the data block serial number and the data block name according to the target corresponding relationship.
- 根据权利要求12所述的数据存储方法,其特征在于,在记录所述目标映射关系以获得所述数据块序号与所述数据块名称的映射关系列表之后,所述方法还包括:The data storage method according to claim 12, characterized in that, after recording the target mapping relationship to obtain a mapping relationship list between the data block serial number and the data block name, the method further includes:响应于出现重复数据,删除冗余的重复数据。In response to the occurrence of duplicate data, redundant duplicate data is removed.
- 根据权利要求12所述的数据存储方法,其特征在于,在记录所述目标映射关系以获得所述数据块序号与所述数据块名称的映射关系列表之后,所述方法还包括:The data storage method according to claim 12, characterized in that, after recording the target mapping relationship to obtain a mapping relationship list between the data block serial number and the data block name, the method further includes:响应于出现重复数据,保留一个目标数据以存储至所述固态硬盘中。In response to the occurrence of duplicate data, a target data is retained for storage to the solid state drive.
- 根据权利要求10所述的数据存储方法,其特征在于,还包括:The data storage method according to claim 10, further comprising:将任一数据块所对应的数据块序号与数据块名称进行绑定,并形成映射关系;及Bind the data block serial number corresponding to any data block with the data block name and form a mapping relationship; and将所述数据块序号与数据块名称的映射关系进行记录,以形成数据块序号与数据块名称的映射关系列表。The mapping relationship between the data block serial number and the data block name is recorded to form a mapping relationship list between the data block serial number and the data block name.
- 根据权利要求1至16任一项所述的数据存储方法,其特征在于,所述确定待存储数据之前,还包括:The data storage method according to any one of claims 1 to 16, characterized in that before determining the data to be stored, it further includes:获取待存储数据,并确定与所述待存储数据对应的目标对象存储设备;及Obtain the data to be stored and determine the target object storage device corresponding to the data to be stored; and将所述待存储数据写入所述目标对象存储设备中;Write the data to be stored into the target object storage device;相应的,所述确定待存储数据,包括:Correspondingly, the determination of data to be stored includes:从所述目标对象存储设备中提取所述待存储数据。Extract the data to be stored from the target object storage device.
- 一种数据存储装置,其特征在于,应用于分布式存储系统,包括:A data storage device, characterized in that it is applied to a distributed storage system, including:数据名称确定模块,用于确定待存储数据,并利用预设名称确定方法确定所述待存储数据的数据名称;所述预设名称确定方法为基于存储场景的确定方法或基于哈希算法的确定方法;A data name determination module, used to determine the data to be stored, and determine the data name of the data to be stored using a preset name determination method; the preset name determination method is a determination method based on storage scenarios or a determination based on a hash algorithm method;键值信息构建模块,用于通过将所述数据名称确定为键以及将所述待存储数据确定为值的方式,构建所述待存储数据对应的键值信息;及A key value information construction module, configured to construct key value information corresponding to the data to be stored by determining the data name as a key and the data to be stored as a value; and信息存储模块,用于通过固态硬盘中的键值存储接口将所述待存储数据对应的所述键值信息存储至所述固态硬盘中。An information storage module is configured to store the key value information corresponding to the data to be stored in the solid state drive through a key value storage interface in the solid state drive.
- 一种电子设备,其特征在于,包括一个或多个处理器和一个或多个存储器;其中,所述一个或多个处理器执行所述一个或多个存储器中保存的计算机可读指令时实现如权利要求1至17任一项所述的数据存储方法。An electronic device, characterized in that it includes one or more processors and one or more memories; wherein, when the one or more processors execute computer-readable instructions stored in the one or more memories, the implementation The data storage method according to any one of claims 1 to 17.
- 一种非易失性计算机可读存储介质,其特征在于,用于存储计算机可读指令;其中,所述计算机可读指令被所述一个或多个处理器执行时实现如权利要求1至17任一项所述的数据存储方法。A non-volatile computer-readable storage medium, characterized in that it is used to store computer-readable instructions; wherein the computer-readable instructions implement claims 1 to 17 when executed by the one or more processors The data storage method described in any one of the above.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210462119.5A CN114579061B (en) | 2022-04-28 | 2022-04-28 | Data storage method, device, equipment and medium |
CN202210462119.5 | 2022-04-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023207132A1 true WO2023207132A1 (en) | 2023-11-02 |
Family
ID=81778039
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/138693 WO2023207132A1 (en) | 2022-04-28 | 2022-12-13 | Data storage method and apparatus, and device and medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114579061B (en) |
WO (1) | WO2023207132A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114579061B (en) * | 2022-04-28 | 2022-07-29 | 苏州浪潮智能科技有限公司 | Data storage method, device, equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130173853A1 (en) * | 2011-09-26 | 2013-07-04 | Nec Laboratories America, Inc. | Memory-efficient caching methods and systems |
CN112434015A (en) * | 2020-12-08 | 2021-03-02 | 新华三大数据技术有限公司 | Data storage method and device, electronic equipment and medium |
CN113051221A (en) * | 2021-03-31 | 2021-06-29 | 网易(杭州)网络有限公司 | Data storage method, device, medium, equipment and distributed file system |
CN114356921A (en) * | 2021-12-28 | 2022-04-15 | 中国农业银行股份有限公司 | Data processing method, device, server and storage medium |
CN114579061A (en) * | 2022-04-28 | 2022-06-03 | 苏州浪潮智能科技有限公司 | Data storage method, device, equipment and medium |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722449B (en) * | 2012-05-24 | 2015-01-21 | 中国科学院计算技术研究所 | Key-Value local storage method and system based on solid state disk (SSD) |
US9934264B2 (en) * | 2015-06-02 | 2018-04-03 | Netapp, Inc. | Technique for reducing metadata stored in a memory of a node |
CN105912687B (en) * | 2016-04-19 | 2019-05-24 | 江苏物联网研究发展中心 | Magnanimity distributed data base storage unit |
US11644992B2 (en) * | 2016-11-23 | 2023-05-09 | Samsung Electronics Co., Ltd. | Storage system performing data deduplication, method of operating storage system, and method of operating data processing system |
US10769064B1 (en) * | 2017-12-20 | 2020-09-08 | Pliops Ltd | Method for retrieving key value pairs and groups of key value pairs |
CN111831208B (en) * | 2019-04-16 | 2023-04-14 | 中移(苏州)软件技术有限公司 | Information processing method and device, terminal equipment and storage medium |
KR102714982B1 (en) * | 2019-07-05 | 2024-10-10 | 삼성전자주식회사 | Storage device storing data based on key-value and operating method of the same |
CN111538461B (en) * | 2020-04-21 | 2023-04-07 | 招商局金融科技有限公司 | Data reading and writing method and device based on solid state disk cache and storage medium |
CN112214468B (en) * | 2020-10-18 | 2023-01-06 | 苏州浪潮智能科技有限公司 | Small file acceleration method, device, equipment and medium for distributed storage system |
CN113609090B (en) * | 2021-08-06 | 2024-06-18 | 杭州网易云音乐科技有限公司 | Data storage method and device, computer readable storage medium and electronic equipment |
CN113608699A (en) * | 2021-08-09 | 2021-11-05 | 北京金山云网络技术有限公司 | Data writing method and device and electronic equipment |
CN113806300B (en) * | 2021-09-23 | 2023-08-01 | 北京百度网讯科技有限公司 | Data storage method, system, device, equipment and storage medium |
-
2022
- 2022-04-28 CN CN202210462119.5A patent/CN114579061B/en active Active
- 2022-12-13 WO PCT/CN2022/138693 patent/WO2023207132A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130173853A1 (en) * | 2011-09-26 | 2013-07-04 | Nec Laboratories America, Inc. | Memory-efficient caching methods and systems |
CN112434015A (en) * | 2020-12-08 | 2021-03-02 | 新华三大数据技术有限公司 | Data storage method and device, electronic equipment and medium |
CN113051221A (en) * | 2021-03-31 | 2021-06-29 | 网易(杭州)网络有限公司 | Data storage method, device, medium, equipment and distributed file system |
CN114356921A (en) * | 2021-12-28 | 2022-04-15 | 中国农业银行股份有限公司 | Data processing method, device, server and storage medium |
CN114579061A (en) * | 2022-04-28 | 2022-06-03 | 苏州浪潮智能科技有限公司 | Data storage method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN114579061B (en) | 2022-07-29 |
CN114579061A (en) | 2022-06-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10698773B2 (en) | Replicating a source data set to a target data store | |
US8347050B2 (en) | Append-based shared persistent storage | |
US9274716B2 (en) | Systems and methods for hierarchical reference counting via sibling trees | |
EP3816783B1 (en) | Method and device for data migration | |
US8769225B2 (en) | Optimization of data migration between storage mediums | |
EP4030273A1 (en) | Data storage method and device | |
US8135918B1 (en) | Data de-duplication for iSCSI | |
US11221989B2 (en) | Tape image reclaim in hierarchical storage systems | |
US9116851B2 (en) | System and method for virtual tape library over S3 | |
US8478933B2 (en) | Systems and methods for performing deduplicated data processing on tape | |
CN111177143B (en) | Key value data storage method and device, storage medium and electronic equipment | |
EP4087212A1 (en) | Method and apparatus for cloning file system | |
CN113806300B (en) | Data storage method, system, device, equipment and storage medium | |
WO2023056728A1 (en) | Method and apparatus for reconstructing redundant arrays of independent drives, and device and medium | |
US12026069B2 (en) | Data storage volume recovery management | |
WO2023207132A1 (en) | Data storage method and apparatus, and device and medium | |
CN114063883B (en) | Data storage method, electronic device and computer program product | |
US10831624B2 (en) | Synchronizing data writes | |
CN112347044A (en) | Object storage optimization method based on SPDK | |
US11720551B1 (en) | Method and system for streaming data from portable storage devices | |
EP4160422A1 (en) | Method for using intermediate device to process data, computer system, and intermediate device | |
US20230251800A1 (en) | Modified file storage in hierarchical storage systems | |
WO2024051252A1 (en) | Data processing method and apparatus | |
US20240103984A1 (en) | Leveraging backup process metadata for data recovery optimization | |
WO2022242371A1 (en) | Secure data erase for tape storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22939942 Country of ref document: EP Kind code of ref document: A1 |