WO2014089828A1 - 访问存储设备的方法和存储设备 - Google Patents

访问存储设备的方法和存储设备 Download PDF

Info

Publication number
WO2014089828A1
WO2014089828A1 PCT/CN2012/086667 CN2012086667W WO2014089828A1 WO 2014089828 A1 WO2014089828 A1 WO 2014089828A1 CN 2012086667 W CN2012086667 W CN 2012086667W WO 2014089828 A1 WO2014089828 A1 WO 2014089828A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
physical block
block address
target key
storage device
Prior art date
Application number
PCT/CN2012/086667
Other languages
English (en)
French (fr)
Inventor
雷晓松
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201280002763.9A priority Critical patent/CN104054071A/zh
Priority to PCT/CN2012/086667 priority patent/WO2014089828A1/zh
Publication of WO2014089828A1 publication Critical patent/WO2014089828A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0615Address space extension
    • G06F12/063Address space extension for I/O modules, e.g. memory mapped I/O
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1081Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays

Definitions

  • the present invention relates to the field of data storage and management, and more particularly to a method and storage device for accessing a storage device. Background technique
  • Storage devices are used to store data and programs in a computer system. With the development of technology, storage devices are gradually expanded into various forms, according to storage media, including, for example, a hard disk drive (HDD), a non-volatile memory (NVM, Non-Volatile Memory), etc., where NVM is Can be specifically divided into solid state drive (SSD, Solid State Disk), optional injection-type random access memory (STT-RAM, Spin-transfer Torque Random-Access Memory), ferroelectric random accessor (FeRAM, Ferroelectric Random-Access Memory ), 4 PCM (phase change memory), Resistive random-access memory (RRAM).
  • HDD hard disk drive
  • NVM Non-Volatile Memory
  • SSD Solid State Disk
  • STT-RAM Spin-transfer Torque Random-Access Memory
  • FeRAM ferroelectric random accessor
  • RRAM Resistive random-access memory
  • a central processing unit usually generates read and write commands, and requires a lot of complicated mechanisms, including but not limited to: complex memory block allocation mechanism, complex buffer cache design, and let The file system crash (crash) can be recovered, etc., and at least multiple address translation processes are required to locate the physical space of the accessed storage device, which increases the complexity and reduces the speed at which the computer accesses the storage device, and the performance of the computer system is reduced. . Therefore, there is a need for a method that can reduce the number of address translations when accessing a storage device. Summary of the invention
  • the embodiments of the present invention provide a method and a storage device for accessing a storage device, so as to solve the problem of excessive address conversion times when accessing the storage device.
  • the first aspect provides a method for accessing a storage device, including: the storage device receiving an interface command including a target key; converting the target key to obtain a physical block address corresponding to the target key; and executing the interface command according to the acquired physical block address .
  • the method further includes: the storage device pre-storing the key with the physical The mapping table of the block address, the conversion target key to obtain the physical block address corresponding to the target key is specifically: According to the target key query mapping table, the physical block address corresponding to the target key is obtained.
  • the mapping table of the key and the physical block address pre-stored by the storage device is a hash index table, including a hash index number, and A mapping list associated with the hash index number, where the mapping list includes a mapping relationship between at least one key and a physical block address; then converting the target key to obtain a physical block address corresponding to the target key is specifically: hashing the target key to obtain The hash index number of the target key is used to query the mapping relationship between the at least one key and the physical block address in the mapping list associated with the hash index number of the target key in the hash index table, and obtain the physical block address corresponding to the target key.
  • the mapping table of the key and the physical block address pre-stored by the storage device is a static mapping table, where the static mapping table includes a key and a key Corresponding physical block address; then converting the target key to obtain the physical block address corresponding to the target key is specifically: querying the static mapping table according to the target key, and obtaining the physical block address corresponding to the target key.
  • the mapping table of the key and the physical block address pre-stored by the storage device is a multi-path search tree, including a key-based multi-path searcher. a tree, and a mapping relationship between a key and a physical block address associated with each node in the multipath lookup subtree; then converting the target key to obtain the physical block address corresponding to the target key is specifically: searching in the multipath search subtree a node corresponding to the target key, and determining a physical block address corresponding to the target key according to a mapping relationship between the key and the physical block address associated with the node.
  • the key is hashed to obtain a hash value, and the hash value is modulo the upper limit of the address of the physical block address.
  • the physical block address corresponding to the key is obtained; or the free physical block is randomly searched, and the physical block address of the free physical block is taken as the physical block address corresponding to the key.
  • a sixth possible implementation manner performing a hash operation on the target key to obtain a hash value, and taking the hash value to the upper limit of the address of the physical block address Module, thereby obtaining the physical block address corresponding to the target key; or randomly searching for the free physical block, and taking the physical block address of the free physical block as the physical block address corresponding to the target key.
  • the interface command includes one of the following interface commands: Put ( String Key, String Version optional, String Value ); or Write ( String Key, String Version optional, String Value ); or Get( String Key, String Version optional ); or Read( String Key, String Version optional ); or Delete ( String Key, String Version Optional ); or TRIM ( String Key, String Version Optional ).
  • the interface command supported by the storage device is carried in one or more of the following protocols: NVM Express, SCSI ( Small computer system interface, protocol, ATA (Advanced Technology Attachment) protocol, SOP (SCSI over PCI-E, SCSI protocol message over PCI-E channel) protocol, HTTP/REST (hypertext) Transfer protocol/resource state transfer protocol, HTTP/SOAP (hypertext transfer protocol/simple object access protocol) protocol and RPC (remote process call) Remote procedure call) protocol.
  • NVM Express SCSI ( Small computer system interface, protocol, ATA (Advanced Technology Attachment) protocol, SOP (SCSI over PCI-E, SCSI protocol message over PCI-E channel) protocol, HTTP/REST (hypertext) Transfer protocol/resource state transfer protocol, HTTP/SOAP (hypertext transfer protocol/simple object access protocol) protocol and RPC (remote process call) Remote procedure call) protocol.
  • a second aspect provides a storage device, including an interface, a controller, and a storage medium, wherein the interface receives an interface command including a target key and sends the command to the controller; the controller converts the target key to obtain a physical block address corresponding to the target key; The interface command is executed on the storage medium according to the acquired physical block address.
  • the storage device further includes a memory, where the storage medium is further configured to store a mapping table of the key and the physical block address; and the controller is specifically configured to read the mapping table of the key and the physical block address from the storage medium. Go to the memory, query the mapping table in memory according to the target key, and get the physical block address corresponding to the target key.
  • the storage device further includes a memory, where the memory is used to store a mapping table of the key and the physical block address; and the controller is specifically configured to query the memory according to the target key.
  • the mapping table obtains the physical block address corresponding to the target key.
  • the controller is specifically configured to perform a hash operation on the target key to obtain a hash value, and the hash value is modulo the upper limit of the address of the physical block address.
  • the physical block address corresponding to the target key is obtained; or the free physical block is randomly searched, and the physical block address of the free physical block is taken as the physical block address corresponding to the target key.
  • the interface command received by the controller includes one or more of the following interface commands: Put ( String Key, String Version optional, String Value ); or Write ( String Key, String Version optional, String Value ); or Get ( String Key, String Version optional ); or Read ( String Key, String Version optional ); or Delete ( String Key, String Version Optional ); or TRIM ( String Key, String Version Optional ).
  • the storage device is one of the following devices: a mechanical hard disk (HDD), a solid state hard disk (SSD), and a self-selected random input.
  • HDD mechanical hard disk
  • SSD solid state hard disk
  • STT-RAM access memory
  • FeRAM ferroelectric memory
  • PCM phase change memory
  • RRAM resistive random access memory
  • the physical channel supported by the storage device includes one or more of the following interfaces: SAS (Serial Attached SCSI, serial message) Computer system interface, SATA (Serial ATA) interface, PCI-E (peripheral component interconnected express) interface, Infiniband interface, Ethernet interface and USB (Universal Serial Bus) Bus) interface.
  • SAS Serial Attached SCSI, serial message
  • SATA Serial ATA
  • PCI-E peripheral component interconnected express
  • Infiniband interface Infiniband interface
  • Ethernet interface and USB (Universal Serial Bus) Bus) interface.
  • the interface command supported by the storage device is carried in one or more of the following protocols: NVM Express protocol, SCSI protocol, ATA Protocol, SOP protocol, HTTP/REST protocol, HTTP/SOAP protocol and RPC protocol.
  • the foregoing technical solution may be configured to receive an interface command including a target key by using a key-value storage manner, and then convert the target key to obtain a physical block address corresponding to the target key, and then execute an interface command according to the obtained physical block address, thereby reducing
  • the number of address translations increases the speed at which the storage device is accessed in the computer system, thereby improving the overall performance of the computer system.
  • FIGS. 1A to 1D are diagrams of a method of accessing a storage device by key-value in the prior art.
  • FIG. 3 is a schematic flowchart of a method for accessing a storage device according to an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of a method for accessing a storage device according to another embodiment of the present invention.
  • 5A to 5C are respectively schematic diagrams showing a mapping relationship between a key-value and a physical block address established in an embodiment of the present invention.
  • 6A-6B are schematic block diagrams of a storage device according to an embodiment of the present invention. detailed description
  • Cloud storage is a new concept extended and developed in the concept of cloud computing. It refers to the collection of a large number of different types of storage devices in the network through application software through functions such as cluster application, grid technology or distributed file system. Work together to provide a system for data storage and business access functions.
  • cloud computing systems need to be configured with a large number of storage devices, then the cloud computing system is transformed into a cloud storage system, so cloud storage is a data storage system. And management as the core of the cloud computing system.
  • NoSQL databases can be divided into Key-Value databases, Column Family databases, document databases, and graph databases according to different data models. The first three are based on a key-value organization data model.
  • the key-value data model is characterized in that, with the primary key as the center, the data entity is uniquely identified by the key (Key), and the system treats the value (Value) as a black box, without explanation, and there is no relationship between the data fields.
  • NoSQL is widely used in many fields such as the Internet, unstructured data backup, and search, especially cloud computing systems.
  • accessing the storage device includes the methods illustrated in Figures 1A through 1D.
  • accessing a database means accessing a storage device.
  • FIGS. 1A to 1D are diagrams of a method of accessing a storage device by key-value in the prior art.
  • a traditional solid state disk (SSD) pair provides a traditional block interface access address (LBA, Logic Block Address) favour application (application) specifies a logical block address on the SSD, that is, an LBA, usually by calling a conventional file.
  • System such as Linux Ext3 or Ext4
  • POSIX Portable Operating System Interface
  • the SSD controller When writing, by providing the LBA, the SSD controller looks for free blocks in the media space, writes data, and refreshes the mapping relationship between the LBA space maintained by the controller and the PBA space.
  • FIG 1C A collection of physical addresses is called a physical space, also called a storage space.
  • the application accesses the interface by using the file access mode 11 or the block access mode 12, and the content that the user needs to access is acquired in the specified LBA, or the file system allocates the LBA as the logical address of the content.
  • the NVM controller maintains the mapping relationship between the LBA and the PBA.
  • NVM media it consists of multiple particles that can be accessed concurrently, which has great advantages for I/O, and the file system brings a heavy price to the entire access process. If you access directly through block access, you also need to maintain a large key (Key) space and LBA mapping in the data management software layer, which requires a layer of translation, which increases the complexity of the access process. These will reduce the overall access speed of the computer system and reduce the performance of the computer system.
  • Key key
  • 2 is another method 20 of accessing a storage device by key-value in the prior art.
  • the value (Value) corresponding to different keys (names) in the namespace (name space) is distributed on the distributed storage node according to the key (Key).
  • the input ⁇ Key, Version optional, Value> can also be stored in a certain key-value data organization.
  • Several forms of local storage engines organize and store different data for the ⁇ ⁇ Version optional, &111 ⁇ 2> input. The local storage engine takes the key as the input and finally converts it into the data management middle of the address format that can be understood by the storage device. Pieces.
  • the above system usually applies for a large file, for example, 100 GB, and stores and reads data of the local storage medium in a large file according to a customized data organization.
  • This approach is similar to the "block access method" mentioned in Access Mode 12 of the prior art.
  • NVM read and write and compare operation commands are defined.
  • the Read (Read) and Write (Write) commands are used, and the required parameters include the LBA.
  • the data management layer developed by the user on the local storage node maintains a huge key (key) mapping with the LBA; in the NVM, the mapping relationship between the LBA and the PBA is performed by the NVM controller.
  • key key mapping
  • the embodiment of the invention provides a method for accessing a storage device, which can reduce the complexity when accessing the storage device.
  • the storage device receives an interface command, where the interface command includes a target key, and the storage device converts the target key to directly obtain a physical block address corresponding to the target key, and executes the physical block address based on the physical block address.
  • the converting the target key to directly obtain the physical block address corresponding to the target key may be implemented by the following implementation manner of FIG. 3, that is, a mapping table for storing a key and a physical block address in a storage device in advance. Querying the mapping table based on the target key to obtain a physical block address corresponding to the target key; or implementing the following method in FIG. 4, that is, directly calculating the target key to obtain the target key The corresponding physical block address. See the description of the embodiments below for details.
  • FIG. 3 is a schematic flow chart of a method 30 for accessing a storage device according to an embodiment of the present invention, including the following content.
  • the storage device receives an interface command including a target key (Key).
  • Interface commands include the usual read, write, delete, and other operational commands.
  • interface commands can be defined in the following format.
  • the write command format is: Put (String Key, String Version optional, String Value) or Write (String Key, String Version optional, String Value).
  • the write command includes the target key (Key) and an optional version (Version). ) information, and the value to be written (Value ).
  • the read command format is: Get(String Key, String Version optional) or Read(String Key, String Version optional), and the read command includes a target key (Key) and an optional version (Version) information.
  • the J command format is: Delete (String Key, String Version optional) or Trim (String Key, String Version optional).
  • the delete command includes the target key (Key), an optional version. This (Version) information.
  • the comparison command can also be supported, and the format is: Compare(String Key, String Version optional, String Value). The value in the storage space specified by the key (Key) and the optional version (Version) is taken out and compared with the value (Value) parameter carried by the comparison command.
  • NVMe Non-Volatile Memory Express, interface protocol between PCI-E SSD and Host and related standards organization
  • Write uncorrectable the address space used by the above commands is the address space specified by the key (Key) as the logical address input.
  • SCSI small computer system interface
  • SOP SCSI over PCI-E, carrying SCSI protocol messages on PCI-E channels
  • LBA space Modified to the space specified by Key can support all LV-related commands defined by SCSI (small computer system interface) standard and SOP (SCSI over PCI-E, carrying SCSI protocol messages on PCI-E channels), LBA space Modified to the space specified by Key.
  • the read and write commands defined in the existing standard are changed based on the LBA as the address input, and the NVM controller performs the address mapping method; and the key (Key) is used as the address input, thereby avoiding the file access mode or The overhead of block access.
  • the storage device in the embodiment of the present invention may have one or more of the following interfaces, such as a traditional hard disk interface, an Infiniband interface, an Ethernet interface, a PCI-E (peripheral component interconnected express) interface, and a USB interface.
  • Traditional hard disk interfaces can include SAS interfaces, SATA interfaces, and the like.
  • the physical channel can be a SAS interface, a SATA interface, a PCI-E interface, an Infiniband interface, and a USB interface;
  • the command set protocol thereon can be a standard defined by the NVM Express standard organization, and a SCSI defined by the T10. A subset of the command set, the ATA protocol or the SOP protocol, and so on.
  • the physical channel can be SAS interface, SATA interface,
  • a mapping table of a key (Key) and a physical block address (PBA) may be pre-stored in the storage device, and the mapping table is queried based on the target key to obtain a corresponding target key. Physical block address.
  • FIG. 5A to FIG. 5C respectively illustrate a method for establishing a mapping relationship between a key (Key) and a physical block address (PBA) according to an embodiment of the present invention.
  • the key (Key) is hashed by performing a hash (Hash) operation on the key (Key), for example, using a secure hash algorithm (SHA-1).
  • Hash hash
  • SHA-1 secure hash algorithm
  • the result of the hash operation is subjected to modulo operation on the upper limit of the PBA address to obtain a PBA block number.
  • Hashing is the input of any length (also known as pre-mapping, pre-image), which is transformed into a fixed-length output by a hashing algorithm.
  • the output is a hash value.
  • This conversion is a compression map, that is, the length of the hash value is usually much smaller than the length of the input, and different inputs may hash to the same output, so the input value cannot be uniquely determined from the hash value, ie
  • the mapping of keys (Keys) to hash values can be a many-to-one relationship.
  • the specific instructions are as follows:
  • a hash index table is maintained in the storage device, and the hash index table is divided into rows, and each row of the record cylinder is called a hash index row, including a hash index number (Hashlnddex) of the row.
  • mapping list wherein the hash index number (Hashlndex) is a hash value obtained after a key is hashed, and the mapping list (Map_List) may include one or more ⁇ Key, PBA>
  • the mapping relationship, the hash value obtained by hashing one or more Keys in the mapping relationship of one or more ⁇ Key, PBA> is the hash index number, and the mapping relationship indicates that The value corresponding to the Key is placed in the storage unit pointed to by the PBA.
  • the process of establishing the hash index table is as follows: For any Key, such as Key2, firstly, Key 2 is hash-calculated, and the hash index number Hashlndexm corresponding to the Key2 is obtained.
  • the obtained Hashlndexm is corresponding to the mth hash index row of the hash index table, and then the physical block address (PBA) corresponding to the key2 is obtained.
  • the physical block address corresponding to the key2 is obtained.
  • PBA) may be a hash index number Hashlndexm obtained by hashing Key2 to modulo the upper limit of the address of the PBA, and the result of the modulo operation is the physical block address (PBA) corresponding to Key2, such as PBA6; Or randomly find a free physical block in the entire PBA space as the physical block address (PBA) corresponding to Key2, such as PBA6.
  • mapping relationship between Key2 and PBA6 is inserted as a node into the Map_List (here, Map_Listm) corresponding to the hash index (here, Hashlndexm), where it is recorded as ⁇ Key2, PBA6>, indicating the value corresponding to Key2. Put it into the storage unit pointed to by PBA6. Perform the above operation on each Key, and specify the corresponding hash index number of each Key and its corresponding mapping relationship ⁇ Key, PBA> and store it in the hash index table in the above manner.
  • Map_Listm Map_Listm
  • Hashlndexm hash index
  • a new node is inserted in the Map_Listm corresponding to the hash index row Hashlndexm, and the mapping relationship ⁇ Key_N, PBA_N> is recorded in the node, that is, the value corresponding to the Key_N is placed in the storage unit pointed to by the PBA_N. Therefore, in each hash index row, there may be one or more mappings ⁇ Key, PBA> in the mapping table Map_List corresponding to the same hash index number (Hashlndex).
  • the hash index number is modulo the upper limit of the address of the PBA, and collision may occur. That is, different hash index numbers obtain the same modulus after modulo the upper limit of the address of the PBA, in order to avoid collision. You can perform another hash calculation on the function Hash(Key), and then modulo the upper limit of the PBA address.
  • the formula is (Hash(H)(Hash(Key)% PBA address upper limit) until the free PBA block is obtained.
  • each record in the Map_List of the hash index row corresponding to the hash index number is ⁇ Key
  • PBA > is followed by a pointer to point to the next ⁇ 6 ⁇ ⁇ > in the Map_List, so that multiple mappings in the Map_List can be stringed into a linked list by pointers.
  • the storage device Based on the hash index mapping table, the storage device performs a hash operation on the target key in the interface command to obtain a hash index number of the target key after receiving an interface command, and then in the The mapping list of the hash index table associated with the hash index number of the target key is used to query a mapping relationship between the at least one key and the physical block address, thereby acquiring a physical block address corresponding to the target key.
  • the hash index table is used as a mapping table of keys and PBAs in the embodiment of the present invention, and may be generated in the memory at runtime and may be stored in the memory.
  • the mapping relationship may also be persisted to a specified interval of the storage medium, and the designated area becomes an index area.
  • the hardware characteristics of the NVM determine the number of times the data deletion or read and write operations are used. Hashing data effectively into different PBA spaces by hashing Meets the expectations of wear leveling to increase NVM life. Moreover, based on the characteristics of the key hash, the randomness and concurrency of the NVM particles can be fully utilized to improve the concurrency.
  • the hardware characteristics of the HDD determine the characteristics that the data needs to be read sequentially. Hashing can effectively hash data into different PBA spaces. Although the sequential read characteristics of the hard disk are destroyed, the value storage area can be cross-tracked by taking into account the difference in access speed between the inside and outside of the disk. Store for comprehensive utilization of hard drive performance. In a large distributed environment, it can meet the load balancing of multiple hard disk accesses and has the effect of global performance balance.
  • mapping relationship between the key (Key) and the PBA may also be established in a static mapping manner.
  • the PBA corresponding to the Key For any Key, determine the PBA corresponding to the Key. Referring to the manner described in the foregoing embodiment, the hash value obtained by hashing the Key may be obtained, and then the PBA corresponding to the upper limit of the PBA is obtained, or the PBA corresponding to the Key is obtained, or randomly searched in the PBA space. An idle physical block is used as the PBA corresponding to the key. In the embodiment of the present invention, it is assumed that the PBA corresponding to Key1 is PBA1. Record the mapping relationship ⁇ Keyl, PBA1> in the index record. If you need to write the value corresponding to the new Key later, you can use the same process as above.
  • the storage device queries the static mapping table according to the target key to obtain a physical block address corresponding to the target key.
  • a mapping relationship between a key (Key) and a PBA may be established by using a multi-way search tree.
  • the multi-path search tree may include a multi-path search based on a key. a subtree, and a mapping relationship between a key and a physical block address associated with each of the nodes in the multipath lookup subtree.
  • the multi-path search subtree can be implemented by using B+Tree.
  • the B+Tree technology can keep the data arrangement of the keys stable and order, and has a relatively stable logarithmic time complexity for key insertion and modification.
  • Each node in B+Tree can contain a large amount of keyword information and branches according to the actual situation; thus the depth of the tree is reduced, which means that looking for an element with few nodes, from a storage device such as a foreign disk Read in memory and quickly access the data you are looking for.
  • the B+Tree method also allows data to be hashed into the NVM address space to increase NVM lifetime.
  • the multipath lookup tree allocates 4 nodes per time. Assume that 2, 10, 15, and 20 are inserted in the first 4 times, corresponding to 4 keys (Key). Insert 5 for the 5th time, but the 4 units of the allocated node space have been used up. At this time, a new node is reassigned, and the new node also contains 4 units. Since 3 is larger than 2 and smaller than 10, it is inserted into the node corresponding to the subtree between 2 and 10.
  • the insertion process is similar in the future, and a hierarchical multi-path tree structure is gradually established.
  • the PBA corresponding to the key may be obtained by referring to the manner described in the foregoing embodiment, and may be a hash value obtained by hashing the key, and then modulating the upper limit of the address of the PBA to obtain the corresponding key of the key.
  • PBA or randomly find a free physical block in the PBA space as the PBA corresponding to the Key, and then associate the key of each unit in the B+Tree with the mapping relationship ⁇ Key, PBA> corresponding to the Key, as shown in the figure.
  • the list on the right side of 5C is shown.
  • the description of the embodiment of the present invention is not limited to the two-stage tree structure.
  • the mapping relationship between the key and the PBA of the embodiment of the present invention is generated by the above method.
  • the storage device searches for a node corresponding to the target key in its multipath search subtree after receiving an interface command, and according to the key and physical block associated with the node
  • the mapping relationship of the addresses determines the physical block address corresponding to the target key.
  • the data is written on the storage unit pointed to by the physical block address (PBA) corresponding to the key;
  • the method of the embodiment of the present invention further includes returning an execution result for the interface command, such as information indicating whether the write operation is successful for a write command, and returning a physical block corresponding to the key (Key) for the read command
  • an execution result for the interface command such as information indicating whether the write operation is successful for a write command
  • returning a physical block corresponding to the key (Key) for the read command The data stored on the storage unit pointed to by the address (PBA); for the delete command, the information indicating whether the deletion operation was successful.
  • the storage device receives the interface command including the target key, and then converts the target key to obtain the PBA corresponding to the target key, and then performs the connection according to the acquired PBA. Oral command.
  • the storage device receives the interface command including the target key, and then converts the target key to obtain the PBA corresponding to the target key, and then performs the connection according to the acquired PBA. Oral command.
  • multiple conversions of the Key ⁇ -> file ⁇ ->LBA ⁇ ->PBA, or multiple conversions of the Key ⁇ ->LBA ⁇ ->PBA are avoided, and the above multiple conversion requires a processor and a file system. Multiple interactions with storage devices.
  • the corresponding PBA is directly obtained by the Key, thereby reducing the number of address translations, reducing the complexity of maintaining the management software of the address translation, reducing the performance loss of the computer system, and improving the speed of accessing the storage device in the computer system. , thereby improving the overall performance of the computer system.
  • FIG. 4 is a schematic flowchart of a method 40 for accessing a storage device according to another embodiment of the present invention.
  • a mapping table of a key and a physical block address described in the foregoing embodiment may be stored in a storage device without using a storage table.
  • the physical block address (PBA) corresponding to the target key is obtained by directly calculating according to the target key, and specifically includes the following content.
  • the storage device receives an interface command including a target key.
  • Interface commands include the usual read, write, delete, and other operational commands.
  • interface commands can be defined in the following format.
  • the write command format is: Put (String Key, String Version optional, String Value) or Write (String Key, String Version optional, String Value).
  • the write command includes the target key (Key) and an optional version (Version). ) information, and the value to be written (Value ).
  • the read command format is: Get(String Key, String Version optional) or Read(String Key, String Version optional), and the read command includes a target key (Key) and an optional version (Version) information.
  • the format of the delete command is: Delete(String Key, String Version optional) or Trim(String
  • the delete command contains the target key (Key), optional version (Version) information.
  • the comparison command can also be supported, and the format is: Compare(String Key, String Version optional, String Value). The value in the storage space specified by the key (Key) and the optional version (Version) is taken out and compared with the value (Value) parameter carried by the comparison command.
  • NVMe Non-Volatile Memory Express, PCI-E SSD and Host interface protocol and related standards organization
  • defined dataset management Dataset Management
  • download 'J Flush
  • uncorrectable command Write uncorrectable
  • the address space used by the above commands is the address space specified based on the key (Key) as the logical address input.
  • SCSI small computer system interface, small computer System Interface
  • SOP SCSI over PCI-E, carrying SCSI protocol messages on the PCI-E channel
  • the read and write commands defined in the existing standard are changed based on the LBA as the address input, and the NVM controller performs the address mapping method; and the key (Key) is used as the address input, thereby avoiding the file access mode or The overhead of block access.
  • the storage device in the embodiment of the present invention may have one or more of the following interfaces, such as a traditional hard disk interface, an Infiniband interface, an Ethernet interface, a PCI-E (peripheral component interconnected express) interface, and a USB interface.
  • Traditional hard disk interfaces can include SAS interfaces, SATA interfaces, and the like.
  • the physical channel can be a SAS interface, a SATA interface, a PCI-E interface, an Infiniband interface, and a USB interface;
  • the command set protocol thereon can be a standard defined by the NVM Express standard organization, and a SCSI defined by the T10. A subset of the command set, the ATA protocol or the SOP protocol, and so on.
  • the physical channel can be SAS interface, SATA interface,
  • the interface command including the target key is received, and the target key is directly calculated according to the target key.
  • PBA physical block address
  • a write command data is written on a storage unit pointed to by a physical block address (PBA) corresponding to a key; for a read command, a physical block address corresponding to the key (Key) is returned ( The data stored on the storage unit pointed to by the PBA; for the delete command, the data stored on the storage unit pointed to by the physical block address (PBA) corresponding to the key is released.
  • PBA physical block address
  • the method of the embodiment of the present invention further includes returning an execution result for the interface command, such as information indicating whether the write operation is successful for a write command, and returning the read command
  • an execution result for the interface command such as information indicating whether the write operation is successful for a write command
  • returning the read command The data stored on the storage unit pointed to by the physical block address (PBA) corresponding to the key (Key); for the delete command, the information indicating whether the deletion operation is successful is returned.
  • the storage device receives an interface command including a target key, and then performs calculation according to the target key to obtain a physical block address (PBA) corresponding to the target key, and then executes an interface command according to the acquired PBA.
  • PBA physical block address
  • multiple conversions of the Key ⁇ -> file ⁇ ->LBA ⁇ ->PBA, or multiple conversions of the Key ⁇ -> LBA ⁇ ->PBA are avoided, and the above multiple conversion requires a processor and a file system. Multiple interactions with storage devices.
  • the corresponding PB A is directly obtained by the Key, thereby reducing the number of address translations, reducing the complexity of the management software for maintaining address translation, and reducing the performance loss of the computer system, and increasing the access storage device in the computer system.
  • the speed thereby improving the overall performance of the computer system.
  • FIG. 6A is a schematic block diagram of a memory device 60 in accordance with an embodiment of the present invention.
  • the storage device 60 includes an interface 61, a controller 62, and a storage medium 63.
  • the interface 61 receives an interface command including the target key and transmits it to the controller 62.
  • the controller 62 converts the target key to obtain a physical block address corresponding to the target key; and executes the interface command on the storage medium according to the acquired physical block address.
  • Figure 6B is a schematic block diagram of a memory device 60 in accordance with another embodiment of the present invention.
  • the storage device provided by the embodiment of the present invention further includes a memory 64.
  • the mapping table of the key and the physical block address may be stored in the memory 64 in advance, and the controller 62 receives the interface command from the interface 61, according to the The target key carried in the interface command queries the mapping table in the memory 64 to obtain a physical block address corresponding to the target key.
  • the mapping table in order to prevent the mapping table from being lost due to power failure of the memory 64, the mapping table may also be stored in a designated area of the storage medium 63, and the controller 62 may use the storage medium.
  • the stored mapping table is read into the memory 64, and the mapping table is queried according to the target key carried in the interface command to obtain the physical block address corresponding to the target key.
  • mapping table The structure of the mapping table is specifically described with reference to the contents as shown in FIG. 5A, 5B or 5C and the corresponding embodiment, and details are not described herein again.
  • the storage medium 63 is configured to store data, and the controller 62 performs the interface command on the storage medium 63 according to the obtained physical block address.
  • the controller 62 instructs to write the data carried in the write command on the storage unit in the storage medium 63 pointed to by the physical block address corresponding to the target key;
  • Command the controller 62 instructs to read the data stored on the storage unit from the storage unit in the storage medium 63 pointed to by the physical block address corresponding to the target key;
  • delete command the controller 62 instructs the target key The data stored on the storage unit is released on the storage unit in the storage medium 63 pointed to by the corresponding physical block address.
  • the method of the embodiment of the present invention further includes returning an execution result for the interface command, such as information indicating whether the write operation is successful for a write command, and returning a physical block corresponding to the key (Key) for the read command
  • an execution result for the interface command such as information indicating whether the write operation is successful for a write command
  • returning a physical block corresponding to the key (Key) for the read command The data stored on the storage unit pointed to by the address (PBA); for the delete command, the information indicating whether the deletion operation was successful.
  • the controller 62 may be a SoC (System On Chip), an ASIC (Application Specific Integrated Circuit), a DSP (Digital Signaling Processor), and an FPGA (Field-Programmable Gate Array). , field programmable gate arrays, as well as general purpose processors and more.
  • SoC System On Chip
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signaling Processor
  • FPGA Field-Programmable Gate Array
  • the storage device stores data by using a key-value storage manner, and the storage device receives an interface command including a target key, and then converts the target key to obtain a physical block address corresponding to the target key, and then obtains a physical block address according to the acquired physical block address.
  • Execute interface commands In the embodiment of the present invention, multiple conversions of the Key ⁇ ->file ⁇ ->LBA ⁇ ->PBA, or multiple conversions of the Key ⁇ ->LB A ⁇ ->PB A are avoided, and the above multiple conversion requires a processor, Multiple interactions between file systems and storage devices.
  • the corresponding PBA is directly obtained by the Key, thereby reducing the number of address translations, reducing the complexity of maintaining the management software of the address translation, reducing the performance loss of the computer system, and improving the speed of the computer system accessing the storage device. This improves the overall performance of the computer system.
  • the storage medium of the storage device may not be used in advance.
  • mapping table of the key and the physical block address stores the mapping table of the key and the physical block address, but after receiving the interface command including the target key, the controller 62 directly calculates according to the target key to obtain the corresponding target key.
  • PBA physical block address
  • the interface command received by the controller 62 may include one of the following interface commands.
  • Delete String Key, String Version Optional
  • TRIM String Key, String Version Optional
  • the storage device 60 can be combined with different storage media by the controller 62 to form a variety of storage devices.
  • the storage medium may be a disk medium or a solid state electronic storage medium or the like.
  • the storage device 60 may be a mechanical hard disk (HDD), a solid state drive (SSD), a self-selected random access memory (STT-RAM), a ferroelectric memory (FeRAM), a phase change memory (PCM), a resistive random Access memory (RRAM) and more.
  • HDD hard disk
  • SSD solid state drive
  • STT-RAM self-selected random access memory
  • FeRAM ferroelectric memory
  • PCM phase change memory
  • RRAM resistive random Access memory
  • the physical channel that the storage device 60 can support includes one or more of the following interfaces: a SAS interface, a SATA interface, a PCI-E interface, an Infiniband interface, an Ethernet interface, and a USB interface.
  • the interface commands that the storage device 60 can support are carried in one or more of the following protocols: NVM Express protocol, SCSI protocol, ATA protocol, SOP protocol, HTTP/REST protocol, HTTP/SOAP protocol, and RPC protocol. .
  • the storage device 60 implements the method 40 or the method 30, which reduces the number of address translations by querying the mapping relationship between the keys and physical block addresses that are stored or obtained by directly calculating the target keys, thereby improving access in the computer system.
  • the speed of the storage device is the speed of the storage device.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separate.
  • the components displayed for the unit may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage device if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage device includes: a U disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a disk or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种访问存储设备的方法和存储设备,以解决访问存储设备时地址转换次数过多的问题。方法包括:存储设备接收包括目标键的接口命令;转换目标键以获取目标键对应的物理块地址;根据所获取的物理块地址,执行接口命令。存储设备包括:接口、控制器和存储介质,接口接收包括目标键的接口命令并发送到控制器;控制器转换目标键以获取目标键对应的物理块地址;控制器根据所获取的物理块地址,对存储介质执行接口命令。该技术方案减少了地址转换的次数,提高了计算机系统中访问存储设备的速度,由此整体提高了计算机系统性能。

Description

访问存储设备的方法和存储设备 技术领域
本发明涉及数据存储和管理领域, 并且更具体地, 涉及访问存储设备的 方法和存储设备。 背景技术
存储设备用于存储计算机系统中的数据和程序。 随着科技的发展, 存储 设备逐渐扩展为多种形态, 按照存储介质分, 例如包括机械硬盘(HDD, Hard Disk Drive ), 非易失性存储器(NVM, Non- Volatile Memory )等, 其 中 NVM又可以具体分为固态硬盘 ( SSD, Solid State Disk )、 自选注入式随 机访问存者器( STT-RAM, Spin-transfer Torque Random-Access Memory )、 铁电存者器 ( FeRAM, Ferroelectric Random-Access Memory ), 4目变存者器 (PCM , phase change memory) , 电阻式随机访问存者器 (RRAM , Resistive random-access Memory)等。
由于计算机系统处理的数据业务日益繁重,评价计算机系统的重要指标 至少包括: 访问存储设备的速度和计算机系统性能的高与低。 现有技术中, 通常中央处理器(CPU, Central Processing Unit )产生读写命令, 需要经过 很多复杂机制, 包括但不限于: 复杂的存储块分配机制、 复杂的高速緩沖 ( buffer cache )设计、 让文件系统崩溃(crash )可恢复等, 且至少需要多次 地址转换过程才能定位所访问的存储设备的物理空间, 增加了复杂度的同 时, 降低了计算机访问存储设备的速度, 同时计算机系统性能降低。 因此, 需要一种可以在访问存储设备时降低地址转换次数的方法。 发明内容
有鉴于此, 本发明实施例提供一种访问存储设备的方法和存储设备, 以 解决访问存储设备时地址转换次数过多的问题。
第一方面, 提供了一种访问存储设备的方法, 包括: 存储设备接收包括 目标键的接口命令; 转换目标键以获取目标键对应的物理块地址; 根据所获 取的物理块地址, 执行接口命令。
在第一种可能的实现方式中, 方法还包括: 存储设备预先存储键同物理 块地址的映射表, 则转换目标键以获取目标键对应的物理块地址具体为: 根 据目标键查询映射表, 获取目标键对应的物理块地址。
结合第一方面的第一种可能的实现方式, 在第二种可能的实现方式中, 存储设备预先存储的键同物理块地址的映射表为哈希索引表, 包括哈希索引 号、 以及跟哈希索引号相关联的映射列表, 映射列表中包含至少一条键同物 理块地址的映射关系; 则转换目标键以获取目标键对应的物理块地址具体 为: 对目标键进行哈希运算以获取该目标键的哈希索引号, 在哈希索引表的 与目标键的哈希索引号相关联的映射列表中查询至少一条键同物理块地址 的映射关系, 获取与目标键对应的物理块地址。
结合第一方面的第一种可能的实现方式, 在第三种可能的实现方式中, 存储设备预先存储的键同物理块地址的映射表为静态映射表, 该静态映射表 包含键以及与键对应的物理块地址; 则转换目标键以获取目标键对应的物理 块地址具体为:根据目标键查询静态映射表,获取目标键对应的物理块地址。
结合第一方面的第一种可能的实现方式, 在第四种可能的实现方式中, 存储设备预先存储的键同物理块地址的映射表为多路查找树, 包括基于键的 多路径查找子树、 以及与多路径查找子树中每一个节点相关联的一条键和物 理块地址的映射关系; 则转换目标键以获取目标键对应的物理块地址具体 为: 在多路径查找子树中查找与目标键对应的节点, 并根据与该节点相关联 的键和物理块地址的映射关系确定与目标键对应的物理块地址。
结合第一方面或第一方面的上述可能的实现方式,在第五种可能的实现 方式中, 对键进行哈希运算以得到哈希值, 将哈希值对物理块地址的地址上 限取模, 从而得到键对应的物理块地址; 或随机查找空闲的物理块, 将该空 闲物理块的物理块地址作为与键对应的物理块地址。
结合第一方面或第一方面的上述可能的实现方式,在第六种可能的实现 方式中, 对目标键进行哈希运算以得到哈希值, 将哈希值对物理块地址的地 址上限取模,从而得到目标键对应的物理块地址;或随机查找空闲的物理块, 将该空闲物理块的物理块地址作为与目标键对应的物理块地址。
结合第一方面或第一方面的上述可能的实现方式,在第七种可能的实现 方式中, 接口命令包括以下接口命令之一: Put ( String Key, String Version optional, String Value ); 或 Write ( String Key, String Version optional, String Value );或 Get( String Key, String Version optional );或 Read( String Key, String Version optional ); 或 Delete ( String Key, String Version Optional ); 或 TRIM ( String Key, String Version Optional )。
结合第一方面或第一方面的上述可能的实现方式,在第八种可能的实现 方式中, 存储设备支持的接口命令承载于以下协议中的一个或多个: NVM Express十办议、 SCSI ( small computer system interface, 小型计算机系统接口 ) 协议、 ATA ( Advanced Technology Attachment, 高级技术附属)协议、 SOP ( SCSI over PCI-E,在 PCI-E通道上承载 SCSI协议消息)协议、 HTTP/REST ( hypertext transfer protocol/resource state transfer, 超文本传送协议 /资源状态 转移)协议、 HTTP/SOAP ( hypertext transfer protocol/simple object access protocol, 超文本传送协议 /筒单对象访问协议)协议和 RPC ( remote process call, 远程过程调用)协议。
第二方面, 提供了一种存储设备, 包括接口、 控制器和存储介质, 接口 接收包括目标键的接口命令并发送到控制器; 控制器转换目标键以获取目标 键对应的物理块地址; 控制器根据所获取的物理块地址, 对存储介质执行接 口命令。
在第一种可能的实现方式中, 存储设备还包括内存, 存储介质还用于存 储键同物理块地址的映射表; 则控制器具体用于从存储介质将键同物理块地 址的映射表读到内存, 根据目标键在内存中查询映射表, 获取与目标键对应 的物理块地址。
结合第二方面的实现方式, 在第二种可能的实现方式中, 存储设备还包 括内存, 内存用于存储键同物理块地址的映射表; 则控制器具体用于根据目 标键在内存中查询映射表, 获取与目标键对应的物理块地址。
结合第二方面的实现方式, 在第三种可能的实现方式中, 控制器具体用 于对目标键进行哈希运算以得到哈希值,将哈希值对物理块地址的地址上限 取模, 从而得到目标键对应的物理块地址; 或随机查找空闲的物理块, 将该 空闲物理块的物理块地址作为与目标键对应的物理块地址。
结合第二方面或第二方面的上述可能的实现方式,在第四种可能的实现 方式中, 控制器接收的接口命令包括以下接口命令之一或多个: Put ( String Key, String Version optional, String Value ); 或 Write ( String Key, String Version optional, String Value ); 或 Get ( String Key, String Version optional ); 或 Read ( String Key, String Version optional ); 或 Delete ( String Key, String Version Optional ); 或 TRIM ( String Key, String Version Optional )。
结合第二方面或第二方面的上述可能的实现方式,在第五种可能的实现 方式中, 存储设备为以下设备的一种: 机械硬盘(HDD )、 固态硬盘(SSD )、 自选注入式随机访问存储器(STT-RAM )、 铁电存储器(FeRAM )、 相变存 储器(PCM )、 电阻式随机访问存储器(RRAM )。
结合第二方面或第二方面的上述可能的实现方式,在第六种可能的实现 方式中,存储设备支持的物理通道包括以下接口中的一个或多个: SAS( Serial Attached SCSI, 串行消息计算机系统)接口、 SATA ( Serial ATA, 串行数据 通信 )接口、 PCI-E ( peripheral component interconnected express, 夕卜设互联 标准)接口、 Infiniband接口、 以太网接口和 USB ( Universal Serial Bus, 通 用串行总线)接口。
结合第二方面或第二方面的上述可能的实现方式,在第七种可能的实现 方式中, 存储设备支持的接口命令承载于以下协议中的一个或多个: NVM Express协议、 SCSI协议、 ATA协议、 SOP协议、 HTTP/REST协议、 HTTP/SOAP 协议和 RPC协议。
上述技术方案可以通过键-值存储方式, 存储设备接收包括目标键的接 口命令, 然后转换该目标键以获取该目标键对应的物理块地址, 再根据获取 的物理块地址执行接口命令, 从而减少地址转换的次数, 提高了计算机系统 中访问存储设备的速度, 由此整体提高计算机系统性能。 附图说明
为了更清楚地说明本发明实施例的技术方案, 下面将对实施例或现有技 术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图 仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造 性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1A至图 1D是现有技术中一种通过键 -值访问存储设备的方法。
图 2是现有技术中另一种通过键 -值访问存储设备的方法。
图 3是本发明实施例的访问存储设备的方法的示意流程图。
图 4是本发明另一实施例的访问存储设备的方法的示意流程图。
图 5A至图 5C分别是一种本发明实施例建立的键-值和物理块地址的映 射关系的示意图。 图 6A至图 6B是本发明实施例的存储设备的示意框图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创 造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
云存储是在云计算概念上延伸和发展出来的一个新的概念,是指通过集 群应用、 网格技术或分布式文件系统等功能, 将网络中大量各种不同类型的 存储设备通过应用软件集合起来协同工作,共同对外提供数据存储和业务访 问功能的一个系统。 当云计算系统运算和处理的核心是大量数据的存储和管 理时, 云计算系统中就需要配置大量的存储设备, 那么云计算系统就转变成 为一个云存储系统, 所以云存储是一个以数据存储和管理为核心的云计算系 统。
近来,出现了大量开源和闭源的非关系型(NoSQL )数据库,其中 NoSQL 的概念是指提供非关系型、 分布式、 提供或可以不提供 ACID语义的数据库 设计模式。 ACID, 指数据库事务正确执行的四个基本要素的缩写, 包括原 子性(Atomicity )、 一致性 ( Consistency )、 隔离性 ( Isolation )、 持久性 ( Durability )。 NoSQL数据库按照不同数据模型可以被分为键-值 (Key- Value) 数据库、 列族 (Column Family)数据库、 文档数据库和图数据库。 前三者都 是基于键 -值组织数据模型。 键 -值数据模型的特征在于, 以主键为中心, 数 据实体由键 (Key)唯一标识, 系统把数值(Value ) 当黑盒处理, 不做解释, 数据域之间无关联关系。 基于该特性, 当前 NoSQL在互联网、 非结构化数 据备份、 搜索等多个领域, 特别是云计算系统中得到广泛的应用。
现有技术中, 当计算机系统基于键-值组织数据时, 访问存储设备的方 法包括图 1A至图 1D所示的方法。 上下文中, 访问数据库即是指访问存储 设备。
图 1A至图 1D是现有技术中一种通过键 -值访问存储设备的方法。
如图 1A和图 1B所示, 传统固态硬盘 ( SSD )对上提供传统块接口访问 地址(LBA, Logic Block Address )„ 应用 ( application )指定 SSD上的逻辑 块地址即 LBA, 通常通过调用传统文件系统 (例如 Linux的 Ext3或 Ext4)接 口, 使用标准 POSIX ( Portable Operating System Interface, 可移植操作系统 接口)文件访问接口进行访问。 读取时, 通过提供 LBA, 由 SSD的闪存翻 译层(FTL, Flash Translation Layer)进行地址翻译,得到 NAND闪存(NAND Flash Memory )或其他介质形态内的物理地址(PBA ), 进行数据块的读出。 写入时, 通过提供 LBA, 由 SSD的控制器在介质空间内寻找空闲块, 写入 数据, 并刷新控制器维护的 LBA空间同 PBA空间的映射关系。 LBA空间同 PBA空间的映射关系参考图 1C。 物理地址的集合称为物理空间, 也称存储 空间。
参考图 1D, 应用通过使用文件访问方式 11或块(block )访问方式 12 访问接口, 将用户需要访问的内容以指定的 LBA获取, 或由文件系统分配 LBA为内容存放的逻辑地址。 NVM控制器会维持 LBA同 PBA的映射关系。
在上述现有技术中, 由于传统文件系统针对 SAS ( Serial Attached SCSI, 串行 SCSI )存储器、 SATA ( Serial Advanced Technology Attachment, 串行 ATA )存储器进行设计, 其内部提供很多复杂机制, 包括但不限于: 复杂的 存储块分配机制、 复杂的高速緩存设计、 让文件系统崩溃可恢复等。 在键- 值存储中, 输入键(key ), 会将其转换为文件标识符, 通过复杂的文件层, 转换为 LBA。 在 NVM内部, 控制器会再将 LBA转换为 NVM内物理空间 地址, 也就是 PBA。 经过两层转换过程, 行为复杂; 并且文件系统考虑了硬 盘对顺序访问性能好的特点, 进行了输入输出 (I/O )访问的调度、 合并和 排队。 但在 NVM介质上, 其由多个可并发访问的颗粒组成, 对 I/O存在巨 大优势, 由此文件系统给整个访问过程带来了繁重的代价。 如果通过块访问 方式直接访问, 也需要在数据管理软件层维护一个庞大的键(Key ) 空间和 LBA的映射, 需要一层转换层, 增加了访问过程的复杂度。 这些都会降低计 算机系统整体对外体现的访问速度, 以及降低计算机系统的性能。
图 2是现有技术中另一种通过键 -值访问存储设备的方法 20。
在支持键 -值的分布式存储系统中, 会按照键(Key )将名字空间(name space ) 内不同键( Key )对应的数值 ( Value )在分布式存储节点上进行分散 存放。 在本地存储节点上, 输入的 <Key, Version optional, Value>也可按照一 定的键-值数据组织形式存放。 几种形态的本地存储引擎都针对输入的< ^ Version optional, &11½>进行不同的数据组织和存放。 本地存储引擎是以键 ( key ) 为输入, 最后转成可以被存储设备理解的地址格式的数据管理中间 件。
上述系统通常会申请一个大文件, 例如 100GB , 在大文件内按照自定义 的数据组织形式进行本地存储介质的数据存放和读取。这种方式类似于前一 现有技术中访问方式 12中提到的 "块访问方式"。
同时, 在 NVM Express标准中, 定义了 NVM读写和比较等操作命令, 以读(Read )、 写 (Write )命令为例, 其需要的参数包括 LBA。 这种方式由 本地存储节点上用户自己开发的数据管理层维护巨大的键(key ) 同 LBA的 映射; 在 NVM内由 NVM控制器进行 LBA到 PBA的映射关系。 转换存在 代价, 降低访问存储设备的速度或降低计算机系统的性能。
本发明实施例提供了一种访问存储设备的方法, 可以降低访问存储设备 时的复杂度。 在本发明实施例中, 存储设备接收接口命令, 该接口命令中包 含目标键,存储设备对所述目标键进行转换以直接获取与该目标键对应的物 理块地址, 基于该物理块地址执行所述接口命令。 所述的对所述目标键进行 转换以直接获取与该目标键对应的物理块地址可以是通过如下图 3的实施方 式来实现, 即预先在存储设备中存储键和物理块地址的映射表, 基于所述目 标键查询所述映射表从而获取所述目标键对应的物理块地址; 也可以是通过 如下图 4的实施方式来实现, 即直接对所述目标键进行计算从而获取所述目 标键对应的物理块地址。 具体参见下面的实施例描述。
图 3是本发明实施例的访问存储设备的方法 30的示意流程图, 包括以 下内容。
S31 , 存储设备接收包括目标键(Key ) 的接口命令。
接口命令包括通常的读、 写、 删除等操作命令。 举例来说, 接口命令可 以定义为以下格式。
写入命令格式为: Put (String Key, String Version optional, String Value)或 Write(String Key, String Version optional, String Value) , 该写入命令中包含目 标键(Key )、 可选的版本(Version )信息, 和准备写入的数值( Value )。
读出命令格式为: Get(String Key, String Version optional) 或 Read(String Key, String Version optional) , 该读出命令中包含目标键 ( Key )、 可选的版本 ( Version )信息。
J除命令格式为: Delete(String Key, String Version optional)或 Trim(String Key, String Version optional) , 该删除命令中包含目标键(Key )、 可选的版 本 ( Version )信息。
可选的, 还可以支持比较命令, 格式为: Compare(String Key, String Version optional, String Value)。 将键 ( Key )和可选的版本 ( Version )指定的 存储空间内的值取出, 同该比较命令携带的数值(Value )参数进行比较。
同时,可选的,也支持 NVMe ( Non- Volatile Memory Express , PCI-E SSD 同 Host之间的接口协议以及相关标准组织) 定义的数据集管理 (Dataset Management )和下 U ( Flush )命令,以及不可爹正命令 ( Write uncorrectable ), 上述命令使用的地址空间都是基于键(Key )作为逻辑地址输入而指定的地 址空间。
此外, 还可以支持 SCSI ( small computer System Interface , 小型计算机 系统接口)标准和 SOP ( SCSI over PCI-E, 在 PCI-E通道上承载 SCSI协议 消息)定义的所有与 LBA有关的命令, 将 LBA空间修改为基于 Key指定的 空间。
在 NVM接口上改变了现有标准中定义的读、 写等命令基于 LBA作为 地址输入, 在 NVM控制器进行地址映射的方式; 而使用键(Key )作为地 址输入, 从而可避免文件访问方式或块访问方式带来的开销。
本发明实施例中的存储设备可以具有以下接口中的一个或多个, 例如传 统硬盘接口、 Infiniband接口、 以太网 (Ethernet )接口、 PCI-E ( peripheral component interconnected express )接口和 USB接口等。传统硬盘接口可以包 括 SAS接口、 SATA接口等。
按照具体的实体, 对 NVM, 物理通道可以是 SAS接口、 SATA接口、 PCI-E接口、 Infiniband接口和 USB接口等;其上的命令集合协议可以是 NVM Express标准组织定义的标准、 T10定义的 SCSI命令集合的一个子集、 ATA 协议或 SOP协议等。
对机械硬盘及其扩展的硬盘, 物理通道可以是 SAS接口、 SATA接口、
Infiniband接口、 以太网接口、 PCI-E接口和 USB接口等; 其上的命令集合 协议是 T10定义的 SCSI命令集合, 可以单独划分一个子集来用, 或 ATA协 议等。
S32, 转换目标键以获取目标键对应的物理块地址。
在本发明实施例中, 可以在存储设备中预先存储有键(Key )和物理块 地址(PBA )的映射表, 基于所述目标键查询映射表从而获取目标键对应的 物理块地址。
该映射表可以通过多种实现方式建立, 图 5A至图 5C分别示出一种本 发明实施例提供的建立键(Key )和物理块地址(PBA )的映射关系的方法。 如图 5A所示, 通过对键(Key )进行哈希 (Hash, 哈希)运算, 例如使用 安全哈希算法 (SHA-1 , secure hash algorithm)对所述键 ( Key )进行哈希运 算, 将该哈希运算的结果对 PBA地址上限进行取模运算, 从而得到 PBA区 块号。 哈希运算就是把任意长度的输入(又叫做预映射, pre-image ), 通过 哈希算法, 变换成固定长度的输出, 该输出就是哈希值。 这种转换是一种压 缩映射, 也就是, 哈希值的长度通常远小于输入的长度, 不同的输入可能会 哈希成相同的输出, 因此无法从哈希值来唯一地确定输入值, 即键(Key ) 与哈希值的映射可以是多对一关系。 具体说明如下:
本发明实施例中, 在存储设备内维护一张哈希索引表, 所述哈希索引表 分行记录, 每一行记录筒称为哈希索引行, 包括该行的哈希索引号 ( Hashlndex )和映射列表( Map_List ), 其中所述哈希索引号 ( Hashlndex ) 即是某 Key经过哈希运算后所得到的哈希值, 所述映射列表( Map_List )可 以包含一条或多条 <Key, PBA>的映射关系, 这里所述一条或多条 <Key, PBA>的映射关系中的一个或多个 Key经过哈希运算所得到的哈希值即为所 述哈希索引号, 该映射关系指示将 Key对应的 Value放入到 PBA指向的存 储单元中。建立该哈希索引表的过程为: 对任意 Key,如 Key2,首先对 Key 2 进行哈希计算, 得到所述 Key2 对应的哈希索引号 Hashlndexm。 将得到的 Hashlndexm对应到该哈希索引表的第 m个哈希索引行, 然后获取该 Key2 对应的物理块地址( PBA ) , 本发明实施例中, 所述获取该 Key2对应的物理 块地址 (PBA ) 可以是将对 Key2 进行哈希运算所得到的哈希索引号 Hashlndexm对 PBA的地址上限进行取模, 该取模运算的结果即是 Key2所 对应的物理块地址(PBA ), 如 PBA6; 或者在整个 PBA空间内随机寻找一 个空闲物理块作为 Key2对应的物理块地址(PBA ), 如 PBA6。 则 Key2与 PBA6的映射关系作为一个节点插入到该哈希索引 (此处为 Hashlndexm )对 应的 Map_List (此处为 Map_Listm )中, 其中会记录为 <Key2, PBA6>, 即指 示将 Key2对应的 Value放入到 PBA6指向的存储单元中。 对每一 Key进行 上述操作, 明确每一个 Key相应的哈希索引号以及其对应的映射关系 <Key, PBA>并按照上述方式存入该哈希索引表中, 在该过程中, 如果对某 Key进 行哈希运算后得到的哈希索引号与前面曾经出现过的某哈希索引号相同,也 就是出现了哈希碰撞, 则在该哈希索引号对应的 Map_List 中插入新节点, 记录针对该某 Key的映射关系 <Key, PBA >。 举例来说, 假若对 Key_N进行 哈希运算, 得到的哈希索引号也为 Hashlndexm (跟 Key2有相同的哈希索引 号), 采用上述同样的方式, 得到 Key_N对应的物理块地址为 PBA_N, 则 在哈希索引行 Hashlndexm对应的 Map_Listm中插入一个新节点, 在该节点 中记录映射关系<Key_N, PBA_N>, 即指示将 Key_N对应的 Value放入到 PBA_N指向的存储单元中。 由此可知, 在每一个哈希索引行中, 与同一哈 希索引号 (Hashlndex )对应的映射表 Map_List中可以有 1条或多条映射关 <Key, PBA >。
需要注意的是, 将哈希索引号对 PBA的地址上限进行取模运算, 可能 会出现碰撞, 即不同的哈希索引号对 PBA的地址上限取模后得到相同的模, 这时为避免沖撞,可以对函数 Hash( Key )再进行一遍哈希计算后,再对 PBA 的地址上限取模,公式为(Hash(Hash(Key) % PBA的地址上限), 直到得到空 闲的 PBA块为止。
可选的, 作为另一种实现方式, 针对多个不同 Key有相同哈希索引号 ( Hashlndex ) 的场景, 在该哈希索引号对应的哈希索引行的 Map_List中的 每一条记录 < Key, PBA >后面维护一个指针, 用于指向该 Map_List中的下一 条< 6^ ΡΒΑ> , 这样可以通过指针把该 Map_List中的多条映射关系串成一 个链表。
基于上述的哈希索引映射表, 所述存储设备在接收到某接口命令后, 对 所述接口命令中的目标键进行哈希运算以获取该目标键的哈希索引号, 然后 在所述哈希索引表的与所述目标键的哈希索引号相关联的映射列表中查询 所述至少一条键同物理块地址的映射关系,从而获取与所述目标键对应的物 理块地址。
本发明实施例中, 上述哈希索引表作为本发明实施例的键和 PBA的映 射表, 可以是在运行时在内存中生成, 可以存储在内存中。 可选的, 作为不 同的实施例, 为防止该映射关系由于内存掉电而丟失, 该映射关系还可以持 久化到存储介质指定的区间内, 该指定的区域即成为索引区。
当存储设备为 NVM时, NVM的硬件特性决定了数据删除或读写操作 有使用次数的限制。 通过哈希运算, 有效地将数据散列到不同 PBA空间内, 符合磨损平衡( wear leveling )的期望,可提升 NVM寿命。并且,基于键( Key ) 散列的特征, 可充分发挥 NVM颗粒随机和并发性好的特点, 提高并发度。
当存储设备为 HDD时, HDD的硬件特性决定了数据需要按序读取的特 性。 通过哈希运算, 可有效地将数据散列到不同 PBA空间内, 虽然破坏了 硬盘的顺序读取特性, 但通过充分考虑磁盘内外圏访问速度的不同, 对数值 ( Value )存放区域可跨磁道进行存放, 以综合利用硬盘性能。 在大型分布式 环境下, 可以满足多块硬盘访问的负载均衡, 具有全局性能均衡的效果。
如图 5B所示, 也可以是以静态映射表方式建立键 ( Key )与 PBA的映 射关系。
就任一 Key,确定该 Key对应的 PBA。可参照上述实施例所描述的方式, 可以是将所述 Key进行哈希运算后得到的哈希值, 再对 PBA的地址上限取 模得到该 Key所对应的 PBA, 或在 PBA空间内随机寻找一个空闲物理块作 为该 Key对应的 PBA等,本发明实施例中,假设 Keyl对应的 PBA为 PBA1。 将映射关系 <Keyl, PBA1>记在索引记录中。 后续若需要写入新的 Key对应 的 Value时, 可以同上述处理过程。
基于上述的静态映射表, 所述存储设备在接收到某接口命令后, 根据所 述目标键查询所述静态映射表, 获取所述目标键对应的物理块地址。
如图 5C所示, 本发明实施例中, 还可以通过多路查找树建立键(Key ) 与 PBA的映射关系, 本发明实施例中, 所述多路查找树可以包含基于键的 多路径查找子树、 以及与所述多路径查找子树中每一个节点相关联的一条键 和物理块地址的映射关系。 所述多路径查找子树可以是利用 B+Tree的方式 实现, B+Tree技术可以保持键的数据排列稳定有序,使键的插入与修改有较 稳定的对数时间复杂度。 B+Tree中的每个结点根据实际情况可以包含大量的 关键字信息和分支; 这样树的深度降低了, 这就意味着查找一个元素只要很 少结点, 从存储设备例如外存磁盘中读入内存, 很快访问到要查找的数据。 B+Tree方式也可以将数据充分散列地放置到 NVM地址空间中, 从而提升 NVM寿命。
下面详细说明本发明实施例中通过多路查找树( B+Tree )建立键( key ) 与 PBA的映射关系的过程:
首先假定多路查找树每次分配的节点空间大小为 4个单元。假设前 4次 分别插入了 2、 10、 15、 20, 对应 4个键(Key )。 第 5次插入 3 , 但分配的节点空间的 4个单元已用完, 此时重新分配一 个新节点, 该新节点也包含 4个单元。 因为 3比 2大, 比 10小, 因此插入 到位置为 2和 10之间的子树对应的节点。
以后插入过程类似, 逐步建立起分级多路树结构。 针对每一个 Key, 在 B+Tree的 Key空间上申请一个单元, 记录该 Key。 同时, 获取该 Key对应 的 PBA, 可参照上述实施例所描述的方式, 可以是将所述 Key进行哈希运 算后得到的哈希值, 再对 PBA的地址上限取模得到该 Key所对应的 PBA, 或在 PBA空间内随机寻找一个空闲物理块作为该 Key对应的 PBA等,然后 将 B+Tree中每一个单元的 Key跟该 Key相对应的映射关系 <Key, PBA>关联 起来, 如图 5C右侧列表所示。
图 5C中出于筒洁, 仅以两级树结构进行了说明, 但本发明实施例对此 不做限定。 最终, 作为 B+Tree上节点空间的任一键与 PBA空间的 PBA地 址都建立了——对应关系, 即通过上述方法生成本发明实施例的键和 PBA 的映射关系。
基于上述的多路查找树, 所述存储设备在接收到某接口命令后, 在其多 路径查找子树中查找与所述目标键对应的节点, 并根据与该节点相关联的键 和物理块地址的映射关系确定与所述目标键对应的物理块地址。
本领域技术人员可以理解的是, 建立键(key )与 PBA的映射关系的实 现方式不限于上述几种。
S33, 根据所获取的物理块地址, 执行接口命令。
举例来说, 根据查询得到的物理块地址(PBA ), 针对写入命令, 在键 ( Key )对应的物理块地址(PBA )所指向的存储单元上写入数据; 针对读 出命令, 返回所述键 ( Key )对应的物理块地址(PBA )所指向的存储单元 上存储的数据; 针对删除命令, 释放在键(Key )对应的物理块地址(PBA ) 所指向的存储单元上所存储的数据。
进一步的, 本发明实施例的方法还包括返回针对所述接口命令的执行结 果, 如针对写命令, 返回写操作是否成功的信息; 针对读出命令, 返回所述 键 ( Key )对应的物理块地址(PBA )所指向的存储单元上存储的数据; 针 对删除命令, 返回删除操作是否成功的信息。
本发明实施例通过键-值存储方式, 存储设备接收包括目标键的接口命 令, 然后转换目标键以获取目标键对应的 PBA, 再根据获取的 PBA执行接 口命令。本发明实施例中避免进行 Key<->文件 <->LBA<->PBA的多次转换 , 或 Key<-> LBA<->PBA的多次转换,上述多次转换需要处理器、文件系统和 存储设备之间的多个交互。 本发明实施例直接由 Key得到其对应的 PBA, 从而减少地址转换的次数, 降低了维护地址转换的管理软件的复杂度的同时 减少计算机系统的性能损耗, 提高了计算机系统中访问存储设备的速度, 由 此整体提高了计算机系统性能。
图 4是本发明另一实施例的访问存储设备的方法 40的示意流程图, 该 实施例中, 可以不用预先在存储设备中存储上述实施例所描述的键同物理块 地址的映射表, 而是在接收到包含目标键的接口命令后, 直接根据所述目标 键计算从而获取该目标键对应的物理块地址(PBA ), 具体包括以下内容。
S41 , 存储设备接收包括目标键的接口命令。
接口命令包括通常的读、 写、 删除等操作命令。 举例来说, 接口命令可 以定义为以下格式。
写入命令格式为: Put (String Key, String Version optional, String Value)或 Write(String Key, String Version optional, String Value) , 该写入命令中包含目 标键(Key )、 可选的版本(Version )信息, 和准备写入的数值( Value )。
读出命令格式为: Get(String Key, String Version optional) 或 Read(String Key, String Version optional) , 该读出命令中包含目标键 ( Key )、 可选的版本 ( Version )信息。
删除命令格式为: Delete(String Key, String Version optional)或 Trim(String
Key, String Version optional) , 该删除命令中包含目标键(Key )、 可选的版 本 ( Version )信息。
可选的, 还可以支持比较命令, 格式为: Compare(String Key, String Version optional, String Value)。 将键 ( Key )和可选的版本 ( Version )指定的 存储空间内的值取出, 同该比较命令携带的数值(Value )参数进行比较。
同时,可选的,也支持 NVMe ( Non- Volatile Memory Express, PCI-E SSD 同 Host之间的接口协议以及相关标准组织) 定义的数据集管理 (Dataset Management )和下屌 'J ( Flush )命令,以及不可爹正命令( Write uncorrectable ), 上述命令使用的地址空间都是基于键(Key )作为逻辑地址输入而指定的地 址空间。
此外, 还可以支持 SCSI ( small computer System Interface, 小型计算机 系统接口)标准和 SOP ( SCSI over PCI-E, 在 PCI-E通道上承载 SCSI协议 消息)定义的所有与 LBA有关的命令, 将 LBA空间修改为基于 Key指定的 空间。
在 NVM接口上改变了现有标准中定义的读、 写等命令基于 LBA作为 地址输入, 在 NVM控制器进行地址映射的方式; 而使用键(Key )作为地 址输入, 从而可避免文件访问方式或块访问方式带来的开销。
本发明实施例中的存储设备可以具有以下接口中的一个或多个, 例如传 统硬盘接口、 Infiniband接口、 以太网 (Ethernet )接口、 PCI-E ( peripheral component interconnected express )接口和 USB接口等。传统硬盘接口可以包 括 SAS接口、 SATA接口等。
按照具体的实体, 对 NVM, 物理通道可以是 SAS接口、 SATA接口、 PCI-E接口、 Infiniband接口和 USB接口等;其上的命令集合协议可以是 NVM Express标准组织定义的标准、 T10定义的 SCSI命令集合的一个子集、 ATA 协议或 SOP协议等。
对机械硬盘及其扩展的硬盘, 物理通道可以是 SAS接口、 SATA接口、
Infiniband接口、 以太网接口、 PCI-E接口和 USB接口等; 其上的命令集合 协议是 T10定义的 SCSI命令集合, 可以单独划分一个子集来用, 或 ATA协 议等。
S42, 根据所述目标键确定物理块地址(PBA )。
该实施例中, 可以不用预先在存储设备中存储上述实施例所描述的键同 物理块地址的映射表, 而是在接收到包含目标键的接口命令后, 直接根据所 述目标键计算从而获取该目标键对应的物理块地址( PBA ), 该过程具体可 参照上述实施例中所描述的建立如图 5A-5C的映射表的实现方式,本实施例 在此不再赘述。
S43, 根据所确定的物理块地址, 执行所述接口命令。
举例来说, 针对写入命令, 在键(Key )对应的物理块地址(PBA )所 指向的存储单元上写入数据; 针对读出命令, 返回所述键(Key )对应的物 理块地址(PBA )所指向的存储单元上存储的数据; 针对删除命令, 释放在 键 ( Key )对应的物理块地址(PBA )所指向的存储单元上所存储的数据。
进一步的, 本发明实施例的方法还包括返回针对所述接口命令的执行结 果, 如针对写命令, 返回写操作是否成功的信息; 针对读出命令, 返回所述 键 ( Key )对应的物理块地址(PBA )所指向的存储单元上存储的数据; 针 对删除命令, 返回删除操作是否成功的信息。
本发明实施例通过键-值存储方式, 存储设备接收包括目标键的接口命 令,然后根据该目标键进行计算从而获取该目标键对应的物理块地址 (PBA), 再根据获取的 PBA执行接口命令。 本发明实施例中避免进行 Key<->文件 <->LBA<->PBA的多次转换, 或 Key<-> LBA<->PBA的多次转换, 上述多 次转换需要处理器、 文件系统和存储设备之间的多个交互。 本发明实施例直 接由 Key得到其对应的 PB A , 从而减少地址转换的次数, 降低了维护地址 转换的管理软件的复杂度的同时减少计算机系统的性能损,耗提高了计算机 系统中访问存储设备的速度, 由此整体提高了计算机系统性能。
图 6A是本发明实施例的存储设备 60的示意框图。 存储设备 60包括接 口 61、 控制器 62和存储介质 63。
接口 61接收包括目标键的接口命令并发送到控制器 62。
控制器 62转换所述目标键以获取所述目标键对应的物理块地址; 根据 所获取的物理块地址, 对所述存储介质执行所述接口命令。
图 6B是本发明另一实施例的存储设备 60的示意框图。本发明实施例提 供的存储装置还包括内存 64, 可以预先在所述内存 64中存储键与物理块地 址的映射表, 则所述控制器 62在接收到来自接口 61的接口命令后, 根据所 述接口命令中携带的目标键到所述内存 64 中查询所述映射表, 从而获取与 该目标键对应的物理块地址。
可选的, 作为不同的实施例, 为防止由于内存 64掉电引起该映射表丟 失, 也可以将该映射表存储到存储介质 63的指定区域, 则所述控制器 62可 以将所述存储介质 63存储的所述映射表读取到所述内存 64中,根据所述接 口命令中携带的目标键到所述内存 64 中查询所述映射表, 从而获取与该目 标键对应的物理块地址。
所述的映射表结构具体参照如图 5A、 5B或 5C以及相应的实施例部分 描述的内容, 在此不再赘述。
所述存储介质 63用于存储数据, 所述控制器 62根据所获取的物理块地 址对所述存储介质 63执行所述接口命令具体可以有如下几种情况:
针对写入命令, 则控制器 62指令在目标键对应的物理块地址所指向的 所述存储介质 63 中的存储单元上写入该写入命令中携带的数据; 针对读出 命令, 则控制器 62指令从目标键对应的物理块地址所指向的所述存储介质 63中的存储单元上读取该存储单元上存储的数据;针对删除命令, 则控制器 62指令从目标键对应的物理块地址所指向的所述存储介质 63中的存储单元 上释放该存储单元上所存储的数据。 进一步的, 本发明实施例的方法还包括 返回针对所述接口命令的执行结果, 如针对写命令, 返回写操作是否成功的 信息; 针对读出命令, 返回所述键 ( Key )对应的物理块地址(PBA )所指 向的存储单元上存储的数据;针对删除命令,返回删除操作是否成功的信息。
所述的控制器 62可以是 SoC(System On Chip, 片上系统)、 ASIC (特定 用途集成电路, Application Specific Integrated Circuit )、 DSP(Digital Signaling Processor, 数字信号处理器)、 FPGA(Field-Programmable Gate Array, 现场可 编程门阵列), 以及通用处理器等等。
本发明实施例提供的存储设备通过键-值存储方式存储数据, 存储设备 接收包括目标键的接口命令, 然后转换该目标键以获取该目标键对应的物理 块地址, 再根据获取的物理块地址执行接口命令。 本发明实施例中避免进行 Key<->文件 <->LBA<->PBA的多次转换, 或 Key<-> LB A<->PB A的多次转 换, 上述多次转换需要处理器、 文件系统和存储设备之间的多个交互。 本发 明实施例直接由 Key得到其对应的 PBA, 从而减少地址转换的次数, 降低 了维护地址转换的管理软件的复杂度的同时减少计算机系统的性能损耗,提 高了计算机系统访问存储设备的速度, 由此整体提高了计算机系统性能。
可选的, 作为不同的实施例, 可以不用预先在所述存储设备的存储介质
63或内存 64中存储所述的键同物理块地址的映射表, 而是在接收到包含目 标键的接口命令后, 所述控制器 62直接根据所述目标键计算从而获取该目 标键对应的物理块地址(PBA ), 该过程具体可参照上述实施例中所描述的 建立如图 5A-5C的映射表的实现方式, 本实施例在此不再赘述。
可选的, 作为不同的实施例, 控制器 62接收的接口命令可以包括以下 接口命令中的一个。
Put ( String Key, String Version optional, String Value ); 或
Write ( String Key, String Version optional, String Value ); 或
Get ( String Key, String Version optional ); 或
Read ( String Key, String Version optional ); 或
Delete ( String Key, String Version Optional ); 或 TRIM ( String Key, String Version Optional )。
作为不同的实现方式,存储设备 60可以通过控制器 62与不同的存储介 质结合, 形成各种各样的存储设备。 存储介质可以是磁盘介质或固态电子存 储介质等。举例来说,存储设备 60可以是机械硬盘( HDD )、固态硬盘( SSD )、 自选注入式随机访问存储器(STT-RAM )、 铁电存储器(FeRAM )、 相变存 储器(PCM)、 电阻式随机访问存储器 (RRAM)等等。
作为可能的实现方式, 存储设备 60可以支持的物理通道包括以下接口 中的一个或多个: SAS接口、 SATA接口、 PCI-E接口、 Infiniband接口、 以 太网接口和 USB接口。
作为可能的实现方式, 存储设备 60可以支持的接口命令承载于以下协 议中的一个或多个: NVM Express协议、 SCSI协议、 ATA协议、 SOP协议、 HTTP/REST协议、 HTTP/SOAP协议和 RPC协议。
存储设备 60实现了方法 40或方法 30,通过查询所存储的或通过直接对 目标键进行计算而获取的键和物理块地址的映射关系, 减少了地址转换的次 数, 从而提高了计算机系统中访问存储设备的速度。
本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到, 为描述的方便和筒洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储设备中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前 述的存储设备包括: U盘、移动硬盘、只读存储器( ROM, Read-Only Memory )、 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。

Claims

权利要求
1. 一种访问存储设备的方法, 其特征在于, 包括:
所述存储设备接收包括目标键的接口命令;
转换所述目标键以获取所述目标键对应的物理块地址;
根据所获取的物理块地址, 执行所述接口命令。
2. 根据权利要求 1所述的方法, 其特征在于, 所述存储设备预先存储 键同物理块地址的映射表, 则所述转换所述目标键以获取所述目标键对应的 物理块地址具体为:
根据所述目标键查询所述映射表, 获取所述目标键对应的物理块地址。
3. 根据权利要求 2所述的方法, 其特征在于, 所述存储设备预先存储 的键同物理块地址的映射表为哈希索引表, 包括哈希索引号、 以及跟所述哈 希索引号相关联的映射列表, 所述映射列表中包含至少一条键同物理块地址 的映射关系;
则所述转换所述目标键以获取所述目标键对应的物理块地址具体为: 对所述目标键进行哈希运算以获取该目标键的哈希索引号,在所述哈希 索引表的与所述目标键的哈希索引号相关联的映射列表中查询所述至少一 条键同物理块地址的映射关系, 获取与所述目标键对应的物理块地址。
4. 根据权利要求 2所述的方法, 其特征在于, 所述存储设备预先存储 的键同物理块地址的映射表为静态映射表, 该静态映射表包含键以及与所述 键对应的物理块地址;
则所述转换所述目标键以获取所述目标键对应的物理块地址具体为: 根据所述目标键查询所述静态映射表, 获取所述目标键对应的物理块地 址。
5. 根据权利要求 2所述的方法, 其特征在于, 所述存储设备预先存储 的键同物理块地址的映射表为多路查找树, 包括基于键的多路径查找子树、 以及与所述多路径查找子树中每一个节点相关联的一条键和物理块地址的 映射关系;
则所述转换所述目标键以获取所述目标键对应的物理块地址具体为: 在所述多路径查找子树中查找与所述目标键对应的节点, 并根据与该节 点相关联的键和物理块地址的映射关系确定与所述目标键对应的物理块地 址。
6. 根据权利要求 3至 5任一项所述的方法, 其特征在于, 所述映射表 中的键和物理块地址的映射关系具体通过如下方式建立:
对所述键进行哈希运算以得到哈希值,将所述哈希值对物理块地址的地 址上限取模, 从而得到所述键对应的物理块地址; 或
随机查找空闲的物理块,将该空闲物理块的物理块地址作为与所述键对 应的物理块地址。
7. 根据权利要求 1所述的方法, 其特征在于, 所述转换所述目标键以 获取所述目标键对应的物理块地址具体为:
对所述目标键进行哈希运算以得到哈希值,将所述哈希值对物理块地址 的地址上限取模, 从而得到所述目标键对应的物理块地址; 或
随机查找空闲的物理块,将该空闲物理块的物理块地址作为与所述目标 键对应的物理块地址。
8. 根据权利要求 1至 7任一项所述的方法, 其特征在于, 所述接口命 令包括以下接口命令之一:
Put ( String Key, String Version optional, String Value ); 或
Write ( String Key, String Version optional, String Value ); 或
Get ( String Key, String Version optional ); 或
Read ( String Key, String Version optional ); 或
Delete ( String Key, String Version Optional ); 或
TRIM ( String Key, String Version Optional )。
9. 根据权利要求 1至 7任一项所述的方法, 其特征在于, 所述存储设 备支持的接口命令承载于以下协议中的一个或多个:
NVM Express协议、 SCSI协议、 ATA协议、 SOP协议、 HTTP/REST协 议、 HTTP/SOAP协议和 RPC协议。
10. 一种存储设备, 其特征在于, 包括接口、 控制器和存储介质: 所述接口接收包括目标键的接口命令并发送到所述控制器;
所述控制器转换所述目标键以获取所述目标键对应的物理块地址; 所述控制器根据所获取的物理块地址,对所述存储介质执行所述接口命 令。
11. 根据权利要求 10所述的存储设备, 其特征在于, 所述存储设备还 包括内存, 所述存储介质还用于存储键同物理块地址的映射表; 则所述控制器具体用于从所述存储介质将所述键同物理块地址的映射 表读到所述内存, 根据所述目标键在所述内存中查询所述映射表, 获取与所 述目标键对应的物理块地址。
12. 根据权利要求 10所述的存储设备, 其特征在于, 所述存储设备还 包括内存, 所述内存用于存储键同物理块地址的映射表;
则所述控制器具体用于根据所述目标键在所述内存中查询所述映射表, 获取与所述目标键对应的物理块地址。
13. 根据权利要求 10所述的存储设备, 其特征在于:
所述控制器具体用于对所述目标键进行哈希运算以得到哈希值,将所述 哈希值对物理块地址的地址上限取模,从而得到所述目标键对应的物理块地 址; 或随机查找空闲的物理块, 将该空闲物理块的物理块地址作为与所述目 标键对应的物理块地址。
14. 根据权利要求 10至 13任一项所述的存储设备, 其特征在于, 所述 控制器接收的接口命令包括以下接口命令之一或多个:
Put ( String Key, String Version optional, String Value ); 或
Write ( String Key, String Version optional, String Value ); 或
Get ( String Key, String Version optional ); 或
Read ( String Key, String Version optional ); 或
Delete ( String Key, String Version Optional ); 或
TRIM ( String Key, String Version Optional )。
15. 根据权利要求 10至 13任一项所述的存储设备, 其特征在于, 所述 存储设备为以下设备的一种:
机械硬盘 HDD、固态硬盘 SSD、 自选注入式随机访问存储器 STT-RAM、 铁电存储器 FeRAM、 相变存储器 PCM、 电阻式随机访问存储器 RRAM。
16. 根据权利要求 10至 13任一项所述的存储设备, 其特征在于, 所述 存储设备支持的物理接口包括以下接口中的一个或多个:
SAS接口、 SATA接口、 PCI-E接口、 Infiniband接口、 以太网接口和 USB接口。
17、 根据权利要求 10至 13任一项所述的存储设备, 其特征在于, 所述 存储设备支持的接口命令承载于以下协议中的一个或多个:
NVM Express协议、 SCSI协议、 ATA协议、 SOP协议、 HTTP/REST协 议、 HTTP/SOAP协议和 RPC协议
PCT/CN2012/086667 2012-12-14 2012-12-14 访问存储设备的方法和存储设备 WO2014089828A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280002763.9A CN104054071A (zh) 2012-12-14 2012-12-14 访问存储设备的方法和存储设备
PCT/CN2012/086667 WO2014089828A1 (zh) 2012-12-14 2012-12-14 访问存储设备的方法和存储设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/086667 WO2014089828A1 (zh) 2012-12-14 2012-12-14 访问存储设备的方法和存储设备

Publications (1)

Publication Number Publication Date
WO2014089828A1 true WO2014089828A1 (zh) 2014-06-19

Family

ID=50933732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086667 WO2014089828A1 (zh) 2012-12-14 2012-12-14 访问存储设备的方法和存储设备

Country Status (2)

Country Link
CN (1) CN104054071A (zh)
WO (1) WO2014089828A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209644A (zh) * 2015-12-28 2017-09-26 华为技术有限公司 一种数据处理方法以及NVMe存储器
CN108614671A (zh) * 2016-12-12 2018-10-02 北京忆恒创源科技有限公司 基于命名空间的键-数据访问方法与固态存储设备
CN108614668A (zh) * 2016-12-12 2018-10-02 北京忆恒创源科技有限公司 基于kv模型的数据访问方法与固态存储设备
CN108614669A (zh) * 2016-12-12 2018-10-02 北京忆恒创源科技有限公司 解决哈希冲突的键-数据访问方法与固态存储设备
CN111752480A (zh) * 2016-03-24 2020-10-09 华为技术有限公司 一种数据写方法、数据读方法及相关设备、系统
CN111813345A (zh) * 2020-07-17 2020-10-23 济南浪潮数据技术有限公司 一种数据传输方法、装置、服务器及可读存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018966A (zh) * 2018-01-09 2019-07-16 阿里巴巴集团控股有限公司 一种存储器、存储系统、主机及数据操作、垃圾回收方法
KR102641521B1 (ko) * 2018-02-22 2024-02-28 삼성전자주식회사 키-밸류 스토리지 장치 및 이의 동작 방법
CN112506814B (zh) * 2020-11-17 2024-03-22 合肥康芯威存储技术有限公司 一种存储器及其控制方法与存储系统
CN113485948B (zh) * 2021-06-29 2023-11-14 成都忆芯科技有限公司 Nvm坏块管理方法与控制部件
CN114465770A (zh) * 2021-12-29 2022-05-10 天翼云科技有限公司 数据处理方法及相关装置
CN114265958A (zh) * 2022-03-01 2022-04-01 南京得瑞芯存科技有限公司 Kv ssd的映射管理方法、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1139489A (zh) * 1993-11-02 1997-01-01 帕拉科姆有限公司 加速计算机数据库事务处理的装置
CN101901270A (zh) * 2010-08-05 2010-12-01 上海酷吧信息技术有限公司 一种支持海量存储的内存数据库方法
CN102043852A (zh) * 2010-12-22 2011-05-04 东北大学 一种基于路径信息的可扩展标记语言祖先后代索引方法
CN102521228A (zh) * 2011-11-01 2012-06-27 浙江省电力试验研究院 一种线性数据表键值映射方法
CN102737127A (zh) * 2012-06-20 2012-10-17 厦门聚海源物联网络技术有限公司 一种海量数据存储方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1139489A (zh) * 1993-11-02 1997-01-01 帕拉科姆有限公司 加速计算机数据库事务处理的装置
CN101901270A (zh) * 2010-08-05 2010-12-01 上海酷吧信息技术有限公司 一种支持海量存储的内存数据库方法
CN102043852A (zh) * 2010-12-22 2011-05-04 东北大学 一种基于路径信息的可扩展标记语言祖先后代索引方法
CN102521228A (zh) * 2011-11-01 2012-06-27 浙江省电力试验研究院 一种线性数据表键值映射方法
CN102737127A (zh) * 2012-06-20 2012-10-17 厦门聚海源物联网络技术有限公司 一种海量数据存储方法

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3260971A4 (en) * 2015-12-28 2018-05-02 Huawei Technologies Co., Ltd. Data processing method and nvme storage
US11467975B2 (en) 2015-12-28 2022-10-11 Huawei Technologies Co., Ltd. Data processing method and NVMe storage device
CN107209644A (zh) * 2015-12-28 2017-09-26 华为技术有限公司 一种数据处理方法以及NVMe存储器
EP3916536A1 (en) * 2015-12-28 2021-12-01 Huawei Technologies Co., Ltd. Data processing method and nvme storage device
CN107209644B (zh) * 2015-12-28 2020-04-28 华为技术有限公司 一种数据处理方法以及NVMe存储器
US10705974B2 (en) 2015-12-28 2020-07-07 Huawei Technologies Co., Ltd. Data processing method and NVME storage device
CN111427517A (zh) * 2015-12-28 2020-07-17 华为技术有限公司 一种数据处理方法以及NVMe存储器
CN111752480A (zh) * 2016-03-24 2020-10-09 华为技术有限公司 一种数据写方法、数据读方法及相关设备、系统
CN108614668A (zh) * 2016-12-12 2018-10-02 北京忆恒创源科技有限公司 基于kv模型的数据访问方法与固态存储设备
CN108614669A (zh) * 2016-12-12 2018-10-02 北京忆恒创源科技有限公司 解决哈希冲突的键-数据访问方法与固态存储设备
CN108614671A (zh) * 2016-12-12 2018-10-02 北京忆恒创源科技有限公司 基于命名空间的键-数据访问方法与固态存储设备
CN108614669B (zh) * 2016-12-12 2023-02-17 北京忆恒创源科技股份有限公司 解决哈希冲突的键-数据访问方法与固态存储设备
CN108614671B (zh) * 2016-12-12 2023-02-28 北京忆恒创源科技股份有限公司 基于命名空间的键-数据访问方法与固态存储设备
CN111813345A (zh) * 2020-07-17 2020-10-23 济南浪潮数据技术有限公司 一种数据传输方法、装置、服务器及可读存储介质

Also Published As

Publication number Publication date
CN104054071A (zh) 2014-09-17

Similar Documents

Publication Publication Date Title
WO2014089828A1 (zh) 访问存储设备的方法和存储设备
JP7090606B2 (ja) データベース・システムにおけるテスト・データの形成及び動作
US9778856B2 (en) Block-level access to parallel storage
US9729659B2 (en) Caching content addressable data chunks for storage virtualization
US8620962B1 (en) Systems and methods for hierarchical reference counting via sibling trees
WO2016082196A1 (zh) 文件访问方法、装置及存储设备
WO2014101420A1 (zh) 一种元数据的构建系统及其方法
US20150032938A1 (en) System and method for performing efficient processing of data stored in a storage node
US8977662B1 (en) Storing data objects from a flat namespace in a hierarchical directory structured file system
JP2015521310A (ja) 効率的なデータオブジェクトストレージ及び検索
TW201220197A (en) for improving the safety and reliability of data storage in a virtual machine based on cloud calculation and distributed storage environment
WO2014101000A1 (zh) 元数据管理方法及系统
WO2016054212A1 (en) Efficient metadata in a storage system
WO2012083754A1 (zh) 处理脏数据的方法及装置
US9020994B1 (en) Client-based migrating of data from content-addressed storage to file-based storage
US20140280392A1 (en) File system operation on multi-tiered volume
US20160170649A1 (en) Unified object interface for memory and storage system
EP3669262A1 (en) Thin provisioning virtual desktop infrastructure virtual machines in cloud environments without thin clone support
CN113535670B (zh) 一种虚拟化资源镜像存储系统及其实现方法
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
US10055139B1 (en) Optimized layout in a two tier storage
WO2022262381A1 (zh) 一种数据压缩方法及装置
US10762139B1 (en) Method and system for managing a document search index
EP4016312B1 (en) Data operations using a cache table in a file system
US10628391B1 (en) Method and system for reducing metadata overhead in a two-tier storage architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12890005

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12890005

Country of ref document: EP

Kind code of ref document: A1