CN105117415B - A kind of SSD data-updating methods of optimization - Google Patents

A kind of SSD data-updating methods of optimization Download PDF

Info

Publication number
CN105117415B
CN105117415B CN201510458844.5A CN201510458844A CN105117415B CN 105117415 B CN105117415 B CN 105117415B CN 201510458844 A CN201510458844 A CN 201510458844A CN 105117415 B CN105117415 B CN 105117415B
Authority
CN
China
Prior art keywords
data
memory
ssd
resident
methods
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510458844.5A
Other languages
Chinese (zh)
Other versions
CN105117415A (en
Inventor
段章峰
伍卫国
崔金华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201510458844.5A priority Critical patent/CN105117415B/en
Publication of CN105117415A publication Critical patent/CN105117415A/en
Application granted granted Critical
Publication of CN105117415B publication Critical patent/CN105117415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1847File system types specifically adapted to static storage, e.g. adapted to flash memory or SSD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of SSD data-updating methods of optimization, for the data of the character types stored on SSD, two kinds of data structures of Kd-Trees and B-tree are used in combination, while search efficiency is ensured, index structure is reduced to be taken up space, when being updated to character data, utilize set intersection union feature, the different value of identical key in two data blocks are updated, other data then write back as former state, by using the LSM methods based on line segment B-tree, being sequentially written in for data will be converted into for randomly updating for character data on SSD, effectively prevent the write-in scale-up problem of SSD, accelerate the write data speed of SSD, improve database manipulation efficiency.

Description

A kind of SSD data-updating methods of optimization
Technical field
The invention belongs to field of computer technology, and in particular to a kind of SSD data-updating methods of optimization.
Background technology
When NoSQL databases design, needing can persistent storage by the data in memory.It uses SSD (solid state disk) As can persistent storage when, the read-write processing capacities of data, the overall performance of lifting system can be accelerated.Therefore data are write It returns in SSD, the two-level memory framework of composition memory-SSD, to provide capacity bigger for application, compared to tradition machinery disk Storage, the faster Database Systems of access speed.In order to quickly access to the data being stored on SSD, need to data Establish index.However the index structures such as traditional B-tree, B+ trees can generate the request of a large amount of random I/O operation.If directly will The untreated write-in SSD of a large amount of random I/O request datas can cause the serious reduction of SSD performances.Because work as upper layer application When program needs to update the data on SSD, write-in amplification characteristic intrinsic SSD increases the delay of write operation.
O'Nei et al. is in the log-structured merging (Log Structure Merge) of inwardly proposition of log file system The thought of method according to the continuous additional write-in feature of log information, with reference to B-tree data structure, is sacrificed part reading performance, is used for Write performance is greatly improved, the balance of more preferable performance is obtained between read-write.The LSM methods provide a kind of delay update mechanism, The update that the update operation of random small data quantity is fused to big succession operates, and improves the bandwidth availability ratio of storage.LSM The These characteristics of method cause it is written in the two-level memory framework of memory-SSD to be showed more than in the Database Systems read Go out good performance.
However LSM methods will appear the problem of some are new when directly acting on the NoSQL Database Systems based on SSD.LSM Method is to be designed based on keyword for integer type, in the database for the character string type that storage keyword is key-value In, when being indexed if established to the key values of character string type, then index structure being write back SSD, converted, because of SSD On cannot directly store the data of pointer type.On the other hand, the data of character string type are various due to its form, length etc. Property, it can not represent its data with the space distributed in advance, and be individually for each key value to establish index, it will so that The data volume in b-tree indexed structure in LSM methods in SSD increases, and accesses in this case to data, it will cause Accessed path increases, and data access delay increases.For this it is a series of the problem of, the present invention proposes a kind of to be directed to character string type The improvement index structure of data, can speed up the access speed of character string type data on SSD.
Invention content
In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of SSD data updates of optimization Method improves data access efficiency.
In order to achieve the above object, the technical solution taken of the present invention is:
A kind of SSD data-updating methods of optimization, include the following steps:
The first step establishes line segment B-tree structure:
In memory database, first with the shared prefix information of string data, by the character string of shared same prefix Form a character string section;Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree;Finally, The logical view of entire data structure is a B-tree, but the keyword strings block information of its storage;
Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and The merging process step of the pLSM methods of two components is identical, and the merging process of the pLSM methods of two components is as follows:
1) from memory-resident C0It is middle to read in the leaf node data not merged, it inserts and merges in block;
2) from disk resident C1It is middle to read in the leaf node data not merged, it inserts and merges in block;
3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C0In Data are latest data, are updated operation;
4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C0 With disk resident C1In the leaf node data that do not merge;
5) as memory-resident C0With disk resident C1All leaf nodes all carried out union operation after, represent one conjunction And process terminates, memory-resident C0In update operation data be integrated into disk;
By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then using step by step The strategy of merging writes back SSD, during writing back, does not modify to the data in SSD directly, but by merging process come Updated data are generated, and in the additional new file of write-in, after the completion of merging, delete ancient deed;
Third walks, and index is provided to the data on SSD disks using line segment B-tree structure:
After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data first exists Memory-resident C in memory0It searches in structure, is so searched in disk structure step by step if it is not found, from resident Disk C1, C2Until Ck, until finding;
In search procedure, if same data field occurs simultaneously in multiple structures, pLSM methods ensure to read every time Newest and correct data can be fetched in taking, in search operation, if required data are not found in, just from disk Read data;
When being searched in the index structure in SSD disks, the comparison of keyword be converted into test data to be found whether In section, if finding the block information for including data to be found, its offset in the data file is read, passes through file Offset accesses data file, obtains the corresponding value of key to be found, and the entire interval censored data in primary access is buffered in In memory, a buffering area is formed, when the data access inside the same section of next time arrives, is then directly carried out in memory Operation, the tissue of buffering eliminate the buffered data being of little use using doubly linked list tissue, and by LRU, method.
The beneficial effects of the invention are as follows:
The present invention realizes a kind of LSM methods of optimization, and pLSM methods provide the character string type stored for SSD The more new strategy of data.On the one hand, using pLSM methods, sequence will be converted into the random data update of file on SSD disks File appending is written, and avoids the write-in scale-up problem of SSD.On the other hand, it is carried using line segment B-tree structure for the data in SSD disks For index, whole traversals during data search are avoided, data search time complexity is reduced to O (Log N) from O (N), improves Data access efficiency.
Description of the drawings
Fig. 1 is two component LSM structure diagrams.
Fig. 2 is LSM method merging process schematic diagrames.
Specific embodiment
A kind of SSD data-updating methods of optimization, include the following steps:
The first step establishes line segment B-tree structure:
In memory database, first with the shared prefix information of string data, by the character string of shared same prefix Form a character string section;Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree;Finally, The logical view of entire data structure is a B-tree, but the keyword strings block information of its storage;
Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and The merging process step of the pLSM methods of two components is identical, and with reference to Fig. 1, the merging process of the pLSM methods of two components is as follows:
1) from memory-resident C0It is middle to read in the leaf node data not merged, it inserts and merges in block;
2) from disk resident C1It is middle to read in the leaf node data not merged, it inserts and merges in block;
3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C0In Data are latest data, are updated operation, for example, memory-resident C0The Data Identification that middle key is 001 is deletion, and C1In Key is that 001 corresponding data are 12345, then when carrying out merger sequence in merging block, due to memory-resident C0In number It is updated according to for latest data, so it is the data corresponding to 001 to delete key;
4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C0 With disk resident C1In the leaf node data that do not merge;
5) as memory-resident C0With disk resident C1All leaf nodes all carried out union operation after, represent one conjunction And process terminates, memory-resident C0In update operation data be integrated into disk, as shown in Figure 2;
Merging process of the above-mentioned steps for the pLSM algorithms of two components, merging process and above-mentioned steps phase between multicompartment Together;
By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then using step by step The strategy of merging writes back SSD, during writing back, does not modify to the data in SSD directly, but by merging process come Generate updated data, and it is additional be written in new file, after the completion of merging, delete ancient deed, in this way after, The write-in scale-up problem of SSD is effectively prevented, data is improved and writes back efficiency;
Third walks, and index is provided to the data on SSD disks using line segment B-tree structure:
After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data first exists Memory-resident C in memory0It searches in structure, is so searched in disk structure step by step if it is not found, from resident Disk C1, C2Until Ck, until finding.
In search procedure, possible same data field occurs simultaneously in multiple structures, and pLSM algorithms can ensure Newest and correct data can be fetched in reading every time.This is because pLSM trees read when be based on it is assumed hereinafter that:It is newest Data are always existed in the relatively low storage organization of rank, i.e., if existing simultaneously identical keyword in Ck and Ck-1, then Lookup result returns to the data in Ck-1, this feature is also embodied in the insertion process of data.In search operation, such as Fruit does not find required data in, it is necessary to data are read from disk, increase the time overhead of search operation.
It is similar with the search procedure of common B-tree when being searched in the index structure in SSD disks, still, the ratio of keyword Test data to be found are relatively converted into whether in section, schematic diagram is as shown in drawings.If it finds comprising data to be found Block information, then read its offset in the data file, by document misregistration amount access data file, obtain to be found The corresponding value of key.Because the speed for accessing SSD disks is slower than the speed for accessing memory, with reference to locality of reference principle, by one Entire interval censored data caching in secondary access forms a buffering area, the data inside the same section of next time in memory It when being accessed next, is then directly operated in memory, reduces and access SSD number, the tissue buffered in the present invention is using two-way Chain table organization, and the buffered data being of little use is eliminated by lru algorithm.

Claims (1)

1. the SSD data-updating methods of a kind of optimization, which is characterized in that include the following steps:
The first step establishes line segment B-tree structure:
In memory database, first with the shared prefix information of string data, the character string of shared same prefix is formed One character string section;Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree;Finally, entirely The logical view of data structure is a B-tree, and the node of B-tree stores keyword strings block information;
Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and two groups The merging process step of the pLSM methods of part is identical, and the merging process of the pLSM methods of two components is as follows:
1) from memory-resident C0It is middle to read in the leaf node data not merged, it inserts and merges in block;
2) from disk resident C1It is middle to read in the leaf node data not merged, it inserts and merges in block;
3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C0In data be Latest data is updated operation;
4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C0With it is normal C in disk1In the leaf node data that do not merge;
5) as memory-resident C0With disk resident C1All leaf nodes all carried out union operation after, represent one merged Journey terminates, memory-resident C0In update operation data be integrated into disk;
By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then uses and merges step by step Strategy write back SSD, during writing back, do not modify to the data in SSD directly, but generated by merging process Updated data, and in the additional new file of write-in, after the completion of merging, delete ancient deed;
Third walks, and index is provided to the data on SSD disks using line segment B-tree structure:
After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data, first positioned at Memory-resident C in memory0It searches in structure, is so searched in disk structure step by step if it is not found, from disk resident C1, C2Until Ck, until finding;
In search procedure, if same data field occurs simultaneously in multiple structures, pLSM methods ensure in reading every time Newest and correct data can be fetched, in search operation, if required data are not found in memory, just from disk Read data;
When being searched in the index structure in SSD disks, whether the comparison of keyword is converted into test data to be found in section It is interior, if finding the block information for including data to be found, its offset in the data file is read, passes through document misregistration Amount accesses data file, obtains the corresponding value of key to be found, and the entire interval censored data in primary access is buffered in memory In, a buffering area is formed, when the data access inside the same section of next time arrives, is then directly grasped in memory Make, the tissue of buffering eliminates the buffered data being of little use using doubly linked list tissue, and by LRU, method.
CN201510458844.5A 2015-07-30 2015-07-30 A kind of SSD data-updating methods of optimization Active CN105117415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510458844.5A CN105117415B (en) 2015-07-30 2015-07-30 A kind of SSD data-updating methods of optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510458844.5A CN105117415B (en) 2015-07-30 2015-07-30 A kind of SSD data-updating methods of optimization

Publications (2)

Publication Number Publication Date
CN105117415A CN105117415A (en) 2015-12-02
CN105117415B true CN105117415B (en) 2018-07-03

Family

ID=54665405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510458844.5A Active CN105117415B (en) 2015-07-30 2015-07-30 A kind of SSD data-updating methods of optimization

Country Status (1)

Country Link
CN (1) CN105117415B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227677B (en) * 2016-07-20 2018-11-20 浪潮电子信息产业股份有限公司 A kind of method of elongated cache metadata management
CN106708442B (en) * 2016-12-30 2020-02-14 硬石科技(武汉)有限公司 Mass data storage method simultaneously adapting to read-write characteristics of magnetic disk and solid state disk
CN108319602B (en) * 2017-01-17 2020-10-16 阿里巴巴(中国)有限公司 Database management method and database system
CN108319625B (en) * 2017-01-17 2019-10-25 广州市动景计算机科技有限公司 File mergences method and apparatus
WO2018133762A1 (en) * 2017-01-17 2018-07-26 广州市动景计算机科技有限公司 File merging method and apparatus
US10725983B2 (en) * 2017-12-29 2020-07-28 Huawei Technologies Co., Ltd. Systems and methods for database management using append-only storage devices
CN110851434B (en) * 2018-07-27 2023-07-18 阿里巴巴集团控股有限公司 Data storage method, device and equipment
CN109213445A (en) * 2018-08-23 2019-01-15 郑州云海信息技术有限公司 A kind of management method, management system and the relevant apparatus of storage system metadata
CN109407985B (en) * 2018-10-15 2022-02-18 郑州云海信息技术有限公司 Data management method and related device
CN109271570A (en) * 2018-10-30 2019-01-25 郑州云海信息技术有限公司 A kind of method of metadata management inquiry
CN110502457B (en) * 2019-08-23 2022-02-18 北京浪潮数据技术有限公司 Metadata storage method and device
CN111104403B (en) * 2019-11-30 2022-06-07 北京浪潮数据技术有限公司 LSM tree data processing method, system, equipment and computer medium
CN111831622A (en) * 2020-03-31 2020-10-27 北京嘀嘀无限科技发展有限公司 Data index generation method and device, electronic equipment and readable storage medium
CN112487095B (en) * 2020-12-09 2023-03-28 浪潮云信息技术股份公司 Method for optimizing transaction data storage of distributed database
CN113094372A (en) 2021-04-16 2021-07-09 三星(中国)半导体有限公司 Data access method, data access control device and data access system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722449A (en) * 2012-05-24 2012-10-10 中国科学院计算技术研究所 Key-Value local storage method and system based on solid state disk (SSD)
CN104461384A (en) * 2014-11-28 2015-03-25 华为技术有限公司 Data write-in method and storage device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9311252B2 (en) * 2013-08-26 2016-04-12 Globalfoundries Inc. Hierarchical storage for LSM-based NoSQL stores

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722449A (en) * 2012-05-24 2012-10-10 中国科学院计算技术研究所 Key-Value local storage method and system based on solid state disk (SSD)
CN104461384A (en) * 2014-11-28 2015-03-25 华为技术有限公司 Data write-in method and storage device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An efficient design and implementation of LSM-tree based key-value store on open-channel SSD;Peng Wang et al;《Proceedings of the Ninth European Conference on Computer Systems》;20141231;第1-14页 *
pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis;Jin Wang et al;《2013 IEEE 37th Annual Computer Software and Applications Conference》;20131231;第240-245页 *
高效Key-Value持久化缓存系统的实现;罗军等;《计算机工程》;20140331;第40卷(第3期);第33-38页 *

Also Published As

Publication number Publication date
CN105117415A (en) 2015-12-02

Similar Documents

Publication Publication Date Title
CN105117415B (en) A kind of SSD data-updating methods of optimization
CN110083601B (en) Key value storage system-oriented index tree construction method and system
US9672235B2 (en) Method and system for dynamically partitioning very large database indices on write-once tables
CN109299113B (en) Range query method with storage-aware mixed index
US10019382B2 (en) Secondary data structures for storage class memory (scm) enables main-memory databases
WO2020186549A1 (en) Metadata management method, system and medium
Ahn et al. ForestDB: A fast key-value storage system for variable-length string keys
CN110347852B (en) File system embedded with transverse expansion key value storage system and file management method
US20130297613A1 (en) Indexing based on key ranges
CN103229164B (en) Data access method and device
CN104484471B (en) A kind of implementation method of high-performance data storage engines
US20160357673A1 (en) Method of maintaining data consistency
KR20190019805A (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
CN112000846B (en) Method for grouping LSM tree indexes based on GPU
US10521117B2 (en) Unified table delta dictionary memory size and load time optimization
US10289709B2 (en) Interleaved storage of dictionary blocks in a page chain
Lv et al. Log-compact R-tree: an efficient spatial index for SSD
US20110153580A1 (en) Index Page Split Avoidance With Mass Insert Processing
JP7345482B2 (en) Maintaining shards in KV store with dynamic key range
Petrov Algorithms behind modern storage systems: Different uses for read-optimized b-trees and write-optimized lsm-trees
CN110515897B (en) Method and system for optimizing reading performance of LSM storage system
WO2015129109A1 (en) Index management device
CN116382588A (en) LSM-Tree storage engine read amplification problem optimization method based on learning index
US10417215B2 (en) Data storage over immutable and mutable data stages
US20240028560A1 (en) Directory management method and system for file system based on cuckoo hash and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant