CN105117415B - A kind of SSD data-updating methods of optimization - Google Patents
A kind of SSD data-updating methods of optimization Download PDFInfo
- Publication number
- CN105117415B CN105117415B CN201510458844.5A CN201510458844A CN105117415B CN 105117415 B CN105117415 B CN 105117415B CN 201510458844 A CN201510458844 A CN 201510458844A CN 105117415 B CN105117415 B CN 105117415B
- Authority
- CN
- China
- Prior art keywords
- data
- memory
- ssd
- resident
- methods
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1847—File system types specifically adapted to static storage, e.g. adapted to flash memory or SSD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of SSD data-updating methods of optimization, for the data of the character types stored on SSD, two kinds of data structures of Kd-Trees and B-tree are used in combination, while search efficiency is ensured, index structure is reduced to be taken up space, when being updated to character data, utilize set intersection union feature, the different value of identical key in two data blocks are updated, other data then write back as former state, by using the LSM methods based on line segment B-tree, being sequentially written in for data will be converted into for randomly updating for character data on SSD, effectively prevent the write-in scale-up problem of SSD, accelerate the write data speed of SSD, improve database manipulation efficiency.
Description
Technical field
The invention belongs to field of computer technology, and in particular to a kind of SSD data-updating methods of optimization.
Background technology
When NoSQL databases design, needing can persistent storage by the data in memory.It uses SSD (solid state disk)
As can persistent storage when, the read-write processing capacities of data, the overall performance of lifting system can be accelerated.Therefore data are write
It returns in SSD, the two-level memory framework of composition memory-SSD, to provide capacity bigger for application, compared to tradition machinery disk
Storage, the faster Database Systems of access speed.In order to quickly access to the data being stored on SSD, need to data
Establish index.However the index structures such as traditional B-tree, B+ trees can generate the request of a large amount of random I/O operation.If directly will
The untreated write-in SSD of a large amount of random I/O request datas can cause the serious reduction of SSD performances.Because work as upper layer application
When program needs to update the data on SSD, write-in amplification characteristic intrinsic SSD increases the delay of write operation.
O'Nei et al. is in the log-structured merging (Log Structure Merge) of inwardly proposition of log file system
The thought of method according to the continuous additional write-in feature of log information, with reference to B-tree data structure, is sacrificed part reading performance, is used for
Write performance is greatly improved, the balance of more preferable performance is obtained between read-write.The LSM methods provide a kind of delay update mechanism,
The update that the update operation of random small data quantity is fused to big succession operates, and improves the bandwidth availability ratio of storage.LSM
The These characteristics of method cause it is written in the two-level memory framework of memory-SSD to be showed more than in the Database Systems read
Go out good performance.
However LSM methods will appear the problem of some are new when directly acting on the NoSQL Database Systems based on SSD.LSM
Method is to be designed based on keyword for integer type, in the database for the character string type that storage keyword is key-value
In, when being indexed if established to the key values of character string type, then index structure being write back SSD, converted, because of SSD
On cannot directly store the data of pointer type.On the other hand, the data of character string type are various due to its form, length etc.
Property, it can not represent its data with the space distributed in advance, and be individually for each key value to establish index, it will so that
The data volume in b-tree indexed structure in LSM methods in SSD increases, and accesses in this case to data, it will cause
Accessed path increases, and data access delay increases.For this it is a series of the problem of, the present invention proposes a kind of to be directed to character string type
The improvement index structure of data, can speed up the access speed of character string type data on SSD.
Invention content
In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of SSD data updates of optimization
Method improves data access efficiency.
In order to achieve the above object, the technical solution taken of the present invention is:
A kind of SSD data-updating methods of optimization, include the following steps:
The first step establishes line segment B-tree structure:
In memory database, first with the shared prefix information of string data, by the character string of shared same prefix
Form a character string section;Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree;Finally,
The logical view of entire data structure is a B-tree, but the keyword strings block information of its storage;
Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and
The merging process step of the pLSM methods of two components is identical, and the merging process of the pLSM methods of two components is as follows:
1) from memory-resident C0It is middle to read in the leaf node data not merged, it inserts and merges in block;
2) from disk resident C1It is middle to read in the leaf node data not merged, it inserts and merges in block;
3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C0In
Data are latest data, are updated operation;
4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C0
With disk resident C1In the leaf node data that do not merge;
5) as memory-resident C0With disk resident C1All leaf nodes all carried out union operation after, represent one conjunction
And process terminates, memory-resident C0In update operation data be integrated into disk;
By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then using step by step
The strategy of merging writes back SSD, during writing back, does not modify to the data in SSD directly, but by merging process come
Updated data are generated, and in the additional new file of write-in, after the completion of merging, delete ancient deed;
Third walks, and index is provided to the data on SSD disks using line segment B-tree structure:
After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data first exists
Memory-resident C in memory0It searches in structure, is so searched in disk structure step by step if it is not found, from resident
Disk C1, C2Until Ck, until finding;
In search procedure, if same data field occurs simultaneously in multiple structures, pLSM methods ensure to read every time
Newest and correct data can be fetched in taking, in search operation, if required data are not found in, just from disk
Read data;
When being searched in the index structure in SSD disks, the comparison of keyword be converted into test data to be found whether
In section, if finding the block information for including data to be found, its offset in the data file is read, passes through file
Offset accesses data file, obtains the corresponding value of key to be found, and the entire interval censored data in primary access is buffered in
In memory, a buffering area is formed, when the data access inside the same section of next time arrives, is then directly carried out in memory
Operation, the tissue of buffering eliminate the buffered data being of little use using doubly linked list tissue, and by LRU, method.
The beneficial effects of the invention are as follows:
The present invention realizes a kind of LSM methods of optimization, and pLSM methods provide the character string type stored for SSD
The more new strategy of data.On the one hand, using pLSM methods, sequence will be converted into the random data update of file on SSD disks
File appending is written, and avoids the write-in scale-up problem of SSD.On the other hand, it is carried using line segment B-tree structure for the data in SSD disks
For index, whole traversals during data search are avoided, data search time complexity is reduced to O (Log N) from O (N), improves
Data access efficiency.
Description of the drawings
Fig. 1 is two component LSM structure diagrams.
Fig. 2 is LSM method merging process schematic diagrames.
Specific embodiment
A kind of SSD data-updating methods of optimization, include the following steps:
The first step establishes line segment B-tree structure:
In memory database, first with the shared prefix information of string data, by the character string of shared same prefix
Form a character string section;Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree;Finally,
The logical view of entire data structure is a B-tree, but the keyword strings block information of its storage;
Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and
The merging process step of the pLSM methods of two components is identical, and with reference to Fig. 1, the merging process of the pLSM methods of two components is as follows:
1) from memory-resident C0It is middle to read in the leaf node data not merged, it inserts and merges in block;
2) from disk resident C1It is middle to read in the leaf node data not merged, it inserts and merges in block;
3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C0In
Data are latest data, are updated operation, for example, memory-resident C0The Data Identification that middle key is 001 is deletion, and C1In
Key is that 001 corresponding data are 12345, then when carrying out merger sequence in merging block, due to memory-resident C0In number
It is updated according to for latest data, so it is the data corresponding to 001 to delete key;
4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C0
With disk resident C1In the leaf node data that do not merge;
5) as memory-resident C0With disk resident C1All leaf nodes all carried out union operation after, represent one conjunction
And process terminates, memory-resident C0In update operation data be integrated into disk, as shown in Figure 2;
Merging process of the above-mentioned steps for the pLSM algorithms of two components, merging process and above-mentioned steps phase between multicompartment
Together;
By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then using step by step
The strategy of merging writes back SSD, during writing back, does not modify to the data in SSD directly, but by merging process come
Generate updated data, and it is additional be written in new file, after the completion of merging, delete ancient deed, in this way after,
The write-in scale-up problem of SSD is effectively prevented, data is improved and writes back efficiency;
Third walks, and index is provided to the data on SSD disks using line segment B-tree structure:
After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data first exists
Memory-resident C in memory0It searches in structure, is so searched in disk structure step by step if it is not found, from resident
Disk C1, C2Until Ck, until finding.
In search procedure, possible same data field occurs simultaneously in multiple structures, and pLSM algorithms can ensure
Newest and correct data can be fetched in reading every time.This is because pLSM trees read when be based on it is assumed hereinafter that:It is newest
Data are always existed in the relatively low storage organization of rank, i.e., if existing simultaneously identical keyword in Ck and Ck-1, then
Lookup result returns to the data in Ck-1, this feature is also embodied in the insertion process of data.In search operation, such as
Fruit does not find required data in, it is necessary to data are read from disk, increase the time overhead of search operation.
It is similar with the search procedure of common B-tree when being searched in the index structure in SSD disks, still, the ratio of keyword
Test data to be found are relatively converted into whether in section, schematic diagram is as shown in drawings.If it finds comprising data to be found
Block information, then read its offset in the data file, by document misregistration amount access data file, obtain to be found
The corresponding value of key.Because the speed for accessing SSD disks is slower than the speed for accessing memory, with reference to locality of reference principle, by one
Entire interval censored data caching in secondary access forms a buffering area, the data inside the same section of next time in memory
It when being accessed next, is then directly operated in memory, reduces and access SSD number, the tissue buffered in the present invention is using two-way
Chain table organization, and the buffered data being of little use is eliminated by lru algorithm.
Claims (1)
1. the SSD data-updating methods of a kind of optimization, which is characterized in that include the following steps:
The first step establishes line segment B-tree structure:
In memory database, first with the shared prefix information of string data, the character string of shared same prefix is formed
One character string section;Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree;Finally, entirely
The logical view of data structure is a B-tree, and the node of B-tree stores keyword strings block information;
Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and two groups
The merging process step of the pLSM methods of part is identical, and the merging process of the pLSM methods of two components is as follows:
1) from memory-resident C0It is middle to read in the leaf node data not merged, it inserts and merges in block;
2) from disk resident C1It is middle to read in the leaf node data not merged, it inserts and merges in block;
3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C0In data be
Latest data is updated operation;
4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C0With it is normal
C in disk1In the leaf node data that do not merge;
5) as memory-resident C0With disk resident C1All leaf nodes all carried out union operation after, represent one merged
Journey terminates, memory-resident C0In update operation data be integrated into disk;
By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then uses and merges step by step
Strategy write back SSD, during writing back, do not modify to the data in SSD directly, but generated by merging process
Updated data, and in the additional new file of write-in, after the completion of merging, delete ancient deed;
Third walks, and index is provided to the data on SSD disks using line segment B-tree structure:
After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data, first positioned at
Memory-resident C in memory0It searches in structure, is so searched in disk structure step by step if it is not found, from disk resident
C1, C2Until Ck, until finding;
In search procedure, if same data field occurs simultaneously in multiple structures, pLSM methods ensure in reading every time
Newest and correct data can be fetched, in search operation, if required data are not found in memory, just from disk
Read data;
When being searched in the index structure in SSD disks, whether the comparison of keyword is converted into test data to be found in section
It is interior, if finding the block information for including data to be found, its offset in the data file is read, passes through document misregistration
Amount accesses data file, obtains the corresponding value of key to be found, and the entire interval censored data in primary access is buffered in memory
In, a buffering area is formed, when the data access inside the same section of next time arrives, is then directly grasped in memory
Make, the tissue of buffering eliminates the buffered data being of little use using doubly linked list tissue, and by LRU, method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510458844.5A CN105117415B (en) | 2015-07-30 | 2015-07-30 | A kind of SSD data-updating methods of optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510458844.5A CN105117415B (en) | 2015-07-30 | 2015-07-30 | A kind of SSD data-updating methods of optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105117415A CN105117415A (en) | 2015-12-02 |
CN105117415B true CN105117415B (en) | 2018-07-03 |
Family
ID=54665405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510458844.5A Active CN105117415B (en) | 2015-07-30 | 2015-07-30 | A kind of SSD data-updating methods of optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105117415B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106227677B (en) * | 2016-07-20 | 2018-11-20 | 浪潮电子信息产业股份有限公司 | A kind of method of elongated cache metadata management |
CN106708442B (en) * | 2016-12-30 | 2020-02-14 | 硬石科技(武汉)有限公司 | Mass data storage method simultaneously adapting to read-write characteristics of magnetic disk and solid state disk |
CN108319602B (en) * | 2017-01-17 | 2020-10-16 | 阿里巴巴(中国)有限公司 | Database management method and database system |
CN108319625B (en) * | 2017-01-17 | 2019-10-25 | 广州市动景计算机科技有限公司 | File mergences method and apparatus |
WO2018133762A1 (en) * | 2017-01-17 | 2018-07-26 | 广州市动景计算机科技有限公司 | File merging method and apparatus |
US10725983B2 (en) * | 2017-12-29 | 2020-07-28 | Huawei Technologies Co., Ltd. | Systems and methods for database management using append-only storage devices |
CN110851434B (en) * | 2018-07-27 | 2023-07-18 | 阿里巴巴集团控股有限公司 | Data storage method, device and equipment |
CN109213445A (en) * | 2018-08-23 | 2019-01-15 | 郑州云海信息技术有限公司 | A kind of management method, management system and the relevant apparatus of storage system metadata |
CN109407985B (en) * | 2018-10-15 | 2022-02-18 | 郑州云海信息技术有限公司 | Data management method and related device |
CN109271570A (en) * | 2018-10-30 | 2019-01-25 | 郑州云海信息技术有限公司 | A kind of method of metadata management inquiry |
CN110502457B (en) * | 2019-08-23 | 2022-02-18 | 北京浪潮数据技术有限公司 | Metadata storage method and device |
CN111104403B (en) * | 2019-11-30 | 2022-06-07 | 北京浪潮数据技术有限公司 | LSM tree data processing method, system, equipment and computer medium |
CN111831622A (en) * | 2020-03-31 | 2020-10-27 | 北京嘀嘀无限科技发展有限公司 | Data index generation method and device, electronic equipment and readable storage medium |
CN112487095B (en) * | 2020-12-09 | 2023-03-28 | 浪潮云信息技术股份公司 | Method for optimizing transaction data storage of distributed database |
CN113094372A (en) | 2021-04-16 | 2021-07-09 | 三星(中国)半导体有限公司 | Data access method, data access control device and data access system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722449A (en) * | 2012-05-24 | 2012-10-10 | 中国科学院计算技术研究所 | Key-Value local storage method and system based on solid state disk (SSD) |
CN104461384A (en) * | 2014-11-28 | 2015-03-25 | 华为技术有限公司 | Data write-in method and storage device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9311252B2 (en) * | 2013-08-26 | 2016-04-12 | Globalfoundries Inc. | Hierarchical storage for LSM-based NoSQL stores |
-
2015
- 2015-07-30 CN CN201510458844.5A patent/CN105117415B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722449A (en) * | 2012-05-24 | 2012-10-10 | 中国科学院计算技术研究所 | Key-Value local storage method and system based on solid state disk (SSD) |
CN104461384A (en) * | 2014-11-28 | 2015-03-25 | 华为技术有限公司 | Data write-in method and storage device |
Non-Patent Citations (3)
Title |
---|
An efficient design and implementation of LSM-tree based key-value store on open-channel SSD;Peng Wang et al;《Proceedings of the Ninth European Conference on Computer Systems》;20141231;第1-14页 * |
pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis;Jin Wang et al;《2013 IEEE 37th Annual Computer Software and Applications Conference》;20131231;第240-245页 * |
高效Key-Value持久化缓存系统的实现;罗军等;《计算机工程》;20140331;第40卷(第3期);第33-38页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105117415A (en) | 2015-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105117415B (en) | A kind of SSD data-updating methods of optimization | |
CN110083601B (en) | Key value storage system-oriented index tree construction method and system | |
US9672235B2 (en) | Method and system for dynamically partitioning very large database indices on write-once tables | |
CN109299113B (en) | Range query method with storage-aware mixed index | |
US10019382B2 (en) | Secondary data structures for storage class memory (scm) enables main-memory databases | |
WO2020186549A1 (en) | Metadata management method, system and medium | |
Ahn et al. | ForestDB: A fast key-value storage system for variable-length string keys | |
CN110347852B (en) | File system embedded with transverse expansion key value storage system and file management method | |
US20130297613A1 (en) | Indexing based on key ranges | |
CN103229164B (en) | Data access method and device | |
CN104484471B (en) | A kind of implementation method of high-performance data storage engines | |
US20160357673A1 (en) | Method of maintaining data consistency | |
KR20190019805A (en) | Method and device for storing data object, and computer readable storage medium having a computer program using the same | |
CN112000846B (en) | Method for grouping LSM tree indexes based on GPU | |
US10521117B2 (en) | Unified table delta dictionary memory size and load time optimization | |
US10289709B2 (en) | Interleaved storage of dictionary blocks in a page chain | |
Lv et al. | Log-compact R-tree: an efficient spatial index for SSD | |
US20110153580A1 (en) | Index Page Split Avoidance With Mass Insert Processing | |
JP7345482B2 (en) | Maintaining shards in KV store with dynamic key range | |
Petrov | Algorithms behind modern storage systems: Different uses for read-optimized b-trees and write-optimized lsm-trees | |
CN110515897B (en) | Method and system for optimizing reading performance of LSM storage system | |
WO2015129109A1 (en) | Index management device | |
CN116382588A (en) | LSM-Tree storage engine read amplification problem optimization method based on learning index | |
US10417215B2 (en) | Data storage over immutable and mutable data stages | |
US20240028560A1 (en) | Directory management method and system for file system based on cuckoo hash and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |