CN105117415B

CN105117415B - A kind of SSD data-updating methods of optimization

Info

Publication number: CN105117415B
Application number: CN201510458844.5A
Authority: CN
Inventors: 段章峰; 伍卫国; 崔金华
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2015-07-30
Filing date: 2015-07-30
Publication date: 2018-07-03
Anticipated expiration: 2035-07-30
Also published as: CN105117415A

Abstract

A kind of SSD data-updating methods of optimization, for the data of the character types stored on SSD, two kinds of data structures of Kd-Trees and B-tree are used in combination, while search efficiency is ensured, index structure is reduced to be taken up space, when being updated to character data, utilize set intersection union feature, the different value of identical key in two data blocks are updated, other data then write back as former state, by using the LSM methods based on line segment B-tree, being sequentially written in for data will be converted into for randomly updating for character data on SSD, effectively prevent the write-in scale-up problem of SSD, accelerate the write data speed of SSD, improve database manipulation efficiency.

Description

A kind of SSD data-updating methods of optimization

Technical field

The invention belongs to field of computer technology, and in particular to a kind of SSD data-updating methods of optimization.

Background technology

When NoSQL databases design, needing can persistent storage by the data in memory.It uses SSD (solid state disk) As can persistent storage when, the read-write processing capacities of data, the overall performance of lifting system can be accelerated.Therefore data are write It returns in SSD, the two-level memory framework of composition memory-SSD, to provide capacity bigger for application, compared to tradition machinery disk Storage, the faster Database Systems of access speed.In order to quickly access to the data being stored on SSD, need to data Establish index.However the index structures such as traditional B-tree, B+ trees can generate the request of a large amount of random I/O operation.If directly will The untreated write-in SSD of a large amount of random I/O request datas can cause the serious reduction of SSD performances.Because work as upper layer application When program needs to update the data on SSD, write-in amplification characteristic intrinsic SSD increases the delay of write operation.

O'Nei et al. is in the log-structured merging (Log Structure Merge) of inwardly proposition of log file system The thought of method according to the continuous additional write-in feature of log information, with reference to B-tree data structure, is sacrificed part reading performance, is used for Write performance is greatly improved, the balance of more preferable performance is obtained between read-write.The LSM methods provide a kind of delay update mechanism, The update that the update operation of random small data quantity is fused to big succession operates, and improves the bandwidth availability ratio of storage.LSM The These characteristics of method cause it is written in the two-level memory framework of memory-SSD to be showed more than in the Database Systems read Go out good performance.

However LSM methods will appear the problem of some are new when directly acting on the NoSQL Database Systems based on SSD.LSM Method is to be designed based on keyword for integer type, in the database for the character string type that storage keyword is key-value In, when being indexed if established to the key values of character string type, then index structure being write back SSD, converted, because of SSD On cannot directly store the data of pointer type.On the other hand, the data of character string type are various due to its form, length etc. Property, it can not represent its data with the space distributed in advance, and be individually for each key value to establish index, it will so that The data volume in b-tree indexed structure in LSM methods in SSD increases, and accesses in this case to data, it will cause Accessed path increases, and data access delay increases.For this it is a series of the problem of, the present invention proposes a kind of to be directed to character string type The improvement index structure of data, can speed up the access speed of character string type data on SSD.

Invention content

In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of SSD data updates of optimization Method improves data access efficiency.

In order to achieve the above object, the technical solution taken of the present invention is：

A kind of SSD data-updating methods of optimization, include the following steps：

The first step establishes line segment B-tree structure：

In memory database, first with the shared prefix information of string data, by the character string of shared same prefix Form a character string section；Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree；Finally, The logical view of entire data structure is a B-tree, but the keyword strings block information of its storage；

Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and The merging process step of the pLSM methods of two components is identical, and the merging process of the pLSM methods of two components is as follows：

1) from memory-resident C₀It is middle to read in the leaf node data not merged, it inserts and merges in block；

2) from disk resident C₁It is middle to read in the leaf node data not merged, it inserts and merges in block；

3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C₀In Data are latest data, are updated operation；

4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C₀ With disk resident C₁In the leaf node data that do not merge；

5) as memory-resident C₀With disk resident C₁All leaf nodes all carried out union operation after, represent one conjunction And process terminates, memory-resident C₀In update operation data be integrated into disk；

By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then using step by step The strategy of merging writes back SSD, during writing back, does not modify to the data in SSD directly, but by merging process come Updated data are generated, and in the additional new file of write-in, after the completion of merging, delete ancient deed；

Third walks, and index is provided to the data on SSD disks using line segment B-tree structure：

After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data first exists Memory-resident C in memory₀It searches in structure, is so searched in disk structure step by step if it is not found, from resident Disk C₁, C₂Until C_k, until finding；

In search procedure, if same data field occurs simultaneously in multiple structures, pLSM methods ensure to read every time Newest and correct data can be fetched in taking, in search operation, if required data are not found in, just from disk Read data；

When being searched in the index structure in SSD disks, the comparison of keyword be converted into test data to be found whether In section, if finding the block information for including data to be found, its offset in the data file is read, passes through file Offset accesses data file, obtains the corresponding value of key to be found, and the entire interval censored data in primary access is buffered in In memory, a buffering area is formed, when the data access inside the same section of next time arrives, is then directly carried out in memory Operation, the tissue of buffering eliminate the buffered data being of little use using doubly linked list tissue, and by LRU, method.

The beneficial effects of the invention are as follows：

The present invention realizes a kind of LSM methods of optimization, and pLSM methods provide the character string type stored for SSD The more new strategy of data.On the one hand, using pLSM methods, sequence will be converted into the random data update of file on SSD disks File appending is written, and avoids the write-in scale-up problem of SSD.On the other hand, it is carried using line segment B-tree structure for the data in SSD disks For index, whole traversals during data search are avoided, data search time complexity is reduced to O (Log N) from O (N), improves Data access efficiency.

Description of the drawings

Fig. 1 is two component LSM structure diagrams.

Fig. 2 is LSM method merging process schematic diagrames.

Specific embodiment

The first step establishes line segment B-tree structure：

Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and The merging process step of the pLSM methods of two components is identical, and with reference to Fig. 1, the merging process of the pLSM methods of two components is as follows：

3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C₀In Data are latest data, are updated operation, for example, memory-resident C₀The Data Identification that middle key is 001 is deletion, and C₁In Key is that 001 corresponding data are 12345, then when carrying out merger sequence in merging block, due to memory-resident C₀In number It is updated according to for latest data, so it is the data corresponding to 001 to delete key；

5) as memory-resident C₀With disk resident C₁All leaf nodes all carried out union operation after, represent one conjunction And process terminates, memory-resident C₀In update operation data be integrated into disk, as shown in Figure 2；

Merging process of the above-mentioned steps for the pLSM algorithms of two components, merging process and above-mentioned steps phase between multicompartment Together；

By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then using step by step The strategy of merging writes back SSD, during writing back, does not modify to the data in SSD directly, but by merging process come Generate updated data, and it is additional be written in new file, after the completion of merging, delete ancient deed, in this way after, The write-in scale-up problem of SSD is effectively prevented, data is improved and writes back efficiency；

After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data first exists Memory-resident C in memory₀It searches in structure, is so searched in disk structure step by step if it is not found, from resident Disk C₁, C₂Until C_k, until finding.

In search procedure, possible same data field occurs simultaneously in multiple structures, and pLSM algorithms can ensure Newest and correct data can be fetched in reading every time.This is because pLSM trees read when be based on it is assumed hereinafter that：It is newest Data are always existed in the relatively low storage organization of rank, i.e., if existing simultaneously identical keyword in Ck and Ck-1, then Lookup result returns to the data in Ck-1, this feature is also embodied in the insertion process of data.In search operation, such as Fruit does not find required data in, it is necessary to data are read from disk, increase the time overhead of search operation.

It is similar with the search procedure of common B-tree when being searched in the index structure in SSD disks, still, the ratio of keyword Test data to be found are relatively converted into whether in section, schematic diagram is as shown in drawings.If it finds comprising data to be found Block information, then read its offset in the data file, by document misregistration amount access data file, obtain to be found The corresponding value of key.Because the speed for accessing SSD disks is slower than the speed for accessing memory, with reference to locality of reference principle, by one Entire interval censored data caching in secondary access forms a buffering area, the data inside the same section of next time in memory It when being accessed next, is then directly operated in memory, reduces and access SSD number, the tissue buffered in the present invention is using two-way Chain table organization, and the buffered data being of little use is eliminated by lru algorithm.

Claims

1. the SSD data-updating methods of a kind of optimization, which is characterized in that include the following steps：

The first step establishes line segment B-tree structure：

In memory database, first with the shared prefix information of string data, the character string of shared same prefix is formed One character string section；Then this interval censored data is inserted into using B-tree structure write-in algorithm in line segment B-tree；Finally, entirely The logical view of data structure is a B-tree, and the node of B-tree stores keyword strings block information；

Second step, the update that data are completed using pLSM methods are operated, the merging process of the pLSM methods between multicompartment and two groups The merging process step of the pLSM methods of part is identical, and the merging process of the pLSM methods of two components is as follows：

3) pairing and data in the block progress merger sequence, if encounter equal keyword, with memory-resident C₀In data be Latest data is updated operation；

4) repeat step 1), 2), 3), when merge block it is full when, additional writes back disk, then re-reads memory-resident C₀With it is normal C in disk₁In the leaf node data that do not merge；

5) as memory-resident C₀With disk resident C₁All leaf nodes all carried out union operation after, represent one merged Journey terminates, memory-resident C₀In update operation data be integrated into disk；

By above-mentioned merging step, structure in memory is first written in the update for data on SSD disks, then uses and merges step by step Strategy write back SSD, during writing back, do not modify to the data in SSD directly, but generated by merging process Updated data, and in the additional new file of write-in, after the completion of merging, delete ancient deed；

After pLSM methods, due to being stored with multiple data files in SSD disks, the search operation for data, first positioned at Memory-resident C in memory₀It searches in structure, is so searched in disk structure step by step if it is not found, from disk resident C₁, C₂Until C_k, until finding；

In search procedure, if same data field occurs simultaneously in multiple structures, pLSM methods ensure in reading every time Newest and correct data can be fetched, in search operation, if required data are not found in memory, just from disk Read data；

When being searched in the index structure in SSD disks, whether the comparison of keyword is converted into test data to be found in section It is interior, if finding the block information for including data to be found, its offset in the data file is read, passes through document misregistration Amount accesses data file, obtains the corresponding value of key to be found, and the entire interval censored data in primary access is buffered in memory In, a buffering area is formed, when the data access inside the same section of next time arrives, is then directly grasped in memory Make, the tissue of buffering eliminates the buffered data being of little use using doubly linked list tissue, and by LRU, method.