CN104573089A

CN104573089A - Incremental snapshot method in NewSQL database

Info

Publication number: CN104573089A
Application number: CN201510046499.4A
Authority: CN
Inventors: 董小社; 王龙翔; 张兴军; 魏晓林
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2015-01-29
Filing date: 2015-01-29
Publication date: 2015-04-29

Abstract

The invention discloses an incremental snapshot method in a NewSQL database. The method which can fast write in snapshot data includes the steps of firstly, a snapshot engine serializes the data in a current internal memory database, divides the data into a plurality of data blocks and sequentially writes the data blocks into an annular buffer area, and a duplicated data deleting engine continuously reads the data blocks from the annular buffer area and calculates the corresponding hash value of each data block after snapshot begins; secondly, a hash value search fingerprint index table is searched, the data blocks with duplicated hash values in the hash value search fingerprint index table are deleted, and the hash value corresponding to each data block is written into an abstract file. By the method, the snapshot data can be written in fast, and increasing of disk space consumed by the snapshot data is avoided.

Description

Increment type Snapshot Method in a kind of NewSQL database

Technical field

The invention belongs to database technical field, relate to a kind of Snapshot Method, be specifically related to the increment type Snapshot Method in a kind of NewSQL database.

Background technology

NewSQL database: NewSQL database is the excellent extensibility that a class can have similar nosql database, supports again the advanced database system of traditional database such as ACID affairs and sql characteristic of speech sounds simultaneously.The main application scenarios of NewSQL database is the online affairs transaction system of OLTP, but existing Snapshot Method is when writing snapshot data, only there is the problem that snapshot data repeats to write, due to snapshot data repeat write, thus effectively add the write time of snapshot data, and snapshot data is made to consume external space growth.

Summary of the invention

The object of the invention is to the shortcoming overcoming above-mentioned prior art, provide the increment type Snapshot Method in a kind of NewSQL database, snapshot data can write by the method fast, and avoids snapshot data to consume the growth of external space.

For achieving the above object, the increment type Snapshot Method in NewSQL database of the present invention comprises the following steps:

1) snapshot engine is divided into some data blocks after the data in current memory database are carried out serialization, then each data block is written in buffer circle successively, after starting snapshot, data de-duplication engine is read block from buffer circle continuously, and calculates hash value corresponding to each data block;

2) search hash value and search fingerprint index table, respectively determining step 1) whether hash value corresponding to each data block of obtaining be present in hash value and search fingerprint index table, when the hash value that any one data block is corresponding be present in hash value search in fingerprint index table time, then this data block is denoted as repeating data block, then hash value corresponding for this repeating data block is written in Summary file; When the hash value that any one data block is corresponding be not present in hash value search in fingerprint index table time, then this data block is denoted as new data block, and by new data block writing data blocks storage engines, hash value corresponding for new data block and this new data block offset address are in the data file written to hash value to search in fingerprint index table simultaneously, and hash value corresponding for this new data block is written in Summary file.

If the head pointer of buffer circle is head, tail pointer is tail, and the length of buffer circle is length, then have:

When head equals tail, then buffer district is empty, and when head is not equal to tail, then buffer circle is non-NULL;

Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full;

Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine;

Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.

Adopt the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block;

Hash value is searched in fingerprint index table and is preserved hash value corresponding to data block and this data block offset address in the data file;

Hash value corresponding to each data block is calculated by SHA-1 hash function.

When needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, hash value is read one by one again from Summary file, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.

The present invention has following beneficial effect:

Increment type Snapshot Method in NewSQL database of the present invention in use, some data blocks are divided into after first the data in memory database being carried out serialization, then each data block is written in buffer circle successively, from buffer circle, each data block is read again by data de-duplication engine, and calculate hash value corresponding to each data block, the hash value corresponding according to each data block is searched hash value and to be tabled look-up fingerprint index table, delete the data block repeated, thus avoid snapshot data to repeat the problem write, reduce the write time of snapshot data, snapshot data is avoided to consume the growth of external space, first data block is written in buffer circle simultaneously, internal memory is avoided repeatedly to apply for-discharge the performance cost brought, reduce Snapshot time.

Accompanying drawing explanation

Fig. 1 is schematic diagram of the present invention;

Fig. 2 is process flow diagram of the present invention;

Fig. 3 is the process flow diagram of date restoring in the present invention.

Embodiment

Below in conjunction with accompanying drawing, the present invention is described in further detail:

With reference to figure 1 and Fig. 2, the increment type Snapshot Method in NewSQL database of the present invention comprises the following steps:

It should be noted that, if the head pointer of buffer circle is head, tail pointer is tail, the length of buffer circle is length, then have: when head equals tail, then buffer district is empty, when head is not equal to tail, then buffer circle is non-NULL; Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full; Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine; Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.

Adopt the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block; Hash value is searched in fingerprint index table and is preserved hash value corresponding to data block and this data block offset address in the data file; Hash value corresponding to each data block is calculated by SHA-1 hash function.

With reference to figure 3, when needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, from Summary file, read hash value one by one again, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.

Claims

1. the increment type Snapshot Method in NewSQL database, is characterized in that, comprise the following steps:

2. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, if the head pointer of buffer circle is head, tail pointer is tail, and the length of buffer circle is length, then have:

3. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, adopts the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block.

4. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, described hash value is searched in fingerprint index table and preserved hash value corresponding to data block and this data block offset address in the data file.

5. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, calculates hash value corresponding to each data block by SHA-1hash function.

6. the increment type Snapshot Method in NewSQL database according to claim 1, it is characterized in that, when needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, hash value is read one by one again from Summary file, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.