CN104573089A - Incremental snapshot method in NewSQL database - Google Patents

Incremental snapshot method in NewSQL database Download PDF

Info

Publication number
CN104573089A
CN104573089A CN201510046499.4A CN201510046499A CN104573089A CN 104573089 A CN104573089 A CN 104573089A CN 201510046499 A CN201510046499 A CN 201510046499A CN 104573089 A CN104573089 A CN 104573089A
Authority
CN
China
Prior art keywords
hash value
data block
data
snapshot
tail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510046499.4A
Other languages
Chinese (zh)
Inventor
董小社
王龙翔
张兴军
魏晓林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201510046499.4A priority Critical patent/CN104573089A/en
Publication of CN104573089A publication Critical patent/CN104573089A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables

Abstract

The invention discloses an incremental snapshot method in a NewSQL database. The method which can fast write in snapshot data includes the steps of firstly, a snapshot engine serializes the data in a current internal memory database, divides the data into a plurality of data blocks and sequentially writes the data blocks into an annular buffer area, and a duplicated data deleting engine continuously reads the data blocks from the annular buffer area and calculates the corresponding hash value of each data block after snapshot begins; secondly, a hash value search fingerprint index table is searched, the data blocks with duplicated hash values in the hash value search fingerprint index table are deleted, and the hash value corresponding to each data block is written into an abstract file. By the method, the snapshot data can be written in fast, and increasing of disk space consumed by the snapshot data is avoided.

Description

Increment type Snapshot Method in a kind of NewSQL database
Technical field
The invention belongs to database technical field, relate to a kind of Snapshot Method, be specifically related to the increment type Snapshot Method in a kind of NewSQL database.
Background technology
NewSQL database: NewSQL database is the excellent extensibility that a class can have similar nosql database, supports again the advanced database system of traditional database such as ACID affairs and sql characteristic of speech sounds simultaneously.The main application scenarios of NewSQL database is the online affairs transaction system of OLTP, but existing Snapshot Method is when writing snapshot data, only there is the problem that snapshot data repeats to write, due to snapshot data repeat write, thus effectively add the write time of snapshot data, and snapshot data is made to consume external space growth.
Summary of the invention
The object of the invention is to the shortcoming overcoming above-mentioned prior art, provide the increment type Snapshot Method in a kind of NewSQL database, snapshot data can write by the method fast, and avoids snapshot data to consume the growth of external space.
For achieving the above object, the increment type Snapshot Method in NewSQL database of the present invention comprises the following steps:
1) snapshot engine is divided into some data blocks after the data in current memory database are carried out serialization, then each data block is written in buffer circle successively, after starting snapshot, data de-duplication engine is read block from buffer circle continuously, and calculates hash value corresponding to each data block;
2) search hash value and search fingerprint index table, respectively determining step 1) whether hash value corresponding to each data block of obtaining be present in hash value and search fingerprint index table, when the hash value that any one data block is corresponding be present in hash value search in fingerprint index table time, then this data block is denoted as repeating data block, then hash value corresponding for this repeating data block is written in Summary file; When the hash value that any one data block is corresponding be not present in hash value search in fingerprint index table time, then this data block is denoted as new data block, and by new data block writing data blocks storage engines, hash value corresponding for new data block and this new data block offset address are in the data file written to hash value to search in fingerprint index table simultaneously, and hash value corresponding for this new data block is written in Summary file.
If the head pointer of buffer circle is head, tail pointer is tail, and the length of buffer circle is length, then have:
When head equals tail, then buffer district is empty, and when head is not equal to tail, then buffer circle is non-NULL;
Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full;
Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine;
Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.
Adopt the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block;
Hash value is searched in fingerprint index table and is preserved hash value corresponding to data block and this data block offset address in the data file;
Hash value corresponding to each data block is calculated by SHA-1 hash function.
When needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, hash value is read one by one again from Summary file, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.
The present invention has following beneficial effect:
Increment type Snapshot Method in NewSQL database of the present invention in use, some data blocks are divided into after first the data in memory database being carried out serialization, then each data block is written in buffer circle successively, from buffer circle, each data block is read again by data de-duplication engine, and calculate hash value corresponding to each data block, the hash value corresponding according to each data block is searched hash value and to be tabled look-up fingerprint index table, delete the data block repeated, thus avoid snapshot data to repeat the problem write, reduce the write time of snapshot data, snapshot data is avoided to consume the growth of external space, first data block is written in buffer circle simultaneously, internal memory is avoided repeatedly to apply for-discharge the performance cost brought, reduce Snapshot time.
Accompanying drawing explanation
Fig. 1 is schematic diagram of the present invention;
Fig. 2 is process flow diagram of the present invention;
Fig. 3 is the process flow diagram of date restoring in the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail:
With reference to figure 1 and Fig. 2, the increment type Snapshot Method in NewSQL database of the present invention comprises the following steps:
1) snapshot engine is divided into some data blocks after the data in current memory database are carried out serialization, then each data block is written in buffer circle successively, after starting snapshot, data de-duplication engine is read block from buffer circle continuously, and calculates hash value corresponding to each data block;
2) search hash value and search fingerprint index table, respectively determining step 1) whether hash value corresponding to each data block of obtaining be present in hash value and search fingerprint index table, when the hash value that any one data block is corresponding be present in hash value search in fingerprint index table time, then this data block is denoted as repeating data block, then hash value corresponding for this repeating data block is written in Summary file; When the hash value that any one data block is corresponding be not present in hash value search in fingerprint index table time, then this data block is denoted as new data block, and by new data block writing data blocks storage engines, hash value corresponding for new data block and this new data block offset address are in the data file written to hash value to search in fingerprint index table simultaneously, and hash value corresponding for this new data block is written in Summary file.
It should be noted that, if the head pointer of buffer circle is head, tail pointer is tail, the length of buffer circle is length, then have: when head equals tail, then buffer district is empty, when head is not equal to tail, then buffer circle is non-NULL; Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full; Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine; Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.
Adopt the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block; Hash value is searched in fingerprint index table and is preserved hash value corresponding to data block and this data block offset address in the data file; Hash value corresponding to each data block is calculated by SHA-1 hash function.
With reference to figure 3, when needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, from Summary file, read hash value one by one again, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.

Claims (6)

1. the increment type Snapshot Method in NewSQL database, is characterized in that, comprise the following steps:
1) snapshot engine is divided into some data blocks after the data in current memory database are carried out serialization, then each data block is written in buffer circle successively, after starting snapshot, data de-duplication engine is read block from buffer circle continuously, and calculates hash value corresponding to each data block;
2) search hash value and search fingerprint index table, respectively determining step 1) whether hash value corresponding to each data block of obtaining be present in hash value and search fingerprint index table, when the hash value that any one data block is corresponding be present in hash value search in fingerprint index table time, then this data block is denoted as repeating data block, then hash value corresponding for this repeating data block is written in Summary file; When the hash value that any one data block is corresponding be not present in hash value search in fingerprint index table time, then this data block is denoted as new data block, and by new data block writing data blocks storage engines, hash value corresponding for new data block and this new data block offset address are in the data file written to hash value to search in fingerprint index table simultaneously, and hash value corresponding for this new data block is written in Summary file.
2. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, if the head pointer of buffer circle is head, tail pointer is tail, and the length of buffer circle is length, then have:
When head equals tail, then buffer district is empty, and when head is not equal to tail, then buffer circle is non-NULL;
Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full;
Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine;
Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.
3. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, adopts the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block.
4. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, described hash value is searched in fingerprint index table and preserved hash value corresponding to data block and this data block offset address in the data file.
5. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, calculates hash value corresponding to each data block by SHA-1hash function.
6. the increment type Snapshot Method in NewSQL database according to claim 1, it is characterized in that, when needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, hash value is read one by one again from Summary file, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.
CN201510046499.4A 2015-01-29 2015-01-29 Incremental snapshot method in NewSQL database Pending CN104573089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510046499.4A CN104573089A (en) 2015-01-29 2015-01-29 Incremental snapshot method in NewSQL database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510046499.4A CN104573089A (en) 2015-01-29 2015-01-29 Incremental snapshot method in NewSQL database

Publications (1)

Publication Number Publication Date
CN104573089A true CN104573089A (en) 2015-04-29

Family

ID=53089151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510046499.4A Pending CN104573089A (en) 2015-01-29 2015-01-29 Incremental snapshot method in NewSQL database

Country Status (1)

Country Link
CN (1) CN104573089A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369487A (en) * 2015-11-25 2018-08-03 华睿泰科技有限责任公司 System and method for shooting snapshot in duplicate removal Virtual File System
CN109101369A (en) * 2018-08-21 2018-12-28 郑州云海信息技术有限公司 A kind of sustainable protection method, system and device of business host data
CN109522160A (en) * 2018-11-29 2019-03-26 上海英方软件股份有限公司 Compare backup method and system by saving the file information abstract progress file directory
CN111125012A (en) * 2019-12-22 2020-05-08 浪潮(北京)电子信息产业有限公司 Snapshot generation method, device and equipment and readable storage medium
CN114943021A (en) * 2022-07-20 2022-08-26 之江实验室 TB-level incremental data screening method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255758A1 (en) * 2006-04-28 2007-11-01 Ling Zheng System and method for sampling based elimination of duplicate data
CN101582076A (en) * 2009-06-24 2009-11-18 浪潮电子信息产业股份有限公司 Data de-duplication method based on data base
CN102684827A (en) * 2012-03-02 2012-09-19 华为技术有限公司 Data processing method and data processing equipment
CN104077380A (en) * 2014-06-26 2014-10-01 深圳信息职业技术学院 Method and device for deleting duplicated data and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255758A1 (en) * 2006-04-28 2007-11-01 Ling Zheng System and method for sampling based elimination of duplicate data
CN101582076A (en) * 2009-06-24 2009-11-18 浪潮电子信息产业股份有限公司 Data de-duplication method based on data base
CN102684827A (en) * 2012-03-02 2012-09-19 华为技术有限公司 Data processing method and data processing equipment
CN104077380A (en) * 2014-06-26 2014-10-01 深圳信息职业技术学院 Method and device for deleting duplicated data and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩明峰: "环形缓冲区读写操作的分析与实现", 《单片机与嵌入式系统应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369487A (en) * 2015-11-25 2018-08-03 华睿泰科技有限责任公司 System and method for shooting snapshot in duplicate removal Virtual File System
CN108369487B (en) * 2015-11-25 2021-05-04 华睿泰科技有限责任公司 System and method for taking snapshots in a deduplicated virtual file system
CN109101369A (en) * 2018-08-21 2018-12-28 郑州云海信息技术有限公司 A kind of sustainable protection method, system and device of business host data
CN109522160A (en) * 2018-11-29 2019-03-26 上海英方软件股份有限公司 Compare backup method and system by saving the file information abstract progress file directory
CN111125012A (en) * 2019-12-22 2020-05-08 浪潮(北京)电子信息产业有限公司 Snapshot generation method, device and equipment and readable storage medium
CN114943021A (en) * 2022-07-20 2022-08-26 之江实验室 TB-level incremental data screening method and device
US11789639B1 (en) 2022-07-20 2023-10-17 Zhejiang Lab Method and apparatus for screening TB-scale incremental data

Similar Documents

Publication Publication Date Title
US9851917B2 (en) Method for de-duplicating data and apparatus therefor
CN104573089A (en) Incremental snapshot method in NewSQL database
CN102663090B (en) Method and device for inquiry metadata
CN101777016B (en) Snapshot storage and data recovery method of continuous data protection system
CN103488709B (en) A kind of index establishing method and system, search method and system
CN101777017B (en) Rapid recovery method of continuous data protection system
CN103324699B (en) A kind of rapid data de-duplication method adapting to large market demand
CN102467572B (en) Data block inquiring method for supporting data de-duplication program
CN104090987A (en) Historical data storage and indexing method
CN107135662B (en) Differential data backup method, storage system and differential data backup device
CN101882141A (en) Method and system for implementing repeated data deletion
CN103559027A (en) Design method of separate-storage type key-value storage system
CN102799598A (en) Data recovery method for deleting repeated data
CN101582076A (en) Data de-duplication method based on data base
CN103229164B (en) Data access method and device
GB2476536A (en) Modified B+ tree to map logical addresses to physical addresses in NAND flash memory
JP2016157441A (en) System and method for copy on write on ssd
CN103152430B (en) A kind of reduce the cloud storage method that data take up room
KR20160084370A (en) Controller, flash memory apparatus, method for identifying data block stability, and method for storing data in flash memory apparatus
CN102722450B (en) Storage method for redundancy deletion block device based on location-sensitive hash
CN103488727A (en) Two-dimensional time-series data storage and query method based on periodic logs
CN103810246A (en) Index building method and device and index query method and device
CN103473258A (en) Cloud storage file system
CN106372002B (en) A kind of date storage method and read restoring method
CN107506466A (en) A kind of small documents storage method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150429