CN104573089A - Incremental snapshot method in NewSQL database - Google Patents
Incremental snapshot method in NewSQL database Download PDFInfo
- Publication number
- CN104573089A CN104573089A CN201510046499.4A CN201510046499A CN104573089A CN 104573089 A CN104573089 A CN 104573089A CN 201510046499 A CN201510046499 A CN 201510046499A CN 104573089 A CN104573089 A CN 104573089A
- Authority
- CN
- China
- Prior art keywords
- hash value
- data block
- data
- snapshot
- tail
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1737—Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
Abstract
The invention discloses an incremental snapshot method in a NewSQL database. The method which can fast write in snapshot data includes the steps of firstly, a snapshot engine serializes the data in a current internal memory database, divides the data into a plurality of data blocks and sequentially writes the data blocks into an annular buffer area, and a duplicated data deleting engine continuously reads the data blocks from the annular buffer area and calculates the corresponding hash value of each data block after snapshot begins; secondly, a hash value search fingerprint index table is searched, the data blocks with duplicated hash values in the hash value search fingerprint index table are deleted, and the hash value corresponding to each data block is written into an abstract file. By the method, the snapshot data can be written in fast, and increasing of disk space consumed by the snapshot data is avoided.
Description
Technical field
The invention belongs to database technical field, relate to a kind of Snapshot Method, be specifically related to the increment type Snapshot Method in a kind of NewSQL database.
Background technology
NewSQL database: NewSQL database is the excellent extensibility that a class can have similar nosql database, supports again the advanced database system of traditional database such as ACID affairs and sql characteristic of speech sounds simultaneously.The main application scenarios of NewSQL database is the online affairs transaction system of OLTP, but existing Snapshot Method is when writing snapshot data, only there is the problem that snapshot data repeats to write, due to snapshot data repeat write, thus effectively add the write time of snapshot data, and snapshot data is made to consume external space growth.
Summary of the invention
The object of the invention is to the shortcoming overcoming above-mentioned prior art, provide the increment type Snapshot Method in a kind of NewSQL database, snapshot data can write by the method fast, and avoids snapshot data to consume the growth of external space.
For achieving the above object, the increment type Snapshot Method in NewSQL database of the present invention comprises the following steps:
1) snapshot engine is divided into some data blocks after the data in current memory database are carried out serialization, then each data block is written in buffer circle successively, after starting snapshot, data de-duplication engine is read block from buffer circle continuously, and calculates hash value corresponding to each data block;
2) search hash value and search fingerprint index table, respectively determining step 1) whether hash value corresponding to each data block of obtaining be present in hash value and search fingerprint index table, when the hash value that any one data block is corresponding be present in hash value search in fingerprint index table time, then this data block is denoted as repeating data block, then hash value corresponding for this repeating data block is written in Summary file; When the hash value that any one data block is corresponding be not present in hash value search in fingerprint index table time, then this data block is denoted as new data block, and by new data block writing data blocks storage engines, hash value corresponding for new data block and this new data block offset address are in the data file written to hash value to search in fingerprint index table simultaneously, and hash value corresponding for this new data block is written in Summary file.
If the head pointer of buffer circle is head, tail pointer is tail, and the length of buffer circle is length, then have:
When head equals tail, then buffer district is empty, and when head is not equal to tail, then buffer circle is non-NULL;
Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full;
Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine;
Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.
Adopt the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block;
Hash value is searched in fingerprint index table and is preserved hash value corresponding to data block and this data block offset address in the data file;
Hash value corresponding to each data block is calculated by SHA-1 hash function.
When needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, hash value is read one by one again from Summary file, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.
The present invention has following beneficial effect:
Increment type Snapshot Method in NewSQL database of the present invention in use, some data blocks are divided into after first the data in memory database being carried out serialization, then each data block is written in buffer circle successively, from buffer circle, each data block is read again by data de-duplication engine, and calculate hash value corresponding to each data block, the hash value corresponding according to each data block is searched hash value and to be tabled look-up fingerprint index table, delete the data block repeated, thus avoid snapshot data to repeat the problem write, reduce the write time of snapshot data, snapshot data is avoided to consume the growth of external space, first data block is written in buffer circle simultaneously, internal memory is avoided repeatedly to apply for-discharge the performance cost brought, reduce Snapshot time.
Accompanying drawing explanation
Fig. 1 is schematic diagram of the present invention;
Fig. 2 is process flow diagram of the present invention;
Fig. 3 is the process flow diagram of date restoring in the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail:
With reference to figure 1 and Fig. 2, the increment type Snapshot Method in NewSQL database of the present invention comprises the following steps:
1) snapshot engine is divided into some data blocks after the data in current memory database are carried out serialization, then each data block is written in buffer circle successively, after starting snapshot, data de-duplication engine is read block from buffer circle continuously, and calculates hash value corresponding to each data block;
2) search hash value and search fingerprint index table, respectively determining step 1) whether hash value corresponding to each data block of obtaining be present in hash value and search fingerprint index table, when the hash value that any one data block is corresponding be present in hash value search in fingerprint index table time, then this data block is denoted as repeating data block, then hash value corresponding for this repeating data block is written in Summary file; When the hash value that any one data block is corresponding be not present in hash value search in fingerprint index table time, then this data block is denoted as new data block, and by new data block writing data blocks storage engines, hash value corresponding for new data block and this new data block offset address are in the data file written to hash value to search in fingerprint index table simultaneously, and hash value corresponding for this new data block is written in Summary file.
It should be noted that, if the head pointer of buffer circle is head, tail pointer is tail, the length of buffer circle is length, then have: when head equals tail, then buffer district is empty, when head is not equal to tail, then buffer circle is non-NULL; Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full; Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine; Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.
Adopt the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block; Hash value is searched in fingerprint index table and is preserved hash value corresponding to data block and this data block offset address in the data file; Hash value corresponding to each data block is calculated by SHA-1 hash function.
With reference to figure 3, when needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, from Summary file, read hash value one by one again, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.
Claims (6)
1. the increment type Snapshot Method in NewSQL database, is characterized in that, comprise the following steps:
1) snapshot engine is divided into some data blocks after the data in current memory database are carried out serialization, then each data block is written in buffer circle successively, after starting snapshot, data de-duplication engine is read block from buffer circle continuously, and calculates hash value corresponding to each data block;
2) search hash value and search fingerprint index table, respectively determining step 1) whether hash value corresponding to each data block of obtaining be present in hash value and search fingerprint index table, when the hash value that any one data block is corresponding be present in hash value search in fingerprint index table time, then this data block is denoted as repeating data block, then hash value corresponding for this repeating data block is written in Summary file; When the hash value that any one data block is corresponding be not present in hash value search in fingerprint index table time, then this data block is denoted as new data block, and by new data block writing data blocks storage engines, hash value corresponding for new data block and this new data block offset address are in the data file written to hash value to search in fingerprint index table simultaneously, and hash value corresponding for this new data block is written in Summary file.
2. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, if the head pointer of buffer circle is head, tail pointer is tail, and the length of buffer circle is length, then have:
When head equals tail, then buffer district is empty, and when head is not equal to tail, then buffer circle is non-NULL;
Judge whether (head+1) %lenth equals tail, when (head+1) %lenth equals tail, then buffer circle has stored full, and when (head+1) %lenth is not equal to tail, then buffer circle does not store full;
Tail pointer regulation from buffer circle read block, when after reading data block, then by tail pointed internal storage location data, and is tail=(tail+1) %length by Get method by data de-duplication engine;
Adopt put method to be written in buffer circle by data block, writing in a data block, then by head pointed internal storage location data, and is head=(head+1) %length by head pointer regulation.
3. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, adopts the method for fixed block or the method for variable partitioned blocks that serialized data are divided into data block.
4. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, described hash value is searched in fingerprint index table and preserved hash value corresponding to data block and this data block offset address in the data file.
5. the increment type Snapshot Method in NewSQL database according to claim 1, is characterized in that, calculates hash value corresponding to each data block by SHA-1hash function.
6. the increment type Snapshot Method in NewSQL database according to claim 1, it is characterized in that, when needs carry out date restoring, the snapshot detection point that selection will return to, corresponding Summary file is read according to described snapshot detection point, hash value is read one by one again from Summary file, and often read a hash value, then search fingerprint index table by hash value in hash value and search corresponding offset address, then according to offset address and hash value read block from data block storage engines of correspondence, reload in internal memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510046499.4A CN104573089A (en) | 2015-01-29 | 2015-01-29 | Incremental snapshot method in NewSQL database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510046499.4A CN104573089A (en) | 2015-01-29 | 2015-01-29 | Incremental snapshot method in NewSQL database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104573089A true CN104573089A (en) | 2015-04-29 |
Family
ID=53089151
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510046499.4A Pending CN104573089A (en) | 2015-01-29 | 2015-01-29 | Incremental snapshot method in NewSQL database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104573089A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108369487A (en) * | 2015-11-25 | 2018-08-03 | 华睿泰科技有限责任公司 | System and method for shooting snapshot in duplicate removal Virtual File System |
CN109101369A (en) * | 2018-08-21 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of sustainable protection method, system and device of business host data |
CN109522160A (en) * | 2018-11-29 | 2019-03-26 | 上海英方软件股份有限公司 | Compare backup method and system by saving the file information abstract progress file directory |
CN111125012A (en) * | 2019-12-22 | 2020-05-08 | 浪潮(北京)电子信息产业有限公司 | Snapshot generation method, device and equipment and readable storage medium |
CN114943021A (en) * | 2022-07-20 | 2022-08-26 | 之江实验室 | TB-level incremental data screening method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070255758A1 (en) * | 2006-04-28 | 2007-11-01 | Ling Zheng | System and method for sampling based elimination of duplicate data |
CN101582076A (en) * | 2009-06-24 | 2009-11-18 | 浪潮电子信息产业股份有限公司 | Data de-duplication method based on data base |
CN102684827A (en) * | 2012-03-02 | 2012-09-19 | 华为技术有限公司 | Data processing method and data processing equipment |
CN104077380A (en) * | 2014-06-26 | 2014-10-01 | 深圳信息职业技术学院 | Method and device for deleting duplicated data and system |
-
2015
- 2015-01-29 CN CN201510046499.4A patent/CN104573089A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070255758A1 (en) * | 2006-04-28 | 2007-11-01 | Ling Zheng | System and method for sampling based elimination of duplicate data |
CN101582076A (en) * | 2009-06-24 | 2009-11-18 | 浪潮电子信息产业股份有限公司 | Data de-duplication method based on data base |
CN102684827A (en) * | 2012-03-02 | 2012-09-19 | 华为技术有限公司 | Data processing method and data processing equipment |
CN104077380A (en) * | 2014-06-26 | 2014-10-01 | 深圳信息职业技术学院 | Method and device for deleting duplicated data and system |
Non-Patent Citations (1)
Title |
---|
韩明峰: "环形缓冲区读写操作的分析与实现", 《单片机与嵌入式系统应用》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108369487A (en) * | 2015-11-25 | 2018-08-03 | 华睿泰科技有限责任公司 | System and method for shooting snapshot in duplicate removal Virtual File System |
CN108369487B (en) * | 2015-11-25 | 2021-05-04 | 华睿泰科技有限责任公司 | System and method for taking snapshots in a deduplicated virtual file system |
CN109101369A (en) * | 2018-08-21 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of sustainable protection method, system and device of business host data |
CN109522160A (en) * | 2018-11-29 | 2019-03-26 | 上海英方软件股份有限公司 | Compare backup method and system by saving the file information abstract progress file directory |
CN111125012A (en) * | 2019-12-22 | 2020-05-08 | 浪潮(北京)电子信息产业有限公司 | Snapshot generation method, device and equipment and readable storage medium |
CN114943021A (en) * | 2022-07-20 | 2022-08-26 | 之江实验室 | TB-level incremental data screening method and device |
US11789639B1 (en) | 2022-07-20 | 2023-10-17 | Zhejiang Lab | Method and apparatus for screening TB-scale incremental data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9851917B2 (en) | Method for de-duplicating data and apparatus therefor | |
CN104573089A (en) | Incremental snapshot method in NewSQL database | |
CN102663090B (en) | Method and device for inquiry metadata | |
CN101777016B (en) | Snapshot storage and data recovery method of continuous data protection system | |
CN103488709B (en) | A kind of index establishing method and system, search method and system | |
CN101777017B (en) | Rapid recovery method of continuous data protection system | |
CN103324699B (en) | A kind of rapid data de-duplication method adapting to large market demand | |
CN102467572B (en) | Data block inquiring method for supporting data de-duplication program | |
CN104090987A (en) | Historical data storage and indexing method | |
CN107135662B (en) | Differential data backup method, storage system and differential data backup device | |
CN101882141A (en) | Method and system for implementing repeated data deletion | |
CN103559027A (en) | Design method of separate-storage type key-value storage system | |
CN102799598A (en) | Data recovery method for deleting repeated data | |
CN101582076A (en) | Data de-duplication method based on data base | |
CN103229164B (en) | Data access method and device | |
GB2476536A (en) | Modified B+ tree to map logical addresses to physical addresses in NAND flash memory | |
JP2016157441A (en) | System and method for copy on write on ssd | |
CN103152430B (en) | A kind of reduce the cloud storage method that data take up room | |
KR20160084370A (en) | Controller, flash memory apparatus, method for identifying data block stability, and method for storing data in flash memory apparatus | |
CN102722450B (en) | Storage method for redundancy deletion block device based on location-sensitive hash | |
CN103488727A (en) | Two-dimensional time-series data storage and query method based on periodic logs | |
CN103810246A (en) | Index building method and device and index query method and device | |
CN103473258A (en) | Cloud storage file system | |
CN106372002B (en) | A kind of date storage method and read restoring method | |
CN107506466A (en) | A kind of small documents storage method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150429 |