CN104239542B - A kind of system and method for source distribution formula database capture data - Google Patents

A kind of system and method for source distribution formula database capture data Download PDF

Info

Publication number
CN104239542B
CN104239542B CN201410488046.2A CN201410488046A CN104239542B CN 104239542 B CN104239542 B CN 104239542B CN 201410488046 A CN201410488046 A CN 201410488046A CN 104239542 B CN104239542 B CN 104239542B
Authority
CN
China
Prior art keywords
file
keeping records
data
region server
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410488046.2A
Other languages
Chinese (zh)
Other versions
CN104239542A (en
Inventor
孙志云
郭美思
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410488046.2A priority Critical patent/CN104239542B/en
Publication of CN104239542A publication Critical patent/CN104239542A/en
Application granted granted Critical
Publication of CN104239542B publication Critical patent/CN104239542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The present invention discloses the system and method for source distribution formula database capture data, wherein method includes:When region server starts keeping records, by in the file of the Refresh Data in memory cache to distributed file system, then the reference document of the document creation respective links of all distributed file systems for wanting keeping records is directed to, and by reference document storage into the file set labeled as book server.Keeping records mode as present invention use, user can be allowed to carry out capturing the operation of data to source distribution formula database in different phase, so as to the data of source distribution formula database information safe to use record.

Description

A kind of system and method for source distribution formula database capture data
Technical field
The present invention relates to distributed computer cluster system data storehouse technology, more particularly to increased income in distributed cluster system Distributed data base captures the system and method for data.
Background technology
As data volume is presented explosive growth, the calculation processing power and storage capacity of single computer are much The requirement of data storage and processing can not be met.Therefore, the architecture of computer distribution type receives the concern of user and good Comment.In distributed architecture, more cheap computers can be built into distributed cluster system, so that every Corresponding task can be run on machine, while also can concurrently handle the demand of user.Distributed cluster system has high property Can, highly reliable, high extension and it is inexpensive the characteristics of.HBase is the distributed data base increased income for distributed cluster system. Large-scale structure storage cluster can be erected on cheap server using HBase technologies, there is very high data throughput Amount and good structure are stretched ability, and can not only processing structure data and non-structured data simultaneously, moreover it is possible to logical Real-time random read-write is crossed to supplement distributed file system (HDFS) deficiency.Therefore, capture data regular to HBase are very Important.
In traditional database, the uniformity of affairs will be considered by capturing the method for data.The side that database passes through daily record Formula ensures the uniformity of affairs, i.e., could mark completion after all affairs are submitted.In this process, if wrong hair It is raw, the affairs in current system can be returned to by way of daily record retraction.Therefore, in the data in capturing traditional database, Ensure that the keeping records in each database is consistent with source database.When capturing the keeping records of data typically using writing The mode of copy preserves data record.
In HBase distributed data bases, data can be changed the union operation in HBase table and deletion action Become.In some applications, user needs to use the data in each stage HBase table.The frame diagram of data is deposited in HBase such as Shown in Fig. 1.Region server (HregionServer) internal control a series of region (Hregion) object, each Hregion has corresponded to a domain (Region) in Hbase tables, is made up of in HRegion multiple storage files (Hstore).Often Individual Hstore has corresponded to the data storage of the Ge Lie races (ColumnFamily) in Hbase tables, each ColumnFamily its Real is exactly the memory cell of a concentration.Hstore is the core of HBase storages, and it is made up of two parts, and a part is caching (MemStore), a part is file (StoreFile).MemStore is memory cache, and the data of user's write-in are put into first In MemStore, the HFile files for forming bottom HDFS will be completely flushed in StoreFile until MemStore reaches.Therefore, Needed when HBase periodically captures data to MemStore and HFile file keeping records.And HDFS HFile files be Managed in region server, therefore, it is necessary to distributed preservation is carried out to these records of region server.In HBase Main service (Master) and RegionServer communications framework figures are as shown in Figure 2.
Because existing HBase technologies there is no a specific implementation approach for realizing capture data regular to HBase and side Method so that HBase distributed data bases can not meet that user is wanted in different phase using the corresponding data message of HBase table Ask.Therefore, to meet use demand of the user to HBase, it is desirable to provide a kind of method that HBase periodically captures data, can Allow user different phase relievedly use HBase table in data.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of system and method for source distribution formula database capture data, Can allow user different phase relievedly use HBase table in data.
In order to solve the above-mentioned technical problem, the invention provides a kind of method of source distribution formula database capture data, bag Include:
When region server starts keeping records, by the text of the Refresh Data in memory cache to distributed file system In part, then for all distributed file systems for wanting keeping records document creation respective links reference document, and will Reference document is stored into the file set labeled as book server.
Further, this method also includes:
Region server notifies primary service module after the request of keeping records is received;
The information record title for needing keeping records is notified corresponding regional service by primary service module according to the request Device, and create corresponding file set for corresponding region server;
Region server returns to master when successfully completing the operation of keeping records, by the file set for having reference document Service module;
Primary service module is stored to corresponding region server after the file set of region server return is received File set in.
Further, the request for the keeping records that region server receives, it is the request sent by user, or by source point The request that cloth Database Systems are periodically sent automatically.
Further, source distribution formula database is HBase, then each region server performs the workflow tool of keeping records Body includes:
The file of mark local area domain server is created, file is file set;
Judge whether local area domain server meets the condition of keeping records, that is, check in all memory caches whether there is number According to being then by the Refresh Data in memory cache into memory file, then by the distributed text of the data Cun Chudao of memory file In the file of part system;
Establish and can link point in the file of establishment for the file of all distributed file systems for wanting keeping records The corresponding reference document of file of cloth file system;
The Successful Operation notice for carrying the file for having reference document is returned to primary service module.
In order to solve the above-mentioned technical problem, the invention provides a kind of system of source distribution formula database capture data, bag Multiple region servers are included, wherein:
Region server, for when starting keeping records, by the Refresh Data in memory cache to distributed field system In the file of system, then for all distributed file systems for wanting keeping records document creation respective links citation Part, and by reference document storage into the file set labeled as book server.
Further, the system also includes the main clothes being connected by distributed reliable coordination component with multiple region servers Business module, wherein:
Region server notifies primary service module after the request of keeping records is received;Successfully completing the behaviour of keeping records When making, the file set for having reference document is returned into primary service module;
Primary service module, for the information record title for needing keeping records to be notified into corresponding region according to the request Server, and for the file set of the corresponding keeping records of corresponding region server establishment;Returned receiving region server After the file set returned, by reference document therein storage into the file set of corresponding region server.
Further, the request for the keeping records that region server receives, it is the request sent by user, or by source point The request that cloth Database Systems are periodically sent automatically.
A part of data of the present invention in source distribution formula database are stored in the caching of region, some data The document form persistence of file system is in distributed file system in a distributed manner, therefore distinguishes for above-mentioned two partial data Carry out the operation of keeping records.In such a way, user can be allowed to be carried out in different phase to source distribution formula database The operation of data is captured, so as to the data of source distribution formula database information safe to use record.
Brief description of the drawings
Fig. 1 is the frame diagram that data are deposited in existing HBase;
Fig. 2 is the frame diagram that main service communicates with region server in existing HBase;
Fig. 3 is regional service in the method using HBase as the source distribution formula database capture data of embodiment of the invention The flow chart of device keeping records.
Embodiment
Technical scheme is set forth in below in conjunction with accompanying drawing and preferred embodiment.It should be understood that with Under the embodiment enumerated be merely to illustrate and explain the present invention, do not form the limitation to technical solution of the present invention.
The present invention is by the use of HBase as embodiment, to describe the method for the capture data to source distribution formula database.Due to Data in HBase be in the permanent file HFile for be stored in respectively HDFS and region server in memory cache In MemStore, therefore for the regular capture data of HBase, then need to keeping records at HFile, MemStore two. The mode of keeping records is first by the Refresh Data in MemStore into HFile, then again to HFile in MemStore internal memories Carry out the operation of keeping records.When to permanent file HFile keeping records, each HFile document creations one are directed to Point to the reference document of this document.Data in HBase are periodically captured by such mode.
The embodiment of the method for the source distribution formula database capture data of the present invention, comprises the following steps:
When region server starts keeping records, by the text of the Refresh Data in memory cache to distributed file system In part, then for all distributed file systems for wanting keeping records document creation respective links reference document, and deposit Store up in the file set labeled as book server.
Above method embodiment also includes:
Region server notifies primary service module after the request of keeping records is received;
The information record title for needing keeping records is notified corresponding regional service by primary service module according to the request Device, and for the file set of the corresponding keeping records of corresponding region server establishment;
Region server returns to master when successfully completing the operation of keeping records, by the file set for having reference document Service module;
Primary service module is stored to the text of corresponding region server after the file set of region server is received In part set.
The request for the keeping records that region server receives, it is the request that user sends in above method embodiment, or It is the request that source distribution formula Database Systems are periodically sent automatically.
In above method embodiment, if source distribution formula database is HBase, each region server performs keeping records Workflow as shown in figure 3, comprising the following steps:
110:Create the file of a mark local area domain server;
This document folder is above-mentioned file set.
120:Judge whether local area domain server meets the condition of keeping records, be then to perform step 140, otherwise perform step Rapid 130;
Here, the condition of keeping records, which refers to check, does not have data in all memory caches (MemStore).
130:By in the Refresh Data in MemStore to memory file (StoreFile), then by StoreFile number In file (Hfile) according to storage to distributed file system;
140:For it is all want the Hfile files of keeping records to be established in the file of establishment to link Hfile corresponding Reference document;
150:The Successful Operation notice for carrying the file for having reference document is returned to primary service module.
The present invention is directed to above method embodiment, the system for correspondingly additionally providing source distribution formula database capture data, Including multiple region servers, wherein:
Region server, for when starting keeping records by the Refresh Data in memory cache to distributed file system In file in, then for the citation of the document creation respective links in all distributed file systems for wanting keeping records Part, and store into the file set labeled as book server.
Said system embodiment also includes the main clothes being connected by distributed reliable coordination component with multiple region servers Business module, wherein:
Region server notifies primary service module after the request of keeping records is received;Successfully completing the behaviour of keeping records When making, the file set for having reference document is returned into primary service module;
Primary service module, for the information record title for needing keeping records to be notified into corresponding region according to the request Server, and for the file set of the corresponding keeping records of corresponding region server establishment;Receiving region server After file set, by reference document therein storage into the file set of corresponding region server.
The request for the keeping records that region server receives, it is the request sent by user in said system embodiment, Or the request periodically sent automatically by source distribution formula Database Systems.

Claims (6)

1. a kind of method of source distribution formula database capture data, including:
When region server starts keeping records, by the file of the Refresh Data in memory cache to distributed file system In, then for all distributed file systems for wanting keeping records document creation respective links reference document, and will draw With file storage into the file set labeled as book server;
The source distribution formula database is HBase, then the workflow of each region server execution keeping records specifically includes:
The file of mark local area domain server is created, the file is the file set;
Judge whether local area domain server meets the condition of keeping records, that is, check in all memory caches whether there are data, It is then by the Refresh Data in the memory cache into memory file, then by distribution described in the data Cun Chudao of memory file In the file of formula file system;
For the file of all distributed file systems for wanting keeping records energy chain is established in the file of establishment Connect the corresponding reference document of file of the distributed file system;
The Successful Operation notice for carrying the file for having reference document is returned to primary service module.
2. in accordance with the method for claim 1, it is characterised in that also include:
The region server notifies primary service module after the request of the keeping records is received;
The information record title for needing keeping records is notified corresponding region server by primary service module according to the request, and Corresponding file set is created for corresponding region server;
The region server will have the file set of the reference document when successfully completing the operation of the keeping records Return to primary service module;
Primary service module is stored to corresponding region server after the file set that the region server returns is received File set in.
3. in accordance with the method for claim 2, it is characterised in that
The request for the keeping records that the region server receives, it is the request sent by user, or by source distribution formula The request that Database Systems are periodically sent automatically.
4. a kind of system of source distribution formula database capture data, including multiple region servers, wherein:
Region server, for when starting keeping records, the Refresh Data in memory cache to be arrived into distributed file system In file, then for all distributed file systems for wanting keeping records document creation respective links reference document, and By reference document storage into the file set labeled as book server;
The source distribution formula database is HBase;
The region server, the file of local area domain server is marked specifically for creating, the file is the file Set;Judge whether local area domain server meets the condition of keeping records, that is, check in all memory caches whether there are data, It is then by the Refresh Data in the memory cache into memory file, then by distribution described in the data Cun Chudao of memory file In the file of formula file system;For all distributed file systems for wanting keeping records file establishment the text Part underedge establishes the corresponding reference document of file that can link the distributed file system;Carrying is returned to primary service module to deposit The Successful Operation for having the file of reference document notifies.
5. according to the system described in claim 4, it is characterised in that also include by distributed reliable coordination component with it is described more The primary service module of individual region server connection, wherein:
The region server notifies primary service module after the request of keeping records is received;Successfully completing the behaviour of keeping records When making, the file set for having the reference document is returned into primary service module;
Primary service module, for notifying corresponding region to take the information record title for needing keeping records according to the request Business device, and for the file set of the corresponding keeping records of corresponding region server establishment;Receiving the region server After the file set of return, by reference document therein storage into the file set of corresponding region server.
6. according to the system described in claim 5, it is characterised in that the keeping records that the region server receives is asked Ask, be the request sent by user, or the request periodically sent automatically by the source distribution formula Database Systems.
CN201410488046.2A 2014-09-22 2014-09-22 A kind of system and method for source distribution formula database capture data Active CN104239542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410488046.2A CN104239542B (en) 2014-09-22 2014-09-22 A kind of system and method for source distribution formula database capture data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410488046.2A CN104239542B (en) 2014-09-22 2014-09-22 A kind of system and method for source distribution formula database capture data

Publications (2)

Publication Number Publication Date
CN104239542A CN104239542A (en) 2014-12-24
CN104239542B true CN104239542B (en) 2017-11-17

Family

ID=52227601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410488046.2A Active CN104239542B (en) 2014-09-22 2014-09-22 A kind of system and method for source distribution formula database capture data

Country Status (1)

Country Link
CN (1) CN104239542B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312383A (en) * 2021-06-01 2021-08-27 拉卡拉支付股份有限公司 Data query method, data query device, electronic equipment, storage medium and program product

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617211A (en) * 2013-11-20 2014-03-05 浪潮电子信息产业股份有限公司 HBase loaded data importing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9128949B2 (en) * 2012-01-18 2015-09-08 Cloudera, Inc. Memory allocation buffer for reduction of heap fragmentation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617211A (en) * 2013-11-20 2014-03-05 浪潮电子信息产业股份有限公司 HBase loaded data importing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
分布式数据库HBase快照的设计与实现;李崇欣;《中国优秀硕士学位论文全文数据库信息科技辑》;20110715(第7期);第I138-360页 *

Also Published As

Publication number Publication date
CN104239542A (en) 2014-12-24

Similar Documents

Publication Publication Date Title
CN107943777B (en) Collaborative editing and collaborative processing method, device, equipment and storage medium
CN108052675A (en) Blog management method, system and computer readable storage medium
CN104820717B (en) A kind of storage of mass small documents and management method and system
CN104133882A (en) HDFS (Hadoop Distributed File System)-based old file processing method
CN105138661A (en) Hadoop-based k-means clustering analysis system and method of network security log
CN103440288A (en) Big data storage method and device
CN109309631A (en) A kind of method and device based on universal network file system write-in data
CN106777270A (en) A kind of Heterogeneous Database Replication parallel execution system and method based on submission point time line locking
CN106502699A (en) A kind of five application page processing method, apparatus and system
CN102521712A (en) Process instance data processing method and device
CN104967658A (en) Data synchronization method on multiple devices
CN102984357B (en) Contact person information managing method and managing device
CN106202416A (en) Table data write method and device, table data read method and device
CN105426307A (en) Local area network product test resource sharing method and system
CN106503079A (en) A kind of blog management method and system
CN103823846A (en) Method for storing and querying big data on basis of graph theories
CN102508886A (en) Extensive makeup language (XML)-based method for synchronously updating increment of spatial data
CN107026880A (en) Method of data synchronization and device
CN104539583A (en) Real-time database subscription system and method
CN104915460A (en) Log storage method and system
CN103020086B (en) A kind of picture duplicate checking method and device
CN104199963A (en) Method and device for HBase data backup and recovery
CN106776795A (en) Method for writing data and device based on Hbase databases
CN104516985A (en) Rapid mass data importing method based on HBase database
CN104537063A (en) Knowledge venation map construction system and method based on thesis citation network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant