CN104239542B - A kind of system and method for source distribution formula database capture data - Google Patents
A kind of system and method for source distribution formula database capture data Download PDFInfo
- Publication number
- CN104239542B CN104239542B CN201410488046.2A CN201410488046A CN104239542B CN 104239542 B CN104239542 B CN 104239542B CN 201410488046 A CN201410488046 A CN 201410488046A CN 104239542 B CN104239542 B CN 104239542B
- Authority
- CN
- China
- Prior art keywords
- file
- keeping records
- data
- region server
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000015654 memory Effects 0.000 claims abstract description 25
- 238000003860 storage Methods 0.000 claims abstract description 12
- 230000006399 behavior Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 239000004744 fabric Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/273—Asynchronous replication or reconciliation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
- G06F16/183—Provision of network file services by network file servers, e.g. by using NFS, CIFS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention discloses the system and method for source distribution formula database capture data, wherein method includes:When region server starts keeping records, by in the file of the Refresh Data in memory cache to distributed file system, then the reference document of the document creation respective links of all distributed file systems for wanting keeping records is directed to, and by reference document storage into the file set labeled as book server.Keeping records mode as present invention use, user can be allowed to carry out capturing the operation of data to source distribution formula database in different phase, so as to the data of source distribution formula database information safe to use record.
Description
Technical field
The present invention relates to distributed computer cluster system data storehouse technology, more particularly to increased income in distributed cluster system
Distributed data base captures the system and method for data.
Background technology
As data volume is presented explosive growth, the calculation processing power and storage capacity of single computer are much
The requirement of data storage and processing can not be met.Therefore, the architecture of computer distribution type receives the concern of user and good
Comment.In distributed architecture, more cheap computers can be built into distributed cluster system, so that every
Corresponding task can be run on machine, while also can concurrently handle the demand of user.Distributed cluster system has high property
Can, highly reliable, high extension and it is inexpensive the characteristics of.HBase is the distributed data base increased income for distributed cluster system.
Large-scale structure storage cluster can be erected on cheap server using HBase technologies, there is very high data throughput
Amount and good structure are stretched ability, and can not only processing structure data and non-structured data simultaneously, moreover it is possible to logical
Real-time random read-write is crossed to supplement distributed file system (HDFS) deficiency.Therefore, capture data regular to HBase are very
Important.
In traditional database, the uniformity of affairs will be considered by capturing the method for data.The side that database passes through daily record
Formula ensures the uniformity of affairs, i.e., could mark completion after all affairs are submitted.In this process, if wrong hair
It is raw, the affairs in current system can be returned to by way of daily record retraction.Therefore, in the data in capturing traditional database,
Ensure that the keeping records in each database is consistent with source database.When capturing the keeping records of data typically using writing
The mode of copy preserves data record.
In HBase distributed data bases, data can be changed the union operation in HBase table and deletion action
Become.In some applications, user needs to use the data in each stage HBase table.The frame diagram of data is deposited in HBase such as
Shown in Fig. 1.Region server (HregionServer) internal control a series of region (Hregion) object, each
Hregion has corresponded to a domain (Region) in Hbase tables, is made up of in HRegion multiple storage files (Hstore).Often
Individual Hstore has corresponded to the data storage of the Ge Lie races (ColumnFamily) in Hbase tables, each ColumnFamily its
Real is exactly the memory cell of a concentration.Hstore is the core of HBase storages, and it is made up of two parts, and a part is caching
(MemStore), a part is file (StoreFile).MemStore is memory cache, and the data of user's write-in are put into first
In MemStore, the HFile files for forming bottom HDFS will be completely flushed in StoreFile until MemStore reaches.Therefore,
Needed when HBase periodically captures data to MemStore and HFile file keeping records.And HDFS HFile files be
Managed in region server, therefore, it is necessary to distributed preservation is carried out to these records of region server.In HBase
Main service (Master) and RegionServer communications framework figures are as shown in Figure 2.
Because existing HBase technologies there is no a specific implementation approach for realizing capture data regular to HBase and side
Method so that HBase distributed data bases can not meet that user is wanted in different phase using the corresponding data message of HBase table
Ask.Therefore, to meet use demand of the user to HBase, it is desirable to provide a kind of method that HBase periodically captures data, can
Allow user different phase relievedly use HBase table in data.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of system and method for source distribution formula database capture data,
Can allow user different phase relievedly use HBase table in data.
In order to solve the above-mentioned technical problem, the invention provides a kind of method of source distribution formula database capture data, bag
Include:
When region server starts keeping records, by the text of the Refresh Data in memory cache to distributed file system
In part, then for all distributed file systems for wanting keeping records document creation respective links reference document, and will
Reference document is stored into the file set labeled as book server.
Further, this method also includes:
Region server notifies primary service module after the request of keeping records is received;
The information record title for needing keeping records is notified corresponding regional service by primary service module according to the request
Device, and create corresponding file set for corresponding region server;
Region server returns to master when successfully completing the operation of keeping records, by the file set for having reference document
Service module;
Primary service module is stored to corresponding region server after the file set of region server return is received
File set in.
Further, the request for the keeping records that region server receives, it is the request sent by user, or by source point
The request that cloth Database Systems are periodically sent automatically.
Further, source distribution formula database is HBase, then each region server performs the workflow tool of keeping records
Body includes:
The file of mark local area domain server is created, file is file set;
Judge whether local area domain server meets the condition of keeping records, that is, check in all memory caches whether there is number
According to being then by the Refresh Data in memory cache into memory file, then by the distributed text of the data Cun Chudao of memory file
In the file of part system;
Establish and can link point in the file of establishment for the file of all distributed file systems for wanting keeping records
The corresponding reference document of file of cloth file system;
The Successful Operation notice for carrying the file for having reference document is returned to primary service module.
In order to solve the above-mentioned technical problem, the invention provides a kind of system of source distribution formula database capture data, bag
Multiple region servers are included, wherein:
Region server, for when starting keeping records, by the Refresh Data in memory cache to distributed field system
In the file of system, then for all distributed file systems for wanting keeping records document creation respective links citation
Part, and by reference document storage into the file set labeled as book server.
Further, the system also includes the main clothes being connected by distributed reliable coordination component with multiple region servers
Business module, wherein:
Region server notifies primary service module after the request of keeping records is received;Successfully completing the behaviour of keeping records
When making, the file set for having reference document is returned into primary service module;
Primary service module, for the information record title for needing keeping records to be notified into corresponding region according to the request
Server, and for the file set of the corresponding keeping records of corresponding region server establishment;Returned receiving region server
After the file set returned, by reference document therein storage into the file set of corresponding region server.
Further, the request for the keeping records that region server receives, it is the request sent by user, or by source point
The request that cloth Database Systems are periodically sent automatically.
A part of data of the present invention in source distribution formula database are stored in the caching of region, some data
The document form persistence of file system is in distributed file system in a distributed manner, therefore distinguishes for above-mentioned two partial data
Carry out the operation of keeping records.In such a way, user can be allowed to be carried out in different phase to source distribution formula database
The operation of data is captured, so as to the data of source distribution formula database information safe to use record.
Brief description of the drawings
Fig. 1 is the frame diagram that data are deposited in existing HBase;
Fig. 2 is the frame diagram that main service communicates with region server in existing HBase;
Fig. 3 is regional service in the method using HBase as the source distribution formula database capture data of embodiment of the invention
The flow chart of device keeping records.
Embodiment
Technical scheme is set forth in below in conjunction with accompanying drawing and preferred embodiment.It should be understood that with
Under the embodiment enumerated be merely to illustrate and explain the present invention, do not form the limitation to technical solution of the present invention.
The present invention is by the use of HBase as embodiment, to describe the method for the capture data to source distribution formula database.Due to
Data in HBase be in the permanent file HFile for be stored in respectively HDFS and region server in memory cache
In MemStore, therefore for the regular capture data of HBase, then need to keeping records at HFile, MemStore two.
The mode of keeping records is first by the Refresh Data in MemStore into HFile, then again to HFile in MemStore internal memories
Carry out the operation of keeping records.When to permanent file HFile keeping records, each HFile document creations one are directed to
Point to the reference document of this document.Data in HBase are periodically captured by such mode.
The embodiment of the method for the source distribution formula database capture data of the present invention, comprises the following steps:
When region server starts keeping records, by the text of the Refresh Data in memory cache to distributed file system
In part, then for all distributed file systems for wanting keeping records document creation respective links reference document, and deposit
Store up in the file set labeled as book server.
Above method embodiment also includes:
Region server notifies primary service module after the request of keeping records is received;
The information record title for needing keeping records is notified corresponding regional service by primary service module according to the request
Device, and for the file set of the corresponding keeping records of corresponding region server establishment;
Region server returns to master when successfully completing the operation of keeping records, by the file set for having reference document
Service module;
Primary service module is stored to the text of corresponding region server after the file set of region server is received
In part set.
The request for the keeping records that region server receives, it is the request that user sends in above method embodiment, or
It is the request that source distribution formula Database Systems are periodically sent automatically.
In above method embodiment, if source distribution formula database is HBase, each region server performs keeping records
Workflow as shown in figure 3, comprising the following steps:
110:Create the file of a mark local area domain server;
This document folder is above-mentioned file set.
120:Judge whether local area domain server meets the condition of keeping records, be then to perform step 140, otherwise perform step
Rapid 130;
Here, the condition of keeping records, which refers to check, does not have data in all memory caches (MemStore).
130:By in the Refresh Data in MemStore to memory file (StoreFile), then by StoreFile number
In file (Hfile) according to storage to distributed file system;
140:For it is all want the Hfile files of keeping records to be established in the file of establishment to link Hfile corresponding
Reference document;
150:The Successful Operation notice for carrying the file for having reference document is returned to primary service module.
The present invention is directed to above method embodiment, the system for correspondingly additionally providing source distribution formula database capture data,
Including multiple region servers, wherein:
Region server, for when starting keeping records by the Refresh Data in memory cache to distributed file system
In file in, then for the citation of the document creation respective links in all distributed file systems for wanting keeping records
Part, and store into the file set labeled as book server.
Said system embodiment also includes the main clothes being connected by distributed reliable coordination component with multiple region servers
Business module, wherein:
Region server notifies primary service module after the request of keeping records is received;Successfully completing the behaviour of keeping records
When making, the file set for having reference document is returned into primary service module;
Primary service module, for the information record title for needing keeping records to be notified into corresponding region according to the request
Server, and for the file set of the corresponding keeping records of corresponding region server establishment;Receiving region server
After file set, by reference document therein storage into the file set of corresponding region server.
The request for the keeping records that region server receives, it is the request sent by user in said system embodiment,
Or the request periodically sent automatically by source distribution formula Database Systems.
Claims (6)
1. a kind of method of source distribution formula database capture data, including:
When region server starts keeping records, by the file of the Refresh Data in memory cache to distributed file system
In, then for all distributed file systems for wanting keeping records document creation respective links reference document, and will draw
With file storage into the file set labeled as book server;
The source distribution formula database is HBase, then the workflow of each region server execution keeping records specifically includes:
The file of mark local area domain server is created, the file is the file set;
Judge whether local area domain server meets the condition of keeping records, that is, check in all memory caches whether there are data,
It is then by the Refresh Data in the memory cache into memory file, then by distribution described in the data Cun Chudao of memory file
In the file of formula file system;
For the file of all distributed file systems for wanting keeping records energy chain is established in the file of establishment
Connect the corresponding reference document of file of the distributed file system;
The Successful Operation notice for carrying the file for having reference document is returned to primary service module.
2. in accordance with the method for claim 1, it is characterised in that also include:
The region server notifies primary service module after the request of the keeping records is received;
The information record title for needing keeping records is notified corresponding region server by primary service module according to the request, and
Corresponding file set is created for corresponding region server;
The region server will have the file set of the reference document when successfully completing the operation of the keeping records
Return to primary service module;
Primary service module is stored to corresponding region server after the file set that the region server returns is received
File set in.
3. in accordance with the method for claim 2, it is characterised in that
The request for the keeping records that the region server receives, it is the request sent by user, or by source distribution formula
The request that Database Systems are periodically sent automatically.
4. a kind of system of source distribution formula database capture data, including multiple region servers, wherein:
Region server, for when starting keeping records, the Refresh Data in memory cache to be arrived into distributed file system
In file, then for all distributed file systems for wanting keeping records document creation respective links reference document, and
By reference document storage into the file set labeled as book server;
The source distribution formula database is HBase;
The region server, the file of local area domain server is marked specifically for creating, the file is the file
Set;Judge whether local area domain server meets the condition of keeping records, that is, check in all memory caches whether there are data,
It is then by the Refresh Data in the memory cache into memory file, then by distribution described in the data Cun Chudao of memory file
In the file of formula file system;For all distributed file systems for wanting keeping records file establishment the text
Part underedge establishes the corresponding reference document of file that can link the distributed file system;Carrying is returned to primary service module to deposit
The Successful Operation for having the file of reference document notifies.
5. according to the system described in claim 4, it is characterised in that also include by distributed reliable coordination component with it is described more
The primary service module of individual region server connection, wherein:
The region server notifies primary service module after the request of keeping records is received;Successfully completing the behaviour of keeping records
When making, the file set for having the reference document is returned into primary service module;
Primary service module, for notifying corresponding region to take the information record title for needing keeping records according to the request
Business device, and for the file set of the corresponding keeping records of corresponding region server establishment;Receiving the region server
After the file set of return, by reference document therein storage into the file set of corresponding region server.
6. according to the system described in claim 5, it is characterised in that the keeping records that the region server receives is asked
Ask, be the request sent by user, or the request periodically sent automatically by the source distribution formula Database Systems.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410488046.2A CN104239542B (en) | 2014-09-22 | 2014-09-22 | A kind of system and method for source distribution formula database capture data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410488046.2A CN104239542B (en) | 2014-09-22 | 2014-09-22 | A kind of system and method for source distribution formula database capture data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104239542A CN104239542A (en) | 2014-12-24 |
CN104239542B true CN104239542B (en) | 2017-11-17 |
Family
ID=52227601
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410488046.2A Active CN104239542B (en) | 2014-09-22 | 2014-09-22 | A kind of system and method for source distribution formula database capture data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104239542B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312383B (en) * | 2021-06-01 | 2024-08-20 | 拉卡拉支付股份有限公司 | Data query method, device, electronic equipment, storage medium and program product |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617211A (en) * | 2013-11-20 | 2014-03-05 | 浪潮电子信息产业股份有限公司 | HBase loaded data importing method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9128949B2 (en) * | 2012-01-18 | 2015-09-08 | Cloudera, Inc. | Memory allocation buffer for reduction of heap fragmentation |
-
2014
- 2014-09-22 CN CN201410488046.2A patent/CN104239542B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617211A (en) * | 2013-11-20 | 2014-03-05 | 浪潮电子信息产业股份有限公司 | HBase loaded data importing method |
Non-Patent Citations (1)
Title |
---|
分布式数据库HBase快照的设计与实现;李崇欣;《中国优秀硕士学位论文全文数据库信息科技辑》;20110715(第7期);第I138-360页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104239542A (en) | 2014-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107943777B (en) | Collaborative editing and collaborative processing method, device, equipment and storage medium | |
US8601361B2 (en) | Automatically populating and/or generating tables using data extracted from files | |
CN103731483B (en) | Virtual file system based on cloud computing | |
CN104820717B (en) | A kind of storage of mass small documents and management method and system | |
CN103793514B (en) | Database synchronization method and database | |
CN103905508B (en) | Cloud platform application dispositions method and device | |
CN103019814B (en) | A kind of shear plate management system and method | |
CN103164435B (en) | A kind of acquisition method of network data and system | |
CN107169083A (en) | Public security bayonet socket magnanimity vehicle data storage and retrieval method and device, electronic equipment | |
CN102609479B (en) | A kind of memory database node clone method | |
CN103731489B (en) | A kind of date storage method, system and equipment | |
CN106777270A (en) | A kind of Heterogeneous Database Replication parallel execution system and method based on submission point time line locking | |
CN206249316U (en) | A kind of Data Centre in Hospital plateform system based on high in the clouds | |
CN104539583B (en) | A kind of real-time data base ordering system and method | |
CN102984357B (en) | Contact person information managing method and managing device | |
CN106202416A (en) | Table data write method and device, table data read method and device | |
CN103279474A (en) | Video file index method and system | |
CN105426307A (en) | Local area network product test resource sharing method and system | |
CN105069128B (en) | Method of data synchronization and device | |
CN105930502B (en) | System, client and method for collecting data | |
CN104182487A (en) | Unified storage method supporting various storage modes | |
CN104537063A (en) | Knowledge venation map construction system and method based on thesis citation network | |
CN104504010A (en) | Many-to-many data acquisition system and acquisition method thereof | |
CN104239542B (en) | A kind of system and method for source distribution formula database capture data | |
CN105407044A (en) | Method for implementing cloud storage gateway system based on network file system (NFS) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |