CN106776795A

CN106776795A - Method for writing data and device based on Hbase databases

Info

Publication number: CN106776795A
Application number: CN201611047256.3A
Authority: CN
Inventors: 黄健文; 王刚
Original assignee: Individual
Current assignee: Individual
Priority date: 2016-11-23
Filing date: 2016-11-23
Publication date: 2017-05-31
Anticipated expiration: 2036-11-23
Also published as: CN106776795B

Abstract

The invention discloses a kind of method for writing data and device based on Hbase databases, the method includes：Using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data,The data record that will be stored in the cache,The corresponding row Major key of data record of the storage,The identification code of the thread of storage and the row list of primary keys of storage are write in the distributed file system in the database,And after the completion of write-in,The identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared,The reference data is compared with the data to be compared,If comparison result is that there are during data record do not write the distributed file system of the database,During the file to be written then write into the database again,So ensure the integrality of data storage,Recorded compared in the way of journal file simultaneously,The present invention is compared using row list of primary keys,Occupying system resources very little,And then do not interfere with data storage efficiency.

Description

Method for writing data and device based on Hbase databases

Technical field

The invention belongs to field of data storage, more particularly to a kind of method for writing data and dress based on Hbase databases Put.

Background technology

The method that distributed storage is taken current cloud storage system, by data dispersion storage in many independent equipment more On, on the one hand the perfect performance of database, improves the reading efficiency of data；On the other hand due to distributed storage architecture, When there is storage device failure to occur, the access of local data can be only influenceed, without making whole database paralyse, and then increased The safety and reliability of big data.Hadoop databases (HBase, Hadoop Database) are a kind of distributed storages System.Although HBase databases can be avoided when storage device sends failure, depositing for total data in database is not interfered with Take, but cannot avoid producing failure in data writing process, and then lead to not by search index to target data.

In the prior art, ahead daily record WAL (Write Ahead Logging) is to ensure that the standard side of data integrity Method.In the case of database corruption, the daily record that is prestored by WAL recovers database.The daily record needs for prestoring Record storing process each time, therefore daily record can take a large amount of storage resources of system, while the I/O resources of the system of occupancy, once The data volume of storage increases, and must reduce the efficiency of data storage.

The content of the invention

The present invention provides a kind of method for writing data and device based on Hbase databases, it is intended to solve because of prior art In the daily record that prestores take system ample resources and cause to reduce the problem of data storage efficiency.

A kind of method for writing data based on Hbase databases that the present invention is provided, including：Obtained from thread to be written The identification code of the corresponding row Major key of the corresponding data record of file, the data record and the thread, and generation include There is the row list of primary keys of the corresponding relation between the data record and the row Major key, while by the thread of the acquisition Identification code and the row list of primary keys of the generation are used as reference data；By the data record of the acquisition, the data of the acquisition In recording the row list of primary keys write into Databasce of corresponding row Major key, the identification code of the thread of the acquisition and the generation In cache；By the data record stored in the cache, the corresponding row Major key of the data record of the storage, deposit The identification code of the thread of storage and the row list of primary keys of storage are write in the distributed file system in the database, and in write-in After the completion of, the identification code and row list of primary keys of thread in the cache are will be stored in as data to be compared；Will be described Reference data is compared with the data to be compared, if comparison result is that there are data record not writing the database In distributed file system, then again by the file write-in to be written database.

A kind of data transfer apparatus based on Hbase databases that the present invention is provided, including：Acquisition module, for from line The identification of the corresponding row Major key of the corresponding data record of file to be written, the data record and the thread is obtained in journey Code, and generate the row list of primary keys for including corresponding relation between the data record and the row Major key, while by institute The identification code of the thread of acquisition and the row list of primary keys of the generation are stated as reference data；Processing module, for being obtained described Data record, the corresponding row Major key of the data record of the acquisition, the identification code of the thread of the acquisition and the life for taking Into row list of primary keys write into Databasce in cache in；The processing module, is additionally operable to be deposited in the cache The row master of the data record of storage, the corresponding row Major key of the data record of the storage, the identification code of the thread of storage and storage Key list is write in the distributed file system in the database, and after the completion of write-in, will be stored in the cache In thread identification code and row list of primary keys as data to be compared；The processing module, is additionally operable to the reference data Compare with the data to be compared, if comparison result is that there are the distribution text that data record does not write the database In part system, then again by the file write-in to be written database.

Method for writing data and device based on Hbase databases that the present invention is provided, obtain text to be written from thread The identification code of the corresponding data record of part, the corresponding row Major key of the data record and the thread, and generation includes the number According to the row list of primary keys of the corresponding relation between record and the row Major key, while by the identification code of the thread of the acquisition and the life Into row list of primary keys as reference data, by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, In cache in the identification code of the thread of the acquisition and the row list of primary keys write into Databasce of the generation, by the cache The corresponding row Major key of the data record of middle storage, the data record of the storage, the identification code of the thread of storage and the row of storage List of primary keys is write in the distributed file system in the database, and after the completion of write-in, in will be stored in the cache Thread identification code and row list of primary keys as data to be compared, the reference data is compared with the data to be compared, If comparison result is that there are during data record do not write the distributed file system of the database, again by the text to be written Part is write in the database, so every time after data are write, by comparing whether determination data are all written to database In, and then ensure the integrality of data storage, while being recorded compared in the way of journal file, the present invention utilizes row primary key column Table is compared, occupying system resources very littles, and then does not interfere with data storage efficiency.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention.

Fig. 1 is that the method for writing data based on Hbase databases that first embodiment of the invention is provided realizes that flow is shown It is intended to；

Fig. 2 is that the method for writing data based on Hbase databases that second embodiment of the invention is provided realizes that flow is shown It is intended to；

Fig. 3 is the structural representation of the data transfer apparatus based on Hbase databases that third embodiment of the invention is provided；

Fig. 4 is the structural representation of the data transfer apparatus based on Hbase databases that fourth embodiment of the invention is provided.

Specific embodiment

To enable that goal of the invention of the invention, feature, advantage are more obvious and understandable, below in conjunction with the present invention Accompanying drawing in embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described reality It is only a part of embodiment of the invention to apply example, and not all embodiments.Based on the embodiment in the present invention, people in the art The every other embodiment that member is obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.

Fig. 1 is referred to, Fig. 1 provides the reality of the method for writing data based on Hbase databases for first embodiment of the invention Existing schematic flow sheet, in can be applied to the terminal with data processing function, such as computer, shown in Fig. 1 based on Hbase data The method for writing data in storehouse, mainly includes the following steps that：

S101, obtain from thread the corresponding row Major key of the corresponding data record of file to be written, the data record with And the identification code of the thread, and generate the row primary key column for including the corresponding relation between the data record and the row Major key Table, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data.

Hbase databases include multiple threads (Threads), and the plurality of thread is used to distribute and dispatch.One thread is equal The identity number (ID, identification) of correspondence one identification code of thread, i.e. thread.In actual applications, one File to be written is segmented into multiple data records, and the most of data records in a file to be written can be assigned to a line Cheng Zhong, but also there is the possibility being assigned in multiple threads.One data record one row Major key (rowkey) of correspondence.The row The corresponding relation between multiple data records and the row Major key, wherein the row list of primary keys and acquisition are included in list of primary keys Thread ID it is corresponding.

Alternatively, can also be corresponding with the file to be written by the row list of primary keys, and by the row list of primary keys with The corresponding relation of the file to be written is stored, and informs the threads store row list of primary keys with the file to be written simultaneously Corresponding relation.

S102, by the data record of the acquisition, the data record of the acquisition corresponding row Major key, the thread of the acquisition In cache in identification code and the row list of primary keys write into Databasce of the generation.

The cache of Hbase databases is MemStore, and in actual applications, by the data record of the acquisition, this obtains The row list of primary keys write-in internal memory of the corresponding row Major key of data record, the identification code of the thread of the acquisition and the generation that take In MemStore.

S103, the data record by being stored in the cache, the corresponding row Major key of the data record of the storage, storage The identification code of thread and during the row list of primary keys of storage writes the distributed file system in the database, and completed in write-in Afterwards, the identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared.

Here it is that the data of cache storage in S102 are written to distributed file system, i.e. HBase databases In Hfile.

S104, the reference data and the data to be compared are compared, if comparison result is to there are data record not In writing the distributed file system of the database, then in the file to be written being write into the database again.

The process of comparison is in order to ensure the integrality of the data being written in distributed file system.If comparison result is There are during data record do not write the distributed file system of the database, then the file to be written is write into the data again In storehouse, that is, need to re-execute step S101 to step S104.

In the embodiment of the present invention, the corresponding data record of file to be written, the data record are obtained from thread corresponding The identification code of row Major key and the thread, and generation includes the corresponding relation between the data record and the row Major key Row list of primary keys, while the identification code of the thread of the acquisition and the row list of primary keys of the generation are obtained this as reference data The row of the data record, the corresponding row Major key of the data record of the acquisition, the identification code of the thread of the acquisition and the generation that take In cache in list of primary keys write into Databasce, data record, the data of the storage note that will be stored in the cache Record the distribution text that corresponding row Major key, the identification code of thread of storage and the row list of primary keys of storage are write in the database In part system, and after the completion of write-in, the identification code and row list of primary keys of the thread in the cache are will be stored in as treating Comparison data, the reference data is compared with the data to be compared, if comparison result does not write to there are data record In the distributed file system of the database, then in the file to be written being write into the database again, so every time in write-in After data, by comparing whether determination data are all written in database, and then the integrality of data storage is ensured, while Recorded compared in the way of journal file, the embodiment of the present invention is compared using row list of primary keys, occupying system resources very little, And then do not interfere with data storage efficiency.

Fig. 2 is referred to, the method for writing data based on Hbase databases that Fig. 2 is provided for second embodiment of the invention Realize schematic flow sheet, in can be applied to the terminal with data processing function, such as computer, shown in Fig. 2 based on Hbase numbers According to the method for writing data in storehouse, mainly include the following steps that：

S201, data record in the file to be written is sent into thread, receive the thread with by the thread Each data record generate corresponding row Major key.

One data record one row Major key of correspondence.In actual applications, a file to be written is segmented into multiple Data record, the most of data records in a file to be written can be assigned in a thread, but are also assigned to many Possibility in individual thread.In actual applications, thread is the rowkey values generated by Hash (Hash) algorithm.

S202, obtain from thread the corresponding row Major key of the corresponding data record of file to be written, the data record with And the identification code of the thread, and generate the row primary key column for including the corresponding relation between the data record and the row Major key Table, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data.

Comprising the corresponding relation between multiple data records and the row Major key, wherein row master in the row list of primary keys Key list is corresponding with the ID of the thread for obtaining.

Alternatively, the corresponding relation of the threads store row list of primary keys and the file to be written can also be informed.

S203, by the data record of the acquisition, the data record of the acquisition corresponding row Major key, the thread of the acquisition In cache in identification code and the row list of primary keys write into Databasce of the generation.

S204, the data record by being stored in the cache, the corresponding row Major key of the data record of the storage, storage The identification code of thread and during the row list of primary keys of storage writes the distributed file system in the database, and completed in write-in Afterwards, the identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared.

Here it is that the data of cache storage in S203 are written to distributed file system, i.e. HBase databases In Hfile.In actual applications, first will in the cache store data record, the data record of the storage it is corresponding Row Major key, the identification code of thread of storage and the row list of primary keys of storage are transmitted to the HStore, Ran Houtong of HBase databases Cross data record, the corresponding row Major key of the data record of the storage, storage that commit modes will be stored in the cache Thread identification code and storage row list of primary keys be written in HFile.After HFile is write, by the cache Thread identification code and row list of primary keys be stored in the cache as data to be compared where server in.

S205, the reference data and the data to be compared are compared, if comparison result is to there are data record not In writing the distributed file system of the database, then in the file to be written being write into the database again.

The process of comparison is in order to ensure the integrality of the data being written in distributed file system.If comparison result is There are during data record do not write the distributed file system of the database, then the file to be written is write into the data again In storehouse, that is, need to re-execute step S201 to step S205.

Alternatively, the reference data and the data to be compared are compared specially：

Judge whether the identification code of thread in the data to be compared is consistent with the identification code of the thread in the reference data；

If consistent, the row list of primary keys in the data to be compared is compared with the row list of primary keys in the reference data It is right；

If row list of primary keys in the data to be compared is completely the same with row list of primary keys in the reference data, the ratio It is in not writing the distributed file system of the database without data record to result；

If row list of primary keys in the data to be compared is inconsistent with row list of primary keys in the reference data, the comparison Result is that there are during data record do not write the distributed file system of the database.

First have to judge whether the reference data is consistent with the ID of data thread to be compared, in the bar that the ID of thread is consistent Under part, then the uniformity for comparing row list of primary keys in the reference data and data to be compared.

Alternatively, if comparison result is that there are during data record do not write the distributed file system of the database, Again it is specially in the file to be written being write into the database：

If the comparison result is that there are during data record do not write the distributed file system of the database, from the base The row Major key lacked in the row list of primary keys of the data to be compared is searched in the row list of primary keys of quasi- data；

According to the reference data or the identification code of the data thread to be compared, the row master of the missing is obtained from the thread The corresponding file to be written of row list of primary keys of row list of primary keys and the acquisition where key assignments, is wherein stored with the thread Corresponding relation between row list of primary keys and file to be written；

Again in the corresponding data record of the file to be written being write into the database, and comparison basis data and treat again Comparison data, until the comparison result is in not writing the distributed file system of the database without data record.

Row list of primary keys where being to determine the row Major key of missing first, then finds thread by row list of primary keys ID, finally by thread store row list of primary keys and file to be written between corresponding relation, using row list of primary keys The file to be written for needing to re-write is found, step S201- step S205 are then re-executed, until the comparison result is not for In thering is data record not write the distributed file system of the database, this avoid loss of data, it is ensured that data can be with complete Whole storage is in database.

It should be noted that for the file same to be written for re-writing, the rowkey values of correspondence generation be it is identical, Simultaneously it can be appreciated that row list of primary keys is identical, can so avoid repeating to be stored in identical data record.

Alternatively, after this compares the reference data and the data to be compared, also include：

If the comparison result is in not writing the distributed file system of the database without data record, to delete the base Quasi- data and the data to be compared；

According to the reference data or the identification code of the data thread to be compared, sent to the thread and delete prompt message, The deletion prompt message is corresponding between the row list of primary keys in the thread and the file to be written for pointing out deletion to be stored in Relation.

If the comparison result is in not writing the distributed file system of the database without data record, to delete the benchmark Data and the data to be compared, and inform thread delete the row list of primary keys and the file to be written that are stored in the thread it Between corresponding relation, can so discharge the memory space of part, optimize system resource.

In the embodiment of the present invention, data record in the file to be written is sent into thread, with should by the thread Each data record that thread is received generates corresponding row Major key, and the corresponding data note of file to be written is obtained from thread The identification code of record, the corresponding row Major key of the data record and the thread, and generation includes the data record and row master The row list of primary keys of the corresponding relation between key assignments, while by the identification code of the thread of the acquisition and the row list of primary keys of the generation As reference data, by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, the thread of the acquisition In cache in identification code and the row list of primary keys write into Databasce of the generation, the data note that will be stored in the cache The row list of primary keys write-in of record, the corresponding row Major key of data record of the storage, the identification code of the thread of storage and storage should In distributed file system in database, and after the completion of write-in, the identification code of the thread in the cache is will be stored in With row list of primary keys as data to be compared, the reference data is compared with the data to be compared, if comparison result is to deposit In the distributed file system for thering is data record not write the database, then the file to be written is write into the database again In, whether so every time after data are write, all it is written in database, and then ensure data by comparing determination data The integrality of storage, while being recorded compared in the way of journal file, the present invention is compared using row list of primary keys, takes system System resource very little, and then do not interfere with data storage efficiency.

Fig. 3 is referred to, Fig. 3 is the data transfer apparatus based on Hbase databases that third embodiment of the invention is provided Structural representation, for convenience of description, illustrate only the part related to the embodiment of the present invention.Fig. 3 examples based on Hbase numbers Can be that the data based on Hbase databases that earlier figures 1 and embodiment illustrated in fig. 2 are provided write according to the data transfer apparatus in storehouse The executive agent of method.The data transfer apparatus based on Hbase databases of Fig. 3 examples, mainly include：The He of acquisition module 301 Processing module 302.Each functional module describes in detail as follows above：

Acquisition module 301, it is corresponding for obtaining the corresponding data record of file to be written, the data record from thread The identification code of row Major key and the thread, and generation includes the corresponding relation between the data record and the row Major key Row list of primary keys, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data；

Processing module 302, for by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, should In cache in the identification code of the thread of acquisition and the row list of primary keys write into Databasce of the generation；

Processing module 302, be additionally operable to will in the cache store data record, the data record of the storage it is corresponding Row Major key, the identification code of thread of storage and the row list of primary keys of storage write the distributed file system in the database In, and after the completion of write-in, the identification code and row list of primary keys of thread in the cache are will be stored in as number to be compared According to；

Processing module 302, is additionally operable to compare the reference data with the data to be compared, if comparison result is presence In thering is data record not write the distributed file system of the database, then the file to be written is write into the database again In.

One thread corresponds to an identification code for thread, the i.e. ID of thread.One file to be written is segmented into multiple Data record, the most of data records in a file to be written can be assigned in a thread, but are also assigned to many Possibility in individual thread.Comprising the corresponding relation between multiple data records and the row Major key in the row list of primary keys, its In the row list of primary keys with obtain thread ID it is corresponding.Alternatively, can also be by the row list of primary keys and the text to be written Part it is corresponding, and the row list of primary keys and the corresponding relation of the file to be written are stored, and inform that thread is deposited simultaneously Store up the corresponding relation of the row list of primary keys and the file to be written.The cache of Hbase databases is MemStore.

The process of comparison is in order to ensure the integrality of the data being written in distributed file system.If comparison result is There are during data record do not write the distributed file system of the database, then processing module 302 is again by the text to be written Part is write in the database.

The present embodiment details not to the greatest extent, refers to the description of foregoing embodiment illustrated in fig. 1, and here is omitted.

It should be noted that in the implementation method of the data transfer apparatus based on Hbase databases of figure 3 above example, The division of each functional module is merely illustrative of, in practical application can as needed, such as the configuration requirement of corresponding hardware or The convenient consideration of the realization of person's software, and above-mentioned functions distribution is completed by different functional modules, Hbase data will be based on The internal structure of the data transfer apparatus in storehouse is divided into different functional modules, to complete all or part of work(described above Energy.And, in practical application, the corresponding functional module in the present embodiment can be realized by corresponding hardware, it is also possible to by Corresponding hardware performs corresponding software and completes.Each embodiment that this specification is provided can all apply foregoing description principle, with Under repeat no more.

In the embodiment of the present invention, acquisition module 301 obtains the corresponding data record of file to be written, the data from thread Corresponding row Major key and the identification code of the thread are recorded, and generation includes between the data record and the row Major key The row list of primary keys of corresponding relation, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as base value According to, processing module 302 by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, the acquisition thread Identification code and the generation row list of primary keys write into Databasce in cache in, processing module 302 is by the cache The corresponding row Major key of the data record of middle storage, the data record of the storage, the identification code of the thread of storage and the row of storage List of primary keys is write in the distributed file system in the database, and after the completion of write-in, in will be stored in the cache Thread identification code and row list of primary keys as data to be compared, processing module 302 is by the reference data and the number to be compared According to comparing, if comparison result is that there are during data record do not write the distributed file system of the database, again During the file to be written write into the database, so every time after data are write, determine data whether all by comparing It is written in database, and then ensures the integrality of data storage, while is recorded compared in the way of journal file, the present invention Compared using row list of primary keys, occupying system resources very little, and then do not interfere with data storage efficiency.

Fig. 4 is referred to, the data transfer apparatus based on Hbase databases that Fig. 4 is provided for fourth embodiment of the invention Structural representation, for convenience of description, illustrate only the part related to the embodiment of the present invention.Fig. 4 examples based on Hbase numbers Can be that the data based on Hbase databases that earlier figures 1 and embodiment illustrated in fig. 2 are provided write according to the data transfer apparatus in storehouse The executive agent of method.The data transfer apparatus based on Hbase databases of Fig. 4 examples, mainly include：Sending module 401, obtain Modulus block 402, processing module 403, removing module 404 and reminding module 405, wherein processing module 403 include comparing submodule Block 4031；Also include searching submodule 4032, acquisition submodule 4033 in processing module 403 and reset submodule 4034.More than Each functional module describes in detail as follows：

Sending module 401, for data record in the file to be written to be sent into thread, with should by the thread Each data record that thread is received generates corresponding row Major key.

Acquisition module 402, it is corresponding for obtaining the corresponding data record of file to be written, the data record from thread The identification code of row Major key and the thread, and generation includes the corresponding relation between the data record and the row Major key Row list of primary keys, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data；

Processing module 403, for by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, should In cache in the identification code of the thread of acquisition and the row list of primary keys write into Databasce of the generation；

The processing module 403, is additionally operable to data record, the data record of the storage correspondence that will be stored in the cache Row Major key, the identification code of thread of storage and the row list of primary keys of storage write distributed file system in the database In, and after the completion of write-in, the identification code and row list of primary keys of thread in the cache are will be stored in as number to be compared According to；

The processing module 403, is additionally operable to compare the reference data with the data to be compared, if comparison result is to deposit In the distributed file system for thering is data record not write the database, then the file to be written is write into the database again In.

One data record one row Major key of correspondence.In actual applications, a file to be written is segmented into multiple Data record, the most of data records in a file to be written can be assigned in a thread, but are also assigned to many Possibility in individual thread.In actual applications, thread is the rowkey values generated by hash algorithm.Wrapped in the row list of primary keys Containing the corresponding relation between multiple data records and the row Major key, wherein ID phases of the row list of primary keys and the thread of acquisition Correspondence.

Alternatively, sending module 401 is additionally operable to inform that the threads store row list of primary keys is corresponding with the file to be written Relation.

Alternatively, processing module 403 includes：Compare submodule 4031；

Submodule 4031 is compared, for the line in the identification code and the reference data that judge the thread in the data to be compared Whether the identification code of journey is consistent；

Submodule 4032 is compared, if being additionally operable to unanimously, by the row list of primary keys in the data to be compared and the base value Row list of primary keys in is compared；

Submodule 4031 is compared, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is completely the same, then the comparison result is in not writing the distributed file system of the database without data record；

Submodule 4031 is compared, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is inconsistent, then the comparison result is that there are during data record do not write the distributed file system of the database.

Alternatively, processing module 403 also includes：Search submodule 4032, acquisition submodule 4033 and reset submodule 4034th, submodule 4035 and prompting submodule 4036 are deleted；

Submodule 4032 is searched, if being that there are the distribution that data record does not write the database for the comparison result In file system, then the row lacked in the row list of primary keys of the data to be compared is searched from the row list of primary keys of the reference data Major key；

Acquisition submodule 4033, for the identification code according to the reference data or the data thread to be compared, from the line The corresponding text to be written of row list of primary keys of row list of primary keys and the acquisition where the row Major key of the missing is obtained in journey The corresponding relation being stored between row list of primary keys and file to be written in part, the wherein thread；

Submodule 4034 is reset, in the corresponding data record of the file to be written write into the database again, and Again comparison basis data and data to be compared, until the comparison result is the distribution that the database is not write without data record In formula file system.

Submodule 4035 is deleted, if for the comparison result being the distribution that the database is not write without data record In formula file system, then the reference data and the data to be compared are deleted；

Prompting submodule 4036, for the identification code according to the reference data or the data thread to be compared, to The thread sends deletes prompt message, and the deletion prompt message is used to point out to delete the row major key being stored in the thread Corresponding relation between list and the file to be written.

The present embodiment details not to the greatest extent, refers to the description of earlier figures 1 and embodiment illustrated in fig. 2, and here is omitted.

In the embodiment of the present invention, sending module 401 sends into thread data record in the file to be written, to pass through Each data record that the thread receives the thread generates corresponding row Major key, and acquisition module 402 is obtained from thread The identification code of the corresponding data record of file to be written, the corresponding row Major key of the data record and the thread, and generate bag Row list of primary keys containing the corresponding relation between the data record and the row Major key, while by the identification of the thread of the acquisition Used as reference data, processing module 403 is by the data record of the acquisition, the data of the acquisition for code and the row list of primary keys of the generation Record the caching in the row list of primary keys write into Databasce of corresponding row Major key, the identification code of the thread of the acquisition and the generation In internal memory, data record, the corresponding row Major key of the data record of the storage, the thread of storage that will be stored in the cache Identification code and the row list of primary keys of storage write the distributed file system in the database, and after the completion of write-in, will The identification code and row list of primary keys of the thread being stored in the cache treat the reference data with this as data to be compared Comparison data is compared, if comparison result is that there are during data record do not write the distributed file system of the database, During the file to be written then write into the database again, so every time after data are write, determine that data are by comparing No whole is written in database, and then ensures the integrality of data storage, while recorded compared in the way of journal file, The present invention is compared using row list of primary keys, occupying system resources very little, and then does not interfere with data storage efficiency.

In multiple embodiments provided herein, it should be understood that disclosed system, apparatus and method, can be with Realize by another way.For example, device embodiment described above is only schematical, for example, the module Divide, only a kind of division of logic function there can be other dividing mode when actually realizing, such as multiple module or components Can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, it is shown or The coupling each other for discussing or direct-coupling or communication linkage can be the indirect couplings of device or module by some interfaces Close or communication linkage, can be electrical, mechanical or other forms.

The module that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as module The part for showing can be or may not be physical module, you can with positioned at a place, or can also be distributed to multiple On mixed-media network modules mixed-media.Some or all of module therein can be according to the actual needs selected to realize the mesh of this embodiment scheme 's.

In addition, during each functional module in each embodiment of the invention can be integrated in a processing module, it is also possible to It is that modules are individually physically present, it is also possible to which two or more modules are integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.

If the integrated module is to realize in the form of software function module and as independent production marketing or use When, can store in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part for being contributed to prior art in other words or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are used to so that a computer Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the invention Portion or part steps.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

It should be noted that for foregoing each method embodiment, in order to simplicity is described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention not by described by sequence of movement limited because According to the present invention, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, and involved action and module might not all be this hairs Necessary to bright.

In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiments.

It is more than the description to method for writing data and device based on Hbase databases provided by the present invention, for Those skilled in the art, according to the thought of the embodiment of the present invention, has change in specific embodiments and applications Part, to sum up, this specification content should not be construed as limiting the invention.

Claims

1. a kind of method for writing data based on Hbase databases, it is characterised in that including：

The corresponding row Major key of the corresponding data record of file to be written, the data record and the line are obtained from thread The identification code of journey, and the row list of primary keys for including the corresponding relation between the data record and the row Major key is generated, Simultaneously using the identification code of the thread of the acquisition and the row list of primary keys of the generation as reference data；

By the corresponding row Major key of the data record of the data record of the acquisition, the acquisition, the knowledge of the thread of the acquisition In cache in other code and the row list of primary keys write into Databasce of the generation；

Data record, the corresponding row Major key of the data record of the storage, the line of storage that will be stored in the cache The identification code of journey and the row list of primary keys of storage are write in the distributed file system in the database, and are completed in write-in Afterwards, the identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared；

The reference data is compared with the data to be compared, if comparison result does not write institute to there are data record In stating the distributed file system of database, then in the file to be written being write into the database again.

2. method according to claim 1, it is characterised in that described that the corresponding data of file to be written are obtained from thread Also include before the identification code of record, the corresponding row Major key of the data record and the thread：

Data record in the file to be written is sent into thread, with by the thread by the thread receive it is every One data record generates corresponding row Major key.

3. method according to claim 1, it is characterised in that described to enter the reference data with the data to be compared Row comparison includes：

If row list of primary keys in the data to be compared is completely the same with row list of primary keys in the reference data, described Comparison result is in not writing the distributed file system of the database without data record；

If row list of primary keys in the data to be compared is inconsistent with row list of primary keys in the reference data, the ratio It is in there are the distributed file system that data record does not write the database to result.

4. method according to claim 3, it is characterised in that if the comparison result does not write to there are data record In the distributed file system of the database, then the file to be written is write into the database again includes：

If the comparison result is in there are the distributed file system that data record does not write the database, from described The row Major key lacked in the row list of primary keys of the data to be compared is searched in the row list of primary keys of reference data；

According to the reference data or the identification code of the data thread to be compared, the missing is obtained from the thread The corresponding file to be written of row list of primary keys of row list of primary keys and the acquisition where row Major key, wherein the thread In the corresponding relation that is stored between row list of primary keys and file to be written；

5. method according to claim 3, it is characterised in that described to enter the reference data with the data to be compared After row is compared, also include：

If the comparison result is in not writing the distributed file system of the database without data record, delete described Reference data and the data to be compared；

According to the reference data or the identification code of the data thread to be compared, sent to the thread and delete prompting letter Breath, it is described delete prompt message be used to pointing out to delete be stored in row list of primary keys and the file to be written in the thread it Between corresponding relation.

6. a kind of data transfer apparatus based on Hbase databases, it is characterised in that described device includes：

Acquisition module, for obtaining the corresponding row master of the corresponding data record of file to be written, the data record from thread The identification code of key assignments and the thread, and generate the corresponding relation included between the data record and the row Major key Row list of primary keys, while using the identification code of the thread of the acquisition and the row list of primary keys of the generation as reference data；

Processing module, for by the corresponding row Major key of the data record of the data record of the acquisition, the acquisition, described obtain In cache in the identification code of the thread for taking and the row list of primary keys write into Databasce of the generation；

The processing module, is additionally operable to data record, the data record of the storage correspondence that will be stored in the cache The row list of primary keys of row Major key, the identification code of the thread of storage and storage write distributed field system in the database In system, and after the completion of write-in, the identification code and row list of primary keys of the thread in the cache are will be stored in as waiting to compare To data；

The processing module, is additionally operable to compare the reference data with the data to be compared, if comparison result is to deposit It is in the distributed file system for thering is data record not write the database, then again that the file write-in to be written is described In database.

7. device according to claim 6, it is characterised in that described device also includes：

Sending module, for data record in the file to be written to be sent into thread, incites somebody to action described with by the thread Each data record that thread is received generates corresponding row Major key.

8. device according to claim 6, it is characterised in that the processing module includes：

Submodule is compared, for judging the thread in the identification code of the thread in the data to be compared and the reference data Whether identification code is consistent；

The comparison submodule, if being additionally operable to unanimously, by the row list of primary keys in the data to be compared and the base value Row list of primary keys in is compared；

The comparison submodule, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is completely the same, then the comparison result is the distributed file system that the database is not write without data record In；

The comparison submodule, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is inconsistent, then the comparison result is that there are the distributed file system that data record does not write the database In.

9. device according to claim 8, it is characterised in that the processing module also includes：

Submodule is searched, if being that there are the distributed document that data record does not write the database for the comparison result In system, then the row lacked in the row list of primary keys of the data to be compared is searched from the row list of primary keys of the reference data Major key；

Acquisition submodule, for the identification code according to the reference data or the data thread to be compared, from the thread The row list of primary keys of row list of primary keys and the acquisition where the middle row Major key for obtaining the missing is corresponding to be written File, wherein the corresponding relation being stored with the thread between row list of primary keys and file to be written；

Submodule is reset, for again will be in the file to be written corresponding data record write-in database, and again Comparison basis data and data to be compared, until the comparison result is the distribution that the database is not write without data record In formula file system.

10. device according to claim 8, it is characterised in that the processing module also includes：

Submodule is deleted, if for the comparison result being the distributed field system that the database is not write without data record In system, then the reference data and the data to be compared are deleted；

Prompting submodule, for the identification code according to the reference data or the data thread to be compared, to the thread Send and delete prompt message, the deletion prompt message is used to point out to delete row list of primary keys and the institute being stored in the thread State the corresponding relation between file to be written.