CN106776795A - Method for writing data and device based on Hbase databases - Google Patents

Method for writing data and device based on Hbase databases Download PDF

Info

Publication number
CN106776795A
CN106776795A CN201611047256.3A CN201611047256A CN106776795A CN 106776795 A CN106776795 A CN 106776795A CN 201611047256 A CN201611047256 A CN 201611047256A CN 106776795 A CN106776795 A CN 106776795A
Authority
CN
China
Prior art keywords
data
thread
row
primary keys
compared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611047256.3A
Other languages
Chinese (zh)
Other versions
CN106776795B (en
Inventor
黄健文
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201611047256.3A priority Critical patent/CN106776795B/en
Publication of CN106776795A publication Critical patent/CN106776795A/en
Application granted granted Critical
Publication of CN106776795B publication Critical patent/CN106776795B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Abstract

The invention discloses a kind of method for writing data and device based on Hbase databases, the method includes:Using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data,The data record that will be stored in the cache,The corresponding row Major key of data record of the storage,The identification code of the thread of storage and the row list of primary keys of storage are write in the distributed file system in the database,And after the completion of write-in,The identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared,The reference data is compared with the data to be compared,If comparison result is that there are during data record do not write the distributed file system of the database,During the file to be written then write into the database again,So ensure the integrality of data storage,Recorded compared in the way of journal file simultaneously,The present invention is compared using row list of primary keys,Occupying system resources very little,And then do not interfere with data storage efficiency.

Description

Method for writing data and device based on Hbase databases
Technical field
The invention belongs to field of data storage, more particularly to a kind of method for writing data and dress based on Hbase databases Put.
Background technology
The method that distributed storage is taken current cloud storage system, by data dispersion storage in many independent equipment more On, on the one hand the perfect performance of database, improves the reading efficiency of data;On the other hand due to distributed storage architecture, When there is storage device failure to occur, the access of local data can be only influenceed, without making whole database paralyse, and then increased The safety and reliability of big data.Hadoop databases (HBase, Hadoop Database) are a kind of distributed storages System.Although HBase databases can be avoided when storage device sends failure, depositing for total data in database is not interfered with Take, but cannot avoid producing failure in data writing process, and then lead to not by search index to target data.
In the prior art, ahead daily record WAL (Write Ahead Logging) is to ensure that the standard side of data integrity Method.In the case of database corruption, the daily record that is prestored by WAL recovers database.The daily record needs for prestoring Record storing process each time, therefore daily record can take a large amount of storage resources of system, while the I/O resources of the system of occupancy, once The data volume of storage increases, and must reduce the efficiency of data storage.
The content of the invention
The present invention provides a kind of method for writing data and device based on Hbase databases, it is intended to solve because of prior art In the daily record that prestores take system ample resources and cause to reduce the problem of data storage efficiency.
A kind of method for writing data based on Hbase databases that the present invention is provided, including:Obtained from thread to be written The identification code of the corresponding row Major key of the corresponding data record of file, the data record and the thread, and generation include There is the row list of primary keys of the corresponding relation between the data record and the row Major key, while by the thread of the acquisition Identification code and the row list of primary keys of the generation are used as reference data;By the data record of the acquisition, the data of the acquisition In recording the row list of primary keys write into Databasce of corresponding row Major key, the identification code of the thread of the acquisition and the generation In cache;By the data record stored in the cache, the corresponding row Major key of the data record of the storage, deposit The identification code of the thread of storage and the row list of primary keys of storage are write in the distributed file system in the database, and in write-in After the completion of, the identification code and row list of primary keys of thread in the cache are will be stored in as data to be compared;Will be described Reference data is compared with the data to be compared, if comparison result is that there are data record not writing the database In distributed file system, then again by the file write-in to be written database.
A kind of data transfer apparatus based on Hbase databases that the present invention is provided, including:Acquisition module, for from line The identification of the corresponding row Major key of the corresponding data record of file to be written, the data record and the thread is obtained in journey Code, and generate the row list of primary keys for including corresponding relation between the data record and the row Major key, while by institute The identification code of the thread of acquisition and the row list of primary keys of the generation are stated as reference data;Processing module, for being obtained described Data record, the corresponding row Major key of the data record of the acquisition, the identification code of the thread of the acquisition and the life for taking Into row list of primary keys write into Databasce in cache in;The processing module, is additionally operable to be deposited in the cache The row master of the data record of storage, the corresponding row Major key of the data record of the storage, the identification code of the thread of storage and storage Key list is write in the distributed file system in the database, and after the completion of write-in, will be stored in the cache In thread identification code and row list of primary keys as data to be compared;The processing module, is additionally operable to the reference data Compare with the data to be compared, if comparison result is that there are the distribution text that data record does not write the database In part system, then again by the file write-in to be written database.
Method for writing data and device based on Hbase databases that the present invention is provided, obtain text to be written from thread The identification code of the corresponding data record of part, the corresponding row Major key of the data record and the thread, and generation includes the number According to the row list of primary keys of the corresponding relation between record and the row Major key, while by the identification code of the thread of the acquisition and the life Into row list of primary keys as reference data, by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, In cache in the identification code of the thread of the acquisition and the row list of primary keys write into Databasce of the generation, by the cache The corresponding row Major key of the data record of middle storage, the data record of the storage, the identification code of the thread of storage and the row of storage List of primary keys is write in the distributed file system in the database, and after the completion of write-in, in will be stored in the cache Thread identification code and row list of primary keys as data to be compared, the reference data is compared with the data to be compared, If comparison result is that there are during data record do not write the distributed file system of the database, again by the text to be written Part is write in the database, so every time after data are write, by comparing whether determination data are all written to database In, and then ensure the integrality of data storage, while being recorded compared in the way of journal file, the present invention utilizes row primary key column Table is compared, occupying system resources very littles, and then does not interfere with data storage efficiency.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention.
Fig. 1 is that the method for writing data based on Hbase databases that first embodiment of the invention is provided realizes that flow is shown It is intended to;
Fig. 2 is that the method for writing data based on Hbase databases that second embodiment of the invention is provided realizes that flow is shown It is intended to;
Fig. 3 is the structural representation of the data transfer apparatus based on Hbase databases that third embodiment of the invention is provided;
Fig. 4 is the structural representation of the data transfer apparatus based on Hbase databases that fourth embodiment of the invention is provided.
Specific embodiment
To enable that goal of the invention of the invention, feature, advantage are more obvious and understandable, below in conjunction with the present invention Accompanying drawing in embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described reality It is only a part of embodiment of the invention to apply example, and not all embodiments.Based on the embodiment in the present invention, people in the art The every other embodiment that member is obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Fig. 1 is referred to, Fig. 1 provides the reality of the method for writing data based on Hbase databases for first embodiment of the invention Existing schematic flow sheet, in can be applied to the terminal with data processing function, such as computer, shown in Fig. 1 based on Hbase data The method for writing data in storehouse, mainly includes the following steps that:
S101, obtain from thread the corresponding row Major key of the corresponding data record of file to be written, the data record with And the identification code of the thread, and generate the row primary key column for including the corresponding relation between the data record and the row Major key Table, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data.
Hbase databases include multiple threads (Threads), and the plurality of thread is used to distribute and dispatch.One thread is equal The identity number (ID, identification) of correspondence one identification code of thread, i.e. thread.In actual applications, one File to be written is segmented into multiple data records, and the most of data records in a file to be written can be assigned to a line Cheng Zhong, but also there is the possibility being assigned in multiple threads.One data record one row Major key (rowkey) of correspondence.The row The corresponding relation between multiple data records and the row Major key, wherein the row list of primary keys and acquisition are included in list of primary keys Thread ID it is corresponding.
Alternatively, can also be corresponding with the file to be written by the row list of primary keys, and by the row list of primary keys with The corresponding relation of the file to be written is stored, and informs the threads store row list of primary keys with the file to be written simultaneously Corresponding relation.
S102, by the data record of the acquisition, the data record of the acquisition corresponding row Major key, the thread of the acquisition In cache in identification code and the row list of primary keys write into Databasce of the generation.
The cache of Hbase databases is MemStore, and in actual applications, by the data record of the acquisition, this obtains The row list of primary keys write-in internal memory of the corresponding row Major key of data record, the identification code of the thread of the acquisition and the generation that take In MemStore.
S103, the data record by being stored in the cache, the corresponding row Major key of the data record of the storage, storage The identification code of thread and during the row list of primary keys of storage writes the distributed file system in the database, and completed in write-in Afterwards, the identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared.
Here it is that the data of cache storage in S102 are written to distributed file system, i.e. HBase databases In Hfile.
S104, the reference data and the data to be compared are compared, if comparison result is to there are data record not In writing the distributed file system of the database, then in the file to be written being write into the database again.
The process of comparison is in order to ensure the integrality of the data being written in distributed file system.If comparison result is There are during data record do not write the distributed file system of the database, then the file to be written is write into the data again In storehouse, that is, need to re-execute step S101 to step S104.
In the embodiment of the present invention, the corresponding data record of file to be written, the data record are obtained from thread corresponding The identification code of row Major key and the thread, and generation includes the corresponding relation between the data record and the row Major key Row list of primary keys, while the identification code of the thread of the acquisition and the row list of primary keys of the generation are obtained this as reference data The row of the data record, the corresponding row Major key of the data record of the acquisition, the identification code of the thread of the acquisition and the generation that take In cache in list of primary keys write into Databasce, data record, the data of the storage note that will be stored in the cache Record the distribution text that corresponding row Major key, the identification code of thread of storage and the row list of primary keys of storage are write in the database In part system, and after the completion of write-in, the identification code and row list of primary keys of the thread in the cache are will be stored in as treating Comparison data, the reference data is compared with the data to be compared, if comparison result does not write to there are data record In the distributed file system of the database, then in the file to be written being write into the database again, so every time in write-in After data, by comparing whether determination data are all written in database, and then the integrality of data storage is ensured, while Recorded compared in the way of journal file, the embodiment of the present invention is compared using row list of primary keys, occupying system resources very little, And then do not interfere with data storage efficiency.
Fig. 2 is referred to, the method for writing data based on Hbase databases that Fig. 2 is provided for second embodiment of the invention Realize schematic flow sheet, in can be applied to the terminal with data processing function, such as computer, shown in Fig. 2 based on Hbase numbers According to the method for writing data in storehouse, mainly include the following steps that:
S201, data record in the file to be written is sent into thread, receive the thread with by the thread Each data record generate corresponding row Major key.
One data record one row Major key of correspondence.In actual applications, a file to be written is segmented into multiple Data record, the most of data records in a file to be written can be assigned in a thread, but are also assigned to many Possibility in individual thread.In actual applications, thread is the rowkey values generated by Hash (Hash) algorithm.
S202, obtain from thread the corresponding row Major key of the corresponding data record of file to be written, the data record with And the identification code of the thread, and generate the row primary key column for including the corresponding relation between the data record and the row Major key Table, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data.
Comprising the corresponding relation between multiple data records and the row Major key, wherein row master in the row list of primary keys Key list is corresponding with the ID of the thread for obtaining.
Alternatively, the corresponding relation of the threads store row list of primary keys and the file to be written can also be informed.
S203, by the data record of the acquisition, the data record of the acquisition corresponding row Major key, the thread of the acquisition In cache in identification code and the row list of primary keys write into Databasce of the generation.
The cache of Hbase databases is MemStore, and in actual applications, by the data record of the acquisition, this obtains The row list of primary keys write-in internal memory of the corresponding row Major key of data record, the identification code of the thread of the acquisition and the generation that take In MemStore.
S204, the data record by being stored in the cache, the corresponding row Major key of the data record of the storage, storage The identification code of thread and during the row list of primary keys of storage writes the distributed file system in the database, and completed in write-in Afterwards, the identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared.
Here it is that the data of cache storage in S203 are written to distributed file system, i.e. HBase databases In Hfile.In actual applications, first will in the cache store data record, the data record of the storage it is corresponding Row Major key, the identification code of thread of storage and the row list of primary keys of storage are transmitted to the HStore, Ran Houtong of HBase databases Cross data record, the corresponding row Major key of the data record of the storage, storage that commit modes will be stored in the cache Thread identification code and storage row list of primary keys be written in HFile.After HFile is write, by the cache Thread identification code and row list of primary keys be stored in the cache as data to be compared where server in.
S205, the reference data and the data to be compared are compared, if comparison result is to there are data record not In writing the distributed file system of the database, then in the file to be written being write into the database again.
The process of comparison is in order to ensure the integrality of the data being written in distributed file system.If comparison result is There are during data record do not write the distributed file system of the database, then the file to be written is write into the data again In storehouse, that is, need to re-execute step S201 to step S205.
Alternatively, the reference data and the data to be compared are compared specially:
Judge whether the identification code of thread in the data to be compared is consistent with the identification code of the thread in the reference data;
If consistent, the row list of primary keys in the data to be compared is compared with the row list of primary keys in the reference data It is right;
If row list of primary keys in the data to be compared is completely the same with row list of primary keys in the reference data, the ratio It is in not writing the distributed file system of the database without data record to result;
If row list of primary keys in the data to be compared is inconsistent with row list of primary keys in the reference data, the comparison Result is that there are during data record do not write the distributed file system of the database.
First have to judge whether the reference data is consistent with the ID of data thread to be compared, in the bar that the ID of thread is consistent Under part, then the uniformity for comparing row list of primary keys in the reference data and data to be compared.
Alternatively, if comparison result is that there are during data record do not write the distributed file system of the database, Again it is specially in the file to be written being write into the database:
If the comparison result is that there are during data record do not write the distributed file system of the database, from the base The row Major key lacked in the row list of primary keys of the data to be compared is searched in the row list of primary keys of quasi- data;
According to the reference data or the identification code of the data thread to be compared, the row master of the missing is obtained from the thread The corresponding file to be written of row list of primary keys of row list of primary keys and the acquisition where key assignments, is wherein stored with the thread Corresponding relation between row list of primary keys and file to be written;
Again in the corresponding data record of the file to be written being write into the database, and comparison basis data and treat again Comparison data, until the comparison result is in not writing the distributed file system of the database without data record.
Row list of primary keys where being to determine the row Major key of missing first, then finds thread by row list of primary keys ID, finally by thread store row list of primary keys and file to be written between corresponding relation, using row list of primary keys The file to be written for needing to re-write is found, step S201- step S205 are then re-executed, until the comparison result is not for In thering is data record not write the distributed file system of the database, this avoid loss of data, it is ensured that data can be with complete Whole storage is in database.
It should be noted that for the file same to be written for re-writing, the rowkey values of correspondence generation be it is identical, Simultaneously it can be appreciated that row list of primary keys is identical, can so avoid repeating to be stored in identical data record.
Alternatively, after this compares the reference data and the data to be compared, also include:
If the comparison result is in not writing the distributed file system of the database without data record, to delete the base Quasi- data and the data to be compared;
According to the reference data or the identification code of the data thread to be compared, sent to the thread and delete prompt message, The deletion prompt message is corresponding between the row list of primary keys in the thread and the file to be written for pointing out deletion to be stored in Relation.
If the comparison result is in not writing the distributed file system of the database without data record, to delete the benchmark Data and the data to be compared, and inform thread delete the row list of primary keys and the file to be written that are stored in the thread it Between corresponding relation, can so discharge the memory space of part, optimize system resource.
In the embodiment of the present invention, data record in the file to be written is sent into thread, with should by the thread Each data record that thread is received generates corresponding row Major key, and the corresponding data note of file to be written is obtained from thread The identification code of record, the corresponding row Major key of the data record and the thread, and generation includes the data record and row master The row list of primary keys of the corresponding relation between key assignments, while by the identification code of the thread of the acquisition and the row list of primary keys of the generation As reference data, by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, the thread of the acquisition In cache in identification code and the row list of primary keys write into Databasce of the generation, the data note that will be stored in the cache The row list of primary keys write-in of record, the corresponding row Major key of data record of the storage, the identification code of the thread of storage and storage should In distributed file system in database, and after the completion of write-in, the identification code of the thread in the cache is will be stored in With row list of primary keys as data to be compared, the reference data is compared with the data to be compared, if comparison result is to deposit In the distributed file system for thering is data record not write the database, then the file to be written is write into the database again In, whether so every time after data are write, all it is written in database, and then ensure data by comparing determination data The integrality of storage, while being recorded compared in the way of journal file, the present invention is compared using row list of primary keys, takes system System resource very little, and then do not interfere with data storage efficiency.
Fig. 3 is referred to, Fig. 3 is the data transfer apparatus based on Hbase databases that third embodiment of the invention is provided Structural representation, for convenience of description, illustrate only the part related to the embodiment of the present invention.Fig. 3 examples based on Hbase numbers Can be that the data based on Hbase databases that earlier figures 1 and embodiment illustrated in fig. 2 are provided write according to the data transfer apparatus in storehouse The executive agent of method.The data transfer apparatus based on Hbase databases of Fig. 3 examples, mainly include:The He of acquisition module 301 Processing module 302.Each functional module describes in detail as follows above:
Acquisition module 301, it is corresponding for obtaining the corresponding data record of file to be written, the data record from thread The identification code of row Major key and the thread, and generation includes the corresponding relation between the data record and the row Major key Row list of primary keys, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data;
Processing module 302, for by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, should In cache in the identification code of the thread of acquisition and the row list of primary keys write into Databasce of the generation;
Processing module 302, be additionally operable to will in the cache store data record, the data record of the storage it is corresponding Row Major key, the identification code of thread of storage and the row list of primary keys of storage write the distributed file system in the database In, and after the completion of write-in, the identification code and row list of primary keys of thread in the cache are will be stored in as number to be compared According to;
Processing module 302, is additionally operable to compare the reference data with the data to be compared, if comparison result is presence In thering is data record not write the distributed file system of the database, then the file to be written is write into the database again In.
One thread corresponds to an identification code for thread, the i.e. ID of thread.One file to be written is segmented into multiple Data record, the most of data records in a file to be written can be assigned in a thread, but are also assigned to many Possibility in individual thread.Comprising the corresponding relation between multiple data records and the row Major key in the row list of primary keys, its In the row list of primary keys with obtain thread ID it is corresponding.Alternatively, can also be by the row list of primary keys and the text to be written Part it is corresponding, and the row list of primary keys and the corresponding relation of the file to be written are stored, and inform that thread is deposited simultaneously Store up the corresponding relation of the row list of primary keys and the file to be written.The cache of Hbase databases is MemStore.
The process of comparison is in order to ensure the integrality of the data being written in distributed file system.If comparison result is There are during data record do not write the distributed file system of the database, then processing module 302 is again by the text to be written Part is write in the database.
The present embodiment details not to the greatest extent, refers to the description of foregoing embodiment illustrated in fig. 1, and here is omitted.
It should be noted that in the implementation method of the data transfer apparatus based on Hbase databases of figure 3 above example, The division of each functional module is merely illustrative of, in practical application can as needed, such as the configuration requirement of corresponding hardware or The convenient consideration of the realization of person's software, and above-mentioned functions distribution is completed by different functional modules, Hbase data will be based on The internal structure of the data transfer apparatus in storehouse is divided into different functional modules, to complete all or part of work(described above Energy.And, in practical application, the corresponding functional module in the present embodiment can be realized by corresponding hardware, it is also possible to by Corresponding hardware performs corresponding software and completes.Each embodiment that this specification is provided can all apply foregoing description principle, with Under repeat no more.
In the embodiment of the present invention, acquisition module 301 obtains the corresponding data record of file to be written, the data from thread Corresponding row Major key and the identification code of the thread are recorded, and generation includes between the data record and the row Major key The row list of primary keys of corresponding relation, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as base value According to, processing module 302 by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, the acquisition thread Identification code and the generation row list of primary keys write into Databasce in cache in, processing module 302 is by the cache The corresponding row Major key of the data record of middle storage, the data record of the storage, the identification code of the thread of storage and the row of storage List of primary keys is write in the distributed file system in the database, and after the completion of write-in, in will be stored in the cache Thread identification code and row list of primary keys as data to be compared, processing module 302 is by the reference data and the number to be compared According to comparing, if comparison result is that there are during data record do not write the distributed file system of the database, again During the file to be written write into the database, so every time after data are write, determine data whether all by comparing It is written in database, and then ensures the integrality of data storage, while is recorded compared in the way of journal file, the present invention Compared using row list of primary keys, occupying system resources very little, and then do not interfere with data storage efficiency.
Fig. 4 is referred to, the data transfer apparatus based on Hbase databases that Fig. 4 is provided for fourth embodiment of the invention Structural representation, for convenience of description, illustrate only the part related to the embodiment of the present invention.Fig. 4 examples based on Hbase numbers Can be that the data based on Hbase databases that earlier figures 1 and embodiment illustrated in fig. 2 are provided write according to the data transfer apparatus in storehouse The executive agent of method.The data transfer apparatus based on Hbase databases of Fig. 4 examples, mainly include:Sending module 401, obtain Modulus block 402, processing module 403, removing module 404 and reminding module 405, wherein processing module 403 include comparing submodule Block 4031;Also include searching submodule 4032, acquisition submodule 4033 in processing module 403 and reset submodule 4034.More than Each functional module describes in detail as follows:
Sending module 401, for data record in the file to be written to be sent into thread, with should by the thread Each data record that thread is received generates corresponding row Major key.
Acquisition module 402, it is corresponding for obtaining the corresponding data record of file to be written, the data record from thread The identification code of row Major key and the thread, and generation includes the corresponding relation between the data record and the row Major key Row list of primary keys, while using the row list of primary keys of the identification code of the thread of the acquisition and the generation as reference data;
Processing module 403, for by the data record of the acquisition, the corresponding row Major key of the data record of the acquisition, should In cache in the identification code of the thread of acquisition and the row list of primary keys write into Databasce of the generation;
The processing module 403, is additionally operable to data record, the data record of the storage correspondence that will be stored in the cache Row Major key, the identification code of thread of storage and the row list of primary keys of storage write distributed file system in the database In, and after the completion of write-in, the identification code and row list of primary keys of thread in the cache are will be stored in as number to be compared According to;
The processing module 403, is additionally operable to compare the reference data with the data to be compared, if comparison result is to deposit In the distributed file system for thering is data record not write the database, then the file to be written is write into the database again In.
One data record one row Major key of correspondence.In actual applications, a file to be written is segmented into multiple Data record, the most of data records in a file to be written can be assigned in a thread, but are also assigned to many Possibility in individual thread.In actual applications, thread is the rowkey values generated by hash algorithm.Wrapped in the row list of primary keys Containing the corresponding relation between multiple data records and the row Major key, wherein ID phases of the row list of primary keys and the thread of acquisition Correspondence.
Alternatively, sending module 401 is additionally operable to inform that the threads store row list of primary keys is corresponding with the file to be written Relation.
Alternatively, processing module 403 includes:Compare submodule 4031;
Submodule 4031 is compared, for the line in the identification code and the reference data that judge the thread in the data to be compared Whether the identification code of journey is consistent;
Submodule 4032 is compared, if being additionally operable to unanimously, by the row list of primary keys in the data to be compared and the base value Row list of primary keys in is compared;
Submodule 4031 is compared, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is completely the same, then the comparison result is in not writing the distributed file system of the database without data record;
Submodule 4031 is compared, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is inconsistent, then the comparison result is that there are during data record do not write the distributed file system of the database.
Alternatively, processing module 403 also includes:Search submodule 4032, acquisition submodule 4033 and reset submodule 4034th, submodule 4035 and prompting submodule 4036 are deleted;
Submodule 4032 is searched, if being that there are the distribution that data record does not write the database for the comparison result In file system, then the row lacked in the row list of primary keys of the data to be compared is searched from the row list of primary keys of the reference data Major key;
Acquisition submodule 4033, for the identification code according to the reference data or the data thread to be compared, from the line The corresponding text to be written of row list of primary keys of row list of primary keys and the acquisition where the row Major key of the missing is obtained in journey The corresponding relation being stored between row list of primary keys and file to be written in part, the wherein thread;
Submodule 4034 is reset, in the corresponding data record of the file to be written write into the database again, and Again comparison basis data and data to be compared, until the comparison result is the distribution that the database is not write without data record In formula file system.
Submodule 4035 is deleted, if for the comparison result being the distribution that the database is not write without data record In formula file system, then the reference data and the data to be compared are deleted;
Prompting submodule 4036, for the identification code according to the reference data or the data thread to be compared, to The thread sends deletes prompt message, and the deletion prompt message is used to point out to delete the row major key being stored in the thread Corresponding relation between list and the file to be written.
It should be noted that for the file same to be written for re-writing, the rowkey values of correspondence generation be it is identical, Simultaneously it can be appreciated that row list of primary keys is identical, can so avoid repeating to be stored in identical data record.
The present embodiment details not to the greatest extent, refers to the description of earlier figures 1 and embodiment illustrated in fig. 2, and here is omitted.
In the embodiment of the present invention, sending module 401 sends into thread data record in the file to be written, to pass through Each data record that the thread receives the thread generates corresponding row Major key, and acquisition module 402 is obtained from thread The identification code of the corresponding data record of file to be written, the corresponding row Major key of the data record and the thread, and generate bag Row list of primary keys containing the corresponding relation between the data record and the row Major key, while by the identification of the thread of the acquisition Used as reference data, processing module 403 is by the data record of the acquisition, the data of the acquisition for code and the row list of primary keys of the generation Record the caching in the row list of primary keys write into Databasce of corresponding row Major key, the identification code of the thread of the acquisition and the generation In internal memory, data record, the corresponding row Major key of the data record of the storage, the thread of storage that will be stored in the cache Identification code and the row list of primary keys of storage write the distributed file system in the database, and after the completion of write-in, will The identification code and row list of primary keys of the thread being stored in the cache treat the reference data with this as data to be compared Comparison data is compared, if comparison result is that there are during data record do not write the distributed file system of the database, During the file to be written then write into the database again, so every time after data are write, determine that data are by comparing No whole is written in database, and then ensures the integrality of data storage, while recorded compared in the way of journal file, The present invention is compared using row list of primary keys, occupying system resources very little, and then does not interfere with data storage efficiency.
In multiple embodiments provided herein, it should be understood that disclosed system, apparatus and method, can be with Realize by another way.For example, device embodiment described above is only schematical, for example, the module Divide, only a kind of division of logic function there can be other dividing mode when actually realizing, such as multiple module or components Can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, it is shown or The coupling each other for discussing or direct-coupling or communication linkage can be the indirect couplings of device or module by some interfaces Close or communication linkage, can be electrical, mechanical or other forms.
The module that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as module The part for showing can be or may not be physical module, you can with positioned at a place, or can also be distributed to multiple On mixed-media network modules mixed-media.Some or all of module therein can be according to the actual needs selected to realize the mesh of this embodiment scheme 's.
In addition, during each functional module in each embodiment of the invention can be integrated in a processing module, it is also possible to It is that modules are individually physically present, it is also possible to which two or more modules are integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.
If the integrated module is to realize in the form of software function module and as independent production marketing or use When, can store in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part for being contributed to prior art in other words or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are used to so that a computer Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the invention Portion or part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
It should be noted that for foregoing each method embodiment, in order to simplicity is described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention not by described by sequence of movement limited because According to the present invention, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, and involved action and module might not all be this hairs Necessary to bright.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiments.
It is more than the description to method for writing data and device based on Hbase databases provided by the present invention, for Those skilled in the art, according to the thought of the embodiment of the present invention, has change in specific embodiments and applications Part, to sum up, this specification content should not be construed as limiting the invention.

Claims (10)

1. a kind of method for writing data based on Hbase databases, it is characterised in that including:
The corresponding row Major key of the corresponding data record of file to be written, the data record and the line are obtained from thread The identification code of journey, and the row list of primary keys for including the corresponding relation between the data record and the row Major key is generated, Simultaneously using the identification code of the thread of the acquisition and the row list of primary keys of the generation as reference data;
By the corresponding row Major key of the data record of the data record of the acquisition, the acquisition, the knowledge of the thread of the acquisition In cache in other code and the row list of primary keys write into Databasce of the generation;
Data record, the corresponding row Major key of the data record of the storage, the line of storage that will be stored in the cache The identification code of journey and the row list of primary keys of storage are write in the distributed file system in the database, and are completed in write-in Afterwards, the identification code and row list of primary keys of thread in the cache be will be stored in as data to be compared;
The reference data is compared with the data to be compared, if comparison result does not write institute to there are data record In stating the distributed file system of database, then in the file to be written being write into the database again.
2. method according to claim 1, it is characterised in that described that the corresponding data of file to be written are obtained from thread Also include before the identification code of record, the corresponding row Major key of the data record and the thread:
Data record in the file to be written is sent into thread, with by the thread by the thread receive it is every One data record generates corresponding row Major key.
3. method according to claim 1, it is characterised in that described to enter the reference data with the data to be compared Row comparison includes:
Judge whether the identification code of thread in the data to be compared is consistent with the identification code of the thread in the reference data;
If consistent, the row list of primary keys in the data to be compared is compared with the row list of primary keys in the reference data It is right;
If row list of primary keys in the data to be compared is completely the same with row list of primary keys in the reference data, described Comparison result is in not writing the distributed file system of the database without data record;
If row list of primary keys in the data to be compared is inconsistent with row list of primary keys in the reference data, the ratio It is in there are the distributed file system that data record does not write the database to result.
4. method according to claim 3, it is characterised in that if the comparison result does not write to there are data record In the distributed file system of the database, then the file to be written is write into the database again includes:
If the comparison result is in there are the distributed file system that data record does not write the database, from described The row Major key lacked in the row list of primary keys of the data to be compared is searched in the row list of primary keys of reference data;
According to the reference data or the identification code of the data thread to be compared, the missing is obtained from the thread The corresponding file to be written of row list of primary keys of row list of primary keys and the acquisition where row Major key, wherein the thread In the corresponding relation that is stored between row list of primary keys and file to be written;
Again in the corresponding data record of the file to be written being write into the database, and comparison basis data and treat again Comparison data, until the comparison result is in not writing the distributed file system of the database without data record.
5. method according to claim 3, it is characterised in that described to enter the reference data with the data to be compared After row is compared, also include:
If the comparison result is in not writing the distributed file system of the database without data record, delete described Reference data and the data to be compared;
According to the reference data or the identification code of the data thread to be compared, sent to the thread and delete prompting letter Breath, it is described delete prompt message be used to pointing out to delete be stored in row list of primary keys and the file to be written in the thread it Between corresponding relation.
6. a kind of data transfer apparatus based on Hbase databases, it is characterised in that described device includes:
Acquisition module, for obtaining the corresponding row master of the corresponding data record of file to be written, the data record from thread The identification code of key assignments and the thread, and generate the corresponding relation included between the data record and the row Major key Row list of primary keys, while using the identification code of the thread of the acquisition and the row list of primary keys of the generation as reference data;
Processing module, for by the corresponding row Major key of the data record of the data record of the acquisition, the acquisition, described obtain In cache in the identification code of the thread for taking and the row list of primary keys write into Databasce of the generation;
The processing module, is additionally operable to data record, the data record of the storage correspondence that will be stored in the cache The row list of primary keys of row Major key, the identification code of the thread of storage and storage write distributed field system in the database In system, and after the completion of write-in, the identification code and row list of primary keys of the thread in the cache are will be stored in as waiting to compare To data;
The processing module, is additionally operable to compare the reference data with the data to be compared, if comparison result is to deposit It is in the distributed file system for thering is data record not write the database, then again that the file write-in to be written is described In database.
7. device according to claim 6, it is characterised in that described device also includes:
Sending module, for data record in the file to be written to be sent into thread, incites somebody to action described with by the thread Each data record that thread is received generates corresponding row Major key.
8. device according to claim 6, it is characterised in that the processing module includes:
Submodule is compared, for judging the thread in the identification code of the thread in the data to be compared and the reference data Whether identification code is consistent;
The comparison submodule, if being additionally operable to unanimously, by the row list of primary keys in the data to be compared and the base value Row list of primary keys in is compared;
The comparison submodule, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is completely the same, then the comparison result is the distributed file system that the database is not write without data record In;
The comparison submodule, if the row list of primary keys being additionally operable in the data to be compared is led with the row in the reference data Key list is inconsistent, then the comparison result is that there are the distributed file system that data record does not write the database In.
9. device according to claim 8, it is characterised in that the processing module also includes:
Submodule is searched, if being that there are the distributed document that data record does not write the database for the comparison result In system, then the row lacked in the row list of primary keys of the data to be compared is searched from the row list of primary keys of the reference data Major key;
Acquisition submodule, for the identification code according to the reference data or the data thread to be compared, from the thread The row list of primary keys of row list of primary keys and the acquisition where the middle row Major key for obtaining the missing is corresponding to be written File, wherein the corresponding relation being stored with the thread between row list of primary keys and file to be written;
Submodule is reset, for again will be in the file to be written corresponding data record write-in database, and again Comparison basis data and data to be compared, until the comparison result is the distribution that the database is not write without data record In formula file system.
10. device according to claim 8, it is characterised in that the processing module also includes:
Submodule is deleted, if for the comparison result being the distributed field system that the database is not write without data record In system, then the reference data and the data to be compared are deleted;
Prompting submodule, for the identification code according to the reference data or the data thread to be compared, to the thread Send and delete prompt message, the deletion prompt message is used to point out to delete row list of primary keys and the institute being stored in the thread State the corresponding relation between file to be written.
CN201611047256.3A 2016-11-23 2016-11-23 Data writing method and device based on Hbase database Expired - Fee Related CN106776795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611047256.3A CN106776795B (en) 2016-11-23 2016-11-23 Data writing method and device based on Hbase database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611047256.3A CN106776795B (en) 2016-11-23 2016-11-23 Data writing method and device based on Hbase database

Publications (2)

Publication Number Publication Date
CN106776795A true CN106776795A (en) 2017-05-31
CN106776795B CN106776795B (en) 2020-05-12

Family

ID=58974335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611047256.3A Expired - Fee Related CN106776795B (en) 2016-11-23 2016-11-23 Data writing method and device based on Hbase database

Country Status (1)

Country Link
CN (1) CN106776795B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273462A (en) * 2017-06-02 2017-10-20 郑州云海信息技术有限公司 One kind builds HBase cluster full-text index methods, method for reading data and method for writing data
CN109492001A (en) * 2018-10-15 2019-03-19 四川巧夺天工信息安全智能设备有限公司 A method of crumb data in ACCESS database is extracted in classification
CN110096296A (en) * 2019-05-10 2019-08-06 广州品唯软件有限公司 A kind of caching control methods and equipment
CN106776795B (en) * 2016-11-23 2020-05-12 黄健文 Data writing method and device based on Hbase database
CN111506582A (en) * 2019-01-30 2020-08-07 普天信息技术有限公司 Data storage method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318060A1 (en) * 2011-09-02 2013-11-28 Palantir Technologies, Inc. Multi-row transactions
CN103853727A (en) * 2012-11-29 2014-06-11 深圳中兴力维技术有限公司 Method and system for improving large data volume query performance
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN104778182A (en) * 2014-01-14 2015-07-15 博雅网络游戏开发(深圳)有限公司 Data import method and system based on HBase (Hadoop Database)
WO2015109250A1 (en) * 2014-01-20 2015-07-23 Alibaba Group Holding Limited CREATING NoSQL DATABASE INDEX FOR SEMI-STRUCTURED DATA

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776795B (en) * 2016-11-23 2020-05-12 黄健文 Data writing method and device based on Hbase database

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318060A1 (en) * 2011-09-02 2013-11-28 Palantir Technologies, Inc. Multi-row transactions
CN103853727A (en) * 2012-11-29 2014-06-11 深圳中兴力维技术有限公司 Method and system for improving large data volume query performance
CN104778182A (en) * 2014-01-14 2015-07-15 博雅网络游戏开发(深圳)有限公司 Data import method and system based on HBase (Hadoop Database)
WO2015109250A1 (en) * 2014-01-20 2015-07-23 Alibaba Group Holding Limited CREATING NoSQL DATABASE INDEX FOR SEMI-STRUCTURED DATA
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776795B (en) * 2016-11-23 2020-05-12 黄健文 Data writing method and device based on Hbase database
CN107273462A (en) * 2017-06-02 2017-10-20 郑州云海信息技术有限公司 One kind builds HBase cluster full-text index methods, method for reading data and method for writing data
CN107273462B (en) * 2017-06-02 2020-09-25 浪潮云信息技术股份公司 Full-text index method for building HBase cluster, data reading method and data writing method
CN109492001A (en) * 2018-10-15 2019-03-19 四川巧夺天工信息安全智能设备有限公司 A method of crumb data in ACCESS database is extracted in classification
CN111506582A (en) * 2019-01-30 2020-08-07 普天信息技术有限公司 Data storage method and device
CN110096296A (en) * 2019-05-10 2019-08-06 广州品唯软件有限公司 A kind of caching control methods and equipment

Also Published As

Publication number Publication date
CN106776795B (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN106776795A (en) Method for writing data and device based on Hbase databases
WO2015106711A1 (en) Method and device for constructing nosql database index for semi-structured data
US9256665B2 (en) Creation of inverted index system, and data processing method and apparatus
CN106874348B (en) File storage and index method and device and file reading method
CN104881466B (en) The processing of data fragmentation and the delet method of garbage files and device
CN105069111A (en) Similarity based data-block-grade data duplication removal method for cloud storage
CN106302720B (en) survivable storage system and method for block chain
CN104424287B (en) Data query method and apparatus
US9170748B2 (en) Systems, methods, and computer program products providing change logging in a deduplication process
CN109255056B (en) Data reference processing method, device, equipment and storage medium of block chain
CN106708653A (en) Mixed tax administration data security protecting method based on erasure code and multi-copy
CN103186554A (en) Distributed data mirroring method and data storage node
CN104965835B (en) A kind of file read/write method and device of distributed file system
CN109344157A (en) Read and write abruption method, apparatus, computer equipment and storage medium
CN102456076A (en) Massive fragment data aggregation system and method
CN107832423A (en) A kind of file read/write method for distributed file system
CN108255994A (en) A kind of database version management method based on database snapshot
CN103414762A (en) Cloud backup method and cloud backup device
CN105589908A (en) Association rule computing method for transaction set
CN102541982B (en) Method for organizing and accessing metadata file log
CN111522791A (en) Distributed file repeating data deleting system and method
CN106980618B (en) File storage method and system based on MongoDB distributed cluster architecture
CN112883121A (en) Data processing method, data processing device, computer equipment and storage medium
Balasundaram et al. An approach to secure capacity optimization in cloud computing using cryptographic hash function and data de-duplication
US9245048B1 (en) Parallel sort with a ranged, partitioned key-value store in a high perfomance computing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200512

Termination date: 20211123