CN108667929A

CN108667929A - Method for synchronizing data to elastic search based on HBase coprocessor

Info

Publication number: CN108667929A
Application number: CN201810432287.3A
Authority: CN
Inventors: 赵圣杰; 张霞; 肖雪; 胡清
Original assignee: Inspur Software Group Co Ltd
Current assignee: Inspur Software Group Co Ltd
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2018-10-16

Abstract

The invention discloses a method for synchronizing data to an elastic search based on an HBase coprocessor, which is characterized in that the method configures the coprocessor for a table created by the HBase, configures the attribute of the coprocessor to the corresponding table of the HBase, and when the coprocessor takes effect, the coprocessor is connected with the elastic search through a client, initializes parameters of the HBase coprocessor and creates an elastic search index; then the HBase coprocessor calls a related method to synchronize HBase data to an index corresponding to the elasticsearch. The invention synchronizes the data changed in the HBase to the elastic search in real time for storage by writing, updating or deleting the data into the HBase, and realizes flexible query and statistics of the data by utilizing the elastic search.

Description

A method of based on HBase coprocessors synchrodata to elasticsearch

Technical field

The present invention relates to Distributed Data Synchronization technical fields, and in particular to one kind being based on the same step number of HBase coprocessors According to the method to elasticsearch.

Background technology

With the growth of data, the efficient storage of distributed data and inquiry become more and more important, and HBase is to operate in Unstructured storage database on Hadoop, Elasticsearch are then efficient automotive engine system in distributed system, are realized Data store and efficiently inquiry, and existing storage and inquiry based on HBase and Elasticsearch have had more mature Method, but some advantage and disadvantage are individually present：

1, MapReduce schemes

MapReduce is a kind of programming framework can be used for data processing.MapReduce can be by distributed principle, will In the batch data of HBase, offline synchronization to elasticsearch, Mapreduce needs to pass through the scanning to HBase table Data can be synchronized in Elasticsearch, thus the additions and deletions each time of HBase change look into be required for operation Mapreduce come It synchronizes, flexibility is not strong enough, and real-time is not strong enough.

2, HBase secondary indexs scheme

It when HBase creates table, needs to create concordance list on the same region server, and corresponds.In master After being inserted into certain data in table, index column is write in concordance list with Coprocessor.In order to make main table and concordance list same On one region server, the automatically and manually split of concordance list is disabled（Division）, when can only be by main table split Triggering, when main table split, concordance list is divided by its corresponding data, meanwhile, to second of concordance list The previous section of the row key of daughter split is revised as the row key of corresponding major key.The secondary index of HBase needs Deeply to understand the backstage mechanism principle of HBase, carry out secondary development, be unfavorable for function decoupling.

Invention content

The technical problem to be solved by the present invention is to：In view of the above problems, the present invention, which provides one kind, being based on HBase coprocessors Method of the synchrodata to elasticsearch

The technical solution adopted in the present invention is：

A method of based on HBase coprocessors synchrodata to elasticsearch, the method is by for HBase institutes The table of establishment configures coprocessor, which is given to the table of corresponding HBase, when coprocessor comes into force, Elasticsearch is connected by client, initializes HBase coprocessor parameters, creates elasticsearch indexes；So HBase coprocessors call correlation technique that HBase data are synchronized in the corresponding indexes of elasticsearch afterwards, utilize Elasticsearch realizes the multi-condition inquiry of data.

The HBase coprocessors parameter configuration includes：

Configure the associated class of HBase coprocessors, including the cluster name of elasticsearch, cluster ip, index name, index class Type information, and establish relevant contact with elasticsearch.

The HBase coprocessors obtain elasticsearch client Connecting quantities by calling start () method, Cluster.name, transport.type netty3 are set, creates elasticsearch clients and connects instance objects, Corresponding elasticsearch indexes are established for HBase.

Write-in, the update of the HBase data, by calling the postPut methods of HBase coprocessors to realize.

The postPut method calls process is as follows：By calling elasticsearch's in the postPut methods Client is connected, the row data information being written in HBase is secondly obtained, the HBase data being written are synchronously written into In elasticsearch.

The deletion of the HBase data, by the postDelete methods for calling coprocessor.Number is obtained in the method According to major key call the connection client of elasticsearch simultaneously, it is according to major key that the data are same in elasticsearch Step is deleted.

The elasticsearch clients Connecting quantity includes cluster name, host names, TCP port number.

The method makes HBase coprocessors come into force by making the corresponding tables of HBase come into force.

Beneficial effects of the present invention are：

The present invention is arrived the real time data synchronization changed in HBase by the way that data are written, updated or deleted to HBase It is stored in elasticsearch, the Flexible Query and statistics of data is realized using elasticsearch.

Description of the drawings

Fig. 1 is data synchronization framework schematic diagram of the present invention.

Specific implementation mode

With reference to the accompanying drawings of the specification, by specific implementation mode, the present invention is further described：

Embodiment 1

As shown in Figure 1, a kind of method based on HBase coprocessors synchrodata to elasticsearch, the method pass through Coprocessor is configured by the table that HBase is created, which is given to the table of corresponding HBase, is handled in association When device comes into force, elasticsearch is connected by client, initializes HBase coprocessor parameters, is created Elasticsearch indexes；Then HBase coprocessors call correlation technique that HBase data are synchronized to elasticsearch In corresponding index, the multi-condition inquiry of data is realized using elasticsearch.

Embodiment 2

On the basis of embodiment 1, HBase coprocessor parameter configurations described in the present embodiment include：

Embodiment 3

On the basis of embodiment 1 or 2, HBase coprocessors described in the present embodiment are obtained by calling start () method Elasticsearch client Connecting quantities, setting cluster.name, transport.type netty3, create Elasticsearch clients connect instance objects, and corresponding elasticsearch indexes are established for HBase.

Embodiment 4

On the basis of embodiment 3, write-in, the update of HBase data described in the present embodiment, by calling HBase coprocessors PostPut methods realize.

Embodiment 5

On the basis of embodiment 4, postPut method call processes described in the present embodiment are as follows：By in the postPut methods The middle connection client for calling elasticsearch, secondly obtains the row data information being written in HBase, by HBase write-ins Data are synchronously written into elasticsearch.

Embodiment 6

On the basis of embodiment 3, the deletion of HBase data described in the present embodiment, by calling coprocessor PostDelete methods.The major key for obtaining data in the method calls the connection client of elasticsearch simultaneously, according to Major key synchronization removal in elasticsearch by the data.

Embodiment 7

On the basis of embodiment 3, elasticsearch client Connecting quantities described in the present embodiment include cluster name, Host names, TCP port number.

Embodiment 8

On the basis of embodiment 1, the present embodiment the method makes HBase coprocessors by making the corresponding tables of HBase come into force It comes into force.

Embodiment of above is merely to illustrate the present invention, and not limitation of the present invention, in relation to the common of technical field Technical staff can also make a variety of changes and modification without departing from the spirit and scope of the present invention, therefore all Equivalent technical solution also belongs to scope of the invention, and scope of patent protection of the invention should be defined by the claims.

Claims

1. a kind of method based on HBase coprocessors synchrodata to elasticsearch, which is characterized in that the method Coprocessor is configured by the table created by HBase, which is given to the table of corresponding HBase, is being assisted When processor comes into force, elasticsearch is connected by client, initializes HBase coprocessor parameters, is created Elasticsearch indexes；Then HBase coprocessors call correlation technique that HBase data are synchronized to elasticsearch In corresponding index.

2. a kind of method based on HBase coprocessors synchrodata to elasticsearch according to claim 1, It is characterized in that, the HBase coprocessors parameter configuration includes：

3. a kind of side based on HBase coprocessors synchrodata to elasticsearch according to claim 1 or 2 Method, it is characterised in that：The HBase coprocessors obtain elasticsearch clients and connect by calling start () method Parameter is connect, it is real to create the connection of elasticsearch clients by setting cluster.name, transport.type netty3 Example object establishes corresponding elasticsearch indexes for HBase.

4. a kind of method based on HBase coprocessors synchrodata to elasticsearch according to claim 3, It is characterized in that：Write-in, the update of the HBase data, by calling the postPut methods of HBase coprocessors to realize.

5. a kind of method based on HBase coprocessors synchrodata to elasticsearch according to claim 4, It is characterized in that, the postPut method calls process is as follows：By calling elasticsearch in the postPut methods Connection client, secondly obtain the row data information that is written in HBase, the HBase data being written be synchronously written into In elasticsearch.

6. a kind of method based on HBase coprocessors synchrodata to elasticsearch according to claim 3, It is characterized in that, the deletion of the HBase data is obtained in the method by calling the postDelete methods of coprocessor The major key for evidence of fetching calls the connection client of elasticsearch simultaneously, according to major key by the data in elasticsearch Middle synchronization removal.

7. a kind of method based on HBase coprocessors synchrodata to elasticsearch according to claim 3, It is characterized in that, the elasticsearch clients Connecting quantity includes cluster name, host names, TCP port number.

8. a kind of method based on HBase coprocessors synchrodata to elasticsearch according to claim 1, It is characterized in that, the method makes HBase coprocessors come into force by making the corresponding tables of HBase come into force.