CN110633271A

CN110633271A - Data extraction method from Hbase to mysql based on json

Info

Publication number: CN110633271A
Application number: CN201911137971.XA
Authority: CN
Inventors: 何静; 汤同伟; 霍荣
Original assignee: Unicloud Nanjing Digital Technology Co Ltd
Current assignee: Unicloud Nanjing Digital Technology Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2019-12-31

Abstract

The invention discloses a data extraction method from Hbase to mysql based on json, and relates to the technical field of data processing. The method comprises the steps of configuring HBase data source information, mysql target data source information, an extraction strategy, a cleaning rule, a conversion rule and a field mapping relation on a visual interface; executing a query on a visual interface to generate a json data set; the JCleanprocessor analyzes dirty data and invalid data in the json data set and deletes the dirty data and the invalid data; the JTransformprocessor converts data in the json data set according to different data types; json data converted by the MultiPutHBaseJSON processor is stored in the mysql database. According to the invention, through visual page configuration, the convenient and fast efficacy transfer from the Hbase data source to the mysql target data source is realized.

Description

Data extraction method from Hbase to mysql based on json

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a json-based data extraction method from Hbase to mysql.

Background

Hbase is a distributed, column-oriented open-source database, which is a database suitable for unstructured data storage. Meanwhile, the Hbase realizes a database system with high reliability, high performance, column storage, scalability and real-time reading and writing. But due to the inherent limitations of a single RowKey, it was determined that it was not possible to efficiently support multi-conditional queries. This results in that Hbase is not so convenient to use when the service scenario is complicated, multi-table association is required and multi-conditional queries are required. Meanwhile, in many business systems, many data are stored in a relational database, such as mysql. Thus, when data synchronization is required, data transfers across data sources can occur.

Referring to fig. 8, a common technical scheme for transferring data from the Hbase database to the mysql database is to export the data in the Hbase to the csv file or txt file, and then import the data in the csv file or txt file to the mysql database. However, this method has the following disadvantages:

a. reading data from Hbase requires command line operation, and has high requirement on professional knowledge of operators;

b. when writing data to mysql, since the amount of data stored in Hbase may be large, there may be various performance problems at the time of introduction.

c. The user experience is poor due to the fact that the visual operation interface is not available.

The invention provides a data extraction method from Hbase to mysql based on json, which is used for solving the following problems:

1) through a visual configuration mode, a user configures Hbase data source information, mysql target data source information, a cleaning rule, a conversion rule, a field mapping relation and data loading which are required to be extracted according to the actual requirement of the user; while the data is transferred in json format to effect the transfer of data from Hbase to mysql library.

2) Due to the existence of column families in Hbase, a plurality of columns exist under one column family; whereas mysql is a relational database with no column families; the technical scheme of the invention also solves the problem of how to take out the required column from the column family of Hbase and store the value of the column into mysql. .

Disclosure of Invention

The invention aims to provide a json-based method for extracting data from Hbase to mysql, which realizes convenient and rapid effect transfer from an Hbase data source to a mysql target data source through visual page configuration.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention relates to a data extraction method from Hbase to mysql based on json, which comprises the following steps:

s00: configuring HBase data source information, mysql target data source information, an extraction strategy, a cleaning rule, a conversion rule and a field mapping relation on a visual interface;

the configuration of the HBase data source information further comprises the steps of selecting the HBase data source, the table name and the column family or the column information to be extracted;

s01: executing a query on a visual interface to generate a json data set;

s02: the JCleanprocessor analyzes dirty data and invalid data in the json data set and deletes the dirty data and the invalid data;

s03: the JTransformprocessor converts data in the json data set according to different data types;

s04: json data converted by the MultiPutHBaseJSON processor is stored in the mysql database.

Preferably, S00 specifically includes the following:

reading data by adopting a MultiGetHBase processor according to the specified HBase data source information and the extraction strategy;

the extraction strategy comprises incremental extraction and full extraction; when the increment extraction is executed, extracting data changed after the last task is finished; when the full amount extraction is performed, all data is extracted.

Preferably, the MultiGetHBase processor separates the extracted json data set into json data groups.

Preferably, the method further comprises monitoring the data extraction task by using a JMonitorProcessor processor.

Preferably, the MultiPutHBaseJSON processor writes json data into the mysql database in batches.

The invention has the following beneficial effects:

1. according to the invention, through visual page configuration, the convenient and fast efficacy transfer from the Hbase data source to the mysql target data source is realized;

2. in the data extraction process, the data is transmitted in a json format, so that the problem can be timely positioned through the generated json data and the data can be conveniently modified;

3. the invention reduces the requirement of professional knowledge of operators through interface configuration operation, is convenient to operate and improves the efficiency.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a json-based method for extracting Hbase-mysql data according to the invention;

FIG. 2 is a schematic diagram of a create task interface in the present invention;

FIG. 3 is a schematic diagram of an interface for configuring an Hbase data source, an extraction field and an extraction strategy according to the present invention;

FIG. 4 is a schematic diagram of an interface for configuring mysql data sources and table field information in the present invention;

FIG. 5 is a schematic diagram of an interface for configuring cleaning rules according to the present invention;

FIG. 6 is a schematic interface diagram of the mapping relationship between the Hbase table field and the mysql table field established in the present invention;

FIG. 7 is a schematic diagram of an interface for configuring a loading policy and a time scheduling policy according to the present invention;

FIG. 8 is a diagram illustrating a conventional scheme for transferring data from the Hbase database to the mysql database in the background art of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The implementation method comprises the following steps:

referring to fig. 1, the present invention is a json-based method for extracting data from Hbase to mysql, comprising the following steps:

the configuration of the HBase data source information further comprises the steps of selecting the HBase data source, a table name and column family or column information to be extracted; reading data by adopting a MultiGetHBase processor according to the specified HBase data source information and the extraction strategy;

configuring Hbase database information, and connecting the Hbase database information to the Hbase database; since there may be multiple column families in the same rowkey data in the Hbase database, there may be multiple columns under a column family, so there is multidimensional data, but the mysql database stores data in one-dimensional form. At this time, the mapping of data needs to be processed; through a visual configuration mode, data with different dimensions can be definitely specified to be transferred to different storage places, and therefore association of source data and target data is achieved.

The format of the source data is as follows:

ROW COLUMN+CELL

001column=info1:age, timestamp=1536739425819, value=18

001column=info1:name, timestamp=1536542197314, value=tom

001column=info2:chinese, timestamp=1536542217612, value=89

001column=info2:math, timestamp=1536739462180, value=67

002column=info1:age, timestamp=1536739588914, value=34

002column=info1:name, timestamp=1536739081007, value=James

002column=info1:sex, timestamp=1539335801990, value=male

002column=info2:chinese, timestamp=1536739111620, value=78

002 column=info2:math, timestamp=1536739480289, value=76

003column=info1:age, timestamp=1537155958877, value=37

004 column=info1:name, timestamp=1537947449018, value=jack

referring to fig. 2-7, a specific configuration process is shown:

wherein, as shown in fig. 2, firstly, a task name is established; as shown in fig. 3, when the Hbase data source, the extracted field and the extraction strategy are selected, the Hbase data source is automatically synchronized to the right frame after clicking the field in the left frame; as shown in fig. 4, mysql data source and table field information are configured; as shown in fig. 5, a cleaning rule is configured; as shown in fig. 6, the conversion rule is configured, and meanwhile, the mapping relationship between the Hbase table field and the mysql table field is also automatically established; as shown in fig. 7, a loading policy and a time scheduling policy are configured, the loading policy may select full deletion and incremental deletion, and the time scheduling policy may select interval time or cron expression, so as to complete creation and configuration tasks;

s01: executing a query on a visual interface to generate a json data set;

specifically, the extraction strategy comprises incremental extraction and full extraction; when the incremental extraction is selected, the system records the time point of the last extraction completion, and each task only extracts data changed after the last task is completed; when the full extraction is selected, the data in the table are extracted completely. In order to prevent the problem of memory overflow caused by overlarge data volume, the processor can automatically group json data, and finally, a complete json data set is divided into a plurality of small json data groups, so that the transfer is facilitated, and the performance is improved; in order to realize reasonable grouping of json, data in a table needs to be sorted firstly, then extracted in segments, the size of each segment can be manually configured by a user (the default is that 1 ten thousand pieces of data are in a group), and finally the extracted data is converted into json character strings.

because of the existence of dirty data and invalid data, the JCleanprocessor automatically analyzes all data, finds out data which does not meet the requirements, and deletes the data from the json result set;

the specific process is as follows: performing cleaning rule check on all data in batch according to the json result set generated by the last processor, for example, if the cleaning rule is to judge whether the age is greater than 18 years, finding age information from the json result set, then comparing the age information with the age of 18 years, if so, retaining the data, and if not, deleting the data from the whole json result set;

s04: saving json data converted by the MultiPutHBaseJSON processor into a mysql database;

specifically, a MultiPutHBaseJSON processor writes json data into a mysql database in batches; the MultiPutHBaseJSON processor writes a plurality of pieces of data into mysql at one time by adopting a batch processing technology, batch submission is achieved, and the writing times of a database are reduced; meanwhile, data exception processing is also carried out, when storage fails, data rollback is achieved, and all error data can be recorded.

The method comprises the following steps of monitoring a data extraction task by adopting a JMonitorProcessor processor; the jmonitor processor completely records the execution condition of the whole extraction task, including the start time and the end time of the task, the amount of extracted data, the amount of cleaned data, the amount of converted data, the amount of warehoused data, the amount of error data and other information.

It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

In addition, it is understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing associated hardware, and the corresponding program may be stored in a computer-readable storage medium.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A data extraction method from Hbase to mysql based on json is characterized by comprising the following steps:

s01: executing a query on a visual interface to generate a json data set;

2. The method for extracting data from hbsase to mysql based on json as claimed in claim 1, wherein S00 specifically includes the following:

3. The json-based Hbase-to-mysql data extraction method of claim 3, wherein the MultiGetHBase processor separates the extracted json dataset into json data sets.

4. The json-based data extraction method from Hbase to mysql as claimed in claim 1, further comprising monitoring the data extraction task using a JMonitorProcessor processor.

5. The json-based Hbase-to-mysql data extraction method of claim 1, wherein the MultiPutHBaseJSON processor writes json data to the mysql database in batches.