CN110633271A - Data extraction method from Hbase to mysql based on json - Google Patents

Data extraction method from Hbase to mysql based on json Download PDF

Info

Publication number
CN110633271A
CN110633271A CN201911137971.XA CN201911137971A CN110633271A CN 110633271 A CN110633271 A CN 110633271A CN 201911137971 A CN201911137971 A CN 201911137971A CN 110633271 A CN110633271 A CN 110633271A
Authority
CN
China
Prior art keywords
data
json
hbase
mysql
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911137971.XA
Other languages
Chinese (zh)
Inventor
何静
汤同伟
霍荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unicloud Nanjing Digital Technology Co Ltd
Original Assignee
Unicloud Nanjing Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unicloud Nanjing Digital Technology Co Ltd filed Critical Unicloud Nanjing Digital Technology Co Ltd
Priority to CN201911137971.XA priority Critical patent/CN110633271A/en
Publication of CN110633271A publication Critical patent/CN110633271A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus

Abstract

The invention discloses a data extraction method from Hbase to mysql based on json, and relates to the technical field of data processing. The method comprises the steps of configuring HBase data source information, mysql target data source information, an extraction strategy, a cleaning rule, a conversion rule and a field mapping relation on a visual interface; executing a query on a visual interface to generate a json data set; the JCleanprocessor analyzes dirty data and invalid data in the json data set and deletes the dirty data and the invalid data; the JTransformprocessor converts data in the json data set according to different data types; json data converted by the MultiPutHBaseJSON processor is stored in the mysql database. According to the invention, through visual page configuration, the convenient and fast efficacy transfer from the Hbase data source to the mysql target data source is realized.

Description

Data extraction method from Hbase to mysql based on json
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a json-based data extraction method from Hbase to mysql.
Background
Hbase is a distributed, column-oriented open-source database, which is a database suitable for unstructured data storage. Meanwhile, the Hbase realizes a database system with high reliability, high performance, column storage, scalability and real-time reading and writing. But due to the inherent limitations of a single RowKey, it was determined that it was not possible to efficiently support multi-conditional queries. This results in that Hbase is not so convenient to use when the service scenario is complicated, multi-table association is required and multi-conditional queries are required. Meanwhile, in many business systems, many data are stored in a relational database, such as mysql. Thus, when data synchronization is required, data transfers across data sources can occur.
Referring to fig. 8, a common technical scheme for transferring data from the Hbase database to the mysql database is to export the data in the Hbase to the csv file or txt file, and then import the data in the csv file or txt file to the mysql database. However, this method has the following disadvantages:
a. reading data from Hbase requires command line operation, and has high requirement on professional knowledge of operators;
b. when writing data to mysql, since the amount of data stored in Hbase may be large, there may be various performance problems at the time of introduction.
c. The user experience is poor due to the fact that the visual operation interface is not available.
The invention provides a data extraction method from Hbase to mysql based on json, which is used for solving the following problems:
1) through a visual configuration mode, a user configures Hbase data source information, mysql target data source information, a cleaning rule, a conversion rule, a field mapping relation and data loading which are required to be extracted according to the actual requirement of the user; while the data is transferred in json format to effect the transfer of data from Hbase to mysql library.
2) Due to the existence of column families in Hbase, a plurality of columns exist under one column family; whereas mysql is a relational database with no column families; the technical scheme of the invention also solves the problem of how to take out the required column from the column family of Hbase and store the value of the column into mysql. .
Disclosure of Invention
The invention aims to provide a json-based method for extracting data from Hbase to mysql, which realizes convenient and rapid effect transfer from an Hbase data source to a mysql target data source through visual page configuration.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a data extraction method from Hbase to mysql based on json, which comprises the following steps:
s00: configuring HBase data source information, mysql target data source information, an extraction strategy, a cleaning rule, a conversion rule and a field mapping relation on a visual interface;
the configuration of the HBase data source information further comprises the steps of selecting the HBase data source, the table name and the column family or the column information to be extracted;
s01: executing a query on a visual interface to generate a json data set;
s02: the JCleanprocessor analyzes dirty data and invalid data in the json data set and deletes the dirty data and the invalid data;
s03: the JTransformprocessor converts data in the json data set according to different data types;
s04: json data converted by the MultiPutHBaseJSON processor is stored in the mysql database.
Preferably, S00 specifically includes the following:
reading data by adopting a MultiGetHBase processor according to the specified HBase data source information and the extraction strategy;
the extraction strategy comprises incremental extraction and full extraction; when the increment extraction is executed, extracting data changed after the last task is finished; when the full amount extraction is performed, all data is extracted.
Preferably, the MultiGetHBase processor separates the extracted json data set into json data groups.
Preferably, the method further comprises monitoring the data extraction task by using a JMonitorProcessor processor.
Preferably, the MultiPutHBaseJSON processor writes json data into the mysql database in batches.
The invention has the following beneficial effects:
1. according to the invention, through visual page configuration, the convenient and fast efficacy transfer from the Hbase data source to the mysql target data source is realized;
2. in the data extraction process, the data is transmitted in a json format, so that the problem can be timely positioned through the generated json data and the data can be conveniently modified;
3. the invention reduces the requirement of professional knowledge of operators through interface configuration operation, is convenient to operate and improves the efficiency.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a json-based method for extracting Hbase-mysql data according to the invention;
FIG. 2 is a schematic diagram of a create task interface in the present invention;
FIG. 3 is a schematic diagram of an interface for configuring an Hbase data source, an extraction field and an extraction strategy according to the present invention;
FIG. 4 is a schematic diagram of an interface for configuring mysql data sources and table field information in the present invention;
FIG. 5 is a schematic diagram of an interface for configuring cleaning rules according to the present invention;
FIG. 6 is a schematic interface diagram of the mapping relationship between the Hbase table field and the mysql table field established in the present invention;
FIG. 7 is a schematic diagram of an interface for configuring a loading policy and a time scheduling policy according to the present invention;
FIG. 8 is a diagram illustrating a conventional scheme for transferring data from the Hbase database to the mysql database in the background art of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The implementation method comprises the following steps:
referring to fig. 1, the present invention is a json-based method for extracting data from Hbase to mysql, comprising the following steps:
s00: configuring HBase data source information, mysql target data source information, an extraction strategy, a cleaning rule, a conversion rule and a field mapping relation on a visual interface;
the configuration of the HBase data source information further comprises the steps of selecting the HBase data source, a table name and column family or column information to be extracted; reading data by adopting a MultiGetHBase processor according to the specified HBase data source information and the extraction strategy;
configuring Hbase database information, and connecting the Hbase database information to the Hbase database; since there may be multiple column families in the same rowkey data in the Hbase database, there may be multiple columns under a column family, so there is multidimensional data, but the mysql database stores data in one-dimensional form. At this time, the mapping of data needs to be processed; through a visual configuration mode, data with different dimensions can be definitely specified to be transferred to different storage places, and therefore association of source data and target data is achieved.
The format of the source data is as follows:
ROW COLUMN+CELL
001column=info1:age, timestamp=1536739425819, value=18
001column=info1:name, timestamp=1536542197314, value=tom
001column=info2:chinese, timestamp=1536542217612, value=89
001column=info2:math, timestamp=1536739462180, value=67
002column=info1:age, timestamp=1536739588914, value=34
002column=info1:name, timestamp=1536739081007, value=James
002column=info1:sex, timestamp=1539335801990, value=male
002column=info2:chinese, timestamp=1536739111620, value=78
002 column=info2:math, timestamp=1536739480289, value=76
003column=info1:age, timestamp=1537155958877, value=37
004 column=info1:name, timestamp=1537947449018, value=jack
referring to fig. 2-7, a specific configuration process is shown:
wherein, as shown in fig. 2, firstly, a task name is established; as shown in fig. 3, when the Hbase data source, the extracted field and the extraction strategy are selected, the Hbase data source is automatically synchronized to the right frame after clicking the field in the left frame; as shown in fig. 4, mysql data source and table field information are configured; as shown in fig. 5, a cleaning rule is configured; as shown in fig. 6, the conversion rule is configured, and meanwhile, the mapping relationship between the Hbase table field and the mysql table field is also automatically established; as shown in fig. 7, a loading policy and a time scheduling policy are configured, the loading policy may select full deletion and incremental deletion, and the time scheduling policy may select interval time or cron expression, so as to complete creation and configuration tasks;
s01: executing a query on a visual interface to generate a json data set;
specifically, the extraction strategy comprises incremental extraction and full extraction; when the incremental extraction is selected, the system records the time point of the last extraction completion, and each task only extracts data changed after the last task is completed; when the full extraction is selected, the data in the table are extracted completely. In order to prevent the problem of memory overflow caused by overlarge data volume, the processor can automatically group json data, and finally, a complete json data set is divided into a plurality of small json data groups, so that the transfer is facilitated, and the performance is improved; in order to realize reasonable grouping of json, data in a table needs to be sorted firstly, then extracted in segments, the size of each segment can be manually configured by a user (the default is that 1 ten thousand pieces of data are in a group), and finally the extracted data is converted into json character strings.
S02: the JCleanprocessor analyzes dirty data and invalid data in the json data set and deletes the dirty data and the invalid data;
because of the existence of dirty data and invalid data, the JCleanprocessor automatically analyzes all data, finds out data which does not meet the requirements, and deletes the data from the json result set;
the specific process is as follows: performing cleaning rule check on all data in batch according to the json result set generated by the last processor, for example, if the cleaning rule is to judge whether the age is greater than 18 years, finding age information from the json result set, then comparing the age information with the age of 18 years, if so, retaining the data, and if not, deleting the data from the whole json result set;
s03: the JTransformprocessor converts data in the json data set according to different data types;
s04: saving json data converted by the MultiPutHBaseJSON processor into a mysql database;
specifically, a MultiPutHBaseJSON processor writes json data into a mysql database in batches; the MultiPutHBaseJSON processor writes a plurality of pieces of data into mysql at one time by adopting a batch processing technology, batch submission is achieved, and the writing times of a database are reduced; meanwhile, data exception processing is also carried out, when storage fails, data rollback is achieved, and all error data can be recorded.
The method comprises the following steps of monitoring a data extraction task by adopting a JMonitorProcessor processor; the jmonitor processor completely records the execution condition of the whole extraction task, including the start time and the end time of the task, the amount of extracted data, the amount of cleaned data, the amount of converted data, the amount of warehoused data, the amount of error data and other information.
It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
In addition, it is understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (5)

1. A data extraction method from Hbase to mysql based on json is characterized by comprising the following steps:
s00: configuring HBase data source information, mysql target data source information, an extraction strategy, a cleaning rule, a conversion rule and a field mapping relation on a visual interface;
the configuration of the HBase data source information further comprises the steps of selecting the HBase data source, the table name and the column family or the column information to be extracted;
s01: executing a query on a visual interface to generate a json data set;
s02: the JCleanprocessor analyzes dirty data and invalid data in the json data set and deletes the dirty data and the invalid data;
s03: the JTransformprocessor converts data in the json data set according to different data types;
s04: json data converted by the MultiPutHBaseJSON processor is stored in the mysql database.
2. The method for extracting data from hbsase to mysql based on json as claimed in claim 1, wherein S00 specifically includes the following:
reading data by adopting a MultiGetHBase processor according to the specified HBase data source information and the extraction strategy;
the extraction strategy comprises incremental extraction and full extraction; when the increment extraction is executed, extracting data changed after the last task is finished; when the full amount extraction is performed, all data is extracted.
3. The json-based Hbase-to-mysql data extraction method of claim 3, wherein the MultiGetHBase processor separates the extracted json dataset into json data sets.
4. The json-based data extraction method from Hbase to mysql as claimed in claim 1, further comprising monitoring the data extraction task using a JMonitorProcessor processor.
5. The json-based Hbase-to-mysql data extraction method of claim 1, wherein the MultiPutHBaseJSON processor writes json data to the mysql database in batches.
CN201911137971.XA 2019-11-20 2019-11-20 Data extraction method from Hbase to mysql based on json Pending CN110633271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911137971.XA CN110633271A (en) 2019-11-20 2019-11-20 Data extraction method from Hbase to mysql based on json

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911137971.XA CN110633271A (en) 2019-11-20 2019-11-20 Data extraction method from Hbase to mysql based on json

Publications (1)

Publication Number Publication Date
CN110633271A true CN110633271A (en) 2019-12-31

Family

ID=68979468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911137971.XA Pending CN110633271A (en) 2019-11-20 2019-11-20 Data extraction method from Hbase to mysql based on json

Country Status (1)

Country Link
CN (1) CN110633271A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694824A (en) * 2020-05-25 2020-09-22 智强通达科技(北京)有限公司 Method for mapping and cleaning oil data chain

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069551A (en) * 2019-04-25 2019-07-30 江南大学 Medical Devices O&M information excavating analysis system and its application method based on Spark
CN110196876A (en) * 2019-06-05 2019-09-03 浪潮软件股份有限公司 A method of it is isolated tool based on web administration and scheduling Kettle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069551A (en) * 2019-04-25 2019-07-30 江南大学 Medical Devices O&M information excavating analysis system and its application method based on Spark
CN110196876A (en) * 2019-06-05 2019-09-03 浪潮软件股份有限公司 A method of it is isolated tool based on web administration and scheduling Kettle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
丿青辞: "从Hbase导入数据至Mysql", 《博客园:HTTPS://WWW.CNBLOGS.COM/AUTUMNNIGHT/P/10938709.HTML》 *
崔有文 等: "基于KETTLE的数据集成研究", 《计算机技术与发展》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694824A (en) * 2020-05-25 2020-09-22 智强通达科技(北京)有限公司 Method for mapping and cleaning oil data chain

Similar Documents

Publication Publication Date Title
US11023448B2 (en) Data scrubbing method and apparatus, and computer readable storage medium
CN111046034B (en) Method and system for managing memory data and maintaining data in memory
CN102129458B (en) Method and device for storing relational database
US10452491B2 (en) Scalable log partitioning system
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
CN112286941B (en) Big data synchronization method and device based on Binlog + HBase + Hive
US9619492B2 (en) Data migration
CN107301214B (en) Data migration method and device in HIVE and terminal equipment
CN109376196B (en) Method and device for batch synchronization of redo logs
CN103593440B (en) The reading/writing method and device of journal file
CN111324610A (en) Data synchronization method and device
US10671586B2 (en) Optimal sort key compression and index rebuilding
AU2018290753B2 (en) Systems and methods of creation and deletion of tenants within a database
WO2019161645A1 (en) Shell-based data table extraction method, terminal, device, and storage medium
JP2016539449A (en) Database implementation method
US20180150536A1 (en) Instance-based distributed data recovery method and apparatus
CN110245134B (en) Increment synchronization method applied to search service
CN105900093A (en) Keyvalue database data table updating method and data table updating device
CN108121564A (en) A kind of json data Fast synchronization and the method and system device of control
CN101833511A (en) Data management method, device and system
CN110633271A (en) Data extraction method from Hbase to mysql based on json
CN113760902A (en) Data splitting method, device, equipment, medium and program product
CN113094442A (en) Full data synchronization method, device, equipment and medium
CN116010345A (en) Method, device and equipment for realizing table service scheme of flow batch integrated data lake
JP2015130165A (en) Automated compilation of graph input for hypergraph solver

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191231

RJ01 Rejection of invention patent application after publication