CN110209680A - Data-updating method, device and electronic device based on Hive external table - Google Patents
Data-updating method, device and electronic device based on Hive external table Download PDFInfo
- Publication number
- CN110209680A CN110209680A CN201910340498.9A CN201910340498A CN110209680A CN 110209680 A CN110209680 A CN 110209680A CN 201910340498 A CN201910340498 A CN 201910340498A CN 110209680 A CN110209680 A CN 110209680A
- Authority
- CN
- China
- Prior art keywords
- data
- new data
- updated
- hive
- tables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000005192 partition Methods 0.000 claims description 27
- 230000015654 memory Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 7
- 230000006399 behavior Effects 0.000 claims description 5
- 230000001360 synchronised effect Effects 0.000 abstract description 12
- 230000008569 process Effects 0.000 description 13
- 238000012360 testing method Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 230000008520 organization Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of data-updating method based on Hive external table, device and electronic devices, wherein this method comprises: extracting more new data table from target service side, wherein the table name of more new data table meets preset table name and updates rule;More new data table is uploaded to the target directory of HDFS file system;The matched tables of data to be updated of table name of rule lookup and more new data table in the library Hive is updated according to table name, and tables of data to be updated is executed and deletes table handling, wherein, tables of data to be updated meets the external table to be updated that table name updates rule for be created in advance in the library Hive, table name;The external table handling of creation is executed in the library Hive for more new data table, wherein the external table path for creating external table handling is the path of target directory.Through the invention, data in the prior art are solved and update the technical issues of synchronous method operation takes a long time.
Description
Technical field
The present invention relates to database technical fields, in particular to a kind of data update side based on Hive external table
Method, device and electronic device.
Background technique
When Hive database executes data and updates, usually operated using internal table, if business side has updated
Table structure, and table structure is updated in the library Hive not in time, then after business side is drawn into newest data, Hive database
It will lead to data parsing when updating and serial problem occur, if for example, the data stored in Hive database are as follows:
b | c |
test | test |
test | Test |
And the newest table of business side are as follows:
a | b | c |
1 | test | test |
2 | test | Test |
So in the newest table of Hive database parsing business side, it will lead to serial problem, be as follows:
b | c |
1 | test |
2 | Test |
If modified to the table structure in Hive database, since the table structure of internal table is operated with data manipulation not
Separation, will lead to causes a corresponding column data to be modified after table structure is modified, to increase the process of operation.
The settling mode of existing scheme be inspection business side table structure and the library Hive in correspond to table table structure whether one
It causes, in the case where comparing out inconsistent, table currently stored in the library Hive is deleted, (the inside table delete operation in the library Hive
Data file will be deleted simultaneously), it is synchronous then to carry out table structure, and then it is synchronous to execute data.This mode of operation process is numerous
It is trivial, and when data volume is bigger, it is time-consuming relatively long.
For data in the prior art update synchronous method operation take a long time the technical issues of, at present it is not yet found that
The solution of effect.
Summary of the invention
The embodiment of the invention provides a kind of data-updating method based on Hive external table, device and electronic device, with
It at least solves data in the prior art and updates the technical issues of synchronous method operation takes a long time.
According to one embodiment of present invention, a kind of data-updating method based on Hive external table is provided, comprising: from
Extract more new data table in target service side, wherein the table name of more new data table meets preset table name and updates rule;Number will be updated
The target directory of HDFS file system is uploaded to according to table;Rule is updated according to table name to search in the library Hive and more new data table
The matched tables of data to be updated of table name, and tables of data to be updated is executed and deletes table handling, wherein tables of data to be updated is Hive
Be created in advance in library, table name meets the external table to be updated that table name updates rule;For more new data table in the library Hive
Middle execution creates external table handling, wherein the external table path for creating external table handling is the path of target directory.
Further, more new data table is uploaded to the target directory of HDFS file system, comprising: according to default square partition
Formula creates multiple partition directories in target directory;More new data table is uploaded to respectively in corresponding partition directory;In Hive
It is executed in library and creates external table handling, comprising: the partition directory for including in parsing target directory;According to parsing result in the library Hive
The middle multiple outer subsectors tables of creation, wherein multiple outer subsectors tables and multiple partition directories correspond.
Further, after tables of data to be updated corresponding to more new data table executes deletion table handling in the library Hive,
This method further include: delete operation is executed to the data file of tables of data to be updated in HDFS file system.
Further, more new data table is uploaded to the target directory of HDFS file system, comprising: will be on more new data table
Reach SFTP server;More new data table is uploaded in the corresponding target directory of HDFS file system from SFTP server.
Further, more new data table is extracted from target service side, comprising: using Kettle tool in target service side
Database extracts data;The data extracted are stored to more new data table.
The technical solution that any of the above-described embodiment of the method provides, when updating, first will by the storage organization of external table
More new data table is stored into HDFS file system, allows the member that the external table in the library Hive is directly deleted in update table
Data, and then the more new data table that new external table is directed toward HDFS file system is created in the library Hive, greatly reduce data
Synchronous time-consuming duration, and the process that data update operation is simplified, the data solved in the related technology update the side of synchronization
The technical issues of method operation takes a long time.
According to another embodiment of the invention, a kind of data update apparatus based on Hive external table is provided, comprising:
Abstraction module, for extracting more new data table from target service side, wherein the table name of more new data table meets preset table name more
New rule;Uploading module, for more new data table to be uploaded to the target directory of HDFS file system;First removing module is used
In the matched tables of data to be updated of table name according to table name update rule lookup and more new data table in the library Hive, and treat more
New data table, which executes, deletes table handling, wherein tables of data to be updated meets table name more for be created in advance in the library Hive, table name
The external table to be updated of new rule;Creation module, for executing creation external table behaviour in the library Hive for more new data table
Make, wherein the external table path for creating external table handling is the path of target directory.
Further, uploading module includes: the first creating unit, for creating in target directory according to default partitioned mode
Build multiple partition directories;First uploading unit, for more new data table to be uploaded to respectively in corresponding partition directory;Create mould
Block includes: resolution unit, for parsing the partition directory for including in target directory;Second creating unit, for being tied according to parsing
Fruit creates multiple outer subsectors tables in the library Hive, wherein multiple outer subsectors tables and multiple partition directories correspond.
Further, the device further include: the second removing module, for executing deletion table to the tables of data to be updated
After operation, delete operation is executed to data file in HDFS file system, corresponding with tables of data to be updated is stored in.
Further, uploading module includes: the second uploading unit, for more new data table to be uploaded to SFTP server;
Third uploading unit, for more new data table to be uploaded in the corresponding target directory of HDFS file system from SFTP server.
Further, abstraction module includes: extracting unit, for the database using Kettle tool in target service side
Extract data;Storage unit, for storing the data extracted to more new data table.
The technical solution that any of the above-described Installation practice provides, when updating, first will by the storage organization of external table
More new data table is stored into HDFS file system, allows the member that the external table in the library Hive is directly deleted in update table
Data, and then the more new data table that new external table is directed toward HDFS file system is created in the library Hive, greatly reduce data
Synchronous time-consuming duration, and the process that data update operation is simplified, the data solved in the related technology update the side of synchronization
The technical issues of method operation takes a long time.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation, and
Technical effect corresponding with corresponding embodiment of the method can be reached.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described
Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described
Step in embodiment of the method, and technical effect corresponding with corresponding embodiment of the method can be reached.
Through the invention, first more new data table is stored to HDFS file when updating by the storage organization of external table
In system, allow the metadata for directly deleting the external table in the library Hive in update table, and then is created in the library Hive
New external table is directed toward the more new data table of HDFS file system, greatly reduces the synchronous time-consuming duration of data, and simplify
Data update the process of operation, solve data in the related technology and update the technology that synchronous method operation takes a long time and ask
Topic.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the data-updating method according to an embodiment of the present invention based on Hive external table;
Fig. 2 is the schematic diagram of the data update apparatus according to an embodiment of the present invention based on Hive external table;
Fig. 3 is a kind of hardware block diagram of computer equipment of the embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
The embodiment of the application a part, instead of all the embodiments, in the absence of conflict, embodiment and reality in the application
The feature applied in example can be combined with each other.Based on the embodiment in the application, those of ordinary skill in the art are not making wound
Every other embodiment obtained under the premise of the property made labour, shall fall within the protection scope of the present application.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
Present embodiments provide a kind of data-updating method based on Hive external table, can run on computer equipment,
Among server, mobile terminal, handheld terminal or similar arithmetic facility.Operating in different arithmetic facilities only is that scheme is being held
Difference in row main body, those skilled in the art are contemplated that in nonidentity operation equipment, operation can generate identical technical effect.
Data-updating method provided in this embodiment based on Hive external table is used to data update being synchronized to the library Hive
In, specifically, the tables of data in the library Hive uses the storage organization of external table, it, first will the storage of more new data table when updating
Into HDFS file system, allow the metadata that the external table in the library Hive is directly deleted in update table, Jin Er
The more new data table that new external table is directed toward HDFS file system is created in the library Hive, when greatly reducing the time-consuming that data synchronize
It is long, and simplify the process that data update operation.
As shown in Figure 1, the data-updating method provided in this embodiment based on Hive external table includes the following steps:
Step 101, more new data table is extracted from target service side, wherein the table name of more new data table meets preset table
Name updates rule.
Business side is the data source side of the data-updating method based on Hive external table of the present embodiment, be can be any
The data of the data of type, business side may change, for example, the customer information, etc. of bank.It is provided in this embodiment
Data-updating method based on Hive external table can be updated according to the actual demand of target service side, the purpose of update be by
The data pick-up to be updated of data source (target service side) goes out, and is updated in the library Hive.The movement that triggering updates can be
It is updated according to the fixed period, alternatively, the demand in response to target service side is updated, the embodiment of the present invention, which does not do this, to be had
Body limits, and when needing more new data table, starts to execute method and step provided in an embodiment of the present invention.
Some data extraction tools can be used by extracting update table, for example, using data warehouse technology (Extract-
Transform-Load, abbreviation ETL) tool.ETL tool is that one kind can execute from data source extraction, turn to data
It changes, data is loaded onto the tools of the operations such as target side, in ETL tool, an execution data pick-up task can be created
Tool comes out the data pick-up of target service side, and can be written into the file of preset format, obtains more new data
Table.
Wherein, the table name of more new data table is that rule name is updated according to preset table name, for example, can use
The format convention of " xxbank_customerdata_ update date " is named table name, wherein the previous section of table name
Field " xxbank_customerdata_ " be it is constant, " update date " according to the date on the same day determine, thus from table name area
Separate the tables of data that same date does not update.Correspondingly, table can also be based on tables of data to be updated corresponding to more new data table
Name updates rule searching and arrives.
Optionally, it can choose the task that Kettle tool executes data pick-up.Kettle tool is a kind of ETL of open source
Tool, pure java write, and can run on Windows, Linux, Unix, and the Job in Kettle tool set can describe number
According to the particular content of operation.Therefore, it can use the Job in ETL tool more new data is extracted and write from target service side
Enter in more new data table.It may include: using Kettle tool in mesh correspondingly, extracting more new data table from target service side
The database of mark business side extracts data, and the data extracted are stored to more new data table.
Step 102, more new data table is uploaded to the target directory of HDFS file system.
After extracting more new data table from target service side, more new data table is uploaded to corresponding to the library Hive
In the target directory of HDFS file system (Hadoop DistributedFile System, distributed file system).By table
Before loading (load) to the library Hive, require to be uploaded to table in HDFS file system.
It should be noted that target directory can be any one specified catalogue on HDFS, still, catalogue "/
Except user/hive/warehouse ".That is, more new data table can be uploaded to HDFS file system except "/user/
In any one preassigned catalogue other than hive/warehouse ".This is because catalogue "/user/hive/
Warehouse " is in HDFS file system for storing the default storage location of internal table data, it is impossible to be used in storage external table
Data.
Step 103, it is searched in the library Hive according to table name update rule matched to be updated with the table name of more new data table
Tables of data, and tables of data to be updated is executed and deletes table handling.
The embodiment of the present invention is to become for being updated to the tables of data in the library Hive since the data of business side can exist
Dynamic, the corresponding data in the library Hive is also required to update therewith.In the present embodiment, the tables of data of business side in the library Hive all with
The form of external table stores, therefore, before this update before the tables of data that once updates be also with outside in the library Hive
The form of table exists.In order to update the data in the library Hive, by the library Hive with more new data table corresponding to data to be updated
Table is deleted, and deletes table handling (drop operation) that is, executing, the external table of the business side once stored preceding in the library Hive is deleted
It removes.Tables of data to be updated is the external table being created in advance in the library Hive, is tables of data to be updated.
When searching tables of data to be updated matched with the table name of more new data table, can be updated based on table name it is regular come
Match.For example, if each data update when, using the format convention of " xxbank_ customer data _ update date " come to table name
It is named, this tables of data table name updated is " yyyy_customerdata_20190124 ", then searches in the library Hive
Table name has the tables of data of " yyyy_customerdata " field, after finding, determines that it is corresponding with more new data table
Tables of data to be updated.
It when using external table, is executed in the library Hive and deletes external table handling, it is outer only to delete description in the library Hive
The metadata of portion's table, metadata (Metadata) are a kind of data for describing other data, can be regarded as a kind of electronic type catalogue,
The attribute of other data can be described, to indicate the storage location etc. of other data.Therefore, the metadata in the library Hive is being deleted
Later, the data file in HDFS file system described in metadata is not will be deleted, file still exists in HDFS file
In system.Therefore, the response speed for deleting external table handling is very fast, convenient for quickly handling next creation external table behaviour
Make.
Optionally, the text of the data in HDFS file system will not be deleted due to executing the external table handling of deletion in the library Hive
Part is executed in the library Hive after deleting external table handling, the number to be updated still stored in HDFS file system in order to prevent
System memory space is occupied according to table, or may result in and accidentally store more new data table to before when being updated later on tables of data
Catalogue in, tables of data to be updated is deleted in HDFS file system, with reach complete deletion tables of data to be updated data text
The purpose of part.
Step 104, the external table handling of creation is executed in the library Hive for more new data table.
Tables of data to be updated is executed in the library Hive after deleting external table handling, is executed outside creation in the library Hive
Table handling, the external table path for creating external table handling is the path of target directory, that is, creation is for describing in the library Hive
The metadata of external table.
When creating external table in Hive database, the organization of create table sentence can be used
External clause creates.External table (external table) refers to the table being not present in Hive database, external table
It is to belong to an operating system file among HDFS file system, by creating description external table in Hive database
Metadata, to enable Hive database to execute some limited (read-only) operations to external table, one can be operated
System file treats as a read-only database table, carries out just as being stored in a general data library table these data
Access.
For example, can execute inquiry operation and connection to external table can pass through when accessing to external table
SQL statement is completed, and without first the data in external table are loaded into Hive database, supports parallel work-flow.
Optionally, can using outer subsectors table form store in Hive database with more new data, can be preparatory
Predefined when creating Hive database, define the mode of subregion, can by some field of each data come
Subregion is carried out, for example, subregion can be carried out according to day, data on the same day are stored to the same partition directory of target directory
Under, data on the same day are not respectively stored under the partition directory that target directory corresponds to day.
It is corresponding with partitioned mode predefined in the library Hive, in the mesh that more new data is uploaded to HDFS file system
When in heading record, also multiple partition directories are created in target directory according to the default partitioned mode of Hive database, in turn,
More new data table is uploaded to respectively in corresponding partition directory.For example, when extracting data, it is by date field
The update number of partition directory corresponding on December 13rd, to be uploaded 1 this day is written in the data entry of " 2018-12-13 "
According in table, in turn, after the data pick-up write-in of a tables of data finishes, it is uploaded in corresponding partition directory.
Correspondingly, when the external table that creates the division in the library Hive, included in the target directory of parsing HDFS file system
Partition directory, and according to parsing as a result, in the library Hive the multiple outer subsectors tables (metadata) of corresponding creation.It needs
Illustrate, multiple outer subsectors tables (metadata) in the library Hive are one-to-one with multiple partition directories.
Optionally, when creating outer subsectors table, it can configure and be uploaded in the either partition catalogue in HDFS file system
After more new data table finishes, i.e., corresponding outer subsectors table is created in the library Hive and is directed toward corresponding partition directory, without
It waits the more new data table of all partition directories in HDFS file system all to upload to finish.
Optionally, it since more new data table can be multiple tables, is uploaded in HDFS file system by more new data table
When, be configurable to from target service side is every extracted a more new data table after, be uploaded in HDFS file system, be not necessarily to
All more new data table extractions are waited to finish.Alternatively, in order to reduce with HDFS file system connection disconnect probability, reduce with
The connection number of HDFS file system can also divide multiple more new data tables according to the concurrent concatenation ability of HDFS file system
It is multiple groups, each group includes multiple tables, and all tables extraction in one group is uploaded to HDFS file system after finishing.
As a kind of optional embodiment, when more new data table to be uploaded to the target directory of HDFS file system, tool
The operation of body can be, and first the update data pick-up of target service side is written into local more new data table, then will update
Tables of data is uploaded to SFTP server, in turn, will more new data table from SFTP server to be uploaded to HDFS file system corresponding
In target directory.
It should be noted that will more new data table be uploaded in HDFS file system from SFTP server when, Ke Yixuan
Any optional embodiment for the step of selecting using above-mentioned to the target directory that more new data table is uploaded to HDFS file system,
To reach the attainable technical effect of above-mentioned corresponding embodiment institute.
Corresponding to above-mentioned optional embodiment, show using the data guiding flow of external table to provide a kind of specific process
Example are as follows: Kettle produces file-> sftp server-> hdfs file system.That is, firstly, using Kettle tool from mesh
Mark business side extracts more new data, generates more new data table.Secondly, more new data table is uploaded to SFTP server.Finally, will
More new data table is uploaded to HDFS file system.
Compared with using the storage organization of internal table, when importing data using internal table, it is uploaded to by tables of data
After HDFS file system, it is also necessary to which Hive establishes data connection in library, and more new data table is imported into the data of Hive database
Warehouse address, importing process are cumbersome.
In addition, the library Hive after importing the data, needs to update (to be updated) for the storage organization of internal table
Data structure is compared with updated data structure, to determine the need for original inside table deleting (drop), into
And need to import more new data again, otherwise directly update the problem of causing serial mode.
And more new data is stored in HDFS file system when importing more new data by the more new data in the form of external table
In system, in the library Hive, the metadata for the tables of data that description does not update can be directly deleted, new description is then created
The metadata of external table enormously simplifies the operating process of more new data table, has reached the entire management process of optimization, simplifies data
Problem reparation improves the effect of data updating efficiency.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions
It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not
The sequence being same as herein executes shown or described step.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
A kind of data update apparatus based on Hive external table is additionally provided in the present embodiment, and the device is for realizing upper
Embodiment 1 and its preferred embodiment are stated, to the term or implementation not being described in detail in this present embodiment, reference can be made to embodiment 1
In related description, the descriptions that have already been made will not be repeated.
Term " module " as used below, can be achieved on the combination of the software and/or hardware of predetermined function.Although
Device described in following embodiment is preferably realized with software, but the combined realization of hardware or software and hardware
And can be contemplated.
Fig. 2 is the schematic diagram of the data update apparatus according to an embodiment of the present invention based on Hive external table, such as Fig. 2 institute
Show, which includes: abstraction module 10, uploading module 20, the first removing module 30 and creation module 40.
Wherein, abstraction module, for extracting more new data table from target service side, wherein the table name of more new data table accords with
It closes preset table name and updates rule;Uploading module, for more new data table to be uploaded to the target directory of HDFS file system;The
One removing module, for updating the matched number to be updated of table name of rule lookup and more new data table in the library Hive according to table name
According to table, and to tables of data to be updated execute delete table handling, wherein tables of data to be updated be the library Hive in be created in advance,
Table name meets the external table to be updated that table name updates rule;Creation module, for being held in the library Hive for more new data table
Row creates external table handling, wherein the external table path for creating external table handling is the path of target directory.
Optionally, uploading module includes: the first creating unit, for creating in target directory according to default partitioned mode
Multiple partition directories;First uploading unit, for more new data table to be uploaded to respectively in corresponding partition directory;Creation module
It include: resolution unit, for parsing the partition directory for including in target directory;Second creating unit, for according to parsing result
Multiple outer subsectors tables are created in the library Hive, wherein multiple outer subsectors tables and multiple partition directories correspond.
Optionally, the device further include: the second removing module, for executing deletion table behaviour to the tables of data to be updated
After work, delete operation is executed to data file in HDFS file system, corresponding with tables of data to be updated is stored in.
Optionally, uploading module includes: the second uploading unit, for more new data table to be uploaded to SFTP server;The
Three uploading units, for more new data table to be uploaded in the corresponding target directory of HDFS file system from SFTP server.
Optionally, abstraction module includes: extracting unit, is taken out for the database using Kettle tool in target service side
Access evidence;Storage unit, for storing the data extracted to more new data table.
The embodiment of the present invention is first stored more new data table to HDFS text when updating by the storage organization of external table
In part system, allow the metadata for directly deleting the external table in the library Hive in update table, and then is created in the library Hive
The more new data table that new external table is directed toward HDFS file system is built, the synchronous time-consuming duration of data, and letter are greatly reduced
The process that data update operation is changed, has solved the technology that data update synchronous method operation in the related technology takes a long time and ask
Topic.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any
Combined form is located in different processors.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
Embodiment 3
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read-
Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard
The various media that can store computer program such as disk, magnetic or disk.
Embodiment 4
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory
There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method
Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device
It is connected with above-mentioned processor, which connects with above-mentioned processor.By taking electronic device is computer equipment as an example, figure
3 be a kind of hardware block diagram of computer equipment of the embodiment of the present invention.As shown in figure 3, mobile terminal may include one
Or (processor 302 can include but is not limited to Micro-processor MCV or programmable to multiple (one is only shown in Fig. 3) processors 302
The processing unit of logical device FPGA etc.) and memory 304 for storing data, optionally, above-mentioned mobile terminal can be with
Including the transmission device 306 and input-output equipment 308 for communication function.It will appreciated by the skilled person that
Structure shown in Fig. 3 is only to illustrate, and does not cause to limit to the structure of above-mentioned mobile terminal.For example, mobile terminal can also wrap
Include than shown in Fig. 3 more perhaps less component or with the configuration different from shown in Fig. 3.
Memory 304 can be used for storing computer program, for example, the software program and module of application software, such as this hair
The corresponding computer program of the recognition methods of image in bright embodiment, processor 302 are stored in memory 304 by operation
Computer program realize above-mentioned method thereby executing various function application and data processing.Memory 304 can wrap
Include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or
Other non-volatile solid state memories.In some instances, memory 304 can further comprise long-range relative to processor 302
The memory of setting, these remote memories can pass through network connection to mobile terminal.The example of above-mentioned network includes but not
It is limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 306 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of mobile terminal provide.In an example, transmitting device 306 includes a network adapter
(Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments so as to
It is communicated with internet.In an example, transmitting device 306 can be radio frequency (Radio Frequency, referred to as RF)
Module is used to wirelessly be communicated with internet.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc.
With replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (8)
1. a kind of data-updating method based on Hive external table characterized by comprising
More new data table is extracted from target service side, wherein the table name of the more new data table meets preset table name and updates rule
Then;
The more new data table is uploaded to the target directory of HDFS file system;
The matched data to be updated of table name of rule lookup and the more new data table in the library Hive are updated according to the table name
Table, and the tables of data to be updated is executed and deletes table handling, wherein the tables of data to be updated is preparatory in the library Hive
Be created, table name meets the external table to be updated that the table name updates rule;
The external table handling of creation is executed in the library Hive for the more new data table, wherein the creation external table behaviour
The external table path of work is the path of the target directory.
2. the method according to claim 1, wherein
The target directory that the more new data table is uploaded to HDFS file system, comprising: exist according to default partitioned mode
Multiple partition directories are created in the target directory;The more new data table is uploaded to respectively in corresponding partition directory;
Described execute in the library Hive creates external table handling, comprising: parses the subregion mesh for including in the target directory
Record;Multiple outer subsectors tables are created in the library Hive according to parsing result, wherein the multiple outer subsectors table with it is described
Multiple partition directories correspond.
3. the method according to claim 1, wherein the tables of data to be updated is executed delete table handling it
Afterwards, the method also includes:
Behaviour is deleted to being stored in data file in the HDFS file system, corresponding with the tables of data to be updated and executing
Make.
4. the method according to claim 1, wherein described be uploaded to HDFS file system for the more new data table
The target directory of system, comprising:
The more new data table is uploaded to SFTP server;
The more new data table is uploaded to the corresponding target directory of the HDFS file system from the SFTP server
In.
5. the method according to claim 1, wherein described extract more new data table from target service side, comprising:
Database using Kettle tool in the target service side extracts data;
The data extracted are stored to the more new data table.
6. a kind of data update apparatus based on Hive external table characterized by comprising
Abstraction module, for extracting more new data table from target service side, wherein the table name of the more new data table meets default
Table name update rule;
Uploading module, for the more new data table to be uploaded to the target directory of HDFS file system;
First removing module, for updating the table name of rule lookup and the more new data table in the library Hive according to the table name
Matched tables of data to be updated, and the tables of data to be updated is executed and deletes table handling, wherein the tables of data to be updated is
Be created in advance in the library Hive, table name meets the external table to be updated that the table name updates rule;
Creation module, for executing the external table handling of creation in the library Hive for the more new data table, wherein described
The external table path for creating external table handling is the path of the target directory.
7. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to perform claim when operation and requires method described in 1 to 5 any one.
8. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to run the computer program in method described in perform claim 1 to 5 any one of requirement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910340498.9A CN110209680A (en) | 2019-04-25 | 2019-04-25 | Data-updating method, device and electronic device based on Hive external table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910340498.9A CN110209680A (en) | 2019-04-25 | 2019-04-25 | Data-updating method, device and electronic device based on Hive external table |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110209680A true CN110209680A (en) | 2019-09-06 |
Family
ID=67786494
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910340498.9A Pending CN110209680A (en) | 2019-04-25 | 2019-04-25 | Data-updating method, device and electronic device based on Hive external table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110209680A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241203A (en) * | 2020-02-10 | 2020-06-05 | 江苏满运软件科技有限公司 | Hive data warehouse synchronization method, system, equipment and storage medium |
CN111984659A (en) * | 2020-07-28 | 2020-11-24 | 招联消费金融有限公司 | Data updating method and device, computer equipment and storage medium |
CN113254535A (en) * | 2021-06-08 | 2021-08-13 | 成都新潮传媒集团有限公司 | Method and device for synchronizing data from mongodb to mysql and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140074790A1 (en) * | 2012-09-12 | 2014-03-13 | International Business Machines Corporation | Using a metadata image of a file system and archive instance to restore data objects in the file system |
CN107967316A (en) * | 2017-11-22 | 2018-04-27 | 平安科技(深圳)有限公司 | A kind of method of data synchronization, equipment and computer-readable recording medium |
CN108241724A (en) * | 2017-05-11 | 2018-07-03 | 新华三大数据技术有限公司 | A kind of metadata management method and device |
CN108255909A (en) * | 2017-07-27 | 2018-07-06 | 平安科技(深圳)有限公司 | Tables of data backup method and server based on oracle database |
-
2019
- 2019-04-25 CN CN201910340498.9A patent/CN110209680A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140074790A1 (en) * | 2012-09-12 | 2014-03-13 | International Business Machines Corporation | Using a metadata image of a file system and archive instance to restore data objects in the file system |
CN108241724A (en) * | 2017-05-11 | 2018-07-03 | 新华三大数据技术有限公司 | A kind of metadata management method and device |
CN108255909A (en) * | 2017-07-27 | 2018-07-06 | 平安科技(深圳)有限公司 | Tables of data backup method and server based on oracle database |
CN107967316A (en) * | 2017-11-22 | 2018-04-27 | 平安科技(深圳)有限公司 | A kind of method of data synchronization, equipment and computer-readable recording medium |
Non-Patent Citations (1)
Title |
---|
刘如九;张振山;柴天佑;: "一种通用的多数据库间数据抽取方法及应用", 北京交通大学学报, no. 04, 15 August 2008 (2008-08-15) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241203A (en) * | 2020-02-10 | 2020-06-05 | 江苏满运软件科技有限公司 | Hive data warehouse synchronization method, system, equipment and storage medium |
CN111984659A (en) * | 2020-07-28 | 2020-11-24 | 招联消费金融有限公司 | Data updating method and device, computer equipment and storage medium |
CN111984659B (en) * | 2020-07-28 | 2023-07-21 | 招联消费金融有限公司 | Data updating method, device, computer equipment and storage medium |
CN113254535A (en) * | 2021-06-08 | 2021-08-13 | 成都新潮传媒集团有限公司 | Method and device for synchronizing data from mongodb to mysql and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107111450B (en) | Disk partition stitching and rebalancing using partition tables | |
CN107832406B (en) | Method, device, equipment and storage medium for removing duplicate entries of mass log data | |
CN110209680A (en) | Data-updating method, device and electronic device based on Hive external table | |
CN105956123A (en) | Local updating software-based data processing method and apparatus | |
CN107590207B (en) | Data synchronization method and device and electronic equipment | |
CN103514295B (en) | historical data archiving method and historical data archiving device | |
US9977804B2 (en) | Index updates using parallel and hybrid execution | |
CN106709066B (en) | Data synchronization method and device | |
CN106796588B (en) | The update method and equipment of concordance list | |
CN108536745B (en) | Shell-based data table extraction method, terminal, equipment and storage medium | |
CN110442578A (en) | Zipper table updating method, device, server and computer readable storage medium | |
CN104933051B (en) | File storage recovery method and device | |
KR20160100216A (en) | Method and device for constructing on-line real-time updating of massive audio fingerprint database | |
CN108427736B (en) | Method for querying data | |
US10241963B2 (en) | Hash-based synchronization of geospatial vector features | |
CN106970958A (en) | A kind of inquiry of stream file and storage method and device | |
CN105812469A (en) | Address book synchronization method and device | |
CN105653258A (en) | Code processing method and apparatus | |
US9455740B2 (en) | Data compression apparatus and method | |
US10430400B1 (en) | User controlled file synchronization limits | |
US20210216516A1 (en) | Management of a secondary vertex index for a graph | |
EP2840501A1 (en) | Database management method, database system and program | |
CN106682199B (en) | Method and device for realizing automatic expansion of Mongos cluster | |
CN106168960B (en) | A kind of the adjustment device and its method of adjustment of data resource | |
CN109558270A (en) | Method and apparatus, the method and apparatus of data convert of data backup |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |