CN110209680A - Data-updating method, device and electronic device based on Hive external table - Google Patents

Data-updating method, device and electronic device based on Hive external table Download PDF

Info

Publication number
CN110209680A
CN110209680A CN201910340498.9A CN201910340498A CN110209680A CN 110209680 A CN110209680 A CN 110209680A CN 201910340498 A CN201910340498 A CN 201910340498A CN 110209680 A CN110209680 A CN 110209680A
Authority
CN
China
Prior art keywords
data
new data
updated
hive
tables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910340498.9A
Other languages
Chinese (zh)
Inventor
周之浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910340498.9A priority Critical patent/CN110209680A/en
Publication of CN110209680A publication Critical patent/CN110209680A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data-updating method based on Hive external table, device and electronic devices, wherein this method comprises: extracting more new data table from target service side, wherein the table name of more new data table meets preset table name and updates rule;More new data table is uploaded to the target directory of HDFS file system;The matched tables of data to be updated of table name of rule lookup and more new data table in the library Hive is updated according to table name, and tables of data to be updated is executed and deletes table handling, wherein, tables of data to be updated meets the external table to be updated that table name updates rule for be created in advance in the library Hive, table name;The external table handling of creation is executed in the library Hive for more new data table, wherein the external table path for creating external table handling is the path of target directory.Through the invention, data in the prior art are solved and update the technical issues of synchronous method operation takes a long time.

Description

Data-updating method, device and electronic device based on Hive external table
Technical field
The present invention relates to database technical fields, in particular to a kind of data update side based on Hive external table Method, device and electronic device.
Background technique
When Hive database executes data and updates, usually operated using internal table, if business side has updated Table structure, and table structure is updated in the library Hive not in time, then after business side is drawn into newest data, Hive database It will lead to data parsing when updating and serial problem occur, if for example, the data stored in Hive database are as follows:
b c
test test
test Test
And the newest table of business side are as follows:
a b c
1 test test
2 test Test
So in the newest table of Hive database parsing business side, it will lead to serial problem, be as follows:
b c
1 test
2 Test
If modified to the table structure in Hive database, since the table structure of internal table is operated with data manipulation not Separation, will lead to causes a corresponding column data to be modified after table structure is modified, to increase the process of operation.
The settling mode of existing scheme be inspection business side table structure and the library Hive in correspond to table table structure whether one It causes, in the case where comparing out inconsistent, table currently stored in the library Hive is deleted, (the inside table delete operation in the library Hive Data file will be deleted simultaneously), it is synchronous then to carry out table structure, and then it is synchronous to execute data.This mode of operation process is numerous It is trivial, and when data volume is bigger, it is time-consuming relatively long.
For data in the prior art update synchronous method operation take a long time the technical issues of, at present it is not yet found that The solution of effect.
Summary of the invention
The embodiment of the invention provides a kind of data-updating method based on Hive external table, device and electronic device, with It at least solves data in the prior art and updates the technical issues of synchronous method operation takes a long time.
According to one embodiment of present invention, a kind of data-updating method based on Hive external table is provided, comprising: from Extract more new data table in target service side, wherein the table name of more new data table meets preset table name and updates rule;Number will be updated The target directory of HDFS file system is uploaded to according to table;Rule is updated according to table name to search in the library Hive and more new data table The matched tables of data to be updated of table name, and tables of data to be updated is executed and deletes table handling, wherein tables of data to be updated is Hive Be created in advance in library, table name meets the external table to be updated that table name updates rule;For more new data table in the library Hive Middle execution creates external table handling, wherein the external table path for creating external table handling is the path of target directory.
Further, more new data table is uploaded to the target directory of HDFS file system, comprising: according to default square partition Formula creates multiple partition directories in target directory;More new data table is uploaded to respectively in corresponding partition directory;In Hive It is executed in library and creates external table handling, comprising: the partition directory for including in parsing target directory;According to parsing result in the library Hive The middle multiple outer subsectors tables of creation, wherein multiple outer subsectors tables and multiple partition directories correspond.
Further, after tables of data to be updated corresponding to more new data table executes deletion table handling in the library Hive, This method further include: delete operation is executed to the data file of tables of data to be updated in HDFS file system.
Further, more new data table is uploaded to the target directory of HDFS file system, comprising: will be on more new data table Reach SFTP server;More new data table is uploaded in the corresponding target directory of HDFS file system from SFTP server.
Further, more new data table is extracted from target service side, comprising: using Kettle tool in target service side Database extracts data;The data extracted are stored to more new data table.
The technical solution that any of the above-described embodiment of the method provides, when updating, first will by the storage organization of external table More new data table is stored into HDFS file system, allows the member that the external table in the library Hive is directly deleted in update table Data, and then the more new data table that new external table is directed toward HDFS file system is created in the library Hive, greatly reduce data Synchronous time-consuming duration, and the process that data update operation is simplified, the data solved in the related technology update the side of synchronization The technical issues of method operation takes a long time.
According to another embodiment of the invention, a kind of data update apparatus based on Hive external table is provided, comprising: Abstraction module, for extracting more new data table from target service side, wherein the table name of more new data table meets preset table name more New rule;Uploading module, for more new data table to be uploaded to the target directory of HDFS file system;First removing module is used In the matched tables of data to be updated of table name according to table name update rule lookup and more new data table in the library Hive, and treat more New data table, which executes, deletes table handling, wherein tables of data to be updated meets table name more for be created in advance in the library Hive, table name The external table to be updated of new rule;Creation module, for executing creation external table behaviour in the library Hive for more new data table Make, wherein the external table path for creating external table handling is the path of target directory.
Further, uploading module includes: the first creating unit, for creating in target directory according to default partitioned mode Build multiple partition directories;First uploading unit, for more new data table to be uploaded to respectively in corresponding partition directory;Create mould Block includes: resolution unit, for parsing the partition directory for including in target directory;Second creating unit, for being tied according to parsing Fruit creates multiple outer subsectors tables in the library Hive, wherein multiple outer subsectors tables and multiple partition directories correspond.
Further, the device further include: the second removing module, for executing deletion table to the tables of data to be updated After operation, delete operation is executed to data file in HDFS file system, corresponding with tables of data to be updated is stored in.
Further, uploading module includes: the second uploading unit, for more new data table to be uploaded to SFTP server; Third uploading unit, for more new data table to be uploaded in the corresponding target directory of HDFS file system from SFTP server.
Further, abstraction module includes: extracting unit, for the database using Kettle tool in target service side Extract data;Storage unit, for storing the data extracted to more new data table.
The technical solution that any of the above-described Installation practice provides, when updating, first will by the storage organization of external table More new data table is stored into HDFS file system, allows the member that the external table in the library Hive is directly deleted in update table Data, and then the more new data table that new external table is directed toward HDFS file system is created in the library Hive, greatly reduce data Synchronous time-consuming duration, and the process that data update operation is simplified, the data solved in the related technology update the side of synchronization The technical issues of method operation takes a long time.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation, and Technical effect corresponding with corresponding embodiment of the method can be reached.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described Step in embodiment of the method, and technical effect corresponding with corresponding embodiment of the method can be reached.
Through the invention, first more new data table is stored to HDFS file when updating by the storage organization of external table In system, allow the metadata for directly deleting the external table in the library Hive in update table, and then is created in the library Hive New external table is directed toward the more new data table of HDFS file system, greatly reduces the synchronous time-consuming duration of data, and simplify Data update the process of operation, solve data in the related technology and update the technology that synchronous method operation takes a long time and ask Topic.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the data-updating method according to an embodiment of the present invention based on Hive external table;
Fig. 2 is the schematic diagram of the data update apparatus according to an embodiment of the present invention based on Hive external table;
Fig. 3 is a kind of hardware block diagram of computer equipment of the embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments, in the absence of conflict, embodiment and reality in the application The feature applied in example can be combined with each other.Based on the embodiment in the application, those of ordinary skill in the art are not making wound Every other embodiment obtained under the premise of the property made labour, shall fall within the protection scope of the present application.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
Embodiment 1
Present embodiments provide a kind of data-updating method based on Hive external table, can run on computer equipment, Among server, mobile terminal, handheld terminal or similar arithmetic facility.Operating in different arithmetic facilities only is that scheme is being held Difference in row main body, those skilled in the art are contemplated that in nonidentity operation equipment, operation can generate identical technical effect.
Data-updating method provided in this embodiment based on Hive external table is used to data update being synchronized to the library Hive In, specifically, the tables of data in the library Hive uses the storage organization of external table, it, first will the storage of more new data table when updating Into HDFS file system, allow the metadata that the external table in the library Hive is directly deleted in update table, Jin Er The more new data table that new external table is directed toward HDFS file system is created in the library Hive, when greatly reducing the time-consuming that data synchronize It is long, and simplify the process that data update operation.
As shown in Figure 1, the data-updating method provided in this embodiment based on Hive external table includes the following steps:
Step 101, more new data table is extracted from target service side, wherein the table name of more new data table meets preset table Name updates rule.
Business side is the data source side of the data-updating method based on Hive external table of the present embodiment, be can be any The data of the data of type, business side may change, for example, the customer information, etc. of bank.It is provided in this embodiment Data-updating method based on Hive external table can be updated according to the actual demand of target service side, the purpose of update be by The data pick-up to be updated of data source (target service side) goes out, and is updated in the library Hive.The movement that triggering updates can be It is updated according to the fixed period, alternatively, the demand in response to target service side is updated, the embodiment of the present invention, which does not do this, to be had Body limits, and when needing more new data table, starts to execute method and step provided in an embodiment of the present invention.
Some data extraction tools can be used by extracting update table, for example, using data warehouse technology (Extract- Transform-Load, abbreviation ETL) tool.ETL tool is that one kind can execute from data source extraction, turn to data It changes, data is loaded onto the tools of the operations such as target side, in ETL tool, an execution data pick-up task can be created Tool comes out the data pick-up of target service side, and can be written into the file of preset format, obtains more new data Table.
Wherein, the table name of more new data table is that rule name is updated according to preset table name, for example, can use The format convention of " xxbank_customerdata_ update date " is named table name, wherein the previous section of table name Field " xxbank_customerdata_ " be it is constant, " update date " according to the date on the same day determine, thus from table name area Separate the tables of data that same date does not update.Correspondingly, table can also be based on tables of data to be updated corresponding to more new data table Name updates rule searching and arrives.
Optionally, it can choose the task that Kettle tool executes data pick-up.Kettle tool is a kind of ETL of open source Tool, pure java write, and can run on Windows, Linux, Unix, and the Job in Kettle tool set can describe number According to the particular content of operation.Therefore, it can use the Job in ETL tool more new data is extracted and write from target service side Enter in more new data table.It may include: using Kettle tool in mesh correspondingly, extracting more new data table from target service side The database of mark business side extracts data, and the data extracted are stored to more new data table.
Step 102, more new data table is uploaded to the target directory of HDFS file system.
After extracting more new data table from target service side, more new data table is uploaded to corresponding to the library Hive In the target directory of HDFS file system (Hadoop DistributedFile System, distributed file system).By table Before loading (load) to the library Hive, require to be uploaded to table in HDFS file system.
It should be noted that target directory can be any one specified catalogue on HDFS, still, catalogue "/ Except user/hive/warehouse ".That is, more new data table can be uploaded to HDFS file system except "/user/ In any one preassigned catalogue other than hive/warehouse ".This is because catalogue "/user/hive/ Warehouse " is in HDFS file system for storing the default storage location of internal table data, it is impossible to be used in storage external table Data.
Step 103, it is searched in the library Hive according to table name update rule matched to be updated with the table name of more new data table Tables of data, and tables of data to be updated is executed and deletes table handling.
The embodiment of the present invention is to become for being updated to the tables of data in the library Hive since the data of business side can exist Dynamic, the corresponding data in the library Hive is also required to update therewith.In the present embodiment, the tables of data of business side in the library Hive all with The form of external table stores, therefore, before this update before the tables of data that once updates be also with outside in the library Hive The form of table exists.In order to update the data in the library Hive, by the library Hive with more new data table corresponding to data to be updated Table is deleted, and deletes table handling (drop operation) that is, executing, the external table of the business side once stored preceding in the library Hive is deleted It removes.Tables of data to be updated is the external table being created in advance in the library Hive, is tables of data to be updated.
When searching tables of data to be updated matched with the table name of more new data table, can be updated based on table name it is regular come Match.For example, if each data update when, using the format convention of " xxbank_ customer data _ update date " come to table name It is named, this tables of data table name updated is " yyyy_customerdata_20190124 ", then searches in the library Hive Table name has the tables of data of " yyyy_customerdata " field, after finding, determines that it is corresponding with more new data table Tables of data to be updated.
It when using external table, is executed in the library Hive and deletes external table handling, it is outer only to delete description in the library Hive The metadata of portion's table, metadata (Metadata) are a kind of data for describing other data, can be regarded as a kind of electronic type catalogue, The attribute of other data can be described, to indicate the storage location etc. of other data.Therefore, the metadata in the library Hive is being deleted Later, the data file in HDFS file system described in metadata is not will be deleted, file still exists in HDFS file In system.Therefore, the response speed for deleting external table handling is very fast, convenient for quickly handling next creation external table behaviour Make.
Optionally, the text of the data in HDFS file system will not be deleted due to executing the external table handling of deletion in the library Hive Part is executed in the library Hive after deleting external table handling, the number to be updated still stored in HDFS file system in order to prevent System memory space is occupied according to table, or may result in and accidentally store more new data table to before when being updated later on tables of data Catalogue in, tables of data to be updated is deleted in HDFS file system, with reach complete deletion tables of data to be updated data text The purpose of part.
Step 104, the external table handling of creation is executed in the library Hive for more new data table.
Tables of data to be updated is executed in the library Hive after deleting external table handling, is executed outside creation in the library Hive Table handling, the external table path for creating external table handling is the path of target directory, that is, creation is for describing in the library Hive The metadata of external table.
When creating external table in Hive database, the organization of create table sentence can be used External clause creates.External table (external table) refers to the table being not present in Hive database, external table It is to belong to an operating system file among HDFS file system, by creating description external table in Hive database Metadata, to enable Hive database to execute some limited (read-only) operations to external table, one can be operated System file treats as a read-only database table, carries out just as being stored in a general data library table these data Access.
For example, can execute inquiry operation and connection to external table can pass through when accessing to external table SQL statement is completed, and without first the data in external table are loaded into Hive database, supports parallel work-flow.
Optionally, can using outer subsectors table form store in Hive database with more new data, can be preparatory Predefined when creating Hive database, define the mode of subregion, can by some field of each data come Subregion is carried out, for example, subregion can be carried out according to day, data on the same day are stored to the same partition directory of target directory Under, data on the same day are not respectively stored under the partition directory that target directory corresponds to day.
It is corresponding with partitioned mode predefined in the library Hive, in the mesh that more new data is uploaded to HDFS file system When in heading record, also multiple partition directories are created in target directory according to the default partitioned mode of Hive database, in turn, More new data table is uploaded to respectively in corresponding partition directory.For example, when extracting data, it is by date field The update number of partition directory corresponding on December 13rd, to be uploaded 1 this day is written in the data entry of " 2018-12-13 " According in table, in turn, after the data pick-up write-in of a tables of data finishes, it is uploaded in corresponding partition directory.
Correspondingly, when the external table that creates the division in the library Hive, included in the target directory of parsing HDFS file system Partition directory, and according to parsing as a result, in the library Hive the multiple outer subsectors tables (metadata) of corresponding creation.It needs Illustrate, multiple outer subsectors tables (metadata) in the library Hive are one-to-one with multiple partition directories.
Optionally, when creating outer subsectors table, it can configure and be uploaded in the either partition catalogue in HDFS file system After more new data table finishes, i.e., corresponding outer subsectors table is created in the library Hive and is directed toward corresponding partition directory, without It waits the more new data table of all partition directories in HDFS file system all to upload to finish.
Optionally, it since more new data table can be multiple tables, is uploaded in HDFS file system by more new data table When, be configurable to from target service side is every extracted a more new data table after, be uploaded in HDFS file system, be not necessarily to All more new data table extractions are waited to finish.Alternatively, in order to reduce with HDFS file system connection disconnect probability, reduce with The connection number of HDFS file system can also divide multiple more new data tables according to the concurrent concatenation ability of HDFS file system It is multiple groups, each group includes multiple tables, and all tables extraction in one group is uploaded to HDFS file system after finishing.
As a kind of optional embodiment, when more new data table to be uploaded to the target directory of HDFS file system, tool The operation of body can be, and first the update data pick-up of target service side is written into local more new data table, then will update Tables of data is uploaded to SFTP server, in turn, will more new data table from SFTP server to be uploaded to HDFS file system corresponding In target directory.
It should be noted that will more new data table be uploaded in HDFS file system from SFTP server when, Ke Yixuan Any optional embodiment for the step of selecting using above-mentioned to the target directory that more new data table is uploaded to HDFS file system, To reach the attainable technical effect of above-mentioned corresponding embodiment institute.
Corresponding to above-mentioned optional embodiment, show using the data guiding flow of external table to provide a kind of specific process Example are as follows: Kettle produces file-> sftp server-> hdfs file system.That is, firstly, using Kettle tool from mesh Mark business side extracts more new data, generates more new data table.Secondly, more new data table is uploaded to SFTP server.Finally, will More new data table is uploaded to HDFS file system.
Compared with using the storage organization of internal table, when importing data using internal table, it is uploaded to by tables of data After HDFS file system, it is also necessary to which Hive establishes data connection in library, and more new data table is imported into the data of Hive database Warehouse address, importing process are cumbersome.
In addition, the library Hive after importing the data, needs to update (to be updated) for the storage organization of internal table Data structure is compared with updated data structure, to determine the need for original inside table deleting (drop), into And need to import more new data again, otherwise directly update the problem of causing serial mode.
And more new data is stored in HDFS file system when importing more new data by the more new data in the form of external table In system, in the library Hive, the metadata for the tables of data that description does not update can be directly deleted, new description is then created The metadata of external table enormously simplifies the operating process of more new data table, has reached the entire management process of optimization, simplifies data Problem reparation improves the effect of data updating efficiency.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not The sequence being same as herein executes shown or described step.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
A kind of data update apparatus based on Hive external table is additionally provided in the present embodiment, and the device is for realizing upper Embodiment 1 and its preferred embodiment are stated, to the term or implementation not being described in detail in this present embodiment, reference can be made to embodiment 1 In related description, the descriptions that have already been made will not be repeated.
Term " module " as used below, can be achieved on the combination of the software and/or hardware of predetermined function.Although Device described in following embodiment is preferably realized with software, but the combined realization of hardware or software and hardware And can be contemplated.
Fig. 2 is the schematic diagram of the data update apparatus according to an embodiment of the present invention based on Hive external table, such as Fig. 2 institute Show, which includes: abstraction module 10, uploading module 20, the first removing module 30 and creation module 40.
Wherein, abstraction module, for extracting more new data table from target service side, wherein the table name of more new data table accords with It closes preset table name and updates rule;Uploading module, for more new data table to be uploaded to the target directory of HDFS file system;The One removing module, for updating the matched number to be updated of table name of rule lookup and more new data table in the library Hive according to table name According to table, and to tables of data to be updated execute delete table handling, wherein tables of data to be updated be the library Hive in be created in advance, Table name meets the external table to be updated that table name updates rule;Creation module, for being held in the library Hive for more new data table Row creates external table handling, wherein the external table path for creating external table handling is the path of target directory.
Optionally, uploading module includes: the first creating unit, for creating in target directory according to default partitioned mode Multiple partition directories;First uploading unit, for more new data table to be uploaded to respectively in corresponding partition directory;Creation module It include: resolution unit, for parsing the partition directory for including in target directory;Second creating unit, for according to parsing result Multiple outer subsectors tables are created in the library Hive, wherein multiple outer subsectors tables and multiple partition directories correspond.
Optionally, the device further include: the second removing module, for executing deletion table behaviour to the tables of data to be updated After work, delete operation is executed to data file in HDFS file system, corresponding with tables of data to be updated is stored in.
Optionally, uploading module includes: the second uploading unit, for more new data table to be uploaded to SFTP server;The Three uploading units, for more new data table to be uploaded in the corresponding target directory of HDFS file system from SFTP server.
Optionally, abstraction module includes: extracting unit, is taken out for the database using Kettle tool in target service side Access evidence;Storage unit, for storing the data extracted to more new data table.
The embodiment of the present invention is first stored more new data table to HDFS text when updating by the storage organization of external table In part system, allow the metadata for directly deleting the external table in the library Hive in update table, and then is created in the library Hive The more new data table that new external table is directed toward HDFS file system is built, the synchronous time-consuming duration of data, and letter are greatly reduced The process that data update operation is changed, has solved the technology that data update synchronous method operation in the related technology takes a long time and ask Topic.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any Combined form is located in different processors.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
Embodiment 3
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store computer program such as disk, magnetic or disk.
Embodiment 4
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.By taking electronic device is computer equipment as an example, figure 3 be a kind of hardware block diagram of computer equipment of the embodiment of the present invention.As shown in figure 3, mobile terminal may include one Or (processor 302 can include but is not limited to Micro-processor MCV or programmable to multiple (one is only shown in Fig. 3) processors 302 The processing unit of logical device FPGA etc.) and memory 304 for storing data, optionally, above-mentioned mobile terminal can be with Including the transmission device 306 and input-output equipment 308 for communication function.It will appreciated by the skilled person that Structure shown in Fig. 3 is only to illustrate, and does not cause to limit to the structure of above-mentioned mobile terminal.For example, mobile terminal can also wrap Include than shown in Fig. 3 more perhaps less component or with the configuration different from shown in Fig. 3.
Memory 304 can be used for storing computer program, for example, the software program and module of application software, such as this hair The corresponding computer program of the recognition methods of image in bright embodiment, processor 302 are stored in memory 304 by operation Computer program realize above-mentioned method thereby executing various function application and data processing.Memory 304 can wrap Include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or Other non-volatile solid state memories.In some instances, memory 304 can further comprise long-range relative to processor 302 The memory of setting, these remote memories can pass through network connection to mobile terminal.The example of above-mentioned network includes but not It is limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 306 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of mobile terminal provide.In an example, transmitting device 306 includes a network adapter (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments so as to It is communicated with internet.In an example, transmitting device 306 can be radio frequency (Radio Frequency, referred to as RF) Module is used to wirelessly be communicated with internet.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of data-updating method based on Hive external table characterized by comprising
More new data table is extracted from target service side, wherein the table name of the more new data table meets preset table name and updates rule Then;
The more new data table is uploaded to the target directory of HDFS file system;
The matched data to be updated of table name of rule lookup and the more new data table in the library Hive are updated according to the table name Table, and the tables of data to be updated is executed and deletes table handling, wherein the tables of data to be updated is preparatory in the library Hive Be created, table name meets the external table to be updated that the table name updates rule;
The external table handling of creation is executed in the library Hive for the more new data table, wherein the creation external table behaviour The external table path of work is the path of the target directory.
2. the method according to claim 1, wherein
The target directory that the more new data table is uploaded to HDFS file system, comprising: exist according to default partitioned mode Multiple partition directories are created in the target directory;The more new data table is uploaded to respectively in corresponding partition directory;
Described execute in the library Hive creates external table handling, comprising: parses the subregion mesh for including in the target directory Record;Multiple outer subsectors tables are created in the library Hive according to parsing result, wherein the multiple outer subsectors table with it is described Multiple partition directories correspond.
3. the method according to claim 1, wherein the tables of data to be updated is executed delete table handling it Afterwards, the method also includes:
Behaviour is deleted to being stored in data file in the HDFS file system, corresponding with the tables of data to be updated and executing Make.
4. the method according to claim 1, wherein described be uploaded to HDFS file system for the more new data table The target directory of system, comprising:
The more new data table is uploaded to SFTP server;
The more new data table is uploaded to the corresponding target directory of the HDFS file system from the SFTP server In.
5. the method according to claim 1, wherein described extract more new data table from target service side, comprising:
Database using Kettle tool in the target service side extracts data;
The data extracted are stored to the more new data table.
6. a kind of data update apparatus based on Hive external table characterized by comprising
Abstraction module, for extracting more new data table from target service side, wherein the table name of the more new data table meets default Table name update rule;
Uploading module, for the more new data table to be uploaded to the target directory of HDFS file system;
First removing module, for updating the table name of rule lookup and the more new data table in the library Hive according to the table name Matched tables of data to be updated, and the tables of data to be updated is executed and deletes table handling, wherein the tables of data to be updated is Be created in advance in the library Hive, table name meets the external table to be updated that the table name updates rule;
Creation module, for executing the external table handling of creation in the library Hive for the more new data table, wherein described The external table path for creating external table handling is the path of the target directory.
7. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to perform claim when operation and requires method described in 1 to 5 any one.
8. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program in method described in perform claim 1 to 5 any one of requirement.
CN201910340498.9A 2019-04-25 2019-04-25 Data-updating method, device and electronic device based on Hive external table Pending CN110209680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910340498.9A CN110209680A (en) 2019-04-25 2019-04-25 Data-updating method, device and electronic device based on Hive external table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910340498.9A CN110209680A (en) 2019-04-25 2019-04-25 Data-updating method, device and electronic device based on Hive external table

Publications (1)

Publication Number Publication Date
CN110209680A true CN110209680A (en) 2019-09-06

Family

ID=67786494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910340498.9A Pending CN110209680A (en) 2019-04-25 2019-04-25 Data-updating method, device and electronic device based on Hive external table

Country Status (1)

Country Link
CN (1) CN110209680A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241203A (en) * 2020-02-10 2020-06-05 江苏满运软件科技有限公司 Hive data warehouse synchronization method, system, equipment and storage medium
CN111984659A (en) * 2020-07-28 2020-11-24 招联消费金融有限公司 Data updating method and device, computer equipment and storage medium
CN113254535A (en) * 2021-06-08 2021-08-13 成都新潮传媒集团有限公司 Method and device for synchronizing data from mongodb to mysql and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140074790A1 (en) * 2012-09-12 2014-03-13 International Business Machines Corporation Using a metadata image of a file system and archive instance to restore data objects in the file system
CN107967316A (en) * 2017-11-22 2018-04-27 平安科技(深圳)有限公司 A kind of method of data synchronization, equipment and computer-readable recording medium
CN108241724A (en) * 2017-05-11 2018-07-03 新华三大数据技术有限公司 A kind of metadata management method and device
CN108255909A (en) * 2017-07-27 2018-07-06 平安科技(深圳)有限公司 Tables of data backup method and server based on oracle database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140074790A1 (en) * 2012-09-12 2014-03-13 International Business Machines Corporation Using a metadata image of a file system and archive instance to restore data objects in the file system
CN108241724A (en) * 2017-05-11 2018-07-03 新华三大数据技术有限公司 A kind of metadata management method and device
CN108255909A (en) * 2017-07-27 2018-07-06 平安科技(深圳)有限公司 Tables of data backup method and server based on oracle database
CN107967316A (en) * 2017-11-22 2018-04-27 平安科技(深圳)有限公司 A kind of method of data synchronization, equipment and computer-readable recording medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘如九;张振山;柴天佑;: "一种通用的多数据库间数据抽取方法及应用", 北京交通大学学报, no. 04, 15 August 2008 (2008-08-15) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241203A (en) * 2020-02-10 2020-06-05 江苏满运软件科技有限公司 Hive data warehouse synchronization method, system, equipment and storage medium
CN111984659A (en) * 2020-07-28 2020-11-24 招联消费金融有限公司 Data updating method and device, computer equipment and storage medium
CN111984659B (en) * 2020-07-28 2023-07-21 招联消费金融有限公司 Data updating method, device, computer equipment and storage medium
CN113254535A (en) * 2021-06-08 2021-08-13 成都新潮传媒集团有限公司 Method and device for synchronizing data from mongodb to mysql and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN107111450B (en) Disk partition stitching and rebalancing using partition tables
CN107832406B (en) Method, device, equipment and storage medium for removing duplicate entries of mass log data
CN110209680A (en) Data-updating method, device and electronic device based on Hive external table
CN105677250B (en) The update method and updating device of object data in object storage system
CN107092652B (en) Navigation method and device for target page
CN105956123A (en) Local updating software-based data processing method and apparatus
US9977804B2 (en) Index updates using parallel and hybrid execution
US20220179642A1 (en) Software code change method and apparatus
CN106709066B (en) Data synchronization method and device
CN105847336A (en) Address book synchronization method and device
CN110442578A (en) Zipper table updating method, device, server and computer readable storage medium
CN109739828B (en) Data processing method and device and computer readable storage medium
KR20160100216A (en) Method and device for constructing on-line real-time updating of massive audio fingerprint database
US10241963B2 (en) Hash-based synchronization of geospatial vector features
CN105812469A (en) Address book synchronization method and device
CN104933051B (en) File storage recovery method and device
CN108427736B (en) Method for querying data
CN105653258A (en) Code processing method and apparatus
US9031905B2 (en) Data synchronization
US10430400B1 (en) User controlled file synchronization limits
US20210216516A1 (en) Management of a secondary vertex index for a graph
EP2840501A1 (en) Database management method, database system and program
CN109388644B (en) Data updating method and device
CN106682199B (en) Method and device for realizing automatic expansion of Mongos cluster
CN109558270A (en) Method and apparatus, the method and apparatus of data convert of data backup

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination