CN107679251A

CN107679251A - Universal Database abstracting method based on Spoon under big data environment

Info

Publication number: CN107679251A
Application number: CN201711064474.2A
Authority: CN
Inventors: 曹亮
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2017-11-02
Filing date: 2017-11-02
Publication date: 2018-02-09

Abstract

The invention discloses the Universal Database abstracting method based on Spoon under a kind of big data environment, the Spoon graphic interfaces that Kettle instruments are provided and the mould plate technique based on Java（Velocity）It is combined, user can flexibly set parameter, extract related data from different databases to quickness and high efficiency, be laid a good foundation for big data analysis.

Description

Universal Database abstracting method based on Spoon under big data environment

Technical field

The invention belongs to field of computer technology, and in particular to the conventional data based on Spoon under a kind of big data environment Storehouse abstracting method.

Background technology

In the epoch of informationization, the data of magnanimity, such as the shopping information of e-commerce website, train can be all produced daily Relevant information that the ticket purchase Transaction Information of platform, the chat message of live chat instrument, industrial site gather etc., it is Facilitating management, these data can be all stored in corresponding database with consulting.The mechanisms such as enterprise, factory, research institute are being handled When owned magnanimity, high growth rate and diversified information assets, discovery can not use traditional data base administration Software is efficiently managed, it is necessary to is used new tupe, is gone out valuable information from the extracting data of magnanimity.

Thus, big data technology is arisen at the historic moment.The strategic importance of big data technology is for the sea comprising useful information Measure data and carry out data mining, extract valuable information.Big data needs special technology, such as MPP （MPP）Database, data mining, distributed file system, distributed data base, cloud computing platform, internet and expansible Storage system.

Under big data environment, mass data storage is in different databases, and different databases is in terms of data structure Larger difference often be present, how related data is extracted to quickness and high efficiency from different databases, while realize difference The conversion of data structure is very crucial for the realization of data mining between database.

The content of the invention

In order to solve the above problems, the invention provides the Universal Database based on Spoon under a kind of big data environment to take out Take method, it is characterised in that this method comprises the following steps：

1）Execution flow chart is made by Spoon modeling tools；

2）Execution flow chart relevant parameter configures；

3）Test is performed in Spoon instruments；

4）The executable template file of generation；

5）Build the environment that Spoon, Velocity can perform in java；

6）Design generateVM methods；

7）Design executeSpoon methods；

8）Design the entry method that Java is called.

Further, the step 2）Described in relevant parameter include table input database link information, table output data Storehouse link information, table output database table, table output field map configuration and the paging of big data quantity and are circularly set.

Further, the step 3）Described in test include transformation verify and impact analysis.

Further, the step 4）In by Velocity mould plate techniques rule carry out dynamic setting parameter in xml.

Further, the parameter include source database connection in $ { source_name }, $ { source_server }, And field attribute $ { source_colum1 }, $ { source_colum2 }, the $ { target_ in target database connection Name }, $ { target_server } and field attribute $ { target_colum1 }, $ { target_colum2 }, source database To be corresponded with the field attribute in target database.

Further, the step 5）Specific build requirement under a windows environment, to install JRE（Java Runtime Environment）, release requirement be 1.5 and its more than, under class Unix environment, pass through the order of input Shell scripts are in executable state.

Further, the step 6）Specifically comprise the following steps：

a）Define the Properties classes that Velocity frameworks need；

b）VelocityEngine classes are defined, and are initialized using the Properties attributes set；

c）The getTemplate methods provided by VelocityEngine classes obtain the template class Template of file；

d）By Template meger methods, the variable set in template file is replaced by parameter information set；

e）Return to the file replaced and can performed after variable in Spoon（XML）.

Further, the step b）Middle VelocityEngine can create a new velocity example every time, Respective example can be possessed by so allowing for each velocity engines, not interfere with each other.

Brief description of the drawings

Fig. 1 is the particular flow sheet of database abstracting method.

Embodiment

In order to which technical characteristic, purpose and the effect of the present invention is more clearly understood, now control illustrates this hair Bright embodiment.

The Universal Database abstracting method based on Spoon, described Universal Database are common under a kind of big data environment Relational database, such as oracle, mysql, sqlserver, db2, postgresql, hbase etc., method includes following step Suddenly：

1）Execution flow chart is made by Spoon modeling tools

ETTL (Extract-Transform-Transport-Load) is used for describing data from source terminal by extracting （extract）, conversion（transform）, transport（ransport）, loading（load）The process of destination is reached Deng operation. Kettle is the ETTL instruments that a foreign countries increase income, and pure java writes, can be in a variety of operations such as Window, Linux, Unix Run in system, data pick-up efficient stable.Kettle provides graphic user interface Spoon, it is allowed to which user is above interface Just execution flow chart is quickly made, forms transformation script files, transformation script files are used for Complete the basis conversion for data.The instrument Pan that Kettle is included is a data transformation engine, can carry out the reading of data Take, computing and write-in, it is allowed to which batch operation changes (such as using a time scheduler) by the ETTL of Spoon designs.Pan It is the program that a backstage performs, without graphical interfaces.

2）Execution flow chart relevant parameter configures

Include four concepts that are mutually related in transformation, respectively Value, Row, Input Stream, Hop and Note.Wherein Value belongs to a Row part, can include any kind of data, for example, character string, floating number, integer, Date, Boolean etc..Multiple while processing Value forms Row as unified input.Included in one operating procedure Multiple Row composition Input Stream.Hop refers to the graph-based of data flow between two states, and a Hop is always represented The output and the input of another state of one state.Note refers to that the descriptive matter in which there in transfer process can be added.

Transformation operations need to come out the data pick-up in output database, will by corresponding conversion Data after conversion are put into input database, it is contemplated that relational database realizes the storage of information by descriptive form, is Transformation operations are smoothly completed, it is necessary to configure following parameter in transformation Value：

a）Table input database link information；

b）Table output database link information；

c）Table output database table；

d）The mapping configuration of table output field；

e）The paging of big data quantity and it is circularly set.

3）Test is performed in Spoon instruments

In graphical interfaces transformation checkings and impact analysis are carried out by clicking on respective icon.transformation Checking will test to each state, it is ensured that carry out whole transfer process with designed order.Impact analysis will be pre- Survey the influence that conversion operation may be brought for database.

4）The executable template file of generation

Generating process includes following steps：

a）Export performs the configuration file that test passes through（XML）；

b）The parameter of input will be tested in 1st step, by mould plate technique（Velocity）Rule be set dynamically in xml （The main parameter information related including source storehouse, object library, field mapping, paging etc.）, that is, utilize mould plate technique velocity's Variable-definition mode, the parameter of replacement test, $ { source_name }, $ { source_ in being connected such as source database Server } and field attribute $ { source_colum1 }, $ { source_colum2 }, the $ in target database connection { target_name }, $ { target_server } and field attribute $ { target_colum1 }, $ { target_colum2 }, Source database will correspond with the field attribute in target database；

c）The XML file set up is saved as into the executable VM files of mould plate technique.

5）Build the environment that Spoon, Velocity can perform in java

Under a windows environment, it is necessary to install JRE（Java Runtime Environment）, release requirement be 1.5 and its with On, under class Unix environment（Such as solaris operating system, (SuSE) Linux OS, MacOS operating systems）, user must make Obtain shell scripts and be in executable state, concrete operations are such as issued orders for input：

cd Kettle；

chmod +x *.sh。

It should be noted that Velocity also relies on some other jar bag, have in the build/lib of distribution version, If what is downloaded is that binary system distribution version specifically includes, it is necessary to download other bag that rely on to following address：Jakarta Commons Collections-necessary；Jakarta Avalon Logkit-optional, but strong row suggestion adds, so as to defeated Go out log information；Jakarta ORO-optional, only when using org.apache.velocity.convert.WebMacro Needed during this template switch instrument of template.

6）Design generateVM methods

The parameter of this method is to need to replace the information about dynamic parameters set in VM files（JavaBean）, generate in the 3rd step Catalogue and filename where the template file that can be performed in Velocity.VM file parameters are replaced using Velocity technologies The XML file performed for Spoon, the XML file of generation is stored in temp directory, specific design process is as follows：

a）Define the Properties classes that Velocity frameworks need.Can be carried out in Properties classes template coding, Foreach configurations, set configurations, include configurations, parse configurations, the configuration of template loader, grand configuration, explorer are matched somebody with somebody Put, resolver pond configuration, can be inserted into introspector configuration etc. many kinds of parameters configuration, herein according to being actually needed, put Enter the information such as the file path for needing to replace；

b）VelocityEngine classes are defined, and are initialized using the Properties attributes set. Under org.apache.velocity.app bags, the two classes of Velocity and VelocityEngine, the method base of the inside are had This is identical, and both differences are that Velocity classes are the classes of a velocity example for initializing singleton, can allow multiple classes An example is shared, allows them to be able to access that same data, and VelocityEngine can create one newly every time Velocity examples, respective example can be possessed by so allowing for each velocity engines, not interfere with each other.In order to realize The isolation operation of velocity engines, herein using VelocityEngine classes, and carried out initially using Properties attributes Change；

7）Design executeSpoon methods

The parameter of this method is the path of the temporary file generated in the 5th step, passes through the API Calls of spoon offers.

8）Design the entry method that Java is called

The parameter information that method needs in configuration step 6, complete data access operation.

Above disclosure is only preferred embodiment of present invention, it is impossible to the interest field of the present invention is limited with this, The equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims

1. the Universal Database abstracting method based on Spoon under a kind of big data environment, it is characterised in that this method includes as follows Step：

1）Execution flow chart is made by Spoon modeling tools；

2）Execution flow chart relevant parameter configures；

3）Test is performed in Spoon instruments；

4）The executable template file of generation；

5）Build the environment that Spoon, Velocity can perform in java；

6）Design generateVM methods；

7）Design executeSpoon methods；

8）Design the entry method that Java is called.

2. according to the method for claim 1, it is characterised in that the step 2）Described in relevant parameter include table input number According to storehouse link information, table output database link information, table output database table, table output field mapping configuration and big data quantity Paging and be circularly set.

3. according to the method for claim 1, it is characterised in that the step 3）Described in test and include Transformation is verified and impact analysis.

4. according to the method for claim 1, it is characterised in that the step 4）In press Velocity mould plate techniques rule Dynamic setting parameter is carried out in xml.

5. according to the method for claim 4, it is characterised in that the parameter includes the $ { source_ in source database connection Name }, $ { source_server } and field attribute $ { source_colum1 }, $ { source_colum2 }, target data $ { target_name }, $ { target_server } and field attribute $ { target_colum1 }, $ in the connection of storehouse { target_colum2 }, source database will correspond with the field attribute in target database.

6. according to the method for claim 1, it is characterised in that the step 5）Specific to build requirement be in Windows rings Under border, JRE is installed（Java Runtime Environment）, release requirement be 1.5 and its more than, under class Unix environment, Executable state is in by the order shell scripts of input.

7. according to the method for claim 1, it is characterised in that the step 6）Specifically comprise the following steps：

a）Define the Properties classes that Velocity frameworks need；

8. according to the method for claim 7, it is characterised in that the step b）Middle VelocityEngine can be created every time A new velocity example is built, respective example can be possessed by so allowing for each velocity engines, not interfere with each other.