CN107679251A - Universal Database abstracting method based on Spoon under big data environment - Google Patents

Universal Database abstracting method based on Spoon under big data environment Download PDF

Info

Publication number
CN107679251A
CN107679251A CN201711064474.2A CN201711064474A CN107679251A CN 107679251 A CN107679251 A CN 107679251A CN 201711064474 A CN201711064474 A CN 201711064474A CN 107679251 A CN107679251 A CN 107679251A
Authority
CN
China
Prior art keywords
spoon
database
velocity
target
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711064474.2A
Other languages
Chinese (zh)
Inventor
曹亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201711064474.2A priority Critical patent/CN107679251A/en
Publication of CN107679251A publication Critical patent/CN107679251A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The invention discloses the Universal Database abstracting method based on Spoon under a kind of big data environment, the Spoon graphic interfaces that Kettle instruments are provided and the mould plate technique based on Java(Velocity)It is combined, user can flexibly set parameter, extract related data from different databases to quickness and high efficiency, be laid a good foundation for big data analysis.

Description

Universal Database abstracting method based on Spoon under big data environment
Technical field
The invention belongs to field of computer technology, and in particular to the conventional data based on Spoon under a kind of big data environment Storehouse abstracting method.
Background technology
In the epoch of informationization, the data of magnanimity, such as the shopping information of e-commerce website, train can be all produced daily Relevant information that the ticket purchase Transaction Information of platform, the chat message of live chat instrument, industrial site gather etc., it is Facilitating management, these data can be all stored in corresponding database with consulting.The mechanisms such as enterprise, factory, research institute are being handled When owned magnanimity, high growth rate and diversified information assets, discovery can not use traditional data base administration Software is efficiently managed, it is necessary to is used new tupe, is gone out valuable information from the extracting data of magnanimity.
Thus, big data technology is arisen at the historic moment.The strategic importance of big data technology is for the sea comprising useful information Measure data and carry out data mining, extract valuable information.Big data needs special technology, such as MPP (MPP)Database, data mining, distributed file system, distributed data base, cloud computing platform, internet and expansible Storage system.
Under big data environment, mass data storage is in different databases, and different databases is in terms of data structure Larger difference often be present, how related data is extracted to quickness and high efficiency from different databases, while realize difference The conversion of data structure is very crucial for the realization of data mining between database.
The content of the invention
In order to solve the above problems, the invention provides the Universal Database based on Spoon under a kind of big data environment to take out Take method, it is characterised in that this method comprises the following steps:
1)Execution flow chart is made by Spoon modeling tools;
2)Execution flow chart relevant parameter configures;
3)Test is performed in Spoon instruments;
4)The executable template file of generation;
5)Build the environment that Spoon, Velocity can perform in java;
6)Design generateVM methods;
7)Design executeSpoon methods;
8)Design the entry method that Java is called.
Further, the step 2)Described in relevant parameter include table input database link information, table output data Storehouse link information, table output database table, table output field map configuration and the paging of big data quantity and are circularly set.
Further, the step 3)Described in test include transformation verify and impact analysis.
Further, the step 4)In by Velocity mould plate techniques rule carry out dynamic setting parameter in xml.
Further, the parameter include source database connection in $ { source_name }, $ { source_server }, And field attribute $ { source_colum1 }, $ { source_colum2 }, the $ { target_ in target database connection Name }, $ { target_server } and field attribute $ { target_colum1 }, $ { target_colum2 }, source database To be corresponded with the field attribute in target database.
Further, the step 5)Specific build requirement under a windows environment, to install JRE(Java Runtime Environment), release requirement be 1.5 and its more than, under class Unix environment, pass through the order of input Shell scripts are in executable state.
Further, the step 6)Specifically comprise the following steps:
a)Define the Properties classes that Velocity frameworks need;
b)VelocityEngine classes are defined, and are initialized using the Properties attributes set;
c)The getTemplate methods provided by VelocityEngine classes obtain the template class Template of file;
d)By Template meger methods, the variable set in template file is replaced by parameter information set;
e)Return to the file replaced and can performed after variable in Spoon(XML).
Further, the step b)Middle VelocityEngine can create a new velocity example every time, Respective example can be possessed by so allowing for each velocity engines, not interfere with each other.
Brief description of the drawings
Fig. 1 is the particular flow sheet of database abstracting method.
Embodiment
In order to which technical characteristic, purpose and the effect of the present invention is more clearly understood, now control illustrates this hair Bright embodiment.
The Universal Database abstracting method based on Spoon, described Universal Database are common under a kind of big data environment Relational database, such as oracle, mysql, sqlserver, db2, postgresql, hbase etc., method includes following step Suddenly:
1)Execution flow chart is made by Spoon modeling tools
ETTL (Extract-Transform-Transport-Load) is used for describing data from source terminal by extracting (extract), conversion(transform), transport(ransport), loading(load)The process of destination is reached Deng operation. Kettle is the ETTL instruments that a foreign countries increase income, and pure java writes, can be in a variety of operations such as Window, Linux, Unix Run in system, data pick-up efficient stable.Kettle provides graphic user interface Spoon, it is allowed to which user is above interface Just execution flow chart is quickly made, forms transformation script files, transformation script files are used for Complete the basis conversion for data.The instrument Pan that Kettle is included is a data transformation engine, can carry out the reading of data Take, computing and write-in, it is allowed to which batch operation changes (such as using a time scheduler) by the ETTL of Spoon designs.Pan It is the program that a backstage performs, without graphical interfaces.
2)Execution flow chart relevant parameter configures
Include four concepts that are mutually related in transformation, respectively Value, Row, Input Stream, Hop and Note.Wherein Value belongs to a Row part, can include any kind of data, for example, character string, floating number, integer, Date, Boolean etc..Multiple while processing Value forms Row as unified input.Included in one operating procedure Multiple Row composition Input Stream.Hop refers to the graph-based of data flow between two states, and a Hop is always represented The output and the input of another state of one state.Note refers to that the descriptive matter in which there in transfer process can be added.
Transformation operations need to come out the data pick-up in output database, will by corresponding conversion Data after conversion are put into input database, it is contemplated that relational database realizes the storage of information by descriptive form, is Transformation operations are smoothly completed, it is necessary to configure following parameter in transformation Value:
a)Table input database link information;
b)Table output database link information;
c)Table output database table;
d)The mapping configuration of table output field;
e)The paging of big data quantity and it is circularly set.
3)Test is performed in Spoon instruments
In graphical interfaces transformation checkings and impact analysis are carried out by clicking on respective icon.transformation Checking will test to each state, it is ensured that carry out whole transfer process with designed order.Impact analysis will be pre- Survey the influence that conversion operation may be brought for database.
4)The executable template file of generation
Generating process includes following steps:
a)Export performs the configuration file that test passes through(XML);
b)The parameter of input will be tested in 1st step, by mould plate technique(Velocity)Rule be set dynamically in xml (The main parameter information related including source storehouse, object library, field mapping, paging etc.), that is, utilize mould plate technique velocity's Variable-definition mode, the parameter of replacement test, $ { source_name }, $ { source_ in being connected such as source database Server } and field attribute $ { source_colum1 }, $ { source_colum2 }, the $ in target database connection { target_name }, $ { target_server } and field attribute $ { target_colum1 }, $ { target_colum2 }, Source database will correspond with the field attribute in target database;
c)The XML file set up is saved as into the executable VM files of mould plate technique.
5)Build the environment that Spoon, Velocity can perform in java
Under a windows environment, it is necessary to install JRE(Java Runtime Environment), release requirement be 1.5 and its with On, under class Unix environment(Such as solaris operating system, (SuSE) Linux OS, MacOS operating systems), user must make Obtain shell scripts and be in executable state, concrete operations are such as issued orders for input:
cd Kettle;
chmod +x *.sh。
It should be noted that Velocity also relies on some other jar bag, have in the build/lib of distribution version, If what is downloaded is that binary system distribution version specifically includes, it is necessary to download other bag that rely on to following address:Jakarta Commons Collections-necessary;Jakarta Avalon Logkit-optional, but strong row suggestion adds, so as to defeated Go out log information;Jakarta ORO-optional, only when using org.apache.velocity.convert.WebMacro Needed during this template switch instrument of template.
6)Design generateVM methods
The parameter of this method is to need to replace the information about dynamic parameters set in VM files(JavaBean), generate in the 3rd step Catalogue and filename where the template file that can be performed in Velocity.VM file parameters are replaced using Velocity technologies The XML file performed for Spoon, the XML file of generation is stored in temp directory, specific design process is as follows:
a)Define the Properties classes that Velocity frameworks need.Can be carried out in Properties classes template coding, Foreach configurations, set configurations, include configurations, parse configurations, the configuration of template loader, grand configuration, explorer are matched somebody with somebody Put, resolver pond configuration, can be inserted into introspector configuration etc. many kinds of parameters configuration, herein according to being actually needed, put Enter the information such as the file path for needing to replace;
b)VelocityEngine classes are defined, and are initialized using the Properties attributes set. Under org.apache.velocity.app bags, the two classes of Velocity and VelocityEngine, the method base of the inside are had This is identical, and both differences are that Velocity classes are the classes of a velocity example for initializing singleton, can allow multiple classes An example is shared, allows them to be able to access that same data, and VelocityEngine can create one newly every time Velocity examples, respective example can be possessed by so allowing for each velocity engines, not interfere with each other.In order to realize The isolation operation of velocity engines, herein using VelocityEngine classes, and carried out initially using Properties attributes Change;
c)The getTemplate methods provided by VelocityEngine classes obtain the template class Template of file;
d)By Template meger methods, the variable set in template file is replaced by parameter information set;
e)Return to the file replaced and can performed after variable in Spoon(XML).
7)Design executeSpoon methods
The parameter of this method is the path of the temporary file generated in the 5th step, passes through the API Calls of spoon offers.
8)Design the entry method that Java is called
The parameter information that method needs in configuration step 6, complete data access operation.
Above disclosure is only preferred embodiment of present invention, it is impossible to the interest field of the present invention is limited with this, The equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims (8)

1. the Universal Database abstracting method based on Spoon under a kind of big data environment, it is characterised in that this method includes as follows Step:
1)Execution flow chart is made by Spoon modeling tools;
2)Execution flow chart relevant parameter configures;
3)Test is performed in Spoon instruments;
4)The executable template file of generation;
5)Build the environment that Spoon, Velocity can perform in java;
6)Design generateVM methods;
7)Design executeSpoon methods;
8)Design the entry method that Java is called.
2. according to the method for claim 1, it is characterised in that the step 2)Described in relevant parameter include table input number According to storehouse link information, table output database link information, table output database table, table output field mapping configuration and big data quantity Paging and be circularly set.
3. according to the method for claim 1, it is characterised in that the step 3)Described in test and include Transformation is verified and impact analysis.
4. according to the method for claim 1, it is characterised in that the step 4)In press Velocity mould plate techniques rule Dynamic setting parameter is carried out in xml.
5. according to the method for claim 4, it is characterised in that the parameter includes the $ { source_ in source database connection Name }, $ { source_server } and field attribute $ { source_colum1 }, $ { source_colum2 }, target data $ { target_name }, $ { target_server } and field attribute $ { target_colum1 }, $ in the connection of storehouse { target_colum2 }, source database will correspond with the field attribute in target database.
6. according to the method for claim 1, it is characterised in that the step 5)Specific to build requirement be in Windows rings Under border, JRE is installed(Java Runtime Environment), release requirement be 1.5 and its more than, under class Unix environment, Executable state is in by the order shell scripts of input.
7. according to the method for claim 1, it is characterised in that the step 6)Specifically comprise the following steps:
a)Define the Properties classes that Velocity frameworks need;
b)VelocityEngine classes are defined, and are initialized using the Properties attributes set;
c)The getTemplate methods provided by VelocityEngine classes obtain the template class Template of file;
d)By Template meger methods, the variable set in template file is replaced by parameter information set;
e)Return to the file replaced and can performed after variable in Spoon(XML).
8. according to the method for claim 7, it is characterised in that the step b)Middle VelocityEngine can be created every time A new velocity example is built, respective example can be possessed by so allowing for each velocity engines, not interfere with each other.
CN201711064474.2A 2017-11-02 2017-11-02 Universal Database abstracting method based on Spoon under big data environment Pending CN107679251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711064474.2A CN107679251A (en) 2017-11-02 2017-11-02 Universal Database abstracting method based on Spoon under big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711064474.2A CN107679251A (en) 2017-11-02 2017-11-02 Universal Database abstracting method based on Spoon under big data environment

Publications (1)

Publication Number Publication Date
CN107679251A true CN107679251A (en) 2018-02-09

Family

ID=61144879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711064474.2A Pending CN107679251A (en) 2017-11-02 2017-11-02 Universal Database abstracting method based on Spoon under big data environment

Country Status (1)

Country Link
CN (1) CN107679251A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013058312A1 (en) * 2011-10-18 2013-04-25 ガイアホールディングス株式会社 Household appliance information accumulation server
CN104639558A (en) * 2015-02-25 2015-05-20 浪潮集团有限公司 Data extracting method and system as well as cloud platform
CN106446144A (en) * 2016-09-21 2017-02-22 郑州云海信息技术有限公司 Kettle-based method for extraction and statistics of data on large data platform based on kettle
CN106991100A (en) * 2016-01-21 2017-07-28 北京京东尚科信息技术有限公司 Data lead-in method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013058312A1 (en) * 2011-10-18 2013-04-25 ガイアホールディングス株式会社 Household appliance information accumulation server
CN104639558A (en) * 2015-02-25 2015-05-20 浪潮集团有限公司 Data extracting method and system as well as cloud platform
CN106991100A (en) * 2016-01-21 2017-07-28 北京京东尚科信息技术有限公司 Data lead-in method and device
CN106446144A (en) * 2016-09-21 2017-02-22 郑州云海信息技术有限公司 Kettle-based method for extraction and statistics of data on large data platform based on kettle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUKCY110100: "ETL工具(kettle)与Velocity模板引擎的结合会产生一个非常变态的产物——代码自动化", 《HTTPS://GITEE.COM/LUCKY110100/TEMPLATE》 *
细雨飘竹: "ETL:数据库分析抽取转换软件(Spoon)", 《HTTPS://WWW.JIANSHU.COM/P/65517E7E428F》 *

Similar Documents

Publication Publication Date Title
AU2018282168B2 (en) Dataflow graph configuration
Tarboton et al. HydroShare: advancing collaboration through hydrologic data and model sharing
AU2011323773B2 (en) Managing data set objects in a dataflow graph that represents a computer program
US8392896B2 (en) Software test bed generation
Dobre et al. Parallel programming paradigms and frameworks in big data era
Mościcki et al. Ganga: a tool for computational-task management and easy access to Grid resources
KR101117945B1 (en) Architecture for distributed computing system and automated design, deployment, and management of distributed applications
US9251165B2 (en) End to end automation of application deployment
US20110320394A1 (en) Creation and Revision of Network Object Graph Topology for a Network Performance Management System
US20040111248A1 (en) Polymorphic computational system and method
Pimentel et al. Yin & Yang: demonstrating complementary provenance from noWorkflow & YesWorkflow
US8775392B1 (en) Revision control and configuration management
Sánchez-Gallegos et al. An efficient pattern-based approach for workflow supporting large-scale science: The DagOnStar experience
KR101379855B1 (en) Method and apparatus for data migration from hierarchical database of mainframe system to rehosting solution database of open system
CN115516443A (en) Generating optimization logic from architecture
Kim et al. Development of an automated gridded crop growth simulation support system for distributed computing with virtual machines
Samples et al. Parameter sweeps for exploring GP parameters
Billings et al. The eclipse integrated computational environment
Stern et al. Pangeo forge: crowdsourcing analysis-ready, cloud optimized data production
CN107679251A (en) Universal Database abstracting method based on Spoon under big data environment
Sankar Fast Data Processing with Spark 2
Khashan et al. An adaptive spark-based framework for querying large-scale NoSQL and relational databases
Bendoukha et al. Building cloud-based scientific workflows made easy: A remote sensing application
US11620312B2 (en) Method and system for processing write queries in an application programming interface based on declarative schemas for individual services
US11429569B2 (en) Method and system for processing read queries in an application programming interface based on declarative schemas for individual services

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180209

RJ01 Rejection of invention patent application after publication