CN107679251A - Universal Database abstracting method based on Spoon under big data environment - Google Patents
Universal Database abstracting method based on Spoon under big data environment Download PDFInfo
- Publication number
- CN107679251A CN107679251A CN201711064474.2A CN201711064474A CN107679251A CN 107679251 A CN107679251 A CN 107679251A CN 201711064474 A CN201711064474 A CN 201711064474A CN 107679251 A CN107679251 A CN 107679251A
- Authority
- CN
- China
- Prior art keywords
- spoon
- database
- velocity
- target
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3684—Test management for test design, e.g. generating new test cases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Abstract
The invention discloses the Universal Database abstracting method based on Spoon under a kind of big data environment, the Spoon graphic interfaces that Kettle instruments are provided and the mould plate technique based on Java(Velocity)It is combined, user can flexibly set parameter, extract related data from different databases to quickness and high efficiency, be laid a good foundation for big data analysis.
Description
Technical field
The invention belongs to field of computer technology, and in particular to the conventional data based on Spoon under a kind of big data environment
Storehouse abstracting method.
Background technology
In the epoch of informationization, the data of magnanimity, such as the shopping information of e-commerce website, train can be all produced daily
Relevant information that the ticket purchase Transaction Information of platform, the chat message of live chat instrument, industrial site gather etc., it is
Facilitating management, these data can be all stored in corresponding database with consulting.The mechanisms such as enterprise, factory, research institute are being handled
When owned magnanimity, high growth rate and diversified information assets, discovery can not use traditional data base administration
Software is efficiently managed, it is necessary to is used new tupe, is gone out valuable information from the extracting data of magnanimity.
Thus, big data technology is arisen at the historic moment.The strategic importance of big data technology is for the sea comprising useful information
Measure data and carry out data mining, extract valuable information.Big data needs special technology, such as MPP
(MPP)Database, data mining, distributed file system, distributed data base, cloud computing platform, internet and expansible
Storage system.
Under big data environment, mass data storage is in different databases, and different databases is in terms of data structure
Larger difference often be present, how related data is extracted to quickness and high efficiency from different databases, while realize difference
The conversion of data structure is very crucial for the realization of data mining between database.
The content of the invention
In order to solve the above problems, the invention provides the Universal Database based on Spoon under a kind of big data environment to take out
Take method, it is characterised in that this method comprises the following steps:
1)Execution flow chart is made by Spoon modeling tools;
2)Execution flow chart relevant parameter configures;
3)Test is performed in Spoon instruments;
4)The executable template file of generation;
5)Build the environment that Spoon, Velocity can perform in java;
6)Design generateVM methods;
7)Design executeSpoon methods;
8)Design the entry method that Java is called.
Further, the step 2)Described in relevant parameter include table input database link information, table output data
Storehouse link information, table output database table, table output field map configuration and the paging of big data quantity and are circularly set.
Further, the step 3)Described in test include transformation verify and impact analysis.
Further, the step 4)In by Velocity mould plate techniques rule carry out dynamic setting parameter in xml.
Further, the parameter include source database connection in $ { source_name }, $ { source_server },
And field attribute $ { source_colum1 }, $ { source_colum2 }, the $ { target_ in target database connection
Name }, $ { target_server } and field attribute $ { target_colum1 }, $ { target_colum2 }, source database
To be corresponded with the field attribute in target database.
Further, the step 5)Specific build requirement under a windows environment, to install JRE(Java
Runtime Environment), release requirement be 1.5 and its more than, under class Unix environment, pass through the order of input
Shell scripts are in executable state.
Further, the step 6)Specifically comprise the following steps:
a)Define the Properties classes that Velocity frameworks need;
b)VelocityEngine classes are defined, and are initialized using the Properties attributes set;
c)The getTemplate methods provided by VelocityEngine classes obtain the template class Template of file;
d)By Template meger methods, the variable set in template file is replaced by parameter information set;
e)Return to the file replaced and can performed after variable in Spoon(XML).
Further, the step b)Middle VelocityEngine can create a new velocity example every time,
Respective example can be possessed by so allowing for each velocity engines, not interfere with each other.
Brief description of the drawings
Fig. 1 is the particular flow sheet of database abstracting method.
Embodiment
In order to which technical characteristic, purpose and the effect of the present invention is more clearly understood, now control illustrates this hair
Bright embodiment.
The Universal Database abstracting method based on Spoon, described Universal Database are common under a kind of big data environment
Relational database, such as oracle, mysql, sqlserver, db2, postgresql, hbase etc., method includes following step
Suddenly:
1)Execution flow chart is made by Spoon modeling tools
ETTL (Extract-Transform-Transport-Load) is used for describing data from source terminal by extracting
(extract), conversion(transform), transport(ransport), loading(load)The process of destination is reached Deng operation.
Kettle is the ETTL instruments that a foreign countries increase income, and pure java writes, can be in a variety of operations such as Window, Linux, Unix
Run in system, data pick-up efficient stable.Kettle provides graphic user interface Spoon, it is allowed to which user is above interface
Just execution flow chart is quickly made, forms transformation script files, transformation script files are used for
Complete the basis conversion for data.The instrument Pan that Kettle is included is a data transformation engine, can carry out the reading of data
Take, computing and write-in, it is allowed to which batch operation changes (such as using a time scheduler) by the ETTL of Spoon designs.Pan
It is the program that a backstage performs, without graphical interfaces.
2)Execution flow chart relevant parameter configures
Include four concepts that are mutually related in transformation, respectively Value, Row, Input Stream, Hop and
Note.Wherein Value belongs to a Row part, can include any kind of data, for example, character string, floating number, integer,
Date, Boolean etc..Multiple while processing Value forms Row as unified input.Included in one operating procedure
Multiple Row composition Input Stream.Hop refers to the graph-based of data flow between two states, and a Hop is always represented
The output and the input of another state of one state.Note refers to that the descriptive matter in which there in transfer process can be added.
Transformation operations need to come out the data pick-up in output database, will by corresponding conversion
Data after conversion are put into input database, it is contemplated that relational database realizes the storage of information by descriptive form, is
Transformation operations are smoothly completed, it is necessary to configure following parameter in transformation Value:
a)Table input database link information;
b)Table output database link information;
c)Table output database table;
d)The mapping configuration of table output field;
e)The paging of big data quantity and it is circularly set.
3)Test is performed in Spoon instruments
In graphical interfaces transformation checkings and impact analysis are carried out by clicking on respective icon.transformation
Checking will test to each state, it is ensured that carry out whole transfer process with designed order.Impact analysis will be pre-
Survey the influence that conversion operation may be brought for database.
4)The executable template file of generation
Generating process includes following steps:
a)Export performs the configuration file that test passes through(XML);
b)The parameter of input will be tested in 1st step, by mould plate technique(Velocity)Rule be set dynamically in xml
(The main parameter information related including source storehouse, object library, field mapping, paging etc.), that is, utilize mould plate technique velocity's
Variable-definition mode, the parameter of replacement test, $ { source_name }, $ { source_ in being connected such as source database
Server } and field attribute $ { source_colum1 }, $ { source_colum2 }, the $ in target database connection
{ target_name }, $ { target_server } and field attribute $ { target_colum1 }, $ { target_colum2 },
Source database will correspond with the field attribute in target database;
c)The XML file set up is saved as into the executable VM files of mould plate technique.
5)Build the environment that Spoon, Velocity can perform in java
Under a windows environment, it is necessary to install JRE(Java Runtime Environment), release requirement be 1.5 and its with
On, under class Unix environment(Such as solaris operating system, (SuSE) Linux OS, MacOS operating systems), user must make
Obtain shell scripts and be in executable state, concrete operations are such as issued orders for input:
cd Kettle;
chmod +x *.sh。
It should be noted that Velocity also relies on some other jar bag, have in the build/lib of distribution version,
If what is downloaded is that binary system distribution version specifically includes, it is necessary to download other bag that rely on to following address:Jakarta
Commons Collections-necessary;Jakarta Avalon Logkit-optional, but strong row suggestion adds, so as to defeated
Go out log information;Jakarta ORO-optional, only when using org.apache.velocity.convert.WebMacro
Needed during this template switch instrument of template.
6)Design generateVM methods
The parameter of this method is to need to replace the information about dynamic parameters set in VM files(JavaBean), generate in the 3rd step
Catalogue and filename where the template file that can be performed in Velocity.VM file parameters are replaced using Velocity technologies
The XML file performed for Spoon, the XML file of generation is stored in temp directory, specific design process is as follows:
a)Define the Properties classes that Velocity frameworks need.Can be carried out in Properties classes template coding,
Foreach configurations, set configurations, include configurations, parse configurations, the configuration of template loader, grand configuration, explorer are matched somebody with somebody
Put, resolver pond configuration, can be inserted into introspector configuration etc. many kinds of parameters configuration, herein according to being actually needed, put
Enter the information such as the file path for needing to replace;
b)VelocityEngine classes are defined, and are initialized using the Properties attributes set.
Under org.apache.velocity.app bags, the two classes of Velocity and VelocityEngine, the method base of the inside are had
This is identical, and both differences are that Velocity classes are the classes of a velocity example for initializing singleton, can allow multiple classes
An example is shared, allows them to be able to access that same data, and VelocityEngine can create one newly every time
Velocity examples, respective example can be possessed by so allowing for each velocity engines, not interfere with each other.In order to realize
The isolation operation of velocity engines, herein using VelocityEngine classes, and carried out initially using Properties attributes
Change;
c)The getTemplate methods provided by VelocityEngine classes obtain the template class Template of file;
d)By Template meger methods, the variable set in template file is replaced by parameter information set;
e)Return to the file replaced and can performed after variable in Spoon(XML).
7)Design executeSpoon methods
The parameter of this method is the path of the temporary file generated in the 5th step, passes through the API Calls of spoon offers.
8)Design the entry method that Java is called
The parameter information that method needs in configuration step 6, complete data access operation.
Above disclosure is only preferred embodiment of present invention, it is impossible to the interest field of the present invention is limited with this,
The equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.
Claims (8)
1. the Universal Database abstracting method based on Spoon under a kind of big data environment, it is characterised in that this method includes as follows
Step:
1)Execution flow chart is made by Spoon modeling tools;
2)Execution flow chart relevant parameter configures;
3)Test is performed in Spoon instruments;
4)The executable template file of generation;
5)Build the environment that Spoon, Velocity can perform in java;
6)Design generateVM methods;
7)Design executeSpoon methods;
8)Design the entry method that Java is called.
2. according to the method for claim 1, it is characterised in that the step 2)Described in relevant parameter include table input number
According to storehouse link information, table output database link information, table output database table, table output field mapping configuration and big data quantity
Paging and be circularly set.
3. according to the method for claim 1, it is characterised in that the step 3)Described in test and include
Transformation is verified and impact analysis.
4. according to the method for claim 1, it is characterised in that the step 4)In press Velocity mould plate techniques rule
Dynamic setting parameter is carried out in xml.
5. according to the method for claim 4, it is characterised in that the parameter includes the $ { source_ in source database connection
Name }, $ { source_server } and field attribute $ { source_colum1 }, $ { source_colum2 }, target data
$ { target_name }, $ { target_server } and field attribute $ { target_colum1 }, $ in the connection of storehouse
{ target_colum2 }, source database will correspond with the field attribute in target database.
6. according to the method for claim 1, it is characterised in that the step 5)Specific to build requirement be in Windows rings
Under border, JRE is installed(Java Runtime Environment), release requirement be 1.5 and its more than, under class Unix environment,
Executable state is in by the order shell scripts of input.
7. according to the method for claim 1, it is characterised in that the step 6)Specifically comprise the following steps:
a)Define the Properties classes that Velocity frameworks need;
b)VelocityEngine classes are defined, and are initialized using the Properties attributes set;
c)The getTemplate methods provided by VelocityEngine classes obtain the template class Template of file;
d)By Template meger methods, the variable set in template file is replaced by parameter information set;
e)Return to the file replaced and can performed after variable in Spoon(XML).
8. according to the method for claim 7, it is characterised in that the step b)Middle VelocityEngine can be created every time
A new velocity example is built, respective example can be possessed by so allowing for each velocity engines, not interfere with each other.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711064474.2A CN107679251A (en) | 2017-11-02 | 2017-11-02 | Universal Database abstracting method based on Spoon under big data environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711064474.2A CN107679251A (en) | 2017-11-02 | 2017-11-02 | Universal Database abstracting method based on Spoon under big data environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107679251A true CN107679251A (en) | 2018-02-09 |
Family
ID=61144879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711064474.2A Pending CN107679251A (en) | 2017-11-02 | 2017-11-02 | Universal Database abstracting method based on Spoon under big data environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107679251A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013058312A1 (en) * | 2011-10-18 | 2013-04-25 | ガイアホールディングス株式会社 | Household appliance information accumulation server |
CN104639558A (en) * | 2015-02-25 | 2015-05-20 | 浪潮集团有限公司 | Data extracting method and system as well as cloud platform |
CN106446144A (en) * | 2016-09-21 | 2017-02-22 | 郑州云海信息技术有限公司 | Kettle-based method for extraction and statistics of data on large data platform based on kettle |
CN106991100A (en) * | 2016-01-21 | 2017-07-28 | 北京京东尚科信息技术有限公司 | Data lead-in method and device |
-
2017
- 2017-11-02 CN CN201711064474.2A patent/CN107679251A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013058312A1 (en) * | 2011-10-18 | 2013-04-25 | ガイアホールディングス株式会社 | Household appliance information accumulation server |
CN104639558A (en) * | 2015-02-25 | 2015-05-20 | 浪潮集团有限公司 | Data extracting method and system as well as cloud platform |
CN106991100A (en) * | 2016-01-21 | 2017-07-28 | 北京京东尚科信息技术有限公司 | Data lead-in method and device |
CN106446144A (en) * | 2016-09-21 | 2017-02-22 | 郑州云海信息技术有限公司 | Kettle-based method for extraction and statistics of data on large data platform based on kettle |
Non-Patent Citations (2)
Title |
---|
LUKCY110100: "ETL工具(kettle)与Velocity模板引擎的结合会产生一个非常变态的产物——代码自动化", 《HTTPS://GITEE.COM/LUCKY110100/TEMPLATE》 * |
细雨飘竹: "ETL:数据库分析抽取转换软件(Spoon)", 《HTTPS://WWW.JIANSHU.COM/P/65517E7E428F》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2018282168B2 (en) | Dataflow graph configuration | |
Tarboton et al. | HydroShare: advancing collaboration through hydrologic data and model sharing | |
AU2011323773B2 (en) | Managing data set objects in a dataflow graph that represents a computer program | |
US8392896B2 (en) | Software test bed generation | |
Dobre et al. | Parallel programming paradigms and frameworks in big data era | |
Mościcki et al. | Ganga: a tool for computational-task management and easy access to Grid resources | |
KR101117945B1 (en) | Architecture for distributed computing system and automated design, deployment, and management of distributed applications | |
US9251165B2 (en) | End to end automation of application deployment | |
US20110320394A1 (en) | Creation and Revision of Network Object Graph Topology for a Network Performance Management System | |
US20040111248A1 (en) | Polymorphic computational system and method | |
Pimentel et al. | Yin & Yang: demonstrating complementary provenance from noWorkflow & YesWorkflow | |
US8775392B1 (en) | Revision control and configuration management | |
Sánchez-Gallegos et al. | An efficient pattern-based approach for workflow supporting large-scale science: The DagOnStar experience | |
KR101379855B1 (en) | Method and apparatus for data migration from hierarchical database of mainframe system to rehosting solution database of open system | |
CN115516443A (en) | Generating optimization logic from architecture | |
Kim et al. | Development of an automated gridded crop growth simulation support system for distributed computing with virtual machines | |
Samples et al. | Parameter sweeps for exploring GP parameters | |
Billings et al. | The eclipse integrated computational environment | |
Stern et al. | Pangeo forge: crowdsourcing analysis-ready, cloud optimized data production | |
CN107679251A (en) | Universal Database abstracting method based on Spoon under big data environment | |
Sankar | Fast Data Processing with Spark 2 | |
Khashan et al. | An adaptive spark-based framework for querying large-scale NoSQL and relational databases | |
Bendoukha et al. | Building cloud-based scientific workflows made easy: A remote sensing application | |
US11620312B2 (en) | Method and system for processing write queries in an application programming interface based on declarative schemas for individual services | |
US11429569B2 (en) | Method and system for processing read queries in an application programming interface based on declarative schemas for individual services |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180209 |
|
RJ01 | Rejection of invention patent application after publication |