CN101697126A - ETL realization method for incremental data of Excel file - Google Patents

ETL realization method for incremental data of Excel file Download PDF

Info

Publication number
CN101697126A
CN101697126A CN200910229625A CN200910229625A CN101697126A CN 101697126 A CN101697126 A CN 101697126A CN 200910229625 A CN200910229625 A CN 200910229625A CN 200910229625 A CN200910229625 A CN 200910229625A CN 101697126 A CN101697126 A CN 101697126A
Authority
CN
China
Prior art keywords
excel
file
dataobject
excel file
sink
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910229625A
Other languages
Chinese (zh)
Other versions
CN101697126B (en
Inventor
扶文海
舒琦
陈俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CVIC Software Engineering Co Ltd
Original Assignee
CVIC Software Engineering Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CVIC Software Engineering Co Ltd filed Critical CVIC Software Engineering Co Ltd
Priority to CN 200910229625 priority Critical patent/CN101697126B/en
Publication of CN101697126A publication Critical patent/CN101697126A/en
Application granted granted Critical
Publication of CN101697126B publication Critical patent/CN101697126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an ETL realization method for incremental data of an Excel file, which comprises the following steps of: based on InfoSib-ExcelSource and InfoSib-ExcelSink components on an open source JavaApi, extracting file increment and whole volume of file content; and supporting the extraction of the file content of a specified part; supporting the extraction of a common file and a file in a special format separately; supporting file filtering, Sheet form filtering and data column filtering; supporting an Excel to an Excel, the Excel to database, and the database to the Excel; supporting long-distance Excel data extraction; supporting a repeated operations of one-time configuration; and supporting three extraction patterns, namely a real-time extraction pattern, a timely extraction pattern and a triggering extraction pattern.

Description

A kind of ETL implementation method of the incremental data at the Excel file
Technical field
The present invention relates to a kind of ETL implementation method of the incremental data at the Excel file.
Background technology
At present, most convenient, quick, easy-to-use software should be the Excel of Microsoft in to the data processing procedure.It is widely used for doing the storage of data, the simple operation of data and the displaying of data in the world today, therefore in the ETL process to data, should also must comprise Excel is supported.But also just because of Excel be widely used, the form variation, having caused the method for most of support Excel data ETL all be the simple Excel document data ETL that supports to mark set form, these methods lack effective dirigibility and applicability.
Summary of the invention
Purpose of the present invention is exactly at above-mentioned deficiency, provide a kind of based on InfoSib-ExcelSource on the JavaApi that increases income and InfoSib-ExcelSink member, employing is extracted file increment and file content full dose, but also supports the file content of specified portions is extracted; Support separately extracting of general and special format file; Support file filter, Sheet list to filter and the data rows filtration; Support Excel to Excel, Excel to database and database to Excel; Support long-range Excel data pick-up; Support repeatedly operation of once configuration; Support in real time, regularly and trigger the ETL implementation method at the incremental data of Excel file of three kinds of decimation patterns.
A kind of ETL implementation method of the incremental data at the Excel file, obtain and resolve the configuration information of source and sink end, the a plurality of threads of parallel starting are respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the excel file at last.
The above-mentioned step of obtaining and resolve the configuration information of source and sink end is: obtain the xbean.xml file of source end, and parse corresponding configuration information; Obtain the xbean.xml file of sink end, and parse corresponding configuration information.
Preferably, carry out a plurality of threads of parallel starting according to the following step and be respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the operation of excel file at last:
Start the source end and read thread, circulation is read and is met the Excel file of decimation rule below the basic catalogue FileObject object, and content is packaged into several DataObject objects sends in the transmit queue;
Start the transmission thread, the DataObject object in the transmit queue is read in circulation, and sends to InforSib bus container;
Start sink end processing threads, monitor the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle.
Preferably, obtain source end configuration file according to the following step:
Resolve source end xbean.xml file, obtain Source end configuration information;
By resolving the Xml file, the runMode item content that obtains in the configuration information determines that decimation pattern is real-time or timing or triggering;
According to the remoteFile item configuration information among the Xml, judgement is carried out long-range or local Excel file data and is extracted, if telefile, then use remote method to create the FileObject object of basic catalogue, if local file then uses nation method to create the FileObject object of basic catalogue;
According to the deploy content that obtains, take out with extracting the relevant information of action, generate decimation rule, begin the Excel file below the FileObject object to be extracted by decimation rule.
Preferably, Source reads the operation of excel file content according to following steps:
Start and to read thread, the Excel file of symbol decimation rule below the basic catalogue FileObject object is read in circulation, and with the data write memory of file;
According to decimation rule, the Excel content in the internal memory is write in the Document object;
The Document object that generates is packaged into the DataObject object according to the bag number size and the packet byte size that are provided with in the decimation rule, and puts into the transmit queue that sends thread.
Preferably, obtain sink end configuration information according to following steps:
Resolve Sink end configuration information file xbean.xml, obtain excel destination file path, resolve the content among the xbean.xml, obtain sink and hold employed excel file path, excel sheet and other information;
According to resolving the excel file path that xbean.xml obtains, judge whether this path is that local path still is remote path;
If the excel path that obtains is a remote path, judge that then whether this long-range excel file exists or do not exist, and if there is no then creates this document;
If the excel path that obtains is a local path, judge that then whether this this locality excel file exists or do not exist, and if there is no then creates this document;
Or be written into automatically and existed the excel file of creating in the middle of internal memory.
Preferably, the Sink end receives and handles operation according to following steps:
The sink end starts processing threads, monitors the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle;
Sink holds processing threads, receive DataObject after, will parse the excel structure from DataObject, excel data, excel data format information;
Sink holds processing threads, and according to acquisition excel file, excel sheet among the xbean, the excel data that will parse from DataObject are saved in the excel data in the excel file according to the mode of appending.
The ETL implementation method of the incremental data at the Excel file provided by the invention has following advantage:
1. need not programming, existing programming personnel can spend time on the more valuable project.Company can utilize Legacy System and stride the platform integral data of all supports, need not existing environment is changed.Your investment can adapt to the change following in the computing environment like a cork, so that can adapt to future.
2. applied widely, contained the ETL demand of all Excel file datas basically, and can realize Excel to Excel, Excel to database and database to Excel.
3. dirigibility is strong, and the Excel data ETL that both can carry out general globality also can carry out ETL to special Excel, but also can carry out in real time, regularly and various modes such as triggering.
4. simple to operate, the big Excel of portion file is not needed more to dispose, just the data ETL that can want to the Excel file of special format, also can easily realize in the configuration interface of level easily.
Description of drawings
Fig. 1 is the process flow diagram of the embodiment of the invention.
Embodiment
With the embodiment of indefiniteness the present invention is done further explanation, explanation below.
A kind of ETL implementation method of the incremental data at the Excel file, as shown in Figure 1, this method starts from step 101, resolves the Xml file, obtains Source end configuration information.
Enter step 102 then, by resolving the Xml file, the runMode item content that obtains in the configuration information determines that decimation pattern is real-time or timing or triggering.
In step 103, according to the remoteFile item configuration information among the Xml, long-range or local Excel file data extraction is carried out in judgement, if telefile then enters step 1041, uses remote method to create the FileObject object of basic catalogue; If local file then enters step 1042, use nation method to create the FileObject object of basic catalogue.
Enter step 105 then,, take out, generate decimation rule, begin the Excel file below the FileObject object to be extracted by decimation rule with extracting the relevant information of action according to the deploy content that obtains.
Enter step 106 again, start and to read thread, the Excel file of symbol decimation rule below the basic catalogue FileObject object is read in circulation, and with the data write memory of file.
In step 107,, the Excel content in the internal memory is write in the Document object according to decimation rule.
In step 108, the Document object that generates is packaged into the DataObject object according to the bag number size and the packet byte size that are provided with in the decimation rule, and puts into the transmit queue that sends thread then.
In step 109, start the transmission thread, the DataObject object in the transmit queue is read in circulation, and sends to InforSib bus container.
In step 1010, the sink end starts processing threads, monitors the sib bus; When DO when the sib bus arrives the sink end, processing threads will call the process method and receive DO and prepare and handle.
In step 1011, resolve Sink end configuration information file, obtain excel destination file path, resolve the content among the xbean.xml, obtain sink and hold employed excel file path, excel sheet and other information.
In step 1012,, judge whether this path is that local path still is remote path according to resolving the excel file path that xbean.xml obtains.
If the excel path that obtains is a remote path, then in step 1013, judge whether this long-range excel file exists or do not exist, if there is no then create this document, and enter step 1017 in step 1014.
If the excel path that obtains is a local path, then judge in step 1015 whether this this locality excel file exists or do not exist, if there is no then create this document, and enter step 1017 in step 1016.
In step 1017, or be written into and exist the excel file of creating in the middle of internal memory.
In step 1018, sink holds processing threads then, receive DO after, will parse the excel structure from DO, excel data, information such as excel data layout.
At last, enter step 1019, sink holds processing threads, and according to acquisition excel file, excel sheet among the xbean, the excel data that will parse from DO are saved in the excel data in the excel file according to the mode of appending.

Claims (7)

1. ETL implementation method at the incremental data of Excel file, it is characterized in that: the configuration information that obtains and resolve source and sink end, the a plurality of threads of parallel starting are respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the excel file at last.
2. the ETL implementation method of the incremental data at the Excel file according to claim 1, it is characterized in that: the step of obtaining and resolve the configuration information of source and sink end is: obtain the xbean.xml file of source end, and parse corresponding configuration information; Obtain the xbean.xml file of sink end, and parse corresponding configuration information.
3. the ETL implementation method of the incremental data at the Excel file according to claim 1, it is characterized in that: carry out a plurality of threads of parallel starting according to the following step and be respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the operation of excel file at last:
Start the source end and read thread, circulation is read and is met the Excel file of decimation rule below the basic catalogue FileObject object, and content is packaged into several DataObject objects sends in the transmit queue;
Start the transmission thread, the DataObject object in the transmit queue is read in circulation, and sends to InforSib bus container;
Start sink end processing threads, monitor the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle.
4. the ETL implementation method of the incremental data at the Excel file according to claim 2 is characterized in that: obtain source end configuration file according to the following step:
Resolve source end xbean.xml file, obtain Source end configuration information;
By resolving the Xml file, the runMode item content that obtains in the configuration information determines that decimation pattern is real-time or timing or triggering;
According to the remoteFile item configuration information among the Xml, judgement is carried out long-range or local Excel file data and is extracted, if telefile, then use remote method to create the FileObject object of basic catalogue, if local file then uses nation method to create the FileObject object of basic catalogue;
According to the deploy content that obtains, take out with extracting the relevant information of action, generate decimation rule, begin the Excel file below the FileObject object to be extracted by decimation rule.
5. the ETL implementation method of the incremental data at the Excel file according to claim 3 is characterized in that: Source reads the operation of excel file content according to following steps:
Start and to read thread, the Excel file of symbol decimation rule below the basic catalogue FileObject object is read in circulation, and with the data write memory of file;
According to decimation rule, the Excel content in the internal memory is write in the Document object;
The Document object that generates is packaged into the DataObject object according to the bag number size and the packet byte size that are provided with in the decimation rule, and puts into the transmit queue that sends thread.
6. the ETL implementation method of the incremental data at the Excel file according to claim 2 is characterized in that: obtain sink end configuration information according to following steps:
Resolve Sink end configuration information file xbean.xml, obtain excel destination file path, resolve the content among the xbean.xml, obtain sink and hold employed excel file path, excel sheet and other information;
According to resolving the excel file path that xbean.xml obtains, judge whether this path is that local path still is remote path;
If the excel path that obtains is a remote path, judge that then whether this long-range excel file exists or do not exist, and if there is no then creates this document;
If the excel path that obtains is a local path, judge that then whether this this locality excel file exists or do not exist, and if there is no then creates this document;
Or be written into automatically and existed the excel file of creating in the middle of internal memory.
7. the ETL implementation method of the incremental data at the Excel file according to claim 3 is characterized in that: the Sink end receives and handles operation according to following steps:
The sink end starts processing threads, monitors the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle;
Sink holds processing threads, receive DataObject after, will parse the excel structure from DataObject, excel data, excel data format information;
Sink holds processing threads, and according to acquisition excel file, excel sheet among the xbean, the excel data that will parse from DataObject are saved in the excel data in the excel file according to the mode of appending.
CN 200910229625 2009-10-28 2009-10-28 ETL realization method for incremental data of Excel file Active CN101697126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910229625 CN101697126B (en) 2009-10-28 2009-10-28 ETL realization method for incremental data of Excel file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910229625 CN101697126B (en) 2009-10-28 2009-10-28 ETL realization method for incremental data of Excel file

Publications (2)

Publication Number Publication Date
CN101697126A true CN101697126A (en) 2010-04-21
CN101697126B CN101697126B (en) 2013-03-27

Family

ID=42142231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910229625 Active CN101697126B (en) 2009-10-28 2009-10-28 ETL realization method for incremental data of Excel file

Country Status (1)

Country Link
CN (1) CN101697126B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411569A (en) * 2010-09-20 2012-04-11 上海众融信息技术有限公司 Database conversion and cleaning information processing method
CN102508912A (en) * 2011-11-09 2012-06-20 深圳市同洲电子股份有限公司 Method and system for data extracting, converting and loading
CN103605747A (en) * 2013-11-20 2014-02-26 北京国双科技有限公司 Method and device for processing file form
CN105701094A (en) * 2014-11-24 2016-06-22 北京航管科技有限公司 ETL data acquisition method and device
CN106021215A (en) * 2016-05-18 2016-10-12 广东源恒软件科技有限公司 Automatic extraction method and system of finance and tax data
CN111241171A (en) * 2019-10-28 2020-06-05 杭州美创科技有限公司 Full-amount data extraction method for database
CN111708621A (en) * 2020-05-22 2020-09-25 伟恩测试技术(武汉)有限公司 Display method of Pattern file based on multithreading parallel processing
CN112364607A (en) * 2020-10-08 2021-02-12 北京麟卓信息科技有限公司 Method and device for editing Linux file by Android application

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411569A (en) * 2010-09-20 2012-04-11 上海众融信息技术有限公司 Database conversion and cleaning information processing method
CN102508912A (en) * 2011-11-09 2012-06-20 深圳市同洲电子股份有限公司 Method and system for data extracting, converting and loading
CN103605747A (en) * 2013-11-20 2014-02-26 北京国双科技有限公司 Method and device for processing file form
CN105701094A (en) * 2014-11-24 2016-06-22 北京航管科技有限公司 ETL data acquisition method and device
CN105701094B (en) * 2014-11-24 2019-03-19 北京航管科技有限公司 A kind of ETL collecting method and device
CN106021215A (en) * 2016-05-18 2016-10-12 广东源恒软件科技有限公司 Automatic extraction method and system of finance and tax data
CN111241171A (en) * 2019-10-28 2020-06-05 杭州美创科技有限公司 Full-amount data extraction method for database
CN111708621A (en) * 2020-05-22 2020-09-25 伟恩测试技术(武汉)有限公司 Display method of Pattern file based on multithreading parallel processing
CN111708621B (en) * 2020-05-22 2024-03-29 伟恩测试技术(武汉)有限公司 Display method of Pattern file based on multithread parallel processing
CN112364607A (en) * 2020-10-08 2021-02-12 北京麟卓信息科技有限公司 Method and device for editing Linux file by Android application

Also Published As

Publication number Publication date
CN101697126B (en) 2013-03-27

Similar Documents

Publication Publication Date Title
CN101697126B (en) ETL realization method for incremental data of Excel file
CN103970903B (en) Large industrial system feedback data real-time processing method and system based on Web
EP1777632A3 (en) Method and server for extracting content based on RSS
EP2093681A3 (en) Method and system for implementing an enhanced database
TWI348640B (en) Method, memory device, and memory card that supports file system interoperability
RU2013149859A (en) PRINTING OF A PANOPTICALLY VISUALIZED DOCUMENT
CN105138312A (en) Table generation method and apparatus
RU2003116280A (en) SYSTEM AND METHOD OF DYNAMIC MASTER INTERFACE
CN101441629A (en) Automatic acquiring method of non-structured web page information
CN105354236B (en) Account checking information generation method and system
CN111078702A (en) SQL sentence classification management and unified query method and device
CN108280056A (en) A kind of Excel file analytic method
CN113163009A (en) Data transmission method, device, electronic equipment and storage medium
CN105335516A (en) Construction method of universal acquisition system
CN109670129A (en) A kind of method and device for switching to html web page to be adapted to MIP format
WO2004070491A3 (en) Method and system for organizing and retrieving energy information
RU2012106121A (en) METHOD AND DEVICE FOR PROVIDING CONTENT THROUGH NETWORK, METHOD AND DEVICE FOR RECEIVING CONTENT THROUGH NETWORK, METHOD AND DEVICE FOR BACKUP OF DATA THROUGH NETWORK, DEVICE FOR PROVIDING RESERVED RESERVE RESERVED
CN101833583A (en) Method, device and system for generating report form based on database
CN102193787B (en) Methods for serialization and de-serialization, device and system
CN109901802A (en) A kind of information paperless recording method, apparatus, equipment and system
CN103530353B (en) Self-identification method of GPS user data format
CN102446206B (en) A kind of cross-platform switch and method of three-dimensional data
CN106156191B (en) Academic probation method based on ePub file and the academic probation system based on ePub file
CN109508211A (en) A kind of multilingual configuration method, device, system and electronic equipment
CN106775643B (en) Application file packaging system and method with channel data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant