CN101697126A - ETL realization method for incremental data of Excel file - Google Patents
ETL realization method for incremental data of Excel file Download PDFInfo
- Publication number
- CN101697126A CN101697126A CN200910229625A CN200910229625A CN101697126A CN 101697126 A CN101697126 A CN 101697126A CN 200910229625 A CN200910229625 A CN 200910229625A CN 200910229625 A CN200910229625 A CN 200910229625A CN 101697126 A CN101697126 A CN 101697126A
- Authority
- CN
- China
- Prior art keywords
- excel
- file
- dataobject
- excel file
- sink
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses an ETL realization method for incremental data of an Excel file, which comprises the following steps of: based on InfoSib-ExcelSource and InfoSib-ExcelSink components on an open source JavaApi, extracting file increment and whole volume of file content; and supporting the extraction of the file content of a specified part; supporting the extraction of a common file and a file in a special format separately; supporting file filtering, Sheet form filtering and data column filtering; supporting an Excel to an Excel, the Excel to database, and the database to the Excel; supporting long-distance Excel data extraction; supporting a repeated operations of one-time configuration; and supporting three extraction patterns, namely a real-time extraction pattern, a timely extraction pattern and a triggering extraction pattern.
Description
Technical field
The present invention relates to a kind of ETL implementation method of the incremental data at the Excel file.
Background technology
At present, most convenient, quick, easy-to-use software should be the Excel of Microsoft in to the data processing procedure.It is widely used for doing the storage of data, the simple operation of data and the displaying of data in the world today, therefore in the ETL process to data, should also must comprise Excel is supported.But also just because of Excel be widely used, the form variation, having caused the method for most of support Excel data ETL all be the simple Excel document data ETL that supports to mark set form, these methods lack effective dirigibility and applicability.
Summary of the invention
Purpose of the present invention is exactly at above-mentioned deficiency, provide a kind of based on InfoSib-ExcelSource on the JavaApi that increases income and InfoSib-ExcelSink member, employing is extracted file increment and file content full dose, but also supports the file content of specified portions is extracted; Support separately extracting of general and special format file; Support file filter, Sheet list to filter and the data rows filtration; Support Excel to Excel, Excel to database and database to Excel; Support long-range Excel data pick-up; Support repeatedly operation of once configuration; Support in real time, regularly and trigger the ETL implementation method at the incremental data of Excel file of three kinds of decimation patterns.
A kind of ETL implementation method of the incremental data at the Excel file, obtain and resolve the configuration information of source and sink end, the a plurality of threads of parallel starting are respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the excel file at last.
The above-mentioned step of obtaining and resolve the configuration information of source and sink end is: obtain the xbean.xml file of source end, and parse corresponding configuration information; Obtain the xbean.xml file of sink end, and parse corresponding configuration information.
Preferably, carry out a plurality of threads of parallel starting according to the following step and be respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the operation of excel file at last:
Start the source end and read thread, circulation is read and is met the Excel file of decimation rule below the basic catalogue FileObject object, and content is packaged into several DataObject objects sends in the transmit queue;
Start the transmission thread, the DataObject object in the transmit queue is read in circulation, and sends to InforSib bus container;
Start sink end processing threads, monitor the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle.
Preferably, obtain source end configuration file according to the following step:
Resolve source end xbean.xml file, obtain Source end configuration information;
By resolving the Xml file, the runMode item content that obtains in the configuration information determines that decimation pattern is real-time or timing or triggering;
According to the remoteFile item configuration information among the Xml, judgement is carried out long-range or local Excel file data and is extracted, if telefile, then use remote method to create the FileObject object of basic catalogue, if local file then uses nation method to create the FileObject object of basic catalogue;
According to the deploy content that obtains, take out with extracting the relevant information of action, generate decimation rule, begin the Excel file below the FileObject object to be extracted by decimation rule.
Preferably, Source reads the operation of excel file content according to following steps:
Start and to read thread, the Excel file of symbol decimation rule below the basic catalogue FileObject object is read in circulation, and with the data write memory of file;
According to decimation rule, the Excel content in the internal memory is write in the Document object;
The Document object that generates is packaged into the DataObject object according to the bag number size and the packet byte size that are provided with in the decimation rule, and puts into the transmit queue that sends thread.
Preferably, obtain sink end configuration information according to following steps:
Resolve Sink end configuration information file xbean.xml, obtain excel destination file path, resolve the content among the xbean.xml, obtain sink and hold employed excel file path, excel sheet and other information;
According to resolving the excel file path that xbean.xml obtains, judge whether this path is that local path still is remote path;
If the excel path that obtains is a remote path, judge that then whether this long-range excel file exists or do not exist, and if there is no then creates this document;
If the excel path that obtains is a local path, judge that then whether this this locality excel file exists or do not exist, and if there is no then creates this document;
Or be written into automatically and existed the excel file of creating in the middle of internal memory.
Preferably, the Sink end receives and handles operation according to following steps:
The sink end starts processing threads, monitors the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle;
Sink holds processing threads, receive DataObject after, will parse the excel structure from DataObject, excel data, excel data format information;
Sink holds processing threads, and according to acquisition excel file, excel sheet among the xbean, the excel data that will parse from DataObject are saved in the excel data in the excel file according to the mode of appending.
The ETL implementation method of the incremental data at the Excel file provided by the invention has following advantage:
1. need not programming, existing programming personnel can spend time on the more valuable project.Company can utilize Legacy System and stride the platform integral data of all supports, need not existing environment is changed.Your investment can adapt to the change following in the computing environment like a cork, so that can adapt to future.
2. applied widely, contained the ETL demand of all Excel file datas basically, and can realize Excel to Excel, Excel to database and database to Excel.
3. dirigibility is strong, and the Excel data ETL that both can carry out general globality also can carry out ETL to special Excel, but also can carry out in real time, regularly and various modes such as triggering.
4. simple to operate, the big Excel of portion file is not needed more to dispose, just the data ETL that can want to the Excel file of special format, also can easily realize in the configuration interface of level easily.
Description of drawings
Fig. 1 is the process flow diagram of the embodiment of the invention.
Embodiment
With the embodiment of indefiniteness the present invention is done further explanation, explanation below.
A kind of ETL implementation method of the incremental data at the Excel file, as shown in Figure 1, this method starts from step 101, resolves the Xml file, obtains Source end configuration information.
Enter step 102 then, by resolving the Xml file, the runMode item content that obtains in the configuration information determines that decimation pattern is real-time or timing or triggering.
In step 103, according to the remoteFile item configuration information among the Xml, long-range or local Excel file data extraction is carried out in judgement, if telefile then enters step 1041, uses remote method to create the FileObject object of basic catalogue; If local file then enters step 1042, use nation method to create the FileObject object of basic catalogue.
Enter step 105 then,, take out, generate decimation rule, begin the Excel file below the FileObject object to be extracted by decimation rule with extracting the relevant information of action according to the deploy content that obtains.
Enter step 106 again, start and to read thread, the Excel file of symbol decimation rule below the basic catalogue FileObject object is read in circulation, and with the data write memory of file.
In step 107,, the Excel content in the internal memory is write in the Document object according to decimation rule.
In step 108, the Document object that generates is packaged into the DataObject object according to the bag number size and the packet byte size that are provided with in the decimation rule, and puts into the transmit queue that sends thread then.
In step 109, start the transmission thread, the DataObject object in the transmit queue is read in circulation, and sends to InforSib bus container.
In step 1010, the sink end starts processing threads, monitors the sib bus; When DO when the sib bus arrives the sink end, processing threads will call the process method and receive DO and prepare and handle.
In step 1011, resolve Sink end configuration information file, obtain excel destination file path, resolve the content among the xbean.xml, obtain sink and hold employed excel file path, excel sheet and other information.
In step 1012,, judge whether this path is that local path still is remote path according to resolving the excel file path that xbean.xml obtains.
If the excel path that obtains is a remote path, then in step 1013, judge whether this long-range excel file exists or do not exist, if there is no then create this document, and enter step 1017 in step 1014.
If the excel path that obtains is a local path, then judge in step 1015 whether this this locality excel file exists or do not exist, if there is no then create this document, and enter step 1017 in step 1016.
In step 1017, or be written into and exist the excel file of creating in the middle of internal memory.
In step 1018, sink holds processing threads then, receive DO after, will parse the excel structure from DO, excel data, information such as excel data layout.
At last, enter step 1019, sink holds processing threads, and according to acquisition excel file, excel sheet among the xbean, the excel data that will parse from DO are saved in the excel data in the excel file according to the mode of appending.
Claims (7)
1. ETL implementation method at the incremental data of Excel file, it is characterized in that: the configuration information that obtains and resolve source and sink end, the a plurality of threads of parallel starting are respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the excel file at last.
2. the ETL implementation method of the incremental data at the Excel file according to claim 1, it is characterized in that: the step of obtaining and resolve the configuration information of source and sink end is: obtain the xbean.xml file of source end, and parse corresponding configuration information; Obtain the xbean.xml file of sink end, and parse corresponding configuration information.
3. the ETL implementation method of the incremental data at the Excel file according to claim 1, it is characterized in that: carry out a plurality of threads of parallel starting according to the following step and be respectively applied for and read source end excel file content, send DataObject object after the encapsulation and sink and receive the DataObject object that obtains and handle, and write the operation of excel file at last:
Start the source end and read thread, circulation is read and is met the Excel file of decimation rule below the basic catalogue FileObject object, and content is packaged into several DataObject objects sends in the transmit queue;
Start the transmission thread, the DataObject object in the transmit queue is read in circulation, and sends to InforSib bus container;
Start sink end processing threads, monitor the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle.
4. the ETL implementation method of the incremental data at the Excel file according to claim 2 is characterized in that: obtain source end configuration file according to the following step:
Resolve source end xbean.xml file, obtain Source end configuration information;
By resolving the Xml file, the runMode item content that obtains in the configuration information determines that decimation pattern is real-time or timing or triggering;
According to the remoteFile item configuration information among the Xml, judgement is carried out long-range or local Excel file data and is extracted, if telefile, then use remote method to create the FileObject object of basic catalogue, if local file then uses nation method to create the FileObject object of basic catalogue;
According to the deploy content that obtains, take out with extracting the relevant information of action, generate decimation rule, begin the Excel file below the FileObject object to be extracted by decimation rule.
5. the ETL implementation method of the incremental data at the Excel file according to claim 3 is characterized in that: Source reads the operation of excel file content according to following steps:
Start and to read thread, the Excel file of symbol decimation rule below the basic catalogue FileObject object is read in circulation, and with the data write memory of file;
According to decimation rule, the Excel content in the internal memory is write in the Document object;
The Document object that generates is packaged into the DataObject object according to the bag number size and the packet byte size that are provided with in the decimation rule, and puts into the transmit queue that sends thread.
6. the ETL implementation method of the incremental data at the Excel file according to claim 2 is characterized in that: obtain sink end configuration information according to following steps:
Resolve Sink end configuration information file xbean.xml, obtain excel destination file path, resolve the content among the xbean.xml, obtain sink and hold employed excel file path, excel sheet and other information;
According to resolving the excel file path that xbean.xml obtains, judge whether this path is that local path still is remote path;
If the excel path that obtains is a remote path, judge that then whether this long-range excel file exists or do not exist, and if there is no then creates this document;
If the excel path that obtains is a local path, judge that then whether this this locality excel file exists or do not exist, and if there is no then creates this document;
Or be written into automatically and existed the excel file of creating in the middle of internal memory.
7. the ETL implementation method of the incremental data at the Excel file according to claim 3 is characterized in that: the Sink end receives and handles operation according to following steps:
The sink end starts processing threads, monitors the InforSib bus; When DataObject when the InforSib bus arrives the sink end, processing threads will call the process method and receive DataObject and prepare and handle;
Sink holds processing threads, receive DataObject after, will parse the excel structure from DataObject, excel data, excel data format information;
Sink holds processing threads, and according to acquisition excel file, excel sheet among the xbean, the excel data that will parse from DataObject are saved in the excel data in the excel file according to the mode of appending.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200910229625 CN101697126B (en) | 2009-10-28 | 2009-10-28 | ETL realization method for incremental data of Excel file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200910229625 CN101697126B (en) | 2009-10-28 | 2009-10-28 | ETL realization method for incremental data of Excel file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101697126A true CN101697126A (en) | 2010-04-21 |
CN101697126B CN101697126B (en) | 2013-03-27 |
Family
ID=42142231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200910229625 Active CN101697126B (en) | 2009-10-28 | 2009-10-28 | ETL realization method for incremental data of Excel file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101697126B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411569A (en) * | 2010-09-20 | 2012-04-11 | 上海众融信息技术有限公司 | Database conversion and cleaning information processing method |
CN102508912A (en) * | 2011-11-09 | 2012-06-20 | 深圳市同洲电子股份有限公司 | Method and system for data extracting, converting and loading |
CN103605747A (en) * | 2013-11-20 | 2014-02-26 | 北京国双科技有限公司 | Method and device for processing file form |
CN105701094A (en) * | 2014-11-24 | 2016-06-22 | 北京航管科技有限公司 | ETL data acquisition method and device |
CN106021215A (en) * | 2016-05-18 | 2016-10-12 | 广东源恒软件科技有限公司 | Automatic extraction method and system of finance and tax data |
CN111241171A (en) * | 2019-10-28 | 2020-06-05 | 杭州美创科技有限公司 | Full-amount data extraction method for database |
CN111708621A (en) * | 2020-05-22 | 2020-09-25 | 伟恩测试技术(武汉)有限公司 | Display method of Pattern file based on multithreading parallel processing |
CN112364607A (en) * | 2020-10-08 | 2021-02-12 | 北京麟卓信息科技有限公司 | Method and device for editing Linux file by Android application |
-
2009
- 2009-10-28 CN CN 200910229625 patent/CN101697126B/en active Active
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411569A (en) * | 2010-09-20 | 2012-04-11 | 上海众融信息技术有限公司 | Database conversion and cleaning information processing method |
CN102508912A (en) * | 2011-11-09 | 2012-06-20 | 深圳市同洲电子股份有限公司 | Method and system for data extracting, converting and loading |
CN103605747A (en) * | 2013-11-20 | 2014-02-26 | 北京国双科技有限公司 | Method and device for processing file form |
CN105701094A (en) * | 2014-11-24 | 2016-06-22 | 北京航管科技有限公司 | ETL data acquisition method and device |
CN105701094B (en) * | 2014-11-24 | 2019-03-19 | 北京航管科技有限公司 | A kind of ETL collecting method and device |
CN106021215A (en) * | 2016-05-18 | 2016-10-12 | 广东源恒软件科技有限公司 | Automatic extraction method and system of finance and tax data |
CN111241171A (en) * | 2019-10-28 | 2020-06-05 | 杭州美创科技有限公司 | Full-amount data extraction method for database |
CN111708621A (en) * | 2020-05-22 | 2020-09-25 | 伟恩测试技术(武汉)有限公司 | Display method of Pattern file based on multithreading parallel processing |
CN111708621B (en) * | 2020-05-22 | 2024-03-29 | 伟恩测试技术(武汉)有限公司 | Display method of Pattern file based on multithread parallel processing |
CN112364607A (en) * | 2020-10-08 | 2021-02-12 | 北京麟卓信息科技有限公司 | Method and device for editing Linux file by Android application |
Also Published As
Publication number | Publication date |
---|---|
CN101697126B (en) | 2013-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101697126B (en) | ETL realization method for incremental data of Excel file | |
CN103970903B (en) | Large industrial system feedback data real-time processing method and system based on Web | |
EP1777632A3 (en) | Method and server for extracting content based on RSS | |
EP2093681A3 (en) | Method and system for implementing an enhanced database | |
TWI348640B (en) | Method, memory device, and memory card that supports file system interoperability | |
RU2013149859A (en) | PRINTING OF A PANOPTICALLY VISUALIZED DOCUMENT | |
CN105138312A (en) | Table generation method and apparatus | |
RU2003116280A (en) | SYSTEM AND METHOD OF DYNAMIC MASTER INTERFACE | |
CN101441629A (en) | Automatic acquiring method of non-structured web page information | |
CN105354236B (en) | Account checking information generation method and system | |
CN111078702A (en) | SQL sentence classification management and unified query method and device | |
CN108280056A (en) | A kind of Excel file analytic method | |
CN113163009A (en) | Data transmission method, device, electronic equipment and storage medium | |
CN105335516A (en) | Construction method of universal acquisition system | |
CN109670129A (en) | A kind of method and device for switching to html web page to be adapted to MIP format | |
WO2004070491A3 (en) | Method and system for organizing and retrieving energy information | |
RU2012106121A (en) | METHOD AND DEVICE FOR PROVIDING CONTENT THROUGH NETWORK, METHOD AND DEVICE FOR RECEIVING CONTENT THROUGH NETWORK, METHOD AND DEVICE FOR BACKUP OF DATA THROUGH NETWORK, DEVICE FOR PROVIDING RESERVED RESERVE RESERVED | |
CN101833583A (en) | Method, device and system for generating report form based on database | |
CN102193787B (en) | Methods for serialization and de-serialization, device and system | |
CN109901802A (en) | A kind of information paperless recording method, apparatus, equipment and system | |
CN103530353B (en) | Self-identification method of GPS user data format | |
CN102446206B (en) | A kind of cross-platform switch and method of three-dimensional data | |
CN106156191B (en) | Academic probation method based on ePub file and the academic probation system based on ePub file | |
CN109508211A (en) | A kind of multilingual configuration method, device, system and electronic equipment | |
CN106775643B (en) | Application file packaging system and method with channel data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |