Summary of the invention
The invention provides a kind of data processing method and system, can in non-visualization system and visualization system, realize the processing of big data quantity, multi-data source, improve treatment effeciency.
For this reason, the embodiment of the present invention provides following technical scheme:
A kind of data processing method, comprising:
ETL rule is encapsulated as to dynamic library file form, and the information of this dynamic library file is registered in the task list of database on backstage;
Scan described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.
Preferably, the information of described ETL rule and described dynamic library file is that user arranges and issues.
Alternatively, described ETL rule comprises following any one or more: peek rule, Data Division rule, data conversion rule, data merge rule, data sorting rule, and data gather rule, data network collection rule, data loading rule, data configuration rule.
Alternatively, the information of described dynamic library file comprises following any one or more: the start-up time of each task, and start-up period, the mark of reforming, task type mark, task description, whether task identification, can use, and whether has subtask.
Preferably, the described ETL processing that each task in described task list is realized to data according to its corresponding ETL rule comprises:
To the each task in described task list according to its corresponding ETL rule extraction source data from data source;
The source data of obtaining is converted to the target data that system needs;
Described target data is stored in object library.
Preferably, by the task in the described task list of backstage multi-course concurrency mechanism scheduling.
A kind of data handling system, comprising:
Rule encapsulation unit, for ETL rule is encapsulated as to dynamic library file form, and registers to the information of this dynamic library file in the task list of database on backstage;
Scheduling unit, for scanning described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.
Preferably, described system also comprises:
Rule setting unit, for obtaining the information of the regular and described dynamic library file of described ETL that user arranges and issue.
Preferably, described scheduling unit comprises:
Extract subelement, for to each task of described task list according to its corresponding ETL rule extraction source data from storage facility located at processing plant;
Conversion subelement, is converted to for the source data that described extraction subelement is extracted the target data that system needs;
Storage unit, for storing the target data after described conversion subelement conversion into object library.
Preferably, described scheduling unit, specifically for dispatching the task in described task list by multi-course concurrency mechanism.
Data processing method provided by the invention and system, be encapsulated as dynamic library file form by ETL rule, and the information of this dynamic library file is registered in the task list of database on backstage; Scan described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.Possess programming technique without user, can realize the processing of big data quantity, multi-data source, not only treatment effeciency is high, and not affected by system environments, in non-visualization system and visualization system.
Embodiment
In order to make those skilled in the art person understand better the scheme of the embodiment of the present invention, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.
Data processing method provided by the invention and system, be encapsulated as dynamic library file form by ETL rule, and the information of this dynamic library file is registered in the task list of database on backstage; Scan described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.Thereby possess programming technique without user, can realize the processing of big data quantity, multi-data source, not only treatment effeciency is high, and not affected by system environments, in non-visualization system and visualization system.
As shown in Figure 1, be the process flow diagram of embodiment of the present invention data processing method, comprise the following steps:
Step 101, is encapsulated as dynamic library file form by ETL rule, and the information of this dynamic library file is registered in the task list of database on backstage.
In actual applications, the information of above-mentioned ETL rule and dynamic library file can be User Defined and be published to server.Described ETL rule can comprise the various rules of ETL application, as the rule of peeking, Data Division rule, data conversion rule, data merge rule, data sorting rule, and data gather rule, data network collection rule, data loading rule, data configuration rule etc.
In above-mentioned steps 101, can these ETL rules be encapsulated as by server to the form of dynamic library file, and the information of this dynamic library file be registered in the task list of database on backstage.The information of described dynamic library file can comprise following any one or more: start-up time, and start-up period, the mark of reforming, task type mark, task description, whether task identification, can use, and whether has subtask.These information can be user from defined, time be published on described server issuing ETL rule simultaneously.
Be to describe this task when to put the scheduling that is triggered above-mentioned start-up time, the description of above-mentioned task is in order to strengthen readability, illustrate that this task is for doing and so on, above-mentioned start-up period is used to indicate how long start a subtask, and the mark of above-mentioned task is the unique identification of this task.
In order to facilitate ETL developer's exploitation and the unified management of task, can also provide unified ETL to process API (Application Programming Interface, application programming interface), certainly, it can also be cross-platform that ETL processes API, so that ETL developer can carry out ETL exploitation in different system platforms, for example use the grand API of the instrument such as SRC_TABLE, DES_TABLE, wherein, SRC_TABLE is grand for the API of operate source data, and DES_TABLE is the grand API for Action Target data.
Step 102, scans described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.
Above-mentioned scanning process and ETL rule encapsulation enrollment process can be completed by different platforms, such as, by scheduler scanning (such as periodically or timing scan) described task list, according to the task in task list described in the message scheduling of described dynamic library file, particularly, scheduler can be by the task in the described task list of backstage multi-course concurrency mechanism scheduling.
Above-mentioned scheduler is roughly as follows to the processing procedure of the each task in described task list:
Scheduler to the each task in described task list according to its corresponding ETL rule extraction source data from data source (such as storage facility located at processing plant), the source data of obtaining is converted to the target data that system needs, described target data is stored in object library.
In said process, also can further comprise: the target data after conversion is sorted and gathered, and then the data after gathering are stored in object library.
In order to facilitate developer's use, a series of API (application programming interfaces) can also be provided, these API can be defined by developer, and scheduler calls these interfaces and realizes above-mentioned processing procedure.Such as, following API can be provided:
1. peek API, for extraction source data, comprising: network peek API, database peek API, Excel peek API, Acess peek API etc.
2. merge API, for data are merged.
3. Data Division API, for splitting data.
4. conversion API, for data are changed, such as, can indulge table and turn horizontal table etc.Can use the grand processing such as SRC_TABLE, DES_TABLE API.
5. gather API, for data are gathered, such as, use the type API can gather by index, gather by row or row.
6. index API, for big data quantity is searched, uses line index technology, namely line number is put in shared drive as index.
7. log interface, for by the situation of the calling log of each interface, to safeguard and system the present situation is shown to user.
Certainly, above-mentioned each API can be selected according to actual needs by user, and this embodiment of the present invention is not limited.
Visible, data processing method provided by the invention, is encapsulated as dynamic library file form by ETL rule, and the information of this dynamic library file is registered in the task list of database on backstage; Scan described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.Thereby possesses programming technique without user, can realize the processing of big data quantity, multi-data source, not only treatment effeciency is high, and not affected by system environments, in non-visualization system and visualization system, such as, can be applied in the system platforms such as Linux, Aix, Solaris, Windows.
Correspondingly, the embodiment of the present invention also provides a kind of data handling system, as shown in Figure 2, is a kind of structural representation of this system.
In this embodiment, described system comprises:
Rule encapsulation unit 201, for ETL rule is encapsulated as to dynamic library file form, and registers to the information of this dynamic library file in the task list of database on backstage.
Scheduling unit 202, for scanning described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.
In actual applications, the information of above-mentioned ETL rule and dynamic library file can be User Defined and be published to server.Described ETL rule can comprise the various rules of ETL application, as the rule of peeking, Data Division rule, data conversion rule, data merge rule, data sorting rule, and data gather rule, data network collection rule, data loading rule, data configuration rule etc.
For this reason, in embodiments of the present invention, described system also can further comprise: regular setting unit 203, and for obtaining described ETL rule that user arranges and issue and the information of described dynamic library file.
Correspondingly, regular encapsulation unit 201 is encapsulated as these ETL rules the form of dynamic library file, and the information of this dynamic library file is registered in the task list of database on backstage.The information of described dynamic library file can comprise following any one or more: start-up time, and start-up period, the mark of reforming, task type mark, task description, whether task identification, can use, and whether has subtask.These information can be user from defined, time be published on described server issuing ETL rule simultaneously.
In this embodiment, above-mentioned scheduling unit 102 can have various ways to realize, and a kind of concrete structure of this scheduling unit 102 comprises: extract subelement, and conversion subelement and storing sub-units, wherein:
Described extraction subelement, for to each task of described task list according to its corresponding ETL rule extraction source data from storage facility located at processing plant;
Described conversion subelement, is converted to for the source data that described extraction subelement is extracted the target data that system needs;
Described storing sub-units, for storing the target data after described conversion subelement conversion into object library.
Certainly, in actual applications, above-mentioned scheduling unit 102 also can further comprise other functional unit, such as, for the functional unit to processing such as the target data after described conversion subelement conversion sort, gathers.
In order further to improve the treatment effeciency to big data quantity, above-mentioned scheduling unit 102 can be preferably by the task in the described task list of multi-course concurrency mechanism scheduling.
Visible, data handling system provided by the invention, is encapsulated as dynamic library file form by ETL rule, and the information of this dynamic library file is registered in the task list of database on backstage; Scan described task list, to the each task in described task list according to its corresponding ETL rule realize data ETL process, the each task correspondence in described task list an ETL rule.Thereby possesses programming technique without user, can realize the processing of big data quantity, multi-data source, not only treatment effeciency is high, and not affected by system environments, in non-visualization system and visualization system, such as, can be applied in the system platforms such as Linux, Aix, Solaris, Windows.
It should be noted that, in embodiment of the present invention data handling system, can be integrated in an equipment (such as computing machine) above with different units, also can be distributed on different equipment.
Further describe for example the method and system of the embodiment of the present invention below to the processing procedure of processing.
Such as, for the form platform of mobile service, because mobile subscriber's quantity is huge, after several hundred million cellphone subscriber's business processing, form platform can produce the business record that reaches more than one hundred million, needs the ETL instrument that can process big data quantity and process these business records.The method and system of utilizing the embodiment of the present invention to provide, can build a report database, configure different form tasks in database table, and different tasks has different ETL rules, and these different ETL rules can be hidden under unified interface.Server gets up these ETL rule-based schedulings, realizes the processing to described business record, has effectively improved treatment effeciency.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually referring to, what each embodiment stressed is and the difference of other embodiment.Especially,, for system embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to the part explanation of embodiment of the method.System embodiment described above is only schematic, the wherein said unit as separating component explanation can or can not be also physically to separate, the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in multiple network element.Can select according to the actual needs some or all of module wherein to realize the object of the present embodiment scheme.Those of ordinary skill in the art, in the situation that not paying creative work, are appreciated that and implement.
Above the embodiment of the present invention is described in detail, has applied embodiment herein the present invention is set forth, the explanation of above embodiment is just for helping to understand method and apparatus of the present invention; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.