CN106682205A - Device and method for data processing - Google Patents
Device and method for data processing Download PDFInfo
- Publication number
- CN106682205A CN106682205A CN201611255473.1A CN201611255473A CN106682205A CN 106682205 A CN106682205 A CN 106682205A CN 201611255473 A CN201611255473 A CN 201611255473A CN 106682205 A CN106682205 A CN 106682205A
- Authority
- CN
- China
- Prior art keywords
- data
- data processing
- dimension
- rule
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a device for data processing. The device for data processing comprises an obtaining module, an extracting adding module and an abstract processing module, wherein the obtaining module is used for obtaining data from a front page; the extracting adding module is used for extracting basic dimension data from the obtained data so as to add the extracted basic dimension data to a preset dimension table; and the abstract processing module is used for performing abstract processing on the basic dimension data in the dimension table according to self-defining rules, so as to obtain data facilitating service analysis. The invention further discloses a method for data processing. Prior to data analysis, first the data is subjected to dimension extraction and abstract processing, so as to obtain data facilitating service analysis and make subsequent data analysis more accurate.
Description
Technical field
The present invention relates to big data field, more particularly to a kind of data processing equipment and method.
Background technology
With computer development and network application it is increasingly extensive, various types of data are more and more, to data
Analysis is more and more important.
At present, front end data is analyzed, such as user's visit capacity, click volume data daily in website, analysis mode
It is all fairly simple, changes in flow rate is only monitored, to determine the access of certain one end time or a certain panel region according to changes in flow rate
Amount or click volume etc., however, this data analysiss mode is relatively simple, the content of analysis is also not comprehensive enough.
The content of the invention
Present invention is primarily targeted at proposing a kind of data processing equipment and method, it is intended to solve existing data analysiss
Mode is relatively simple, and content also not comprehensive enough the technical problem analyzed.
For achieving the above object, the invention provides a kind of data processing equipment, the data processing equipment includes:
Acquisition module, for obtaining data from front end page;
Add module is extracted, for the data for obtaining to be carried out with the extraction of basic dimension data, the base that will be extracted
Plinth dimension data is added in default dimension table;
Abstract processing module, for according to custom rule to dimension table in basic dimension data carry out abstract process,
To obtain being easy to the data of operational analysis.
Alternatively, the data processing equipment includes:
Extraction module, for extracting the historical data prestored in dimension table;
The abstract processing module includes:
Accumulated unit, for the basic dimension data in dimension table, and the historical data prestored in dimension table to be tired out
Meter;
Abstract processing unit, for according to custom rule to add up after data carry out abstract process, to be easy to
The data of operational analysis.
Alternatively, the data processing equipment also includes:
Determining module, if for receive On-line data processing instruct when, determine online treatment rule, wherein, it is described
Line process rule includes several conditions and/or field;
Online treatment module, for according to the online treatment rule for determining, the data to obtaining to carry out online treatment,
Data after to obtain online treatment.
Alternatively, the data processing equipment also includes:
Report form processing module, for the data after abstract process to be carried out into report form processing in the form of page form, obtains
Report data;
Memory module, for report data to be stored in PostgreSQL database, subsequently received report data shows instruction
When, directly report data is obtained from the PostgreSQL database, and be shown on report page in the form of form.
Alternatively, the data processing equipment also includes:
Enquiry module, during for obtaining data from front end page, by the capability value of preset program inquiring data, and
The active volume of other servers;
Distribution module is selected, if the capability value for inquiry exceeds predetermined threshold value, selects active volume to be more than the appearance
The server of value, and by the preset program by obtain data distribution to select server in, institute for selection
State server and perform data processing operation.
Additionally, for achieving the above object, present invention also offers a kind of data processing method, the data processing method bag
Include:
Server obtains data from front end page;
Data to obtaining carry out the extraction of basic dimension data, the described basic dimension data for extracting are added to pre-
If dimension table in;
According to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis
Data.
Alternatively, it is described according to custom rule to dimension table in basic dimension data the step of carry out abstract process it
Before, also including step:
The historical data that extraction prestores in dimension table;
It is described according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to business
The step of data of analysis, includes:
Basic dimension data in dimension table, and the historical data prestored in dimension table are added up;
According to custom rule to add up after data carry out abstract process, to obtain being easy to the data of operational analysis.
Alternatively, after the step of server obtains data from front end page, the data processing method is also wrapped
Include:
If receive On-line data processing instruction, online treatment rule is determined, wherein, the online treatment rule includes
Several conditions and/or field;
According to determine the online treatment rule, to obtain data carry out online treatment, to obtain online treatment after
Data.
Alternatively, it is described according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain
After the step of being easy to the data of operational analysis, the data processing method also includes:
Data after abstract process are carried out into report form processing in the form of page form, report data is obtained;
Report data is stored in PostgreSQL database, when subsequently received report data shows instruction, directly from described
PostgreSQL database obtains report data, and is shown on report page in the form of form.
Alternatively, the data processing method also includes:
When obtaining data from front end page, by the capability value of preset program inquiring data, and other servers
Active volume;
If the capability value of inquiry exceeds predetermined threshold value, active volume is selected more than the server of the capability value, and lead to
The preset program is crossed by the data distribution for obtaining to the server for selecting, the server for selection is performed at data
Reason operation.
Data processing equipment proposed by the present invention and method, server first obtains data from front end page, then to obtaining
The data for taking carry out the extraction of basic dimension data, and the described basic dimension data for extracting is added to into default dimension table
In, in conjunction with custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis
Data after abstract process are subsequently analyzed again by data, can more accurately analyze the practical situation of data, and are not only
According to traffic conditions analytical data, before the present invention is to data analysiss, first data is carried out with dimension and are extracted and abstract process so that
Follow-up data analysiss are more accurate.
Description of the drawings
Fig. 1 is the hardware architecture diagram for realizing each embodiment one of the invention alternatively server;
Fig. 2 is the module diagram of data processing equipment first embodiment of the present invention;
Fig. 3 is the module diagram of data processing equipment second embodiment of the present invention;
Fig. 4 is the module diagram of data processing equipment 3rd embodiment of the present invention;
Fig. 5 is the module diagram of data processing equipment fourth embodiment of the present invention;
Fig. 6 is the module diagram of the embodiment of data processing equipment of the present invention 5th;
Fig. 7 is the present invention preferably implement scene schematic diagram;
Fig. 8 is the schematic flow sheet of data processing method first embodiment of the present invention;
Fig. 9 is the schematic flow sheet of data processing method second embodiment of the present invention;
Figure 10 is the schematic flow sheet of data processing method 3rd embodiment of the present invention;
Figure 11 is the schematic flow sheet of data processing method fourth embodiment of the present invention;
Figure 12 is the schematic flow sheet of the embodiment of data processing method of the present invention 5th.
The realization of the object of the invention, functional characteristics and advantage will be explained referring to the drawings in conjunction with the embodiments.
Specific embodiment
It should be appreciated that specific embodiment described herein is not intended to limit the present invention only to explain the present invention.
The server and terminal of each embodiment of the invention are realized referring now to Description of Drawings.In follow-up description,
Using the suffix of such as " module ", " part " or " unit " that are used to represent element only for being conducive to explanation of the invention, its
Itself does not have specific meaning.Therefore, " module " can be used mixedly with " part ".
Terminal can be implemented in a variety of manners.For example, the present invention described in terminal can include such as mobile phone,
Smart phone, notebook computer, digit broadcasting receiver, PDA (personal digital assistant), PAD (panel computer), PMP are (portable
Multimedia player), the mobile terminal of guider etc. and the such as fixed terminal of numeral TV, desk computer etc..
Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for moving purpose
Outside element, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.
Fig. 1 be realize the embodiment of the present invention one alternatively server hardware configuration illustrate.
As shown in figure 1, the server includes the communication interface that processor 1001 and the processor 1001 are communicated to connect
1002nd, memorizer 1003 and display interface 1004.
Processor 1001 first passes through communication interface 1002 and data is obtained from front end page, and then the data to obtaining are carried out
The extraction of basic dimension data, by the described basic dimension data for extracting the default dimension table of memorizer 1003 is added to
In, further according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis
Data.
Further, basic dimension data of the processor 1001 according to custom rule to dimension table carries out abstract place
Before reason, the historical data prestored in dimension table of memorizer 1003 is first extracted, then by the basic dimension data in dimension table,
Added up with the historical data prestored in dimension table, finally according to custom rule to add up after data carry out abstract place
Reason, to obtain being easy to the data of operational analysis.
Further, when processor 1001 receives On-line data processing instruction by communication interface 1002, it is determined that online
Rule is processed, then according to the online treatment rule for determining, the data to obtaining carry out online treatment, to be located online
Data after reason.
Further, processor 1001 according to custom rule to dimension table in basic dimension data carry out abstract place
Reason, to obtain being easy to after the data of operational analysis, the data after abstract process is carried out at form in the form of page form
Reason, obtains report data, then report data is stored in PostgreSQL database, when subsequently received report data shows instruction,
Directly report data is obtained from the PostgreSQL database, and the report page of display interface 1004 is shown in the form of form
On.
Further, when processor 1001 obtains data by communication interface 1002 from front end page, by preset journey
Sequence inquires about the capability value of data, and the active volume of other servers, if the capability value of inquiry exceeds predetermined threshold value, selects
Active volume is more than the server of the capability value, and the data that will be obtained by the preset program, by communication interface
1002 are distributed in the server of selection, and the server for selection performs data processing operation.
Based on the hardware configuration of above-mentioned server, each embodiment of data processing equipment of the present invention is proposed.
With reference to Fig. 2, Fig. 2 is the module diagram of data processing equipment first embodiment of the present invention.
It is emphasized that it will be apparent to those skilled in the art that module map shown in Fig. 2 is only a preferred embodiment
Exemplary plot, the module of data processing equipment of the those skilled in the art shown in Fig. 2 can easily carry out new module
Supplement;The title of each module is self-defined title, is only used for aiding in each program function block for understanding the data processing equipment, no
For limiting technical scheme, the core of technical solution of the present invention is, what the module of each self-defined title to be reached
Function.
In the present embodiment, the data processing equipment is applied to server, and the data processing equipment includes:
Acquisition module 10, for obtaining data from front end page;
Add module 20 is extracted, for the data for obtaining to be carried out with the extraction of basic dimension data, described in extracting
Basic dimension data is added in default dimension table;
Abstract processing module 30, for according to custom rule to dimension table in basic dimension data carry out abstract place
Reason, to obtain being easy to the data of operational analysis.
In the present embodiment, the pipe of ETL (Extract-Transform-Load extracts-conversion-loading) rule is first developed
Reason system, enables an administrator to manage the related rules of ETL by way of Web, then according to practical business demand, builds
Go out that corresponding ETL is regular, and the management system by ETL rules is entered into ETL rules in ETL data bases.The ETL
Rule is for data are passed through into extraction (extract), conversion (transform), loading (load) to destination from source terminal
The rule of process.The ETL rules are included but is not limited to:Basic ETL rules, accumulative ETL are regular, self-defined ETL rules, individual character
Change and perform chain ETL rules and form ETL rules etc., wherein, each ETL rule both corresponds to a SQL
The masterplate of (Structured Query Language, SQL), forgives various variables to be rendered in masterplate,
Including the date, using ID etc..
First, acquisition module 10 obtains data from front end page, and the front end page refers to the page of user operation, if handss
The machine page or the computer page, acquisition module 10 obtains data from front end page, is the equal of that data are obtained from terminal, such as from
Data are obtained in user mobile phone, computer, wherein, it can be timing acquisition that the acquisition module 10 obtains the mode of data, also may be used
Being to obtain in real time.
After acquisition module 10 gets data, extracting the data of 20 pairs of acquisitions of add module carries out taking out for basic dimension
Take, be the extraction that using above-mentioned basic ELT rules the data for obtaining are carried out with basic dimension, i.e., by institute in the present embodiment
Stating basic ELT rules carries out the extraction of basic dimension data to data, is then added to the basic dimension data for extracting default
Dimension table in, the dimension table is to set up in advance, to store basic dimension data.
In the present embodiment, data include various, such as visit capacity, click volume, active several data, or user is at certain
The amount of money that the data such as the consumption in individual website, such as user are consumed in Taobao, does not specifically limit.Accordingly, the base
Plinth dimension data also includes various, the data instance such as consumption with user in some websites, then, basic dimension data
Just include:The data such as date, website information, ID, user charges amount.
After the basic dimension data for extracting to be added to default dimension table, abstract processing module 30 is using self-defined
Rule, i.e., above self-defined ETL rule to dimension table in basic dimension data carry out abstract process, the custom rule
Be data above are carried out it is further abstract so that the data after abstract process are closer to business demand, described to make by oneself
Adopted rule is arranged according to practical situation, is not limited herein.Including:Using custom rule to data abstraction process, to determine
The classification of data.To be best understood from, it is exemplified below:
For example, basic dimension data above is paying total value of the user within a period of time, now, in order that data are more
Plus it is clear, analysis is more favorable for, divide cluster label to user data, such as add up paying volume and pay for primary in 1-100
Expense family;100-500 is intermediate paying customer, the like obtain final result, then, you can according to user this time
Interior paying total value determines the group that user is located.Additionally, the process of the abstract process not only can be used alone to analysis, also may be used
To participate in multilist alternate analyses as dimensional information, will analysis result as new dimensional information, be added to dimension table
In be further analyzed.
Abstract process is being carried out to basic dimension data, is obtaining being easy to after the data of operational analysis, after abstract process
Data Cun Chudao data warehouse in.Subsequently, when inquiring about or call data, directly call from data warehouse.
In the present embodiment, the process to data, preferably a kind of mode of processed offline, i.e., in the time point logarithm of setting
According to process is analyzed, for example, in daily zero point the analyzing and processing of data is carried out.
The data processing equipment that the present embodiment is proposed, server first obtains data from front end page, then to acquisition
Data carry out the extraction of basic dimension data, and the described basic dimension data for extracting is added in default dimension table, then
With reference to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to the data of operational analysis,
Subsequently the data after abstract process are analyzed again, can more accurately analyze the practical situation of data, and not exclusively basis
Data before the present invention is to data analysiss, is first carried out dimension and are extracted and abstract process so that be follow-up by traffic conditions analytical data
Data analysiss it is more accurate.
Further, the second embodiment of data processing equipment of the present invention is proposed.
The second embodiment of data processing equipment is with the difference of the first embodiment of data processing equipment, with reference to Fig. 3,
The data processing equipment includes:
Extraction module 40, for extracting the historical data prestored in dimension table;
The abstract processing module 30 includes:
Accumulated unit 31, for the basic dimension data in dimension table, and the historical data prestored in dimension table to be carried out
It is accumulative;
Abstract processing unit 32, for according to custom rule to add up after data carry out abstract process, to obtain just
In the data of operational analysis.
In the present embodiment, before abstract process is carried out to basic dimension data, in order that follow-up analysis result is more
Plus accurately, can also adopt above-mentioned accumulative ETL regular, to dimension table in basic dimension data and historical data add up.
That is, before abstract process is carried out to basic dimension data, extraction module 40 first extracts historical data in the dimension table that prestores, so
Afterwards, accumulated unit 31 adopts accumulative ETL rules by the basic dimension data in dimension table and the historical data prestored in dimension table
Added up.Such as:User A is 100 yuan by the end of the previous day total paying volume, and today increases newly and pays 10 yuan again, then accumulative
ETL rules can show that user A pays since the dawn of human civilization 110 yuan of result, to store in corresponding accumulation schedule.Subsequently, according to certainly
Data after rule is defined to adding up carry out abstract process, to obtain being easy to the data of operational analysis.
In the present embodiment, by adding up ETL accumulative process of the rule to data, a kind of also side of processed offline
Formula, will the same day data and historical data added up, it is accumulative after abstract processing unit 32 carry out abstract process again.
Be the equal of the maintenance of the user related information to some history accumulations in the present embodiment, add up ETL rules and join
And ETL results before and the data of newest a day, newest ETL results are obtained, subsequently according to custom rule to after accumulative
Data carry out abstract process so that the analysis of data is not only according to current acquired data, going through before may also be combined with
History data, prevent from, when the data difference of some day is larger, showing that analysis result causes analysis inaccurate, with reference to historical data pair
Data are analyzed, it is ensured that the comprehensive and stability of data analysiss, also further increase the accuracy of data analysiss.
Further, the 3rd embodiment of data processing equipment of the present invention is proposed.
The 3rd embodiment of data processing equipment is with the difference of the first embodiment of data processing equipment, with reference to Fig. 4,
The data processing equipment also includes:
Determining module 50, if for receiving during On-line data processing instruction, determining online treatment rule, wherein, it is described
Online treatment rule includes several conditions and/or field;
Online treatment module 60, for according to the online treatment rule for determining, the data to obtaining to be located online
Reason, to obtain online treatment after data.
In the present embodiment, after data are got from front end page, if receiving On-line data processing instruction, it is determined that
Module 50 first determines online treatment rule, and the online treatment rule is that the personalized chain ETL that performs mentioned above is regular, this reality
In applying example, the online treatment rule is preferably the rule of several conditions and/or field composition, including:Several condition groups
Into rule, the rule of several fields composition, or several conditions add the rule that field is constituted.It is determined that online treatment is regular
Afterwards, according to the online treatment, the data to obtaining carry out online treatment to online treatment module 60, for example, the online treatment rule
Then:Select*from Male where areas=' Fujian ' and age age>25and actions=Playgames (filters out good fortune
Build in province and playing game, male of the age more than 25 years old), then the online place of data is carried out according to the online treatment rule
Reason, to filter out this regular data is met.
Further, the present embodiment also differ in that the present embodiment can also be with two embodiments above
For generic logic it is unappeasable in the case of, the data processing method for being proposed.I.e. described online treatment rule or difference
In the unexistent rule of the rule of generic logic, i.e. generic logic.For example, traditional data analysiss, can only individually analyze one
The data of terminal applies, i.e. server can only individually analyze the data in wechat, or the individually data in analysis QQ.And this
Two applications of wechat and QQ can be carried out combined analysis simultaneously by embodiment simultaneously, be also specifically by online treatment rule
Condition and/or field realize that is, in the online treatment rule, setting includes the condition and/or field of different terminals application.When
So, to realize simultaneously merging the data of two different applications analysis, the server needs the first clothes with two applications
Business end association, follow-up online treatment module 60 merges analysis to the data of two service ends simultaneously.
In the present embodiment, there is provided a kind of On-line data processing mode, the on-line analyses of data can be both realized, can also be real
The combined analysis of data in existing different terminals application so that the analysis of data is more flexible.
Further, the fourth embodiment of data processing equipment of the present invention is proposed.
The fourth embodiment of data processing equipment is with the difference of the first embodiment of data processing equipment, with reference to Fig. 5,
The data processing equipment also includes:
Report form processing module 70, for the data after abstract process to be carried out into report form processing in the form of page form, obtains
To report data;
Memory module 80, for report data to be stored in PostgreSQL database, subsequently received report data displaying refers to
When making, directly report data is obtained from the PostgreSQL database, and be shown on report page in the form of form.
In the present embodiment, according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain
After being easy to the data of operational analysis, report form processing module 70 is regular by the data after abstract process by above-mentioned form ETL
Report form processing is carried out in the form of page form and obtains report data, the report form processing mode, is the equal of according to form
Form, data are integrated with report form, obtain report data, and final memory module 80 stores report data to increasing income
In data base.Subsequently, if receiving report data shows instruction, directly report data is obtained from the PostgreSQL database, and
It is shown on report page in the form of form.
It should be appreciated that each ETL in above three embodiment is regular, all it is to data bins by the storage of final implementing result
In storehouse, used as middle persistence result of calculation, these result of calculations exist as a part for Based Data Warehouse System.
In the present embodiment, report data can be directly stored in HBasePhoenix (data of increasing income distributed, towards row
Storehouse) in, reporting system can directly access Phoenix and obtain corresponding result, graphically represent on report page.
Such form, provides an advantage in that the final form result of quick obtaining, and is directly presented on the page, and obtains
When fetching data, than from data warehouse obtain speed faster.
It should be appreciated that each level of the ETL rules in the present invention is that have strict interlayer order dependent, such as base
Plinth ETL rules must be first carried out after finishing, and are added up the reason for ETL is regular could to be performed, and its is essential and are, the calculating of later layer
Rule can use the result of calculation of preceding layers (including the data of most original).
In the same manner, after basic ETL rule has been performed, it is regular to perform self-defined ETL, or accumulative ETL rules are held
After row, self-defined ETL could be performed regular.
However, the personalized chain ETL rules that perform are a kind of online rules, without quoting other any one rules, directly obtain
Take source data to be analyzed, therefore the personalized chain ETL rules that perform do not rely on Else Rule.
Form ETL rules are also based on what above-mentioned each layer rule was realized, or being that reference basis ETL is regular, or being
Reference adds up ETL rules or self-defined concept ETL rule or personalized execution chain ETL is regular.
Although the adduction relationship of each level of ETL rules is described above, this is for same application
, for different applications, the execution between each layer ETL is regular is separate, such as the ground floor implementing result of APP1
Execution is over, but the ground floor of APP2 is performed and do not terminated, and APP1 can just enter the execution of the second layer, without waiting APP2
Ground floor is performed and completed.
Further, the 5th embodiment of data processing equipment of the present invention is proposed.
5th embodiment of data processing equipment is to join with the difference of first to fourth embodiment of data processing equipment
According to Fig. 6, the data processing equipment also includes:
Enquiry module 90, during for obtaining data from front end page, by the capability value of preset program inquiring data, with
And the active volume of other servers;
Distribution module 100 is selected, if the capability value for inquiry exceeds predetermined threshold value, active volume is selected more than described
The server of capability value, and by the preset program by obtain data distribution to select server in, it is for selection
The server performs data processing operation.
In this embodiment, when acquisition module 10 obtains data from front end page, enquiry module 90 first passes through preset program
The capability value of inquiry data, the capability value of data that such as today obtains is 200M, also, monitor other servers of association can
Use capacity.Then the capability value of inquiry and predetermined threshold value are compared, the predetermined threshold value is set according to practical situation.
When the capability value of inquiry exceeds predetermined threshold value, distribution module 100 is selected to select active volume more than the service of the capability value
Device, then by the preset program by the data distribution for obtaining to the server for selecting, the server of selection is performed
Data processing operation.
The present embodiment, is excessive in order to slow down single server operation ETL engine pressures, needs to be distributed to multiple servers
Scheme designed by the situation of upper execution, i.e., when the processing pressure of server is excessive, can be by preset program by different application
The data distribution of middle collection gives different servers, it is to avoid when server process pressure is excessive, caused data-handling efficiency drop
It is low.
Based on the specific descriptions of the first to the 5th embodiment, the present embodiment describes again the present invention with specific application scenarios
Data processing implements process.
Fig. 7 is refer to, Fig. 7 is the concrete scene schematic diagram of data processing of the present invention.
As shown in fig. 7, server first develops ETL rule management systems, and it is regular to set up ETL, is then advised by the ETL
Then management system, the ETL rules of foundation are entered in the ETL rule databases (MySQL), it is then determined that being currently to enter
Row processed offline (Hive JDBC), or online treatment (Presto JDBC), then according to ETL engines (ETL Engine),
Perform corresponding process to operate, if using processed offline mode processing data, then by the data Cun Chudao number after processed offline
According to warehouse (Datawarehouse);If using online treatment mode processing data, then by data Cun Chudao after online treatment
In PostgreSQL database (Hbase Phoenix);Certainly, if data are carried out with report form processing, can also be by the number after report form processing
According to storage to reporting system (Report system), to complete the process of data processing, subsequently can be to the data after data
It is analyzed so that the analysis result of data is more accurate.
The present invention also provides a kind of data processing method.
With reference to Fig. 8, Fig. 8 is the schematic flow sheet of data processing method first embodiment of the present invention.
In the present embodiment, the data processing method is applied to server, and the data processing method includes:
Server obtains data from front end page;Data to obtaining carry out the extraction of basic dimension data, will take out
The described basic dimension data for taking is added in default dimension table;According to custom rule to dimension table in basic number of dimensions
According to abstract process is carried out, to obtain being easy to the data of operational analysis.
In the present embodiment, the pipe of ETL (Extract-Transform-Load extracts-conversion-loading) rule is first developed
Reason system, enables an administrator to manage the related rules of ETL by way of Web, then according to practical business demand, builds
Go out that corresponding ETL is regular, and the management system by ETL rules is entered into ETL rules in ETL data bases.The ETL
Rule is for data are passed through into extraction (extract), conversion (transform), loading (load) to destination from source terminal
The rule of process.The ETL rules are included but is not limited to:Basic ETL rules, accumulative ETL are regular, self-defined ETL rules, individual character
Change and perform chain ETL rules and form ETL rules etc., wherein, each ETL rule both corresponds to a SQL
The masterplate of (Structured Query Language, SQL), forgives various variables to be rendered in masterplate,
Including the date, using ID etc..
The following is the concrete steps that data processing is done step-by-step in the present embodiment:
Step S10, server obtains data from front end page;
First, server obtains data from front end page, and the front end page refers to the page of user operation, if mobile phone page
Face or the computer page, server obtains data from front end page, is the equal of that data are obtained from terminal, such as from user's handss
Data are obtained in mechanical, electrical brain, wherein, the server obtains the mode of data can be timing acquisition, or obtain in real time
Take.
Step S20, the data to obtaining carry out the extraction of basic dimension data, the described basic dimension data that will be extracted
In being added to default dimension table;
After data are got, the data to obtaining carry out the extraction of basic dimension, are using above-mentioned in the present embodiment
Basic ELT rules the extraction of basic dimension is carried out to the data for obtaining, i.e., base is carried out to data by the basic ELT rules
The extraction of plinth dimension data, is then added to the basic dimension data for extracting in default dimension table, and the dimension table is thing
First set up, to store basic dimension data.
In the present embodiment, data include various, such as visit capacity, click volume, active several data, or user is at certain
The amount of money that the data such as the consumption in individual website, such as user are consumed in Taobao, does not specifically limit.Accordingly, the base
Plinth dimension data also includes various, the data instance such as consumption with user in some websites, then, basic dimension data
Just include:The data such as date, website information, ID, user charges amount.
Step S30, according to custom rule to dimension table in basic dimension data carry out abstract process, to be easy to
The data of operational analysis.
After the basic dimension data for extracting is added to into default dimension table, using custom rule, i.e., above
Self-defined ETL rule to dimension table in basic dimension data carry out abstract process, the custom rule is to number above
It is further abstract according to carrying out so that the data after abstract process are closer to business demand, and the custom rule is according to reality
Border situation is arranged, and is not limited herein.Including:Using custom rule to data abstraction process, to determine the classification of data.For
It is best understood from, is exemplified below:
For example, basic dimension data above is paying total value of the user within a period of time, now, in order that data are more
Plus it is clear, analysis is more favorable for, divide cluster label to user data, such as add up paying volume and pay for primary in 1-100
Expense family;100-500 is intermediate paying customer, the like obtain final result, then, you can according to user this time
Interior paying total value determines the group that user is located.Additionally, the process of the abstract process not only can be used alone to analysis, also may be used
To participate in multilist alternate analyses as dimensional information, will analysis result as new dimensional information, be added to dimension table
In be further analyzed.
Abstract process is being carried out to basic dimension data, is obtaining being easy to after the data of operational analysis, after abstract process
Data Cun Chudao data warehouse in.Subsequently, when inquiring about or call data, directly call from data warehouse.
In the present embodiment, the process to data, preferably a kind of mode of processed offline, i.e., in the time point logarithm of setting
According to process is analyzed, for example, in daily zero point the analyzing and processing of data is carried out.
The data processing method that the present embodiment is proposed, server first obtains data from front end page, then to acquisition
Data carry out the extraction of basic dimension data, and the described basic dimension data for extracting is added in default dimension table, then
With reference to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to the data of operational analysis,
Subsequently the data after abstract process are analyzed again, can more accurately analyze the practical situation of data, and not exclusively basis
Data before the present invention is to data analysiss, is first carried out dimension and are extracted and abstract process so that be follow-up by traffic conditions analytical data
Data analysiss it is more accurate.
Further, the second embodiment of data processing method of the present invention is proposed.
The second embodiment of data processing method is with the difference of the first embodiment of data processing method, with reference to Fig. 9,
Before step S30, also include:
Step S40, extracts the historical data prestored in dimension table;
Step S30 includes:
Step S31, the basic dimension data in dimension table, and the historical data prestored in dimension table are added up;
Step S32, according to custom rule to add up after data carry out abstract process, to obtain being easy to operational analysis
Data.
In the present embodiment, before abstract process is carried out to basic dimension data, in order that follow-up analysis result is more
Plus accurately, can also adopt above-mentioned accumulative ETL regular, to dimension table in basic dimension data and historical data add up.
That is, before abstract process is carried out to basic dimension data, first historical data is extracted in the dimension table that prestores, then, using tired
Meter ETL rules are added up the basic dimension data in dimension table and the historical data in dimension table that prestores.Such as:User A
It it is 100 yuan by the end of the previous day total paying volume, today increases newly and pays 10 yuan again, then accumulative ETL rules can show that user A has
The result of 110 yuan of paying since history, to store in corresponding accumulation schedule.Subsequently, according to custom rule to add up after number
According to abstract process is carried out, to obtain being easy to the data of operational analysis.
In the present embodiment, by adding up ETL accumulative process of the rule to data, a kind of also side of processed offline
Formula, will the same day data and historical data added up, it is accumulative after carry out abstract process again.
Be the equal of the maintenance of the user related information to some history accumulations in the present embodiment, add up ETL rules and join
And ETL results before and the data of newest a day, newest ETL results are obtained, subsequently according to custom rule to after accumulative
Data carry out abstract process so that the analysis of data is not only according to current acquired data, going through before may also be combined with
History data, prevent from, when the data difference of some day is larger, showing that analysis result causes analysis inaccurate, with reference to historical data pair
Data are analyzed, it is ensured that the comprehensive and stability of data analysiss, also further increase the accuracy of data analysiss.
Further, the 3rd embodiment of data processing method of the present invention is proposed.
The 3rd embodiment of data processing method is with the difference of the first embodiment of data processing method, reference picture
10, after step S10, also include:
Step S50, if receive On-line data processing instruction, determines online treatment rule, wherein, the online treatment
Rule includes several conditions and/or field;
Step S60, according to the online treatment rule for determining, the data to obtaining carry out online treatment, to obtain
Data after line process.
In the present embodiment, after data are got from front end page, if receiving On-line data processing instruction, first really
Determine online treatment rule, the online treatment rule is that the personalized chain ETL that performs mentioned above is regular, in the present embodiment, institute
The rule that online treatment rule is preferably several conditions and/or field composition is stated, including:The rule of several condition compositions,
The rule of several fields composition, or several conditions add the rule that field is constituted.After it is determined that online treatment is regular, according to
The online treatment, the data to obtaining carry out online treatment, for example, the online treatment rule:Select*from Male
Where areas=' Fujian ' and age age>25and actions=Playgames (is filtered out in Fujian Province and played game, year
Male of the age more than 25 years old), then the online treatment of data is carried out according to the online treatment rule, to filter out this rule is met
Data then.
Further, the present embodiment also differ in that the present embodiment can also be with two embodiments above
For generic logic it is unappeasable in the case of, the data processing method for being proposed.I.e. described online treatment rule or difference
In the unexistent rule of the rule of generic logic, i.e. generic logic.For example, traditional data analysiss, can only individually analyze one
The data of terminal applies, i.e. server can only individually analyze the data in wechat, or the individually data in analysis QQ.And this
Two applications of wechat and QQ can be carried out combined analysis simultaneously by embodiment simultaneously, be also specifically by online treatment rule
Condition and/or field realize that is, in the online treatment rule, setting includes the condition and/or field of different terminals application.When
So, to realize simultaneously merging the data of two different applications analysis, the server needs the first clothes with two applications
The association of business end, it is follow-up simultaneously analysis to be merged to the data of two service ends.
In the present embodiment, there is provided a kind of On-line data processing mode, the on-line analyses of data can be both realized, can also be real
The combined analysis of data in existing different terminals application so that the analysis of data is more flexible.
Further, the fourth embodiment of data processing method of the present invention is proposed.
The fourth embodiment of data processing method is with the difference of the first embodiment of data processing method, reference picture
11, after step S30, the data processing method also includes:
Data after abstract process are carried out report form processing by step S70 in the form of page form, obtain report data;
Step S80, report data is stored in PostgreSQL database, when subsequently received report data shows instruction, directly
Connect from the PostgreSQL database and obtain report data, and be shown on report page in the form of form.
In the present embodiment, according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain
After being easy to the data of operational analysis, by above-mentioned form ETL rules by the data after abstract process with the shape of page form
Formula carries out report form processing and obtains report data, the report form processing mode, is the equal of according to the form of form, by data reporting
Sheet form is integrated, and obtains report data, and most at last report data is stored in PostgreSQL database.Subsequently, if receiving report
Table data display is instructed, then directly from PostgreSQL database acquisition report data, and form page is shown in the form of form
On face.
It should be appreciated that each ETL in above three embodiment is regular, all it is to data bins by the storage of final implementing result
In storehouse, used as middle persistence result of calculation, these result of calculations exist as a part for Based Data Warehouse System.
In the present embodiment, report data can be directly stored in HBasePhoenix (data of increasing income distributed, towards row
Storehouse) in, reporting system can directly access Phoenix and obtain corresponding result, graphically represent on report page.
Such form, provides an advantage in that the final form result of quick obtaining, and is directly presented on the page, and obtains
When fetching data, than from data warehouse obtain speed faster.
It should be appreciated that each level of the ETL rules in the present invention is that have strict interlayer order dependent, such as base
Plinth ETL rules must be first carried out after finishing, and are added up the reason for ETL is regular could to be performed, and its is essential and are, the calculating of later layer
Rule can use the result of calculation of preceding layers (including the data of most original).
In the same manner, after basic ETL rule has been performed, it is regular to perform self-defined ETL, or accumulative ETL rules are held
After row, self-defined ETL could be performed regular.
However, the personalized chain ETL rules that perform are a kind of online rules, without quoting other any one rules, directly obtain
Take source data to be analyzed, therefore the personalized chain ETL rules that perform do not rely on Else Rule.
Form ETL rules are also based on what above-mentioned each layer rule was realized, or being that reference basis ETL is regular, or being
Reference adds up ETL rules or self-defined concept ETL rule or personalized execution chain ETL is regular.
Although the adduction relationship of each level of ETL rules is described above, this is for same application
, for different applications, the execution between each layer ETL is regular is separate, such as the ground floor implementing result of APP1
Execution is over, but the ground floor of APP2 is performed and do not terminated, and APP1 can just enter the execution of the second layer, without waiting APP2
Ground floor is performed and completed.
Further, the 5th embodiment of data processing method of the present invention is proposed.
5th embodiment of data processing method is with the difference of first to fourth embodiment of data processing method:Ginseng
According to Figure 12, the data processing method also includes:
Step S90, when obtaining data from front end page, by the capability value of preset program inquiring data, and other
The active volume of server;
Step S100, if the capability value of inquiry exceeds predetermined threshold value, selects active volume more than the clothes of the capability value
Business device, and by the preset program by obtain data distribution to select server in, the server for selection
Perform data processing operation.
In this embodiment, when server obtains data from front end page, the capacity of preset program inquiring data is first passed through
The capability value of the data that value, such as today are obtained is 200M, also, monitors the active volume of other servers of association.Then will
The capability value of inquiry is compared with predetermined threshold value, and the predetermined threshold value is set according to practical situation.In the capacity of inquiry
When value exceeds predetermined threshold value, select active volume more than the server of the capability value, then will be obtained by the preset program
Into the server for selecting, the server of selection performs data processing operation to the data distribution for taking.
The present embodiment, is excessive in order to slow down single server operation ETL engine pressures, needs to be distributed to multiple servers
Scheme designed by the situation of upper execution, i.e., when the processing pressure of server is excessive, can be by preset program by different application
The data distribution of middle collection gives different servers, it is to avoid when server process pressure is excessive, caused data-handling efficiency drop
It is low.
It should be noted that herein, term " including ", "comprising" or its any other variant are intended to non-row
His property is included, so that a series of process, method, article or system including key elements not only include those key elements, and
And also include other key elements being not expressly set out, or also include for this process, method, article or system institute inherently
Key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
Also there is other identical element in the process of key element, method, article or system.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme is substantially done to prior art in other words
Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer takes
Business device, air-conditioner, or network equipment etc.) perform method described in each embodiment of the invention.
The preferred embodiments of the present invention are these are only, the scope of the claims of the present invention is not thereby limited, it is every using this
Equivalent structure or equivalent flow conversion that bright description and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of data processing equipment, it is characterised in that be applied to server, the data processing equipment includes:
Acquisition module, for obtaining data from front end page;
Add module is extracted, for the data for obtaining to be carried out with the extraction of basic dimension data, by the basis dimension for extracting
Degrees of data is added in default dimension table;
Abstract processing module, for according to custom rule to dimension table in basic dimension data carry out abstract process, with
To the data for being easy to operational analysis.
2. data processing equipment as claimed in claim 1, it is characterised in that the data processing equipment includes:
Extraction module, for extracting the historical data prestored in dimension table;
The abstract processing module includes:
Accumulated unit, for the basic dimension data in dimension table, and the historical data prestored in dimension table to be added up;
Abstract processing unit, for according to custom rule to add up after data carry out abstract process, to obtain being easy to business
The data of analysis.
3. data processing equipment as claimed in claim 1, it is characterised in that the data processing equipment also includes:
Determining module, if for receiving during On-line data processing instruction, determining online treatment rule, wherein, the online place
Reason rule includes several conditions and/or field;
Online treatment module, for according to the online treatment rule for determining, the data to obtaining to carry out online treatment, with
Data to after online treatment.
4. data processing equipment as claimed in claim 1, it is characterised in that the data processing equipment also includes:
Report form processing module, for the data after abstract process to be carried out into report form processing in the form of page form, obtains form
Data;
Memory module, for report data to be stored in PostgreSQL database, when subsequently received report data shows instruction, directly
Connect from the PostgreSQL database and obtain report data, and be shown on report page in the form of form.
5. the data processing equipment as described in any one of claim 1-4, it is characterised in that the data processing equipment is also wrapped
Include:
Enquiry module, during for obtaining data from front end page, by the capability value of preset program inquiring data, and other
The active volume of server;
Distribution module is selected, if the capability value for inquiry exceeds predetermined threshold value, selects active volume to be more than the capability value
Server, and by the preset program by obtain data distribution to select server in, the clothes for selection
Business device performs data processing operation.
6. a kind of data processing method, it is characterised in that be applied to server, the data processing method includes:
Server obtains data from front end page;
Data to obtaining carry out the extraction of basic dimension data, the described basic dimension data for extracting are added to default
In dimension table;
According to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to the number of operational analysis
According to.
7. data processing method as claimed in claim 6, it is characterised in that it is described according to custom rule in dimension table
Before the step of basic dimension data carries out abstract process, also including step:
The historical data that extraction prestores in dimension table;
It is described according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis
Data the step of include:
Basic dimension data in dimension table, and the historical data prestored in dimension table are added up;
According to custom rule to add up after data carry out abstract process, to obtain being easy to the data of operational analysis.
8. data processing method as claimed in claim 6, it is characterised in that the server obtains data from front end page
The step of after, the data processing method also includes:
If receive On-line data processing instruction, online treatment rule is determined, wherein, the online treatment rule includes some
Individual condition and/or field;
According to determine the online treatment rule, to obtain data carry out online treatment, to obtain online treatment after number
According to.
9. data processing method as claimed in claim 6, it is characterised in that it is described according to custom rule in dimension table
Basic dimension data carries out abstract process, the step of to obtain the data for being easy to operational analysis after, the data processing method
Also include:
Data after abstract process are carried out into report form processing in the form of page form, report data is obtained;
Report data is stored in PostgreSQL database, when subsequently received report data shows instruction, is directly increased income from described
Data base obtains report data, and is shown on report page in the form of form.
10. the data processing method as described in any one of claim 6-9, it is characterised in that the data processing method is also wrapped
Include:
When obtaining data from front end page, by the capability value of preset program inquiring data, and other servers is available
Capacity;
If the capability value of inquiry exceeds predetermined threshold value, active volume is selected more than the server of the capability value, and by institute
Preset program is stated by the data distribution for obtaining to the server for selecting, the server for selection performs data processing behaviour
Make.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611255473.1A CN106682205A (en) | 2016-12-29 | 2016-12-29 | Device and method for data processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611255473.1A CN106682205A (en) | 2016-12-29 | 2016-12-29 | Device and method for data processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106682205A true CN106682205A (en) | 2017-05-17 |
Family
ID=58872730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611255473.1A Pending CN106682205A (en) | 2016-12-29 | 2016-12-29 | Device and method for data processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106682205A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107580015A (en) * | 2017-07-26 | 2018-01-12 | 阿里巴巴集团控股有限公司 | Data processing method and device, server |
CN112905593A (en) * | 2021-03-04 | 2021-06-04 | 天九共享网络科技集团有限公司 | Report generation method, device, medium and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102456050A (en) * | 2010-10-27 | 2012-05-16 | 中国移动通信集团四川有限公司 | Method and device for extracting data from webpage |
CN102567539A (en) * | 2011-12-31 | 2012-07-11 | 北京新媒传信科技有限公司 | Intelligent WEB report implementation method and intelligent WEB report implementation system |
US20120221511A1 (en) * | 2007-11-02 | 2012-08-30 | International Business Machines Corporation | System and method for analyzing data in a report |
US20120240064A1 (en) * | 2011-03-15 | 2012-09-20 | Oracle International Corporation | Visualization and interaction with financial data using sunburst visualization |
CN103473338A (en) * | 2013-09-22 | 2013-12-25 | 北京奇虎科技有限公司 | Webpage content extraction method and webpage content extraction system |
CN104408179A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Method and device for processing data from data table |
-
2016
- 2016-12-29 CN CN201611255473.1A patent/CN106682205A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120221511A1 (en) * | 2007-11-02 | 2012-08-30 | International Business Machines Corporation | System and method for analyzing data in a report |
CN102456050A (en) * | 2010-10-27 | 2012-05-16 | 中国移动通信集团四川有限公司 | Method and device for extracting data from webpage |
US20120240064A1 (en) * | 2011-03-15 | 2012-09-20 | Oracle International Corporation | Visualization and interaction with financial data using sunburst visualization |
CN102567539A (en) * | 2011-12-31 | 2012-07-11 | 北京新媒传信科技有限公司 | Intelligent WEB report implementation method and intelligent WEB report implementation system |
CN103473338A (en) * | 2013-09-22 | 2013-12-25 | 北京奇虎科技有限公司 | Webpage content extraction method and webpage content extraction system |
CN104408179A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Method and device for processing data from data table |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107580015A (en) * | 2017-07-26 | 2018-01-12 | 阿里巴巴集团控股有限公司 | Data processing method and device, server |
CN112905593A (en) * | 2021-03-04 | 2021-06-04 | 天九共享网络科技集团有限公司 | Report generation method, device, medium and electronic equipment |
CN112905593B (en) * | 2021-03-04 | 2024-02-02 | 天九共享网络科技集团有限公司 | Report generation method, report generation device, report generation medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102708130B (en) | Calculate the easily extensible engine that fine point of user is mated for offer | |
CN107038200A (en) | Business data processing method and system | |
CN103473238B (en) | Dispense address location system and method | |
CN108509583A (en) | A kind of information-pushing method, server and computer readable storage medium | |
CN110347724A (en) | Abnormal behaviour recognition methods, device, electronic equipment and medium | |
Shin et al. | Forecasting the video data traffic of 5 G services in south korea | |
CN103761228B (en) | The rank threshold of application program determines that method and rank threshold determine system | |
CN110288350A (en) | User's Value Prediction Methods, device, equipment and storage medium | |
CN103250376A (en) | Method and system for carrying out predictive analysis relating to nodes of a communication network | |
CN109615172A (en) | A kind of method and terminal handling examination data | |
CN110852559A (en) | Resource allocation method and device, storage medium and electronic device | |
CN104182544B (en) | The dimension method for decomposing and device of analytical database | |
CN103218411B (en) | Website related information acquisition methods and device | |
CN107977855B (en) | Method and device for managing user information | |
CN108960672A (en) | The air control method, apparatus and computer readable storage medium of limit limit time | |
CN104484435A (en) | Method for cross-over analysis of user behavior | |
CN109190027A (en) | Multi-source recommended method, terminal, server, computer equipment, readable medium | |
CN107133339A (en) | Circuit query method and apparatus and storage medium, processor | |
CN106682205A (en) | Device and method for data processing | |
CN111859115B (en) | User allocation method and system, data processing equipment and user allocation equipment | |
CN110428278A (en) | Determine the method and device of resource share | |
CN109657950A (en) | Hierarchy Analysis Method, device, equipment and computer readable storage medium | |
CN110210884A (en) | Determine the method, apparatus, computer equipment and storage medium of user characteristic data | |
CN111951035B (en) | Consumption analysis method, system, device and platform | |
CN107066602A (en) | A kind of news information method for pushing and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170517 |
|
RJ01 | Rejection of invention patent application after publication |