CN106682205A - Device and method for data processing - Google Patents

Device and method for data processing Download PDF

Info

Publication number
CN106682205A
CN106682205A CN201611255473.1A CN201611255473A CN106682205A CN 106682205 A CN106682205 A CN 106682205A CN 201611255473 A CN201611255473 A CN 201611255473A CN 106682205 A CN106682205 A CN 106682205A
Authority
CN
China
Prior art keywords
data
data processing
dimension
rule
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611255473.1A
Other languages
Chinese (zh)
Inventor
高宋俤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201611255473.1A priority Critical patent/CN106682205A/en
Publication of CN106682205A publication Critical patent/CN106682205A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a device for data processing. The device for data processing comprises an obtaining module, an extracting adding module and an abstract processing module, wherein the obtaining module is used for obtaining data from a front page; the extracting adding module is used for extracting basic dimension data from the obtained data so as to add the extracted basic dimension data to a preset dimension table; and the abstract processing module is used for performing abstract processing on the basic dimension data in the dimension table according to self-defining rules, so as to obtain data facilitating service analysis. The invention further discloses a method for data processing. Prior to data analysis, first the data is subjected to dimension extraction and abstract processing, so as to obtain data facilitating service analysis and make subsequent data analysis more accurate.

Description

Data processing equipment and method
Technical field
The present invention relates to big data field, more particularly to a kind of data processing equipment and method.
Background technology
With computer development and network application it is increasingly extensive, various types of data are more and more, to data Analysis is more and more important.
At present, front end data is analyzed, such as user's visit capacity, click volume data daily in website, analysis mode It is all fairly simple, changes in flow rate is only monitored, to determine the access of certain one end time or a certain panel region according to changes in flow rate Amount or click volume etc., however, this data analysiss mode is relatively simple, the content of analysis is also not comprehensive enough.
The content of the invention
Present invention is primarily targeted at proposing a kind of data processing equipment and method, it is intended to solve existing data analysiss Mode is relatively simple, and content also not comprehensive enough the technical problem analyzed.
For achieving the above object, the invention provides a kind of data processing equipment, the data processing equipment includes:
Acquisition module, for obtaining data from front end page;
Add module is extracted, for the data for obtaining to be carried out with the extraction of basic dimension data, the base that will be extracted Plinth dimension data is added in default dimension table;
Abstract processing module, for according to custom rule to dimension table in basic dimension data carry out abstract process, To obtain being easy to the data of operational analysis.
Alternatively, the data processing equipment includes:
Extraction module, for extracting the historical data prestored in dimension table;
The abstract processing module includes:
Accumulated unit, for the basic dimension data in dimension table, and the historical data prestored in dimension table to be tired out Meter;
Abstract processing unit, for according to custom rule to add up after data carry out abstract process, to be easy to The data of operational analysis.
Alternatively, the data processing equipment also includes:
Determining module, if for receive On-line data processing instruct when, determine online treatment rule, wherein, it is described Line process rule includes several conditions and/or field;
Online treatment module, for according to the online treatment rule for determining, the data to obtaining to carry out online treatment, Data after to obtain online treatment.
Alternatively, the data processing equipment also includes:
Report form processing module, for the data after abstract process to be carried out into report form processing in the form of page form, obtains Report data;
Memory module, for report data to be stored in PostgreSQL database, subsequently received report data shows instruction When, directly report data is obtained from the PostgreSQL database, and be shown on report page in the form of form.
Alternatively, the data processing equipment also includes:
Enquiry module, during for obtaining data from front end page, by the capability value of preset program inquiring data, and The active volume of other servers;
Distribution module is selected, if the capability value for inquiry exceeds predetermined threshold value, selects active volume to be more than the appearance The server of value, and by the preset program by obtain data distribution to select server in, institute for selection State server and perform data processing operation.
Additionally, for achieving the above object, present invention also offers a kind of data processing method, the data processing method bag Include:
Server obtains data from front end page;
Data to obtaining carry out the extraction of basic dimension data, the described basic dimension data for extracting are added to pre- If dimension table in;
According to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis Data.
Alternatively, it is described according to custom rule to dimension table in basic dimension data the step of carry out abstract process it Before, also including step:
The historical data that extraction prestores in dimension table;
It is described according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to business The step of data of analysis, includes:
Basic dimension data in dimension table, and the historical data prestored in dimension table are added up;
According to custom rule to add up after data carry out abstract process, to obtain being easy to the data of operational analysis.
Alternatively, after the step of server obtains data from front end page, the data processing method is also wrapped Include:
If receive On-line data processing instruction, online treatment rule is determined, wherein, the online treatment rule includes Several conditions and/or field;
According to determine the online treatment rule, to obtain data carry out online treatment, to obtain online treatment after Data.
Alternatively, it is described according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain After the step of being easy to the data of operational analysis, the data processing method also includes:
Data after abstract process are carried out into report form processing in the form of page form, report data is obtained;
Report data is stored in PostgreSQL database, when subsequently received report data shows instruction, directly from described PostgreSQL database obtains report data, and is shown on report page in the form of form.
Alternatively, the data processing method also includes:
When obtaining data from front end page, by the capability value of preset program inquiring data, and other servers Active volume;
If the capability value of inquiry exceeds predetermined threshold value, active volume is selected more than the server of the capability value, and lead to The preset program is crossed by the data distribution for obtaining to the server for selecting, the server for selection is performed at data Reason operation.
Data processing equipment proposed by the present invention and method, server first obtains data from front end page, then to obtaining The data for taking carry out the extraction of basic dimension data, and the described basic dimension data for extracting is added to into default dimension table In, in conjunction with custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis Data after abstract process are subsequently analyzed again by data, can more accurately analyze the practical situation of data, and are not only According to traffic conditions analytical data, before the present invention is to data analysiss, first data is carried out with dimension and are extracted and abstract process so that Follow-up data analysiss are more accurate.
Description of the drawings
Fig. 1 is the hardware architecture diagram for realizing each embodiment one of the invention alternatively server;
Fig. 2 is the module diagram of data processing equipment first embodiment of the present invention;
Fig. 3 is the module diagram of data processing equipment second embodiment of the present invention;
Fig. 4 is the module diagram of data processing equipment 3rd embodiment of the present invention;
Fig. 5 is the module diagram of data processing equipment fourth embodiment of the present invention;
Fig. 6 is the module diagram of the embodiment of data processing equipment of the present invention 5th;
Fig. 7 is the present invention preferably implement scene schematic diagram;
Fig. 8 is the schematic flow sheet of data processing method first embodiment of the present invention;
Fig. 9 is the schematic flow sheet of data processing method second embodiment of the present invention;
Figure 10 is the schematic flow sheet of data processing method 3rd embodiment of the present invention;
Figure 11 is the schematic flow sheet of data processing method fourth embodiment of the present invention;
Figure 12 is the schematic flow sheet of the embodiment of data processing method of the present invention 5th.
The realization of the object of the invention, functional characteristics and advantage will be explained referring to the drawings in conjunction with the embodiments.
Specific embodiment
It should be appreciated that specific embodiment described herein is not intended to limit the present invention only to explain the present invention.
The server and terminal of each embodiment of the invention are realized referring now to Description of Drawings.In follow-up description, Using the suffix of such as " module ", " part " or " unit " that are used to represent element only for being conducive to explanation of the invention, its Itself does not have specific meaning.Therefore, " module " can be used mixedly with " part ".
Terminal can be implemented in a variety of manners.For example, the present invention described in terminal can include such as mobile phone, Smart phone, notebook computer, digit broadcasting receiver, PDA (personal digital assistant), PAD (panel computer), PMP are (portable Multimedia player), the mobile terminal of guider etc. and the such as fixed terminal of numeral TV, desk computer etc.. Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for moving purpose Outside element, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.
Fig. 1 be realize the embodiment of the present invention one alternatively server hardware configuration illustrate.
As shown in figure 1, the server includes the communication interface that processor 1001 and the processor 1001 are communicated to connect 1002nd, memorizer 1003 and display interface 1004.
Processor 1001 first passes through communication interface 1002 and data is obtained from front end page, and then the data to obtaining are carried out The extraction of basic dimension data, by the described basic dimension data for extracting the default dimension table of memorizer 1003 is added to In, further according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis Data.
Further, basic dimension data of the processor 1001 according to custom rule to dimension table carries out abstract place Before reason, the historical data prestored in dimension table of memorizer 1003 is first extracted, then by the basic dimension data in dimension table, Added up with the historical data prestored in dimension table, finally according to custom rule to add up after data carry out abstract place Reason, to obtain being easy to the data of operational analysis.
Further, when processor 1001 receives On-line data processing instruction by communication interface 1002, it is determined that online Rule is processed, then according to the online treatment rule for determining, the data to obtaining carry out online treatment, to be located online Data after reason.
Further, processor 1001 according to custom rule to dimension table in basic dimension data carry out abstract place Reason, to obtain being easy to after the data of operational analysis, the data after abstract process is carried out at form in the form of page form Reason, obtains report data, then report data is stored in PostgreSQL database, when subsequently received report data shows instruction, Directly report data is obtained from the PostgreSQL database, and the report page of display interface 1004 is shown in the form of form On.
Further, when processor 1001 obtains data by communication interface 1002 from front end page, by preset journey Sequence inquires about the capability value of data, and the active volume of other servers, if the capability value of inquiry exceeds predetermined threshold value, selects Active volume is more than the server of the capability value, and the data that will be obtained by the preset program, by communication interface 1002 are distributed in the server of selection, and the server for selection performs data processing operation.
Based on the hardware configuration of above-mentioned server, each embodiment of data processing equipment of the present invention is proposed.
With reference to Fig. 2, Fig. 2 is the module diagram of data processing equipment first embodiment of the present invention.
It is emphasized that it will be apparent to those skilled in the art that module map shown in Fig. 2 is only a preferred embodiment Exemplary plot, the module of data processing equipment of the those skilled in the art shown in Fig. 2 can easily carry out new module Supplement;The title of each module is self-defined title, is only used for aiding in each program function block for understanding the data processing equipment, no For limiting technical scheme, the core of technical solution of the present invention is, what the module of each self-defined title to be reached Function.
In the present embodiment, the data processing equipment is applied to server, and the data processing equipment includes:
Acquisition module 10, for obtaining data from front end page;
Add module 20 is extracted, for the data for obtaining to be carried out with the extraction of basic dimension data, described in extracting Basic dimension data is added in default dimension table;
Abstract processing module 30, for according to custom rule to dimension table in basic dimension data carry out abstract place Reason, to obtain being easy to the data of operational analysis.
In the present embodiment, the pipe of ETL (Extract-Transform-Load extracts-conversion-loading) rule is first developed Reason system, enables an administrator to manage the related rules of ETL by way of Web, then according to practical business demand, builds Go out that corresponding ETL is regular, and the management system by ETL rules is entered into ETL rules in ETL data bases.The ETL Rule is for data are passed through into extraction (extract), conversion (transform), loading (load) to destination from source terminal The rule of process.The ETL rules are included but is not limited to:Basic ETL rules, accumulative ETL are regular, self-defined ETL rules, individual character Change and perform chain ETL rules and form ETL rules etc., wherein, each ETL rule both corresponds to a SQL The masterplate of (Structured Query Language, SQL), forgives various variables to be rendered in masterplate, Including the date, using ID etc..
First, acquisition module 10 obtains data from front end page, and the front end page refers to the page of user operation, if handss The machine page or the computer page, acquisition module 10 obtains data from front end page, is the equal of that data are obtained from terminal, such as from Data are obtained in user mobile phone, computer, wherein, it can be timing acquisition that the acquisition module 10 obtains the mode of data, also may be used Being to obtain in real time.
After acquisition module 10 gets data, extracting the data of 20 pairs of acquisitions of add module carries out taking out for basic dimension Take, be the extraction that using above-mentioned basic ELT rules the data for obtaining are carried out with basic dimension, i.e., by institute in the present embodiment Stating basic ELT rules carries out the extraction of basic dimension data to data, is then added to the basic dimension data for extracting default Dimension table in, the dimension table is to set up in advance, to store basic dimension data.
In the present embodiment, data include various, such as visit capacity, click volume, active several data, or user is at certain The amount of money that the data such as the consumption in individual website, such as user are consumed in Taobao, does not specifically limit.Accordingly, the base Plinth dimension data also includes various, the data instance such as consumption with user in some websites, then, basic dimension data Just include:The data such as date, website information, ID, user charges amount.
After the basic dimension data for extracting to be added to default dimension table, abstract processing module 30 is using self-defined Rule, i.e., above self-defined ETL rule to dimension table in basic dimension data carry out abstract process, the custom rule Be data above are carried out it is further abstract so that the data after abstract process are closer to business demand, described to make by oneself Adopted rule is arranged according to practical situation, is not limited herein.Including:Using custom rule to data abstraction process, to determine The classification of data.To be best understood from, it is exemplified below:
For example, basic dimension data above is paying total value of the user within a period of time, now, in order that data are more Plus it is clear, analysis is more favorable for, divide cluster label to user data, such as add up paying volume and pay for primary in 1-100 Expense family;100-500 is intermediate paying customer, the like obtain final result, then, you can according to user this time Interior paying total value determines the group that user is located.Additionally, the process of the abstract process not only can be used alone to analysis, also may be used To participate in multilist alternate analyses as dimensional information, will analysis result as new dimensional information, be added to dimension table In be further analyzed.
Abstract process is being carried out to basic dimension data, is obtaining being easy to after the data of operational analysis, after abstract process Data Cun Chudao data warehouse in.Subsequently, when inquiring about or call data, directly call from data warehouse.
In the present embodiment, the process to data, preferably a kind of mode of processed offline, i.e., in the time point logarithm of setting According to process is analyzed, for example, in daily zero point the analyzing and processing of data is carried out.
The data processing equipment that the present embodiment is proposed, server first obtains data from front end page, then to acquisition Data carry out the extraction of basic dimension data, and the described basic dimension data for extracting is added in default dimension table, then With reference to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to the data of operational analysis, Subsequently the data after abstract process are analyzed again, can more accurately analyze the practical situation of data, and not exclusively basis Data before the present invention is to data analysiss, is first carried out dimension and are extracted and abstract process so that be follow-up by traffic conditions analytical data Data analysiss it is more accurate.
Further, the second embodiment of data processing equipment of the present invention is proposed.
The second embodiment of data processing equipment is with the difference of the first embodiment of data processing equipment, with reference to Fig. 3, The data processing equipment includes:
Extraction module 40, for extracting the historical data prestored in dimension table;
The abstract processing module 30 includes:
Accumulated unit 31, for the basic dimension data in dimension table, and the historical data prestored in dimension table to be carried out It is accumulative;
Abstract processing unit 32, for according to custom rule to add up after data carry out abstract process, to obtain just In the data of operational analysis.
In the present embodiment, before abstract process is carried out to basic dimension data, in order that follow-up analysis result is more Plus accurately, can also adopt above-mentioned accumulative ETL regular, to dimension table in basic dimension data and historical data add up. That is, before abstract process is carried out to basic dimension data, extraction module 40 first extracts historical data in the dimension table that prestores, so Afterwards, accumulated unit 31 adopts accumulative ETL rules by the basic dimension data in dimension table and the historical data prestored in dimension table Added up.Such as:User A is 100 yuan by the end of the previous day total paying volume, and today increases newly and pays 10 yuan again, then accumulative ETL rules can show that user A pays since the dawn of human civilization 110 yuan of result, to store in corresponding accumulation schedule.Subsequently, according to certainly Data after rule is defined to adding up carry out abstract process, to obtain being easy to the data of operational analysis.
In the present embodiment, by adding up ETL accumulative process of the rule to data, a kind of also side of processed offline Formula, will the same day data and historical data added up, it is accumulative after abstract processing unit 32 carry out abstract process again.
Be the equal of the maintenance of the user related information to some history accumulations in the present embodiment, add up ETL rules and join And ETL results before and the data of newest a day, newest ETL results are obtained, subsequently according to custom rule to after accumulative Data carry out abstract process so that the analysis of data is not only according to current acquired data, going through before may also be combined with History data, prevent from, when the data difference of some day is larger, showing that analysis result causes analysis inaccurate, with reference to historical data pair Data are analyzed, it is ensured that the comprehensive and stability of data analysiss, also further increase the accuracy of data analysiss.
Further, the 3rd embodiment of data processing equipment of the present invention is proposed.
The 3rd embodiment of data processing equipment is with the difference of the first embodiment of data processing equipment, with reference to Fig. 4, The data processing equipment also includes:
Determining module 50, if for receiving during On-line data processing instruction, determining online treatment rule, wherein, it is described Online treatment rule includes several conditions and/or field;
Online treatment module 60, for according to the online treatment rule for determining, the data to obtaining to be located online Reason, to obtain online treatment after data.
In the present embodiment, after data are got from front end page, if receiving On-line data processing instruction, it is determined that Module 50 first determines online treatment rule, and the online treatment rule is that the personalized chain ETL that performs mentioned above is regular, this reality In applying example, the online treatment rule is preferably the rule of several conditions and/or field composition, including:Several condition groups Into rule, the rule of several fields composition, or several conditions add the rule that field is constituted.It is determined that online treatment is regular Afterwards, according to the online treatment, the data to obtaining carry out online treatment to online treatment module 60, for example, the online treatment rule Then:Select*from Male where areas=' Fujian ' and age age>25and actions=Playgames (filters out good fortune Build in province and playing game, male of the age more than 25 years old), then the online place of data is carried out according to the online treatment rule Reason, to filter out this regular data is met.
Further, the present embodiment also differ in that the present embodiment can also be with two embodiments above For generic logic it is unappeasable in the case of, the data processing method for being proposed.I.e. described online treatment rule or difference In the unexistent rule of the rule of generic logic, i.e. generic logic.For example, traditional data analysiss, can only individually analyze one The data of terminal applies, i.e. server can only individually analyze the data in wechat, or the individually data in analysis QQ.And this Two applications of wechat and QQ can be carried out combined analysis simultaneously by embodiment simultaneously, be also specifically by online treatment rule Condition and/or field realize that is, in the online treatment rule, setting includes the condition and/or field of different terminals application.When So, to realize simultaneously merging the data of two different applications analysis, the server needs the first clothes with two applications Business end association, follow-up online treatment module 60 merges analysis to the data of two service ends simultaneously.
In the present embodiment, there is provided a kind of On-line data processing mode, the on-line analyses of data can be both realized, can also be real The combined analysis of data in existing different terminals application so that the analysis of data is more flexible.
Further, the fourth embodiment of data processing equipment of the present invention is proposed.
The fourth embodiment of data processing equipment is with the difference of the first embodiment of data processing equipment, with reference to Fig. 5, The data processing equipment also includes:
Report form processing module 70, for the data after abstract process to be carried out into report form processing in the form of page form, obtains To report data;
Memory module 80, for report data to be stored in PostgreSQL database, subsequently received report data displaying refers to When making, directly report data is obtained from the PostgreSQL database, and be shown on report page in the form of form.
In the present embodiment, according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain After being easy to the data of operational analysis, report form processing module 70 is regular by the data after abstract process by above-mentioned form ETL Report form processing is carried out in the form of page form and obtains report data, the report form processing mode, is the equal of according to form Form, data are integrated with report form, obtain report data, and final memory module 80 stores report data to increasing income In data base.Subsequently, if receiving report data shows instruction, directly report data is obtained from the PostgreSQL database, and It is shown on report page in the form of form.
It should be appreciated that each ETL in above three embodiment is regular, all it is to data bins by the storage of final implementing result In storehouse, used as middle persistence result of calculation, these result of calculations exist as a part for Based Data Warehouse System.
In the present embodiment, report data can be directly stored in HBasePhoenix (data of increasing income distributed, towards row Storehouse) in, reporting system can directly access Phoenix and obtain corresponding result, graphically represent on report page. Such form, provides an advantage in that the final form result of quick obtaining, and is directly presented on the page, and obtains When fetching data, than from data warehouse obtain speed faster.
It should be appreciated that each level of the ETL rules in the present invention is that have strict interlayer order dependent, such as base Plinth ETL rules must be first carried out after finishing, and are added up the reason for ETL is regular could to be performed, and its is essential and are, the calculating of later layer Rule can use the result of calculation of preceding layers (including the data of most original).
In the same manner, after basic ETL rule has been performed, it is regular to perform self-defined ETL, or accumulative ETL rules are held After row, self-defined ETL could be performed regular.
However, the personalized chain ETL rules that perform are a kind of online rules, without quoting other any one rules, directly obtain Take source data to be analyzed, therefore the personalized chain ETL rules that perform do not rely on Else Rule.
Form ETL rules are also based on what above-mentioned each layer rule was realized, or being that reference basis ETL is regular, or being Reference adds up ETL rules or self-defined concept ETL rule or personalized execution chain ETL is regular.
Although the adduction relationship of each level of ETL rules is described above, this is for same application , for different applications, the execution between each layer ETL is regular is separate, such as the ground floor implementing result of APP1 Execution is over, but the ground floor of APP2 is performed and do not terminated, and APP1 can just enter the execution of the second layer, without waiting APP2 Ground floor is performed and completed.
Further, the 5th embodiment of data processing equipment of the present invention is proposed.
5th embodiment of data processing equipment is to join with the difference of first to fourth embodiment of data processing equipment According to Fig. 6, the data processing equipment also includes:
Enquiry module 90, during for obtaining data from front end page, by the capability value of preset program inquiring data, with And the active volume of other servers;
Distribution module 100 is selected, if the capability value for inquiry exceeds predetermined threshold value, active volume is selected more than described The server of capability value, and by the preset program by obtain data distribution to select server in, it is for selection The server performs data processing operation.
In this embodiment, when acquisition module 10 obtains data from front end page, enquiry module 90 first passes through preset program The capability value of inquiry data, the capability value of data that such as today obtains is 200M, also, monitor other servers of association can Use capacity.Then the capability value of inquiry and predetermined threshold value are compared, the predetermined threshold value is set according to practical situation. When the capability value of inquiry exceeds predetermined threshold value, distribution module 100 is selected to select active volume more than the service of the capability value Device, then by the preset program by the data distribution for obtaining to the server for selecting, the server of selection is performed Data processing operation.
The present embodiment, is excessive in order to slow down single server operation ETL engine pressures, needs to be distributed to multiple servers Scheme designed by the situation of upper execution, i.e., when the processing pressure of server is excessive, can be by preset program by different application The data distribution of middle collection gives different servers, it is to avoid when server process pressure is excessive, caused data-handling efficiency drop It is low.
Based on the specific descriptions of the first to the 5th embodiment, the present embodiment describes again the present invention with specific application scenarios Data processing implements process.
Fig. 7 is refer to, Fig. 7 is the concrete scene schematic diagram of data processing of the present invention.
As shown in fig. 7, server first develops ETL rule management systems, and it is regular to set up ETL, is then advised by the ETL Then management system, the ETL rules of foundation are entered in the ETL rule databases (MySQL), it is then determined that being currently to enter Row processed offline (Hive JDBC), or online treatment (Presto JDBC), then according to ETL engines (ETL Engine), Perform corresponding process to operate, if using processed offline mode processing data, then by the data Cun Chudao number after processed offline According to warehouse (Datawarehouse);If using online treatment mode processing data, then by data Cun Chudao after online treatment In PostgreSQL database (Hbase Phoenix);Certainly, if data are carried out with report form processing, can also be by the number after report form processing According to storage to reporting system (Report system), to complete the process of data processing, subsequently can be to the data after data It is analyzed so that the analysis result of data is more accurate.
The present invention also provides a kind of data processing method.
With reference to Fig. 8, Fig. 8 is the schematic flow sheet of data processing method first embodiment of the present invention.
In the present embodiment, the data processing method is applied to server, and the data processing method includes:
Server obtains data from front end page;Data to obtaining carry out the extraction of basic dimension data, will take out The described basic dimension data for taking is added in default dimension table;According to custom rule to dimension table in basic number of dimensions According to abstract process is carried out, to obtain being easy to the data of operational analysis.
In the present embodiment, the pipe of ETL (Extract-Transform-Load extracts-conversion-loading) rule is first developed Reason system, enables an administrator to manage the related rules of ETL by way of Web, then according to practical business demand, builds Go out that corresponding ETL is regular, and the management system by ETL rules is entered into ETL rules in ETL data bases.The ETL Rule is for data are passed through into extraction (extract), conversion (transform), loading (load) to destination from source terminal The rule of process.The ETL rules are included but is not limited to:Basic ETL rules, accumulative ETL are regular, self-defined ETL rules, individual character Change and perform chain ETL rules and form ETL rules etc., wherein, each ETL rule both corresponds to a SQL The masterplate of (Structured Query Language, SQL), forgives various variables to be rendered in masterplate, Including the date, using ID etc..
The following is the concrete steps that data processing is done step-by-step in the present embodiment:
Step S10, server obtains data from front end page;
First, server obtains data from front end page, and the front end page refers to the page of user operation, if mobile phone page Face or the computer page, server obtains data from front end page, is the equal of that data are obtained from terminal, such as from user's handss Data are obtained in mechanical, electrical brain, wherein, the server obtains the mode of data can be timing acquisition, or obtain in real time Take.
Step S20, the data to obtaining carry out the extraction of basic dimension data, the described basic dimension data that will be extracted In being added to default dimension table;
After data are got, the data to obtaining carry out the extraction of basic dimension, are using above-mentioned in the present embodiment Basic ELT rules the extraction of basic dimension is carried out to the data for obtaining, i.e., base is carried out to data by the basic ELT rules The extraction of plinth dimension data, is then added to the basic dimension data for extracting in default dimension table, and the dimension table is thing First set up, to store basic dimension data.
In the present embodiment, data include various, such as visit capacity, click volume, active several data, or user is at certain The amount of money that the data such as the consumption in individual website, such as user are consumed in Taobao, does not specifically limit.Accordingly, the base Plinth dimension data also includes various, the data instance such as consumption with user in some websites, then, basic dimension data Just include:The data such as date, website information, ID, user charges amount.
Step S30, according to custom rule to dimension table in basic dimension data carry out abstract process, to be easy to The data of operational analysis.
After the basic dimension data for extracting is added to into default dimension table, using custom rule, i.e., above Self-defined ETL rule to dimension table in basic dimension data carry out abstract process, the custom rule is to number above It is further abstract according to carrying out so that the data after abstract process are closer to business demand, and the custom rule is according to reality Border situation is arranged, and is not limited herein.Including:Using custom rule to data abstraction process, to determine the classification of data.For It is best understood from, is exemplified below:
For example, basic dimension data above is paying total value of the user within a period of time, now, in order that data are more Plus it is clear, analysis is more favorable for, divide cluster label to user data, such as add up paying volume and pay for primary in 1-100 Expense family;100-500 is intermediate paying customer, the like obtain final result, then, you can according to user this time Interior paying total value determines the group that user is located.Additionally, the process of the abstract process not only can be used alone to analysis, also may be used To participate in multilist alternate analyses as dimensional information, will analysis result as new dimensional information, be added to dimension table In be further analyzed.
Abstract process is being carried out to basic dimension data, is obtaining being easy to after the data of operational analysis, after abstract process Data Cun Chudao data warehouse in.Subsequently, when inquiring about or call data, directly call from data warehouse.
In the present embodiment, the process to data, preferably a kind of mode of processed offline, i.e., in the time point logarithm of setting According to process is analyzed, for example, in daily zero point the analyzing and processing of data is carried out.
The data processing method that the present embodiment is proposed, server first obtains data from front end page, then to acquisition Data carry out the extraction of basic dimension data, and the described basic dimension data for extracting is added in default dimension table, then With reference to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to the data of operational analysis, Subsequently the data after abstract process are analyzed again, can more accurately analyze the practical situation of data, and not exclusively basis Data before the present invention is to data analysiss, is first carried out dimension and are extracted and abstract process so that be follow-up by traffic conditions analytical data Data analysiss it is more accurate.
Further, the second embodiment of data processing method of the present invention is proposed.
The second embodiment of data processing method is with the difference of the first embodiment of data processing method, with reference to Fig. 9, Before step S30, also include:
Step S40, extracts the historical data prestored in dimension table;
Step S30 includes:
Step S31, the basic dimension data in dimension table, and the historical data prestored in dimension table are added up;
Step S32, according to custom rule to add up after data carry out abstract process, to obtain being easy to operational analysis Data.
In the present embodiment, before abstract process is carried out to basic dimension data, in order that follow-up analysis result is more Plus accurately, can also adopt above-mentioned accumulative ETL regular, to dimension table in basic dimension data and historical data add up. That is, before abstract process is carried out to basic dimension data, first historical data is extracted in the dimension table that prestores, then, using tired Meter ETL rules are added up the basic dimension data in dimension table and the historical data in dimension table that prestores.Such as:User A It it is 100 yuan by the end of the previous day total paying volume, today increases newly and pays 10 yuan again, then accumulative ETL rules can show that user A has The result of 110 yuan of paying since history, to store in corresponding accumulation schedule.Subsequently, according to custom rule to add up after number According to abstract process is carried out, to obtain being easy to the data of operational analysis.
In the present embodiment, by adding up ETL accumulative process of the rule to data, a kind of also side of processed offline Formula, will the same day data and historical data added up, it is accumulative after carry out abstract process again.
Be the equal of the maintenance of the user related information to some history accumulations in the present embodiment, add up ETL rules and join And ETL results before and the data of newest a day, newest ETL results are obtained, subsequently according to custom rule to after accumulative Data carry out abstract process so that the analysis of data is not only according to current acquired data, going through before may also be combined with History data, prevent from, when the data difference of some day is larger, showing that analysis result causes analysis inaccurate, with reference to historical data pair Data are analyzed, it is ensured that the comprehensive and stability of data analysiss, also further increase the accuracy of data analysiss.
Further, the 3rd embodiment of data processing method of the present invention is proposed.
The 3rd embodiment of data processing method is with the difference of the first embodiment of data processing method, reference picture 10, after step S10, also include:
Step S50, if receive On-line data processing instruction, determines online treatment rule, wherein, the online treatment Rule includes several conditions and/or field;
Step S60, according to the online treatment rule for determining, the data to obtaining carry out online treatment, to obtain Data after line process.
In the present embodiment, after data are got from front end page, if receiving On-line data processing instruction, first really Determine online treatment rule, the online treatment rule is that the personalized chain ETL that performs mentioned above is regular, in the present embodiment, institute The rule that online treatment rule is preferably several conditions and/or field composition is stated, including:The rule of several condition compositions, The rule of several fields composition, or several conditions add the rule that field is constituted.After it is determined that online treatment is regular, according to The online treatment, the data to obtaining carry out online treatment, for example, the online treatment rule:Select*from Male Where areas=' Fujian ' and age age>25and actions=Playgames (is filtered out in Fujian Province and played game, year Male of the age more than 25 years old), then the online treatment of data is carried out according to the online treatment rule, to filter out this rule is met Data then.
Further, the present embodiment also differ in that the present embodiment can also be with two embodiments above For generic logic it is unappeasable in the case of, the data processing method for being proposed.I.e. described online treatment rule or difference In the unexistent rule of the rule of generic logic, i.e. generic logic.For example, traditional data analysiss, can only individually analyze one The data of terminal applies, i.e. server can only individually analyze the data in wechat, or the individually data in analysis QQ.And this Two applications of wechat and QQ can be carried out combined analysis simultaneously by embodiment simultaneously, be also specifically by online treatment rule Condition and/or field realize that is, in the online treatment rule, setting includes the condition and/or field of different terminals application.When So, to realize simultaneously merging the data of two different applications analysis, the server needs the first clothes with two applications The association of business end, it is follow-up simultaneously analysis to be merged to the data of two service ends.
In the present embodiment, there is provided a kind of On-line data processing mode, the on-line analyses of data can be both realized, can also be real The combined analysis of data in existing different terminals application so that the analysis of data is more flexible.
Further, the fourth embodiment of data processing method of the present invention is proposed.
The fourth embodiment of data processing method is with the difference of the first embodiment of data processing method, reference picture 11, after step S30, the data processing method also includes:
Data after abstract process are carried out report form processing by step S70 in the form of page form, obtain report data;
Step S80, report data is stored in PostgreSQL database, when subsequently received report data shows instruction, directly Connect from the PostgreSQL database and obtain report data, and be shown on report page in the form of form.
In the present embodiment, according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain After being easy to the data of operational analysis, by above-mentioned form ETL rules by the data after abstract process with the shape of page form Formula carries out report form processing and obtains report data, the report form processing mode, is the equal of according to the form of form, by data reporting Sheet form is integrated, and obtains report data, and most at last report data is stored in PostgreSQL database.Subsequently, if receiving report Table data display is instructed, then directly from PostgreSQL database acquisition report data, and form page is shown in the form of form On face.
It should be appreciated that each ETL in above three embodiment is regular, all it is to data bins by the storage of final implementing result In storehouse, used as middle persistence result of calculation, these result of calculations exist as a part for Based Data Warehouse System.
In the present embodiment, report data can be directly stored in HBasePhoenix (data of increasing income distributed, towards row Storehouse) in, reporting system can directly access Phoenix and obtain corresponding result, graphically represent on report page. Such form, provides an advantage in that the final form result of quick obtaining, and is directly presented on the page, and obtains When fetching data, than from data warehouse obtain speed faster.
It should be appreciated that each level of the ETL rules in the present invention is that have strict interlayer order dependent, such as base Plinth ETL rules must be first carried out after finishing, and are added up the reason for ETL is regular could to be performed, and its is essential and are, the calculating of later layer Rule can use the result of calculation of preceding layers (including the data of most original).
In the same manner, after basic ETL rule has been performed, it is regular to perform self-defined ETL, or accumulative ETL rules are held After row, self-defined ETL could be performed regular.
However, the personalized chain ETL rules that perform are a kind of online rules, without quoting other any one rules, directly obtain Take source data to be analyzed, therefore the personalized chain ETL rules that perform do not rely on Else Rule.
Form ETL rules are also based on what above-mentioned each layer rule was realized, or being that reference basis ETL is regular, or being Reference adds up ETL rules or self-defined concept ETL rule or personalized execution chain ETL is regular.
Although the adduction relationship of each level of ETL rules is described above, this is for same application , for different applications, the execution between each layer ETL is regular is separate, such as the ground floor implementing result of APP1 Execution is over, but the ground floor of APP2 is performed and do not terminated, and APP1 can just enter the execution of the second layer, without waiting APP2 Ground floor is performed and completed.
Further, the 5th embodiment of data processing method of the present invention is proposed.
5th embodiment of data processing method is with the difference of first to fourth embodiment of data processing method:Ginseng According to Figure 12, the data processing method also includes:
Step S90, when obtaining data from front end page, by the capability value of preset program inquiring data, and other The active volume of server;
Step S100, if the capability value of inquiry exceeds predetermined threshold value, selects active volume more than the clothes of the capability value Business device, and by the preset program by obtain data distribution to select server in, the server for selection Perform data processing operation.
In this embodiment, when server obtains data from front end page, the capacity of preset program inquiring data is first passed through The capability value of the data that value, such as today are obtained is 200M, also, monitors the active volume of other servers of association.Then will The capability value of inquiry is compared with predetermined threshold value, and the predetermined threshold value is set according to practical situation.In the capacity of inquiry When value exceeds predetermined threshold value, select active volume more than the server of the capability value, then will be obtained by the preset program Into the server for selecting, the server of selection performs data processing operation to the data distribution for taking.
The present embodiment, is excessive in order to slow down single server operation ETL engine pressures, needs to be distributed to multiple servers Scheme designed by the situation of upper execution, i.e., when the processing pressure of server is excessive, can be by preset program by different application The data distribution of middle collection gives different servers, it is to avoid when server process pressure is excessive, caused data-handling efficiency drop It is low.
It should be noted that herein, term " including ", "comprising" or its any other variant are intended to non-row His property is included, so that a series of process, method, article or system including key elements not only include those key elements, and And also include other key elements being not expressly set out, or also include for this process, method, article or system institute inherently Key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that including being somebody's turn to do Also there is other identical element in the process of key element, method, article or system.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme is substantially done to prior art in other words Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer takes Business device, air-conditioner, or network equipment etc.) perform method described in each embodiment of the invention.
The preferred embodiments of the present invention are these are only, the scope of the claims of the present invention is not thereby limited, it is every using this Equivalent structure or equivalent flow conversion that bright description and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of data processing equipment, it is characterised in that be applied to server, the data processing equipment includes:
Acquisition module, for obtaining data from front end page;
Add module is extracted, for the data for obtaining to be carried out with the extraction of basic dimension data, by the basis dimension for extracting Degrees of data is added in default dimension table;
Abstract processing module, for according to custom rule to dimension table in basic dimension data carry out abstract process, with To the data for being easy to operational analysis.
2. data processing equipment as claimed in claim 1, it is characterised in that the data processing equipment includes:
Extraction module, for extracting the historical data prestored in dimension table;
The abstract processing module includes:
Accumulated unit, for the basic dimension data in dimension table, and the historical data prestored in dimension table to be added up;
Abstract processing unit, for according to custom rule to add up after data carry out abstract process, to obtain being easy to business The data of analysis.
3. data processing equipment as claimed in claim 1, it is characterised in that the data processing equipment also includes:
Determining module, if for receiving during On-line data processing instruction, determining online treatment rule, wherein, the online place Reason rule includes several conditions and/or field;
Online treatment module, for according to the online treatment rule for determining, the data to obtaining to carry out online treatment, with Data to after online treatment.
4. data processing equipment as claimed in claim 1, it is characterised in that the data processing equipment also includes:
Report form processing module, for the data after abstract process to be carried out into report form processing in the form of page form, obtains form Data;
Memory module, for report data to be stored in PostgreSQL database, when subsequently received report data shows instruction, directly Connect from the PostgreSQL database and obtain report data, and be shown on report page in the form of form.
5. the data processing equipment as described in any one of claim 1-4, it is characterised in that the data processing equipment is also wrapped Include:
Enquiry module, during for obtaining data from front end page, by the capability value of preset program inquiring data, and other The active volume of server;
Distribution module is selected, if the capability value for inquiry exceeds predetermined threshold value, selects active volume to be more than the capability value Server, and by the preset program by obtain data distribution to select server in, the clothes for selection Business device performs data processing operation.
6. a kind of data processing method, it is characterised in that be applied to server, the data processing method includes:
Server obtains data from front end page;
Data to obtaining carry out the extraction of basic dimension data, the described basic dimension data for extracting are added to default In dimension table;
According to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to the number of operational analysis According to.
7. data processing method as claimed in claim 6, it is characterised in that it is described according to custom rule in dimension table Before the step of basic dimension data carries out abstract process, also including step:
The historical data that extraction prestores in dimension table;
It is described according to custom rule to dimension table in basic dimension data carry out abstract process, to obtain being easy to operational analysis Data the step of include:
Basic dimension data in dimension table, and the historical data prestored in dimension table are added up;
According to custom rule to add up after data carry out abstract process, to obtain being easy to the data of operational analysis.
8. data processing method as claimed in claim 6, it is characterised in that the server obtains data from front end page The step of after, the data processing method also includes:
If receive On-line data processing instruction, online treatment rule is determined, wherein, the online treatment rule includes some Individual condition and/or field;
According to determine the online treatment rule, to obtain data carry out online treatment, to obtain online treatment after number According to.
9. data processing method as claimed in claim 6, it is characterised in that it is described according to custom rule in dimension table Basic dimension data carries out abstract process, the step of to obtain the data for being easy to operational analysis after, the data processing method Also include:
Data after abstract process are carried out into report form processing in the form of page form, report data is obtained;
Report data is stored in PostgreSQL database, when subsequently received report data shows instruction, is directly increased income from described Data base obtains report data, and is shown on report page in the form of form.
10. the data processing method as described in any one of claim 6-9, it is characterised in that the data processing method is also wrapped Include:
When obtaining data from front end page, by the capability value of preset program inquiring data, and other servers is available Capacity;
If the capability value of inquiry exceeds predetermined threshold value, active volume is selected more than the server of the capability value, and by institute Preset program is stated by the data distribution for obtaining to the server for selecting, the server for selection performs data processing behaviour Make.
CN201611255473.1A 2016-12-29 2016-12-29 Device and method for data processing Pending CN106682205A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611255473.1A CN106682205A (en) 2016-12-29 2016-12-29 Device and method for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611255473.1A CN106682205A (en) 2016-12-29 2016-12-29 Device and method for data processing

Publications (1)

Publication Number Publication Date
CN106682205A true CN106682205A (en) 2017-05-17

Family

ID=58872730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611255473.1A Pending CN106682205A (en) 2016-12-29 2016-12-29 Device and method for data processing

Country Status (1)

Country Link
CN (1) CN106682205A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107580015A (en) * 2017-07-26 2018-01-12 阿里巴巴集团控股有限公司 Data processing method and device, server
CN112905593A (en) * 2021-03-04 2021-06-04 天九共享网络科技集团有限公司 Report generation method, device, medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456050A (en) * 2010-10-27 2012-05-16 中国移动通信集团四川有限公司 Method and device for extracting data from webpage
CN102567539A (en) * 2011-12-31 2012-07-11 北京新媒传信科技有限公司 Intelligent WEB report implementation method and intelligent WEB report implementation system
US20120221511A1 (en) * 2007-11-02 2012-08-30 International Business Machines Corporation System and method for analyzing data in a report
US20120240064A1 (en) * 2011-03-15 2012-09-20 Oracle International Corporation Visualization and interaction with financial data using sunburst visualization
CN103473338A (en) * 2013-09-22 2013-12-25 北京奇虎科技有限公司 Webpage content extraction method and webpage content extraction system
CN104408179A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Method and device for processing data from data table

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221511A1 (en) * 2007-11-02 2012-08-30 International Business Machines Corporation System and method for analyzing data in a report
CN102456050A (en) * 2010-10-27 2012-05-16 中国移动通信集团四川有限公司 Method and device for extracting data from webpage
US20120240064A1 (en) * 2011-03-15 2012-09-20 Oracle International Corporation Visualization and interaction with financial data using sunburst visualization
CN102567539A (en) * 2011-12-31 2012-07-11 北京新媒传信科技有限公司 Intelligent WEB report implementation method and intelligent WEB report implementation system
CN103473338A (en) * 2013-09-22 2013-12-25 北京奇虎科技有限公司 Webpage content extraction method and webpage content extraction system
CN104408179A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Method and device for processing data from data table

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107580015A (en) * 2017-07-26 2018-01-12 阿里巴巴集团控股有限公司 Data processing method and device, server
CN112905593A (en) * 2021-03-04 2021-06-04 天九共享网络科技集团有限公司 Report generation method, device, medium and electronic equipment
CN112905593B (en) * 2021-03-04 2024-02-02 天九共享网络科技集团有限公司 Report generation method, report generation device, report generation medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN102708130B (en) Calculate the easily extensible engine that fine point of user is mated for offer
CN107038200A (en) Business data processing method and system
CN103473238B (en) Dispense address location system and method
CN108509583A (en) A kind of information-pushing method, server and computer readable storage medium
CN110347724A (en) Abnormal behaviour recognition methods, device, electronic equipment and medium
Shin et al. Forecasting the video data traffic of 5 G services in south korea
CN103761228B (en) The rank threshold of application program determines that method and rank threshold determine system
CN110288350A (en) User's Value Prediction Methods, device, equipment and storage medium
CN103250376A (en) Method and system for carrying out predictive analysis relating to nodes of a communication network
CN109615172A (en) A kind of method and terminal handling examination data
CN110852559A (en) Resource allocation method and device, storage medium and electronic device
CN104182544B (en) The dimension method for decomposing and device of analytical database
CN103218411B (en) Website related information acquisition methods and device
CN107977855B (en) Method and device for managing user information
CN108960672A (en) The air control method, apparatus and computer readable storage medium of limit limit time
CN104484435A (en) Method for cross-over analysis of user behavior
CN109190027A (en) Multi-source recommended method, terminal, server, computer equipment, readable medium
CN107133339A (en) Circuit query method and apparatus and storage medium, processor
CN106682205A (en) Device and method for data processing
CN111859115B (en) User allocation method and system, data processing equipment and user allocation equipment
CN110428278A (en) Determine the method and device of resource share
CN109657950A (en) Hierarchy Analysis Method, device, equipment and computer readable storage medium
CN110210884A (en) Determine the method, apparatus, computer equipment and storage medium of user characteristic data
CN111951035B (en) Consumption analysis method, system, device and platform
CN107066602A (en) A kind of news information method for pushing and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170517

RJ01 Rejection of invention patent application after publication