CN105844107B - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- CN105844107B CN105844107B CN201610197491.2A CN201610197491A CN105844107B CN 105844107 B CN105844107 B CN 105844107B CN 201610197491 A CN201610197491 A CN 201610197491A CN 105844107 B CN105844107 B CN 105844107B
- Authority
- CN
- China
- Prior art keywords
- data
- analyzed
- polymerization
- computation complexity
- sample survey
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Z—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
- G16Z99/00—Subject matter not provided for in other main groups of this subclass
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This application discloses data processing method and device.One specific embodiment of the method includes: to obtain the polymerization calculating type of data to be analyzed and data to be analyzed;Type is calculated based on polymerization and preset computation complexity is sampled data to be analyzed, obtains data from the sample survey;Polymerization calculating is carried out to data from the sample survey;Show the polymerization calculated result of data from the sample survey.The embodiment can rapidly provide the data analysis result that part has reference value, improve the efficiency that large-scale data polymerization processing result is shown.
Description
Technical field
This application involves field of computer technology, and in particular to technical field of telecommunications more particularly to data processing method
And device.
Background technique
With the development of internet technology, more and more network datas are produced.Back-end data Analysis server can be with
Polymerization analysis is carried out to the network data of generation, obtains the statistical information of the network behavior of big data quantity.Usual background server
After the polymerization for completing all data to be analyzed calculates, the result that polymerization calculates is showed into user.
For ultra-large network data, due to server system resources, the limitation of computing capability, converging operation needs
Longer time is consumed, it can not real-time exhibition polymerization result.At this moment, show that results page can be stuck in the shape for waiting result to return
State, the efficiency for providing statistic analysis result for user are lower.
Summary of the invention
In view of this, it is desired to be able to which a kind of data analysis processing method for quickly showing polymerization result is provided.In order to solve
Above-mentioned technical problem, this application provides the method and apparatus of data processing.
On the one hand, this application provides a kind of data processing methods, comprising: obtains data to be analyzed and the number to be analyzed
According to polymerization calculate type;Type is calculated based on the polymerization and preset computation complexity takes out the data to be analyzed
Sample obtains data from the sample survey;Polymerization calculating is carried out to the data from the sample survey;Show the polymerization calculated result of the data from the sample survey.
It is described that type and preset computation complexity are calculated to institute based on the polymerization in some optional implementations
It states data to be analyzed to be sampled, obtains data from the sample survey, comprising: type and preset computation complexity are calculated based on the polymerization
Determine the data from the sample survey amount of the data to be analyzed;According to the data from the sample survey amount, institute is extracted from the data to be analyzed
State data from the sample survey.
It is described to be determined based on polymerization calculating type and preset computation complexity in some optional implementations
The data from the sample survey amount of the data to be analyzed, comprising: the polymerization of the data to be analyzed is calculated into that type input has been trained
One computation complexity model obtains the first relationship mould between the computation complexity and data volume to be analyzed of the data to be analyzed
Type;The data volume to be analyzed for corresponding to the preset computation complexity is determined according to first relational model, as described
Data from the sample survey amount.
In some optional implementations, the method also includes the steps of training the first computation complexity model
Suddenly, comprising: obtain historical data analysis record, the historical data analysis record includes the number of at least one historical data set
Type is calculated according to amount and corresponding history computation complexity and history polymerization;It is recorded and is trained according to the historical data analysis
Obtain the first computation complexity model.
In some optional implementations, the method also includes: obtain available computing resource surplus;It is described to be based on
The polymerization calculates type and preset computation complexity determines the data from the sample survey amount of the data to be analyzed, comprising: will be described
The polymerization of data to be analyzed calculates type, the computing resource surplus inputs the second computation complexity model trained, and obtains
The second relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;According to the second relationship mould
Type determines the data volume to be analyzed for corresponding to the preset computation complexity, as the data from the sample survey amount.
In some optional implementations, the method also includes the steps of training the second computation complexity model
Suddenly, comprising: obtain historical data analysis record, the historical data analysis record includes the number of at least one historical data set
Type is calculated according to amount and corresponding history computation complexity, history computing resource surplus and history polymerization;According to the history
Data analysis record training obtains the second computation complexity model.
In some optional implementations, the computation complexity includes: resource needed for calculating time-consuming and/or calculating
Amount.
Second aspect, this application provides a kind of data processing equipments, comprising: first acquisition unit, for obtaining wait divide
The polymerization for analysing data and the data to be analyzed calculates type;Sampling unit, for calculating type based on the polymerization and presetting
Computation complexity the data to be analyzed are sampled, obtain data from the sample survey;Computing unit, for the data from the sample survey
Carry out polymerization calculating;Display unit, for showing the polymerization calculated result of the data from the sample survey.
In some optional implementations, the sampling unit is for as follows carrying out the data to be analyzed
Sampling, obtains data from the sample survey: calculating type based on the polymerization and preset computation complexity determines the data to be analyzed
Data from the sample survey amount;According to the data from the sample survey amount, the data from the sample survey is extracted from the data to be analyzed.
In some optional implementations, the sampling unit is further according to the number to be analyzed as described in determining under type
According to data from the sample survey amount: the polymerization of the data to be analyzed is calculated into type and inputs the first computation complexity model for having trained,
Obtain the first relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;It is closed according to described first
It is that model determines the data volume to be analyzed for corresponding to the preset computation complexity, as the data from the sample survey amount.
In some optional implementations, described device further includes the first training unit, for instructing as follows
Practice the first computation complexity model: obtaining historical data analysis record, the historical data analysis record includes at least one
The data volume of a historical data set and corresponding history computation complexity and history polymerization calculate type;It is gone through according to described
The analysis record training of history data obtains the first computation complexity model.
In some optional implementations, described device further include: second acquisition unit, for obtaining available calculating
Resource excess;Data from the sample survey amount of the sampling unit further according to the data to be analyzed as described in determining under type: will as described in
The polymerization of data to be analyzed calculates type, the computing resource surplus inputs the second computation complexity model trained, and obtains
The second relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;According to the second relationship mould
Type determines the data volume to be analyzed for corresponding to the preset computation complexity, as the data from the sample survey amount.
In some optional implementations, described device further includes the second training unit, for training as follows
The second computation complexity model: historical data analysis record is obtained, the historical data analysis record includes at least one
The data volume of historical data set and corresponding history computation complexity, history computing resource surplus and history polymerization calculate
Type;The second computation complexity model is obtained according to historical data analysis record training.
In some optional implementations, the computation complexity includes: resource needed for calculating time-consuming and/or calculating
Amount.
Data processing method and device provided by the present application, by obtaining the poly- of data to be analyzed and the data to be analyzed
It is total to calculate type, type is then calculated based on polymerization and preset computation complexity is sampled data to be analyzed, is taken out
Sample data then carry out polymerization calculating to data from the sample survey, show the polymerization calculated result of data from the sample survey, finally so as to quick
Ground provides the data analysis result that part has reference value, improves the efficiency that large-scale data polymerization processing result is shown.
Detailed description of the invention
Non-limiting embodiment is described in detail referring to made by the following drawings by reading, other features,
Objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the data processing method of the application;
Fig. 3 is the schematic illustration according to the data processing method of the application;
Fig. 4 is the flow chart according to another embodiment of the data processing method of the application;
Fig. 5 is the flow chart according to the further embodiment of the data processing method of the application;
Fig. 6 is the structural schematic diagram of one embodiment of the application data processing equipment;
Fig. 7 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present application
Figure.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
Terminal device 101,102,103 can be with display screen and the various electronics of supporting network to be served by are set
It is standby, including but not limited to smart phone, tablet computer, smartwatch, E-book reader, MP3 player (Moving
Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4
(Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) is broadcast
Put device, pocket computer on knee and desktop computer etc..
It should be noted that data processing method provided by the embodiment of the present application is generally executed by server 105, accordingly
Ground, data processing equipment are generally positioned in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process 200 of one embodiment of the data processing method according to the application is shown.It is described
Data processing method, comprising the following steps:
In the present embodiment, the electronic equipment (such as server 105 shown in FIG. 1) of data processing method operation thereon
Data to be analyzed can be obtained from multiple terminal devices.Wherein, data to be analyzed may include the network behavior number of terminal device
According to carrying out the data of web page browsing, the data of map retrieval, audio/video playback by terminal device for example including user
The data of each network access application such as data.Meanwhile data processing method operation electronic equipment thereon can also be according to setting
The polymerization that fixed aggregating algorithm obtains data to be analyzed calculates type.Such as when aggregating algorithm is that execution n times are cumulative and m times is asked equal
When value, the type for polymerizeing calculating may include summation (sum) and be averaging (average).
In general, user is when accessing network by terminal device, terminal device can recorde the behavior of customer access network
The network behavior data of user are stored in network log by data.Such as when user's browsing webpage, terminal device can will be used
Network address, browsing time, the operation (such as the text information clicked, keyed in) executed on webpage of family browsing webpage etc. are recorded in
In web page browsing log.The log of the above-mentioned available terminal device of electronic equipment, as data to be analyzed.It needs to illustrate
It is that the data to be analyzed for the big data quantity that the above-mentioned available multiple terminals of electronic equipment save in the present embodiment can also be selected
Selecting property fetching portion network log as data to be analyzed, such as using the network log in nearest one month as number to be analyzed
According to.
In some optional implementations, the terminal device for recording user network behavioral data can be by network to number
Its network log is reported according to processing method operation electronic equipment thereon, and above-mentioned electronic equipment can also be by network to each end
End equipment issues network log and collects request, actively obtains network log from terminal device.It should be pointed out that above-mentioned network
Connection type can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB
The radio connections such as (ultra wideband) connection, also may include wired connection mode.
In some embodiments, above-mentioned electronic equipment can also be in the data for directly acquiring customer access network from the background.Example
As the available user of web page server passes through the web page address and temporal information of terminal device request web page data.
Polymerization, which calculates type and can be, manually to be set, for example, Data Analyst can set need to data to be analyzed into
Row is which kind of operation.Polymerization calculating type is also possible to determining according to polymerization result demand.Such as when needing statistical web page to access
When the tendency of amount, then it can be cumulative for polymerizeing calculating type.Polymerization calculates type can be to be a variety of, at this moment, acquired polymerization
Calculating type can also include the quantity that every kind of polymerization calculates type.
In the present embodiment, the polymerization calculating type that above-mentioned electronic equipment can be obtained according to step 201 determines to be analyzed
The computation complexity of data, computation complexity and preset computation complexity based on data to be analyzed determine sampling rate, thus
Data to be analyzed are sampled according to sampling rate, obtain data from the sample survey.
In some optional implementations, the computation complexity of data to be analyzed can calculate in the following way: set
Each polymerization of fixed data to be analyzed calculates the complexity of type, and the number then carried out according to each polymerization calculating type is to each
The complexity that polymerization calculates type adds up, and obtains the computation complexity that total complexity is data to be analyzed.
In some optional implementations, computation complexity may include resource needed for calculating time-consuming and/or calculating
Amount.Wherein, calculating time-consuming indicates to polymerize duration consumed by calculating, and it is occupied to calculate required stock number expression polymerization calculating
Amount, including amount of memory, such as it can be 1CPU+16G memory that polymerization, which calculates occupied stock number,.It is optional
Ground, calculating required stock number may include the occupied memory space of data to be analyzed, the occupied storage sky of polymerization calculating
Between and polymerization calculate the occupied memory space of result.
In some implementations, the computation complexity of data to be analyzed can also be calculated using empirical equation.In empirical equation,
Computation complexity is related to type and quantity that polymerization calculates.
Preset computation complexity can show time setting according to sampling results.If computation complexity is to calculate consumption
When, then it can determine that sampling results show that the time is preset computation complexity.Such as user needs to be sampled in 5 seconds
As a result, then preset computation complexity can be 5 seconds.If computation complexity is the stock number needed for calculating, can be according to meter
It calculates the positive correlation between time-consuming and stock number needed for calculating and determines that sampling results show time corresponding computation complexity.
Data volume and computation complexity to be calculated has positively related relationship.In the present embodiment, above-mentioned electronic equipment
It can determine the default corresponding data volume to be calculated of computation complexity, as data from the sample survey amount.Such as the number of data to be analyzed
It is 2 according to amount40, computation complexity is 1000 seconds, and the displaying time of sampling results is 10 seconds, if quantity to be calculated is
220When computation complexity be 10 seconds, then can determine data from the sample survey amount be 220.It in some embodiments, can be according to history
The corresponding relationship that data determine computation complexity Yu data volume to be calculated is calculated, determines that default computation complexity is corresponding later
Data volume to be calculated is data from the sample survey amount.
After determining data from the sample survey amount, the number to be analyzed of the data from the sample survey amount can be extracted based on a variety of methods of samplings
According to as data from the sample survey.The methods of sampling can include but is not limited to: random sampling, chester sampling, stratified sampling.
In the present embodiment, above-mentioned electronic equipment can polymerize data from the sample survey according to preset polymerization computation rule
It calculates.It may include multiclass aggregate function that polymerization, which calculates, and each aggregate function executes calculating to one group of data in data from the sample survey
And return to single value.Aggregate function can be user's customized function according to demand, be also possible to above-mentioned electronic equipment
The statistical analysis function saved in memory.Aggregate function (can return to number for example including AVG (returning to mean value), COUNT
Amount), MAX (return to maximum value), MIN (returning to minimum value), SUM (return and value), VAR (returning to statistical variance) etc..
In some embodiments, polymerization calculating can be carried out to data from the sample survey and data to be analyzed simultaneously.To number to be analyzed
It is consistent with the polymerization calculating type carried out to data from the sample survey that type is calculated according to the polymerization of progress, it can also be according to the need of real-time exhibition
The polymerization calculated result for determining data to be analyzed is sought, such as when user needs to quickly understand the summation of data, it can be to sampling number
According to read group total operation is executed, the operation such as AVG, COUNT, MAX, MIN, SUM, VAR is executed to data to be analyzed.
In some optional implementations, in order to obtain the polymerization calculated result of data from the sample survey as early as possible, can preferentially into
The polymerization of line sampling data calculates, and the polymerization calculating for restarting data to be analyzed after finishing is calculated to data from the sample survey.
In the present embodiment, above-mentioned electronic equipment can configure visualization interface, and sampling number is shown in visualization interface
According to polymerization calculated result.User can obtain polymerization calculated result by visualization interface.Above-mentioned electronic equipment can also be with
Other display equipment connection, shows the polymerization calculated result of data from the sample survey in the display device.Pass through the data from the sample survey of displaying
It polymerize calculated result, can quickly provides a user the rough estimates analysis result of data to be analyzed.
In some embodiments, after the polymerization for completing data to be analyzed calculates, the polymerization of data can be analysed to
Calculated result is shown in visualization interface.In this way, user's not only available real-time preliminary statistical result, but also available essence
True statistical result, to improve the efficiency for obtaining information.
Referring to FIG. 3, it illustrates the schematic illustrations according to the data processing method of the application.As shown in figure 3, obtaining
After taking the data to be analyzed 301 of big data quantity, it can be sampled, be sampled based on the computation complexity of data to be analyzed
Data 302.Later, polymerization calculating can be carried out to data from the sample survey 302, and the polymerization calculated result of data from the sample survey 302 is shown
In showing interface 303.Meanwhile polymerization calculating can be carried out to data 301 to be analyzed, it will be wait divide after polymerization calculates completion
The polymerization calculated result of analysis data 301, which is also illustrated in, to be shown in interface 303.
The data analysing method that the above embodiments of the present application provide, takes out data to be analyzed based on computation complexity
Sample, and show the polymerization calculated result of data from the sample survey, the PRELIMINARY RESULTS of data analysis can be provided quickly, in real time, improved
Large-scale data polymerize the efficiency that processing result is shown.
It, can be based on polymerization when being sampled in step 202 in some optional implementations of above-described embodiment
Calculate the data from the sample survey amount that type and preset computation complexity determine data to be analyzed, then according to data from the sample survey amount, to
The data from the sample survey is extracted in analysis data.The data from the sample survey amount of data to be analyzed can be determined using a variety of methods.Below
The method for determining the data from the sample survey amount of data to be analyzed is further described in conjunction with Fig. 4 and Fig. 5.
With further reference to Fig. 4, it illustrates the processes according to another embodiment of the data processing method of the application
400.As shown in figure 4, the process 400 of the data processing method, comprising the following steps:
In the present embodiment, the electronic equipment (such as server 105 shown in FIG. 1) of data processing method operation thereon
Data to be analyzed can be obtained from multiple terminal devices.Wherein, data to be analyzed may include the network behavior number of terminal device
According to, meanwhile, the electronic equipment of data processing method operation thereon can also obtain number to be analyzed according to the aggregating algorithm of setting
According to polymerization calculate type.Polymerization, which calculates type and can be, manually to be set, such as be can be and set according to polymerization result demand
's.In some embodiments, the quantity that each polymerization calculates type can also be obtained.
In some embodiments, above-mentioned electronic equipment can be in the data for directly acquiring customer access network from the background.Such as
The available user of web page server passes through the web page address and temporal information of terminal device request web page data.
In the present embodiment, data from the sample survey amount can be determined using the method for machine learning.Specifically, first can be based on
Computation complexity model determines the first relational model between computation complexity and data volume to be analyzed.First relational model can be with
It is a mathematic(al) representation, the first computation complexity model can be preset model, and input can calculate type for polymerization,
Output can relational expression between computation complexity and data volume to be analyzed.Wherein, the polymerization calculating type of input can
To include the quantity of every Type of Collective calculating type, i.e., the number that every kind polymerization calculating type is performed.First computation complexity can
Think that calculating is time-consuming.The polymerization for being analysed to data calculates type and inputs the first computation complexity model trained, and can obtain
The time-consuming corresponding relationship with data volume to be analyzed is calculated out expresses formula.
It in the present embodiment, can after obtaining the first relational model between computation complexity and data volume to be analyzed
To determine the data volume to be analyzed corresponding to preset computation complexity according to the first relational model.Specifically, if first
Mathematical relationship expression formula of the relational model between computation complexity and data volume to be analyzed, then can express according to mathematical relationship
Formula and preset computation complexity calculate corresponding data volume to be analyzed, as data from the sample survey amount.
Above-mentioned electronic equipment can extract the data for the data from the sample survey amount determined with step 403 from data to be analyzed,
As data from the sample survey.It, can also be according to calculated data volume to be analyzed in number to be analyzed in some optional implementations
According to ratio shared in total amount, obtains sampling rate, be then sampled according to sampling rate to obtain data from the sample survey.
In the present embodiment, above-mentioned electronic equipment can carry out polymerization calculating to data from the sample survey.Polymerization calculates
Various types of calculating operations in data statistic analysis, such as sum, be averaging, seek mean square deviation, be maximized, being minimized
Etc. types calculating.
In some embodiments, polymerization calculating can be carried out to data from the sample survey and data to be analyzed simultaneously.Other can
In the implementation of choosing, in order to obtain the polymerization calculated result of data from the sample survey as early as possible, it can preferentially be sampled the polymerization of data
It calculates, the polymerization calculating for restarting data to be analyzed after finishing is calculated to data from the sample survey.
In the present embodiment, it can be shown in the display equipment of visualization interface or connection that above-mentioned electronic equipment configures
The polymerization calculated result of data from the sample survey.By the polymerization calculated result of the data from the sample survey of displaying, can rapidly provide a user
Result is analyzed in the rough estimates of data to be analyzed.
In some optional implementations, above-mentioned data processing method can also include the first computation complexity mould of training
The step of type, comprising: obtain historical data analysis record, the first computation complexity is obtained according to historical data analysis record training
Model.Wherein, historical data analysis record includes the data volume of at least one historical data set, and corresponding history calculates complicated
Degree and history polymerization calculate type.Above-mentioned electronic equipment can obtain historical data analysis record from memory, be based on history
Data analysis record establishes training set and test set, then using training set the first complexity model of training, can be based on later
Test set is modified the parameter of the first complexity model.
In the present embodiment, step 401, step 405 and the step 406 in above-mentioned implementation process respectively with previous embodiment
In step 201, step 203 and step 204 it is identical, details are not described herein.
Compared with embodiment illustrated in fig. 2, the process 400 of data processing method shown in Fig. 4 has been refined to be calculated based on polymerization
The step of type and preset computation complexity are sampled data to be analyzed determines data from the sample survey according to the model trained
Amount further improves the reliability of data from the sample survey polymerization calculated result.
With further reference to Fig. 5, it illustrates the flow charts according to the further embodiment of the data processing method of the application.
The data processing method process 500, comprising the following steps:
In the present embodiment, the electronic equipment (such as server 105 shown in FIG. 1) of data processing method operation thereon
Data to be analyzed can be obtained from multiple terminal devices.Wherein, data to be analyzed may include the network row that terminal device reports
For data.The electronic equipment can also obtain to be analyzed according to artificial setting or the aggregating algorithm determined based on polymerization result demand
The polymerization of data calculates type.
In the present embodiment, above-mentioned electronic equipment can calculate current computing resource surplus, more than available resource
Amount.Computing resource can be CPU (Central Processing Unit, central processing unit) quantity and amount of ram, for example,
The computing resource total amount of above-mentioned electronic equipment can be 1CPU+4G memory and 2CPU+8G memory.
In some optional implementations, the above-mentioned available total resources of electronic equipment is determining other later
Then the occupied stock number of the program of operation carries out subtracting operation finding out available resource excess.Such as when above-mentioned electronic equipment
Computing resource total amount be 1CPU+4G memory and 2CPU+8G memory when, if other application occupy total resources be 2CPU+
8G memory, then available resource excess can be 1CPU+4G memory.
In the present embodiment, data from the sample survey amount can be determined using the method for machine learning.Specifically, second can be based on
Computation complexity model determines the second relational model between computation complexity and data volume to be analyzed.Second relational model can be with
It is a mathematic(al) representation, the second computation complexity may include stock number needed for calculating time-consuming and calculating.Second calculates again
Miscellaneous degree model can be preset model, and input can calculate type to polymerize, and output can be needed for calculating time-consuming, calculating
Stock number and data volume to be analyzed between relational expression.Wherein, it may include often birdsing of the same feather flock together that the polymerization of input, which calculates type,
Total quantity for calculating type, i.e., every kind polymerization calculate the number that type is performed.The polymerization calculating type for being analysed to data is defeated
Enter the second computation complexity model trained, you can get it calculates time-consuming, the required resource excess of calculating and data to be analyzed
The corresponding relationship of amount expresses formula.
In the present embodiment, preset computation complexity includes that preset calculating is time-consuming, calculates time-consuming, calculating institute obtaining
After the second relational model between the resource excess needed and data volume to be analyzed, it can be determined according to the second relational model pair
It should be in the preset data volume to be analyzed for calculating the available resource excess that time-consuming and step 502 obtains.Specifically, if
Two relational models are to calculate mathematical relationship expression formula time-consuming, needed for calculating between resource excess and data volume to be analyzed, then
Corresponding data to be analyzed can be calculated according to mathematical relationship expression formula, preset calculating time-consuming and available resource excess
Amount, as data from the sample survey amount.
Above-mentioned electronic equipment can extract the data for the data from the sample survey amount determined with step 504 from data to be analyzed,
As data from the sample survey.It, can also be according to calculated data volume to be analyzed in number to be analyzed in some optional implementations
According to ratio shared in total amount, obtains sampling rate, be then sampled according to sampling rate to obtain data from the sample survey.
In the present embodiment, above-mentioned electronic equipment can carry out polymerization calculating to data from the sample survey.Polymerization calculates
Various types of calculating operations in data statistic analysis, such as sum, be averaging, seek mean square deviation, be maximized, being minimized
Etc. types calculating.
In some embodiments, polymerization calculating can be carried out to data from the sample survey and data to be analyzed simultaneously.Other can
In the implementation of choosing, in order to obtain the polymerization calculated result of data from the sample survey as early as possible, it can preferentially be sampled the polymerization of data
It calculates, the polymerization calculating for restarting data to be analyzed after finishing is calculated to data from the sample survey.
In the present embodiment, it can be shown in the display equipment of visualization interface or connection that above-mentioned electronic equipment configures
The polymerization calculated result of data from the sample survey.By the polymerization calculated result of the data from the sample survey of displaying, can rapidly provide a user
Result is analyzed in the rough estimates of data to be analyzed.
In the present embodiment, step 501, step 506 and the step 407 in above-mentioned implementation process respectively with previous embodiment
In step 201, step 203 and step 204 it is identical, details are not described herein.
From figure 5 it can be seen that compared with embodiment shown in Fig. 4, the process of data processing method provided by the present application
500 increase the step 502 for obtaining available resource excess, the comprehensive available resource excess and pre- when determining data from the sample survey amount
If computation complexity demand accelerate to provide Data Analysis Services result to realize making full use of for system resources in computation
Speed.
Fig. 6 is the structural schematic diagram of one embodiment of the application data processing equipment.As shown in fig. 6, data processing fills
Setting 600 may include: first acquisition unit 601, sampling unit 603, computing unit 604 and display unit 605.Wherein,
One acquiring unit 601 is used to obtain data to be analyzed and the polymerization of the data to be analyzed calculates type;Sampling unit 602 is used for
Type is calculated based on polymerization and preset computation complexity is sampled data to be analyzed, obtains data from the sample survey;Computing unit
603 for carrying out polymerization calculating to data from the sample survey;Display unit 604 is used to show the polymerization calculated result of data from the sample survey.
In the present embodiment, first acquisition unit 601 can obtain data to be analyzed from multiple terminal devices.Wherein, to
Analysis data may include the network behavior data that terminal device reports.First acquisition unit 601 can also be according to artificial setting
Or the aggregating algorithm determined based on polymerization result demand obtains the polymerization calculating type of data to be analyzed.
In some optional implementations, sampling unit 602 can be used for as follows to first acquisition unit 601
The data to be analyzed obtained are sampled: calculating type based on polymerization and preset computation complexity determines the pumping of data to be analyzed
Sample data volume;According to data from the sample survey amount, the data from the sample survey is extracted from data to be analyzed.
In further implementation, sampling unit 602 can determine data to be analyzed using the method for machine learning
Data from the sample survey amount.A kind of optional mode includes: to be analysed to the polymerizations of data to calculate type input train first and count
Complexity model is calculated, obtains the first relational model between the computation complexity of data to be analyzed and data volume to be analyzed;According to
First relational model determines the data volume to be analyzed for corresponding to preset computation complexity, as data from the sample survey amount.
Further, data processing equipment 600 can also include second acquisition unit, for obtaining more than available resource
Amount.At this moment, sampling unit 602 can determine the data from the sample survey amount of data to be analyzed as follows: be analysed to data
Polymerization calculates type, computing resource surplus inputs the second computation complexity model trained, and obtains the calculating of data to be analyzed
The second relational model between complexity and data volume to be analyzed;It is determined according to the second relational model and corresponds to preset calculate again
The data volume to be analyzed of miscellaneous degree, as data from the sample survey amount.
Alternatively, or in addition, the device 600 further includes the first training unit and the second training unit.First training
Unit for training the first computation complexity model as follows: historical data analysis record is obtained, according to historical data
Analysis record training obtains the first computation complexity model.Wherein, for training the historical data of the first computation complexity model
Data volume and corresponding history computation complexity and history polymerization meter of the analysis record including at least one historical data set
Calculate type.Second training unit for training the second computation complexity model as follows: obtaining historical data analysis record;
The second computation complexity model is obtained according to historical data analysis record training.Wherein, for training the second calculating complicated
The historical data analysis record for spending model includes that the data volume of at least one historical data set and corresponding history calculate again
Miscellaneous degree, history computing resource surplus and history polymerization calculate type.
In some optional implementations, above-mentioned computation complexity includes: resource needed for calculating time-consuming and/or calculating
Amount.
It should be appreciated that all units recorded in device 600 are opposite with each step in the method with reference to Fig. 2-Fig. 5 description
It answers.Device 600 and list wherein included are equally applicable to above with respect to the operation and feature of data processing method description as a result,
Member, details are not described herein.Corresponding units in device 600 can cooperate with the unit in terminal device and/or server
To realize the scheme of the embodiment of the present application.
It will be understood by those skilled in the art that above-mentioned data processing equipment 600 further includes some other known features, such as
Processor, memory etc., in order to unnecessarily obscure embodiment of the disclosure, these well known structures are not shown in Fig. 6.
Data processing equipment provided by the present application can rapidly provide the data analysis knot that part has reference value
Fruit improves the efficiency that large-scale data polymerization processing result is shown.
Below with reference to Fig. 7, it illustrates the calculating of the terminal device or server that are suitable for being used to realize the embodiment of the present application
The structural schematic diagram of machine system 700.
As shown in fig. 7, computer system 700 includes central processing unit (CPU) 701, it can be read-only according to being stored in
Program in memory (ROM) 702 or be loaded into the program in random access storage device (RAM) 703 from storage section 708 and
Execute various movements appropriate and processing.In RAM 703, also it is stored with system 700 and operates required various programs and data.
CPU 701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 is also connected to always
Line 704.
I/O interface 705 is connected to lower component: the importation 706 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 708 including hard disk etc.;
And the communications portion 709 of the network interface card including LAN card, modem etc..Communications portion 709 via such as because
The network of spy's net executes communication process.Driver 710 is also connected to I/O interface 705 as needed.Detachable media 711, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 710, in order to read from thereon
Computer program be mounted into storage section 708 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine readable
Computer program on medium, the computer program include the program code for method shown in execution flow chart.At this
In the embodiment of sample, which can be downloaded and installed from network by communications portion 709, and/or from removable
Medium 711 is unloaded to be mounted.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong
The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer
The combination of order is realized.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include first acquisition unit, sampling unit, computing unit and display unit.Wherein, the title of these units is under certain conditions simultaneously
The restriction to the unit itself is not constituted, for example, first acquisition unit is also described as " obtaining data to be analyzed and described
The polymerization of data to be analyzed calculates the unit of type ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating
Machine storage medium can be nonvolatile computer storage media included in device described in above-described embodiment;It is also possible to
Individualism, without the nonvolatile computer storage media in supplying terminal.Above-mentioned nonvolatile computer storage media is deposited
One or more program is contained, when one or more of programs are executed by an equipment, so that the equipment: obtaining
The polymerization of data to be analyzed and the data to be analyzed calculates type;Type is calculated based on the polymerization and preset calculating is complicated
Degree is sampled the data to be analyzed, obtains data from the sample survey;Polymerization calculating is carried out to the data from the sample survey;Show the pumping
The polymerization calculated result of sample data.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (12)
1. a kind of data processing method characterized by comprising
The polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, and the data to be analyzed are the net of terminal device
Network behavioral data;
The polymerization of the data to be analyzed is calculated into type and inputs the first computation complexity model trained, is obtained described wait divide
Analyse the first relational model between the computation complexity and data volume to be analyzed of data;
The data volume to be analyzed for corresponding to preset computation complexity is determined according to first relational model, as described wait divide
Analyse the data from the sample survey amount of data;
According to the data from the sample survey amount, the data from the sample survey is extracted from the data to be analyzed;
Polymerization calculating is carried out to the data from the sample survey;
Show that the polymerization calculated result of the data from the sample survey, the polymerization calculated result are the statistical information of network behavior.
2. the method according to claim 1, wherein the method also includes training first computation complexities
The step of model, comprising:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set
Amount and corresponding history computation complexity and history polymerization calculate type;
The first computation complexity model is obtained according to historical data analysis record training.
3. method according to claim 1 or 2, which is characterized in that the computation complexity include: calculate it is time-consuming and/or
Stock number needed for calculating.
4. a kind of data processing method characterized by comprising
The polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, and the data to be analyzed are the net of terminal device
Network behavioral data;
Obtain available computing resource surplus;
The polymerization of the data to be analyzed is calculated into type, the computing resource surplus inputs the second computation complexity trained
Model obtains the second relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;
The data volume to be analyzed for corresponding to preset computation complexity is determined according to second relational model, as described wait divide
Analyse the data from the sample survey amount of data;
According to the data from the sample survey amount, the data from the sample survey is extracted from the data to be analyzed;
Polymerization calculating is carried out to the data from the sample survey;
Show that the polymerization calculated result of the data from the sample survey, the polymerization calculated result are the statistical information of network behavior.
5. according to the method described in claim 4, it is characterized in that, the method also includes training second computation complexities
The step of model, comprising:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set
Amount and corresponding history computation complexity, history computing resource surplus and history polymerization calculate type;
The second computation complexity model is obtained according to historical data analysis record training.
6. method according to claim 4 or 5, which is characterized in that the computation complexity include: calculate it is time-consuming and/or
Stock number needed for calculating.
7. a kind of data processing equipment characterized by comprising
First acquisition unit, the polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, described to be analyzed
Data are the network behavior data of terminal device;
Sampling unit inputs the first computation complexity mould trained for the polymerization of the data to be analyzed to be calculated type
Type obtains the first relational model between the computation complexity and data volume to be analyzed of the data to be analyzed, according to described
One relational model determines the data volume to be analyzed for corresponding to preset computation complexity, the sampling number as the data to be analyzed
The data from the sample survey is extracted from the data to be analyzed according to the data from the sample survey amount according to amount;
Computing unit, for carrying out polymerization calculating to the data from the sample survey;
Display unit, for showing the polymerization calculated result of the data from the sample survey, the polymerization calculated result is network behavior
Statistical information.
8. device according to claim 7, which is characterized in that described device further includes the first training unit, for according to
The first computation complexity model as described under type training:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set
Amount and corresponding history computation complexity and history polymerization calculate type;
The first computation complexity model is obtained according to historical data analysis record training.
9. device according to claim 7 or 8, which is characterized in that the computation complexity include: calculate it is time-consuming and/or
Stock number needed for calculating.
10. a kind of data processing equipment characterized by comprising
First acquisition unit, the polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, described to be analyzed
Data are the network behavior data of terminal device;
Second acquisition unit, for obtaining available computing resource surplus;
Sampling unit, for the polymerization of the data to be analyzed to be calculated type, computing resource surplus input has been trained
Second computation complexity model obtains the second relationship between the computation complexity and data volume to be analyzed of the data to be analyzed
Model, according to second relational model determine correspond to preset computation complexity data volume to be analyzed, as it is described to
The data from the sample survey amount of analysis data extracts the data from the sample survey from the data to be analyzed according to the data from the sample survey amount;
Computing unit, for carrying out polymerization calculating to the data from the sample survey;
Display unit, for showing the polymerization calculated result of the data from the sample survey, the polymerization calculated result is network behavior
Statistical information.
11. device according to claim 10, which is characterized in that described device further includes the second training unit, for pressing
The second computation complexity model as described under type training:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set
Amount and corresponding history computation complexity, history computing resource surplus and history polymerization calculate type;
The second computation complexity model is obtained according to historical data analysis record training.
12. device described in 0 or 11 according to claim 1, which is characterized in that the computation complexity include: calculate it is time-consuming and/
Or the stock number needed for calculating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610197491.2A CN105844107B (en) | 2016-03-31 | 2016-03-31 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610197491.2A CN105844107B (en) | 2016-03-31 | 2016-03-31 | Data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105844107A CN105844107A (en) | 2016-08-10 |
CN105844107B true CN105844107B (en) | 2019-10-15 |
Family
ID=56596374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610197491.2A Active CN105844107B (en) | 2016-03-31 | 2016-03-31 | Data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105844107B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110737679B (en) * | 2018-07-03 | 2022-06-14 | 百度在线网络技术(北京)有限公司 | Data resource query method, device, equipment and storage medium |
CN110737691B (en) * | 2018-07-03 | 2022-11-04 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing access behavior data |
CN110309235B (en) * | 2019-06-28 | 2022-01-07 | 京东科技控股股份有限公司 | Data processing method, device, equipment and medium |
CN112185575B (en) * | 2020-10-14 | 2024-01-16 | 北京嘉和美康信息技术有限公司 | Method and device for determining medical data to be compared |
CN113779150B (en) * | 2021-09-14 | 2024-06-18 | 杭州数梦工场科技有限公司 | Data quality assessment method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102262678A (en) * | 2011-08-16 | 2011-11-30 | 郑毅 | System for sampling mass data and managing sampled data |
CN102946319A (en) * | 2012-09-29 | 2013-02-27 | 焦点科技股份有限公司 | System and method for analyzing network user behavior information |
CN104317877A (en) * | 2014-10-21 | 2015-01-28 | 上海交通大学 | Netuser behavior data real-time processing method based on distributed computation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040077020A1 (en) * | 2001-11-30 | 2004-04-22 | Mannick Elizabeth E. | Diagnostic microarray for inflammatory bowel disease, crohn's disease and ulcerative colitis |
-
2016
- 2016-03-31 CN CN201610197491.2A patent/CN105844107B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102262678A (en) * | 2011-08-16 | 2011-11-30 | 郑毅 | System for sampling mass data and managing sampled data |
CN102946319A (en) * | 2012-09-29 | 2013-02-27 | 焦点科技股份有限公司 | System and method for analyzing network user behavior information |
CN104317877A (en) * | 2014-10-21 | 2015-01-28 | 上海交通大学 | Netuser behavior data real-time processing method based on distributed computation |
Also Published As
Publication number | Publication date |
---|---|
CN105844107A (en) | 2016-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105844107B (en) | Data processing method and device | |
CN105320766B (en) | Information-pushing method and device | |
CN107943583B (en) | Application processing method and device, storage medium and electronic equipment | |
CN105117491B (en) | Page push method and apparatus | |
US10657559B2 (en) | Generating and utilizing a conversational index for marketing campaigns | |
US10706454B2 (en) | Method, medium, and system for training and utilizing item-level importance sampling models | |
CN109598576A (en) | Service recommendation method, device and equipment | |
CN111626767B (en) | Resource data issuing method, device and equipment | |
CN109976997A (en) | Test method and device | |
WO2019062405A1 (en) | Application program processing method and apparatus, storage medium, and electronic device | |
US20170178144A1 (en) | Synchronized communication platform | |
CN112948695A (en) | User portrait based general financial fast loan product recommendation method and device | |
US11551256B2 (en) | Multivariate digital campaign content exploration utilizing rank-1 best-arm identification | |
CN108700928A (en) | Content is managed based on battery utilization rate when showing content on device | |
CN111552835A (en) | File recommendation method and device and server | |
CN113450230A (en) | Financing risk assessment method and device, storage medium and electronic equipment | |
CN110046571A (en) | The method and apparatus at age for identification | |
CN112905879B (en) | Recommendation method, recommendation device, server and storage medium | |
CN111683280A (en) | Video processing method and device and electronic equipment | |
CN115037665B (en) | Equipment testing method and device | |
CN109992614B (en) | Data acquisition method, device and server | |
CN115760296A (en) | Page data processing and browsing method, terminal device and storage medium | |
CN112269942B (en) | Method, device and system for recommending object and electronic equipment | |
CN114510668A (en) | Data display method and device, computer equipment and storage medium | |
CN113468354A (en) | Method and device for recommending chart, electronic equipment and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |