CN105844107B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN105844107B
CN105844107B CN201610197491.2A CN201610197491A CN105844107B CN 105844107 B CN105844107 B CN 105844107B CN 201610197491 A CN201610197491 A CN 201610197491A CN 105844107 B CN105844107 B CN 105844107B
Authority
CN
China
Prior art keywords
data
analyzed
polymerization
computation complexity
sample survey
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610197491.2A
Other languages
Chinese (zh)
Other versions
CN105844107A (en
Inventor
汪敏峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610197491.2A priority Critical patent/CN105844107B/en
Publication of CN105844107A publication Critical patent/CN105844107A/en
Application granted granted Critical
Publication of CN105844107B publication Critical patent/CN105844107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16ZINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
    • G16Z99/00Subject matter not provided for in other main groups of this subclass

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses data processing method and device.One specific embodiment of the method includes: to obtain the polymerization calculating type of data to be analyzed and data to be analyzed;Type is calculated based on polymerization and preset computation complexity is sampled data to be analyzed, obtains data from the sample survey;Polymerization calculating is carried out to data from the sample survey;Show the polymerization calculated result of data from the sample survey.The embodiment can rapidly provide the data analysis result that part has reference value, improve the efficiency that large-scale data polymerization processing result is shown.

Description

Data processing method and device
Technical field
This application involves field of computer technology, and in particular to technical field of telecommunications more particularly to data processing method And device.
Background technique
With the development of internet technology, more and more network datas are produced.Back-end data Analysis server can be with Polymerization analysis is carried out to the network data of generation, obtains the statistical information of the network behavior of big data quantity.Usual background server After the polymerization for completing all data to be analyzed calculates, the result that polymerization calculates is showed into user.
For ultra-large network data, due to server system resources, the limitation of computing capability, converging operation needs Longer time is consumed, it can not real-time exhibition polymerization result.At this moment, show that results page can be stuck in the shape for waiting result to return State, the efficiency for providing statistic analysis result for user are lower.
Summary of the invention
In view of this, it is desired to be able to which a kind of data analysis processing method for quickly showing polymerization result is provided.In order to solve Above-mentioned technical problem, this application provides the method and apparatus of data processing.
On the one hand, this application provides a kind of data processing methods, comprising: obtains data to be analyzed and the number to be analyzed According to polymerization calculate type;Type is calculated based on the polymerization and preset computation complexity takes out the data to be analyzed Sample obtains data from the sample survey;Polymerization calculating is carried out to the data from the sample survey;Show the polymerization calculated result of the data from the sample survey.
It is described that type and preset computation complexity are calculated to institute based on the polymerization in some optional implementations It states data to be analyzed to be sampled, obtains data from the sample survey, comprising: type and preset computation complexity are calculated based on the polymerization Determine the data from the sample survey amount of the data to be analyzed;According to the data from the sample survey amount, institute is extracted from the data to be analyzed State data from the sample survey.
It is described to be determined based on polymerization calculating type and preset computation complexity in some optional implementations The data from the sample survey amount of the data to be analyzed, comprising: the polymerization of the data to be analyzed is calculated into that type input has been trained One computation complexity model obtains the first relationship mould between the computation complexity and data volume to be analyzed of the data to be analyzed Type;The data volume to be analyzed for corresponding to the preset computation complexity is determined according to first relational model, as described Data from the sample survey amount.
In some optional implementations, the method also includes the steps of training the first computation complexity model Suddenly, comprising: obtain historical data analysis record, the historical data analysis record includes the number of at least one historical data set Type is calculated according to amount and corresponding history computation complexity and history polymerization;It is recorded and is trained according to the historical data analysis Obtain the first computation complexity model.
In some optional implementations, the method also includes: obtain available computing resource surplus;It is described to be based on The polymerization calculates type and preset computation complexity determines the data from the sample survey amount of the data to be analyzed, comprising: will be described The polymerization of data to be analyzed calculates type, the computing resource surplus inputs the second computation complexity model trained, and obtains The second relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;According to the second relationship mould Type determines the data volume to be analyzed for corresponding to the preset computation complexity, as the data from the sample survey amount.
In some optional implementations, the method also includes the steps of training the second computation complexity model Suddenly, comprising: obtain historical data analysis record, the historical data analysis record includes the number of at least one historical data set Type is calculated according to amount and corresponding history computation complexity, history computing resource surplus and history polymerization;According to the history Data analysis record training obtains the second computation complexity model.
In some optional implementations, the computation complexity includes: resource needed for calculating time-consuming and/or calculating Amount.
Second aspect, this application provides a kind of data processing equipments, comprising: first acquisition unit, for obtaining wait divide The polymerization for analysing data and the data to be analyzed calculates type;Sampling unit, for calculating type based on the polymerization and presetting Computation complexity the data to be analyzed are sampled, obtain data from the sample survey;Computing unit, for the data from the sample survey Carry out polymerization calculating;Display unit, for showing the polymerization calculated result of the data from the sample survey.
In some optional implementations, the sampling unit is for as follows carrying out the data to be analyzed Sampling, obtains data from the sample survey: calculating type based on the polymerization and preset computation complexity determines the data to be analyzed Data from the sample survey amount;According to the data from the sample survey amount, the data from the sample survey is extracted from the data to be analyzed.
In some optional implementations, the sampling unit is further according to the number to be analyzed as described in determining under type According to data from the sample survey amount: the polymerization of the data to be analyzed is calculated into type and inputs the first computation complexity model for having trained, Obtain the first relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;It is closed according to described first It is that model determines the data volume to be analyzed for corresponding to the preset computation complexity, as the data from the sample survey amount.
In some optional implementations, described device further includes the first training unit, for instructing as follows Practice the first computation complexity model: obtaining historical data analysis record, the historical data analysis record includes at least one The data volume of a historical data set and corresponding history computation complexity and history polymerization calculate type;It is gone through according to described The analysis record training of history data obtains the first computation complexity model.
In some optional implementations, described device further include: second acquisition unit, for obtaining available calculating Resource excess;Data from the sample survey amount of the sampling unit further according to the data to be analyzed as described in determining under type: will as described in The polymerization of data to be analyzed calculates type, the computing resource surplus inputs the second computation complexity model trained, and obtains The second relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;According to the second relationship mould Type determines the data volume to be analyzed for corresponding to the preset computation complexity, as the data from the sample survey amount.
In some optional implementations, described device further includes the second training unit, for training as follows The second computation complexity model: historical data analysis record is obtained, the historical data analysis record includes at least one The data volume of historical data set and corresponding history computation complexity, history computing resource surplus and history polymerization calculate Type;The second computation complexity model is obtained according to historical data analysis record training.
In some optional implementations, the computation complexity includes: resource needed for calculating time-consuming and/or calculating Amount.
Data processing method and device provided by the present application, by obtaining the poly- of data to be analyzed and the data to be analyzed It is total to calculate type, type is then calculated based on polymerization and preset computation complexity is sampled data to be analyzed, is taken out Sample data then carry out polymerization calculating to data from the sample survey, show the polymerization calculated result of data from the sample survey, finally so as to quick Ground provides the data analysis result that part has reference value, improves the efficiency that large-scale data polymerization processing result is shown.
Detailed description of the invention
Non-limiting embodiment is described in detail referring to made by the following drawings by reading, other features, Objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the data processing method of the application;
Fig. 3 is the schematic illustration according to the data processing method of the application;
Fig. 4 is the flow chart according to another embodiment of the data processing method of the application;
Fig. 5 is the flow chart according to the further embodiment of the data processing method of the application;
Fig. 6 is the structural schematic diagram of one embodiment of the application data processing equipment;
Fig. 7 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present application Figure.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User 110 can be used terminal device 101,102,103 and be interacted with server 105 by network 104, with reception or Send message etc..Network can be installed on terminal device 101,102,103 to be served by, such as browser, map application, sound Frequently/video playing application, the application of online service for life class etc..
Terminal device 101,102,103 can be with display screen and the various electronics of supporting network to be served by are set It is standby, including but not limited to smart phone, tablet computer, smartwatch, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) is broadcast Put device, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as to the net that terminal device 101,102,103 is shown Page provides the backstage web page server that data are supported.Backstage web page server can carry out data point to the access request received The processing such as analysis, and processing result (such as web data) is fed back into terminal device.
Server 105 can be back-end data Analysis server, for obtaining the network row of terminal device 101,102,103 For data and carry out data analysis.Such as the network log of the available terminal device 101,102,103 of server 105, and it is right The network log of acquisition analyzes and counts, to obtain the statistical result of network behavior data.
It should be noted that data processing method provided by the embodiment of the present application is generally executed by server 105, accordingly Ground, data processing equipment are generally positioned in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process 200 of one embodiment of the data processing method according to the application is shown.It is described Data processing method, comprising the following steps:
Step 201, the polymerization for obtaining data to be analyzed and data to be analyzed calculates type.
In the present embodiment, the electronic equipment (such as server 105 shown in FIG. 1) of data processing method operation thereon Data to be analyzed can be obtained from multiple terminal devices.Wherein, data to be analyzed may include the network behavior number of terminal device According to carrying out the data of web page browsing, the data of map retrieval, audio/video playback by terminal device for example including user The data of each network access application such as data.Meanwhile data processing method operation electronic equipment thereon can also be according to setting The polymerization that fixed aggregating algorithm obtains data to be analyzed calculates type.Such as when aggregating algorithm is that execution n times are cumulative and m times is asked equal When value, the type for polymerizeing calculating may include summation (sum) and be averaging (average).
In general, user is when accessing network by terminal device, terminal device can recorde the behavior of customer access network The network behavior data of user are stored in network log by data.Such as when user's browsing webpage, terminal device can will be used Network address, browsing time, the operation (such as the text information clicked, keyed in) executed on webpage of family browsing webpage etc. are recorded in In web page browsing log.The log of the above-mentioned available terminal device of electronic equipment, as data to be analyzed.It needs to illustrate It is that the data to be analyzed for the big data quantity that the above-mentioned available multiple terminals of electronic equipment save in the present embodiment can also be selected Selecting property fetching portion network log as data to be analyzed, such as using the network log in nearest one month as number to be analyzed According to.
In some optional implementations, the terminal device for recording user network behavioral data can be by network to number Its network log is reported according to processing method operation electronic equipment thereon, and above-mentioned electronic equipment can also be by network to each end End equipment issues network log and collects request, actively obtains network log from terminal device.It should be pointed out that above-mentioned network Connection type can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB The radio connections such as (ultra wideband) connection, also may include wired connection mode.
In some embodiments, above-mentioned electronic equipment can also be in the data for directly acquiring customer access network from the background.Example As the available user of web page server passes through the web page address and temporal information of terminal device request web page data.
Polymerization, which calculates type and can be, manually to be set, for example, Data Analyst can set need to data to be analyzed into Row is which kind of operation.Polymerization calculating type is also possible to determining according to polymerization result demand.Such as when needing statistical web page to access When the tendency of amount, then it can be cumulative for polymerizeing calculating type.Polymerization calculates type can be to be a variety of, at this moment, acquired polymerization Calculating type can also include the quantity that every kind of polymerization calculates type.
Step 202, type is calculated based on polymerization and preset computation complexity is sampled data to be analyzed, taken out Sample data.
In the present embodiment, the polymerization calculating type that above-mentioned electronic equipment can be obtained according to step 201 determines to be analyzed The computation complexity of data, computation complexity and preset computation complexity based on data to be analyzed determine sampling rate, thus Data to be analyzed are sampled according to sampling rate, obtain data from the sample survey.
In some optional implementations, the computation complexity of data to be analyzed can calculate in the following way: set Each polymerization of fixed data to be analyzed calculates the complexity of type, and the number then carried out according to each polymerization calculating type is to each The complexity that polymerization calculates type adds up, and obtains the computation complexity that total complexity is data to be analyzed.
In some optional implementations, computation complexity may include resource needed for calculating time-consuming and/or calculating Amount.Wherein, calculating time-consuming indicates to polymerize duration consumed by calculating, and it is occupied to calculate required stock number expression polymerization calculating Amount, including amount of memory, such as it can be 1CPU+16G memory that polymerization, which calculates occupied stock number,.It is optional Ground, calculating required stock number may include the occupied memory space of data to be analyzed, the occupied storage sky of polymerization calculating Between and polymerization calculate the occupied memory space of result.
In some implementations, the computation complexity of data to be analyzed can also be calculated using empirical equation.In empirical equation, Computation complexity is related to type and quantity that polymerization calculates.
Preset computation complexity can show time setting according to sampling results.If computation complexity is to calculate consumption When, then it can determine that sampling results show that the time is preset computation complexity.Such as user needs to be sampled in 5 seconds As a result, then preset computation complexity can be 5 seconds.If computation complexity is the stock number needed for calculating, can be according to meter It calculates the positive correlation between time-consuming and stock number needed for calculating and determines that sampling results show time corresponding computation complexity.
Data volume and computation complexity to be calculated has positively related relationship.In the present embodiment, above-mentioned electronic equipment It can determine the default corresponding data volume to be calculated of computation complexity, as data from the sample survey amount.Such as the number of data to be analyzed It is 2 according to amount40, computation complexity is 1000 seconds, and the displaying time of sampling results is 10 seconds, if quantity to be calculated is 220When computation complexity be 10 seconds, then can determine data from the sample survey amount be 220.It in some embodiments, can be according to history The corresponding relationship that data determine computation complexity Yu data volume to be calculated is calculated, determines that default computation complexity is corresponding later Data volume to be calculated is data from the sample survey amount.
After determining data from the sample survey amount, the number to be analyzed of the data from the sample survey amount can be extracted based on a variety of methods of samplings According to as data from the sample survey.The methods of sampling can include but is not limited to: random sampling, chester sampling, stratified sampling.
Step 203, polymerization calculating is carried out to data from the sample survey.
In the present embodiment, above-mentioned electronic equipment can polymerize data from the sample survey according to preset polymerization computation rule It calculates.It may include multiclass aggregate function that polymerization, which calculates, and each aggregate function executes calculating to one group of data in data from the sample survey And return to single value.Aggregate function can be user's customized function according to demand, be also possible to above-mentioned electronic equipment The statistical analysis function saved in memory.Aggregate function (can return to number for example including AVG (returning to mean value), COUNT Amount), MAX (return to maximum value), MIN (returning to minimum value), SUM (return and value), VAR (returning to statistical variance) etc..
In some embodiments, polymerization calculating can be carried out to data from the sample survey and data to be analyzed simultaneously.To number to be analyzed It is consistent with the polymerization calculating type carried out to data from the sample survey that type is calculated according to the polymerization of progress, it can also be according to the need of real-time exhibition The polymerization calculated result for determining data to be analyzed is sought, such as when user needs to quickly understand the summation of data, it can be to sampling number According to read group total operation is executed, the operation such as AVG, COUNT, MAX, MIN, SUM, VAR is executed to data to be analyzed.
In some optional implementations, in order to obtain the polymerization calculated result of data from the sample survey as early as possible, can preferentially into The polymerization of line sampling data calculates, and the polymerization calculating for restarting data to be analyzed after finishing is calculated to data from the sample survey.
Step 204, the polymerization calculated result of data from the sample survey is shown.
In the present embodiment, above-mentioned electronic equipment can configure visualization interface, and sampling number is shown in visualization interface According to polymerization calculated result.User can obtain polymerization calculated result by visualization interface.Above-mentioned electronic equipment can also be with Other display equipment connection, shows the polymerization calculated result of data from the sample survey in the display device.Pass through the data from the sample survey of displaying It polymerize calculated result, can quickly provides a user the rough estimates analysis result of data to be analyzed.
In some embodiments, after the polymerization for completing data to be analyzed calculates, the polymerization of data can be analysed to Calculated result is shown in visualization interface.In this way, user's not only available real-time preliminary statistical result, but also available essence True statistical result, to improve the efficiency for obtaining information.
Referring to FIG. 3, it illustrates the schematic illustrations according to the data processing method of the application.As shown in figure 3, obtaining After taking the data to be analyzed 301 of big data quantity, it can be sampled, be sampled based on the computation complexity of data to be analyzed Data 302.Later, polymerization calculating can be carried out to data from the sample survey 302, and the polymerization calculated result of data from the sample survey 302 is shown In showing interface 303.Meanwhile polymerization calculating can be carried out to data 301 to be analyzed, it will be wait divide after polymerization calculates completion The polymerization calculated result of analysis data 301, which is also illustrated in, to be shown in interface 303.
The data analysing method that the above embodiments of the present application provide, takes out data to be analyzed based on computation complexity Sample, and show the polymerization calculated result of data from the sample survey, the PRELIMINARY RESULTS of data analysis can be provided quickly, in real time, improved Large-scale data polymerize the efficiency that processing result is shown.
It, can be based on polymerization when being sampled in step 202 in some optional implementations of above-described embodiment Calculate the data from the sample survey amount that type and preset computation complexity determine data to be analyzed, then according to data from the sample survey amount, to The data from the sample survey is extracted in analysis data.The data from the sample survey amount of data to be analyzed can be determined using a variety of methods.Below The method for determining the data from the sample survey amount of data to be analyzed is further described in conjunction with Fig. 4 and Fig. 5.
With further reference to Fig. 4, it illustrates the processes according to another embodiment of the data processing method of the application 400.As shown in figure 4, the process 400 of the data processing method, comprising the following steps:
Step 401, the polymerization for obtaining data to be analyzed and data to be analyzed calculates type.
In the present embodiment, the electronic equipment (such as server 105 shown in FIG. 1) of data processing method operation thereon Data to be analyzed can be obtained from multiple terminal devices.Wherein, data to be analyzed may include the network behavior number of terminal device According to, meanwhile, the electronic equipment of data processing method operation thereon can also obtain number to be analyzed according to the aggregating algorithm of setting According to polymerization calculate type.Polymerization, which calculates type and can be, manually to be set, such as be can be and set according to polymerization result demand 's.In some embodiments, the quantity that each polymerization calculates type can also be obtained.
In some embodiments, above-mentioned electronic equipment can be in the data for directly acquiring customer access network from the background.Such as The available user of web page server passes through the web page address and temporal information of terminal device request web page data.
Step 402, the polymerization for being analysed to data calculates type and inputs the first computation complexity model trained, and obtains The first relational model between the computation complexity of data to be analyzed and data volume to be analyzed.
In the present embodiment, data from the sample survey amount can be determined using the method for machine learning.Specifically, first can be based on Computation complexity model determines the first relational model between computation complexity and data volume to be analyzed.First relational model can be with It is a mathematic(al) representation, the first computation complexity model can be preset model, and input can calculate type for polymerization, Output can relational expression between computation complexity and data volume to be analyzed.Wherein, the polymerization calculating type of input can To include the quantity of every Type of Collective calculating type, i.e., the number that every kind polymerization calculating type is performed.First computation complexity can Think that calculating is time-consuming.The polymerization for being analysed to data calculates type and inputs the first computation complexity model trained, and can obtain The time-consuming corresponding relationship with data volume to be analyzed is calculated out expresses formula.
Step 403, the data volume to be analyzed for corresponding to preset computation complexity is determined according to the first relational model, as Data from the sample survey amount.
It in the present embodiment, can after obtaining the first relational model between computation complexity and data volume to be analyzed To determine the data volume to be analyzed corresponding to preset computation complexity according to the first relational model.Specifically, if first Mathematical relationship expression formula of the relational model between computation complexity and data volume to be analyzed, then can express according to mathematical relationship Formula and preset computation complexity calculate corresponding data volume to be analyzed, as data from the sample survey amount.
Step 404, according to data from the sample survey amount, data from the sample survey is extracted from data to be analyzed.
Above-mentioned electronic equipment can extract the data for the data from the sample survey amount determined with step 403 from data to be analyzed, As data from the sample survey.It, can also be according to calculated data volume to be analyzed in number to be analyzed in some optional implementations According to ratio shared in total amount, obtains sampling rate, be then sampled according to sampling rate to obtain data from the sample survey.
Step 405, polymerization calculating is carried out to data from the sample survey.
In the present embodiment, above-mentioned electronic equipment can carry out polymerization calculating to data from the sample survey.Polymerization calculates Various types of calculating operations in data statistic analysis, such as sum, be averaging, seek mean square deviation, be maximized, being minimized Etc. types calculating.
In some embodiments, polymerization calculating can be carried out to data from the sample survey and data to be analyzed simultaneously.Other can In the implementation of choosing, in order to obtain the polymerization calculated result of data from the sample survey as early as possible, it can preferentially be sampled the polymerization of data It calculates, the polymerization calculating for restarting data to be analyzed after finishing is calculated to data from the sample survey.
Step 406, the polymerization calculated result of data from the sample survey is shown.
In the present embodiment, it can be shown in the display equipment of visualization interface or connection that above-mentioned electronic equipment configures The polymerization calculated result of data from the sample survey.By the polymerization calculated result of the data from the sample survey of displaying, can rapidly provide a user Result is analyzed in the rough estimates of data to be analyzed.
In some optional implementations, above-mentioned data processing method can also include the first computation complexity mould of training The step of type, comprising: obtain historical data analysis record, the first computation complexity is obtained according to historical data analysis record training Model.Wherein, historical data analysis record includes the data volume of at least one historical data set, and corresponding history calculates complicated Degree and history polymerization calculate type.Above-mentioned electronic equipment can obtain historical data analysis record from memory, be based on history Data analysis record establishes training set and test set, then using training set the first complexity model of training, can be based on later Test set is modified the parameter of the first complexity model.
In the present embodiment, step 401, step 405 and the step 406 in above-mentioned implementation process respectively with previous embodiment In step 201, step 203 and step 204 it is identical, details are not described herein.
Compared with embodiment illustrated in fig. 2, the process 400 of data processing method shown in Fig. 4 has been refined to be calculated based on polymerization The step of type and preset computation complexity are sampled data to be analyzed determines data from the sample survey according to the model trained Amount further improves the reliability of data from the sample survey polymerization calculated result.
With further reference to Fig. 5, it illustrates the flow charts according to the further embodiment of the data processing method of the application. The data processing method process 500, comprising the following steps:
Step 501, the polymerization for obtaining data to be analyzed and data to be analyzed calculates type.
In the present embodiment, the electronic equipment (such as server 105 shown in FIG. 1) of data processing method operation thereon Data to be analyzed can be obtained from multiple terminal devices.Wherein, data to be analyzed may include the network row that terminal device reports For data.The electronic equipment can also obtain to be analyzed according to artificial setting or the aggregating algorithm determined based on polymerization result demand The polymerization of data calculates type.
Step 502, available resource excess is obtained.
In the present embodiment, above-mentioned electronic equipment can calculate current computing resource surplus, more than available resource Amount.Computing resource can be CPU (Central Processing Unit, central processing unit) quantity and amount of ram, for example, The computing resource total amount of above-mentioned electronic equipment can be 1CPU+4G memory and 2CPU+8G memory.
In some optional implementations, the above-mentioned available total resources of electronic equipment is determining other later Then the occupied stock number of the program of operation carries out subtracting operation finding out available resource excess.Such as when above-mentioned electronic equipment Computing resource total amount be 1CPU+4G memory and 2CPU+8G memory when, if other application occupy total resources be 2CPU+ 8G memory, then available resource excess can be 1CPU+4G memory.
Step 503, the polymerization for being analysed to data calculates type, second that the input of computing resource surplus has been trained calculates again Miscellaneous degree model obtains the second relational model between the computation complexity of data to be analyzed and data volume to be analyzed.
In the present embodiment, data from the sample survey amount can be determined using the method for machine learning.Specifically, second can be based on Computation complexity model determines the second relational model between computation complexity and data volume to be analyzed.Second relational model can be with It is a mathematic(al) representation, the second computation complexity may include stock number needed for calculating time-consuming and calculating.Second calculates again Miscellaneous degree model can be preset model, and input can calculate type to polymerize, and output can be needed for calculating time-consuming, calculating Stock number and data volume to be analyzed between relational expression.Wherein, it may include often birdsing of the same feather flock together that the polymerization of input, which calculates type, Total quantity for calculating type, i.e., every kind polymerization calculate the number that type is performed.The polymerization calculating type for being analysed to data is defeated Enter the second computation complexity model trained, you can get it calculates time-consuming, the required resource excess of calculating and data to be analyzed The corresponding relationship of amount expresses formula.
Step 504, the data volume to be analyzed for corresponding to preset computation complexity is determined according to the second relational model, as Data from the sample survey amount.
In the present embodiment, preset computation complexity includes that preset calculating is time-consuming, calculates time-consuming, calculating institute obtaining After the second relational model between the resource excess needed and data volume to be analyzed, it can be determined according to the second relational model pair It should be in the preset data volume to be analyzed for calculating the available resource excess that time-consuming and step 502 obtains.Specifically, if Two relational models are to calculate mathematical relationship expression formula time-consuming, needed for calculating between resource excess and data volume to be analyzed, then Corresponding data to be analyzed can be calculated according to mathematical relationship expression formula, preset calculating time-consuming and available resource excess Amount, as data from the sample survey amount.
Step 505, according to data from the sample survey amount, data from the sample survey is extracted from data to be analyzed.
Above-mentioned electronic equipment can extract the data for the data from the sample survey amount determined with step 504 from data to be analyzed, As data from the sample survey.It, can also be according to calculated data volume to be analyzed in number to be analyzed in some optional implementations According to ratio shared in total amount, obtains sampling rate, be then sampled according to sampling rate to obtain data from the sample survey.
Step 506, polymerization calculating is carried out to data from the sample survey.
In the present embodiment, above-mentioned electronic equipment can carry out polymerization calculating to data from the sample survey.Polymerization calculates Various types of calculating operations in data statistic analysis, such as sum, be averaging, seek mean square deviation, be maximized, being minimized Etc. types calculating.
In some embodiments, polymerization calculating can be carried out to data from the sample survey and data to be analyzed simultaneously.Other can In the implementation of choosing, in order to obtain the polymerization calculated result of data from the sample survey as early as possible, it can preferentially be sampled the polymerization of data It calculates, the polymerization calculating for restarting data to be analyzed after finishing is calculated to data from the sample survey.
Step 507, the polymerization calculated result of data from the sample survey is shown.
In the present embodiment, it can be shown in the display equipment of visualization interface or connection that above-mentioned electronic equipment configures The polymerization calculated result of data from the sample survey.By the polymerization calculated result of the data from the sample survey of displaying, can rapidly provide a user Result is analyzed in the rough estimates of data to be analyzed.
In the present embodiment, step 501, step 506 and the step 407 in above-mentioned implementation process respectively with previous embodiment In step 201, step 203 and step 204 it is identical, details are not described herein.
From figure 5 it can be seen that compared with embodiment shown in Fig. 4, the process of data processing method provided by the present application 500 increase the step 502 for obtaining available resource excess, the comprehensive available resource excess and pre- when determining data from the sample survey amount If computation complexity demand accelerate to provide Data Analysis Services result to realize making full use of for system resources in computation Speed.
Fig. 6 is the structural schematic diagram of one embodiment of the application data processing equipment.As shown in fig. 6, data processing fills Setting 600 may include: first acquisition unit 601, sampling unit 603, computing unit 604 and display unit 605.Wherein, One acquiring unit 601 is used to obtain data to be analyzed and the polymerization of the data to be analyzed calculates type;Sampling unit 602 is used for Type is calculated based on polymerization and preset computation complexity is sampled data to be analyzed, obtains data from the sample survey;Computing unit 603 for carrying out polymerization calculating to data from the sample survey;Display unit 604 is used to show the polymerization calculated result of data from the sample survey.
In the present embodiment, first acquisition unit 601 can obtain data to be analyzed from multiple terminal devices.Wherein, to Analysis data may include the network behavior data that terminal device reports.First acquisition unit 601 can also be according to artificial setting Or the aggregating algorithm determined based on polymerization result demand obtains the polymerization calculating type of data to be analyzed.
Sampling unit 602 can determine sampling based on the computation complexity of data to be analyzed and preset computation complexity Rate obtains data from the sample survey to be sampled according to sampling rate to data to be analyzed.Specifically, sampling unit 602 can basis Preset polymerization calculates type and calculates calculating time-consuming and required amount of computational resources that the polymerization of data to be analyzed calculates, as The computation complexity of data to be analyzed.Optionally, the data volume positive of the computation complexity of data to be analyzed and data to be analyzed It closes.Then computation complexity is the variable increased as data volume to be analyzed increases.At this moment, it is complicated that preset calculating can be calculated Corresponding data volume to be analyzed is spent, as data from the sample survey amount.And it is extracted from data to be analyzed equal with the data from the sample survey amount Data as data from the sample survey.
In some optional implementations, sampling unit 602 can be used for as follows to first acquisition unit 601 The data to be analyzed obtained are sampled: calculating type based on polymerization and preset computation complexity determines the pumping of data to be analyzed Sample data volume;According to data from the sample survey amount, the data from the sample survey is extracted from data to be analyzed.
In further implementation, sampling unit 602 can determine data to be analyzed using the method for machine learning Data from the sample survey amount.A kind of optional mode includes: to be analysed to the polymerizations of data to calculate type input train first and count Complexity model is calculated, obtains the first relational model between the computation complexity of data to be analyzed and data volume to be analyzed;According to First relational model determines the data volume to be analyzed for corresponding to preset computation complexity, as data from the sample survey amount.
Further, data processing equipment 600 can also include second acquisition unit, for obtaining more than available resource Amount.At this moment, sampling unit 602 can determine the data from the sample survey amount of data to be analyzed as follows: be analysed to data Polymerization calculates type, computing resource surplus inputs the second computation complexity model trained, and obtains the calculating of data to be analyzed The second relational model between complexity and data volume to be analyzed;It is determined according to the second relational model and corresponds to preset calculate again The data volume to be analyzed of miscellaneous degree, as data from the sample survey amount.
Alternatively, or in addition, the device 600 further includes the first training unit and the second training unit.First training Unit for training the first computation complexity model as follows: historical data analysis record is obtained, according to historical data Analysis record training obtains the first computation complexity model.Wherein, for training the historical data of the first computation complexity model Data volume and corresponding history computation complexity and history polymerization meter of the analysis record including at least one historical data set Calculate type.Second training unit for training the second computation complexity model as follows: obtaining historical data analysis record; The second computation complexity model is obtained according to historical data analysis record training.Wherein, for training the second calculating complicated The historical data analysis record for spending model includes that the data volume of at least one historical data set and corresponding history calculate again Miscellaneous degree, history computing resource surplus and history polymerization calculate type.
Computing unit 603 can carry out the data from the sample survey that sampling unit 602 obtains according to preset polymerization computation rule Polymerization calculates.It may include multiclass aggregate function that polymerization, which calculates, and each aggregate function executes one group of data in data from the sample survey It calculates and returns to single value.Aggregate function can be user's customized function according to demand, is also possible to above-mentioned electronics and sets The statistical analysis function saved in standby memory.
Display unit 604 can be shown the calculated result of computing unit 603.Display unit can be in a variety of forms Show the polymerization calculated result of data from the sample survey.Such as can graphically be shown, can also in the form of document into Row is shown.
In some optional implementations, above-mentioned computation complexity includes: resource needed for calculating time-consuming and/or calculating Amount.
It should be appreciated that all units recorded in device 600 are opposite with each step in the method with reference to Fig. 2-Fig. 5 description It answers.Device 600 and list wherein included are equally applicable to above with respect to the operation and feature of data processing method description as a result, Member, details are not described herein.Corresponding units in device 600 can cooperate with the unit in terminal device and/or server To realize the scheme of the embodiment of the present application.
It will be understood by those skilled in the art that above-mentioned data processing equipment 600 further includes some other known features, such as Processor, memory etc., in order to unnecessarily obscure embodiment of the disclosure, these well known structures are not shown in Fig. 6.
Data processing equipment provided by the present application can rapidly provide the data analysis knot that part has reference value Fruit improves the efficiency that large-scale data polymerization processing result is shown.
Below with reference to Fig. 7, it illustrates the calculating of the terminal device or server that are suitable for being used to realize the embodiment of the present application The structural schematic diagram of machine system 700.
As shown in fig. 7, computer system 700 includes central processing unit (CPU) 701, it can be read-only according to being stored in Program in memory (ROM) 702 or be loaded into the program in random access storage device (RAM) 703 from storage section 708 and Execute various movements appropriate and processing.In RAM 703, also it is stored with system 700 and operates required various programs and data. CPU 701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 is also connected to always Line 704.
I/O interface 705 is connected to lower component: the importation 706 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 708 including hard disk etc.; And the communications portion 709 of the network interface card including LAN card, modem etc..Communications portion 709 via such as because The network of spy's net executes communication process.Driver 710 is also connected to I/O interface 705 as needed.Detachable media 711, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 710, in order to read from thereon Computer program be mounted into storage section 708 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine readable Computer program on medium, the computer program include the program code for method shown in execution flow chart.At this In the embodiment of sample, which can be downloaded and installed from network by communications portion 709, and/or from removable Medium 711 is unloaded to be mounted.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include first acquisition unit, sampling unit, computing unit and display unit.Wherein, the title of these units is under certain conditions simultaneously The restriction to the unit itself is not constituted, for example, first acquisition unit is also described as " obtaining data to be analyzed and described The polymerization of data to be analyzed calculates the unit of type ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating Machine storage medium can be nonvolatile computer storage media included in device described in above-described embodiment;It is also possible to Individualism, without the nonvolatile computer storage media in supplying terminal.Above-mentioned nonvolatile computer storage media is deposited One or more program is contained, when one or more of programs are executed by an equipment, so that the equipment: obtaining The polymerization of data to be analyzed and the data to be analyzed calculates type;Type is calculated based on the polymerization and preset calculating is complicated Degree is sampled the data to be analyzed, obtains data from the sample survey;Polymerization calculating is carried out to the data from the sample survey;Show the pumping The polymerization calculated result of sample data.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (12)

1. a kind of data processing method characterized by comprising
The polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, and the data to be analyzed are the net of terminal device Network behavioral data;
The polymerization of the data to be analyzed is calculated into type and inputs the first computation complexity model trained, is obtained described wait divide Analyse the first relational model between the computation complexity and data volume to be analyzed of data;
The data volume to be analyzed for corresponding to preset computation complexity is determined according to first relational model, as described wait divide Analyse the data from the sample survey amount of data;
According to the data from the sample survey amount, the data from the sample survey is extracted from the data to be analyzed;
Polymerization calculating is carried out to the data from the sample survey;
Show that the polymerization calculated result of the data from the sample survey, the polymerization calculated result are the statistical information of network behavior.
2. the method according to claim 1, wherein the method also includes training first computation complexities The step of model, comprising:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set Amount and corresponding history computation complexity and history polymerization calculate type;
The first computation complexity model is obtained according to historical data analysis record training.
3. method according to claim 1 or 2, which is characterized in that the computation complexity include: calculate it is time-consuming and/or Stock number needed for calculating.
4. a kind of data processing method characterized by comprising
The polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, and the data to be analyzed are the net of terminal device Network behavioral data;
Obtain available computing resource surplus;
The polymerization of the data to be analyzed is calculated into type, the computing resource surplus inputs the second computation complexity trained Model obtains the second relational model between the computation complexity and data volume to be analyzed of the data to be analyzed;
The data volume to be analyzed for corresponding to preset computation complexity is determined according to second relational model, as described wait divide Analyse the data from the sample survey amount of data;
According to the data from the sample survey amount, the data from the sample survey is extracted from the data to be analyzed;
Polymerization calculating is carried out to the data from the sample survey;
Show that the polymerization calculated result of the data from the sample survey, the polymerization calculated result are the statistical information of network behavior.
5. according to the method described in claim 4, it is characterized in that, the method also includes training second computation complexities The step of model, comprising:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set Amount and corresponding history computation complexity, history computing resource surplus and history polymerization calculate type;
The second computation complexity model is obtained according to historical data analysis record training.
6. method according to claim 4 or 5, which is characterized in that the computation complexity include: calculate it is time-consuming and/or Stock number needed for calculating.
7. a kind of data processing equipment characterized by comprising
First acquisition unit, the polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, described to be analyzed Data are the network behavior data of terminal device;
Sampling unit inputs the first computation complexity mould trained for the polymerization of the data to be analyzed to be calculated type Type obtains the first relational model between the computation complexity and data volume to be analyzed of the data to be analyzed, according to described One relational model determines the data volume to be analyzed for corresponding to preset computation complexity, the sampling number as the data to be analyzed The data from the sample survey is extracted from the data to be analyzed according to the data from the sample survey amount according to amount;
Computing unit, for carrying out polymerization calculating to the data from the sample survey;
Display unit, for showing the polymerization calculated result of the data from the sample survey, the polymerization calculated result is network behavior Statistical information.
8. device according to claim 7, which is characterized in that described device further includes the first training unit, for according to The first computation complexity model as described under type training:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set Amount and corresponding history computation complexity and history polymerization calculate type;
The first computation complexity model is obtained according to historical data analysis record training.
9. device according to claim 7 or 8, which is characterized in that the computation complexity include: calculate it is time-consuming and/or Stock number needed for calculating.
10. a kind of data processing equipment characterized by comprising
First acquisition unit, the polymerization for obtaining data to be analyzed and the data to be analyzed calculates type, described to be analyzed Data are the network behavior data of terminal device;
Second acquisition unit, for obtaining available computing resource surplus;
Sampling unit, for the polymerization of the data to be analyzed to be calculated type, computing resource surplus input has been trained Second computation complexity model obtains the second relationship between the computation complexity and data volume to be analyzed of the data to be analyzed Model, according to second relational model determine correspond to preset computation complexity data volume to be analyzed, as it is described to The data from the sample survey amount of analysis data extracts the data from the sample survey from the data to be analyzed according to the data from the sample survey amount;
Computing unit, for carrying out polymerization calculating to the data from the sample survey;
Display unit, for showing the polymerization calculated result of the data from the sample survey, the polymerization calculated result is network behavior Statistical information.
11. device according to claim 10, which is characterized in that described device further includes the second training unit, for pressing The second computation complexity model as described under type training:
Historical data analysis record is obtained, the historical data analysis record includes the data of at least one historical data set Amount and corresponding history computation complexity, history computing resource surplus and history polymerization calculate type;
The second computation complexity model is obtained according to historical data analysis record training.
12. device described in 0 or 11 according to claim 1, which is characterized in that the computation complexity include: calculate it is time-consuming and/ Or the stock number needed for calculating.
CN201610197491.2A 2016-03-31 2016-03-31 Data processing method and device Active CN105844107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610197491.2A CN105844107B (en) 2016-03-31 2016-03-31 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610197491.2A CN105844107B (en) 2016-03-31 2016-03-31 Data processing method and device

Publications (2)

Publication Number Publication Date
CN105844107A CN105844107A (en) 2016-08-10
CN105844107B true CN105844107B (en) 2019-10-15

Family

ID=56596374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610197491.2A Active CN105844107B (en) 2016-03-31 2016-03-31 Data processing method and device

Country Status (1)

Country Link
CN (1) CN105844107B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110737679B (en) * 2018-07-03 2022-06-14 百度在线网络技术(北京)有限公司 Data resource query method, device, equipment and storage medium
CN110737691B (en) * 2018-07-03 2022-11-04 百度在线网络技术(北京)有限公司 Method and apparatus for processing access behavior data
CN110309235B (en) * 2019-06-28 2022-01-07 京东科技控股股份有限公司 Data processing method, device, equipment and medium
CN112185575B (en) * 2020-10-14 2024-01-16 北京嘉和美康信息技术有限公司 Method and device for determining medical data to be compared
CN113779150B (en) * 2021-09-14 2024-06-18 杭州数梦工场科技有限公司 Data quality assessment method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262678A (en) * 2011-08-16 2011-11-30 郑毅 System for sampling mass data and managing sampled data
CN102946319A (en) * 2012-09-29 2013-02-27 焦点科技股份有限公司 System and method for analyzing network user behavior information
CN104317877A (en) * 2014-10-21 2015-01-28 上海交通大学 Netuser behavior data real-time processing method based on distributed computation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040077020A1 (en) * 2001-11-30 2004-04-22 Mannick Elizabeth E. Diagnostic microarray for inflammatory bowel disease, crohn's disease and ulcerative colitis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262678A (en) * 2011-08-16 2011-11-30 郑毅 System for sampling mass data and managing sampled data
CN102946319A (en) * 2012-09-29 2013-02-27 焦点科技股份有限公司 System and method for analyzing network user behavior information
CN104317877A (en) * 2014-10-21 2015-01-28 上海交通大学 Netuser behavior data real-time processing method based on distributed computation

Also Published As

Publication number Publication date
CN105844107A (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN105844107B (en) Data processing method and device
CN105320766B (en) Information-pushing method and device
CN107943583B (en) Application processing method and device, storage medium and electronic equipment
CN105117491B (en) Page push method and apparatus
US10657559B2 (en) Generating and utilizing a conversational index for marketing campaigns
US10706454B2 (en) Method, medium, and system for training and utilizing item-level importance sampling models
CN109598576A (en) Service recommendation method, device and equipment
CN111626767B (en) Resource data issuing method, device and equipment
CN109976997A (en) Test method and device
WO2019062405A1 (en) Application program processing method and apparatus, storage medium, and electronic device
US20170178144A1 (en) Synchronized communication platform
CN112948695A (en) User portrait based general financial fast loan product recommendation method and device
US11551256B2 (en) Multivariate digital campaign content exploration utilizing rank-1 best-arm identification
CN108700928A (en) Content is managed based on battery utilization rate when showing content on device
CN111552835A (en) File recommendation method and device and server
CN113450230A (en) Financing risk assessment method and device, storage medium and electronic equipment
CN110046571A (en) The method and apparatus at age for identification
CN112905879B (en) Recommendation method, recommendation device, server and storage medium
CN111683280A (en) Video processing method and device and electronic equipment
CN115037665B (en) Equipment testing method and device
CN109992614B (en) Data acquisition method, device and server
CN115760296A (en) Page data processing and browsing method, terminal device and storage medium
CN112269942B (en) Method, device and system for recommending object and electronic equipment
CN114510668A (en) Data display method and device, computer equipment and storage medium
CN113468354A (en) Method and device for recommending chart, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant