CN108984700A - Data processing method and device, computer equipment and storage medium - Google Patents

Data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN108984700A
CN108984700A CN201810729988.3A CN201810729988A CN108984700A CN 108984700 A CN108984700 A CN 108984700A CN 201810729988 A CN201810729988 A CN 201810729988A CN 108984700 A CN108984700 A CN 108984700A
Authority
CN
China
Prior art keywords
data
sampling
sampling function
sample survey
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810729988.3A
Other languages
Chinese (zh)
Other versions
CN108984700B (en
Inventor
王炼
吕远方
卢力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810729988.3A priority Critical patent/CN108984700B/en
Publication of CN108984700A publication Critical patent/CN108984700A/en
Application granted granted Critical
Publication of CN108984700B publication Critical patent/CN108984700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

A kind of data processing method and device, computer equipment and storage medium, the data processing method include: acquisition pending data;Obtain the corresponding sampling expression formula of the pending data;The sampling expression formula is parsed, sampling function and the corresponding sampling parametric value of the sampling function are obtained;Based on the sampling function and the corresponding sampling parametric value, the pending data is sampled, determines data from the sample survey.Sampling efficiency can be improved using this method.

Description

Data processing method and device, computer equipment and storage medium
Technical field
This application involves field of computer technology, in particular to a kind of data processing method and device, computer equipment and Storage medium.
Background technique
With the development of computer technology and development of Mobile Internet technology, the type of application is also increasing, and is provided the user with Convenience.For example, mobile phone application download platform provide convenience using downloading to user.In the exploitation of various types of application Cheng Zhong need to be tested etc. using each function to ensure the normal and stable operation applied, in the process, developer It needs to be tested using data.To avoid mass data from influencing testing efficiency, data need to be sampled, reduce test process In data volume, to improve testing efficiency.
During being sampled, pending data need to be read using data input code, currently, (can by sampling logic To be interpreted as sampling code) be added in data input code, that is, data input code is modified, based on sampling logic to input generation The pending data that code is read is sampled, to obtain data from the sample survey.
However, above-mentioned sampling process, every carry out single sample, need to be written sampling logic, data input code is modified, then Sampling logic is executed again to be sampled, and causes sampling efficiency low.
Summary of the invention
Based on this, it is necessary to which the limited problem of the information content conveyed for the message during existing mass-sending proposes one Kind data processing method and device, computer equipment and storage medium.
A kind of data processing method, comprising steps of
Obtain pending data;
Obtain sampling expression formula;
The sampling expression formula is parsed, sampling function and the corresponding sampling parametric value of the sampling function are obtained;
Based on the sampling function and the corresponding sampling parametric value, the pending data is sampled, really Determine data from the sample survey.
A kind of data processing equipment, comprising:
Data acquisition module, for obtaining pending data;
Expression formula obtains module, for obtaining sampling expression formula;
Parsing module obtains sampling function and the corresponding pumping of the sampling function for parsing the sampling expression formula Sample parameter value;
Decimation blocks, for being based on the sampling function and the corresponding sampling parametric value, to the number to be processed According to being sampled, data from the sample survey is determined.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing The step of device realizes the above method when executing the computer program.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step of above method is realized when row.
Above-mentioned data processing method and device, computer equipment and storage medium pass through in sampling of data treatment process Sampling expression formula is obtained, by the sampling function and the corresponding sampling parametric value of the sampling function in sampling expression formula, i.e., Pending data can be sampled, without updating sampling logic and modification Data Data generation during being sampled The operation of code only need to obtain sampling expression formula, can be sampled using its corresponding sampling function and sampling parametric value, letter Change sampling step, improves sampling efficiency.
Detailed description of the invention
Fig. 1 is the application environment schematic diagram of data processing method in one embodiment;
Fig. 2 is the flow diagram of data processing method in one embodiment;
Fig. 3 is existing Sampling figure;
Fig. 4 is the corresponding Sampling figure of data processing method of one embodiment;
Fig. 5 is the expression formula configuration interface figure of one embodiment;
Fig. 6 is the principle that sampling function com bined- sampling is limited using skip sampling function and quantity of one embodiment Figure;
Fig. 7 is the schematic diagram being sampled using interval sampling function of one embodiment;
Fig. 8 is the schematic diagram being sampled using random sampling function of one embodiment;
Fig. 9 is the module diagram of the data processing equipment of one embodiment;
Figure 10 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
For the objects, technical solutions and advantages of the application are more clearly understood, with reference to the accompanying drawings and embodiments, to this Application is described in further detail.It should be appreciated that the specific embodiments described herein are only used to explain the application, The protection scope of the application is not limited.
The data processing method of each embodiment provided by the present application, can be applied in application environment as shown in Figure 1, answers It is related to terminal 10 and server 20 with environment, terminal 10 is communicated by network with server 20.Above-mentioned data processing method It can be applied in server 20, in server 20, sampling of data can be carried out by above-mentioned data processing method and determine sampling number According to, and it can carry out testing determining test result based on function of the data from the sample survey to application to be measured, it, can will be to be measured after test passes through The function of application is online, with for users to use, i.e., terminal 10 can download the application to be measured, Yong Huke by access server 20 Use the function that the application to be measured provides in the terminal 10.The server 20 can use independent server either multiple clothes The server cluster of business device composition is realized.
In one embodiment, as shown in Fig. 2, providing a kind of data processing method.It is applied to take in Fig. 1 in this way It is illustrated for business device 20, includes the following steps S210 to S240.
S210: pending data is obtained.
Pending data can be understood as data to be sampled, i.e., data need to be extracted from pending data.Number to be processed According to may include that article content, merchandise news and network receive data etc., wherein article content can be Jie to application It continues, operation instruction and evaluation content etc., which can determine in such a way that network crawler is crawled from network. This article content can be stored in advance with database, can obtain this article content by the reading to database.Merchandise news can To be the merchandise news of third-party platform offer, for example, the corresponding second kills merchandise news etc. on certain electric business platform.Wherein, third Fang Pingtai can refer to that the server 20 with current configuration for executing data processing belongs to the associated server equipment of different platform. Network, which receives data, can be understood as the data pulled by accessing network interface.For example, certain consulting class application, can provide daily From certain network Ask-Answer Community (can connect the user of all trades and professions, user can share between knowledge, experience and opinion, In other words, in the community, user can carry out relevant discuss etc. around a certain topic) Selecting material (can be it is selected answer, Special column feature etc.).Consulting class application provides open interface, access the opening of information class application interface can pull it is selected Content etc..
S220: sampling expression formula is obtained.
S230: parsing sampling expression formula obtains sampling function and the corresponding sampling parametric value of sampling function.
Sampling expression formula can be understood as sampling condition, and sampling expression formula is corresponding with pending data, that is, indicate to treat place The condition that reason data are sampled.It may include sampling function and sampling parametric value in sampling expression formula, sampling function refers to can Realize the function of sampling function, it is understood that be sampling prescription, i.e., need the requirement or rule followed in sampling process.It is sampling In the process, the data for meeting the sampling prescription can be extracted from pending data.Sampling parametric value refers to be provided for sampling process Parameter value, can be on the basis of sampling prescription, and increase sampling parametric value, which samples to it, to be limited.That is sampling function and correspondence Sampling parametric value constitute to the sampling condition of pending data.
For example, it is desired to pending data at interval of a distance values extraction data, then in the sampling expression formula obtained Sampling function need to meet to be sampled at interval of a distance values, also needs to set distance values, passes through the sampling to limit Gap size when function is sampled, i.e., be defined sampling prescription.The sampling function and distance values are constituted to be processed The sampling condition of data needs to be sampled pending data at interval of a distance values.
S240: it is based on sampling function and corresponding sampling parametric value, pending data is sampled, determines sampling number According to.
In the server, it is previously stored with the corresponding code of sampling function, that is, is previously stored with achievable sampling function The sampling code of sampling process.It, can be based on sampling letter after parsing sampling expression formula determines sampling function and sampling parametric value Several and corresponding sampling parametric value, executes corresponding sampling code, can realize and sample to pending data, to be sampled Data.
Above-mentioned data processing method, by obtaining expression formula of sampling, is expressed in sampling of data treatment process by sampling The corresponding sampling parametric value of sampling function and sampling function in formula, can be sampled pending data, without into Sampling logic is updated during line sampling and modifies the operation of Data Data code, only need to obtain sampling expression formula, benefit It can be sampled with its corresponding sampling function and sampling parametric value, simplify sampling step, improve sampling efficiency.
In one embodiment, in one embodiment, after determining data from the sample survey, may further comprise the step of: to sampling Data are filtered processing respectively, determine filtered data from the sample survey.
In one example, it after determining filtered data from the sample survey, can also unite to filtered data from the sample survey Meter processing, obtains data statistics result.
In any one obtained data from the sample survey, it is understood that there may be the noises such as messy code also need for the accuracy for ensuring data It is filtered processing, i.e. noise in filter samples data respectively to data from the sample survey respectively.After filtering, it can also be counted Work, obtains statistical result, convenient for understanding data cases.For example, being directed to the pending data of article content, word can be carried out to it Number statistics, obtains word counting result.It is appreciated that often obtaining a data from the sample survey, above-mentioned filtering and system can be carried out to it Count treatment process.
In one embodiment, the corresponding sampling expression formula of pending data is obtained, comprising: receive response and match to expression formula Set the sampling expression formula that the interactive operation of expression formula input frame in interface obtains.
I.e. in this application, realize that data are taken out using B/S mode (Browser/Server, Browser/Server Mode) Sample process, server can be sent to browser by expression formula configuration interface, and browser can be shown after obtaining expression formula configuration interface, User can interact operation to the expression formula input frame in the expression formula configuration interface that browser is shown, for example, input operation, Sampling expression formula can be input in input frame, for another example, selection operation, i.e., record has history sampling expression in browser Formula can select one as the corresponding pumping of this pending data in the sampling expression formula of the history in the corresponding combobox of input frame Sample expression formula.Browser responds the interactive operation, can be obtained the corresponding sampling expression formula of pending data.Browser is taken out Server can be transmitted it to after sample expression formula, i.e. browser can respond the friendship to expression formula input frame in expression formula configuration interface Interoperability obtains sampling expression formula, and server can receive the sampling expression formula that browser obtains, and realizes and obtains to sampling expression formula It takes.Also i other words, by carrying out simple interactive operation in expression formula configuration interface, browser can get sampling expression formula And it is transferred to server, server can be sampled processing after obtaining the sampling expression formula, and step is succinct, and it is not necessary to modify generations Code improves sampling efficiency.
In one embodiment, pending data is obtained, comprising: be read out, obtain to each data source based on iterator Pending data.
Iterator (iterator), and vernier (cursor) can be claimed, it can be in container (container, such as chained list or array Deng) on the interface visited all over, without being concerned about the content of container.It is appreciated that iterator (Iterator) is an object, it can The ground of the determination in container is represented for some or all of traversing in Standard Template Library container element, each iterator objects Location.Iterator has modified the interface of conventional pointer, and so-called iterator is a kind of notional abstract, picture iterator in those behaviors Thing can be called iterator.However iterator has many different abilities, it can be abstract container and general-purpose algorithm Organically unite.
The work of iterator is the object for traversing and selecting in sequence, it is provided in a kind of one container object of access Each element, and the method for the container interior details need not be exposed.By iterator, without understanding the structure of container bottom, It can realize the traversal to container.Since the cost of creation iterator is small, iterator is commonly known as the appearance of lightweight Device.
A data is read by iterator, that is, realizes the input of a data, data source can be regarded as one has Sequence sequence can read on demand next data in fact by next () function in iterator, need without knowing the length of sequence It just can read when returning to next data.For example, can be by judging whether there are also data in container, if so, then using next () function obtains next data.
Due to data source difference, the corresponding data structure of data is not quite similar, and will lead to Heterogeneous data, i.e., each data The data in source may isomery.In being sampled treatment process, needs to be read out all types of data sources, determine to be processed Data provide Data safeguard for subsequent sampling process.In the present embodiment, each data source is read out by iterator, i.e., Above-mentioned pending data can be obtained.Wherein, each data source may include article content corresponding data source, merchandise news corresponding data Source and network receive data corresponding data source.
In one embodiment, sampling function includes skip sampling function, the corresponding sampling parametric value of skip sampling function For number of hops.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: in data based on pending data sequence and pending data data sequence number of hops it Data afterwards, determine data from the sample survey.
I.e. in the present embodiment, determining data from the sample survey is sampled to pending data based on skip sampling function.Wait locate Each data in reason data are corresponding with data sequence, it can are interpreted as sequence data.Skip sampling function can correspond to jump Sampling prescription is needed data sequence in pending data arranging preceding number of hops data and be skipped, no in sampling process It is sampled, by data of the data sequence after number of hops in pending data, as data from the sample survey.Carrying out In sampling process, it can skip the number of hops data of data front, to reduce treating capacity.The quantity of pending data can Number of hops can be greater than, it is also possible to be less than or equal to number of hops, in one example, be greater than in the quantity of pending data and jump When the number that jumps, by data of the data sequence after number of hops in pending data, as data from the sample survey, in pending data Quantity when being less than or equal to number of hops, indicate that pending data is unsatisfactory for the requirement of skip sampling, then stop sampling.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 5, then it represents that need to skip before data sequence 5 data without sampling, that is, skip data A1, A2, A3, A4 and A5 is not sampled it.Using data of the data sequence after 5 as data from the sample survey, then data from the sample survey include A6, A7, A8, A9 and A10.
In one embodiment, sampling function includes quantity limitation sampling function, and quantity limits the corresponding pumping of sampling function Sample parameter value is sample size threshold value.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: the data sequence based on pending data, by the sample size of data front in pending data Threshold number evidence, is determined as data from the sample survey.
I.e. in the present embodiment, determining data from the sample survey is sampled to pending data based on quantity limitation sampling function. To avoid the amount of sampling too big, sample size can be carried out during being sampled by increasing pressure to subsequent processing work Limitation.Above-mentioned quantity limits the upper limit that the corresponding sample size threshold value of sampling function is sample size.Above-mentioned quantity limitation is taken out Sample function can corresponding number limit sampling prescription, i.e., in sampling process, the quantity of sampling must not exceed the sample size threshold value. And in the case where being combined without other sampling functions with quantity limitation sampling function, by data front in pending data Sample size threshold number evidence, be determined as data from the sample survey, can be realized quantity limit sampling prescription under sampling of data.Number Amount limitation sampling function is to take in pending data before data sequence sample size threshold number according to as data from the sample survey.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Sample size threshold value be 5, then by above-mentioned quantity limitation sampling prescription be sampled, obtained data from the sample survey include A1, A2, A3, A4 and A5.
The quantity of pending data is likely larger than or is equal to sample size threshold value, it is also possible to it is less than sample size threshold value, In one embodiment, the data sequence based on pending data, by the sample size threshold of data front in pending data The step of being worth data, being determined as data from the sample survey include: when the quantity of pending data is greater than or equal to sample size threshold value, By the sample size threshold number evidence of data front in pending data, it is determined as data from the sample survey, in pending data When quantity is less than sample size threshold value, pending data is determined as data from the sample survey,
In one embodiment, sampling function includes interval sampling function, the corresponding sampling parametric value of interval sampling function For distance values.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: the data sequence based on pending data, the data by pending data intermediate compartment away from value determine It include the data that data sequence is first in pending data for data from the sample survey, in data from the sample survey.
I.e. in the present embodiment, determining data from the sample survey is sampled to pending data based on interval sampling function.Interval Sampling function can correspond to spacing sampling prescription, i.e., the sampling prescription being sampled to pending data at interval of distance values can To understand, by the data of data sequence interval distance values in pending data.I.e. in this reality embodiment, to pending data into The equidistant sampling of row, or perhaps pending data is sampled by fixed distance values.Specifically, data are primarily based on Sequentially, the current data can be determined as sampling using the data that data sequence is first in pending data as current data Data then, will be to be processed when the quantity of data is greater than or equal to distance values after data in this prior in pending data Data sequence differs the data of distance values with the data of current data sequence as current data, i.e. update current number in data According to, and the step of current data is determined as data from the sample survey is returned, number until data in this prior in pending data According to quantity be less than distance values, at this point, can determine data from the sample survey.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10. Distance values are 3, and after being sampled based on above-mentioned interval sampling rule, obtained data from the sample survey includes A1, A4 and A7.
When being sampled by above-mentioned interval sampling function, entire data set, i.e. pending data can be facilitated, thus, it can Ensure that data from the sample survey has preferable distribution.In the identical situation of pending data, place is treated according to interval sampling function Manage data carry out multiple sampling, it is ensured that the data from the sample survey obtained every time be it is identical, consequently facilitating orientation problem data and solution Certainly problem.
In one embodiment, sampling function includes random sampling function, the corresponding sampling parametric value of random sampling function For random probability value.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be based on random probability value, random sampling is carried out to pending data, determines data from the sample survey.
I.e. in the present embodiment, determining data from the sample survey is sampled to pending data based on random sampling function.At random Sampling function can correspond to random sampling rule, i.e., the sampling prescription of random sampling is carried out to pending data.In the present embodiment, Random sampling can be carried out to pending data.Specifically with fixed random probability value random sampling.Pass through above-mentioned random sampling letter When number is sampled, entire data set can be facilitated, it is all uncertain that whether any one data, which is drawn, thus, it can be true Data from the sample survey is protected to be evenly distributed.
For example, in pending data include 25 data, be followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24 and A25.Random probability value is 0.5, After being sampled based on above-mentioned random sampling rule, obtained data from the sample survey may include A2, A3, A6, A8, A10, A11, A12, A14, A19, A21, A22 and A25.
In one embodiment, sampling function includes skip sampling function, quantity limitation sampling function, interval sampling function And at least any one in random sampling function.
Except using in skip sampling function, quantity limitation sampling function, interval sampling function and random sampling function Unitary sampling function can also be combined sampling outside being sampled.It can be by above-mentioned skip sampling function, quantity limitation sampling letter Number, interval sampling function and random sampling function carry out any combination, are sampled to pending data, in this way, can be into one Step ensures the diversity and uniformity of data from the sample survey.
In sampling expression formula, if the quantity of sampling function is greater than 1, exist first between sampling function in expression formula of sampling Sequence afterwards is sampled expression formula in parsing, except can obtain sampling function and the corresponding sampling parametric value of the sampling function it Outside, it also can get the sequence of sampling function.I.e. sampling function has sequence in sampling expression formula, based on the sampling function and The corresponding sampling parametric value, the step of being sampled to the pending data, determine data from the sample survey may include: to be based on The sequence of the sampling function, the corresponding sampling parametric value and sampling function, is sampled the pending data, Determine data from the sample survey.
After being sampled based on sampling function, number that the data that sampling obtains are sampled as next sampling function According to that is, this sampling function is sampled on the basis of the data that a upper sampling function is sampled, according to sampling function Sequence be sampled, data from the sample survey can be obtained.It is appreciated that when the quantity of sampling function is at least 2, sampling process Be it is serial, i.e., sequence pending data is sampled in most preceding sampling function, in above-mentioned sequence in most preceding sampling letter On the basis of the data that several pairs of pending datas are sampled, recycles adjacent thereto and come sampling function thereafter The data that sampling obtains are corresponded to it to be sampled, and so on, it samples and finishes until the sampling function in sampling expression formula.
No matter data from the sample survey includes skip sampling function, quantity limitation sampling function, interval sampling function and random pumping Which function in sample function, according to the sequence of sampling function, successively according to the mistake that wherein unitary sampling function is sampled The principle being sampled when only including a sampling function in journey and above-mentioned sampling expression formula is corresponding identical.For example, sampling expression Include that skip sampling function and quantity limit sampling function in formula, then passes sequentially through skip sampling function and pending data is carried out It is identical that the principle process that skip sampling function is sampled is only included in the principle process of sampling and above-mentioned sampling expression formula, then Sampling function, which is limited, by quantity is sampled the process that determining data are sampled to by skip sampling function, and it is above-mentioned It is identical that the principle process that quantity limitation sampling function is sampled is only included in sampling expression formula, the difference is that be sampled Data are different, above-mentioned when only including quantity restricted function, are to be sampled to pending data, that is, data to be sampled are wait locate Data are managed, herein, data to be sampled are that determining data are sampled by skip sampling function.
In one embodiment, the quantity of sampling function is 2, i.e., sampling function includes skip sampling function, quantity limit Any two in sampling function processed, interval sampling function and random sampling function are then based on sampling function and corresponding pumping Sample parameter value, is sampled pending data, determine data from the sample survey process be first based on the preceding sampling function of sequence with And corresponding sampling parametric value is sampled pending data, determines the first data from the sample survey undetermined, then posterior based on sequence Sampling function and corresponding sampling parametric value are sampled the first data from the sample survey undetermined, determine data from the sample survey.
In one embodiment, sampling function includes skip sampling function and quantity limitation sampling function, skip sampling letter For several sequences before the sequence of quantity limitation sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, It is sample size threshold value that quantity, which limits the corresponding sampling parametric value of sampling function,.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on skip sampling function and number of hops to pending data, determine that first is undetermined Data from the sample survey limits sampling function based on quantity and sample size threshold value is sampled the first data from the sample survey undetermined, determines Data from the sample survey.
In one embodiment, when can have the data not being extracted a greater than number of hops in pending data, It is that the data after number of hops are determined as the first data from the sample survey undetermined by data sequence in pending data.And from first Data from the sample survey is extracted in data from the sample survey undetermined, and the quantity of data from the sample survey is less than or equal to sample size threshold value.
I.e. in the present embodiment, it is sampled using the combination of skip sampling function and quantity limitation sampling function.It is taking out During sample, it is necessary first to skip data, then need to judge that number of hops is a not to be taken out with the presence or absence of being greater than in pending data The data taken, if it is not, then stopping sampling, if so, being after number of hops based on data sequence in pending data Data (data from the sample survey undetermined of i.e. above-mentioned first), determine data from the sample survey, have not only met skip sampling rule but also have met quantity limitation Sampling prescription.Specifically, based on the data that data sequence in pending data is after number of hops, data from the sample survey is determined Process may include when the quantity of the data after number of hops is greater than or equal to sample size threshold value, then by number to be processed According to the forward sample size threshold number evidence that middle data sequence is after number of hops, it is determined as data from the sample survey, i.e., by the The sample size threshold number of data front is according to as data from the sample survey in one data from the sample survey undetermined.And after number of hops The quantity of data when being less than sample size threshold value, then be number after number of hops by data sequence in pending data According to being determined as data from the sample survey, i.e., using the first data from the sample survey undetermined as data from the sample survey.In one example, the data from the sample survey packet Including data sequence in pending data is the data that number of hops adds one.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 5, and sample size threshold value is 2, then it represents that before needing to skip data sequence 5 data are skipped without sampling Data A1, A2, A3, A4 and A5, are not sampled it.The data from the sample survey undetermined of determining first is A6, A7, A8, A9 and A10 (data sequence data) after 5, using preceding 2 data of the data sequence after 5 as data from the sample survey, then data from the sample survey is Including A6 and A7.
In one embodiment, data from the sample survey is extracted from the first data from the sample survey undetermined, comprising: initializing sample size is Zero;Data sequence based on the first data from the sample survey undetermined, selects first data (as to jump from the first data from the sample survey undetermined First data after number) it is used as currently processed data;It is less than sample size threshold value, and currently processed number in sample size When according to meeting preset requirement, currently processed data are determined as data from the sample survey, and sample size is added one;And in the first pumping undetermined There are when data after currently processed data in sample data, the data sequence based on the first data from the sample survey undetermined is undetermined by first Adjacent data (next data of i.e. currently processed data) in data from the sample survey after currently processed data is as currently processed Data update currently processed data, and return and be less than sample size threshold value in sample size, and currently processed data meet in advance If it is required that when, using current pending data as the step of data from the sample survey, until sample size is equal to sample size threshold value or the There are no data in one data from the sample survey undetermined after currently processed data, (quantity of the data after i.e. currently processed data is Zero).In addition, will then deserve pre-processing data when currently processed data are unsatisfactory for preset requirement and abandon, that is, execute first to Determine in data from the sample survey after currently processed data there are when data, the data sequence based on the first data from the sample survey undetermined, by first The step of adjacent data in data from the sample survey undetermined after currently processed data is as currently processed data update currently pending Data.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 5, and sample size threshold value is 2.Firstly, sample size is using the 6th data, that is, A6 as current pending data Zero, less than 2, and meet preset requirement, then data A6 is determined as data from the sample survey, and sample size is added into one, i.e. data from the sample survey It is updated to 1.Data after data A6 further include that A7, A8, A9 and A10 are redefined down using A7 as current pending data Data A7 is determined as data from the sample survey, and will take out at this point, sample size is less than 2, and meets preset requirement by one data from the sample survey Sample quantity adds one, i.e. data from the sample survey is updated to 2.Data after data A7 further include A8, A9 and A10, however, sampling at this time Data are 2, are equal to sample size threshold value, meet sampling termination condition, can terminate to sample.Determining data from the sample survey be A6 and A7。
If current processing data A7 to be measured is unsatisfactory for preset requirement, using A8 as current pending data, sample size Less than 2, and meet preset requirement, data A8 is determined as data from the sample survey, and sample size is added one, i.e. data from the sample survey is updated to 2.Data after data A8 further include A9 and A10, however, data from the sample survey at this time is 2, are equal to sample size threshold value, satisfaction Sampling termination condition, can terminate to sample.Determining data from the sample survey is A6 and A8.
In the present embodiment, after current pending data being determined as data from the sample survey, which can be carried out Above-mentioned filtering processing and statistical disposition.
In one embodiment, preset requirement can be in determining data from the sample survey with to deserve pre-processing data be mutually similar The quantity of the data of type is less than preset quantity threshold value.I.e. in determining data from the sample survey and deserve pre-processing data as same type The quantity of data when being less than preset quantity threshold value, expression deserves pre-processing data and meets preset requirement.If it is determined that sampling number When being greater than or equal to preset quantity threshold value with the quantity for deserving the continuous data that pre-processing data is same type in, it is indicated The preceding data extracted more than or equal to preset quantity threshold value with the current pending data for same type, to ensure to sample The diversity of data no longer extracts the data of the type, at this point, can abandon the current pending data, update is worked as Preceding pending data, the data from the sample survey for carrying out next round determine.
For example, currently processed data are article content, that is, article's style is corresponded to, has extracted preset quantity threshold value as text The pending data of chapter type is data from the sample survey, at this point, can abandon this article content not extract.In another example currently processed number It is pumping according to the pending data that preset quantity threshold value is the type of merchandise for merchandise news, i.e. corresponding goods type, has been extracted Sample data, at this point, the merchandise news can be abandoned not extract.For another example, currently processed data are that network receives data, i.e., corresponding Network data type, having extracted the pending data that preset quantity threshold value is network data type is data from the sample survey, at this point, The network data can be received to abandon not extracting.
In one embodiment, sampling function includes quantity limitation sampling function and skip sampling function, and quantity limitation is taken out The sequence of sample function is before the sequence of skip sampling function.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: sampling function is limited based on quantity and sample size threshold value is sampled pending data, is determined First data from the sample survey undetermined is sampled the first data from the sample survey undetermined based on skip sampling function and number of hops, determines Data from the sample survey.In the present embodiment, determine that the process of the first data from the sample survey undetermined is based on quantity and limits sampling function, with above-mentioned base It can be different in the particular content that skip sampling function is sampled the sampling function undetermined of determining first.
It in one embodiment, can be based on the data sequence of the pending data, by number in the pending data According to the sample size threshold number evidence of front, it is determined as the first data from the sample survey undetermined;By the first data from the sample survey undetermined In, data sequence is the data after number of hops, is determined as data from the sample survey.
By in the first data from the sample survey undetermined, data sequence is the data after number of hops, is indicated in the first sampling undetermined Data sequence in data is data after number of hops, i.e., in the present embodiment, first pass through quantity limit sampling function into Line sampling, i.e., by the sample size threshold number of front in pending data according to as the first data from the sample survey undetermined, then In the first data from the sample survey undetermined, number of hops data before skipping, using remaining data in the first data from the sample survey undetermined as Data from the sample survey.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 2, and sample size threshold value is 5.5 forward data are first extracted as the first data from the sample survey undetermined, i.e., first to Determining data from the sample survey includes A1, A2, A3, A4 and A5.Then by the first data from the sample survey undetermined data sequence be 2 after number According to, be determined as data from the sample survey, the data that data sequence is 2 in the first data from the sample survey undetermined are A2, data from the sample survey include A3, A4 and A5。
In one embodiment, sampling function includes interval sampling function and quantity limitation sampling function, interval sampling letter For several sequences before the sequence of quantity limitation sampling function, the corresponding sampling parametric value of interval sampling function is distance values, number The corresponding sampling parametric value of amount limitation sampling function is sample size threshold value.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on interval sampling function to pending data, determine the first data from the sample survey undetermined, base The first data from the sample survey undetermined is sampled in quantity limitation sampling function, determines data from the sample survey.
In one embodiment, firstly, can based on pending data data sequence, by pending data intermediate compartment away from The data of value are determined as the first data from the sample survey undetermined, the data of data front from the first data from the sample survey undetermined, as pumping Sample data.The quantity of data from the sample survey is less than or equal to sample size threshold value, and is first including data sequence in pending data Data.
In one embodiment, first can be based on the data sequence of pending data, it can be suitable by data in pending data The current data is determined as data from the sample survey, then, at this in pending data as current data by the data that sequence is first The quantity of data is greater than or equal to distance values after current data and the quantity of the data from the sample survey of determination is less than sample size threshold value When, the data of distance values are differed using data sequence in pending data with the data of current data sequence as current data, Current data is updated, and returns to the step of current data is determined as data from the sample survey, until deserving in pending data The quantity that the quantity of data is less than the data from the sample survey of distance values or determination after preceding data is greater than or equal to sample size threshold value, this When, that is, it can determine data from the sample survey.It is appreciated that the quantity of data is less than spacing after data in this prior in pending data Data from the sample survey is determined when value, indicates that the quantity of determining data from the sample survey is also not up to sample size threshold value, number after current data According to sampling demand is had been unable to meet, the quantity of determining data from the sample survey is less than sample size threshold value at this time.In determining sampling number According to quantity be greater than or equal to sample size threshold value when determine data from the sample survey, indicate that the quantity of determining data from the sample survey has reached pumping Sample amount threshold, the quantity of determining data from the sample survey is equal to sample size threshold value at this time.
For example, for example, in pending data include 10 data, be followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, distance values 2, sample size threshold value are 4.Firstly, being sampled based on interval sampling function to pending data, determine The first data from the sample survey undetermined include A1, A3, A5, A7 and A9, in the sampling that these data are carried out with quantity limitation, i.e. extraction number Measure the data from the sample survey that the upper limit is 4.The quantity of first data from the sample survey undetermined is 5, is greater than sample size threshold value, then by the first pumping undetermined 4 data of data front as data from the sample survey, that is, include A1, A3, A5 and A7 in sample data.If sample size threshold value is 6, the quantity of the first data from the sample survey undetermined is less than sample size threshold value and samples then using the first data from the sample survey as data from the sample survey Data include A1, A3, A5, A7 and A9.
In one embodiment, sampling function includes quantity limitation sampling function and interval sampling function, and quantity limitation is taken out The sequence of sample function is before the sequence of interval sampling function.
In this embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, is determined Data from the sample survey, comprising: pending data is sampled based on quantity restricted function and sample size threshold value, determine first to Determine data from the sample survey;The first data from the sample survey undetermined is sampled based on interval sampling function and distance values, determines data from the sample survey.
In one embodiment, first can be based on the data sequence of the pending data, it will be in the pending data The sample size threshold number evidence of data front, determines the first data from the sample survey undetermined;By the first data from the sample survey undetermined In, the data of the distance values are spaced, data from the sample survey is determined as, include the described first data from the sample survey undetermined in the data from the sample survey The data that middle data sequence is first.
I.e. in the present embodiment, it first passes through quantity limitation to be sampled, i.e., by the sampling of front in pending data Then amount threshold data are spaced distance values and carry out interval pumping to the first data from the sample survey undetermined as the first data from the sample survey undetermined Sample determines data from the sample survey.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Sample size threshold value is 6, distance values 2.It is undetermined as the first data from the sample survey undetermined, i.e., first first to extract 6 forward data Data from the sample survey includes A1, A2, A3, A4, A5 and A6.Then by the above-mentioned first data from the sample survey undetermined, the data of distance values are spaced, It is determined as data from the sample survey, and data from the sample survey includes the data that first is come in the first data from the sample survey undetermined, i.e. A1.Data from the sample survey packet Include A1, A3 and A5.
In one embodiment, sampling function includes random sampling function and quantity limitation sampling function, random sampling letter For several sequences before the sequence of quantity limitation sampling function, the corresponding sampling parametric value of random sampling function is random chance Value, it is sample size threshold value that quantity, which limits the corresponding sampling parametric value of sampling function,.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: random sampling is carried out to pending data based on random sampling function and random probability value, determines the One data from the sample survey;Sampling function is limited based on quantity and sample size threshold value is sampled the first data from the sample survey, is determined and is taken out Sample function.Wherein, the quantity of data from the sample survey is less than or equal to sample size threshold value.
Random sampling is carried out to pending data, entire data set is traversed, to avoid sample size too greatly to subsequent place Reason causes stress, and can limit the quantity of sampling.After obtaining the first data from the sample survey undetermined after random sampling, first to Determine to extract the data from the sample survey that quantity is less than or equal to sample size threshold value in data from the sample survey again.Carried out in traversal pending data Random sampling, and the quantity for the data further sampled with quantity limitation sampling function is less than or sample size threshold value.Tool Body, it samples to obtain the first data from the sample survey undetermined first with random sampling function, be less than in the quantity of the first data from the sample survey undetermined When sample size threshold value, the first data from the sample survey undetermined that random sampling is obtained is used as data from the sample survey, at this point, data from the sample survey Quantity is less than sample size threshold value.If the quantity for the first data from the sample survey undetermined that random sampling obtains is greater than or equal to data from the sample survey Threshold value, then extract the data from the sample survey threshold number of data front according to as data from the sample survey from the first data from the sample survey undetermined, The quantity of determining data from the sample survey is equal to sample size threshold value at this time.
In one embodiment, sampling function includes quantity limitation sampling function and random sampling function, and quantity limitation is taken out The sequence of sample function is before the sequence of random sampling function.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: sampling function is limited based on quantity and sample size threshold value is sampled pending data, is determined First data from the sample survey undetermined is sampled the first data from the sample survey undetermined based on random sampling function, determines data from the sample survey.
It in one embodiment, can be based on the data sequence of the pending data, by data in the pending data The sample size threshold number evidence of front, is determined as the first data from the sample survey undetermined;Based on random sampling function and Random probability value carries out random sampling to the first data from the sample survey undetermined, is determined as data from the sample survey.I.e. in the present embodiment, first lead to It crosses quantity limitation to be sampled, i.e., by the sample size threshold number of front in pending data according to as the first pumping undetermined Sample data carry out random sampling by random probability value, determine data from the sample survey then to the first data from the sample survey undetermined.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Sample size threshold value is 6, random probability value 0.05.6 forward data are first extracted as the first data from the sample survey undetermined, i.e., One data from the sample survey undetermined includes A1, A2, A3, A4, A5 and A6.Random probability value is then based on to the above-mentioned first data from the sample survey undetermined Random sampling is carried out, data from the sample survey is determined, if data from the sample survey may include A1, A2, A5 and A6.
In one embodiment, sampling function includes skip sampling function and interval sampling function, skip sampling function For sequence before the sequence of interval sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, interval sampling The corresponding sampling parametric value of function is distance values.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on skip sampling function and number of hops to pending data, determine that first is undetermined Data from the sample survey;The first data from the sample survey undetermined is sampled based on interval sampling function, determines data from the sample survey.
I.e. in the present embodiment, can first the data based on pending data sequentially, data sequence in pending data be existed Data after number of hops are determined as the first data from the sample survey undetermined.It is then based on the data sequence of the first data from the sample survey undetermined, By the first data of the data from the sample survey intermediate compartment away from value undetermined, it is determined as data from the sample survey, includes the first indefinite number in data from the sample survey The data for being first according to middle data sequence.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 5, and preceding 5 data are skipped first, be not sampled to it, the first data from the sample survey undetermined is then away from being 2 by compartment Including A6, A7, A8, A9 and A10, interval sampling is carried out to it, data from the sample survey may include A6, A8 and A10.
In one embodiment, sampling function includes interval sampling function and skip sampling function, interval sampling function Sequence is before the sequence of skip sampling function.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on interval sampling function and spacing value to pending data, determine the first pumping undetermined Sample data;The first data from the sample survey undetermined is sampled based on skip sampling function and number of hops, determines data from the sample survey.
Wherein, the first data from the sample survey undetermined includes the data that first is come in pending data.I.e. in the present embodiment, may be used It first passes through interval sampling function and spacing value is sampled pending data, determine the first data from the sample survey undetermined, then jump Number of hops data forward in the first data from the sample survey undetermined are crossed, by data sequence in the first data from the sample survey undetermined in number of hops Data after mesh are as data from the sample survey.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 2, and compartment carries out interval sampling to pending data away from being 2, first with distance values, and first obtained is undetermined Data from the sample survey includes A1, A3, A5, A7 and A9.Skip 2 data forward in the first data from the sample survey undetermined again, that is, skip A1 and A3, using remaining A5, A7 and A9 as data from the sample survey.
In one embodiment, sampling function includes skip sampling function and random sampling function, skip sampling function For sequence before the sequence of random sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, random sampling The corresponding sampling parametric value of function is random probability value.
Based on sampling function and corresponding sampling parametric value, pending data is sampled, determines data from the sample survey, wrapped It includes: pending data being sampled based on skip sampling function and number of hops, determine the first data from the sample survey undetermined;It is based on Random sampling function and random probability value are sampled the first data from the sample survey undetermined, determine data from the sample survey.
I.e. in the present embodiment, data sequence in pending data can jumped based on the data sequence of pending data Data after the number that jumps, are determined as the first data from the sample survey undetermined;Based on random sampling function and random probability value, to first Data from the sample survey undetermined carries out random sampling, determines data from the sample survey.
The sampling of skip sampling rule is first carried out, obtains the first data from the sample survey undetermined, then interval sampling is carried out to it, Obtain data from the sample survey.For example, in pending data include 10 data, be followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and Preceding 5 data are skipped first, are not sampled to it by A10, number of hops 5, random probability value 0.5, the first pumping undetermined Sample data then include A6, A7, A8, A9 and A10, random sampling are carried out to it, for example, data from the sample survey may include A7 and A8.
In one embodiment, sampling function includes random sampling function and skip sampling function, random sampling function Sequence is before the sequence of skip sampling function.
It is then based on sampling function and corresponding sampling parametric value, pending data is sampled, determines data from the sample survey, Include: to be sampled based on random sampling function and random probability value to pending data, determines the first data from the sample survey undetermined; The first data from the sample survey undetermined is sampled based on skip sampling function and number of hops, determines data from the sample survey.
I.e. in the present embodiment, first pending data can be taken out based on random sampling function and random probability value Sample, determines the first data from the sample survey undetermined, then skips the number of hops data of front in the first data from the sample survey undetermined again, Using data of the sequence after number of hops in the first data from the sample survey undetermined as data from the sample survey.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 2, random probability value 0.5, carries out random sampling to pending data first with random probability value, obtains First data from the sample survey undetermined includes A1, A2, A5, A7 and A9.2 data forward in the first data from the sample survey undetermined are skipped again, i.e., A1 and A2 are skipped, using remaining A5, A7 and A9 as data from the sample survey.
In one embodiment, sampling function includes interval sampling function and random sampling function, interval sampling function For sequence before the sequence of random sampling function, the corresponding sampling parametric value of interval sampling function is distance values, random sampling letter The corresponding sampling parametric value of number is random probability value.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on interval sampling function and distance values to pending data, determine the first pumping undetermined Sample data;The first data from the sample survey undetermined is sampled based on random sampling function and random probability value, determines data from the sample survey.
I.e. in the present embodiment, it is not only sampled by interval sampling function and distance values, is also taken out to by spacing The first data from the sample survey undetermined that sample obtains carries out further random sampling, so, it can be ensured that and data from the sample survey is more comprehensive, without It is to concentrate on a certain data.
In one embodiment, sampling function includes random sampling function and interval sampling function, random sampling function Sequence is before the sequence of interval sampling function.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: pending data is sampled based on random sampling function and random probability value, determine first to Determine data from the sample survey;The first data from the sample survey undetermined is sampled based on interval sampling function and distance values, determines data from the sample survey.
I.e. in the present embodiment, it can be not only sampled by random sampling function and random probability value, also to passing through The first data from the sample survey undetermined that random sampling obtains carries out further interval sampling, so, it can be ensured that data from the sample survey is more complete Face, rather than concentrate on a certain data.
In one embodiment, the quantity of sampling function is three, i.e., sampling function includes skip sampling function, quantity limit Any three in sampling function processed, interval sampling function and random sampling function, then sampling function and corresponding pumping are based on Sample parameter value, is sampled pending data, determines that the process of data from the sample survey is first based on sequence in most preceding sampling function And corresponding sampling parametric value is sampled pending data, determines the second data from the sample survey undetermined;Again based on sequence in Between sampling function and corresponding sampling parametric value the second data from the sample survey undetermined is sampled, determine third sampling number undetermined According to;Finally third data from the sample survey undetermined is taken out in last sampling function and corresponding sampling parametric value based on sequence Sample determines data from the sample survey.
The principle that is sampled in the present embodiment based on sequential three sampling functions and above-mentioned based on orderly two The principle that sampling function is sampled is similar, the difference is that, increase a sampling function, then increases and execute single sample Process.Above-mentioned unitary sampling is only included in the process and sampling expression formula that unitary sampling function is sampled in three sampling functions Sampling when function is identical, the difference is that, when three sampling functions are sampled respectively, except coming most preceding sampling Function (is sampled) on the basis of pending data, other sampling functions are the data basis determined in preceding single sample Upper carry out resampling.
Below with the combination of three sampling functions of partial order (skip sampling function, quantity limitation sampling function, interval In sampling function and random sampling function, the quantity of the sequential combination of three sampling functions has 24) for be illustrated.
For example, in one embodiment, sampling function includes that skip sampling function, interval sampling function and quantity limitation are taken out Sample function, the sequence of skip sampling function be most before, the sequence of interval sampling function be that intermediate (i.e. sequence is in skip sampling function Sequence and quantity limitation sampling function sequence between), quantity limit sampling function sequence be finally, skip sampling function Corresponding sampling parametric value is number of hops, and the corresponding sampling parametric value of interval sampling function is distance values, quantity limitation sampling The corresponding sampling parametric value of function is sample size threshold value.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on skip sampling function and number of hops to pending data, determine that second is undetermined Data from the sample survey;The second data from the sample survey undetermined is sampled based on interval sampling function and distance values, determines third pumping undetermined Sample data;Sampling function is limited based on quantity and sample size threshold value is sampled third data from the sample survey undetermined, is determined and is taken out Sample data.
I.e. in the present embodiment, data sequence in pending data can jumped based on the data sequence of pending data Data after the number that jumps, being determined as the second data from the sample survey undetermined, (process is to be sampled based on skip sampling function Process).Based on interval sampling function and distance values, the second data from the sample survey undetermined is sampled, determines third sampling undetermined Data (the as second data of the data from the sample survey intermediate compartment away from value undetermined, and including being sequentially the in the second data from the sample survey undetermined One data).Then, then based on quantity limitation sampling function and sample size threshold value third data from the sample survey undetermined is taken out Sample determines data from the sample survey.
Wherein, the quantity of data from the sample survey is less than or equal to sample size threshold value, limits sampling function based on quantity with above-mentioned And the principle that sample size threshold value is sampled pending data or the first data undetermined is identical, the difference is that, number According to basic difference, herein is in limit data from the sample survey by quantity on the basis of third data from the sample survey undetermined to be sampled.
For example, including 10 data in pending data, it is followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, Number of hops is 1, distance values 2, and sample size threshold value is 4, first with skip sampling function and number of hops to be processed Data carry out random sampling, i.e. 1 forward data of skip order, second obtained data from the sample survey undetermined include A2, A3, A4, A5, A6, A7, A8, A9 and A10.Interval sampling is carried out to the second data from the sample survey undetermined by interval sampling function and distance values again, Obtained third data from the sample survey undetermined then includes A2, A4, A6, A8 and A10.Third is waited for finally by quantity limitation sampling function Determine data from the sample survey to be sampled, i.e., the sample size threshold number of data front is according to work in selection third data from the sample survey undetermined For data from the sample survey, i.e. data from the sample survey includes A2, A4, A6 and A8.
In the case where sampling function includes skip sampling function, interval sampling function and quantity limitation sampling function, The sequence of sampling function can arbitrarily change, and be based on actual demand, can be limited with skip sampling function, interval sampling function and quantity Orderly any combination corresponding to sampling function processed is sampled, and the process that unitary sampling function is sampled is somebody's turn to do with above-mentioned It is identical that unitary sampling function is individually sampled principle, the difference is that, sampling sequence is different, in different data basis It is sampled, obtained sampling results can be different.
Sample for data are analyzed and are excavated is a kind of common Primary Stage Data processing technique and stage, ordinary circumstance Under, the scale of conceptual data (i.e. pending data) is too big, if carrying out analytic operation for conceptual data, can not only consume a large amount of Resource, and the analytic operation time can be dramatically increased, system crash when resulting even in analysis.Pass through the pumping to conceptual data Sample carries out analytic operation, the consumption not only to economize on resources to data from the sample survey, and reduces analysis time, and can reduce system and collapse A possibility that bursting.And pass through a certain or certain attribute of data from the sample survey, conceptual data feature can be obtained to have and centainly may be used Assessment by property judges, to reach the understanding to conceptual data.After the completion of sampling, data from the sample survey can be applied to each reality In scene, for example, can analyze data from the sample survey, the feature of data from the sample survey is determined, the feature of data from the sample survey is applied to number According in classification scene, data from the sample survey can also be used and carry out software test etc..
In different phase (for example, exploitation or test phase) or different demands are based on, data from the sample survey can be required different.And Sequence of the sampling function in sampling expression formula is different, and the process being sampled is different, then the result of the data from the sample survey obtained is not Together.In this way, the configurable sampling expression formula with different sampling functions, and changeable sampling function is suitable in sampling expression formula Sequence obtains different data from the sample survey.Different demands are based on, can configure the sampling expression formula of different sampling function sequences, are utilized The sampling function of different order is sampled, it is possible to provide different data from the sample survey, to meet different demands.
For example, above-mentioned be successively sampled based on skip sampling function, interval sampling function and quantity limitation sampling function, It is sampled first with skip sampling function, i.e., first skips the number of hops data of front in pending data, avoid The number of hops data of front in pending data are extracted, to reduce the data of front to entire data from the sample survey It influences, and reduces data volume.Then on the basis of the data sampled by skip sampling function, interval sampling is carried out, The transition of so avoidable data is concentrated, and the uniformity of data from the sample survey is improved.Finally in the data obtained by interval sampling On the basis of, it is sampled by quantity restricted function, to limit the quantity of data from the sample survey, reduces data volume.In this way, by above-mentioned The data from the sample survey that is sampled to pending data of sequence not only reduces the unstable of front in pending data Data, and can ensure that data uniformity and reduce data volume.
In another example when being successively sampled based on skip sampling function, quantity limitation sampling function and interval sampling function, The sequence of sampling is different from the sequence of above-mentioned sampling.The number of hops number of front in pending data is skipped first According to, avoid extract pending data in front number of hops data, to reduce the unstability of front data Influence to entire data from the sample survey, and reduce data volume.Then on the basis for the data sampled by skip sampling function On, progress is quantity limitation sampling (rather than above-mentioned interval sampling), i.e., carries out quantity limit to the data obtained after skip sampling It makes, front is less than or equal to the sample size threshold value (data that skip sampling obtains in the data for taking skip sampling to obtain Quantity be greater than or equal to sample size threshold value when, extract sample size threshold number evidence, otherwise, extract be less than sample size threshold It is worth data) a data, it realizes secondary sample, further decreases data volume.Interval sampling is finally carried out, that is, is passing through quantity It on the basis of the data that limitation sampling obtains, is selected since wherein first data, is used as pumping at interval of the data of distance values Sample data, to ensure the uniformity of data from the sample survey.It i.e. in the present embodiment, is between being carried out in the data after quantity limitation sampling Every extraction, rather than the above-mentioned enterprising row interval of data obtained after skip sampling extracts, it is thus possible to obtain different sampling knots Fruit carries out the sampling of different order that is, by the sequence of different sampling functions, and the data from the sample survey of available multiplicity is improved and taken out The diversity of sample data, to meet different demands.
In one embodiment, sampling function includes skip sampling function, interval sampling function and random sampling function, is jumped Jump sampling function sequence be most before, the sequence of random sampling function is finally, the sequence of interval sampling function is in skip sampling Between the sequence of function and the sequence of random sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, with The corresponding sampling parametric value of machine sampling function sampling function is random probability value, and the corresponding sampling parametric value of interval sampling function is Distance values.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on skip sampling function and number of hops to pending data, determine that second is undetermined Data from the sample survey;The second data from the sample survey undetermined is sampled based on interval sampling function and distance values, determines third pumping undetermined Sample data;Third data from the sample survey undetermined is sampled based on random sampling function and random probability value, determines data from the sample survey.
I.e. in the present embodiment, data sequence in pending data can jumped based on the data sequence of pending data Data after the number that jumps, are determined as the second data from the sample survey undetermined.It is undetermined to second based on interval sampling function and distance values Data from the sample survey is sampled, determine third data from the sample survey undetermined (the as second data of the data from the sample survey intermediate compartment away from value undetermined, And including being sequentially first data in the second data from the sample survey undetermined).Then, then based on random sampling function and random chance Value is sampled third data from the sample survey undetermined, determines data from the sample survey.
In this way, can skip the data that sequence is earlier in pending data, its influence to whole sampling results is avoided. And after skip sampling, it is sampled by interval sampling function, to ensure the uniformity of data distribution, can be more nearly The rule of conceptual data.It is sampled finally by random sampling function, in sampling process, can be obtained traversal through interval sampling Entire data, whether each data be sampled, be all it is uncertain, further ensure that data from the sample survey is evenly distributed.
In the case where sampling function includes skip sampling function, interval sampling function and random sampling function, sampling The sequence of function can arbitrarily change, and be based on actual demand, can be with skip sampling function, interval sampling function and random sampling letter The corresponding orderly any combination of number is sampled, the process that unitary sampling function is sampled and the above-mentioned unitary sampling It is identical that function is individually sampled principle, the difference is that, sampling sequence is different, is taken out in different data basis Sample, obtained sampling results can be different, to meet the needs of different.
In one embodiment, sampling function includes quantity limitation sampling function, interval sampling function and random sampling letter Number, the sequence of quantity limitation sampling function be most before, the sequence of interval sampling function be that intermediate (i.e. sequence is limited in quantity and sampled Between the sequence of function and the sequence of random sampling function), the sequence of random sampling function is finally, quantity limits sampling function Corresponding sampling parametric value is sample size threshold value, and the corresponding sampling parametric value of interval sampling function is distance values, random sampling The corresponding sampling parametric value of function is random probability value.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: sampling function is limited based on quantity and data from the sample survey threshold value is sampled pending data, is determined Second data from the sample survey undetermined;The second data from the sample survey undetermined is sampled based on interval sampling function and distance values, determines Three data from the sample survey undetermined;Third data from the sample survey undetermined is sampled based on random sampling function and random probability value, is determined Data from the sample survey.
It i.e. in the present embodiment, can be based on the data sequence of pending data, by data front in pending data Sample size threshold number evidence, be determined as the second data from the sample survey undetermined.Based on interval sampling function and distance values, to second Data from the sample survey undetermined is sampled, and determining third data from the sample survey undetermined, (the as second data from the sample survey intermediate compartment undetermined is away from value Data, and including being sequentially first data in the second data from the sample survey undetermined).Then, then based on random sampling function and with Machine probability value is sampled third data from the sample survey undetermined, determines data from the sample survey.
In this way, can first ensure that the quantity of data is not too big, it is ensured that in data from the sample survey threshold range, reduce data volume. And be sampled after quantity limitation sampling, then through interval sampling function, it, can be more to ensure the uniformity of data distribution The rule of the nearly conceptual data of adjunction.It is sampled finally by random sampling function, in sampling process, can be taken out traversal through interval Whether the entire data that sample obtains, each data are sampled, be all it is uncertain, further ensure that data from the sample survey distribution is equal It is even.I.e. on the basis of ensuring data volume, considers further that the uniform problem of data distribution, data volume can be reduced, also can ensure that number According to uniform.
In the case where sampling function includes quantity limitation sampling function, interval sampling function and random sampling function, The sequence of sampling function can arbitrarily change, be based on actual demand, can with quantity limit sampling function, interval sampling function and with Orderly any combination corresponding to machine sampling function is sampled, and the process that unitary sampling function is sampled is somebody's turn to do with above-mentioned It is identical that unitary sampling function is individually sampled principle, the difference is that, sampling sequence is different, in different data basis It is sampled, obtained sampling results can be different.
In one embodiment, sampling function includes skip sampling function, quantity limitation sampling function and random sampling letter Number, the sequence of skip sampling function be most before, to limit the sequence of sampling function be that intermediate (i.e. sequence is in skip sampling function to quantity Sequence and random sampling function sequence between), the sequence of random sampling function is finally, quantity limitation sampling function is corresponding Sampling parametric value be sample size threshold value, the corresponding sampling parametric value of skip sampling function be number of hops, random sampling letter The corresponding sampling parametric value of number is random probability value.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on skip sampling function and number of hops to pending data, determine that second is undetermined Data from the sample survey;Sampling function is limited based on quantity and sample size threshold value is sampled the second data from the sample survey undetermined, is determined Third data from the sample survey undetermined;Third data from the sample survey undetermined is sampled based on random sampling function and random probability value, really Determine data from the sample survey.
I.e. in the present embodiment, data sequence in pending data can jumped based on the data sequence of pending data Data after the number that jumps, are determined as the second data from the sample survey undetermined.Data sequence based on the second data from the sample survey undetermined, by second The sample size threshold number evidence of front, is determined as third data from the sample survey undetermined in data from the sample survey undetermined.Then, it then is based on Random sampling function and random probability value are sampled third data from the sample survey undetermined, determine data from the sample survey.
In this way, can skip the data that sequence is earlier in pending data, its influence to whole sampling results is avoided. And after skip sampling, sampling function is limited by quantity and is sampled, the data volume to ensure data is not too big, it is ensured that In data from the sample survey threshold range, data volume is reduced.It is sampled finally by random sampling function, in sampling process, can traverse By the obtained entire data of quantity limitation sampling, whether each data is sampled, be all it is uncertain, further ensure that Data from the sample survey is evenly distributed.
In the case where sampling function includes skip sampling function, quantity limitation sampling function and random sampling function, The sequence of sampling function can arbitrarily change, be based on actual demand, can with skip sampling function, quantity limitation sampling function and with Orderly any combination corresponding to machine sampling function is sampled, and the process that unitary sampling function is sampled is somebody's turn to do with above-mentioned It is identical that unitary sampling function is individually sampled principle, the difference is that, sampling sequence is different, in different data basis It is sampled, obtained sampling results can be different.
In one embodiment, when the quantity of sampling function is 4, i.e., sampling function includes skip sampling function, quantity Sampling function, interval sampling function and random sampling function are limited, then place is first treated for most preceding sampling function based on sequence Reason data are sampled, and determine the 4th data from the sample survey undetermined, then based on sequence for second sampling function to the 4th sampling undetermined Data are sampled, and determine the 5th data from the sample survey undetermined, based on the sampling function that sequence is third to the 5th data from the sample survey undetermined It is sampled, determines the 6th data from the sample survey undetermined, be finally last sampling function to the 6th data from the sample survey undetermined based on sequence It is sampled, determines data from the sample survey.
Below with the combination of four sampling functions of partial order (skip sampling function, quantity limitation sampling function, interval The quantity of the sequential combination of sampling function and random sampling function has 24) for be illustrated.
For example, in one embodiment, sampling function includes skip sampling function, interval sampling function, random sampling letter Several and quantity limits sampling function, the sequence of skip sampling function be most before, the sequence of interval sampling function is in skip sampling Between the sequence of function and the sequence of random sampling function, the sequence and number of the sequence of random sampling function in interval sampling function Between the sequence of amount limitation sampling function, the sequence that quantity limits sampling function is finally, the corresponding sampling of skip sampling function Parameter value is number of hops, and the corresponding sampling parametric value of interval sampling function is distance values, the corresponding sampling of random sampling function Parameter value is random probability value, and it is sample size threshold value that quantity, which limits the corresponding sampling parametric value of sampling function,.
In the present embodiment, it is based on sampling function and corresponding sampling parametric value, pending data is sampled, really Determine data from the sample survey, comprising: be sampled based on skip sampling function and number of hops to pending data, determine that the 4th is undetermined Data from the sample survey;The 4th data from the sample survey undetermined is sampled based on interval sampling function and distance values, determines the 5th pumping undetermined Sample data (data including coming first in the 4th data from the sample survey undetermined);Based on random sampling function and random probability value pair 5th data from the sample survey undetermined is sampled, and determines the 6th data from the sample survey undetermined;Sampling function and sampling number are limited based on quantity Amount threshold value is sampled the 6th data from the sample survey undetermined, determines data from the sample survey.
I.e. in the present embodiment, first data sequence in pending data can be existed based on the data sequence of pending data Data after number of hops are determined as the 4th data from the sample survey undetermined;On the basis of four data from the sample survey undetermined, pass through interval Sampling function and distance values are sampled, then with random sampling function and quantity limitation sampling function be sampled, finally with Quantity limitation sampling function and sample size threshold value are sampled, and obtain data from the sample survey.
In the present embodiment, the quantity of data from the sample survey is limited within sample size threshold value, i.e. the number of data from the sample survey Amount is less than or equal to sample size threshold value.For example, in pending data include 10 data, be followed successively by A1, A2, A3, A4, A5, A6, A7, A8, A9 and A10, number of hops 2, compartment is away from being 2, and random probability value 0.5, sample size threshold value is 3, first Preceding 2 data are skipped, it are not sampled, the 4th data from the sample survey undetermined then include A3, A4, A5, A6, A7, A8, A9 and A10 carries out interval sampling to it according to distance values 2, and the 5th obtained data from the sample survey undetermined includes A3, A5, A7 and A9, then to it Random sampling is carried out, the 6th data from the sample survey undetermined that random sampling obtains may include A3, A5, A7 and A9, on this basis, Quantity limitation sampling is carried out, i.e., by the preceding sample size threshold number of sequence in the 6th data from the sample survey undetermined according to as sampling number According to then data from the sample survey includes A3, A5 and A7.If the 6th data from the sample survey undetermined that random sampling obtains includes A5 and A7, due to The quantity of six data from the sample survey undetermined is less than sample size threshold value and then samples then using the 6th data from the sample survey undetermined as data from the sample survey Data include A5 and A7.
In this way, can skip the data that sequence is earlier in pending data, its influence to whole sampling results is avoided. And after skip sampling, then it is sampled by interval sampling function, it, can more adjunction to ensure the uniformity of data distribution The rule of nearly conceptual data.Then it is sampled by random sampling function, in sampling process, can be obtained traversal through interval sampling Whether the entire data arrived, each data are sampled, be all it is uncertain, further ensure that data from the sample survey is evenly distributed. It is sampled finally by quantity limitation sampling function, the data volume to ensure data is not too big, it is ensured that in data from the sample survey threshold It is worth in range, reduces data volume.The data from the sample survey obtained meets the corresponding sampling prescription of aforementioned four sampling function, that is, meets Four kinds of sampling prescriptions are required with meeting sampling.
It include skip sampling function, quantity limitation sampling function, interval sampling function and random sampling letter in sampling function In the case where number, the sequence of sampling function can arbitrarily change, and be based on actual demand, can be corresponding to interval sampling function Orderly any combination is sampled, the process that unitary sampling function is sampled and the above-mentioned unitary sampling function individually into Line sampling principle is identical, the difference is that, sampling sequence is different, is sampled in different data basis, available Different sampling results.
In one embodiment, the above method is further comprised the steps of: when sampling function meets preset condition, into being based on Sampling function and corresponding sampling parametric value, the step of being sampled to pending data, determine data from the sample survey, otherwise, give Report an error prompt information out.
Since the sampling expression formula of user configuration is there may be deviation, that is, the sampling function that configures simultaneously is unsatisfactory for default item Part can not normally execute sampling process.At this point, the prompt information that reports an error can be provided, to prompt the sampling expression formula of user configuration Error, reconfigurable sampling expression formula when the sampling function in expression formula of sampling meets preset condition, are then executed based on pumping Sample function and corresponding sampling parametric value, the step of being sampled to pending data, determine data from the sample survey.
In one embodiment, preset condition may include belonging to each default sampling function, i.e., belongs to respectively in sampling function When default sampling function, sampling function meets preset condition.
Pre-set default sampling function in the possible simultaneously non-server of the sampling function parsed, even if parsing obtains The sampling function is obtained, corresponding subsampling operation process can not be executed, i.e., can not normally execute sampling process.At this point, report can be provided Wrong prompt information, to prompt the sampling expression formula of user configuration to malfunction, reconfigurable sampling expression formula, in sampling expression formula Sampling function when belonging to each default sampling function, then execute and be based on sampling function and corresponding sampling parametric value, treat place The step of reason data are sampled, determine data from the sample survey.
Above-mentioned data handling procedure is illustrated with a specific embodiment below.
Referring to Fig. 3, being existing Sampling figure.Wherein, data source include database, message queue, network interface with And it is other, article content is stored in database, middle storage is merchandise news, network interface corresponding network interface in message team Data.Data, which are read out, in data source realizes pending data input, under normal circumstances, existing using intrusive mode It is sampled, i.e., every carry out single sample writes single sample code, is added to the code of data input, modification data input Code.Processing, which is sampled, by pending data of the intrusive sampling code to input obtains data from the sample survey.It then can be right After data from the sample survey carries out the data processings such as subsequent filtering and statistics, statistical result can be obtained.For example, to the system of merchandise news The report of merchandise news can be obtained after meter.
Referring to Fig. 4, being the corresponding Sampling figure of data processing method of the application one embodiment.At entire data Data flow framework and Business Logic separation in reason scheme.Data flow framework is a set of stable frame system, is responsible for own Business is connected in series, and provides infrastructure service.Business Logic is variation, depending on different usage scenarios, is responsible for holding The actual work of row.
It include data input layer, sampling expression formula engine, data processing plug-in unit and data output layer in data flow framework, Data input layer, which is read out the data in data source, realizes data input.Sampling expression formula engine is responsible for the pumping of parsing configuration Sample expression formula (such as input.skip (100) .limit (200) .sample (2), i.e., it is suitable first to skip data in pending data 100 forward data of sequence, using data sequence in 100 later data as the second data from the sample survey undetermined, in the second pumping undetermined 200 data of front are extracted in sample data as third data from the sample survey undetermined, with distance values 2 to third sampling number undetermined According to interval sampling is carried out, data from the sample survey is determined) and be sampled applied to pending data, and execute the sampling letter that parsing obtains Sampling can be completed in the corresponding code of number, in the service logic without invading input.After the completion of sampling, availability data processing Plug-in unit handles data from the sample survey, for example, filtering and statistical disposition etc., the result that obtains that treated and by data output Layer output is to Business Logic.
Referring to Fig. 5, being expression formula configuration interface figure.User can be inputted by the expression formula in expression formula configuration interface figure Middle input sampling expression formula, for example, user can input the sampling expression of input.skip (600) .limit (300) in Fig. 5 Formula.Sampling expression formula engine can parse the sampling expression formula to obtain sampling function and corresponding sampling parametric value.Solve Analysing obtained sampling function includes skip sampling function skip and quantity limitation sampling function limit.Referring to Fig. 6, to utilize The schematic diagram of skip sampling function and quantity limitation sampling function com bined- sampling.For skip (5) and limit (11), first jump Preceding 5 data in pending data are crossed, then select the 6th to the 16th data as data from the sample survey.
In Fig. 5, in the sampling expression formula of configuration, the corresponding number of hops of skip sampling function skip is 600, quantity limit The corresponding sample size threshold value of sampling function limit processed is 300.Then the pending data obtained from data source is taken out Sample that is, in sampling process, skips preceding 600 data in pending data, and quantity limitation sampling is carried out since the 601st data The corresponding sampling of function limit, for example, from the 600th number in pending data accordingly after data in preceding 300 data of selection As data from the sample survey.
Based on different demands, any modification can be carried out to sampling expression formula, different sampling expression formulas can be obtained, execute Different sampling processes obtains different data from the sample survey.Referring to Fig. 7, for the principle being sampled using interval sampling function Figure.The sampling process of interval sampling function sample is to be spaced distance values to be extracted, for example, in Fig. 7, sampling expression formula is Sample (4), spacing value 4 extract a data as data from the sample survey every distance values then in pending data.
Referring to Fig. 8, for the schematic diagram being sampled using random sampling function.The sampling of random sampling function random Process is to randomly select, for example, sampling expression formula is random (0.5), random probability value 0.5, then wait locate in Fig. 8 It manages in data, with the random probability value, pending data is randomly selected, to determine data from the sample survey.In Fig. 8, with random The data from the sample survey that sampling function random random sampling obtains includes the 2nd, the 3rd, the 6th, the 8th data, 10-12 Data, the 14th data, the 19th data, the 21-22 number are accordingly and the 25th data.
When the application carries out sampling of data, it is only necessary to configure a sampling expression to the data (i.e. pending data) of input Formula, server execute the achievable sampling of the corresponding code of sampling function in sampling expression formula.If modifying Sampling Strategies, modification sampling Expression formula, for example, when needing using full dose (needing pending data), it is only necessary to modification sampling expression formula, Ji Keshi Existing full dose sampling.
It should be understood that although each step in the flow chart of Fig. 2 is successively shown according to the instruction of arrow, this A little steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, these steps It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part in Fig. 2 Step may include that perhaps these sub-steps of multiple stages or stage are executed in synchronization to multiple sub-steps It completes, but can execute at different times, the execution sequence in these sub-steps or stage, which is also not necessarily, successively to be carried out, But it can be executed in turn or alternately at least part of the sub-step or stage of other steps or other steps.
In one embodiment, as shown in figure 9, providing a kind of data processing equipment, comprising:
Data acquisition module 910, for obtaining pending data;
Expression formula obtains module 920, for obtaining sampling expression formula;
Parsing module 930 obtains sampling function and the corresponding sampling parametric of sampling function for parsing sampling expression formula Value;
Decimation blocks 940 take out pending data for being based on sampling function and corresponding sampling parametric value Sample determines data from the sample survey.
In one embodiment, expression formula obtains module, defeated to expression formula in expression formula configuration interface for receiving response Enter the sampling expression formula that the interactive operation of frame obtains.
In one embodiment, data acquisition module is obtained for being read out by iterator to each data source wait locate Manage data.
In one embodiment, sampling function includes skip sampling function, the corresponding sampling parametric value of skip sampling function For number of hops;
Decimation blocks are being jumped for data sequence in data sequence and pending data based on pending data Data after number, determine data from the sample survey.
In one embodiment, sampling function includes quantity limitation sampling function, and quantity limits the corresponding pumping of sampling function Sample parameter value is sample size threshold value;
Decimation blocks, for the data sequence based on pending data, by the pumping of data front in pending data Sample amount threshold data, are determined as data from the sample survey.
In one embodiment, sampling function includes interval sampling function, the corresponding sampling parametric value of interval sampling function For distance values;
Decimation blocks, sequentially for the data based on pending data, the data by pending data intermediate compartment away from value, It is determined as data from the sample survey, includes the data that data sequence is first in pending data in data from the sample survey.
In one embodiment, sampling function includes random sampling function, the corresponding sampling parametric value of random sampling function For random probability value;
Decimation blocks carry out random sampling to pending data, determine data from the sample survey for being based on random probability value.
In one embodiment, sampling function includes skip sampling function, quantity limitation sampling function, interval sampling function And at least any one in random sampling function.
In one embodiment, sampling function includes skip sampling function and quantity limitation sampling function, skip sampling letter For several sequences before the sequence of quantity limitation sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, It is sample size threshold value that quantity, which limits the corresponding sampling parametric value of sampling function,.
In the present embodiment, decimation blocks, for based on skip sampling function and number of hops to pending data into Line sampling determines the first data from the sample survey undetermined, based on quantity limitation sampling function and sample size threshold value to the first pumping undetermined Sample data are sampled, and determine data from the sample survey.
In one embodiment, decimation blocks are not extracted for existing in pending data greater than number of hops It is that the data after number of hops are determined as the first sampling number undetermined by data sequence in pending data when the data crossed According to.And data from the sample survey is extracted from the first data from the sample survey undetermined, and the quantity of data from the sample survey is less than or equal to sample size threshold Value.
In one embodiment, decimation blocks are zero for initializing sample size;Data based on pending data are suitable Sequence, the data sequence based on the first data from the sample survey undetermined, selects first data (as to jump from the first data from the sample survey undetermined First data after number) it is used as currently processed data;It is less than sample size threshold value, and currently processed number in sample size When according to meeting preset requirement, currently processed data are determined as data from the sample survey, and sample size is added one;And in the first pumping undetermined There are when data after currently processed data in sample data, the data sequence based on the first data from the sample survey undetermined is undetermined by first Adjacent data (next data of i.e. currently processed data) in data from the sample survey after currently processed data is as currently processed Data, and return and be less than sample size threshold value in sample size, and when currently processed data meet preset requirement, it will be currently wait locate The step of data are as data from the sample survey is managed, until sample size is equal in sample size threshold value or the first data from the sample survey undetermined currently There is no data after processing data (quantity of the data after i.e. currently processed data is zero).In addition, in currently processed number When according to being unsatisfactory for preset requirement, then it will deserve pre-processing data and abandon, that is, execute currently processed in the first data from the sample survey undetermined There are when data after data, the data sequence based on the first data from the sample survey undetermined will currently be located in first data from the sample survey undetermined Manage data after adjacent data as currently processed data the step of, update current pending data.
In one embodiment, sampling function includes interval sampling function and quantity limitation sampling function, interval sampling letter For several sequences before the sequence of quantity limitation sampling function, the corresponding sampling parametric value of interval sampling function is distance values, number The corresponding sampling parametric value of amount limitation sampling function is sample size threshold value.
In the present embodiment, decimation blocks determine for being sampled to pending data based on interval sampling function One data from the sample survey undetermined is sampled the first data from the sample survey undetermined based on quantity limitation sampling function, determines data from the sample survey.
In one embodiment, decimation blocks, for the data sequence based on pending data, among pending data It is determined as the first data from the sample survey undetermined every the data of distance values, the data of data front from the first data from the sample survey undetermined, As data from the sample survey.The quantity of data from the sample survey is less than or equal to sample size threshold value, and including data sequence in pending data For first data.
In one embodiment, sampling function includes random sampling function and quantity limitation sampling function, random sampling letter For several sequences before the sequence of quantity limitation sampling function, the corresponding sampling parametric value of random sampling function is random chance Value, it is sample size threshold value that quantity, which limits the corresponding sampling parametric value of sampling function,.
In the present embodiment, decimation blocks, for being based on random sampling function and random probability value to pending data Random sampling is carried out, determines the first data from the sample survey;Sampling function and sample size threshold value is limited based on quantity to sample to first Data are sampled, and determine sampling function.Wherein, the quantity of data from the sample survey is less than or equal to sample size threshold value.
In one embodiment, sampling function includes skip sampling function and interval sampling function, skip sampling function For sequence before the sequence of interval sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, interval sampling The corresponding sampling parametric value of function is distance values.
In the present embodiment, decimation blocks, for based on skip sampling function and number of hops to pending data into Line sampling determines the first data from the sample survey undetermined;The first data from the sample survey undetermined is sampled based on interval sampling function, determines and takes out Sample data.
In one embodiment, decimation blocks, for the data sequence based on pending data, by number in pending data According to data of the sequence after number of hops, it is determined as the first data from the sample survey undetermined.It is then based on the first data from the sample survey undetermined First data of the data from the sample survey intermediate compartment away from value undetermined are determined as data from the sample survey by data sequence, include the in data from the sample survey The data that data sequence is first in one data undetermined.
In one embodiment, sampling function includes skip sampling function and random sampling function, skip sampling function For sequence before the sequence of random sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, random sampling The corresponding sampling parametric value of function is random probability value.
In the present embodiment, decimation blocks, for based on skip sampling function and number of hops to pending data into Line sampling determines the first data from the sample survey undetermined;Based on random sampling function and random probability value to the first data from the sample survey undetermined It is sampled, determines data from the sample survey.
In one embodiment, decimation blocks, for the data sequence based on pending data, by number in pending data According to data of the sequence after number of hops, it is determined as the first data from the sample survey undetermined;It is based on random sampling function and general at random Rate value carries out random sampling to the first data from the sample survey undetermined, determines data from the sample survey.
In one embodiment, sampling function includes interval sampling function and random sampling function, interval sampling function For sequence before the sequence of random sampling function, the corresponding sampling parametric value of interval sampling function is distance values, random sampling letter The corresponding sampling parametric value of number is random probability value.
In the present embodiment, decimation blocks, for being carried out based on interval sampling function and distance values to pending data Sampling, determines the first data from the sample survey undetermined;Based on random sampling function and random probability value to the first data from the sample survey undetermined into Line sampling determines data from the sample survey.
In one embodiment, sampling function includes skip sampling function, interval sampling function and quantity limitation sampling letter Number, the sequence of skip sampling function be most before, the sequence of interval sampling function be that intermediate (i.e. sequence is in the suitable of skip sampling function Between sequence and the sequence of quantity limitation sampling function), the sequence of quantity limitation sampling function is finally, skip sampling function is corresponding Sampling parametric value be number of hops, the corresponding sampling parametric value of interval sampling function be distance values, quantity limit sampling function Corresponding sampling parametric value is sample size threshold value.
In the present embodiment, decimation blocks, for based on skip sampling function and number of hops to pending data into Line sampling determines the second data from the sample survey undetermined;The second data from the sample survey undetermined is carried out based on interval sampling function and distance values Sampling, determines third data from the sample survey undetermined;Sampling function and sample size threshold value are limited to third sampling undetermined based on quantity Data are sampled, and determine data from the sample survey.
In one embodiment, decimation blocks, for the data sequence based on pending data, by number in pending data According to data of the sequence after number of hops, being determined as the second data from the sample survey undetermined, (process is to be based on skip sampling function The process being sampled).Based on interval sampling function and distance values, the second data from the sample survey undetermined is sampled, determines Three data from the sample survey undetermined (the as second data of the data from the sample survey intermediate compartment away from value undetermined, and including the second data from the sample survey undetermined The data that middle sequence is first).Then, then based on quantity limitation sampling function and sample size threshold value to third sampling undetermined Data are sampled, and determine data from the sample survey.
In one embodiment, sampling function includes skip sampling function, interval sampling function and random sampling function, is jumped Jump sampling function sequence be most before, the sequence of random sampling function is finally, the sequence of interval sampling function is in skip sampling Between the sequence of function and the sequence of random sampling function, the corresponding sampling parametric value of skip sampling function is number of hops, with The corresponding sampling parametric value of machine sampling function sampling function is random probability value, and quantity limits the corresponding sampling parametric of sampling function Value is sample size threshold value.
In the present embodiment, decimation blocks, for being based on sampling function and corresponding sampling parametric value, to number to be processed According to being sampled, data from the sample survey is determined, comprising: take out based on skip sampling function and number of hops to pending data Sample determines the second data from the sample survey undetermined;The second data from the sample survey undetermined is sampled based on interval sampling function and distance values, Determine third data from the sample survey undetermined;Third data from the sample survey undetermined is taken out based on random sampling function and random probability value Sample determines data from the sample survey.
In one embodiment, decimation blocks, for the data sequence based on pending data, by number in pending data According to data of the sequence after number of hops, it is determined as the second data from the sample survey undetermined.Based on interval sampling function and distance values, Second data from the sample survey undetermined is sampled, determines third data from the sample survey undetermined (the as second data from the sample survey intermediate compartment undetermined Data away from value, and including being sequentially first data in the second data from the sample survey undetermined).Then, then based on random sampling function And random probability value is sampled third data from the sample survey undetermined, determines data from the sample survey.
In one embodiment, sampling function include skip sampling function, interval sampling function, random sampling function and Quantity limits sampling function, the sequence of skip sampling function be most before, the sequence of interval sampling function is in skip sampling function Between sequence and the sequence of random sampling function, the sequence of random sampling function is limited in the sequence of interval sampling function with quantity Between the sequence of sampling function, the sequence that quantity limits sampling function is finally, the corresponding sampling parametric value of skip sampling function For number of hops, the corresponding sampling parametric value of interval sampling function is distance values, the corresponding sampling parametric value of random sampling function For random probability value, it is sample size threshold value that quantity, which limits the corresponding sampling parametric value of sampling function,.
In the present embodiment, decimation blocks, for based on skip sampling function and number of hops to pending data into Line sampling determines the 4th data from the sample survey undetermined;The 4th data from the sample survey undetermined is carried out based on interval sampling function and distance values Sampling, determines the 5th data from the sample survey (data including coming first in the 4th data from the sample survey undetermined) undetermined;Based on random sampling Function and random probability value are sampled the 5th data from the sample survey undetermined, determine the 6th data from the sample survey undetermined;It is limited based on quantity Sampling function processed and sample size threshold value are sampled the 6th data from the sample survey undetermined, determine data from the sample survey.
In one embodiment, decimation blocks first will be in pending data for the data sequence based on pending data Data of the data sequence after number of hops, are determined as the 4th data from the sample survey undetermined;On the basis of the 4th data from the sample survey undetermined On, it is sampled by interval sampling function and distance values, then carried out with random sampling function and quantity limitation sampling function Sampling finally limits sampling function with quantity and sample size threshold value is sampled, obtains data from the sample survey.
In one embodiment, decimation blocks, for executing and being based on sampling function when sampling function meets preset condition And corresponding sampling parametric value, pending data is sampled, determines data from the sample survey, otherwise, provides the prompt information that reports an error.
In one embodiment, preset condition may include belonging to each default sampling function, i.e., belongs to respectively in sampling function When default sampling function, sampling function meets preset condition.
Specific about data processing equipment limits the restriction that may refer to above for data processing method, wherein Restriction about the decimation blocks in data set can be found in above to the restriction in data processing method, no longer superfluous herein It states.Modules in above-mentioned data processing equipment can be realized fully or partially through software, hardware and combinations thereof.It is above-mentioned each Module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be stored in meter in a software form It calculates in the memory in machine equipment, executes the corresponding operation of the above modules in order to which processor calls.
In one embodiment, a kind of computer equipment is provided, which can be the server in Fig. 1 20, internal structure chart can be as shown in Figure 10.The computer equipment includes processor, the memory connected by system bus And network interface.Wherein, the processor of the computer equipment is for providing calculating and control ability.Wherein, memory includes non- Volatile storage medium and built-in storage.The non-volatile memory medium of the computer equipment is stored with operating system and computer Program.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The meter The network interface for calculating machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor The step of embodiment to realize above-mentioned each method.
It will be understood by those skilled in the art that structure shown in Figure 10, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with meter The step of calculation machine program, processor realizes the above method when executing computer program.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, the meter The step of above method is realized when calculation machine program is executed by processor.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which, which can be stored in a non-volatile computer, can be read storage In medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein each To any reference of memory, storage, database or other media used in embodiment, may each comprise it is non-volatile and/ Or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not Therefore it is interpreted as the limitation to the application the scope of the patents.It should be pointed out that for those of ordinary skill in the art, Without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection model of the application It encloses.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (15)

1. a kind of data processing method characterized by comprising
Obtain pending data;
Obtain sampling expression formula;
The sampling expression formula is parsed, sampling function and the corresponding sampling parametric value of the sampling function are obtained;
Based on the sampling function and the corresponding sampling parametric value, the pending data is sampled, determines and takes out Sample data.
2. the expression formula the method according to claim 1, wherein the acquisition is sampled, comprising:
Receive the sampling expression formula that response obtains the interactive operation of expression formula input frame in expression formula configuration interface.
3. the method according to claim 1, wherein the acquisition pending data, comprising:
Each data source is read out based on iterator, obtains the pending data.
4. the method according to claim 1, wherein the sampling function includes skip sampling function, the jump The corresponding sampling parametric value of sampling function that jumps is number of hops;
It is described to be based on the sampling function and the corresponding sampling parametric value, the pending data is sampled, really Determine data from the sample survey, comprising:
Based on the pending data data sequence and the pending data in data sequence the number of hops it Data afterwards determine the data from the sample survey.
5. the method according to claim 1, wherein the sampling function includes quantity limitation sampling function, institute Stating the corresponding sampling parametric value of quantity limitation sampling function is sample size threshold value;
It is described to be based on the sampling function and the corresponding sampling parametric value, the pending data is sampled, really Determine data from the sample survey, comprising:
Data sequence based on the pending data, by the sample size of data front in the pending data Threshold number evidence is determined as the data from the sample survey.
6. the method according to claim 1, wherein the sampling function includes interval sampling function, between described Every the corresponding sampling parametric value of sampling function be distance values;
It is described to be based on the sampling function and the corresponding sampling parametric value, the pending data is sampled, really Determine data from the sample survey, comprising:
Data sequence based on the pending data, will be spaced the data of the distance values, determines in the pending data It include the data that data sequence is first in the pending data for the data from the sample survey, in the data from the sample survey.
7. the method according to claim 1, wherein the sampling function includes random sampling function, it is described with The corresponding sampling parametric value of machine sampling function is random probability value;
It is described to be based on the sampling function and the corresponding sampling parametric value, the pending data is sampled, really Determine data from the sample survey, comprising:
Based on the random probability value, random sampling is carried out to the pending data, determines the data from the sample survey.
8. method according to any one of claims 1-7, which is characterized in that the sampling function includes skip sampling At least any one in function, quantity limitation sampling function, interval sampling function and random sampling function.
9. the method according to claim 1, wherein further comprising the steps of:
When the sampling function meets preset condition, into based on the sampling function and the corresponding sampling parametric Value, the step of being sampled to the pending data, determine data from the sample survey, otherwise, provides the prompt information that reports an error.
10. the method according to weighing and require 1, which is characterized in that after determining data from the sample survey, further comprise the steps of: to described Data from the sample survey is filtered processing respectively, determines filtered data from the sample survey.
11. a kind of data processing equipment characterized by comprising
Data acquisition module, for obtaining pending data;
Expression formula obtains module, for obtaining sampling expression formula;
Parsing module obtains sampling function and the corresponding sampling ginseng of the sampling function for parsing the sampling expression formula Numerical value;
Decimation blocks, for be based on the sampling function and the corresponding sampling parametric value, to the pending data into Line sampling determines data from the sample survey.
12. device according to claim 10, which is characterized in that the expression formula obtains module, for receiving response pair The sampling expression formula that the interactive operation of expression formula input frame obtains in expression formula configuration interface.
13. device according to claim 10, which is characterized in that the data acquisition module, for passing through iterator pair Each data source is read out, and obtains the pending data.
14. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists The step of realizing the method as described in claim 1-10 any one when, the processor executes the computer program.
15. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 10 is realized when being executed by processor.
CN201810729988.3A 2018-07-05 2018-07-05 Data processing method and device, computer equipment and storage medium Active CN108984700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810729988.3A CN108984700B (en) 2018-07-05 2018-07-05 Data processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810729988.3A CN108984700B (en) 2018-07-05 2018-07-05 Data processing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108984700A true CN108984700A (en) 2018-12-11
CN108984700B CN108984700B (en) 2021-07-27

Family

ID=64537230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810729988.3A Active CN108984700B (en) 2018-07-05 2018-07-05 Data processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108984700B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569284A (en) * 2019-09-09 2019-12-13 联想(北京)有限公司 Information processing method and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078772A (en) * 2013-02-26 2013-05-01 南京理工大学常熟研究院有限公司 Depth packet inspection (DPI) sampling peer-to-peer (P2P) flow detection system based on credibility
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
CN105243127A (en) * 2015-09-30 2016-01-13 海天水务集团股份公司 Report data sampling method for wastewater treatment plant
CN106886535A (en) * 2015-12-16 2017-06-23 大唐软件技术股份有限公司 A kind of data pick-up method and apparatus for being adapted to multiple data sources
CN107181776A (en) * 2016-03-10 2017-09-19 华为技术有限公司 A kind of data processing method and relevant device, system
CN107506383A (en) * 2017-07-25 2017-12-22 中国建设银行股份有限公司 A kind of audit data processing method and computer equipment
CN107766486A (en) * 2017-10-16 2018-03-06 山东浪潮通软信息科技有限公司 Method, apparatus, computer-readable recording medium and the storage control of randomly drawing sample data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078772A (en) * 2013-02-26 2013-05-01 南京理工大学常熟研究院有限公司 Depth packet inspection (DPI) sampling peer-to-peer (P2P) flow detection system based on credibility
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
CN105243127A (en) * 2015-09-30 2016-01-13 海天水务集团股份公司 Report data sampling method for wastewater treatment plant
CN106886535A (en) * 2015-12-16 2017-06-23 大唐软件技术股份有限公司 A kind of data pick-up method and apparatus for being adapted to multiple data sources
CN107181776A (en) * 2016-03-10 2017-09-19 华为技术有限公司 A kind of data processing method and relevant device, system
CN107506383A (en) * 2017-07-25 2017-12-22 中国建设银行股份有限公司 A kind of audit data processing method and computer equipment
CN107766486A (en) * 2017-10-16 2018-03-06 山东浪潮通软信息科技有限公司 Method, apparatus, computer-readable recording medium and the storage control of randomly drawing sample data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
仇立平: "《社会研究方法》", 31 May 2015 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569284A (en) * 2019-09-09 2019-12-13 联想(北京)有限公司 Information processing method and electronic equipment

Also Published As

Publication number Publication date
CN108984700B (en) 2021-07-27

Similar Documents

Publication Publication Date Title
US10627998B2 (en) Facilitating data science operations
US20230367559A1 (en) Development environment for real-time dataflow programming language
US10353882B2 (en) Packaging data science operations
CN110944048B (en) Service logic configuration method and device
CN101689112B (en) Late bound programmatic assistance
US8365149B2 (en) Debugger for a declarative event-driven programming model
US8527452B2 (en) Construction of rules for use in a complex event processing system
CN108388515A (en) Test data generating method, device, equipment and computer readable storage medium
US20090222789A1 (en) Compiler for a Declarative Event-Driven Programming Model
US8838559B1 (en) Data mining through property checks based upon string pattern determinations
CN110308904B (en) Aggregation method and device for multi-type front-end frames and computer equipment
CN109446038A (en) The statistical method and terminal device of page access duration
US12001823B2 (en) Systems and methods for building and deploying machine learning applications
CN109445774A (en) Method for processing business and device based on pelization operation
CN112199261B (en) Application program performance analysis method and device and electronic equipment
CN114895935A (en) Method and device for flashing vehicle ECU, electronic equipment and storage medium
CN112487163B (en) Execution method of automatic flow and method and device for acquiring interface data of execution method
CN108984700A (en) Data processing method and device, computer equipment and storage medium
CN109343856A (en) The generation method and device of custom algorithm component
CN113190576A (en) Data processing method and device, computer equipment and readable storage medium
CN113296902A (en) Task arranging method, equipment, device and system and computer readable storage medium
CN111159226A (en) Index query method and system
CN109582574A (en) A kind of code coverage statistical method, device, storage medium and terminal device
CN114996156A (en) Method and device for testing small program, electronic equipment and readable storage medium
CN109117381A (en) The adjusting, measuring method and device of processing task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant