CN101739410A - Method, device and system for revealing operation result - Google Patents

Method, device and system for revealing operation result Download PDF

Info

Publication number
CN101739410A
CN101739410A CN200810180712A CN200810180712A CN101739410A CN 101739410 A CN101739410 A CN 101739410A CN 200810180712 A CN200810180712 A CN 200810180712A CN 200810180712 A CN200810180712 A CN 200810180712A CN 101739410 A CN101739410 A CN 101739410A
Authority
CN
China
Prior art keywords
sampling
result
client
attribute
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200810180712A
Other languages
Chinese (zh)
Inventor
劳玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN200810180712A priority Critical patent/CN101739410A/en
Publication of CN101739410A publication Critical patent/CN101739410A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device and a system for revealing operation result, which relate to the field of data processing and solve the problems that the search result and a production database have certain distortion so as not to reveal data in real time and the data is relatively lagged. The method for revealing the operation result of the production database comprises the following steps: sampling the stored mass data according to the distribution of the data to establish a micro data set; and sending the micro data set to an auxiliary system. The method, the device and the system are applicable to a database system.

Description

The methods, devices and systems of revealing operation result
Technical field
The present invention relates to data processing field, relate in particular to the methods, devices and systems of revealing operation result.
Background technology
Regular meeting runs into such situation in daily network operation service: the client need check what the user characteristics that satisfies some condition among the mass network user is, how many this class total numbers of users is.Because the data query amount is big, no matter be data base querying record or by model prediction, all can expend the regular hour, the client can only wait for, can not see the result at once, and the client perception degree is relatively poor, and practical application effect is relatively poor.In addition, sometimes, system need carry out alternately with the operator, and according to the client characteristics that the operator selects, client's quantity of feature is satisfied in instant demonstration.Under the situation of customer data amount very big (more than the millions), this instant client's quantity computing almost is impossible.
Industry shows it generally is to inquire about at the backup library in production data storehouse for this big data quantity inquiry at present, adopting the backstage simultaneously (is the business processing program of inquiry system, the user can not intuitively see) new data (period in morning) more when not busy, the foreground (is the front page layout of inquiry system, the user can intuitively see) displaying employing asynchronous process process, data after upgrading are inquired about, what just are presented in the page as a result the time before obtaining, computing is continued on the backstage, after the total data inquiry is finished, just represent the record that meets querying condition and be total to how many bars.
State in realization in the process of revealing operation result, the inventor finds that there are the following problems at least in the prior art: Query Result gets from backup library, and the result who checks out is demonstrating data in real time, and data relatively lag behind.
Summary of the invention
Embodiments of the invention provide a kind of methods, devices and systems of revealing operation result, can inquire the data in production data storehouse in real time.
For achieving the above object, embodiments of the invention adopt following technical scheme:
The method that a kind of production data storehouse operation result calls comprises:
From the mass data of storage, set up the micro data set according to the DATA DISTRIBUTION sampling;
Described micro data set is sent to backup system.
A kind of method of backup system revealing operation result comprises:
Reception is from the micro data set in production data storehouse;
Reception is from the querying condition or the preference pattern of client;
Selection is extracted the result who satisfies client query conditioned disjunction preference pattern by the micro data set, represents to described client.
A kind of production data storehouse comprises:
Sampling unit: be used for setting up the micro data set according to the DATA DISTRIBUTION sampling from the mass data of described storage;
Micro data set transmitting element: be used for described micro data set is sent to backup system.
A kind of backup system comprises:
Micro data set receiving element: be used to receive micro data set from the production data storehouse;
Inquiry receiving element: be used to receive querying condition or preference pattern from client;
Selective extraction is the unit as a result: be used to select to extract the result who satisfies client query conditioned disjunction preference pattern by the micro data set, represent to described client.
A kind of system of revealing operation result comprises: production data storehouse and backup system, wherein,
Production data storehouse: be used for setting up the micro data set according to DATA DISTRIBUTION;
Backup system: be used for selecting extracting and representing Query Result by the micro data set.
The methods, devices and systems of the revealing operation result that the embodiment of the invention provides, also represent the result in real time by micro data set prediction production data library inquiry result earlier, obtaining real Query Result from the production data storehouse again replaces predicting the outcome, no longer rely on backup library, make client can inquire the data in production data storehouse in real time.
Description of drawings
In order to be illustrated more clearly in the technical scheme of the embodiment of the invention, the accompanying drawing of required use is done to introduce simply in will describing embodiment below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the inventive method embodiment one process flow diagram;
Fig. 2 is the inventive method embodiment two process flow diagrams;
Fig. 3 is the particular flow sheet of step S203 among the inventive method embodiment two;
Fig. 4 is a system embodiment structural drawing of the present invention;
Fig. 5 is the structural drawing in production data storehouse in the system embodiment of the present invention;
Fig. 6 is the structural drawing of backup system in the system embodiment of the present invention;
Fig. 7 is the structural drawing of unit as a result of selective extraction in the backup system in the system embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
The inventive method embodiment one, as shown in Figure 1, comprising:
The micro data set is set up according to the DATA DISTRIBUTION sampling in S101, production data storehouse from the mass data of described storage.
The described method that the mass data of storage is sampled comprises: the one or any combination in simple random sampling, systematic sampling, stratified sampling, chester sampling, the multistage sampling.
The purpose of setting up the micro data set is to set up " micro " production data storehouse, when client is inquired about, at first the Query Result that represents to client will produce in the micro data centralization, because the data volume of micro data centralization is less than the production data storehouse, so can be very fast the result be presented in face of the user, reduces user's stand-by period.The list structure of micro data set is general consistent with the list structure in production data storehouse, and the process of setting up described micro data set is in realization with to have the process of setting up database now similar, does not do at this and gives unnecessary details.
In addition, when sampling produces the micro data set, the net result that it should be noted that sampling should make the micro data set can correct response production data storehouse in the distribution characteristics of mass data, be beneficial to the user by the prediction of micro data set to the whole production result of querying database.
S102, production data storehouse send to backup system with described micro data set.
S103, backup system receive the micro data set from the production data storehouse.
S104, backup system receive querying condition or preference pattern from client.
S105, backup system are selected to extract the result who satisfies client query conditioned disjunction preference pattern by the micro data set, represent to described client.
In this step, the process of extracting Query Result by the micro data set is that optionally this mainly is that querying condition and the model considered are comparatively simple, though data volume is very big, existing database technology still can very fast obtaining a result.Therefore, backup system can be chosen in uses prior art to obtain under the situation of Query Result overlong time, extracts Query Result by the micro data set.
Available data amount inquiry methods of exhibiting is when client has query requests; begin database is carried out inquiry; owing to will search to the mass data in the database; calculated amount is very big, when especially a plurality of clients are inquired about simultaneously, even can cause the paralysis in production data storehouse; in order to protect the normal operation in production data storehouse; prior art is set up backup library in addition outside the production data storehouse, the data in the backup library copy production data storehouse are in order to accept the query requests of client.Because the data volume in production data storehouse is often very big, the each backup production data of database of backup library needs for a long time, so generally can't accomplish to upgrade synchronously every day the data in the backup library, this can cause the data in backup library data and the production data storehouse asynchronous, reduces the accuracy and the real-time of Query Result.
In addition, when each client has query requests, all to carry out computing, because operand is excessive to mass data, if wait until that whole computings finish just to represent Query Result to client, time can be very long, and the user can not put up with, so prior art adopts the asynchronous process process, as long as what just are presented in the page as a result the time before obtaining, computing is continued on the backstage, after the total data inquiry is finished, just represents the record that meets querying condition and is total to how many bars.But, be presented in user's Query Result (several results that obtain earlier after just calculating) in front in this way, distribute irrelevant with the overall data of database, the overall distribution situation that can't reflect Query Result to client, can't make the user this inquiry be had the assurance of a macroscopic view, and many times the purpose of user inquiring is very fuzzy, be not intended to the information that accurate location will be searched, exactly only want that inquiry to oneself has the understanding of a macroscopic view, be beneficial to further search,, can not represent the user quickly and want the total result seen so the Query Result user perceptibility of prior art is poor, be unfavorable for aid decision making, effect is bad.
Present embodiment is at first set up a micro data set in the production data storehouse, described micro data set can correct response production data storehouse in the distribution characteristics of mass data.When client has query requests, can inquire about at the micro data set according to this query requests, to the Query Result of client shows to the micro data set, since the micro data set can correct response production data storehouse in the distribution characteristics of mass data, the user can be by predicting to the Query Result of micro data set what the Query Result of this inquiry in the production data storehouse may be, help improving the perceptibility of user, help the assisted user decision-making Query Result.This process can also improve real-time from Query Result to the user of client that return.Meanwhile, system can carry out inquiry to the production data storehouse when not busy on the backstage, and shows true Query Result to the production data storehouse to it when the user needs.By this process, can guarantee the accuracy of final Query Result.
The inventive method embodiment two, as shown in Figure 2, comprising:
S201, backup system are incited somebody to action more during the spare time new data sends to the production data storehouse.
S202, production data storehouse receive the more new data from backup system, and the mass data of storage is upgraded.
The micro data set according to DATA DISTRIBUTION, is set up with the methods of sampling of simple random sampling, chester sampling and systematic sampling combination in S203, production data storehouse from the mass data of described storage.
Described simple random sampling is meant from an overall N unit arbitrary extracting n unit as sample, a kind of sample mode that the probability that each possible sample is drawn equates.
Described chester sampling is constituent parts merger in overall to be become several do not intersect mutually, mutual unduplicated set, is referred to as the group; Be a kind of sample mode of sampling unit sample drawn then with group.
Described systematic sampling is exactly behind the overall element number sorting, calculates certain more at interval, then by the first method of usually forming sample of a fixing extraction.Be suitable for totally reaching all bigger situation of sample size.
Sampling of this step and the general difference to some extent of sampling: what general sampling was at first considered is what the particular problem of sampling is, selects the different methods of samplings to reach the best overall effect of reaction according to problem then.And herein because we can not determine in advance that the user wants the problem of inquiring about, in order not depart from the general characteristic of significant data, we carry out the operation of importance ranking to the data attribute at adding in the practical operation according to the industry experience, sample according to ordering then, make every effort to react to greatest extent general characteristic.
The methods of sampling specifically comprises as shown in Figure 3 in this step:
The data total amount of S203a, setting sampling.
S203b, set the attribute of data from the sample survey, described attribute is carried out importance ranking according to user-selected industry.
S203c, set the amount of sampling of each attribute,,, in the mass data of storage, sample successively to the value characteristic of each attribute according to this attribute according to the importance ranking of attribute.
The amount of sampling of each sampling can be set according to actual needs, such as being set at minimum 20000 records, maximum 100000 records.
If the production data storehouse has data recording N bar, the attribute that extracts data institute foundation has three, is respectively according to importance ranking (from important to inessential): Z_1, Z_2, Z_3.Wherein, the value characteristic of Z_1 and Z_3 is the discrete type value; The value characteristic of Z_2 is the continuous type value.Z_1 discrete value value 0,1,2,3; Z_2 successive value value 0-100, the Z_3 value 0,1,2 that disperses.When sampling, according to the importance ranking of each attribute, sample successively, account for overall number percent by each discrete value of Z_1 earlier and sample, the number percent that accounts for the Z_1 sample note according to the discrete value after the Z_2 discretize again in these records that are extracted into is sampled then, carries out successively.
In addition, also need determine the amount of sampling of each attribute: establishing the attribute that participates in importance ranking has n (having three in the present embodiment: Z_1, Z_2, Z_3), the overall number of record N, the sample that is extracted at last accounts for overall y%, the sample number of the ^n=N of N * (x%) * y%=sampling then, wherein x% represent the to sample record number of a back attribute in the sample accounts for the number percent of previous attribute record number.
If in the production data storehouse, Z_1 gets 0,1, the ratio of 2,3 data acknowledgment number is 3: 3.5: 2: 1.5.
If the record number by Z_1 sampling is N_1 first, wherein the ratio of the record number of each value is constant, and then to get the record number of 0 value be N_1* (3/10) to Z_1.
If with the Z_2 discretize, be divided into 10 sections: 0~10,11~20,21~30,31~40,41~50,51~60,61~70,71~80,81~90,91~100.
If each section got in the record number of 0 value each segment record number of Z_2 value than being a_1: a_2: a_3: a_4: a_5: a_6: a_7: a_8: a_9: a_10 at Z_1; Then establishing and getting the record number of randomly drawing 0 value in this ratio from Z_1 is N_2.
If each section got in the record number of 1 value each segment record number of Z_2 value than being b_1: b_2: b_3: b_4: b_5: b_6: b_7: b_8: b_9: b_10 at Z_1; Then establishing and getting the record number of randomly drawing 1 value in this ratio from Z_1 is N_3.
If each section got in the record number of 2 values each segment record number of Z_2 value than being c_1: c_2: c_3: c_4: c_5: c_6: c_7: c_8: c_9: c_10 at Z_1; Then establishing and getting the record number of randomly drawing 2 values in this ratio from Z_1 is N_4.
If each section got in the record number of 3 values each segment record number of Z_2 value than being d_1: d_2: d_3: d_4: d_5: d_6: d_7: d_8: d_9: d_10 at Z_1; Then establishing and getting the record number of randomly drawing 3 values in this ratio from Z_1 is N_5.
Extract according to the attribute importance ranking with this recursion, the number percent that a then back attribute record number accounts for previous attribute record number is x%=(N_2+N_3+N_4+N_5)/N_5.
If Z_2 attribute record number accounts for the Z_1 attribute record and counts x_1%, Z_3 attribute record number accounts for the Z_2 attribute record and counts x_2%, and 20,000<N_1*x_1%*x_2%<100,000 then can obtain the scope of the value of simple random sampling for the first time in view of the above.Other attribute is repeated this step just can set up the micro data set.
When the value characteristic of the attribute of sampling is the continuous type value, random value may be lost the value record of attribute extreme value after the discretize, in example above, after the value discretize of Z_2, lost 10~11,20~21,30~31,40~41,50~51,60~61,70~71,80~81,90~91 value record.Therefore after sampling finishes, need separately above-mentioned field to be sampled, will write down number and add to the micro data centralization.
If the sample that above-mentioned sampling obtains at last accounts for overall y%, if value field S_1 (is a Z_2 value 10~11 in the last example continuously herein, other continuous value field is by that analogy) extreme value record number is P_1, S_2 extreme value record number is P_2, S_3 is P_3......, then extreme value sample note number be [P_1*y%]+[P_2*y%]+..., [] expression round, promptly a record will be extracted out in record of less than.Extract record number out if existing, then no longer insert data centralization in data centralization.
S204, production data storehouse send to backup system with described micro data set.
S205, backup system receive the micro data set from the production data storehouse.
S206, backup system receive querying condition or preference pattern from client.
S207, backup system judge the existing database revealing operation result method of using, whether can be in the standard time return results.
If S208 existing database revealing operation result method can represent the result to client in the standard time, use described existing database revealing operation result method.
If S209 existing database revealing operation result method can not represent the result to client in the standard time, select to extract the result who satisfies client query conditioned disjunction preference pattern by the micro data set.
In this step, the process of extracting Query Result by the micro data set is that optionally this mainly is that querying condition and the model considered are comparatively simple, though data volume is very big, but existing database technology still can very fast obtaining a result, and adopts the technology of subregion storage as data.Therefore, backup system can be chosen in uses prior art to obtain under the situation of Query Result overlong time, extracts Query Result by the micro data set.Usually, the user wishes initial conditions, can see the result by the page at once.But investigation finds that nearly 4 one-tenth netizen is 10 seconds for the time point of standing of opening webpage, and 8 one-tenth above netizens are intolerable for also not opening webpage more than 20 seconds.Therefore get 20 seconds here as the standard time, extracts Query Result for the computing that can't obtain total result in 20 seconds by the micro data set.Judgment formula is as follows:
Page open time T=20/ (3 * 1000) * (N (Field Count) * P (field inquiry coefficient)+C (computational complexity)) * G (data volume, the record number is a unit with ten thousand)
Wherein, 20/ (3 * 1000) are experience digit, represent that 1,000 ten thousand do not have 3 fields of record queries of index to need 20 seconds consuming time.P (field inquiry coefficient) is the experience factor that produces owing to " being multiplied needs the field of inquiry but expend time in can not be multiplied ".C (computational complexity) rule of thumb defines, general query-defined be that common factor, the union operation definition of 0,2 data set is 1, the friendship union of n data set is defined as n/2*P.
S210, convert client query conditioned disjunction preference pattern to SQL statement, carry out inquiry, and obtain the Query Result of micro data set in described micro data centralization.
S211, according to the Query Result of the micro data set that obtains, the Query Result in prediction production data storehouse, and the Query Result in the production data storehouse of described prediction represented to client on the foreground.
S212, the condition that continuation is set according to client to the mass data in the production data storehouse on the backstage are carried out computing, obtain the Query Result in real production data storehouse.
S213, the Query Result in the production data storehouse of the Query Result in described real production data storehouse and prediction is compared, if error is less than the standard thresholding, the Query Result of replacing the production data storehouse of prediction with the Query Result in real production data storehouse represents to the client; If error overgauge thresholding sends warning message.
Prediction of result whole result by the micro data set is after the foreground is showed, the conditioned disjunction model selected that the backstage is set according to the user raw data set distributes certain internal memory to carry out computing, up to computing finish obtain the Query Result in real production data storehouse after, the gained result is compared with predicting the outcome, if error is less than standard thresholding (the standard thresholding gets 5% in the present embodiment), then replace the Query Result in the production data storehouse of prediction with the Query Result in real production data storehouse, otherwise then the system manager is sent warning information, check whether the methods of sampling is reasonable, find out the concrete reason of problem.
Present embodiment is at first set up a micro data set in the production data storehouse, described micro data set can correct response production data storehouse in the distribution characteristics of mass data.When client has query requests, can inquire about at the micro data set according to this query requests, to the Query Result of client shows to the micro data set, also can be under the short prerequisite of query time, the inquiry of use prior art guarantees in the shortest time Query Result to be represented to the user.Since the micro data set can correct response production data storehouse in the distribution characteristics of mass data, the user can be by predicting to the Query Result of micro data set what the Query Result of this inquiry in the production data storehouse may be, help improving the perceptibility of user, help the assisted user decision-making Query Result.This process can also improve real-time from Query Result to the user of client that return.Meanwhile, system can carry out inquiry to the production data storehouse when not busy on the backstage, and shows true Query Result to the production data storehouse to it when the user needs.By this process, can guarantee the accuracy of final Query Result.
In addition, the realization of present embodiment no longer relies on backup library, has saved system cost, and directly from production data storehouse derived data, data are true, does not have to postpone.
System embodiment of the present invention as shown in Figure 4, comprising: production data storehouse 1 and backup system 2, wherein,
Production data storehouse 1: be used to set up the micro data set.
Backup system 2: be used for selecting extracting and representing Query Result by the micro data set.
In reality, described backup system can be the safeguards system that is used to cooperate, support the system that is runing on the existing network.
Described production data storehouse 1 comprises as shown in Figure 5:
Receiving element 11: be used to receive more new data, the mass data of storage is upgraded from backup system.
Sampling unit 12: be used for setting up the micro data set from the mass data sampling of described storage.
Micro data set transmitting element 13: be used for described micro data set is sent to backup system.
Wherein, described sampling unit 12 comprises:
Total amount is set subelement 121: the data total amount that is used to set sampling.
Ordering subelement 122: be used for setting the attribute of data from the sample survey, described attribute is carried out importance ranking according to user-selected industry.
Sample process subelement 123: be used to set the amount of sampling of each attribute,,, in the mass data of storage, sample successively to the value characteristic of each attribute according to this attribute according to the importance ranking of attribute.
Further, described sample process subelement 123 is specially:
If the value characteristic of described attribute is the discrete type value, the number percent that accounts for the amount of sampling of described attribute according to each discrete value carries out simple random sampling;
If the value characteristic of described attribute is the continuous type value, the span of described attribute is divided between more than one discrete regions, carry out simple random sampling according to the number percent of the amount of sampling that accounts for described attribute between each discrete regions.
In addition, backup system 2 comprises as shown in Figure 6:
Updating block 21: inciting somebody to action more when being used for the spare time, new data sends to the production data storehouse.
Micro data set receiving element 22: be used to receive micro data set from the production data storehouse.
Inquiry receiving element 23: be used to receive querying condition or preference pattern from client.
Selective extraction is unit 24 as a result: be used to select to extract the result who satisfies client query conditioned disjunction preference pattern by the micro data set, represent to described client.
Wherein, described selective extraction comprises unit 24 as a result as shown in Figure 7:
Judgment sub-unit 241: be used for judge using existing database revealing operation result method, whether can be in the standard time return results.
Show subelement 242: if existing database revealing operation result method can not represent the result to client in the standard time, be used for Query Result, and the Query Result of described prediction is represented to client on the foreground by micro data set prediction production data storehouse.
If existing database revealing operation result method can represent the result to client in the standard time, use described existing database revealing operation result method.
Query Result operator unit 243: be used for the mass data in the production data storehouse being continued to carry out computing according to the condition that client is set, obtain the Query Result in real production data storehouse on the backstage.
Comparer unit 244: be used for the Query Result in the production data storehouse of the Query Result in described real production data storehouse and prediction is compared, if error is less than the standard thresholding, the Query Result of replacing the production data storehouse of prediction with the Query Result in real production data storehouse represents to the client; If error overgauge thresholding sends warning message.
Present embodiment also represents the result in real time by micro data set prediction production data library inquiry result earlier, obtaining real Query Result from the production data storehouse again replaces predicting the outcome, no longer rely on backup library, make client can inquire the data in production data storehouse in real time.
One of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random Access Memory, RAM) etc.
The above; only be the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (15)

1. the method that production data storehouse operation result calls is characterized in that, comprising:
From the mass data of storage, set up the micro data set according to the DATA DISTRIBUTION sampling;
Described micro data set is sent to backup system.
2. method according to claim 1 is characterized in that, the described method that the mass data of storage is sampled comprises: the one or any combination in simple random sampling, systematic sampling, stratified sampling, chester sampling, the multistage sampling.
3. method according to claim 2 is characterized in that, the methods of sampling of described simple random sampling, chester sampling and systematic sampling combination comprises:
Set the data total amount of sampling;
Attribute according to user-selected industry setting data from the sample survey carries out importance ranking to described attribute;
Set the amount of sampling of each attribute,,, in the mass data of storage, sample successively to the value characteristic of each attribute according to this attribute according to the importance ranking of attribute.
4. method according to claim 3 is characterized in that, described method of sampling according to the value characteristic of attribute is specially:
If the value characteristic of described attribute is the discrete type value, the number percent that accounts for the amount of sampling of described attribute according to each discrete value carries out simple random sampling;
If the value characteristic of described attribute is the continuous type value, the span of described attribute is divided between more than one discrete regions, carry out simple random sampling according to the number percent of the amount of sampling that accounts for described attribute between each discrete regions.
5. method according to claim 4, it is characterized in that, when the value characteristic of described attribute is the continuous type value, be divided between discrete regions after, the part between between described discrete regions is carried out simple random sampling with a fixed percentage of the amount of sampling of described attribute.
6. the method for a backup system revealing operation result is characterized in that, comprising:
Reception is from the micro data set in production data storehouse;
Reception is from the querying condition or the preference pattern of client;
Selection is extracted the result who satisfies client query conditioned disjunction preference pattern by the micro data set, represents to described client.
7. the method for backup system revealing operation result according to claim 6 is characterized in that, described selection is extracted the result who satisfies client query conditioned disjunction preference pattern by the micro data set, and the step that returns to described client comprises:
Judge to use existing database revealing operation result method, whether can be in the standard time return results;
If existing database revealing operation result method can not represent the result to client in the standard time, predict the Query Result in production data storehouse by the micro data set, and the Query Result of described prediction is represented to client on the foreground;
On the backstage mass data in the production data storehouse is continued to carry out computing according to the condition that client is set, obtain the Query Result in real production data storehouse;
The Query Result in the production data storehouse of the Query Result in described real production data storehouse and prediction is compared, if error is less than the standard thresholding, the Query Result of replacing the production data storehouse of prediction with the Query Result in real production data storehouse represents to the client; If error overgauge thresholding sends warning message.
8. the method for backup system revealing operation result according to claim 7 is characterized in that, described selection is extracted the result who satisfies client query conditioned disjunction preference pattern by the micro data set, and the step that returns to described client also comprises:
If existing database revealing operation result method can represent the result to client in the standard time, use described existing database revealing operation result method.
9. a production data storehouse is characterized in that, comprising:
Sampling unit: be used for setting up the micro data set according to the DATA DISTRIBUTION sampling from the mass data of described storage;
Micro data set transmitting element: be used for described micro data set is sent to backup system.
10. production data according to claim 9 storehouse is characterized in that, described sampling unit comprises:
Total amount is set subelement: the data total amount that is used to set sampling;
Ordering subelement: be used for setting the attribute of data from the sample survey, described attribute is carried out importance ranking according to user-selected industry;
Sample process subelement: be used to set the amount of sampling of each attribute,,, in the mass data of storage, sample successively to the value characteristic of each attribute according to this attribute according to the importance ranking of attribute.
11. production data according to claim 10 storehouse is characterized in that, described sample process subelement is specially:
If the value characteristic of described attribute is the discrete type value, the number percent that accounts for the amount of sampling of described attribute according to each discrete value carries out simple random sampling;
If the value characteristic of described attribute is the continuous type value, the span of described attribute is divided between more than one discrete regions, carry out simple random sampling according to the number percent of the amount of sampling that accounts for described attribute between each discrete regions.
12. a backup system is characterized in that, comprising:
Micro data set receiving element: be used to receive micro data set from the production data storehouse;
Inquiry receiving element: be used to receive querying condition or preference pattern from client;
Selective extraction is the unit as a result: be used to select to extract the result who satisfies client query conditioned disjunction preference pattern by the micro data set, represent to described client.
13. backup system according to claim 12 is characterized in that, described selective extraction unit as a result comprises:
Judgment sub-unit: be used for judge using existing database revealing operation result method, whether can be in the standard time return results;
Show subelement: if existing database revealing operation result method can not represent the result to client in the standard time, be used for Query Result, and the Query Result of described prediction is represented to client on the foreground by micro data set prediction production data storehouse;
Query Result operator unit: be used for the mass data in the production data storehouse being continued to carry out computing according to the condition that client is set, obtain the Query Result in real production data storehouse on the backstage;
Comparer unit: be used for the Query Result in the production data storehouse of the Query Result in described real production data storehouse and prediction is compared, if error is less than the standard thresholding, the Query Result of replacing the production data storehouse of prediction with the Query Result in real production data storehouse represents to the client; If error overgauge thresholding sends warning message.
14. backup system according to claim 13, it is characterized in that, described chooser unit also is used for: if existing database revealing operation result method can represent the result to client in the standard time, use described existing database revealing operation result method.
15. the system of a revealing operation result is characterized in that, comprising: production data storehouse and backup system, wherein,
Production data storehouse: be used for setting up the micro data set according to DATA DISTRIBUTION;
Backup system: be used for selecting extracting and representing Query Result by the micro data set.
CN200810180712A 2008-11-24 2008-11-24 Method, device and system for revealing operation result Pending CN101739410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810180712A CN101739410A (en) 2008-11-24 2008-11-24 Method, device and system for revealing operation result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810180712A CN101739410A (en) 2008-11-24 2008-11-24 Method, device and system for revealing operation result

Publications (1)

Publication Number Publication Date
CN101739410A true CN101739410A (en) 2010-06-16

Family

ID=42462902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810180712A Pending CN101739410A (en) 2008-11-24 2008-11-24 Method, device and system for revealing operation result

Country Status (1)

Country Link
CN (1) CN101739410A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737063A (en) * 2011-04-15 2012-10-17 阿里巴巴集团控股有限公司 Processing method and processing system for log information
CN104881475A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and system for randomly sampling big data
CN106776733A (en) * 2016-11-22 2017-05-31 北京航天自动控制研究所 The data staging methods of sampling in data handling system
CN108491262A (en) * 2018-02-28 2018-09-04 华为技术有限公司 A kind of method and apparatus of sampling inquiry
CN111538728A (en) * 2020-04-27 2020-08-14 中国科学技术大学 Method for archiving and querying historical data of large scientific device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737063A (en) * 2011-04-15 2012-10-17 阿里巴巴集团控股有限公司 Processing method and processing system for log information
CN102737063B (en) * 2011-04-15 2014-09-10 阿里巴巴集团控股有限公司 Processing method and processing system for log information
CN104881475A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and system for randomly sampling big data
CN106776733A (en) * 2016-11-22 2017-05-31 北京航天自动控制研究所 The data staging methods of sampling in data handling system
CN108491262A (en) * 2018-02-28 2018-09-04 华为技术有限公司 A kind of method and apparatus of sampling inquiry
WO2019165762A1 (en) * 2018-02-28 2019-09-06 华为技术有限公司 Sampling query method and device
CN111538728A (en) * 2020-04-27 2020-08-14 中国科学技术大学 Method for archiving and querying historical data of large scientific device

Similar Documents

Publication Publication Date Title
AU2007336337B2 (en) System and method for optimizing changes of data sets
CN102193929B (en) Method and equipment for searching by using word information entropy
CN103678647A (en) Method and system for recommending information
CN101840418A (en) User word library synchronous update method, update server and input method system
CN102780603B (en) Web traffic control method and device
CN103207882A (en) Shop visiting data processing method and system
CN109324905A (en) Database operation method, device, electronic equipment and storage medium
CN101739410A (en) Method, device and system for revealing operation result
CN101765096A (en) Method, device and system for querying order relations
CN103455560A (en) Data query method and system
CN113515545A (en) Data query method, device, system, electronic equipment and storage medium
CN109829098A (en) Search result optimization method, device and server
CN109978575B (en) Method and device for mining user flow operation scene
CN115423555A (en) Commodity recommendation method and device, electronic equipment and storage medium
CN110737432A (en) script aided design method and device based on root list
CN109739885A (en) Data query method, apparatus, equipment and storage medium based on local cache
CN110765226B (en) Goods owner matching method, device, equipment and medium
CN105095224A (en) Method, apparatus and system for carrying out OLAP analysis in mobile communication network
CN112347099A (en) Data processing method and device, computing equipment and computer readable storage medium
CN112052259A (en) Data processing method, device, equipment and computer storage medium
CN115455042A (en) Data processing method, apparatus and computer readable storage medium
CN114581202A (en) Decoration style customization method and device, equipment, medium and product thereof
CN115080824A (en) Target word mining method and device, electronic equipment and storage medium
CN114036391A (en) Data pushing method and device, electronic equipment and storage medium
CN114253938A (en) Data management method, data management device, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20100616

RJ01 Rejection of invention patent application after publication