CN104809175A - Generation method and device of feature library - Google Patents

Generation method and device of feature library Download PDF

Info

Publication number
CN104809175A
CN104809175A CN201510173241.0A CN201510173241A CN104809175A CN 104809175 A CN104809175 A CN 104809175A CN 201510173241 A CN201510173241 A CN 201510173241A CN 104809175 A CN104809175 A CN 104809175A
Authority
CN
China
Prior art keywords
target
random number
arbitrary width
array
random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510173241.0A
Other languages
Chinese (zh)
Other versions
CN104809175B (en
Inventor
朱仲颖
张钦
张黎敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN201510173241.0A priority Critical patent/CN104809175B/en
Publication of CN104809175A publication Critical patent/CN104809175A/en
Application granted granted Critical
Publication of CN104809175B publication Critical patent/CN104809175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a generation method and device of a feature library. The method comprises the steps of determining the target element scale and number of feature cords in the target element; randomly generating random numbers as much as the feature records in the target element according to preset random seed; storing as the initial random step-size number set; modifying the initial random step-size number set at least twice to obtain a target random step-size number set, wherein the modifying efficiency is high; dividing the whole target record according to the target element scale; acquiring the corresponding feature cord in each element according to the target random step-size number set to be used as a corresponding sample library; combining the sets of the sample library corresponding to each element as the feature library of the whole target record. According to the method, the target random step-size number set can be generated at a time to apply to all elements, so that the performance cost of acquiring the feature records by a database management system can be decreased, the CBO analyzing efficiency can be increased, and the sampling randomness and the sampling rate accuracy can be met.

Description

The generation method and apparatus of feature database
Technical field
The embodiment of the present invention relates to the data sampling techniques of data base management system (DBMS), particularly relates to a kind of generation method and apparatus of feature database.
Background technology
SQL (Structured Query Language, the Structured Query Language (SQL)) statement that data base management system (DBMS) inputs according to user generates corresponding executive plan.Most of data base management system (DBMS), all introduce optimizer (the cost based optimizer based on cost, CBO), namely data base management system (DBMS) obtains the relevant all information of executive plan, by doing computational analysis to these information, show that the executive plan of a Least-cost in all feasible executive plans is as final executive plan, to improve the execution efficiency of data base management system (DBMS).And the sampling computational analysis to data-base recording, be the foundation stone that CBO exists.
Computational analysis is carried out to all records, no doubt can improve the accuracy of CBO, but for magnanimity record, such cost is too high, can reduce the execution efficiency of data base management system (DBMS) on the contrary.So, how in the record of magnanimity, sample randomly, obtain feature record, and generating feature storehouse seems particularly important.
Usually, can think that the record in data base management system (DBMS) is Coutinuous store, the sampling process obtaining feature database is generally: after current record offsets A and step-length, obtain a feature record, this feature record relativity shift A ' got of relativity shift, obtains next feature record again; This process, finally obtains feature database repeatedly.
Due to the unevenness of Data distribution8 in database, the above-mentioned stochastic sampling method of the current many employings of data base management system (DBMS) manufacturer obtains feature record, thus generating feature storehouse, but disclosed data does not all relate to how effectively utilizing arbitrary width to generate the method for feature database at present.
Summary of the invention
The embodiment of the present invention provides a kind of generation method and apparatus of feature database, to optimize the acquisition mode of feature record.
First aspect, embodiments provides a kind of generation method of feature database, comprising:
According to the initial element of set scale preset and sample percentage, determine the number of feature record in target element of set scale and target element of set;
Utilize the random number that default random seed stochastic generation number is identical with the number of feature record in described target element of set, and each random number generated is saved as initial random step-length array, the span of each random number is all between 0 to described target element of set scale;
Calculate each random number sum that described initial random step-length array comprises;
When determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array;
According to described target element of set scale, whole target record is divided;
For dividing each element of set obtained, described target arbitrary width array is utilized in this element of set, to obtain corresponding feature record, as the Sample Storehouse that this element of set is corresponding;
Determine the union of the Sample Storehouse that each element of set is corresponding, as the feature database of described whole target record.
Second aspect, embodiments provides a kind of generating apparatus of feature database, comprising:
Parameter configuration module, for according to the initial element of set scale preset and sample percentage, determines the number of feature record in target element of set scale and target element of set;
Initial random step-length array generation module, for utilizing the random number that default random seed stochastic generation number is identical with the number of feature record in described target element of set, and each random number generated is saved as initial random step-length array, the span of each random number is all between 0 to described target element of set scale;
Target arbitrary width array generation module, for calculating each random number sum that described initial random step-length array comprises; When determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array;
Feature database generation module, for dividing whole target record according to described target element of set scale; For dividing each element of set obtained, described target arbitrary width array is utilized in this element of set, to obtain corresponding feature record, as the Sample Storehouse that this element of set is corresponding; Determine the union of the Sample Storehouse that each element of set is corresponding, as the feature database of described whole target record.
The generation method and apparatus of the feature database that the embodiment of the present invention provides, by determining target element of set scale, and the whole target record in the appointment table using target element of set scale to store data base management system (DBMS) carries out division obtains each element of set, by determining the number of feature record in target element of set, random seed is utilized to generate corresponding random number, and using the capacity of the number of feature record in target element of set as initial random step-length array, obtain initial random step-length array, the each random number sum comprised by initial random step-length array and the consistance of target element of set scale are judged, conforming initial random step-length array will be met as target arbitrary width array, the quantity of the feature record utilizing target arbitrary width array can control to collect in each element of set, and the target arbitrary width array of element of set only needs generation once just to can be used for all element of sets, thus reduce the performance cost of data base management system (DBMS) acquisition characteristics record, decrease the cost that CBO analyzes the feature record collected, improve the analysis efficiency of CBO, in addition, the each random number sum comprised due to target arbitrary width array is consistent with described target element of set scale, for dividing each element of set obtained, can ensure that the sample range of the feature record obtained in each element of set covers each element of set, the randomness of sampling and the accuracy of sampling rate can be met simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention, introduce doing one to the accompanying drawing used required in the present invention simply below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of the generation method of a kind of feature database that Fig. 1 a provides for the embodiment of the present invention one;
Fig. 1 b obtains the schematic diagram of feature record for the random array of use target that the embodiment of the present invention one provides in element of set;
The schematic flow sheet of the generation method of a kind of feature database that Fig. 2 a provides for the embodiment of the present invention two;
The once correction of passing through that Fig. 2 b provides for the embodiment of the present invention two obtains the schematic flow sheet of target arbitrary width array;
The schematic flow sheet being obtained target arbitrary width array by second-order correction that Fig. 2 c provides for the embodiment of the present invention two;
The structural representation of the generating apparatus of a kind of feature database that Fig. 3 provides for the embodiment of the present invention three.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, be described in further detail the technical scheme in the embodiment of the present invention below in conjunction with accompanying drawing, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Be understandable that; specific embodiment described herein is only for explaining the present invention; but not limitation of the invention; based on the embodiment in the present invention; those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
Embodiment one
Refer to Fig. 1 a, the schematic flow sheet of the generation method of a kind of feature database provided for the embodiment of the present invention one.The method of the embodiment of the present invention can be performed by the generating apparatus of the feature database being configured to hardware and/or software simulating, and the general accessible site of this implement device is in the server that the collection of feature record can be provided to serve.
The method comprises: step 110 ~ step 170.
The initial element of set scale that step 110, basis are preset and sample percentage, determine the number of feature record in target element of set scale and target element of set.
Particularly, following two steps can be comprised:
Calculate the initial element of set scale preset and the sample percentage preset long-pending;
Described initial element of set scale and sample percentage long-pending be less than 1 time, expand described initial element of set scale, until element of set scale after the expansion calculated and sample percentage is is long-pendingly more than or equal to 1, current element of set scale after expanding is defined as target element of set scale, and by described target element of set scale and sample percentage amass round after be defined as the number of feature record in target element of set.
Wherein, element of set scale refers to the sum of the target record that element of set comprises, and described target record is stored in data base management system (DBMS), and the whole target record in the appointment table stored in data base management system (DBMS) is divided into several element of set.Also namely, in data base management system (DBMS), store a lot of table, in each table, store different records, and the division object of the present embodiment for be whole target record in the appointment table stored in data base management system (DBMS).
Target element of set scale and the sample percentage preset are long-pending is more than or equal to 1, refers at least to obtain a feature record in the element of set that target element of set scale is corresponding.
Such as, the initial element of set scale G preset is supposed 0be 1000, the sample percentage preset is 0.09%, the initial element of set scale G preset calculated 0with preset amassing of sample percentage be 0.9, determine initial element of set scale G 01 is less than, by initial element of set scale G with long-pending 0.9 of sample percentage 0expand 10 times, the current element of set scale G after expansion is 10000, again calculating current element of set scale G with the amassing of sample percentage of presetting is 9, determine current element of set scale G and sample percentage long-pending 9 are greater than 1, now, current element of set scale G after expanding is defined as target element of set scale, and after described target element of set scale 10000 and long-pending 9 of sample percentage 0.09% being rounded, be defined as the number of feature record in target element of set, represent in every 10000 objective records should random acquisition 9 objective record as feature record, also be namely in each element of set of 10000, all need random acquisition 9 objective record as the feature record of each target element of set in target element of set scale.
Step 120, utilize the random seed stochastic generation number random number identical with the number of feature record in described target element of set preset, and each random number generated is saved as initial random step-length array, the span of each random number is all between 0 to described target element of set scale.
In other words, in target element of set, the number of feature record is N, then utilize the N number of random number of random seed stochastic generation, the span of each random number in N number of random number is all between 0 to described target element of set scale G, and the N number of random number generated is saved as initial random step-length array, also namely, in target element of set, the number N of feature record is exactly the capacity of initial random step-length array.
In the present embodiment, the setting of random seed is in order to ensure that initial random step-length array is relatively controlled.
Step 130, calculate each random number sum that described initial random step-length array comprises.
Step 140, when determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array.
Step 150, according to described target element of set scale, whole target record to be divided.
Step 160, each element of set obtained for division, utilize described target arbitrary width array in this element of set, to obtain corresponding feature record, as the Sample Storehouse that this element of set is corresponding.
Particularly, if whole target record aliquot target element of set scale, then divide each element of set obtained and comprise: number is that whole target record divides exactly the business of target element of set scale and scale is the element of set of described target element of set scale.Such as, the whole target record stored in tentation data base management system is 30000, and the target element of set scale G determined is 10000, and the sample percentage preset is 0.09%, then divide and obtain 3 element of sets, the scale of each element of set is 10000.
If whole target record aliquant target element of set scale, then divide each element of set obtained to comprise: number is that whole target record divides exactly the business of target element of set scale and scale is the element of set of described target element of set scale, and a scale is the element of set that whole target record divides exactly the remainder of target element of set scale.Such as, the whole target record stored in tentation data base management system is 34000, and the target element of set scale G determined is 10000, the sample percentage preset is 0.09%, then divide and obtain 4 element of sets, the scale of front 3 element of sets is 10000, and the scale of the 4th element of set is 4000.
It should be noted that, the advantage of use element of set is, the quantity of each acquisition characteristics record can be controlled, and the target arbitrary width array of element of set only needs generation once just to can be used for all element of sets, thus reduce the performance cost of data base management system (DBMS) acquisition characteristics record, decrease the cost that CBO analyzes the feature record collected.
Correspondingly, utilize described target arbitrary width array in this element of set, obtain corresponding feature record, specifically can comprise:
For each element of set that scale is described target element of set scale, according to first random number in described target arbitrary width array, in this element of set, obtain corresponding first feature record; According to i-th random number in described target arbitrary width array, obtain in this element of set with the record that described the i-th-1 feature record relativity shift is described i-th random number, as i-th feature record, wherein i >=2.
Scale is less than to the element of set of described target element of set scale, according to first random number in described target arbitrary width array, in this element of set, obtains corresponding first feature record; According to i-th random number in described target arbitrary width array, obtain in this element of set with the record that described the i-th-1 feature record relativity shift is described i-th random number, as i-th feature record, wherein i >=2, until front i+1 random number sum in described target arbitrary width array is greater than the scale of this element of set, then stop the operation of described acquisition feature record.
In the present embodiment, the random number in the random array of target is used as arbitrary width, specifically refers to that the past feature is recorded to the relativity shift of current signature record.The value of arbitrary width (random number also namely in the random array of target) should be more than or equal to 0 integer.
Refer to Fig. 1 b, the random array of use target provided for the embodiment of the present invention one obtains the schematic diagram of feature record in element of set.The random number that this target arbitrary width array comprises is respectively 0,3,2,5,3,6,3.According to first random number 0 in this target arbitrary width array, in the element of set shown in Fig. 1 b, obtain corresponding first feature record (as shown in the solid black mark of first, left side in Fig. 1 b); According to the 2nd random number in described target arbitrary width array, obtain is the record (be namely the record of 3 with described 1st feature record relativity shift value) of described 2nd random number with described 1st feature record relativity shift in this element of set, as the 2nd feature record (as shown in the solid black mark of second, left side in Fig. 1 b); According to the 3rd random number in described target arbitrary width array, obtain is the record (be namely the record of 2 with described 2nd feature record relativity shift value) of described 3rd random number with described 2nd feature record relativity shift in this element of set, as the 3rd feature record (as shown in the solid black mark of the 3rd, left side in Fig. 1 b), the like, get all the other feature records in this element of set, thus obtain Sample Storehouse corresponding to this element of set.
Step 170, determine the union of the Sample Storehouse that each element of set is corresponding, as the feature database of described whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and the whole target record in the appointment table using target element of set scale to store data base management system (DBMS) carries out division obtains each element of set, by determining the number of feature record in target element of set, random seed is utilized to generate corresponding random number, and using the capacity of the number of feature record in target element of set as initial random step-length array, obtain initial random step-length array, the each random number sum comprised by initial random step-length array and the consistance of target element of set scale are judged, conforming initial random step-length array will be met as target arbitrary width array, the quantity of the feature record utilizing target arbitrary width array can control to collect in each element of set, and the target arbitrary width array of element of set only needs generation once just to can be used for all element of sets, thus reduce the performance cost of data base management system (DBMS) acquisition characteristics record, decrease the cost that CBO analyzes the feature record collected, improve the analysis efficiency of CBO, in addition, the each random number sum comprised due to target arbitrary width array is consistent with described target element of set scale, for dividing each element of set obtained, can ensure that the sample range of the feature record obtained in each element of set covers each element of set, the randomness of sampling and the accuracy of sampling rate can be met simultaneously.
Embodiment two
Refer to Fig. 2 a, the schematic flow sheet of the generation method of a kind of feature database provided for the embodiment of the present invention two.The method comprises: step 210 ~ step 290.
The initial element of set scale that step 210, basis are preset and sample percentage, determine the number of feature record in target element of set scale and target element of set.
This step is equally applicable to the concrete operations in previous embodiment one step 110, repeats no more.
Step 220, utilize the random seed stochastic generation number random number identical with the number of feature record in described target element of set preset, and each random number generated is saved as initial random step-length array, the span of each random number is all between 0 to described target element of set scale.
Step 230, calculate each random number sum that described initial random step-length array comprises.
Step 240, judge that whether each random number sum that described initial random step-length array comprises is consistent with described target element of set scale, if so, perform step 250, if not, perform step 260.
Step 250, using described initial random step-length array as target arbitrary width array, continue perform step 270.
Step 260, maximum modified twice is carried out to each random number that described initial random step-length array comprises, obtain target arbitrary width array, wherein, error between each random number sum that described target arbitrary width array comprises and described target element of set scale meets default error rate, continues to perform step 270.
It should be noted that, for dividing each element of set obtained, in order to ensure that the sample range of feature record covers whole element of set, the action scope of element of set Sample Storehouse should be whole element of set.Ideal is that each random number sum that target arbitrary width array comprises is consistent with target element of set scale.
But, due to the existence of random number, can not ensure that each random number sum that the initial random step-length array generated comprises is certainly consistent with target element of set scale.Need each random number to described initial random step-length array comprises to carry out maximum modified twice, the error between each random number sum that the target arbitrary width array obtained is comprised and described target element of set scale meets default error rate for this reason.
Below first time modification method is introduced respectively with second time modification method.
Refer to Fig. 2 b, the once correction of passing through provided for the embodiment of the present invention two obtains the schematic flow sheet of target arbitrary width array.Specifically comprise: step 261 ~ step 263.
Step 261, the impartial proportional zoom of each random number that described initial random step-length array is comprised rounding, obtain first time revised arbitrary width array, wherein, described zoom factor is the ratio of each random number sum that described target element of set scale and described initial random step-length array comprise.
Step 262, judge described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error whether meet default error rate, if so, perform step 263.
Step 263, using revised for described first time arbitrary width array as target arbitrary width array.
In other words, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, carry out first time to each random number in original arbitrary width array to revise, specifically: to each random number in original arbitrary width array, the ratio geometric ratio ground convergent-divergent of each random number sum S comprised according to described initial random step-length array and target element of set scale G (is taken advantage of (G ÷ S) each random number in original arbitrary width array, then is rounded.
The manner, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, carry out first time to random numbers all in original arbitrary width array to revise, thus the error between each random number sum making revised arbitrary width array for the first time comprise and described target element of set scale meets default error rate, and then obtain target arbitrary width array, error between each random number sum comprised due to target arbitrary width array and described target element of set scale meets default error rate, therefore for dividing each element of set obtained, can ensure that the sample range of the feature record obtained in each element of set covers each element of set, the randomness of sampling and the accuracy of sampling rate can be met simultaneously.
Referring to Fig. 2 c, is the schematic flow sheet being obtained target arbitrary width array by second-order correction that the embodiment of the present invention two provides.Specifically comprise: step 261 ~ step 267.
Step 261, the impartial proportional zoom of each random number that described initial random step-length array is comprised rounding, obtain first time revised arbitrary width array, wherein, described zoom factor is the ratio of each random number sum that described target element of set scale and described initial random step-length array comprise.
Step 262, judge described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error whether meet default error rate, if so, perform step 263, if not, perform step 264.
Step 263, using revised for described first time arbitrary width array as target arbitrary width array.
In other words, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, carry out first time to each random number in original arbitrary width array to revise, if first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error meet default error rate, then can obtain target arbitrary width array by once revising, otherwise, also need to carry out second time correction to first time revised arbitrary width array, thus the error between each random number sum making the revised arbitrary width array of second time comprise and described target element of set scale meets default error rate, and then obtain target arbitrary width array.Second time modification method specifically comprises: step 264 ~ step 267.
Step 264, according to described target element of set scale and described default error rate, determine lower limit and the higher limit of element of set coverage, continue to perform step 265.
Particularly, according to described target element of set scale G and described default error rate, determine lower limit L and the higher limit U of element of set coverage, wherein, L=G-G × preset error rate, U=G+G × preset error rate.
Step 265, in the lower limit of described element of set coverage between higher limit, random selecting one value is as corrected parameter, determine described corrected parameter and described first time the deviation of each random number sum that comprises of revised arbitrary width array, continue to perform step 266.
Also, namely, in scope [L, U], W is as corrected parameter for Stochastic choice one value, and determines the deviation D between W and first time each random number sum S of comprising of revised arbitrary width array.
Step 266, the random number that random selecting number is identical with described deviation in described first time revised arbitrary width array, continue to perform step 267.
Specifically, first time revised arbitrary width array in a random selecting D random number, as second time correction correction object.
If the described corrected parameter of step 267 is greater than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all added 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array;
If described corrected parameter is less than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all subtracted 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array.
Also namely, if W> is first time each random number sum S of comprising of revised arbitrary width array, then by D random number of random selecting in first time revised arbitrary width array, each adds 1; Otherwise, by first time revised arbitrary width array in random selecting D random number each subtract 1.
The manner, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, carry out first time to random numbers all in original arbitrary width array to revise, first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error meet default error rate time, target arbitrary width array can be obtained by once revising; First time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error still do not meet default error rate time, carry out second time correction, obtain target arbitrary width array, error between each random number sum comprised due to target arbitrary width array and described target element of set scale meets default error rate, therefore for dividing each element of set obtained, can ensure that the sample range of the feature record obtained in each element of set covers each element of set, the randomness of sampling and the accuracy of sampling rate can be met simultaneously.
It should be noted that, it is for random numbers all in original arbitrary width array that first time is revised; Second time is revised only to be needed to revise from random selected part random number first time revised arbitrary width array, does not relate to whole random number.
Step 270, according to described target element of set scale, whole target record to be divided.
Step 280, each element of set obtained for division, utilize described target arbitrary width array in this element of set, to obtain corresponding feature record, as the Sample Storehouse that this element of set is corresponding.
The described target arbitrary width array that utilizes that this step is equally applicable to above-described embodiment one and provides obtains the concrete operations of corresponding feature record in this element of set, repeats no more.
Step 290, determine the union of the Sample Storehouse that each element of set is corresponding, as the feature database of described whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and the whole target record in the appointment table using target element of set scale to store data base management system (DBMS) carries out division obtains each element of set, by determining the number of feature record in target element of set, random seed is utilized to generate corresponding random number, and using the capacity of the number of feature record in target element of set as initial random step-length array, obtain initial random step-length array, when determining each random number sum that initial random step-length array comprises and target element of set scale inconsistent, by maximum modified twice, get final product target arbitrary width array, the quantity of the feature record utilizing target arbitrary width array can control to collect in each element of set, and the target arbitrary width array of element of set only needs generation once just to can be used for all element of sets, thus reduce the performance cost of data base management system (DBMS) acquisition characteristics record, decrease the cost that CBO analyzes the feature record collected, improve the analysis efficiency of CBO, in addition, error between each random number sum comprised due to target arbitrary width array and described target element of set scale meets default error rate, therefore for dividing each element of set obtained, can ensure that the sample range of the feature record obtained in each element of set covers each element of set, the randomness of sampling and the accuracy of sampling rate can be met simultaneously.
Embodiment three
Refer to Fig. 3, the structural representation of the generating apparatus of a kind of feature database provided for the embodiment of the present invention three.This device comprises: parameter configuration module 310, initial random step-length array generation module 320, target arbitrary width array generation module 330 and feature database generation module 340.
Wherein, parameter configuration module 310, for according to the initial element of set scale preset and sample percentage, determines the number of feature record in target element of set scale and target element of set; Initial random step-length array generation module 320 is for utilizing the random number that default random seed stochastic generation number is identical with the number of feature record in described target element of set, and each random number generated is saved as initial random step-length array, the span of each random number is all between 0 to described target element of set scale; Each random number sum that target arbitrary width array generation module 330 comprises for calculating described initial random step-length array; When determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array; Feature database generation module 340 is for dividing whole target record according to described target element of set scale; For dividing each element of set obtained, described target arbitrary width array is utilized in this element of set, to obtain corresponding feature record, as the Sample Storehouse that this element of set is corresponding; Determine the union of the Sample Storehouse that each element of set is corresponding, as the feature database of described whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and the whole target record in the appointment table using target element of set scale to store data base management system (DBMS) carries out division obtains each element of set, by determining the number of feature record in target element of set, random seed is utilized to generate corresponding random number, and using the capacity of the number of feature record in target element of set as initial random step-length array, obtain initial random step-length array, the each random number sum comprised by initial random step-length array and the consistance of target element of set scale are judged, conforming initial random step-length array will be met as target arbitrary width array, the quantity of the feature record utilizing target arbitrary width array can control to collect in each element of set, and the target arbitrary width array of element of set only needs generation once just to can be used for all element of sets, thus reduce the performance cost of data base management system (DBMS) acquisition characteristics record, decrease the cost that CBO analyzes the feature record collected, improve the analysis efficiency of CBO, in addition, the each random number sum comprised due to target arbitrary width array is consistent with described target element of set scale, for dividing each element of set obtained, can ensure that the sample range of the feature record obtained in each element of set covers each element of set, the randomness of sampling and the accuracy of sampling rate can be met simultaneously.
In such scheme, described target arbitrary width array generation module 330 also can be used for, after each random number sum that the described initial random step-length array of calculating comprises, before according to described target element of set scale whole target record being divided, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, maximum modified twice is carried out to each random number that described initial random step-length array comprises, obtain target arbitrary width array, wherein, error between each random number sum that described target arbitrary width array comprises and described target element of set scale meets default error rate.
Further, described target arbitrary width array generation module 330 can preferably include: calculating sub module, first revises submodule and target arbitrary width array generates submodule.
Wherein, each random number sum of comprising for calculating described initial random step-length array of calculating sub module, after first correction submodule is used for each random number sum comprised in the described initial random step-length array of calculating, before according to described target element of set scale whole target record being divided, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, the impartial proportional zoom of each random number comprise described initial random step-length array also rounded, obtain first time revised arbitrary width array, wherein, described zoom factor is the ratio of each random number sum that described target element of set scale and described initial random step-length array comprise, target arbitrary width array generates submodule and is used for when determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array, or for determine described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error meet default error rate time, using revised for described first time arbitrary width array as target arbitrary width array.
Further, described target arbitrary width array generation module 330 also can comprise: second revises submodule, for after obtaining first time revised arbitrary width array, determine described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error do not meet default error rate time, according to described target element of set scale and described default error rate, determine lower limit and the higher limit of element of set coverage; In the lower limit of described element of set coverage between higher limit, random selecting one value as corrected parameter, determine described corrected parameter and described first time the deviation of each random number sum that comprises of revised arbitrary width array; The random number that random selecting number is identical with described deviation in described first time revised arbitrary width array; If described corrected parameter is greater than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all added 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array; If described corrected parameter is less than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all subtracted 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array.
In such scheme, described parameter configuration module 310 specifically can be used for:
Calculate the initial element of set scale preset and the sample percentage preset long-pending;
Described initial element of set scale and sample percentage long-pending be less than 1 time, expand described initial element of set scale, until element of set scale after the expansion calculated and sample percentage is is long-pendingly more than or equal to 1, current element of set scale after expanding is defined as target element of set scale, and by described target element of set scale and sample percentage amass round after be defined as the number of feature record in target element of set.
In such scheme, if whole target record aliquot target element of set scale, then divide each element of set obtained and comprise: number is that whole target record divides exactly the business of target element of set scale and scale is the element of set of described target element of set scale;
If whole target record aliquant target element of set scale, then divide each element of set obtained to comprise: number is that whole target record divides exactly the business of target element of set scale and scale is the element of set of described target element of set scale, and a scale is the element of set that whole target record divides exactly the remainder of target element of set scale.
Accordingly, described feature database generation module 340 specifically for:
For each element of set that scale is described target element of set scale, according to first random number in described target arbitrary width array, in this element of set, obtain corresponding first feature record; According to i-th random number in described target arbitrary width array, obtain in this element of set with the record that described the i-th-1 feature record relativity shift is described i-th random number, as i-th feature record, wherein i >=2;
Scale is less than to the element of set of described target element of set scale, according to first random number in described target arbitrary width array, in this element of set, obtains corresponding first feature record; According to i-th random number in described target arbitrary width array, obtain in this element of set with the record that described the i-th-1 feature record relativity shift is described i-th random number, as i-th feature record, wherein i >=2, until front i+1 random number sum in described target arbitrary width array is greater than the scale of this element of set, then stop the operation of described acquisition feature record.
The generating apparatus of the feature database that the embodiment of the present invention provides can perform the generation method of the feature database that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.
Obviously, it will be understood by those skilled in the art that above-mentioned of the present invention each module or each step can be implemented by server and client side as above.Alternatively, the embodiment of the present invention can realize by the executable program of computer installation, thus they storages can be performed by processor in the storage device, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.; Or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to the combination of any specific hardware and software.
Last it is noted that above each embodiment is only for illustration of technical scheme of the present invention, but not be limited; In embodiment preferred embodiment, be not limited, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a generation method for feature database, is characterized in that, comprising:
According to the initial element of set scale preset and sample percentage, determine the number of feature record in target element of set scale and target element of set;
Utilize the random number that default random seed stochastic generation number is identical with the number of feature record in described target element of set, and each random number generated is saved as initial random step-length array, the span of each random number is all between 0 to described target element of set scale;
Calculate each random number sum that described initial random step-length array comprises;
When determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array;
According to described target element of set scale, whole target record is divided;
For dividing each element of set obtained, described target arbitrary width array is utilized in this element of set, to obtain corresponding feature record, as the Sample Storehouse that this element of set is corresponding;
Determine the union of the Sample Storehouse that each element of set is corresponding, as the feature database of described whole target record.
2. method according to claim 1, is characterized in that, after calculating each random number sum of comprising of described initial random step-length array, before dividing whole target record according to described target element of set scale, described method also comprises:
When determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, maximum modified twice is carried out to each random number that described initial random step-length array comprises, obtain target arbitrary width array, wherein, the error between each random number sum of comprising of described target arbitrary width array and described target element of set scale meets default error rate.
3. method according to claim 2, is characterized in that, carries out maximum modified twice, obtain target arbitrary width array, comprising each random number that described initial random step-length array comprises:
The impartial proportional zoom of each random number comprise described initial random step-length array also rounded, obtain first time revised arbitrary width array, wherein, described zoom factor is the ratio of each random number sum that described target element of set scale and described initial random step-length array comprise;
Determine described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error meet default error rate time, using revised for described first time arbitrary width array as target arbitrary width array.
4. method according to claim 3, is characterized in that, after obtaining first time revised arbitrary width array, described method also comprises:
Determine described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error do not meet default error rate time, according to described target element of set scale and described default error rate, determine lower limit and the higher limit of element of set coverage;
In the lower limit of described element of set coverage between higher limit, random selecting one value as corrected parameter, determine described corrected parameter and described first time the deviation of each random number sum that comprises of revised arbitrary width array;
The random number that random selecting number is identical with described deviation in described first time revised arbitrary width array;
If described corrected parameter is greater than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all added 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array;
If described corrected parameter is less than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all subtracted 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array.
5., according to the arbitrary described method of claim 1-4, it is characterized in that, according to the initial element of set scale preset and sample percentage, determine the number of feature record in target element of set scale and target element of set, comprising:
Calculate the initial element of set scale preset and the sample percentage preset long-pending;
Described initial element of set scale and sample percentage long-pending be less than 1 time, expand described initial element of set scale, until element of set scale after the expansion calculated and sample percentage is is long-pendingly more than or equal to 1, current element of set scale after expanding is defined as target element of set scale, and by described target element of set scale and sample percentage amass round after be defined as the number of feature record in target element of set.
6., according to the arbitrary described method of claim 1-4, it is characterized in that:
If whole target record aliquot target element of set scale, then divide each element of set obtained and comprise: number is that whole target record divides exactly the business of target element of set scale and scale is the element of set of described target element of set scale;
If whole target record aliquant target element of set scale, then divide each element of set obtained to comprise: number is that whole target record divides exactly the business of target element of set scale and scale is the element of set of described target element of set scale, and a scale is the element of set that whole target record divides exactly the remainder of target element of set scale.
7. method according to claim 6, is characterized in that, utilizes described target arbitrary width array in this element of set, obtain corresponding feature record, comprising:
For each element of set that scale is described target element of set scale, according to first random number in described target arbitrary width array, in this element of set, obtain corresponding first feature record; According to i-th random number in described target arbitrary width array, obtain in this element of set with the record that described the i-th-1 feature record relativity shift is described i-th random number, as i-th feature record, wherein i >=2;
Scale is less than to the element of set of described target element of set scale, according to first random number in described target arbitrary width array, in this element of set, obtains corresponding first feature record; According to i-th random number in described target arbitrary width array, obtain in this element of set with the record that described the i-th-1 feature record relativity shift is described i-th random number, as i-th feature record, wherein i >=2, until front i+1 random number sum in described target arbitrary width array is greater than the scale of this element of set, then stop the operation of described acquisition feature record.
8. a generating apparatus for feature database, is characterized in that, comprising:
Parameter configuration module, for according to the initial element of set scale preset and sample percentage, determines the number of feature record in target element of set scale and target element of set;
Initial random step-length array generation module, for utilizing the random number that default random seed stochastic generation number is identical with the number of feature record in described target element of set, and each random number generated is saved as initial random step-length array, the span of each random number is all between 0 to described target element of set scale;
Target arbitrary width array generation module, for calculating each random number sum that described initial random step-length array comprises; When determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array;
Feature database generation module, for dividing whole target record according to described target element of set scale; For dividing each element of set obtained, described target arbitrary width array is utilized in this element of set, to obtain corresponding feature record, as the Sample Storehouse that this element of set is corresponding; Determine the union of the Sample Storehouse that each element of set is corresponding, as the feature database of described whole target record.
9. device according to claim 8, is characterized in that:
Described target arbitrary width array generation module also for, after each random number sum that the described initial random step-length array of calculating comprises, before according to described target element of set scale whole target record being divided, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, maximum modified twice is carried out to each random number that described initial random step-length array comprises, obtain target arbitrary width array, wherein, error between each random number sum that described target arbitrary width array comprises and described target element of set scale meets default error rate.
10. device according to claim 9, is characterized in that, described target arbitrary width array generation module comprises:
Calculating sub module, for calculating each random number sum that described initial random step-length array comprises;
First revises submodule, for after calculating each random number sum of comprising of described initial random step-length array, before according to described target element of set scale whole target record being divided, when determining each random number sum that described initial random step-length array comprises and described target element of set scale is inconsistent, the impartial proportional zoom of each random number comprise described initial random step-length array also rounded, obtain first time revised arbitrary width array, wherein, described zoom factor is the ratio of each random number sum that described target element of set scale and described initial random step-length array comprise,
Target arbitrary width array generates submodule, for when determining each random number sum that described initial random step-length array comprises and being consistent with described target element of set scale, using described initial random step-length array as target arbitrary width array; Or for determine described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error meet default error rate time, using revised for described first time arbitrary width array as target arbitrary width array;
Described target arbitrary width array generation module also comprises:
Second revises submodule, for after obtaining first time revised arbitrary width array, determine described first time each random number sum of comprising of revised arbitrary width array and described target element of set scale between error do not meet default error rate time, according to described target element of set scale and described default error rate, determine lower limit and the higher limit of element of set coverage; In the lower limit of described element of set coverage between higher limit, random selecting one value as corrected parameter, determine described corrected parameter and described first time the deviation of each random number sum that comprises of revised arbitrary width array; The random number that random selecting number is identical with described deviation in described first time revised arbitrary width array; If described corrected parameter is greater than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all added 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array; If described corrected parameter is less than each random number sum of comprising of revised arbitrary width array of described first time, then each random number identical with described deviation for the number of described random selecting is all subtracted 1, obtain the revised arbitrary width array of second time, and as target arbitrary width array.
CN201510173241.0A 2015-04-13 2015-04-13 The generation method and device of feature database Active CN104809175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510173241.0A CN104809175B (en) 2015-04-13 2015-04-13 The generation method and device of feature database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510173241.0A CN104809175B (en) 2015-04-13 2015-04-13 The generation method and device of feature database

Publications (2)

Publication Number Publication Date
CN104809175A true CN104809175A (en) 2015-07-29
CN104809175B CN104809175B (en) 2018-02-27

Family

ID=53693997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510173241.0A Active CN104809175B (en) 2015-04-13 2015-04-13 The generation method and device of feature database

Country Status (1)

Country Link
CN (1) CN104809175B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279864A (en) * 2018-01-31 2018-07-13 上海集成电路研发中心有限公司 System random number generation method
CN112308330A (en) * 2020-11-09 2021-02-02 清华大学 Digital accident database construction method and device and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198811A1 (en) * 2009-02-02 2010-08-05 Wiener Janet L Query plan analysis of alternative plans using robustness mapping
CN102081651A (en) * 2010-12-29 2011-06-01 北京像素软件科技股份有限公司 Table division method for online game database
CN102999594A (en) * 2012-11-16 2013-03-27 上海交通大学 Safety nearest neighbor query method and system based on maximum division and random data block
CN104156451A (en) * 2014-08-18 2014-11-19 深圳市一五一十网络科技有限公司 Data storage managing method and system
US20140358895A1 (en) * 2013-05-31 2014-12-04 International Business Machines Corporation Eigenvalue-based data query
US20150039642A1 (en) * 2006-03-31 2015-02-05 Oracle International Corporation Leveraging Structured XML Index Data For Evaluating Database Queries

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150039642A1 (en) * 2006-03-31 2015-02-05 Oracle International Corporation Leveraging Structured XML Index Data For Evaluating Database Queries
US20100198811A1 (en) * 2009-02-02 2010-08-05 Wiener Janet L Query plan analysis of alternative plans using robustness mapping
CN102081651A (en) * 2010-12-29 2011-06-01 北京像素软件科技股份有限公司 Table division method for online game database
CN102999594A (en) * 2012-11-16 2013-03-27 上海交通大学 Safety nearest neighbor query method and system based on maximum division and random data block
US20140358895A1 (en) * 2013-05-31 2014-12-04 International Business Machines Corporation Eigenvalue-based data query
CN104156451A (en) * 2014-08-18 2014-11-19 深圳市一五一十网络科技有限公司 Data storage managing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李伏等: "混合MapReduce环境下大数据划分的查询优化", 《计算机科学与探索》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279864A (en) * 2018-01-31 2018-07-13 上海集成电路研发中心有限公司 System random number generation method
CN112308330A (en) * 2020-11-09 2021-02-02 清华大学 Digital accident database construction method and device and computer equipment
CN112308330B (en) * 2020-11-09 2021-07-09 清华大学 Digital accident database construction method and device and computer equipment

Also Published As

Publication number Publication date
CN104809175B (en) 2018-02-27

Similar Documents

Publication Publication Date Title
CN102930062B (en) The method of the quick horizontal extension of a kind of database
CN105786808B (en) A kind of method and apparatus for distributed execution relationship type computations
US10169412B2 (en) Selectivity estimation for query execution planning in a database
AU2015347304B2 (en) Testing insecure computing environments using random data sets generated from characterizations of real data sets
US20160246852A1 (en) Systems and Methods for Quantile Estimation in a Distributed Data System
US9652498B2 (en) Processing queries using hybrid access paths
US20090024607A1 (en) Query selection for effectively learning ranking functions
CN111512283B (en) Radix estimation in a database
US10726006B2 (en) Query optimization using propagated data distinctness
JP6694447B2 (en) Big data calculation method and system, program, and recording medium
US9152670B2 (en) Estimating number of iterations or self joins required to evaluate iterative or recursive database queries
CN108205571B (en) Key value data table connection method and device
US20120136879A1 (en) Systems and methods for filtering interpolated input data based on user-supplied or other approximation constraints
CN105447035A (en) Data scanning method and apparatus
US20150120271A1 (en) System and method for visualization and optimization of system of systems
CN104679858A (en) Method and device for inquiring data
CN104809175A (en) Generation method and device of feature library
US10482076B2 (en) Single level, multi-dimension, hash-based table partitioning
Mavrotas et al. AUGMECON2: A novel version of the ε-constraint method for finding the exact Pareto set in Multi-Objective Integer Programming problems
US20210124781A1 (en) Single view presentation of multiple queries in a data visualization application
CN105653355A (en) Method and system for calculating Hadoop configuration parameters
CN104361090A (en) Data query method and device
Shettar et al. A MapReduce framework to implement enhanced K-means algorithm
CN111090708B (en) User characteristic output method and system based on data warehouse
CN112667859A (en) Data processing method and device based on memory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant