CN104809175B - The generation method and device of feature database - Google Patents
The generation method and device of feature database Download PDFInfo
- Publication number
- CN104809175B CN104809175B CN201510173241.0A CN201510173241A CN104809175B CN 104809175 B CN104809175 B CN 104809175B CN 201510173241 A CN201510173241 A CN 201510173241A CN 104809175 B CN104809175 B CN 104809175B
- Authority
- CN
- China
- Prior art keywords
- target
- random number
- arbitrary width
- record
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of generation method and device of feature database.This method includes:Determine the number that feature records in target element of set scale and target element of set;Number and the number identical random number of feature record in target element of set are generated at random using default random seed, and save as initial random step-length array;Most modified twices are carried out to initial random step-length array, obtain target arbitrary width array, correct efficiency high;Whole target record is divided according to target element of set scale;Corresponding feature record is obtained in each element of set using target arbitrary width array, as corresponding Sample Storehouse;Feature database using the union of Sample Storehouse corresponding to each element of set as whole target record.Target arbitrary width array, which need to only generate, once just can be used for all element of sets, reduces the performance cost of data base management system acquisition characteristics record, improves CBO analysis efficiency, can meet the accuracy for sampling randomness and sample rate.
Description
Technical field
The present embodiments relate to the data sampling techniques of data base management system, more particularly to a kind of generation of feature database
Method and apparatus.
Background technology
SQL (Structured Query Language, the structuralized query that data base management system inputs according to user
Language) executive plan corresponding to sentence generation.Most of data base management systems, all introduce the optimizer based on cost
(cost based optimizer, CBO), i.e. data base management system obtain the related all information of executive plan, by right
These information do calculating analysis, show that the executive plan of a Least-cost in all feasible executive plans performs as final
Plan, to improve the execution efficiency of data base management system.It is existing for CBO and the sampling to data-base recording calculates analysis
Foundation stone.
Calculating analysis is carried out to all records, can no doubt improve CBO accuracy, but for magnanimity record,
Such cost is too high, can reduce the execution efficiency of data base management system on the contrary.So how in the record of magnanimity, with
Sample machine, obtain feature record, and generate feature database and be particularly important.
Generally, it is believed that the record in data base management system is Coutinuous store, obtains the sampling process one of feature database
As be:Relatively current record-shifted A is after step-length, and one feature record of acquisition, then the relative feature got that offsets record
Relative skew A ', obtain next feature record;This process repeatedly, finally gives feature database.
Due to the inhomogeneities of data distribution in database, current data base management system manufacturer is more using above-mentioned random
The method of sampling obtains feature record, and so as to generate feature database, but how presently disclosed data is all without reference to effectively utilizing
Arbitrary width generates the method for feature database.
The content of the invention
The embodiment of the present invention provides a kind of generation method and device of feature database, to optimize the acquisition mode of feature record.
In a first aspect, the embodiments of the invention provide a kind of generation method of feature database, including:
According to default initial element of set scale and sample percentage, determine special in target element of set scale and target element of set
Levy the number of record;
Using default random seed generate at random the number identical of feature record in number and the target element of set with
Machine number, and each random number of generation saved as into initial random step-length array, the span of each random number is between 0 to described
Between target element of set scale;
Calculate each random number sum that the initial random step-length array includes;
Determining, each random number sum that the initial random step-length array includes is consistent with the target element of set scale
When, using the initial random step-length array as target arbitrary width array;
Whole target record is divided according to the target element of set scale;
Each element of set obtained for division, corresponding feature is obtained using the target arbitrary width array in the element of set
Record, as Sample Storehouse corresponding to the element of set;
The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
Second aspect, the embodiments of the invention provide a kind of generating means of feature database, including:
Parameter configuration module, for according to default initial element of set scale and sample percentage, determining that target element of set is advised
The number that feature records in mould and target element of set;
Initial random step-length array generation module, for generating number and the target at random using default random seed
The number identical random number that feature records in element of set, and each random number of generation is saved as into initial random step-length array, respectively
The span of random number is arrived between the target element of set scale between 0;
Target arbitrary width array generation module, for calculate each random number that the initial random step-length array includes it
With;Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will
The initial random step-length array is as target arbitrary width array;
Feature database generation module, for being divided according to the target element of set scale to whole target record;For drawing
Each element of set got, corresponding feature is obtained in the element of set using the target arbitrary width array and is recorded, as the collection
Sample Storehouse corresponding to member;The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
The generation method and device of feature database provided in an embodiment of the present invention, by determining target element of set scale, and use
Whole target record in the specified table that target element of set scale stores to data base management system is divided to obtain each element of set, is led to
The number for determining feature record in target element of set is crossed, corresponding random number is generated using random seed, and will be special in target element of set
Capacity of the number of record as initial random step-length array is levied, initial random step-length array is obtained, passes through initial random step-length
The uniformity of each random number sum that array includes and target element of set scale judges, will meet the initial random step-length number of uniformity
Group is used as target arbitrary width array, and the feature collected in each element of set can be controlled using target arbitrary width array
The quantity of record, and the target arbitrary width array of element of set only needs generation once just to can be used for all element of sets, so as to reduce
The performance cost of data base management system acquisition characteristics record, reduce the cost for the feature record that CBO analyses collect, carry
High CBO analysis efficiency, further, since each random number sum that target arbitrary width array includes is advised with the target element of set
Mould is consistent, each element of set obtained for division, it is ensured that the sample range covering of the feature record obtained in each element of set
Each element of set, the randomness of sampling and the accuracy of sample rate can be met simultaneously.
Brief description of the drawings
In order to illustrate more clearly of the present invention, one will be done to the required accompanying drawing used in the present invention below and be simply situated between
Continue, it should be apparent that, drawings in the following description are some embodiments of the present invention, are come for those of ordinary skill in the art
Say, without having to pay creative labor, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 a are a kind of schematic flow sheet of the generation method for feature database that the embodiment of the present invention one provides;
Fig. 1 b are the signal for obtaining feature record in element of set using the random array of target that the embodiment of the present invention one provides
Figure;
Fig. 2 a are a kind of schematic flow sheet of the generation method for feature database that the embodiment of the present invention two provides;
Fig. 2 b are that the flow by once correcting to obtain target arbitrary width array that the embodiment of the present invention two provides is illustrated
Figure;
Fig. 2 c are that the flow that target arbitrary width array is obtained by second-order correction that the embodiment of the present invention two provides is illustrated
Figure;
Fig. 3 is a kind of structural representation of the generating means for feature database that the embodiment of the present invention three provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the embodiment of the present invention
In technical scheme be described in further detail, it is clear that described embodiment is part of the embodiment of the present invention, rather than entirely
The embodiment in portion.It is understood that specific embodiment described herein is only used for explaining the present invention, rather than to the present invention's
Limit, based on the embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of creative work is not made
Every other embodiment, belong to the scope of protection of the invention.It also should be noted that for the ease of description, accompanying drawing
In illustrate only part related to the present invention rather than full content.
Embodiment one
Fig. 1 a are referred to, a kind of schematic flow sheet of the generation method of the feature database provided for the embodiment of the present invention one.This
The method of inventive embodiments can be performed by the generating means for configuring the feature database with hardware and/or software realization, the realization
Device can be typically integrated in the server that can provide feature record collection service.
This method includes:Step 110~step 170.
Step 110, according to default initial element of set scale and sample percentage, determine target element of set scale and target
The number that feature records in element of set.
Specifically, following two steps can be included:
Calculate default initial element of set scale and the product of default sample percentage;
When the initial element of set scale and the product of sample percentage are less than 1, expand the initial element of set scale, until meter
Element of set scale and the product of sample percentage after the expansion arrived are more than or equal to 1, and the current element of set scale after will be enlarged by is defined as
Target element of set scale, and be defined as feature in target element of set after the product of the target element of set scale and sample percentage is rounded and remember
The number of record.
Wherein, element of set scale refers to the sum for the target record that element of set includes, and the target record is stored in data depositary management
In reason system, the whole target record in the specified table stored in data base management system is divided into several element of sets.That is, number
According to many tables are stored with base management system, different records is stored with each table, and the division object pin of the present embodiment
To be whole target record in the specified table stored in data base management system.
Target element of set scale and the product of default sample percentage are more than or equal to 1, refer to corresponding to target element of set scale
A feature record is at least obtained in element of set.
For example, it is assumed that default initial element of set scale G0For 1000, default sample percentage is 0.09%, is calculated
Default initial element of set scale G0Product with default sample percentage is 0.9, determines initial element of set scale G0With sampling hundred
Point than product 0.9 less than 1, by initial element of set scale G0Expand 10 times, the current element of set scale G after expansion is 10000;Count again
The product that calculation obtains current element of set scale G and default sample percentage is 9, determines current element of set scale G and sample percentage
Product 9 be more than 1, now, the current element of set scale G after will be enlarged by is defined as target element of set scale, and the target element of set is advised
The product 9 of mould 10000 and sample percentage 0.09% is defined as the number that feature records in target element of set after rounding, and represents every
In 10000 target records should 9 target records of random acquisition recorded as feature, namely in target element of set scale be
The feature that 9 target records of random acquisition are both needed in 10000 each element of set as each target element of set records.
Step 120, generate number and the number of feature record in the target element of set at random using default random seed
Identical random number, and each random number of generation is saved as into initial random step-length array, the span of each random number is situated between
Between 0 to the target element of set scale.
In other words, the number that feature records in target element of set is N, then generates N number of random number, N at random using random seed
The span of each random number in individual random number is between 0 arrives the target element of set scale G, and by the N number of of generation
Random number saves as initial random step-length array, that is, the number N that feature records in target element of set is exactly initial random step-length number
The capacity of group.
In the present embodiment, the setting of random seed is to ensure that initial random step-length array is relatively controllable.
Step 130, calculate each random number sum that the initial random step-length array includes.
Step 140, determining each random number sum and the target element of set that the initial random step-length array includes
When scale is consistent, using the initial random step-length array as target arbitrary width array.
Step 150, according to the target element of set scale whole target record is divided.
Step 160, each element of set obtained for division, phase is obtained using the target arbitrary width array in the element of set
The feature record answered, as Sample Storehouse corresponding to the element of set.
Specifically, if whole target record aliquot target element of set scale, each element of set for dividing to obtain include:Number
Divide exactly the business of target element of set scale for whole target record and scale be the target element of set scale element of set.For example, it is assumed that number
According to the whole target record stored in base management system be 30000, it is determined that target element of set scale G be 10000, it is default to adopt
Sample percentage is 0.09%, then division obtains 3 element of sets, and the scale of each element of set is 10000.
If the aliquant target element of set scale of whole target record, each element of set for dividing to obtain include:Number is whole
Individual target record divides exactly the element of set that the business of target element of set scale and scale are the target element of set scale, and a scale is
Whole target record divides exactly the element of set of the remainder of target element of set scale.For example, it is assumed that what is stored in data base management system is whole
Target record be 34000, it is determined that target element of set scale G be 10000, default sample percentage be 0.09%, then divide
4 element of sets are obtained, the scale of preceding 3 element of sets is 10000, and the scale of the 4th element of set is 4000.
It should be noted that be the quantity that can control each acquisition characteristics record using the advantages of element of set, and element of set
Target arbitrary width array only needs generation once just to can be used for all element of sets, special so as to reduce data base management system collection
The performance cost of record is levied, reduces the cost for the feature record that CBO analyses collect.
Correspondingly, corresponding feature is obtained in the element of set using the target arbitrary width array to record, specifically can be with
Including:
For each element of set that scale is the target element of set scale, according to first in the target arbitrary width array
Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with
Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as
Ith feature records, wherein i >=2.
It is less than the element of set of the target element of set scale for scale, according to first in the target arbitrary width array
Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with
Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as
Ith feature records, wherein i >=2, until the preceding i+1 random number sum in the target arbitrary width array is more than the collection
The scale of member, then stop the operation of the acquisition feature record.
In the present embodiment, the random number in the random array of target is used as arbitrary width, in particular to from previous feature
It recorded the relative skew of current signature record.The value of arbitrary width (namely random number in the random array of target) should be greater than
Integer equal to 0.
Fig. 1 b are referred to, the random array of use target provided for the embodiment of the present invention one obtains feature record in element of set
Schematic diagram.The random number that the target arbitrary width array includes is respectively 0,3,2,5,3,6,3.According to the target arbitrary width
First random number 0 in array, corresponding first feature record is obtained in the element of set shown in Fig. 1 b (on the left of in such as Fig. 1 b
Shown in first solid black mark);The 2nd random number in the target arbitrary width array, is obtained in the element of set
Interior skew relative with the 1st feature record (records phase for the record of the 2nd random number with the 1st feature
To the record that deviant is 3), as the 2nd feature record (as shown in second, left side solid black mark in Fig. 1 b);According to
The 3rd random number in the target arbitrary width array, obtain and record relative skew with the 2nd feature in the element of set
For the record (recording the record that relative deviant is 2 with the 2nd feature) of the 3rd random number, as the 3rd spy
Sign record (as shown in the 3rd, left side solid black mark in Fig. 1 b), the like, get remaining feature note in the element of set
Record, so as to obtain Sample Storehouse corresponding to the element of set.
Step 170, the union for determining Sample Storehouse corresponding to each element of set, the feature database as the whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and using target element of set scale to data depositary management
Whole target record in the specified table of reason system storage is divided to obtain each element of set, by determining that feature is remembered in target element of set
The number of record, corresponding random number is generated using random seed, and using the number that feature in target element of set records as initially with
The capacity of machine step-length array, initial random step-length array is obtained, each random number sum included by initial random step-length array
Judge with the uniformity of target element of set scale, the initial random step-length array of uniformity will be met as target chance move long number
Group, the quantity that the feature collected in each element of set can be controlled to record using target arbitrary width array, and element of set
Target arbitrary width array only need generation once just can be used for all element of sets, so as to reduce data base management system gather
The performance cost of feature record, reduce the cost for the feature record that CBO analyses collect, improve CBO analysis efficiency, this
Outside, each random number sum included due to target arbitrary width array is consistent with the target element of set scale, is obtained for division
Each element of set, it is ensured that the sample range of the feature record obtained in each element of set covers each element of set, can expire simultaneously
The randomness and the accuracy of sample rate sampled enough.
Embodiment two
Fig. 2 a are referred to, a kind of schematic flow sheet of the generation method of the feature database provided for the embodiment of the present invention two.Should
Method includes:Step 210~step 290.
Step 210, according to default initial element of set scale and sample percentage, determine target element of set scale and target
The number that feature records in element of set.
This step is equally applicable to the concrete operations in the step 110 of previous embodiment one, repeats no more.
Step 220, generate number and the number of feature record in the target element of set at random using default random seed
Identical random number, and each random number of generation is saved as into initial random step-length array, the span of each random number is situated between
Between 0 to the target element of set scale.
Step 230, calculate each random number sum that the initial random step-length array includes.
Step 240, judge each random number sum and the target element of set scale that the initial random step-length array includes
It is whether consistent, if so, step 250 is performed, if it is not, performing step 260.
Step 250, using the initial random step-length array as target arbitrary width array, continue executing with step 270.
Step 260, each random number included to the initial random step-length array carry out most modified twices, obtain target
Arbitrary width array, wherein, each random number sum that the target arbitrary width array includes and the target element of set scale it
Between error meet default error rate, continue executing with step 270.
It should be noted that each element of set obtained for division, in order to ensure that the sample range covering of feature record is whole
Element of set, the action scope of element of set Sample Storehouse should be whole element of set.Ideal be each random number for including of target arbitrary width array it
It is consistent with target element of set scale.
But due to the presence of random number, it is impossible to ensure generation each random number for including of initial random step-length array it
It is consistent with target element of set scale affirmative.Each random number for needing to include the initial random step-length array for this carries out most
Modified twice so that between each random number sum and the target element of set scale that obtained target arbitrary width array includes
Error meets default error rate.
First time modification method and second of modification method are introduced respectively below.
Refer to Fig. 2 b, for the embodiment of the present invention two provide by once correcting to obtain the stream of target arbitrary width array
Journey schematic diagram.Specifically include:Step 261~step 263.
Step 261, each random number equalization proportional zoom included to the initial random step-length array simultaneously round, and obtain
Once revised arbitrary width array, wherein, the zoom factor is the target element of set scale and the initial random step
The ratio for each random number sum that long array includes.
Step 262, judge each random number sum and the mesh that the first time revised arbitrary width array includes
Whether the error between mark element of set scale meets default error rate, if so, performing step 263.
Step 263, using the first time revised arbitrary width array as target arbitrary width array.
In other words, each random number sum and the target element of set rule that the initial random step-length array includes are being determined
When mould is inconsistent, first time amendment is carried out to each random number in original arbitrary width array, is specifically:To original arbitrary width
Each random number in array, according to each random number sum S that the initial random step-length array includes and target element of set scale G
Ratio etc. (each random number in original arbitrary width array is multiplied (G ÷ S) than ground scaling, then rounded.
The manner, determining each random number sum and the target element of set rule that the initial random step-length array includes
When mould is inconsistent, first time amendment is carried out to random number all in original arbitrary width array, so that correcting for the first time
Error between each random number sum and the target element of set scale that arbitrary width array afterwards includes meets default error rate,
And then target arbitrary width array is obtained, due to each random number sum that target arbitrary width array includes and the target element of set
Error between scale meets default error rate, therefore for dividing obtained each element of set, it is ensured that obtained in each element of set
The sample range of the feature record taken covers each element of set, can meet the randomness of sampling and the accuracy of sample rate simultaneously.
Fig. 2 c are referred to, the stream of target arbitrary width array is obtained by second-order correction for what the embodiment of the present invention two provided
Journey schematic diagram.Specifically include:Step 261~step 267.
Step 261, each random number equalization proportional zoom included to the initial random step-length array simultaneously round, and obtain
Once revised arbitrary width array, wherein, the zoom factor is the target element of set scale and the initial random step
The ratio for each random number sum that long array includes.
Step 262, judge each random number sum and the mesh that the first time revised arbitrary width array includes
Whether the error between mark element of set scale meets default error rate, if so, step 263 is performed, if it is not, performing step 264.
Step 263, using the first time revised arbitrary width array as target arbitrary width array.
In other words, each random number sum and the target element of set rule that the initial random step-length array includes are being determined
When mould is inconsistent, in original arbitrary width array each random number carry out first time amendment, if for the first time it is revised with
Error between each random number sum and the target element of set scale that machine step-length array includes meets default error rate, then passes through
Once amendment can obtain target arbitrary width array, otherwise, it is also necessary to which revised arbitrary width array for the first time is carried out
Second corrects, so that each random number sum that second of revised arbitrary width array includes and the target element of set
Error between scale meets default error rate, and then obtains target arbitrary width array.Second of modification method specifically includes:
Step 264~step 267.
Step 264, according to the target element of set scale and the default error rate, determine the lower limit of element of set coverage
And higher limit, continue executing with step 265.
Specifically, according to the target element of set scale G and the default error rate, the lower limit of element of set coverage is determined
L and higher limit U, wherein, L=G-G × default error rate, U=G+G × default error rate.
Step 265, the element of set coverage lower limit between higher limit, randomly selecting value as amendment
Parameter, determine the inclined of each random number sum that the corrected parameter includes with the first time revised arbitrary width array
Difference, continue executing with step 266.
That is, in the scope [L, U], one value W is as corrected parameter for random selection, and determine W with after first time amendment
Each random number sum S for including of arbitrary width array between deviation D.
Step 266, randomly select number in the first time revised arbitrary width array and the deviation is identical
Random number, continue executing with step 267.
Specifically, D random number is randomly selected in revised arbitrary width array for the first time, is corrected as second
Amendment object.
If step 267, the corrected parameter be more than the first time revised arbitrary width array include it is each with
Machine number sum, then the number randomly selected and each random number of deviation identical are added 1, after obtaining second of amendment
Arbitrary width array, and be used as target arbitrary width array;
If the corrected parameter is less than each random number sum that the first time revised arbitrary width array includes,
Then the number randomly selected and each random number of deviation identical are subtracted 1, obtain second of revised chance move
Long array, and it is used as target arbitrary width array.
If that is, W>Each random number sum S that revised arbitrary width array includes for the first time, then will repair for the first time
The D random number randomly selected in arbitrary width array after just each adds 1;Otherwise, will be revised for the first time random
The D random number randomly selected in step-length array each subtracts 1.
The manner, determining each random number sum and the target element of set rule that the initial random step-length array includes
When mould is inconsistent, first time amendment is carried out to all random number in original arbitrary width array, it is revised for the first time with
When error between each random number sum and the target element of set scale that machine step-length array includes meets default error rate, pass through
Once amendment can obtain target arbitrary width array;Each random number that revised arbitrary width array includes for the first time it
And the error between the target element of set scale carries out second and corrected when not meeting default error rate still, obtain target with
Machine step-length array, the error between each random number sum and the target element of set scale that are included due to target arbitrary width array
Meet default error rate, therefore for dividing obtained each element of set, it is ensured that the feature record obtained in each element of set
Sample range covers each element of set, can meet the randomness of sampling and the accuracy of sample rate simultaneously.
It should be noted that amendment for the first time is to be directed to random number all in original arbitrary width array;Repair for the second time
Just only need random selected part random number from revised arbitrary width array for the first time to correct, be not related to whole
Random number.
Step 270, according to the target element of set scale whole target record is divided.
Step 280, each element of set obtained for division, phase is obtained using the target arbitrary width array in the element of set
The feature record answered, as Sample Storehouse corresponding to the element of set.
This step is equally applicable to the utilization target arbitrary width array of the offer of above-described embodiment one in the element of set
The concrete operations of corresponding feature record are obtained, are repeated no more.
Step 290, the union for determining Sample Storehouse corresponding to each element of set, the feature database as the whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and using target element of set scale to data depositary management
Whole target record in the specified table of reason system storage is divided to obtain each element of set, by determining that feature is remembered in target element of set
The number of record, corresponding random number is generated using random seed, and using the number that feature in target element of set records as initially with
The capacity of machine step-length array, initial random step-length array is obtained, in each random number for determining initial random step-length array and including
Sum and target element of set scale it is inconsistent when, pass through most modified twices, you can target arbitrary width array, using target with
Machine step-length array can control the quantity that the feature collected in each element of set records, and the target arbitrary width of element of set
Array only needs generation once just to can be used for all element of sets, so as to reduce the performance of data base management system acquisition characteristics record
Expense, reduce the cost for the feature record that CBO analyses collect, improve CBO analysis efficiency, further, since target with
Error between each random number sum and the target element of set scale that machine step-length array includes meets default error rate, therefore right
In each element of set that division obtains, it is ensured that the sample range of the feature record obtained in each element of set covers each element of set,
The randomness of sampling and the accuracy of sample rate can be met simultaneously.
Embodiment three
A kind of referring to Fig. 3, structural representation of the generating means of the feature database provided for the embodiment of the present invention three.The dress
Put including:Parameter configuration module 310, initial random step-length array generation module 320, target arbitrary width array generation module
330 and feature database generation module 340.
Wherein, parameter configuration module 310 is used to, according to default initial element of set scale and sample percentage, determine target
The number that feature records in element of set scale and target element of set;Initial random step-length array generation module 320 is used for using default
Random seed generate the number identical random number of feature record in number and the target element of set at random, and by each of generation
Random number saves as initial random step-length array, and the span of each random number is arrived between the target element of set scale between 0;
Target arbitrary width array generation module 330 is used to calculate each random number sum that the initial random step-length array includes;
Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will it is described just
Beginning arbitrary width array is as target arbitrary width array;Feature database generation module 340 is used for according to the target element of set scale
Whole target record is divided;Each element of set obtained for division, using the target arbitrary width array in the element of set
It is interior to obtain corresponding feature record, as Sample Storehouse corresponding to the element of set;The union of Sample Storehouse corresponding to each element of set is determined, as
The feature database of the whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and using target element of set scale to data depositary management
Whole target record in the specified table of reason system storage is divided to obtain each element of set, by determining that feature is remembered in target element of set
The number of record, corresponding random number is generated using random seed, and using the number that feature in target element of set records as initially with
The capacity of machine step-length array, initial random step-length array is obtained, each random number sum included by initial random step-length array
Judge with the uniformity of target element of set scale, the initial random step-length array of uniformity will be met as target chance move long number
Group, the quantity that the feature collected in each element of set can be controlled to record using target arbitrary width array, and element of set
Target arbitrary width array only need generation once just can be used for all element of sets, so as to reduce data base management system gather
The performance cost of feature record, reduce the cost for the feature record that CBO analyses collect, improve CBO analysis efficiency, this
Outside, each random number sum included due to target arbitrary width array is consistent with the target element of set scale, is obtained for division
Each element of set, it is ensured that the sample range of the feature record obtained in each element of set covers each element of set, can expire simultaneously
The randomness and the accuracy of sample rate sampled enough.
In such scheme, the target arbitrary width array generation module 330 can be additionally used in, calculate it is described initially with
After each random number sum that machine step-length array includes, whole target record is divided according to the target element of set scale
Before, each random number sum that the initial random step-length array includes is being determined and the target element of set scale is inconsistent
When, each random number included to the initial random step-length array carries out most modified twices, obtains target arbitrary width array,
Wherein, the error between the target arbitrary width array includes each random number sum and the target element of set scale meets pre-
If error rate.
Further, the target arbitrary width array generation module 330 preferably includes:Calculating sub module, first are repaiied
Syndrome generation module and target arbitrary width array generation submodule.
Wherein, calculating sub module is used to calculate each random number sum that the initial random step-length array includes;First repaiies
Syndrome generation module is used for after each random number sum that the initial random step-length array includes is calculated, according to the object set
Before first scale divides to whole target record, determine each random number that the initial random step-length array includes it
With with the target element of set scale it is inconsistent when, each random number equalization proportional zoom for being included to the initial random step-length array
And round, revised arbitrary width array for the first time is obtained, wherein, the zoom factor is the target element of set scale and institute
State the ratio for each random number sum that initial random step-length array includes;Target arbitrary width array generation submodule is used to sentence
, will be described initial when each random number sum included of breaking to the initial random step-length array is consistent with the target element of set scale
Arbitrary width array is as target arbitrary width array;Or for determining the first time revised chance move long number
When error between each random number sum and the target element of set scale that group includes meets default error rate, by the first time
Revised arbitrary width array is as target arbitrary width array.
Further, the target arbitrary width array generation module 330 may also include:Second amendment submodule, is used for
After revised arbitrary width array for the first time is obtained, the first time revised arbitrary width array bag is being determined
When error between each random number sum and the target element of set scale that contain does not meet default error rate, according to the object set
First scale and the default error rate, determine the lower limit and higher limit of element of set coverage;In the element of set coverage
Lower limit determines that the corrected parameter is repaiied with the first time to a value between higher limit, is randomly selected as corrected parameter
The deviation for each random number sum that arbitrary width array after just includes;In the first time revised arbitrary width array
Randomly select number and the deviation identical random number;If it is revised random that the corrected parameter is more than the first time
Each random number sum that step-length array includes, then the number randomly selected and each random number of deviation identical are added
1, second of revised arbitrary width array is obtained, and be used as target arbitrary width array;If the corrected parameter is less than
Each random number sum that the first time revised arbitrary width array includes, then by the number randomly selected with it is described
Each random number of deviation identical subtracts 1, obtains second of revised arbitrary width array, and be used as target chance move long number
Group.
In such scheme, the parameter configuration module 310 is particularly used in:
Calculate default initial element of set scale and the product of default sample percentage;
When the initial element of set scale and the product of sample percentage are less than 1, expand the initial element of set scale, until meter
Element of set scale and the product of sample percentage after the expansion arrived are more than or equal to 1, and the current element of set scale after will be enlarged by is defined as
Target element of set scale, and be defined as feature in target element of set after the product of the target element of set scale and sample percentage is rounded and remember
The number of record.
In such scheme, if whole target record aliquot target element of set scale, divides obtained each element of set bag
Include:Number be whole target record divide exactly the business of target element of set scale and scale be the target element of set scale element of set;
If the aliquant target element of set scale of whole target record, each element of set for dividing to obtain include:Number is whole
Individual target record divides exactly the element of set that the business of target element of set scale and scale are the target element of set scale, and a scale is
Whole target record divides exactly the element of set of the remainder of target element of set scale.
Accordingly, the feature database generation module 340 is specifically used for:
For each element of set that scale is the target element of set scale, according to first in the target arbitrary width array
Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with
Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as
Ith feature records, wherein i >=2;
It is less than the element of set of the target element of set scale for scale, according to first in the target arbitrary width array
Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with
Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as
Ith feature records, wherein i >=2, until the preceding i+1 random number sum in the target arbitrary width array is more than the collection
The scale of member, then stop the operation of the acquisition feature record.
The generating means of feature database provided in an embodiment of the present invention can perform the feature that any embodiment of the present invention is provided
The generation method in storehouse, possess the corresponding functional module of execution method and beneficial effect.
Obviously, it will be understood by those skilled in the art that above-mentioned each module of the invention or each step can be by as above
Described server and client side implements.Alternatively, the embodiment of the present invention can be with the program that computer installation can perform come real
Existing, so as to be stored in storage device by processor to perform, described program can be stored in a kind of calculating
In machine readable storage medium storing program for executing, storage medium mentioned above can be read-only storage, disk or CD etc.;Or they are divided
Each integrated circuit modules are not fabricated to, or the multiple modules or step in them are fabricated to single integrated circuit module
Realize.So, the present invention is not restricted to the combination of any specific hardware and software.
Finally it should be noted that:Various embodiments above is merely to illustrate technical scheme, rather than it is limited
System;Preferred embodiment in embodiment, is not limited, and to those skilled in the art, the present invention can be with
There are various changes and change.All any modification, equivalent substitution and improvements made within spirit and principles of the present invention etc.,
It should be included within protection scope of the present invention.
Claims (10)
- A kind of 1. generation method of feature database, it is characterised in that including:According to default initial element of set scale and sample percentage, determine that feature is remembered in target element of set scale and target element of set The number of record;Number and the number identical random number of feature record in the target element of set are generated at random using default random seed, And each random number of generation is saved as into initial random step-length array, the span of each random number arrives the target between 0 Between element of set scale;Calculate each random number sum that the initial random step-length array includes;Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will The initial random step-length array is as target arbitrary width array;Whole target record is divided according to the target element of set scale;Each element of set obtained for division, corresponding feature is obtained in the element of set using the target arbitrary width array and is remembered Record, as Sample Storehouse corresponding to the element of set;The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
- 2. according to the method for claim 1, it is characterised in that calculate that the initial random step-length array includes it is each with After machine number sum, before being divided according to the target element of set scale to whole target record, methods described also includes:When determining each random number sum and the inconsistent target element of set scale that the initial random step-length array includes, Each random number included to the initial random step-length array carries out most modified twices, obtains target arbitrary width array, its In, the error between each random number sum and the target element of set scale that the target arbitrary width array includes meets default Error rate.
- 3. according to the method for claim 2, it is characterised in that each random number included to the initial random step-length array Most modified twices are carried out, obtain target arbitrary width array, including:Each random number equalization proportional zoom for being included to the initial random step-length array simultaneously rounds, and obtains revised for the first time Arbitrary width array, wherein, zoom factor be the target element of set scale and the initial random step-length array include it is each with The ratio of machine number sum;Determining each random number sum and the target element of set rule that the first time revised arbitrary width array includes When error between mould meets default error rate, using the first time revised arbitrary width array as target arbitrary width Array.
- 4. according to the method for claim 3, it is characterised in that obtain for the first time revised arbitrary width array it Afterwards, methods described also includes:Determining each random number sum and the target element of set rule that the first time revised arbitrary width array includes When error between mould does not meet default error rate, according to the target element of set scale and the default error rate, element of set is determined The lower limit and higher limit of coverage;In the lower limit of the element of set coverage to a value between higher limit, is randomly selected as corrected parameter, institute is determined State corrected parameter and the deviation of each random number sum that the first time revised arbitrary width array includes;Number and the deviation identical random number are randomly selected in the first time revised arbitrary width array;, will if the corrected parameter is more than each random number sum that the first time revised arbitrary width array includes The number randomly selected and each random number of deviation identical add 1, obtain second of revised chance move long number Group, and it is used as target arbitrary width array;, will if the corrected parameter is less than each random number sum that the first time revised arbitrary width array includes The number randomly selected and each random number of deviation identical subtract 1, obtain second of revised chance move long number Group, and it is used as target arbitrary width array.
- 5. according to any described methods of claim 1-4, it is characterised in that according to default initial element of set scale and sampling Percentage, the number that feature records in target element of set scale and target element of set is determined, including:Calculate default initial element of set scale and the product of default sample percentage;When the initial element of set scale and the product of sample percentage are less than 1, expand the initial element of set scale, arrived until calculating Expansion after the product of element of set scale and sample percentage be more than or equal to 1, the current element of set scale after will be enlarged by is defined as target Element of set scale, and it is defined as what feature in target element of set recorded after the product of the target element of set scale and sample percentage is rounded Number.
- 6. according to any described methods of claim 1-4, it is characterised in that:If whole target record aliquot target element of set scale, each element of set for dividing to obtain include:Number is whole target Record divides exactly the business of target element of set scale and scale is the element of set of the target element of set scale;If the aliquant target element of set scale of whole target record, each element of set for dividing to obtain include:Number is whole mesh It is the element of set of the target element of set scale that the business of target element of set scale and scale are divided exactly in mark record, and a scale is whole Target record divides exactly the element of set of the remainder of target element of set scale.
- 7. according to the method for claim 6, it is characterised in that obtained using the target arbitrary width array in the element of set Corresponding feature is taken to record, including:For each element of set that scale is the target element of set scale, first in the target arbitrary width array is random Number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array is random Number, the record with the skew relatively of the i-th -1 feature record for i-th of random number in the element of set is obtained, it is special as i-th Sign record, wherein i >=2;It is less than the element of set of the target element of set scale for scale, first in the target arbitrary width array is random Number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array is random Number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as i-th Individual feature record, wherein i >=2, until the preceding i+1 random number sum in the target arbitrary width array is more than the element of set Scale, then stop the operation of the acquisition feature record.
- A kind of 8. generating means of feature database, it is characterised in that including:Parameter configuration module, for according to default initial element of set scale and sample percentage, determine target element of set scale with And the number that feature records in target element of set;Initial random step-length array generation module, for generating number and the target element of set at random using default random seed The number identical random number of interior feature record, and each random number of generation is saved as into initial random step-length array, it is each random Several spans is arrived between the target element of set scale between 0;Target arbitrary width array generation module, each random number sum included for calculating the initial random step-length array; Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will described in Initial random step-length array is as target arbitrary width array;Feature database generation module, for being divided according to the target element of set scale to whole target record;For dividing Each element of set arrived, corresponding feature is obtained in the element of set using the target arbitrary width array and is recorded, as the element of set pair The Sample Storehouse answered;The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
- 9. device according to claim 8, it is characterised in that:The target arbitrary width array generation module is additionally operable to, and is included in the calculating initial random step-length array each random It is described initial determining before being divided according to the target element of set scale to whole target record after number sum When each random number sum that arbitrary width array includes and the inconsistent target element of set scale, to the initial random step-length number Each random number that group includes carries out most modified twices, obtains target arbitrary width array, wherein, the target chance move long number Error between each random number sum and the target element of set scale that group includes meets default error rate.
- 10. device according to claim 9, it is characterised in that the target arbitrary width array generation module includes:Calculating sub module, each random number sum included for calculating the initial random step-length array;First amendment submodule, for after each random number sum that the initial random step-length array includes is calculated, in root Before being divided according to the target element of set scale to whole target record, included determining the initial random step-length array Each random number sum and the target element of set scale it is inconsistent when, each random number for being included to the initial random step-length array Impartial proportional zoom simultaneously rounds, and obtains revised arbitrary width array for the first time, wherein, zoom factor is the target element of set The ratio for each random number sum that scale includes with the initial random step-length array;Target arbitrary width array generates submodule, in each random number for determining the initial random step-length array and including When sum is consistent with the target element of set scale, using the initial random step-length array as target arbitrary width array;Or For determining each random number sum and the target element of set rule that the first time revised arbitrary width array includes When error between mould meets default error rate, using the first time revised arbitrary width array as target arbitrary width Array;The target arbitrary width array generation module also includes:Second amendment submodule, for after revised arbitrary width array for the first time is obtained, determining described first Error between each random number sum and the target element of set scale that secondary revised arbitrary width array includes does not meet pre- If during error rate, according to the target element of set scale and the default error rate, determine the lower limit of element of set coverage and upper Limit value;In the lower limit of the element of set coverage to a value between higher limit, is randomly selected as corrected parameter, institute is determined State corrected parameter and the deviation of each random number sum that the first time revised arbitrary width array includes;Described first Number and the deviation identical random number are randomly selected in secondary revised arbitrary width array;If the corrected parameter is big In each random number sum that the first time revised arbitrary width array includes, then by the number randomly selected and institute State each random number of deviation identical and add 1, obtain second of revised arbitrary width array, and be used as target chance move long number Group;, will if the corrected parameter is less than each random number sum that the first time revised arbitrary width array includes The number randomly selected and each random number of deviation identical subtract 1, obtain second of revised chance move long number Group, and it is used as target arbitrary width array.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510173241.0A CN104809175B (en) | 2015-04-13 | 2015-04-13 | The generation method and device of feature database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510173241.0A CN104809175B (en) | 2015-04-13 | 2015-04-13 | The generation method and device of feature database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104809175A CN104809175A (en) | 2015-07-29 |
CN104809175B true CN104809175B (en) | 2018-02-27 |
Family
ID=53693997
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510173241.0A Active CN104809175B (en) | 2015-04-13 | 2015-04-13 | The generation method and device of feature database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104809175B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108279864A (en) * | 2018-01-31 | 2018-07-13 | 上海集成电路研发中心有限公司 | System random number generation method |
CN112308330B (en) * | 2020-11-09 | 2021-07-09 | 清华大学 | Digital accident database construction method and device and computer equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102081651A (en) * | 2010-12-29 | 2011-06-01 | 北京像素软件科技股份有限公司 | Table division method for online game database |
CN102999594A (en) * | 2012-11-16 | 2013-03-27 | 上海交通大学 | Safety nearest neighbor query method and system based on maximum division and random data block |
CN104156451A (en) * | 2014-08-18 | 2014-11-19 | 深圳市一五一十网络科技有限公司 | Data storage managing method and system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8880506B2 (en) * | 2009-10-16 | 2014-11-04 | Oracle International Corporation | Leveraging structured XML index data for evaluating database queries |
US9128984B2 (en) * | 2009-02-02 | 2015-09-08 | Hewlett-Packard Development Company, L.P. | Query plan analysis of alternative plans using robustness mapping |
CN104216894B (en) * | 2013-05-31 | 2017-07-14 | 国际商业机器公司 | Method and system for data query |
-
2015
- 2015-04-13 CN CN201510173241.0A patent/CN104809175B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102081651A (en) * | 2010-12-29 | 2011-06-01 | 北京像素软件科技股份有限公司 | Table division method for online game database |
CN102999594A (en) * | 2012-11-16 | 2013-03-27 | 上海交通大学 | Safety nearest neighbor query method and system based on maximum division and random data block |
CN104156451A (en) * | 2014-08-18 | 2014-11-19 | 深圳市一五一十网络科技有限公司 | Data storage managing method and system |
Non-Patent Citations (1)
Title |
---|
混合MapReduce环境下大数据划分的查询优化;李伏等;《计算机科学与探索》;20120815;第877页-第887页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104809175A (en) | 2015-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108364085B (en) | Takeout delivery time prediction method and device | |
AU2018221097B2 (en) | Data processing method and device | |
CN102930062B (en) | The method of the quick horizontal extension of a kind of database | |
Zhijia et al. | Study of the Xinanjiang model parameter calibration | |
CN104281940A (en) | Method and device for providing data processing mode list through communication network | |
CN104809175B (en) | The generation method and device of feature database | |
CN108204819A (en) | A kind of map datum automatic testing method and device and hybrid navigation system | |
CN104077438A (en) | Power grid large-scale topological structure construction method and system | |
CN103714124B (en) | Ultra-large-scale low-voltage data processing method | |
CN106250457A (en) | The inquiry processing method of big data platform Materialized View and system | |
CN101576849A (en) | Method for generating test data | |
CN108632047A (en) | A kind of determination method and device of tariff data | |
CN109872159B (en) | Block chain consensus method and architecture | |
CN110019205A (en) | A kind of data storage, restoring method, device and computer equipment | |
CN111190814A (en) | Software test case generation method and device, storage medium and terminal | |
CN112787402B (en) | District switch physical topology identification method based on power grid full data acquisition | |
CN106022590B (en) | Voltage quality evaluation method and device for active power distribution network | |
CN112733234A (en) | Three-dimensional bridge automatic calculation and generation device based on cable information transmission | |
CN103281202A (en) | System of browser/server architecture and front-end presentation method of system | |
CN104299065B (en) | A kind of method that Correctness of model is verified between dispatching automation main preparation system | |
CN109447512B (en) | Large power grid reliability assessment method based on uniform design | |
CN110784349A (en) | Automatic generation method and device for power communication equipment and network cutover scheme | |
JP5353641B2 (en) | Business process structure estimation method, program, and apparatus | |
WO2016206191A1 (en) | Data processing method and device | |
CN107133281A (en) | A kind of packet-based global multi-query optimization method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |