CN104809175B - The generation method and device of feature database - Google Patents

The generation method and device of feature database Download PDF

Info

Publication number
CN104809175B
CN104809175B CN201510173241.0A CN201510173241A CN104809175B CN 104809175 B CN104809175 B CN 104809175B CN 201510173241 A CN201510173241 A CN 201510173241A CN 104809175 B CN104809175 B CN 104809175B
Authority
CN
China
Prior art keywords
target
random number
arbitrary width
record
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510173241.0A
Other languages
Chinese (zh)
Other versions
CN104809175A (en
Inventor
朱仲颖
张钦
张黎敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN201510173241.0A priority Critical patent/CN104809175B/en
Publication of CN104809175A publication Critical patent/CN104809175A/en
Application granted granted Critical
Publication of CN104809175B publication Critical patent/CN104809175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The present invention provides a kind of generation method and device of feature database.This method includes:Determine the number that feature records in target element of set scale and target element of set;Number and the number identical random number of feature record in target element of set are generated at random using default random seed, and save as initial random step-length array;Most modified twices are carried out to initial random step-length array, obtain target arbitrary width array, correct efficiency high;Whole target record is divided according to target element of set scale;Corresponding feature record is obtained in each element of set using target arbitrary width array, as corresponding Sample Storehouse;Feature database using the union of Sample Storehouse corresponding to each element of set as whole target record.Target arbitrary width array, which need to only generate, once just can be used for all element of sets, reduces the performance cost of data base management system acquisition characteristics record, improves CBO analysis efficiency, can meet the accuracy for sampling randomness and sample rate.

Description

The generation method and device of feature database
Technical field
The present embodiments relate to the data sampling techniques of data base management system, more particularly to a kind of generation of feature database Method and apparatus.
Background technology
SQL (Structured Query Language, the structuralized query that data base management system inputs according to user Language) executive plan corresponding to sentence generation.Most of data base management systems, all introduce the optimizer based on cost (cost based optimizer, CBO), i.e. data base management system obtain the related all information of executive plan, by right These information do calculating analysis, show that the executive plan of a Least-cost in all feasible executive plans performs as final Plan, to improve the execution efficiency of data base management system.It is existing for CBO and the sampling to data-base recording calculates analysis Foundation stone.
Calculating analysis is carried out to all records, can no doubt improve CBO accuracy, but for magnanimity record, Such cost is too high, can reduce the execution efficiency of data base management system on the contrary.So how in the record of magnanimity, with Sample machine, obtain feature record, and generate feature database and be particularly important.
Generally, it is believed that the record in data base management system is Coutinuous store, obtains the sampling process one of feature database As be:Relatively current record-shifted A is after step-length, and one feature record of acquisition, then the relative feature got that offsets record Relative skew A ', obtain next feature record;This process repeatedly, finally gives feature database.
Due to the inhomogeneities of data distribution in database, current data base management system manufacturer is more using above-mentioned random The method of sampling obtains feature record, and so as to generate feature database, but how presently disclosed data is all without reference to effectively utilizing Arbitrary width generates the method for feature database.
The content of the invention
The embodiment of the present invention provides a kind of generation method and device of feature database, to optimize the acquisition mode of feature record.
In a first aspect, the embodiments of the invention provide a kind of generation method of feature database, including:
According to default initial element of set scale and sample percentage, determine special in target element of set scale and target element of set Levy the number of record;
Using default random seed generate at random the number identical of feature record in number and the target element of set with Machine number, and each random number of generation saved as into initial random step-length array, the span of each random number is between 0 to described Between target element of set scale;
Calculate each random number sum that the initial random step-length array includes;
Determining, each random number sum that the initial random step-length array includes is consistent with the target element of set scale When, using the initial random step-length array as target arbitrary width array;
Whole target record is divided according to the target element of set scale;
Each element of set obtained for division, corresponding feature is obtained using the target arbitrary width array in the element of set Record, as Sample Storehouse corresponding to the element of set;
The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
Second aspect, the embodiments of the invention provide a kind of generating means of feature database, including:
Parameter configuration module, for according to default initial element of set scale and sample percentage, determining that target element of set is advised The number that feature records in mould and target element of set;
Initial random step-length array generation module, for generating number and the target at random using default random seed The number identical random number that feature records in element of set, and each random number of generation is saved as into initial random step-length array, respectively The span of random number is arrived between the target element of set scale between 0;
Target arbitrary width array generation module, for calculate each random number that the initial random step-length array includes it With;Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will The initial random step-length array is as target arbitrary width array;
Feature database generation module, for being divided according to the target element of set scale to whole target record;For drawing Each element of set got, corresponding feature is obtained in the element of set using the target arbitrary width array and is recorded, as the collection Sample Storehouse corresponding to member;The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
The generation method and device of feature database provided in an embodiment of the present invention, by determining target element of set scale, and use Whole target record in the specified table that target element of set scale stores to data base management system is divided to obtain each element of set, is led to The number for determining feature record in target element of set is crossed, corresponding random number is generated using random seed, and will be special in target element of set Capacity of the number of record as initial random step-length array is levied, initial random step-length array is obtained, passes through initial random step-length The uniformity of each random number sum that array includes and target element of set scale judges, will meet the initial random step-length number of uniformity Group is used as target arbitrary width array, and the feature collected in each element of set can be controlled using target arbitrary width array The quantity of record, and the target arbitrary width array of element of set only needs generation once just to can be used for all element of sets, so as to reduce The performance cost of data base management system acquisition characteristics record, reduce the cost for the feature record that CBO analyses collect, carry High CBO analysis efficiency, further, since each random number sum that target arbitrary width array includes is advised with the target element of set Mould is consistent, each element of set obtained for division, it is ensured that the sample range covering of the feature record obtained in each element of set Each element of set, the randomness of sampling and the accuracy of sample rate can be met simultaneously.
Brief description of the drawings
In order to illustrate more clearly of the present invention, one will be done to the required accompanying drawing used in the present invention below and be simply situated between Continue, it should be apparent that, drawings in the following description are some embodiments of the present invention, are come for those of ordinary skill in the art Say, without having to pay creative labor, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 a are a kind of schematic flow sheet of the generation method for feature database that the embodiment of the present invention one provides;
Fig. 1 b are the signal for obtaining feature record in element of set using the random array of target that the embodiment of the present invention one provides Figure;
Fig. 2 a are a kind of schematic flow sheet of the generation method for feature database that the embodiment of the present invention two provides;
Fig. 2 b are that the flow by once correcting to obtain target arbitrary width array that the embodiment of the present invention two provides is illustrated Figure;
Fig. 2 c are that the flow that target arbitrary width array is obtained by second-order correction that the embodiment of the present invention two provides is illustrated Figure;
Fig. 3 is a kind of structural representation of the generating means for feature database that the embodiment of the present invention three provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the embodiment of the present invention In technical scheme be described in further detail, it is clear that described embodiment is part of the embodiment of the present invention, rather than entirely The embodiment in portion.It is understood that specific embodiment described herein is only used for explaining the present invention, rather than to the present invention's Limit, based on the embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of creative work is not made Every other embodiment, belong to the scope of protection of the invention.It also should be noted that for the ease of description, accompanying drawing In illustrate only part related to the present invention rather than full content.
Embodiment one
Fig. 1 a are referred to, a kind of schematic flow sheet of the generation method of the feature database provided for the embodiment of the present invention one.This The method of inventive embodiments can be performed by the generating means for configuring the feature database with hardware and/or software realization, the realization Device can be typically integrated in the server that can provide feature record collection service.
This method includes:Step 110~step 170.
Step 110, according to default initial element of set scale and sample percentage, determine target element of set scale and target The number that feature records in element of set.
Specifically, following two steps can be included:
Calculate default initial element of set scale and the product of default sample percentage;
When the initial element of set scale and the product of sample percentage are less than 1, expand the initial element of set scale, until meter Element of set scale and the product of sample percentage after the expansion arrived are more than or equal to 1, and the current element of set scale after will be enlarged by is defined as Target element of set scale, and be defined as feature in target element of set after the product of the target element of set scale and sample percentage is rounded and remember The number of record.
Wherein, element of set scale refers to the sum for the target record that element of set includes, and the target record is stored in data depositary management In reason system, the whole target record in the specified table stored in data base management system is divided into several element of sets.That is, number According to many tables are stored with base management system, different records is stored with each table, and the division object pin of the present embodiment To be whole target record in the specified table stored in data base management system.
Target element of set scale and the product of default sample percentage are more than or equal to 1, refer to corresponding to target element of set scale A feature record is at least obtained in element of set.
For example, it is assumed that default initial element of set scale G0For 1000, default sample percentage is 0.09%, is calculated Default initial element of set scale G0Product with default sample percentage is 0.9, determines initial element of set scale G0With sampling hundred Point than product 0.9 less than 1, by initial element of set scale G0Expand 10 times, the current element of set scale G after expansion is 10000;Count again The product that calculation obtains current element of set scale G and default sample percentage is 9, determines current element of set scale G and sample percentage Product 9 be more than 1, now, the current element of set scale G after will be enlarged by is defined as target element of set scale, and the target element of set is advised The product 9 of mould 10000 and sample percentage 0.09% is defined as the number that feature records in target element of set after rounding, and represents every In 10000 target records should 9 target records of random acquisition recorded as feature, namely in target element of set scale be The feature that 9 target records of random acquisition are both needed in 10000 each element of set as each target element of set records.
Step 120, generate number and the number of feature record in the target element of set at random using default random seed Identical random number, and each random number of generation is saved as into initial random step-length array, the span of each random number is situated between Between 0 to the target element of set scale.
In other words, the number that feature records in target element of set is N, then generates N number of random number, N at random using random seed The span of each random number in individual random number is between 0 arrives the target element of set scale G, and by the N number of of generation Random number saves as initial random step-length array, that is, the number N that feature records in target element of set is exactly initial random step-length number The capacity of group.
In the present embodiment, the setting of random seed is to ensure that initial random step-length array is relatively controllable.
Step 130, calculate each random number sum that the initial random step-length array includes.
Step 140, determining each random number sum and the target element of set that the initial random step-length array includes When scale is consistent, using the initial random step-length array as target arbitrary width array.
Step 150, according to the target element of set scale whole target record is divided.
Step 160, each element of set obtained for division, phase is obtained using the target arbitrary width array in the element of set The feature record answered, as Sample Storehouse corresponding to the element of set.
Specifically, if whole target record aliquot target element of set scale, each element of set for dividing to obtain include:Number Divide exactly the business of target element of set scale for whole target record and scale be the target element of set scale element of set.For example, it is assumed that number According to the whole target record stored in base management system be 30000, it is determined that target element of set scale G be 10000, it is default to adopt Sample percentage is 0.09%, then division obtains 3 element of sets, and the scale of each element of set is 10000.
If the aliquant target element of set scale of whole target record, each element of set for dividing to obtain include:Number is whole Individual target record divides exactly the element of set that the business of target element of set scale and scale are the target element of set scale, and a scale is Whole target record divides exactly the element of set of the remainder of target element of set scale.For example, it is assumed that what is stored in data base management system is whole Target record be 34000, it is determined that target element of set scale G be 10000, default sample percentage be 0.09%, then divide 4 element of sets are obtained, the scale of preceding 3 element of sets is 10000, and the scale of the 4th element of set is 4000.
It should be noted that be the quantity that can control each acquisition characteristics record using the advantages of element of set, and element of set Target arbitrary width array only needs generation once just to can be used for all element of sets, special so as to reduce data base management system collection The performance cost of record is levied, reduces the cost for the feature record that CBO analyses collect.
Correspondingly, corresponding feature is obtained in the element of set using the target arbitrary width array to record, specifically can be with Including:
For each element of set that scale is the target element of set scale, according to first in the target arbitrary width array Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as Ith feature records, wherein i >=2.
It is less than the element of set of the target element of set scale for scale, according to first in the target arbitrary width array Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as Ith feature records, wherein i >=2, until the preceding i+1 random number sum in the target arbitrary width array is more than the collection The scale of member, then stop the operation of the acquisition feature record.
In the present embodiment, the random number in the random array of target is used as arbitrary width, in particular to from previous feature It recorded the relative skew of current signature record.The value of arbitrary width (namely random number in the random array of target) should be greater than Integer equal to 0.
Fig. 1 b are referred to, the random array of use target provided for the embodiment of the present invention one obtains feature record in element of set Schematic diagram.The random number that the target arbitrary width array includes is respectively 0,3,2,5,3,6,3.According to the target arbitrary width First random number 0 in array, corresponding first feature record is obtained in the element of set shown in Fig. 1 b (on the left of in such as Fig. 1 b Shown in first solid black mark);The 2nd random number in the target arbitrary width array, is obtained in the element of set Interior skew relative with the 1st feature record (records phase for the record of the 2nd random number with the 1st feature To the record that deviant is 3), as the 2nd feature record (as shown in second, left side solid black mark in Fig. 1 b);According to The 3rd random number in the target arbitrary width array, obtain and record relative skew with the 2nd feature in the element of set For the record (recording the record that relative deviant is 2 with the 2nd feature) of the 3rd random number, as the 3rd spy Sign record (as shown in the 3rd, left side solid black mark in Fig. 1 b), the like, get remaining feature note in the element of set Record, so as to obtain Sample Storehouse corresponding to the element of set.
Step 170, the union for determining Sample Storehouse corresponding to each element of set, the feature database as the whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and using target element of set scale to data depositary management Whole target record in the specified table of reason system storage is divided to obtain each element of set, by determining that feature is remembered in target element of set The number of record, corresponding random number is generated using random seed, and using the number that feature in target element of set records as initially with The capacity of machine step-length array, initial random step-length array is obtained, each random number sum included by initial random step-length array Judge with the uniformity of target element of set scale, the initial random step-length array of uniformity will be met as target chance move long number Group, the quantity that the feature collected in each element of set can be controlled to record using target arbitrary width array, and element of set Target arbitrary width array only need generation once just can be used for all element of sets, so as to reduce data base management system gather The performance cost of feature record, reduce the cost for the feature record that CBO analyses collect, improve CBO analysis efficiency, this Outside, each random number sum included due to target arbitrary width array is consistent with the target element of set scale, is obtained for division Each element of set, it is ensured that the sample range of the feature record obtained in each element of set covers each element of set, can expire simultaneously The randomness and the accuracy of sample rate sampled enough.
Embodiment two
Fig. 2 a are referred to, a kind of schematic flow sheet of the generation method of the feature database provided for the embodiment of the present invention two.Should Method includes:Step 210~step 290.
Step 210, according to default initial element of set scale and sample percentage, determine target element of set scale and target The number that feature records in element of set.
This step is equally applicable to the concrete operations in the step 110 of previous embodiment one, repeats no more.
Step 220, generate number and the number of feature record in the target element of set at random using default random seed Identical random number, and each random number of generation is saved as into initial random step-length array, the span of each random number is situated between Between 0 to the target element of set scale.
Step 230, calculate each random number sum that the initial random step-length array includes.
Step 240, judge each random number sum and the target element of set scale that the initial random step-length array includes It is whether consistent, if so, step 250 is performed, if it is not, performing step 260.
Step 250, using the initial random step-length array as target arbitrary width array, continue executing with step 270.
Step 260, each random number included to the initial random step-length array carry out most modified twices, obtain target Arbitrary width array, wherein, each random number sum that the target arbitrary width array includes and the target element of set scale it Between error meet default error rate, continue executing with step 270.
It should be noted that each element of set obtained for division, in order to ensure that the sample range covering of feature record is whole Element of set, the action scope of element of set Sample Storehouse should be whole element of set.Ideal be each random number for including of target arbitrary width array it It is consistent with target element of set scale.
But due to the presence of random number, it is impossible to ensure generation each random number for including of initial random step-length array it It is consistent with target element of set scale affirmative.Each random number for needing to include the initial random step-length array for this carries out most Modified twice so that between each random number sum and the target element of set scale that obtained target arbitrary width array includes Error meets default error rate.
First time modification method and second of modification method are introduced respectively below.
Refer to Fig. 2 b, for the embodiment of the present invention two provide by once correcting to obtain the stream of target arbitrary width array Journey schematic diagram.Specifically include:Step 261~step 263.
Step 261, each random number equalization proportional zoom included to the initial random step-length array simultaneously round, and obtain Once revised arbitrary width array, wherein, the zoom factor is the target element of set scale and the initial random step The ratio for each random number sum that long array includes.
Step 262, judge each random number sum and the mesh that the first time revised arbitrary width array includes Whether the error between mark element of set scale meets default error rate, if so, performing step 263.
Step 263, using the first time revised arbitrary width array as target arbitrary width array.
In other words, each random number sum and the target element of set rule that the initial random step-length array includes are being determined When mould is inconsistent, first time amendment is carried out to each random number in original arbitrary width array, is specifically:To original arbitrary width Each random number in array, according to each random number sum S that the initial random step-length array includes and target element of set scale G Ratio etc. (each random number in original arbitrary width array is multiplied (G ÷ S) than ground scaling, then rounded.
The manner, determining each random number sum and the target element of set rule that the initial random step-length array includes When mould is inconsistent, first time amendment is carried out to random number all in original arbitrary width array, so that correcting for the first time Error between each random number sum and the target element of set scale that arbitrary width array afterwards includes meets default error rate, And then target arbitrary width array is obtained, due to each random number sum that target arbitrary width array includes and the target element of set Error between scale meets default error rate, therefore for dividing obtained each element of set, it is ensured that obtained in each element of set The sample range of the feature record taken covers each element of set, can meet the randomness of sampling and the accuracy of sample rate simultaneously.
Fig. 2 c are referred to, the stream of target arbitrary width array is obtained by second-order correction for what the embodiment of the present invention two provided Journey schematic diagram.Specifically include:Step 261~step 267.
Step 261, each random number equalization proportional zoom included to the initial random step-length array simultaneously round, and obtain Once revised arbitrary width array, wherein, the zoom factor is the target element of set scale and the initial random step The ratio for each random number sum that long array includes.
Step 262, judge each random number sum and the mesh that the first time revised arbitrary width array includes Whether the error between mark element of set scale meets default error rate, if so, step 263 is performed, if it is not, performing step 264.
Step 263, using the first time revised arbitrary width array as target arbitrary width array.
In other words, each random number sum and the target element of set rule that the initial random step-length array includes are being determined When mould is inconsistent, in original arbitrary width array each random number carry out first time amendment, if for the first time it is revised with Error between each random number sum and the target element of set scale that machine step-length array includes meets default error rate, then passes through Once amendment can obtain target arbitrary width array, otherwise, it is also necessary to which revised arbitrary width array for the first time is carried out Second corrects, so that each random number sum that second of revised arbitrary width array includes and the target element of set Error between scale meets default error rate, and then obtains target arbitrary width array.Second of modification method specifically includes: Step 264~step 267.
Step 264, according to the target element of set scale and the default error rate, determine the lower limit of element of set coverage And higher limit, continue executing with step 265.
Specifically, according to the target element of set scale G and the default error rate, the lower limit of element of set coverage is determined L and higher limit U, wherein, L=G-G × default error rate, U=G+G × default error rate.
Step 265, the element of set coverage lower limit between higher limit, randomly selecting value as amendment Parameter, determine the inclined of each random number sum that the corrected parameter includes with the first time revised arbitrary width array Difference, continue executing with step 266.
That is, in the scope [L, U], one value W is as corrected parameter for random selection, and determine W with after first time amendment Each random number sum S for including of arbitrary width array between deviation D.
Step 266, randomly select number in the first time revised arbitrary width array and the deviation is identical Random number, continue executing with step 267.
Specifically, D random number is randomly selected in revised arbitrary width array for the first time, is corrected as second Amendment object.
If step 267, the corrected parameter be more than the first time revised arbitrary width array include it is each with Machine number sum, then the number randomly selected and each random number of deviation identical are added 1, after obtaining second of amendment Arbitrary width array, and be used as target arbitrary width array;
If the corrected parameter is less than each random number sum that the first time revised arbitrary width array includes, Then the number randomly selected and each random number of deviation identical are subtracted 1, obtain second of revised chance move Long array, and it is used as target arbitrary width array.
If that is, W>Each random number sum S that revised arbitrary width array includes for the first time, then will repair for the first time The D random number randomly selected in arbitrary width array after just each adds 1;Otherwise, will be revised for the first time random The D random number randomly selected in step-length array each subtracts 1.
The manner, determining each random number sum and the target element of set rule that the initial random step-length array includes When mould is inconsistent, first time amendment is carried out to all random number in original arbitrary width array, it is revised for the first time with When error between each random number sum and the target element of set scale that machine step-length array includes meets default error rate, pass through Once amendment can obtain target arbitrary width array;Each random number that revised arbitrary width array includes for the first time it And the error between the target element of set scale carries out second and corrected when not meeting default error rate still, obtain target with Machine step-length array, the error between each random number sum and the target element of set scale that are included due to target arbitrary width array Meet default error rate, therefore for dividing obtained each element of set, it is ensured that the feature record obtained in each element of set Sample range covers each element of set, can meet the randomness of sampling and the accuracy of sample rate simultaneously.
It should be noted that amendment for the first time is to be directed to random number all in original arbitrary width array;Repair for the second time Just only need random selected part random number from revised arbitrary width array for the first time to correct, be not related to whole Random number.
Step 270, according to the target element of set scale whole target record is divided.
Step 280, each element of set obtained for division, phase is obtained using the target arbitrary width array in the element of set The feature record answered, as Sample Storehouse corresponding to the element of set.
This step is equally applicable to the utilization target arbitrary width array of the offer of above-described embodiment one in the element of set The concrete operations of corresponding feature record are obtained, are repeated no more.
Step 290, the union for determining Sample Storehouse corresponding to each element of set, the feature database as the whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and using target element of set scale to data depositary management Whole target record in the specified table of reason system storage is divided to obtain each element of set, by determining that feature is remembered in target element of set The number of record, corresponding random number is generated using random seed, and using the number that feature in target element of set records as initially with The capacity of machine step-length array, initial random step-length array is obtained, in each random number for determining initial random step-length array and including Sum and target element of set scale it is inconsistent when, pass through most modified twices, you can target arbitrary width array, using target with Machine step-length array can control the quantity that the feature collected in each element of set records, and the target arbitrary width of element of set Array only needs generation once just to can be used for all element of sets, so as to reduce the performance of data base management system acquisition characteristics record Expense, reduce the cost for the feature record that CBO analyses collect, improve CBO analysis efficiency, further, since target with Error between each random number sum and the target element of set scale that machine step-length array includes meets default error rate, therefore right In each element of set that division obtains, it is ensured that the sample range of the feature record obtained in each element of set covers each element of set, The randomness of sampling and the accuracy of sample rate can be met simultaneously.
Embodiment three
A kind of referring to Fig. 3, structural representation of the generating means of the feature database provided for the embodiment of the present invention three.The dress Put including:Parameter configuration module 310, initial random step-length array generation module 320, target arbitrary width array generation module 330 and feature database generation module 340.
Wherein, parameter configuration module 310 is used to, according to default initial element of set scale and sample percentage, determine target The number that feature records in element of set scale and target element of set;Initial random step-length array generation module 320 is used for using default Random seed generate the number identical random number of feature record in number and the target element of set at random, and by each of generation Random number saves as initial random step-length array, and the span of each random number is arrived between the target element of set scale between 0; Target arbitrary width array generation module 330 is used to calculate each random number sum that the initial random step-length array includes; Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will it is described just Beginning arbitrary width array is as target arbitrary width array;Feature database generation module 340 is used for according to the target element of set scale Whole target record is divided;Each element of set obtained for division, using the target arbitrary width array in the element of set It is interior to obtain corresponding feature record, as Sample Storehouse corresponding to the element of set;The union of Sample Storehouse corresponding to each element of set is determined, as The feature database of the whole target record.
The technical scheme of the present embodiment, by determining target element of set scale, and using target element of set scale to data depositary management Whole target record in the specified table of reason system storage is divided to obtain each element of set, by determining that feature is remembered in target element of set The number of record, corresponding random number is generated using random seed, and using the number that feature in target element of set records as initially with The capacity of machine step-length array, initial random step-length array is obtained, each random number sum included by initial random step-length array Judge with the uniformity of target element of set scale, the initial random step-length array of uniformity will be met as target chance move long number Group, the quantity that the feature collected in each element of set can be controlled to record using target arbitrary width array, and element of set Target arbitrary width array only need generation once just can be used for all element of sets, so as to reduce data base management system gather The performance cost of feature record, reduce the cost for the feature record that CBO analyses collect, improve CBO analysis efficiency, this Outside, each random number sum included due to target arbitrary width array is consistent with the target element of set scale, is obtained for division Each element of set, it is ensured that the sample range of the feature record obtained in each element of set covers each element of set, can expire simultaneously The randomness and the accuracy of sample rate sampled enough.
In such scheme, the target arbitrary width array generation module 330 can be additionally used in, calculate it is described initially with After each random number sum that machine step-length array includes, whole target record is divided according to the target element of set scale Before, each random number sum that the initial random step-length array includes is being determined and the target element of set scale is inconsistent When, each random number included to the initial random step-length array carries out most modified twices, obtains target arbitrary width array, Wherein, the error between the target arbitrary width array includes each random number sum and the target element of set scale meets pre- If error rate.
Further, the target arbitrary width array generation module 330 preferably includes:Calculating sub module, first are repaiied Syndrome generation module and target arbitrary width array generation submodule.
Wherein, calculating sub module is used to calculate each random number sum that the initial random step-length array includes;First repaiies Syndrome generation module is used for after each random number sum that the initial random step-length array includes is calculated, according to the object set Before first scale divides to whole target record, determine each random number that the initial random step-length array includes it With with the target element of set scale it is inconsistent when, each random number equalization proportional zoom for being included to the initial random step-length array And round, revised arbitrary width array for the first time is obtained, wherein, the zoom factor is the target element of set scale and institute State the ratio for each random number sum that initial random step-length array includes;Target arbitrary width array generation submodule is used to sentence , will be described initial when each random number sum included of breaking to the initial random step-length array is consistent with the target element of set scale Arbitrary width array is as target arbitrary width array;Or for determining the first time revised chance move long number When error between each random number sum and the target element of set scale that group includes meets default error rate, by the first time Revised arbitrary width array is as target arbitrary width array.
Further, the target arbitrary width array generation module 330 may also include:Second amendment submodule, is used for After revised arbitrary width array for the first time is obtained, the first time revised arbitrary width array bag is being determined When error between each random number sum and the target element of set scale that contain does not meet default error rate, according to the object set First scale and the default error rate, determine the lower limit and higher limit of element of set coverage;In the element of set coverage Lower limit determines that the corrected parameter is repaiied with the first time to a value between higher limit, is randomly selected as corrected parameter The deviation for each random number sum that arbitrary width array after just includes;In the first time revised arbitrary width array Randomly select number and the deviation identical random number;If it is revised random that the corrected parameter is more than the first time Each random number sum that step-length array includes, then the number randomly selected and each random number of deviation identical are added 1, second of revised arbitrary width array is obtained, and be used as target arbitrary width array;If the corrected parameter is less than Each random number sum that the first time revised arbitrary width array includes, then by the number randomly selected with it is described Each random number of deviation identical subtracts 1, obtains second of revised arbitrary width array, and be used as target chance move long number Group.
In such scheme, the parameter configuration module 310 is particularly used in:
Calculate default initial element of set scale and the product of default sample percentage;
When the initial element of set scale and the product of sample percentage are less than 1, expand the initial element of set scale, until meter Element of set scale and the product of sample percentage after the expansion arrived are more than or equal to 1, and the current element of set scale after will be enlarged by is defined as Target element of set scale, and be defined as feature in target element of set after the product of the target element of set scale and sample percentage is rounded and remember The number of record.
In such scheme, if whole target record aliquot target element of set scale, divides obtained each element of set bag Include:Number be whole target record divide exactly the business of target element of set scale and scale be the target element of set scale element of set;
If the aliquant target element of set scale of whole target record, each element of set for dividing to obtain include:Number is whole Individual target record divides exactly the element of set that the business of target element of set scale and scale are the target element of set scale, and a scale is Whole target record divides exactly the element of set of the remainder of target element of set scale.
Accordingly, the feature database generation module 340 is specifically used for:
For each element of set that scale is the target element of set scale, according to first in the target arbitrary width array Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as Ith feature records, wherein i >=2;
It is less than the element of set of the target element of set scale for scale, according to first in the target arbitrary width array Random number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array with Machine number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as Ith feature records, wherein i >=2, until the preceding i+1 random number sum in the target arbitrary width array is more than the collection The scale of member, then stop the operation of the acquisition feature record.
The generating means of feature database provided in an embodiment of the present invention can perform the feature that any embodiment of the present invention is provided The generation method in storehouse, possess the corresponding functional module of execution method and beneficial effect.
Obviously, it will be understood by those skilled in the art that above-mentioned each module of the invention or each step can be by as above Described server and client side implements.Alternatively, the embodiment of the present invention can be with the program that computer installation can perform come real Existing, so as to be stored in storage device by processor to perform, described program can be stored in a kind of calculating In machine readable storage medium storing program for executing, storage medium mentioned above can be read-only storage, disk or CD etc.;Or they are divided Each integrated circuit modules are not fabricated to, or the multiple modules or step in them are fabricated to single integrated circuit module Realize.So, the present invention is not restricted to the combination of any specific hardware and software.
Finally it should be noted that:Various embodiments above is merely to illustrate technical scheme, rather than it is limited System;Preferred embodiment in embodiment, is not limited, and to those skilled in the art, the present invention can be with There are various changes and change.All any modification, equivalent substitution and improvements made within spirit and principles of the present invention etc., It should be included within protection scope of the present invention.

Claims (10)

  1. A kind of 1. generation method of feature database, it is characterised in that including:
    According to default initial element of set scale and sample percentage, determine that feature is remembered in target element of set scale and target element of set The number of record;
    Number and the number identical random number of feature record in the target element of set are generated at random using default random seed, And each random number of generation is saved as into initial random step-length array, the span of each random number arrives the target between 0 Between element of set scale;
    Calculate each random number sum that the initial random step-length array includes;
    Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will The initial random step-length array is as target arbitrary width array;
    Whole target record is divided according to the target element of set scale;
    Each element of set obtained for division, corresponding feature is obtained in the element of set using the target arbitrary width array and is remembered Record, as Sample Storehouse corresponding to the element of set;
    The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
  2. 2. according to the method for claim 1, it is characterised in that calculate that the initial random step-length array includes it is each with After machine number sum, before being divided according to the target element of set scale to whole target record, methods described also includes:
    When determining each random number sum and the inconsistent target element of set scale that the initial random step-length array includes, Each random number included to the initial random step-length array carries out most modified twices, obtains target arbitrary width array, its In, the error between each random number sum and the target element of set scale that the target arbitrary width array includes meets default Error rate.
  3. 3. according to the method for claim 2, it is characterised in that each random number included to the initial random step-length array Most modified twices are carried out, obtain target arbitrary width array, including:
    Each random number equalization proportional zoom for being included to the initial random step-length array simultaneously rounds, and obtains revised for the first time Arbitrary width array, wherein, zoom factor be the target element of set scale and the initial random step-length array include it is each with The ratio of machine number sum;
    Determining each random number sum and the target element of set rule that the first time revised arbitrary width array includes When error between mould meets default error rate, using the first time revised arbitrary width array as target arbitrary width Array.
  4. 4. according to the method for claim 3, it is characterised in that obtain for the first time revised arbitrary width array it Afterwards, methods described also includes:
    Determining each random number sum and the target element of set rule that the first time revised arbitrary width array includes When error between mould does not meet default error rate, according to the target element of set scale and the default error rate, element of set is determined The lower limit and higher limit of coverage;
    In the lower limit of the element of set coverage to a value between higher limit, is randomly selected as corrected parameter, institute is determined State corrected parameter and the deviation of each random number sum that the first time revised arbitrary width array includes;
    Number and the deviation identical random number are randomly selected in the first time revised arbitrary width array;
    , will if the corrected parameter is more than each random number sum that the first time revised arbitrary width array includes The number randomly selected and each random number of deviation identical add 1, obtain second of revised chance move long number Group, and it is used as target arbitrary width array;
    , will if the corrected parameter is less than each random number sum that the first time revised arbitrary width array includes The number randomly selected and each random number of deviation identical subtract 1, obtain second of revised chance move long number Group, and it is used as target arbitrary width array.
  5. 5. according to any described methods of claim 1-4, it is characterised in that according to default initial element of set scale and sampling Percentage, the number that feature records in target element of set scale and target element of set is determined, including:
    Calculate default initial element of set scale and the product of default sample percentage;
    When the initial element of set scale and the product of sample percentage are less than 1, expand the initial element of set scale, arrived until calculating Expansion after the product of element of set scale and sample percentage be more than or equal to 1, the current element of set scale after will be enlarged by is defined as target Element of set scale, and it is defined as what feature in target element of set recorded after the product of the target element of set scale and sample percentage is rounded Number.
  6. 6. according to any described methods of claim 1-4, it is characterised in that:
    If whole target record aliquot target element of set scale, each element of set for dividing to obtain include:Number is whole target Record divides exactly the business of target element of set scale and scale is the element of set of the target element of set scale;
    If the aliquant target element of set scale of whole target record, each element of set for dividing to obtain include:Number is whole mesh It is the element of set of the target element of set scale that the business of target element of set scale and scale are divided exactly in mark record, and a scale is whole Target record divides exactly the element of set of the remainder of target element of set scale.
  7. 7. according to the method for claim 6, it is characterised in that obtained using the target arbitrary width array in the element of set Corresponding feature is taken to record, including:
    For each element of set that scale is the target element of set scale, first in the target arbitrary width array is random Number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array is random Number, the record with the skew relatively of the i-th -1 feature record for i-th of random number in the element of set is obtained, it is special as i-th Sign record, wherein i >=2;
    It is less than the element of set of the target element of set scale for scale, first in the target arbitrary width array is random Number, corresponding first feature record is obtained in the element of set;I-th in the target arbitrary width array is random Number, obtain and record record of the relative skew for i-th of random number with the i-th -1 feature in the element of set, as i-th Individual feature record, wherein i >=2, until the preceding i+1 random number sum in the target arbitrary width array is more than the element of set Scale, then stop the operation of the acquisition feature record.
  8. A kind of 8. generating means of feature database, it is characterised in that including:
    Parameter configuration module, for according to default initial element of set scale and sample percentage, determine target element of set scale with And the number that feature records in target element of set;
    Initial random step-length array generation module, for generating number and the target element of set at random using default random seed The number identical random number of interior feature record, and each random number of generation is saved as into initial random step-length array, it is each random Several spans is arrived between the target element of set scale between 0;
    Target arbitrary width array generation module, each random number sum included for calculating the initial random step-length array; Determine each random number sum that the initial random step-length array includes it is consistent with the target element of set scale when, will described in Initial random step-length array is as target arbitrary width array;
    Feature database generation module, for being divided according to the target element of set scale to whole target record;For dividing Each element of set arrived, corresponding feature is obtained in the element of set using the target arbitrary width array and is recorded, as the element of set pair The Sample Storehouse answered;The union of Sample Storehouse corresponding to each element of set is determined, the feature database as the whole target record.
  9. 9. device according to claim 8, it is characterised in that:
    The target arbitrary width array generation module is additionally operable to, and is included in the calculating initial random step-length array each random It is described initial determining before being divided according to the target element of set scale to whole target record after number sum When each random number sum that arbitrary width array includes and the inconsistent target element of set scale, to the initial random step-length number Each random number that group includes carries out most modified twices, obtains target arbitrary width array, wherein, the target chance move long number Error between each random number sum and the target element of set scale that group includes meets default error rate.
  10. 10. device according to claim 9, it is characterised in that the target arbitrary width array generation module includes:
    Calculating sub module, each random number sum included for calculating the initial random step-length array;
    First amendment submodule, for after each random number sum that the initial random step-length array includes is calculated, in root Before being divided according to the target element of set scale to whole target record, included determining the initial random step-length array Each random number sum and the target element of set scale it is inconsistent when, each random number for being included to the initial random step-length array Impartial proportional zoom simultaneously rounds, and obtains revised arbitrary width array for the first time, wherein, zoom factor is the target element of set The ratio for each random number sum that scale includes with the initial random step-length array;
    Target arbitrary width array generates submodule, in each random number for determining the initial random step-length array and including When sum is consistent with the target element of set scale, using the initial random step-length array as target arbitrary width array;Or For determining each random number sum and the target element of set rule that the first time revised arbitrary width array includes When error between mould meets default error rate, using the first time revised arbitrary width array as target arbitrary width Array;
    The target arbitrary width array generation module also includes:
    Second amendment submodule, for after revised arbitrary width array for the first time is obtained, determining described first Error between each random number sum and the target element of set scale that secondary revised arbitrary width array includes does not meet pre- If during error rate, according to the target element of set scale and the default error rate, determine the lower limit of element of set coverage and upper Limit value;In the lower limit of the element of set coverage to a value between higher limit, is randomly selected as corrected parameter, institute is determined State corrected parameter and the deviation of each random number sum that the first time revised arbitrary width array includes;Described first Number and the deviation identical random number are randomly selected in secondary revised arbitrary width array;If the corrected parameter is big In each random number sum that the first time revised arbitrary width array includes, then by the number randomly selected and institute State each random number of deviation identical and add 1, obtain second of revised arbitrary width array, and be used as target chance move long number Group;, will if the corrected parameter is less than each random number sum that the first time revised arbitrary width array includes The number randomly selected and each random number of deviation identical subtract 1, obtain second of revised chance move long number Group, and it is used as target arbitrary width array.
CN201510173241.0A 2015-04-13 2015-04-13 The generation method and device of feature database Active CN104809175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510173241.0A CN104809175B (en) 2015-04-13 2015-04-13 The generation method and device of feature database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510173241.0A CN104809175B (en) 2015-04-13 2015-04-13 The generation method and device of feature database

Publications (2)

Publication Number Publication Date
CN104809175A CN104809175A (en) 2015-07-29
CN104809175B true CN104809175B (en) 2018-02-27

Family

ID=53693997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510173241.0A Active CN104809175B (en) 2015-04-13 2015-04-13 The generation method and device of feature database

Country Status (1)

Country Link
CN (1) CN104809175B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279864A (en) * 2018-01-31 2018-07-13 上海集成电路研发中心有限公司 System random number generation method
CN112308330B (en) * 2020-11-09 2021-07-09 清华大学 Digital accident database construction method and device and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081651A (en) * 2010-12-29 2011-06-01 北京像素软件科技股份有限公司 Table division method for online game database
CN102999594A (en) * 2012-11-16 2013-03-27 上海交通大学 Safety nearest neighbor query method and system based on maximum division and random data block
CN104156451A (en) * 2014-08-18 2014-11-19 深圳市一五一十网络科技有限公司 Data storage managing method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8880506B2 (en) * 2009-10-16 2014-11-04 Oracle International Corporation Leveraging structured XML index data for evaluating database queries
US9128984B2 (en) * 2009-02-02 2015-09-08 Hewlett-Packard Development Company, L.P. Query plan analysis of alternative plans using robustness mapping
CN104216894B (en) * 2013-05-31 2017-07-14 国际商业机器公司 Method and system for data query

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081651A (en) * 2010-12-29 2011-06-01 北京像素软件科技股份有限公司 Table division method for online game database
CN102999594A (en) * 2012-11-16 2013-03-27 上海交通大学 Safety nearest neighbor query method and system based on maximum division and random data block
CN104156451A (en) * 2014-08-18 2014-11-19 深圳市一五一十网络科技有限公司 Data storage managing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
混合MapReduce环境下大数据划分的查询优化;李伏等;《计算机科学与探索》;20120815;第877页-第887页 *

Also Published As

Publication number Publication date
CN104809175A (en) 2015-07-29

Similar Documents

Publication Publication Date Title
CN108364085B (en) Takeout delivery time prediction method and device
AU2018221097B2 (en) Data processing method and device
CN102930062B (en) The method of the quick horizontal extension of a kind of database
Zhijia et al. Study of the Xinanjiang model parameter calibration
CN104281940A (en) Method and device for providing data processing mode list through communication network
CN104809175B (en) The generation method and device of feature database
CN108204819A (en) A kind of map datum automatic testing method and device and hybrid navigation system
CN104077438A (en) Power grid large-scale topological structure construction method and system
CN103714124B (en) Ultra-large-scale low-voltage data processing method
CN106250457A (en) The inquiry processing method of big data platform Materialized View and system
CN101576849A (en) Method for generating test data
CN108632047A (en) A kind of determination method and device of tariff data
CN109872159B (en) Block chain consensus method and architecture
CN110019205A (en) A kind of data storage, restoring method, device and computer equipment
CN111190814A (en) Software test case generation method and device, storage medium and terminal
CN112787402B (en) District switch physical topology identification method based on power grid full data acquisition
CN106022590B (en) Voltage quality evaluation method and device for active power distribution network
CN112733234A (en) Three-dimensional bridge automatic calculation and generation device based on cable information transmission
CN103281202A (en) System of browser/server architecture and front-end presentation method of system
CN104299065B (en) A kind of method that Correctness of model is verified between dispatching automation main preparation system
CN109447512B (en) Large power grid reliability assessment method based on uniform design
CN110784349A (en) Automatic generation method and device for power communication equipment and network cutover scheme
JP5353641B2 (en) Business process structure estimation method, program, and apparatus
WO2016206191A1 (en) Data processing method and device
CN107133281A (en) A kind of packet-based global multi-query optimization method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant