CN101567694B - Multilevel data sampling method based on connected subgraph - Google Patents

Multilevel data sampling method based on connected subgraph Download PDF

Info

Publication number
CN101567694B
CN101567694B CN 200910031265 CN200910031265A CN101567694B CN 101567694 B CN101567694 B CN 101567694B CN 200910031265 CN200910031265 CN 200910031265 CN 200910031265 A CN200910031265 A CN 200910031265A CN 101567694 B CN101567694 B CN 101567694B
Authority
CN
China
Prior art keywords
sampling
data
distortion
sampled
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200910031265
Other languages
Chinese (zh)
Other versions
CN101567694A (en
Inventor
钱宇
张康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 200910031265 priority Critical patent/CN101567694B/en
Publication of CN101567694A publication Critical patent/CN101567694A/en
Application granted granted Critical
Publication of CN101567694B publication Critical patent/CN101567694B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a multilevel data sampling method based on a connected subgraph, which is characterized in that the method comprises the following steps: (1) establishing K nearest adjacent graphs or K commonly-adjacent graphs for input data, wherein K is an integer; (2) acquiring connected subgraphs of the established K nearest adjacent graphs orthe K commonly-adjacent graphs; (3) calculating the average value or the intermediate value of the connected subgraphs as sampling points, wherein a collection of the sampling points is a result of the sampling; and (4) taking the sampling result obtained in the step (3) as new input data, repeating the step (1) to the step (3) until sampling end conditions are met, and realizing sampling required multilevel data. A concept of graphs is introduced in the sampling process; therefore, the method has high sampling efficiency and short running time; and the sampling end conditions can be set, so that a system can automatically stop continuous sampling when the sampling points of the data are too few and cannot represent source data.

Description

A kind of multilevel data sampling method based on connected subgraph
Technical field
The application relates to a kind of new method of data sampling, belongs to information processing and statistics field, through data are sampled, saves the expense of data space, transmission time and data analysis.
Background technology
Data sampling is widely used in information processing and statistics field.Especially after the popularizing of extensive use digital equipment and network, need the data volume rapid growth handled, and growth rate is being accelerated always.Such growth has received memory capacity, communication bandwidth, the restriction of each side such as system understanding ability.Often need sample or compress in this case, thereby in preserving data, accelerate storage, transmission and the analysis of data under the prerequisite of essential information data.
The simplest also the most widely used method of sampling is exactly a stochastical sampling.It is short that it has operation time, be easy to advantages such as realization, yet its shortcoming also clearly.The uncontrollable sampling process that causes in position that at first is sampled point is not reproducible, and sampling error also just is difficult to control; The selected probability in the data area that its two data point is few is low, in sampling, is left in the basket usually, causes whole zone not have representative point; The prior number of input sample point of user in addition, and domestic consumer is difficult to know that what sampled points could both guarantee that sample minimized not serious distortion.An important improvement to stochastical sampling is based on the position that concentration (being data-intensive degree) is adjusted sampled point, and the area sampling frequency that concentration is low more is high more, so just can guarantee that also there is sampled point in the few zone of data point.But doing has like this increased the expense of calculating, nor can remedy other defective of stochastical sampling method.
Vector quantization coding (Vector Quantization) is another kind of typical sampling compression method.LBG (Linde-Buzo-Gray) algorithm is a typical vectorization coding method.It uses the K-means clustering algorithm to produce representative point, then same group of representative point is applied to new data calculating sampling point.The shortcoming of LBG algorithm is that running time is longer, and the user need specify sample size equally.The LBG algorithm is a kind of learning method that supervision is arranged, and is not suitable for the occasion that does not have training data.
Two immediate data points of the each merging of PNN algorithm arrive the sample size of user's appointment up to the decreased number of data point.Its algorithm complex is O (N 3), the user need specify sample size, exists domestic consumer to be difficult to confirm the problem of sampled point quantity equally.
Summary of the invention
The object of the invention provides a kind of multilevel data sampling method based on connected subgraph, is reducing algorithm complex, when reducing the sampling time, realizes the automatic termination of multilevel data sampling.
For achieving the above object, the technical scheme that the present invention adopts is: a kind of multilevel data sampling method based on connected subgraph comprises the following steps:
(1) the input data is set up map interlinking of K arest neighbors or the common adjacent map of K; For N data vector, K is the integer of
Figure G2009100312657D00021
;
(2) obtain the K arest neighbors map interlinking set up or the connected subgraph of the common adjacent map of K;
(3) to each connected subgraph, calculate its mean value or median as a sampled point, the set of all sampled points is the result of this time sampling;
(4) sampled result of obtaining with step (3) is as new input data, and repeating step (1) until satisfying the sampling end condition, is realized required multilevel data sampling to (3).
In the technique scheme, the K in the step (1) is an integer variable, and the value of K is big more, and the number of the sampled point of generation is few more, is not having to get K=1 usually under the situation of specified otherwise; A sampled result can be by further sampling, and promptly repeating step (1) it should be noted that to (3) further sampling also should continue to use mean value if in step (3), use mean value; If what use is median, further sampling also should continue to use median, claims that the former is an average sample, and the latter is the intermediate value sampling.Sampling each time produces the sampled point of lesser number than preceding once sampling, reduced data fidelity, but obtained the more data sample of refining.Data sample set of these compositions of sample, the number of the sampled point in these samples progressively successively decreases, and the distortion factor progressively increases, but the level of abstraction progressively increases.Can simple proof, if do not stop further to sample, last sampling will only comprise a sampled point.This moment data distortion maximum but the level of abstraction is the highest.
In the technique scheme, the foundation of map interlinking of K arest neighbors or the common adjacent map of K is prior art, explains as follows:
A) the input data can be as the set of one group of data vector, and each data vector has the attribute of similar number, and property value can be sky.
B) each data vector can represent that the property value of data vector is exactly the coordinate figure of this point with a point in the cartesian coordinate system.
C) similarity between the every pair of data vector is the Euclidean distance between these two points.
D) the arest neighbors map interlinking with each data point with link to each other with its nearest other K data points.
E) the common adjacent map of K requires to judge whether that for each closest approach of each data point X an X also is one of K closest approach of this closest approach.If not, in common adjacent map, will there be on line between some X and this closest approach.
F) create an arest neighbors map interlinking or common adjacent map based on one group of data point.
Whether G) whether two data points are close is in same connected subgraph definition by it.
H) one group of close data point can characterize with its center.
I) center of one group of data point can be defined as the mean value or the median of this group data point.
J) sustainable the carrying out of process of sampling.The output that sampling each time produces is the input of next time sampling.Thereby reach the purpose of constantly dwindling sample.
In the technique scheme, the sampling end condition in the said step (4) is explained as follows,
A) tentation data collection G comprises N vector point.A samples for the first time 1Produce sampling D 1, the sampled point number is N 1, A samples for the second time 2Produce sampling D 2, the sampled point number is N 2... to the last only surplus 1 sampled point, according to the said algorithm of claim 1,1<...<N 2<N 1<N and for all i, D iThe distortion factor less than D I+1
B) with sampled point from N iBecome N I+1Reduction and sample D iTo D I+1The distortion factor be changed to benchmark, if at next sampling process A I+2In, N I+1To N I+2The reduction changes in amplitude little, yet the sampling D I+2Compare D I+1The distortion factor increase considerably, then the declarative data pattern is destroyed.Sampling must stop when once finishing preceding.
C) the normal ratio of sampling distortion degree and the sampled point reduction reduction that begins most to be compared with initial data by sampling for the first time and the average distance between raw data points self are estimated.The expectation distortion factor of sampling is estimated by the degree of distortion in service of last time and the sampled point minimizing number of this sampling each time then.
The degree of distortion in service of definition sampled result for each data point arrive its nearest sampled point apart from sum, represent by following formula (1)
ad = ( Σ i = 1 N dist ( X i , C ( X i ) ) 2 / N ) 1 / 2 - - - ( 1 )
Wherein ad is the distortion factor, and N is input data number, and the input data are X 1, X 2..., X N, corresponding sampling points is C (X 1), C (X 2) ..., C (X N), C (X wherein i) be to return X iThe function of sampled point, dist (X i, C (X i)) then be an X iAnd C (X i) between Euclidean distance;
A iThe sampled result D that obtains when the stage samples iThe expectation distortion factor define by the ratio that previous degree of distortion in service and sample dwindle:
pd i=(ad i-1+ad 1)(N i-1/N i) 1/d-ad 1 ∀ i > i (2)
Wherein, d is the dimension of data, ad 1Be the degree of distortion in service of phase I sampling, ad I-1Be sample phase A I-1Degree of distortion in service, N I-1Expression sample phase A I-1The sampled point number, N iExpression sample phase A iThe sampled point number;
As sample phase A tSatisfy condition: &ForAll; i < t , Ad i≤pd iAnd ad t>pd t, promptly the degree of distortion in service when the t time sampling is higher than when estimating the distortion factor, and sampling stops automatically.
Perhaps, the sampling end condition in the said step (4) is that system continues sampling till the sampled point number is 1; In this process, preserve the result of each sampling, require from the result who preserves, to choose required sample according to the distortion factor of size or sample by the user.
Further technical scheme, in the said step (1), K=1.
Because the technique scheme utilization, the present invention compared with prior art has advantage:
1. the present invention has introduced the notion of figure in sampling process, each whole connected subgraph of forming by the phase near point that merges, and sampling efficiency is higher, and running time is shorter than PNN, and to N data vector, the algorithm complex of PNN is O (N 3), and method proposed by the invention is O (N 2); Low dimension data more is reduced to O (NlogN), and sampling process continues to carry out simultaneously, and the output of each sampling can be used as the input of sampling last time; The user need not specify sample size, can finish the back in sampling and from the sample set that produces, select such as the size and the distortion factor according to the attribute of sample.
2. the present invention can be provided with the sampling stop condition, and the very few sampling that stops automatically can't represent source data the time continuing can be put at data sampling by system.
Description of drawings
Fig. 1 is the sampling algorithm sketch map among the embodiment one;
Fig. 2 is the sampling algorithm sketch map among the embodiment two;
Fig. 3 is the data sampling sketch map as a result of embodiment two.
Embodiment
Below in conjunction with accompanying drawing and embodiment the present invention is further described:
Embodiment one: shown in accompanying drawing 1, a kind of multilevel data sampling method based on connected subgraph comprises the following steps:
(1) the input data is set up map interlinking of K arest neighbors or the common adjacent map of K, get K=1;
(2) obtain the connected subgraph of the arest neighbors map interlinking set up or common adjacent map;
(3) to each connected subgraph, calculate its mean value as a sampled point, the set of all sampled points is the result of this time sampling;
(4) sampled result of obtaining with step (3) is as new input data, and repeating step (1) till the sampled point number is 1, is realized required multilevel data sampling to (3).
In this process, preserve the result of each sampling, require from the result who preserves, to choose required sample based on the distortion factor of size or sample by the user.
Source data with 40 bivectors is an example, and sampled result is as shown in the table.
Form 1: source data (40 bivectors) is 1 through three samplings until the sampled point number
Primary data is sampled for the first time to sample for the second time and is sampled for the third time
169.7998 74.50672 173.4262 ?72.55558 ?176.273 72.16654 132.0092 133.3186
166.7934 63.05775 166.173 59.49513 ?112.3492 ?168.3199
202.7356 the 76.41366 195.7937 79.14297 107.1235 148.6582 sample distortion factors 59.752489
193.5498 70.38806 160.2242 73.59232 132.291 144.1296 sample sizes 1
164.6338 53.93999 172.0314 ?92.31077
195.8784 the 80.32839 189.9896 55.90245 sample distortion factors 14.352662
197.9796 81.092 106.9004 165.3225 sample sizes 4
171.0991 69.93897 96.29275 ?147.0986
158.0456 73.13734 110.0779 ?147.4216
167.9566 89.69758 114.9997 ?151.4545
162.4027 74.0473 ?134.6302 ?138.35
180.0624 64.92613 129.9518 ?149.9093
193.5652 54.91849 117.798 171.3173
167.0918 61.48766
188.8248 the 87.49272 sample distortion factors 5.037373
173.3717 93.5637 sample sizes 13
189.9956 57.91901
172.7436 80.8505
186.4079 54.86986
174.7658 93.67102
106.7336 165.4157
93.78157 158.2717
107.0671 165.2292
97.56 145.3172
107.346 151.3961
113.5398 152.9269
136.9691 137.3639
131.4998 153.2854
129.429 137.3368
94.33297 144.2756
114.8089 168.0962
123.047 174.4868
115.5382 171.3688
114.358 151.3219
110.1604 149.505
128.4038 146.5332
112.7274 141.3637
117.1014 150.1147
99.49649 140.5297
137.4924 140.3492
Embodiment two: shown in accompanying drawing 2, a kind of multilevel data sampling method based on connected subgraph comprises the following steps:
(1) the input data is set up map interlinking of K arest neighbors or the common adjacent map of K, get K=1;
(2) obtain the connected subgraph of the arest neighbors map interlinking set up or common adjacent map;
(3) to each connected subgraph, calculate its mean value as a sampled point, the set of all sampled points is the result of this time sampling;
(4) sampled result of obtaining with step (3) is as new input data, and repeating step (1) until satisfying the sampling end condition, is realized required multilevel data sampling to (3).
Sampling end condition in the said step (4) does,
The degree of distortion in service of definition sampled result for each data point arrive its nearest sampled point apart from sum, represent by following formula (1)
ad = ( &Sigma; i = 1 N dist ( X i , C ( X i ) ) 2 / N ) 1 / 2 - - - ( 1 )
Wherein ad is the distortion factor, and N is input data number, and the input data are X 1, X 2..., X N, corresponding sampling points is C (X 1), C (X 2) ..., C (X N), C (X wherein i) be to return X iThe function of sampled point, dist (X i, C (X i)) then be an X iAnd C (X i) between Euclidean distance;
A iThe sampled result D that obtains when the stage samples iThe expectation distortion factor define by the ratio that previous degree of distortion in service and sample dwindle:
pd i=(ad i-1+ad 1)(N i-1/N i) 1/d-ad 1 &ForAll; i > 1 (2)
Wherein, d is the dimension of data, ad 1Be the degree of distortion in service of phase I sampling, ad I-1Be sample phase A I-1Degree of distortion in service, N I-1Expression sample phase A I-1The sampled point number, N iExpression sample phase A iThe sampled point number;
As sample phase A tSatisfy condition: &ForAll; i < t , Ad i≤pd iAnd ad t>pd t, promptly the degree of distortion in service when the t time sampling is higher than when estimating the distortion factor, and sampling stops automatically.
Referring to accompanying drawing 3, be that a two-dimentional data set is used result's diagram that the present embodiment method is carried out twice continuous sampling.Raw data set has 60000 data points; First round sampling obtains 18612 sampled points; Second takes turns sampling then further narrows down to 5272 points with sample.Can see that from diagram sample has kept the data distribution pattern of initial data.Such sample is for Application of pattern recognition, has expressed initial data less than 9% sample data, thereby has greatly saved the memory space of data, the time of transmission time and data analysis.
Another one data example sees the following form.
Form 2: source data (200 bivectors) stops through 3 samplings automatically
Primary data is sampled for the first time to sample for the second time and is sampled for the third time
169.7997776 74.50672302 173.426227 72.55558 176.272997 72.166536 114.129588 81.678394
166.7933783 63.05774696 166.172992 59.49513 112.349174 168.31988 98.778549 ?170.6385
202.7356144 76.41366387 195.793659 79.14297 102.559628 142.22789 239.078243 181.25253
193.5498314 70.38806409 160.224158 73.59232 132.291001 144.12965 142.265443 240.25743
164.6338182 53.93998559 172.031384 92.31077 239.945324 165.63614 299.776329 241.73492
195.8784235 80.3283904 ?189.989563 55.90245 238.211163 196.86892 96.537169 ?326.67356
197.9796412 81.09200337 106.900354 165.3225 137.050508 247.99492 290.446734 71.594537
171.0991433 69.93896589 96.292754 ?147.0986 147.480377 232.51995
158.0455791 73.13733714 110.077946 147.4216 278.149385 310.43626 degree of distortion in service 44.71084
167.9566124 89.69758466 114.99971 151.4545 106.978795 322.56198 distortion estimator degree 35.793693
162.402737 74.04729564 ?134.63016 138.35 86.095542 330.78514
180.0623559 64.92612878 129.951841 149.9093 260.092535 70.635186 sample sizes: 7
193.5651531 54.91848737 117.797993 171.3173 305.544689 46.601105 distortion estimator degree<degree of distortion in service
167.0917808 61.48766111 239.22731 161.3595 305.702979 97.547319 samplings stop
188.8247858 87.49271938 226.118832 188.5736 292.850153 230.32031
173.3717278 93.5637015 ?222.426235 168.0075 328.329449 184.44819
189.9956312 57.91901393 235.327043 205.4823 52.936274 ?214.40892
172.7436315 80.85050011 245.850746 165.365 ?93.756668 ?184.10616
186.4079048 54.86986105 252.277006 167.8126 105.704 ?94.144446
174.7658105 93.67102089 253.187613 196.5509 60.411768 ?78.7242
106.7335659 165.4157125 156.460007 254.659
93.78156518 158.2716753 136.229203 247.991 degree of distortion in service 17.587867
107.0671423 165.2292331 128.699508 239.2955 distortion estimator degree 22.400175
97.55999842 145.3172039 154.11463 ?235.5789
107.3460249 151.3960818 140.846123 229.461 sample sizes: 20
113.5397882 152.9269165 142.974101 260.8046 distortion estimator degree>degree of distortion in service
136.9691312 137.3639306 120.88972 237.2245 samplings can continue
131.4998494 153.2854434 265.981118 303.8293
129.4289837 137.3368179 292.491996 309.1535
94.33296513 144.2756367 282.205436 296.0294
114.808868 168.0962061 ?280.493316 326.4101
123.0469569 174.4868095 278.662942 311.712
115.5381537 171.3688195 269.061504 315.4833
114.357963 151.3218645 ?96.617666 312.5471
110.1603909 149.5049531 90.827076 ?340.7038
128.4038316 146.5332052 114.241393 315.3127
112.7274216 141.3636768 84.754761 ?329.8122
117.101379 150.1147107 ?117.92681 335.7055
99.49648807 140.5296939 82.70479 321.8394
137.4923659 140.3491654 99.129313 ?326.6825
235.2346947 162.2671429 269.295705 61.01423
221.0634138 196.3193248 317.537391 56.33134
239.1552526 159.9851635 292.489174 104.9481
218.9283092 169.496724 ?293.551986 36.87087
234.4233497 203.305418 ?250.889365 80.25614
238.394729 161.5411915 ?318.916785 90.14651
225.9241606 166.5181978 270.771999 218.9408
248.1528726 164.2897652 289.953656 250.0757
220.2353575 187.7092192 320.75349 ?193.944
240.8418064 182.6705168 335.905407 174.9524
242.8508888 161.364773 ?318.524202 234.5358
240.5009864 161.638997 ?292.150756 217.7289
253.8990317 166.5638696 62.420124 ?231.0347
253.6184175 194.0966203 43.452424 ?197.7831
252.7568084 199.0051041 92.529373 ?191.1006
222.3540715 191.3795932 94.983963 ?177.1117
226.0995132 184.7891818 114.265343 106.6252
243.5486188 166.4402978 61.483045 ?107.5512
236.2307353 207.6592271 97.455955 ?98.69963
250.6549806 169.0613718 105.390702 77.1085
153.6266513 264.3702777 88.868103 ?122.937
140.2554482 244.3291167 61.503216 ?44.26813
127.4981005 237.3244286 58.249043 ?84.35325
150.0450694 238.2197243
139.9488918 229.9625585 degree of distortion in service 8.785595
152.4128685 253.7015052 distortion estimator degree N/A
129.9009162 241.2665011
165.0723569 242.5129241 sample sizes: 63
137.5494144 259.9084716
157.388017 248.9612482
153.8284342 258.0855702
158.1841914 232.9381509
123.5829529 242.5294869
121.7241763 239.8786043
156.4317144 260.3225515
137.6629203 249.5985649
146.6320587 262.0880899
144.7408289 260.4171827
141.7433537 228.9593758
130.7692399 250.0454107
270.2152371 302.4620969
272.177858 306.1354661
295.2478816 314.6564228
282.2274576 303.1310009
281.5348935 326.4685219
279.3628062 312.8095062
291.6605084 306.0408
277.9630781 310.6144287
255.5502594 302.8904372
284.4850095 290.7885228
284.7994825 295.7164115
279.451738 326.3516213
270.3049469 311.9283375
277.3097955 294.4818477
291.3577418 308.7947095
271.100245 314.1428586
265.4750133 315.1515665
272.0883585 314.3626217
266.3389558 321.8309697
291.7018514 307.1219722
93.99414939 312.6508404
88.68348413 340.924915
92.97066779 340.4827158
108.6450879 322.5673166
79.92435981 329.0684224
122.537263 328.2926627
110.6075591 313.373927
80.64741437 320.3693805
116.5127737 339.7257055
104.5108955 331.2226058
87.07737898 328.9883561
99.39594504 324.5342045
93.48109903 324.2907953
83.57849292 329.7692623
84.76216555 323.3094776
114.7303927 339.0982433
88.4388114 331.4226647
117.9328551 310.056421
119.7800703 315.2532746
99.24118189 312.4433646
264.5958928 55.52543732
315.1461845 48.29076454
299.4000389 110.5455044
319.9285974 64.371913
296.6773291 58.69920433
282.4113216 31.95051365
273.9955181 66.50301916
274.787336 95.34290012
260.4228332 77.96874658
321.2433564 85.01493692
239.8905738 80.35811653
301.9841923 109.7890365
253.0161871 76.02233523
299.4793505 24.86013611
293.7851308 104.115055
288.6640596 29.63217691
300.5278703 39.21232171
250.2278647 86.67537928
312.069147 95.20103603
323.4378523 90.22356736
274.1651315 199.4700348
290.7485889 245.0252879
320.9462415 195.4197857
333.9955542 170.8090094
329.4123264 195.6631252
342.1049732 208.4304607
290.6502159 248.8807703
288.4621634 256.3210496
313.718554 227.5113916
326.8414486 229.1041347
289.3432217 220.4176825
271.4961509 213.0431657
316.3975971 196.53059
315.0126038 246.9920203
337.8152603 179.095773
293.2171377 167.177807
294.9582904 215.0402094
272.4049588 225.8314788
265.0217553 237.418322
322.442665 200.4421204
68.83360455 215.1384059
36.98937184 194.9917144
95.12690113 186.1412053
45.78798258 194.24183
53.76184019 218.3447956
95.3531616 175.9992938
55.69049976 246.8171295
93.07948623 190.493174
66.13368371 240.4401933
58.95348222 188.9294226
45.05658057 177.5552126
76.60044508 192.3126568
41.73376966 244.1037793
94.61476488 178.2241594
65.21252494 227.6262047
39.50000902 208.2659874
34.42711922 222.7144593
85.57494506 224.7726419
105.3106608 195.4553072
117.3620314 229.2653928
112.1635718 106.3020536
66.45241694 115.9552987
55.12324246 98.01123497
97.12060662 97.7510742
104.4510167 75.69535689
62.87347605 108.6871273
96.45828527 121.7012639
114.2851887 92.88195385
56.97830472 45.52186784
84.70505196 75.57541713
101.0071849 94.05021798
80.71707671 34.05409407
108.8038105 58.64582851
116.3671141 106.9483625
114.7084426 82.74396325
81.27792014 124.172652
55.03400835 83.2137695
46.81426629 53.22843064
61.46407719 85.49272839
94.24007377 104.2975849
Embodiment three: a kind of multilevel data sampling method based on connected subgraph comprises the following steps:
(1) the input data is set up map interlinking of K arest neighbors or the common adjacent map of K, get K=1;
(2) obtain the K arest neighbors map interlinking set up or the connected subgraph of the common adjacent map of K;
(3) to each connected subgraph, calculate its median as a sampled point, the set of all sampled points is the result of this time sampling;
(4) sampled result of obtaining with step (3) is as new input data, and repeating step (1) until satisfying the sampling end condition, is realized required multilevel data sampling to (3).
Sampling end condition in the said step (4) does,
The degree of distortion in service of definition sampled result for each data point arrive its nearest sampled point apart from sum, represent by following formula (1)
ad = ( &Sigma; i = 1 N dist ( X i , C ( X i ) ) 2 / N ) 1 / 2 - - - ( 1 )
Wherein ad is the distortion factor, and N is input data number, and the input data are X 1, X 2..., X N, corresponding sampling points is C (X 1), C (X 2) ..., C (X N), C (X wherein i) be to return X iThe function of sampled point, dist (X i, C (X i)) then be an X iAnd C (X i) between Euclidean distance;
A iThe sampled result D that obtains when the stage samples iThe expectation distortion factor define by the ratio that previous degree of distortion in service and sample dwindle:
pd i=(ad i-1+ad 1)(N i-1/N i) 1/d-ad 1 &ForAll; i > 1 (2)
Wherein, d is the dimension of data, ad 1Be the degree of distortion in service of phase I sampling, ad I-1Be sample phase A I-1Degree of distortion in service, N I-1Expression sample phase A I-1The sampled point number, N iExpression sample phase A iThe sampled point number;
As sample phase A tSatisfy condition: &ForAll; i < t , Ad i≤pd iAnd ad t>pd t, promptly the degree of distortion in service when the t time sampling is higher than when estimating the distortion factor, and sampling stops automatically.

Claims (4)

1. the multilevel data sampling method based on connected subgraph is characterized in that, comprises the following steps:
(1) the input data is set up map interlinking of K arest neighbors or the common adjacent map of K; For N data vector, K is the integer of
Figure FSB00000359934100011
;
(2) obtain the K arest neighbors map interlinking set up or the connected subgraph of the common adjacent map of K;
(3) to each connected subgraph, calculate its mean value as a sampled point, the set of all sampled points is the result of this time sampling;
(4) sampled result of obtaining with step (3) is as new input data, and repeating step (1) until satisfying the sampling end condition, is realized required multilevel data sampling to (3);
Sampling end condition in the said step (4) does,
The degree of distortion in service of definition sampled result for each data point arrive its nearest sampled point apart from sum, represent by following formula (1)
ad = ( &Sigma; i = 1 N dist ( X i , C ( X i ) ) 2 / N ) 1 / 2 - - - ( 1 )
Wherein ad is the distortion factor, and N is input data number, and the input data are X 1, X 2..., X N, corresponding sampling points is C (X 1), C (X 2) ..., C (X N), C (X wherein i) be to return X iThe function of sampled point, dist (X i, C (X i)) then be an X iAnd C (X i) between Euclidean distance;
A iThe sampled result D that obtains when the stage samples iThe expectation distortion factor define by the ratio that previous degree of distortion in service and sample dwindle:
pd i=(ad i-1+ad 1)(N i-1/N i) 1/d-ad 1 &ForAll; i > 1 - - - ( 2 )
Wherein, d is the dimension of data, ad 1Be the degree of distortion in service of phase I sampling, ad I-1Be sample phase A I-1Degree of distortion in service, N I-1Expression sample phase A I-1The sampled point number, N iExpression sample phase A iThe sampled point number; I is the integer greater than 1, stage A 1Represent sampling for the first time, A 2Represent sampling for the second time ... A iRepresent the i time sampling;
As sample phase A tSatisfy condition:
Figure FSB00000359934100014
Ad i≤pd iAnd ad t>pd t, promptly the degree of distortion in service when the t time sampling is higher than when estimating the distortion factor, and sampling stops automatically;
Perhaps, the sampling end condition in the said step (4) is that system continues sampling till the sampled point number is 1; In this process, preserve the result of each sampling, require from the result who preserves, to choose required sample according to the distortion factor of size or sample by the user.
2. the multilevel data sampling method based on connected subgraph according to claim 1 is characterized in that: in the said step (1), and K=1.
3. the multilevel data sampling method based on connected subgraph is characterized in that, comprises the following steps:
(1) the input data is set up map interlinking of K arest neighbors or the common adjacent map of K; For N data vector, K is the integer of
Figure FSB00000359934100021
;
(2) obtain the K arest neighbors map interlinking set up or the connected subgraph of the common adjacent map of K;
(3) to each connected subgraph, calculate its median as a sampled point, the set of all sampled points is the result of this time sampling;
(4) sampled result of obtaining with step (3) is as new input data, and repeating step (1) until satisfying the sampling end condition, is realized required multilevel data sampling to (3);
Sampling end condition in the said step (4) does,
The degree of distortion in service of definition sampled result for each data point arrive its nearest sampled point apart from sum, represent by following formula (1)
ad = ( &Sigma; i = 1 N dist ( X i , C ( X i ) ) 2 / N ) 1 / 2 - - - ( 1 )
Wherein ad is the distortion factor, and N is input data number, and the input data are X 1, X 2..., X N, corresponding sampling points is C (X 1), C (X 2) ..., C (X N), C (X wherein i) be to return X iThe function of sampled point, dist (X i, C (X i)) then be an X iAnd C (X i) between Euclidean distance;
A iThe sampled result D that obtains when the stage samples iThe expectation distortion factor by the preceding A that once samples I-1Degree of distortion in service and the sample ratio of dwindling define:
pd i=(ad i-1+ad 1)(N i-1/N i) 1/d-ad 1 &ForAll; i > 1 - - - ( 2 )
Wherein, d is the dimension of data, ad 1Be the degree of distortion in service of phase I sampling, ad I-1Be sample phase A I-1Degree of distortion in service, N I-1Expression sample phase A I-1The sampled point number, N iExpression sample phase A iThe sampled point number; I is the integer greater than 1, stage A 1Represent sampling for the first time, A 2Represent sampling for the second time ... A iRepresent the i time sampling;
As sample phase A tSatisfy condition:
Figure FSB00000359934100024
Ad i≤pd iAnd ad t>pd t, promptly the degree of distortion in service when the t time sampling is higher than when estimating the distortion factor, and sampling stops automatically;
Perhaps, the sampling end condition in the said step (4) is that system continues sampling till the sampled point number is 1; In this process, preserve the result of each sampling, require from the result who preserves, to choose required sample according to the distortion factor of size or sample by the user.
4. the multilevel data sampling method based on connected subgraph according to claim 3 is characterized in that: in the said step (1), and K=1.
CN 200910031265 2009-04-30 2009-04-30 Multilevel data sampling method based on connected subgraph Expired - Fee Related CN101567694B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910031265 CN101567694B (en) 2009-04-30 2009-04-30 Multilevel data sampling method based on connected subgraph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910031265 CN101567694B (en) 2009-04-30 2009-04-30 Multilevel data sampling method based on connected subgraph

Publications (2)

Publication Number Publication Date
CN101567694A CN101567694A (en) 2009-10-28
CN101567694B true CN101567694B (en) 2012-04-18

Family

ID=41283684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910031265 Expired - Fee Related CN101567694B (en) 2009-04-30 2009-04-30 Multilevel data sampling method based on connected subgraph

Country Status (1)

Country Link
CN (1) CN101567694B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1797409A (en) * 2004-12-30 2006-07-05 李强 Method for automatic obtaining engineering parameter values of sampling points in graph by using computer
CN1809840A (en) * 2003-05-22 2006-07-26 Lm爱立信电话有限公司 Method and system for supersampling rasterization of image data
US20070064806A1 (en) * 2005-09-16 2007-03-22 Sony Corporation Multi-stage linked process for adaptive motion vector sampling in video compression

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1809840A (en) * 2003-05-22 2006-07-26 Lm爱立信电话有限公司 Method and system for supersampling rasterization of image data
CN1797409A (en) * 2004-12-30 2006-07-05 李强 Method for automatic obtaining engineering parameter values of sampling points in graph by using computer
US20070064806A1 (en) * 2005-09-16 2007-03-22 Sony Corporation Multi-stage linked process for adaptive motion vector sampling in video compression

Also Published As

Publication number Publication date
CN101567694A (en) 2009-10-28

Similar Documents

Publication Publication Date Title
CN111160108B (en) Anchor-free face detection method and system
CN102594360B (en) Method and device for computer data compression
CN107481293B (en) Differential image compressed sensing reconstruction method based on multi-hypothesis weighting and intelligent terminal
CN111199740B (en) Unloading method for accelerating automatic voice recognition task based on edge calculation
CN103532645B (en) The compression frequency spectrum sensing method that a kind of observing matrix is optimized
CN110690931B (en) Digital signal adaptive code rate estimation method and device based on multi-wavelet-base combination
CN106028451A (en) User grouping mechanism applied to NOMA
CN104105193A (en) Power distribution method in heterogeneous network based on Starckelberg game
CN114422311A (en) Signal modulation identification method and system combining deep neural network and expert prior characteristics
CN101567694B (en) Multilevel data sampling method based on connected subgraph
CN109067678A (en) Based on Higher Order Cumulants WFRFT signal cascade Modulation Identification method, wireless communication system
CN101242168B (en) A realization method and device for FIR digital filter direct-connection
CN107197192B (en) A kind of method and system for face video in compressed video communication
CN103974274B (en) A kind of robustness beam form-endowing method promoting multiple cell efficiency
He et al. Deep learning-based automatic modulation recognition algorithm in non-cooperative communication systems
CN106102148A (en) A kind of base station dormancy method and device
CN108173610B (en) Second-order statistic-based cooperative spectrum sensing method for heterogeneous wireless network
CN116582133A (en) Intelligent management system for data in transformer production process
CN106331719A (en) K-L transformation error space dividing based image data compression method
CN100518323C (en) Method for performing matching compression to image using rotary compressed codebook
CN115857823A (en) Distributed compression storage method based on data sharing
CN109194667A (en) The device of realization I/Q data signal data compression and transfer function based on frequency domain detection
CN108712655A (en) A kind of group&#39;s image encoding method merged for similar image collection
CN115936291A (en) Method for constructing dynamic standard library based on multi-energy collaborative enterprise energy consumption under mass data
CN113595954A (en) PSS timing synchronization detection method based on segmented differential algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120418

Termination date: 20150430

EXPY Termination of patent right or utility model