CN101814112A

CN101814112A - Method and device for processing data

Info

Publication number: CN101814112A
Application number: CN201010033881A
Authority: CN
Inventors: 付新刚; 贾学力; 李建军
Original assignee: Beijing Cennavi Technologies Co Ltd
Current assignee: Beijing Cennavi Technologies Co Ltd
Priority date: 2010-01-11
Filing date: 2010-01-11
Publication date: 2010-08-25
Anticipated expiration: 2030-01-11
Also published as: CN101814112B; WO2011082616A1

Abstract

The embodiment of the invention discloses method and device for processing data, which relates to the field of intelligent transportation and aims at solving the problem that a great amount of vehicle speed data is difficult to store and manage and works of the subsequent vehicle speed analysis and the like are complicated. The method comprises the following steps of: obtaining more than two data sets to be processed; dividing the data sets to be processed into more than one category according to the similarity between the pre-obtained data sets to be processed; and combining more than two data sets of the same category in the categories according to the preset combining principle. The invention can be applied to the technical field of compressing a great amount of data.

Description

The method and apparatus of deal with data

Technical field

The present invention relates to intelligent transportation field, relate in particular to a kind of method and apparatus of deal with data.

Background technology

The dynamic information service is one of core research direction of present intelligent transportation system.In the dynamic information service technology, need to gather a large amount of vehicle speed datas, by analyzing and processing, can guide people's trip route, the service efficiency of raising road intelligently to these vehicle speed datas.

In realizing process of the present invention, the inventor finds that a large amount of vehicle speed datas is not only stored, difficult management, and makes operation complexity such as the subsequent analysis speed of a motor vehicle.

Summary of the invention

Embodiments of the invention provide a kind of method and apparatus of deal with data, amount of compressed data effectively.

For achieving the above object, embodiments of the invention adopt following technical scheme:

A kind of method of deal with data comprises:

Obtain data acquisition pending more than two;

Similarity according between described pending more than two data acquisition that obtains in advance is divided into an above classification with described data acquisition pending more than two;

Two above data acquisitions in the same classification in the described above classification are merged according to the merging rule that sets in advance.

A kind of device of deal with data comprises:

First acquiring unit is used to obtain data acquisition pending more than two;

Division unit is used for will being divided into an above classification by the pending data acquisition more than two that described first acquiring unit obtains according to the similarity between described pending more than two data acquisition that obtains in advance;

Merge cells, two above data acquisitions in the same classification of above classification that is used for being obtained by described division unit merge according to the merging rule that sets in advance.

The method and apparatus of the deal with data that the embodiment of the invention provides, by obtaining the similarity between the pending data acquisition, according to described similarity with pending data qualification, and the data acquisition in the same classification merged, effectively reduced the quantity of data acquisition, it is easier when unprocessed than original to make for the storage of data and management; Because the minimizing of data volume makes follow-up work also become simpler.The method and apparatus of the deal with data that embodiments of the invention provide, amount of compressed data effectively.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

The method flow diagram of the deal with data that Fig. 1 provides for the embodiment of the invention;

The method flow diagram of the deal with data that Fig. 2 provides for another embodiment of the present invention;

The method flow diagram that the H that Fig. 3 provides for the embodiment of the invention checks;

The method to set up process flow diagram of the sample size that Fig. 4 provides for the embodiment of the invention;

The structural representation one of the device of the deal with data that Fig. 5 provides for the embodiment of the invention;

Fig. 6 is the structural representation of first acquiring unit 501 shown in Figure 5;

Fig. 7 is the structural representation of division unit 502 shown in Figure 5;

Fig. 8 is another structural representation of division unit 502 shown in Figure 5;

Fig. 9 is the structural representation of second division unit 5014 shown in Figure 6.

Embodiment

For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

Have a large amount of vehicle speed datas in order to solve the existing stage, make storage, difficult management, and make operation complicated problems such as the subsequent analysis speed of a motor vehicle, the embodiment of the invention provides a kind of method and apparatus of deal with data.

As shown in Figure 1, the method for the deal with data that the embodiment of the invention provides comprises:

Step 101 is obtained data acquisition pending more than two;

In the present embodiment, described data acquisition pending more than two can be some days the traffic flow data that extracts from the historical data base of setting up in advance, after also can being division with the traffic flow data elapsed time section of some day, the traffic flow data of several time periods of acquisition.

Step 102, the similarity according between described pending more than two data acquisition that obtains in advance is divided into an above classification with described data acquisition pending more than two;

Whether in the present embodiment, what obtain two similarities employings between the data set is the H method of inspection, and it comprises F check and T check, promptly check two mean square deviation and averages between the set to equate.H upchecks, and shows that two set are similar, can be classified as a classification; Otherwise the H check is not passed through, and shows that two set are dissimilar, does not belong to a classification.

Step 103 merges two above data acquisitions in the same classification in the described above classification according to the merging rule that sets in advance.

In the present embodiment, with similar some days traffic flow data set,, merge into one day traffic flow data set by averaged; Perhaps,,, merge into data, represent the traffic flow of the whole time period after the merging with these data by averaged with several similar in one day time period traffic flow datas set.

The method of the deal with data that the embodiment of the invention provides, by obtaining the similarity between the pending data acquisition, according to described similarity with pending data qualification, and the data acquisition in the same classification merged, effectively reduced the quantity of data acquisition, it is easier when unprocessed than original to make for the storage of data and management; Because the minimizing of data volume makes follow-up work also become simpler.

In order to make those skilled in the art can more be expressly understood the technical scheme that the embodiment of the invention provides, below by specific embodiment, the method for the deal with data that another embodiment of the present invention is provided is elaborated.

As shown in Figure 2, the method for the deal with data that another embodiment of the present invention provides comprises:

Step 201 is obtained the set of traffic flow data more than two days of gathering in advance, and this set is pending data acquisition;

In the present embodiment, the traffic flow data in the historical data base is divided by characteristic day earlier, the traffic flow data that will have same characteristic features is merged into one day traffic flow data; Divide by the time period in every day again, several time periods are merged into data.Certainly, the division of characteristic day is carried out in division that also can advanced line time section again; Pending data acquisition only can also be carried out the division of characteristic day or time period, give unnecessary details no longer one by one herein.Wherein, the factor to traffic flow generation material impact of the implication of characteristic day for repeating to take place in the historical data research object of being analyzed is as festivals or holidays, week, weather etc.For a road,, can be divided into seven kinds: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday if " week " divided as characteristic day; If " festivals or holidays " are divided as characteristic day, can be divided into three kinds: " 11 " golden week, general red-letter day, the Spring Festival etc. connect for a long time stops; If " weather " is divided as characteristic day, can be divided into four kinds: fine, overcast and rainy, heavy rain sky, heavy snow sky.

In the historical data base of having set up, randomly draw L bar link, altogether the historical traffic flow data of the M month.Randomly drawing principle is: guarantee that link covers each grade link equably, covers different geographic areas equably, and linkage length is greater than 200 meters; Secondly, consider the complexity and the feasibility of calculating, L gets 1% of total link number in the historical data base usually, is greater than 100 usually; M got 12 months usually.Under the situation that design conditions allow, can suitably increase the numerical value of L and M.In the present embodiment, select for use " week ", and 12 months traffic flow data getting 100 links is as pending data acquisition as characteristic day.

Step 202 is calculated the described related coefficient that has the data acquisition of identical specific characteristic more than two days in the traffic flow data set, obtains two above facies relationship ordered series of numbers;

In the present embodiment, at first the data of a link are handled, identical specific characteristic is several homogeny of week.For example, 12 of a link Mondays that the middle of the month is all, all second-class.Related coefficient between the data acquisition that calculates these 12 all Mondays in the middle of the month is gathered in twos, for example, supposing has 8 set these 12 Mondays in the middle of the month, then to calculate the related coefficient between per two set in these 8 set, obtain 28 facies relationship numerical value altogether, with the facies relationship ordered series of numbers of these 28 facies relationship numerical value as Monday; In like manner, can obtain the facies relationship ordered series of numbers on Tuesday to Sunday.These 7 facies relationship ordered series of numbers are grouped together form a related coefficient tabulation.The formula that calculates two related coefficients between the set particularly is:

ρ_{xy} = (Σ_{t = 1}^{n} y_{t} x_{t}) / \sqrt{Σ_{t = 1}^{n} y_{t}^{2} Σ_{t = 1}^{n} x_{t}^{2}}

Wherein, n is the number of data in the set, x _tAnd y _tBe respectively traffic flow data in a day, the time dependent value of the speed of a motor vehicle in promptly a day.The related coefficient of calculating by this formula is listed as follows shown in the table:

Table one:

Monday	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday
Monday	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday	??ρ ₁₁??ρ ₁₂??ρ ₁₃??ρ ₁₄??.??.??.	??ρ ₂₁??ρ ₂₂??ρ ₂₃??ρ ₂₄??.??.??.	??ρ ₃₁??ρ ₃₂??ρ ₃₃??ρ ₃₄??.??.??.	??ρ ₄₁??ρ ₄₂??ρ ₄₃??ρ ₄₄??.??.??.	??ρ ₅₁??ρ ₅₂??ρ ₅₃??ρ ₅₄??.??.??.	??ρ ₆₁??ρ ₆₂??ρ ₆₃??ρ ₆₄??.??.??.	??ρ ₇₁??ρ ₇₂??ρ ₇₃??ρ ₇₄??.??.??.

Step 203 is carried out the H check with per two facies relationship ordered series of numbers in described two above facies relationship ordered series of numbers, obtains first assay;

In the present embodiment, the data acquisition in the facies relationship ordered series of numbers of supposing to obtain by step 202 all meets normal distribution, and then available H checks and judges two similarities between the facies relationship ordered series of numbers.The H check comprises F check and T check.Wherein, F check be for the mean square deviation of judging two facies relationship ordered series of numbers whether identical; T check be for the average of judging two facies relationship ordered series of numbers whether identical.As shown in Figure 3, the concrete grammar of H check comprises:

Step 301 reads two facies relationship ordered series of numbers;

Step 302, the described facies relationship ordered series of numbers of standardization changes it into standardized normal distribution;

In the present embodiment, in order to represent conveniently, with resulting ρ in the step 202 _iUse x _iAnd y _iRepresent.Suppose that a related coefficient classifies as: x～N (μ ₁, σ ₁ ²),

Be the sample of x, i.e. facies relationship ordered series of numbers data wherein, μ ₁Be the average of x, σ ₁ ²Variance for x; Another related coefficient is classified y～N (μ as ₂, σ ₂ ²),

Be the sample of y, μ ₂Be the average of y, σ ₂ ²Variance for y.And these two samples of x and y are separate.Available following formula comes standardization x and y:

x_{i}^{'} = \frac{x_{i} - μ_{1}}{σ_{1} / \sqrt{n_{1}}} ~ N (0,1), (1 \leq i \leq n_{1})

y_{i}^{'} = \frac{y_{i} - μ_{2}}{σ_{2} / \sqrt{n_{2}}} ~ N (0,1), (1 \leq i \leq n_{2})

Because σ ₁And σ ₂Be actually unknown, so need use S ₁And S ₂Replace:

S_{1} = \sqrt{\frac{1}{n_{1} - 1} Σ_{i = 1}^{n_{1}} {(x_{i} - \overset{&OverBar;}{x})}^{2}},

S_{2} = \sqrt{\frac{1}{n_{2} - 1} Σ_{i = 1}^{n_{2}} {(y_{i} - \overset{&OverBar;}{y})}^{2}} .

Wherein, x and y are respectively

With Average.

Step 303 is carried out the F check with the facies relationship ordered series of numbers after the standardization;

The statistic of structure F check:

F = \frac{S_{1}^{2} / σ_{1}^{2}}{S_{2}^{2} / σ_{2}^{2}} ~ F (n_{1} - 1, n_{2} - 1)

If H ₀:

H ₁:

Owing to work as H ₀During establishment,

F = \frac{S_{1}^{2}}{S_{2}^{2}} ~ F (n_{1} - 1, n_{2} - 1)

So the region of rejection of F check is:

Perhaps

When assay in this region of rejection, the variance of two set of expression does not wait, and does not satisfy F check, directly exports assay, does not need to carry out T and has checked.In the present embodiment, the output result who does not satisfy the H check is set is " 1 ", the output result who satisfies the H check is " 0 ", so when assay is in above-mentioned region of rejection, output result 1.

Step 304 when F upchecks, is carried out the T check;

The statistic of structure T check:

T = \frac{\overset{&OverBar;}{x} - \overset{&OverBar;}{y}}{\frac{(n_{1} - 1) S_{1}^{2} + (n_{2} - 1) S_{2}^{2}}{n_{1} + n_{2} - 2} \cdot \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} ~ t (n_{1} + n_{2} - 2)

If H ₀: μ ₁=μ ₂, H ₁: μ ₁≠ μ ₂

Then work as H ₀During establishment, the region of rejection of T check is:

W = {| T | > t_{1 - \frac{α}{2}} (n_{1} + n_{2} - 2)}

When assay in this region of rejection, the average of two set of expression does not wait, and does not satisfy T check.In the present embodiment, the output result who does not satisfy the H check is set is " 1 ", the output result who satisfies the H check is " 0 ".So when assay is in above-mentioned region of rejection, output assay 1.

In the present embodiment, under the level of signifiance of α=0.05, carry out H check, assay h between any two for the facies relationship ordered series of numbers in the table one with above-mentioned steps _IjExpression, h _IjValue be 0 or 1, and, h _Ij=h _Ji

The assay statistical form is as shown in Table 2:

Table two:

Last table is the assay of a link, and 100 links that taken out in the step 201 are carried out the H check according to step 202 to 203 method, obtains assay as shown in Table 2 respectively.Because for different links, above-mentioned statistics may be different, so need analyze all statisticses, are aggregated into the result shown in the table three:

Table three:

Account for the overall test number	Monday Tuesday	Monday Wednesday	Monday Thursday	……	Friday Sunday	Saturday Sunday
Account for the overall test number	Monday Tuesday	Monday Wednesday	Monday Thursday	……	Friday Sunday	Saturday Sunday	The assay of H is 0	??a ₁₂	??a ₁₃	??a ₁₄	……	??????a ₅₇	??a ₆₇
The assay of H is 1	??b ₁₂	??b ₁₃	??b ₁₄	……	b ₅₇	??b ₆₇	The assay of H is 0	??a ₁₂	??a ₁₃	??a ₁₄	……	??????a ₅₇	??a ₆₇

As long as satisfy a _Ij＞0.7, can think that just the H assay is that the number percent of 0 shared overall test number is 1 shared number percent much larger than the H assay, promptly to be listed as with j be similar to the i row.

Step 204 is obtained described similarity between the traffic flow data set more than two days according to described H assay;

Can find out clearly that by table two and table three which data acquisition of two days is similar.For example, if a ₂₃＞0.7, show that the data acquisition of Tuesday and Wednesday is similar; If a ₁₂≤ 0.7, show the data acquisition dissmilarity of Monday and Tuesday.

Step 205, according to described similarity between the traffic flow data set more than two days, will be described more than two days traffic flow data gather and be divided into an above classification;

In the present embodiment, gather when similar, can be classified as a classification when two data.Suppose in table two a ₁₅＞0.7, a ₂₃＞0.7, a ₃₄＞0.7, a ₆₇＞0.7, show seven days data to be classified as three classifications, be respectively: Monday and Friday, Tuesday, Wednesday and Thursday, Saturday and Sunday.

Step 206 is merged into one day traffic flow data with the set of the traffic flow data more than two days in the same classification in the described above classification according to the merging rule that sets in advance;

In the present embodiment, the rule of several days traffic flow data being merged into one day data is to get corresponding mean value constantly.For example, with Tuesday, Wednesday and Thursday 12 noon the method that merges of traffic flow data be: the mean value of asking for three days 12 point data.In like manner can calculate other mean value constantly.These mean values constitute the traffic flow data after merging.

Step 207 is obtained one day traffic flow data;

In the present embodiment, described one day traffic flow data is with the traffic flow data after original traffic flow data process " week " merging.Certainly, also can directly from historical data base, take out the data acquisition that does not merge, repeat no more herein through characteristic day.

Step 208 is divided into two above segment data set equal time with described one day traffic flow data according to the sample size that sets in advance;

In the present embodiment, at first the traffic flow data that a time span is divided a day to be set.The data number that this time span the inside comprises is exactly a sample size.As shown in Figure 4, the concrete method to set up of sample size comprises:

Step 401, the value set of obtaining described sample size;

In the present embodiment, sampling interval is 5 minutes, and therefore one day traffic flow data is 288.If as a time span, then sample size is 1 each sampling instant, can be divided into one day traffic flow data 288 continuous time section; If sample size is 2, can be divided into one day traffic flow data 287 continuous time section, the rest may be inferred.If sample size is n, then one day traffic flow data can be divided into 288-(n-1) individual continuous time of section.In theory, the sample size value can get 277, promptly one day traffic flow data be divided into 2 continuous time section.But it is just nonsensical that in fact, n surpasses a half of total data number in a day.The value set that can obtain sample size thus for n | 1≤n≤N/2}, wherein N is the number of traffic flow data in a day, is 288 in the present embodiment, n and N all round numerical value.

Step 402 when described sample size is got each sample size value in the set of described value, is obtained the similarity between described each sample size value time corresponding segment data set;

In the present embodiment, n is since 1 value to 144.For example, when n=1, T ₁={ x ₁, T ₂={ x ₂, T ₃={ x ₃... T ₂₈₈={ x ₂₈₈; When n=2, T ₁={ x ₁, x ₂, T ₂={ x ₂, x ₃... .T ₂₈₇={ x ₂₈₇, x ₂₈₈; T _iGeneral expression formula as follows:

T _i＝{x _i，x _i+1，…x _i+n-1}??T _i+1＝{x _i+1，x _i+2，…x _i+n}

Wherein, 1≤i≤N-n.To T _iAnd T _I+1Carry out H check, and records tests h as a result _{I (i+1)} ⁿ, as i H test ending when 1 varies to N-n, h _{I (i+1)} ⁿBe the similarity between segment data set equal time of described each sample size value correspondence.

Step 403 is obtained similar number set between described each sample size value time corresponding segment data set according to described similarity;

In the present embodiment, when statistics n gets each value, corresponding h _{I (i+1)} ⁿValue, and count all h _{I (i+1)} ⁿAmong the result

Number.For example, when n=1, Number be s ₁When n=2, Number be s ₂When n=144,

Number be s ₁₄₄All numbers are formed set { s ₁, s ₂... s ₁₄₄.

Step 404, the greatest measure corresponding sample capability value in the described similar number set is set to sample size.

In the present embodiment, get set { s ₁, s ₂... s ₁₄₄In maximal value, be designated as s _Max=max{s ₁, s ₂... s ₁₄₄, with s _MaxAs sample size.

Need to prove, also can obtain sample size according to sample size few principle of trying one's best.Usually along with n varies to N/2 from 1, s can increase gradually earlier again and reduce gradually, the possibility that backward also may occur increasing again, but finally all can reduce.Therefore, can select existing first time of turning point corresponding sample capacity, this moment, sample size was all less, was generally 3 or 4.

Step 209 is carried out the H check with every adjacent two time period data acquisitions in described two above segment data set equal time, obtains second assay;

In the present embodiment, suppose that by the sample size that the described method of step 208 is obtained be 3, then Y ₁=(x ₁, x ₂, x ₃), Y ₂=(x ₄, x ₅, x ₆), Y ₃=(x ₇, x ₈, x ₉) ... Y ₉₆=(x ₂₈₆, x ₂₈₇, x ₂₈₈).Sample size is that the general expression formula of n is: Y _a=(x _i, x _I+1... x _I+n-1), Y _b=(x _I+n, x _I+n+1... x _I+2n-1), i=1+nj, 0≤j≤([N/n] _{+ ∞}-2), wherein [] _{+ ∞}Represent that positive infinity rounds.Along with the change of i, to every couple of Y _aAnd Y _bDo the H check, obtain assay h _i

Step 210 is obtained similarity between described two above equal time of segment data set according to described second assay;

In the present embodiment, work as h _i=0 o'clock, the data acquisition that shows two time periods was similar; Otherwise, work as h _i=1 o'clock, the data acquisition dissmilarity of two time periods.

Step 211 according to the similarity between described two above segment data set equal time, is divided into an above classification with described two above segment data set equal time;

In the present embodiment, gather when similar, can be classified as a classification when two data.Need to prove that described data acquisition is continuous in time data acquisition.

Step 212 is merged into data with two in the same classification in the described above classification above segment data set equal time according to the merging rule that sets in advance.

In the present embodiment, the rule of the data acquisition of several time periods being merged into data is to get the mean value of all data in these several time periods.For example, 0:00 to the traffic flow data merging method of 6:00 is: ask for the mean value of 0:00 to all data between the 6:00.Duan merging method is identical At All Other Times, repeats no more.

As shown in Figure 5, the embodiment of the invention also provides a kind of device of deal with data, comprising:

First acquiring unit 501 is used to obtain data acquisition pending more than two;

Division unit 502 is used for will being divided into an above classification by the pending data acquisition more than two that described first acquiring unit 501 obtains according to the similarity between described pending more than two data acquisition that obtains in advance;

Merge cells 503, two above data acquisitions in the same classification of above classification that is used for being obtained by described division unit 502 merge according to the merging rule that sets in advance.

Further, as shown in Figure 6, described first acquiring unit 501 comprises:

First obtains subelement 5011, is used to obtain the set of traffic flow data more than two days of gathering in advance; First is provided with unit 5012, is used for obtaining the set of traffic flow data more than two days that subelement 5011 obtains by described first and is set to described data acquisition pending more than two; Perhaps,

Second obtains subelement 5013, is used to obtain one day traffic flow data gathering in advance; Second division unit 5014 is used for being divided into two above segment data set equal time with obtaining one day traffic flow data that subelement 5013 obtains by described second according to the sample size that sets in advance; Second is provided with unit 5015, and two above segment data set equal time that are used for being divided by described second division unit 5014 are set to described data acquisition pending more than two.

Further, as shown in Figure 7, when described data acquisition pending more than two is described traffic flow data when set more than two days, described division unit 502 comprises:

First computing unit 5021 is used for calculating and obtains the set of traffic flow data more than two days that subelement 5011 obtains by described first and have the related coefficient of the data acquisition of identical specific characteristic, obtains two above facies relationship ordered series of numbers;

First verification unit 5022, per two the facies relationship ordered series of numbers of two above facies relationship ordered series of numbers that are used for being calculated by described first computing unit 5021 carry out the H check, obtain first assay;

Second acquisition unit 5023 is used for obtaining described similarity between the traffic flow data set more than two days according to first assay of being obtained by described first verification unit 5022.

Further, as shown in Figure 8, when described data acquisition pending more than two be described two above equal time segment data set fashionable, described division unit 502 comprises:

Second verification unit 5024 is used for carrying out the H check with obtaining two every adjacent two time period data acquisitions of above segment data set equal time that subelement 5013 obtains by described second, obtains second assay;

The 3rd acquiring unit 5025 is used for obtaining similarity between described two above segment data set equal time according to second assay of being obtained by described second verification unit 5024.

Further, as shown in Figure 9, described second division unit 5014 comprises:

The 4th acquiring unit 601, the value set that is used to obtain described sample size;

Second computing unit 602 is used for when described sample size is got each sample size value of described value set, the similarity between segment data equal time of calculating described each sample size value correspondence is gathered;

The 5th acquiring unit 603 is used for according to similar number set between segment data set equal time of being obtained described each sample size value correspondence by the similarity of described second computing unit 602 calculating;

The 3rd is provided with unit 604, is used for being set to sample size by the greatest measure corresponding sample capability value that the similar number that described the 5th acquiring unit 603 obtains is gathered.

More than Zhuan Zhi specific implementation method can referring to as Fig. 2 as described in step 201～212 shown in Figure 4, repeat no more herein.

The device of the deal with data that the embodiment of the invention provides, by obtaining the similarity between the pending data acquisition, according to described similarity with pending data qualification, and the data acquisition in the same classification merged, effectively reduced the quantity of data acquisition, it is easier when unprocessed than original to make for the storage of data and management; Because the minimizing of data volume makes follow-up work also become simpler.

Technical scheme provided by the invention can be applied in the technical field that mass data is compressed.

One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to finish by program, described program can be stored in the computer-readable recording medium, as ROM/RAM, magnetic disc or CD etc.

The above; only be the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by described protection domain with claim.

Claims

1. the method for a deal with data is characterized in that, comprising:

Obtain data acquisition pending more than two;

2. the method for deal with data according to claim 1 is characterized in that, describedly obtains data acquisition pending more than two and comprises:

Obtain the set of traffic flow data more than two days of gathering in advance; Described more than two days traffic flow data set be set to described data acquisition pending more than two; Perhaps,

Obtain one day traffic flow data gathering in advance; Described one day traffic flow data is divided into two above segment data set equal time according to the sample size that sets in advance; Described two above segment data set equal time are set to described data acquisition pending more than two.

3. the method for deal with data according to claim 2, it is characterized in that, when described data acquisition pending more than two is described traffic flow data when set more than two days, the described step of obtaining the similarity between the described data acquisition pending more than two comprises:

Calculate the described related coefficient that has the data acquisition of identical specific characteristic more than two days in the traffic flow data set, obtain two above facies relationship ordered series of numbers;

Per two facies relationship ordered series of numbers in described two above facies relationship ordered series of numbers are carried out the H check, obtain first assay;

Obtain described similarity between the traffic flow data set more than two days according to described first assay.

4. the method for deal with data according to claim 2, it is characterized in that, when described data acquisition pending more than two be described two above equal time segment data set fashionable, the described step of obtaining the similarity between the described data acquisition pending more than two comprises:

Every adjacent two time period data acquisitions in described two above segment data set equal time are carried out the H check, obtain second assay;

Obtain similarity between described two above equal time of segment data set according to described second assay.

5. the method for deal with data according to claim 2 is characterized in that, the step that is provided with of described sample size comprises:

Obtain the value set of described sample size;

When described sample size is got each sample size value in the set of described value, obtain the similarity between described each sample size value time corresponding segment data set;

Obtain similar number set between described each sample size value time corresponding segment data set according to described similarity;

Greatest measure corresponding sample capability value in the described similar number set is set to sample size.

6. the device of a deal with data is characterized in that, comprising:

First acquiring unit is used to obtain data acquisition pending more than two;

7. the device of deal with data according to claim 6 is characterized in that, described first acquiring unit comprises:

First obtains subelement, is used to obtain the set of traffic flow data more than two days of gathering in advance; First is provided with the unit, is used for obtaining the set of traffic flow data more than two days that subelement obtains by described first and is set to described data acquisition pending more than two; Perhaps,

Second obtains subelement, is used to obtain one day traffic flow data gathering in advance; Second division unit is used for being divided into two above segment data set equal time with obtaining one day traffic flow data that subelement obtains by described second according to the sample size that sets in advance; Second is provided with the unit, and two above segment data set equal time that are used for being divided by described second division unit are set to described data acquisition pending more than two.

8. the device of deal with data according to claim 7 is characterized in that, when described data acquisition pending more than two is described traffic flow data when set more than two days, described division unit comprises:

First computing unit is used for calculating and obtains the set of traffic flow data more than two days that subelement obtains by described first and have the related coefficient of the data acquisition of identical specific characteristic, obtains two above facies relationship ordered series of numbers;

First verification unit, per two the facies relationship ordered series of numbers of two above facies relationship ordered series of numbers that are used for being calculated by described first computing unit carry out the H check, obtain first assay;

Second acquisition unit is used for obtaining described similarity between the traffic flow data set more than two days according to first assay of being obtained by described first verification unit.

9. the device of deal with data according to claim 7 is characterized in that, when described data acquisition pending more than two be described two above equal time segment data set fashionable, described division unit comprises:

Second verification unit is used for carrying out the H check with obtaining two every adjacent two time period data acquisitions of above segment data set equal time that subelement obtains by described second, obtains second assay;

The 3rd acquiring unit is used for obtaining similarity between described two above segment data set equal time according to second assay of being obtained by described second verification unit.

10. the device of deal with data according to claim 7 is characterized in that, described second division unit comprises:

The 4th acquiring unit, the value set that is used to obtain described sample size;

Second computing unit is used for when described sample size is got each sample size value of described value set, the similarity between segment data equal time of calculating described each sample size value correspondence is gathered;

The 5th acquiring unit is used for according to similar number set between segment data set equal time of being obtained described each sample size value correspondence by the similarity of described second computing unit calculating;

The 3rd is provided with the unit, is used for being set to sample size by the greatest measure corresponding sample capability value that the similar number that described the 5th acquiring unit obtains is gathered.