CN101814112A - Method and device for processing data - Google Patents

Method and device for processing data Download PDF

Info

Publication number
CN101814112A
CN101814112A CN201010033881A CN201010033881A CN101814112A CN 101814112 A CN101814112 A CN 101814112A CN 201010033881 A CN201010033881 A CN 201010033881A CN 201010033881 A CN201010033881 A CN 201010033881A CN 101814112 A CN101814112 A CN 101814112A
Authority
CN
China
Prior art keywords
data
traffic flow
data acquisition
sample size
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201010033881A
Other languages
Chinese (zh)
Other versions
CN101814112B (en
Inventor
付新刚
贾学力
李建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Cennavi Technologies Co Ltd
Original Assignee
Beijing Cennavi Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Cennavi Technologies Co Ltd filed Critical Beijing Cennavi Technologies Co Ltd
Priority to CN2010100338819A priority Critical patent/CN101814112B/en
Publication of CN101814112A publication Critical patent/CN101814112A/en
Priority to PCT/CN2010/079706 priority patent/WO2011082616A1/en
Application granted granted Critical
Publication of CN101814112B publication Critical patent/CN101814112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

The embodiment of the invention discloses method and device for processing data, which relates to the field of intelligent transportation and aims at solving the problem that a great amount of vehicle speed data is difficult to store and manage and works of the subsequent vehicle speed analysis and the like are complicated. The method comprises the following steps of: obtaining more than two data sets to be processed; dividing the data sets to be processed into more than one category according to the similarity between the pre-obtained data sets to be processed; and combining more than two data sets of the same category in the categories according to the preset combining principle. The invention can be applied to the technical field of compressing a great amount of data.

Description

The method and apparatus of deal with data
Technical field
The present invention relates to intelligent transportation field, relate in particular to a kind of method and apparatus of deal with data.
Background technology
The dynamic information service is one of core research direction of present intelligent transportation system.In the dynamic information service technology, need to gather a large amount of vehicle speed datas, by analyzing and processing, can guide people's trip route, the service efficiency of raising road intelligently to these vehicle speed datas.
In realizing process of the present invention, the inventor finds that a large amount of vehicle speed datas is not only stored, difficult management, and makes operation complexity such as the subsequent analysis speed of a motor vehicle.
Summary of the invention
Embodiments of the invention provide a kind of method and apparatus of deal with data, amount of compressed data effectively.
For achieving the above object, embodiments of the invention adopt following technical scheme:
A kind of method of deal with data comprises:
Obtain data acquisition pending more than two;
Similarity according between described pending more than two data acquisition that obtains in advance is divided into an above classification with described data acquisition pending more than two;
Two above data acquisitions in the same classification in the described above classification are merged according to the merging rule that sets in advance.
A kind of device of deal with data comprises:
First acquiring unit is used to obtain data acquisition pending more than two;
Division unit is used for will being divided into an above classification by the pending data acquisition more than two that described first acquiring unit obtains according to the similarity between described pending more than two data acquisition that obtains in advance;
Merge cells, two above data acquisitions in the same classification of above classification that is used for being obtained by described division unit merge according to the merging rule that sets in advance.
The method and apparatus of the deal with data that the embodiment of the invention provides, by obtaining the similarity between the pending data acquisition, according to described similarity with pending data qualification, and the data acquisition in the same classification merged, effectively reduced the quantity of data acquisition, it is easier when unprocessed than original to make for the storage of data and management; Because the minimizing of data volume makes follow-up work also become simpler.The method and apparatus of the deal with data that embodiments of the invention provide, amount of compressed data effectively.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
The method flow diagram of the deal with data that Fig. 1 provides for the embodiment of the invention;
The method flow diagram of the deal with data that Fig. 2 provides for another embodiment of the present invention;
The method flow diagram that the H that Fig. 3 provides for the embodiment of the invention checks;
The method to set up process flow diagram of the sample size that Fig. 4 provides for the embodiment of the invention;
The structural representation one of the device of the deal with data that Fig. 5 provides for the embodiment of the invention;
Fig. 6 is the structural representation of first acquiring unit 501 shown in Figure 5;
Fig. 7 is the structural representation of division unit 502 shown in Figure 5;
Fig. 8 is another structural representation of division unit 502 shown in Figure 5;
Fig. 9 is the structural representation of second division unit 5014 shown in Figure 6.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
Have a large amount of vehicle speed datas in order to solve the existing stage, make storage, difficult management, and make operation complicated problems such as the subsequent analysis speed of a motor vehicle, the embodiment of the invention provides a kind of method and apparatus of deal with data.
As shown in Figure 1, the method for the deal with data that the embodiment of the invention provides comprises:
Step 101 is obtained data acquisition pending more than two;
In the present embodiment, described data acquisition pending more than two can be some days the traffic flow data that extracts from the historical data base of setting up in advance, after also can being division with the traffic flow data elapsed time section of some day, the traffic flow data of several time periods of acquisition.
Step 102, the similarity according between described pending more than two data acquisition that obtains in advance is divided into an above classification with described data acquisition pending more than two;
Whether in the present embodiment, what obtain two similarities employings between the data set is the H method of inspection, and it comprises F check and T check, promptly check two mean square deviation and averages between the set to equate.H upchecks, and shows that two set are similar, can be classified as a classification; Otherwise the H check is not passed through, and shows that two set are dissimilar, does not belong to a classification.
Step 103 merges two above data acquisitions in the same classification in the described above classification according to the merging rule that sets in advance.
In the present embodiment, with similar some days traffic flow data set,, merge into one day traffic flow data set by averaged; Perhaps,,, merge into data, represent the traffic flow of the whole time period after the merging with these data by averaged with several similar in one day time period traffic flow datas set.
The method of the deal with data that the embodiment of the invention provides, by obtaining the similarity between the pending data acquisition, according to described similarity with pending data qualification, and the data acquisition in the same classification merged, effectively reduced the quantity of data acquisition, it is easier when unprocessed than original to make for the storage of data and management; Because the minimizing of data volume makes follow-up work also become simpler.
In order to make those skilled in the art can more be expressly understood the technical scheme that the embodiment of the invention provides, below by specific embodiment, the method for the deal with data that another embodiment of the present invention is provided is elaborated.
As shown in Figure 2, the method for the deal with data that another embodiment of the present invention provides comprises:
Step 201 is obtained the set of traffic flow data more than two days of gathering in advance, and this set is pending data acquisition;
In the present embodiment, the traffic flow data in the historical data base is divided by characteristic day earlier, the traffic flow data that will have same characteristic features is merged into one day traffic flow data; Divide by the time period in every day again, several time periods are merged into data.Certainly, the division of characteristic day is carried out in division that also can advanced line time section again; Pending data acquisition only can also be carried out the division of characteristic day or time period, give unnecessary details no longer one by one herein.Wherein, the factor to traffic flow generation material impact of the implication of characteristic day for repeating to take place in the historical data research object of being analyzed is as festivals or holidays, week, weather etc.For a road,, can be divided into seven kinds: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday if " week " divided as characteristic day; If " festivals or holidays " are divided as characteristic day, can be divided into three kinds: " 11 " golden week, general red-letter day, the Spring Festival etc. connect for a long time stops; If " weather " is divided as characteristic day, can be divided into four kinds: fine, overcast and rainy, heavy rain sky, heavy snow sky.
In the historical data base of having set up, randomly draw L bar link, altogether the historical traffic flow data of the M month.Randomly drawing principle is: guarantee that link covers each grade link equably, covers different geographic areas equably, and linkage length is greater than 200 meters; Secondly, consider the complexity and the feasibility of calculating, L gets 1% of total link number in the historical data base usually, is greater than 100 usually; M got 12 months usually.Under the situation that design conditions allow, can suitably increase the numerical value of L and M.In the present embodiment, select for use " week ", and 12 months traffic flow data getting 100 links is as pending data acquisition as characteristic day.
Step 202 is calculated the described related coefficient that has the data acquisition of identical specific characteristic more than two days in the traffic flow data set, obtains two above facies relationship ordered series of numbers;
In the present embodiment, at first the data of a link are handled, identical specific characteristic is several homogeny of week.For example, 12 of a link Mondays that the middle of the month is all, all second-class.Related coefficient between the data acquisition that calculates these 12 all Mondays in the middle of the month is gathered in twos, for example, supposing has 8 set these 12 Mondays in the middle of the month, then to calculate the related coefficient between per two set in these 8 set, obtain 28 facies relationship numerical value altogether, with the facies relationship ordered series of numbers of these 28 facies relationship numerical value as Monday; In like manner, can obtain the facies relationship ordered series of numbers on Tuesday to Sunday.These 7 facies relationship ordered series of numbers are grouped together form a related coefficient tabulation.The formula that calculates two related coefficients between the set particularly is:
ρ xy = ( Σ t = 1 n y t x t ) / Σ t = 1 n y t 2 Σ t = 1 n x t 2
Wherein, n is the number of data in the set, x tAnd y tBe respectively traffic flow data in a day, the time dependent value of the speed of a motor vehicle in promptly a day.The related coefficient of calculating by this formula is listed as follows shown in the table:
Table one:
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
??ρ 11??ρ 12??ρ 13??ρ 14??.??.??. ??ρ 21??ρ 22??ρ 23??ρ 24??.??.??. ??ρ 31??ρ 32??ρ 33??ρ 34??.??.??. ??ρ 41??ρ 42??ρ 43??ρ 44??.??.??. ??ρ 51??ρ 52??ρ 53??ρ 54??.??.??. ??ρ 61??ρ 62??ρ 63??ρ 64??.??.??. ??ρ 71??ρ 72??ρ 73??ρ 74??.??.??.
Step 203 is carried out the H check with per two facies relationship ordered series of numbers in described two above facies relationship ordered series of numbers, obtains first assay;
In the present embodiment, the data acquisition in the facies relationship ordered series of numbers of supposing to obtain by step 202 all meets normal distribution, and then available H checks and judges two similarities between the facies relationship ordered series of numbers.The H check comprises F check and T check.Wherein, F check be for the mean square deviation of judging two facies relationship ordered series of numbers whether identical; T check be for the average of judging two facies relationship ordered series of numbers whether identical.As shown in Figure 3, the concrete grammar of H check comprises:
Step 301 reads two facies relationship ordered series of numbers;
Step 302, the described facies relationship ordered series of numbers of standardization changes it into standardized normal distribution;
In the present embodiment, in order to represent conveniently, with resulting ρ in the step 202 iUse x iAnd y iRepresent.Suppose that a related coefficient classifies as: x~N (μ 1, σ 1 2),
Figure G2010100338819D00061
Be the sample of x, i.e. facies relationship ordered series of numbers data wherein, μ 1Be the average of x, σ 1 2Variance for x; Another related coefficient is classified y~N (μ as 2, σ 2 2),
Figure G2010100338819D00062
Be the sample of y, μ 2Be the average of y, σ 2 2Variance for y.And these two samples of x and y are separate.Available following formula comes standardization x and y:
x i ′ = x i - μ 1 σ 1 / n 1 ~ N ( 0,1 ) , ( 1 ≤ i ≤ n 1 )
y i ′ = y i - μ 2 σ 2 / n 2 ~ N ( 0,1 ) , ( 1 ≤ i ≤ n 2 )
Because σ 1And σ 2Be actually unknown, so need use S 1And S 2Replace:
S 1 = 1 n 1 - 1 Σ i = 1 n 1 ( x i - x ‾ ) 2 , S 2 = 1 n 2 - 1 Σ i = 1 n 2 ( y i - y ‾ ) 2 .
Wherein, x and y are respectively
Figure G2010100338819D00067
With Average.
Step 303 is carried out the F check with the facies relationship ordered series of numbers after the standardization;
The statistic of structure F check:
F = S 1 2 / σ 1 2 S 2 2 / σ 2 2 ~ F ( n 1 - 1 , n 2 - 1 )
If H 0:
Figure G2010100338819D00072
H 1:
Figure G2010100338819D00073
Owing to work as H 0During establishment,
F = S 1 2 S 2 2 ~ F ( n 1 - 1 , n 2 - 1 )
So the region of rejection of F check is:
Perhaps
Figure G2010100338819D00076
When assay in this region of rejection, the variance of two set of expression does not wait, and does not satisfy F check, directly exports assay, does not need to carry out T and has checked.In the present embodiment, the output result who does not satisfy the H check is set is " 1 ", the output result who satisfies the H check is " 0 ", so when assay is in above-mentioned region of rejection, output result 1.
Step 304 when F upchecks, is carried out the T check;
The statistic of structure T check:
T = x ‾ - y ‾ ( n 1 - 1 ) S 1 2 + ( n 2 - 1 ) S 2 2 n 1 + n 2 - 2 · 1 n 1 + 1 n 2 ~ t ( n 1 + n 2 - 2 )
If H 0: μ 12, H 1: μ 1≠ μ 2
Then work as H 0During establishment, the region of rejection of T check is:
W = { | T | > t 1 - α 2 ( n 1 + n 2 - 2 ) }
When assay in this region of rejection, the average of two set of expression does not wait, and does not satisfy T check.In the present embodiment, the output result who does not satisfy the H check is set is " 1 ", the output result who satisfies the H check is " 0 ".So when assay is in above-mentioned region of rejection, output assay 1.
In the present embodiment, under the level of signifiance of α=0.05, carry out H check, assay h between any two for the facies relationship ordered series of numbers in the table one with above-mentioned steps IjExpression, h IjValue be 0 or 1, and, h Ij=h Ji
The assay statistical form is as shown in Table 2:
Table two:
Figure G2010100338819D00081
Last table is the assay of a link, and 100 links that taken out in the step 201 are carried out the H check according to step 202 to 203 method, obtains assay as shown in Table 2 respectively.Because for different links, above-mentioned statistics may be different, so need analyze all statisticses, are aggregated into the result shown in the table three:
Table three:
Account for the overall test number Monday Tuesday Monday Wednesday Monday Thursday …… Friday Sunday Saturday Sunday
The assay of H is 0 ??a 12 ??a 13 ??a 14 …… ??????a 57 ??a 67
The assay of H is 1 ??b 12 ??b 13 ??b 14 …… b 57 ??b 67
As long as satisfy a Ij>0.7, can think that just the H assay is that the number percent of 0 shared overall test number is 1 shared number percent much larger than the H assay, promptly to be listed as with j be similar to the i row.
Step 204 is obtained described similarity between the traffic flow data set more than two days according to described H assay;
Can find out clearly that by table two and table three which data acquisition of two days is similar.For example, if a 23>0.7, show that the data acquisition of Tuesday and Wednesday is similar; If a 12≤ 0.7, show the data acquisition dissmilarity of Monday and Tuesday.
Step 205, according to described similarity between the traffic flow data set more than two days, will be described more than two days traffic flow data gather and be divided into an above classification;
In the present embodiment, gather when similar, can be classified as a classification when two data.Suppose in table two a 15>0.7, a 23>0.7, a 34>0.7, a 67>0.7, show seven days data to be classified as three classifications, be respectively: Monday and Friday, Tuesday, Wednesday and Thursday, Saturday and Sunday.
Step 206 is merged into one day traffic flow data with the set of the traffic flow data more than two days in the same classification in the described above classification according to the merging rule that sets in advance;
In the present embodiment, the rule of several days traffic flow data being merged into one day data is to get corresponding mean value constantly.For example, with Tuesday, Wednesday and Thursday 12 noon the method that merges of traffic flow data be: the mean value of asking for three days 12 point data.In like manner can calculate other mean value constantly.These mean values constitute the traffic flow data after merging.
Step 207 is obtained one day traffic flow data;
In the present embodiment, described one day traffic flow data is with the traffic flow data after original traffic flow data process " week " merging.Certainly, also can directly from historical data base, take out the data acquisition that does not merge, repeat no more herein through characteristic day.
Step 208 is divided into two above segment data set equal time with described one day traffic flow data according to the sample size that sets in advance;
In the present embodiment, at first the traffic flow data that a time span is divided a day to be set.The data number that this time span the inside comprises is exactly a sample size.As shown in Figure 4, the concrete method to set up of sample size comprises:
Step 401, the value set of obtaining described sample size;
In the present embodiment, sampling interval is 5 minutes, and therefore one day traffic flow data is 288.If as a time span, then sample size is 1 each sampling instant, can be divided into one day traffic flow data 288 continuous time section; If sample size is 2, can be divided into one day traffic flow data 287 continuous time section, the rest may be inferred.If sample size is n, then one day traffic flow data can be divided into 288-(n-1) individual continuous time of section.In theory, the sample size value can get 277, promptly one day traffic flow data be divided into 2 continuous time section.But it is just nonsensical that in fact, n surpasses a half of total data number in a day.The value set that can obtain sample size thus for n | 1≤n≤N/2}, wherein N is the number of traffic flow data in a day, is 288 in the present embodiment, n and N all round numerical value.
Step 402 when described sample size is got each sample size value in the set of described value, is obtained the similarity between described each sample size value time corresponding segment data set;
In the present embodiment, n is since 1 value to 144.For example, when n=1, T 1={ x 1, T 2={ x 2, T 3={ x 3... T 288={ x 288; When n=2, T 1={ x 1, x 2, T 2={ x 2, x 3... .T 287={ x 287, x 288; T iGeneral expression formula as follows:
T i={x i,x i+1,…x i+n-1}??T i+1={x i+1,x i+2,…x i+n}
Wherein, 1≤i≤N-n.To T iAnd T I+1Carry out H check, and records tests h as a result I (i+1) n, as i H test ending when 1 varies to N-n, h I (i+1) nBe the similarity between segment data set equal time of described each sample size value correspondence.
Step 403 is obtained similar number set between described each sample size value time corresponding segment data set according to described similarity;
In the present embodiment, when statistics n gets each value, corresponding h I (i+1) nValue, and count all h I (i+1) nAmong the result
Figure G2010100338819D00101
Number.For example, when n=1, Number be s 1When n=2, Number be s 2When n=144,
Figure G2010100338819D00104
Number be s 144All numbers are formed set { s 1, s 2... s 144.
Step 404, the greatest measure corresponding sample capability value in the described similar number set is set to sample size.
In the present embodiment, get set { s 1, s 2... s 144In maximal value, be designated as s Max=max{s 1, s 2... s 144, with s MaxAs sample size.
Need to prove, also can obtain sample size according to sample size few principle of trying one's best.Usually along with n varies to N/2 from 1, s can increase gradually earlier again and reduce gradually, the possibility that backward also may occur increasing again, but finally all can reduce.Therefore, can select existing first time of turning point corresponding sample capacity, this moment, sample size was all less, was generally 3 or 4.
Step 209 is carried out the H check with every adjacent two time period data acquisitions in described two above segment data set equal time, obtains second assay;
In the present embodiment, suppose that by the sample size that the described method of step 208 is obtained be 3, then Y 1=(x 1, x 2, x 3), Y 2=(x 4, x 5, x 6), Y 3=(x 7, x 8, x 9) ... Y 96=(x 286, x 287, x 288).Sample size is that the general expression formula of n is: Y a=(x i, x I+1... x I+n-1), Y b=(x I+n, x I+n+1... x I+2n-1), i=1+nj, 0≤j≤([N/n] + ∞-2), wherein [] + ∞Represent that positive infinity rounds.Along with the change of i, to every couple of Y aAnd Y bDo the H check, obtain assay h i
Step 210 is obtained similarity between described two above equal time of segment data set according to described second assay;
In the present embodiment, work as h i=0 o'clock, the data acquisition that shows two time periods was similar; Otherwise, work as h i=1 o'clock, the data acquisition dissmilarity of two time periods.
Step 211 according to the similarity between described two above segment data set equal time, is divided into an above classification with described two above segment data set equal time;
In the present embodiment, gather when similar, can be classified as a classification when two data.Need to prove that described data acquisition is continuous in time data acquisition.
Step 212 is merged into data with two in the same classification in the described above classification above segment data set equal time according to the merging rule that sets in advance.
In the present embodiment, the rule of the data acquisition of several time periods being merged into data is to get the mean value of all data in these several time periods.For example, 0:00 to the traffic flow data merging method of 6:00 is: ask for the mean value of 0:00 to all data between the 6:00.Duan merging method is identical At All Other Times, repeats no more.
The method of the deal with data that the embodiment of the invention provides, by obtaining the similarity between the pending data acquisition, according to described similarity with pending data qualification, and the data acquisition in the same classification merged, effectively reduced the quantity of data acquisition, it is easier when unprocessed than original to make for the storage of data and management; Because the minimizing of data volume makes follow-up work also become simpler.
As shown in Figure 5, the embodiment of the invention also provides a kind of device of deal with data, comprising:
First acquiring unit 501 is used to obtain data acquisition pending more than two;
Division unit 502 is used for will being divided into an above classification by the pending data acquisition more than two that described first acquiring unit 501 obtains according to the similarity between described pending more than two data acquisition that obtains in advance;
Merge cells 503, two above data acquisitions in the same classification of above classification that is used for being obtained by described division unit 502 merge according to the merging rule that sets in advance.
Further, as shown in Figure 6, described first acquiring unit 501 comprises:
First obtains subelement 5011, is used to obtain the set of traffic flow data more than two days of gathering in advance; First is provided with unit 5012, is used for obtaining the set of traffic flow data more than two days that subelement 5011 obtains by described first and is set to described data acquisition pending more than two; Perhaps,
Second obtains subelement 5013, is used to obtain one day traffic flow data gathering in advance; Second division unit 5014 is used for being divided into two above segment data set equal time with obtaining one day traffic flow data that subelement 5013 obtains by described second according to the sample size that sets in advance; Second is provided with unit 5015, and two above segment data set equal time that are used for being divided by described second division unit 5014 are set to described data acquisition pending more than two.
Further, as shown in Figure 7, when described data acquisition pending more than two is described traffic flow data when set more than two days, described division unit 502 comprises:
First computing unit 5021 is used for calculating and obtains the set of traffic flow data more than two days that subelement 5011 obtains by described first and have the related coefficient of the data acquisition of identical specific characteristic, obtains two above facies relationship ordered series of numbers;
First verification unit 5022, per two the facies relationship ordered series of numbers of two above facies relationship ordered series of numbers that are used for being calculated by described first computing unit 5021 carry out the H check, obtain first assay;
Second acquisition unit 5023 is used for obtaining described similarity between the traffic flow data set more than two days according to first assay of being obtained by described first verification unit 5022.
Further, as shown in Figure 8, when described data acquisition pending more than two be described two above equal time segment data set fashionable, described division unit 502 comprises:
Second verification unit 5024 is used for carrying out the H check with obtaining two every adjacent two time period data acquisitions of above segment data set equal time that subelement 5013 obtains by described second, obtains second assay;
The 3rd acquiring unit 5025 is used for obtaining similarity between described two above segment data set equal time according to second assay of being obtained by described second verification unit 5024.
Further, as shown in Figure 9, described second division unit 5014 comprises:
The 4th acquiring unit 601, the value set that is used to obtain described sample size;
Second computing unit 602 is used for when described sample size is got each sample size value of described value set, the similarity between segment data equal time of calculating described each sample size value correspondence is gathered;
The 5th acquiring unit 603 is used for according to similar number set between segment data set equal time of being obtained described each sample size value correspondence by the similarity of described second computing unit 602 calculating;
The 3rd is provided with unit 604, is used for being set to sample size by the greatest measure corresponding sample capability value that the similar number that described the 5th acquiring unit 603 obtains is gathered.
More than Zhuan Zhi specific implementation method can referring to as Fig. 2 as described in step 201~212 shown in Figure 4, repeat no more herein.
The device of the deal with data that the embodiment of the invention provides, by obtaining the similarity between the pending data acquisition, according to described similarity with pending data qualification, and the data acquisition in the same classification merged, effectively reduced the quantity of data acquisition, it is easier when unprocessed than original to make for the storage of data and management; Because the minimizing of data volume makes follow-up work also become simpler.
Technical scheme provided by the invention can be applied in the technical field that mass data is compressed.
One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to finish by program, described program can be stored in the computer-readable recording medium, as ROM/RAM, magnetic disc or CD etc.
The above; only be the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by described protection domain with claim.

Claims (10)

1. the method for a deal with data is characterized in that, comprising:
Obtain data acquisition pending more than two;
Similarity according between described pending more than two data acquisition that obtains in advance is divided into an above classification with described data acquisition pending more than two;
Two above data acquisitions in the same classification in the described above classification are merged according to the merging rule that sets in advance.
2. the method for deal with data according to claim 1 is characterized in that, describedly obtains data acquisition pending more than two and comprises:
Obtain the set of traffic flow data more than two days of gathering in advance; Described more than two days traffic flow data set be set to described data acquisition pending more than two; Perhaps,
Obtain one day traffic flow data gathering in advance; Described one day traffic flow data is divided into two above segment data set equal time according to the sample size that sets in advance; Described two above segment data set equal time are set to described data acquisition pending more than two.
3. the method for deal with data according to claim 2, it is characterized in that, when described data acquisition pending more than two is described traffic flow data when set more than two days, the described step of obtaining the similarity between the described data acquisition pending more than two comprises:
Calculate the described related coefficient that has the data acquisition of identical specific characteristic more than two days in the traffic flow data set, obtain two above facies relationship ordered series of numbers;
Per two facies relationship ordered series of numbers in described two above facies relationship ordered series of numbers are carried out the H check, obtain first assay;
Obtain described similarity between the traffic flow data set more than two days according to described first assay.
4. the method for deal with data according to claim 2, it is characterized in that, when described data acquisition pending more than two be described two above equal time segment data set fashionable, the described step of obtaining the similarity between the described data acquisition pending more than two comprises:
Every adjacent two time period data acquisitions in described two above segment data set equal time are carried out the H check, obtain second assay;
Obtain similarity between described two above equal time of segment data set according to described second assay.
5. the method for deal with data according to claim 2 is characterized in that, the step that is provided with of described sample size comprises:
Obtain the value set of described sample size;
When described sample size is got each sample size value in the set of described value, obtain the similarity between described each sample size value time corresponding segment data set;
Obtain similar number set between described each sample size value time corresponding segment data set according to described similarity;
Greatest measure corresponding sample capability value in the described similar number set is set to sample size.
6. the device of a deal with data is characterized in that, comprising:
First acquiring unit is used to obtain data acquisition pending more than two;
Division unit is used for will being divided into an above classification by the pending data acquisition more than two that described first acquiring unit obtains according to the similarity between described pending more than two data acquisition that obtains in advance;
Merge cells, two above data acquisitions in the same classification of above classification that is used for being obtained by described division unit merge according to the merging rule that sets in advance.
7. the device of deal with data according to claim 6 is characterized in that, described first acquiring unit comprises:
First obtains subelement, is used to obtain the set of traffic flow data more than two days of gathering in advance; First is provided with the unit, is used for obtaining the set of traffic flow data more than two days that subelement obtains by described first and is set to described data acquisition pending more than two; Perhaps,
Second obtains subelement, is used to obtain one day traffic flow data gathering in advance; Second division unit is used for being divided into two above segment data set equal time with obtaining one day traffic flow data that subelement obtains by described second according to the sample size that sets in advance; Second is provided with the unit, and two above segment data set equal time that are used for being divided by described second division unit are set to described data acquisition pending more than two.
8. the device of deal with data according to claim 7 is characterized in that, when described data acquisition pending more than two is described traffic flow data when set more than two days, described division unit comprises:
First computing unit is used for calculating and obtains the set of traffic flow data more than two days that subelement obtains by described first and have the related coefficient of the data acquisition of identical specific characteristic, obtains two above facies relationship ordered series of numbers;
First verification unit, per two the facies relationship ordered series of numbers of two above facies relationship ordered series of numbers that are used for being calculated by described first computing unit carry out the H check, obtain first assay;
Second acquisition unit is used for obtaining described similarity between the traffic flow data set more than two days according to first assay of being obtained by described first verification unit.
9. the device of deal with data according to claim 7 is characterized in that, when described data acquisition pending more than two be described two above equal time segment data set fashionable, described division unit comprises:
Second verification unit is used for carrying out the H check with obtaining two every adjacent two time period data acquisitions of above segment data set equal time that subelement obtains by described second, obtains second assay;
The 3rd acquiring unit is used for obtaining similarity between described two above segment data set equal time according to second assay of being obtained by described second verification unit.
10. the device of deal with data according to claim 7 is characterized in that, described second division unit comprises:
The 4th acquiring unit, the value set that is used to obtain described sample size;
Second computing unit is used for when described sample size is got each sample size value of described value set, the similarity between segment data equal time of calculating described each sample size value correspondence is gathered;
The 5th acquiring unit is used for according to similar number set between segment data set equal time of being obtained described each sample size value correspondence by the similarity of described second computing unit calculating;
The 3rd is provided with the unit, is used for being set to sample size by the greatest measure corresponding sample capability value that the similar number that described the 5th acquiring unit obtains is gathered.
CN2010100338819A 2010-01-11 2010-01-11 Method and device for processing data Active CN101814112B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2010100338819A CN101814112B (en) 2010-01-11 2010-01-11 Method and device for processing data
PCT/CN2010/079706 WO2011082616A1 (en) 2010-01-11 2010-12-13 Method and device for processing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010100338819A CN101814112B (en) 2010-01-11 2010-01-11 Method and device for processing data

Publications (2)

Publication Number Publication Date
CN101814112A true CN101814112A (en) 2010-08-25
CN101814112B CN101814112B (en) 2012-05-23

Family

ID=42621364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010100338819A Active CN101814112B (en) 2010-01-11 2010-01-11 Method and device for processing data

Country Status (2)

Country Link
CN (1) CN101814112B (en)
WO (1) WO2011082616A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950483A (en) * 2010-09-15 2011-01-19 青岛海信网络科技股份有限公司 Repairing method and device for traffic data fault
CN101982820A (en) * 2010-11-22 2011-03-02 北京航空航天大学 Curve display and inquiry method for large data quantity
WO2011082616A1 (en) * 2010-01-11 2011-07-14 北京世纪高通科技有限公司 Method and device for processing data
CN103366017A (en) * 2013-08-02 2013-10-23 人民搜索网络股份公司 Microblog information capturing method and device
CN104679970A (en) * 2013-11-29 2015-06-03 高德软件有限公司 Data detection method and device
CN104699056A (en) * 2015-02-13 2015-06-10 北京金控自动化技术有限公司 Sewage treatment process unit running performance monitoring method
CN106251381A (en) * 2016-07-29 2016-12-21 上海联影医疗科技有限公司 Image rebuilding method
CN106407215A (en) * 2015-07-31 2017-02-15 阿里巴巴集团控股有限公司 Data processing method and device
CN106850336A (en) * 2016-12-28 2017-06-13 中国科学院信息工程研究所 The data stream merging method and service end of a kind of monitoring system
CN106970180A (en) * 2017-01-24 2017-07-21 浙江农林大学 Poison reagent leakage monitoring method
CN107305209A (en) * 2017-01-24 2017-10-31 浙江农林大学 Volatility based on LEIF models poisons reagent leak hunting method
WO2018099089A1 (en) * 2016-11-29 2018-06-07 华为技术有限公司 Method and device for recognizing stationary state
US10535166B2 (en) 2016-02-29 2020-01-14 Shanghai United Imaging Healthcare Co., Ltd. System and method for reconstructing ECT image
CN112347113A (en) * 2020-09-16 2021-02-09 北京中兵数字科技集团有限公司 Aviation data fusion method, aviation data fusion device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100762629B1 (en) * 2003-08-26 2007-10-01 삼성전자주식회사 Method for processing back-up service of mobile terminal
JP4784322B2 (en) * 2006-01-31 2011-10-05 ソニー株式会社 Image processing device
CN101296373B (en) * 2007-04-27 2011-11-23 北京信心晟通科技发展有限公司 Multimedia data processing system and method based on material exchange format
CN100570664C (en) * 2008-01-11 2009-12-16 孟小峰 A kind of system and method thereof of coming the monitoring and controlling traffic congestion based on cluster
CN101309125B (en) * 2008-07-10 2011-04-06 浙江大学 Multimedia data transmission method of concurrent access of multiple threads
CN101814112B (en) * 2010-01-11 2012-05-23 北京世纪高通科技有限公司 Method and device for processing data

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011082616A1 (en) * 2010-01-11 2011-07-14 北京世纪高通科技有限公司 Method and device for processing data
CN101950483A (en) * 2010-09-15 2011-01-19 青岛海信网络科技股份有限公司 Repairing method and device for traffic data fault
CN101950483B (en) * 2010-09-15 2013-03-20 青岛海信网络科技股份有限公司 Repairing method and device for traffic data fault
CN101982820B (en) * 2010-11-22 2011-12-07 北京航空航天大学 Curve display and inquiry method for large data quantity
CN101982820A (en) * 2010-11-22 2011-03-02 北京航空航天大学 Curve display and inquiry method for large data quantity
CN103366017A (en) * 2013-08-02 2013-10-23 人民搜索网络股份公司 Microblog information capturing method and device
CN103366017B (en) * 2013-08-02 2016-11-23 人民搜索网络股份公司 A kind of micro-blog information grasping means and device
CN104679970B (en) * 2013-11-29 2018-11-09 高德软件有限公司 A kind of data detection method and device
CN104679970A (en) * 2013-11-29 2015-06-03 高德软件有限公司 Data detection method and device
CN104699056B (en) * 2015-02-13 2017-03-15 北京金控数据技术股份有限公司 A kind of method is monitored by sewage treatment process unit runnability
CN104699056A (en) * 2015-02-13 2015-06-10 北京金控自动化技术有限公司 Sewage treatment process unit running performance monitoring method
CN106407215A (en) * 2015-07-31 2017-02-15 阿里巴巴集团控股有限公司 Data processing method and device
US10535166B2 (en) 2016-02-29 2020-01-14 Shanghai United Imaging Healthcare Co., Ltd. System and method for reconstructing ECT image
US11557067B2 (en) 2016-02-29 2023-01-17 Shanghai United Imaging Healthcare Co., Ltd. System and method for reconstructing ECT image
CN106251381B (en) * 2016-07-29 2020-02-04 上海联影医疗科技有限公司 Image reconstruction method
CN106251381A (en) * 2016-07-29 2016-12-21 上海联影医疗科技有限公司 Image rebuilding method
WO2018099089A1 (en) * 2016-11-29 2018-06-07 华为技术有限公司 Method and device for recognizing stationary state
CN106850336A (en) * 2016-12-28 2017-06-13 中国科学院信息工程研究所 The data stream merging method and service end of a kind of monitoring system
CN106850336B (en) * 2016-12-28 2019-12-03 中国科学院信息工程研究所 A kind of the data stream merging method and server-side of monitoring system
CN107305209B (en) * 2017-01-24 2019-08-06 浙江农林大学 Volatility based on LEIF model poisons reagent leak hunting method
CN106970180B (en) * 2017-01-24 2019-06-25 浙江农林大学 Poison reagent leakage monitoring method
CN107305209A (en) * 2017-01-24 2017-10-31 浙江农林大学 Volatility based on LEIF models poisons reagent leak hunting method
CN106970180A (en) * 2017-01-24 2017-07-21 浙江农林大学 Poison reagent leakage monitoring method
CN112347113A (en) * 2020-09-16 2021-02-09 北京中兵数字科技集团有限公司 Aviation data fusion method, aviation data fusion device and storage medium
CN112347113B (en) * 2020-09-16 2021-12-14 北京中兵数字科技集团有限公司 Aviation data fusion method, aviation data fusion device and storage medium

Also Published As

Publication number Publication date
CN101814112B (en) 2012-05-23
WO2011082616A1 (en) 2011-07-14

Similar Documents

Publication Publication Date Title
CN101814112B (en) Method and device for processing data
CN109697854B (en) Multi-dimensional urban road traffic state evaluation method
CN101694743A (en) Method and device for predicting road conditions
CN108346292B (en) Urban expressway real-time traffic index calculation method based on checkpoint data
CN109657844B (en) Electric power short-term load prediction method and device
Thomas et al. Predictions of urban volumes in single time series
Saha et al. Pattern recognition using clustering analysis to support transportation system management, operations, and modeling
Aliari et al. Bluetooth sensor data and ground truth testing of reported travel times
Pamuła Classification and prediction of traffic flow based on real data using neural networks
CN103903437A (en) Motor vehicle out-driving OD matrix obtaining method based on video traffic detection data
CN110969190A (en) Illegal operation vehicle detection method, medium, equipment and device
CN104679970A (en) Data detection method and device
CN108877225A (en) Magnitude of traffic flow index determines method and device
Chen et al. Short-term traffic states forecasting considering spatial–temporal impact on an urban expressway
CN113221472B (en) Passenger flow prediction method based on LSTM
Kho et al. A development of punctuality index for bus operation
CN113643538A (en) Public transport passenger flow measuring and calculating method with integrated IC card historical data and manual survey data
Barkley et al. Relating travel time reliability and nonrecurrent congestion with multistate models
Zhang et al. Bi-national delay pattern analysis for commercial and passenger vehicles at niagara frontier border
CN112634113A (en) Polluted waste gas correlation analysis method based on dynamic sliding window
CN114639241B (en) Method and system for judging road section interruption state
CN111680888A (en) Method for determining road network capacity based on RFID data
Weinblatt Using seasonal and day-of-week factoring to improve estimates of truck vehicle miles traveled
NAVANDAR et al. Analysis of level of service for manually operated tollbooths under mixed traffic scenario
Kumar et al. Day-wise travel time pattern analysis under heterogeneous traffic conditions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant