Specific embodiment
In one embodiment, as shown in Figure 1, a kind of data collection system, including statistical server and and statistical fractals
Multiple sampling servers of device connection, sampling server acquire data according to preset sampling period (for example, hour, day etc.),
The time for collecting data is that (acquisition data are acquired in sampling server by data the generation time of the acquisition data at this time
Generate, accordingly, with respect to statistical server, the time is referred to as generation time), as shown in Figure 1, sampling server can be irregular
Acquisition data are uploaded to statistical server, the time that statistical server receives the acquisition data is adopting for the acquisition data
Collect the time and (acquires data and statistical server is uploaded by sampling server at this time, acquired successfully by statistical server, therefore, relatively
In statistical server, the time is referred to as acquisition time).
In the present embodiment, as shown in Fig. 2, storage method in a kind of acquisition of data, this method place one's entire reliance upon calculating
Machine program, the computer program can run on the above-mentioned statistical server based on Von Neumann system, this method comprises:
Step S102 obtains acquisition data, obtains the generation time and acquisition time of acquisition data.
In this example, statistical server can be obtained by receiving the acquisition data that sampling server uploads, and be sampled
After server collects acquisition data in each preset time interval, that is, period, it can be stored as independent file, it should
The creation time of file is to acquire the generation time of data;When statistical server receives the acquisition data, when can record this
Between, the time that when reception records is the acquisition time of the acquisition data.
Step S104, the difference by calculating generation time and acquisition time obtain deviant.
Deviant is the number in the sampling period differed between acquisition time and generation time.For example, if the sampling period
For day, if then generation time is on August 1st, 2013, acquisition time is on August 4th, 2013, then deviant is 3.
Step S106 obtains preset offset threshold, judges whether deviant is less than offset threshold, if so, executing step
Rapid S108:
The corresponding centrally stored catalogue of acquisition data is obtained, is obtained corresponding with generation time under centrally stored catalogue
Generation time subdirectory obtains in the offset subdirectory corresponding with deviant under generation time subdirectory, and will acquire data
It is stored in offset subdirectory.
In the present embodiment, if deviant is greater than or equal to offset threshold, S110 is thened follow the steps:
Obtain the corresponding decentralized storage catalogue of acquisition data, obtain under decentralized storage catalogue with acquisition time pair
The acquisition time subdirectory answered, by acquired data storage into acquisition time subdirectory.
Centrally stored catalogue and decentralized storage catalogue are two catalogues in statistical server in file system.It is preferred that
, centrally stored catalogue and decentralized storage catalogue are under same type catalogue, can be classified in advance to acquisition data, will
The identical acquired data storage of data type is in the centrally stored catalogue or decentralized storage catalogue under same type catalogue, i.e.,
The data type of acquisition data can be obtained;Obtain the corresponding type of directory of data type;It obtains centrally stored under type of directory
Catalogue/decentralized storage catalogue.
For example, the data type of acquisition data can be determined according to the format of acquisition data.It can be by the acquisition number of picture type
It, can be by the acquired data storage of video type under videogram according to being stored under picture directory.Corresponding picture directory and view
Centrally stored catalogue and decentralized storage catalogue can be respectively established under frequency catalogue.
In the corresponding application scenarios of the present embodiment, as shown in figure 3, picture catalogue (type of directory) is for storing number
It is the acquisition data of picture according to type, the picture1 catalogue under picture catalogue is centrally stored catalogue, picture catalogue
Lower picture2 catalogue is decentralized storage catalogue.Preset offset threshold is 6, is recordable in configuration file, can pass through reading
Configuration file is taken to obtain the offset threshold.If acquiring the deviant of data less than 6, store it in picture1 catalogue,
That is picture1 catalogue and it includes specific item address book stored acquisition data deviant between 0 to 5.
As shown in figure 3, if the generation time of the acquisition data of picture type be on August 1st, 2013, it is corresponding
Product time subdirectory under picture1 catalogue is 20130801 catalogues, if the acquisition time of the acquisition data is 2013
August 4 days, then it is daily that its deviant, which is 3(collection period), it is corresponding under product time subdirectory i.e. 20130801 catalogues
Offset subdirectory be p3 catalogue, can be by the acquired data storage under the p3 catalogue.That is, 20130801 in Fig. 3
It is on August 1st, 2013 that p0 to p5 catalogue (offset subdirectory) under catalogue stores generation time respectively, and acquisition time exists
Data are acquired between on August on August 6th, 1,1 2013, i.e. the acquisition data that p0 is stored into p5 catalogue can be by more
A sampling server is differentiated between on August on August 6th, 1,1 2013 uploads statistical server.
As shown in figure 3, the acquisition data for being on August 9th, 2013 for acquisition time, deviant 8, are greater than offset threshold
Value 6, therefore, the acquisition data will be stored in the subdirectory of decentralized storage catalogue picture2 catalogue, can be according to its acquisition
Acquisition time subdirectory under the corresponding picture2 catalogue of the 2013 time determination of the August 9 days acquisition data is 20130809
Catalogue, and will be under the acquired data storage to 20130809 catalogue.That is, being greater than adopting for offset threshold for deviant
Collect data, corresponding acquisition time subdirectory can be found according to its acquisition time in decentralized storage catalogue and be stored.
In one embodiment, user can also carry out the acquisition data stored in statistical server by input keyword
It reads.Keyword may include the information such as data type, acquisition time keyword or generation time keyword.It can be according to data type
Navigate to corresponding type of directory.For example, picture catalogue can be navigated to if the data type in keyword is picture.
And for acquisition time keyword, the method for reading corresponding acquisition data includes:
The acquisition time keyword of input is obtained, extracted for the first input time;
In centrally stored catalogue, obtain it includes the corresponding generation time of generation time subdirectory with first input when
Between difference be less than offset threshold, and the corresponding generation time of generation time subdirectory with it includes offset subdirectory it is corresponding inclined
The sum of shifting value is equal to the offset subdirectory of the first input time, reads the acquisition data stored in offset subdirectory;
In decentralized storage catalogue, obtain it includes the corresponding acquisition time of acquisition time subdirectory and first input
Time identical acquisition time subdirectory reads the acquisition data stored in acquisition time subdirectory.
For example, as shown in figure 4, if acquisition time keyword corresponding first input time of user's input is 2013 8
The moon 6, then the offset subdirectory that can navigate to the corresponding acquisition data of storage is under 20130801 catalogues that Fig. 5 bend streaks
P5 catalogue, the p4 catalogue under 20130802 catalogues, the p3 catalogue under 20130803 catalogues, the p2 mesh under 20130804 catalogues
P0 catalogue under record, p1 catalogue and 20130806 catalogues under 20130805 catalogues is (when the corresponding generation of generation time subdirectory
Between with the sum of the corresponding deviant of offset subdirectory be 6).Deviant can be obtained less than inclined by reading above-mentioned offset subdirectory
Move the acquisition data of threshold value.
And be greater than or equal to the acquisition data of offset threshold for deviant, then it can be read in decentralized storage catalogue
It takes.Deviant can be obtained by 20130806 catalogues directly read under decentralized storage catalogue more than or equal to offset threshold
Acquire data.
From the foregoing, it will be observed that by the subdirectory under the subdirectory and decentralized storage catalogue under centrally stored catalogue respectively into
The acquisition data that all acquisition times are on August 6th, 2013 can be obtained in the above-mentioned read operation of row.Since reading process does not need
All acquisition data are traversed, are only the position that can determine the catalogue of storage acquisition data by simple plus and minus calculation, thus into
Row is directly read, so that improving reading efficiency compared with traditional technology.
And for generation time keyword, the method for reading corresponding acquisition data includes:
The generation time keyword of input is obtained, extracted for the second input time;
In centrally stored catalogue, obtain it includes the corresponding generation time of generation time subdirectory with second input when
Between identical generation time subdirectory, read generation time subdirectory and it includes offset subdirectory in the acquisition number that stores
According to;
In decentralized storage catalogue, acquisition time subdirectory is traversed, reads the generation time under acquisition time subdirectory
Acquisition data corresponding with the second input time.
For example, if generation time keyword corresponding second input time of user's input is on August 1st, 2013, it can
20130801 catalogues are directly read under centrally stored catalogue, and to all acquisition time subdirectories in decentralized storage catalogue
It is traversed, reads the acquisition data that generation time is on August 1st, 2013.
Since when offset threshold is arranged, what can be arranged is larger, so that the acquisition time under decentralized storage catalogue
The data volume of the acquisition data stored in subdirectory is smaller, compared with traversing all acquisition data in traditional technology, only traverses number
Reading efficiency can be improved according to lesser decentralized storage catalogue is measured.
In one embodiment, statistical server can also according to have received sampling server upload acquisition data to upper
It states offset threshold to be adjusted, specifically include:
The acquisition data stored under centrally stored catalogue and its subdirectory, decentralized storage catalogue and its subdirectory are traversed,
The corresponding generation time of the acquisition data and acquisition time are obtained, and calculates corresponding deviant;
According to formula:
Generate deviant probability distribution;Wherein, S (T) is the sum of the quantity of acquisition data of the deviant less than T, and N is acquisition
The total quantity of data, P (T) are deviant probability distribution;Preset probability threshold value is obtained, according to probability threshold value update
Offset threshold.
For example, the number of the corresponding acquisition data of difference deviant is as shown in table 1 if there is 100 acquisition data:
Table 1
Deviant (T) |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
>7 |
Number |
23 |
32 |
16 |
13 |
8 |
5 |
2 |
1 |
S(T) |
23 |
55 |
71 |
84 |
92 |
97 |
99 |
100 |
P(T) |
23% |
55% |
71% |
84% |
92% |
97% |
99% |
100% |
If preset probability threshold value is 98%, then it represents that the offset threshold of setting needs the acquisition data greater than 98% or more
Therefore deviant can set 7 for offset threshold;If preset probability threshold value is 60%, then it represents that the offset threshold of setting needs
It is greater than the deviant of 60% or more acquisition data, therefore, 3 can be set by offset threshold.
It should be noted that bigger (offset threshold is bigger) of probability threshold value setting, then store in decentralized storage catalogue
Acquisition data it is fewer, the file traversed when being read out according to generation time keyword is fewer, and efficiency is higher, but according to acquisition
When time-critical word is read out, the offset subdirectory under the centrally stored catalogue for needing to read is more, and reading efficiency is relatively
Low (but still higher than traditional technology);The smaller offset threshold of probability threshold value setting is smaller), then it is stored in decentralized storage catalogue
Acquire data it is more, the file traversed when being read out according to generation time keyword is more, and efficiency is lower, but according to acquisition when
Between keyword when being read out, the offset subdirectory under the centrally stored catalogue for needing to read relatively is burnt, and reading efficiency is relatively high.
Preferably, preset probability threshold value can be 99.5%.
In one embodiment, as shown in figure 5, a kind of data acquisition in storage device, comprising:
Data reception module 102 obtains the generation time and acquisition time of acquisition data for obtaining acquisition data;
Deviant computing module 104 obtains deviant for the difference by calculating generation time and acquisition time;
Data memory module 106 judges whether deviant is less than offset threshold for obtaining preset offset threshold, if
It is then to obtain the corresponding centrally stored catalogue of acquisition data, obtains the production corresponding with generation time under centrally stored catalogue
Raw time subdirectory, obtains in the offset subdirectory corresponding with deviant under generation time subdirectory, and acquisition data are deposited
Storage is in offset subdirectory.
In the present embodiment, data memory module 102 is also used to when deviant is greater than or equal to offset threshold, and acquisition is adopted
Collect the corresponding decentralized storage catalogue of data, obtains acquisition time corresponding with acquisition time under decentralized storage catalogue
Catalogue, by acquired data storage into acquisition time subdirectory.
In one embodiment, data memory module 102 is also used to obtain the data type of acquisition data;Obtain data class
The corresponding type of directory of type;Obtain centrally stored catalogue/decentralized storage catalogue under type of directory.
In one embodiment, it as shown in fig. 6, the storage device in data acquisition further includes the first read module 108, uses
In the acquisition time keyword for obtaining input, extracted for the first input time;In centrally stored catalogue, obtain it includes generation
The corresponding generation time of time subdirectory and the difference of the first input time are less than offset threshold, and generation time subdirectory is corresponding
Generation time with it includes the sum of the corresponding deviant of offset subdirectory be equal to the offset subdirectory of the first input time, read
The acquisition data stored in offset subdirectory;In decentralized storage catalogue, obtain it includes acquisition time subdirectory it is corresponding
Acquisition time acquisition time subdirectory identical with the first input time, read the acquisition number that stores in acquisition time subdirectory
According to.
In one embodiment, it as shown in fig. 6, the storage device in data acquisition further includes the second read module 110, uses
In the generation time keyword for obtaining input, extracted for the second input time;In centrally stored catalogue, obtain it includes generation
The corresponding generation time of time subdirectory generation time subdirectory identical with the second input time reads generation time subdirectory
And it includes offset subdirectory in the acquisition data that store;In decentralized storage catalogue, acquisition time subdirectory is traversed, is read
Take the acquisition data corresponding with the second input time of the generation time under acquisition time subdirectory.
In one embodiment, as shown in fig. 6, the storage device in data acquisition further includes offset threshold adjustment module
112, for traversing the acquisition data stored under centrally stored catalogue and its subdirectory, decentralized storage catalogue and its subdirectory;
The corresponding generation time of acquisition data and acquisition time are obtained, and calculates corresponding deviant;According to formula:
Generate deviant probability distribution;Wherein, S (T) is the sum of the quantity of acquisition data of the deviant less than T, and N is acquisition
The total quantity of data, P (T) are deviant probability distribution;Preset probability threshold value is obtained, offset threshold is updated according to probability threshold value
Value.
Storage method and device in above-mentioned data acquisition, are provided with offset threshold, and will acquire according to offset threshold
Acquisition data navigate in centrally stored catalogue, and store it under the catalogue with acquisition the generation time of data it is corresponding
Generation time subdirectory under offset subdirectory corresponding with the deviant in.So that when reading acquisition data, it can root
It quickly navigates to corresponding catalogue according to deviant to be read out, with the reading manner phase for traversing all acquisition data in traditional technology
Than improving reading efficiency.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously
Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention
Protect range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.