CN104424236B - Storage method and device in data acquisition - Google Patents

Storage method and device in data acquisition Download PDF

Info

Publication number
CN104424236B
CN104424236B CN201310377205.7A CN201310377205A CN104424236B CN 104424236 B CN104424236 B CN 104424236B CN 201310377205 A CN201310377205 A CN 201310377205A CN 104424236 B CN104424236 B CN 104424236B
Authority
CN
China
Prior art keywords
acquisition
data
time
subdirectory
catalogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310377205.7A
Other languages
Chinese (zh)
Other versions
CN104424236A (en
Inventor
邱跃鹏
廖建魁
章猛
范成涛
李恭伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201310377205.7A priority Critical patent/CN104424236B/en
Priority to PCT/CN2014/085004 priority patent/WO2015027868A1/en
Publication of CN104424236A publication Critical patent/CN104424236A/en
Priority to US14/732,231 priority patent/US9977836B2/en
Application granted granted Critical
Publication of CN104424236B publication Critical patent/CN104424236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Storage method in a kind of acquisition of data, comprising: obtain acquisition data, obtain the generation time and acquisition time of the acquisition data;Difference by calculating the generation time and acquisition time obtains deviant;Obtain preset offset threshold, judge whether the deviant is less than the offset threshold, if, then obtain the corresponding centrally stored catalogue of the acquisition data, obtain the generation time subdirectory corresponding with the generation time under the centrally stored catalogue, it obtains in the offset subdirectory corresponding with the deviant under the generation time subdirectory, and by the acquired data storage in the offset subdirectory.In addition, additionally providing the storage device in a kind of acquisition of data.Storage method and device in the acquisition of above-mentioned data can reading data when reading efficiency.

Description

Storage method and device in data acquisition
Technical field
The present invention relates to data mining technology fields, more particularly to the storage method and device in a kind of acquisition of data.
Background technique
In data mining technology field, data point could be carried out according to the collected data by needing sampling to obtain mass data Analysis, it is in the conventional technology, as depicted in figs. 1 and 2, usually respectively (such as all according to preset cycle T by multiple sampling servers Phase T can be 1 day, sampling server acquire a data daily) be acquired, sampling collection of server data time be The generation time (T0, T0+T etc. as shown in figure 1) of data is acquired, or else acquisition data are periodically sent to statistics by sampling server Server summarizes storage, statistical server receive the acquisition data time be the acquisition data acquisition time (such as Fig. 1 In T0+4T and T0+2T).
Statistical server will be when that will acquire data and be stored, usually to same type of acquisition data according to acquisition time Catalogue is established, is then created a subdirectory in the catalogue according to the generation time of acquisition data.
Inventor is it has been investigated that above-mentioned storage mode needs to be examined according to generation time the prior art has at least the following problems: working as Suo Shi then needs to be traversed for the catalogue of all storage acquisition data and is searched according to generation time, so that the catalogue number of traversal Measure it is more so that reading efficiency is not high.
Summary of the invention
Based on this, it is necessary to provide it is a kind of can be improved reading efficiency data acquisition in storage method.
A kind of storage method in data acquisition, comprising:
Acquisition data are obtained, the generation time and acquisition time of the acquisition data are obtained;
Difference by calculating the generation time and acquisition time obtains deviant;
Preset offset threshold is obtained, judges whether the deviant is less than the offset threshold, if so, described in obtaining The corresponding centrally stored catalogue of data is acquired, the generation corresponding with the generation time under the centrally stored catalogue is obtained Time subdirectory obtains in the offset subdirectory corresponding with the deviant under the generation time subdirectory, and will be described Acquired data storage is in the offset subdirectory.
In addition, there is a need to provide the storage device in a kind of data acquisition that can be improved reading efficiency.
A kind of storage device in data acquisition, comprising:
Data reception module obtains the generation time and acquisition time of the acquisition data for obtaining acquisition data;
Deviant computing module obtains deviant for the difference by calculating the generation time and acquisition time;
Data memory module judges whether the deviant is less than the offset threshold for obtaining preset offset threshold Value, if so, obtain the corresponding centrally stored catalogue of the acquisition data, acquisition under the centrally stored catalogue with it is described The corresponding generation time subdirectory of generation time obtains the offset corresponding with the deviant under the generation time subdirectory In subdirectory, and by the acquired data storage in the offset subdirectory.
Storage method and device in above-mentioned data acquisition, are provided with offset threshold, and will acquire according to offset threshold Acquisition data navigate in centrally stored catalogue, and store it under the catalogue with acquisition the generation time of data it is corresponding Generation time subdirectory under offset subdirectory corresponding with the deviant in.So that when reading acquisition data, it can root It quickly navigates to corresponding catalogue according to deviant to be read out, with the reading manner phase for traversing all acquisition data in traditional technology Than improving reading efficiency.
Detailed description of the invention
Fig. 1 is the data flow figure in traditional technology in data collection system;
Fig. 2 is the flow chart of the storage method in one embodiment in data acquisition;
File storage structure schematic diagram when Fig. 3 is data storage in one embodiment;
Fig. 4 positions the process signal of the offset subdirectory in centrally stored catalogue when being reading data in one embodiment Figure;
Fig. 5 is the structural schematic diagram of the storage device in one embodiment in data acquisition;
Fig. 6 is the structural schematic diagram of the storage device in another embodiment in data acquisition.
Specific embodiment
In one embodiment, as shown in Figure 1, a kind of data collection system, including statistical server and and statistical fractals Multiple sampling servers of device connection, sampling server acquire data according to preset sampling period (for example, hour, day etc.), The time for collecting data is that (acquisition data are acquired in sampling server by data the generation time of the acquisition data at this time Generate, accordingly, with respect to statistical server, the time is referred to as generation time), as shown in Figure 1, sampling server can be irregular Acquisition data are uploaded to statistical server, the time that statistical server receives the acquisition data is adopting for the acquisition data Collect the time and (acquires data and statistical server is uploaded by sampling server at this time, acquired successfully by statistical server, therefore, relatively In statistical server, the time is referred to as acquisition time).
In the present embodiment, as shown in Fig. 2, storage method in a kind of acquisition of data, this method place one's entire reliance upon calculating Machine program, the computer program can run on the above-mentioned statistical server based on Von Neumann system, this method comprises:
Step S102 obtains acquisition data, obtains the generation time and acquisition time of acquisition data.
In this example, statistical server can be obtained by receiving the acquisition data that sampling server uploads, and be sampled After server collects acquisition data in each preset time interval, that is, period, it can be stored as independent file, it should The creation time of file is to acquire the generation time of data;When statistical server receives the acquisition data, when can record this Between, the time that when reception records is the acquisition time of the acquisition data.
Step S104, the difference by calculating generation time and acquisition time obtain deviant.
Deviant is the number in the sampling period differed between acquisition time and generation time.For example, if the sampling period For day, if then generation time is on August 1st, 2013, acquisition time is on August 4th, 2013, then deviant is 3.
Step S106 obtains preset offset threshold, judges whether deviant is less than offset threshold, if so, executing step Rapid S108:
The corresponding centrally stored catalogue of acquisition data is obtained, is obtained corresponding with generation time under centrally stored catalogue Generation time subdirectory obtains in the offset subdirectory corresponding with deviant under generation time subdirectory, and will acquire data It is stored in offset subdirectory.
In the present embodiment, if deviant is greater than or equal to offset threshold, S110 is thened follow the steps:
Obtain the corresponding decentralized storage catalogue of acquisition data, obtain under decentralized storage catalogue with acquisition time pair The acquisition time subdirectory answered, by acquired data storage into acquisition time subdirectory.
Centrally stored catalogue and decentralized storage catalogue are two catalogues in statistical server in file system.It is preferred that , centrally stored catalogue and decentralized storage catalogue are under same type catalogue, can be classified in advance to acquisition data, will The identical acquired data storage of data type is in the centrally stored catalogue or decentralized storage catalogue under same type catalogue, i.e., The data type of acquisition data can be obtained;Obtain the corresponding type of directory of data type;It obtains centrally stored under type of directory Catalogue/decentralized storage catalogue.
For example, the data type of acquisition data can be determined according to the format of acquisition data.It can be by the acquisition number of picture type It, can be by the acquired data storage of video type under videogram according to being stored under picture directory.Corresponding picture directory and view Centrally stored catalogue and decentralized storage catalogue can be respectively established under frequency catalogue.
In the corresponding application scenarios of the present embodiment, as shown in figure 3, picture catalogue (type of directory) is for storing number It is the acquisition data of picture according to type, the picture1 catalogue under picture catalogue is centrally stored catalogue, picture catalogue Lower picture2 catalogue is decentralized storage catalogue.Preset offset threshold is 6, is recordable in configuration file, can pass through reading Configuration file is taken to obtain the offset threshold.If acquiring the deviant of data less than 6, store it in picture1 catalogue, That is picture1 catalogue and it includes specific item address book stored acquisition data deviant between 0 to 5.
As shown in figure 3, if the generation time of the acquisition data of picture type be on August 1st, 2013, it is corresponding Product time subdirectory under picture1 catalogue is 20130801 catalogues, if the acquisition time of the acquisition data is 2013 August 4 days, then it is daily that its deviant, which is 3(collection period), it is corresponding under product time subdirectory i.e. 20130801 catalogues Offset subdirectory be p3 catalogue, can be by the acquired data storage under the p3 catalogue.That is, 20130801 in Fig. 3 It is on August 1st, 2013 that p0 to p5 catalogue (offset subdirectory) under catalogue stores generation time respectively, and acquisition time exists Data are acquired between on August on August 6th, 1,1 2013, i.e. the acquisition data that p0 is stored into p5 catalogue can be by more A sampling server is differentiated between on August on August 6th, 1,1 2013 uploads statistical server.
As shown in figure 3, the acquisition data for being on August 9th, 2013 for acquisition time, deviant 8, are greater than offset threshold Value 6, therefore, the acquisition data will be stored in the subdirectory of decentralized storage catalogue picture2 catalogue, can be according to its acquisition Acquisition time subdirectory under the corresponding picture2 catalogue of the 2013 time determination of the August 9 days acquisition data is 20130809 Catalogue, and will be under the acquired data storage to 20130809 catalogue.That is, being greater than adopting for offset threshold for deviant Collect data, corresponding acquisition time subdirectory can be found according to its acquisition time in decentralized storage catalogue and be stored.
In one embodiment, user can also carry out the acquisition data stored in statistical server by input keyword It reads.Keyword may include the information such as data type, acquisition time keyword or generation time keyword.It can be according to data type Navigate to corresponding type of directory.For example, picture catalogue can be navigated to if the data type in keyword is picture.
And for acquisition time keyword, the method for reading corresponding acquisition data includes:
The acquisition time keyword of input is obtained, extracted for the first input time;
In centrally stored catalogue, obtain it includes the corresponding generation time of generation time subdirectory with first input when Between difference be less than offset threshold, and the corresponding generation time of generation time subdirectory with it includes offset subdirectory it is corresponding inclined The sum of shifting value is equal to the offset subdirectory of the first input time, reads the acquisition data stored in offset subdirectory;
In decentralized storage catalogue, obtain it includes the corresponding acquisition time of acquisition time subdirectory and first input Time identical acquisition time subdirectory reads the acquisition data stored in acquisition time subdirectory.
For example, as shown in figure 4, if acquisition time keyword corresponding first input time of user's input is 2013 8 The moon 6, then the offset subdirectory that can navigate to the corresponding acquisition data of storage is under 20130801 catalogues that Fig. 5 bend streaks P5 catalogue, the p4 catalogue under 20130802 catalogues, the p3 catalogue under 20130803 catalogues, the p2 mesh under 20130804 catalogues P0 catalogue under record, p1 catalogue and 20130806 catalogues under 20130805 catalogues is (when the corresponding generation of generation time subdirectory Between with the sum of the corresponding deviant of offset subdirectory be 6).Deviant can be obtained less than inclined by reading above-mentioned offset subdirectory Move the acquisition data of threshold value.
And be greater than or equal to the acquisition data of offset threshold for deviant, then it can be read in decentralized storage catalogue It takes.Deviant can be obtained by 20130806 catalogues directly read under decentralized storage catalogue more than or equal to offset threshold Acquire data.
From the foregoing, it will be observed that by the subdirectory under the subdirectory and decentralized storage catalogue under centrally stored catalogue respectively into The acquisition data that all acquisition times are on August 6th, 2013 can be obtained in the above-mentioned read operation of row.Since reading process does not need All acquisition data are traversed, are only the position that can determine the catalogue of storage acquisition data by simple plus and minus calculation, thus into Row is directly read, so that improving reading efficiency compared with traditional technology.
And for generation time keyword, the method for reading corresponding acquisition data includes:
The generation time keyword of input is obtained, extracted for the second input time;
In centrally stored catalogue, obtain it includes the corresponding generation time of generation time subdirectory with second input when Between identical generation time subdirectory, read generation time subdirectory and it includes offset subdirectory in the acquisition number that stores According to;
In decentralized storage catalogue, acquisition time subdirectory is traversed, reads the generation time under acquisition time subdirectory Acquisition data corresponding with the second input time.
For example, if generation time keyword corresponding second input time of user's input is on August 1st, 2013, it can 20130801 catalogues are directly read under centrally stored catalogue, and to all acquisition time subdirectories in decentralized storage catalogue It is traversed, reads the acquisition data that generation time is on August 1st, 2013.
Since when offset threshold is arranged, what can be arranged is larger, so that the acquisition time under decentralized storage catalogue The data volume of the acquisition data stored in subdirectory is smaller, compared with traversing all acquisition data in traditional technology, only traverses number Reading efficiency can be improved according to lesser decentralized storage catalogue is measured.
In one embodiment, statistical server can also according to have received sampling server upload acquisition data to upper It states offset threshold to be adjusted, specifically include:
The acquisition data stored under centrally stored catalogue and its subdirectory, decentralized storage catalogue and its subdirectory are traversed, The corresponding generation time of the acquisition data and acquisition time are obtained, and calculates corresponding deviant;
According to formula:
Generate deviant probability distribution;Wherein, S (T) is the sum of the quantity of acquisition data of the deviant less than T, and N is acquisition The total quantity of data, P (T) are deviant probability distribution;Preset probability threshold value is obtained, according to probability threshold value update Offset threshold.
For example, the number of the corresponding acquisition data of difference deviant is as shown in table 1 if there is 100 acquisition data:
Table 1
Deviant (T) 0 1 2 3 4 5 6 >7
Number 23 32 16 13 8 5 2 1
S(T) 23 55 71 84 92 97 99 100
P(T) 23% 55% 71% 84% 92% 97% 99% 100%
If preset probability threshold value is 98%, then it represents that the offset threshold of setting needs the acquisition data greater than 98% or more Therefore deviant can set 7 for offset threshold;If preset probability threshold value is 60%, then it represents that the offset threshold of setting needs It is greater than the deviant of 60% or more acquisition data, therefore, 3 can be set by offset threshold.
It should be noted that bigger (offset threshold is bigger) of probability threshold value setting, then store in decentralized storage catalogue Acquisition data it is fewer, the file traversed when being read out according to generation time keyword is fewer, and efficiency is higher, but according to acquisition When time-critical word is read out, the offset subdirectory under the centrally stored catalogue for needing to read is more, and reading efficiency is relatively Low (but still higher than traditional technology);The smaller offset threshold of probability threshold value setting is smaller), then it is stored in decentralized storage catalogue Acquire data it is more, the file traversed when being read out according to generation time keyword is more, and efficiency is lower, but according to acquisition when Between keyword when being read out, the offset subdirectory under the centrally stored catalogue for needing to read relatively is burnt, and reading efficiency is relatively high. Preferably, preset probability threshold value can be 99.5%.
In one embodiment, as shown in figure 5, a kind of data acquisition in storage device, comprising:
Data reception module 102 obtains the generation time and acquisition time of acquisition data for obtaining acquisition data;
Deviant computing module 104 obtains deviant for the difference by calculating generation time and acquisition time;
Data memory module 106 judges whether deviant is less than offset threshold for obtaining preset offset threshold, if It is then to obtain the corresponding centrally stored catalogue of acquisition data, obtains the production corresponding with generation time under centrally stored catalogue Raw time subdirectory, obtains in the offset subdirectory corresponding with deviant under generation time subdirectory, and acquisition data are deposited Storage is in offset subdirectory.
In the present embodiment, data memory module 102 is also used to when deviant is greater than or equal to offset threshold, and acquisition is adopted Collect the corresponding decentralized storage catalogue of data, obtains acquisition time corresponding with acquisition time under decentralized storage catalogue Catalogue, by acquired data storage into acquisition time subdirectory.
In one embodiment, data memory module 102 is also used to obtain the data type of acquisition data;Obtain data class The corresponding type of directory of type;Obtain centrally stored catalogue/decentralized storage catalogue under type of directory.
In one embodiment, it as shown in fig. 6, the storage device in data acquisition further includes the first read module 108, uses In the acquisition time keyword for obtaining input, extracted for the first input time;In centrally stored catalogue, obtain it includes generation The corresponding generation time of time subdirectory and the difference of the first input time are less than offset threshold, and generation time subdirectory is corresponding Generation time with it includes the sum of the corresponding deviant of offset subdirectory be equal to the offset subdirectory of the first input time, read The acquisition data stored in offset subdirectory;In decentralized storage catalogue, obtain it includes acquisition time subdirectory it is corresponding Acquisition time acquisition time subdirectory identical with the first input time, read the acquisition number that stores in acquisition time subdirectory According to.
In one embodiment, it as shown in fig. 6, the storage device in data acquisition further includes the second read module 110, uses In the generation time keyword for obtaining input, extracted for the second input time;In centrally stored catalogue, obtain it includes generation The corresponding generation time of time subdirectory generation time subdirectory identical with the second input time reads generation time subdirectory And it includes offset subdirectory in the acquisition data that store;In decentralized storage catalogue, acquisition time subdirectory is traversed, is read Take the acquisition data corresponding with the second input time of the generation time under acquisition time subdirectory.
In one embodiment, as shown in fig. 6, the storage device in data acquisition further includes offset threshold adjustment module 112, for traversing the acquisition data stored under centrally stored catalogue and its subdirectory, decentralized storage catalogue and its subdirectory; The corresponding generation time of acquisition data and acquisition time are obtained, and calculates corresponding deviant;According to formula:
Generate deviant probability distribution;Wherein, S (T) is the sum of the quantity of acquisition data of the deviant less than T, and N is acquisition The total quantity of data, P (T) are deviant probability distribution;Preset probability threshold value is obtained, offset threshold is updated according to probability threshold value Value.
Storage method and device in above-mentioned data acquisition, are provided with offset threshold, and will acquire according to offset threshold Acquisition data navigate in centrally stored catalogue, and store it under the catalogue with acquisition the generation time of data it is corresponding Generation time subdirectory under offset subdirectory corresponding with the deviant in.So that when reading acquisition data, it can root It quickly navigates to corresponding catalogue according to deviant to be read out, with the reading manner phase for traversing all acquisition data in traditional technology Than improving reading efficiency.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention Protect range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (10)

1. the storage method in a kind of data acquisition, comprising:
Acquisition data are obtained, the generation time and acquisition time of the acquisition data are obtained;
Difference by calculating the generation time and acquisition time obtains deviant;
Preset offset threshold is obtained, judges whether the deviant is less than the offset threshold, if so, obtaining the acquisition The corresponding centrally stored catalogue of data obtains the generation time corresponding with the generation time under the centrally stored catalogue Subdirectory, obtains the offset subdirectory corresponding with the deviant under the generation time subdirectory, and by the acquisition number According to being stored in the offset subdirectory;
If the deviant is greater than or equal to the offset threshold, the corresponding decentralized storage mesh of the acquisition data is obtained Record obtains the acquisition time subdirectory corresponding with the acquisition time under the decentralized storage catalogue, by the acquisition Data are stored into the acquisition time subdirectory.
2. the storage method in data acquisition according to claim 1, which is characterized in that described to obtain the acquisition data Corresponding centrally stored catalogue/decentralized storage catalogue step includes:
Obtain the data type of the acquisition data;
Obtain the corresponding type of directory of the data type;
Obtain centrally stored catalogue/decentralized storage catalogue under the type of directory.
3. the storage method in data acquisition according to claim 1, which is characterized in that the method also includes:
The acquisition time keyword of input is obtained, extracted for the first input time;
In the centrally stored catalogue, obtain it includes the corresponding generation time of generation time subdirectory and described first defeated The difference of the angle of incidence be less than the offset threshold, and the corresponding generation time of the generation time subdirectory with it includes offset son The sum of corresponding deviant of catalogue is equal to the offset subdirectory of first input time, reads in the offset subdirectory and stores Acquisition data;
In the decentralized storage catalogue, obtain it includes the corresponding acquisition time of acquisition time subdirectory and described first Input time identical acquisition time subdirectory, reads the acquisition data stored in the acquisition time subdirectory.
4. the storage method in data acquisition according to claim 1, which is characterized in that the method also includes:
The generation time keyword of input is obtained, extracted for the second input time;
In the centrally stored catalogue, obtain it includes the corresponding generation time of generation time subdirectory and described second defeated The identical generation time subdirectory of the angle of incidence, read the generation time subdirectory and it includes offset subdirectory in store Acquire data;
In the decentralized storage catalogue, the acquisition time subdirectory is traversed, reads the generation under acquisition time subdirectory Time acquisition data corresponding with second input time.
5. the storage method in data acquisition according to claim 1, which is characterized in that the method also includes:
Traverse the acquisition data stored under the centrally stored catalogue and its subdirectory, decentralized storage catalogue and its subdirectory;
The corresponding generation time of the acquisition data and acquisition time are obtained, and calculates corresponding deviant;
According to formula:
Generate deviant probability distribution;Wherein, S (T) is the sum of the quantity of acquisition data of the deviant less than T, and N is acquisition data Total quantity, P (T) be deviant probability distribution, T is deviant;
Preset probability threshold value is obtained, the offset threshold is updated according to the probability threshold value.
6. the storage device in a kind of data acquisition characterized by comprising
Data reception module obtains the generation time and acquisition time of the acquisition data for obtaining acquisition data;
Deviant computing module obtains deviant for the difference by calculating the generation time and acquisition time;
Data memory module judges whether the deviant is less than the offset threshold for obtaining preset offset threshold, if Then to obtain the corresponding centrally stored catalogue of the acquisition data, obtain under the centrally stored catalogue with the generation Time corresponding generation time subdirectory obtains the offset specific item corresponding with the deviant under the generation time subdirectory Record, and by the acquired data storage in the offset subdirectory;
The data memory module is also used to obtain the acquisition when the deviant is greater than or equal to the offset threshold The corresponding decentralized storage catalogue of data obtains the acquisition corresponding with the acquisition time under the decentralized storage catalogue Time subdirectory, by the acquired data storage into the acquisition time subdirectory.
7. the storage device in data acquisition according to claim 6, which is characterized in that the data memory module is also used In the data type for obtaining the acquisition data;Obtain the corresponding type of directory of the data type;Obtain the type of directory Under centrally stored catalogue/decentralized storage catalogue.
8. the storage device in data acquisition according to claim 6, which is characterized in that described device further includes the first reading Modulus block extracted for the first input time for obtaining the acquisition time keyword of input;In the centrally stored catalogue, obtain Take it includes the corresponding generation time of generation time subdirectory and first input time difference be less than the offset threshold, And the corresponding generation time of the generation time subdirectory with it includes the sum of the corresponding deviant of offset subdirectory be equal to institute The offset subdirectory of the first input time is stated, the acquisition data stored in the offset subdirectory are read;It decentralized is deposited described Store up catalogue in, obtain it includes the acquisition identical with first input time of the corresponding acquisition time of acquisition time subdirectory Time subdirectory reads the acquisition data stored in the acquisition time subdirectory.
9. the storage device in data acquisition according to claim 6, which is characterized in that described device further includes the second reading Modulus block extracted for the second input time for obtaining the generation time keyword of input;In the centrally stored catalogue, obtain Take it includes the corresponding generation time of generation time subdirectory generation time subdirectory identical with second input time, Read the generation time subdirectory and it includes offset subdirectory in the acquisition data that store;In the decentralized storage mesh In record, the acquisition time subdirectory is traversed, reads the generation time under acquisition time subdirectory and second input time Corresponding acquisition data.
10. the storage device in data acquisition according to claim 6, which is characterized in that described device further includes offset Threshold adjustment module is deposited under the centrally stored catalogue and its subdirectory, decentralized storage catalogue and its subdirectory for traversing The acquisition data of storage;The corresponding generation time of the acquisition data and acquisition time are obtained, and calculates corresponding deviant;According to Formula:
Generate deviant probability distribution;Wherein, S (T) is the sum of the quantity of acquisition data of the deviant less than T, and N is acquisition data Total quantity, P (T) be deviant probability distribution;Preset probability threshold value is obtained, the offset is updated according to the probability threshold value Threshold value, T are deviant.
CN201310377205.7A 2013-08-26 2013-08-26 Storage method and device in data acquisition Active CN104424236B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201310377205.7A CN104424236B (en) 2013-08-26 2013-08-26 Storage method and device in data acquisition
PCT/CN2014/085004 WO2015027868A1 (en) 2013-08-26 2014-08-22 Storing method and apparatus for data acquisition
US14/732,231 US9977836B2 (en) 2013-08-26 2015-06-05 Storing method and apparatus for data acquisition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310377205.7A CN104424236B (en) 2013-08-26 2013-08-26 Storage method and device in data acquisition

Publications (2)

Publication Number Publication Date
CN104424236A CN104424236A (en) 2015-03-18
CN104424236B true CN104424236B (en) 2018-12-07

Family

ID=52585572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310377205.7A Active CN104424236B (en) 2013-08-26 2013-08-26 Storage method and device in data acquisition

Country Status (3)

Country Link
US (1) US9977836B2 (en)
CN (1) CN104424236B (en)
WO (1) WO2015027868A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528870B (en) * 2016-06-22 2019-08-23 腾讯科技(深圳)有限公司 A kind of collecting method and its equipment
CN110716966B (en) * 2019-10-16 2022-12-27 京东方科技集团股份有限公司 Data visualization processing method and system, electronic device and storage medium
CN110765321B (en) * 2019-10-28 2022-10-25 北京明略软件系统有限公司 Data storage path generation method and device and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1747398A (en) * 2004-09-08 2006-03-15 大唐移动通信设备有限公司 Mass performance data statistical method in network element management system
CN102761524A (en) * 2011-04-27 2012-10-31 中兴通讯股份有限公司 Steaming media storage and playing method and corresponding system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7392235B2 (en) * 2005-04-15 2008-06-24 Emc Corporation Methods and apparatus for retrieval of content units in a time-based directory structure
JP4772378B2 (en) * 2005-05-26 2011-09-14 株式会社東芝 Method and apparatus for generating time-series data from a web page
US8732386B2 (en) * 2008-03-20 2014-05-20 Sandisk Enterprise IP LLC. Sharing data fabric for coherent-distributed caching of multi-node shared-distributed flash memory
US8810369B2 (en) * 2008-11-19 2014-08-19 Intermec Ip Corp Finding sensor data in an RFID network
CN101667205B (en) * 2009-09-28 2011-03-30 河南电力试验研究院 Method for memorizing real time measure point data for quick review
JP2011109469A (en) * 2009-11-18 2011-06-02 Canon Inc Content receiving apparatus, and method of controlling the same
CN102841823A (en) * 2011-06-23 2012-12-26 鸿富锦精密工业(深圳)有限公司 Data backup system and method
CN102402592A (en) * 2011-11-04 2012-04-04 同辉佳视(北京)信息技术股份有限公司 Information collecting method based on webpage data mining

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1747398A (en) * 2004-09-08 2006-03-15 大唐移动通信设备有限公司 Mass performance data statistical method in network element management system
CN102761524A (en) * 2011-04-27 2012-10-31 中兴通讯股份有限公司 Steaming media storage and playing method and corresponding system

Also Published As

Publication number Publication date
US9977836B2 (en) 2018-05-22
WO2015027868A1 (en) 2015-03-05
CN104424236A (en) 2015-03-18
US20150269277A1 (en) 2015-09-24

Similar Documents

Publication Publication Date Title
WO2016054908A1 (en) Internet of things big data platform-based intelligent user profiling method and apparatus
CN103702053A (en) Video storage and search method and system as well as monitoring system
CN102685717B (en) network service quality parameter identification method and device
CN105490854A (en) Real-time log collection method and system, and application server cluster
CN107797894B (en) APP user behavior analysis method and device
CN102880712A (en) Method and system for sequencing searched network videos
CN105933772B (en) Exchange method, interactive device and interactive system
CN103617260B (en) Index generation method and device for repeated data deletion
CN105095211A (en) Acquisition method and device for multimedia data
CN104424236B (en) Storage method and device in data acquisition
CN105718590A (en) Multi-tenant oriented SaaS public opinion monitoring system and method
CN104182482B (en) A kind of news list page determination methods and the method for screening news list page
CN106528787A (en) Mass data multi-dimensional analysis-based query method and device
CN106033324A (en) Data storage method and device
CN105069113A (en) Data flow real-time visualization method and data flow real-time visualization system
CN106610774A (en) Webpage table editing method and device
CN101929859B (en) Image full-frame scanning based space debris detecting method
CN105608135A (en) Data mining method and system based on Apriori algorithm
CN102737093A (en) Data storage apparatus and data storage method
CN110493085A (en) Statistical method, system, electronic equipment and the medium of IPv6 active users
CN103019575B (en) A kind of mobile terminal and information processing method thereof
Rathore et al. Hadoop based real-time big data architecture for remote sensing earth observatory system
CN105517018B (en) A kind of method and device obtaining location information
CN103929339A (en) Method and system for collecting web data
CN104954351B (en) data detection method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190807

Address after: 518000 Nanshan District science and technology zone, Guangdong, Zhejiang Province, science and technology in the Tencent Building on the 1st floor of the 35 layer

Co-patentee after: Tencent cloud computing (Beijing) limited liability company

Patentee after: Tencent Technology (Shenzhen) Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.