CN105005617B - A kind of storage method and device of time series data - Google Patents

A kind of storage method and device of time series data Download PDF

Info

Publication number
CN105005617B
CN105005617B CN201510429895.5A CN201510429895A CN105005617B CN 105005617 B CN105005617 B CN 105005617B CN 201510429895 A CN201510429895 A CN 201510429895A CN 105005617 B CN105005617 B CN 105005617B
Authority
CN
China
Prior art keywords
data
storage
stored
hbase databases
hbase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510429895.5A
Other languages
Chinese (zh)
Other versions
CN105005617A (en
Inventor
李江颖
吴培荣
程衍明
陈刚
杨超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NAVIMENTUM INFORMATION SYSTEM CO Ltd
Original Assignee
NAVIMENTUM INFORMATION SYSTEM CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NAVIMENTUM INFORMATION SYSTEM CO Ltd filed Critical NAVIMENTUM INFORMATION SYSTEM CO Ltd
Priority to CN201510429895.5A priority Critical patent/CN105005617B/en
Publication of CN105005617A publication Critical patent/CN105005617A/en
Application granted granted Critical
Publication of CN105005617B publication Critical patent/CN105005617B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of storage method of time series data and devices.Wherein, this method includes:It will be in data to be stored storage to Hbase databases;Data volume in Hbase databases is counted, statistical result is obtained;Merge in measurement period in n-th and the data in Hbase databases are merged, the file after being merged;The N+1 data storage capacity s merged in measurement period is obtained according to statistical result;Judge whether data storage capacity s merges the preset data amount of storage of measurement period more than the N+1;If it is not, being split to the file after merging.The technical issues of present invention solves the storage capacity requirement that cannot be satisfied data in the prior art and can not ensure data storage stability.

Description

A kind of storage method and device of time series data
Technical field
The present invention relates to field of computer technology more particularly to the storage methods and device of a kind of time series data.
Background technology
In industrial information process, real time historical database always is the direction of industrial circle primary study, involved And industry include oil, electric power, metallurgy, chemical industry etc..Real time historical database is equal in directions such as data monitoring, management and storages Status with core, creation data is described using measuring point as basic unit, and a measuring point represents an actual number According to the temperature etc. of voltage, some test point on source, such as a certain power transmission line.It is growing in commercial Application in order to meet Transactional demands, every field all carried out a large amount of system research to real time historical database.
With the rapid development of Internet of Things, various smart machines are widely used in industrial circle, more and more numbers According in collected real time historical database.Since production control process is to run without interruption for 24 hours, with the time Passage, the total amount of historical data constantly accumulates, and the historical data total amount of generation can reach TB even PB ranks.Current reality When historical data base be deployed on server of good performance, but the historical data of magnanimity still can give current mainstream clothes Business device hardware strap carrys out pressure, or even is difficult to be solved the problems, such as by way of hardware expanding.
Industrial data have typical time series data feature, by smart machine upload data be typically band sometimes Between label data flow, time tag is the important screening conditions of the inquiry of historical data.Current real time historical database is adopted With special storage mode, to improve the processing capacity to time related sequence data, but the scalability of its storage is still So there are bottlenecks.
The distributed file system of Hbase (Hadoop Database, distributed memory system) bottom as it is a kind of it is low at This distributed storage solution, is widely used in recent years.But Hbase in face of mass data when being written Having certain delay, mass data, which enters Hbase, can ceaselessly trigger big file Split operation in Hbase, and file Segmentation is usually happened at after file Merge operation, and the I/O loads in cluster are extremely high at this time.Meanwhile file division meeting Cause the temporary offline of server, data filing request that can be blocked, system response time can be extremely unstable, the filing of data Speed is unable to get guarantee.
In conclusion in face of the mass data of industrial circle, currently without a kind of storage that can both meet current data Capacity requirement, and can ensure the method for data storage stability.
Invention content
The application provides a kind of storage method and device of time series data, solves and cannot be satisfied number in the prior art According to storage capacity requirement and the technical issues of can not ensure data storage stability.
An embodiment of the present invention provides a kind of storage methods of time series data, including:
It will be in data to be stored storage to Hbase databases;
Data volume in the Hbase databases is counted, statistical result is obtained;
Merge in measurement period in n-th and the data in the Hbase databases are merged, the text after being merged Part;
The N+1 data storage capacity s merged in measurement period is obtained according to the statistical result;
Judge whether the data storage capacity s merges the preset data amount of storage of measurement period more than the N+1;
If it is not, being then split to the file after the merging.
Further, it in the storage to Hbase databases by data to be stored, specifically includes:
The data to be stored is stored in buffer area;
Judge whether the data capacity in the buffer area reaches preset data capacity threshold;
And/or
Judge whether the storage period of the buffer area reaches default storage Ct value;
If the data capacity in the buffer area reaches the storage of the preset data capacity threshold and/or the buffer area Period reaches the default storage Ct value, will be in data to be stored storage to the Hbase databases.
Further, it in the storage to the Hbase databases by the data to be stored, specifically includes:
The mark of the data to be stored and storage time are stabbed to the line unit as the Hbase databases;
Using the content of the data to be stored as the filing data of the Hbase databases;
Using the line unit and the filing data as the data item of Hbase databases storage to the Hbase numbers According in library.
Further, the data volume in the Hbase databases counts, and specifically includes:
Judge the storage moment T of the data to be stored0Whether earlier than the preset earliest statistics moment;
If so, the data to be stored is abandoned;
Judge the storage moment T of the data to be stored0Whether between preset earliest statistics moment and preset Between the statistics moment the latest;
If so, the data volume in the former Hbase databases is added the data to be stored, statistics is completed;
Judge whether the storage moment T0 of the data to be stored is later than and described preset counts the moment the latest;
If so, using the storage moment T0 of the data to be stored as the new statistics moment the latest, and by the original Hbase Data volume in database adds the data to be stored, completes statistics.
Further, the data in the Hbase databases merge, and specifically include:
Judge that the Hbase databases merge whether the data writing in measurement period reaches default in the n-th Data writing threshold value;
If so, the data item in the Hbase databases, which is pressed ranks key ascending order, is written an independent file;
Judge whether the number of file under corresponding server in the Hbase databases is more than that default file merges threshold value;
If so, merging the file in the corresponding server.
The embodiment of the present invention additionally provides a kind of storage device of time series data, including:
Data memory module, for storing data to be stored into Hbase databases;
Data statistics module obtains statistical result for being counted to the data volume in the Hbase databases;
Data combiners block closes the data in the Hbase databases for merging in measurement period in n-th And the file after being merged;
Data processing module obtains the N+1 data merged in measurement period storage for described according to statistical result Measure s;
Judgment module, for judge the data storage capacity s whether more than the N+1 merging measurement period preset data Amount of storage;
Divide module, if the judging result for the judgment module is no, the file after the merging is split.
Further, the data memory module, specifically includes:
Data storage execution unit, for the data to be stored to be stored in buffer area;
Whether the first judging unit, the data capacity for judging in the buffer area reach preset data capacity threshold;
First data storage subunit operable will be described to be stored if the judging result for first judging unit is yes In data storage to the Hbase databases;
Second judgment unit, for judging whether the storage period of the buffer area reaches default storage Ct value;
Second data storage subunit operable will be described to be stored if the judging result for the second judgment unit is yes In data storage to the Hbase databases.
Further, first data storage subunit operable, if specifically for the judging result of first judging unit It is yes, the mark of the data to be stored and storage time is stabbed into the line unit as the Hbase databases, it will be described to be stored Filing data of the content of data as the Hbase databases;Using the line unit and the filing data as the Hbase In the data item storage to the Hbase databases of database;
Second data storage subunit operable, if the judging result specifically for the second judgment unit is yes, by institute The line unit of the mark and storage time stamp of data to be stored as the Hbase databases is stated, it will be in the data to be stored Hold the filing data as the Hbase databases;Using the line unit and the filing data as the Hbase databases In data item storage to the Hbase databases.
Further, the data statistics module, specifically includes:
Third judging unit, the storage moment T for judging the data to be stored0Whether earlier than preset earliest statistics Moment;
Data discarding unit loses the data to be stored if the judging result for the third judging unit is yes It abandons;
4th judging unit, the storage moment T for judging the data to be stored0Whether between described preset earliest It counts between moment and preset statistics moment the latest;
First objects of statistics, if the judging result for the 4th judging unit is yes, by the former Hbase databases In data volume add the data to be stored, complete statistics, obtain the statistical result;
5th judging unit, the storage moment T for judging the data to be stored0Whether be later than it is described it is preset the latest Count the moment;
Second objects of statistics, if the judging result for the 5th judging unit is yes, by the data to be stored Store moment T0As the new statistics moment the latest, and by the data volume in the former Hbase databases plus described to be stored Data complete statistics, obtain the statistical result.
Further, the data combiners block, specifically includes:
6th judging unit is write for judging that the Hbase databases merge the data in measurement period in the n-th Enter whether amount reaches preset data writing threshold value;
File generating unit will be in the Hbase databases if the judging result for the 6th judging unit is yes Data item press ranks key ascending order be written an independent file;
7th judging unit, for judge file under corresponding server in the Hbase databases number whether be more than Default file merges threshold value;
Merge execution unit and merges the corresponding server if the judging result for the 7th judging unit is yes In file, obtain the file after the merging.
The one or more technical solutions provided in the embodiment of the present invention, have at least the following technical effects or advantages:
1, data to be stored is stored into Hbase databases, and the data volume in Hbase databases is counted, Obtain statistical result;The data storage capacity s in next merging measurement period is obtained according to statistical result;Judge data storage capacity Whether s is beyond the preset amount of storage for merging data in measurement period;If it is not, being split to the file after merging.Due to Hbase is a support mass data storage, and has the distributed data base of expanding storage depth and data redundancy ability, Therefore the embodiment of the present invention disclosure satisfy that the memory requirement of historical time sequence data.Further, since Hbase in the prior art What merging and segmentation to internal data file were taken is the static policies based on preset parameter, thus cannot be adapted to sea well Measure the persistence requirement of time series data.The application improves Hbase internal data texts by the scheme based on dynamic statistics Part merges the efficiency with segmentation, not only reduces the delay of data access, and also assure the stability of data access, very well The persistence that ground has been adapted to magnanimity time series data wants summed data visiting demand.
2, first data buffer storage to be stored is got up, then the data for a period of time being cached disposably is stored in In Hbase, to reduce the number of magnetic disc i/o number and network communication during data persistence, and then greatly improve The persistence efficiency of data and the data throughout of Hbase.
3, the mark of data to be stored and storage time are stabbed to the line unit as Hbase databases;By data to be stored Filing data of the content as Hbase databases;Line unit and filing data are arrived as the data item storage of Hbase databases In Hbase databases.The embodiment of the present invention is proposed the scheme of time series data storage to Hbase databases, it is ensured that Hbase databases can adapt to the memory requirement of historical data.Due to the embodiment of the present invention to the line units of Hbase databases according to The mark of data to be stored carries out classification and ordination and simultaneously implements distributed storage, thus can be by when on the one hand ensure that data storage It is reasonably distributed under distributed environment on each node of Hbase databases, without triggering the read-write in Hbase databases Hot spot;On the other hand also ensure that the data of same observation station will not be spread out in physical store and the height of inquiry is caused to open Pin and poor efficiency.In addition, continuous data arrangement allows the inquiry of historical data easily using the scanning of Hbase databases Required network transmission volume is inquired in mechanism, reduction every time.
Description of the drawings
Fig. 1 is the flow chart of the storage method for the time series data that the embodiment of the present invention one provides;
Fig. 2 is that the storage method of a time series data provided through the embodiment of the present invention carries out industrial historical data The flow chart of storage;
Fig. 3 is that data are stored in Hbase's by the storage method of a time series data provided through the embodiment of the present invention Principle schematic;
Fig. 4 is the storage method of a time series data provided through the embodiment of the present invention to each server in cluster The flow chart that upper each writing for merging historical data in measurement period is counted;
Fig. 5 is that the storage method of a time series data provided through the embodiment of the present invention carries out certainly Hbase files Adapt to the flow chart of segmentation;
Fig. 6 is the module map of the storage device of time series data provided by Embodiment 2 of the present invention.
Specific implementation mode
The embodiment of the present invention solves in the prior art by providing a kind of storage method and device of time series data The technical issues of cannot be satisfied the storage capacity requirement of data and can not ensureing data storage stability.
Technical solution in the embodiment of the present invention is in order to solve the above technical problems, general thought is as follows:
Data to be stored is stored into Hbase databases, and the data volume in Hbase databases is counted, is obtained Obtain statistical result;The data storage capacity s in next merging measurement period is obtained according to statistical result;Judge data storage capacity s Whether beyond the preset amount of storage for merging data in measurement period;If it is not, being split to the file after merging.Due to Hbase is a support mass data storage, and has the distributed data base of expanding storage depth and data redundancy ability, Therefore the embodiment of the present invention disclosure satisfy that the memory requirement of historical time sequence data.Further, since Hbase in the prior art What merging and segmentation to internal data file were taken is the static policies based on preset parameter, thus cannot be adapted to sea well Measure the persistence requirement of time series data.The embodiment of the present invention is improved by the scheme based on dynamic statistics inside Hbase Data file merges the efficiency with segmentation, not only reduces the delay of data access, and also assure the stabilization of data access Property, the persistence for being adapted to magnanimity time series data well wants summed data visiting demand.
Above-mentioned technical proposal in order to better understand, in conjunction with appended figures and specific embodiments to upper Technical solution is stated to be described in detail.
Embodiment one
Referring to Fig. 1, the storage method of time series data provided in an embodiment of the present invention, including:
Step S110:It will be in data to be stored storage to Hbase databases;
This step is illustrated, step S110 is specifically included:
Data to be stored is stored in buffer area;
Judge whether the data capacity in buffer area reaches preset data capacity threshold;
And/or
Judge whether the storage period of buffer area reaches default storage Ct value;
If the data capacity in buffer area reaches preset data capacity threshold and/or the storage period of buffer area reaches default Ct value is stored, it will be in data to be stored storage to Hbase databases;
Otherwise, without any processing.
It is illustrated to storing data to be stored to the specific steps in Hbase databases in step S110:
The mark of data to be stored and storage time are stabbed to the line unit as Hbase databases;
Using the content of data to be stored as the filing data of Hbase databases;
Using line unit and filing data as in the storage to Hbase databases of the data item of Hbase databases.
In the present embodiment, the reality when length of each data item is depending on historical data archiving in Hbase databases Size.
Step S120:Data volume in Hbase databases is counted, statistical result is obtained;
This step is illustrated, step S120 is specifically included:
Judge the storage moment T of data to be stored0Whether earlier than the preset earliest statistics moment;
If so, illustrating that the deposit of data to be stored is illegal, data to be stored is abandoned;
Judge the storage moment T of data to be stored0Whether preset count between the preset earliest statistics moment and the latest Between moment;
If so, the data volume in former Hbase databases is added data to be stored, statistics is completed;
Judge the storage moment T of data to be stored0Whether it is later than and preset counts the moment the latest;
If so, by the storage moment T of data to be stored0As the new statistics moment the latest, and will be in former Hbase databases Data volume add data to be stored, complete statistics.
Step S130:Merge in measurement period in n-th and the data in Hbase databases are merged, is merged File afterwards;
This step is illustrated, step S130 is specifically included:
Judge whether the data writing that Hbase databases merge in n-th in measurement period reaches preset data write-in Measure threshold value;
If so, the data item in Hbase databases, which is pressed ranks key ascending order, is written an independent file;
If it is not, then without any processing.
Judge whether the number of file under corresponding server in Hbase databases is more than that default file merges threshold value;
If so, merging the file in corresponding server;
If it is not, then without any processing.
Step S140:The N+1 data storage capacity s merged in measurement period is obtained according to statistical result;
Step S150:Judge whether data storage capacity s merges the storage of the preset data of measurement period more than the N+1 Amount;
If so, without any processing;
If it is not, being then split to the file after merging.
Referring to Fig. 2, the specific steps that the method that provides through the embodiment of the present invention stores industrial historical data are such as Under:
Step 301:Initialization.Create buffer area and Hbase client pools;Detailed process includes:Utilize the API of Hbase (Application Program interface, application programming interfaces) function opens Hbase tables of data.If return value is sky, Illustrate historical data table informal in Hbase, then presses unified naming rule HisData and create a table H.Otherwise, it says Existing formal historical data table in bright Hbase.Column family where whether having historical data in table H is judged again;If it is not, Column family F is created by unified naming rule;If so, all measuring point arrangement information are then read from database, for each survey The consistent buffer area of point two block sizes of distribution, the data value number that each measuring point may store in buffer area, which is fixed and opened, determines When device thread;The ponds the HTable Pool of two fixed sizes is created according to configurationFAnd PoolAManage the write-in of measuring point, this be for Ensure that the safety of data under multi-thread environment, all Hbase clients connection ZooKeeper simultaneously keep session.Meanwhile matching It sets all by PoolFThe write request of client is directly delivered to corresponding server in pond, and configuration is all to pass through PoolAPond Interior each object opens included spatial cache, and data use and automatically write mode.Wherein, the number of object depends in pond The performance of current server and the needs of practical application scene.
Step 302:It receives data and is inserted into request, by data deposit measuring point NIDCorresponding read-write state is the caching of write state Then area updates measuring point NIDBuffer data size information DataNum in corresponding buffer area.
Step 303:Judge measuring point NIDIt is full whether corresponding buffer area is write, that is, judges whether is data capacity in buffer area Reach preset data capacity threshold;If so, thening follow the steps 304;If not, thening follow the steps 305.
Step 304:Acquisition request HTable pond PoolFInterior currently available idle object CF.If it is right to return to a sky As, then it represents that currently without idle object, continue to attempt to obtain idle object C after waiting for 1 millisecondF.Otherwise the ponds HTable are obtained PoolFInterior currently available idle object CF, referring to Fig. 3, the mark for the data being inserted into according to request and request moment generate row Key RowKey, and filing data ColumnData is generated using the data of buffer area as byte arrays, by RowKey and The data item Item of ColumnData composition filings Hbase.Utilize the available idle object CFThe write-in interface of offer will Item is delivered to corresponding server in Hbase and is handled immediately;
Step 305:Judge measuring point NIDBuffering area write time whether be more than buffer area set by timer filing Time;If it does, thening follow the steps 306;If be not above, terminate this flow.
Step 306:Acquisition request HTable pond PoolAInterior currently available idle object CA.If it is right to return to a sky As, then it represents that currently without idle object, continue to attempt to obtain idle object C after waiting for 1 millisecondA.Otherwise the ponds HTable are obtained PoolAInterior currently available idle object CA.The mark for the data being inserted into according to request and request moment generate line unit RowKey, and filing data ColumnData is generated using the data of buffer area as byte arrays, by RowKey and The data item Item of ColumnData composition filings Hbase.Utilize the available idle object CAThe interface of offer founds Item Corresponding server in Hbase is delivered to be handled.
It should be noted that after industrial historical data enters Hbase, the embodiment of the present invention can be constantly in statistical cluster Respectively merge the writing of historical data in measurement period on each server.When carrying out file division, next merging is judged Whether the writing of historical data is within the data value of systemic presupposition in measurement period, if so, file division is carried out again, it is no Any action is not done then.
Separately below to respectively merging the writing and text of historical data in measurement period in statistical cluster on each server The specific steps of part segmentation illustrate.
Referring to Fig. 4, the writing to respectively merging historical data in measurement period in cluster on each server counts It is as follows:
Step 401:Initialization;After each process normally starts in Hbase clusters, created in each Region Server Data statistics queue Queue is built, the basic unit of queue is encapsulated as Writing Record objects, each objects of statistics Writing Record are responsible for one filing amount for merging historical data in statistic period T of statistics, each objects of statistics in queue The time range that Writing Record are responsible for statistics is continual incremented by successively.Initialize the objects of statistics in queue Writing Record, even its built-in variable records is equal to 0.The length of data statistics queue Queue be LEN, LEN from Queue is created to destruction and remains unchanged.Wherein, in the present embodiment, it is 20 seconds to merge statistic period T.
Step 402:After Reigon Server receive a data insertion request, the write-ahead log switch in Hbase is judged Whether the value of variable is True;If so, illustrating that write-ahead log has turned on, daily record is written into data item Item, otherwise directly will Data item Item is inserted into the corresponding position of memory according to line unit, keeps the data item in memory integrally orderly.If data be inserted at Work(thens follow the steps 403.
Step 403:It obtains current data and is inserted into time T0, synchronous to obtain data statistics queue Queue cover time ranges Initial time TSWith deadline TE.Wherein, initial time TSFor preset earliest statistics moment, deadline TEIt is default The statistics moment the latest.If T0Less than TS, then illustrate that the insertion time of current historical data is illegal, by current historical data It abandons, any processing is not made to data queue, directly terminate this flow;If T0More than TE, then illustrate current data statistics team Object there is no to be responsible for T in row Queue0The statistics of data writing in the cycle T of place executes step 404;If T0Be more than or Person is equal to TSAnd T0Less than or equal to TE, then illustrate that the insertion time of current historical data is in data statistics queue Queue Cover time within the scope of, directly execute step 405.
Step 404:Obtain the head subject WR of data statistics queue QueueHEAD, and queue is removed it, it is synchronous to obtain The tail object WR of data statistics queue QueueTAIL, obtain queue and cover deadline QTE.New objects of statistics WR ' is created, And it sets WR ' and is responsible for statistics QTETo QTEThe writing of historical data within the scope of+T time after the completion adds the data being newly inserted into Enter the tail portion of data statistics queue Queue, and repeats step 403.
Step 405:It is inserted into time T according to current data0It is synchronous to obtain corresponding statistics pair in data statistics queue Queue As WR0, to WR0Internal variable counts records and increases current data size, terminates this flow.
Referring to Fig. 5, in embodiments of the present invention, adaptivenon-uniform sampling is carried out to Hbase files and is as follows:
Step 501:Data item Item is inserted into the designated position in Hbase memories MemStore according to line unit, judges to work as Whether the data writing that preceding MemStore merges at one in measurement period reaches preset data writing threshold value;If not yet Have, then terminate this flow, all data item in MemStore, which are otherwise pressed ranks key ascending order, is written an independent file StoreFile, and execute step 502.
Step 502:Judge whether the number of file under current Region Server is more than that default file merges threshold value Compact Threshold and current Region Server are in operating status;If it is not, terminating this flow, otherwise hold Row step 503.
Step 503:By the file in current Region Server according at least Min of sequential selection from small to largeCAnd extremely More MaxCA file StoreFile, and it is merged into file File one bigC, continue to execute step 504.Wherein, MinCAnd MaxC It is configured according to practical application scene demand.In the present embodiment, MinCIt is 64, MaxCIt is 256.
Step 504:N number of objects of statistics Writing Record that the time is nearest are selected out of data statistics queue Queue, N is less than queue length LEN.According to a sliding average algorithm, it is calculated in data statistics queue Queue from deadline TE Play next writing s for merging historical data in statistic period T.
Step 505:Judge s whether be more than preset unit period in data writing upper limit value Writing Upper Limit judges whether s is more than the preset amount of storage for merging data in measurement period;If it does, not making any action then And terminate this flow, it is no to then follow the steps 506.Wherein, in the present embodiment, Writing Upper Limit are 64M.
Step 506:To the file File after mergingCFile Split operation is carried out, this flow is terminated.
Embodiment two
Referring to Fig. 6, the storage device of time series data provided in an embodiment of the present invention, including:
Data memory module 100, for storing data to be stored into Hbase databases;
Data memory module 100 is illustrated, data memory module 100 specifically includes:
Data storage execution unit, for data to be stored to be stored in buffer area;
First judging unit, for judging whether the data capacity in buffer area reaches preset data capacity threshold;
First data storage subunit operable stores data to be stored if the judging result for the first judging unit is yes Into Hbase databases;
Specifically, in the present embodiment, the first data storage subunit operable, if specifically for the judgement knot of the first judging unit Fruit is yes, the mark of data to be stored and storage time is stabbed the line unit as Hbase databases, by the content of data to be stored Filing data as Hbase databases;Using line unit and filing data as the data item of Hbase databases storage to Hbase In database;
Second judgment unit, for judging whether the storage period of buffer area reaches default storage Ct value;
Second data storage subunit operable stores data to be stored if the judging result for second judgment unit is yes Into Hbase databases;
Specifically, in the present embodiment, the second data storage subunit operable, if specifically for the judgement knot of second judgment unit Fruit is yes, the mark of data to be stored and storage time is stabbed the line unit as Hbase databases, by the content of data to be stored Filing data as Hbase databases;Using line unit and filing data as the data item of Hbase databases storage to Hbase In database;
In the present embodiment, the reality when length of each data item is depending on historical data archiving in Hbase databases Size.
Data statistics module 200 obtains statistical result for being counted to the data volume in Hbase databases;
Data statistics module 200 is illustrated, in the present embodiment, data statistics module 200 specifically includes:
Third judging unit, the storage moment T for judging data to be stored0When whether earlier than preset earliest statistics It carves;
Data discarding unit illustrates that the deposit of data to be stored is non-if the judging result for third judging unit is yes Method abandons data to be stored;
4th judging unit, the storage moment T for judging data to be stored0Whether between the preset earliest statistics moment Between the preset statistics moment the latest;
First objects of statistics, if the judging result for the 4th judging unit is yes, by the data in former Hbase databases Amount adds data to be stored, completes statistics, obtains statistical result;
5th judging unit, the storage moment T for judging data to be stored0Whether it is later than preset when counting the latest It carves;
Second objects of statistics, if the judging result for the 5th judging unit is yes, by the storage moment of data to be stored T0Data to be stored is added as the new statistics moment the latest, and by the data volume in former Hbase databases, statistics is completed, obtains Obtain statistical result.
Data combiners block 300 closes the data in Hbase databases for merging in measurement period in n-th And the file after being merged;
Data combiners block 300 is illustrated, in the present embodiment, data combiners block 300 specifically includes:
6th judging unit, for whether judging data writing of the Hbase databases in n-th merging measurement period Reach preset data writing threshold value;
File generating unit, if the judging result for the 6th judging unit is yes, by the data item in Hbase databases It presses ranks key ascending order and an independent file is written;
7th judging unit, for judging whether the number of file under corresponding server in Hbase databases is more than default Piece file mergence threshold value;
Merge execution unit, if the judging result for the 7th judging unit is yes, merge the file in corresponding server, File after being merged.
Data processing module 400, for obtaining the N+1 data storage capacity merged in measurement period according to statistical result s;
Judgment module 500, for judge data storage capacity s whether more than the N+1 merging measurement period preset data Amount of storage;
Divide module 600, if the judging result for judgment module 500 is no, the file after merging is split.
【Technique effect】
1, data to be stored is stored into Hbase databases, and the data volume in Hbase databases is counted, Obtain statistical result;The data storage capacity s in next merging measurement period is obtained according to statistical result;Judge data storage capacity Whether s is beyond the preset amount of storage for merging data in measurement period;If it is not, being split to the file after merging.Due to Hbase is a support mass data storage, and has the distributed data base of expanding storage depth and data redundancy ability, Therefore the embodiment of the present invention disclosure satisfy that the memory requirement of historical time sequence data.Further, since Hbase in the prior art What merging and segmentation to internal data file were taken is the static policies based on preset parameter, thus cannot be adapted to sea well Measure the persistence requirement of time series data.The embodiment of the present invention is improved by the scheme based on dynamic statistics inside Hbase Data file merges the efficiency with segmentation, not only reduces the delay of data access, and also assure the stabilization of data access Property, the persistence for being adapted to magnanimity time series data well wants summed data visiting demand.
2, first data buffer storage to be stored is got up, then the data for a period of time being cached disposably is stored in In Hbase, to reduce the number of magnetic disc i/o number and network communication during data persistence, and then greatly improve The persistence efficiency of data and the data throughout of Hbase.
3, the mark of data to be stored and storage time are stabbed to the line unit as Hbase databases;By data to be stored Filing data of the content as Hbase databases;Line unit and filing data are arrived as the data item storage of Hbase databases In Hbase databases.The embodiment of the present invention is proposed the scheme of time series data storage to Hbase databases, it is ensured that Hbase databases can adapt to the memory requirement of historical data.Due to the embodiment of the present invention to the line units of Hbase databases according to The mark of data to be stored carries out classification and ordination and simultaneously implements distributed storage, thus can be by when on the one hand ensure that data storage It is reasonably distributed under distributed environment on each node of Hbase databases, without triggering the read-write in Hbase databases Hot spot;On the other hand also ensure that the data of same observation station will not be spread out in physical store and the height of inquiry is caused to open Pin and poor efficiency.In addition, continuous data arrangement allows the inquiry of historical data easily using the scanning of Hbase databases Required network transmission volume is inquired in mechanism, reduction every time.
The embodiment of the present invention by the good Data Structure Design in Hbase, improve time series data storage and The efficiency of inquiry.In addition, the embodiment of the present invention also utilizes the statistical result of periodic data storage capacity, to the number of future statistics It is predicted according to amount of storage, and decision is carried out to the merging of Hbase data files and segmentation using prediction result, by this strategy The generation that potential Piece file mergence or segmentation storm inside Hbase can not only be greatly reduced, effectively improves reading and writing data Performance.And also assure the writing speed of the service clock availability and mass data of system.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (8)

1. a kind of storage method of time series data, which is characterized in that including:
It will be in data to be stored storage to Hbase databases;
Data volume in the Hbase databases is counted, statistical result is obtained;
Merge in measurement period in n-th and the data in the Hbase databases are merged, the file after being merged;
The N+1 data storage capacity s merged in measurement period is obtained according to the statistical result;
Judge whether the data storage capacity s merges the preset data amount of storage of measurement period more than the N+1;
If it is not, being then split to the file after the merging;
Wherein, the data volume in the Hbase databases counts, and specifically includes:
Judge the storage moment T of the data to be stored0Whether earlier than the preset earliest statistics moment;
If so, the data to be stored is abandoned;
Judge the storage moment T of the data to be stored0Whether preset unite between the preset earliest statistics moment and the latest Between timing is carved;
If so, the data volume in the former Hbase databases is added the data to be stored, statistics is completed;
Judge the storage moment T of the data to be stored0Whether it is later than and described preset counts the moment the latest;
If so, by the storage moment T of the data to be stored0As the new statistics moment the latest, and by the former Hbase data Data volume in library adds the data to be stored, completes statistics.
2. the method as described in claim 1, which is characterized in that in the storage to Hbase databases by data to be stored, tool Body includes:
The data to be stored is stored in buffer area;
Judge whether the data capacity in the buffer area reaches preset data capacity threshold;
And/or
Judge whether the storage period of the buffer area reaches default storage Ct value;
If the data capacity in the buffer area reaches the storage period of the preset data capacity threshold and/or the buffer area Reach the default storage Ct value, it will be in data to be stored storage to the Hbase databases.
3. method as claimed in claim 2, which is characterized in that described by data to be stored storage to the Hbase numbers According in library, specifically include:
The mark of the data to be stored and storage time are stabbed to the line unit as the Hbase databases;
Using the content of the data to be stored as the filing data of the Hbase databases;
Using the line unit and the filing data as the data item of Hbase databases storage to the Hbase databases In.
4. method as claimed in claim 3, which is characterized in that the data in the Hbase databases merge, It specifically includes:
Judge that the Hbase databases merge whether the data writing in measurement period reaches preset data in the n-th Writing threshold value;
If so, the data item in the Hbase databases, which is pressed ranks key ascending order, is written an independent file;
Judge whether the number of file under corresponding server in the Hbase databases is more than that default file merges threshold value;
If so, merging the file in the corresponding server.
5. a kind of storage device of time series data, which is characterized in that including:
Data memory module, for storing data to be stored into Hbase databases;
Data statistics module obtains statistical result for being counted to the data volume in the Hbase databases;
Data combiners block merges the data in the Hbase databases for merging in measurement period in n-th, File after being merged;
Data processing module, for obtaining the N+1 data storage capacity s merged in measurement period according to the statistical result;
Judgment module, for judging whether the preset data of the measurement period of the merging more than the N+1 is deposited by the data storage capacity s Reserves;
Divide module, if the judging result for the judgment module is no, the file after the merging is split;
The data statistics module, specifically includes:
Third judging unit, the storage moment T for judging the data to be stored0Whether earlier than the preset earliest statistics moment;
Data discarding unit abandons the data to be stored if the judging result for the third judging unit is yes;
4th judging unit, the storage moment T for judging the data to be stored0Whether between the preset earliest statistics Between moment and preset statistics moment the latest;
First objects of statistics will be in the former Hbase databases if the judging result for the 4th judging unit is yes Data volume adds the data to be stored, completes statistics, obtains the statistical result;
5th judging unit, the storage moment T for judging the data to be stored0Whether described preset the latest count is later than Moment;
Second objects of statistics, if the judging result for the 5th judging unit is yes, by the storage of the data to be stored Moment T0The data to be stored is added as the new statistics moment the latest, and by the data volume in the former Hbase databases, Statistics is completed, the statistical result is obtained.
6. device as claimed in claim 5, which is characterized in that the data memory module specifically includes:
Data storage execution unit, for the data to be stored to be stored in buffer area;
Whether the first judging unit, the data capacity for judging in the buffer area reach preset data capacity threshold;
First data storage subunit operable, if the judging result for first judging unit is yes, by the data to be stored It stores in the Hbase databases;
Second judgment unit, for judging whether the storage period of the buffer area reaches default storage Ct value;
Second data storage subunit operable, if the judging result for the second judgment unit is yes, by the data to be stored It stores in the Hbase databases.
7. device as claimed in claim 6, which is characterized in that
First data storage subunit operable waits for if the judging result specifically for first judging unit is yes by described The line unit of the mark and storage time stamp of data as the Hbase databases is stored, the content of the data to be stored is made For the filing data of the Hbase databases;Using the line unit and the filing data as the data of the Hbase databases In item storage to the Hbase databases;
Second data storage subunit operable waits for if the judging result specifically for the second judgment unit is yes by described The line unit of the mark and storage time stamp of data as the Hbase databases is stored, the content of the data to be stored is made For the filing data of the Hbase databases;Using the line unit and the filing data as the data of the Hbase databases In item storage to the Hbase databases.
8. device as claimed in claim 7, which is characterized in that the data combiners block specifically includes:
6th judging unit, for judging that the Hbase databases merge the data writing in measurement period in the n-th Whether preset data writing threshold value is reached;
File generating unit, if the judging result for the 6th judging unit is yes, by the number in the Hbase databases Ranks key ascending order is pressed according to item, and an independent file is written;
7th judging unit, for judging whether the number of file under corresponding server in the Hbase databases is more than default Piece file mergence threshold value;
Merge execution unit if the judging result for the 7th judging unit is yes to merge in the corresponding server File obtains the file after the merging.
CN201510429895.5A 2015-07-21 2015-07-21 A kind of storage method and device of time series data Active CN105005617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510429895.5A CN105005617B (en) 2015-07-21 2015-07-21 A kind of storage method and device of time series data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510429895.5A CN105005617B (en) 2015-07-21 2015-07-21 A kind of storage method and device of time series data

Publications (2)

Publication Number Publication Date
CN105005617A CN105005617A (en) 2015-10-28
CN105005617B true CN105005617B (en) 2018-10-12

Family

ID=54378293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510429895.5A Active CN105005617B (en) 2015-07-21 2015-07-21 A kind of storage method and device of time series data

Country Status (1)

Country Link
CN (1) CN105005617B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243109B (en) 2015-09-25 2021-10-15 华为技术有限公司 Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus
CN107491458B (en) * 2016-06-13 2021-08-31 阿里巴巴集团控股有限公司 Method, device and system for storing time series data
CN106407453A (en) * 2016-09-30 2017-02-15 郑州云海信息技术有限公司 Performance data management method and system
CN106682100B (en) * 2016-12-02 2020-10-20 浙江宇视科技有限公司 Data statistics method and system based on Hbase database
CN108255533B (en) * 2016-12-28 2021-12-17 平安科技(深圳)有限公司 System configuration changing method and device
CN106843770A (en) * 2017-01-23 2017-06-13 北京思特奇信息技术股份有限公司 A kind of distributed file system small file data storage, read method and device
JP6875697B2 (en) * 2017-02-20 2021-05-26 株式会社Kmc Production information collection system and production information collection method
CN107229673A (en) * 2017-04-20 2017-10-03 努比亚技术有限公司 Method for writing data, Hbase terminals and the storage medium of Hbase databases
CN108804347A (en) * 2017-05-05 2018-11-13 华中科技大学 A kind of cache layer, collecting system and method for industrial big data convergence
CN107491314A (en) * 2017-08-30 2017-12-19 四川长虹电器股份有限公司 Processing method is write based on Read-Write Locks algorithm is accessible to HBASE real time datas
CN108038171B (en) * 2017-12-07 2020-07-03 杭州电魂网络科技股份有限公司 Data writing method and device and data server
CN110019239B (en) * 2017-12-29 2021-06-04 百度在线网络技术(北京)有限公司 Storage method and device of reported data, electronic equipment and storage medium
CN108563698B (en) 2018-03-22 2021-11-23 中国银联股份有限公司 Region merging method and device for HBase table
CN108647243B (en) * 2018-04-13 2021-11-23 中国神华能源股份有限公司 Industrial big data storage method based on time series
CN110502543B (en) * 2019-08-07 2022-07-12 京信网络系统股份有限公司 Equipment performance data storage method, device, equipment and storage medium
CN114402313A (en) * 2019-11-08 2022-04-26 深圳市欢太科技有限公司 Label updating method and device, electronic equipment and storage medium
CN112685008B (en) * 2020-11-30 2024-08-16 上海赫千电子科技有限公司 Service failure control method adopting service-oriented architecture based on AUTOSAR
WO2022126551A1 (en) * 2020-12-17 2022-06-23 北京涛思数据科技有限公司 Method for storing time series data
CN112632347B (en) * 2021-01-14 2024-01-23 加和(北京)信息科技有限公司 Data screening control method and device and nonvolatile storage medium
CN115269594A (en) * 2022-07-20 2022-11-01 清云智通(北京)科技有限公司 Industrial data processing method and system and computing equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101641674A (en) * 2006-10-05 2010-02-03 斯普兰克公司 Time series search engine
CN103605805A (en) * 2013-12-09 2014-02-26 冶金自动化研究设计院 Storage method of massive time series data
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system
CN104216989A (en) * 2014-09-09 2014-12-17 广东电网公司中山供电局 Method for storing transmission line integrated data based on HBase

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8745014B2 (en) * 2011-10-19 2014-06-03 Pivotal Software, Inc. Time series data mapping into a key-value database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101641674A (en) * 2006-10-05 2010-02-03 斯普兰克公司 Time series search engine
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system
CN103605805A (en) * 2013-12-09 2014-02-26 冶金自动化研究设计院 Storage method of massive time series data
CN104216989A (en) * 2014-09-09 2014-12-17 广东电网公司中山供电局 Method for storing transmission line integrated data based on HBase

Also Published As

Publication number Publication date
CN105005617A (en) 2015-10-28

Similar Documents

Publication Publication Date Title
CN105005617B (en) A kind of storage method and device of time series data
CN102843396B (en) Data write-in and read method and device in a kind of distributed cache system
CN101201801B (en) Classification storage management method for VOD system
CN104899297B (en) Create the method with the hybrid index of storage perception
CN104899156B (en) A kind of diagram data storage and querying method towards extensive social networks
CN103366016B (en) E-file based on HDFS is centrally stored and optimization method
CN104699422B (en) Data cached determination method and device
CN107798130A (en) A kind of Snapshot Method of distributed storage
CN106201916B (en) A kind of nonvolatile cache method towards SSD
CN104580437A (en) Cloud storage client and high-efficiency data access method thereof
WO2013165532A1 (en) Method and system for managing power grid data
CN110188108A (en) Date storage method, device, system, computer equipment and storage medium
CN108139872A (en) A kind of buffer memory management method, cache controller and computer system
CN107888687B (en) Proxy client storage acceleration method and system based on distributed storage system
CN105938447B (en) Data backup device and method
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
CN109471843A (en) A kind of metadata cache method, system and relevant apparatus
CN107436738A (en) A kind of date storage method and system
CN106469123A (en) A kind of write buffer distribution based on NVDIMM, method for releasing and its device
CN105243030A (en) Data caching method
CN107506146A (en) A kind of data-storage system
KR20170052441A (en) Centralized distributed systems and methods for managing operations
CN103729239A (en) Distributed type lock algorithm of mirror-image metadata
CN108614847A (en) A kind of caching method and system of data
CN104811646B (en) The storage method of the modulation of multiple video strems Data Concurrent and buffering based on Coutinuous store model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant