A kind of OLAP method for dynamically caching and device
Technical field
The present invention relates to technical field of data storage more particularly to a kind of OLAP method for dynamically caching and device.
Background technique
Caching technology is a universal solution for promoting query performance, increasing concurrency, but in OLAP engine,
If it had included again real time data that an inquiry, which had not only included historical data, caching technology difficult to use because initial data with
The passage of time is changing always, and corresponding to cause result data that change therewith, this will lead to result data can not yet
It is buffered in time.For example, real-time statistics are from 0 point to the sales volume at current time, if the sales volume of current queries statistics is 10000
Member then carries out query statistic again after one minute, if one 100 yuan of sale has occurred during this, gets in the buffer
Statistics sales volume be still previous 10000 yuan, but actual sales volume adds up to 10100 yuan, it is clear that obtains from caching at this time
The statistical result got is inaccurate.
Summary of the invention
The purpose of the present invention is to provide a kind of OLAP method for dynamically caching, can be while promoting cluster query performance
Guarantee the real-time and accuracy of query result.
To achieve the goals above, an aspect of of the present present invention provides a kind of OLAP method for dynamically caching, comprising:
Cache structure is constructed, the cache structure includes coordinator node, history node and real time node;
Real-time files packet is set according to data volume size and multiple history file packets, the real-time files are packaged in real-time section
In point, multiple history file packets are distributed in each history node;
It obtains real time data to be cached in real-time files packet, and when caching duration and reaching preset period of time by real-time files packet
In data cached distribution stored in each history file packet;
The query statement for receiving current time, by coordinator node from corresponding history file packet and/or real-time files packet
Middle acquisition target data, the feedback output after statistics summarizes.
Preferably, it is cached in real-time files packet in acquisition real time data, and will when caching duration and reaching preset period of time
After being stored in data cached distribution and each history file packet in real-time files packet further include:
The history file package informatin for being related to storing target data, the history file packet are parsed from the query statement
Information includes one of version information, index information, key value information of history file packet or a variety of;
Caching packet is set correspondingly based on relevant history file packet, and caches packet and is distributed in corresponding history node
In, alternatively, being gathered in coordinator node.
It is cached in real-time files packet preferably, obtaining real time data, and will be real when caching duration and reaching preset period of time
When file packet in the method that is stored in each history file packet of data cached distribution include:
The distribution period T of preset data divides the data cached unification of present period in real-time files packet at interval of period T
It issues and is stored in each history file packet, while emptying the caching of the data cached preparation subsequent period in real-time files packet.
Preferably, the query statement for receiving current time, by coordinator node from corresponding history file packet and/or in real time
Target data is obtained in file packet, the method for feedback output includes: after counting and summarizing
The Distribution Strategy of the target data and caching packet that are related to based on query statement, the setting mould of dynamic select caching packet
Formula;
When the Setting pattern of dynamic select is to be distributed in corresponding history node, then pass through correspondence when inquiring for the first time
Caching packet relevant scattered target data is obtained from relevant historical file packet, via history node belonging to respective caching packet
History target data is obtained after summarizing merging, then will be in history target data and current time real-time files packet by coordinator node
It is data cached summarize again merging after obtain target data feedback output;
When the Setting pattern of dynamic select is to be gathered in coordinator node, then pass through corresponding caching when inquiring for the first time
Packet obtains relevant scattered target data from relevant historical file packet, obtains history target after summarizing merging via coordinator node
Data, then after being summarized history target data again with data cached in current time real-time files packet by coordinator node and merging
Obtain target data feedback output.
Optionally, further includes:
When the Setting pattern of dynamic select is to be distributed in corresponding history node and after inquiring for the first time, identical
When period executes identical inquiry operation again, the history mesh of caching is directly obtained from history node belonging to respective caching packet
Data are marked, and by coordinator node history target data is summarized again with data cached in current time real-time files packet and is merged
After obtain target data feedback output;And/or
Dynamic select Setting pattern be gathered in coordinator node when and for the first time inquire after, the identical period again
When the identical inquiry operation of secondary execution, the history target data of caching is directly obtained from the caching packet of coordinator node, and by assisting
Point of adjustment summarizes history target data with data cached in current time real-time files packet again merge after obtain number of targets
According to feedback output.
Preferably, the Distribution Strategy of the caching packet are as follows:
When the quantity of the history file packet of storage target data involved in query statement is less than N number of, caching packet aggregation is taken
In the placement strategy of coordinator node, and N < 2*M;Alternatively,
When storing the quantity of the history file packet of target data involved in query statement less than M, and affiliated history node
Quantity when being greater than 4*M, take caching packet to be gathered in the placement strategy of coordinator node;
Otherwise, caching packet is taken to be distributed in the placement strategy of corresponding history node.
Illustratively, the M value is 20, and the N value is 30.
Compared with prior art, OLAP method for dynamically caching provided by the invention has the advantages that
In OLAP method for dynamically caching provided by the invention, can by entire assemblage classification be coordinator node, history node and
Then 1 real-time files packet and multiple history file packets is arranged according to the size of buffer data size in real time node, wherein in real time
File packet is cached for obtaining real time data, and will be data cached in real-time files packet when caching duration and reaching preset period of time
Distribute and stored in each history file packet, can be corresponded to when the storage of the data of history file packet is full increase new history file packet after
Storage is renewed, existing history file packet loop storage can also be used, later according to the query statement at current time by coordinating section
Point obtains target data from corresponding history file packet and/or real-time files packet, the feedback output after statistics summarizes.
As it can be seen that the present invention is arranged by the collocation of coordinator node, history node and real time node, can be looked into promoting cluster
Ask performance while guarantee query result real-time and accuracy, overcome in the prior art with inquiry the moment passage,
Query result can not query result inaccuracy caused by real-time update problem.
Another aspect of the present invention provides a kind of OLAP dynamic buffering device, the OLAP mentioned applied to above-mentioned technical proposal
In method for dynamically caching, which includes:
Framework setting unit, for constructing cache structure, the cache structure includes coordinator node, history node and in real time
Node;
File packet setting unit, it is described for real-time files packet and multiple history file packets to be arranged according to data volume size
Real-time files are packaged in real time node, and multiple history file packet distributions are in each history node;
Distribute storage unit, be cached in real-time files packet for obtaining real time data, and reaches default in caching duration
The data cached distribution in real-time files packet is stored in each history file packet when the period;
Output unit is inquired, for receiving the query statement at current time, by coordinator node from corresponding history file
Target data is obtained in packet and/or real-time files packet, the feedback output after statistics summarizes.
Preferably, further include between distribution storage unit and inquiry output unit
Packet setting unit is cached, for parsing the history file packet for being related to storing target data from the query statement
Information, the history file package informatin include one of version information, index information, key value information of history file packet or more
Kind;And caching packet is arranged based on relevant history file packet correspondingly, and cache packet and be distributed in corresponding history node,
Alternatively, being gathered in coordinator node.
Compared with prior art, the beneficial effect of OLAP dynamic buffering device provided by the invention is mentioned with above-mentioned technical proposal
The beneficial effect of the OLAP method for dynamically caching of confession is identical, and this will not be repeated here.
The third aspect of the present invention provides a kind of computer readable storage medium, is stored on computer readable storage medium
Computer program, the step of above-mentioned OLAP method for dynamically caching is executed when computer program is run by processor.
Compared with prior art, the beneficial effect and above-mentioned technical proposal of computer readable storage medium provided by the invention
The beneficial effect of the OLAP method for dynamically caching of offer is identical, and this will not be repeated here.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow diagram of OLAP method for dynamically caching in the embodiment of the present invention one;
Fig. 2 is that caching is packaged in the process schematic inquired for the first time in history node in the embodiment of the present invention one;
Fig. 3 is that caching is packaged in the process schematic inquired again in history node in the embodiment of the present invention one;
Fig. 4 is that caching is packaged in the process schematic inquired for the first time in coordinator node in the embodiment of the present invention one;
Fig. 5 is that caching is packaged in the process schematic inquired again in coordinator node in the embodiment of the present invention one.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, implement below in conjunction with the present invention
Attached drawing in example, technical scheme in the embodiment of the invention is clearly and completely described.Obviously, described embodiment
Only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, the common skill in this field
Art personnel all other embodiment obtained without creative labor belongs to the model that the present invention protects
It encloses.
Embodiment one
Referring to Fig. 1, the present embodiment provides a kind of OLAP method for dynamically caching, comprising: building cache structure, cache structure
Including coordinator node, history node and real time node;According to data volume size, real-time files packet and multiple history file packets are set,
Real-time files are packaged in real time node, and multiple history file packets are distributed in each history node;Real time data is obtained to be cached in
In real-time files packet, and it is when caching duration and reaching preset period of time that the data cached distribution in real-time files packet is literary in each history
It is stored in part packet;The query statement for receiving current time, by coordinator node from corresponding history file packet and/or real-time files
Target data is obtained in packet, the feedback output after statistics summarizes.
When it is implemented, the data volume in the next unit time period of mass data situation can be very big, after some time it is possible to reach tens G
Even G up to a hundred carry out statistics when summarizing if be only managed according to a file packet, the computing resource for needing to consume
Can be very big, cause calculated result output slow, the resources advantage and single machine of distributed type assemblies cannot be made full use of
Multicore advantage, also inconvenient fault-tolerant, dilatation.Therefore we can be according to certain strategy to the mass data in a period
Carry out cutting, be divided into multiple history file packets or real-time files packet, each file packet is referred to as segment, the strategy include with
Machine algorithm carries out hash modulus etc. according to certain field, these segment can be distributed to the different machines of cluster, thus can
One statistical query is distributed on more machines and is executed, resource utilization and query performance are promoted.In addition, above-mentioned history file
Packet or real-time files packet can have multiple copies, can also promote the concurrent of statistics calculating while providing high availability in this way
Property.
It can be coordinator node, history node by entire assemblage classification in OLAP method for dynamically caching provided in this embodiment
And real time node, 1 real-time files packet and multiple history file packets are then arranged according to the size of buffer data size, wherein real
When file packet for obtaining real time data caching, and when caching duration and reaching preset period of time by the caching number in real-time files packet
It is stored in each history file packet according to distribution, can be corresponded to when the storage of the data of history file packet is full and increase new history file packet
Continue to store, existing history file packet loop storage can also be used, coordination is passed through according to the query statement at current time later
Node obtains target data from corresponding history file packet and/or real-time files packet, the feedback output after statistics summarizes.
As it can be seen that the present embodiment is arranged by the collocation of coordinator node, history node and real time node, cluster can promoted
The real-time and accuracy for guaranteeing query result while query performance overcome pushing away with the inquiry moment in the prior art
Move, query result can not query result inaccuracy caused by real-time update problem.
In order to improve search efficiency, it is cached in real-time files packet in above-described embodiment in acquisition real time data, and slow
It deposits and also wraps the data cached distribution in real-time files packet later with storage in each history file packet when duration reaches preset period of time
It includes:
The history file package informatin for being related to storing target data is parsed from query statement, history file package informatin includes
One of the version information of history file packet, index information, key value information are a variety of;One by one based on relevant history file packet
Corresponding setting caching packet, and cache packet and be distributed in corresponding history node, alternatively, being gathered in coordinator node.
When it is implemented, if history file bag data is changed in history node, corresponding version information
It can change, ask for statistical result in the bag deposit that at this moment cannot postpone again, need to count again from history file packet, complete
Result is stored in the form of KV again afterwards into caching packet.
It obtains real time data in above-described embodiment to be cached in real-time files packet, and when caching duration reaches preset period of time
Include: by the method that the data cached distribution in real-time files packet stores in each history file packet
The distribution period T of preset data divides the data cached unification of present period in real-time files packet at interval of period T
It issues and is stored in each history file packet, while emptying the caching of the data cached preparation subsequent period in real-time files packet.
History file packet is used for store historical data, data can be counted and be cached, and real-time files packet is for depositing
Real time data is put, is more than current slot, then data cached be committed in history file packet of present period is become into history number
According to real time node is used for the newly generated real time data of reception, and coordinator node is responsible for receiving query statement, and inquiry is divided
It is dealt into corresponding history node and real time node, while the statistical result returned to history node and real time node merges,
Or it can according to need and cache the statistical result of each history file packet.
In above-described embodiment receive current time query statement, by coordinator node from corresponding history file packet and/
Or target data is obtained in real-time files packet, the method for feedback output includes: after counting and summarizing
The Distribution Strategy of the target data and caching packet that are related to based on query statement, the setting mould of dynamic select caching packet
Formula;
When the Setting pattern of dynamic select is to be distributed in corresponding history node, then pass through correspondence when inquiring for the first time
Caching packet relevant scattered target data is obtained from relevant historical file packet, via history node belonging to respective caching packet
History target data is obtained after summarizing merging, then will be in history target data and current time real-time files packet by coordinator node
It is data cached summarize again merging after obtain target data feedback output;
When the Setting pattern of dynamic select is to be gathered in coordinator node, then pass through corresponding caching when inquiring for the first time
Packet obtains relevant scattered target data from relevant historical file packet, obtains history target after summarizing merging via coordinator node
Data, then after being summarized history target data again with data cached in current time real-time files packet by coordinator node and merging
Obtain target data feedback output.
When it is implemented, referring to Fig. 2, if caching is packaged in history node, when inquiring first time, from
It is counted in segment data, and statistical result is cached in history node, history node can be to each segment
As a result operation is merged, final coordinator node can merge to obtain target to the statistical result of history node and real time node
Data feedback output;Referring to Fig. 3, if caching is packaged in coordinator node, it will not be in history node to multiple segment
Result merge operation, but the result of each segment is transferred to coordinator node, then cached by coordinator node
Into memory, while the statistical result of real time node being merged to obtain target data feedback output.
Preferably, in above-described embodiment further include:
When the Setting pattern of dynamic select is to be distributed in corresponding history node and after inquiring for the first time, identical
When period executes identical inquiry operation again, the history mesh of caching is directly obtained from history node belonging to respective caching packet
Data are marked, and by coordinator node history target data is summarized again with data cached in current time real-time files packet and is merged
After obtain target data feedback output.Referring to Fig. 4, when it is implemented, history node will be straight when carrying out second of inquiry
It connects and obtains statistical result from caching, merged rear history of forming target data and be committed to coordinator node, saved by coordinating
Point summarizes history target data with data cached in current time real-time files packet merge after obtain target data feed back it is defeated
Out.
Dynamic select Setting pattern be gathered in coordinator node when and for the first time inquire after, the identical period again
When the identical inquiry operation of secondary execution, the history target data of caching is directly obtained from the caching packet of coordinator node, and by assisting
Point of adjustment summarizes history target data with data cached in current time real-time files packet again merge after obtain number of targets
According to feedback output.Referring to Fig. 5, when it is implemented, when carrying out second and inquiring, if it find that the history being related in inquiry
Target data is buffered, then directly obtains history target data from local cache, while by coordinator node by history target
Data summarize with data cached in current time real-time files packet merge after obtain target data feedback output.Coordinate section at this time
Point will not be interacted with history node.
In addition, if second has been related to the history target data of a longer period of time when inquiring, can will cache at this time
History target data call directly, and uncached history target data is then first individually counted from history node, then by assisting
Point of adjustment summarizes it with data cached in current time real-time files packet merge after obtain target data feedback output.
It should be noted that include in query statement target data be related to version information, index information, in key value information
It is one or more, it is data cached in required history target data and real-time files packet for retrieving.
It should be noted that caching packet, which is put into history node on earth or is put on coordinator node, to be needed according to flock size
With the characteristic dynamic select of inquiry size;Such as:
One: one cluster of scene has 5 history nodes, and an inquiry relates only to 5 history file packets (segment),
And this 5 segment are evenly distributed in 5 history nodes, at this point, if caching packet is put into history node, every
When secondary inquiry, it is necessary first to buffered results be uploaded coordinator node from history node, then carry out statistics conjunction by coordinator node again
And, it is assumed that data a length of 100ms when transmitting merges each segment and needs 50ms, then entire time-consuming for 50* (5-1)+100=
300ms;If caching packet is put on coordinator node, does not need to carry out data transmission, directly unite on coordinator node
Meter merges, then entire time-consuming for 50* (5-1)=200ms;Caching packet is put into performance meeting in coordinator node in obvious above-mentioned scene
More preferably.
Two: one clusters of scene have 20 history nodes, and an inquiry is related to 200 history file packets (segment),
And this 200 segment are evenly distributed in 20 history nodes, i.e., each history node averagely has 10 segment, this
When, if caching packet is put into coordinator node, one carrys out the possible a large amount of committed memories of result cache of 200 segment, in addition
200 segment buffered results are merged, time-consuming also long, it is (200-1) * 50ms that one query is time-consuming in total
=9950ms;And if caching packet is put into history node, because history node can be executed concurrently, then time-consuming is in total
(10-1) * 50+100+ (20-1) * 50=1500ms, it can be seen that in such a scenario, caching is put into history node can be obvious
Accelerate performance.
It to sum up, when it is implemented, can be according to the size of current cluster, the quantity of the segment for participating in inquiry, inquiry
Complexity and the history node number of participation dynamically determine for caching to be put into coordinator node or history node.Usually abide by
The tactful principle followed is as follows:
If the segment number for 1, participating in inquiry is less, it will just cache and be put on coordinator node.In this way in hit caching
In the case of can directly be counted on coordinator node and real time node, save data cached network latency and expense, assist
The pressure of point of adjustment amalgamation result is also little;
If the segment quantity for 2, participating in statistics is more, need specifically to consider the history node that the inquiry participates at this time
Number:
If the history number of nodes participated in is less, in other words, that is to say, that participate in inquiry on each history node
If segment is more, it is just very advantageous that caching is at this moment put into history node, for example, can concomitantly history node into
Row merges, and promotes the concurrency of inquiry;Or the time of coordinator node merging data will be greatly reduced;
If the history number of nodes participated in is more, in other words, that is, inquiry is participated on each history node
Segment number is less, in this case, in history nodal cache and combined effect with regard to unobvious, because passing to
Coordinator node needs combined history target data amount still very big, and the delay merged on coordinator node is also bigger.It is this
In the case of, caching packet is put on coordinator node or history node, performance is influenced less, at this moment just to need to consider inquiry
Complexity, the buffer data size size of each segment and coordinate section memory cache pressure;If each
The buffer data size of segment is little, or coordination section memory cache pressure is smaller, it may be considered that caching is put into coordination section
On point, the expense of network transmission can be reduced in this way;Otherwise, consider for caching packet to be put on history node, association can be mitigated in this way
The expense that point of adjustment merges.
Illustratively, the Distribution Strategy of packet is cached in above-described embodiment are as follows:
When the quantity of the history file packet of storage target data involved in query statement is less than N number of, caching packet aggregation is taken
In the placement strategy of coordinator node, and N < 2*M;Alternatively, storing the number of the history file packet of target data involved in query statement
When amount is less than M, and when the quantity of affiliated history node is greater than 4*M, takes caching to wrap and be gathered in the placement strategy of coordinator node;
Otherwise, caching packet is taken to be distributed in the placement strategy of corresponding history node.It is 30 if M value is 20, N value.
When it is implemented, above-mentioned configuration can online dynamic regulation, it can moved according to practical implementation effect
State adjustment.It is learnt after practice result is analyzed, after having used above-mentioned dynamic buffering packet Distribution Strategy, the inquiry of entire cluster is prolonged
30% is averagely reduced late, concurrency also improves 20%, has reached the desired effect of dynamic buffering.
Embodiment two
The present embodiment provides a kind of OLAP dynamic buffering devices, comprising:
Framework setting unit, for constructing cache structure, the cache structure includes coordinator node, history node and in real time
Node;
File packet setting unit, it is described for real-time files packet and multiple history file packets to be arranged according to data volume size
Real-time files are packaged in real time node, and multiple history file packet distributions are in each history node;
Distribute storage unit, be cached in real-time files packet for obtaining real time data, and reaches default in caching duration
The data cached distribution in real-time files packet is stored in each history file packet when the period;
Output unit is inquired, for receiving the query statement at current time, by coordinator node from corresponding history file
Target data is obtained in packet and/or real-time files packet, the feedback output after statistics summarizes.
Preferably, further include between distribution storage unit and inquiry output unit
Packet setting unit is cached, for parsing the history file packet for being related to storing target data from the query statement
Information, the history file package informatin include one of version information, index information, key value information of history file packet or more
Kind;And caching packet is arranged based on relevant history file packet correspondingly, and cache packet and be distributed in corresponding history node,
Alternatively, being gathered in coordinator node.
Compared with prior art, the beneficial effect of OLAP dynamic buffering device provided in an embodiment of the present invention and above-mentioned implementation
The beneficial effect for the OLAP method for dynamically caching that example one provides is identical, and this will not be repeated here.
Embodiment three
The present embodiment provides a kind of computer readable storage medium, computer journey is stored on computer readable storage medium
Sequence, the step of above-mentioned OLAP method for dynamically caching is executed when computer program is run by processor.
Compared with prior art, the beneficial effect of computer readable storage medium provided in this embodiment and above-mentioned technical side
The beneficial effect for the OLAP method for dynamically caching that case provides is identical, and this will not be repeated here.
It will appreciated by the skilled person that realizing that all or part of the steps in foregoing invention method is can to lead to
Program is crossed to instruct relevant hardware and complete, above procedure can store in computer-readable storage medium, the program
When being executed, each step including above-described embodiment method, and storage medium may is that ROM/RAM, magnetic disk, CD, storage
Card etc..
More than, only a specific embodiment of the invention, but scope of protection of the present invention is not limited thereto, and it is any to be familiar with
Those skilled in the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all cover
Within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.