CN109947796A - A kind of caching method of distributed data base system inquiry intermediate result set - Google Patents
A kind of caching method of distributed data base system inquiry intermediate result set Download PDFInfo
- Publication number
- CN109947796A CN109947796A CN201910166410.6A CN201910166410A CN109947796A CN 109947796 A CN109947796 A CN 109947796A CN 201910166410 A CN201910166410 A CN 201910166410A CN 109947796 A CN109947796 A CN 109947796A
- Authority
- CN
- China
- Prior art keywords
- data
- caching
- intermediate result
- cache
- result set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention discloses a kind of caching method of distributed data base system inquiry intermediate result set, comprises the following steps: record subquery task returns to intermediate result set, establishes the data cached collection storage model of intermediate result;Establish query statement query context recognition mechanism;Overtime crash handling is realized to intermediate result set;By above-mentioned intermediate result set cache and recognition mechanism, distributed data base is realized the subquery for the condition that meets and is inquired without network interaction, improves the efficiency of distributed query.
Description
Technical field
The invention belongs to Computer Database fields, and in particular to what a kind of distributed data base system query statement generated
Intermediate result set cache and application method.
Background technique
Distributed data base is risen and is sent out in computer technology high speed as an important research content in database field
In the last decade of exhibition.Internet and mobile application it is universal allow Various types of data service facing increasing data scale and visit
Ask request pressure, and with the extensive use of distributed data base, the information of data service is promoted by distributed data base
Processing capacity becomes the universal solution of Various types of data service providers.
Distributed data base is grown up on the mature technology of centralized data base, and core concept is by data
Library cluster externally provides data service as a whole, interior data dispersion storage, by data redundancy, data fragmentation,
The technologies such as copies synchronized realize data reliability, improve data manipulation execution efficiency by technologies such as read and write abruptions.Distributed number
It is above general still using single node or a small number of nodes as management node realization inquiry parses, sentence is rewritten and result in management according to library
The functions such as merging, command multiple subdata base nodes, externally provide service to realize.
Currently, coherence request of most distributed data bases due to guarding data rigidly, submits sub- node database
Query result completely without caching process, so its carry out single distributed query when inherently consumed in network interaction
The a large amount of time.However some in practical application are applied, medical field etc. is not high to the coherence request of data, goes back simultaneously
With inquiry data in a period of time, there are certain incidence relation or almost homogeneous characteristics, therefore how to improve distributed data
The search efficiency of library system, so that it is provided high performance service is to need the problem of researching and solving.
Summary of the invention
For the deficiency of above-mentioned traditional distributed database, the invention proposes in a kind of inquiry of distributed data base system
Between result set caching and application method.The method achieve multiple to what is be related to when distributed data base executes query task
The subquery results collection that child node generates is cached, if so that distributed data base system is held again within following a period of time
When row query task, in the case that query context is less than buffered results range or range duplicates, system can direct centering
Between result set be multiplexed, to reduce the network interaction resource that consumes by repeating for inquiry.
The main thought of the method for the present invention is: after query statement decomposes each child node, acquiring each child node inquiry
The intermediate result set information that clause generates, establishes the data cached collection of intermediate result;It establishes and is looked into according to database execution sentence division
The recognition mechanism for asking range, judges whether the data cached collection of intermediate result can be used when there is new query task, and then reduces
The network interaction resource that database sends the number of query task to backstage child node and consumes when executing query task.
Realization of the invention comprises the following steps:
(1) the data cached collection storage model of intermediate result is established
The purpose of this step is that subquery results are established with storage model for caching the intermediate result of subquery generation.It should
Storage model is divided into head caching and result cache two parts.It includes local data bank ip, database name, pass that head caching, which needs to record,
It is mode, column field name etc.;Result cache needs to record all data item present in single row data.By generating above two
Part is data cached to cache the data information that each node returns.
(2) recognition mechanism of query context is established
The purpose of query context identification is that the matching degree of intermediate result and future Query is determined by recognition mechanism.For reality
The judgement to query statement query context is realized in this existing target, design multiway tree expression (structure is as shown in Fig. 1 in explanatory diagram).It should
Multiway tree from the beginning node to any one non-root node path produce a complete where clause, and single node storage number
It is executed on subdata base node according to the where sentence for the connected generation of nodes all on from the beginning node to the node path
Inquiry.The building of multiway tree is that corresponding number is added in multiway tree while caching to subquery intermediate result set
According to node, the information which includes include: node label search criterion Key with using the node as tail node inquiry generated
The all information of the corresponding inquiry record of conditional statement.Inquiry record includes inquiry target database ip, table name, inquiry item
Part, inquiry target column and the inquiry record the out-of-service time, and data structure is as shown in Figure 2.It is executed in subsequent query statement
When, the corresponding child node subquery task to be executed is obtained through routing resume module first, is divided into query term, inquiry
Table, multiple queries condition three, are retrieved in multiway tree according to querying condition later, if there are the querying condition institutes in multiway tree
The query path of generation and on the path querying condition range are all larger than or are equal to current subtask restriction range, simultaneously should
There are caching records for path tail node, then it is assumed that the sentence is identified, and there are intermediate results workable for the sentence in cache set
Set cache.
(3) processing of the data cached collection failure of intermediate result
In step (1), intermediate result set cache is that have certain timeliness, and overlong time data volume must be huge,
If untreated can seriously drag the slow database speed of service, while cannot guarantee that the validity of data.It is asked to solve this
Topic concentrates the failure time limit of setting record in the intermediate result of caching, and timing walker is arranged, by time to buffered results
It concentrates failure time limit attribute to be traversed, assert that the data set is super when time attribute occur in buffered results and being greater than current time
When fail, delete operation is executed to it.The failure time limit is generally related with the operation of field and database.Out-of-service time is logical here
Crossing inside the configuration file of configuration file realization has failure time attribute, this time is equivalent to one wherein and determines after reading
Value.
Compared with prior art, the present invention has following apparent advantage and beneficial effect:
The present invention proposes a kind of intermediate result set caching method generated based on distributed networks database query sentence, for
Network interaction demand is low and is directed to but data volume biggish query task not high to database coherence request, such as medical data
Deng having preferable database response speed.
Detailed description of the invention
Fig. 1 identification tree explanatory diagram;
Fig. 2 identifies tree node class declaration figure;
Fig. 3 intermediate result caches explanatory diagram;
Fig. 4 identification tree example diagram
The flow chart of Fig. 5 method involved in the present invention;
Specific implementation measure
The present invention will be further described with reference to the accompanying drawings and detailed description.
Step 1, the data cached collection model of intermediate result is established.
Intermediate result set caching method takes its structure to be delayed using relevant database MySQL returned content as standard
It deposits.Relevant database returns the result collection and includes two parts content: result head information, result column information.
Intermediate result caching head data set organization and intermediate result cache contents data set organization are shown in Fig. 3.Wherein intermediate knot
It is FieldCache in figure that fruit, which caches head data set, and respectively head records id, the address database ip, database member name from top to bottom
Title, database data table name, column original title array, complete raw cache head, cache contents id array.Intermediate result caching
Content data set is RowCache in figure, and attribute is row record id, row data content from top to bottom.
According to above-mentioned two data entity, data cached memory mechanism is established, is deposited into database or document.
Non-relational database returns the result also different according to type of database difference.Therefore it is operated mainly
To identify its returned content format, and its content is gone forward side by side according to the intermediate result set data that above-mentioned format conversion is above-mentioned model
Row caching.
Step 2, the recognition mechanism of query context is established.
Recognition mechanism is differentiated slow with the presence or absence of available intermediate result set when executing query task using multi-fork tree method
It deposits.Multiway tree single node content is shown in Fig. 2.
Wherein content is respectively Read-Write Locks, present node storage where clause (column, operator, value), operation from top to bottom
Symbol, value, cache database ip, table name, cache attribute array, caching expired time, caching head id.Child node executes query task
SQL statement turns to a paths for following binary tree after parsing.
The Multiway Tree Structure of generation is as shown in Figure 4.It is query statement " select name, sex from that path is indicated in figure
The user where and of id > 1 and of age > 13 type=4;" sentence generation path.
When executing new SQL statement, sentence is divided into alternative condition, table name and querying attributes, and by alternative condition according to
Be ranked up according to sequences of text, after alternative condition is searched in binary tree, meet full condition with item a certain in binary tree
When path storing data is identical or range is enclosed less than the demonstration of binary tree internal standard, meet and query execution data library if tail node exists
IP, table name are identical, querying attributes range it is identical or be greater than current queries objective attribute target attribute and caching it is not out of date, then illustrate current cache
It can use.If there is the inconsistent situation of query context before, after executing secondary inquiry to the buffered results, it is spliced into new centre
Result set packet executes union operation and returns to client after waiting subsequent intermediate result set cache to return.Otherwise it directly generates
Intermediate result set packet waits subsequent processing.
Step 3, the failure mode of the data cached collection of intermediate result.
After obtaining the intermediate result set that child node returns, parsing result set content is simultaneously cached.Calculate caching record
Current time and record come into force duration and, meter do the out-of-service time, typing caching head data set in.In addition system is by initiating
Timed task loops through buffered results.When the caching head data for being less than current time in discovery data set there are the out-of-service time
When, delete operation is carried out to when front data and with the cache contents data of this data correlation.Similarly, recognition mechanism passes through another
It plays timed task mode and stale data is traversed and deleted to it, but recognition mechanism also can be performed when finding fail data
Caching record operation is deleted to Free up Memory.Recognition mechanism delete operation follows following principle: delaying when deleting certain in execution
When depositing record, if the case where the caching record is the unique caching record of present node, and any child node is not present in present node
Under, deletion directly is executed to present node, and its father node should be judged with the presence or absence of caching record.If there is no slow for its father node
Record is deposited, then should continue to delete father node and repeats whether above-mentioned detection father node is do-nothing operation.
Step 4, it realizes without network interaction subquery
No network interaction subquery flow of task is shown in Fig. 5.
As shown in figure 5, subquery obtains its destination path in identification caching through resolution phase first before executing.
Judge that the caching whether there is according to path, while recording subquery range less than cache path range of condition
Number.The path and there are the caching intermediate results of database node performed by the subquery and tables of data in the path if it exists
Collection, then cache hit, successively extracts column name and row data if the detection inconsistent number of query context is greater than 0, executes secondary
Query filter result set.It directly generates after intermediate result set packet waits whole subqueries to be finished and carries out if inconsistent number
As a result merge.
If the path is not present, subquery is issued to corresponding child node and waits child node feedback result packet.Feedback
It, together will be in its typing with the out-of-service time after the result packet returned successively analyzes the information such as result packet header, result row data after receiving
Between in result set cache.It adds the intermediate result set cache path in identification caching according to subquery sentence later and records slow
Deposit the out-of-service time.Implementing result union operation after waiting whole subqueries to be finished later.
Claims (1)
1. a kind of caching method of distributed data base system inquiry intermediate result set, which is characterized in that comprise the following steps:
Step 1, the data cached collection model of intermediate result is established;
Intermediate result set caching method takes its structure to be cached using relevant database MySQL returned content as standard;It closes
It is that type database returns the result collection comprising two parts content: result head information, result column information;
It includes that head records id, the address database ip, database member title, database data that intermediate result, which caches head data set organization,
Table name, column original title array, complete raw cache head, cache contents id array;Intermediate result cache contents data set packet
Id, row data content are recorded containing row;
According to above-mentioned two data entity, data cached memory mechanism is established, is deposited into database or document;
Non-relational database returns the result also different according to type of database difference;Therefore it is operated to identify it
Returned content format, and its content according to the intermediate result set data that above-mentioned format conversion is above-mentioned model and is cached;
Step 2, the recognition mechanism of query context is established;
Recognition mechanism is differentiated using multi-fork tree method whether there is available intermediate result set cache when executing query task;It is more
Fork tree single node includes Read-Write Locks under high concurrent, present node storage where clause, operator, value, cache database ip, table
Name, cache attribute array, caching expired time, caching head id;Child node execute query task SQL statement turned to after parsing with
A paths for lower binary tree;
When executing new SQL statement, sentence is divided into alternative condition, table name and querying attributes, and by alternative condition according to text
This is sequentially ranked up, after alternative condition is searched in binary tree, meet full condition with paths a certain in binary tree
When storing data is identical or range is less than the demonstration of binary tree internal standard and encloses, if tail node exist meet with query execution data library IP,
Table name is identical, querying attributes range it is identical or be greater than current queries objective attribute target attribute and caching it is not out of date, then illustrate that current cache can
With;If there is the inconsistent situation of query context before, after executing secondary inquiry to the buffered results, it is spliced into new intermediate knot
Fruit Ji Bao executes union operation and returns to client after waiting subsequent intermediate result set cache to return;Otherwise in directly generating
Between result set packet wait subsequent processing;
Step 3, the failure mode of the data cached collection of intermediate result is set;
After obtaining the intermediate result set that child node returns, parsing result set content is simultaneously cached;Calculate working as caching record
The preceding time and record come into force duration and, meter do the out-of-service time, typing caching head data set in;In addition system is by initiating timing
Duty cycle traverses buffered results;It is right when finding to be less than the caching head data of current time there are the out-of-service time in data set
Delete operation is carried out when front data and with the cache contents data of this data correlation;Similarly, recognition mechanism is fixed by setting up another
When task state traversed and deleted stale data to it, but recognition mechanism also can be performed when finding fail data and delete
Caching record operation is to Free up Memory;Recognition mechanism delete operation follows following principle: remembering when deleting certain caching in execution
When record, if in the case that the caching record is the unique caching record of present node, and any child node is not present in present node, directly
It connects and deletion is executed to present node, and its father node should be judged with the presence or absence of caching record;If there is no caching notes for its father node
Record should then continue to delete father node and repeat whether above-mentioned detection father node is do-nothing operation;
Step 4, realize subquery without network interaction
Subquery obtains its destination path in identification caching through resolution phase first before executing;
Judge that the caching whether there is according to path, while recording subquery range less than cache path range of condition number;If
There are the path and the path is there are the caching intermediate result set of database node performed by the subquery and tables of data, then delays
Hit is deposited, if the detection inconsistent number of query context is greater than 0, column name and row data is successively extracted, executes secondary inquired
Filter result set;It directly generates after intermediate result set packet waits whole subqueries to be finished and is tied if inconsistent data is 0
Fruit merges;
Subquery is issued to corresponding child node and waits child node feedback result by the cache miss if the path is not present
Packet;After the result packet fed back to successively analyzes result packet header, result row data information after receiving, recorded together with the out-of-service time
Enter in intermediate result set cache;The intermediate result set cache path is added in identification caching according to subquery sentence later and is remembered
Record the cache invalidation time;Implementing result union operation after waiting whole subqueries to be finished later.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910166410.6A CN109947796B (en) | 2019-04-12 | 2019-04-12 | Caching method for query intermediate result set of distributed database system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910166410.6A CN109947796B (en) | 2019-04-12 | 2019-04-12 | Caching method for query intermediate result set of distributed database system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109947796A true CN109947796A (en) | 2019-06-28 |
CN109947796B CN109947796B (en) | 2021-04-30 |
Family
ID=67008343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910166410.6A Active CN109947796B (en) | 2019-04-12 | 2019-04-12 | Caching method for query intermediate result set of distributed database system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109947796B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543494A (en) * | 2019-08-19 | 2019-12-06 | 湖南麟淇网络科技股份有限公司 | Method for constructing reachable graph based on cache table |
CN112380256A (en) * | 2020-11-24 | 2021-02-19 | 广东机场白云信息科技有限公司 | Method for accessing energy system data, database and computer readable storage medium |
CN112905592A (en) * | 2021-02-08 | 2021-06-04 | 中国工商银行股份有限公司 | Data query method, system and server |
CN113420033A (en) * | 2021-08-17 | 2021-09-21 | 蚂蚁金服(杭州)网络技术有限公司 | Table data query method, table data query device and system for distributed database |
CN113515549A (en) * | 2021-09-14 | 2021-10-19 | 江西科技学院 | Financial data query method and device and readable storage medium |
CN114840562A (en) * | 2022-07-04 | 2022-08-02 | 深圳市茗格科技有限公司 | Distributed caching method and device for business data, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102163195A (en) * | 2010-02-22 | 2011-08-24 | 北京东方通科技股份有限公司 | Query optimization method based on unified view of distributed heterogeneous database |
CN102521406A (en) * | 2011-12-26 | 2012-06-27 | 中国科学院计算技术研究所 | Distributed query method and system for complex task of querying massive structured data |
CN105912666A (en) * | 2016-04-12 | 2016-08-31 | 中国科学院软件研究所 | Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform |
US20160267132A1 (en) * | 2013-12-17 | 2016-09-15 | Hewlett-Packard Enterprise Development LP | Abstraction layer between a database query engine and a distributed file system |
CN106682147A (en) * | 2016-12-22 | 2017-05-17 | 北京锐安科技有限公司 | Mass data based query method and device |
CN108108456A (en) * | 2017-12-28 | 2018-06-01 | 重庆邮电大学 | A kind of information resources distributed enquiring method based on metadata |
-
2019
- 2019-04-12 CN CN201910166410.6A patent/CN109947796B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102163195A (en) * | 2010-02-22 | 2011-08-24 | 北京东方通科技股份有限公司 | Query optimization method based on unified view of distributed heterogeneous database |
CN102521406A (en) * | 2011-12-26 | 2012-06-27 | 中国科学院计算技术研究所 | Distributed query method and system for complex task of querying massive structured data |
US20160267132A1 (en) * | 2013-12-17 | 2016-09-15 | Hewlett-Packard Enterprise Development LP | Abstraction layer between a database query engine and a distributed file system |
CN105912666A (en) * | 2016-04-12 | 2016-08-31 | 中国科学院软件研究所 | Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform |
CN106682147A (en) * | 2016-12-22 | 2017-05-17 | 北京锐安科技有限公司 | Mass data based query method and device |
CN108108456A (en) * | 2017-12-28 | 2018-06-01 | 重庆邮电大学 | A kind of information resources distributed enquiring method based on metadata |
Non-Patent Citations (2)
Title |
---|
SIMONE: "An Elastic Multi-Core Allocation Mechanism for Database Systems", 《IEEE》 * |
亓开元: "支持高并发数据流处理的MapReduce中间结果缓存", 《计算机研究与发展》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543494A (en) * | 2019-08-19 | 2019-12-06 | 湖南麟淇网络科技股份有限公司 | Method for constructing reachable graph based on cache table |
CN110543494B (en) * | 2019-08-19 | 2023-03-24 | 湖南麟淇网络科技股份有限公司 | Method for constructing reachable graph based on cache table |
CN112380256A (en) * | 2020-11-24 | 2021-02-19 | 广东机场白云信息科技有限公司 | Method for accessing energy system data, database and computer readable storage medium |
CN112380256B (en) * | 2020-11-24 | 2023-10-13 | 广东机场白云信息科技有限公司 | Method for accessing data of energy system, database and computer readable storage medium |
CN112905592A (en) * | 2021-02-08 | 2021-06-04 | 中国工商银行股份有限公司 | Data query method, system and server |
CN113420033A (en) * | 2021-08-17 | 2021-09-21 | 蚂蚁金服(杭州)网络技术有限公司 | Table data query method, table data query device and system for distributed database |
WO2023020236A1 (en) * | 2021-08-17 | 2023-02-23 | 北京奥星贝斯科技有限公司 | Table data query of distributed database |
CN113515549A (en) * | 2021-09-14 | 2021-10-19 | 江西科技学院 | Financial data query method and device and readable storage medium |
CN114840562A (en) * | 2022-07-04 | 2022-08-02 | 深圳市茗格科技有限公司 | Distributed caching method and device for business data, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109947796B (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109947796A (en) | A kind of caching method of distributed data base system inquiry intermediate result set | |
CN103020204B (en) | A kind of method and its system carrying out multi-dimensional interval query to distributed sequence list | |
CN104750681B (en) | A kind of processing method and processing device of mass data | |
Suel et al. | ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval. | |
Bohannon et al. | From XML schema to relations: A cost-based approach to XML storage | |
CN103631870B (en) | System and method used for large-scale distributed data processing | |
CN102270232B (en) | Semantic data query system with optimized storage | |
CN105630881B (en) | A kind of date storage method and querying method of RDF | |
Tatarowicz et al. | Lookup tables: Fine-grained partitioning for distributed databases | |
Ding et al. | Efficient and progressive algorithms for distributed skyline queries over uncertain data | |
CN107656951B (en) | A kind of method of real time data in synchronous and heterogeneous Database Systems | |
CN108664516A (en) | Enquiring and optimizing method and relevant apparatus | |
CN100458784C (en) | Researching system and method used in digital labrary | |
CN106294695A (en) | A kind of implementation method towards the biggest data search engine | |
Borkar et al. | Have your data and query it too: From key-value caching to big data management | |
Lu et al. | Multidatabase query optimization: Issues and solutions | |
CN109446358A (en) | A kind of chart database accelerator and method based on ID caching technology | |
CN108009270A (en) | A kind of text searching method calculated based on distributed memory | |
Özsu et al. | Distributed and Parallel Database Systems. | |
CN104199978A (en) | System and method for realizing metadata cache and analysis based on NoSQL and method | |
CN109271437A (en) | A kind of Query method in real time of magnanimity rent information | |
Pirzadeh et al. | Bigfun: A performance study of big data management system functionality | |
Cappellari et al. | A path-oriented rdf index for keyword search query processing | |
JP2000163307A (en) | Method and device for data base processing, and medium where processing program thereof is recorded | |
Rohde et al. | Optimizing federated queries based on the physical design of a data lake |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |