CN109947796A - A kind of caching method of distributed data base system inquiry intermediate result set - Google Patents

A kind of caching method of distributed data base system inquiry intermediate result set Download PDF

Info

Publication number
CN109947796A
CN109947796A CN201910166410.6A CN201910166410A CN109947796A CN 109947796 A CN109947796 A CN 109947796A CN 201910166410 A CN201910166410 A CN 201910166410A CN 109947796 A CN109947796 A CN 109947796A
Authority
CN
China
Prior art keywords
data
caching
intermediate result
cache
result set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910166410.6A
Other languages
Chinese (zh)
Other versions
CN109947796B (en
Inventor
杜金莲
陈子昂
金雪云
苏航
李童
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910166410.6A priority Critical patent/CN109947796B/en
Publication of CN109947796A publication Critical patent/CN109947796A/en
Application granted granted Critical
Publication of CN109947796B publication Critical patent/CN109947796B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention discloses a kind of caching method of distributed data base system inquiry intermediate result set, comprises the following steps: record subquery task returns to intermediate result set, establishes the data cached collection storage model of intermediate result;Establish query statement query context recognition mechanism;Overtime crash handling is realized to intermediate result set;By above-mentioned intermediate result set cache and recognition mechanism, distributed data base is realized the subquery for the condition that meets and is inquired without network interaction, improves the efficiency of distributed query.

Description

A kind of caching method of distributed data base system inquiry intermediate result set
Technical field
The invention belongs to Computer Database fields, and in particular to what a kind of distributed data base system query statement generated Intermediate result set cache and application method.
Background technique
Distributed data base is risen and is sent out in computer technology high speed as an important research content in database field In the last decade of exhibition.Internet and mobile application it is universal allow Various types of data service facing increasing data scale and visit Ask request pressure, and with the extensive use of distributed data base, the information of data service is promoted by distributed data base Processing capacity becomes the universal solution of Various types of data service providers.
Distributed data base is grown up on the mature technology of centralized data base, and core concept is by data Library cluster externally provides data service as a whole, interior data dispersion storage, by data redundancy, data fragmentation, The technologies such as copies synchronized realize data reliability, improve data manipulation execution efficiency by technologies such as read and write abruptions.Distributed number It is above general still using single node or a small number of nodes as management node realization inquiry parses, sentence is rewritten and result in management according to library The functions such as merging, command multiple subdata base nodes, externally provide service to realize.
Currently, coherence request of most distributed data bases due to guarding data rigidly, submits sub- node database Query result completely without caching process, so its carry out single distributed query when inherently consumed in network interaction The a large amount of time.However some in practical application are applied, medical field etc. is not high to the coherence request of data, goes back simultaneously With inquiry data in a period of time, there are certain incidence relation or almost homogeneous characteristics, therefore how to improve distributed data The search efficiency of library system, so that it is provided high performance service is to need the problem of researching and solving.
Summary of the invention
For the deficiency of above-mentioned traditional distributed database, the invention proposes in a kind of inquiry of distributed data base system Between result set caching and application method.The method achieve multiple to what is be related to when distributed data base executes query task The subquery results collection that child node generates is cached, if so that distributed data base system is held again within following a period of time When row query task, in the case that query context is less than buffered results range or range duplicates, system can direct centering Between result set be multiplexed, to reduce the network interaction resource that consumes by repeating for inquiry.
The main thought of the method for the present invention is: after query statement decomposes each child node, acquiring each child node inquiry The intermediate result set information that clause generates, establishes the data cached collection of intermediate result;It establishes and is looked into according to database execution sentence division The recognition mechanism for asking range, judges whether the data cached collection of intermediate result can be used when there is new query task, and then reduces The network interaction resource that database sends the number of query task to backstage child node and consumes when executing query task.
Realization of the invention comprises the following steps:
(1) the data cached collection storage model of intermediate result is established
The purpose of this step is that subquery results are established with storage model for caching the intermediate result of subquery generation.It should Storage model is divided into head caching and result cache two parts.It includes local data bank ip, database name, pass that head caching, which needs to record, It is mode, column field name etc.;Result cache needs to record all data item present in single row data.By generating above two Part is data cached to cache the data information that each node returns.
(2) recognition mechanism of query context is established
The purpose of query context identification is that the matching degree of intermediate result and future Query is determined by recognition mechanism.For reality The judgement to query statement query context is realized in this existing target, design multiway tree expression (structure is as shown in Fig. 1 in explanatory diagram).It should Multiway tree from the beginning node to any one non-root node path produce a complete where clause, and single node storage number It is executed on subdata base node according to the where sentence for the connected generation of nodes all on from the beginning node to the node path Inquiry.The building of multiway tree is that corresponding number is added in multiway tree while caching to subquery intermediate result set According to node, the information which includes include: node label search criterion Key with using the node as tail node inquiry generated The all information of the corresponding inquiry record of conditional statement.Inquiry record includes inquiry target database ip, table name, inquiry item Part, inquiry target column and the inquiry record the out-of-service time, and data structure is as shown in Figure 2.It is executed in subsequent query statement When, the corresponding child node subquery task to be executed is obtained through routing resume module first, is divided into query term, inquiry Table, multiple queries condition three, are retrieved in multiway tree according to querying condition later, if there are the querying condition institutes in multiway tree The query path of generation and on the path querying condition range are all larger than or are equal to current subtask restriction range, simultaneously should There are caching records for path tail node, then it is assumed that the sentence is identified, and there are intermediate results workable for the sentence in cache set Set cache.
(3) processing of the data cached collection failure of intermediate result
In step (1), intermediate result set cache is that have certain timeliness, and overlong time data volume must be huge, If untreated can seriously drag the slow database speed of service, while cannot guarantee that the validity of data.It is asked to solve this Topic concentrates the failure time limit of setting record in the intermediate result of caching, and timing walker is arranged, by time to buffered results It concentrates failure time limit attribute to be traversed, assert that the data set is super when time attribute occur in buffered results and being greater than current time When fail, delete operation is executed to it.The failure time limit is generally related with the operation of field and database.Out-of-service time is logical here Crossing inside the configuration file of configuration file realization has failure time attribute, this time is equivalent to one wherein and determines after reading Value.
Compared with prior art, the present invention has following apparent advantage and beneficial effect:
The present invention proposes a kind of intermediate result set caching method generated based on distributed networks database query sentence, for Network interaction demand is low and is directed to but data volume biggish query task not high to database coherence request, such as medical data Deng having preferable database response speed.
Detailed description of the invention
Fig. 1 identification tree explanatory diagram;
Fig. 2 identifies tree node class declaration figure;
Fig. 3 intermediate result caches explanatory diagram;
Fig. 4 identification tree example diagram
The flow chart of Fig. 5 method involved in the present invention;
Specific implementation measure
The present invention will be further described with reference to the accompanying drawings and detailed description.
Step 1, the data cached collection model of intermediate result is established.
Intermediate result set caching method takes its structure to be delayed using relevant database MySQL returned content as standard It deposits.Relevant database returns the result collection and includes two parts content: result head information, result column information.
Intermediate result caching head data set organization and intermediate result cache contents data set organization are shown in Fig. 3.Wherein intermediate knot It is FieldCache in figure that fruit, which caches head data set, and respectively head records id, the address database ip, database member name from top to bottom Title, database data table name, column original title array, complete raw cache head, cache contents id array.Intermediate result caching Content data set is RowCache in figure, and attribute is row record id, row data content from top to bottom.
According to above-mentioned two data entity, data cached memory mechanism is established, is deposited into database or document.
Non-relational database returns the result also different according to type of database difference.Therefore it is operated mainly To identify its returned content format, and its content is gone forward side by side according to the intermediate result set data that above-mentioned format conversion is above-mentioned model Row caching.
Step 2, the recognition mechanism of query context is established.
Recognition mechanism is differentiated slow with the presence or absence of available intermediate result set when executing query task using multi-fork tree method It deposits.Multiway tree single node content is shown in Fig. 2.
Wherein content is respectively Read-Write Locks, present node storage where clause (column, operator, value), operation from top to bottom Symbol, value, cache database ip, table name, cache attribute array, caching expired time, caching head id.Child node executes query task SQL statement turns to a paths for following binary tree after parsing.
The Multiway Tree Structure of generation is as shown in Figure 4.It is query statement " select name, sex from that path is indicated in figure The user where and of id > 1 and of age > 13 type=4;" sentence generation path.
When executing new SQL statement, sentence is divided into alternative condition, table name and querying attributes, and by alternative condition according to Be ranked up according to sequences of text, after alternative condition is searched in binary tree, meet full condition with item a certain in binary tree When path storing data is identical or range is enclosed less than the demonstration of binary tree internal standard, meet and query execution data library if tail node exists IP, table name are identical, querying attributes range it is identical or be greater than current queries objective attribute target attribute and caching it is not out of date, then illustrate current cache It can use.If there is the inconsistent situation of query context before, after executing secondary inquiry to the buffered results, it is spliced into new centre Result set packet executes union operation and returns to client after waiting subsequent intermediate result set cache to return.Otherwise it directly generates Intermediate result set packet waits subsequent processing.
Step 3, the failure mode of the data cached collection of intermediate result.
After obtaining the intermediate result set that child node returns, parsing result set content is simultaneously cached.Calculate caching record Current time and record come into force duration and, meter do the out-of-service time, typing caching head data set in.In addition system is by initiating Timed task loops through buffered results.When the caching head data for being less than current time in discovery data set there are the out-of-service time When, delete operation is carried out to when front data and with the cache contents data of this data correlation.Similarly, recognition mechanism passes through another It plays timed task mode and stale data is traversed and deleted to it, but recognition mechanism also can be performed when finding fail data Caching record operation is deleted to Free up Memory.Recognition mechanism delete operation follows following principle: delaying when deleting certain in execution When depositing record, if the case where the caching record is the unique caching record of present node, and any child node is not present in present node Under, deletion directly is executed to present node, and its father node should be judged with the presence or absence of caching record.If there is no slow for its father node Record is deposited, then should continue to delete father node and repeats whether above-mentioned detection father node is do-nothing operation.
Step 4, it realizes without network interaction subquery
No network interaction subquery flow of task is shown in Fig. 5.
As shown in figure 5, subquery obtains its destination path in identification caching through resolution phase first before executing.
Judge that the caching whether there is according to path, while recording subquery range less than cache path range of condition Number.The path and there are the caching intermediate results of database node performed by the subquery and tables of data in the path if it exists Collection, then cache hit, successively extracts column name and row data if the detection inconsistent number of query context is greater than 0, executes secondary Query filter result set.It directly generates after intermediate result set packet waits whole subqueries to be finished and carries out if inconsistent number As a result merge.
If the path is not present, subquery is issued to corresponding child node and waits child node feedback result packet.Feedback It, together will be in its typing with the out-of-service time after the result packet returned successively analyzes the information such as result packet header, result row data after receiving Between in result set cache.It adds the intermediate result set cache path in identification caching according to subquery sentence later and records slow Deposit the out-of-service time.Implementing result union operation after waiting whole subqueries to be finished later.

Claims (1)

1. a kind of caching method of distributed data base system inquiry intermediate result set, which is characterized in that comprise the following steps:
Step 1, the data cached collection model of intermediate result is established;
Intermediate result set caching method takes its structure to be cached using relevant database MySQL returned content as standard;It closes It is that type database returns the result collection comprising two parts content: result head information, result column information;
It includes that head records id, the address database ip, database member title, database data that intermediate result, which caches head data set organization, Table name, column original title array, complete raw cache head, cache contents id array;Intermediate result cache contents data set packet Id, row data content are recorded containing row;
According to above-mentioned two data entity, data cached memory mechanism is established, is deposited into database or document;
Non-relational database returns the result also different according to type of database difference;Therefore it is operated to identify it Returned content format, and its content according to the intermediate result set data that above-mentioned format conversion is above-mentioned model and is cached;
Step 2, the recognition mechanism of query context is established;
Recognition mechanism is differentiated using multi-fork tree method whether there is available intermediate result set cache when executing query task;It is more Fork tree single node includes Read-Write Locks under high concurrent, present node storage where clause, operator, value, cache database ip, table Name, cache attribute array, caching expired time, caching head id;Child node execute query task SQL statement turned to after parsing with A paths for lower binary tree;
When executing new SQL statement, sentence is divided into alternative condition, table name and querying attributes, and by alternative condition according to text This is sequentially ranked up, after alternative condition is searched in binary tree, meet full condition with paths a certain in binary tree When storing data is identical or range is less than the demonstration of binary tree internal standard and encloses, if tail node exist meet with query execution data library IP, Table name is identical, querying attributes range it is identical or be greater than current queries objective attribute target attribute and caching it is not out of date, then illustrate that current cache can With;If there is the inconsistent situation of query context before, after executing secondary inquiry to the buffered results, it is spliced into new intermediate knot Fruit Ji Bao executes union operation and returns to client after waiting subsequent intermediate result set cache to return;Otherwise in directly generating Between result set packet wait subsequent processing;
Step 3, the failure mode of the data cached collection of intermediate result is set;
After obtaining the intermediate result set that child node returns, parsing result set content is simultaneously cached;Calculate working as caching record The preceding time and record come into force duration and, meter do the out-of-service time, typing caching head data set in;In addition system is by initiating timing Duty cycle traverses buffered results;It is right when finding to be less than the caching head data of current time there are the out-of-service time in data set Delete operation is carried out when front data and with the cache contents data of this data correlation;Similarly, recognition mechanism is fixed by setting up another When task state traversed and deleted stale data to it, but recognition mechanism also can be performed when finding fail data and delete Caching record operation is to Free up Memory;Recognition mechanism delete operation follows following principle: remembering when deleting certain caching in execution When record, if in the case that the caching record is the unique caching record of present node, and any child node is not present in present node, directly It connects and deletion is executed to present node, and its father node should be judged with the presence or absence of caching record;If there is no caching notes for its father node Record should then continue to delete father node and repeat whether above-mentioned detection father node is do-nothing operation;
Step 4, realize subquery without network interaction
Subquery obtains its destination path in identification caching through resolution phase first before executing;
Judge that the caching whether there is according to path, while recording subquery range less than cache path range of condition number;If There are the path and the path is there are the caching intermediate result set of database node performed by the subquery and tables of data, then delays Hit is deposited, if the detection inconsistent number of query context is greater than 0, column name and row data is successively extracted, executes secondary inquired Filter result set;It directly generates after intermediate result set packet waits whole subqueries to be finished and is tied if inconsistent data is 0 Fruit merges;
Subquery is issued to corresponding child node and waits child node feedback result by the cache miss if the path is not present Packet;After the result packet fed back to successively analyzes result packet header, result row data information after receiving, recorded together with the out-of-service time Enter in intermediate result set cache;The intermediate result set cache path is added in identification caching according to subquery sentence later and is remembered Record the cache invalidation time;Implementing result union operation after waiting whole subqueries to be finished later.
CN201910166410.6A 2019-04-12 2019-04-12 Caching method for query intermediate result set of distributed database system Active CN109947796B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910166410.6A CN109947796B (en) 2019-04-12 2019-04-12 Caching method for query intermediate result set of distributed database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910166410.6A CN109947796B (en) 2019-04-12 2019-04-12 Caching method for query intermediate result set of distributed database system

Publications (2)

Publication Number Publication Date
CN109947796A true CN109947796A (en) 2019-06-28
CN109947796B CN109947796B (en) 2021-04-30

Family

ID=67008343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910166410.6A Active CN109947796B (en) 2019-04-12 2019-04-12 Caching method for query intermediate result set of distributed database system

Country Status (1)

Country Link
CN (1) CN109947796B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543494A (en) * 2019-08-19 2019-12-06 湖南麟淇网络科技股份有限公司 Method for constructing reachable graph based on cache table
CN112380256A (en) * 2020-11-24 2021-02-19 广东机场白云信息科技有限公司 Method for accessing energy system data, database and computer readable storage medium
CN112905592A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Data query method, system and server
CN113420033A (en) * 2021-08-17 2021-09-21 蚂蚁金服(杭州)网络技术有限公司 Table data query method, table data query device and system for distributed database
CN113515549A (en) * 2021-09-14 2021-10-19 江西科技学院 Financial data query method and device and readable storage medium
CN114840562A (en) * 2022-07-04 2022-08-02 深圳市茗格科技有限公司 Distributed caching method and device for business data, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163195A (en) * 2010-02-22 2011-08-24 北京东方通科技股份有限公司 Query optimization method based on unified view of distributed heterogeneous database
CN102521406A (en) * 2011-12-26 2012-06-27 中国科学院计算技术研究所 Distributed query method and system for complex task of querying massive structured data
CN105912666A (en) * 2016-04-12 2016-08-31 中国科学院软件研究所 Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform
US20160267132A1 (en) * 2013-12-17 2016-09-15 Hewlett-Packard Enterprise Development LP Abstraction layer between a database query engine and a distributed file system
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device
CN108108456A (en) * 2017-12-28 2018-06-01 重庆邮电大学 A kind of information resources distributed enquiring method based on metadata

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163195A (en) * 2010-02-22 2011-08-24 北京东方通科技股份有限公司 Query optimization method based on unified view of distributed heterogeneous database
CN102521406A (en) * 2011-12-26 2012-06-27 中国科学院计算技术研究所 Distributed query method and system for complex task of querying massive structured data
US20160267132A1 (en) * 2013-12-17 2016-09-15 Hewlett-Packard Enterprise Development LP Abstraction layer between a database query engine and a distributed file system
CN105912666A (en) * 2016-04-12 2016-08-31 中国科学院软件研究所 Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device
CN108108456A (en) * 2017-12-28 2018-06-01 重庆邮电大学 A kind of information resources distributed enquiring method based on metadata

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SIMONE: "An Elastic Multi-Core Allocation Mechanism for Database Systems", 《IEEE》 *
亓开元: "支持高并发数据流处理的MapReduce中间结果缓存", 《计算机研究与发展》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543494A (en) * 2019-08-19 2019-12-06 湖南麟淇网络科技股份有限公司 Method for constructing reachable graph based on cache table
CN110543494B (en) * 2019-08-19 2023-03-24 湖南麟淇网络科技股份有限公司 Method for constructing reachable graph based on cache table
CN112380256A (en) * 2020-11-24 2021-02-19 广东机场白云信息科技有限公司 Method for accessing energy system data, database and computer readable storage medium
CN112380256B (en) * 2020-11-24 2023-10-13 广东机场白云信息科技有限公司 Method for accessing data of energy system, database and computer readable storage medium
CN112905592A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Data query method, system and server
CN113420033A (en) * 2021-08-17 2021-09-21 蚂蚁金服(杭州)网络技术有限公司 Table data query method, table data query device and system for distributed database
WO2023020236A1 (en) * 2021-08-17 2023-02-23 北京奥星贝斯科技有限公司 Table data query of distributed database
CN113515549A (en) * 2021-09-14 2021-10-19 江西科技学院 Financial data query method and device and readable storage medium
CN114840562A (en) * 2022-07-04 2022-08-02 深圳市茗格科技有限公司 Distributed caching method and device for business data, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109947796B (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN109947796A (en) A kind of caching method of distributed data base system inquiry intermediate result set
CN103020204B (en) A kind of method and its system carrying out multi-dimensional interval query to distributed sequence list
CN104750681B (en) A kind of processing method and processing device of mass data
Suel et al. ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval.
Bohannon et al. From XML schema to relations: A cost-based approach to XML storage
CN103631870B (en) System and method used for large-scale distributed data processing
CN102270232B (en) Semantic data query system with optimized storage
CN105630881B (en) A kind of date storage method and querying method of RDF
Tatarowicz et al. Lookup tables: Fine-grained partitioning for distributed databases
Ding et al. Efficient and progressive algorithms for distributed skyline queries over uncertain data
CN107656951B (en) A kind of method of real time data in synchronous and heterogeneous Database Systems
CN108664516A (en) Enquiring and optimizing method and relevant apparatus
CN100458784C (en) Researching system and method used in digital labrary
CN106294695A (en) A kind of implementation method towards the biggest data search engine
Borkar et al. Have your data and query it too: From key-value caching to big data management
Lu et al. Multidatabase query optimization: Issues and solutions
CN109446358A (en) A kind of chart database accelerator and method based on ID caching technology
CN108009270A (en) A kind of text searching method calculated based on distributed memory
Özsu et al. Distributed and Parallel Database Systems.
CN104199978A (en) System and method for realizing metadata cache and analysis based on NoSQL and method
CN109271437A (en) A kind of Query method in real time of magnanimity rent information
Pirzadeh et al. Bigfun: A performance study of big data management system functionality
Cappellari et al. A path-oriented rdf index for keyword search query processing
JP2000163307A (en) Method and device for data base processing, and medium where processing program thereof is recorded
Rohde et al. Optimizing federated queries based on the physical design of a data lake

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant