CN109947796A

CN109947796A - A kind of caching method of distributed data base system inquiry intermediate result set

Info

Publication number: CN109947796A
Application number: CN201910166410.6A
Authority: CN
Inventors: 杜金莲; 陈子昂; 金雪云; 苏航; 李童
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2019-06-28
Anticipated expiration: 2039-04-12
Also published as: CN109947796B

Abstract

The present invention discloses a kind of caching method of distributed data base system inquiry intermediate result set, comprises the following steps: record subquery task returns to intermediate result set, establishes the data cached collection storage model of intermediate result；Establish query statement query context recognition mechanism；Overtime crash handling is realized to intermediate result set；By above-mentioned intermediate result set cache and recognition mechanism, distributed data base is realized the subquery for the condition that meets and is inquired without network interaction, improves the efficiency of distributed query.

Description

A kind of caching method of distributed data base system inquiry intermediate result set

Technical field

The invention belongs to Computer Database fields, and in particular to what a kind of distributed data base system query statement generated Intermediate result set cache and application method.

Background technique

Distributed data base is risen and is sent out in computer technology high speed as an important research content in database field In the last decade of exhibition.Internet and mobile application it is universal allow Various types of data service facing increasing data scale and visit Ask request pressure, and with the extensive use of distributed data base, the information of data service is promoted by distributed data base Processing capacity becomes the universal solution of Various types of data service providers.

Distributed data base is grown up on the mature technology of centralized data base, and core concept is by data Library cluster externally provides data service as a whole, interior data dispersion storage, by data redundancy, data fragmentation, The technologies such as copies synchronized realize data reliability, improve data manipulation execution efficiency by technologies such as read and write abruptions.Distributed number It is above general still using single node or a small number of nodes as management node realization inquiry parses, sentence is rewritten and result in management according to library The functions such as merging, command multiple subdata base nodes, externally provide service to realize.

Currently, coherence request of most distributed data bases due to guarding data rigidly, submits sub- node database Query result completely without caching process, so its carry out single distributed query when inherently consumed in network interaction The a large amount of time.However some in practical application are applied, medical field etc. is not high to the coherence request of data, goes back simultaneously With inquiry data in a period of time, there are certain incidence relation or almost homogeneous characteristics, therefore how to improve distributed data The search efficiency of library system, so that it is provided high performance service is to need the problem of researching and solving.

Summary of the invention

For the deficiency of above-mentioned traditional distributed database, the invention proposes in a kind of inquiry of distributed data base system Between result set caching and application method.The method achieve multiple to what is be related to when distributed data base executes query task The subquery results collection that child node generates is cached, if so that distributed data base system is held again within following a period of time When row query task, in the case that query context is less than buffered results range or range duplicates, system can direct centering Between result set be multiplexed, to reduce the network interaction resource that consumes by repeating for inquiry.

The main thought of the method for the present invention is: after query statement decomposes each child node, acquiring each child node inquiry The intermediate result set information that clause generates, establishes the data cached collection of intermediate result；It establishes and is looked into according to database execution sentence division The recognition mechanism for asking range, judges whether the data cached collection of intermediate result can be used when there is new query task, and then reduces The network interaction resource that database sends the number of query task to backstage child node and consumes when executing query task.

Realization of the invention comprises the following steps:

(1) the data cached collection storage model of intermediate result is established

The purpose of this step is that subquery results are established with storage model for caching the intermediate result of subquery generation.It should Storage model is divided into head caching and result cache two parts.It includes local data bank ip, database name, pass that head caching, which needs to record, It is mode, column field name etc.；Result cache needs to record all data item present in single row data.By generating above two Part is data cached to cache the data information that each node returns.

(2) recognition mechanism of query context is established

The purpose of query context identification is that the matching degree of intermediate result and future Query is determined by recognition mechanism.For reality The judgement to query statement query context is realized in this existing target, design multiway tree expression (structure is as shown in Fig. 1 in explanatory diagram).It should Multiway tree from the beginning node to any one non-root node path produce a complete where clause, and single node storage number It is executed on subdata base node according to the where sentence for the connected generation of nodes all on from the beginning node to the node path Inquiry.The building of multiway tree is that corresponding number is added in multiway tree while caching to subquery intermediate result set According to node, the information which includes include: node label search criterion Key with using the node as tail node inquiry generated The all information of the corresponding inquiry record of conditional statement.Inquiry record includes inquiry target database ip, table name, inquiry item Part, inquiry target column and the inquiry record the out-of-service time, and data structure is as shown in Figure 2.It is executed in subsequent query statement When, the corresponding child node subquery task to be executed is obtained through routing resume module first, is divided into query term, inquiry Table, multiple queries condition three, are retrieved in multiway tree according to querying condition later, if there are the querying condition institutes in multiway tree The query path of generation and on the path querying condition range are all larger than or are equal to current subtask restriction range, simultaneously should There are caching records for path tail node, then it is assumed that the sentence is identified, and there are intermediate results workable for the sentence in cache set Set cache.

(3) processing of the data cached collection failure of intermediate result

In step (1), intermediate result set cache is that have certain timeliness, and overlong time data volume must be huge, If untreated can seriously drag the slow database speed of service, while cannot guarantee that the validity of data.It is asked to solve this Topic concentrates the failure time limit of setting record in the intermediate result of caching, and timing walker is arranged, by time to buffered results It concentrates failure time limit attribute to be traversed, assert that the data set is super when time attribute occur in buffered results and being greater than current time When fail, delete operation is executed to it.The failure time limit is generally related with the operation of field and database.Out-of-service time is logical here Crossing inside the configuration file of configuration file realization has failure time attribute, this time is equivalent to one wherein and determines after reading Value.

Compared with prior art, the present invention has following apparent advantage and beneficial effect:

The present invention proposes a kind of intermediate result set caching method generated based on distributed networks database query sentence, for Network interaction demand is low and is directed to but data volume biggish query task not high to database coherence request, such as medical data Deng having preferable database response speed.

Detailed description of the invention

Fig. 1 identification tree explanatory diagram；

Fig. 2 identifies tree node class declaration figure；

Fig. 3 intermediate result caches explanatory diagram；

Fig. 4 identification tree example diagram

The flow chart of Fig. 5 method involved in the present invention；

Specific implementation measure

The present invention will be further described with reference to the accompanying drawings and detailed description.

Step 1, the data cached collection model of intermediate result is established.

Intermediate result set caching method takes its structure to be delayed using relevant database MySQL returned content as standard It deposits.Relevant database returns the result collection and includes two parts content: result head information, result column information.

Intermediate result caching head data set organization and intermediate result cache contents data set organization are shown in Fig. 3.Wherein intermediate knot It is FieldCache in figure that fruit, which caches head data set, and respectively head records id, the address database ip, database member name from top to bottom Title, database data table name, column original title array, complete raw cache head, cache contents id array.Intermediate result caching Content data set is RowCache in figure, and attribute is row record id, row data content from top to bottom.

According to above-mentioned two data entity, data cached memory mechanism is established, is deposited into database or document.

Non-relational database returns the result also different according to type of database difference.Therefore it is operated mainly To identify its returned content format, and its content is gone forward side by side according to the intermediate result set data that above-mentioned format conversion is above-mentioned model Row caching.

Step 2, the recognition mechanism of query context is established.

Recognition mechanism is differentiated slow with the presence or absence of available intermediate result set when executing query task using multi-fork tree method It deposits.Multiway tree single node content is shown in Fig. 2.

Wherein content is respectively Read-Write Locks, present node storage where clause (column, operator, value), operation from top to bottom Symbol, value, cache database ip, table name, cache attribute array, caching expired time, caching head id.Child node executes query task SQL statement turns to a paths for following binary tree after parsing.

The Multiway Tree Structure of generation is as shown in Figure 4.It is query statement " select name, sex from that path is indicated in figure The user where and of id > 1 and of age > 13 type=4；" sentence generation path.

When executing new SQL statement, sentence is divided into alternative condition, table name and querying attributes, and by alternative condition according to Be ranked up according to sequences of text, after alternative condition is searched in binary tree, meet full condition with item a certain in binary tree When path storing data is identical or range is enclosed less than the demonstration of binary tree internal standard, meet and query execution data library if tail node exists IP, table name are identical, querying attributes range it is identical or be greater than current queries objective attribute target attribute and caching it is not out of date, then illustrate current cache It can use.If there is the inconsistent situation of query context before, after executing secondary inquiry to the buffered results, it is spliced into new centre Result set packet executes union operation and returns to client after waiting subsequent intermediate result set cache to return.Otherwise it directly generates Intermediate result set packet waits subsequent processing.

Step 3, the failure mode of the data cached collection of intermediate result.

After obtaining the intermediate result set that child node returns, parsing result set content is simultaneously cached.Calculate caching record Current time and record come into force duration and, meter do the out-of-service time, typing caching head data set in.In addition system is by initiating Timed task loops through buffered results.When the caching head data for being less than current time in discovery data set there are the out-of-service time When, delete operation is carried out to when front data and with the cache contents data of this data correlation.Similarly, recognition mechanism passes through another It plays timed task mode and stale data is traversed and deleted to it, but recognition mechanism also can be performed when finding fail data Caching record operation is deleted to Free up Memory.Recognition mechanism delete operation follows following principle: delaying when deleting certain in execution When depositing record, if the case where the caching record is the unique caching record of present node, and any child node is not present in present node Under, deletion directly is executed to present node, and its father node should be judged with the presence or absence of caching record.If there is no slow for its father node Record is deposited, then should continue to delete father node and repeats whether above-mentioned detection father node is do-nothing operation.

Step 4, it realizes without network interaction subquery

No network interaction subquery flow of task is shown in Fig. 5.

As shown in figure 5, subquery obtains its destination path in identification caching through resolution phase first before executing.

Judge that the caching whether there is according to path, while recording subquery range less than cache path range of condition Number.The path and there are the caching intermediate results of database node performed by the subquery and tables of data in the path if it exists Collection, then cache hit, successively extracts column name and row data if the detection inconsistent number of query context is greater than 0, executes secondary Query filter result set.It directly generates after intermediate result set packet waits whole subqueries to be finished and carries out if inconsistent number As a result merge.

If the path is not present, subquery is issued to corresponding child node and waits child node feedback result packet.Feedback It, together will be in its typing with the out-of-service time after the result packet returned successively analyzes the information such as result packet header, result row data after receiving Between in result set cache.It adds the intermediate result set cache path in identification caching according to subquery sentence later and records slow Deposit the out-of-service time.Implementing result union operation after waiting whole subqueries to be finished later.

Claims

1. a kind of caching method of distributed data base system inquiry intermediate result set, which is characterized in that comprise the following steps:

Step 1, the data cached collection model of intermediate result is established；

Intermediate result set caching method takes its structure to be cached using relevant database MySQL returned content as standard；It closes It is that type database returns the result collection comprising two parts content: result head information, result column information；

It includes that head records id, the address database ip, database member title, database data that intermediate result, which caches head data set organization, Table name, column original title array, complete raw cache head, cache contents id array；Intermediate result cache contents data set packet Id, row data content are recorded containing row；

According to above-mentioned two data entity, data cached memory mechanism is established, is deposited into database or document；

Non-relational database returns the result also different according to type of database difference；Therefore it is operated to identify it Returned content format, and its content according to the intermediate result set data that above-mentioned format conversion is above-mentioned model and is cached；

Step 2, the recognition mechanism of query context is established；

Recognition mechanism is differentiated using multi-fork tree method whether there is available intermediate result set cache when executing query task；It is more Fork tree single node includes Read-Write Locks under high concurrent, present node storage where clause, operator, value, cache database ip, table Name, cache attribute array, caching expired time, caching head id；Child node execute query task SQL statement turned to after parsing with A paths for lower binary tree；

When executing new SQL statement, sentence is divided into alternative condition, table name and querying attributes, and by alternative condition according to text This is sequentially ranked up, after alternative condition is searched in binary tree, meet full condition with paths a certain in binary tree When storing data is identical or range is less than the demonstration of binary tree internal standard and encloses, if tail node exist meet with query execution data library IP, Table name is identical, querying attributes range it is identical or be greater than current queries objective attribute target attribute and caching it is not out of date, then illustrate that current cache can With；If there is the inconsistent situation of query context before, after executing secondary inquiry to the buffered results, it is spliced into new intermediate knot Fruit Ji Bao executes union operation and returns to client after waiting subsequent intermediate result set cache to return；Otherwise in directly generating Between result set packet wait subsequent processing；

Step 3, the failure mode of the data cached collection of intermediate result is set；

After obtaining the intermediate result set that child node returns, parsing result set content is simultaneously cached；Calculate working as caching record The preceding time and record come into force duration and, meter do the out-of-service time, typing caching head data set in；In addition system is by initiating timing Duty cycle traverses buffered results；It is right when finding to be less than the caching head data of current time there are the out-of-service time in data set Delete operation is carried out when front data and with the cache contents data of this data correlation；Similarly, recognition mechanism is fixed by setting up another When task state traversed and deleted stale data to it, but recognition mechanism also can be performed when finding fail data and delete Caching record operation is to Free up Memory；Recognition mechanism delete operation follows following principle: remembering when deleting certain caching in execution When record, if in the case that the caching record is the unique caching record of present node, and any child node is not present in present node, directly It connects and deletion is executed to present node, and its father node should be judged with the presence or absence of caching record；If there is no caching notes for its father node Record should then continue to delete father node and repeat whether above-mentioned detection father node is do-nothing operation；

Step 4, realize subquery without network interaction

Subquery obtains its destination path in identification caching through resolution phase first before executing；

Judge that the caching whether there is according to path, while recording subquery range less than cache path range of condition number；If There are the path and the path is there are the caching intermediate result set of database node performed by the subquery and tables of data, then delays Hit is deposited, if the detection inconsistent number of query context is greater than 0, column name and row data is successively extracted, executes secondary inquired Filter result set；It directly generates after intermediate result set packet waits whole subqueries to be finished and is tied if inconsistent data is 0 Fruit merges；

Subquery is issued to corresponding child node and waits child node feedback result by the cache miss if the path is not present Packet；After the result packet fed back to successively analyzes result packet header, result row data information after receiving, recorded together with the out-of-service time Enter in intermediate result set cache；The intermediate result set cache path is added in identification caching according to subquery sentence later and is remembered Record the cache invalidation time；Implementing result union operation after waiting whole subqueries to be finished later.