CN100594497C

CN100594497C - System for implementing network search caching and search method

Info

Publication number: CN100594497C
Application number: CN200810117515A
Authority: CN
Inventors: 李晓林; 徐志伟; 谢毅
Original assignee: Institute of Computing Technology of CAS
Current assignee: Hainan Nanhai cloud Information Technology Co., Ltd.
Priority date: 2008-07-31
Filing date: 2008-07-31
Publication date: 2010-03-17
Anticipated expiration: 2028-07-31
Also published as: CN101329686A

Abstract

The invention provides a caching system for realizing network inquiry, which comprises an inquiry binder, an inquiry resolver, an inquiry descriptor manager, an inquiry scheduler and an inquiry buffer. The query binder is used for binding virtual view meta-information relating to the examples of inquiry requests to a system memory and setting in sequence time marks of write operation of virtual views from a lower layer to an upper layer; the inquiry resolver is used for resolving the virtual views into a querying tree according to the validity of the examples of the inquiry requests; the inquiry descriptor manager is used for storing a queue of the examples of the inquiry requests imposed on the virtue views, determining examples of the inquiry requests to enter and exit the queue, and judging the validity of the examples of the inquiry requests; the inquiry scheduler is used for dispatching the inquiry nodes of the querying tree so as to execute inquiry; the inquiry buffer is used forestablishing and deleting temporary tables, managing data set in the temporary tables, updating time characteristics making of examples of the inquiry requests in the inquiry descriptor and outputting inquiry result. The cache of the system of the invention is carried out based on increment, thereby improving inquiring performance and throughout rate, and providing transparent support for the implementation process of the inquiry requests.

Description

A kind of system and querying method of realizing the network inquiry buffer memory

Technical field

The present invention relates to the Computer Applied Technology field, particularly a kind of system and querying method of realizing the network inquiry buffer memory of database application in the network environment.

Background technology

Along with becoming, the processing of network application layer service logic and calculating becomes increasingly complex, query requests may relate to the instant visit to a plurality of distributed datas source on the network, this a large amount of based on the conjunctive query on the distributed data source, owing to be subjected to the influence of factors such as the network bandwidth, request load, data volume, access performance usually is the bottleneck that this class is used, therefore, the performance optimization technical research at this class application model is a focus always.

From the technical method aspect, up to the present, mainly contain two kinds of thinkings and realize that the caching mechanism of this class distributed query improves the performance of visit.

Thinking one is to be cache in internal memory, the data result collection that query requests is needed (when perhaps asking for the first time) in advance reads in the internal memory, and later request just is based on internal storage access.

The project of for example increasing income memcache, it is a kind of high performance distributed objects buffer memory, and the installation site is between application program and data storage, and it is kept at object among the RAM.During access object, if buffer memory only needs this object of retrieval also to proceed to handle, otherwise, go to database, obtain essential data, it be mapped in the object and it add in the buffer memory.Like this, memcache will minimize or eliminate the database of information inquiry expense at first pre-treatment.The shortcoming of this way is: 1) data object is to be buffered in the internal memory, therefore, is subjected to the constraint of memory size, can not support the buffer memory of mass data query results to need; 2) memcache internal storage data tissue is with the memory object pattern, is not with relational data model, therefore, for application and development provides special-purpose access interface, does not accept standard SQL sentence.

The caching mechanism of MySQL database for example also is to utilize the memory cache tables of data again, if move identical SQL, server is directly got the result from buffer memory, and does not need to go to resolve and carry out SQL again.If table has been changed, all buffering inquiries of using this table so are with no longer valid, and the relevant entry of query caching value is cleared.Change comprises any data or the change of structure in the table, also comprises the inquiry of those use MERGE that is mapped to the table that has changed tables.Obviously, this is for the table of frequent updating, and query caching is unaccommodated, and for some tables that seldom changes data and a large amount of identical SQL query are arranged, query caching can be saved very big performance.The shortcoming of this way is: 1) be subjected to the constraint of memory size equally; 2) data cached replacement policy is to be the unit with whole tables of data, rather than is the unit with more fine-grained query requests; 3) inquiry must be byte-by-bytely identically can be considered to identical, and condition is identical but sets of fields visit is different is considered to different requests; 4) on the server that data cached exists the database place, also do not support the buffer memory between the multi node server of distributed network environment at present.

Thinking two is by setting up the tables of data mechanism of interim storage, part intermediate result is stored in the temporary table, and the partial query visit is carried out at temporary table.This pattern is not subjected to the constraint of memory size, can support the visit of mass data, and various application programs have different solutions, and key is how to determine establishment, deletion, the validity check mechanism of temporary table.Key problem wherein is to resolve the balance of two factors: the one, and the least possible data access, the transmission of mass data on network avoided in particularly network data visit; The 2nd, in the utilization factor that guarantees to improve as much as possible under the ageing requirement cache.Federative database system and data warehouse have been used some similar techniques.

As shown in Figure 1, federative database system (FDBS) is made of half autonomous Database Systems, share data each other, provide access interface mutually between each data source of alliance, the federative database system can be integrated data store system or distributed data base system simultaneously.Application program can be submitted the inquiry of writing with SQL to federal server.Federal server optimization should be inquired about, and produced one and carried into execution a plan, and wherein this inquiry is broken down into the fragment that can carry out on each data source.May have multiple decomposition to this inquiry, optimizer always consumes estimated value according to minimum of resources and makes a choice from multiple possibility.In case selected scheme, federative database just begin to carry out, and call corresponding bottom data source encapsulation device and carry out the fragment of distributing to them.In order to carry out certain code segment, wrapper can be carried out any required data source operation, and perhaps these operations are a series of function calls, or submits to the inquiry that data source is carried out with its this machine query statement.The data stream that generates is returned to federal server, makes up them by federal server, carries out any other the processing that can't be finished by data source, then net result is returned to application program.In order to improve performance, federative database usually adopts autoabstract table (AST) mechanism to realize the high-speed cache of certain form, and it makes the keeper can define the view of the materialization of data in one group of bottom table.For the inquiry of some type, database can automatically determine whether to use AST to answer inquiry, and need not visit base table.The shortcoming of this way is: 1) replacement policy of the view of materialization is to be the unit with whole tables of data, rather than is the unit with more fine-grained query requests; 2) be difficult to guarantee the ageing of higher relatively Materialized View in network environment.

As shown in Figure 2, data warehouse in business administration and decision-making subject-oriented, integrated, with data acquisition time correlation and that can not revise, provide data resource and software support by bottom data base management system (DBMS).Wherein, data be classified as broad sense, on the function independently, do not have an overlapping theme.The derived data of global data impact damper cache database page or leaf and data warehouse, the data in advance of needs visit is all read in the local data base of data warehouse, the go forward side by side renewal of line period, for query manipulation provides work space, application requests all is based on local library table or view visit, simultaneously, the global data impact damper need generate efficient buffer allocative decision and replacement policy for many inquiries.This way can improve the efficient that data query is handled, but is not easy to solve following demand: 1) the sharing of result data collection between multi-user's query requests; 2) be difficult to guarantee that at distributed network environment higher relatively buffering is ageing, cause the high capacity phenomenon of network cycle simultaneously easily.

Solving top problem need mutually combine time of request relative data visit and space characteristics and caching mechanism accordingly, and a kind of query caching mechanism of the transparence based on increment is provided.

Summary of the invention

The present invention be directed to the distributed query demand in network environment mass data source, a kind of system and querying method of realizing the network inquiry buffer memory is provided,, provide transparency support to the query requests implementation simultaneously to improve the performance and the throughput of inquiry.

For achieving the above object, according to an aspect of the present invention, provide a kind of system that realizes the network inquiry buffer memory, comprised inquiry binding device, query parser, query specification symbol manager, query scheduling device and query caching device, wherein:

Described inquiry binding device is used for binding the virtual view metamessage relevant with the query requests example to Installed System Memory, and sets gradually the write operation time tag of virtual view from lower floor to upper strata;

Described query parser is used for according to the validity of described query requests example described virtual view being resolved to query tree for being made of query node;

Described query specification symbol manager, be used to store the formation of the query requests example that is applied on the described virtual view, determine that described query requests example passes in and out described formation, and identify the validity of judging described query requests example according to the write operation time tag of described virtual view and the time response of described query requests example;

Described query scheduling device is used for carrying out inquiry according to the query node that the described query tree of described query parser generation is successively dispatched to the order on upper strata by the lower floor of described query tree;

Described query caching device is used for creating and the deletion temporary table, and passes in and out described formation according to described query requests example and manage data set in the described temporary table, and upgrades the time response sign of query requests example in the query specification symbol, output Query Result.

In the said system, described query specification symbol manager comprises the inquiry lock, is used for controlling the renewal to the time response sign of the write operation of described query specification symbol and described query specification symbol query requests example.

In the said system, described query parser utilization inquiry merges and inquires about push technology down, a plurality of virtual views is resolved to a sub-query node of described query tree.

In the said system, described query scheduling device carries out concurrent scheduling.

According to another aspect of the present invention, also provide a kind of network inquiry method, comprised the following steps:

1) binds the virtual view metamessage relevant in Installed System Memory, and set gradually the write operation time tag of virtual view from lower floor to upper strata with the query requests example;

2) judge the validity of described query requests example according to the time response sign of the write operation time tag of described virtual view and described query requests example, and described virtual view is resolved to query tree for being made of query node according to the validity of described query requests example;

3) all query nodes of successively dispatching described query tree to the order on upper strata according to the lower floor of described query tree are carried out inquiry, the data set that described each query node inquiry is obtained imports temporary table, and upgrades the time response sign of query requests example in the query specification symbol;

4) data set of the root query node of the correspondence of request example described in the described temporary table of output.

In the said method, described step 2) further comprise:

21) create or obtain the query specification symbol of the current virtual view correspondence that will resolve;

22) judge the validity of described query requests example according to the time response sign of the write operation time tag of described virtual view and described query requests example, if effectively execution in step 3);

23) judge whether described query requests example corresponding virtual view does not visit other virtual views, if execution in step 3 then);

24), described query requests example is resolved to the subquery node of lower floor's virtual view of corresponding described virtual view according to the mapping relations of virtual view;

25) the query requests example at this virtual view of current parsing is put into the query requests example formation that described query specification accords with; And then carry out step 21 successively), 22), 23) and 24) up to forming the query tree that constitutes by query node.

In the said method, described step 22) comprise the following steps:

221) judge described query requests example whether in the request queue of described query specification symbol, if not described query requests example cache invalidate;

222) judge whether the time response sign of described query requests example is expired, if expired, then described query requests example cache invalidate; Otherwise, effectively.

In the said method, described step 3) comprises the following steps:

31) the current orlop query node of scheduling is carried out inquiry;

32) data set with inquiry imports temporary table, and upgrades the time response sign of described query requests example;

33) based on the execution result of the query node of lower floor, carry out the upper strata query node of described lower floor query node, realize the inquiry of converging of data;

34) judge whether it is the root query node, if not, then carry out step 31 successively), 32) and 33), obtain the data set of described query node inquiry.

In the said method,

Described step 2) also comprises step: the inquiry lock that obtains the query specification symbol;

Described step 3) also comprises step: after described each query node is inquired about the data set importing temporary table that obtains, discharge the inquiry lock of the query specification symbol of corresponding described query node;

Described step 4) also comprises step: the inquiry lock that discharges the root query node.

Effect of the present invention is: by Intel Virtualization Technology and view techniques, realize the unified management and the expression of distributed query request; In the implementation, do not need disposable all data to be moved in the temporary table, the foundation of temporary table data is to set up by increment accumulation gradually in the process of implementation; Simultaneously validity check, management and the renewal of data is not based on the granularity of table, but with the request example be that granularity unit carries out, the size of the result set data volume of request example is uncertain, data set of a corresponding sign of request example (might have intersection between the data set that request example condition is identified, but the physical data record having only portion) in the query specification symbol; Simultaneously, this increment caching mechanism application programs is fully transparent.

Use method of the present invention, can realize in the distributed multi-data source inquiry, utilize request to have temporal locality and the spatial locality that data are visited, by validity check, management and scheduling to the request example, the fine granularity that realizes the data access result set between the request example is shared, under guaranteeing than the high-timeliness prerequisite, macroscopical throughput and performance in the fully raising system.

Description of drawings

Fig. 1 is the federative database architectural schematic;

Fig. 2 is the data warehouse architectural schematic;

Fig. 3 is a system construction drawing of realizing network distribution type inquiry increment buffer memory according to an embodiment of the invention;

Fig. 4 is the visit and the mapping relations figure of virtual view formation according to an embodiment of the invention, and resolves the corresponding query tree structure diagram that the back forms by query parser optimization;

Fig. 5 is a network inquiry method flow diagram according to an embodiment of the invention.

Embodiment

Below in conjunction with accompanying drawing the specific embodiment of the present invention is described in further detail.

As shown in Figure 3, according to one embodiment of the invention, a kind of system that realizes the increment buffer memory of network environment distributed query is provided, has comprised inquiry binding device (query binder), query parser (query parser), query specification symbol manager (query descriptor), query scheduling device (queryscheduler), five main functional modules of query caching device (query cacher).Each functions of components and realization are specific as follows:

Inquiry binding device: the query requests of the multi-data source of distribution, need at first utilize virtualized method and view techniques, the tables of data that application program is needed is shown as virtual view.Owing to the connection of virtual view by relation, operation such as unite and to form new virtual view, so can make up the needed virtual view of complex application.According to one embodiment of the invention, set up a virtual view with application program ultimate demand of hierarchical structure, concern the conjunctive query of representing data source by the visit between the virtual view.The visit of application program is based on these virtual views submission query requests, system is when receiving a query requests example, inquiry binding device is according to the visit relation of virtual view, bind the virtual view metamessage relevant in Installed System Memory, form and the corresponding virtual view of this query requests example with this query requests example.The data variation that causes owing to the write operation to lower floor's virtual view may cause the ephemeral data of upper strata virtual view invalid, so in binding, the write operation time tag that also needs to set gradually virtual view from bottom to up is the last write operation time in this virtual view and its all lower floor's virtual views, checks during for query parse that validity provides foundation.

Query parser: the query requests example that application program is submitted to is at virtual view, the decomposition that needs to do the query requests example according to the virtual view metamessage of binding is resolved, according to the validity of this query requests example it is resolved to the query tree that is made of query node with certain partial ordering relation at virtual view, wherein each subquery node representative is applied to the query requests example on this virtual view.Preferably, suitably utilize optimisation techniques such as inquiry merges, inquiry pushes away down, will unite query node of formation to the inquiry of several views, this does not influence method of the present invention.

Shown in Fig. 4 (a), by connecting (join), associating relational operations such as (union), form a kind of virtual view visit and mapping relations of hierarchical structure, wherein, virtual view 4,5 forms virtual view 2 by attended operation; Virtual view 6,7 forms virtual view 3 by joint operation; And

virtual view

2 and 3 forms virtual view 1 by attended operation again.When resolving, query parser, is resolved the query requests example that is applied to successively on every layer of virtual view according to virtual view mode map relation from top layer (root), forms the final query tree shown in Fig. 4 (b).Notice that query tree and virtual view visit relation is not one to one, in Fig. 4 (a), 2 pairs of virtual views 4 of virtual view and 5 have the visit relation, when virtual view 4 and 5 is mapped to two tables of data in the same physical data source respectively, by the optimization mechanism of resolving, inquiry 2 in the query tree maps directly on these two data table schemas, just can directly apply this connection query requests example during execution on this data source.Shown in Fig. 4 (b), inquiry 2, inquiry 4, inquiry 5 is subquery nodes, is the root query node and inquire about 1.

When resolving a concrete query requests example, need to obtain represent the inquiry of the query specification symbol of all lower floor's views to lock, when avoiding a plurality of users to visit a virtual view simultaneously, to the conflict of corresponding temporary table critical resource read-write operation.

Query specification symbol manager: the query requests example that is applied on the virtual view is managed by query specification symbol manager, virtual view is corresponding one by one with the query specification symbol, because the existence of a plurality of user conversation concurrencies, exist a plurality of query requests examples to visit the situation of a view simultaneously, but querying condition, time that each query requests example specifically is applied on this virtual view are not quite similar, and have created condition for shared portion data set between the query requests example like this.So, the formation of a special storage administration query requests example during according with, a query specification is arranged, and can be as for the size setting of formation according to server configures and request feature-set.Each the query requests example that is applied on this virtual view is an object elements of this formation.Describe a query requests example three important attribute are arranged: request condition is used to identify the condition of the data set of this query requests example; Time response sign---the read operation time tag that the query requests example takes place; Owner's attribute of the session of expression query requests example.Be convenient to determine to have between the query requests example relation of set character like this, for example, the data set of certain query requests example request condition sign is the subclass of the data set that identified of another query requests example request condition.This queue size is limited, therefore needs a kind of replacement computing method of turnover.Query specification symbol is also corresponding one by one with the temporary table that physics exists, and the aforementioned calculation method also just directly corresponds to the turnover replacement update strategy of data set in the temporary table, and whether the data set that also promptly has an inspection current request the effective criterion of buffer memory.Strategy is described below:

If current query requests example is represented cache invalidate not in the request queue of query specification symbol.Criterion in request queue is: current querying condition equates with the condition of certain query requests example; Perhaps current querying condition is the subclass of certain query requests example condition representative data set.If satisfy therein, whether the read operation time tag of time response sign of then judging the query requests example is expired, be specially, if the write operation time of the write operation time tag of corresponding virtual view sign is to take place after the last read operation of query requests example, then the time marking of this query requests example is expired.If the time response sign is expired, represent that this query requests example corresponding cache is invalid, otherwise effectively.This virtual view will continue further to resolve downwards when invalid; In the time of effectively, just can not continued to be decomposed to form new subquery node downwards, can directly dispatch this subquery node and carry out by the query scheduling device by query parser at this inquiry.When carrying out a new query requests example, if formation is full, certain query requests example need be replaced dequeue, be replaced to as if query requests example the earliest of last access time.

Each query specification symbol manager also has the lock control of a correspondence, be the inquiry lock, have only to have obtained the inquiry lock, could do the alter operation of write operation and query requests example time response sign to the query specification symbol, thereby control is to the read-write operation of data set in the temporary table.

Query scheduling device: behind the query parser generated query tree, need to carry out inquiry by query scheduling device scheduling query node.Scheduling can directly be applied to inquiry on the corresponding database server, also inquiry can be distributed on other collaborative servers of carrying out, continue to carry out a complete procedure of request of access again on this server, this is actually optimization mechanism and guarantees that as far as possible the intermediate data result who transmits between network is minimum.

Because the query tree that generates has the partial order dependence, also be the time relation, and query caching, also be spatial relationship, so query scheduling can be a concurrent scheduling, but need satisfy the relation between the query node.When the subquery of lower floor is all carried out end, the data set that comprises data result all stores in the temporary table of corresponding physical existence, carry out the upper strata query node at data in the temporary table this moment again, and the realization data are compiled multilayer, and then form the temporary data set of each query node.In case after carrying out the upper strata query node, just can discharge the inquiry lock of its lower floor's query node correspondence fully.And the like, up to the query node of carrying out top layer, and obtain final data result.Return to the application requests net result simultaneously, discharging the inquiry lock of top layer query node.

The query caching device: a query specification accords with the temporary table that a corresponding physics exists as mentioned above, the query caching device will comprise that the data set of all query requests examples of each query specification symbol adds up and import in the corresponding temporary table that wherein the valid data collection in temporary table no longer imports.The query caching device is done fine granularity management to the data set in the temporary table, mainly comprises: the calling in/access of temporary table establishment/deletion, data set.After certain query requests example of query specification symbol is replaced, the data set of the condition correspondence of this query requests example might be still by other query requests example institute share and access, also be, has intersection between the data set that query requests example condition is identified, the query caching device does not allow this part data is accessed doing when accessing.Therefore, when accessing data, need to obtain and the formation of inspection query requests example, and calculate the constraint statement that accesses data.Suppose that query requests example queue length is n, corresponding query requests example condition is respectively cond1, cond2 ..., condn.(1≤m≤n) the query requests example goes out row, and the condition represented of derived data should be condm and not (cond1 or cond2...or condm-1 orcondm+1...or condn) in theory so if replace m.In the time of practical operation, need carry out the equivalent optimization process of constraint condition.The query caching device also is used for being provided with and upgrading the time response sign of the corresponding query requests example of query specification symbol; And after executing the top layer query node, in application program output temporary table, should ask the data set of the root query node of example correspondence.

Fig. 5 is a network inquiry method flow diagram according to an embodiment of the invention, specifically comprises the following steps:

Step S1, the query requests example that application program is submitted at virtual view.

Step S2 binds the virtual view metamessage relevant with this query requests example in Installed System Memory, and sets gradually the last write operation time tag of the write operation time tag of virtual view for this virtual view and its lower floor's virtual view from bottom to up.

Identify the validity of judging the query requests example according to the write operation time tag of described virtual view and the time response of query requests example, resolving virtual view according to the validity of query requests example becomes the query tree that is made of query node, and wherein query node is represented the query requests example.Concrete operations are:

Step S3 creates or obtains the query specification symbol of the current virtual view correspondence that will resolve, and obtains the inquiry lock of this query specification symbol, realizes the read-write control to data set in the temporary table; Step S4 judges the buffer memory validity of query requests example, checks promptly whether this query requests example desired data exists in temporary table and its time characteristic sign is not out of date, if execution in step S8 then; Step S5 judges whether this query requests example corresponding virtual view does not visit other virtual views, if execution in step S8 then; Step S6, query parser is according to the mapping relations of this virtual view, and this query requests example is resolved to subquery node to lower floor's virtual view that should virtual view; Step S7, the query requests example at this virtual view of current parsing is put into the query requests example formation of corresponding query specification symbol, if the formation of request example is full, then query requests example the earliest of last access time is replaced dequeue, and access its data set in temporary table; And then carry out step S3, S4, S5, S6, S7 successively, up to forming a query tree that constitutes by query node;

Carry out inquiry according to all query nodes of successively dispatching described query tree to the order on upper strata from the lower floor of described query tree, the data set that described each query node inquiry is obtained imports temporary table, and upgrades the time response sign of query requests example in the query specification symbol.Concrete operations are:

Step S8, the query scheduling device is satisfying under the partial order dependence situation that forms between the inquiry, dispatches current orlop query node and carries out inquiry; Step S9 after the inquiry of each subquery node executes, imports temporary table with the data set of inquiring about, and upgrades the time response sign of corresponding query requests example; Step S10 after the query node of lower floor all executes, carries out the upper strata query node of this lower floor's query node based on the Query Result of lower floor's query node, realizes the inquiry of converging of data, and discharges the inquiry lock of all nodes of lower floor; Step S11 judges whether it is the root query node, if not, carry out step S8, S9, S10 successively, obtain the data set of scheduling root query node inquiry.

Step S12, the data set of exporting described query node inquiry as a result of returns to application program, discharges top layer inquiry lock.

Should be noted that and understand, under the situation that does not break away from the desired the spirit and scope of the present invention of accompanying Claim, can make various modifications and improvement the present invention of foregoing detailed description.Therefore, the scope of claimed technical scheme is not subjected to the restriction of given any specific exemplary teachings.

Claims

1. a system that realizes the network inquiry buffer memory comprises inquiry binding device, query parser, query specification symbol manager, query scheduling device and query caching device, wherein:

2. system according to claim 1 is characterized in that, described query specification symbol manager comprises the inquiry lock, is used for controlling the renewal to the time response sign of the write operation of described query specification symbol and described query specification symbol query requests example.

3. system according to claim 1 is characterized in that, described query parser utilization inquiry merges and inquires about push technology down, a plurality of virtual views is resolved to a sub-query node of described query tree.

4. system according to claim 1 is characterized in that, described query scheduling device carries out concurrent scheduling.

5. a network inquiry method comprises the following steps:

6. method according to claim 5 is characterized in that, described step 2) further comprise:

7. method according to claim 6 is characterized in that, described step 22) comprise the following steps:

8. method according to claim 5 is characterized in that described step 3) comprises the following steps:

31) the current orlop query node of scheduling is carried out inquiry;

9. method according to claim 5 is characterized in that,