CN106599095A - Pruning method based on complete historical record - Google Patents

Pruning method based on complete historical record Download PDF

Info

Publication number
CN106599095A
CN106599095A CN201611056390.XA CN201611056390A CN106599095A CN 106599095 A CN106599095 A CN 106599095A CN 201611056390 A CN201611056390 A CN 201611056390A CN 106599095 A CN106599095 A CN 106599095A
Authority
CN
China
Prior art keywords
query
complete historical
historical
complete
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611056390.XA
Other languages
Chinese (zh)
Other versions
CN106599095B (en
Inventor
陈海波
姚友阳
陈榕
臧斌宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201611056390.XA priority Critical patent/CN106599095B/en
Publication of CN106599095A publication Critical patent/CN106599095A/en
Application granted granted Critical
Publication of CN106599095B publication Critical patent/CN106599095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a pruning method based on a complete historical record. The method comprises a first step that a client sends a query request, and a server receives the query request; a second step that the server parses the query request, and decomposes a query statement into small steps to execute; a third step that a query process is executed according to small query steps, so as to obtain query middle results, and corresponding pruning operations are performed on the middle results, wherein the pruning operations comprise a simple pruning operation and a pruning operation based on the complete historical record; and a fourth step that a post-pruning result and all historical results are simultaneously added to a new historical record table so as to be passed to the next small query step to continue pruning. Compared with the prior art, the method has the advantages that useless middle results are removed as early as possible according to the complete historical record, the characteristic of a high performance network (RDMA) is taken into account fully, and communication overhead is reduced. Compared with a traditional one-step pruning method, the pruning method prevents a final result combination operation with high overhead, thereby greatly improving the performance of a search system.

Description

A method is subtracted based on complete historical
Technical field
The present invention relates to a kind of figure inquiry subtracts a method, it is more particularly to a kind of that branch side is subtracted based on complete historical Method.
Background technology
Graph structure data are increasingly common in large-scale network application, and especially mass data all presents freedom And abundant relevance, the network application in each face of each side is widely used in strongly connected diagram data, such as some are commercially searched Index is held up including Google (Google) and must answer (Bing) using RDF (Resource Description Framework) to show The content of the expression webpage of formula.And for the network application of these process magnanimity diagram datas, the execution of user's online query Speed is unusual the key link, wherein it is to reduce one of important means for postponing to subtract a method to possible result, efficiently Subtracting a method can earlier reject incorrect result, reduce communication-cost, improve the overall performance of inquiry system.
It is a kind of high performance network that remote direct memory accesses (RDMA, Remote Direct Memory Access) Mechanics of communication, can directly access remote memory address, including direct read operation and write operation, and because RDMA can be complete Bypass the CPU of target machine, it is not necessary to which the participation of target machine CPU is assisted, therefore show low delay and high handling capacity, Big advantage is shown compared to traditional network communication.Mono- significant properties of RDMA is, in certain transmission data size Under, the delay of RDMA keeps relatively low delay to be basically unchanged, this is because little data volume can't take high Netowrk tape It is wide.
System is when the inquiry request of user is performed, it will usually many useless intermediate results are produced, if these results Remain into always and finally rejected again, inherently cause the huge waste and larger communication-cost of resource, thus it is existing System typically can specifically subtract a method and useless intermediate result is rejected using some.Existing RDF query system is led to Often adopt single step and subtract the end product of method that branch and final result union operation combine needed for obtain user, this method exists The result of previous step is only included when each step is performed, therefore can not completely reject useless result, bring extra communication Expense, further, since each step still includes useless result, it is therefore desirable to which all of result is focused on one by last in execution Union operation is carried out on machine, and this process readily becomes the performance bottleneck of whole system.
Therefore how to design one and efficiently subtract a method, the useless result of rejecting as early as possible reduces the expense of communication, and to the greatest extent Amount avoids last time-consuming amalgamation result operation, and then the overall performance of lifting distributed Query Processing System, accelerates the inquiry of user Process, it has also become those skilled in the art's technical barrier urgently to be resolved hurrily.
The content of the invention
For defect of the prior art, it is an object of the invention to provide a kind of subtract branch side based on complete historical Method, it can make full use of the characteristic of high performance network, useless result in the middle of rejecting as early as possible, it is to avoid last time-consuming merging behaviour Make, reduce the delay of user's inquiry request.
A kind of according to present invention offer subtracts a method based on complete historical, including:
Step 1:Client sends inquiry request, and server receives inquiry request;
Step 2:Server parses inquiry request, the query statement in inquiry request is resolved into into multistep and is performed, wherein, institute The each step stated in multistep is designated as small step;
Step 3:Query script is performed according to small step, inquiry intermediate result is drawn, intermediate result is carried out to subtract branch operation, obtained To subtracting the result after branch;
Step 4:The result after branch will be subtracted and all of historical results together add new history table, by new history Record sheet passes to next small step and inquires about for continuing to subtract branch.
Preferably, the step 1 includes:Client selects a server to send inquiry request, and server monitors inquiry Request, and initial interrogation related data, clear history record sheet, are to perform query script to prepare.
Preferably, the step 2 includes:Server is received after inquiry request, and inquiry request is parsed, and inquiry please Ask including multiple queries sentence, query statement is resolved into multiple small steps and performed by server according to different query statements.
Preferably, the step 3 includes:
Step 3.1:According to the query statement of small step, matching operation is carried out to the data that data are concentrated, if complete history note Entry number in record is sky, then Data Matching is carried out by the constant in query statement, draws intermediate result;If complete history Entry number in record for sky, then by the value of simultaneous variable in query statement and historical record, to data Collection carries out matching operation and draws intermediate result;
Step 3.2:The intermediate result drawn to small step execution carries out the different branches that subtract and operates according to following different situation:
- when the corresponding query interface of the intermediate result for newly increasing does not exist in complete historical, held according to small step Constant in row carries out subtracting branch operation, and the intermediate result for being unsatisfactory for the constant condition is disallowable;
- when the corresponding query interface of the intermediate result for newly increasing has record in complete historical, then according to history Variate-value in record carries out subtracting branch with whether the variate-value in intermediate result matches, specially:For the centre for newly increasing As a result each record in corresponding complete historical, to the variate-value in historical record and the intermediate variable for newly increasing In corresponding variate-value carry out judging whether equal, the record is rejected if unequal, otherwise retain the record;
Wherein, the query interface, refers to:Unknown quantity in query statement, it is unknown that needs return this in Query Result Measure corresponding value.
Preferably, the step 4 includes:
Step 4.1:The result after branch will be subtracted to add in complete historical table, a row are increased newly in complete historical table To represent the query interface of new addition, and the entry number in complete historical table also accordingly increases or reduces;
Step 4.2:Complete historical table follows query script to pass to next small step query statement to subtract for next step Branch, according to the situation for being related to data of next small step following different transmission operation is performed:
- when the data involved by the implementation procedure of next small step are in local machine, complete historical table is carried out locally Transmission, is not related to network transmission;
- when the data involved by the implementation procedure of next small step are in REMOTE MACHINE, complete historical table follows inquiry Request is sent to the server of distal end and continues executing with.
Compared with prior art, the present invention has following beneficial effect:
1. it is proposed by the invention that a method is subtracted based on complete historical, can be according to complete historical, as early as possible The useless intermediate result of rejecting, reduce the expense of communication, subtract a method compared to a traditional step, expense can be avoided huge End product union operation, therefore show larger performance advantage.
2. the present invention has fully taken into account the characteristic of high performance network (RDMA), and reduces transmission as far as possible using its characteristic The communication-cost of complete historical so that communication delay can keep a relatively low level, take full advantage of high network Bandwidth.
3. it is proposed by the present invention that a method is subtracted based on complete historical, distributed Query Processing System is widely portable to, Limited resource is fully dispatched, the waste of resource is reduced, the delay of inquiry request is reduced as far as possible, and improve whole inquiry system Performance.
Description of the drawings
The detailed description by reading non-limiting example made with reference to the following drawings, the further feature of the present invention, Objects and advantages will become more apparent upon:
Fig. 1 is that the present invention uses the flow chart for subtracting a method based on complete historical.
Specific embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, some changes and improvements can also be made.These belong to the present invention Protection domain.
A kind of according to present invention offer subtracts a method based on complete historical, comprises the steps:
Step 0:Data are distributed by the parallel loading initial data of multiple servers, carry out some initialization operations;
Step 1:Server receives the inquiry request that client sends;
Step 2:Server parses inquiry request, and query statement is resolved into into several small steps (being usually no more than 15 small steps) Perform;
Step 3:Query script is performed according to inquiry small step, inquiry intermediate result is drawn, intermediate result is carried out accordingly Subtract branch operation;
Step 4:Result and all of historical results after branch will be subtracted and together add new history table, passed to next little Step inquiry is used for continuing to subtract branch.
The step 1 includes:Client selects one to load relatively low server (according to the request that server is carrying out Quantity) send inquiry request, server monitors request, and by initial interrogation related data, clear history record result table, Corresponding preparation is done to perform inquiry.
The step 2 includes:Server is received after inquiry request, and inquiry request is parsed, and inquiry request is general It is made up of a plurality of query statement, query statement is resolved into multiple execution small steps by server according to different query statements, Under RDF data form, inquiry request is usually to be made up of multiple triples, therefore is here to be held to divide according to triple Row small step.
The step 3 includes:
Step 3.1:According to the sentence of inquiry small step, matching operation is carried out to the data that data are concentrated, if complete history note Entry number in record is sky, such case typically performing first inquiry small step, then simply by query statement in it is normal Count to send out and carry out Data Matching, draw intermediate result;If the entry number in complete historical is not sky, by query statement The value of simultaneous variable is set out in middle historical record, is carried out matching operation to data set and is drawn intermediate result, here in Between result refer to the other end corresponding value of the triple relative to the variable;
Step 3.2:The centre drawn to small step execution carries out the different branches that subtract and operates according to different situations:
- when the corresponding query interface of the result for newly increasing does not exist in complete historical, according in inquiry small step Constant simply subtracted branch operation, i.e., it is whether equal with the constant entering by judging intermediate result that step 3.1 draws Row subtracts branch, and the result for being unsatisfactory for the constant condition is disallowable;
- when the corresponding query interface of the result for newly increasing has record in complete historical, now will be according to history Whether the variate-value in record matches to carry out subtracting branch with new value, specially:The new complete historical produced corresponding to result In each record, the corresponding value of variable in the corresponding value of variable in historical record and new result is carried out judging equal behaviour Make, the record is rejected if unequal;
Wherein, query interface, refers to:Unknown quantity in query statement, inquiry system needs to return it in Query Result Corresponding value.
The step 4 includes:
Step 4.1:The new result after branch will be subtracted to add in complete historical table, at this moment needed in complete historical table In increase a row newly to represent the query interface of new addition, while the entry number in complete historical table also accordingly increases or subtracts Few (may cause the reduction of intermediate result due to subtracting branch), here entry number refers to the line number in complete historical table;
Step 4.2:After new result to be added complete historical table, complete historical will follow query script to pass Pass next small step query statement and subtract branch for next step, performed according to the concrete condition for being related to data of next small step different Transmission operation:
- when the data involved by the implementation procedure of next small step are in local machine, complete historical is only needed simply Local transmission, is not related to network transmission;
- when the data involved by the implementation procedure of next small step are in REMOTE MACHINE, complete historical needs to follow to look into Asking son asks the server for being sent to distal end to continue executing with;
Further specifically, the transmission complete historical of this step make use of the characteristic of high performance network (RDMA), One outstanding feature of RDMA communication modes is:(such as less than 2000 bytes), the delay of transmission when transmission data size is less Keep relatively low level and be basically unchanged.The present invention transmits the less complete historical of data volume using this characteristic, can To reach higher efficiency of transmission and relatively low transmission delay.This is because the step number inquired about in RDF query and query interface It is generally less, and historical record is typically converted into digital ID to represent, size of data is smaller, and if next small step is at this Ground is performed then equivalent to local transmission complete historical, and this operation can avoid the process for communicating.
Subtract a method based on complete historical to realize in the present invention, complete historical is used herein dynamic The table structure of state is designated as complete historical table storing, and complete historical table is made up of columns and rows, wherein row are used for representing Query interface included in user's inquiry request, row is used for storing the record entry of historical results.Process of the table in inquiry In be dynamic change, i.e., its line number and columns may increase in query script or reduce, this is because in query script In may increase result (such as perform certain small step inquiry) newly, it is also possible to carry out subtracting branch to result (when the sentence of inquiry is present back During the situation of road), complete historical is stored with such dynamic table structure and seems succinct, convenient, and can guarantee that on table to row Carry out deleting the high efficiency for increasing operation with row.
The present invention using based on complete historical subtract a method rather than traditional single step subtracts a method, main cause It is that traditional method that subtracts often causes larger overhead.Traditional single step subtracts a method following problem:
(1) higher communication-cost, the useless intermediate result of redundancy is caused often to remain in single step subtracts a method Finally, therefore there is substantial amounts of intermediate result disallowable, so as to the waste for causing to communicate;
(2) time-consuming last amalgamation result operation, can only be judged because single step subtracts branch by the result of previous step, All of useless result cannot be rejected, therefore after query statement has been performed, still there are some results to be unsatisfactory for last requirement, Finally need to focus on all of result and last amalgamation result operation is carried out on one machine, this is likely to become whole system Performance bottleneck.
And the present invention takes subtracts a method based on complete historical, have following excellent compared to traditional method that subtracts Gesture:
(1) effectively prevent tradition and subtract last amalgamation result operation time-consuming in a method, by transmitting complete history note Recording, all of useless result just can be rejected all in the process of implementation, and need not wait for execution finally carries out again result merging, Result in historical record needed for included all of user;
(2) it is efficient to communicate to reduce the expense of transmission complete historical using RDMA, using the friendly communications of RMDA Mode, effectively using the network bandwidth, reduces the delay of transmission, it is to avoid the wasting of resources in conventional method.
In sum, proposed by the present invention to subtract a method based on complete historical, rejecting that can be as early as possible is useless As a result, the network bandwidth is saved, and makes full use of the characteristic of high performance network transmission, relatively low communication delay can be kept, originally finally Invention is avoided that tradition subtracts the expense of the very time-consuming last amalgamation result that a method is brought, therefore can utilize to greatest extent Limited resource, improves the overall performance of inquiry system.
The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can within the scope of the claims make a variety of changes or change, this not shadow Ring the flesh and blood of the present invention.In the case where not conflicting, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims (5)

1. it is a kind of that a method is subtracted based on complete historical, it is characterised in that to include:
Step 1:Client sends inquiry request, and server receives inquiry request;
Step 2:Server parses inquiry request, the query statement in inquiry request is resolved into into multistep and is performed, wherein, it is described many Each step in step is designated as small step;
Step 3:Query script is performed according to small step, inquiry intermediate result is drawn, intermediate result is carried out to subtract branch operation, subtracted Result after branch;
Step 4:The result after branch will be subtracted and all of historical results together add new history table, by new historical record Table passes to next small step and inquires about for continuing to subtract branch.
2. it is according to claim 1 that a method is subtracted based on complete historical, it is characterised in that the step 1 includes: Client selects a server to send inquiry request, and server is monitored inquiry request, and initial interrogation related data, emptied History table, is to perform query script to prepare.
3. it is according to claim 1 that a method is subtracted based on complete historical, it is characterised in that the step 2 includes: Server is received after inquiry request, and inquiry request is parsed, inquiry request include multiple queries sentence, server according to Different query statements, resolves into query statement multiple small steps and performs.
4. it is according to claim 1 that a method is subtracted based on complete historical, it is characterised in that the step 3 includes:
Step 3.1:According to the query statement of small step, matching operation is carried out to the data that data are concentrated, if in complete historical Entry number for sky, then Data Matching is carried out by the constant in query statement, draw intermediate result;If complete historical In entry number for sky, then by the value of simultaneous variable in query statement and historical record, data set is entered Row matching operation draws intermediate result;
Step 3.2:The intermediate result drawn to small step execution carries out the different branches that subtract and operates according to following different situation:
- when the corresponding query interface of the intermediate result for newly increasing does not exist in complete historical, in being performed according to small step Constant carry out subtracting branch operation, the intermediate result for being unsatisfactory for the constant condition is disallowable;
- when the corresponding query interface of the intermediate result for newly increasing has record in complete historical, then according to historical record In variate-value carry out subtracting branch with whether the variate-value in intermediate result matches, specially:For the intermediate result for newly increasing Each in corresponding complete historical record, in the variate-value in historical record and the intermediate variable for newly increasing Corresponding variate-value carries out judging whether equal, and the record is rejected if unequal, otherwise retains the record;
Wherein, the query interface, refers to:Unknown quantity in query statement, needs return the unknown quantity pair in Query Result The value answered.
5. it is according to claim 1 that a method is subtracted based on complete historical, it is characterised in that the step 4 includes:
Step 4.1:The result after branch will be subtracted to add in complete historical table, a row are increased newly in complete historical table and carried out table Show the query interface of new addition, and the entry number in complete historical table also accordingly increases or reduces;
Step 4.2:Complete historical table follows query script to pass to next small step query statement to subtract branch, root for next step Following different transmission operation is performed according to the situation for being related to data of next small step:
- when the data involved by the implementation procedure of next small step are in local machine, complete historical table is locally transmitted, It is not related to network transmission;
- when the data involved by the implementation procedure of next small step are in REMOTE MACHINE, complete historical table follows inquiry request The server for being sent to distal end is continued executing with.
CN201611056390.XA 2016-11-24 2016-11-24 Branch reduction method based on complete historical record Active CN106599095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611056390.XA CN106599095B (en) 2016-11-24 2016-11-24 Branch reduction method based on complete historical record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611056390.XA CN106599095B (en) 2016-11-24 2016-11-24 Branch reduction method based on complete historical record

Publications (2)

Publication Number Publication Date
CN106599095A true CN106599095A (en) 2017-04-26
CN106599095B CN106599095B (en) 2020-07-14

Family

ID=58591987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611056390.XA Active CN106599095B (en) 2016-11-24 2016-11-24 Branch reduction method based on complete historical record

Country Status (1)

Country Link
CN (1) CN106599095B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491274A (en) * 2018-04-02 2018-09-04 深圳市华傲数据技术有限公司 Optimization method, device, storage medium and the equipment of distributed data management

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254001A (en) * 2011-07-14 2011-11-23 青岛海信网络科技股份有限公司 Efficient data management method and system
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN103455556A (en) * 2013-08-08 2013-12-18 成都市欧冠信息技术有限责任公司 Intelligent storage unit data clipping process
CN103593435A (en) * 2013-11-12 2014-02-19 河海大学 Approximate treatment system and method for uncertain data PT-TopK query
US20140143281A1 (en) * 2012-11-20 2014-05-22 International Business Machines Corporation Scalable Summarization of Data Graphs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254001A (en) * 2011-07-14 2011-11-23 青岛海信网络科技股份有限公司 Efficient data management method and system
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
US20140143281A1 (en) * 2012-11-20 2014-05-22 International Business Machines Corporation Scalable Summarization of Data Graphs
CN103455556A (en) * 2013-08-08 2013-12-18 成都市欧冠信息技术有限责任公司 Intelligent storage unit data clipping process
CN103593435A (en) * 2013-11-12 2014-02-19 河海大学 Approximate treatment system and method for uncertain data PT-TopK query

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LEI ZOU 等: "gStore: Answering SPARQL Queries via Subgraph Matching", 《PROCEEDINGS OF THE VLDB EENOWMENT》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491274A (en) * 2018-04-02 2018-09-04 深圳市华傲数据技术有限公司 Optimization method, device, storage medium and the equipment of distributed data management

Also Published As

Publication number Publication date
CN106599095B (en) 2020-07-14

Similar Documents

Publication Publication Date Title
US9928113B2 (en) Intelligent compiler for parallel graph processing
CA2562281C (en) Partial query caching
WO2018035799A1 (en) Data query method, application and database servers, middleware, and system
US20140280020A1 (en) System and Method for Distributed SQL Join Processing in Shared-Nothing Relational Database Clusters Using Self Directed Data Streams
US9229961B2 (en) Database management delete efficiency
US20220358178A1 (en) Data query method, electronic device, and storage medium
CN114356971A (en) Data processing method, device and system
CN108829740A (en) Date storage method and device
CN104423982A (en) Request processing method and device
US20190327342A1 (en) Methods and electronic devices for data transmission and reception
US10747773B2 (en) Database management system, computer, and database management method
CN106484694B (en) Full-text search method and system based on distributed data base
CN109117426A (en) Distributed networks database query method, apparatus, equipment and storage medium
CN113568938A (en) Data stream processing method and device, electronic equipment and storage medium
CN106484826A (en) A kind of method and device of operating database
CN115757477A (en) Database query processing method, device, equipment and storage medium
WO2024159628A1 (en) Ldap-based memory management method and apparatus, device, and storage medium
CN106599095A (en) Pruning method based on complete historical record
US20170371927A1 (en) Method for predicate evaluation in relational database systems
CN103366014B (en) System for cloud computing data handling system and method based on cluster
CN111221860A (en) Mixed query optimization method and device based on big data
CN112817799B (en) Method and device for accessing multiple data sources based on Spring framework
CN107506473A (en) A kind of big data search method based on cloud computing
CN104376054B (en) A kind of processing method and processing device of persisted instances object
CN103891244B (en) A kind of method and device carrying out data storage and search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant