CN106528687B - A kind of SPARQL enquiring and optimizing method of column family covering storage - Google Patents

A kind of SPARQL enquiring and optimizing method of column family covering storage Download PDF

Info

Publication number
CN106528687B
CN106528687B CN201610939104.8A CN201610939104A CN106528687B CN 106528687 B CN106528687 B CN 106528687B CN 201610939104 A CN201610939104 A CN 201610939104A CN 106528687 B CN106528687 B CN 106528687B
Authority
CN
China
Prior art keywords
inquiry
column
inquired
implementation strategy
preferential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610939104.8A
Other languages
Chinese (zh)
Other versions
CN106528687A (en
Inventor
刘光曹
张世栋
王琰
邱鹤庆
林黎鸣
陈升
赵光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XIAMEN GREAT POWER GEO INFORMATION TECHNOLOGY Co Ltd
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Original Assignee
XIAMEN GREAT POWER GEO INFORMATION TECHNOLOGY Co Ltd
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XIAMEN GREAT POWER GEO INFORMATION TECHNOLOGY Co Ltd, State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd filed Critical XIAMEN GREAT POWER GEO INFORMATION TECHNOLOGY Co Ltd
Priority to CN201610939104.8A priority Critical patent/CN106528687B/en
Publication of CN106528687A publication Critical patent/CN106528687A/en
Application granted granted Critical
Publication of CN106528687B publication Critical patent/CN106528687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of SPARQL enquiring and optimizing methods of column family covering storage, when for star-like inquiry, using connecting preferential implementation strategy, query optimization be the packet that projects of column family simultaneously;It when for chain inquiry, is estimated by cost, judging connection, preferentially simultaneously which kind of preferential implementation strategy cost is minimum with packet, therefrom selects the smallest implementation strategy of cost, search efficiency can be made to be obviously improved.

Description

A kind of SPARQL enquiring and optimizing method of column family covering storage
Technical field
The present invention relates to the SPARQL inquiry in RDF data inquiring technology field more particularly to a kind of covering storage of column family is excellent Change method.
Background technique
RDF data model is a kind of figure shape form, and as unit of side, flexible structure is suitable for description non-mode or half module Formula data.Currently, the inquiry based on RDF data has become eager demand, wherein SPARQL query language is received by W3C, becomes The query criteria of RDF data.With the addition of source of new data and the variation of data own content, mode is forever all in change In change.Thus, the storage in RDF data library relates to disperse data storage into column family.There is similitude between these column families, The phenomenon that there are alternate coverings all contains identical column name in that is, multiple column families, and referred to as column family covers.In general, predicate of the same name is permitted Permitted to appear in multiple column families, but redundancy is not present in the data between column family, and all information for describing the same subject only occur in In one column family.Under such a scenario, it would be desirable to consider that column family covers the excellent of the data set inquiry under this complex situations Change problem.Query structure based on SPARQL can be classified as three types: chain inquiry, star-like inquiry and mixing inquiry, by It is made of in mixing inquiry star-like inquiry and chain inquiry, is optimized so will be inquired mainly for first two.
Summary of the invention
The technical problems to be solved by the present invention are: providing a kind of SPARQL query optimization of column family covering storing data collection Method.
In order to solve the above-mentioned technical problem, a kind of the technical solution adopted by the present invention are as follows: SPARQL of column family covering storage Enquiring and optimizing method, comprising:
When for star-like inquiry, inquired using the implementation strategy for connecting preferential;
When for chain inquiry, judgement is using connection preferential implementation strategy and packet respectively and preferential implementation strategy carries out The cost of inquiry selects the lesser implementation strategy of cost to be inquired.
The beneficial effects of the present invention are: in the SPARQL inquiry of column family covering storing data collection, for star-like inquiry Two kinds of situations are inquired with chain, optimization method is provided respectively, improves search efficiency.
Detailed description of the invention
Fig. 1 is the SPARQL enquiring and optimizing method flow chart of column family of the present invention covering storage;
Fig. 2 is the star-like inquiry schematic diagram of the embodiment of the present invention;
Fig. 3 is that chain of the embodiment of the present invention inquires schematic diagram.
Specific embodiment
To explain the technical content, the achieved purpose and the effect of the present invention in detail, below in conjunction with embodiment and cooperate attached Figure is explained.
The most critical design of the present invention is: being directed to star-like inquiry, the preferential implementation strategy of connection is used;It is looked into for chain It askes, first determining whether connection, preferentially simultaneously which kind of preferential implementation strategy cost is minimum with packet, therefrom selects the smallest implementation strategy of cost.
Please refer to Fig. 1 to Fig. 3, a kind of SPARQL enquiring and optimizing method of column family covering storage, comprising:
When for star-like inquiry, inquired using the implementation strategy for connecting preferential;
When for chain inquiry, judgement is using connection preferential implementation strategy and packet respectively and preferential implementation strategy carries out The cost of inquiry selects the lesser implementation strategy of cost to be inquired.
As can be seen from the above description, the beneficial effects of the present invention are: in the SPARQL inquiry of column family covering storing data collection In, two kinds of situations are inquired for star-like inquiry and chain, provide optimization method respectively, it is preferential using connection for star-like inquiry Implementation strategy, inquired for chain, first determine whether connection preferentially with packet and which kind of preferential implementation strategy cost it is minimum, Cong Zhongxuan The smallest implementation strategy of cost is selected, search efficiency is improved.
Further, when for star-like inquiry, using connecting before preferential implementation strategy inquired, further include by Star-like connection is converted into the filtering of column family, obtains the column family comprising whole attributes in star-like inquiry.
Seen from the above description, the column family for not including all star-like querying attributes, inquiry are first filtered before carrying out star-like inquiry When can to avoid unnecessary scanning, improve search efficiency.
Further, it when for star-like inquiry, is inquired, is specifically included using the implementation strategy for connecting preferential:
For attribute to be checked, the column family comprising whole attributes in star-like inquiry is projected;
If only remaining a column family after filtering, query results are column family projection;
If multiple column families are left in filtering, query results be the packet of the multiple column family projection simultaneously.
Further, when for chain inquiry, tool is estimated by cost and judges the execution plan preferential using connection respectively The cost that summary and packet and preferential implementation strategy are inquired, selects the lesser implementation strategy of cost to be inquired.
Further, it includes judging by whether there is index that the cost, which estimates tool, if the column that inquiry is related to are deposited It is indexing, is then being inquired using the preferential implementation strategy of connection;If the column that are related to of inquiry there is no index, using packet and excellent First implementation strategy is inquired.
Further, it when chain, which is inquired, is inquired using the preferential implementation strategy of connection, specifically includes:
The combination is conspired to create one by the combination that whole attributes are found out while inquired comprising chain from the train value of all column families Group is inquired the connection of relationship sequence by chain;
If having to a kind of combination, query results are the result of the connection;
If obtaining multiple combinations, query results be the connection result packet simultaneously.
Further, it when chain is inquired using packet and preferential implementation strategy is inquired, specifically includes:
It for each RDF Subject, Predicate and Object tuple, is found out from all column families and is related to the column of chain querying attributes, by the column The train value packet of corresponding non-empty simultaneously gets up;
The result of the packet simultaneously is connected according to chain inquiry relationship, obtains query results.
Embodiment
Please refer to Fig. 1 to Fig. 3, the embodiment of the present invention one are as follows: as shown in Figure 1, a kind of SPARQL of column family covering storage Enquiring and optimizing method, comprising:
It when for star-like inquiry, is inquired using the implementation strategy for connecting preferential, is further optimized into column family projection Simultaneously, embodiment is as follows for packet:
As shown in Fig. 2, the purpose of such inquiry is the information that inquiry is directly linked node, that is, the purpose inquired is to obtain Value of the foaf:person1 on the two attributes of foaf:firstName and foaf:surName,? firstName and? SurName "? " expression is followed by variable name, and to simplify description, star-like inquiry is simplified shown as:? a Pred1? v1 }? a Pred2? v2 }? a Pred3? v3 } etc..
Star-like connection is converted into the filtering of column family, that is, filters each column family left, it is necessary to meet: each column family In include column name, it is necessary to include all properties involved in star-like inquiry.In all column families, each remaining column are filtered Race T1, T2, T3 ... contain these relationships of Pred1, Pred2, Pred3 ....
Next, filtering remaining column family to each, projected for attribute to be checked: πID,Pred1,Pred2... (T1)、πID,Pred1,Pred2...(T2) etc..On the one hand column unrelated with attribute to be checked in column family are got rid of;On the other hand it gets rid of Attribute value to be checked is empty row.
If filtering only remains a column family T1, query results are column family projection, i.e. πID,Pred1,Pred2...(T1);
If multiple column family T1, T2, T3 ... are left in filtering, query results be the packet of these column families projection simultaneously, i.e., πID,Pred1,Pred2...(T1)∪πID,Pred1,Pred2...(T2)∪πID,Pred1,Pred2...(T3)...。
When for chain inquiry, judgement is using connection preferential implementation strategy and packet respectively and preferential implementation strategy carries out The cost of inquiry selects the lesser implementation strategy of cost to be inquired.
When for chain inquiry, judging connection, preferentially simultaneously which kind of preferential implementation strategy cost is small with packet, therefrom selects cost Small implementation strategy.A kind of method that cost is estimated is judged by the mode with the presence or absence of index: if the column that inquiry is related to There are indexes, then are inquired using the preferential implementation strategy of connection;If inquiring the column being related to there is no indexing, simultaneously using packet Preferential implementation strategy is inquired.It is implemented as follows described:
Algorithm ChainOptimize(q,dd)
Input:
The SPARQL of mono- chain of q-- is inquired
The data dictionary of dd-- database, including column family information
Output: the SQL query after sql-- optimization
ChainOptimize is principal function, according to the chain SPARQL of input inquiry and data dictionary information (including column family Information) optimized after SQL query.1st is about to chain query decomposition as multiple basic chart-patterns (i.e. Subject, Predicate and Object members of RDF Group), each pair of basic chart-pattern is connected with each other, and constitutes chain inquiry;Then it is corresponding each basic chart-pattern to be obtained in the 2nd~4 row Column family, be stored in variable colFamSet;5th~7 row be it is crucial, for each pair of connection, SQL after being optimized, specifically Realization in AddSQL function.The groundwork of AddSQL is: being directed to each connection, judgement is connection preferential or packet and excellent First, default support packet is simultaneously preferential, as according to the Executing Cost estimated, it is found that connection preferentially advantageously, then it is excellent to be converted to connection First literary style, specific practice are as follows:
AlgorithmAddSQL(sql,curCols,nextCols,dd)
Input:
Sql-- is to the SQL query before this connection optimization
The current corresponding column family of basic chart-pattern of curCols--
The corresponding column family of the next basic chart-pattern of nextCols--
Dd-- data dictionary
Output: sql-- is to the SQL statement after this connection optimization
When estimating Executing Cost, first judge that this, to whether having index or sequence in the corresponding column family of basic chart-pattern, leads to It crosses function IndexExists to obtain, if it does, judging whether the preferential Executing Cost of connection is excellent by JoinFirst function In packet and preferential, if so, then rewrite SQL query according to connecting preferential method, using AddJoin function by next chart-pattern Column family union column family union corresponding with current chart-pattern be attached;Otherwise, as shown in the 6th row, the side AddUnion is called Method, according to packet and preferential method realizes the connection of two chart-patterns.
As shown in figure 3, being that the characteristics of chain inquires schematic diagram, and chain is inquired is that it lays particular emphasis on searching and closes indirectly with node The information of other nodes of connection, for example, the title of inquiry ' foaf:person1 ' partner, since RDF graph is a digraph, Actual chain inquiry may not be able to be shown as a directed chain in figure, it is more likely that by a plurality of contrary chain group At.Influence due to edge direction to inquiry conversion is little, and difference is only that the sequence of connection, to simplify description, chain inquiry letter Changing indicates are as follows:? a Pred1? v1 }? v1Pred2? v2 }? v2Pred3? v3 } etc..
Judgement is estimated by cost, chain inquiry is according to connecting preferential implementation strategy, its implementation are as follows: from all Various combinations are found out in the train value of column family, the connection of sequence that each combination conspires to create one group by Pred1, Pred2, Pred3 ...:? A train value,? v1 train value } Joinv1 it is identical? v1 train value,? v2 train value } Joinv2 it is identical? v2 train value,? v3 train value } ....If having to To a kind of combination, then query results are the result of above-mentioned connection;If obtaining multiple combinations, query results be it is above-mentioned these The packet of connection result is simultaneously.
Chain inquires implementation strategy according to packet and preferential, its implementation are as follows: is related to for chain inquiry each A RDF Subject, Predicate and Object tuple, finds out the column being related to from all column families, by the train value packet of corresponding non-empty and gets up, such as from institute Have and find out and be related in column family? a Pred1? v1 } column, by the train value packet of corresponding non-empty and get up;Equally, from all column families In find out and be related to? v1Pred2? v2 } column, by the train value packet of corresponding non-empty and get up ....Result simultaneously will be wrapped again according to this The incidence relation of chain inquiry, connects, that is, connect into? a,? v1 } Joinv1 it is identical? v1,? v2 } Joinv2 it is identical? V2,? v3 } ... form, obtain query results.
In conclusion a kind of SPARQL enquiring and optimizing method of column family covering storage provided by the invention, is looked into when to be star-like When inquiry, using connecting preferential implementation strategy, query optimization be the packet of column family projection simultaneously;When for chain inquiry, pass through cost It estimates, judges that connection preferentially with packet and which kind of preferential implementation strategy cost minimum, therefrom selects the smallest implementation strategy of cost, it can It is obviously improved search efficiency.
The above description is only an embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalents made by bright specification and accompanying drawing content are applied directly or indirectly in relevant technical field, similarly include In scope of patent protection of the invention.

Claims (2)

1. a kind of SPARQL enquiring and optimizing method of column family covering storage characterized by comprising
When for star-like inquiry, inquired using the implementation strategy for connecting preferential;
When for chain inquiry, tool is estimated by cost and judge respectively using the preferential implementation strategy and packet of connection and preferential The cost that implementation strategy is inquired selects the lesser implementation strategy of cost to be inquired;It includes borrowing that the cost, which estimates tool, It helps with the presence or absence of index to judge, if the column that inquiry is related to have index, is inquired using the implementation strategy for connecting preferential; If inquiring the column being related to there is no index, uses packet and preferential implementation strategy is inquired;
When for star-like inquiry, is inquired, is specifically included using the implementation strategy for connecting preferential:
For attribute to be checked, the column family comprising whole attributes in star-like inquiry is projected;
If only remaining a column family after filtering, query results are column family projection;
If multiple column families are left in filtering, query results be the packet of the multiple column family projection simultaneously;
When chain, which is inquired, is inquired using the preferential implementation strategy of connection, specifically include:
The combination for finding out from the train value of all column families while inquiring comprising chain whole attributes, by the combination conspire to create one group by The connection of chain inquiry relationship sequence;
If having to a kind of combination, query results are the result of the connection;
If obtaining multiple combinations, query results be the connection result packet simultaneously;
When chain is inquired using packet and preferential implementation strategy is inquired, specifically include:
It for each RDF Subject, Predicate and Object tuple, is found out from all column families and is related to the column of chain querying attributes, the column are corresponded to Non-empty train value packet and get up;
The result of the packet simultaneously is connected according to chain inquiry relationship, obtains query results.
2. the SPARQL enquiring and optimizing method of column family according to claim 1 covering storage, which is characterized in that when being star-like It further include the filtering that star-like connection is converted into column family before being inquired using the preferential implementation strategy of connection when inquiry, Obtain the column family comprising whole attributes in star-like inquiry.
CN201610939104.8A 2016-10-25 2016-10-25 A kind of SPARQL enquiring and optimizing method of column family covering storage Active CN106528687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610939104.8A CN106528687B (en) 2016-10-25 2016-10-25 A kind of SPARQL enquiring and optimizing method of column family covering storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610939104.8A CN106528687B (en) 2016-10-25 2016-10-25 A kind of SPARQL enquiring and optimizing method of column family covering storage

Publications (2)

Publication Number Publication Date
CN106528687A CN106528687A (en) 2017-03-22
CN106528687B true CN106528687B (en) 2019-07-16

Family

ID=58292948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610939104.8A Active CN106528687B (en) 2016-10-25 2016-10-25 A kind of SPARQL enquiring and optimizing method of column family covering storage

Country Status (1)

Country Link
CN (1) CN106528687B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8046354B2 (en) * 2004-09-30 2011-10-25 International Business Machines Corporation Method and apparatus for re-evaluating execution strategy for a database query
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8046354B2 (en) * 2004-09-30 2011-10-25 International Business Machines Corporation Method and apparatus for re-evaluating execution strategy for a database query
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Efficiently joining group patterns in SPARQL queries;Maria-Esther Vidal 等;《Processing ESWC" Proceedings of the 7th International conference on The Semantic Web:research and Applications》;20100603;228-242
关联模型支持下的关联参考服务研究;刘媛媛 等;《现代图书情报技术》;20121225(第12期);15-20

Also Published As

Publication number Publication date
CN106528687A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
CN106227800B (en) Storage method and management system for highly-associated big data
Shen et al. Discovering queries based on example tuples
De Virgilio et al. Converting relational to graph databases
US8166074B2 (en) Index data structure for a peer-to-peer network
US8326825B2 (en) Automated partitioning in parallel database systems
CN105630881B (en) A kind of date storage method and querying method of RDF
US20120084296A1 (en) Method and Apparatus for Searching a Hierarchical Database and an Unstructured Database with a Single Search Query
US9753960B1 (en) System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria
JP2005267612A (en) Improved query optimizer using implied predicates
US20080040317A1 (en) Decomposed query conditions
US8812492B2 (en) Automatic and dynamic design of cache groups
CN102955843B (en) Method for realizing multi-key finding of key value database
Qin et al. Scalable keyword search on large data streams
CN103631911A (en) OLAP query processing method based on array storage and vector processing
CN108052514A (en) A kind of blending space Indexing Mechanism for handling geographical text Skyline inquiries
Zhang et al. Towards efficient join processing over large RDF graph using mapreduce
CN104102699B (en) A kind of subgraph search method in the set of graphs that clusters
US7472130B2 (en) Select indexing in merged inverse query evaluations
CN110032676B (en) SPARQL query optimization method and system based on predicate association
CN106484815A (en) A kind of automatic identification optimization method for retrieving scene based on mass data class SQL
Cedeno et al. R2DF framework for ranked path queries over weighted RDF graphs
CN108241709A (en) A kind of data integrating method, device and system
CN106528687B (en) A kind of SPARQL enquiring and optimizing method of column family covering storage
Pitoura et al. Contextual Database Preferences.
CN115994146A (en) Hybrid data storage engine system, data storage method and access method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant