CN105630881B - A kind of date storage method and querying method of RDF - Google Patents
A kind of date storage method and querying method of RDF Download PDFInfo
- Publication number
- CN105630881B CN105630881B CN201510955821.5A CN201510955821A CN105630881B CN 105630881 B CN105630881 B CN 105630881B CN 201510955821 A CN201510955821 A CN 201510955821A CN 105630881 B CN105630881 B CN 105630881B
- Authority
- CN
- China
- Prior art keywords
- storage
- data
- rdf
- triple
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/81—Indexing, e.g. XML tags; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/83—Querying
- G06F16/835—Query processing
- G06F16/8373—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the date storage method of RDF a kind of and querying methods, it is the storage organization and storage mapping by designing the RDF data of entity-oriented, it converts the URI of RDF data and literal to after 64 bit binary datas and is stored according to the storage organization of design, SPARQL query statement is parsed and converted in querying method, by multiple each inquiry triples in SPARQL sentence according to the connection relationship between the analysis result and each inquiry to entire data set, estimation is single to ask cost, ultimately generate minimum cost querying flow, the present invention can greatly promote the speed compared between data and reduce memory space, compared to traditional direct SQL is converted by SPARQL inquire, significantly promote search efficiency, it can be used for web data management, We The fields such as b semantic retrieval.
Description
Technical field
The invention belongs to web data administrative skill fields, and in particular to it is a kind of reduce RDF data memory space, improve
The date storage method and querying method of the RDF of the search efficiency of SPARQL.
Background technique
RDF (resource description framework) is proposed by WWW to WWW (World Wide
Web the frame that information is described on), it provides information Description standard for the various applications on Web.RDF subject S
(Subject), predicate P (Predicate), object O (Object) triple form the resource on Web is described.Wherein, main
Language generally use uniform resource identifier URI (Uniform Resource Identifiers) indicate Web on information entity (or
Person's concept), predicate describes association attributes possessed by entity, and object is corresponding attribute value.Such form of presentation makes RDF
It can be used to indicate appointing on WebWhat identified information, and it is exchanged among applications without losing
Lose semantic information.Therefore, RDF becomes the standard of semantic data description, is widely used in description, ontology and the semanteme of metadata
In net.With being increasing for semantic web data, the system that construction efficiently stored and inquired these semantic web datas becomes language
Adopted net application obtains a universal very important aspect, and RDF is basic as the description of semantic web data, therefore studies
The efficient storage of RDF data and inquiry become the hot spot of research of semantic web.The storage mode and optimal way of RDF data at present
There are mainly three types of.
The first, the storage mode based on relational database
Since RDF data can regard<Subject as, Predicate, Object>triple set, therefore it is most natural
Mode be directly to store these data using triple table.Therefore many RDF datas based on relational database store system
System directly uses relational database, designs triple table or similar mode to store RDF data.The step of this method, wraps
Contain: (1) RDF data being parsed into triple;(2) MD5 (Message Digest is passed through to the URI in triple
Algorithm 5) Hash is encoded, and intercepts preceding 64 identifiers as resource of MD5 Hash;(3) in relationship type number
Data are stored according to the table arranged in library using one 3, and establish relative index.But this method is looked into progress SPARQL
When inquiry, needs to convert structured query language SQL for SPARQL query language and inquire, need the conversion operation of multilayer.
Since RDF data and relation data are very different, when RDF data is stored in relation database table, need to carry out table
Between map operation.Therefore the efficiency of space utilisation and inquiry is reduced.
Second, the storage mode based on local binary file
RDF document be can with certain format store into file, in semantic net, a large amount of RDF document just with
The form of RDF/XML exists.RDF data and relation data make a big difference in structure, describe grammer and compare relation data
Complicated more in library, but describing resource using RDF is to have biggish flexibility.It can be with based on fixed disk file storage RDF document
Reach better storage efficiency, while can guarantee quickly to respond inquiry, has some storage organizations based on the hard disk at present
System design, B-tree, B+ tree and the Hash table technology that these systems are often generallyd use by means of database.But based on text
The storage mode development cost of part is relatively high, and since RDF is basic semantic web data description basis, if there are also
Need on basic storage organization support to data carry out inquiry reasoning that just also need to do a large amount of work.
The third, storage mode memory-based
With the continuous development of hardware technology, memory is also increasingly cheaper, and memory size is also increasing, and building is based on interior
The RDF data storage system deposited also becomes the hot spot of Recent study.Memory is capable of providing quickish access speed first, can
To be operated in real time to data, the I/O expense of disk is saved, if it is good to design a storage organization in memory
RDF storage system can further improve the efficiency of inquiry and analysis.But which is not suitable for large-scale RDF data
Storage, and current option b RAHMS, BitMat etc. does not support the direct inquiry of SPARQL.It can be seen that RDF memory-based is deposited
Storage structure, which is still within, constantly to be studied and improves the stage.
Summary of the invention
It is an object of the invention to overcome the shortcomings of above-mentioned prior art, propose a kind of for RDF education resource offer one
Compare speed between kind data fastly and reduces the RDF data storage method of memory space.
The present invention also provides a kind of RDF data issuers for matching with above-mentioned storage method and capableing of quick search
Method, to improve the recall precision of RDF education resource.
To achieve the goals above, the technical solution adopted by the present invention is that:
The storage method of RDF data of the present invention comprises the steps of:
(1) storage organization of the RDF data of entity-oriented is designed
(1.1) by the way of entity-oriented, data are stored in the k column of relevant database n row, wherein k is RDF number
The average value of the predicate quantity of all subjects in, n is the sum for the line number line that all subjects need, when the predicate of single subject
When quantity sum≤k, then needed for line number line=1;As sum > k, then multirow storage is carried out, then required line number line=(sum/
k)+1;
(1.2) after determining k value, according to mapping predicates algorithm, predicate is switched into column subscript, obtains the table of n row k column
Structure;
Wherein the predicate of step (1.2) is converted into the lower target of column method particularly includes:
(1.2.1) calculates column subscript, the formula of mapping predicates algorithm with mapping predicates algorithm are as follows:
H in formula1, h2…hjJ hash function is corresponded to, i is column subscript;
(1.2.2) then opens up new a line when j hash function calculates the subscript for completing still not find the free time,
The data are stored to h1In the subscript of calculating.
(2) it is designed for the storage mapping of RDF data
The URI of RDF data and literal are separately converted to by 64 bit binary datas using hash algorithm, URI takes hash
64 high, literal low 64 for measuring hash algorithm of algorithm, simultaneously into hash concordance list by the binary data storage of conversion
Ascending order arrangement is carried out to the row in hash concordance list, is quickly mapped and is converted by binary chop algorithm when to search;
(3) RDF data stores
After RDF data is mapped and converted according to the method for step (2), the table of step (1) is arrived in storage for the first time
In structure, to storage, into table structure, data are analyzed, and are created analytical table S, are recorded each Subject and Object includes
Triple number and highest 20 URI of the frequency of occurrences and the corresponding frequency of highest 20 literals of frequency, according still further to step
Suddenly the table structure of (1), using Object as storage entity, to data of the storage into table structure by step (2) mapping with
Second of storage, i.e. the data storage of completion RDF are carried out after conversion again.
A kind of and above-mentioned matched RDF data querying method of RDF data storage method, is to comprise the steps of:
(a.1) extraction and conversion of variable
The basic chart-pattern of triple in SPARQL query statement is decomposed, and determines the variable in query statement
Number be count, in query statement URI and literal respectively refer to the mapping mode in the step (2) in storage method will
It is converted into 64 bit binary datas, carries out -1 assignment for arriving-count to the variable for being included;
(a.2) conversion of basic query chart-pattern
According to the triple parent map Mode Decomposition in step (a.1) as a result, converting ternary for each basic chart-pattern
Group polling node structure, wherein triple query node structure are as follows:
Triple query node structure
{
The Id of node;
The Id of subject;
The Id of predicate;
The Id of object;
The mark of storage mode;
}
The first time storage or second of storage of step (3) in the mark selection RDF data storage method of storage mode;
To URI and literal, the Id of subject, predicate, object are respectively 64 bit binary datas;To change
Amount, the Id of subject, predicate, object correspond to institute's assigned value;
(a.3) expression of attended operation is inquired
It is mutually compared according to the triple decomposed in chart-pattern basic in step (a.1), to there are identical variables
Triple, established a connection using the node Id in step (a.2) structure as unique identifier, and convert connection relationship to
Attended operation side structure, wherein attended operation side structure are as follows:
Attended operation side structure
{
The Id of the node of triple is originated,
The Id of the node of triple is terminated,
The Id of co-variate
};
(a.4) Query Cost of each inquiry is calculated
According to triple query node structure obtained in step (a.2), to attended operation side obtained in step (a.3)
Structure carries out costing analysis according to cost algorithms respectively, and the cost value for obtaining attended operation side structure is c, the formula of cost algorithms
Are as follows:
TMC(t,m,S)→c
Wherein: t is the triple for needing to inquire;Storage or for the first time in the step of m is RDF data storage method (3)
Secondary storage;S is analytical table;
(a.5) generation of inquiry plan
The cost value c of all attended operation sides structure obtained in step (a.4) is subjected to ascending sequence, obtain by
The sequence node of cost value sequence, choosing the smallest node of c value in sequence is start node, is successively chosen next in sequence
Node is attached inquiry if the variable in node is not inquired, until the variable in all nodes is all completed to inquire, i.e., in fact
The inquiry of existing sentence.
It further include that step (a.6) establishes caching mechanism after above-mentioned steps (a.5), specifically: the inquiry to user's input
The set of sentence triple query node structure according to obtained in step (a.2) carries out hash operation, obtains hash function
End value directly takes out buffered results and feeds back to user if there are the values in cache list;Otherwise, then repeatedly step (a.3)
To (a.5), acquired results are stored in hard disk, corresponding address identifies and the end value of hash function is stored in cache list.
The date storage method and querying method of RDF of the invention are the optimization to the memory structure of data, and are directed to
The structure does query optimization to SPARQL, realizes the method that the education resource based on RDF is quickly retrieved and inquired.With
The prior art is compared, the invention has the following advantages that
(1) storage that the URI and literal of script are replaced using 64 bit binary datas, can greatly promote data
Between the speed that compares and reduce memory space, while to URI and literal, taking high 64 and low 64 of hash algorithm respectively
Position, to distinguish URI and literal as identical character string.And the storage of hash index record is ranked up, to search
When required record quickly navigated to by binary chop algorithm.
(2) storage organization of RDF data is stored simultaneously by the way of entity-oriented (entry-oriented)
It is entity with subject (Subject) and is entity two ways with object (Object), the former realizes efficiently from subject
(Subject) inquiry predicate (Predicate) is gone, a large amount of attended operation of the conventional store mode in inquiry is avoided;The latter
It realizes efficiently from predicate (Predicate) to the inquiry of Subject (subject).
(3) SPARQL query statement is parsed and is converted, by multiple each inquiry triples in SPARQL sentence
According to the connection relationship between the analysis result and each inquiry to entire data set, estimates single inquiry cost, ultimately generate minimum
Cost querying flow, compared to it is traditional it is direct convert SQL for SPARQL and inquire, significantly promote search efficiency.
(4) caching mechanism is added during inquiry, the data set high to enquiry frequency caches, and delays in memory
List is deposited, the row in each cache list includes the end value and address mark of hash function, promotes the efficiency of inquiry.
(5) present invention proposes that Data Storage Models and query optimization plan can extend to web data management, Web language
The fields such as justice retrieval, or even the storage and retrieval of others RDF resource data.
Detailed description of the invention
The analysis and conversion schematic diagram that Fig. 1 is the SPARQL of step (a.2) in embodiment.
Fig. 2 is the explanation that query tree is generated to SPARQL of step (a.3) in embodiment.
Fig. 3 is the cache model schematic diagram of step (a.6) in embodiment.
Specific embodiment
The present invention is described further with reference to the accompanying drawings and examples.
The date storage method of RDF is realized by following steps in the present embodiment:
(1) it is designed for the storage mapping of RDF data
For the storage organization of RDF data, by the way of entity-oriented (entry-oriented), data are stored to pass
It is in the k column of type database n row, wherein k is the average value of the predicate quantity of all subjects in RDF data, and n needs for all subjects
The sum of the line number line wanted.
(1.1) the columns k and required line number n of table structure are determined
As predicate (Predicate) quantity sum≤k of single subject (Subject), then needed for line number line=1;When
When sum > k, then multirow tuple is needed to be stored, required line number line=(sum/k)+1;
Such as following data:
(Charles Flint,born,1850)
(Charles Flint,died,1934)
(Charles Flint,founder,IBM)
(Larry Page,born,1973)
(Larry Page,founder,Google)
(Larry Page,board,Google)
(Larry Page,home,Palo Alto)
(Android,developer,Google)
(Android,version,4.1)
(Android,kernel,Linux)
(Android,preceded,4.0)
(Android,graphics,OpenGL)
Storage form is as shown in table 1:
Table 1 is using Object as the storage table of entity
(1.2) the subscript i of predicate (Predicate) storage is determined
After determining k value, according to mapping predicates algorithm, predicate is switched into column subscript, when multiple predicates of same target pass through
It crosses mapping algorithm and obtains identical subscript, be then known as conflicting, need to define the column that multiple hash algorithms utilize space as far as possible
With avoid conflicting, when multiple hash algorithms calculate complete still exist conflict when, then be the Subject more increase tuple one advance
Row storage, mapping predicates algorithmic function are as follows:
H in formula1, h2…hjJ hash function is corresponded to, i is column subscript,
When j hash function, which calculates, to be completed still not finding idle subscript, then new a line is opened up, by the data
It is stored to h1In the subscript of calculating.
In conjunction with table 1, check that Subject is the triple of Android, it is assumed that the triple is inserted into database one by one
In, setting j is 2, then there is h1,h2, the subscript process for calculating pred is as shown in table 2:
Table 2 is to calculate target process under predicate
Developer passes through h1Subscript 1 is calculated, at this time 1 element-free of subscript, directly places.
Version is similarly placed into subscript 2.
Kernel passes through h1It calculates, obtains subscript 1,1 is idle at this time, and meaning clashes, then uses h2Continue to calculate
It is designated as 3 under, places.
Preceded passes through h1It is calculated down and is designated as k placement.
Graphics passes through h1,h2Obtained subscript 3 and 2 is conflicted, then creates a line, put it into pred3。
(2) it is designed for the storage mapping of RDF data
The triple data of usual RDF are divided into two classes: URI and literal.
URI and literal are separately converted to by 64 bit binary datas using hash algorithm, hash algorithm is taken for URI
It is 64 high, for literal low 64 for measuring hash algorithm, to distinguish the URI and literal of identical characters string, by conversion
Binary data storage carries out ascending order arrangement into hash concordance list and to the row in hash concordance list, passes through two when to search
Lookup algorithm is divided quickly to be mapped and converted;
(3) RDF data stores
By RDF data according to the method mapping of step (2) with after conversion, the table structure of step (1) is arrived in storage for the first time
In, and to storage, into table structure, data are analyzed, and are created analytical table S, are recorded each Subject and Object include three
Tuple number and highest 20 URI of the frequency of occurrences and the corresponding frequency of highest 20 literals of frequency, according still further to step
(1) table structure by the mapping of step (2) and turns data of the storage into table structure using Object as storage entity
Second of storage is carried out after alternatively again, completes the data storage of RDF.
With the data in table 1, storage form is shown in table 3:
Table 3 is the storage form that data in table 1 are entity by Object
A kind of efficient method for quickly querying of the RDF data suitable for above method storage, is realized by following steps:
For including 6 basic chart-patterns of triple (Basic Graph Pattern, BGP) with SPARQL sentence, connect down
Need SPARQL query statement to convert, the purpose of conversion be to be able to it is convenient the storage result of bottom is operated, convert
Query Cost estimation is carried out to each triple later, lowest costs is ultimately formed and executes process, specifically by following steps reality
It is existing:
(a.1) extraction and conversion of variable
The basic chart-pattern of the triple of SPARQL query statement (Basic Graph Pattern, BGP) is decomposed,
And determine that the variable number in query statement is count, in query statement URI and literal deposit referring to above-mentioned RDF data
The mapping of the step of method for storing (2) and method for transformation are translated into 64 bit binary datas, for included in query statement
Variable carry out -1 arrive-count assignment;
Such as following data:
SELECT? x? y WHERE
X home " Palo Alto " //q1
Y founder " IBM " //q2
Z founder " Google " //q3
X memberOf? z. //q4
Z revenue? y. //q5
X developer? y. //q6
}
Above-mentioned query statement is parsed, obtain three variables? x,? y,? z, and all variables are subjected to id coding
It is -1, -2, -3, for other URI or literal, is then directly inquired in the concordance list of step (2).
(a.2) conversion of basic query chart-pattern
Referring to Fig. 1, according to the basic chart-pattern of triple (Basic Graph Pattern, BGP) in step (a.1) point
Solution structure converts triple query node structure for each basic chart-pattern, wherein triple query node structure are as follows:
Triple query node structure
{
The Id of node;
The Id of subject;
The Id of predicate;
The Id of object;
The mark of storage mode;
}
To URI and literal, the Id of subject, predicate, object are respectively 64 bit binary datas;To change
Amount, the Id of subject, predicate, object are institute's assigned value;
First time storage (the access- of step (3) in above-mentioned RDF data storage method may be selected in the mark of storage mode
By-Subject it) realizes with second of storage (access-by-Object), first time storage efficiently from subject (Subject)
Inquiry predicate (Predicate) is gone, a large amount of attended operation of the conventional store mode in inquiry is avoided;When subject is unknown,
Second of storage mode inquiry may be selected.
Before carrying out single ternary group polling, first have to determine number, the number of constant of each triple variable with
And the incidence relation between triple variable and constant, the sequence of inquiry can be determined according to these relationships.
(a.3) expression of attended operation is inquired
It is mutually compared according to the triple of triple parent map Mode Decomposition all in step (a.1), to presence
The triple of identical variable is established a connection using the node Id in step (a.2) structure as unique identifier, and connection is closed
System is converted into attended operation side structure, wherein attended operation side structure are as follows:
Attended operation side structure
{
The Id of the node of triple is originated,
The Id of the node of triple is terminated,
The Id of co-variate
}
Ultimately form the attended operation structure in Fig. 2.
Query statement is converted and handled by above-mentioned, realizes coding and the collection of variable, basic chart-pattern
Triple indicates and the attended operation of inquiry indicates.
(a.4) Query Cost of each inquiry is calculated
According to triple query node structure obtained in step (a.2), to the obtained attended operation in step (a.3)
Side structure carries out costing analysis according to conventional cost algorithms, and the cost value for obtaining attended operation side structure is c, the public affairs of cost algorithms
Formula are as follows:
TMC(t,m,S)→c
Wherein: t is the triple for needing to inquire;M is storage or for the first time in the storage method step (3) of RDF data
Secondary storage, S are analytical table;
Such as:
(? x founder Google)
Access-by-Object is used for the triple, then the implementing result of TMC function are as follows: each in analytical table S
The triple number for including in Object.
(a.5) generation of inquiry plan
The cost value c of all attended operation sides structure obtained in step (a.4) is subjected to ascending sequence, obtain by
The sequence node of cost value sequence, choosing the smallest node of c in sequence is start node, successively chooses next section in sequence
Point is attached inquiry if the variable in node is not inquired, until the variable in all nodes is all completed to inquire, that is, realizes
The inquiry of sentence.
With reference to Fig. 2, inquiry plan chooses first triple query node in inquiry plan first, and, as starting point, selection is looked into
The 4th query node in proposed figures for the plan is ask, according to the information of the inquiry plan provided, to variable? x is attached operation, obtains
To two variables<? x? z>intermediate result set;The intermediate result set is carried out with the 5th inquiry ternary group node according to change again
Amount? z is attached operation, obtain the middle tables of three variables<? z? x? y>, and so on, it executes and completes all inquiry languages
Sentence, will obtain<? z? x? y>middle table.SELECT operation finally is carried out to the result of inquiry, take out variable? x? y is corresponding
Value.
(a.6) caching mechanism is established
During data query, establish caching mechanism caching query as a result, referring to Fig. 3, to promote inquiry
Efficiency, concrete operations are:
The set of query statement triple query node structure according to obtained in step (a.2) of user's input is carried out
Hash operation, obtains the end value of hash function, if there are the values in cache list, directly takes out buffered results and feed back to use
Family;Otherwise, then repeat the above steps (a.3) to (a.5), acquired results be stored in hard disk, and by corresponding address mark and
In the end value deposit cache list of hash function.When the capacity of caching is more than expected setting, according to the frequency of inquiry, delete
Remove minimum frequency.
Claims (3)
1. a kind of RDF data storage method, it is characterised in that comprise the steps of:
(1) storage organization of the RDF data of entity-oriented is designed
(1.1) by the way of entity-oriented, data are stored in the k column of relevant database n row, wherein k is in RDF data
The average value of the predicate quantity of all subjects, n is the sum for the line number line that all subjects need, when the predicate quantity of single subject
When sum≤k, then needed for line number line=1;As sum > k, then carry out multirow storage, then needed for line number line=(sum/k)+
1;
(1.2) after determining k value, according to mapping predicates algorithm, predicate is switched into column subscript, obtains the table knot of n row k column
Structure, the predicate are converted into the lower calibration method of column are as follows:
(1.2.1) calculates column subscript, the formula of mapping predicates algorithm with mapping predicates algorithm are as follows:
H in formula1, h2…hjJ hash function is corresponded to, i is column subscript;
(1.2.2) then opens up new a line, by this when j hash function calculates the subscript for completing still not find the free time
Data are stored to h1In the subscript of calculating;
(2) it is designed for the storage mapping of RDF data
The URI of RDF data and literal are separately converted to by 64 bit binary datas using hash algorithm, URI takes hash algorithm
It is 64 high, it is literal to measure low 64 of hash algorithm, the binary data storage of conversion is into hash concordance list and right
Row in hash concordance list carries out ascending order arrangement, is quickly mapped and is converted by binary chop algorithm when to search;
(3) RDF data stores
After RDF data is mapped and converted according to the method for step (2), the table structure of step (1) is arrived in storage for the first time
In, to storage, into table structure, data are analyzed, and are created analytical table S, are recorded the ternary that each Subject and Object include
Group number and highest 20 URI of the frequency of occurrences and the corresponding frequency of highest 20 literals of frequency, according still further to step (1)
Table structure, using Object as storage entity, to data of the storage into table structure by the mapping and conversion of step (2)
Carry out second of storage, i.e. the data storage of completion RDF again afterwards.
2. a kind of and matched RDF data querying method of RDF data storage method described in claim 1, it is characterised in that by
Following steps composition:
(a.1) extraction and conversion of variable
The basic chart-pattern of triple in SPARQL query statement is decomposed, and determines that the variable number in query statement is
Count, in query statement URI and literal respectively refer to the mapping mode in the step (2) in storage method for its turn
64 bit binary datas are turned to, -1 assignment for arriving-count is carried out to the variable for being included;
(a.2) conversion of basic query chart-pattern
It is looked into according to the triple parent map Mode Decomposition in step (a.1) as a result, converting triple for each basic chart-pattern
Node structure is ask, wherein triple query node structure are as follows:
Triple query node structure
The first time storage or second of storage of step (3) in the mark selection RDF data storage method of storage mode;
To URI and literal, the Id of subject, predicate, object are respectively 64 bit binary datas;To variable,
The Id of subject, predicate, object correspond to institute's assigned value;
(a.3) expression of attended operation is inquired
It is mutually compared according to the triple decomposed in chart-pattern basic in step (a.1), to there are the three of identical variable
Tuple is established a connection as unique identifier using the node Id in step (a.2) structure, and converts connection for connection relationship
Side structure is operated, wherein attended operation side structure are as follows:
Attended operation side structure
(a.4) Query Cost of each inquiry is calculated
According to triple query node structure obtained in step (a.2), to attended operation side structure obtained in step (a.3)
Costing analysis is carried out respectively according to cost algorithms, and the cost value for obtaining attended operation side structure is c, the formula of cost algorithms are as follows:
TMC(t,m,S)→c
Wherein: t is the triple for needing to inquire;Storage or second for the first time in the step of m is RDF data storage method (3)
Storage;S is analytical table;
(a.5) generation of inquiry plan
The cost value c of all attended operation sides structure obtained in step (a.4) is subjected to ascending sequence, is obtained by cost
It is worth the sequence node of sequence, choosing the smallest node of c value in sequence is start node, successively chooses next section in sequence
Point is attached inquiry if the variable in node is not inquired, until the variable in all nodes is all completed to inquire, that is, realizes
The inquiry of sentence.
3. RDF data querying method according to claim 2, it is characterised in that further include step after the step (a.5)
(a.6) caching mechanism is established, specifically:
Hash is carried out to the set of query statement triple query node structure according to obtained in step (a.2) of user's input
Operation, obtains the end value of hash function, if there are the values in cache list, directly takes out buffered results and feed back to user;
Otherwise, then repeatedly step (a.3) arrives (a.5), acquired results is stored in hard disk, the result of corresponding address mark and hash function
In value deposit cache list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510955821.5A CN105630881B (en) | 2015-12-18 | 2015-12-18 | A kind of date storage method and querying method of RDF |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510955821.5A CN105630881B (en) | 2015-12-18 | 2015-12-18 | A kind of date storage method and querying method of RDF |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105630881A CN105630881A (en) | 2016-06-01 |
CN105630881B true CN105630881B (en) | 2019-04-09 |
Family
ID=56045814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510955821.5A Active CN105630881B (en) | 2015-12-18 | 2015-12-18 | A kind of date storage method and querying method of RDF |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105630881B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10445361B2 (en) * | 2016-12-15 | 2019-10-15 | Microsoft Technology Licensing, Llc | Caching of subgraphs and integration of cached subgraphs into graph query results |
US10242223B2 (en) | 2017-02-27 | 2019-03-26 | Microsoft Technology Licensing, Llc | Access controlled graph query spanning |
CN107066573B (en) * | 2017-04-10 | 2020-04-17 | 北京工商大学 | Data association access method based on three-dimensional table structure and application |
CN107229704A (en) * | 2017-05-25 | 2017-10-03 | 深圳大学 | A kind of resource description framework querying method and system based on KSP algorithms |
CN108268580A (en) * | 2017-07-14 | 2018-07-10 | 广东神马搜索科技有限公司 | The answering method and device of knowledge based collection of illustrative plates |
CN107480199B (en) * | 2017-07-17 | 2020-06-12 | 深圳先进技术研究院 | Query reconstruction method, device, equipment and storage medium of database |
CN110019911A (en) * | 2017-12-29 | 2019-07-16 | 苏州工业职业技术学院 | Support the querying method and device of the knowledge mapping of Knowledge Evolvement |
EP3514706A1 (en) * | 2018-01-18 | 2019-07-24 | Université Jean-Monnet | Method for processing a question in natural language |
CN109446358A (en) * | 2018-08-27 | 2019-03-08 | 电子科技大学 | A kind of chart database accelerator and method based on ID caching technology |
CN109656946B (en) * | 2018-09-29 | 2022-12-16 | 创新先进技术有限公司 | Multi-table association query method, device and equipment |
CN112287043B (en) * | 2020-12-29 | 2021-06-18 | 成都数联铭品科技有限公司 | Automatic graph code generation method and system based on domain knowledge and electronic equipment |
CN112732746B (en) * | 2021-01-13 | 2023-05-12 | 首都师范大学 | SPARQL endpoint combination-based dynamic connection ordering method |
CN114996370A (en) * | 2022-08-03 | 2022-09-02 | 杰为软件系统(深圳)有限公司 | Data conversion and migration method from relational database to semantic triple |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521299A (en) * | 2011-11-30 | 2012-06-27 | 华中科技大学 | Method for processing data of resource description framework |
CN103970820A (en) * | 2014-01-23 | 2014-08-06 | 河海大学 | Method and device for visualization of Web multimedia resource open annotation data |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7426525B2 (en) * | 2003-08-08 | 2008-09-16 | Hewlett-Packard Development Company, L.P. | Method and apparatus for identifying an object using an object description language |
US8078646B2 (en) * | 2008-08-08 | 2011-12-13 | Oracle International Corporation | Representing and manipulating RDF data in a relational database management system |
-
2015
- 2015-12-18 CN CN201510955821.5A patent/CN105630881B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521299A (en) * | 2011-11-30 | 2012-06-27 | 华中科技大学 | Method for processing data of resource description framework |
CN103970820A (en) * | 2014-01-23 | 2014-08-06 | 河海大学 | Method and device for visualization of Web multimedia resource open annotation data |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
Non-Patent Citations (1)
Title |
---|
"一种基于聚类模式的RDF数据聚类方法";袁柳等;《计算机科学》;20151031;第42卷(第10期);第266-269页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105630881A (en) | 2016-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105630881B (en) | A kind of date storage method and querying method of RDF | |
CN103646032B (en) | A kind of based on body with the data base query method of limited natural language processing | |
Özsu | A survey of RDF data management systems | |
Hartig et al. | Publishing and consuming provenance metadata on the web of linked data | |
Etcheverry et al. | Enhancing OLAP analysis with web cubes | |
Görlitz et al. | Federated data management and query optimization for linked open data | |
US7702685B2 (en) | Querying social networks | |
US11599535B2 (en) | Query translation for searching complex structures of objects | |
Bikakis et al. | The XML and semantic web worlds: technologies, interoperability and integration: a survey of the state of the art | |
CN109947998A (en) | The calculating data lineage of network across heterogeneous system | |
CN104636478A (en) | Information query method and device | |
US8825621B2 (en) | Transformation of complex data source result sets to normalized sets for manipulation and presentation | |
CN104137095B (en) | System for evolution analysis | |
de la Vega et al. | Mortadelo: Automatic generation of NoSQL stores from platform-independent data models | |
Masmoudi et al. | Knowledge hypergraph-based approach for data integration and querying: Application to Earth Observation | |
Banane et al. | SPARQL2Hive: An approach to processing SPARQL queries on Hive based on meta-models | |
US20140067853A1 (en) | Data search method, information system, and recording medium storing data search program | |
CN108241709A (en) | A kind of data integrating method, device and system | |
CN101719162A (en) | Multi-version open geographic information service access method and system based on fragment pattern matching | |
Fernández et al. | Management of big semantic data | |
KR101897760B1 (en) | A system of converting and storing triple for linked open data cloud information service and a method thereof | |
RU2605387C2 (en) | Method and system for storing graphs data | |
Babalou et al. | Towards a semantic toolbox for reproducible knowledge graph generation in the biodiversity domain-how to make the most out of biodiversity data | |
Hauswirth et al. | Linked data management | |
Yuksel et al. | An analysis of RDF storage models and query optimization techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |