CN105630881B - A kind of date storage method and querying method of RDF - Google Patents

A kind of date storage method and querying method of RDF Download PDF

Info

Publication number
CN105630881B
CN105630881B CN201510955821.5A CN201510955821A CN105630881B CN 105630881 B CN105630881 B CN 105630881B CN 201510955821 A CN201510955821 A CN 201510955821A CN 105630881 B CN105630881 B CN 105630881B
Authority
CN
China
Prior art keywords
storage
data
rdf
triple
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510955821.5A
Other languages
Chinese (zh)
Other versions
CN105630881A (en
Inventor
袁柳
张鸿洋
翟梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN201510955821.5A priority Critical patent/CN105630881B/en
Publication of CN105630881A publication Critical patent/CN105630881A/en
Application granted granted Critical
Publication of CN105630881B publication Critical patent/CN105630881B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing
    • G06F16/8373Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the date storage method of RDF a kind of and querying methods, it is the storage organization and storage mapping by designing the RDF data of entity-oriented, it converts the URI of RDF data and literal to after 64 bit binary datas and is stored according to the storage organization of design, SPARQL query statement is parsed and converted in querying method, by multiple each inquiry triples in SPARQL sentence according to the connection relationship between the analysis result and each inquiry to entire data set, estimation is single to ask cost, ultimately generate minimum cost querying flow, the present invention can greatly promote the speed compared between data and reduce memory space, compared to traditional direct SQL is converted by SPARQL inquire, significantly promote search efficiency, it can be used for web data management, We The fields such as b semantic retrieval.

Description

A kind of date storage method and querying method of RDF
Technical field
The invention belongs to web data administrative skill fields, and in particular to it is a kind of reduce RDF data memory space, improve The date storage method and querying method of the RDF of the search efficiency of SPARQL.
Background technique
RDF (resource description framework) is proposed by WWW to WWW (World Wide Web the frame that information is described on), it provides information Description standard for the various applications on Web.RDF subject S (Subject), predicate P (Predicate), object O (Object) triple form the resource on Web is described.Wherein, main Language generally use uniform resource identifier URI (Uniform Resource Identifiers) indicate Web on information entity (or Person's concept), predicate describes association attributes possessed by entity, and object is corresponding attribute value.Such form of presentation makes RDF It can be used to indicate appointing on WebWhat identified information, and it is exchanged among applications without losing Lose semantic information.Therefore, RDF becomes the standard of semantic data description, is widely used in description, ontology and the semanteme of metadata In net.With being increasing for semantic web data, the system that construction efficiently stored and inquired these semantic web datas becomes language Adopted net application obtains a universal very important aspect, and RDF is basic as the description of semantic web data, therefore studies The efficient storage of RDF data and inquiry become the hot spot of research of semantic web.The storage mode and optimal way of RDF data at present There are mainly three types of.
The first, the storage mode based on relational database
Since RDF data can regard<Subject as, Predicate, Object>triple set, therefore it is most natural Mode be directly to store these data using triple table.Therefore many RDF datas based on relational database store system System directly uses relational database, designs triple table or similar mode to store RDF data.The step of this method, wraps Contain: (1) RDF data being parsed into triple;(2) MD5 (Message Digest is passed through to the URI in triple Algorithm 5) Hash is encoded, and intercepts preceding 64 identifiers as resource of MD5 Hash;(3) in relationship type number Data are stored according to the table arranged in library using one 3, and establish relative index.But this method is looked into progress SPARQL When inquiry, needs to convert structured query language SQL for SPARQL query language and inquire, need the conversion operation of multilayer. Since RDF data and relation data are very different, when RDF data is stored in relation database table, need to carry out table Between map operation.Therefore the efficiency of space utilisation and inquiry is reduced.
Second, the storage mode based on local binary file
RDF document be can with certain format store into file, in semantic net, a large amount of RDF document just with The form of RDF/XML exists.RDF data and relation data make a big difference in structure, describe grammer and compare relation data Complicated more in library, but describing resource using RDF is to have biggish flexibility.It can be with based on fixed disk file storage RDF document Reach better storage efficiency, while can guarantee quickly to respond inquiry, has some storage organizations based on the hard disk at present System design, B-tree, B+ tree and the Hash table technology that these systems are often generallyd use by means of database.But based on text The storage mode development cost of part is relatively high, and since RDF is basic semantic web data description basis, if there are also Need on basic storage organization support to data carry out inquiry reasoning that just also need to do a large amount of work.
The third, storage mode memory-based
With the continuous development of hardware technology, memory is also increasingly cheaper, and memory size is also increasing, and building is based on interior The RDF data storage system deposited also becomes the hot spot of Recent study.Memory is capable of providing quickish access speed first, can To be operated in real time to data, the I/O expense of disk is saved, if it is good to design a storage organization in memory RDF storage system can further improve the efficiency of inquiry and analysis.But which is not suitable for large-scale RDF data Storage, and current option b RAHMS, BitMat etc. does not support the direct inquiry of SPARQL.It can be seen that RDF memory-based is deposited Storage structure, which is still within, constantly to be studied and improves the stage.
Summary of the invention
It is an object of the invention to overcome the shortcomings of above-mentioned prior art, propose a kind of for RDF education resource offer one Compare speed between kind data fastly and reduces the RDF data storage method of memory space.
The present invention also provides a kind of RDF data issuers for matching with above-mentioned storage method and capableing of quick search Method, to improve the recall precision of RDF education resource.
To achieve the goals above, the technical solution adopted by the present invention is that:
The storage method of RDF data of the present invention comprises the steps of:
(1) storage organization of the RDF data of entity-oriented is designed
(1.1) by the way of entity-oriented, data are stored in the k column of relevant database n row, wherein k is RDF number The average value of the predicate quantity of all subjects in, n is the sum for the line number line that all subjects need, when the predicate of single subject When quantity sum≤k, then needed for line number line=1;As sum > k, then multirow storage is carried out, then required line number line=(sum/ k)+1;
(1.2) after determining k value, according to mapping predicates algorithm, predicate is switched into column subscript, obtains the table of n row k column Structure;
Wherein the predicate of step (1.2) is converted into the lower target of column method particularly includes:
(1.2.1) calculates column subscript, the formula of mapping predicates algorithm with mapping predicates algorithm are as follows:
H in formula1, h2…hjJ hash function is corresponded to, i is column subscript;
(1.2.2) then opens up new a line when j hash function calculates the subscript for completing still not find the free time, The data are stored to h1In the subscript of calculating.
(2) it is designed for the storage mapping of RDF data
The URI of RDF data and literal are separately converted to by 64 bit binary datas using hash algorithm, URI takes hash 64 high, literal low 64 for measuring hash algorithm of algorithm, simultaneously into hash concordance list by the binary data storage of conversion Ascending order arrangement is carried out to the row in hash concordance list, is quickly mapped and is converted by binary chop algorithm when to search;
(3) RDF data stores
After RDF data is mapped and converted according to the method for step (2), the table of step (1) is arrived in storage for the first time In structure, to storage, into table structure, data are analyzed, and are created analytical table S, are recorded each Subject and Object includes Triple number and highest 20 URI of the frequency of occurrences and the corresponding frequency of highest 20 literals of frequency, according still further to step Suddenly the table structure of (1), using Object as storage entity, to data of the storage into table structure by step (2) mapping with Second of storage, i.e. the data storage of completion RDF are carried out after conversion again.
A kind of and above-mentioned matched RDF data querying method of RDF data storage method, is to comprise the steps of:
(a.1) extraction and conversion of variable
The basic chart-pattern of triple in SPARQL query statement is decomposed, and determines the variable in query statement Number be count, in query statement URI and literal respectively refer to the mapping mode in the step (2) in storage method will It is converted into 64 bit binary datas, carries out -1 assignment for arriving-count to the variable for being included;
(a.2) conversion of basic query chart-pattern
According to the triple parent map Mode Decomposition in step (a.1) as a result, converting ternary for each basic chart-pattern Group polling node structure, wherein triple query node structure are as follows:
Triple query node structure
{
The Id of node;
The Id of subject;
The Id of predicate;
The Id of object;
The mark of storage mode;
}
The first time storage or second of storage of step (3) in the mark selection RDF data storage method of storage mode;
To URI and literal, the Id of subject, predicate, object are respectively 64 bit binary datas;To change Amount, the Id of subject, predicate, object correspond to institute's assigned value;
(a.3) expression of attended operation is inquired
It is mutually compared according to the triple decomposed in chart-pattern basic in step (a.1), to there are identical variables Triple, established a connection using the node Id in step (a.2) structure as unique identifier, and convert connection relationship to Attended operation side structure, wherein attended operation side structure are as follows:
Attended operation side structure
{
The Id of the node of triple is originated,
The Id of the node of triple is terminated,
The Id of co-variate
};
(a.4) Query Cost of each inquiry is calculated
According to triple query node structure obtained in step (a.2), to attended operation side obtained in step (a.3) Structure carries out costing analysis according to cost algorithms respectively, and the cost value for obtaining attended operation side structure is c, the formula of cost algorithms Are as follows:
TMC(t,m,S)→c
Wherein: t is the triple for needing to inquire;Storage or for the first time in the step of m is RDF data storage method (3) Secondary storage;S is analytical table;
(a.5) generation of inquiry plan
The cost value c of all attended operation sides structure obtained in step (a.4) is subjected to ascending sequence, obtain by The sequence node of cost value sequence, choosing the smallest node of c value in sequence is start node, is successively chosen next in sequence Node is attached inquiry if the variable in node is not inquired, until the variable in all nodes is all completed to inquire, i.e., in fact The inquiry of existing sentence.
It further include that step (a.6) establishes caching mechanism after above-mentioned steps (a.5), specifically: the inquiry to user's input The set of sentence triple query node structure according to obtained in step (a.2) carries out hash operation, obtains hash function End value directly takes out buffered results and feeds back to user if there are the values in cache list;Otherwise, then repeatedly step (a.3) To (a.5), acquired results are stored in hard disk, corresponding address identifies and the end value of hash function is stored in cache list.
The date storage method and querying method of RDF of the invention are the optimization to the memory structure of data, and are directed to The structure does query optimization to SPARQL, realizes the method that the education resource based on RDF is quickly retrieved and inquired.With The prior art is compared, the invention has the following advantages that
(1) storage that the URI and literal of script are replaced using 64 bit binary datas, can greatly promote data Between the speed that compares and reduce memory space, while to URI and literal, taking high 64 and low 64 of hash algorithm respectively Position, to distinguish URI and literal as identical character string.And the storage of hash index record is ranked up, to search When required record quickly navigated to by binary chop algorithm.
(2) storage organization of RDF data is stored simultaneously by the way of entity-oriented (entry-oriented) It is entity with subject (Subject) and is entity two ways with object (Object), the former realizes efficiently from subject (Subject) inquiry predicate (Predicate) is gone, a large amount of attended operation of the conventional store mode in inquiry is avoided;The latter It realizes efficiently from predicate (Predicate) to the inquiry of Subject (subject).
(3) SPARQL query statement is parsed and is converted, by multiple each inquiry triples in SPARQL sentence According to the connection relationship between the analysis result and each inquiry to entire data set, estimates single inquiry cost, ultimately generate minimum Cost querying flow, compared to it is traditional it is direct convert SQL for SPARQL and inquire, significantly promote search efficiency.
(4) caching mechanism is added during inquiry, the data set high to enquiry frequency caches, and delays in memory List is deposited, the row in each cache list includes the end value and address mark of hash function, promotes the efficiency of inquiry.
(5) present invention proposes that Data Storage Models and query optimization plan can extend to web data management, Web language The fields such as justice retrieval, or even the storage and retrieval of others RDF resource data.
Detailed description of the invention
The analysis and conversion schematic diagram that Fig. 1 is the SPARQL of step (a.2) in embodiment.
Fig. 2 is the explanation that query tree is generated to SPARQL of step (a.3) in embodiment.
Fig. 3 is the cache model schematic diagram of step (a.6) in embodiment.
Specific embodiment
The present invention is described further with reference to the accompanying drawings and examples.
The date storage method of RDF is realized by following steps in the present embodiment:
(1) it is designed for the storage mapping of RDF data
For the storage organization of RDF data, by the way of entity-oriented (entry-oriented), data are stored to pass It is in the k column of type database n row, wherein k is the average value of the predicate quantity of all subjects in RDF data, and n needs for all subjects The sum of the line number line wanted.
(1.1) the columns k and required line number n of table structure are determined
As predicate (Predicate) quantity sum≤k of single subject (Subject), then needed for line number line=1;When When sum > k, then multirow tuple is needed to be stored, required line number line=(sum/k)+1;
Such as following data:
(Charles Flint,born,1850)
(Charles Flint,died,1934)
(Charles Flint,founder,IBM)
(Larry Page,born,1973)
(Larry Page,founder,Google)
(Larry Page,board,Google)
(Larry Page,home,Palo Alto)
(Android,developer,Google)
(Android,version,4.1)
(Android,kernel,Linux)
(Android,preceded,4.0)
(Android,graphics,OpenGL)
Storage form is as shown in table 1:
Table 1 is using Object as the storage table of entity
(1.2) the subscript i of predicate (Predicate) storage is determined
After determining k value, according to mapping predicates algorithm, predicate is switched into column subscript, when multiple predicates of same target pass through It crosses mapping algorithm and obtains identical subscript, be then known as conflicting, need to define the column that multiple hash algorithms utilize space as far as possible With avoid conflicting, when multiple hash algorithms calculate complete still exist conflict when, then be the Subject more increase tuple one advance Row storage, mapping predicates algorithmic function are as follows:
H in formula1, h2…hjJ hash function is corresponded to, i is column subscript,
When j hash function, which calculates, to be completed still not finding idle subscript, then new a line is opened up, by the data It is stored to h1In the subscript of calculating.
In conjunction with table 1, check that Subject is the triple of Android, it is assumed that the triple is inserted into database one by one In, setting j is 2, then there is h1,h2, the subscript process for calculating pred is as shown in table 2:
Table 2 is to calculate target process under predicate
Developer passes through h1Subscript 1 is calculated, at this time 1 element-free of subscript, directly places.
Version is similarly placed into subscript 2.
Kernel passes through h1It calculates, obtains subscript 1,1 is idle at this time, and meaning clashes, then uses h2Continue to calculate It is designated as 3 under, places.
Preceded passes through h1It is calculated down and is designated as k placement.
Graphics passes through h1,h2Obtained subscript 3 and 2 is conflicted, then creates a line, put it into pred3
(2) it is designed for the storage mapping of RDF data
The triple data of usual RDF are divided into two classes: URI and literal.
URI and literal are separately converted to by 64 bit binary datas using hash algorithm, hash algorithm is taken for URI It is 64 high, for literal low 64 for measuring hash algorithm, to distinguish the URI and literal of identical characters string, by conversion Binary data storage carries out ascending order arrangement into hash concordance list and to the row in hash concordance list, passes through two when to search Lookup algorithm is divided quickly to be mapped and converted;
(3) RDF data stores
By RDF data according to the method mapping of step (2) with after conversion, the table structure of step (1) is arrived in storage for the first time In, and to storage, into table structure, data are analyzed, and are created analytical table S, are recorded each Subject and Object include three Tuple number and highest 20 URI of the frequency of occurrences and the corresponding frequency of highest 20 literals of frequency, according still further to step (1) table structure by the mapping of step (2) and turns data of the storage into table structure using Object as storage entity Second of storage is carried out after alternatively again, completes the data storage of RDF.
With the data in table 1, storage form is shown in table 3:
Table 3 is the storage form that data in table 1 are entity by Object
A kind of efficient method for quickly querying of the RDF data suitable for above method storage, is realized by following steps:
For including 6 basic chart-patterns of triple (Basic Graph Pattern, BGP) with SPARQL sentence, connect down Need SPARQL query statement to convert, the purpose of conversion be to be able to it is convenient the storage result of bottom is operated, convert Query Cost estimation is carried out to each triple later, lowest costs is ultimately formed and executes process, specifically by following steps reality It is existing:
(a.1) extraction and conversion of variable
The basic chart-pattern of the triple of SPARQL query statement (Basic Graph Pattern, BGP) is decomposed, And determine that the variable number in query statement is count, in query statement URI and literal deposit referring to above-mentioned RDF data The mapping of the step of method for storing (2) and method for transformation are translated into 64 bit binary datas, for included in query statement Variable carry out -1 arrive-count assignment;
Such as following data:
SELECT? x? y WHERE
X home " Palo Alto " //q1
Y founder " IBM " //q2
Z founder " Google " //q3
X memberOf? z. //q4
Z revenue? y. //q5
X developer? y. //q6
}
Above-mentioned query statement is parsed, obtain three variables? x,? y,? z, and all variables are subjected to id coding It is -1, -2, -3, for other URI or literal, is then directly inquired in the concordance list of step (2).
(a.2) conversion of basic query chart-pattern
Referring to Fig. 1, according to the basic chart-pattern of triple (Basic Graph Pattern, BGP) in step (a.1) point Solution structure converts triple query node structure for each basic chart-pattern, wherein triple query node structure are as follows:
Triple query node structure
{
The Id of node;
The Id of subject;
The Id of predicate;
The Id of object;
The mark of storage mode;
}
To URI and literal, the Id of subject, predicate, object are respectively 64 bit binary datas;To change Amount, the Id of subject, predicate, object are institute's assigned value;
First time storage (the access- of step (3) in above-mentioned RDF data storage method may be selected in the mark of storage mode By-Subject it) realizes with second of storage (access-by-Object), first time storage efficiently from subject (Subject) Inquiry predicate (Predicate) is gone, a large amount of attended operation of the conventional store mode in inquiry is avoided;When subject is unknown, Second of storage mode inquiry may be selected.
Before carrying out single ternary group polling, first have to determine number, the number of constant of each triple variable with And the incidence relation between triple variable and constant, the sequence of inquiry can be determined according to these relationships.
(a.3) expression of attended operation is inquired
It is mutually compared according to the triple of triple parent map Mode Decomposition all in step (a.1), to presence The triple of identical variable is established a connection using the node Id in step (a.2) structure as unique identifier, and connection is closed System is converted into attended operation side structure, wherein attended operation side structure are as follows:
Attended operation side structure
{
The Id of the node of triple is originated,
The Id of the node of triple is terminated,
The Id of co-variate
}
Ultimately form the attended operation structure in Fig. 2.
Query statement is converted and handled by above-mentioned, realizes coding and the collection of variable, basic chart-pattern Triple indicates and the attended operation of inquiry indicates.
(a.4) Query Cost of each inquiry is calculated
According to triple query node structure obtained in step (a.2), to the obtained attended operation in step (a.3) Side structure carries out costing analysis according to conventional cost algorithms, and the cost value for obtaining attended operation side structure is c, the public affairs of cost algorithms Formula are as follows:
TMC(t,m,S)→c
Wherein: t is the triple for needing to inquire;M is storage or for the first time in the storage method step (3) of RDF data Secondary storage, S are analytical table;
Such as:
(? x founder Google)
Access-by-Object is used for the triple, then the implementing result of TMC function are as follows: each in analytical table S The triple number for including in Object.
(a.5) generation of inquiry plan
The cost value c of all attended operation sides structure obtained in step (a.4) is subjected to ascending sequence, obtain by The sequence node of cost value sequence, choosing the smallest node of c in sequence is start node, successively chooses next section in sequence Point is attached inquiry if the variable in node is not inquired, until the variable in all nodes is all completed to inquire, that is, realizes The inquiry of sentence.
With reference to Fig. 2, inquiry plan chooses first triple query node in inquiry plan first, and, as starting point, selection is looked into The 4th query node in proposed figures for the plan is ask, according to the information of the inquiry plan provided, to variable? x is attached operation, obtains To two variables<? x? z>intermediate result set;The intermediate result set is carried out with the 5th inquiry ternary group node according to change again Amount? z is attached operation, obtain the middle tables of three variables<? z? x? y>, and so on, it executes and completes all inquiry languages Sentence, will obtain<? z? x? y>middle table.SELECT operation finally is carried out to the result of inquiry, take out variable? x? y is corresponding Value.
(a.6) caching mechanism is established
During data query, establish caching mechanism caching query as a result, referring to Fig. 3, to promote inquiry Efficiency, concrete operations are:
The set of query statement triple query node structure according to obtained in step (a.2) of user's input is carried out Hash operation, obtains the end value of hash function, if there are the values in cache list, directly takes out buffered results and feed back to use Family;Otherwise, then repeat the above steps (a.3) to (a.5), acquired results be stored in hard disk, and by corresponding address mark and In the end value deposit cache list of hash function.When the capacity of caching is more than expected setting, according to the frequency of inquiry, delete Remove minimum frequency.

Claims (3)

1. a kind of RDF data storage method, it is characterised in that comprise the steps of:
(1) storage organization of the RDF data of entity-oriented is designed
(1.1) by the way of entity-oriented, data are stored in the k column of relevant database n row, wherein k is in RDF data The average value of the predicate quantity of all subjects, n is the sum for the line number line that all subjects need, when the predicate quantity of single subject When sum≤k, then needed for line number line=1;As sum > k, then carry out multirow storage, then needed for line number line=(sum/k)+ 1;
(1.2) after determining k value, according to mapping predicates algorithm, predicate is switched into column subscript, obtains the table knot of n row k column Structure, the predicate are converted into the lower calibration method of column are as follows:
(1.2.1) calculates column subscript, the formula of mapping predicates algorithm with mapping predicates algorithm are as follows:
H in formula1, h2…hjJ hash function is corresponded to, i is column subscript;
(1.2.2) then opens up new a line, by this when j hash function calculates the subscript for completing still not find the free time Data are stored to h1In the subscript of calculating;
(2) it is designed for the storage mapping of RDF data
The URI of RDF data and literal are separately converted to by 64 bit binary datas using hash algorithm, URI takes hash algorithm It is 64 high, it is literal to measure low 64 of hash algorithm, the binary data storage of conversion is into hash concordance list and right Row in hash concordance list carries out ascending order arrangement, is quickly mapped and is converted by binary chop algorithm when to search;
(3) RDF data stores
After RDF data is mapped and converted according to the method for step (2), the table structure of step (1) is arrived in storage for the first time In, to storage, into table structure, data are analyzed, and are created analytical table S, are recorded the ternary that each Subject and Object include Group number and highest 20 URI of the frequency of occurrences and the corresponding frequency of highest 20 literals of frequency, according still further to step (1) Table structure, using Object as storage entity, to data of the storage into table structure by the mapping and conversion of step (2) Carry out second of storage, i.e. the data storage of completion RDF again afterwards.
2. a kind of and matched RDF data querying method of RDF data storage method described in claim 1, it is characterised in that by Following steps composition:
(a.1) extraction and conversion of variable
The basic chart-pattern of triple in SPARQL query statement is decomposed, and determines that the variable number in query statement is Count, in query statement URI and literal respectively refer to the mapping mode in the step (2) in storage method for its turn 64 bit binary datas are turned to, -1 assignment for arriving-count is carried out to the variable for being included;
(a.2) conversion of basic query chart-pattern
It is looked into according to the triple parent map Mode Decomposition in step (a.1) as a result, converting triple for each basic chart-pattern Node structure is ask, wherein triple query node structure are as follows:
Triple query node structure
The first time storage or second of storage of step (3) in the mark selection RDF data storage method of storage mode;
To URI and literal, the Id of subject, predicate, object are respectively 64 bit binary datas;To variable, The Id of subject, predicate, object correspond to institute's assigned value;
(a.3) expression of attended operation is inquired
It is mutually compared according to the triple decomposed in chart-pattern basic in step (a.1), to there are the three of identical variable Tuple is established a connection as unique identifier using the node Id in step (a.2) structure, and converts connection for connection relationship Side structure is operated, wherein attended operation side structure are as follows:
Attended operation side structure
(a.4) Query Cost of each inquiry is calculated
According to triple query node structure obtained in step (a.2), to attended operation side structure obtained in step (a.3) Costing analysis is carried out respectively according to cost algorithms, and the cost value for obtaining attended operation side structure is c, the formula of cost algorithms are as follows:
TMC(t,m,S)→c
Wherein: t is the triple for needing to inquire;Storage or second for the first time in the step of m is RDF data storage method (3) Storage;S is analytical table;
(a.5) generation of inquiry plan
The cost value c of all attended operation sides structure obtained in step (a.4) is subjected to ascending sequence, is obtained by cost It is worth the sequence node of sequence, choosing the smallest node of c value in sequence is start node, successively chooses next section in sequence Point is attached inquiry if the variable in node is not inquired, until the variable in all nodes is all completed to inquire, that is, realizes The inquiry of sentence.
3. RDF data querying method according to claim 2, it is characterised in that further include step after the step (a.5) (a.6) caching mechanism is established, specifically:
Hash is carried out to the set of query statement triple query node structure according to obtained in step (a.2) of user's input Operation, obtains the end value of hash function, if there are the values in cache list, directly takes out buffered results and feed back to user; Otherwise, then repeatedly step (a.3) arrives (a.5), acquired results is stored in hard disk, the result of corresponding address mark and hash function In value deposit cache list.
CN201510955821.5A 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF Active CN105630881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510955821.5A CN105630881B (en) 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510955821.5A CN105630881B (en) 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF

Publications (2)

Publication Number Publication Date
CN105630881A CN105630881A (en) 2016-06-01
CN105630881B true CN105630881B (en) 2019-04-09

Family

ID=56045814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510955821.5A Active CN105630881B (en) 2015-12-18 2015-12-18 A kind of date storage method and querying method of RDF

Country Status (1)

Country Link
CN (1) CN105630881B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445361B2 (en) * 2016-12-15 2019-10-15 Microsoft Technology Licensing, Llc Caching of subgraphs and integration of cached subgraphs into graph query results
US10242223B2 (en) 2017-02-27 2019-03-26 Microsoft Technology Licensing, Llc Access controlled graph query spanning
CN107066573B (en) * 2017-04-10 2020-04-17 北京工商大学 Data association access method based on three-dimensional table structure and application
CN107229704A (en) * 2017-05-25 2017-10-03 深圳大学 A kind of resource description framework querying method and system based on KSP algorithms
CN108268580A (en) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 The answering method and device of knowledge based collection of illustrative plates
CN107480199B (en) * 2017-07-17 2020-06-12 深圳先进技术研究院 Query reconstruction method, device, equipment and storage medium of database
CN110019911A (en) * 2017-12-29 2019-07-16 苏州工业职业技术学院 Support the querying method and device of the knowledge mapping of Knowledge Evolvement
EP3514706A1 (en) * 2018-01-18 2019-07-24 Université Jean-Monnet Method for processing a question in natural language
CN109446358A (en) * 2018-08-27 2019-03-08 电子科技大学 A kind of chart database accelerator and method based on ID caching technology
CN109656946B (en) * 2018-09-29 2022-12-16 创新先进技术有限公司 Multi-table association query method, device and equipment
CN112287043B (en) * 2020-12-29 2021-06-18 成都数联铭品科技有限公司 Automatic graph code generation method and system based on domain knowledge and electronic equipment
CN112732746B (en) * 2021-01-13 2023-05-12 首都师范大学 SPARQL endpoint combination-based dynamic connection ordering method
CN114996370A (en) * 2022-08-03 2022-09-02 杰为软件系统(深圳)有限公司 Data conversion and migration method from relational database to semantic triple

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN103970820A (en) * 2014-01-23 2014-08-06 河海大学 Method and device for visualization of Web multimedia resource open annotation data
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7426525B2 (en) * 2003-08-08 2008-09-16 Hewlett-Packard Development Company, L.P. Method and apparatus for identifying an object using an object description language
US8078646B2 (en) * 2008-08-08 2011-12-13 Oracle International Corporation Representing and manipulating RDF data in a relational database management system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN103970820A (en) * 2014-01-23 2014-08-06 河海大学 Method and device for visualization of Web multimedia resource open annotation data
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"一种基于聚类模式的RDF数据聚类方法";袁柳等;《计算机科学》;20151031;第42卷(第10期);第266-269页 *

Also Published As

Publication number Publication date
CN105630881A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN105630881B (en) A kind of date storage method and querying method of RDF
CN103646032B (en) A kind of based on body with the data base query method of limited natural language processing
Özsu A survey of RDF data management systems
Hartig et al. Publishing and consuming provenance metadata on the web of linked data
Etcheverry et al. Enhancing OLAP analysis with web cubes
Görlitz et al. Federated data management and query optimization for linked open data
US7702685B2 (en) Querying social networks
US11599535B2 (en) Query translation for searching complex structures of objects
Bikakis et al. The XML and semantic web worlds: technologies, interoperability and integration: a survey of the state of the art
CN109947998A (en) The calculating data lineage of network across heterogeneous system
CN104636478A (en) Information query method and device
US8825621B2 (en) Transformation of complex data source result sets to normalized sets for manipulation and presentation
CN104137095B (en) System for evolution analysis
de la Vega et al. Mortadelo: Automatic generation of NoSQL stores from platform-independent data models
Masmoudi et al. Knowledge hypergraph-based approach for data integration and querying: Application to Earth Observation
Banane et al. SPARQL2Hive: An approach to processing SPARQL queries on Hive based on meta-models
US20140067853A1 (en) Data search method, information system, and recording medium storing data search program
CN108241709A (en) A kind of data integrating method, device and system
CN101719162A (en) Multi-version open geographic information service access method and system based on fragment pattern matching
Fernández et al. Management of big semantic data
KR101897760B1 (en) A system of converting and storing triple for linked open data cloud information service and a method thereof
RU2605387C2 (en) Method and system for storing graphs data
Babalou et al. Towards a semantic toolbox for reproducible knowledge graph generation in the biodiversity domain-how to make the most out of biodiversity data
Hauswirth et al. Linked data management
Yuksel et al. An analysis of RDF storage models and query optimization techniques

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant