CN105930419B - RDF data distributed parallel semantic coding method - Google Patents

RDF data distributed parallel semantic coding method Download PDF

Info

Publication number
CN105930419B
CN105930419B CN201610242787.1A CN201610242787A CN105930419B CN 105930419 B CN105930419 B CN 105930419B CN 201610242787 A CN201610242787 A CN 201610242787A CN 105930419 B CN105930419 B CN 105930419B
Authority
CN
China
Prior art keywords
triple
class
file
coding
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610242787.1A
Other languages
Chinese (zh)
Other versions
CN105930419A (en
Inventor
汪璟玢
郑翠春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201610242787.1A priority Critical patent/CN105930419B/en
Publication of CN105930419A publication Critical patent/CN105930419A/en
Application granted granted Critical
Publication of CN105930419B publication Critical patent/CN105930419B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present invention relates to a kind of RDF data distributed parallel semantic coding methods, specifically includes the following steps: step S1: reading in RDF ontology file, construct class relational model and relation on attributes model;Step S2: reading in RDF data file, ternary component be cut into triple item, triple item is divided by class, and delete duplicate triple item, while generating prefix code;Triple item is filtered, to ensure the consistency of RDF triple coding, so that the same triple item will not be assigned to different codings;Step S3: triple item is encoded, and generates dictionary table;Step S4: triple is encoded, the triple file after generating coding;Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion generates original RDF data file.The present invention can efficiently realize compressed encoding and the reversion of large-scale data under distributed environment in conjunction with ontology.

Description

RDF data distributed parallel semantic coding method
Technical field
The present invention relates to semantic network technology fields, more particularly to a kind of RDF data distributed parallel semantic coding method.
Background technique
Since the extensive property of RDF data makes its management, there are limitations, in order to accelerate the inquiry of RDF data or push away Reason, reduces the memory space of data, and common practice is exactly to carry out compressed encoding to triple.Compressed encoding has proven to A kind of efficient coding finally will by the way that a numerical value (ID) is replaced original triple item (subject or predicate or object) All triple data are converted to the data of numerical value formula.Centralized environment is not suitable for due to the limitation of memory to extensive number According to coding.Research RDF data distributed parallel Coding Compression Algorithm is a newer at present field.Goodman et al. exists The method for proposing to adapt to linear probing on Gray XMT machine, is realized on single dictionary table parallel by parallel Hash Coding.The scramble time of this algorithm and used computer nucleus number are linear, and this method requires all data to protect The Gray XMT in shared drive framework is deposited in memory and depended critically upon, common distributed memory system is not suitable for. LongCheng et al. compresses RDF data using X10 language.Triple is filtered first, further according to triple item Hash value be assigned to triple data are equal number of different node and carry out local coder, and generate multiple dictionary tables. Urbani et al. proposes distribution MapReduce data compression algorithm, is broadly divided into data compression stage and data reversal stage, Wherein triple is compressed in the data compression stage, and constructs dictionary table;In the reversion stage, by compressed triple and Dictionary table is attached, to generate original triple data.This algorithm is not high enough in data reversal stage efficiency.
The above are current RDF data distributed parallel compression newest research results, and presently, there are three kinds effectively The parallelly compressed algorithm of RDF data, can be realized the parallelly compressed coding of magnanimity RDF data, but these compression algorithms do not consider In conjunction with ontology file, therefore the triple after coding does not indicate any semantic information, is unfavorable for later period progress distributed query Or semantic reasoning.There is presently no the parallel semantic codings for combining ontology file to realize RDF data.
The demand of mass data is unable to satisfy under centralized environment, and the compressed encoding under distributed environment does not indicate to appoint What semantic information, is unfavorable for distributed query or reasoning.Some distributed compression algorithms are inadequate in the efficiency in data reversal stage It is high.
The technical issues that need to address: how to guarantee the uniqueness of triple item coding in 1. solution distributed environments, i.e., Identical triple item will not be assigned to different codings.2. solving the lossless compression for how guaranteeing coding in distributed environment Matter, that is, the triple after encoding can be reversed to original triple.3. it is corresponding to combine the distributed schemes proposed to propose Parallel coding scheme, to meet the demand of the distributed parallel semantic coding of large-scale data.
Summary of the invention
In view of this, the object of the present invention is to provide a kind of RDF data distributed parallel semantic coding method, mainly in combination with Ontology encodes RDF data, so that the coding of RDF triple has semantic information and coding has regularity, is conducive to divide The completion of cloth inquiry and semantic reasoning combines ontology that can efficiently realize the compression of large-scale data under distributed environment Coding and reversion.
The present invention is realized using following scheme: a kind of RDF data distributed parallel semantic coding method specifically includes following Step:
Step S1: reading RDF ontology file, building class relational model and relation on attributes model, generation class and its coding The mapped file of mapped file and attribute and its coding;
Step S2: reading in RDF data file, ternary component be cut into triple item, and triple item is divided by class, and deletes Except duplicate triple item, while generating prefix code;Triple item is filtered, to ensure RDF triple coding Consistency, so that the same triple item will not be assigned to different codings;
Step S3: triple item is encoded, and generates dictionary table;
Step S4: triple is encoded, the triple file after generating coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion life At original RDF data file.
Further, in the step S1, the ontology file of RDF data format is subjected to Jena parsing first, according to class Relationship production Methods tree constructs the model of class relationship;
Wherein, generic attribute type mark Flag is defined to identify the class and the attribute, it is assumed that current data v, Then
Definition tree nodes encoding digit TreenodeDigit, abbreviation TD, if total node number is M,
It defines class and encodes TreeClasscode, abbreviation TC, TC are by Flag, lineal parent number label, class node sequence Coding and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of class node sequential encoding encode Digit be all TD (M);TC (h, i) indicates the class nodes encoding of h layers of i-th of node A;F (h, i) indicates the i-th of h layers The node sequence of a node A encodes, and REPT (0, n) is expressed as generating n 0;If anc (h) indicates that h layers of class node sequence is compiled Code, f (h-1, m) indicate the node sequence coding of the class node B of node A, then
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute encodes TreePropertycode, and abbreviation TP, TP are compiled by Flag, class coding, parent attribute node sequence Code and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of parent attribute node sequence coding encode Digit be all TD (M);TP (h, i) indicates the attribute node coding of h layers of i-th of node C, and class belonging to C is set as R, Class nodes encoding is expressed as TC (p, r);F (h, i) indicates the node sequence coding of h layers of i-th of node C, REPT (0, n) table It is shown as generating n 0;If anc (h) indicates h layers of attribute node sequential encoding, f (h-1, m) indicates the parent attribute section of node C The node sequence of point D encodes, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
The relational tree is a multiway tree, and the definition encoded by breadth first algorithm combination class obtains relation on attributes Relational tree, generate class coding.
Further, in the step S2, triple is divided by class, the predicate of the triple is RDF ontology file In attribute, generated attribute coding in building relation on attributes model, then only need by the subject of triple and object by Class divides;If triple item TripleItem is not unique in RDF data, in triple item TripleItem by the same of class division When triple item TripleItem is filtered;
The triple item TripleItem classification and filter algorithm specifically: input RDF triplet format file;It is defeated Triple item TripleItem presses the class file and prefix code relational file that class divides out;
If different triple item TripleItem share identical URI, in order to ensure coding has Semantic Similarity, make It obtains similar URI and is encoded into similar number, then identical prefix is extracted according to RDF data file;
The MultipleOutputFormat that the step needs to rewrite MapReduce enables the file of output by class text Part output.
Preferably, the triple item TripleItem is the subject, predicate or object of triple, is defined as:
Wherein, n indicates the sum of triple;
IfThen v ∈ TripleItem.
Further, in the step S3, the destination file of triple item TripleItem classification and filter algorithm is obtained, Input file as triple item coding;It is handled in class file of the Map stage to triple item TripleItem, The Reduce stage encodes triple item TripleItem, while generating the word that triple item TripleItem is encoded with it Allusion quotation map file, will be in the storage to the HDFS of cluster of dictionary map file;Each triple item TripleItem encodes lattice Formula are as follows: affiliated class coding+prefix code+mantissa coding;
The triple item TripleItem encryption algorithm specifically: input triple item TripleItem is divided by class Class file and prefix code relational file;Export dictionary map file.
Further, in the step S4, according to the dictionary generated in the triple item TripleItem encryption algorithm Mapping table encodes each of the RDF triplet format file of input triple;By triple item TripleItem It is attached with dictionary mapping table, to generate the coding of triple;
The triple encryption algorithm specifically: input RDF triplet format file and dictionary map file
The RDF file that exports coding generates.
Further, in the step S5, triple item TripleItem and its coding are established according to SCOM algorithm The RDF file that coding generates is inverted to original RDF file using SCOM reversion algorithm in conjunction with dictionary table by dictionary table.
Compared with prior art, the present invention has the advantage that RDF data distributed parallel semanteme proposed by the present invention is compiled Code scheme, can efficiently complete the distributed parallel coding of RDF data under large-scale data, and can be realized RDF data Reversion;The encoding scheme compares existing encoding scheme, all has significant advantage in compressed encoding stage and reversion stage, and And the encoding scheme is able to ascend RDFS rule-based reasoning.
Detailed description of the invention
Fig. 1 is method frame schematic diagram of the invention.
Fig. 2 is the middle part LUBM classification relation model schematic in the present invention.
Fig. 3 is to adhere to sexual intercourse model separately in the middle part of LUBM in the present invention
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
The present embodiment provides a kind of RDF data distributed parallel semantic coding method (Semantic Coding with Ontology on MapReduce, abbreviation SCOM).It is closed according to the characteristics of MapReduce in conjunction with ontological construction class relationship and attribute It is model, sorting code number is carried out to RDF data according to model, so as to realize the distributed parallel compressed encoding of RDF data, Wherein SCOM scheme is divided into compressed encoding stage and reversion stage, as shown in Figure 1, specifically includes the following steps:
Step S1: reading RDF ontology file, building class relational model and relation on attributes model, generation class and its coding The mapped file of mapped file and attribute and its coding;
Step S2: reading in RDF data file, ternary component be cut into triple item, and triple item is divided by class, and deletes Except duplicate triple item, while generating prefix code;Triple item is filtered, to ensure RDF triple coding Consistency, so that the same triple item will not be assigned to different codings;
Step S3: triple item is encoded, and generates dictionary table;
Step S4: triple is encoded, the triple file after generating coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion life At original RDF data file.
In the present embodiment, it in the step S1, in order to make the coding of RDF data that there is semantic information, needs Generate class coding.Since the predicate in RDF data is all defined in ontology file, and quantity is far less than in RDF data Subject or object, thus the step needs to complete the coding of attribute after encoding class.
In the present embodiment, following several definition are provided:
Generic attribute type mark Flag is defined to identify the class and the attribute, it is assumed that current data v, then
Definition tree nodes encoding digit TreenodeDigit, abbreviation TD, if total node number is M,
It defines class and encodes TreeClasscode, abbreviation TC, TC are by Flag, lineal parent number label, class node sequence Coding and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of class node sequential encoding encode Digit be all TD (M);TC (h, i) indicates the class nodes encoding of h layers of i-th of node A;F (h, i) indicates the i-th of h layers The node sequence of a node A encodes, and REPT (0, n) is expressed as generating n 0;If anc (h) indicates that h layers of class node sequence is compiled Code, f (h-1, m) indicate the node sequence coding of the class node B of node A, then
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute encodes TreePropertycode, and abbreviation TP, TP are compiled by Flag, class coding, parent attribute node sequence Code and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of parent attribute node sequence coding encode Digit be all TD (M);TP (h, i) indicates the attribute node coding of h layers of i-th of node C, and class belonging to C is set as R, Class nodes encoding is expressed as TC (p, r);F (h, i) indicates the node sequence coding of h layers of i-th of node C, REPT (0, n) table It is shown as generating n 0;If anc (h) indicates h layers of attribute node sequential encoding, f (h-1, m) indicates the parent attribute section of node C The node sequence of point D encodes, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
In the present embodiment, the ontology file of RDF data format is carried out Jena parsing first by the step S2, according to Class relationship production Methods tree (subclass and parent) constructs the model of class relationship;The relational tree is a multiway tree, passes through range The definition of priority algorithm combination class coding, obtains the relational tree of relation on attributes, generates class coding.
By taking the class segment in LUBM data set as an example, it is assumed that according to bits of coded determined by the definition of tree node number of encoding bits Number is 2, then the class relational model constructed is as shown in Fig. 2, the first expression class label wherein encoded, second indicate lineal father Class number label, third position and the 4th combination constitute the lineal class node sequential encoding of current class, last two structures It is encoded at the node sequence of current class.Wherein, Things class is the parent of all classes, its lineal parent number is labeled as 0 (i.e. IPF=0).
In view of the possible more than one of the parent of a class, it is assumed that Part-timeGraduateStudent class is (on-job to grind Study carefully life) lineal parent be Fig. 2 class relational model in GraduateStudent class (postgraduate) and TeachingAssistant class (assistant).Then, the class node sequential encoding of Part-timeGraduateStudent class is The combination (i.e. 0911) of the sequential encoding of GraduateStudent class and TeachingAssistant class.At this point, Part- TimeGraduateStudent class is encoded to 02091113, wherein IPF=2 indicates Part-timeGraduateStudent There are two lineal parents for class.
It is similar with building class relational tree, the relational tree of attribute is constructed, attribute is encoded, is with class coding difference Attribute coding needs to add class encoded information, so that attribute coding contains semantic information.
By taking the attribute segment in LUBM data set as an example, it is assumed that encoded according to determined by the definition of tree node number of encoding bits Digit is 2, and in conjunction with the class coding in Fig. 2, then relation on attributes model is as shown in figure 3, the first expression attribute mark wherein encoded Note, the group of second to the 7th are combined into the class coding (the domain class of attribute) of current attribute, the 8th and the 9th group It is combined into the lineal parent attribute node sequence coding of current attribute, last two node sequences for current attribute encode.In addition, should Step generates attribute definition domain and codomain file.
Using this coding mode, semantic information can be increased newly for the coding of hereinafter RDF data, when TripleItem is When subject or object, affiliated category information can be judged according to its coding, when TripleItem is predicate, can be obtained current The information of the parent attribute of predicate or affiliated class.
In the present embodiment, in the step S2, triple is divided by class, the predicate of the triple is RDF ontology Attribute in file has generated attribute coding in building relation on attributes model, has then only needed the subject of triple and guest Language is divided by class;Since triple item TripleItem may not be unique in RDF data, to delete duplicate triple item TripleItem, it is ensured that the uniqueness of triple item TripleItem guarantees that identical TripleItem will not be assigned to difference Coding.Further, since difference triple item TripleItem may share identical URI, in order to ensure coding has semanteme Similitude extracts identical prefix (NameSpace) according to RDF data file so that similar URI is encoded into similar number. In addition, the MultipleOutputFormat that the step needs to rewrite MapReduce enables the file of output by class file Output.
Preferably, the triple item TripleItem is the subject, predicate or object of triple, is defined as:
Wherein, n indicates the sum of triple;
IfThen v ∈ TripleItem.
Specific step is as follows for the classification of the triple item TripleItem and filter algorithm:
Input: RDF triplet format file
Output: TripleItem presses the class file that class divides;Prefix code relational file
Specific pseudocode is as shown in following table one.
The classification and filter algorithm of one triple item TripleItem of table
In the present embodiment, in the step S3, the result of triple item TripleItem classification and filter algorithm is obtained File, the input file as triple item coding;At class file of the Map stage to triple item TripleItem Reason, encodes triple item TripleItem in the Reduce stage, while generating triple item TripleItem and compiling with it The dictionary map file of code, will be in the storage to the HDFS of cluster of dictionary map file;Each triple item TripleItem Coded format are as follows: affiliated class coding+prefix code+mantissa coding;
The triple item TripleItem encryption algorithm specifically:
Input: triple item TripleItem presses the class file that class divides;Prefix code relational file;
Output: dictionary map file.
Specific pseudocode is as shown in following table two.
Two triple item TripleItem encryption algorithm of table
By taking the triple segment (such as table three) in LUBM data set as an example, the process of TripleItem encryption algorithm is described.
The RDF triple data slot that table three inputs
For terseness described below, the mapping relations of TripleItem number and initial data are generated according to table three Table, as listed by table four.Wherein, the subject of the 3rd triple is identical as the 5th object of triple in table three, then only corresponds to one A TripleItem.
The mapping relations of four TripleItem of table number and initial data
Using the inputting as the classification of triple item TripleItem and filter algorithm of triple segment listed by table three To the class file and prefix code relational file of TripleItem, it is assumed that the prefix code relational file of acquisition is as listed by table five.
Triple segment prefix code information listed by table five
Prefix Prefix code
(xmlns:)http://swat.cse.lehigh.edu/onto/univ-bench.owl# 01
(rdf:)http://www.w3.org/1999/02/22-rdf-syntax-ns# 02
http://www.Department0.University0.edu/ 03
http://www.Department2.University0.edu/ 04
Without prefix (such as Literal categorical data) 00
Using the destination file of the classification of triple item TripleItem and filter algorithm as TripleItem encryption algorithm Input, and class (attribute) relational model can obtain the coding of TripleItem.Assuming that threshold alpha indicates that TripleItem is compiled Ma Zhong mantissa digit, according to the triple segment of table three, encodes if α=3 in conjunction with table four and the obtained TripleItem of table five Information is as listed by table six.
Six TripleItem encoded information of table
In the present embodiment, in the step S4, according to what is generated in the triple item TripleItem encryption algorithm Dictionary mapping table encodes each of the RDF triplet format file of input triple;By triple item TripleItem is attached with dictionary mapping table, to generate the coding of triple;
The triple encryption algorithm specifically:
Input: RDF triplet format file and dictionary map file;
Output: the RDF file of generation is encoded.
Specific pseudocode is as shown in following table seven.
Seven triple encryption algorithm of table
In the present embodiment, in the step S5, SCOM algorithm is a lossless compression algorithm, the reversion algorithm in SCOM RDF data file after coding can quickly be restored completely is initial data.Since SCOM algorithm establishes TripleItem And its RDF file that coding generates can be easily inverted to original RDF text in conjunction with dictionary table by the dictionary table of coding Part.In order to which definitely SCOM inverts algorithm, it is described as follows in the form of pseudocode shown in table eight:
Eight SCOM of table inverts algorithm
The RDF data distributed parallel semantic coding scheme that the present embodiment proposes, can be under large-scale data efficiently The distributed parallel coding of RDF data is completed, and can be realized the reversion of RDF data.Experiment shows the encoding scheme compared to existing Some encoding schemes all have significant advantage in compressed encoding stage and reversion stage, and the encoding scheme is able to ascend RDFS rule-based reasoning.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, is all covered by the present invention.

Claims (6)

1. a kind of RDF data distributed parallel semantic coding method, it is characterised in that: specifically includes the following steps:
Step S1: reading in RDF ontology file, constructs class relational model and relation on attributes model, generates the mapping of class and its coding The mapped file of file and attribute and its coding;
Step S2: reading in RDF data file, ternary component be cut into triple item, triple item is divided by class, and delete weight Multiple triple item, while generating prefix code;Triple item is filtered, to ensure the consistent of RDF triple coding Property, so that the same triple item will not be assigned to different codings;
Step S3: triple item is encoded, and generates dictionary table;
Step S4: triple is encoded, the triple file after generating coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion generates former Beginning RDF data file;
Wherein, in the step S1, the ontology file of RDF data format is subjected to Jena parsing first, is generated according to class relationship Relational tree constructs the model of class relationship;
Wherein, generic attribute type mark Flag is defined to identify the class and the attribute, it is assumed that current data v, then
Definition tree nodes encoding digit TreenodeDigit, abbreviation TD, if total node number is M,
It defines class and encodes TreeClasscode, abbreviation TC, TC are by Flag, lineal parent number label, class node sequential encoding It encodes and constitutes with node sequence;Wherein, total node number M, the position of digit and the node sequence coding of class node sequential encoding Number is all TD (M);TC (h, i) indicates the class nodes encoding of h layers of i-th of node A;F (h, i) indicates h layers of i-th of section The node sequence of point A encodes, and REPT (0, n) is expressed as generating n 0;If anc (h) indicates h layers of class node sequence coding, f (h-1, m) indicates the node sequence coding of the class node B of node A, then
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &f (h-1, m) &REPT (0, TD (M)-TD (f (h,i)))&f(h,i)
As IPF > 1, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute encode TreePropertycode, abbreviation TP, TP by Flag, class coding, parent attribute node sequence coding and Node sequence coding is constituted;Wherein, total node number M, the digit of parent attribute node sequence coding and the position of node sequence coding Number is all TD (M);TP (h, i) indicates the attribute node coding of h layers of i-th of node C, and class belonging to C is set as R, class section Point coded representation is TC (p, r);F (h, i) indicates that the node sequence coding of h layers of i-th of node C, REPT (0, n) are expressed as Generate n 0;If anc (h) indicates h layers of attribute node sequential encoding, f (h-1, m) indicates the parent attribute node D's of node C Node sequence coding, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m))) &f (h-1, m) &REPT (0, TD (M)-TD (f(h,i)))&f(h,i);
The relational tree is a multiway tree, and the definition encoded by breadth first algorithm combination class obtains the pass of relation on attributes System tree generates class coding.
2. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step In rapid S2, triple is divided by class, the predicate of the triple is the attribute in RDF ontology file, in building relation on attributes Attribute coding has been generated in model, then only needs to divide the subject of triple and object by class;If triple in RDF data TripleItem is not unique, then while triple item TripleItem is divided by class to triple item TripleItem into Row filtering;
The classification and filter algorithm of the triple item TripleItem specifically: input RDF triplet format file;Output three Tuple item TripleItem presses the class file that class divides and prefix code relational file;
If different triple item TripleItem share identical URI, in order to ensure coding has Semantic Similarity, so that phase It is encoded into similar number like URI, then identical prefix is extracted according to RDF data file;
The MultipleOutputFormat that the step needs to rewrite MapReduce enables the file of output defeated by class file Out.
3. a kind of RDF data distributed parallel semantic coding method according to claim 2, it is characterised in that: described three Tuple item TripleItem is the subject, predicate or object of triple, is defined as:
Wherein, n indicates the sum of triple;
IfThen v ∈ TripleItem.
4. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step In rapid S3, the destination file of triple item TripleItem classification and filter algorithm is obtained, the input as triple item coding File;It is handled in class file of the Map stage to triple item TripleItem, in the Reduce stage to triple item TripleItem is encoded, while generating the dictionary map file that triple item TripleItem is encoded with it, by dictionary In map file storage to the HDFS of cluster;Each triple item TripleItem coded format are as follows: affiliated class coding+prefix Coding+mantissa coding;
The triple item TripleItem encryption algorithm specifically: input triple item TripleItem presses the class that class divides File and prefix code relational file;Export dictionary map file.
5. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step In rapid S4, according to the dictionary mapping table generated in the triple item TripleItem encryption algorithm, to the RDF triple of input Each of formatted file triple is encoded;Triple item TripleItem and dictionary mapping table are attached, from And generate the coding of triple;
The triple encryption algorithm specifically: input RDF triplet format file and dictionary map file
The RDF file that exports coding generates.
6. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step In rapid S5, the dictionary table of triple item TripleItem and its coding is established according to SCOM algorithm, in conjunction with dictionary table, is used SCOM inverts algorithm and the RDF file that coding generates is inverted to original RDF file.
CN201610242787.1A 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method Active CN105930419B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610242787.1A CN105930419B (en) 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610242787.1A CN105930419B (en) 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method

Publications (2)

Publication Number Publication Date
CN105930419A CN105930419A (en) 2016-09-07
CN105930419B true CN105930419B (en) 2019-08-09

Family

ID=56838391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610242787.1A Active CN105930419B (en) 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method

Country Status (1)

Country Link
CN (1) CN105930419B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144123B (en) * 2018-10-16 2024-02-02 工业互联网创新中心(上海)有限公司 Industrial Internet identification analysis data dictionary construction method
CN110110329B (en) * 2019-04-30 2022-05-17 湖南星汉数智科技有限公司 Entity behavior extraction method and device, computer device and computer readable storage medium
CN110457491A (en) * 2019-08-19 2019-11-15 中国农业大学 A kind of knowledge mapping reconstructing method and device based on free state node
CN112182139A (en) * 2019-08-29 2021-01-05 盈盛智创科技(广州)有限公司 Method, device and equipment for tracing resource description framework triple
CN110516079B (en) * 2019-08-29 2022-04-29 北京大学 RDF object model class hierarchical tree establishing method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462610A (en) * 2015-01-06 2015-03-25 福州大学 Distributed type RDF storage and query optimization method combined with body
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding
CN104615703A (en) * 2015-01-30 2015-05-13 福州大学 RDF data distributed parallel inference method combined with Rete algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890518B2 (en) * 2007-03-29 2011-02-15 Franz Inc. Method for creating a scalable graph database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462610A (en) * 2015-01-06 2015-03-25 福州大学 Distributed type RDF storage and query optimization method combined with body
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding
CN104615703A (en) * 2015-01-30 2015-05-13 福州大学 RDF data distributed parallel inference method combined with Rete algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高可扩展的RDF数据存储系统;袁平鹏;《计算机研究与发展》;20121220;第49卷(第10期);第2131-2141页 *

Also Published As

Publication number Publication date
CN105930419A (en) 2016-09-07

Similar Documents

Publication Publication Date Title
CN105930419B (en) RDF data distributed parallel semantic coding method
Grigorchuk et al. Spectra of Schreier graphs of Grigorchuk’s group and Schroedinger operators with aperiodic order
CN107017992A (en) A kind of high-performance alliance block chain based on duplex structure
Wardani et al. Semantic mapping relational to graph model
Yang et al. Resolving structural conflicts in the integration of XML schemas: A semantic approach
Yang et al. An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters
Macke et al. Lifting the curse of multidimensional data with learned existence indexes
CN109710775A (en) A kind of knowledge mapping dynamic creation method based on more rules
Bouhali et al. Exploiting RDF open data using NoSQL graph databases
CN104462610B (en) Distributed RDF storages and enquiring and optimizing method with reference to body
CN108595588B (en) Scientific data storage association method
Fathy et al. ProGOMap: automatic generation of mappings from property graphs to ontologies
Roul et al. GM-Tree: An efficient frequent pattern mining technique for dynamic database
Kumar et al. Fuzzy clustering of web documents using equivalence relations and fuzzy hierarchical clustering
Liu et al. Incremental mining algorithm of sequential patterns based on sequence tree
Qin et al. Efficient XML query and update processing using a novel prime-based middle fraction labeling scheme
CN112395286B (en) Chained data table connection method, device, equipment and storage medium
Feng et al. An Approach to Converting Relational Database to Graph Database: from MySQL to Neo4j
KR20240004518A (en) Decoder, encoder, control section, method and computer program for updating neural network parameters using node information
CN101131699A (en) Construction method for structure tree with genetic information
Dawelbeit et al. Efficient dictionary compression for processing RDF big data using google BigQuery
Ji et al. An improved random walk based community detection algorithm
Wu et al. Privacy-protection path finding supporting the ranked order on encrypted graph in big data environment
Tejaswi et al. Semantic inference method using ontologies
Xu et al. Construction of Ontology Knowledge by Attribute Reduction and Rule Extraction of Three-Way Formal Concept Analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant