CN105930419B - RDF data distributed parallel semantic coding method - Google Patents
RDF data distributed parallel semantic coding method Download PDFInfo
- Publication number
- CN105930419B CN105930419B CN201610242787.1A CN201610242787A CN105930419B CN 105930419 B CN105930419 B CN 105930419B CN 201610242787 A CN201610242787 A CN 201610242787A CN 105930419 B CN105930419 B CN 105930419B
- Authority
- CN
- China
- Prior art keywords
- triple
- class
- file
- coding
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The present invention relates to a kind of RDF data distributed parallel semantic coding methods, specifically includes the following steps: step S1: reading in RDF ontology file, construct class relational model and relation on attributes model;Step S2: reading in RDF data file, ternary component be cut into triple item, triple item is divided by class, and delete duplicate triple item, while generating prefix code;Triple item is filtered, to ensure the consistency of RDF triple coding, so that the same triple item will not be assigned to different codings;Step S3: triple item is encoded, and generates dictionary table;Step S4: triple is encoded, the triple file after generating coding;Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion generates original RDF data file.The present invention can efficiently realize compressed encoding and the reversion of large-scale data under distributed environment in conjunction with ontology.
Description
Technical field
The present invention relates to semantic network technology fields, more particularly to a kind of RDF data distributed parallel semantic coding method.
Background technique
Since the extensive property of RDF data makes its management, there are limitations, in order to accelerate the inquiry of RDF data or push away
Reason, reduces the memory space of data, and common practice is exactly to carry out compressed encoding to triple.Compressed encoding has proven to
A kind of efficient coding finally will by the way that a numerical value (ID) is replaced original triple item (subject or predicate or object)
All triple data are converted to the data of numerical value formula.Centralized environment is not suitable for due to the limitation of memory to extensive number
According to coding.Research RDF data distributed parallel Coding Compression Algorithm is a newer at present field.Goodman et al. exists
The method for proposing to adapt to linear probing on Gray XMT machine, is realized on single dictionary table parallel by parallel Hash
Coding.The scramble time of this algorithm and used computer nucleus number are linear, and this method requires all data to protect
The Gray XMT in shared drive framework is deposited in memory and depended critically upon, common distributed memory system is not suitable for.
LongCheng et al. compresses RDF data using X10 language.Triple is filtered first, further according to triple item
Hash value be assigned to triple data are equal number of different node and carry out local coder, and generate multiple dictionary tables.
Urbani et al. proposes distribution MapReduce data compression algorithm, is broadly divided into data compression stage and data reversal stage,
Wherein triple is compressed in the data compression stage, and constructs dictionary table;In the reversion stage, by compressed triple and
Dictionary table is attached, to generate original triple data.This algorithm is not high enough in data reversal stage efficiency.
The above are current RDF data distributed parallel compression newest research results, and presently, there are three kinds effectively
The parallelly compressed algorithm of RDF data, can be realized the parallelly compressed coding of magnanimity RDF data, but these compression algorithms do not consider
In conjunction with ontology file, therefore the triple after coding does not indicate any semantic information, is unfavorable for later period progress distributed query
Or semantic reasoning.There is presently no the parallel semantic codings for combining ontology file to realize RDF data.
The demand of mass data is unable to satisfy under centralized environment, and the compressed encoding under distributed environment does not indicate to appoint
What semantic information, is unfavorable for distributed query or reasoning.Some distributed compression algorithms are inadequate in the efficiency in data reversal stage
It is high.
The technical issues that need to address: how to guarantee the uniqueness of triple item coding in 1. solution distributed environments, i.e.,
Identical triple item will not be assigned to different codings.2. solving the lossless compression for how guaranteeing coding in distributed environment
Matter, that is, the triple after encoding can be reversed to original triple.3. it is corresponding to combine the distributed schemes proposed to propose
Parallel coding scheme, to meet the demand of the distributed parallel semantic coding of large-scale data.
Summary of the invention
In view of this, the object of the present invention is to provide a kind of RDF data distributed parallel semantic coding method, mainly in combination with
Ontology encodes RDF data, so that the coding of RDF triple has semantic information and coding has regularity, is conducive to divide
The completion of cloth inquiry and semantic reasoning combines ontology that can efficiently realize the compression of large-scale data under distributed environment
Coding and reversion.
The present invention is realized using following scheme: a kind of RDF data distributed parallel semantic coding method specifically includes following
Step:
Step S1: reading RDF ontology file, building class relational model and relation on attributes model, generation class and its coding
The mapped file of mapped file and attribute and its coding;
Step S2: reading in RDF data file, ternary component be cut into triple item, and triple item is divided by class, and deletes
Except duplicate triple item, while generating prefix code;Triple item is filtered, to ensure RDF triple coding
Consistency, so that the same triple item will not be assigned to different codings;
Step S3: triple item is encoded, and generates dictionary table;
Step S4: triple is encoded, the triple file after generating coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion life
At original RDF data file.
Further, in the step S1, the ontology file of RDF data format is subjected to Jena parsing first, according to class
Relationship production Methods tree constructs the model of class relationship;
Wherein, generic attribute type mark Flag is defined to identify the class and the attribute, it is assumed that current data v,
Then
Definition tree nodes encoding digit TreenodeDigit, abbreviation TD, if total node number is M,
It defines class and encodes TreeClasscode, abbreviation TC, TC are by Flag, lineal parent number label, class node sequence
Coding and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of class node sequential encoding encode
Digit be all TD (M);TC (h, i) indicates the class nodes encoding of h layers of i-th of node A;F (h, i) indicates the i-th of h layers
The node sequence of a node A encodes, and REPT (0, n) is expressed as generating n 0;If anc (h) indicates that h layers of class node sequence is compiled
Code, f (h-1, m) indicate the node sequence coding of the class node B of node A, then
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute encodes TreePropertycode, and abbreviation TP, TP are compiled by Flag, class coding, parent attribute node sequence
Code and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of parent attribute node sequence coding encode
Digit be all TD (M);TP (h, i) indicates the attribute node coding of h layers of i-th of node C, and class belonging to C is set as R,
Class nodes encoding is expressed as TC (p, r);F (h, i) indicates the node sequence coding of h layers of i-th of node C, REPT (0, n) table
It is shown as generating n 0;If anc (h) indicates h layers of attribute node sequential encoding, f (h-1, m) indicates the parent attribute section of node C
The node sequence of point D encodes, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
The relational tree is a multiway tree, and the definition encoded by breadth first algorithm combination class obtains relation on attributes
Relational tree, generate class coding.
Further, in the step S2, triple is divided by class, the predicate of the triple is RDF ontology file
In attribute, generated attribute coding in building relation on attributes model, then only need by the subject of triple and object by
Class divides;If triple item TripleItem is not unique in RDF data, in triple item TripleItem by the same of class division
When triple item TripleItem is filtered;
The triple item TripleItem classification and filter algorithm specifically: input RDF triplet format file;It is defeated
Triple item TripleItem presses the class file and prefix code relational file that class divides out;
If different triple item TripleItem share identical URI, in order to ensure coding has Semantic Similarity, make
It obtains similar URI and is encoded into similar number, then identical prefix is extracted according to RDF data file;
The MultipleOutputFormat that the step needs to rewrite MapReduce enables the file of output by class text
Part output.
Preferably, the triple item TripleItem is the subject, predicate or object of triple, is defined as:
Wherein, n indicates the sum of triple;
IfThen v ∈ TripleItem.
Further, in the step S3, the destination file of triple item TripleItem classification and filter algorithm is obtained,
Input file as triple item coding;It is handled in class file of the Map stage to triple item TripleItem,
The Reduce stage encodes triple item TripleItem, while generating the word that triple item TripleItem is encoded with it
Allusion quotation map file, will be in the storage to the HDFS of cluster of dictionary map file;Each triple item TripleItem encodes lattice
Formula are as follows: affiliated class coding+prefix code+mantissa coding;
The triple item TripleItem encryption algorithm specifically: input triple item TripleItem is divided by class
Class file and prefix code relational file;Export dictionary map file.
Further, in the step S4, according to the dictionary generated in the triple item TripleItem encryption algorithm
Mapping table encodes each of the RDF triplet format file of input triple;By triple item TripleItem
It is attached with dictionary mapping table, to generate the coding of triple;
The triple encryption algorithm specifically: input RDF triplet format file and dictionary map file
The RDF file that exports coding generates.
Further, in the step S5, triple item TripleItem and its coding are established according to SCOM algorithm
The RDF file that coding generates is inverted to original RDF file using SCOM reversion algorithm in conjunction with dictionary table by dictionary table.
Compared with prior art, the present invention has the advantage that RDF data distributed parallel semanteme proposed by the present invention is compiled
Code scheme, can efficiently complete the distributed parallel coding of RDF data under large-scale data, and can be realized RDF data
Reversion;The encoding scheme compares existing encoding scheme, all has significant advantage in compressed encoding stage and reversion stage, and
And the encoding scheme is able to ascend RDFS rule-based reasoning.
Detailed description of the invention
Fig. 1 is method frame schematic diagram of the invention.
Fig. 2 is the middle part LUBM classification relation model schematic in the present invention.
Fig. 3 is to adhere to sexual intercourse model separately in the middle part of LUBM in the present invention
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
The present embodiment provides a kind of RDF data distributed parallel semantic coding method (Semantic Coding with
Ontology on MapReduce, abbreviation SCOM).It is closed according to the characteristics of MapReduce in conjunction with ontological construction class relationship and attribute
It is model, sorting code number is carried out to RDF data according to model, so as to realize the distributed parallel compressed encoding of RDF data,
Wherein SCOM scheme is divided into compressed encoding stage and reversion stage, as shown in Figure 1, specifically includes the following steps:
Step S1: reading RDF ontology file, building class relational model and relation on attributes model, generation class and its coding
The mapped file of mapped file and attribute and its coding;
Step S2: reading in RDF data file, ternary component be cut into triple item, and triple item is divided by class, and deletes
Except duplicate triple item, while generating prefix code;Triple item is filtered, to ensure RDF triple coding
Consistency, so that the same triple item will not be assigned to different codings;
Step S3: triple item is encoded, and generates dictionary table;
Step S4: triple is encoded, the triple file after generating coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion life
At original RDF data file.
In the present embodiment, it in the step S1, in order to make the coding of RDF data that there is semantic information, needs
Generate class coding.Since the predicate in RDF data is all defined in ontology file, and quantity is far less than in RDF data
Subject or object, thus the step needs to complete the coding of attribute after encoding class.
In the present embodiment, following several definition are provided:
Generic attribute type mark Flag is defined to identify the class and the attribute, it is assumed that current data v, then
Definition tree nodes encoding digit TreenodeDigit, abbreviation TD, if total node number is M,
It defines class and encodes TreeClasscode, abbreviation TC, TC are by Flag, lineal parent number label, class node sequence
Coding and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of class node sequential encoding encode
Digit be all TD (M);TC (h, i) indicates the class nodes encoding of h layers of i-th of node A;F (h, i) indicates the i-th of h layers
The node sequence of a node A encodes, and REPT (0, n) is expressed as generating n 0;If anc (h) indicates that h layers of class node sequence is compiled
Code, f (h-1, m) indicate the node sequence coding of the class node B of node A, then
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute encodes TreePropertycode, and abbreviation TP, TP are compiled by Flag, class coding, parent attribute node sequence
Code and node sequence coding are constituted;Wherein, total node number M, the digit and node sequence of parent attribute node sequence coding encode
Digit be all TD (M);TP (h, i) indicates the attribute node coding of h layers of i-th of node C, and class belonging to C is set as R,
Class nodes encoding is expressed as TC (p, r);F (h, i) indicates the node sequence coding of h layers of i-th of node C, REPT (0, n) table
It is shown as generating n 0;If anc (h) indicates h layers of attribute node sequential encoding, f (h-1, m) indicates the parent attribute section of node C
The node sequence of point D encodes, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
In the present embodiment, the ontology file of RDF data format is carried out Jena parsing first by the step S2, according to
Class relationship production Methods tree (subclass and parent) constructs the model of class relationship;The relational tree is a multiway tree, passes through range
The definition of priority algorithm combination class coding, obtains the relational tree of relation on attributes, generates class coding.
By taking the class segment in LUBM data set as an example, it is assumed that according to bits of coded determined by the definition of tree node number of encoding bits
Number is 2, then the class relational model constructed is as shown in Fig. 2, the first expression class label wherein encoded, second indicate lineal father
Class number label, third position and the 4th combination constitute the lineal class node sequential encoding of current class, last two structures
It is encoded at the node sequence of current class.Wherein, Things class is the parent of all classes, its lineal parent number is labeled as 0
(i.e. IPF=0).
In view of the possible more than one of the parent of a class, it is assumed that Part-timeGraduateStudent class is (on-job to grind
Study carefully life) lineal parent be Fig. 2 class relational model in GraduateStudent class (postgraduate) and
TeachingAssistant class (assistant).Then, the class node sequential encoding of Part-timeGraduateStudent class is
The combination (i.e. 0911) of the sequential encoding of GraduateStudent class and TeachingAssistant class.At this point, Part-
TimeGraduateStudent class is encoded to 02091113, wherein IPF=2 indicates Part-timeGraduateStudent
There are two lineal parents for class.
It is similar with building class relational tree, the relational tree of attribute is constructed, attribute is encoded, is with class coding difference
Attribute coding needs to add class encoded information, so that attribute coding contains semantic information.
By taking the attribute segment in LUBM data set as an example, it is assumed that encoded according to determined by the definition of tree node number of encoding bits
Digit is 2, and in conjunction with the class coding in Fig. 2, then relation on attributes model is as shown in figure 3, the first expression attribute mark wherein encoded
Note, the group of second to the 7th are combined into the class coding (the domain class of attribute) of current attribute, the 8th and the 9th group
It is combined into the lineal parent attribute node sequence coding of current attribute, last two node sequences for current attribute encode.In addition, should
Step generates attribute definition domain and codomain file.
Using this coding mode, semantic information can be increased newly for the coding of hereinafter RDF data, when TripleItem is
When subject or object, affiliated category information can be judged according to its coding, when TripleItem is predicate, can be obtained current
The information of the parent attribute of predicate or affiliated class.
In the present embodiment, in the step S2, triple is divided by class, the predicate of the triple is RDF ontology
Attribute in file has generated attribute coding in building relation on attributes model, has then only needed the subject of triple and guest
Language is divided by class;Since triple item TripleItem may not be unique in RDF data, to delete duplicate triple item
TripleItem, it is ensured that the uniqueness of triple item TripleItem guarantees that identical TripleItem will not be assigned to difference
Coding.Further, since difference triple item TripleItem may share identical URI, in order to ensure coding has semanteme
Similitude extracts identical prefix (NameSpace) according to RDF data file so that similar URI is encoded into similar number.
In addition, the MultipleOutputFormat that the step needs to rewrite MapReduce enables the file of output by class file
Output.
Preferably, the triple item TripleItem is the subject, predicate or object of triple, is defined as:
Wherein, n indicates the sum of triple;
IfThen v ∈ TripleItem.
Specific step is as follows for the classification of the triple item TripleItem and filter algorithm:
Input: RDF triplet format file
Output: TripleItem presses the class file that class divides;Prefix code relational file
Specific pseudocode is as shown in following table one.
The classification and filter algorithm of one triple item TripleItem of table
In the present embodiment, in the step S3, the result of triple item TripleItem classification and filter algorithm is obtained
File, the input file as triple item coding;At class file of the Map stage to triple item TripleItem
Reason, encodes triple item TripleItem in the Reduce stage, while generating triple item TripleItem and compiling with it
The dictionary map file of code, will be in the storage to the HDFS of cluster of dictionary map file;Each triple item TripleItem
Coded format are as follows: affiliated class coding+prefix code+mantissa coding;
The triple item TripleItem encryption algorithm specifically:
Input: triple item TripleItem presses the class file that class divides;Prefix code relational file;
Output: dictionary map file.
Specific pseudocode is as shown in following table two.
Two triple item TripleItem encryption algorithm of table
By taking the triple segment (such as table three) in LUBM data set as an example, the process of TripleItem encryption algorithm is described.
The RDF triple data slot that table three inputs
For terseness described below, the mapping relations of TripleItem number and initial data are generated according to table three
Table, as listed by table four.Wherein, the subject of the 3rd triple is identical as the 5th object of triple in table three, then only corresponds to one
A TripleItem.
The mapping relations of four TripleItem of table number and initial data
Using the inputting as the classification of triple item TripleItem and filter algorithm of triple segment listed by table three
To the class file and prefix code relational file of TripleItem, it is assumed that the prefix code relational file of acquisition is as listed by table five.
Triple segment prefix code information listed by table five
Prefix | Prefix code |
(xmlns:)http://swat.cse.lehigh.edu/onto/univ-bench.owl# | 01 |
(rdf:)http://www.w3.org/1999/02/22-rdf-syntax-ns# | 02 |
http://www.Department0.University0.edu/ | 03 |
http://www.Department2.University0.edu/ | 04 |
Without prefix (such as Literal categorical data) | 00 |
Using the destination file of the classification of triple item TripleItem and filter algorithm as TripleItem encryption algorithm
Input, and class (attribute) relational model can obtain the coding of TripleItem.Assuming that threshold alpha indicates that TripleItem is compiled
Ma Zhong mantissa digit, according to the triple segment of table three, encodes if α=3 in conjunction with table four and the obtained TripleItem of table five
Information is as listed by table six.
Six TripleItem encoded information of table
In the present embodiment, in the step S4, according to what is generated in the triple item TripleItem encryption algorithm
Dictionary mapping table encodes each of the RDF triplet format file of input triple;By triple item
TripleItem is attached with dictionary mapping table, to generate the coding of triple;
The triple encryption algorithm specifically:
Input: RDF triplet format file and dictionary map file;
Output: the RDF file of generation is encoded.
Specific pseudocode is as shown in following table seven.
Seven triple encryption algorithm of table
In the present embodiment, in the step S5, SCOM algorithm is a lossless compression algorithm, the reversion algorithm in SCOM
RDF data file after coding can quickly be restored completely is initial data.Since SCOM algorithm establishes TripleItem
And its RDF file that coding generates can be easily inverted to original RDF text in conjunction with dictionary table by the dictionary table of coding
Part.In order to which definitely SCOM inverts algorithm, it is described as follows in the form of pseudocode shown in table eight:
Eight SCOM of table inverts algorithm
The RDF data distributed parallel semantic coding scheme that the present embodiment proposes, can be under large-scale data efficiently
The distributed parallel coding of RDF data is completed, and can be realized the reversion of RDF data.Experiment shows the encoding scheme compared to existing
Some encoding schemes all have significant advantage in compressed encoding stage and reversion stage, and the encoding scheme is able to ascend
RDFS rule-based reasoning.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with
Modification, is all covered by the present invention.
Claims (6)
1. a kind of RDF data distributed parallel semantic coding method, it is characterised in that: specifically includes the following steps:
Step S1: reading in RDF ontology file, constructs class relational model and relation on attributes model, generates the mapping of class and its coding
The mapped file of file and attribute and its coding;
Step S2: reading in RDF data file, ternary component be cut into triple item, triple item is divided by class, and delete weight
Multiple triple item, while generating prefix code;Triple item is filtered, to ensure the consistent of RDF triple coding
Property, so that the same triple item will not be assigned to different codings;
Step S3: triple item is encoded, and generates dictionary table;
Step S4: triple is encoded, the triple file after generating coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, reversion generates former
Beginning RDF data file;
Wherein, in the step S1, the ontology file of RDF data format is subjected to Jena parsing first, is generated according to class relationship
Relational tree constructs the model of class relationship;
Wherein, generic attribute type mark Flag is defined to identify the class and the attribute, it is assumed that current data v, then
Definition tree nodes encoding digit TreenodeDigit, abbreviation TD, if total node number is M,
It defines class and encodes TreeClasscode, abbreviation TC, TC are by Flag, lineal parent number label, class node sequential encoding
It encodes and constitutes with node sequence;Wherein, total node number M, the position of digit and the node sequence coding of class node sequential encoding
Number is all TD (M);TC (h, i) indicates the class nodes encoding of h layers of i-th of node A;F (h, i) indicates h layers of i-th of section
The node sequence of point A encodes, and REPT (0, n) is expressed as generating n 0;If anc (h) indicates h layers of class node sequence coding, f
(h-1, m) indicates the node sequence coding of the class node B of node A, then
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &f (h-1, m) &REPT (0, TD (M)-TD (f
(h,i)))&f(h,i)
As IPF > 1, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute encode TreePropertycode, abbreviation TP, TP by Flag, class coding, parent attribute node sequence coding and
Node sequence coding is constituted;Wherein, total node number M, the digit of parent attribute node sequence coding and the position of node sequence coding
Number is all TD (M);TP (h, i) indicates the attribute node coding of h layers of i-th of node C, and class belonging to C is set as R, class section
Point coded representation is TC (p, r);F (h, i) indicates that the node sequence coding of h layers of i-th of node C, REPT (0, n) are expressed as
Generate n 0;If anc (h) indicates h layers of attribute node sequential encoding, f (h-1, m) indicates the parent attribute node D's of node C
Node sequence coding, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m))) &f (h-1, m) &REPT (0, TD (M)-TD
(f(h,i)))&f(h,i);
The relational tree is a multiway tree, and the definition encoded by breadth first algorithm combination class obtains the pass of relation on attributes
System tree generates class coding.
2. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step
In rapid S2, triple is divided by class, the predicate of the triple is the attribute in RDF ontology file, in building relation on attributes
Attribute coding has been generated in model, then only needs to divide the subject of triple and object by class;If triple in RDF data
TripleItem is not unique, then while triple item TripleItem is divided by class to triple item TripleItem into
Row filtering;
The classification and filter algorithm of the triple item TripleItem specifically: input RDF triplet format file;Output three
Tuple item TripleItem presses the class file that class divides and prefix code relational file;
If different triple item TripleItem share identical URI, in order to ensure coding has Semantic Similarity, so that phase
It is encoded into similar number like URI, then identical prefix is extracted according to RDF data file;
The MultipleOutputFormat that the step needs to rewrite MapReduce enables the file of output defeated by class file
Out.
3. a kind of RDF data distributed parallel semantic coding method according to claim 2, it is characterised in that: described three
Tuple item TripleItem is the subject, predicate or object of triple, is defined as:
Wherein, n indicates the sum of triple;
IfThen v ∈ TripleItem.
4. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step
In rapid S3, the destination file of triple item TripleItem classification and filter algorithm is obtained, the input as triple item coding
File;It is handled in class file of the Map stage to triple item TripleItem, in the Reduce stage to triple item
TripleItem is encoded, while generating the dictionary map file that triple item TripleItem is encoded with it, by dictionary
In map file storage to the HDFS of cluster;Each triple item TripleItem coded format are as follows: affiliated class coding+prefix
Coding+mantissa coding;
The triple item TripleItem encryption algorithm specifically: input triple item TripleItem presses the class that class divides
File and prefix code relational file;Export dictionary map file.
5. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step
In rapid S4, according to the dictionary mapping table generated in the triple item TripleItem encryption algorithm, to the RDF triple of input
Each of formatted file triple is encoded;Triple item TripleItem and dictionary mapping table are attached, from
And generate the coding of triple;
The triple encryption algorithm specifically: input RDF triplet format file and dictionary map file
The RDF file that exports coding generates.
6. a kind of RDF data distributed parallel semantic coding method according to claim 1, it is characterised in that: the step
In rapid S5, the dictionary table of triple item TripleItem and its coding is established according to SCOM algorithm, in conjunction with dictionary table, is used
SCOM inverts algorithm and the RDF file that coding generates is inverted to original RDF file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610242787.1A CN105930419B (en) | 2016-04-19 | 2016-04-19 | RDF data distributed parallel semantic coding method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610242787.1A CN105930419B (en) | 2016-04-19 | 2016-04-19 | RDF data distributed parallel semantic coding method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105930419A CN105930419A (en) | 2016-09-07 |
CN105930419B true CN105930419B (en) | 2019-08-09 |
Family
ID=56838391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610242787.1A Active CN105930419B (en) | 2016-04-19 | 2016-04-19 | RDF data distributed parallel semantic coding method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105930419B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144123B (en) * | 2018-10-16 | 2024-02-02 | 工业互联网创新中心(上海)有限公司 | Industrial Internet identification analysis data dictionary construction method |
CN110110329B (en) * | 2019-04-30 | 2022-05-17 | 湖南星汉数智科技有限公司 | Entity behavior extraction method and device, computer device and computer readable storage medium |
CN110457491A (en) * | 2019-08-19 | 2019-11-15 | 中国农业大学 | A kind of knowledge mapping reconstructing method and device based on free state node |
CN112182139A (en) * | 2019-08-29 | 2021-01-05 | 盈盛智创科技(广州)有限公司 | Method, device and equipment for tracing resource description framework triple |
CN110516079B (en) * | 2019-08-29 | 2022-04-29 | 北京大学 | RDF object model class hierarchical tree establishing method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462610A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | Distributed type RDF storage and query optimization method combined with body |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
CN104615703A (en) * | 2015-01-30 | 2015-05-13 | 福州大学 | RDF data distributed parallel inference method combined with Rete algorithm |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7890518B2 (en) * | 2007-03-29 | 2011-02-15 | Franz Inc. | Method for creating a scalable graph database |
-
2016
- 2016-04-19 CN CN201610242787.1A patent/CN105930419B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462610A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | Distributed type RDF storage and query optimization method combined with body |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
CN104615703A (en) * | 2015-01-30 | 2015-05-13 | 福州大学 | RDF data distributed parallel inference method combined with Rete algorithm |
Non-Patent Citations (1)
Title |
---|
高可扩展的RDF数据存储系统;袁平鹏;《计算机研究与发展》;20121220;第49卷(第10期);第2131-2141页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105930419A (en) | 2016-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105930419B (en) | RDF data distributed parallel semantic coding method | |
Grigorchuk et al. | Spectra of Schreier graphs of Grigorchuk’s group and Schroedinger operators with aperiodic order | |
CN107017992A (en) | A kind of high-performance alliance block chain based on duplex structure | |
Wardani et al. | Semantic mapping relational to graph model | |
Yang et al. | Resolving structural conflicts in the integration of XML schemas: A semantic approach | |
Yang et al. | An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters | |
Macke et al. | Lifting the curse of multidimensional data with learned existence indexes | |
CN109710775A (en) | A kind of knowledge mapping dynamic creation method based on more rules | |
Bouhali et al. | Exploiting RDF open data using NoSQL graph databases | |
CN104462610B (en) | Distributed RDF storages and enquiring and optimizing method with reference to body | |
CN108595588B (en) | Scientific data storage association method | |
Fathy et al. | ProGOMap: automatic generation of mappings from property graphs to ontologies | |
Roul et al. | GM-Tree: An efficient frequent pattern mining technique for dynamic database | |
Kumar et al. | Fuzzy clustering of web documents using equivalence relations and fuzzy hierarchical clustering | |
Liu et al. | Incremental mining algorithm of sequential patterns based on sequence tree | |
Qin et al. | Efficient XML query and update processing using a novel prime-based middle fraction labeling scheme | |
CN112395286B (en) | Chained data table connection method, device, equipment and storage medium | |
Feng et al. | An Approach to Converting Relational Database to Graph Database: from MySQL to Neo4j | |
KR20240004518A (en) | Decoder, encoder, control section, method and computer program for updating neural network parameters using node information | |
CN101131699A (en) | Construction method for structure tree with genetic information | |
Dawelbeit et al. | Efficient dictionary compression for processing RDF big data using google BigQuery | |
Ji et al. | An improved random walk based community detection algorithm | |
Wu et al. | Privacy-protection path finding supporting the ranked order on encrypted graph in big data environment | |
Tejaswi et al. | Semantic inference method using ontologies | |
Xu et al. | Construction of Ontology Knowledge by Attribute Reduction and Rule Extraction of Three-Way Formal Concept Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |