CN105930419A - RDF data distributed parallel semantic coding method - Google Patents

RDF data distributed parallel semantic coding method Download PDF

Info

Publication number
CN105930419A
CN105930419A CN201610242787.1A CN201610242787A CN105930419A CN 105930419 A CN105930419 A CN 105930419A CN 201610242787 A CN201610242787 A CN 201610242787A CN 105930419 A CN105930419 A CN 105930419A
Authority
CN
China
Prior art keywords
coding
tlv triple
file
class
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610242787.1A
Other languages
Chinese (zh)
Other versions
CN105930419B (en
Inventor
汪璟玢
郑翠春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201610242787.1A priority Critical patent/CN105930419B/en
Publication of CN105930419A publication Critical patent/CN105930419A/en
Application granted granted Critical
Publication of CN105930419B publication Critical patent/CN105930419B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to an RDF data distributed parallel semantic coding method. The method specifically comprises the following steps of S1: reading an RDF ontology file and constructing a class relation model and an attribute relation model; S2: reading an RDF data file, dividing a triple into triple items, classifying the triple items by class, deleting the repeated triple items, and generating prefix codes; filtering the triple items to ensure the consistency of RDF triple codes and enable the same triple item not to be allocated with different codes; S3: coding the triple items to generate a dictionary table; S4: coding the triple to generate a coded triple file; and S5: taking a result file in the step S4 as an input of the step S5, and performing inversion according to the dictionary table in the step S3 to generate an original RDF data file. According to the method, compressed coding and inversion of large-scale data can be efficiently realized in combination with ontology in a distributed environment.

Description

RDF data distributed parallel semantic coding method
Technical field
The present invention relates to semantic network technology field, particularly relate to a kind of RDF data distributed parallel semantic coding method.
Background technology
Owing to the extensive property of RDF data makes its management there is limitation, in order to accelerate the inquiry of RDF data or push away Reason, reduces the memory space of data, and common practice is compressed coding exactly to tlv triple.Compressed encoding has proven to A kind of efficient coding, by replacing original tlv triple item (subject or predicate or object) by a numerical value (ID), the most at last All tlv triple data are converted to the data of numerical value formula.Centralized environment, due to the restriction of internal memory, is not suitable for extensive number According to coding.Research RDF data distributed parallel Coding Compression Algorithm is a newest field.Goodman et al. exists The method proposing on Gray XMT machine to adapt to linear probing, achieves parallel by parallel Hash on single dictionary table Coding.The scramble time of this algorithm is linear with the computer check figure used, and this method requires that all of data are protected Exist in internal memory and depend critically upon the Gray XMT at shared drive framework, not being suitable for common distributed memory system. LongCheng et al. uses X10 language to be compressed RDF data.First tlv triple is filtered, further according to tlv triple item Hash value the different node that is assigned to of the quantity such as tlv triple data is carried out local coder, and generate multiple dictionary table. Urbani et al. proposes distributed MapReduce data compression algorithm, is broadly divided into data compression stage and data reversal stage, Wherein in the data compression stage, tlv triple is compressed, and builds dictionary table;Reversion the stage, will compression after tlv triple and Dictionary table is attached, thus generates original tlv triple data.This algorithm is not high enough in data reversal stage efficiency.
It is more than the newest research results of current RDF data distributed parallel compression, is also that three kinds presently, there are are effective The parallelly compressed algorithm of RDF data, it is possible to realize the parallelly compressed coding of magnanimity RDF data, but these compression algorithms all do not consider In conjunction with ontology file, therefore the tlv triple after coding does not represent any semantic information, is unfavorable for that the later stage carries out distributed query Or semantic reasoning.There is presently no and combine ontology file and realize the parallel semantic coding of RDF data.
Cannot meet the demand of mass data under centralized environment, and the compressed encoding under distributed environment does not represent and appoints What semantic information, is unfavorable for distributed query or reasoning.Some distributed compression algorithm is inadequate in the efficiency in data reversal stage High.
The technical issues that need to address: 1. solve how to ensure the uniqueness that tlv triple item encodes, i.e. in distributed environment Identical tlv triple item will not be assigned to different codings.2. solve how distributed environment ensures the lossless compress of coding Matter, i.e. tlv triple after coding can be reversed to original tlv triple.3. combine the distributed schemes proposed and propose correspondence Parallel coding scheme, thus meet the demand of the distributed parallel semantic coding of large-scale data.
Summary of the invention
In view of this, it is an object of the invention to provide a kind of RDF data distributed parallel semantic coding method, mainly in combination with RDF data is encoded by body so that the coding of RDF tlv triple has regularity with semantic information and coding, is beneficial to divide Cloth inquiry completes with semantic reasoning, combines body and can realize the compression of large-scale data efficiently under distributed environment Coding and reversion.
The present invention uses below scheme to realize: a kind of RDF data distributed parallel semantic coding method, specifically includes following Step:
Step S1: read in RDF ontology file, builds class relational model and relation on attributes model, generation class and coding thereof Mapped file and attribute and the mapped file of coding thereof;
Step S2: read in RDF data file, ternary component is slit into tlv triple item, is divided tlv triple item by class, and deletes Except the tlv triple item repeated, generate prefix code simultaneously;Tlv triple item is filtered, in order to guarantee what RDF tlv triple encoded Concordance so that same tlv triple item will not be assigned to different codings;
Step S3: encoded by tlv triple item, generates dictionary table;
Step S4: tlv triple encoded, generates the tlv triple file after coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, invert raw Become original RDF data file.
Further, in described step S1, first the ontology file of RDF data form is carried out Jena parsing, according to class Relation production Methods tree, builds the model of class relation;
Wherein, definition generic attribute type mark Flag is in order to identify described class and described attribute, it is assumed that current data is v, Then
Definition tree nodes encoding figure place TreenodeDigit, is called for short TD, if total nodes is M, then
T D ( M ) = 1 ( 0 < M < 10 ) T D ( f l o o r ( M / 10 ) ) + 1 ( M &GreaterEqual; 10 )
Definition class coding TreeClasscode, is called for short TC, TC by Flag, lineal parent number labelling, class node order Coding and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of class node sequential encoding encodes with node sequence Figure place be all TD (M);(h i) represents the category node coding of the i-th node A of h layer to TC;(h i) represents the i-th of h layer to f The node sequence coding of individual node A, and REPT (0, n) it is expressed as producing n 0;If the category node order that anc (h) represents h layer is compiled Code, (h-1 m) represents the node sequence coding of the class node B of node A, then to f
f ( h , i ) = m a x { a n c ( h - 1 ) } + i ( h &GreaterEqual; 1 ) 0 ( h = 0 , i = 1 )
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1 time, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute coding TreePropertycode, is called for short TP, TP and is compiled by Flag, class coding, father's attribute node order Code and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of father's attribute node sequential encoding encodes with node sequence Figure place be all TD (M);TP (h, i) represents the attribute node coding of the i-th node C of h layer, and the class belonging to C is set to R, its Category node coded representation be TC (p, r);F (h, i) represents that the node sequence of the i-th node C of h layer encodes, and REPT (0, n) table It is shown as generation n 0;If anc (h) represents the attribute node sequential encoding of h layer, (h-1 m) represents father's attribute joint of node C to f The node sequence coding of some D, then
f ( h , i ) = m a x { a n c ( h - 1 ) } + i ( h &GreaterEqual; 1 ) 0 ( h = 0 , i = 1 )
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
Described relational tree is a multiway tree, is combined the definition of class coding by width first traversal, obtains relation on attributes Relational tree, generate class coding.
Further, in described step S2, tlv triple being divided by class, the predicate of described tlv triple is RDF ontology file In attribute, generated attribute coding building in relation on attributes model, then have only to by the subject of tlv triple and object by Class divides;If tlv triple item TripleItem is unique in RDF data, then press the same of class division at tlv triple item TripleItem Time tlv triple item TripleItem is filtered;
Described tlv triple item TripleItem classification with filter algorithm particularly as follows: input RDF triplet format file;Defeated Go out class file and prefix code relational file that tlv triple item TripleItem is divided by class;
If different tlv triple item TripleItem shares identical URI, in order to ensure coding, there is Semantic Similarity, make Obtain similar URI and be encoded into similar numeral, then extract identical prefix according to RDF data file;
The file that this step needs the MultipleOutputFormat rewriteeing MapReduce to make to export can be by class literary composition Part exports.
It is also preferred that the left described tlv triple item TripleItem is the subject of tlv triple, predicate or object, it is defined as:
&ForAll; ( S i , P j , O k ) , ( 1 &le; i , j , k &le; n )
Wherein, n represents the sum of tlv triple;
IfThen v ∈ TripleItem.
Further, in described step S3, obtain the destination file of tlv triple item TripleItem classification and filter algorithm, Input file as tlv triple item coding;In the Map stage, the class file of tlv triple item TripleItem is processed, Tlv triple item TripleItem is encoded by the Reduce stage, generates the word that tlv triple item TripleItem encodes with it simultaneously Allusion quotation map file, stores dictionary map file on the HDFS of cluster;Each tlv triple item TripleItem encodes lattice Formula is: affiliated class coding+prefix code+mantissa coding;
Described tlv triple item TripleItem encryption algorithm particularly as follows: input tlv triple item TripleItem by class divide Class file and prefix code relational file;Output dictionary map file.
Further, in described step S4, according to the dictionary generated in described tlv triple item TripleItem encryption algorithm Mapping table, encodes each tlv triple in the RDF triplet format file of input;By tlv triple item TripleItem It is attached with dictionary mapping table, thus generates the coding of tlv triple;
Described tlv triple encryption algorithm particularly as follows: input RDF triplet format file and dictionary map file
The RDF file that output coding generates.
Further, in described step S5, establish tlv triple item TripleItem and coding thereof according to SCOM algorithm Dictionary table, in conjunction with dictionary table, uses SCOM reversion algorithm that the RDF file reverse that coding generates is changed into original RDF file.
Compared with prior art, present invention have the advantage that the RDF data distributed parallel semanteme that the present invention proposes is compiled Code scheme, it is possible to be efficiently completed the distributed parallel coding of RDF data under large-scale data, and be capable of RDF data Reversion;This encoding scheme compares existing encoding scheme, all has significant advantage in compressed encoding stage and reversion stage, and And this encoding scheme can promote RDFS rule-based reasoning.
Accompanying drawing explanation
Fig. 1 is the method frame schematic diagram of the present invention.
Fig. 2 is classification relation model schematic in the middle part of LUBM in the present invention.
Fig. 3 is to adhere to sexual relationship model in the present invention in the middle part of LUBM separately
Detailed description of the invention
Below in conjunction with the accompanying drawings and embodiment the present invention will be further described.
The present embodiment provides a kind of RDF data distributed parallel semantic coding method (Semantic Coding with Ontology on MapReduce, is called for short SCOM).Feature according to MapReduce combines ontological construction class relation and attribute closes It is model, according to model, RDF data is carried out sorting code number such that it is able to realize the distributed parallel compressed encoding of RDF data, Wherein SCOM scheme is divided into compressed encoding stage and reversion stage, as it is shown in figure 1, specifically include following steps:
Step S1: read in RDF ontology file, builds class relational model and relation on attributes model, generation class and coding thereof Mapped file and attribute and the mapped file of coding thereof;
Step S2: read in RDF data file, ternary component is slit into tlv triple item, is divided tlv triple item by class, and deletes Except the tlv triple item repeated, generate prefix code simultaneously;Tlv triple item is filtered, in order to guarantee what RDF tlv triple encoded Concordance so that same tlv triple item will not be assigned to different codings;
Step S3: encoded by tlv triple item, generates dictionary table;
Step S4: tlv triple encoded, generates the tlv triple file after coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, invert raw Become original RDF data file.
In the present embodiment, in described step S1, in order to make the coding of RDF data have semantic information, need Generation class encodes.Owing to the predicate in RDF data is all defined in ontology file, and quantity is far less than in RDF data Subject or object, thus this step has needed the coding of attribute after encoding class.
In the present embodiment, following several definition is given:
Generic attribute type mark Flag is in order to identify described class and described attribute in definition, it is assumed that current data is v, then
Definition tree nodes encoding figure place TreenodeDigit, is called for short TD, if total nodes is M, then
T D ( M ) = 1 ( 0 < M < 10 ) T D ( f l o o r ( M / 10 ) ) + 1 ( M &GreaterEqual; 10 )
Definition class coding TreeClasscode, is called for short TC, TC by Flag, lineal parent number labelling, class node order Coding and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of class node sequential encoding encodes with node sequence Figure place be all TD (M);(h i) represents the category node coding of the i-th node A of h layer to TC;(h i) represents the i-th of h layer to f The node sequence coding of individual node A, and REPT (0, n) it is expressed as producing n 0;If the category node order that anc (h) represents h layer is compiled Code, (h-1 m) represents the node sequence coding of the class node B of node A, then to f
f ( h , i ) = m a x { a n c ( h - 1 ) } + i ( h &GreaterEqual; 1 ) 0 ( h = 0 , i = 1 )
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1 time, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute coding TreePropertycode, is called for short TP, TP and is compiled by Flag, class coding, father's attribute node order Code and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of father's attribute node sequential encoding encodes with node sequence Figure place be all TD (M);TP (h, i) represents the attribute node coding of the i-th node C of h layer, and the class belonging to C is set to R, its Category node coded representation be TC (p, r);F (h, i) represents that the node sequence of the i-th node C of h layer encodes, and REPT (0, n) table It is shown as generation n 0;If anc (h) represents the attribute node sequential encoding of h layer, (h-1 m) represents father's attribute joint of node C to f The node sequence coding of some D, then
f ( h , i ) = m a x { a n c ( h - 1 ) } + i ( h &GreaterEqual; 1 ) 0 ( h = 0 , i = 1 )
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
In the present embodiment, described step S2, first the ontology file of RDF data form is carried out Jena parsing, according to Class relation production Methods tree (subclass and parent), builds the model of class relation;Described relational tree is a multiway tree, passes through range Priority algorithm combines the definition of class coding, obtains the relational tree of relation on attributes, generates class coding.
As a example by class fragment in LUBM data set, it is assumed that according to bits of coded determined by the definition of tree node coding figure place Number is 2, then the class relational model built is as in figure 2 it is shown, first wherein encoded represents class labelling, and second represents lineal father Class number labelling, the combination of the 3rd and the 4th constitutes the lineal class node sequential encoding of current class, last two structures The node sequence having become current class encodes.Wherein, Things class is the parent of all classes, and its lineal parent number is labeled as 0 (i.e. IPF=0).
Parent in view of a class may more than one, it is assumed that Part-timeGraduateStudent class (on-job is ground Study carefully life) lineal parent be the GraduateStudent class (postgraduate) in Fig. 2 class relational model and TeachingAssistant class (assistant).Then, the class node sequential encoding of Part-timeGraduateStudent class is The combination (i.e. 0911) of the sequential encoding of GraduateStudent class and TeachingAssistant class.Now, Part- TimeGraduateStudent class be encoded to 02091113, wherein, IPF=2 represents Part-timeGraduateStudent Class has two lineal parents.
Similar with building class relational tree, construct the relational tree of attribute, attribute is encoded, encode different being from class Attribute coding needs to add class coding information so that attribute coding contains semantic information.
As a example by attribute fragment in LUBM data set, it is assumed that encode determined by the definition of tree node coding figure place Figure place is 2, encodes in conjunction with the class in Fig. 2, then relation on attributes model is as it is shown on figure 3, first wherein encoded represents attribute mark Note, class coding (the definition territory class of attribute) being combined as current attribute of second to the 7th, the 8th and the group of the 9th Being combined into lineal father's attribute node sequential encoding of current attribute, last two is the node sequence coding of current attribute.Additionally, should Step generates attribute definition territory and codomain file.
Use this coded system, it is possible to the coding for hereinafter RDF data increases semantic information newly, when TripleItem is When subject or object, it is possible to judge affiliated category information according to its coding, when TripleItem is predicate, it is possible to obtain current Father's attribute of predicate or the information of affiliated class.
In the present embodiment, in described step S2, tlv triple being divided by class, the predicate of described tlv triple is RDF body Attribute in file, has generated attribute coding building in relation on attributes model, then have only to the subject of tlv triple and guest Language is divided by class;Owing in RDF data, tlv triple item TripleItem uniquely, thus may not delete the tlv triple item of repetition TripleItem, it is ensured that the uniqueness of tlv triple item TripleItem, it is ensured that identical TripleItem will not be assigned to difference Coding.Additionally, due to different tlv triple item TripleItem may share identical URI, in order to ensure coding, there is semanteme Similarity so that similar URI is encoded into similar numeral, extracts identical prefix (NameSpace) according to RDF data file. Additionally, this step need rewrite MapReduce MultipleOutputFormat make output file can be by class file Output.
It is also preferred that the left described tlv triple item TripleItem is the subject of tlv triple, predicate or object, it is defined as:
&ForAll; ( S i , P j , O k ) , ( 1 &le; i , j , k &le; n )
Wherein, n represents the sum of tlv triple;
IfThen v ∈ TripleItem.
The classification of described tlv triple item TripleItem specifically comprises the following steps that with filter algorithm
Input: RDF triplet format file
The class file that output: TripleItem is divided by class;Prefix code relational file
Concrete false code is as shown in following table one.
The classification of table one tlv triple item TripleItem and filter algorithm
In the present embodiment, in described step S3, obtain the result of tlv triple item TripleItem classification and filter algorithm File, as the input file of tlv triple item coding;At the Map stage class file to tlv triple item TripleItem Reason, encoded tlv triple item TripleItem in the Reduce stage, generates tlv triple item TripleItem simultaneously and compiles with it The dictionary map file of code, stores dictionary map file on the HDFS of cluster;Each tlv triple item TripleItem Coded format is: affiliated class coding+prefix code+mantissa coding;
Described tlv triple item TripleItem encryption algorithm particularly as follows:
Input: the class file that tlv triple item TripleItem is divided by class;Prefix code relational file;
Output: dictionary map file.
Concrete false code is as shown in following table two.
Table two tlv triple item TripleItem encryption algorithm
As a example by tlv triple fragment (such as table three) in LUBM data set, the process of TripleItem encryption algorithm is described.
The RDF tlv triple data slot of table three input
For terseness described below, generate TripleItem numbering and the mapping relations of initial data according to table three Table, as listed by table four.Wherein, in table three, the subject of the 3rd tlv triple and the object of the 5th tlv triple are identical, then a correspondence one Individual TripleItem.
Table four TripleItem numbering and the mapping relations of initial data
Using the tlv triple fragment listed by table three as the classification of tlv triple item TripleItem and inputting of filter algorithm Class file and prefix code relational file to TripleItem, it is assumed that the prefix code relational file of acquisition is as listed by table five.
Tlv triple fragment prefix code information listed by table five
Prefix Prefix code
(xmlns:)http://swat.cse.lehigh.edu/onto/univ-bench.owl# 01
(rdf:)http://www.w3.org/1999/02/22-rdf-syntax-ns# 02
http://www.Department0.University0.edu/ 03
http://www.Department2.University0.edu/ 04
Without prefix (such as Literal categorical data) 00
Using the classification of tlv triple item TripleItem and the destination file of filter algorithm as TripleItem encryption algorithm Input, and class (attribute) relational model can obtain the coding of TripleItem.Assume that threshold alpha represents that TripleItem compiles Ma Zhong mantissa figure place, if α=3, according to the tlv triple fragment of table three, encodes in conjunction with the TripleItem obtained by table four and table five Information is as listed by table six.
Table six TripleItem encodes information
In the present embodiment, in described step S4, according to what described tlv triple item TripleItem encryption algorithm generated Dictionary mapping table, encodes each tlv triple in the RDF triplet format file of input;By tlv triple item TripleItem is attached with dictionary mapping table, thus generates the coding of tlv triple;
Described tlv triple encryption algorithm particularly as follows:
Input: RDF triplet format file and dictionary map file;
Output: the RDF file that coding generates.
Concrete false code is as shown in following table seven.
Table seven tlv triple encryption algorithm
In the present embodiment, in described step S5, SCOM algorithm is a lossless compression algorithm, the reversion algorithm in SCOM Can recover the RDF data file after coding the most completely is initial data.Owing to SCOM algorithm establishes TripleItem And the dictionary table of coding, in conjunction with dictionary table, easily the RDF file reverse that coding generates can be changed into original RDF literary composition Part.In order to definitely SCOM inverts algorithm, it is described as follows shown in table eight with false code form:
Table eight SCOM inverts algorithm
The RDF data distributed parallel semantic coding scheme that the present embodiment proposes, it is possible under large-scale data efficiently Complete the distributed parallel coding of RDF data, and be capable of the reversion of RDF data.Experiment shows, this encoding scheme is compared existing Some encoding schemes, all have significant advantage in compressed encoding stage and reversion stage, and this encoding scheme can promote RDFS rule-based reasoning.
The foregoing is only presently preferred embodiments of the present invention, all impartial changes done according to scope of the present invention patent with Modify, all should belong to the covering scope of the present invention.

Claims (7)

1. a RDF data distributed parallel semantic coding method, it is characterised in that: specifically include following steps:
Step S1: read in RDF ontology file, builds class relational model and relation on attributes model, generates class and the mapping of coding thereof File and attribute and the mapped file of coding thereof;
Step S2: read in RDF data file, ternary component is slit into tlv triple item, is divided tlv triple item by class, and deletes weight Multiple tlv triple item, generates prefix code simultaneously;Tlv triple item is filtered, consistent in order to guarantee that RDF tlv triple encodes Property so that same tlv triple item will not be assigned to different codings;
Step S3: encoded by tlv triple item, generates dictionary table;
Step S4: tlv triple encoded, generates the tlv triple file after coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, invert generation former Beginning RDF data file.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step In rapid S1, first the ontology file of RDF data form is carried out Jena parsing, according to class relation production Methods tree, build class and close The model of system;
Wherein, definition generic attribute type mark Flag is in order to identify described class and described attribute, it is assumed that current data is v, then
Definition tree nodes encoding figure place TreenodeDigit, is called for short TD, if total nodes is M, then
Definition class coding TreeClasscode, is called for short TC, TC by Flag, lineal parent number labelling, class node sequential encoding Constitute with node sequence coding;Wherein, total nodes is M, the position that the figure place of class node sequential encoding encodes with node sequence Number is all TD (M);(h i) represents the category node coding of the i-th node A of h layer to TC;(h i) represents the i-th of h layer to f The node sequence coding of node A, and REPT (0, n) it is expressed as producing n 0;If the category node order that anc (h) represents h layer is compiled Code, (h-1 m) represents the node sequence coding of the class node B of node A, then to f
As IPF > 1 time, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute coding TreePropertycode, be called for short TP, TP by Flag, class coding, father's attribute node sequential encoding and Node sequence coding is constituted;Wherein, total nodes is M, the position that the figure place of father's attribute node sequential encoding encodes with node sequence Number is all TD (M);(h, i) represents the attribute node coding of the i-th node C of h layer to TP, and the class belonging to C is set to R, and its class saves Point coded representation be TC (p, r);F (h, i) represents that the node sequence of the i-th node C of h layer encodes, and REPT (0, n) represent For producing n 0;If anc (h) represents the attribute node sequential encoding of h layer, (h-1 m) represents father's attribute node of node C to f The node sequence coding of D, then
Described relational tree is a multiway tree, is combined the definition of class coding by width first traversal, obtains the pass of relation on attributes System tree, generates class coding.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step In rapid S2, tlv triple being divided by class, the predicate of described tlv triple is the attribute in RDF ontology file, is building relation on attributes Model generates attribute coding, has then had only to divide subject and the object of tlv triple by class;If tlv triple in RDF data Item TripleItem is unique, then enter tlv triple item TripleItem while tlv triple item TripleItem is divided by class Row filters;
The classification of described tlv triple item TripleItem and filter algorithm are particularly as follows: input RDF triplet format file;Output three Tuple item TripleItem presses class file and the prefix code relational file that class divides;
If different tlv triple item TripleItem shares identical URI, in order to ensure coding, there is Semantic Similarity so that phase It is encoded into similar numeral like URI, then extracts identical prefix according to RDF data file;
This step need rewrite MapReduce MultipleOutputFormat make output file can be defeated by class file Go out.
A kind of RDF data distributed parallel semantic coding method the most according to claim 3, it is characterised in that: described three Tuple item TripleItem is the subject of tlv triple, predicate or object, is defined as:
Wherein, n represents the sum of tlv triple;
If, then
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step In rapid S3, obtain the destination file of tlv triple item TripleItem classification and filter algorithm, as the input of tlv triple item coding File;In the Map stage, the class file of tlv triple item TripleItem is processed, in the Reduce stage to tlv triple item TripleItem encodes, and generates the dictionary map file that tlv triple item TripleItem encodes with it, by dictionary simultaneously Map file stores on the HDFS of cluster;Each tlv triple item TripleItem coded format is: affiliated class coding+prefix Coding+mantissa coding;
Described tlv triple item TripleItem encryption algorithm particularly as follows: input tlv triple item TripleItem by class divide class File and prefix code relational file;Output dictionary map file.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step In rapid S4, according to the dictionary mapping table generated in described tlv triple item TripleItem encryption algorithm, the RDF tlv triple to input Each tlv triple in formatted file encodes;Tlv triple item TripleItem is attached with dictionary mapping table, from And generate the coding of tlv triple;
Described tlv triple encryption algorithm particularly as follows: input RDF triplet format file and dictionary map file
The RDF file that output coding generates.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step In rapid S5, establish tlv triple item TripleItem and the dictionary table of coding thereof according to SCOM algorithm, in conjunction with dictionary table, use The RDF file reverse that coding generates is changed into original RDF file by SCOM reversion algorithm.
CN201610242787.1A 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method Active CN105930419B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610242787.1A CN105930419B (en) 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610242787.1A CN105930419B (en) 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method

Publications (2)

Publication Number Publication Date
CN105930419A true CN105930419A (en) 2016-09-07
CN105930419B CN105930419B (en) 2019-08-09

Family

ID=56838391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610242787.1A Active CN105930419B (en) 2016-04-19 2016-04-19 RDF data distributed parallel semantic coding method

Country Status (1)

Country Link
CN (1) CN105930419B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110329A (en) * 2019-04-30 2019-08-09 湖南星汉数智科技有限公司 A kind of entity behavior derivation method, apparatus, computer installation and computer readable storage medium
CN110457491A (en) * 2019-08-19 2019-11-15 中国农业大学 A kind of knowledge mapping reconstructing method and device based on free state node
CN110516079A (en) * 2019-08-29 2019-11-29 北京大学 A kind of RDF object model class hierarchy tree method for building up and system
CN111144123A (en) * 2018-10-16 2020-05-12 工业互联网创新中心(上海)有限公司 Industrial Internet identification analysis data dictionary construction method
CN112182139A (en) * 2019-08-29 2021-01-05 盈盛智创科技(广州)有限公司 Method, device and equipment for tracing resource description framework triple
CN113676290A (en) * 2021-09-27 2021-11-19 深圳市金斧子网络科技有限公司 Data transmission method based on fund system and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243770A1 (en) * 2007-03-29 2008-10-02 Franz Inc. Method for creating a scalable graph database
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding
CN104462610A (en) * 2015-01-06 2015-03-25 福州大学 Distributed type RDF storage and query optimization method combined with body
CN104615703A (en) * 2015-01-30 2015-05-13 福州大学 RDF data distributed parallel inference method combined with Rete algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243770A1 (en) * 2007-03-29 2008-10-02 Franz Inc. Method for creating a scalable graph database
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding
CN104462610A (en) * 2015-01-06 2015-03-25 福州大学 Distributed type RDF storage and query optimization method combined with body
CN104615703A (en) * 2015-01-30 2015-05-13 福州大学 RDF data distributed parallel inference method combined with Rete algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁平鹏: "高可扩展的RDF数据存储系统", 《计算机研究与发展》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144123A (en) * 2018-10-16 2020-05-12 工业互联网创新中心(上海)有限公司 Industrial Internet identification analysis data dictionary construction method
CN111144123B (en) * 2018-10-16 2024-02-02 工业互联网创新中心(上海)有限公司 Industrial Internet identification analysis data dictionary construction method
CN110110329A (en) * 2019-04-30 2019-08-09 湖南星汉数智科技有限公司 A kind of entity behavior derivation method, apparatus, computer installation and computer readable storage medium
CN110110329B (en) * 2019-04-30 2022-05-17 湖南星汉数智科技有限公司 Entity behavior extraction method and device, computer device and computer readable storage medium
CN110457491A (en) * 2019-08-19 2019-11-15 中国农业大学 A kind of knowledge mapping reconstructing method and device based on free state node
CN110516079A (en) * 2019-08-29 2019-11-29 北京大学 A kind of RDF object model class hierarchy tree method for building up and system
CN112182139A (en) * 2019-08-29 2021-01-05 盈盛智创科技(广州)有限公司 Method, device and equipment for tracing resource description framework triple
CN110516079B (en) * 2019-08-29 2022-04-29 北京大学 RDF object model class hierarchical tree establishing method and system
CN113676290A (en) * 2021-09-27 2021-11-19 深圳市金斧子网络科技有限公司 Data transmission method based on fund system and related equipment
CN113676290B (en) * 2021-09-27 2024-05-03 深圳市金斧子网络科技有限公司 Data transmission method and related equipment based on foundation system

Also Published As

Publication number Publication date
CN105930419B (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN105930419B (en) RDF data distributed parallel semantic coding method
Elmendorf et al. Rings, modules, and algebras in infinite loop space theory
Grigorchuk et al. Spectra of Schreier graphs of Grigorchuk’s group and Schroedinger operators with aperiodic order
CN108509543B (en) Streaming RDF data multi-keyword parallel search method based on Spark Streaming
CN108509614A (en) A kind of task record management and analysis method based on chart database
Kontopoulos et al. A space efficient scheme for persistent graph representation
Karapiperis et al. A distributed near-optimal LSH-based framework for privacy-preserving record linkage
Hernández-Illera et al. Serializing RDF in compressed space
Wilson et al. A mathematical framework for transformations of physical processes
Alyas et al. Query optimization framework for graph database in cloud dew environment
CN106294548A (en) The compression method of a kind of data of tracing to the source and system
CN111447188B (en) Carrier-free text steganography method based on language steganography feature space
WO2019073967A1 (en) k-ANONYMIZATION DEVICE, METHOD, AND PROGRAM
Freslon On two-coloured noncrossing partition quantum groups
CN108595588B (en) Scientific data storage association method
CN102043802B (en) Method for searching XML (Extensive Makeup Language) key words based on structural abstract
CN104462610B (en) Distributed RDF storages and enquiring and optimizing method with reference to body
Harris et al. Exterior power operations on higher K-groups via binary complexes
Kambayashi Processing cyclic queries
CN112395286B (en) Chained data table connection method, device, equipment and storage medium
CN103150346A (en) Wireless sensor network data compression method based on extensible markup language
Suri et al. Ternary tree and memory-efficient Huffman decoding algorithm
Priss An FCA interpretation of relation algebra
Jati et al. Big data compression using spiht in hadoop: A case study in multi-lead ECG signals
Ohara et al. Cotorsion pairs in Hopfological algebra

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant