CN105930419A - RDF data distributed parallel semantic coding method - Google Patents
RDF data distributed parallel semantic coding method Download PDFInfo
- Publication number
- CN105930419A CN105930419A CN201610242787.1A CN201610242787A CN105930419A CN 105930419 A CN105930419 A CN 105930419A CN 201610242787 A CN201610242787 A CN 201610242787A CN 105930419 A CN105930419 A CN 105930419A
- Authority
- CN
- China
- Prior art keywords
- coding
- tlv triple
- file
- class
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention relates to an RDF data distributed parallel semantic coding method. The method specifically comprises the following steps of S1: reading an RDF ontology file and constructing a class relation model and an attribute relation model; S2: reading an RDF data file, dividing a triple into triple items, classifying the triple items by class, deleting the repeated triple items, and generating prefix codes; filtering the triple items to ensure the consistency of RDF triple codes and enable the same triple item not to be allocated with different codes; S3: coding the triple items to generate a dictionary table; S4: coding the triple to generate a coded triple file; and S5: taking a result file in the step S4 as an input of the step S5, and performing inversion according to the dictionary table in the step S3 to generate an original RDF data file. According to the method, compressed coding and inversion of large-scale data can be efficiently realized in combination with ontology in a distributed environment.
Description
Technical field
The present invention relates to semantic network technology field, particularly relate to a kind of RDF data distributed parallel semantic coding method.
Background technology
Owing to the extensive property of RDF data makes its management there is limitation, in order to accelerate the inquiry of RDF data or push away
Reason, reduces the memory space of data, and common practice is compressed coding exactly to tlv triple.Compressed encoding has proven to
A kind of efficient coding, by replacing original tlv triple item (subject or predicate or object) by a numerical value (ID), the most at last
All tlv triple data are converted to the data of numerical value formula.Centralized environment, due to the restriction of internal memory, is not suitable for extensive number
According to coding.Research RDF data distributed parallel Coding Compression Algorithm is a newest field.Goodman et al. exists
The method proposing on Gray XMT machine to adapt to linear probing, achieves parallel by parallel Hash on single dictionary table
Coding.The scramble time of this algorithm is linear with the computer check figure used, and this method requires that all of data are protected
Exist in internal memory and depend critically upon the Gray XMT at shared drive framework, not being suitable for common distributed memory system.
LongCheng et al. uses X10 language to be compressed RDF data.First tlv triple is filtered, further according to tlv triple item
Hash value the different node that is assigned to of the quantity such as tlv triple data is carried out local coder, and generate multiple dictionary table.
Urbani et al. proposes distributed MapReduce data compression algorithm, is broadly divided into data compression stage and data reversal stage,
Wherein in the data compression stage, tlv triple is compressed, and builds dictionary table;Reversion the stage, will compression after tlv triple and
Dictionary table is attached, thus generates original tlv triple data.This algorithm is not high enough in data reversal stage efficiency.
It is more than the newest research results of current RDF data distributed parallel compression, is also that three kinds presently, there are are effective
The parallelly compressed algorithm of RDF data, it is possible to realize the parallelly compressed coding of magnanimity RDF data, but these compression algorithms all do not consider
In conjunction with ontology file, therefore the tlv triple after coding does not represent any semantic information, is unfavorable for that the later stage carries out distributed query
Or semantic reasoning.There is presently no and combine ontology file and realize the parallel semantic coding of RDF data.
Cannot meet the demand of mass data under centralized environment, and the compressed encoding under distributed environment does not represent and appoints
What semantic information, is unfavorable for distributed query or reasoning.Some distributed compression algorithm is inadequate in the efficiency in data reversal stage
High.
The technical issues that need to address: 1. solve how to ensure the uniqueness that tlv triple item encodes, i.e. in distributed environment
Identical tlv triple item will not be assigned to different codings.2. solve how distributed environment ensures the lossless compress of coding
Matter, i.e. tlv triple after coding can be reversed to original tlv triple.3. combine the distributed schemes proposed and propose correspondence
Parallel coding scheme, thus meet the demand of the distributed parallel semantic coding of large-scale data.
Summary of the invention
In view of this, it is an object of the invention to provide a kind of RDF data distributed parallel semantic coding method, mainly in combination with
RDF data is encoded by body so that the coding of RDF tlv triple has regularity with semantic information and coding, is beneficial to divide
Cloth inquiry completes with semantic reasoning, combines body and can realize the compression of large-scale data efficiently under distributed environment
Coding and reversion.
The present invention uses below scheme to realize: a kind of RDF data distributed parallel semantic coding method, specifically includes following
Step:
Step S1: read in RDF ontology file, builds class relational model and relation on attributes model, generation class and coding thereof
Mapped file and attribute and the mapped file of coding thereof;
Step S2: read in RDF data file, ternary component is slit into tlv triple item, is divided tlv triple item by class, and deletes
Except the tlv triple item repeated, generate prefix code simultaneously;Tlv triple item is filtered, in order to guarantee what RDF tlv triple encoded
Concordance so that same tlv triple item will not be assigned to different codings;
Step S3: encoded by tlv triple item, generates dictionary table;
Step S4: tlv triple encoded, generates the tlv triple file after coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, invert raw
Become original RDF data file.
Further, in described step S1, first the ontology file of RDF data form is carried out Jena parsing, according to class
Relation production Methods tree, builds the model of class relation;
Wherein, definition generic attribute type mark Flag is in order to identify described class and described attribute, it is assumed that current data is v,
Then
Definition tree nodes encoding figure place TreenodeDigit, is called for short TD, if total nodes is M, then
Definition class coding TreeClasscode, is called for short TC, TC by Flag, lineal parent number labelling, class node order
Coding and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of class node sequential encoding encodes with node sequence
Figure place be all TD (M);(h i) represents the category node coding of the i-th node A of h layer to TC;(h i) represents the i-th of h layer to f
The node sequence coding of individual node A, and REPT (0, n) it is expressed as producing n 0;If the category node order that anc (h) represents h layer is compiled
Code, (h-1 m) represents the node sequence coding of the class node B of node A, then to f
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1 time, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute coding TreePropertycode, is called for short TP, TP and is compiled by Flag, class coding, father's attribute node order
Code and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of father's attribute node sequential encoding encodes with node sequence
Figure place be all TD (M);TP (h, i) represents the attribute node coding of the i-th node C of h layer, and the class belonging to C is set to R, its
Category node coded representation be TC (p, r);F (h, i) represents that the node sequence of the i-th node C of h layer encodes, and REPT (0, n) table
It is shown as generation n 0;If anc (h) represents the attribute node sequential encoding of h layer, (h-1 m) represents father's attribute joint of node C to f
The node sequence coding of some D, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
Described relational tree is a multiway tree, is combined the definition of class coding by width first traversal, obtains relation on attributes
Relational tree, generate class coding.
Further, in described step S2, tlv triple being divided by class, the predicate of described tlv triple is RDF ontology file
In attribute, generated attribute coding building in relation on attributes model, then have only to by the subject of tlv triple and object by
Class divides;If tlv triple item TripleItem is unique in RDF data, then press the same of class division at tlv triple item TripleItem
Time tlv triple item TripleItem is filtered;
Described tlv triple item TripleItem classification with filter algorithm particularly as follows: input RDF triplet format file;Defeated
Go out class file and prefix code relational file that tlv triple item TripleItem is divided by class;
If different tlv triple item TripleItem shares identical URI, in order to ensure coding, there is Semantic Similarity, make
Obtain similar URI and be encoded into similar numeral, then extract identical prefix according to RDF data file;
The file that this step needs the MultipleOutputFormat rewriteeing MapReduce to make to export can be by class literary composition
Part exports.
It is also preferred that the left described tlv triple item TripleItem is the subject of tlv triple, predicate or object, it is defined as:
Wherein, n represents the sum of tlv triple;
IfThen v ∈ TripleItem.
Further, in described step S3, obtain the destination file of tlv triple item TripleItem classification and filter algorithm,
Input file as tlv triple item coding;In the Map stage, the class file of tlv triple item TripleItem is processed,
Tlv triple item TripleItem is encoded by the Reduce stage, generates the word that tlv triple item TripleItem encodes with it simultaneously
Allusion quotation map file, stores dictionary map file on the HDFS of cluster;Each tlv triple item TripleItem encodes lattice
Formula is: affiliated class coding+prefix code+mantissa coding;
Described tlv triple item TripleItem encryption algorithm particularly as follows: input tlv triple item TripleItem by class divide
Class file and prefix code relational file;Output dictionary map file.
Further, in described step S4, according to the dictionary generated in described tlv triple item TripleItem encryption algorithm
Mapping table, encodes each tlv triple in the RDF triplet format file of input;By tlv triple item TripleItem
It is attached with dictionary mapping table, thus generates the coding of tlv triple;
Described tlv triple encryption algorithm particularly as follows: input RDF triplet format file and dictionary map file
The RDF file that output coding generates.
Further, in described step S5, establish tlv triple item TripleItem and coding thereof according to SCOM algorithm
Dictionary table, in conjunction with dictionary table, uses SCOM reversion algorithm that the RDF file reverse that coding generates is changed into original RDF file.
Compared with prior art, present invention have the advantage that the RDF data distributed parallel semanteme that the present invention proposes is compiled
Code scheme, it is possible to be efficiently completed the distributed parallel coding of RDF data under large-scale data, and be capable of RDF data
Reversion;This encoding scheme compares existing encoding scheme, all has significant advantage in compressed encoding stage and reversion stage, and
And this encoding scheme can promote RDFS rule-based reasoning.
Accompanying drawing explanation
Fig. 1 is the method frame schematic diagram of the present invention.
Fig. 2 is classification relation model schematic in the middle part of LUBM in the present invention.
Fig. 3 is to adhere to sexual relationship model in the present invention in the middle part of LUBM separately
Detailed description of the invention
Below in conjunction with the accompanying drawings and embodiment the present invention will be further described.
The present embodiment provides a kind of RDF data distributed parallel semantic coding method (Semantic Coding with
Ontology on MapReduce, is called for short SCOM).Feature according to MapReduce combines ontological construction class relation and attribute closes
It is model, according to model, RDF data is carried out sorting code number such that it is able to realize the distributed parallel compressed encoding of RDF data,
Wherein SCOM scheme is divided into compressed encoding stage and reversion stage, as it is shown in figure 1, specifically include following steps:
Step S1: read in RDF ontology file, builds class relational model and relation on attributes model, generation class and coding thereof
Mapped file and attribute and the mapped file of coding thereof;
Step S2: read in RDF data file, ternary component is slit into tlv triple item, is divided tlv triple item by class, and deletes
Except the tlv triple item repeated, generate prefix code simultaneously;Tlv triple item is filtered, in order to guarantee what RDF tlv triple encoded
Concordance so that same tlv triple item will not be assigned to different codings;
Step S3: encoded by tlv triple item, generates dictionary table;
Step S4: tlv triple encoded, generates the tlv triple file after coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, invert raw
Become original RDF data file.
In the present embodiment, in described step S1, in order to make the coding of RDF data have semantic information, need
Generation class encodes.Owing to the predicate in RDF data is all defined in ontology file, and quantity is far less than in RDF data
Subject or object, thus this step has needed the coding of attribute after encoding class.
In the present embodiment, following several definition is given:
Generic attribute type mark Flag is in order to identify described class and described attribute in definition, it is assumed that current data is v, then
Definition tree nodes encoding figure place TreenodeDigit, is called for short TD, if total nodes is M, then
Definition class coding TreeClasscode, is called for short TC, TC by Flag, lineal parent number labelling, class node order
Coding and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of class node sequential encoding encodes with node sequence
Figure place be all TD (M);(h i) represents the category node coding of the i-th node A of h layer to TC;(h i) represents the i-th of h layer to f
The node sequence coding of individual node A, and REPT (0, n) it is expressed as producing n 0;If the category node order that anc (h) represents h layer is compiled
Code, (h-1 m) represents the node sequence coding of the class node B of node A, then to f
TC (h, i)=Flag&IPF&REPT (0, TD (M)-TD (f (h-1, m))) &
f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i)
As IPF > 1 time, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute coding TreePropertycode, is called for short TP, TP and is compiled by Flag, class coding, father's attribute node order
Code and node sequence coding are constituted;Wherein, total nodes is M, and the figure place of father's attribute node sequential encoding encodes with node sequence
Figure place be all TD (M);TP (h, i) represents the attribute node coding of the i-th node C of h layer, and the class belonging to C is set to R, its
Category node coded representation be TC (p, r);F (h, i) represents that the node sequence of the i-th node C of h layer encodes, and REPT (0, n) table
It is shown as generation n 0;If anc (h) represents the attribute node sequential encoding of h layer, (h-1 m) represents father's attribute joint of node C to f
The node sequence coding of some D, then
TP (h, i)=Flag&TC (p, r) &REPT (0, TD (M)-TD (f (h-1, m)))
&f(h-1,m)&REPT(0,TD(M)-TD(f(h,i)))&f(h,i);
In the present embodiment, described step S2, first the ontology file of RDF data form is carried out Jena parsing, according to
Class relation production Methods tree (subclass and parent), builds the model of class relation;Described relational tree is a multiway tree, passes through range
Priority algorithm combines the definition of class coding, obtains the relational tree of relation on attributes, generates class coding.
As a example by class fragment in LUBM data set, it is assumed that according to bits of coded determined by the definition of tree node coding figure place
Number is 2, then the class relational model built is as in figure 2 it is shown, first wherein encoded represents class labelling, and second represents lineal father
Class number labelling, the combination of the 3rd and the 4th constitutes the lineal class node sequential encoding of current class, last two structures
The node sequence having become current class encodes.Wherein, Things class is the parent of all classes, and its lineal parent number is labeled as 0
(i.e. IPF=0).
Parent in view of a class may more than one, it is assumed that Part-timeGraduateStudent class (on-job is ground
Study carefully life) lineal parent be the GraduateStudent class (postgraduate) in Fig. 2 class relational model and
TeachingAssistant class (assistant).Then, the class node sequential encoding of Part-timeGraduateStudent class is
The combination (i.e. 0911) of the sequential encoding of GraduateStudent class and TeachingAssistant class.Now, Part-
TimeGraduateStudent class be encoded to 02091113, wherein, IPF=2 represents Part-timeGraduateStudent
Class has two lineal parents.
Similar with building class relational tree, construct the relational tree of attribute, attribute is encoded, encode different being from class
Attribute coding needs to add class coding information so that attribute coding contains semantic information.
As a example by attribute fragment in LUBM data set, it is assumed that encode determined by the definition of tree node coding figure place
Figure place is 2, encodes in conjunction with the class in Fig. 2, then relation on attributes model is as it is shown on figure 3, first wherein encoded represents attribute mark
Note, class coding (the definition territory class of attribute) being combined as current attribute of second to the 7th, the 8th and the group of the 9th
Being combined into lineal father's attribute node sequential encoding of current attribute, last two is the node sequence coding of current attribute.Additionally, should
Step generates attribute definition territory and codomain file.
Use this coded system, it is possible to the coding for hereinafter RDF data increases semantic information newly, when TripleItem is
When subject or object, it is possible to judge affiliated category information according to its coding, when TripleItem is predicate, it is possible to obtain current
Father's attribute of predicate or the information of affiliated class.
In the present embodiment, in described step S2, tlv triple being divided by class, the predicate of described tlv triple is RDF body
Attribute in file, has generated attribute coding building in relation on attributes model, then have only to the subject of tlv triple and guest
Language is divided by class;Owing in RDF data, tlv triple item TripleItem uniquely, thus may not delete the tlv triple item of repetition
TripleItem, it is ensured that the uniqueness of tlv triple item TripleItem, it is ensured that identical TripleItem will not be assigned to difference
Coding.Additionally, due to different tlv triple item TripleItem may share identical URI, in order to ensure coding, there is semanteme
Similarity so that similar URI is encoded into similar numeral, extracts identical prefix (NameSpace) according to RDF data file.
Additionally, this step need rewrite MapReduce MultipleOutputFormat make output file can be by class file
Output.
It is also preferred that the left described tlv triple item TripleItem is the subject of tlv triple, predicate or object, it is defined as:
Wherein, n represents the sum of tlv triple;
IfThen v ∈ TripleItem.
The classification of described tlv triple item TripleItem specifically comprises the following steps that with filter algorithm
Input: RDF triplet format file
The class file that output: TripleItem is divided by class;Prefix code relational file
Concrete false code is as shown in following table one.
The classification of table one tlv triple item TripleItem and filter algorithm
In the present embodiment, in described step S3, obtain the result of tlv triple item TripleItem classification and filter algorithm
File, as the input file of tlv triple item coding;At the Map stage class file to tlv triple item TripleItem
Reason, encoded tlv triple item TripleItem in the Reduce stage, generates tlv triple item TripleItem simultaneously and compiles with it
The dictionary map file of code, stores dictionary map file on the HDFS of cluster;Each tlv triple item TripleItem
Coded format is: affiliated class coding+prefix code+mantissa coding;
Described tlv triple item TripleItem encryption algorithm particularly as follows:
Input: the class file that tlv triple item TripleItem is divided by class;Prefix code relational file;
Output: dictionary map file.
Concrete false code is as shown in following table two.
Table two tlv triple item TripleItem encryption algorithm
As a example by tlv triple fragment (such as table three) in LUBM data set, the process of TripleItem encryption algorithm is described.
The RDF tlv triple data slot of table three input
For terseness described below, generate TripleItem numbering and the mapping relations of initial data according to table three
Table, as listed by table four.Wherein, in table three, the subject of the 3rd tlv triple and the object of the 5th tlv triple are identical, then a correspondence one
Individual TripleItem.
Table four TripleItem numbering and the mapping relations of initial data
Using the tlv triple fragment listed by table three as the classification of tlv triple item TripleItem and inputting of filter algorithm
Class file and prefix code relational file to TripleItem, it is assumed that the prefix code relational file of acquisition is as listed by table five.
Tlv triple fragment prefix code information listed by table five
Prefix | Prefix code |
(xmlns:)http://swat.cse.lehigh.edu/onto/univ-bench.owl# | 01 |
(rdf:)http://www.w3.org/1999/02/22-rdf-syntax-ns# | 02 |
http://www.Department0.University0.edu/ | 03 |
http://www.Department2.University0.edu/ | 04 |
Without prefix (such as Literal categorical data) | 00 |
Using the classification of tlv triple item TripleItem and the destination file of filter algorithm as TripleItem encryption algorithm
Input, and class (attribute) relational model can obtain the coding of TripleItem.Assume that threshold alpha represents that TripleItem compiles
Ma Zhong mantissa figure place, if α=3, according to the tlv triple fragment of table three, encodes in conjunction with the TripleItem obtained by table four and table five
Information is as listed by table six.
Table six TripleItem encodes information
In the present embodiment, in described step S4, according to what described tlv triple item TripleItem encryption algorithm generated
Dictionary mapping table, encodes each tlv triple in the RDF triplet format file of input;By tlv triple item
TripleItem is attached with dictionary mapping table, thus generates the coding of tlv triple;
Described tlv triple encryption algorithm particularly as follows:
Input: RDF triplet format file and dictionary map file;
Output: the RDF file that coding generates.
Concrete false code is as shown in following table seven.
Table seven tlv triple encryption algorithm
In the present embodiment, in described step S5, SCOM algorithm is a lossless compression algorithm, the reversion algorithm in SCOM
Can recover the RDF data file after coding the most completely is initial data.Owing to SCOM algorithm establishes TripleItem
And the dictionary table of coding, in conjunction with dictionary table, easily the RDF file reverse that coding generates can be changed into original RDF literary composition
Part.In order to definitely SCOM inverts algorithm, it is described as follows shown in table eight with false code form:
Table eight SCOM inverts algorithm
The RDF data distributed parallel semantic coding scheme that the present embodiment proposes, it is possible under large-scale data efficiently
Complete the distributed parallel coding of RDF data, and be capable of the reversion of RDF data.Experiment shows, this encoding scheme is compared existing
Some encoding schemes, all have significant advantage in compressed encoding stage and reversion stage, and this encoding scheme can promote
RDFS rule-based reasoning.
The foregoing is only presently preferred embodiments of the present invention, all impartial changes done according to scope of the present invention patent with
Modify, all should belong to the covering scope of the present invention.
Claims (7)
1. a RDF data distributed parallel semantic coding method, it is characterised in that: specifically include following steps:
Step S1: read in RDF ontology file, builds class relational model and relation on attributes model, generates class and the mapping of coding thereof
File and attribute and the mapped file of coding thereof;
Step S2: read in RDF data file, ternary component is slit into tlv triple item, is divided tlv triple item by class, and deletes weight
Multiple tlv triple item, generates prefix code simultaneously;Tlv triple item is filtered, consistent in order to guarantee that RDF tlv triple encodes
Property so that same tlv triple item will not be assigned to different codings;
Step S3: encoded by tlv triple item, generates dictionary table;
Step S4: tlv triple encoded, generates the tlv triple file after coding;
Step S5: using the destination file of step S4 as the input of this step, according to the dictionary table in step S3, invert generation former
Beginning RDF data file.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step
In rapid S1, first the ontology file of RDF data form is carried out Jena parsing, according to class relation production Methods tree, build class and close
The model of system;
Wherein, definition generic attribute type mark Flag is in order to identify described class and described attribute, it is assumed that current data is v, then
Definition tree nodes encoding figure place TreenodeDigit, is called for short TD, if total nodes is M, then
Definition class coding TreeClasscode, is called for short TC, TC by Flag, lineal parent number labelling, class node sequential encoding
Constitute with node sequence coding;Wherein, total nodes is M, the position that the figure place of class node sequential encoding encodes with node sequence
Number is all TD (M);(h i) represents the category node coding of the i-th node A of h layer to TC;(h i) represents the i-th of h layer to f
The node sequence coding of node A, and REPT (0, n) it is expressed as producing n 0;If the category node order that anc (h) represents h layer is compiled
Code, (h-1 m) represents the node sequence coding of the class node B of node A, then to f
As IPF > 1 time, class node sequential encoding is the combination of the node sequence coding of all lineal parents;
Defined attribute coding TreePropertycode, be called for short TP, TP by Flag, class coding, father's attribute node sequential encoding and
Node sequence coding is constituted;Wherein, total nodes is M, the position that the figure place of father's attribute node sequential encoding encodes with node sequence
Number is all TD (M);(h, i) represents the attribute node coding of the i-th node C of h layer to TP, and the class belonging to C is set to R, and its class saves
Point coded representation be TC (p, r);F (h, i) represents that the node sequence of the i-th node C of h layer encodes, and REPT (0, n) represent
For producing n 0;If anc (h) represents the attribute node sequential encoding of h layer, (h-1 m) represents father's attribute node of node C to f
The node sequence coding of D, then
;
Described relational tree is a multiway tree, is combined the definition of class coding by width first traversal, obtains the pass of relation on attributes
System tree, generates class coding.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step
In rapid S2, tlv triple being divided by class, the predicate of described tlv triple is the attribute in RDF ontology file, is building relation on attributes
Model generates attribute coding, has then had only to divide subject and the object of tlv triple by class;If tlv triple in RDF data
Item TripleItem is unique, then enter tlv triple item TripleItem while tlv triple item TripleItem is divided by class
Row filters;
The classification of described tlv triple item TripleItem and filter algorithm are particularly as follows: input RDF triplet format file;Output three
Tuple item TripleItem presses class file and the prefix code relational file that class divides;
If different tlv triple item TripleItem shares identical URI, in order to ensure coding, there is Semantic Similarity so that phase
It is encoded into similar numeral like URI, then extracts identical prefix according to RDF data file;
This step need rewrite MapReduce MultipleOutputFormat make output file can be defeated by class file
Go out.
A kind of RDF data distributed parallel semantic coding method the most according to claim 3, it is characterised in that: described three
Tuple item TripleItem is the subject of tlv triple, predicate or object, is defined as:
Wherein, n represents the sum of tlv triple;
If, then。
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step
In rapid S3, obtain the destination file of tlv triple item TripleItem classification and filter algorithm, as the input of tlv triple item coding
File;In the Map stage, the class file of tlv triple item TripleItem is processed, in the Reduce stage to tlv triple item
TripleItem encodes, and generates the dictionary map file that tlv triple item TripleItem encodes with it, by dictionary simultaneously
Map file stores on the HDFS of cluster;Each tlv triple item TripleItem coded format is: affiliated class coding+prefix
Coding+mantissa coding;
Described tlv triple item TripleItem encryption algorithm particularly as follows: input tlv triple item TripleItem by class divide class
File and prefix code relational file;Output dictionary map file.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step
In rapid S4, according to the dictionary mapping table generated in described tlv triple item TripleItem encryption algorithm, the RDF tlv triple to input
Each tlv triple in formatted file encodes;Tlv triple item TripleItem is attached with dictionary mapping table, from
And generate the coding of tlv triple;
Described tlv triple encryption algorithm particularly as follows: input RDF triplet format file and dictionary map file
The RDF file that output coding generates.
A kind of RDF data distributed parallel semantic coding method the most according to claim 1, it is characterised in that: described step
In rapid S5, establish tlv triple item TripleItem and the dictionary table of coding thereof according to SCOM algorithm, in conjunction with dictionary table, use
The RDF file reverse that coding generates is changed into original RDF file by SCOM reversion algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610242787.1A CN105930419B (en) | 2016-04-19 | 2016-04-19 | RDF data distributed parallel semantic coding method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610242787.1A CN105930419B (en) | 2016-04-19 | 2016-04-19 | RDF data distributed parallel semantic coding method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105930419A true CN105930419A (en) | 2016-09-07 |
CN105930419B CN105930419B (en) | 2019-08-09 |
Family
ID=56838391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610242787.1A Active CN105930419B (en) | 2016-04-19 | 2016-04-19 | RDF data distributed parallel semantic coding method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105930419B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110329A (en) * | 2019-04-30 | 2019-08-09 | 湖南星汉数智科技有限公司 | A kind of entity behavior derivation method, apparatus, computer installation and computer readable storage medium |
CN110457491A (en) * | 2019-08-19 | 2019-11-15 | 中国农业大学 | A kind of knowledge mapping reconstructing method and device based on free state node |
CN110516079A (en) * | 2019-08-29 | 2019-11-29 | 北京大学 | A kind of RDF object model class hierarchy tree method for building up and system |
CN111144123A (en) * | 2018-10-16 | 2020-05-12 | 工业互联网创新中心(上海)有限公司 | Industrial Internet identification analysis data dictionary construction method |
CN112182139A (en) * | 2019-08-29 | 2021-01-05 | 盈盛智创科技(广州)有限公司 | Method, device and equipment for tracing resource description framework triple |
CN113676290A (en) * | 2021-09-27 | 2021-11-19 | 深圳市金斧子网络科技有限公司 | Data transmission method based on fund system and related equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080243770A1 (en) * | 2007-03-29 | 2008-10-02 | Franz Inc. | Method for creating a scalable graph database |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
CN104462610A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | Distributed type RDF storage and query optimization method combined with body |
CN104615703A (en) * | 2015-01-30 | 2015-05-13 | 福州大学 | RDF data distributed parallel inference method combined with Rete algorithm |
-
2016
- 2016-04-19 CN CN201610242787.1A patent/CN105930419B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080243770A1 (en) * | 2007-03-29 | 2008-10-02 | Franz Inc. | Method for creating a scalable graph database |
CN104462609A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | RDF data storage and query method combined with star figure coding |
CN104462610A (en) * | 2015-01-06 | 2015-03-25 | 福州大学 | Distributed type RDF storage and query optimization method combined with body |
CN104615703A (en) * | 2015-01-30 | 2015-05-13 | 福州大学 | RDF data distributed parallel inference method combined with Rete algorithm |
Non-Patent Citations (1)
Title |
---|
袁平鹏: "高可扩展的RDF数据存储系统", 《计算机研究与发展》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144123A (en) * | 2018-10-16 | 2020-05-12 | 工业互联网创新中心(上海)有限公司 | Industrial Internet identification analysis data dictionary construction method |
CN111144123B (en) * | 2018-10-16 | 2024-02-02 | 工业互联网创新中心(上海)有限公司 | Industrial Internet identification analysis data dictionary construction method |
CN110110329A (en) * | 2019-04-30 | 2019-08-09 | 湖南星汉数智科技有限公司 | A kind of entity behavior derivation method, apparatus, computer installation and computer readable storage medium |
CN110110329B (en) * | 2019-04-30 | 2022-05-17 | 湖南星汉数智科技有限公司 | Entity behavior extraction method and device, computer device and computer readable storage medium |
CN110457491A (en) * | 2019-08-19 | 2019-11-15 | 中国农业大学 | A kind of knowledge mapping reconstructing method and device based on free state node |
CN110516079A (en) * | 2019-08-29 | 2019-11-29 | 北京大学 | A kind of RDF object model class hierarchy tree method for building up and system |
CN112182139A (en) * | 2019-08-29 | 2021-01-05 | 盈盛智创科技(广州)有限公司 | Method, device and equipment for tracing resource description framework triple |
CN110516079B (en) * | 2019-08-29 | 2022-04-29 | 北京大学 | RDF object model class hierarchical tree establishing method and system |
CN113676290A (en) * | 2021-09-27 | 2021-11-19 | 深圳市金斧子网络科技有限公司 | Data transmission method based on fund system and related equipment |
CN113676290B (en) * | 2021-09-27 | 2024-05-03 | 深圳市金斧子网络科技有限公司 | Data transmission method and related equipment based on foundation system |
Also Published As
Publication number | Publication date |
---|---|
CN105930419B (en) | 2019-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105930419B (en) | RDF data distributed parallel semantic coding method | |
Elmendorf et al. | Rings, modules, and algebras in infinite loop space theory | |
Grigorchuk et al. | Spectra of Schreier graphs of Grigorchuk’s group and Schroedinger operators with aperiodic order | |
CN108509543B (en) | Streaming RDF data multi-keyword parallel search method based on Spark Streaming | |
CN108509614A (en) | A kind of task record management and analysis method based on chart database | |
Kontopoulos et al. | A space efficient scheme for persistent graph representation | |
Karapiperis et al. | A distributed near-optimal LSH-based framework for privacy-preserving record linkage | |
Hernández-Illera et al. | Serializing RDF in compressed space | |
Wilson et al. | A mathematical framework for transformations of physical processes | |
Alyas et al. | Query optimization framework for graph database in cloud dew environment | |
CN106294548A (en) | The compression method of a kind of data of tracing to the source and system | |
CN111447188B (en) | Carrier-free text steganography method based on language steganography feature space | |
WO2019073967A1 (en) | k-ANONYMIZATION DEVICE, METHOD, AND PROGRAM | |
Freslon | On two-coloured noncrossing partition quantum groups | |
CN108595588B (en) | Scientific data storage association method | |
CN102043802B (en) | Method for searching XML (Extensive Makeup Language) key words based on structural abstract | |
CN104462610B (en) | Distributed RDF storages and enquiring and optimizing method with reference to body | |
Harris et al. | Exterior power operations on higher K-groups via binary complexes | |
Kambayashi | Processing cyclic queries | |
CN112395286B (en) | Chained data table connection method, device, equipment and storage medium | |
CN103150346A (en) | Wireless sensor network data compression method based on extensible markup language | |
Suri et al. | Ternary tree and memory-efficient Huffman decoding algorithm | |
Priss | An FCA interpretation of relation algebra | |
Jati et al. | Big data compression using spiht in hadoop: A case study in multi-lead ECG signals | |
Ohara et al. | Cotorsion pairs in Hopfological algebra |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |