CN102902809B - A kind of Novel semantic association method for digging - Google Patents

A kind of Novel semantic association method for digging Download PDF

Info

Publication number
CN102902809B
CN102902809B CN201210399288.5A CN201210399288A CN102902809B CN 102902809 B CN102902809 B CN 102902809B CN 201210399288 A CN201210399288 A CN 201210399288A CN 102902809 B CN102902809 B CN 102902809B
Authority
CN
China
Prior art keywords
piecemeal
semantic
semantic association
digging
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210399288.5A
Other languages
Chinese (zh)
Other versions
CN102902809A (en
Inventor
张祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201210399288.5A priority Critical patent/CN102902809B/en
Publication of CN102902809A publication Critical patent/CN102902809A/en
Application granted granted Critical
Publication of CN102902809B publication Critical patent/CN102902809B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of Novel semantic association method for digging, comprising step is: the semantic data of input is carried out parsing cluster, forms semantic data bunch and carries out data cleansing, resource description framework figure is converted into type object figure; Piecemeal or merging are carried out to type object figure, obtains multiple piecemeal; Predict linking scheme potential in each piecemeal and the order of magnitude of semantic association, predict the outcome and feed back to two division unit, to larger or baroque piecemeal Further Division; Excavate local link pattern and the semantic association of piecemeal, semantic association gathered and adds up, exporting to user.By the way, a kind of Novel semantic association method for digging provided by the invention, the features such as the method has efficiently, Result is accurate, the complexity association between multiple object can be portrayed, and linking scheme is used for the typicalness weighing semantic association, use figure digging technology carries out excavation, improves semantic association and excavates feasibility in extensive semantic data and efficiency.

Description

A kind of Novel semantic association method for digging
Technical field
The present invention relates to information retrieval field, particularly relate to a kind of Novel semantic association method for digging.
Background technology
Along with semantic net nearly ten years flourish, online semantic data is more and more abundanter, and huge semantic data collection constitutes the net of the data of a numerous and complicated.Semantic search in semantic net mainly pays close attention to the semantic association between semantic object and object, and the target of semantic association retrieval helps user to find and understands contact direct or indirect between the object that lies among mass semantic data.
In research of semantic web field, semantic association is generally defined as object relation direct or indirect in resource description framework figure.The modeling of semantic association is continued to use usually to the mode of directed walk in graph theory, for given two objects, the process that semantic association finds is the one or more the shortest or comparatively short path finding object fast in resource description framework figure.
The development that the digging technology of semantic association experiences 10 years has achieved certain achievement, but correlation technique still also exists deficiency.Still there is limitation based on the semantic relationship model in semantic path in tradition: object semantic association between any two can only be portrayed in (1) traditional semantic path, and each semantic association is independent mutually, cannot unify, the complexity association between multiple object cannot be portrayed, but what in fact multiple object was a large amount of is present in true semantic data, should unify as a whole semantic association; (2) typicalness of semantic association do not considered by semantic path model, whether the semantic path between two objects namely with semantic association also appears in other semantic association, under many circumstances, ordinary, unessential semantic path merely illustrates the connectedness of two objects in resource description framework figure, does not represent to there is the significant semantic association of tool between two objects.
The more difficult extensive semantic data of existing semantic association method for digging.Along with the development of semantic net, there is a large amount of extensive semantic data collection, its scale has exceeded 1,000,000 tlv triple, under such data scale, no matter is that path discovery technology or traditional figure digging technology were all difficult to excavate significant result within the limited time.Especially traditional figure mining algorithm, because its complexity is mostly exponential, and the scale of supposition figure can leave in internal memory mostly, therefore fails up to now to directly apply in the excavation of Large Scale Graphs.In data mining research field, the main method of dealing with problems is the piecemeal to Large Scale Graphs.PartMiner algorithm is the figure block algorithm for scheming excavation the most popular, but this theory of algorithm exists the possibility excavating error result, need the correctness detecting Result after the overall situation is excavated further, therefore also imperfect so far method of partition can carry out dividing to extensive semantic data fast and accurately and merge.
Summary of the invention
The advantages such as the technical matters that the present invention mainly solves is to provide a kind of Novel semantic association method for digging, and the method has efficiently, Result is accurate.
For solving the problems of the technologies described above, the technical scheme that the present invention adopts is: provide a kind of Novel semantic association method for digging, comprising step is:
(1) semantic data of input is carried out parsing cluster, form semantic data bunch, data cleansing is carried out to described semantic data bunch, and the resource description framework figure of described semantic data bunch is converted into type object figure;
(2) carry out piecemeal or merging by basic token rule or optimization marking convention to described type object figure, obtain multiple piecemeal, described basic token rule adopts close to random labeling method, and described optimization marking convention adopts heuristic rule;
(3) predict the order of magnitude of potential linking scheme and semantic association in described each piecemeal, predict the outcome and feed back to two division unit, to larger or baroque piecemeal Further Division;
(4) excavate local link pattern and the semantic association of described piecemeal, more described local link pattern and described semantic association are merged, described semantic association gathered and adds up, exporting to user.
In a preferred embodiment of the present invention, it is filter the resource description framework tlv triple in resource description framework figure that resource description framework figure described in step (1) is converted into the process of type object figure, again described resource description framework tlv triple is expanded to link five-tuple, utilize described link five-tuple that described resource description framework figure is converted into described type object figure.
In a preferred embodiment of the present invention, adopt limit labeling method to described type object figure piecemeal in step (2), described labeling method is that in described type object figure, a label is distributed on every bar limit, and described label comprises the type of subject object and object object.
In a preferred embodiment of the present invention, described in step (4), method for digging is stage digging method, from described type object figure, some or all described linking scheme is excavated by Mining Algorithms of Frequent Patterns, choose the partial mode in described linking scheme, in described resource description framework figure, select instantiation subgraph as described semantic association according to described partial mode.
In a preferred embodiment of the present invention, described in step (4), method for digging is combination type method for digging, and when excavating described linking scheme by Mining Algorithms of Frequent Patterns, described semantic association is excavated out when calculating described linking scheme support.
The invention has the beneficial effects as follows: Novel semantic association method for digging of the present invention, the features such as the method has efficiently relative to existing method, Result is accurate, propose type object graph model for portraying the complexity association between multiple object, and linking scheme is used for the typicalness weighing semantic association, and then use figure digging technology carries out the excavation of semantic association, improve semantic association and excavate feasibility in extensive semantic data and efficiency.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of Novel semantic association method for digging one of the present invention preferred embodiment;
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment of the present invention is described in detail, can be easier to make advantages and features of the invention be readily appreciated by one skilled in the art, thus more explicit defining is made to protection scope of the present invention.
Refer to Fig. 1, the invention provides a kind of Novel semantic association method for digging, comprising step is:
(1) input data set that semantic association finds is the resource description framework data of general significance, and in data, comprise the tlv triple of relation between object and object, basic parsing is carried out to the semantic data of input, one side mode common between further analysis disconnected resource description framework figure, thus semantic data is carried out cluster, form semantic data bunch;
(2) basic data cleansing is carried out to the semantic data bunch generated, and resource description framework figure is converted into type object figure, scale due to the type object figure generated may exceed internal memory restriction, in order to improve efficiency and the extensibility of excavation, type object figure is carried out piecemeal;
(3) limit labeling method is adopted to carry out piecemeal and merging to type object figure, opposite side be marked with two kinds of rules, what basic token rule adopted is close to random labeling method, different from basic token rule, optimize marking convention by a series of heuristic rule, improve the success rates of limit type object diagram division methods, reduce pseudo-failed ratio;
(4) by basic token rule or optimization marking convention, type object figure is divided into multiple piecemeal, ensure that described piecemeal has certain mathematical property, and each piecemeal all can be loaded into internal memory, in order to improve the homogeneity of piecemeal, introduce two division unit, quick excavation also predicts linking scheme potential in each piecemeal and the order of magnitude of semantic association, and data result is fed back to two division unit, for by after dividing more still the piecemeal of comparatively large or structure comparatively complexity carry out hierarchical partition further, the piecemeal of comparatively large or structure comparatively complexity is made to find as early as possible and to be divided further, thus improve the overall homogeneity divided,
(5) amended gSpan algorithm is adopted intactly to excavate local link pattern in all piecemeals and semantic association, because point block operations may make the disappearance of linking scheme and the semantic association excavated, therefore the linking scheme of local and semantic association are merged according to certain algorithm, ensure the integrality of Result, semantic association is as the secondary product in linking scheme mining process, it gathered and adds up, and showing user as output.
Novel semantic association method for digging in the present invention, the detailed process based on the semantic association modeling of graph model is: (1) filters the resource description framework tlv triple in resource description framework figure, only considers to describe the tlv triple associated between famous object; (2) resource description framework tlv triple is expanded to link five-tuple, not only comprise the universal resource identifier (URI) of subject, predicate and object in each link five-tuple, further comprises the type information of subject object and object object; (3) utilize link five-tuple that resource description framework figure is converted into type object figure, in resource description framework figure, the type of resource is implicit information, need to be obtained by resource description framework semantic reasoning, and the type information of object is explicit information in type object figure, linking scheme is the Frequent tree mining in type object figure, as semantic association pattern behind, linking scheme ensure that the semantic association that each excavates all has certain typicalness in pattern; (4) graph model of semantic association is proposed based on the definition of linking scheme, in given resource description framework figure, a linking scheme may by multiple subgraph instantiations of this resource description framework figure, and each instantiation constitutes the semantic association between a group objects.
Novel semantic association method for digging in the present invention, an object may be defined as multiple type in resource description framework figure, the polymorphic type problem of object will affect the structure of type object figure, therefore adopts one group of heuristic rule for defining the type problem of object under complex situations.When the type of object is defined repeatedly in single or multiple resource description framework figure, the local type of object and overall type will be distinguished, and authoritative type and the inauthoritativeness type of object is distinguished by statistical study, finally based on context determine the single type of object.
Relate to linking scheme in the graph model of semantic association, the process to the excavation of linking scheme and the instantiation information of discovery linking scheme is actually to the excavation of semantic association.Adopt two kinds of semantic association method for digging: (1) stage digging method: the first stage of excavating, from type object figure, some or all linking scheme is excavated by the Mining Algorithms of Frequent Patterns of classics, the subordinate phase excavated, excavating selected part pattern in linking scheme, the instantiation subgraph to these patterns is found, as the semantic association finally excavated in resource description framework figure; (2) combination type method for digging: in the method, carry out the excavation of linking scheme and semantic association is parallel in same step, when utilizing Mining Algorithms of Frequent Patterns to excavate linking scheme, semantic association is excavated out when calculating linking pattern support, and final Result is all association modes in type object figure and all semantic associations in resource description framework figure.Two kinds of method for digging are applicable to different scenes, sectional type method for digging is chosen owing to introducing pattern before subordinate phase is excavated, therefore the focusing being more suitable for semantic association is excavated, namely as relevant in Users' Interests Mining under given conditions part of semantic incidence set, combination type method for digging is suitable for the excavation of complete semantic association collection, but when extensive resource description framework data collection, its time complexity and space complexity are all higher.
In Mining Algorithms of Frequent Patterns, have selected the classical pattern mining algorithm increased based on pattern, gSpan algorithm is revised.Traditional gSpan algorithm is applicable to undirected simple graph, and type object figure is oriented tape label figure, and may comprise in type object figure from ring, the special circumstances such as polygon.Make it may be used for the excavation of linking scheme in type object figure and semantic association to the amendment of gSpan algorithm, in this external mining process, propose the scale using minimum support and two, maximum limit state modulator linking scheme to excavate.
Novel semantic association method for digging in the present invention is the semantic association method for digging based on cluster and piecemeal.A given resource description framework data is concentrated may exist multiple disconnected resource description framework figure, and then expands to multiple disconnected type object figure.Defined the annexation of type object figure by monolateral linking scheme common between analysis type object diagram, and merge according to multiple type object figure that annexation may comprise common linking scheme.The method makes the excavation of semantic association to adopt divide and conquer, complete independently in each cluster, and without the need to carrying out on whole data set, and the Result in single cluster can directly merge into final Result.
Adopt the figure method of partition based on limit mark can solve extensive type object figure to excavate, use a kind of limit marking convention for distributing a label in bar limit every in type object, this label embodies the type of its subject object and object object.When given memory size and type object figure scale, set up multiple piecemeals of type object figure, and successively the every bar limit in type object figure is divided to suitable piecemeal according to its label, in partition process, define the Connected degree between limit and piecemeal, every bar limit will be divided in the highest piecemeal of Connected degree successively.But the division methods marked based on limit is not also suitable for any cost describing framework data set, given a kind of limit marking convention, when certain resource description framework data concentrates the limit that there is a large amount of same kind, the limit in a large number with same label will be produced, finally there is one or more large-scale type object piecemeal cannot putting into internal memory by causing further in this, this situation is called the inefficacy case of this marking convention.Simultaneously also there is pseudo-inefficacy case, namely certain limit marking convention effectively cannot divide certain data set, but the limit marking convention that there are other can effectively divide, but can reduce the generation of pseudo-inefficacy case based on the optimization marking convention of heuristic.Method of partition based on limit mark ensure that the integrality of final Result and local mode are global schema, simplifies the Result merging process after piecemeal.
The Novel semantic association method for digging that the present invention discloses, improve the ability to express of semantic relationship model, the association that traditional semantic association based on semantic path is beyond expression between comparatively complicated multiple objects, and the present invention sets up the connection layout between multiple similar or foreign peoples's object by constructed type object diagram, be the data basis of excavating with type object figure, the complexity association between multi-object that existing method cannot obtain can be excavated.
The present invention proposes the concept of linking scheme, for weighing the frequent degree of semantic association, the frequent degree that linking scheme embodies features semantic association and whether has typical meaning, and this meaning cannot embody in existing semantic relationship model.
Invention increases semantic association and excavate feasibility in extensive semantic data and efficiency, first the present invention have modified classical gSpan algorithm can the special construction of adaptive form object diagram, next proposes a cluster and block algorithm, solves the Mining Problems of extensive semantic data.We demonstrate this algorithm in theory and there is Result integrality not available for existing block algorithm, we also further demonstrate this algorithm to the semantic data enterprising lang justice association mining in 1,000,000 tlv triple ranks is in an experiment feasible, and substantially increases the efficiency of semantic association excavation.
The foregoing is only embodiments of the invention; not thereby the scope of the claims of the present invention is limited; every utilize instructions of the present invention and accompanying drawing content to do equivalent structure or equivalent flow process conversion; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (2)

1. a Novel semantic association method for digging, is characterized in that, comprises step to be:
(1) semantic data of input is carried out parsing and cluster, form semantic data bunch, data cleansing is carried out to described semantic data bunch, and the resource description framework figure of described semantic data bunch is converted into type object figure, the process that described resource description framework figure is converted into type object figure is filter the resource description framework tlv triple in resource description framework figure, again described resource description framework tlv triple is expanded to link five-tuple, utilize described link five-tuple that described resource description framework figure is converted into described type object figure;
(2) carry out piecemeal or merging by basic token rule to described type object figure, obtain multiple piecemeal, described basic token rule adopts close to random labeling method;
(3) adopt quick mining method to predict the order of magnitude of potential linking scheme and semantic association in described each piecemeal, predict the outcome and feed back to two division unit, to larger or baroque piecemeal Further Division, particularly, by basic token rule, type object figure is divided into multiple piecemeal, ensure that described piecemeal has certain mathematical property, and each piecemeal all can be loaded into internal memory, in order to improve the homogeneity of piecemeal, introduce two division unit, quick excavation also predicts linking scheme potential in each piecemeal and the order of magnitude of semantic association, and data result is fed back to two division unit, for by after dividing more still the piecemeal of comparatively large or structure comparatively complexity carry out hierarchical partition further, the piecemeal of comparatively large or structure comparatively complexity is made to find as early as possible and to be divided further, thus improve the overall homogeneity divided,
(4) excavate local link pattern and the semantic association of described piecemeal, more described local link pattern and described semantic association are merged, described semantic association gathered and adds up, exporting to user; Described method for digging is stage digging method or merges method for digging, described stage digging method excavates some or all described linking scheme by Mining Algorithms of Frequent Patterns from described type object figure, choose the partial mode in described linking scheme, in described resource description framework figure, select instantiation subgraph as described semantic association according to described partial mode; Described combination type method for digging, when excavating described linking scheme by Mining Algorithms of Frequent Patterns, described semantic association is excavated out when calculating described linking scheme support.
2. Novel semantic association method for digging according to claim 1, it is characterized in that, in step (2), limit labeling method is adopted to described type object figure piecemeal, described labeling method is that in described type object figure, a label is distributed on every bar limit, and described label comprises the type of subject object and object object.
CN201210399288.5A 2012-10-19 2012-10-19 A kind of Novel semantic association method for digging Expired - Fee Related CN102902809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210399288.5A CN102902809B (en) 2012-10-19 2012-10-19 A kind of Novel semantic association method for digging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210399288.5A CN102902809B (en) 2012-10-19 2012-10-19 A kind of Novel semantic association method for digging

Publications (2)

Publication Number Publication Date
CN102902809A CN102902809A (en) 2013-01-30
CN102902809B true CN102902809B (en) 2016-02-24

Family

ID=47575041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210399288.5A Expired - Fee Related CN102902809B (en) 2012-10-19 2012-10-19 A kind of Novel semantic association method for digging

Country Status (1)

Country Link
CN (1) CN102902809B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021354A (en) 2016-05-10 2016-10-12 北京信息科技大学 Establishment method of digital interpretation library of Dongba classical ancient books

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436480A (en) * 2011-10-15 2012-05-02 西安交通大学 Incidence relation excavation method for text-oriented knowledge unit
CN102722569A (en) * 2012-05-31 2012-10-10 浙江理工大学 Knowledge discovery device based on path migration of RDF (Resource Description Framework) picture and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436480A (en) * 2011-10-15 2012-05-02 西安交通大学 Incidence relation excavation method for text-oriented knowledge unit
CN102722569A (en) * 2012-05-31 2012-10-10 浙江理工大学 Knowledge discovery device based on path migration of RDF (Resource Description Framework) picture and method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
An RDF Approach for Discovering the Relevant Semantic Associations in a Social Network;Thushar, A.K.,Thilagam, P.S.;《Advanced Computing and Communications, 2008. ADCOM 2008. 16th International Conference on 》;20081217;214-220 *
Wang Xuping,Ni Zijian;Cao Haiyan.Research on Association Rules Mining Based-On Ontology in E-Commerce.《Wireless Communications, Networking and Mobile Computing, 2007. WiCom 2007. International Conference on 》.2007,3549 - 3552. *
基于对gSpan改进的有向频繁子图挖掘算法;周溜溜等;《南京大学学报(自然科学)》;20110930;第47卷(第5期);532-543 *
大规模图集的频繁子图挖掘算法研究;郑超;《中国优秀硕士学位论文全文数据库 信息科学辑(月刊)》;20120315(第03期);I138-48 *
语义Web链接结构分析之综述;葛唯益等;《计算机科学》;20100331;第37卷(第3期);19-21 *

Also Published As

Publication number Publication date
CN102902809A (en) 2013-01-30

Similar Documents

Publication Publication Date Title
Marinescu et al. AND/OR branch-and-bound search for combinatorial optimization in graphical models
CN102073700B (en) Discovery method of complex network community
Afrati et al. Transitive closure and recursive datalog implemented on clusters
CN105468702A (en) Large-scale RDF data association path discovery method
Mantle et al. Large scale distributed spatio-temporal reasoning using real-world knowledge graphs
CN106445645A (en) Method and device for executing distributed computation tasks
Kotilainen et al. P2PRealm-peer-to-peer network simulator
Chen et al. Graph indexing for efficient evaluation of label-constrained reachability queries
CN102708285B (en) Coremedicine excavation method based on complex network model parallelizing PageRank algorithm
Arge et al. Multiway simple cycle separators and I/O-efficient algorithms for planar graphs
CN102902809B (en) A kind of Novel semantic association method for digging
CN102420812A (en) Automatic quality of service (QoS) combination method supporting distributed parallel processing in web service
CN102184194B (en) Ontology-based knowledge map drawing system
Zhou A practical scalable shared-memory parallel algorithm for computing minimum spanning trees
CN102737134A (en) Query processing method being suitable for large-scale real-time data stream
CN106330559B (en) Complex network topologies calculation of characteristic parameters method and system based on MapReduce
CN109829056A (en) Predicate explains the fact that template-driven Abductive reasoning method
CN105488056B (en) A kind of object processing method and equipment
CN108804788B (en) Web service evolution method based on data cell model
WO2018015814A1 (en) Systems and methods for database compression and evaluation
CN107133281A (en) A kind of packet-based global multi-query optimization method
CN102750386A (en) Inquiry processing method suitable for large-scale real-time data flows
Wei et al. Accelerating the shortest-path calculation using cut nodes for problem reduction and division
Liu et al. An Abstract Description Method of Map‐Reduce‐Merge Using Haskell
Huang et al. Growing Like a Tree: Finding Trunks From Graph Skeleton Trees

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160224

Termination date: 20161019