CN103345536B - A kind of semantic association indexing means - Google Patents

A kind of semantic association indexing means Download PDF

Info

Publication number
CN103345536B
CN103345536B CN201310322357.7A CN201310322357A CN103345536B CN 103345536 B CN103345536 B CN 103345536B CN 201310322357 A CN201310322357 A CN 201310322357A CN 103345536 B CN103345536 B CN 103345536B
Authority
CN
China
Prior art keywords
semantic
virtual document
association
semanteme
semantic association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310322357.7A
Other languages
Chinese (zh)
Other versions
CN103345536A (en
Inventor
姚瑞波
周凤波
翁强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing New Silk Road Consulting Group Co., Ltd
Original Assignee
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focus Technology Co Ltd filed Critical Focus Technology Co Ltd
Priority to CN201310322357.7A priority Critical patent/CN103345536B/en
Publication of CN103345536A publication Critical patent/CN103345536A/en
Application granted granted Critical
Publication of CN103345536B publication Critical patent/CN103345536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of semantic association indexing means, adopts virtual document technology, and the magnanimity sex chromosome mosaicism solving semantic association index associates accuracy and the recall rate problem of retrieval with semanteme, reaches the effective index to extensive semantic association and retrieval. The present invention portrays the text-independent information of semantic association and the graph structure information in semantic correlation diagram by virtual document, and constantly expanded by iteration and obtain multistep overall situation virtual document, multistep overall situation virtual document is made to maintain the virtual document dependency that associate semantic with mesh lines in the process of expansion, avoid the retrieval precision decline blindly expanded and to cause, it is to increase the feasibility of semantic association index in extensive semantic data and efficiency.

Description

A kind of semantic association indexing means
Technical field
The invention belongs to information retrieval field, it relates to a kind of semantic association indexing means.
Background technology
Along with semantic net nearly ten years flourish, online semantic data is more and more abundanter, and huge semantic data collection constitutes the net of the data of a numerous and complicated. Semantic search in semantic net mainly pays close attention to the semantic association between semantic object and object, and the target of semantic association retrieval helps user to find and understand contact direct or indirect between the object lain among magnanimity semantic data.
In research of semantic web field, semantic association is generally defined as object relation direct or indirect in resource description framework figure. The mode in oriented path in graph theory is continued to use in the modeling associated by semanteme usually, and for given two objects, the process that semantic association finds is the one or more the shortest or more short-circuit footpath finding object fast in resource description framework figure.
The development that the research of semantic association experiences 10 years has achieved certain achievement, but correlation technique still also exists deficiency. Still there is limitation based on the semantic relationship model in semantic path in tradition: object semantic association between any two can only be portrayed in the semantic path of (1) tradition, and each semantic association is independent mutually, cannot unify, the complicated association between multiple object cannot be portrayed, but what in fact multiple object was a large amount of is present in true semantic data, it should unified as a whole semantic association; (2) typicalness of semantic association do not considered by semantic path model, whether the semantic path namely having between two objects of semantic association also appears in other semantic association, under many circumstances, ordinary, unimportant semantic path merely illustrates the connection of two objects in resource description framework figure, does not represent the semantic association existing between two objects and having meaning. (3) the semantic index of association, the research of search method is lacked at present.
The existing semantic more difficult extensive semantic data of correlating method indexing means. Along with the development of semanteme net, a large amount of extensive semantic data collection occurred, its scale has exceeded 1,000,000 tlv triple, under such data scale, no matter is that path indexing technology or traditional index of the picture technology all are difficult to realize effective index. Tradition index of the picture algorithm has used for reference the Mining Algorithms of Frequent Patterns in data mining, first by the Frequent Pattern Mining in figure out, frequent mode carrying out the serializing of figure again and index, the complexity of this algorithm mostly is exponential, is not therefore suitable for the index of magnanimity semanteme association.
Summary of the invention
Technical problem:The present invention provides a kind of efficient, simply semantic association indexing means of retrieval model.
Technical scheme:The semantic association indexing means of the present invention, comprises the following steps:
1) the semantic association based on resource description frame model is carried out resource description framework parsing, obtain in semantic association the relation between the object that describes and object, then by analyzing object total between different semantic association, the semantic correlation diagram of undirected band label is obtained;
2) carry out composing power to the semantic correlation diagram set up in step 1), concrete grammar is: by the ratio of the total object number that calculates between two semantic associations and total object number, and correlative value seeks logarithm, obtain the weight on the limit of semantic correlation diagram, the weight on limit, between 0 to 1, illustrates the intensity of dependency between two semantic associations;
3) for each semantic association sets up one group of keyword as local virtual document, the keyword that each local virtual document package contains in the association of this semanteme in each object factory information, and the keyword in relationship description information between object;
4) for each semantic association, this semanteme is associated in the local virtual document of the neighbor node that there is direct limit in semantic correlation diagram, join in the local virtual document of this semanteme association self, using the overall virtual document that the virtual document formed associates as this semanteme;
5) the weight w on the direct limit between semantic association and neighbor node is obtained, make S=S × w, such as S > K, after then the overall virtual document obtained in step 4) is associated as this semanteme the local virtual document of self, return step 4), otherwise the overall virtual document obtained in step 4) is associated after final multistep overall situation virtual document as this semanteme, enter step 6), wherein S is iteration intensity, and K is iteration threshold;
6) the multistep overall situation virtual document of each semantic association is carried out index according to the mode of text index, and carry out keyword retrieval according to the Text Retrieval Model in information retrieval theory, obtain mating the semantic association of keyword query.
In the present invention, in the semantic correlation diagram obtained in step 1), each node is a semantic association, and every bar limit represents to there is total object between two semantic associations.
The local virtual document set up in step 3) of the present invention is the local virtual document of semantic association, and described local virtual document describes the text-independent information of semantic association.
In the present invention, the local virtual document of this semanteme association in step 4) self, is initially the local virtual document obtained in step 3).
Useful effect:The present invention compared with prior art, has the following advantages:
The inventive method adopts virtual document technology, and the magnanimity sex chromosome mosaicism solving semantic association index associates accuracy and the recall rate problem of retrieval with semanteme, reaches the effective index to extensive semantic association and retrieval. Technically, first graph structure is converted into text by virtual document technology by the present invention, index complexity can be reduced, simplify retrieval model, make the index to complicated graph structure can replace as relatively simple text index, avoid the appearance of a large amount of index item based on graph structure on the one hand, on the other hand the graph structure sequence not easily storing and inquiring about is turned to the linear text of easy storage, easily inquiry, thus reduce the quantity of index item and scale in magnanimity semanteme association index, have the features such as efficient index, retrieval model are simple relative to existing method. In the inventive method, the structure of virtual document is based on semantic correlation diagram, it is possible to improve the accuracy of retrieval; Decay factor is introduced in the expansion of virtual document, it is possible to ensure the controllability of virtual document expansion.
The invention allows for the dependency of semantic correlation diagram model for portraying between multiple semanteme association, and the text-independent information of semantic association and the graph structure information in semantic correlation diagram is portrayed by virtual document, and constantly expanded by iteration and obtain multistep overall situation virtual document, multistep overall situation virtual document is made to maintain the virtual document dependency that associate semantic with mesh lines in the process of expansion, avoid the retrieval precision decline blindly expanded and to cause, it is to increase the feasibility of semantic association index in extensive semantic data and efficiency.
Accompanying drawing explanation
Fig. 1 is the logic schema of the inventive method.
Fig. 2 is the iterative relation figure obtaining the final multistep overall situation virtual document of semantic association.
Embodiment
Below in conjunction with accompanying drawing, the better embodiment of the present invention is described in detail, so that advantages and features of the invention can be easier to be readily appreciated by one skilled in the art, thus protection scope of the present invention is made more explicit defining.
The semantic association indexing means of the present invention, comprising step is:
1) the semantic association of input being carried out the parsing based on rdf model, the tlv triple obtaining relation between each object in semantic association represents, and by object total between semantic association, obtains semantic correlation diagram, and this figure is undirected band label figure. In semantic correlation diagram, each node is a semantic association, and every bar limit represents to there is total object between two semantic associations.
2) carry out the semantic correlation diagram established composing power. The weight on limit is the ratio of the total object number by calculating between two semantic associations and total object number, and correlative value seeks logarithm gained. When data centralization semanteme association between total object general less time, the weight of opposite side will carry out normalization further. The weight on limit, between 0 to 1, illustrates the intensity of dependency between two semantic associations.
3) for a local virtual document is set up in each semantic association. The mathematical model of local virtual document is word bag (bagofwords), this word bag includes each object in semantic association and all keywords that between object, the label of association comprises, and local virtual document allows the keyword that duplicates, and record the text-independent information that the frequency local virtual document occurred describes semantic association effectively.
4) for each semantic association, the local virtual document that this semanteme is associated in the neighbor node that there is direct limit in semantic correlation diagram joins in the local virtual document of self, and the virtual document of formation becomes the overall virtual document of a step of this semanteme association.
5) setting iteration intensity is S, and iteration threshold is K, the real number that usual S and K is set as between 0 to 1, and K is less than S. For the retrieve application scene needing recall rate preferential, K should much smaller than S; For the retrieve application scene needing precision preferential, K should be slightly less than S. Obtain the weight w on the direct limit between semantic association and neighbor node, make S=S × w, such as S > K, after then the overall virtual document obtained in step 4) is associated as this semanteme the local virtual document of self, return step 4), otherwise the overall virtual document obtained in step 4) is associated after final multistep overall situation virtual document as this semanteme, enters step 6).
The mathematical model of overall situation virtual document is similarly word bag. Overall situation virtual document reflects semantic correlation diagram topological framework effectively, and the index of overall situation virtual document is improve the efficiency that magnanimity semanteme associates index.
6) the overall virtual document of each semantic association is carried out index according to the mode of text index, for the overall virtual document of each semantic association sets up the text index taking B+ tree as basic structure, and traditionally Text Retrieval Model provides the retrieval service based on keyword, obtain mating the semantic association of keyword query. This method reduce magnanimity semanteme association index to the demand in space, reduce the index time, simplify semantic association retrieval model.
Semantic association indexing means in the present invention is the method based on virtual document, the virtual document of semantic association can not only descriptive semantics associate in independent text information, simultaneously can the descriptive semantics graph structure information that is associated in semantic correlation diagram.Virtual document method improves the efficiency to magnanimity semanteme association retrieval effectively.
The present invention proposes the assignment method to limit in semantic correlation diagram, this approach enhance the description ability that semanteme is associated by virtual document, it is to increase the dependency that in virtual document, keyword associates with corresponding semanteme, and then improve accuracy and the recall rate of retrieval.
Decay factor is introduced in virtual document iteration being built, improve controllability and the handiness of virtual document construction process, make the virtual document scale portraying semantic correlation diagram structural information in controlled range, reduce the space requirement to magnanimity semanteme association index.
The foregoing is only embodiments of the invention; not thereby the patent scope of the present invention is limited; every utilize specification sheets of the present invention and accompanying drawing content to do equivalent structure or equivalence flow process conversion; or directly or indirectly it is used in other relevant technical fields, all it is included in the scope of patent protection of the present invention with reason.

Claims (4)

1. a semantic association indexing means, it is characterised in that, the method comprises the following steps:
1) the semantic association based on resource description frame model is carried out resource description framework parsing, obtain in semantic association the relation between the object that describes and object, then by analyzing object total between different semantic association, the semantic correlation diagram of undirected band label is obtained;
2) carry out composing power to the semantic correlation diagram set up in described step 1), concrete grammar is: by the ratio of the total object number that calculates between two semantic associations and total object number, and correlative value seeks logarithm, obtain the weight on the limit of semantic correlation diagram, the weight on described limit, between 0 to 1, illustrates the intensity of dependency between two semantic associations;
3) for each semantic association sets up one group of keyword as local virtual document, the keyword that each local virtual document package contains in the association of this semanteme in each object factory information, and the keyword in relationship description information between object;
4) for each semantic association, this semanteme is associated in the local virtual document of the neighbor node that there is direct limit in semantic correlation diagram, join in the local virtual document of this semanteme association self, using the overall virtual document that the virtual document formed associates as this semanteme;
5) iteration strength S is set, iteration threshold K, S and K is the real number between 0 to 1, obtain the weight w on the direct limit between semantic association and neighbor node, make S=S × w, such as S > K, after then the overall virtual document obtained in step 4) is associated as this semanteme the local virtual document of self, return step 4), otherwise the overall virtual document obtained in step 4) is associated after final multistep overall situation virtual document as this semanteme, entering step 6), wherein S is iteration intensity, and K is iteration threshold;
6) the multistep overall situation virtual document of each semantic association is carried out index according to the mode of text index, and carry out keyword retrieval according to the Text Retrieval Model in information retrieval theory, obtain mating the semantic association of keyword query.
2. semanteme association indexing means according to claim 1, it is characterised in that, in the semantic correlation diagram obtained in described step 1), each node is a semantic association, and every bar limit represents the total object existed between two semantic associations.
3. semanteme association indexing means according to claim 1, it is characterised in that, the local virtual document set up in described step 3) is the local virtual document of semantic association, and described local virtual document describes the text-independent information of semantic association.
4. semantic association indexing means according to claim 1,2 or 3, it is characterised in that, this semanteme in described step 4) associates the local virtual document of self, is initially the local virtual document obtained in step 3).
CN201310322357.7A 2013-07-30 2013-07-30 A kind of semantic association indexing means Active CN103345536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310322357.7A CN103345536B (en) 2013-07-30 2013-07-30 A kind of semantic association indexing means

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310322357.7A CN103345536B (en) 2013-07-30 2013-07-30 A kind of semantic association indexing means

Publications (2)

Publication Number Publication Date
CN103345536A CN103345536A (en) 2013-10-09
CN103345536B true CN103345536B (en) 2016-06-15

Family

ID=49280331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310322357.7A Active CN103345536B (en) 2013-07-30 2013-07-30 A kind of semantic association indexing means

Country Status (1)

Country Link
CN (1) CN103345536B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468702B (en) * 2015-11-18 2019-03-22 中国科学院计算机网络信息中心 A kind of extensive RDF data associated path discovery method
CN106407484B (en) * 2016-12-09 2023-09-01 上海交通大学 Video tag extraction method based on barrage semantic association
CN109614465A (en) * 2018-11-13 2019-04-12 中科创达软件股份有限公司 Data processing method, device and electronic equipment based on citation relations
CN111708819B (en) * 2020-05-28 2023-04-07 北京百度网讯科技有限公司 Method, apparatus, electronic device, and storage medium for information processing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7039634B2 (en) * 2003-03-12 2006-05-02 Hewlett-Packard Development Company, L.P. Semantic querying a peer-to-peer network
CN100442292C (en) * 2007-03-22 2008-12-10 华中科技大学 Method for indexing and acquiring semantic net information

Also Published As

Publication number Publication date
CN103345536A (en) 2013-10-09

Similar Documents

Publication Publication Date Title
CN107291807B (en) SPARQL query optimization method based on graph traversal
CN104809190B (en) A kind of database access method of tree structure data
CN102270232B (en) Semantic data query system with optimized storage
Gao et al. Relational approach for shortest path discovery over large graphs
CN102571954B (en) Complex network clustering method based on key influence of nodes
CN103177094B (en) Cleaning method of data of internet of things
CN103345536B (en) A kind of semantic association indexing means
CN102722566B (en) Method for inquiring potential friends in social network
CN103116625A (en) Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop
CN103793467B (en) Method for optimizing real-time query on big data on basis of hyper-graphs and dynamic programming
CN102609490B (en) Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)
CN102163218A (en) Graph-index-based graph database keyword vicinity searching method
CN102156756A (en) Method for finding optimal path in road network based on graph embedding
CN103678550A (en) Mass data real-time query method based on dynamic index structure
Gao et al. Shortest path computing in relational DBMSs
CN103226608B (en) A kind of parallel file searching method based on directory level telescopic Bloom Filter bitmap table
CN104933143A (en) Method and device for acquiring recommended object
CN104572832B (en) A kind of demand meta-model construction method and device
CN102325161B (en) Query workload estimation-based extensible markup language (XML) fragmentation method
CN104765767A (en) Knowledge storage algorithm for intelligent learning
CN104156431A (en) RDF keyword research method based on stereogram community structure
CN103186674A (en) Web data quick inquiry method based on extensive makeup language (XML)
CN108595588B (en) Scientific data storage association method
CN102955860B (en) Keyword query based on mode chart is improved one's methods
CN102722546B (en) The querying method of shortest path in relational database environment figure below

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200201

Address after: 100000 2505, 21 / F, building 6, No. 93 yard, Jianguo Road, Chaoyang District, Beijing

Patentee after: Beijing New Silk Road Consulting Group Co., Ltd

Address before: 210061 12F, building A, Spark Road software building, hi tech Zone, Jiangsu, Nanjing

Patentee before: Focus Technology Co., Ltd.