CN103345536B - A kind of semantic association indexing means - Google Patents
A kind of semantic association indexing means Download PDFInfo
- Publication number
- CN103345536B CN103345536B CN201310322357.7A CN201310322357A CN103345536B CN 103345536 B CN103345536 B CN 103345536B CN 201310322357 A CN201310322357 A CN 201310322357A CN 103345536 B CN103345536 B CN 103345536B
- Authority
- CN
- China
- Prior art keywords
- semantic
- virtual document
- association
- semanteme
- semantic association
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of semantic association indexing means, adopts virtual document technology, and the magnanimity sex chromosome mosaicism solving semantic association index associates accuracy and the recall rate problem of retrieval with semanteme, reaches the effective index to extensive semantic association and retrieval. The present invention portrays the text-independent information of semantic association and the graph structure information in semantic correlation diagram by virtual document, and constantly expanded by iteration and obtain multistep overall situation virtual document, multistep overall situation virtual document is made to maintain the virtual document dependency that associate semantic with mesh lines in the process of expansion, avoid the retrieval precision decline blindly expanded and to cause, it is to increase the feasibility of semantic association index in extensive semantic data and efficiency.
Description
Technical field
The invention belongs to information retrieval field, it relates to a kind of semantic association indexing means.
Background technology
Along with semantic net nearly ten years flourish, online semantic data is more and more abundanter, and huge semantic data collection constitutes the net of the data of a numerous and complicated. Semantic search in semantic net mainly pays close attention to the semantic association between semantic object and object, and the target of semantic association retrieval helps user to find and understand contact direct or indirect between the object lain among magnanimity semantic data.
In research of semantic web field, semantic association is generally defined as object relation direct or indirect in resource description framework figure. The mode in oriented path in graph theory is continued to use in the modeling associated by semanteme usually, and for given two objects, the process that semantic association finds is the one or more the shortest or more short-circuit footpath finding object fast in resource description framework figure.
The development that the research of semantic association experiences 10 years has achieved certain achievement, but correlation technique still also exists deficiency. Still there is limitation based on the semantic relationship model in semantic path in tradition: object semantic association between any two can only be portrayed in the semantic path of (1) tradition, and each semantic association is independent mutually, cannot unify, the complicated association between multiple object cannot be portrayed, but what in fact multiple object was a large amount of is present in true semantic data, it should unified as a whole semantic association; (2) typicalness of semantic association do not considered by semantic path model, whether the semantic path namely having between two objects of semantic association also appears in other semantic association, under many circumstances, ordinary, unimportant semantic path merely illustrates the connection of two objects in resource description framework figure, does not represent the semantic association existing between two objects and having meaning. (3) the semantic index of association, the research of search method is lacked at present.
The existing semantic more difficult extensive semantic data of correlating method indexing means. Along with the development of semanteme net, a large amount of extensive semantic data collection occurred, its scale has exceeded 1,000,000 tlv triple, under such data scale, no matter is that path indexing technology or traditional index of the picture technology all are difficult to realize effective index. Tradition index of the picture algorithm has used for reference the Mining Algorithms of Frequent Patterns in data mining, first by the Frequent Pattern Mining in figure out, frequent mode carrying out the serializing of figure again and index, the complexity of this algorithm mostly is exponential, is not therefore suitable for the index of magnanimity semanteme association.
Summary of the invention
Technical problem:The present invention provides a kind of efficient, simply semantic association indexing means of retrieval model.
Technical scheme:The semantic association indexing means of the present invention, comprises the following steps:
1) the semantic association based on resource description frame model is carried out resource description framework parsing, obtain in semantic association the relation between the object that describes and object, then by analyzing object total between different semantic association, the semantic correlation diagram of undirected band label is obtained;
2) carry out composing power to the semantic correlation diagram set up in step 1), concrete grammar is: by the ratio of the total object number that calculates between two semantic associations and total object number, and correlative value seeks logarithm, obtain the weight on the limit of semantic correlation diagram, the weight on limit, between 0 to 1, illustrates the intensity of dependency between two semantic associations;
3) for each semantic association sets up one group of keyword as local virtual document, the keyword that each local virtual document package contains in the association of this semanteme in each object factory information, and the keyword in relationship description information between object;
4) for each semantic association, this semanteme is associated in the local virtual document of the neighbor node that there is direct limit in semantic correlation diagram, join in the local virtual document of this semanteme association self, using the overall virtual document that the virtual document formed associates as this semanteme;
5) the weight w on the direct limit between semantic association and neighbor node is obtained, make S=S × w, such as S > K, after then the overall virtual document obtained in step 4) is associated as this semanteme the local virtual document of self, return step 4), otherwise the overall virtual document obtained in step 4) is associated after final multistep overall situation virtual document as this semanteme, enter step 6), wherein S is iteration intensity, and K is iteration threshold;
6) the multistep overall situation virtual document of each semantic association is carried out index according to the mode of text index, and carry out keyword retrieval according to the Text Retrieval Model in information retrieval theory, obtain mating the semantic association of keyword query.
In the present invention, in the semantic correlation diagram obtained in step 1), each node is a semantic association, and every bar limit represents to there is total object between two semantic associations.
The local virtual document set up in step 3) of the present invention is the local virtual document of semantic association, and described local virtual document describes the text-independent information of semantic association.
In the present invention, the local virtual document of this semanteme association in step 4) self, is initially the local virtual document obtained in step 3).
Useful effect:The present invention compared with prior art, has the following advantages:
The inventive method adopts virtual document technology, and the magnanimity sex chromosome mosaicism solving semantic association index associates accuracy and the recall rate problem of retrieval with semanteme, reaches the effective index to extensive semantic association and retrieval. Technically, first graph structure is converted into text by virtual document technology by the present invention, index complexity can be reduced, simplify retrieval model, make the index to complicated graph structure can replace as relatively simple text index, avoid the appearance of a large amount of index item based on graph structure on the one hand, on the other hand the graph structure sequence not easily storing and inquiring about is turned to the linear text of easy storage, easily inquiry, thus reduce the quantity of index item and scale in magnanimity semanteme association index, have the features such as efficient index, retrieval model are simple relative to existing method. In the inventive method, the structure of virtual document is based on semantic correlation diagram, it is possible to improve the accuracy of retrieval; Decay factor is introduced in the expansion of virtual document, it is possible to ensure the controllability of virtual document expansion.
The invention allows for the dependency of semantic correlation diagram model for portraying between multiple semanteme association, and the text-independent information of semantic association and the graph structure information in semantic correlation diagram is portrayed by virtual document, and constantly expanded by iteration and obtain multistep overall situation virtual document, multistep overall situation virtual document is made to maintain the virtual document dependency that associate semantic with mesh lines in the process of expansion, avoid the retrieval precision decline blindly expanded and to cause, it is to increase the feasibility of semantic association index in extensive semantic data and efficiency.
Accompanying drawing explanation
Fig. 1 is the logic schema of the inventive method.
Fig. 2 is the iterative relation figure obtaining the final multistep overall situation virtual document of semantic association.
Embodiment
Below in conjunction with accompanying drawing, the better embodiment of the present invention is described in detail, so that advantages and features of the invention can be easier to be readily appreciated by one skilled in the art, thus protection scope of the present invention is made more explicit defining.
The semantic association indexing means of the present invention, comprising step is:
1) the semantic association of input being carried out the parsing based on rdf model, the tlv triple obtaining relation between each object in semantic association represents, and by object total between semantic association, obtains semantic correlation diagram, and this figure is undirected band label figure. In semantic correlation diagram, each node is a semantic association, and every bar limit represents to there is total object between two semantic associations.
2) carry out the semantic correlation diagram established composing power. The weight on limit is the ratio of the total object number by calculating between two semantic associations and total object number, and correlative value seeks logarithm gained. When data centralization semanteme association between total object general less time, the weight of opposite side will carry out normalization further. The weight on limit, between 0 to 1, illustrates the intensity of dependency between two semantic associations.
3) for a local virtual document is set up in each semantic association. The mathematical model of local virtual document is word bag (bagofwords), this word bag includes each object in semantic association and all keywords that between object, the label of association comprises, and local virtual document allows the keyword that duplicates, and record the text-independent information that the frequency local virtual document occurred describes semantic association effectively.
4) for each semantic association, the local virtual document that this semanteme is associated in the neighbor node that there is direct limit in semantic correlation diagram joins in the local virtual document of self, and the virtual document of formation becomes the overall virtual document of a step of this semanteme association.
5) setting iteration intensity is S, and iteration threshold is K, the real number that usual S and K is set as between 0 to 1, and K is less than S. For the retrieve application scene needing recall rate preferential, K should much smaller than S; For the retrieve application scene needing precision preferential, K should be slightly less than S. Obtain the weight w on the direct limit between semantic association and neighbor node, make S=S × w, such as S > K, after then the overall virtual document obtained in step 4) is associated as this semanteme the local virtual document of self, return step 4), otherwise the overall virtual document obtained in step 4) is associated after final multistep overall situation virtual document as this semanteme, enters step 6).
The mathematical model of overall situation virtual document is similarly word bag. Overall situation virtual document reflects semantic correlation diagram topological framework effectively, and the index of overall situation virtual document is improve the efficiency that magnanimity semanteme associates index.
6) the overall virtual document of each semantic association is carried out index according to the mode of text index, for the overall virtual document of each semantic association sets up the text index taking B+ tree as basic structure, and traditionally Text Retrieval Model provides the retrieval service based on keyword, obtain mating the semantic association of keyword query. This method reduce magnanimity semanteme association index to the demand in space, reduce the index time, simplify semantic association retrieval model.
Semantic association indexing means in the present invention is the method based on virtual document, the virtual document of semantic association can not only descriptive semantics associate in independent text information, simultaneously can the descriptive semantics graph structure information that is associated in semantic correlation diagram.Virtual document method improves the efficiency to magnanimity semanteme association retrieval effectively.
The present invention proposes the assignment method to limit in semantic correlation diagram, this approach enhance the description ability that semanteme is associated by virtual document, it is to increase the dependency that in virtual document, keyword associates with corresponding semanteme, and then improve accuracy and the recall rate of retrieval.
Decay factor is introduced in virtual document iteration being built, improve controllability and the handiness of virtual document construction process, make the virtual document scale portraying semantic correlation diagram structural information in controlled range, reduce the space requirement to magnanimity semanteme association index.
The foregoing is only embodiments of the invention; not thereby the patent scope of the present invention is limited; every utilize specification sheets of the present invention and accompanying drawing content to do equivalent structure or equivalence flow process conversion; or directly or indirectly it is used in other relevant technical fields, all it is included in the scope of patent protection of the present invention with reason.
Claims (4)
1. a semantic association indexing means, it is characterised in that, the method comprises the following steps:
1) the semantic association based on resource description frame model is carried out resource description framework parsing, obtain in semantic association the relation between the object that describes and object, then by analyzing object total between different semantic association, the semantic correlation diagram of undirected band label is obtained;
2) carry out composing power to the semantic correlation diagram set up in described step 1), concrete grammar is: by the ratio of the total object number that calculates between two semantic associations and total object number, and correlative value seeks logarithm, obtain the weight on the limit of semantic correlation diagram, the weight on described limit, between 0 to 1, illustrates the intensity of dependency between two semantic associations;
3) for each semantic association sets up one group of keyword as local virtual document, the keyword that each local virtual document package contains in the association of this semanteme in each object factory information, and the keyword in relationship description information between object;
4) for each semantic association, this semanteme is associated in the local virtual document of the neighbor node that there is direct limit in semantic correlation diagram, join in the local virtual document of this semanteme association self, using the overall virtual document that the virtual document formed associates as this semanteme;
5) iteration strength S is set, iteration threshold K, S and K is the real number between 0 to 1, obtain the weight w on the direct limit between semantic association and neighbor node, make S=S × w, such as S > K, after then the overall virtual document obtained in step 4) is associated as this semanteme the local virtual document of self, return step 4), otherwise the overall virtual document obtained in step 4) is associated after final multistep overall situation virtual document as this semanteme, entering step 6), wherein S is iteration intensity, and K is iteration threshold;
6) the multistep overall situation virtual document of each semantic association is carried out index according to the mode of text index, and carry out keyword retrieval according to the Text Retrieval Model in information retrieval theory, obtain mating the semantic association of keyword query.
2. semanteme association indexing means according to claim 1, it is characterised in that, in the semantic correlation diagram obtained in described step 1), each node is a semantic association, and every bar limit represents the total object existed between two semantic associations.
3. semanteme association indexing means according to claim 1, it is characterised in that, the local virtual document set up in described step 3) is the local virtual document of semantic association, and described local virtual document describes the text-independent information of semantic association.
4. semantic association indexing means according to claim 1,2 or 3, it is characterised in that, this semanteme in described step 4) associates the local virtual document of self, is initially the local virtual document obtained in step 3).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310322357.7A CN103345536B (en) | 2013-07-30 | 2013-07-30 | A kind of semantic association indexing means |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310322357.7A CN103345536B (en) | 2013-07-30 | 2013-07-30 | A kind of semantic association indexing means |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103345536A CN103345536A (en) | 2013-10-09 |
CN103345536B true CN103345536B (en) | 2016-06-15 |
Family
ID=49280331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310322357.7A Active CN103345536B (en) | 2013-07-30 | 2013-07-30 | A kind of semantic association indexing means |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103345536B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105468702B (en) * | 2015-11-18 | 2019-03-22 | 中国科学院计算机网络信息中心 | A kind of extensive RDF data associated path discovery method |
CN106407484B (en) * | 2016-12-09 | 2023-09-01 | 上海交通大学 | Video tag extraction method based on barrage semantic association |
CN109614465A (en) * | 2018-11-13 | 2019-04-12 | 中科创达软件股份有限公司 | Data processing method, device and electronic equipment based on citation relations |
CN111708819B (en) * | 2020-05-28 | 2023-04-07 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and storage medium for information processing |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7039634B2 (en) * | 2003-03-12 | 2006-05-02 | Hewlett-Packard Development Company, L.P. | Semantic querying a peer-to-peer network |
CN100442292C (en) * | 2007-03-22 | 2008-12-10 | 华中科技大学 | Method for indexing and acquiring semantic net information |
-
2013
- 2013-07-30 CN CN201310322357.7A patent/CN103345536B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN103345536A (en) | 2013-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107291807B (en) | SPARQL query optimization method based on graph traversal | |
CN104809190B (en) | A kind of database access method of tree structure data | |
CN102270232B (en) | Semantic data query system with optimized storage | |
Gao et al. | Relational approach for shortest path discovery over large graphs | |
CN102571954B (en) | Complex network clustering method based on key influence of nodes | |
CN103177094B (en) | Cleaning method of data of internet of things | |
CN103345536B (en) | A kind of semantic association indexing means | |
CN102722566B (en) | Method for inquiring potential friends in social network | |
CN103116625A (en) | Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop | |
CN103793467B (en) | Method for optimizing real-time query on big data on basis of hyper-graphs and dynamic programming | |
CN102609490B (en) | Column-storage-oriented B+ tree index method for DWMS (data warehouse management system) | |
CN102163218A (en) | Graph-index-based graph database keyword vicinity searching method | |
CN102156756A (en) | Method for finding optimal path in road network based on graph embedding | |
CN103678550A (en) | Mass data real-time query method based on dynamic index structure | |
Gao et al. | Shortest path computing in relational DBMSs | |
CN103226608B (en) | A kind of parallel file searching method based on directory level telescopic Bloom Filter bitmap table | |
CN104933143A (en) | Method and device for acquiring recommended object | |
CN104572832B (en) | A kind of demand meta-model construction method and device | |
CN102325161B (en) | Query workload estimation-based extensible markup language (XML) fragmentation method | |
CN104765767A (en) | Knowledge storage algorithm for intelligent learning | |
CN104156431A (en) | RDF keyword research method based on stereogram community structure | |
CN103186674A (en) | Web data quick inquiry method based on extensive makeup language (XML) | |
CN108595588B (en) | Scientific data storage association method | |
CN102955860B (en) | Keyword query based on mode chart is improved one's methods | |
CN102722546B (en) | The querying method of shortest path in relational database environment figure below |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200201 Address after: 100000 2505, 21 / F, building 6, No. 93 yard, Jianguo Road, Chaoyang District, Beijing Patentee after: Beijing New Silk Road Consulting Group Co., Ltd Address before: 210061 12F, building A, Spark Road software building, hi tech Zone, Jiangsu, Nanjing Patentee before: Focus Technology Co., Ltd. |