CN102136975B - Large-scale network environment-oriented similarity network construction method - Google Patents

Large-scale network environment-oriented similarity network construction method Download PDF

Info

Publication number
CN102136975B
CN102136975B CN201110044235.7A CN201110044235A CN102136975B CN 102136975 B CN102136975 B CN 102136975B CN 201110044235 A CN201110044235 A CN 201110044235A CN 102136975 B CN102136975 B CN 102136975B
Authority
CN
China
Prior art keywords
network
layer
similar
topic
community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110044235.7A
Other languages
Chinese (zh)
Other versions
CN102136975A (en
Inventor
骆祥峰
倪晶晶
张顺香
张俊
陆磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201110044235.7A priority Critical patent/CN102136975B/en
Publication of CN102136975A publication Critical patent/CN102136975A/en
Application granted granted Critical
Publication of CN102136975B publication Critical patent/CN102136975B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种面向大规模网络资源环境的相似网络构建方法,目的在于增强资源间链接的语义性,形成一个具有相似关系的语义虚拟层,改善目前网络资源的管理及网络服务的低效和不准确性。本方法首先利用分而治之的策略,把资源先粗糙的分为若干个较小的资源块,然后再在每个资源块中进一步细致的处理,降低直接于大规模资源中处理的复杂度;其次,构建一个三层的网络模型,包括资源层、话题层和社区层,保证构建网络的连通性;再次,构建反馈机制,分别位于话题层和社区层,使更多的资源融入构建的网络;最后,加入人机交互机制,让用户修正机器构建的相似网络的链接,使网络链接更符合人类思维。The invention relates to a similar network construction method for a large-scale network resource environment. The purpose is to enhance the semantics of links between resources, form a semantic virtual layer with similar relationships, and improve the management of current network resources and the inefficiency and inefficiency of network services. inaccuracy. This method first uses the strategy of divide and conquer, divides resources roughly into several smaller resource blocks, and then further detailed processing in each resource block, reducing the complexity of direct processing in large-scale resources; secondly, Construct a three-layer network model, including resource layer, topic layer and community layer, to ensure the connectivity of the network; thirdly, build a feedback mechanism, respectively located in the topic layer and community layer, so that more resources can be integrated into the constructed network; finally , adding a human-computer interaction mechanism, allowing users to modify the links of similar networks constructed by machines, so that network links are more in line with human thinking.

Description

A kind of similar network establishing method towards large-scale network environment
Technical field
The present invention relates to a kind of similar network establishing method towards large-scale network resource environment, more specifically, relate to a kind ofly in the text resource of magnanimity, set up similar link to form the method for a similar network.
Background technology
Similar network is a semantic layer based on similarity relation organization network resource, aims to provide to have to enrich semantic network service, for example: network intelligence is browsed and intelligent search.But it is an impossible thing that the method for traditional similar network of structure is applied under the environment of large-scale network resource.Reason has two: one is high complexity computing time.Suppose to utilize to calculate between two and directly calculate similarity,, when resource extent is very large, Riming time of algorithm is the growth of resource extent quadratic term.Another reason is that the similar network link building is poor.While supposing to calculate, similarity threshold is got low, can guarantee to build the connectedness of network, but accuracy is just lower.And similarity threshold is got height, guaranteed accuracy and reduced connectedness.Present patent application is utilized a series of strategies and technology, has perfectly solved the problem on the organization and management that magnanimity resource brings.
Summary of the invention
The object of the invention is to the managerial difficulty that the magnanimity for the information on network causes, and the network service inefficiency and the inaccuracy that run into while providing, a kind of similar network establishing method towards large-scale network environment is provided, by the text resource link with similarity relation together, form a semantic virtual level with similarity relation.
For achieving the above object, design of the present invention is:
First, consider to utilize the strategy of dividing and rule, resource first coarse be divided into several less Resource Block, and then further careful processing in each Resource Block, reduces the complexity of directly processing in extensive resource;
Secondly, consider to build the network model of three layers, comprise resource layer, topic Ceng He community layer, guarantee to build the connectedness of network;
Again, consider to build feedback mechanism, lay respectively at topic Ceng He community layer, make more resource incorporate the network of structure;
Finally, consider to add man-machine interaction mechanism, allow user revise the link of the similar network of machine structure, make network linking more meet human thinking.
According to appeal inventive concept, the present invention adopts following technical proposals:
(1) thought based on acquaintance's immune algorithm, self adaptation is excavated network potential similarity community center;
(2) community center obtaining according to self adaptation, based on Boolean calculation, excavates the coarse similar community in large-scale network resource;
(3), based on matrix reasoning, excavate the topic center in similar community;
(4) according to obtained topic center, use the k-means algorithm of revising, form topic;
(5) build the similar network with Three Tiered Network Architecture, include community's layer, topic layer and resource layer;
(6) utilizing man-machine interaction mechanism to adjust in the similar network of having set up links.
The present invention compares with existing semantic interlink network establishing method, has following significant advantage: the present invention, towards large-scale network resource, adopts the strategy of dividing and rule generally, reduces the time complexity of direct construction.When excavating similar community, the selection of community center is subject to the inspiration of immune algorithm thought, and the similar community center of excavation is than the more high accuracy that has of choosing at random; The algorithm that has designed in addition Boolean calculation forms similar community, reduces the huge time loss that numerical operation brings.While producing different topic in coarse similar community, the center of topic is not some text resources, but that the method for utilizing matrix reasoning produces is more accurate, has that the frequent mode of the keyword of strong representation ability more forms, and be adaptive generation; In addition, utilize the thought of k-means algorithm to form topic, wherein added similarity threshold to avoid the text resource that similarity is lower to be grouped into the accuracy that has reduced plan structure network in some topics.Finally, structure be the similar network with Three Tiered Network Architecture, different layers is managed different level knowledge, meets knowledge hierarchy structure, and has increased network link.
Details are as follows for a preferred embodiment of the present invention:
This concrete implementation step towards the similar network establishing method of large-scale network environment is as follows:
(1) thought based on acquaintance's immune algorithm, self adaptation is excavated network potential similarity Web Community center.From resource collection, get at random a text, find some texts similarly, and extract the main contents of these texts as the center of potential similarity community.
(2) community center obtaining according to self adaptation, based on Boolean calculation, excavates the coarse similar community in large-scale network resource.The similar community center forming according to previous step, does logical “and” operation to similar community center one by one the Internet resources of magnanimity, and satisfactory resource division, in the similar community of correspondence, is formed to several coarse similar communities.
(3), based on matrix reasoning, excavate the topic center in similar community.Each excavation to coarse similar community by the form of matrix, represent, utilize matrix reasoning to excavate multinomial frequent mode, self adaptation forms different topic centers.Wherein defined matrix is: text resource of line display of matrix, and the corresponding keyword that forms text of row, if a text resource comprises certain keyword, in correspondence position set, on the contrary reset.And newly define a matrix operation operation
Figure 2011100442357100002DEST_PATH_IMAGE001
: every a line of previous matrix and each row of a rear matrix are done logic "and" operation, using the outcome record meeting the demands in matrix of consequence as the foundation of next step keyword Mining Frequent Patterns.If previous matrix is frontier, a rear matrix is later, and matrix of consequence is product,
Figure 261739DEST_PATH_IMAGE001
the process of computing is:
Figure 2011100442357100002DEST_PATH_IMAGE003
(m, n represents respectively line number and the columns of matrix frontier, n, p represents respectively line number and the columns of matrix later)
Figure 409004DEST_PATH_IMAGE004
Figure 2011100442357100002DEST_PATH_IMAGE005
Figure 356100DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE007
Matrix
Figure 810084DEST_PATH_IMAGE001
operation is to form potential topic center in order to excavate multinomial frequent item set.The formula of k item frequent mode is excavated in definition:
Figure 8984DEST_PATH_IMAGE008
Wherein, matrix D is the matrix notation of a coarse similar community, D tfor the transposed matrix of D,
Figure DEST_PATH_IMAGE009
record the matrix of consequence of k item Frequent Set.The last Candidate Set using the frequent item set excavating as the center of potential topic.
(4) according to obtained topic center, revise k-means algorithm, form topic.Similarity threshold is set, utilizes k-means method to form in each community and produce topic, in the topic that guarantees to form, do not include the text resource lower than similarity threshold.
(5) build the similar network with Three Tiered Network Architecture, include community's layer, topic layer and resource layer.Community's layer links all community centers; The core of all topics in topic layer link Yi Ge community; Internet resources in a topic of resource layer management.
(6) utilizing man-machine interaction mechanism to adjust in the similar network of having set up links.

Claims (3)

1.一种面向大规模网络环境的相似网络构建方法,其特征在于操作步骤如下:  1. A similar network construction method facing a large-scale network environment, characterized in that the steps of operation are as follows: (1)基于熟人免疫算法的思想,自适应挖掘网络潜在相似社区中心; (1) Based on the idea of acquaintance immune algorithm, adaptively mine potential similar community centers in the network; (2)根据自适应得到的社区中心,基于布尔运算,挖掘大规模网络资源中的粗糙相似社区; (2) According to the community centers obtained by self-adaptation, based on Boolean operations, mining rough similar communities in large-scale network resources; (3)基于矩阵推理,挖掘相似社区中的话题中心; (3) Based on matrix reasoning, mining topic centers in similar communities; (4)根据所获得的话题中心,运用修改k-means算法,形成话题;设置相似度阈值,利用k-means方法在每个社区中形成话题,保证形成的话题中不包含有低于相似度阈值的文本资源; (4) According to the obtained topic center, use the modified k-means algorithm to form a topic; set the similarity threshold, use the k-means method to form a topic in each community, and ensure that the formed topic does not contain any content lower than the similarity Threshold text resource; (5)构建具有三层网络结构的相似网络,包含有社区层,话题层和资源层; (5) Construct a similar network with a three-layer network structure, including the community layer, topic layer and resource layer; (6)利用人机交互机制调整已建立的相似网络中的链接; (6) Use the human-computer interaction mechanism to adjust the links in the established similar network; 所述步骤(3)中的话题中心的形成,是基于矩阵推理得到,且新定义一个矩阵运算操作,为前一个矩阵的每一行与后一个矩阵的每一列做逻辑运算,把满足要求的结果记录于结果矩阵作为下一步关键词挖掘频繁模式的依据,最后把利用矩阵运算得到的频繁模式作为潜在话题中心的候选集。 The formation of the topic center in the step (3) is obtained based on matrix reasoning, and a new matrix operation operation is defined to perform logical operations for each row of the previous matrix and each column of the next matrix, and the results that meet the requirements Recorded in the result matrix as the basis for frequent pattern mining in the next step, and finally the frequent pattern obtained by matrix operation is used as the candidate set of potential topic centers. 2.根据权利要求1所述的面向大规模网络环境的相似网络构建方法,其特征在于所述步骤(1)中的相似社区中心的形成,是受免疫算法思想的启发得到,一个随机选择的文本资源成为一个潜在相似社区中心的概率远小于与这个随机选择的文本相似的若干个文本的主要内容成为一个潜在相似社区中心的概率;因此,从资源集合中随机取一文本,找到若干与之相似的文本,并且提取这些文本的主要内容作为潜在相似社区的中心。 2. The similar network construction method for large-scale network environment according to claim 1, characterized in that the formation of the similar community centers in the step (1) is inspired by the idea of immune algorithm, a randomly selected The probability that a text resource becomes a potentially similar community center is far less than the probability that the main contents of several texts similar to this randomly selected text become a potential similar community center; similar texts, and extract the main content of these texts as the centers of potential similar communities. 3.根据权利要求1所述的面向大规模网络环境的相似网络构建方法,其特征在于所述步骤(5)中的构建具有三层网络结构的相似网络,设计为三层网络结构,包含有社区层、话题层和资源层,社区层链接所有社区中心,话题层链接一个社区中所有话题的核心,资源层管理一个话题中的网络资源。 3. The similar network construction method for a large-scale network environment according to claim 1, characterized in that the construction of a similar network with a three-layer network structure in the step (5) is designed as a three-layer network structure, including Community layer, topic layer and resource layer. The community layer links all community centers, the topic layer links the core of all topics in a community, and the resource layer manages network resources in a topic.
CN201110044235.7A 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method Expired - Fee Related CN102136975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110044235.7A CN102136975B (en) 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110044235.7A CN102136975B (en) 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method

Publications (2)

Publication Number Publication Date
CN102136975A CN102136975A (en) 2011-07-27
CN102136975B true CN102136975B (en) 2014-04-02

Family

ID=44296635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110044235.7A Expired - Fee Related CN102136975B (en) 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method

Country Status (1)

Country Link
CN (1) CN102136975B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2618274A1 (en) * 2012-01-18 2013-07-24 Alcatel Lucent Method for providing a set of services of a first subset of a social network to a user of a second subset of said social network
EP2741249A1 (en) * 2012-12-04 2014-06-11 Alcatel Lucent Method and device for optimizing information diffusion between communities linked by interaction similarities

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369921A (en) * 2008-09-12 2009-02-18 中国科学技术大学 A method for generating self-similar network traffic
CN101571853A (en) * 2009-05-22 2009-11-04 哈尔滨工程大学 Evolution analysis device and method for contents of network topics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7035838B2 (en) * 2002-12-06 2006-04-25 General Electric Company Methods and systems for organizing information stored within a computer network-based system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369921A (en) * 2008-09-12 2009-02-18 中国科学技术大学 A method for generating self-similar network traffic
CN101571853A (en) * 2009-05-22 2009-11-04 哈尔滨工程大学 Evolution analysis device and method for contents of network topics

Also Published As

Publication number Publication date
CN102136975A (en) 2011-07-27

Similar Documents

Publication Publication Date Title
CN102222092A (en) Massive high-dimension data clustering method for MapReduce platform
CN118861318A (en) A knowledge graph completion method based on heterogeneous relational graph attention
CN102136975B (en) Large-scale network environment-oriented similarity network construction method
Hu [Retracted] Decision‐Making Model of Product Modeling Big Data Design Scheme Based on Neural Network Optimized by Genetic Algorithm
CN106844934B (en) Smart city planning and designing expert system and smart city planning and designing method
CN105678382B (en) A kind of concept lattice merging method and system based on sub- Formal Context attributes similarity
Hong et al. The study of improved FP-growth algorithm in MapReduce
Gao et al. When decoupled GCN meets group discrimination: A special graph contrastive learning framework
CN112883278A (en) Bad public opinion propagation inhibition method based on big data knowledge graph of smart community
Xu Deep mining method for high-dimensional big data based on association rule
Fersini et al. A probabilistic relational approach for web document clustering
Wei et al. Industrial development and spatial structure in Changzhou city, China: The restructuring of the Sunan model
CN109829056A (en) Predicate explains the fact that template-driven Abductive reasoning method
Wu et al. Knowledge map application of business-oriented problem solving
Zhou et al. Identifying technology evolution pathways by integrating citation network and text mining
Zhang et al. Intelligent Analysis and Research on Clinical Data of Traditional Chinese Medicine Diagnosis and Treatment of Coronary Heart Disease Based on Data Mining
CN106156259A (en) A kind of user behavior information displaying method and system
CN104268270A (en) Map Reduce based method for mining triangles in massive social network data
Butka et al. One approach to combination of FCA-based local conceptual models for text analysis—grid-based approach
Kumar et al. Automatic retargeting of web page content
WO2020195545A1 (en) Information management device and information management method
He et al. Bat: mining binary-api topic for multi-service application development
Li et al. Noise Suppression with Label Graph in Distantly Supervised Relation Extraction
Han et al. PatHT: an efficient method of classification over evolving data streams
Zhai Research on the development of computer simulation technology in the context of blockchain

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140402

Termination date: 20170224