CN102136975B - Large-scale network environment-oriented similarity network construction method - Google Patents

Large-scale network environment-oriented similarity network construction method Download PDF

Info

Publication number
CN102136975B
CN102136975B CN201110044235.7A CN201110044235A CN102136975B CN 102136975 B CN102136975 B CN 102136975B CN 201110044235 A CN201110044235 A CN 201110044235A CN 102136975 B CN102136975 B CN 102136975B
Authority
CN
China
Prior art keywords
network
community
layer
topic
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110044235.7A
Other languages
Chinese (zh)
Other versions
CN102136975A (en
Inventor
骆祥峰
倪晶晶
张顺香
张俊
陆磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201110044235.7A priority Critical patent/CN102136975B/en
Publication of CN102136975A publication Critical patent/CN102136975A/en
Application granted granted Critical
Publication of CN102136975B publication Critical patent/CN102136975B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a large-scale network resource environment-oriented similarity network construction method, which aims to enhance the semantic of links among resources, form a semantic virtual layer with similarity relation and improve the low accuracy and inaccuracy of conventional network resource management and network services. The method comprises the following steps of: firstly, roughly dividing the resources into a plurality of relatively smaller resource blocks by utilizing a divide and conquer strategy, and further performing fine processing on each resource block to reduce the complexity of directly processing the large-scale resources; secondly, constructing a three-layer network model comprising a resource layer, a topic layer and a community layer to ensure the connectivity of a constructed network; thirdly, constructing feedback mechanisms at the topic layer and the community layer to fuse more resources into the constructed network respectively; and finally, adding a human-computer interaction mechanism to enable a user to correct the links of the similarity network constructed by machines and make the network links more consistent with human thinking.

Description

A kind of similar network establishing method towards large-scale network environment
Technical field
The present invention relates to a kind of similar network establishing method towards large-scale network resource environment, more specifically, relate to a kind ofly in the text resource of magnanimity, set up similar link to form the method for a similar network.
Background technology
Similar network is a semantic layer based on similarity relation organization network resource, aims to provide to have to enrich semantic network service, for example: network intelligence is browsed and intelligent search.But it is an impossible thing that the method for traditional similar network of structure is applied under the environment of large-scale network resource.Reason has two: one is high complexity computing time.Suppose to utilize to calculate between two and directly calculate similarity,, when resource extent is very large, Riming time of algorithm is the growth of resource extent quadratic term.Another reason is that the similar network link building is poor.While supposing to calculate, similarity threshold is got low, can guarantee to build the connectedness of network, but accuracy is just lower.And similarity threshold is got height, guaranteed accuracy and reduced connectedness.Present patent application is utilized a series of strategies and technology, has perfectly solved the problem on the organization and management that magnanimity resource brings.
Summary of the invention
The object of the invention is to the managerial difficulty that the magnanimity for the information on network causes, and the network service inefficiency and the inaccuracy that run into while providing, a kind of similar network establishing method towards large-scale network environment is provided, by the text resource link with similarity relation together, form a semantic virtual level with similarity relation.
For achieving the above object, design of the present invention is:
First, consider to utilize the strategy of dividing and rule, resource first coarse be divided into several less Resource Block, and then further careful processing in each Resource Block, reduces the complexity of directly processing in extensive resource;
Secondly, consider to build the network model of three layers, comprise resource layer, topic Ceng He community layer, guarantee to build the connectedness of network;
Again, consider to build feedback mechanism, lay respectively at topic Ceng He community layer, make more resource incorporate the network of structure;
Finally, consider to add man-machine interaction mechanism, allow user revise the link of the similar network of machine structure, make network linking more meet human thinking.
According to appeal inventive concept, the present invention adopts following technical proposals:
(1) thought based on acquaintance's immune algorithm, self adaptation is excavated network potential similarity community center;
(2) community center obtaining according to self adaptation, based on Boolean calculation, excavates the coarse similar community in large-scale network resource;
(3), based on matrix reasoning, excavate the topic center in similar community;
(4) according to obtained topic center, use the k-means algorithm of revising, form topic;
(5) build the similar network with Three Tiered Network Architecture, include community's layer, topic layer and resource layer;
(6) utilizing man-machine interaction mechanism to adjust in the similar network of having set up links.
The present invention compares with existing semantic interlink network establishing method, has following significant advantage: the present invention, towards large-scale network resource, adopts the strategy of dividing and rule generally, reduces the time complexity of direct construction.When excavating similar community, the selection of community center is subject to the inspiration of immune algorithm thought, and the similar community center of excavation is than the more high accuracy that has of choosing at random; The algorithm that has designed in addition Boolean calculation forms similar community, reduces the huge time loss that numerical operation brings.While producing different topic in coarse similar community, the center of topic is not some text resources, but that the method for utilizing matrix reasoning produces is more accurate, has that the frequent mode of the keyword of strong representation ability more forms, and be adaptive generation; In addition, utilize the thought of k-means algorithm to form topic, wherein added similarity threshold to avoid the text resource that similarity is lower to be grouped into the accuracy that has reduced plan structure network in some topics.Finally, structure be the similar network with Three Tiered Network Architecture, different layers is managed different level knowledge, meets knowledge hierarchy structure, and has increased network link.
Details are as follows for a preferred embodiment of the present invention:
This concrete implementation step towards the similar network establishing method of large-scale network environment is as follows:
(1) thought based on acquaintance's immune algorithm, self adaptation is excavated network potential similarity Web Community center.From resource collection, get at random a text, find some texts similarly, and extract the main contents of these texts as the center of potential similarity community.
(2) community center obtaining according to self adaptation, based on Boolean calculation, excavates the coarse similar community in large-scale network resource.The similar community center forming according to previous step, does logical “and” operation to similar community center one by one the Internet resources of magnanimity, and satisfactory resource division, in the similar community of correspondence, is formed to several coarse similar communities.
(3), based on matrix reasoning, excavate the topic center in similar community.Each excavation to coarse similar community by the form of matrix, represent, utilize matrix reasoning to excavate multinomial frequent mode, self adaptation forms different topic centers.Wherein defined matrix is: text resource of line display of matrix, and the corresponding keyword that forms text of row, if a text resource comprises certain keyword, in correspondence position set, on the contrary reset.And newly define a matrix operation operation
Figure 2011100442357100002DEST_PATH_IMAGE001
: every a line of previous matrix and each row of a rear matrix are done logic "and" operation, using the outcome record meeting the demands in matrix of consequence as the foundation of next step keyword Mining Frequent Patterns.If previous matrix is frontier, a rear matrix is later, and matrix of consequence is product,
Figure 261739DEST_PATH_IMAGE001
the process of computing is:
Figure 2011100442357100002DEST_PATH_IMAGE003
(m, n represents respectively line number and the columns of matrix frontier, n, p represents respectively line number and the columns of matrix later)
Figure 409004DEST_PATH_IMAGE004
Figure 2011100442357100002DEST_PATH_IMAGE005
Figure 356100DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE007
Matrix
Figure 810084DEST_PATH_IMAGE001
operation is to form potential topic center in order to excavate multinomial frequent item set.The formula of k item frequent mode is excavated in definition:
Figure 8984DEST_PATH_IMAGE008
Wherein, matrix D is the matrix notation of a coarse similar community, D tfor the transposed matrix of D,
Figure DEST_PATH_IMAGE009
record the matrix of consequence of k item Frequent Set.The last Candidate Set using the frequent item set excavating as the center of potential topic.
(4) according to obtained topic center, revise k-means algorithm, form topic.Similarity threshold is set, utilizes k-means method to form in each community and produce topic, in the topic that guarantees to form, do not include the text resource lower than similarity threshold.
(5) build the similar network with Three Tiered Network Architecture, include community's layer, topic layer and resource layer.Community's layer links all community centers; The core of all topics in topic layer link Yi Ge community; Internet resources in a topic of resource layer management.
(6) utilizing man-machine interaction mechanism to adjust in the similar network of having set up links.

Claims (3)

1. towards a similar network establishing method for large-scale network environment, it is characterized in that operating procedure is as follows:
(1) thought based on acquaintance's immune algorithm, self adaptation is excavated network potential similarity community center;
(2) community center obtaining according to self adaptation, based on Boolean calculation, excavates the coarse similar community in large-scale network resource;
(3), based on matrix reasoning, excavate the topic center in similar community;
(4) according to obtained topic center, use and revise k-means algorithm, form topic; Similarity threshold is set, utilizes k-means method to form topic in each community, in the topic that guarantees to form, do not include the text resource lower than similarity threshold;
(5) build the similar network with Three Tiered Network Architecture, include community's layer, topic layer and resource layer;
(6) utilize man-machine interaction mechanism to adjust the link in the similar network of having set up;
The formation at the topic center in described step (3), that reasoning obtains based on matrix, and newly define a matrix operation operation, for every a line of previous matrix and each row of a rear matrix are done logical operation, using the outcome record meeting the demands in matrix of consequence as the foundation of next step keyword Mining Frequent Patterns, the last Candidate Set using the frequent mode that utilizes matrix operation to obtain as potential topic center.
2. the similar network establishing method towards large-scale network environment according to claim 1, it is characterized in that the formation of the similar community center in described step (1), be obtained by the inspiration of immune algorithm thought, the probability that the text resource of a random selection becomes Yi Ge potential similarity community center becomes the probability of Yi Ge potential similarity community center much smaller than the main contents of several texts similar to this random text of selecting; Therefore, from resource collection, get at random a text, find some texts similarly, and extract the main contents of these texts as the center of potential similarity community.
3. the similar network establishing method towards large-scale network environment according to claim 1, it is characterized in that the structure in described step (5) has the similar network of Three Tiered Network Architecture, be designed to Three Tiered Network Architecture, include community's layer, topic layer and resource layer, community's layer links all community centers, the core of all topics in topic layer link Yi Ge community, the Internet resources in a topic of resource layer management.
CN201110044235.7A 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method Expired - Fee Related CN102136975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110044235.7A CN102136975B (en) 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110044235.7A CN102136975B (en) 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method

Publications (2)

Publication Number Publication Date
CN102136975A CN102136975A (en) 2011-07-27
CN102136975B true CN102136975B (en) 2014-04-02

Family

ID=44296635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110044235.7A Expired - Fee Related CN102136975B (en) 2011-02-24 2011-02-24 Large-scale network environment-oriented similarity network construction method

Country Status (1)

Country Link
CN (1) CN102136975B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2618274A1 (en) * 2012-01-18 2013-07-24 Alcatel Lucent Method for providing a set of services of a first subset of a social network to a user of a second subset of said social network
EP2741249A1 (en) * 2012-12-04 2014-06-11 Alcatel Lucent Method and device for optimizing information diffusion between communities linked by interaction similarities

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369921A (en) * 2008-09-12 2009-02-18 中国科学技术大学 Self-similar network service generation method
CN101571853A (en) * 2009-05-22 2009-11-04 哈尔滨工程大学 Evolution analysis device and method for contents of network topics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7035838B2 (en) * 2002-12-06 2006-04-25 General Electric Company Methods and systems for organizing information stored within a computer network-based system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369921A (en) * 2008-09-12 2009-02-18 中国科学技术大学 Self-similar network service generation method
CN101571853A (en) * 2009-05-22 2009-11-04 哈尔滨工程大学 Evolution analysis device and method for contents of network topics

Also Published As

Publication number Publication date
CN102136975A (en) 2011-07-27

Similar Documents

Publication Publication Date Title
CN104281617A (en) Domain knowledge-based multilayer association rules mining method and system
CN102768670B (en) Webpage clustering method based on node property label propagation
CN102609528B (en) Frequent mode association sorting method based on probabilistic graphical model
CN102073700A (en) Discovery method of complex network community
CN107819756B (en) Method for improving mining income
CN104361036A (en) Association rule mining method for alarm event
CN106484754A (en) Based on hierarchical data and the knowledge forest layout method of diagram data visualization technique
CN103150163A (en) Map/Reduce mode-based parallel relating method
CN105404637A (en) Data mining method and device
Fadaei et al. Enhanced K-means re-clustering over dynamic networks
Zhai et al. A two-layer algorithm based on PSO for solving unit commitment problem
CN102136975B (en) Large-scale network environment-oriented similarity network construction method
CN106844934B (en) Smart city planning and designing expert system and smart city planning and designing method
Le et al. A novel algorithm for mining high utility itemsets
CN103577899B (en) A kind of service combining method combined with QoS based on creditability forceast
CN104834709A (en) Parallel cosine mode mining method based on load balancing
Seol et al. Reduction of association rules for big data sets in socially-aware computing
CN105678382B (en) A kind of concept lattice merging method and system based on sub- Formal Context attributes similarity
Hong et al. The study of improved FP-growth algorithm in MapReduce
Hou et al. Simulating the dynamics of urban land quantity in China from 2020 to 2070 under the Shared Socioeconomic Pathways
Zhou et al. Identifying technology evolution pathways by integrating citation network and text mining
CN109829056A (en) Predicate explains the fact that template-driven Abductive reasoning method
CN104268270A (en) Map Reduce based method for mining triangles in massive social network data
CN106156259A (en) A kind of user behavior information displaying method and system
CN104036024A (en) Spatial clustering method based on GACUC (greedy agglomerate category utility clustering) and Delaunay triangulation network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140402

Termination date: 20170224

CF01 Termination of patent right due to non-payment of annual fee