Summary of the invention
The object of the invention is to the managerial difficulty that the magnanimity for the information on network causes, and the network service inefficiency and the inaccuracy that run into while providing, a kind of similar network establishing method towards large-scale network environment is provided, by the text resource link with similarity relation together, form a semantic virtual level with similarity relation.
For achieving the above object, design of the present invention is:
First, consider to utilize the strategy of dividing and rule, resource first coarse be divided into several less Resource Block, and then further careful processing in each Resource Block, reduces the complexity of directly processing in extensive resource;
Secondly, consider to build the network model of three layers, comprise resource layer, topic Ceng He community layer, guarantee to build the connectedness of network;
Again, consider to build feedback mechanism, lay respectively at topic Ceng He community layer, make more resource incorporate the network of structure;
Finally, consider to add man-machine interaction mechanism, allow user revise the link of the similar network of machine structure, make network linking more meet human thinking.
According to appeal inventive concept, the present invention adopts following technical proposals:
(1) thought based on acquaintance's immune algorithm, self adaptation is excavated network potential similarity community center;
(2) community center obtaining according to self adaptation, based on Boolean calculation, excavates the coarse similar community in large-scale network resource;
(3), based on matrix reasoning, excavate the topic center in similar community;
(4) according to obtained topic center, use the k-means algorithm of revising, form topic;
(5) build the similar network with Three Tiered Network Architecture, include community's layer, topic layer and resource layer;
(6) utilizing man-machine interaction mechanism to adjust in the similar network of having set up links.
The present invention compares with existing semantic interlink network establishing method, has following significant advantage: the present invention, towards large-scale network resource, adopts the strategy of dividing and rule generally, reduces the time complexity of direct construction.When excavating similar community, the selection of community center is subject to the inspiration of immune algorithm thought, and the similar community center of excavation is than the more high accuracy that has of choosing at random; The algorithm that has designed in addition Boolean calculation forms similar community, reduces the huge time loss that numerical operation brings.While producing different topic in coarse similar community, the center of topic is not some text resources, but that the method for utilizing matrix reasoning produces is more accurate, has that the frequent mode of the keyword of strong representation ability more forms, and be adaptive generation; In addition, utilize the thought of k-means algorithm to form topic, wherein added similarity threshold to avoid the text resource that similarity is lower to be grouped into the accuracy that has reduced plan structure network in some topics.Finally, structure be the similar network with Three Tiered Network Architecture, different layers is managed different level knowledge, meets knowledge hierarchy structure, and has increased network link.
Details are as follows for a preferred embodiment of the present invention:
This concrete implementation step towards the similar network establishing method of large-scale network environment is as follows:
(1) thought based on acquaintance's immune algorithm, self adaptation is excavated network potential similarity Web Community center.From resource collection, get at random a text, find some texts similarly, and extract the main contents of these texts as the center of potential similarity community.
(2) community center obtaining according to self adaptation, based on Boolean calculation, excavates the coarse similar community in large-scale network resource.The similar community center forming according to previous step, does logical “and” operation to similar community center one by one the Internet resources of magnanimity, and satisfactory resource division, in the similar community of correspondence, is formed to several coarse similar communities.
(3), based on matrix reasoning, excavate the topic center in similar community.Each excavation to coarse similar community by the form of matrix, represent, utilize matrix reasoning to excavate multinomial frequent mode, self adaptation forms different topic centers.Wherein defined matrix is: text resource of line display of matrix, and the corresponding keyword that forms text of row, if a text resource comprises certain keyword, in correspondence position set, on the contrary reset.And newly define a matrix operation operation
: every a line of previous matrix and each row of a rear matrix are done logic "and" operation, using the outcome record meeting the demands in matrix of consequence as the foundation of next step keyword Mining Frequent Patterns.If previous matrix is frontier, a rear matrix is later, and matrix of consequence is product,
the process of computing is:
(m, n represents respectively line number and the columns of matrix frontier, n, p represents respectively line number and the columns of matrix later)
Matrix
operation is to form potential topic center in order to excavate multinomial frequent item set.The formula of k item frequent mode is excavated in definition:
Wherein, matrix D is the matrix notation of a coarse similar community, D
tfor the transposed matrix of D,
record the matrix of consequence of k item Frequent Set.The last Candidate Set using the frequent item set excavating as the center of potential topic.
(4) according to obtained topic center, revise k-means algorithm, form topic.Similarity threshold is set, utilizes k-means method to form in each community and produce topic, in the topic that guarantees to form, do not include the text resource lower than similarity threshold.
(5) build the similar network with Three Tiered Network Architecture, include community's layer, topic layer and resource layer.Community's layer links all community centers; The core of all topics in topic layer link Yi Ge community; Internet resources in a topic of resource layer management.
(6) utilizing man-machine interaction mechanism to adjust in the similar network of having set up links.