CN101685470B - Query statistic-based guidance searching method for P2P system - Google Patents

Query statistic-based guidance searching method for P2P system Download PDF

Info

Publication number
CN101685470B
CN101685470B CN2009103022853A CN200910302285A CN101685470B CN 101685470 B CN101685470 B CN 101685470B CN 2009103022853 A CN2009103022853 A CN 2009103022853A CN 200910302285 A CN200910302285 A CN 200910302285A CN 101685470 B CN101685470 B CN 101685470B
Authority
CN
China
Prior art keywords
statistics
query
guiding table
inquiry
sgt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009103022853A
Other languages
Chinese (zh)
Other versions
CN101685470A (en
Inventor
陈贵海
于南南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN2009103022853A priority Critical patent/CN101685470B/en
Publication of CN101685470A publication Critical patent/CN101685470A/en
Application granted granted Critical
Publication of CN101685470B publication Critical patent/CN101685470B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a query statistic-based guidance searching method for a P2P system. The method comprises the following steps: (1) establishing a statistic guidance table SGT; establishing a 'statistic guidance table' with a double layer structure based on history query statistics and semantic analysis results, wherein only a history queried currently is saved in an upstream table entry, and is ranked according to query time; and node information and advantage factors which respond to the current query or similar query are saved in an downstream table entry; (2) querying files by usingthe statistic guidance table, and performing querying by using information in the SGT according to the semantic dependency of the current query and SGT history query after the statistic guidance table is established by nodes; automatically starting an underlying query strategy if the history saved in the SGT has little dependency with the current query; and (3) adaptively updating the statistic guidance table, including two methods, namely active updating and passive updating, wherein the passive mode takes place after the node forwards a query request.

Description

A kind of P2P system guidance searching method based on query statistic
One, technical field
The present invention designs a kind of inquiry Enhancement Method of the P2P of being common to network, and this method is independent of the bottom topological structure, utilizes historical query statistics and semantic analysis result each query requests to be directed to most probable and to provide on the node of corresponding information and carries out.Its use effectively reduces the network overhead of inquiry, improves query success rate, reduces communication delay, and increases extra maintenance costs hardly.
Two, background technology
The querying method that up to the present a lot of classics have been arranged is under hybrid P2P network, based on the inquiry mode of server; Inundation mode under the non-structure P2P network, walk mode, expanded ring mode at random, super node mode, or the like; Close on mode, position based on the numerical value of DHT under the structural P 2 P model and close on mode, matching way by turn, or the like.But we find that various in actual applications models are owing to the defective of self is restricted, and non-structure P2P network relies on advantages such as topological structure is simple, fault-tolerance is strong to become the P2P utility system of present main flow, but some drawbacks limit of non-structure peer-to-peer network self its better expansion and operation, wherein Zui Da problem is exactly data query success ratio and the contradiction of inquiring about expense.The network overhead that the inquiry of traditional inundation formula brings is excessive, and inquiry modes such as the walking at random of low expense, expanded ring can't guarantee the success ratio of inquiring about.
Based on above observation, the present invention proposes a kind of inquiry Enhancement Method of the P2P of being common to network, it is based on historical query statistics and semantic analysis result of information, each query guidance is inquired about to more efficient node, can avoid using the querying method of bottom-layer network so as far as possible, can effectively improve query success rate, reduce the inquiry experiment, alleviate offered load, also can further improve fault-tolerance, the robustness of network.
Three, summary of the invention
The present invention seeks to: propose a kind of Enhancement Method of inquiry of the P2P of being common to network, this method is independent of the bottom topological structure, utilizes historical query statistics and semantic analysis result each query requests to be directed to most probable and to provide on the node of corresponding information and carries out.To achieve these goals, technical scheme of the present invention is:
(1) sets up statistics guiding table (Statistics Guided Table is called for short SGT),, set up double-deck statistics guiding table based on to historical query statistics and semantic analysis result.The upper strata list item only keeps the historical record of up-to-date inquiry, according to the time-sequencing of inquiry; Lower floor's list item is preserved this inquiry of response or was responded the node information (comprising IP, Port etc.) and the advantage factor (as: indexs of functional evaluation such as available bandwidth, time delay, response times) of similar inquiry.
(2) utilize statistics guiding table inquiry file, in case after node is set up statistics guiding table, can at first effectively utilize the information in the statistics guiding table SGT to inquire about according to the semantic dependency size of this inquiry with statistics guiding table SGT historical query; In case the historical record that SGT preserves and this inquiry correlativity are little, perhaps use SGT can't obtain required result, can enable the bottom query strategy automatically.Can only use the query strategy of fabric when proposing query requests for the first time, create SGT after the message that meets with a response,
(3) adaptive renewal statistics guiding table comprises and initiatively upgrading and two kinds of methods of passive renewal.After passive mode occurs in the request of node forwarding inquiries, use the mode of incidentally upgrading (piggybacking), this mode can not increase overhead.After initiatively renewal occurs in successfully inquiry, at first source node (inquiry promoter) can utilize the SGT of the answer information updating oneself that obtains, initiatively node information relevant with this inquiry among the own SGT is issued destination node then, allow destination node utilize its SGT of these information updatings.
The SGT update mode, node is carried out potential cluster, if current SGT and this inquiry correlativity are little, directly enable the bottom querying method, suppose to obtain this resource by certain node, at this moment need to replace the relevant information of SGT inquiry the earliest, very similar when finding two node inquiry after the calculating similarity, so merge two node information; Under miss situation, use bottom inquiry mode supposes that node R, S respond this request, with preceding once similar, sets up new inquiry list item, with list item and the most similar inquiry merging the earliest.
The present invention compared with prior art, its beneficial effect is, the present invention proposes a kind of inquiry Enhancement Method of the P2P of being common to network, it is based on historical query statistics and semantic analysis result of information, each query guidance is inquired about to more efficient node, can avoid using the querying method of bottom-layer network so as far as possible.This method is independent of the bottom topological structure, can both use at non-structure P2P network and structured P 2 P network.Can effectively improve overall performance of network under the very little prerequisite of SGT expense safeguarding, such as: improve query success rate, alleviate offered load, reduce inquiry time delay or the like, also can improve robustness, the fault-tolerance of system.
Four, description of drawings
Fig. 1 is the structure that the present invention adds up the guiding table
Fig. 2 be SGT of the present invention initiatively upgrade synoptic diagram
Fig. 3 is the performance comparison figure that the present invention utilizes many iterative querys of SGT, wherein (a) success ratio, (b) average delay, (c) average message number
Fig. 4 is the expense that the present invention safeguards SGT
The variation of the success ratio of system when Fig. 5 is node dynamic change of the present invention
Five, embodiment
The present invention can be divided into 3 stages: set up statistics guiding table, the algorithm of utilization statistics guiding table inquiry, the algorithm that adaptive updates is added up the guiding table.
Can provide the partial simulation experimental result at last.
Stage 1: set up statistics guiding table
For the convenience of following article and algorithm narration, at first provide the meaning of some symbols, as shown in table 1.
Table 1 symbol table
Figure GDA0000019715150000031
This paper is based on two rational hypothesis: during (1) user's shared resource, may share relevant resource very greatly.(2) user's request resource in relative time of concentration has certain preference.First hypothesis and document [18]The middle interest local phenomenon of describing is similar.Second hypothesis is the result to user behavior analysis, and user's request Internet resources in the time of concentrating relatively have clearer and more definite purpose.Based on this, this paper introduces a kind of statistics guiding table based on historical query statistics and semantic analysis (Statistics Guided Table is called for short SGT), as shown in Figure 1.SGT is divided into 2 layers, and the upper strata list item only keeps the historical record (K=3 among the figure) of up-to-date K inquiry, according to the time-sequencing of inquiry; The record (M=2 among the figure) of one group of M item of preservation that each list item of lower floor is all corresponding, this inquiry of recording responses or responded the node information (comprising IP, Port etc.) and the advantage factor (as: available bandwidth, time delay, response times etc.) of similar inquiry.The catalogue or the criteria for classification of the suitable bottom node information of upper strata list item as can be seen.From the query strategy of following introduction and update strategy, can embody the advantage of this structure.
Aspect computing semantic similarity, emerge a lot of technology in information retrieval field, here (vector space model is VSM) with potential semantic indexing (latentsemantic indexing, LSI) technology for the popular vector space model of our suggestive selection [28]Be about to querying condition and be converted into vector representation.When calculating the similarity degree of twice inquiry, we are converted into the cosine value of the vector that calculates them, utilize following formula:
similarity ( q 1 , q 2 ) = cos < q r 1 , q r 2 > = q r 1 &CenterDot; q r 2 | q r 1 | &CenterDot; | q r 2 |
Wherein
Figure GDA0000019715150000033
It is the vector of inquiry q.
Stage 2: the algorithm that utilizes the inquiry of statistics guiding table
When node enters the P2P network for the first time,, can only use the query strategy of fabric when proposing query requests for the first time, create SGT after the message that meets with a response, inquire about according to algorithm 1 afterwards without any the historical query record.
The search algorithm of algorithm 1 SGT
I.for?i=1?to?K
sum=sum+Sim(q,Q i)
II.if?sum≥T q
1. maximally related K from SGT qUnder the individual historical record, selected
The M of sharp factor maximum qIndividual node sends query requests;
2. wait for a period of time, collect and reply message
If have node response //the SGT hit situation
Connect, obtain resource;
Else //the SGT situation of missing the target
Enable the bottom querying method,
Else //SGT is not high with the inquiry correlativity
Directly enable the bottom querying method;
This as can be seen searching method is a kind of loose Enhancement Method, and SGT depends on the hit rate of SGT to the size of system's contribution, show that in simulated experiment the SGT hit rate is very high, especially utilizing the SGT iteration repeatedly when (SGT_hops is configured to 2,3,4).What algorithm 1 was described is that SGT_hops is a process of using SGT to inquire about at 1 o'clock, and repeatedly the method for iteration SGT is similar with it.
Stage 3: the algorithm of adaptive updates statistics guiding table
For the hit rate that improves SGT also reduces the expense of safeguarding SGT, we use the active and passive update mode that combines.After passive mode occurs in the request of node forwarding inquiries, use the mode of incidentally upgrading (piggybacking), this mode can not increase overhead.After initiatively renewal occurs in successfully inquiry, at first source node (inquiry promoter) can utilize the SGT of the answer information updating oneself that obtains, initiatively node information relevant with this inquiry among the own SGT is issued destination node then, allow destination node utilize its SGT of these information updatings, destination node can be understood the information of more these class files like this, this also is that (resource of sharing is many more for a kind of incentive policy, the possibility that oneself obtains related resource is big more), on the other hand, this active is upgraded and is made SGT can obtain the information of farther place junction associated, enlarge own coverage, improve the SGT hit rate.Maintenance costs only during this time produces, but this expense is very little, because be that single is unidirectional.Utilize this mode passive and that active mode combines can well improve the service efficiency of SGT.
Following algorithm is passive and initiatively upgrades the method for SGT, it should be noted that passive renewal only occurs over just the second layer of SGT, only need sort according to advantage factor, adds or replace the node of certain historical query of sensing; And when initiatively upgrading, SGT is two-layer all will to be changed, and at first needs to merge the historical query information in (perhaps deletion) ground floor, merges the node information of (perhaps replacing) relevant historical inquiry then.
The passive update algorithm of algorithm 2 SGT
Carry out behind // intermediate node forwarding inquiries the q
I. find out the inquiry Q the most similar to inquiring about q Max, establishing maximum similarity is Sim Max
II.if?Sim max≥T m
The source node information of inquiry q is incorporated into Q MaxIn the node tabulation pointed,
else
Ignore this inquiry q information
The active update algorithm of algorithm 3 SGT
// source node successfully obtains to carry out after the resource
I. upgrade the SGT of oneself
If result is to use SGT to obtain
1. replace the most similar query contents Q with this inquiry q Max
2. the node and original Q that merge this inquiry of response MaxThe node information of pointing to,
3. again in chronological order to SGT historical query information sorting at the middle and upper levels;
else
1. find out and inquire about the earliest Q iSimilarity historical query Q Max, similarity is Sim Max
if?Sim max≥T m
Merge Q 1And Q MaxRelevant node
2. delete Q 1Query Information and node information
3. with this querying condition q and response node information, create new list item.Join in chronological order in original guiding table
II. notify the purpose node to upgrade its SGT
1. will inquire about maximally related M with this qIndividual node information sends to the purpose node;
2. the purpose node upgrades its SGT according to the mode of passive renewal
This SGT update mode can be carried out potential cluster to node interest, as shown in Figure 2.Suppose that node X current SGT in source is shown in Fig. 2 (a), begin inquiry " film 1 " then, this moment, its SGT and this inquiry correlativity were little, directly enable the bottom querying method, suppose to obtain this resource, at this moment need to replace the relevant information of SGT inquiry the earliest, find that " songstress 1 " is very similar to " songstress 3 " after the calculating similarity by node F, so merge two node information, as Fig. 2 (b); Node X continues inquiry " film 2 " after supposing this, because SGT has similar inquiry, so at first use SGT, under miss situation, be forced to use the bottom inquiry mode, suppose that node R, S respond this request, with preceding once similar, set up new list item " film 2 ", list item the earliest and the most similar inquiry are merged, shown in Fig. 2 (c); Then, node X continues to search " books 1 ", because guiding table content is uncorrelated, after directly obtaining resource by underlay approach, upgrade SGT and can attempt historical query is the earliest merged to the back list item, but since not high with other inquiry correlativity, so directly delete this, create new list item and add SGT, shown in Fig. 2 (d).
As can be seen from Figure 2, this replacement policy can also be adjusted the granularity of cluster by the time priority principle, and the historical query information approaching more from present time is abundant more, and this is another advantage of " statistics guiding table ".
The simulated experiment result
Fig. 3 is when utilizing the SGT iteration repeatedly to inquire about, and the situation of system aspects performance is still analyzed from query success rate, inquiry time delay, these several respects of system load.Represent the number of times of iterative query with SGT_hops, SGT_hops=1 is the situation that Fig. 3 discussed.As can be seen from Figure 5 repeatedly use SGT can further improve the performance of system, as Fig. 3 (a), when SGT_hops=3 or 4, query success rate increases nearly 270% than Gnutella, make query success rate improve 32% from 0.088%.Fig. 3 (b) shows that the number of times along with iteration SGT increases, and postponing a meeting or conference during inquiry further is reduced to 78%; But as can be seen when the SGT_hops=4, system's operation a period of time back loading can increase and increase along with inquiry times from Fig. 3 (c).Reason is can cause too much node to participate in the inquiry when using SGT iterative query number of times too much, produces the effect of similar inundation.So be not to use the many more effects that system performance is improved of SGT number of iterations big more.Comprehensive from these several figure as can be seen SGT_hops=3 the integral body of system improved reached optimum, this moment, query success rate improved nearly 270% than Gnutella; It is nearly 78% that average delay reduces, and system load also can reduce nearly 32%.
What Fig. 4 described is the expense of safeguarding that SGT is required.We have simulated SGT_hops respectively and had safeguarded that the SGT message count accounted for the ratio situation of overall message number at 1,2,3 o'clock.In fact the expense of safeguarding SGT is very little, is less than 0.04%.This is consistent in the target of design SGT algorithm with us.
Fig. 5 be under dynamic environment, utilize the SGT iterative query 1 time, 2 times, 3 times the time and Gnutella in the contrast aspect the query success rate.In the experimentation, at first under the stable situation of system, allow the collapse of 10%, 30%, 50% node successively, carry out 10000 inquiries respectively; Then allow the node of collapse add system again, carry out 10000 inquiries more respectively.As can be seen from the figure the use of SGT increases to system robustness, fault freedom, and when 50% node collapsed, the query success rate of Gnutella had dropped to below 1%; (when SGT_hops=3) query success rate can guarantee about 11% after using SGT, still than the Gnutella under the steady state (SS) high nearly 20%.From figure we also as can be seen, after the node adding system again of collapse, the steady state (SS) before the use of SGT can help system return to very soon.

Claims (3)

1. P2P system guidance searching method based on query statistic is characterized in that described guidance searching method step is:
(1) sets up statistics guiding table,, set up double-deck " statistics guiding table " based on to historical query statistics and semantic analysis result; The upper strata list item of statistics guiding table only keeps the historical record of up-to-date inquiry, and the historical record of up-to-date inquiry is according to the time-sequencing of inquiry; Lower floor's list item of statistics guiding table is preserved this inquiry of response or was responded IP, Port node information and available bandwidth, time delay, the response times advantage factor of similar inquiry;
(2) utilize statistics guiding table inquiry file, in case after node is set up statistics guiding table,, effectively utilize the information in the statistics guiding table to inquire about according to the semantic dependency size of this inquiry with the historical query of statistics guiding table; In case the historical record that statistics guiding table is preserved and this inquiry correlativity are little, perhaps use statistics guiding table can't obtain required result, can enable the bottom query strategy automatically, can only use the query strategy of fabric when proposing query requests for the first time, create statistics guiding table after the message that meets with a response;
(3) adaptive renewal statistics guiding table comprises and initiatively upgrading and two kinds of methods of passive renewal; After passive mode occurs in the request of node forwarding inquiries, use the mode of incidentally upgrading, this mode can not increase overhead; After initiatively renewal occurs in successfully inquiry, at first inquiring about the promoter is the statistics guiding table that the source node can utilize the answer information updating oneself that obtains, initiatively issue destination node then, allow destination node utilize its statistics guiding table of these information updatings oneself adding up node information relevant in the guiding table with this inquiry.
2. a kind of P2P system guidance searching method based on query statistic according to claim 1 is characterized in that: set up statistics guiding table based on the result to historical query statistics and semantic analysis; Statistics guiding table can carry out potential cluster to user's interest, and in the list item that upgrades, collects and the maximally related node information of user interest as far as possible; In inquiry, can make full use of the result of statistics guiding table record like this, faster, find corresponding resource more accurately, and effectively reduce the expense that inquiry is introduced.
3. a kind of P2P system guidance searching method according to claim 1 based on query statistic, it is characterized in that: when the algorithm of design update statistics guiding table, tend to not increase under the prerequisite of too much overhead, make the more Useful Informations of statistics guiding table record, so adopt active and passive dual mode when upgrading statistics guiding table as far as possible; Passive renewal is a kind of incidentally update mode, does not at this time introduce extra expense fully; Lead when showing and initiatively upgrade statistics, can be to potential interest according to the user, selectively replace node information, and when initiatively upgrading, can utilize the statistics guiding table of oneself to upgrade its statistics guiding table by proactive notification purpose node, this also be a kind of incentive policy, on the other hand, the information that makes statistics guiding table obtain the farther place junction associated is upgraded in this active, expands the coverage area, and makes that the use of statistics guiding table is more efficient.
CN2009103022853A 2009-05-14 2009-05-14 Query statistic-based guidance searching method for P2P system Expired - Fee Related CN101685470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009103022853A CN101685470B (en) 2009-05-14 2009-05-14 Query statistic-based guidance searching method for P2P system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009103022853A CN101685470B (en) 2009-05-14 2009-05-14 Query statistic-based guidance searching method for P2P system

Publications (2)

Publication Number Publication Date
CN101685470A CN101685470A (en) 2010-03-31
CN101685470B true CN101685470B (en) 2011-05-25

Family

ID=42048631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009103022853A Expired - Fee Related CN101685470B (en) 2009-05-14 2009-05-14 Query statistic-based guidance searching method for P2P system

Country Status (1)

Country Link
CN (1) CN101685470B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045392A (en) * 2010-12-14 2011-05-04 武汉大学 Interest-based adaptive topology optimization method for unstructured P2P (peer-to-peer) network
CN106446235B (en) * 2016-10-10 2021-04-06 Tcl科技集团股份有限公司 Video searching method and device

Also Published As

Publication number Publication date
CN101685470A (en) 2010-03-31

Similar Documents

Publication Publication Date Title
Chen et al. Leveraging social networks for P2P content-based file sharing in disconnected MANETs
Cambazoglu et al. Scalability challenges in web search engines
Lv et al. RISC: ICN routing mechanism incorporating SDN and community division
Soheili et al. Spatial queries in sensor networks
Doulkeridis et al. Peer-to-peer similarity search in metric spaces
CN101272399A (en) Method for implementing full text retrieval system based on P2P network
Zhou et al. kNN processing with co-space distance in SoLoMo systems
Yin et al. Efficient distributed skyline computation using dependency-based data partitioning
Sourlas et al. Caching in content-based publish/subscribe systems
Li et al. Efficient progressive processing of skyline queries in peer-to-peer systems
CN102377826B (en) Method for optimal placement of unpopular resource indexes in peer-to-peer network
CN101685470B (en) Query statistic-based guidance searching method for P2P system
Lubbe et al. DiSCO: A distributed semantic cache overlay for location-based services
Cui et al. Efficient skyline computation in structured peer-to-peer systems
CN101442466B (en) Superpose network and implementing method
Yeferny et al. An efficient peer-to-peer semantic overlay network for learning query routing
CN103491128A (en) Optimal placement method for popular resource duplicates in peer-to-peer network
Jia et al. Cluster-based content caching driven by popularity prediction
Qiu et al. Web service discovery based on semantic matchmaking with UDDI
Lee et al. A prediction-based query processing strategy in mobile commerce systems
Atsan et al. Applicability of eigenvector centrality principle to data replication in MANETs
Yang et al. Centrality-based peer rewiring in semantic overlay networks: Short paper
Zhang et al. Forming and searching content-based hierarchical agent clusters in distributed information retrieval systems
Zhang et al. Yarqs: Yet another range queries schema in dht based p2p network
Zhao Data quality-oriented data integration in Peer-to-Peer system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110525

Termination date: 20120514