CN106445989A  Query click graphbased search recommendation model optimization  Google Patents
Query click graphbased search recommendation model optimization Download PDFInfo
 Publication number
 CN106445989A CN106445989A CN201610390608.9A CN201610390608A CN106445989A CN 106445989 A CN106445989 A CN 106445989A CN 201610390608 A CN201610390608 A CN 201610390608A CN 106445989 A CN106445989 A CN 106445989A
 Authority
 CN
 China
 Prior art keywords
 inquiry
 probability
 click
 ij
 random walk
 Prior art date
Links
 238000005457 optimization Methods 0 abstract title 4
 238000004364 calculation methods Methods 0 abstract 2
 238000007405 data analysis Methods 0 abstract 1
 238000000605 extraction Methods 0 abstract 1
 230000014509 gene expression Effects 0 abstract 1
 230000001976 improved Effects 0 abstract 1
 238000000034 methods Methods 0 abstract 1
 238000005065 mining Methods 0 abstract 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/90—Details of database functions independent of the retrieved data types
 G06F16/95—Retrieval from the web
 G06F16/953—Querying, e.g. by the use of web search engines
 G06F16/9535—Search customisation based on user profiles and personalisation
Abstract
Description
Technical field
The present invention relates to a kind of iconic model prioritization scheme, more particularly to a kind of retrieval for clicking on figure based on inquiry recommend mould Type optimizes.
Background technology
A lot of scholars are researched and analysed to user's search daily record, mainly click on bipartite graph from inquiry word association and inquiry Aspect sets up inquiry recommended models.Due to the knowledge hierarchy difference of user, and scan for during operation, the presence of random submission to Nonstandard query word and incoherent Query Result is clicked on, cause to exist in inquiry log inaccurate, lack of standardization in a large number and not Representative Query Information, is elapsed over time, and these inaccurate information gradually can be accumulated, if utilizing conventional recommendation Method, these inaccurate information excavatings is understood, it will recommend inquiry that is inaccurate or being not accepted by the user.Therefore, In the big data epoch, accurate, representative highquality Query Information being excavated from largescale daily record, is to build inquiry to push away The important foundation that recommends.
Content of the invention
The purpose of the present invention is that to solve the above problems and provides a kind of retrieval based on inquiry click figure and recommend Model optimization.
The present invention is achieved through the following technical solutions abovementioned purpose：
The present invention includes that optimization aim builds, weighted value is reconstructed and proposed algorithm optimization；
The optimization aim builds：
Understood according to the above analysis, it is the topmost Search Results of inquiry to click on most pages in Search Results； The relation that we first click on element in bipartite graph for inquiry sets up formalized description：
Define 1 order inquiry and click on bipartite graph G={ Q ∪ U, E, W }, wherein Q and represent inquiry session node set, U represents and looks into Results web page set is ask, E represents the set in figure side, and W represents the weight set on side；Then side e in bipartite graph is clicked on for inquiry_{ij} Weight W_{ij}Construction method is as follows：
The optimization aim of bipartite graph is clicked in inquiry：
Formula (1) represents：When inquiry session node is q_{i}(q_{i}∈ Q) when, twovalue optimized variable c_{ij}Represent that figure is clicked in inquiry Whether side e have selected_{ij}, and the loss function of optimization aim is the weight on maximized selection side with constraints is to retain side Inquiry and Webpage correlation weight be maximum, i.e. c_{ij}When=1, w_{ij}≥w_{ik}And w_{ij}≥w_{kj}；When meeting this target, represent query point Hit in figure and remain many maximum times with regard to inquiring about and click on as far as possible；
One can be inquired about for optimization aim formula (1) or webpage selects multiple identical weight limit sides；If drawing Enter degree d (the i)=∑ of each node_{j}δ (i, j) and d (j)=∑_{i}δ (i, j), then formula (1) be equivalent to formula (2), wherein δ (i, J) query node q is represented_{i}With web page joint u_{i}Between whether there is side (existing for 1, be otherwise 0)；
The optimization aim equivalent form of value of core figure is clicked in inquiry：
In the constraint of optimization aim (2), explicit permission inquiry clicks on a query node of core in figure while connecting Multiple web page joints are connected to, while also allowing inquiry to click on a web page joint connection multiple queries node of core in figure；
The weighted value reconstruct：
Such as define in 1, bipartite graph G={ Q ∪ U, E, W } is clicked in inquiry, first, is provided with a_{ij}Individual user has carried out clicking on behaviour Make；Now, weight W that conventional construction inquiry is connected side with webpage is to use inquiry q_{i}Corresponding webpage u_{j}Number of clicks c_{ij}Table Show, i.e. w_{ij}=c_{ij}；By analysis it was found that user is when Search Results are browsed, some users relatively enliven, number of clicks Many, some numbers of clicks are few, due to the difference of user activity, cause touching quantity really can not reflect between inquiry and webpage The degree of association；In order to avoid the appearance of this biasing phenomenon, we introduce user's frequency to replace number of clicks, i.e. w_{ij}=a_{ij}； Secondly, for same inquiry, user clicks two webpage u_{1}And u_{2}, and touching quantity is equal, if u_{1}Also by more Inquiry was clicked on, then explanation occurs in u_{1}On click there is no u_{2}Important, that is, u_{1}Low with inquiry degree of association；Therefore, it can Inverse enquiry frequency is set up to each webpage, i.e.,：
In formula, N represents the quantity of inquiry, N_{q}Represent the inquiry quantity for clicking the webpage；Now, w is made_{ij}=c_{ij}·iqf (u)；
Based on this, transition probability the Theory Construction weight can also be utilized；Following two probits are calculated first：
(1) inquiry Session Handoff is to the probability of related web page：
(2) related web page is to the transition probability of inquiry session：
As transition probability has unsymmetry, i.e. P (u_{j}q_{i})≠P(q_{i}u_{j}), therefore can adopt linear interpolation or take advantage of The method of product carrys out the symmetry of equalizing weight, such as makes w_{ij}=α P (q_{i}u_{j})+(1α)P(u_{j}q_{i}) wherein α be adjustable JIESHEN Number), or make w_{ij}=P (q_{i}u_{j})·P(u_{j}q_{i})；
The proposed algorithm optimization：
(1) basic model：Most basic inquiry recommendation method is to be clicked in bipartite graph to click on cooccurrence according to inquiry Inquiry is recommended；This thought is amplified further, that is, the inquiry with identical click is similar, and we will be by random The similarity is propagated by migration method；Namely from initial query, click on according to click on bipartite graph in inquiry Probability migration is to adjacent inquiry, and continues migration from adjacent inquiry；With this iteration, until terminating；Random walk model have before to With backward two kinds of migration modes；Two kinds of migration modes can be represented with same group of definition；
Equally, bipartite graph is clicked in inquiry and G={ Q ∪ U, E, W } is defined as, make M represent the nodes of inquiry, N represents net Page nodes, w_{ij}Represent inquiry q_{i}With webpage u_{j}Click weight；Probability transfer matrix A=(M+N) × (M+N) is built, is then saved Point transition probability A [i, j]=P (q_{j}q_{i}), it is reintroduced into from transition probability s, then new transition probability P (v_{j}v_{i}) definition such as formula (6)；
According to given start node v_{i}, the random walk iteration of forward or a backward can be carried out；Before being a difference in that to Migration is possible to the inquiry q' for obtaining inquiring about that q clicks on most possible arrival on bipartite graph in inquiry, it is contemplated that start node v_{i}Trip The probability of other nodes is gone to, i.e.,：And backward migration may reach initial query node q, examine Consider from other node migration to start node v_{i}Probability, i.e.,：
(2) problem finds：On the basis of abovementioned algorithm, the value of arrange parameter n and s, n represents the node being introduced in bipartite graph Quantity；S represents from transition probability, i.e., migration to other nodes, s value should not be set to 0.9 quickly in transfer process；At place When reason inquiry is recommended, the value of n is bigger, represents and wants that introducing more nodes carries out migration, or even can include all sections in whole figure Point, can so bring " proposed topic drift " problem, be exactly that the inquiry and user inquiry degree of association that reaches of migration is not high；Specifically deposit In problems with：
For migration forward, after iteration for several times, transition probability is transmitted in more popular inquiry, causes to push away The inquiry that recommends is inaccurate or uncorrelated；Such as " People Weekly " is inquired about, " global personage " and " epoch people may be recommended to last The more popular publication such as thing "；When being propagated using migration backward, probability can tend to homogenization, can recommend spelling wrong or The relatively low inquiry of frequency；
Traditional recommended models can not effectively distinguish the inquiry for disagreeing figure, and the inquiry in random walk model is recommended to be profit Carried out with the similar propagation of probability, before part can be caused to have tight association or closely similar inquiry to be recommended in most so that push away Recommend result is more single, reduce the diversification of recommendation；
(3) algorithm optimization：In order to solve the problems, such as abovementioned tradition random walk recommended models, propose based on query point The random walk recommended models of figure are hit, by describing inaccurate and recommending to cut without representational in conventional recommendation model Branch；According to the iterative algorithm of random walk ,probability distribution the situation of web page joint can be inquired about, can be now each Webpage is selected the inquiry of corresponding inquiry click in figure and recommends user；
The random walk model proposed algorithm of figure is clicked on based on inquiry：
The Random Walk Algorithm convergence process of forward and backward is as follows：
In forward direction random walk, transition probability matrix is carried out using the markovian Stationary Distribution in stochastic process Convergence；Given shiftmatrix A, if there is iterationses n, works as A^{n}During [i, j] ＞ 0, then the Ma Erke being made up of all nodes Husband's chain is homogeneous aperiodic and irreducible, with unique Stationary Distribution；Now forward direction random walk iterative model can turn It is changed into v^{T}(n+1)=v^{T}(n) A=v (0) A^{n}；Work as A^{n}Tend to A [i, j]=π when Stationary Distribution_{j}, wherein each stage is steady Distribution probability is π^{T}=[π_{1},π_{2},...,π_{M+N}], so lim_{n→∞}V (n)=π, it is probability distribution to be apparent from as probability v (0) When, v^{T}N () A must be stationary binomial random process；
Rear to random walk when, initially propose also not provide convergence card in the document of backward random walk model Bright；Equally we assume that random matrix A Stable distritation, even if being apparent from probability v (0) for probability distribution, A v (n) also differs Surely it is probability distribution；Therefore normalized vector v in an iterative process, orderBecause probability The row of shiftmatrix A and 0 is all higher than for all transition probabilities in 1, and A, when probability v (0) is homogeneous distribution, iteration mistake Journey carries out probability normalized according to the row of probability transfer matrix A, i.e. norm (A v (n))=v (0), and now algorithm can not The disconnected distribution probability for obtaining uniforming；If in the case of whole inquiry click bipartite graph is strongly connected, any two node It is intercommunication, then in iterative process, each item of vector v can all be more than zero, and then constantly iteration can be by v normalization；Formalization For：Iterative process takes advantage of matrix A for left side, and therefore after nth iteration, value is：If A Stationary Distribution, then A^{n}=[π_{1},π_{2},...,π_{M+1}], nowIt is and v^{T}With the row vector of length, because Z is the homogenization factor, if v (0) is probability distribution,It is to be uniformly distributed, the initial state of system entropy maximum is exactly state when which is uniformly distributed, after Just it is intended to return to the state of setting out of system most original to random walk model essence；And forward direction random walk model is that system passes through Constantly iteration extends forward, eventually finds steady statue；Recommend in application, when whole in figure possesses more click in inquiry When query node comes preferential position, that is, the plateau that forward direction random walk model is obtained；And work as all nodes of in figure When distribution probability is identical, backward random walk model reaches Stationary Distribution；Therefore, in recommendation process, homogeneous probability and hot topic Node matrix equation convergence in probability is unfavorable for that inquiry is recommended；Suitable iterationses are set in advance and from transition probability, such as n=10, s= 0.9, with the scope of random walk in this control figure.
The beneficial effects of the present invention is：
The present invention is a kind of retrieval recommended models optimization based on inquiry click figure, compared with prior art, present invention head Search behavior first to user and intention are analyzed, and the data extraction method and expression of search behavior is ground Study carefully, by the deep excavation to inquiring about session, it is proposed that the query word correlating method based on user's inquiry log.Secondly, emphasis The theory of bipartite graph recommended models is clicked on to traditional directory and computational methods are analyzed.As the knot of bipartite graph is clicked in inquiry Structure is simple, practical, and implementation process does not rely on term and webpage Similarity Measure, is therefore widely used in searching During index is held up.The present invention proposes using click frequency and replaces number of clicks to build the weight on side in bipartite graph, so permissible Avoid weight from not biased by excessive invalid clicks, make commending system reach steady statue as far as possible.Finally, by experiment and Data analysiss demonstrate the superiority of improved model in terms of three.
Specific embodiment
The invention will be further described below：
The present invention includes that optimization aim builds, weighted value is reconstructed and proposed algorithm optimization；
The optimization aim builds：
Understood according to the above analysis, it is the topmost Search Results of inquiry to click on most pages in Search Results； The relation that we first click on element in bipartite graph for inquiry sets up formalized description：
Define 1 order inquiry and click on bipartite graph G={ Q ∪ U, E, W }, wherein Q and represent inquiry session node set, U represents and looks into Results web page set is ask, E represents the set in figure side, and W represents the weight set on side；Then side e in bipartite graph is clicked on for inquiry_{ij} Weight W_{ij}Construction method is as follows：
The optimization aim of bipartite graph is clicked in inquiry：
Formula (1) represents：When inquiry session node is q_{i}(q_{i}∈ Q) when, twovalue optimized variable c_{ij}Represent that figure is clicked in inquiry Whether side e have selected_{ij}, and the loss function of optimization aim is the weight on maximized selection side with constraints is to retain side Inquiry and Webpage correlation weight be maximum, i.e. c_{ij}When=1, w_{ij}≥w_{ik}And w_{ij}≥w_{kj}；When meeting this target, represent query point Hit in figure and remain many maximum times with regard to inquiring about and click on as far as possible；
One can be inquired about for optimization aim formula (1) or webpage selects multiple identical weight limit sides；If drawing Enter degree d (the i)=∑ of each node_{j}δ (i, j) and d (j)=∑_{i}δ (i, j), then formula (1) be equivalent to formula (2), wherein δ (i, J) query node q is represented_{i}With web page joint u_{i}Between whether there is side (existing for 1, be otherwise 0)；
The optimization aim equivalent form of value of core figure is clicked in inquiry：
In the constraint of optimization aim (2), explicit permission inquiry clicks on a query node of core in figure while connecting Multiple web page joints are connected to, while also allowing inquiry to click on a web page joint connection multiple queries node of core in figure；
By finding to above optimization aim analysis, the problem has certain contact and area with traditional stable matching problem Not；The core concept of stable matching is to realize a kind of steady statue, and in this state, coupling no longer has such two when finishing Individual set main body；In reality, men and women's blind date familiar to us, the example such as company's intern and buyer seller are namely based on stable The thought of market matching theory is developed；Wherein bilateral model and delay receive two pieces of weights that algorithm is stable matching theory Want foundation stone；
The major function of a lot of markets of bipartite matching model and social system is exactly that main body therein can be led with another Body phase is mated：For example, student and school, office worker and company, are old enough to get married between men and women；This market coupling is broadly divided into " monolateral city Field coupling " (SingleSidedMarketMatch) and " two day market coupling " (TwoSided Market Match)；Wherein " oneside market coupling " refers to only exist a set in market, and the individuality in set is mutually matched according to respective preference；So And, " roommate " phenomenon in oneside market coupling can cause the unstable of coupling；When assume exist four " roommate " A, B, C, D }, wherein A most preference B, B most preference C, C most preference A, and they are classified as D as least preference person；In this case, any It is grouped twobytwo all and cannot realizes stablizing, because current matching can be terminated with the people that D is grouped together goes with matched people again Coupling, and specifically new coupling will be successful so that market cannot realize stable (Gale＆Shapley, 1962) always；" bilateral Matching Model " is proposed from research student application school's model and marriage stable problem by Gale and Shapley (1962) earliest； Socalled " two day market " refers to there is such a market, has two class individual collections, the individuality in first kind set in market Can only match with the individuality in Equations of The Second Kind set；They demonstrate in such a two day market, as long as the preference of individuality With completeness and transferability, and the freedom that market is enough, individuality can be allowed to carry out any potentially possible coupling, entirely Process can be carried out with iteration, until all individualities have coupling object, reach whole market stable；Bipartite matching model is present This characteristic of stable matching so which is obtained in theory and practice and is widely applied；
What this chapter was proposed clicks on the improved recommended models of bipartite graph to inquiry, with " the oneside market in stable matching problem Coupling " is similar, is that similar aspect is as follows with regard to inquiry and web page joint number stable matching problem：
(1) inquiry session node and to return web page joint number possibility different, therefore not can determine that all nodes have Pairing as；
(2) only there is click preference relation in most inquiry sessions and the webpage of oneself correlation between, not be and all webpages Exist and click on preference；
(3) inquiry is clicked in bipartite graph and is likely to occur number of clicks (weight) identical side, now cannot get Proper Match；
The weighted value reconstruct：
Such as define in 1, bipartite graph G={ Q ∪ U, E, W } is clicked in inquiry, first, is provided with a_{ij}Individual user has carried out clicking on behaviour Make；Now, weight W that conventional construction inquiry is connected side with webpage is to use inquiry q_{i}Corresponding webpage u_{j}Number of clicks c_{ij}Table Show, i.e. w_{ij}=c_{ij}；By analysis it was found that user is when Search Results are browsed, some users relatively enliven, number of clicks Many, some numbers of clicks are few, due to the difference of user activity, cause touching quantity really can not reflect between inquiry and webpage The degree of association；In order to avoid the appearance of this biasing phenomenon, we introduce user's frequency to replace number of clicks, i.e. w_{ij}=a_{ij}； Secondly, for same inquiry, user clicks two webpage u_{1}And u_{2}, and touching quantity is equal, if u_{1}Also by more Inquiry was clicked on, then explanation occurs in u_{1}On click there is no u_{2}Important, that is, u_{1}Low with inquiry degree of association；It is right to therefore, it can Each webpage sets up inverse enquiry frequency, i.e.,：
In formula, N represents the quantity of inquiry, N_{q}Represent the inquiry quantity for clicking the webpage；Now, w is made_{ij}=c_{ij}·iqf (u)；
Based on this, transition probability the Theory Construction weight can also be utilized；Following two probits are calculated first：
(1) inquiry Session Handoff is to the probability of related web page：
(2) related web page is to the transition probability of inquiry session：
As transition probability has unsymmetry, i.e. P (u_{j}q_{i})≠P(q_{i}u_{j}), therefore can adopt linear interpolation or take advantage of The method of product carrys out the symmetry of equalizing weight, such as makes w_{ij}=α P (q_{i}u_{j})+(1α)P(u_{j}q_{i}) wherein α be adjustable JIESHEN Number), or make w_{ij}=P (q_{i}u_{j})·P(u_{j}q_{i})；
Using set forth herein built using user's frequency inquiry click on bipartite graph in weight, weight can be avoided not Biased by excessive invalid clicks number of times；The benefit for so building is that in bipartite graph, all of side is all integer, is easy to followup The solution of optimized algorithm；The number of users of other search daily record is the whole weight that clicks in bipartite graph of inquiring about with its result is straight Sight is readily appreciated；
The proposed algorithm optimization：
Through above, the mathematical model and corresponding algorithm of inquiry click bipartite graph is analyzed, we have proposed and be based on The new proposed algorithm of figure is clicked in inquiry, and the algorithm has filtered inaccurate and underrepresented inquiry to be recommended, and successfully avoid Conventional recommendation algorithm ignores the equivalence and problem typical that inquires about under same group；And avoid excessive invalid clicks number of times to draw The biasing problem for rising, improves the precision of inquiry recommended models well；
(1) basic model：Most basic inquiry recommendation method is to be clicked in bipartite graph to click on cooccurrence according to inquiry Inquiry is recommended；This thought is amplified further, that is, the inquiry with identical click is similar, and we will be by random The similarity is propagated by migration method；Namely from initial query, click on according to click on bipartite graph in inquiry Probability migration is to adjacent inquiry, and continues migration from adjacent inquiry；With this iteration, until terminating；Random walk model have before to With backward two kinds of migration modes；Two kinds of migration modes can be represented with same group of definition；
Equally, bipartite graph is clicked in inquiry and G={ Q ∪ U, E, W } is defined as, make M represent the nodes of inquiry, N represents net Page nodes, w_{ij}Represent inquiry q_{i}With webpage u_{j}Click weight；Probability transfer matrix A=(M+N) × (M+N) is built, is then saved Point transition probability A [i, j]=P (q_{j}q_{i}), it is reintroduced into from transition probability s, then new transition probability P (v_{j}v_{i}) definition such as formula (6)；
According to given start node v_{i}, the random walk iteration of forward or a backward can be carried out；Before being a difference in that to Migration is possible to the inquiry q' for obtaining inquiring about that q clicks on most possible arrival on bipartite graph in inquiry, it is contemplated that start node v_{i}Trip The probability of other nodes is gone to, i.e.,：And backward migration may reach initial query node q, examine Consider from other node migration to start node v_{i}Probability, i.e.,：
(2) problem finds：On the basis of abovementioned algorithm, the value of arrange parameter n and s, n represents the node being introduced in bipartite graph Quantity；S represents from transition probability, i.e., migration to other nodes, s value should not be set to 0.9 quickly in transfer process；At place When reason inquiry is recommended, the value of n is bigger, represents and wants that introducing more nodes carries out migration, or even can include all sections in whole figure Point, can so bring " proposed topic drift " problem, be exactly that the inquiry and user inquiry degree of association that reaches of migration is not high；Specifically deposit In problems with：
For migration forward, after iteration for several times, transition probability is transmitted in more popular inquiry, causes to push away The inquiry that recommends is inaccurate or uncorrelated；Such as " People Weekly " is inquired about, " global personage " and " epoch people may be recommended to last The more popular publication such as thing "；When being propagated using migration backward, probability can tend to homogenization, can recommend spelling wrong or The relatively low inquiry of frequency；
Traditional recommended models can not effectively distinguish the inquiry for disagreeing figure, and the inquiry in random walk model is recommended to be profit Carried out with the similar propagation of probability, before part can be caused to have tight association or closely similar inquiry to be recommended in most so that push away Recommend result is more single, reduce the diversification of recommendation；
Traditional random walk model proposed algorithm is as follows：
(3) algorithm optimization：In order to solve the problems, such as abovementioned tradition random walk recommended models, propose based on query point The random walk recommended models of figure are hit, by describing inaccurate and recommending to cut without representational in conventional recommendation model Branch；According to the iterative algorithm of random walk ,probability distribution the situation of web page joint can be inquired about, can be now each Webpage is selected the inquiry of corresponding inquiry click in figure and recommends user；
The random walk model proposed algorithm of figure is clicked on based on inquiry：
The Random Walk Algorithm convergence process of forward and backward is as follows：
In forward direction random walk, transition probability matrix is carried out using the markovian Stationary Distribution in stochastic process Convergence；Given shiftmatrix A, if there is iterationses n, works as A^{n}During [i, j] ＞ 0, then the Ma Erke being made up of all nodes Husband's chain is homogeneous aperiodic and irreducible, with unique Stationary Distribution；Now forward direction random walk iterative model can turn It is changed into v^{T}(n+1)=v^{T}(n) A=v (0) A^{n}；Work as A^{n}Tend to A [i, j]=π when Stationary Distribution_{j}, wherein each stage is steady Distribution probability is π^{T}=[π_{1},π_{2},...,π_{M+N}], so lim_{n→∞}V (n)=π, it is probability distribution to be apparent from as probability v (0) When, v^{T}N () A must be stationary binomial random process；
Rear to random walk when, initially propose also not provide convergence card in the document of backward random walk model Bright；Equally we assume that random matrix A Stable distritation, even if being apparent from probability v (0) for probability distribution, A v (n) also differs Surely it is probability distribution；Therefore normalized vector v in an iterative process, orderBecause probability turns Moving the row of matrix A and 0 is all higher than for all transition probabilities in 1, and A, when probability v (0) is homogeneous distribution, iterative process Row according to probability transfer matrix A carry out probability normalized, i.e. norm (A v (n))=v (0), and now algorithm can be continuous Obtain the distribution probability for uniforming；If in the case of whole inquiry click bipartite graph is strongly connected, any two node is Intercommunication, then in iterative process, each item of vector v can all be more than zero, and then constantly iteration can be by v normalization；Form is turned to：Iterative process takes advantage of matrix A for left side, and therefore after nth iteration, value is：If A Stationary Distribution, then A^{n}=[π_{1},π_{2},...,π_{M+1}], nowIt is and v^{T}With the row vector of length, because Z is the homogenization factor, if v (0) is probability distribution,It is to be uniformly distributed, the initial state of system entropy maximum is exactly state when which is uniformly distributed, after Just it is intended to return to the state of setting out of system most original to random walk model essence；And forward direction random walk model is that system passes through Constantly iteration extends forward, eventually finds steady statue；Recommend in application, when whole in figure possesses looking into for more click in inquiry When inquiry node comes preferential position, that is, the plateau that forward direction random walk model is obtained；And work as all nodes of in figure and divide When cloth probability is identical, backward random walk model reaches Stationary Distribution；Therefore, in recommendation process, homogeneous probability and hot topic are saved Dot matrix convergence in probability is unfavorable for that inquiry is recommended；Suitable iterationses are set in advance and from transition probability, such as n=10, s= 0.9, with the scope of random walk in this control figure.
Experiment and analysis
The inquiry click figure recommended models optimized algorithm performance that this section is proposed to this chapter by experiment is verified.By reality Test the given inquiry of data set and diagram data analysis is clicked on, mainly click on degree of association, recommend performance and recommendation results various from inquiry Change the recommendation method after three aspects compare traditional method and optimize, demonstrate the retrieval based on inquiry click figure after optimizing The effectiveness of proposed algorithm.
Analysis of experimental data
The network inquiry daily record that experimental data set is provided using BeiJing ZhongKe's laboratory, by arranging to data set and dividing Analysis, its log record file size is that always inquiry is recorded as 1135274 to 47MB, user, and total hits are 3675413, always Inquiry word number is 176687.Learnt after the statistical analysiss to query word frequency, retrieving query word of the number of times more than 5 is 28745, these words we classify as high frequency words, these high frequency words are for must inquire about record totally 883752.It follows that accounting for The high frequency query word of total query word 16.3%, but account for 77.8% inquiry times.We when pretreatment is carried out to data, if Threshold value is put for 5, lowfrequency word of the 83.7% retrieval number of times less than 5 is filtered out, has so also only cut 22.2% inquiry letter Breath.Space more conference due to data set causes recommended models more complicated, and after 83.7% query word is neglected, model is adopted Sample space is original 1/6, and low frequency query word corresponding be inquiry of low quality, if adopting lowfrequency word As iteration initial point, typically up to less than recommendation effect.To the essential information such as table 1 after the beta pruning arrangement of data.
Daily record data statistical information is clicked in the inquiry of table 1
According to the inquiry click figure information that excavates in data set, we carry out example at the sampled side of part different frequency Analysis.User mainly has three kinds of information interaction approach when using search engine：(1) dragnet station owner wants domain name to carry out website Search；(2) search name or fixing term find Authoritative Web pages, such as carry out relevant inquiring using the Baidupedia page；(3) search for The main description of information such as searches for lyrics information using title of the song finding the source of information.It is found that figure is clicked in inquiry retaining Touching quantity most webpage, and be the page that user is most interested in, its correspond to the query word that submits to be can accurate description User's request.
In order to distribution of the figure in different frequency is clicked in the inquiry after analysis optimization, we are according to " inquiry " and " webpage " node Between side weights, side is classified：Power while, Gao Quanbian, middle power while, low power while and during weak power, its each selfcorresponding use Family is clicked on frequency and is respectively：[1000 ,+∞), [100,1000], [10,100], [2,10], [1,1].Passed by calculating further The distribution situation of system bipartite graph and improvement bipartite graph in abovementioned five classification, as shown in table 2, the numeral in table bracket is to change Enter the shared ratio in traditional bipartite graph of bipartite graph.It can be found that from the table：(1) improve figure power while and class during weak power The ratio for accounting in type is higher than other three types, this is because what power side and weak Quan Bian represented is the most strong inquiry of the degree of association Click on；(2) as high power is in, middle power and low power side proportion is relatively low, illustrate that the degree of association is relatively low, it is possible to be removed.
2 traditional directory of table is clicked on bipartite graph and its improves distribution situation of the figure on dissimilar side
Ultimate principle and principal character and the advantages of the present invention of the present invention has been shown and described above.The technology of the industry Personnel it should be appreciated that the present invention is not restricted to the described embodiments, simply explanation described in abovedescribed embodiment and description this The principle of invention, without departing from the spirit and scope of the present invention, the present invention also has various changes and modifications, these changes Change and improvement is both fallen within scope of the claimed invention.The claimed scope of the invention by appending claims and its Equivalent thereof.
Claims (1)
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201610390608.9A CN106445989A (en)  20160603  20160603  Query click graphbased search recommendation model optimization 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201610390608.9A CN106445989A (en)  20160603  20160603  Query click graphbased search recommendation model optimization 
Publications (1)
Publication Number  Publication Date 

CN106445989A true CN106445989A (en)  20170222 
Family
ID=58183837
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201610390608.9A CN106445989A (en)  20160603  20160603  Query click graphbased search recommendation model optimization 
Country Status (1)
Country  Link 

CN (1)  CN106445989A (en) 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN106997379A (en) *  20170320  20170801  杭州电子科技大学  A kind of merging method of the close text based on picture text click volume 
CN107832468A (en) *  20171129  20180323  百度在线网络技术（北京）有限公司  Demand recognition methods and device 

2016
 20160603 CN CN201610390608.9A patent/CN106445989A/en active Search and Examination
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN106997379A (en) *  20170320  20170801  杭州电子科技大学  A kind of merging method of the close text based on picture text click volume 
CN107832468A (en) *  20171129  20180323  百度在线网络技术（北京）有限公司  Demand recognition methods and device 
Similar Documents
Publication  Publication Date  Title 

Jäschke et al.  Tag recommendations in folksonomies  
US7856449B1 (en)  Methods and apparatus for determining social relevance in near constant time  
CN105378764B (en)  Interactive concept editor in computerhuman's interactive learning  
Reid et al.  Mapping the contemporary terrorism research domain  
US20110078188A1 (en)  Mining and Conveying Social Relationships  
Thelwall  Conceptualizing documentation on the Web: An evaluation of different heuristic‐based models for counting links between university Web sites  
US7529735B2 (en)  Method and system for mining information based on relationships  
Hsu et al.  Collaborative and Structural Recommendation of Friends using Weblogbased Social Network Analysis.  
Liu et al.  How do users describe their information need: Query recommendation based on snippet click model  
CN101055587A (en)  Search engine retrieving result reordering method based on user behavior information  
Sharma et al.  A comparative analysis of web page ranking algorithms  
CN101770520A (en)  User interest modeling method based on user browsing behavior  
CN103399883B (en)  Method and system for performing personalized recommendation according to user interest points/concerns  
Spink et al.  Overlap among major web search engines  
CN102609512A (en)  System and method for heterogeneous information mining and visual analysis  
US7895195B2 (en)  Method and apparatus for constructing a link structure between documents  
CN103488724A (en)  Bookoriented reading field knowledge map construction method  
US20140358911A1 (en)  Search and discovery system  
CN103577549B (en)  Crowd portrayal system and method based on microblog label  
CN100583804C (en)  Method and system for processing social network expert information based on expert value propagation algorithm  
Thelwall et al.  Online presentations as a source of scientific impact? An analysis of PowerPoint files citing academic journals  
Zhong et al.  Research on China's tourism: A 35‐year review and authorship analysis  
Han et al.  International collaboration in LIS: global trends and networks at the country and institution level  
CN103823844A (en)  Question forwarding system and question forwarding method on the basis of subjective and objective context and in community questionandanswer service  
Arbelaitz et al.  Web usage and content mining to extract knowledge for modelling the users of the Bidasoa Turismo website and to adapt it 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
C06  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
CB03  Change of inventor or designer information 
Inventor after: Jia Hailong Inventor after: Hu Zhenwen Inventor after: Chen Ning Inventor after: Gao Zheng Inventor after: Tian Wenqiang Inventor before: Jia Hailong 

TA01  Transfer of patent application right 
Effective date of registration: 20171228 Address after: 430070 Hubei Province, Wuhan city Hongshan District Luoshi Road No. 122 Applicant after: Wuhan University of Technology Applicant after: Xinxiang University Address before: Xinxiang City, Henan province 453000 Jinsui Avenue East Xinxiang College Applicant before: Xinxiang University 

CB03  Change of inventor or designer information  
TA01  Transfer of patent application right 