CN106445989A - Query click graph-based search recommendation model optimization - Google Patents

Query click graph-based search recommendation model optimization Download PDF

Info

Publication number
CN106445989A
CN106445989A CN201610390608.9A CN201610390608A CN106445989A CN 106445989 A CN106445989 A CN 106445989A CN 201610390608 A CN201610390608 A CN 201610390608A CN 106445989 A CN106445989 A CN 106445989A
Authority
CN
China
Prior art keywords
inquiry
probability
click
ij
random walk
Prior art date
Application number
CN201610390608.9A
Other languages
Chinese (zh)
Inventor
贾海龙
Original Assignee
新乡学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 新乡学院 filed Critical 新乡学院
Priority to CN201610390608.9A priority Critical patent/CN106445989A/en
Publication of CN106445989A publication Critical patent/CN106445989A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses query click graph-based search recommendation model optimization. Compared with the prior art, the query click graph-based search recommendation model optimization is realized through the following steps of: firstly, analyzing a search behavior and an intention of a user, researching a data extraction method and an expression of the search behavior, and proposing a user query log-based query term correlation method through deeply mining a query session; and secondly, analyzing a theory and a calculation method of a traditional query click bipartite graph recommendation model. A query click bipartite graph is simple in structure, strong in practicability and independent of search term and webpage similarity calculation in the implementation process, so that the query click bipartite graph can be widely applied to search engines. According to the query click graph-based search recommendation model optimization, a clicking frequency is utilized to replace the number of clicks to construct a weight of a middle edge of the bipartite graph, so that the weight is prevented from being offset due to overmuch invalid clicks and the recommendation system can achieve a stable state as far as possible. Finally, the superiority of an improved model is proved in three aspects through experiments and data analysis.

Description

The retrieval recommended models for figure being clicked on based on inquiry optimize

Technical field

The present invention relates to a kind of iconic model prioritization scheme, more particularly to a kind of retrieval for clicking on figure based on inquiry recommend mould Type optimizes.

Background technology

A lot of scholars are researched and analysed to user's search daily record, mainly click on bipartite graph from inquiry word association and inquiry Aspect sets up inquiry recommended models.Due to the knowledge hierarchy difference of user, and scan for during operation, the presence of random submission to Nonstandard query word and incoherent Query Result is clicked on, cause to exist in inquiry log inaccurate, lack of standardization in a large number and not Representative Query Information, is elapsed over time, and these inaccurate information gradually can be accumulated, if utilizing conventional recommendation Method, these inaccurate information excavatings is understood, it will recommend inquiry that is inaccurate or being not accepted by the user.Therefore, In the big data epoch, accurate, representative high-quality Query Information being excavated from large-scale daily record, is to build inquiry to push away The important foundation that recommends.

Content of the invention

The purpose of the present invention is that to solve the above problems and provides a kind of retrieval based on inquiry click figure and recommend Model optimization.

The present invention is achieved through the following technical solutions above-mentioned purpose:

The present invention includes that optimization aim builds, weighted value is reconstructed and proposed algorithm optimization;

The optimization aim builds:

Understood according to the above analysis, it is the topmost Search Results of inquiry to click on most pages in Search Results; The relation that we first click on element in bipartite graph for inquiry sets up formalized description:

Define 1 order inquiry and click on bipartite graph G={ Q ∪ U, E, W }, wherein Q and represent inquiry session node set, U represents and looks into Results web page set is ask, E represents the set in figure side, and W represents the weight set on side;Then side e in bipartite graph is clicked on for inquiryij Weight WijConstruction method is as follows:

The optimization aim of bipartite graph is clicked in inquiry:

Formula (1) represents:When inquiry session node is qi(qi∈ Q) when, two-value optimized variable cijRepresent that figure is clicked in inquiry Whether side e have selectedij, and the loss function of optimization aim is the weight on maximized selection side with constraints is to retain side Inquiry and Webpage correlation weight be maximum, i.e. cijWhen=1, wij≥wikAnd wij≥wkj;When meeting this target, represent query point Hit in figure and remain many maximum times with regard to inquiring about and click on as far as possible;

One can be inquired about for optimization aim formula (1) or webpage selects multiple identical weight limit sides;If drawing Enter degree d (the i)=∑ of each nodejδ (i, j) and d (j)=∑iδ (i, j), then formula (1) be equivalent to formula (2), wherein δ (i, J) query node q is representediWith web page joint uiBetween whether there is side (existing for 1, be otherwise 0);

The optimization aim equivalent form of value of core figure is clicked in inquiry:

In the constraint of optimization aim (2), explicit permission inquiry clicks on a query node of core in figure while connecting Multiple web page joints are connected to, while also allowing inquiry to click on a web page joint connection multiple queries node of core in figure;

The weighted value reconstruct:

Such as define in 1, bipartite graph G={ Q ∪ U, E, W } is clicked in inquiry, first, is provided with aijIndividual user has carried out clicking on behaviour Make;Now, weight W that conventional construction inquiry is connected side with webpage is to use inquiry qiCorresponding webpage ujNumber of clicks cijTable Show, i.e. wij=cij;By analysis it was found that user is when Search Results are browsed, some users relatively enliven, number of clicks Many, some numbers of clicks are few, due to the difference of user activity, cause touching quantity really can not reflect between inquiry and webpage The degree of association;In order to avoid the appearance of this biasing phenomenon, we introduce user's frequency to replace number of clicks, i.e. wij=aij; Secondly, for same inquiry, user clicks two webpage u1And u2, and touching quantity is equal, if u1Also by more Inquiry was clicked on, then explanation occurs in u1On click there is no u2Important, that is, u1Low with inquiry degree of association;Therefore, it can Inverse enquiry frequency is set up to each webpage, i.e.,:

In formula, N represents the quantity of inquiry, NqRepresent the inquiry quantity for clicking the webpage;Now, w is madeij=cij·iqf (u);

Based on this, transition probability the Theory Construction weight can also be utilized;Following two probits are calculated first:

(1) inquiry Session Hand-off is to the probability of related web page:

(2) related web page is to the transition probability of inquiry session:

As transition probability has unsymmetry, i.e. P (uj|qi)≠P(qi|uj), therefore can adopt linear interpolation or take advantage of The method of product carrys out the symmetry of equalizing weight, such as makes wij=α P (qi|uj)+(1-α)P(uj|qi) wherein α be adjustable JIESHEN Number), or make wij=P (qi|uj)·P(uj|qi);

The proposed algorithm optimization:

(1) basic model:Most basic inquiry recommendation method is to be clicked in bipartite graph to click on co-occurrence according to inquiry Inquiry is recommended;This thought is amplified further, that is, the inquiry with identical click is similar, and we will be by random The similarity is propagated by migration method;Namely from initial query, click on according to click on bipartite graph in inquiry Probability migration is to adjacent inquiry, and continues migration from adjacent inquiry;With this iteration, until terminating;Random walk model have before to With backward two kinds of migration modes;Two kinds of migration modes can be represented with same group of definition;

Equally, bipartite graph is clicked in inquiry and G={ Q ∪ U, E, W } is defined as, make M represent the nodes of inquiry, N represents net Page nodes, wijRepresent inquiry qiWith webpage ujClick weight;Probability transfer matrix A=(M+N) × (M+N) is built, is then saved Point transition probability A [i, j]=P (qj|qi), it is re-introduced into from transition probability s, then new transition probability P (vj|vi) definition such as formula (6);

According to given start node vi, the random walk iteration of forward or a backward can be carried out;Before being a difference in that to Migration is possible to the inquiry q' for obtaining inquiring about that q clicks on most possible arrival on bipartite graph in inquiry, it is contemplated that start node viTrip The probability of other nodes is gone to, i.e.,:And backward migration may reach initial query node q, examine Consider from other node migration to start node viProbability, i.e.,:

(2) problem finds:On the basis of above-mentioned algorithm, the value of arrange parameter n and s, n represents the node being introduced in bipartite graph Quantity;S represents from transition probability, i.e., migration to other nodes, s value should not be set to 0.9 quickly in transfer process;At place When reason inquiry is recommended, the value of n is bigger, represents and wants that introducing more nodes carries out migration, or even can include all sections in whole figure Point, can so bring " proposed topic drift " problem, be exactly that the inquiry and user inquiry degree of association that reaches of migration is not high;Specifically deposit In problems with:

For migration forward, after iteration for several times, transition probability is transmitted in more popular inquiry, causes to push away The inquiry that recommends is inaccurate or uncorrelated;Such as " People Weekly " is inquired about, " global personage " and " epoch people may be recommended to last The more popular publication such as thing ";When being propagated using migration backward, probability can tend to homogenization, can recommend spelling wrong or The relatively low inquiry of frequency;

Traditional recommended models can not effectively distinguish the inquiry for disagreeing figure, and the inquiry in random walk model is recommended to be profit Carried out with the similar propagation of probability, before part can be caused to have tight association or closely similar inquiry to be recommended in most so that push away Recommend result is more single, reduce the diversification of recommendation;

(3) algorithm optimization:In order to solve the problems, such as above-mentioned tradition random walk recommended models, propose based on query point The random walk recommended models of figure are hit, by describing inaccurate and recommending to cut without representational in conventional recommendation model Branch;According to the iterative algorithm of random walk ,-probability distribution the situation of web page joint can be inquired about, can be now each Webpage is selected the inquiry of corresponding inquiry click in figure and recommends user;

The random walk model proposed algorithm of figure is clicked on based on inquiry:

The Random Walk Algorithm convergence process of forward and backward is as follows:

In forward direction random walk, transition probability matrix is carried out using the markovian Stationary Distribution in stochastic process Convergence;Given shift-matrix A, if there is iterationses n, works as AnDuring [i, j] > 0, then the Ma Erke being made up of all nodes Husband's chain is homogeneous aperiodic and irreducible, with unique Stationary Distribution;Now forward direction random walk iterative model can turn It is changed into vT(n+1)=vT(n) A=v (0) An;Work as AnTend to A [i, j]=π when Stationary Distributionj, wherein each stage is steady Distribution probability is πT=[π12,...,πM+N], so limn→∞V (n)=π, it is probability distribution to be apparent from as probability v (0) When, vTN () A must be stationary binomial random process;

Rear to random walk when, initially propose also not provide convergence card in the document of backward random walk model Bright;Equally we assume that random matrix A Stable distritation, even if being apparent from probability v (0) for probability distribution, A v (n) also differs Surely it is probability distribution;Therefore normalized vector v in an iterative process, orderBecause probability The row of shift-matrix A and 0 is all higher than for all transition probabilities in 1, and A, when probability v (0) is homogeneous distribution, iteration mistake Journey carries out probability normalized according to the row of probability transfer matrix A, i.e. norm (A v (n))=v (0), and now algorithm can not The disconnected distribution probability for obtaining uniforming;If in the case of whole inquiry click bipartite graph is strongly connected, any two node It is intercommunication, then in iterative process, each item of vector v can all be more than zero, and then constantly iteration can be by v normalization;Formalization For:Iterative process takes advantage of matrix A for left side, and therefore after nth iteration, value is:If A Stationary Distribution, then An=[π12,...,πM+1], nowIt is and vTWith the row vector of length, because Z is the homogenization factor, if v (0) is probability distribution,It is to be uniformly distributed, the initial state of system entropy maximum is exactly state when which is uniformly distributed, after Just it is intended to return to the state of setting out of system most original to random walk model essence;And forward direction random walk model is that system passes through Constantly iteration extends forward, eventually finds steady statue;Recommend in application, when whole in figure possesses more click in inquiry When query node comes preferential position, that is, the plateau that forward direction random walk model is obtained;And work as all nodes of in figure When distribution probability is identical, backward random walk model reaches Stationary Distribution;Therefore, in recommendation process, homogeneous probability and hot topic Node matrix equation convergence in probability is unfavorable for that inquiry is recommended;Suitable iterationses are set in advance and from transition probability, such as n=10, s= 0.9, with the scope of random walk in this control figure.

The beneficial effects of the present invention is:

The present invention is a kind of retrieval recommended models optimization based on inquiry click figure, compared with prior art, present invention head Search behavior first to user and intention are analyzed, and the data extraction method and expression of search behavior is ground Study carefully, by the deep excavation to inquiring about session, it is proposed that the query word correlating method based on user's inquiry log.Secondly, emphasis The theory of bipartite graph recommended models is clicked on to traditional directory and computational methods are analyzed.As the knot of bipartite graph is clicked in inquiry Structure is simple, practical, and implementation process does not rely on term and webpage Similarity Measure, is therefore widely used in searching During index is held up.The present invention proposes using click frequency and replaces number of clicks to build the weight on side in bipartite graph, so permissible Avoid weight from not biased by excessive invalid clicks, make commending system reach steady statue as far as possible.Finally, by experiment and Data analysiss demonstrate the superiority of improved model in terms of three.

Specific embodiment

The invention will be further described below:

The present invention includes that optimization aim builds, weighted value is reconstructed and proposed algorithm optimization;

The optimization aim builds:

Understood according to the above analysis, it is the topmost Search Results of inquiry to click on most pages in Search Results; The relation that we first click on element in bipartite graph for inquiry sets up formalized description:

Define 1 order inquiry and click on bipartite graph G={ Q ∪ U, E, W }, wherein Q and represent inquiry session node set, U represents and looks into Results web page set is ask, E represents the set in figure side, and W represents the weight set on side;Then side e in bipartite graph is clicked on for inquiryij Weight WijConstruction method is as follows:

The optimization aim of bipartite graph is clicked in inquiry:

Formula (1) represents:When inquiry session node is qi(qi∈ Q) when, two-value optimized variable cijRepresent that figure is clicked in inquiry Whether side e have selectedij, and the loss function of optimization aim is the weight on maximized selection side with constraints is to retain side Inquiry and Webpage correlation weight be maximum, i.e. cijWhen=1, wij≥wikAnd wij≥wkj;When meeting this target, represent query point Hit in figure and remain many maximum times with regard to inquiring about and click on as far as possible;

One can be inquired about for optimization aim formula (1) or webpage selects multiple identical weight limit sides;If drawing Enter degree d (the i)=∑ of each nodejδ (i, j) and d (j)=∑iδ (i, j), then formula (1) be equivalent to formula (2), wherein δ (i, J) query node q is representediWith web page joint uiBetween whether there is side (existing for 1, be otherwise 0);

The optimization aim equivalent form of value of core figure is clicked in inquiry:

In the constraint of optimization aim (2), explicit permission inquiry clicks on a query node of core in figure while connecting Multiple web page joints are connected to, while also allowing inquiry to click on a web page joint connection multiple queries node of core in figure;

By finding to above optimization aim analysis, the problem has certain contact and area with traditional stable matching problem Not;The core concept of stable matching is to realize a kind of steady statue, and in this state, coupling no longer has such two when finishing Individual set main body;In reality, men and women's blind date familiar to us, the example such as company's intern and buyer seller are namely based on stable The thought of market matching theory is developed;Wherein bilateral model and delay receive two pieces of weights that algorithm is stable matching theory Want foundation stone;

The major function of a lot of markets of bipartite matching model and social system is exactly that main body therein can be led with another Body phase is mated:For example, student and school, office worker and company, are old enough to get married between men and women;This market coupling is broadly divided into " monolateral city Field coupling " (Single-SidedMarketMatch) and " two day market coupling " (Two-Sided Market Match);Wherein " one-side market coupling " refers to only exist a set in market, and the individuality in set is mutually matched according to respective preference;So And, " room-mate " phenomenon in one-side market coupling can cause the unstable of coupling;When assume exist four " room-mate " A, B, C, D }, wherein A most preference B, B most preference C, C most preference A, and they are classified as D as least preference person;In this case, any It is grouped two-by-two all and cannot realizes stablizing, because current matching can be terminated with the people that D is grouped together goes with matched people again Coupling, and specifically new coupling will be successful so that market cannot realize stable (Gale&Shapley, 1962) always;" bilateral Matching Model " is proposed from research student application school's model and marriage stable problem by Gale and Shapley (1962) earliest; So-called " two day market " refers to there is such a market, has two class individual collections, the individuality in first kind set in market Can only match with the individuality in Equations of The Second Kind set;They demonstrate in such a two day market, as long as the preference of individuality With completeness and transferability, and the freedom that market is enough, individuality can be allowed to carry out any potentially possible coupling, entirely Process can be carried out with iteration, until all individualities have coupling object, reach whole market stable;Bipartite matching model is present This characteristic of stable matching so which is obtained in theory and practice and is widely applied;

What this chapter was proposed clicks on the improved recommended models of bipartite graph to inquiry, with " the one-side market in stable matching problem Coupling " is similar, is that similar aspect is as follows with regard to inquiry and web page joint number stable matching problem:

(1) inquiry session node and to return web page joint number possibility different, therefore not can determine that all nodes have Pairing as;

(2) only there is click preference relation in most inquiry sessions and the webpage of oneself correlation between, not be and all webpages Exist and click on preference;

(3) inquiry is clicked in bipartite graph and is likely to occur number of clicks (weight) identical side, now cannot get Proper Match;

The weighted value reconstruct:

Such as define in 1, bipartite graph G={ Q ∪ U, E, W } is clicked in inquiry, first, is provided with aijIndividual user has carried out clicking on behaviour Make;Now, weight W that conventional construction inquiry is connected side with webpage is to use inquiry qiCorresponding webpage ujNumber of clicks cijTable Show, i.e. wij=cij;By analysis it was found that user is when Search Results are browsed, some users relatively enliven, number of clicks Many, some numbers of clicks are few, due to the difference of user activity, cause touching quantity really can not reflect between inquiry and webpage The degree of association;In order to avoid the appearance of this biasing phenomenon, we introduce user's frequency to replace number of clicks, i.e. wij=aij; Secondly, for same inquiry, user clicks two webpage u1And u2, and touching quantity is equal, if u1Also by more Inquiry was clicked on, then explanation occurs in u1On click there is no u2Important, that is, u1Low with inquiry degree of association;It is right to therefore, it can Each webpage sets up inverse enquiry frequency, i.e.,:

In formula, N represents the quantity of inquiry, NqRepresent the inquiry quantity for clicking the webpage;Now, w is madeij=cij·iqf (u);

Based on this, transition probability the Theory Construction weight can also be utilized;Following two probits are calculated first:

(1) inquiry Session Hand-off is to the probability of related web page:

(2) related web page is to the transition probability of inquiry session:

As transition probability has unsymmetry, i.e. P (uj|qi)≠P(qi|uj), therefore can adopt linear interpolation or take advantage of The method of product carrys out the symmetry of equalizing weight, such as makes wij=α P (qi|uj)+(1-α)P(uj|qi) wherein α be adjustable JIESHEN Number), or make wij=P (qi|uj)·P(uj|qi);

Using set forth herein built using user's frequency inquiry click on bipartite graph in weight, weight can be avoided not Biased by excessive invalid clicks number of times;The benefit for so building is that in bipartite graph, all of side is all integer, is easy to follow-up The solution of optimized algorithm;The number of users of other search daily record is the whole weight that clicks in bipartite graph of inquiring about with its result is straight Sight is readily appreciated;

The proposed algorithm optimization:

Through above, the mathematical model and corresponding algorithm of inquiry click bipartite graph is analyzed, we have proposed and be based on The new proposed algorithm of figure is clicked in inquiry, and the algorithm has filtered inaccurate and under-represented inquiry to be recommended, and successfully avoid Conventional recommendation algorithm ignores the equivalence and problem typical that inquires about under same group;And avoid excessive invalid clicks number of times to draw The biasing problem for rising, improves the precision of inquiry recommended models well;

(1) basic model:Most basic inquiry recommendation method is to be clicked in bipartite graph to click on co-occurrence according to inquiry Inquiry is recommended;This thought is amplified further, that is, the inquiry with identical click is similar, and we will be by random The similarity is propagated by migration method;Namely from initial query, click on according to click on bipartite graph in inquiry Probability migration is to adjacent inquiry, and continues migration from adjacent inquiry;With this iteration, until terminating;Random walk model have before to With backward two kinds of migration modes;Two kinds of migration modes can be represented with same group of definition;

Equally, bipartite graph is clicked in inquiry and G={ Q ∪ U, E, W } is defined as, make M represent the nodes of inquiry, N represents net Page nodes, wijRepresent inquiry qiWith webpage ujClick weight;Probability transfer matrix A=(M+N) × (M+N) is built, is then saved Point transition probability A [i, j]=P (qj|qi), it is re-introduced into from transition probability s, then new transition probability P (vj|vi) definition such as formula (6);

According to given start node vi, the random walk iteration of forward or a backward can be carried out;Before being a difference in that to Migration is possible to the inquiry q' for obtaining inquiring about that q clicks on most possible arrival on bipartite graph in inquiry, it is contemplated that start node viTrip The probability of other nodes is gone to, i.e.,:And backward migration may reach initial query node q, examine Consider from other node migration to start node viProbability, i.e.,:

(2) problem finds:On the basis of above-mentioned algorithm, the value of arrange parameter n and s, n represents the node being introduced in bipartite graph Quantity;S represents from transition probability, i.e., migration to other nodes, s value should not be set to 0.9 quickly in transfer process;At place When reason inquiry is recommended, the value of n is bigger, represents and wants that introducing more nodes carries out migration, or even can include all sections in whole figure Point, can so bring " proposed topic drift " problem, be exactly that the inquiry and user inquiry degree of association that reaches of migration is not high;Specifically deposit In problems with:

For migration forward, after iteration for several times, transition probability is transmitted in more popular inquiry, causes to push away The inquiry that recommends is inaccurate or uncorrelated;Such as " People Weekly " is inquired about, " global personage " and " epoch people may be recommended to last The more popular publication such as thing ";When being propagated using migration backward, probability can tend to homogenization, can recommend spelling wrong or The relatively low inquiry of frequency;

Traditional recommended models can not effectively distinguish the inquiry for disagreeing figure, and the inquiry in random walk model is recommended to be profit Carried out with the similar propagation of probability, before part can be caused to have tight association or closely similar inquiry to be recommended in most so that push away Recommend result is more single, reduce the diversification of recommendation;

Traditional random walk model proposed algorithm is as follows:

(3) algorithm optimization:In order to solve the problems, such as above-mentioned tradition random walk recommended models, propose based on query point The random walk recommended models of figure are hit, by describing inaccurate and recommending to cut without representational in conventional recommendation model Branch;According to the iterative algorithm of random walk ,-probability distribution the situation of web page joint can be inquired about, can be now each Webpage is selected the inquiry of corresponding inquiry click in figure and recommends user;

The random walk model proposed algorithm of figure is clicked on based on inquiry:

The Random Walk Algorithm convergence process of forward and backward is as follows:

In forward direction random walk, transition probability matrix is carried out using the markovian Stationary Distribution in stochastic process Convergence;Given shift-matrix A, if there is iterationses n, works as AnDuring [i, j] > 0, then the Ma Erke being made up of all nodes Husband's chain is homogeneous aperiodic and irreducible, with unique Stationary Distribution;Now forward direction random walk iterative model can turn It is changed into vT(n+1)=vT(n) A=v (0) An;Work as AnTend to A [i, j]=π when Stationary Distributionj, wherein each stage is steady Distribution probability is πT=[π12,...,πM+N], so limn→∞V (n)=π, it is probability distribution to be apparent from as probability v (0) When, vTN () A must be stationary binomial random process;

Rear to random walk when, initially propose also not provide convergence card in the document of backward random walk model Bright;Equally we assume that random matrix A Stable distritation, even if being apparent from probability v (0) for probability distribution, A v (n) also differs Surely it is probability distribution;Therefore normalized vector v in an iterative process, orderBecause probability turns Moving the row of matrix A and 0 is all higher than for all transition probabilities in 1, and A, when probability v (0) is homogeneous distribution, iterative process Row according to probability transfer matrix A carry out probability normalized, i.e. norm (A v (n))=v (0), and now algorithm can be continuous Obtain the distribution probability for uniforming;If in the case of whole inquiry click bipartite graph is strongly connected, any two node is Intercommunication, then in iterative process, each item of vector v can all be more than zero, and then constantly iteration can be by v normalization;Form is turned to:Iterative process takes advantage of matrix A for left side, and therefore after nth iteration, value is:If A Stationary Distribution, then An=[π12,...,πM+1], nowIt is and vTWith the row vector of length, because Z is the homogenization factor, if v (0) is probability distribution,It is to be uniformly distributed, the initial state of system entropy maximum is exactly state when which is uniformly distributed, after Just it is intended to return to the state of setting out of system most original to random walk model essence;And forward direction random walk model is that system passes through Constantly iteration extends forward, eventually finds steady statue;Recommend in application, when whole in figure possesses looking into for more click in inquiry When inquiry node comes preferential position, that is, the plateau that forward direction random walk model is obtained;And work as all nodes of in figure and divide When cloth probability is identical, backward random walk model reaches Stationary Distribution;Therefore, in recommendation process, homogeneous probability and hot topic are saved Dot matrix convergence in probability is unfavorable for that inquiry is recommended;Suitable iterationses are set in advance and from transition probability, such as n=10, s= 0.9, with the scope of random walk in this control figure.

Experiment and analysis

The inquiry click figure recommended models optimized algorithm performance that this section is proposed to this chapter by experiment is verified.By reality Test the given inquiry of data set and diagram data analysis is clicked on, mainly click on degree of association, recommend performance and recommendation results various from inquiry Change the recommendation method after three aspects compare traditional method and optimize, demonstrate the retrieval based on inquiry click figure after optimizing The effectiveness of proposed algorithm.

Analysis of experimental data

The network inquiry daily record that experimental data set is provided using BeiJing ZhongKe's laboratory, by arranging to data set and dividing Analysis, its log record file size is that always inquiry is recorded as 1135274 to 47MB, user, and total hits are 3675413, always Inquiry word number is 176687.Learnt after the statistical analysiss to query word frequency, retrieving query word of the number of times more than 5 is 28745, these words we classify as high frequency words, these high frequency words are for must inquire about record totally 883752.It follows that accounting for The high frequency query word of total query word 16.3%, but account for 77.8% inquiry times.We when pretreatment is carried out to data, if Threshold value is put for 5, low-frequency word of the 83.7% retrieval number of times less than 5 is filtered out, has so also only cut 22.2% inquiry letter Breath.Space more conference due to data set causes recommended models more complicated, and after 83.7% query word is neglected, model is adopted Sample space is original 1/6, and low frequency query word corresponding be inquiry of low quality, if adopting low-frequency word As iteration initial point, typically up to less than recommendation effect.To the essential information such as table 1 after the beta pruning arrangement of data.

Daily record data statistical information is clicked in the inquiry of table 1

According to the inquiry click figure information that excavates in data set, we carry out example at the sampled side of part different frequency Analysis.User mainly has three kinds of information interaction approach when using search engine:(1) dragnet station owner wants domain name to carry out website Search;(2) search name or fixing term find Authoritative Web pages, such as carry out relevant inquiring using the Baidupedia page;(3) search for The main description of information such as searches for lyrics information using title of the song finding the source of information.It is found that figure is clicked in inquiry retaining Touching quantity most webpage, and be the page that user is most interested in, its correspond to the query word that submits to be can accurate description User's request.

In order to distribution of the figure in different frequency is clicked in the inquiry after analysis optimization, we are according to " inquiry " and " webpage " node Between side weights, side is classified:Power while, Gao Quanbian, middle power while, low power while and during weak power, its each self-corresponding use Family is clicked on frequency and is respectively:[1000 ,+∞), [100,1000], [10,100], [2,10], [1,1].Passed by calculating further The distribution situation of system bipartite graph and improvement bipartite graph in above-mentioned five classification, as shown in table 2, the numeral in table bracket is to change Enter the shared ratio in traditional bipartite graph of bipartite graph.It can be found that from the table:(1) improve figure power while and class during weak power The ratio for accounting in type is higher than other three types, this is because what power side and weak Quan Bian represented is the most strong inquiry of the degree of association Click on;(2) as high power is in, middle power and low power side proportion is relatively low, illustrate that the degree of association is relatively low, it is possible to be removed.

2 traditional directory of table is clicked on bipartite graph and its improves distribution situation of the figure on dissimilar side

Ultimate principle and principal character and the advantages of the present invention of the present invention has been shown and described above.The technology of the industry Personnel it should be appreciated that the present invention is not restricted to the described embodiments, simply explanation described in above-described embodiment and description this The principle of invention, without departing from the spirit and scope of the present invention, the present invention also has various changes and modifications, these changes Change and improvement is both fallen within scope of the claimed invention.The claimed scope of the invention by appending claims and its Equivalent thereof.

Claims (1)

1. a kind of retrieval recommended models for clicking on figure based on inquiry optimize, it is characterised in that:Including optimization aim structure, weighted value Reconstruct and proposed algorithm optimization;
The optimization aim builds:
Understood according to the above analysis, it is the topmost Search Results of inquiry to click on most pages in Search Results;We The relation for first clicking on element in bipartite graph for inquiry sets up formalized description:
1 order inquiry click bipartite graph G={ Q ∪ U, E, W } is defined, wherein Q represents inquiry session node set, U represents inquiry knot Fruit collections of web pages, E represents the set in figure side, and W represents the weight set on side;Then side e in bipartite graph is clicked on for inquiryijPower Weight WijConstruction method is as follows:
The optimization aim of bipartite graph is clicked in inquiry:
Formula (1) represents:When inquiry session node is qi(qi∈ Q) when, two-value optimized variable cijRepresent whether inquiry click figure selects Side e is selectedij, and the loss function of optimization aim is the weight on maximized selection side with constraints is to retain the inquiry on side It is maximum, i.e. c with Webpage correlation weightijWhen=1, wij≥wikAnd wij≥wkj;When meeting this target, represent that in figure is clicked in inquiry Remain many maximum times with regard to inquiring about and click on as far as possible;
One can be inquired about for optimization aim formula (1) or webpage selects multiple identical weight limit sides;If introducing every Degree d (i) of individual node=∑jδ (i, j) and d (j)=∑iδ (i, j), then formula (1) be equivalent to formula (2), wherein δ (i, j) generation Table query node qiWith web page joint uiBetween whether there is side (existing for 1, be otherwise 0);
The optimization aim equivalent form of value of core figure is clicked in inquiry:
In the constraint of optimization aim (2), explicit permission inquiry is clicked on a query node of core in figure and is simultaneously connected to Multiple web page joints, while also allow inquiry to click on a web page joint connection multiple queries node of core in figure;
The weighted value reconstruct:
Such as define in 1, bipartite graph G={ Q ∪ U, E, W } is clicked in inquiry, first, is provided with aijIndividual user has carried out clicking operation;This When, it is to use inquiry q that conventional construction inquiry is connected weight W on side with webpageiCorresponding webpage ujNumber of clicks cijRepresent, i.e. wij =cij;By analysis it was found that user is when Search Results are browsed, some users relatively enliven, and number of clicks is many, some points Hit that number of times is few, due to the difference of user activity, the degree of association for causing touching quantity really can not reflect between inquiry and webpage; In order to avoid the appearance of this biasing phenomenon, we introduce user's frequency to replace number of clicks, i.e. wij=aij;Next, for Same inquiry, user clicks two webpage u1And u2, and touching quantity is equal, if u1Also clicked on by more inquiry Cross, then explanation occurs in u1On click there is no u2Important, that is, u1Low with inquiry degree of association;Therefore, it can to each webpage Inverse enquiry frequency is set up, i.e.,:
In formula, N represents the quantity of inquiry, NqRepresent the inquiry quantity for clicking the webpage;Now, w is madeij=cij·iqf(u);
Based on this, transition probability the Theory Construction weight can also be utilized;Following two probits are calculated first:
(1) inquiry Session Hand-off is to the probability of related web page:
(2) related web page is to the transition probability of inquiry session:
As transition probability has unsymmetry, i.e. P (uj|qi)≠P(qi|uj), therefore can adopt linear interpolation or product Method carrys out the symmetry of equalizing weight, such as makes wij=α P (qi|uj)+(1-α)P(uj|qi) wherein α be customized parameter), or Person makes wij=P (qi|uj)·P(uj|qi);
The proposed algorithm optimization:
(1) basic model:Most basic inquiry recommendation method is to click on the inquiry that clicks in bipartite graph with co-occurrence according to inquiry Recommended;This thought is amplified further, that is, the inquiry with identical click is similar, and we will pass through random walk The similarity is propagated by method;Namely from initial query, click on according to the probability that clicks on bipartite graph in inquiry Migration is to adjacent inquiry, and continues migration from adjacent inquiry;With this iteration, until terminating;Random walk model have front to rear To two kinds of migration modes;Two kinds of migration modes can be represented with same group of definition;
Equally, bipartite graph is clicked in inquiry and G={ Q ∪ U, E, W } is defined as, make M represent the nodes of inquiry, N represents webpage section Points, wijRepresent inquiry qiWith webpage ujClick weight;Probability transfer matrix A=(M+N) × (M+N) is built, then node turns Move probability A [i, j]=P (qj|qi), it is re-introduced into from transition probability s, then new transition probability P (vj|vi) definition such as formula (6);
According to given start node vi, permissible Carry out the random walk iteration of forward or a backward;It is possible to obtain inquiring about q in two points of inquiry click to migration before being a difference in that The most possible inquiry q' for reaching on figure, it is contemplated that start node viMigration to the probability of other nodes, i.e.,:And backward migration may reach initial query node q, it is contemplated that from other node migration to first Beginning node viProbability, i.e.,:
(2) problem finds:On the basis of above-mentioned algorithm, the value of arrange parameter n and s, n represents the number of nodes being introduced in bipartite graph; S represents from transition probability, i.e., migration to other nodes, s value should not be set to 0.9 quickly in transfer process;Look in process When asking recommendation, the value of n is bigger, represents and wants that introducing more nodes carries out migration, or even can include the interior all nodes of whole figure, this Sample can bring " proposed topic drift " problem, be exactly that the inquiry and user inquiry degree of association that reaches of migration is not high;Concrete exist with Lower problem:
For migration forward, after iteration for several times, transition probability is transmitted in more popular inquiry, causes recommendation Inquiry is inaccurate or uncorrelated;Such as " People Weekly " is inquired about, " global personage " and " epoch personage " may be recommended to last In more popular publication;When being propagated using migration backward, probability can tend to homogenization, can recommend the wrong or frequency of spelling Relatively low inquiry;
Traditional recommended models can not effectively distinguish the inquiry for disagreeing figure, and the inquiry in random walk model is recommended to be to utilize generally The similar propagation of rate is carried out, before part can be caused to have tight association or closely similar inquiry to be recommended in most so that recommend knot Fruit is more single, reduces the diversification of recommendation;
(3) algorithm optimization:In order to solve the problems, such as above-mentioned tradition random walk recommended models, propose to click on figure based on inquiry Random walk recommended models, will be inaccurate to description in conventional recommendation model and recommend to carry out beta pruning without representational;Root According to the iterative algorithm of random walk ,-probability distribution the situation of web page joint can be inquired about, now can be chosen for each webpage User is recommended in the inquiry that in figure is clicked in the corresponding inquiry of choosing;
The random walk model proposed algorithm of figure is clicked on based on inquiry:
The Random Walk Algorithm convergence process of forward and backward is as follows:
In forward direction random walk, transition probability matrix receipts are carried out using the markovian Stationary Distribution in stochastic process Hold back;Given shift-matrix A, if there is iterationses n, works as AnDuring [i, j] > 0, then the Markov being made up of all nodes Chain is homogeneous aperiodic and irreducible, with unique Stationary Distribution;Now forward direction random walk iterative model can change For vT(n+1)=vT(n) A=v (0) An;Work as AnTend to A [i, j]=π when Stationary Distributionj, wherein each stage steadily divides Cloth probability is πT=[π12,...,πM+N], so limn→∞V (n)=π, is apparent from when probability v (0) is probability distribution, vT N () A must be stationary binomial random process;
Rear to random walk when, initially proposing also not provide convergence in the document of backward random walk model proves;With Sample is we assume that random matrix A Stable distritation, even if being apparent from probability v (0) for probability distribution, A v (n) is also not necessarily Probability distribution;Therefore normalized vector v in an iterative process, orderBecause probability shifts square The row of battle array A and be all higher than 0 for all transition probabilities in 1, and A, when probability v (0) is homogeneous distribution, iterative process according to The row of probability transfer matrix A carry out probability normalized, i.e. norm (A v (n))=v (0), and now algorithm can be continuously available The distribution probability of homogenization;If in the case of whole inquiry click bipartite graph is strongly connected, any two node is intercommunication , then in iterative process, each item of vector v can all be more than zero, and then constantly iteration can be by v normalization;Form is turned to:Iterative process takes advantage of matrix A for left side, and therefore after nth iteration, value is:If A Stationary Distribution, then An=[π12,...,πM+1], nowIt is and vTWith the row vector of length, because Z is the homogenization factor, if v (0) is probability distribution,It is to be uniformly distributed, the initial state of system entropy maximum is exactly state when which is uniformly distributed, backward Random walk model essence is just intended to return to the state of setting out of system most original;And forward direction random walk model is system by not Disconnected iteration extends forward, eventually finds steady statue;Recommend in application, when whole in figure possesses the inquiry of more click in inquiry When node comes preferential position, that is, the plateau that forward direction random walk model is obtained;And work as all Node distribution of in figure When probability is identical, backward random walk model reaches Stationary Distribution;Therefore, in recommendation process, homogeneous probability and popular node Matrix convergence in probability is unfavorable for that inquiry is recommended;Suitable iterationses are set in advance and from transition probability, such as n=10, s=0.9, Scope with random walk in this control figure.
CN201610390608.9A 2016-06-03 2016-06-03 Query click graph-based search recommendation model optimization CN106445989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610390608.9A CN106445989A (en) 2016-06-03 2016-06-03 Query click graph-based search recommendation model optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610390608.9A CN106445989A (en) 2016-06-03 2016-06-03 Query click graph-based search recommendation model optimization

Publications (1)

Publication Number Publication Date
CN106445989A true CN106445989A (en) 2017-02-22

Family

ID=58183837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610390608.9A CN106445989A (en) 2016-06-03 2016-06-03 Query click graph-based search recommendation model optimization

Country Status (1)

Country Link
CN (1) CN106445989A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997379A (en) * 2017-03-20 2017-08-01 杭州电子科技大学 A kind of merging method of the close text based on picture text click volume
CN107832468A (en) * 2017-11-29 2018-03-23 百度在线网络技术(北京)有限公司 Demand recognition methods and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997379A (en) * 2017-03-20 2017-08-01 杭州电子科技大学 A kind of merging method of the close text based on picture text click volume
CN107832468A (en) * 2017-11-29 2018-03-23 百度在线网络技术(北京)有限公司 Demand recognition methods and device

Similar Documents

Publication Publication Date Title
Jäschke et al. Tag recommendations in folksonomies
US7856449B1 (en) Methods and apparatus for determining social relevance in near constant time
CN105378764B (en) Interactive concept editor in computer-human's interactive learning
Reid et al. Mapping the contemporary terrorism research domain
US20110078188A1 (en) Mining and Conveying Social Relationships
Thelwall Conceptualizing documentation on the Web: An evaluation of different heuristic‐based models for counting links between university Web sites
US7529735B2 (en) Method and system for mining information based on relationships
Hsu et al. Collaborative and Structural Recommendation of Friends using Weblog-based Social Network Analysis.
Liu et al. How do users describe their information need: Query recommendation based on snippet click model
CN101055587A (en) Search engine retrieving result reordering method based on user behavior information
Sharma et al. A comparative analysis of web page ranking algorithms
CN101770520A (en) User interest modeling method based on user browsing behavior
CN103399883B (en) Method and system for performing personalized recommendation according to user interest points/concerns
Spink et al. Overlap among major web search engines
CN102609512A (en) System and method for heterogeneous information mining and visual analysis
US7895195B2 (en) Method and apparatus for constructing a link structure between documents
CN103488724A (en) Book-oriented reading field knowledge map construction method
US20140358911A1 (en) Search and discovery system
CN103577549B (en) Crowd portrayal system and method based on microblog label
CN100583804C (en) Method and system for processing social network expert information based on expert value propagation algorithm
Thelwall et al. Online presentations as a source of scientific impact? An analysis of PowerPoint files citing academic journals
Zhong et al. Research on China's tourism: A 35‐year review and authorship analysis
Han et al. International collaboration in LIS: global trends and networks at the country and institution level
CN103823844A (en) Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
Arbelaitz et al. Web usage and content mining to extract knowledge for modelling the users of the Bidasoa Turismo website and to adapt it

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Jia Hailong

Inventor after: Hu Zhenwen

Inventor after: Chen Ning

Inventor after: Gao Zheng

Inventor after: Tian Wenqiang

Inventor before: Jia Hailong

TA01 Transfer of patent application right

Effective date of registration: 20171228

Address after: 430070 Hubei Province, Wuhan city Hongshan District Luoshi Road No. 122

Applicant after: Wuhan University of Technology

Applicant after: Xinxiang University

Address before: Xinxiang City, Henan province 453000 Jinsui Avenue East Xinxiang College

Applicant before: Xinxiang University

CB03 Change of inventor or designer information
TA01 Transfer of patent application right