CN102779193B

CN102779193B - Self-adaptive personalized information retrieval system and method

Info

Publication number: CN102779193B
Application number: CN201210244519.5A
Authority: CN
Inventors: 杨沐昀; 王晓春; 李生; 齐浩亮; 赵铁军
Original assignee: Harbin Institute of Technology
Current assignee: Harbin University of technology high tech Development Corporation
Priority date: 2012-07-16
Filing date: 2012-07-16
Publication date: 2015-05-13
Anticipated expiration: 2032-07-16
Also published as: CN102779193A

Abstract

The invention discloses a self-adaptive personalized information retrieval system and method. For timely catching irregularly distributed dynamic retrieval requirements of a user, a retrieval module is timely updated through interaction of the user and a search engine. The system comprises a data input sub system, a parameter training and predicating sub system, a retrieval performing sub system and a data output sub system, wherein the data input sub system is used for combining historical inquiry information and historical click information to form a characteristic matrix according to the current inquiry information, and acquiring a training parameter predicating module according to the characteristic matrix; the parameter training and predicating sub system is used for training and applying the parameter predicating module to acquire the predicated parameters according to the characteristic matrix; the retrieval performing sub system is used for predicating the parameters to organize the current inquiry and the historical inquiry, and combining the user module and the inquiry module to form a personalized inquiry module; and the data output sub system is used for searching a document matched with the personalized inquiry from the document to be retrieved as a primary retrieved result, and sequencing the primary retrieved result according to the correlation to obtain the final retrieved result for outputting.

Description

Self-adaptation Personal Information System and method

Technical field

The present invention relates to computer information retrieval technology.

Background technology

The vastness of the network information and the develop rapidly of correlation technique make people use search engine more and more frequently.According to the statistics of China Internet network information center (NIC) (CNNIC), search engine (search engine) becomes the instrument that the most general auxiliary people retrieve Web information.

In recent years, in order to improve the precision of information retrieval, facilitating user to retrieve, improving the search experience of user, information retrieval field has emerged many outstanding information retrieval models and has achieved good effect.One of them mainly improves is exactly set up user interest model, and object is while the content relevance ensureing inquiry and document, ensures the correlativity of document and user interest simultaneously.User interest divides into Long-term Interest and short-term interest according to time span.Short-term interest comes from the search history of an inquiry session (session).Based in the personalized retrieval research of short-term interest, the people such as Cao (2008; 2009) regard the inquiry in inquiry session and click as ordered data, adopt HMM model and improve HMM model (vlHMM) and the training of CRF model, predicted query is intended to.Zhu and Mishne (2009) is to user's inquiry session process (session, be called for short inquiry session) carry out cluster, then the importance polymerization produced by whole inquiry session, as the importance of the overall situation, proposes the ClickRank model for weighing webpage or website importance.Except these directly carry out modeling research method to inquiry session, also there is researcher using inquiry session as the feature in order models.Multiple query modification relation is added in RankSVM as feature by the people such as Xiang (2010).In addition, conventional retrieval model also can be applied to the research of user's short-term interest.The people such as Chen (2009) combine current queries and click the similarity of documentation summary on the basis of conventional language model.Unlike, the personalized retrieval model comprising Long-term Interest is most based on conventional IR model.Tan (2006) proposes the method for some calculating historical information relevant to current queries on the basis of language model, and this retrieval model has positive role to new and old inquiry.The people such as Dou (2007) have carried out similar experiment respectively on vector space model and language model.The people such as Ahn (2008) are together in series multiple queries session according to Task, establish the Personalized Retrieval System embodying user's Long-term Interest based on BM25 probability model.

There is a significant shortcoming in these personalized retrieval models based on user interest above-mentioned: model is after training completes, and model inner parameter is all fixed value, relatively immobilizes.In fact, in different retrieval situations, information requirement is each variant, adopts unified mode to process various user search, lacks dirigibility unavoidably.For the personalized retrieval model based on query expansion, user model combines and current queries models coupling, and in research, the two-part weight of setting was constant constant usually in the past.But if the length of current queries is very short, the query intention of user is expressed clear or sufficiently complete not, so now should strengthen the effect of user model, reduce the importance of current queries model.Otherwise if current queries length is longer, it is clear that query intention is expressed, and so the effect played of user model is inessential on the contrary.Therefore, a kind of have the personalized retrieval experience that adaptive dynamic retrieval model theory can be improved further user, is the key characteristic that current searching system lacks.

Desirable dynamically personalized retrieval model should with objective retrieve application for foundation, consider when Design and implementation retrieval model following several in:

1. user distribution

In objective world, user is stochastic distribution, and studies often to user distribution proposition hypothesis in the past.Radlinski(2007) suppose that user is the Stochastic choice from the crowd fixed from a number.Second Year, thinks user always in a fixing crowd determined.Existing research confirms that the behavior of user is erratic (Agichtein et al., 2006), should avoid doing any hypothesis to user distribution as far as possible.

2. user interest

User interest is also changeable.Belkin (1997) finds very early when user searches information, and user search demand can change, and Sofia Stamou (2009) also thinks that user interest can along with time variations.

3. query capability

User and the mutual process of search engine are also processes (Shen et al., 2005) learning to use search engine.User, according to the quality returned results and satisfaction, resubmits new inquiry.That is, user can have influence in the reciprocal process with search engine the inquiry that user once submits to.Along with enriching of user search experience, user builds the ability of inquiry also in enhancing.Therefore the importance of each historical query is along with time variations, newer inquiry importance higher (BinTan et al., 2006; Dou et al., 2007).

Prior art does not have with reference to abundant user behavior feature to set the parameter in retrieval model.In fact, user search behavior itself provides important interest information, with this part information for basis can increase assigned weight rationality greatly.For example, if the length of current queries is less, so the quantity of information that provides of current queries is just less, and the weight now for historical information will strengthen.On the contrary, if the historical information of user is little, the weight of current queries will so be strengthened.Parameter training of the present invention and predicting subsystem provide important interest information for assigning weight dynamically according to realization with user behavior itself, greatly can increase the rationality of weight allocation.

Three, present invention employs machine learning algorithm and automatically complete prediction.

For example, if the length of current queries is less, the weight so for historical information will strengthen.If the historical information of user is little, the weight of current queries so will be strengthened.But if current queries is shorter, how the situation that historical information is less, assign weight and just seem complicated simultaneously.Adaptive personalized retrieval model is difficult to the problem determined by machine learning algorithm solution model parameter, ensure that the accuracy of the weight of prediction to a certain extent.

Four, contemplated by the invention the sequential relationship between inquiry.

The query history of user is according to time ordered arrangement, and new inquiry is more important than old inquiry, so carry out weight decay to historical query according to the time gap with current queries.

Five, the present invention solves in the middle of personalized retrieval modeling, how to organize current queries, historical query, and history clicks the relation between three.

Six, present invention enhances the process of customized information, excavate if explored the problem that the historical information of active user and current queries information improve the retrieval effectiveness of current queries further.

Seven, the present invention does not do any hypothesis to user distribution.Doing so avoids user truly distribute inconsistent with hypothesis and affect the situation of retrieval effectiveness.

Summary of the invention

In order to catch in time for the dynamic retrieval demand of the erratic user of distribution, the object of the retrieval model that upgrades in time alternately of adjoint user and search engine, the present invention devises a kind of self-adaptation Personal Information System and method.

Self-adaptation Personal Information System of the present invention comprises:

For according to current queries information, in conjunction with historical query information and history click information constitutive characteristic matrix, also for obtaining the data input subsystem of training parameter forecast model according to eigenmatrix;

For training according to eigenmatrix and application parameter forecast model, the parameter training obtaining Prediction Parameters and predicting subsystem;

For organizing current queries, historical query and history to click with prediction parameter out; Also for user model and interrogation model are combined the execution retrieval subsystem forming personalized enquire model;

For finding the document that mates with personalized enquire as preliminary search result in document to be retrieved, also for sorting to described preliminary search result according to correlativity, and using data output subsystem that the result after sorting exports as final result for retrieval.

Above-mentioned data input subsystem comprises:

For generating the module of user behavior feature according to current queries information, and

For the user's all behavioural characteristic constitutive characteristics matrix norm block according to acquisition.

Above-mentioned parameter training and predicting subsystem comprise:

For receiving the data input module of pending data;

For calculating historical query corresponding to each inquiry and history is clicked and is organized into the module of required data layout;

For constitutive characteristic matrix norm block;

For searching the module of the parameter of current queries optimum in the mode of searching of traversal, the step-length of described traversal is 0.1;

For the module using SVM-Logic Regression Models to set up the mapping of user characteristics and optimized parameter.

Self-adaptation Personalized search of the present invention comprises:

According to current queries information, in conjunction with the step of historical query information and history click information constitutive characteristic matrix;

The step of training parameter forecast model is obtained according to eigenmatrix;

According to eigenmatrix training also application parameter forecast model, obtain the step of the parameter of prediction;

Organize current queries, historical query and history to click with prediction parameter out, user model and interrogation model are combined the step forming personalized enquire model;

Find in document to be retrieved with the document of personalized enquire Model Matching as preliminary search result, and according to correlativity, described preliminary search result is sorted, using the step that the result after sequence exports as final result for retrieval data.

Above-mentioned according to current queries information, the step in conjunction with historical query information and history click information constitutive characteristic matrix comprises:

The step of user behavior feature is generated according to current queries information, and

According to the step of the user's all behavioural characteristic constitutive characteristics matrix obtained.

Above-mentioned according to eigenmatrix training also application parameter forecast model, the step obtaining the parameter of prediction also comprises:

Receive the step of pending data;

Calculate historical query corresponding to each inquiry and history is clicked and is organized into the step of required data layout;

Constitutive characteristic matrix norm block step;

Search the step of the parameter of current queries optimum in the mode of searching of traversal, the step-length of described traversal is 0.1;

SVM regression model is used to set up the step of the mapping of user characteristics and optimized parameter.

In technical scheme of the present invention, described user behavior feature comprises:

Represent that the history of the web document of checking of user in an inquiry session session clicks category feature, that is: represent the web document that user checked within very short time;

Represent the historical query category feature to searching system submitted of user in an inquiry session session, that is, represent the inquiry submitted in user's very short time,

Represent the current queries category feature of current queries;

Represent the feature between the current queries of relation between current queries and historical query and historical query;

Represent the feature between the current queries of relation between current queries and history click and history click.

The particular content of above-mentioned five category features is respectively:

Described history is clicked category feature and is comprised: history clicks total degree, history clicks total length, history clicks length mean value (mean values of whole click length of each inquiry correspondence), each click average length, a upper history clicks total length, last click number of documents, the last mean value clicking Document Length;

Described historical query category feature comprises: historical query total length, the average length of historical query and historical query total quantity;

The current queries category feature of described expression current queries comprises: current queries length;

Feature between described current queries and historical query comprises: current queries word is compared with a upper historical query, the recurrence probability that new epexegesis and a upper history are clicked, current queries and a upper inquiry are compared, the quantity of new epexegesis, current queries word is compared with a upper historical query, co-occurrence word accounts for the number percent of current queries length, the similarity average of current queries and historical query, the similarity maximal value of current queries and historical query, the similarity of current queries word and a upper historical query, current queries is compared with a upper historical query, the recurrence probability of new epexegesis and current queries, new epexegesis quantity, the number of times summation that new epexegesis occurs, current queries word is compared with a upper historical query, delete the recurrence probability of word and a upper historical query, the quantity of word is deleted in a upper historical query, the number of times summation that word occurs is deleted in a upper historical query, current queries is compared with a upper historical query, the recurrence probability of co-occurrence word and a upper historical query, the quantity of co-occurrence word in a upper historical query, the number of times summation that in a upper historical query, co-occurrence word occurs,

Feature between described current queries and history click comprises: current queries word and whole history click similarity average, current queries word and whole history click similarity maximal value, current queries word and a upper history click similarity, current queries and a upper history point hit newly-increased word number, new epexegesis clicks occurrence number summation in a upper history, current queries word is compared with a upper historical query, delete the recurrence probability that word and a upper history are clicked, delete the quantity of word, a upper history point hits deletes word number, the number of times summation deleting that word occurs is hit at a upper history point, compared with current queries word is clicked with a upper history, the recurrence probability that co-occurrence word and a upper history are clicked, the quantity of co-occurrence word, a upper history point hits the quantity of co-occurrence word, a upper history point hits co-occurrence word occurrence number summation.

User behavior feature corresponding to each inquiry is not necessarily identical, and the parameter in corresponding interrogation model is just not necessarily identical.Therefore, the objective retrieval Behavior law that the method that the present invention is directed to the concrete retrieval environment dynamic allocation parameter of each inquiry is more close to the users.

In actual information retrieving, call the feature weight obtained in training, the optimized parameter that should use in prediction retrieval model.The present invention adopts involved by retrieving information five kinds of features jointly to determine, and current queries, historical query and history are clicked in three parts, which part more accurately expresses user search intent and the contribution for current retrieval tasks, thus the dynamic assignment weight of three parts, reach the object obtaining optimized parameter.

To sum up, parameter in the present invention's adaptive personalized retrieval model is all predict the parameter in current queries model according to the interbehavior of each user, machine learning algorithm is have employed in the process of prediction, such retrieval model can parameter in sweetly disposition model, thus possesses higher dirigibility and retrieval rate.

Self-adaptation retrieval model of the present invention is continuous self along with user and increasing of searching system interaction times, wherein to historical information according to the size with current time interval dynamic assignment weight, determine that the parameter of attenuation amplitude is produced by parametric prediction model.In order to the present invention and mainstream technology be compared, have employed the data of (Shen et al., 2005), Setup Experiments is also consistent with this article.Consider that the importance of historical information is not with the special circumstances changed with current time interval, the present invention also compares the effect of now dynamic retrieval model and fixed coefficient retrieval model.See on the whole, along with enriching of historical information, the retrieval effectiveness of personalized retrieval model is become better and better on the whole, and the gap between model is more obvious, refers to following table:

The 4th the inquiry Q4 submitted to for user in inquiry session, first is utilized to inquire about Q1 equally, Q2 and the 3rd inquiry Q3 is as historical information in second inquiry, even if when not considering historical information difference of importance, method in this paper under such condition (i.e. AdaptiveEW result) relatively improves 38.18%, PR@20 index relative to traditional model (BayesInt) and relatively improves 17.74% on MAP measurement index; If difference of importance between historical information, AdaptiveDW model in this paper is relative to BatchUp model, MAP and PR@20 increase rate reaches 27.54% and 15.94% respectively.Data show, the retrieval effectiveness of the self-adaptation personalized retrieval model (AdaptiveDW) that the present invention proposes has exceeded personalized retrieval model (BatchUp mode) best in current main-stream method.

To sum up, self-adaptation personalized retrieval model of the present invention adopts separately parametric prediction model to produce weight, has taken into account dirigibility and the rationality of weight allocation.On identical data set, adaptive dynamically personalized retrieval model is superior to mainstream technology on retrieval effectiveness, confirms the validity of the technology proposed in this invention.

Invent concrete effect to have:

One, the new and old inquiry submitted to for user of the present invention is all effective.

Old inquiry refers to the inquiry occurred in user search history; New inquiry refers to the inquiry that user submits to first time.For old inquiry, because there is historical information can reference, will increase for the weight of historical information in personalized retrieval model, usually setting close to 1 constant.For new inquiry, because do not have history can reference, so will reduce for the weight of historical information in personalized retrieval model, usually setting close to 0 constant.The present invention unlike the prior art, self-adaptation retrieval model of the present invention is without the need to first judging whether new inquiry or old inquiry to inquiry classification, but the parameter directly set flexibly according to user behavior feature in retrieval model, therefore, the present invention is applicable to various types of user behavior feature.

Two, the present invention is according to user interactions behavior dynamic assignment weight.

Accompanying drawing explanation

Fig. 1 is principle schematic of the present invention.Fig. 2 is the message processing flow figure of parameter prediction subsystem.

Embodiment

Self-adaptation Personal Information System described in embodiment one, present embodiment comprises:

Embodiment two, present embodiment are the further restrictions to data input subsystem in the self-adaptation Personal Information System described in embodiment one, and the data input subsystem in present embodiment comprises:

Embodiment three, present embodiment are the further restrictions to the parameter training in the self-adaptation Personal Information System described in embodiment one and predicting subsystem, and in present embodiment, parameter training and predicting subsystem comprise:

For receiving the data input module of pending data;

For constitutive characteristic matrix norm block;

For the module using SVM regression model to set up the mapping of user characteristics and optimized parameter.

Embodiment four, present embodiment are further illustrating the user behavior feature in the self-adaptation Personal Information System described in embodiment one, and described user behavior feature comprises:

Represent that the history of the web document of checking of user in an inquiry session session clicks category feature, that is: represent that the history that user checked within very short time is clicked;

Represent the historical query category feature to searching system submitted of user in an inquiry session session, that is, represent the historical query submitted in user's very short time,

Represent the current queries category feature of current queries;

Embodiment five, present embodiment are further illustrating the self-adaptation Personal Information System described in embodiment four,

Described history is clicked category feature and is comprised: history clicks total degree, history clicks total length (with single word/term for unit), history clicks length mean value (mean values of whole click length of each inquiry correspondence), each click average length, a upper history clicks total length, last click number of documents, the last mean value clicking Document Length;

Self-adaptation Personalized search described in embodiment six, present embodiment comprises:

Embodiment seven, present embodiment are in the self-adaptation Personalized search described in embodiment six, according to current queries information, in conjunction with the further restriction of the step of historical query information and history click information constitutive characteristic matrix, this step comprises further:

Embodiment eight, present embodiment are in the self-adaptation Personalized search described in embodiment six, according to eigenmatrix training also application parameter forecast model, obtain the further restriction of the step of the parameter of prediction, this step comprises further:

Receive the step of pending data;

Constitutive characteristic matrix norm block step;

Embodiment nine, present embodiment are the further restrictions to the user behavior feature described in the self-adaptation Personalized search described in embodiment six, and described user behavior feature comprises:

Represent the historical query category feature of the historical query to searching system submitted of user in an inquiry session session, that is, the interior historical query submitted to of expression user's very short time,

Represent the current queries category feature of current queries;

Embodiment ten, present embodiment are further illustrating five class technical characteristics described in embodiment nine:

Input data of the present invention are the continuous-query behaviors carried out to meet a search need according to each user of sequence of event, comprise the inquiry that each user submits to searching system, the document (comprising title and summary) that searching system returns, and the document code that user checked.

For file query_history.topic2, data layout is:

The result for retrieval of inquiry string " acquisition u.s.foreign company " is recorded between < result for retrieval > and </ result for retrieval >.The precedence that document code occurs has reacted the sequencing information of document in searching system returns results.Click the numbering that set record user clicks the document checked.

According to current queries information, in conjunction with the step of historical query information and history click information constitutive characteristic matrix be:

After input data, next carry out feature extraction.Current queries in Water demand inquiry session and historical query, current queries and history are clicked, historical query, the relation between history click, five classes, 39 the search behavior features of final each user of extraction when submitting each inquiry to, for:

Represent that the history of the web document of checking of user in an inquiry session session clicks category feature, comprising:
	History clicks total degree
History clicks total length
	History clicks length mean value (mean values of whole click length of each inquiry correspondence)
Each click average length
	A upper history clicks total length
Last click number of documents
	The last mean value clicking Document Length

Represent the historical query category feature to searching system submitted of user in an inquiry session session, comprising:
	Historical query total length
Historical query length mean value
	Historical query quantity
Represent the current queries category feature of current queries, comprising:
	Current queries length
Represent the feature between the current queries of relation between current queries and history click and history click, comprise
	Current queries term and whole history click similarity average
Current queries term and whole history click similarity maximal value
	Current queries term and a upper history click similarity
Current queries and a upper history point hit newly-increased word number
	New epexegesis clicks occurrence number summation in a upper history
Current queries term, compared with a upper historical query, deletes the recurrence probability that word and a upper history are clicked
	Delete the quantity of word
A upper history point hits deletes word number
	The number of times summation deleting that word occurs is hit at a upper history point
Compared with current queries term clicks with a upper history, the recurrence probability that co-occurrence word and a upper history are clicked
	The quantity of co-occurrence word
A upper history point hits the quantity of co-occurrence word
	A upper history point hits co-occurrence word occurrence number summation
Represent the feature between the current queries of relation between current queries and historical query and historical query, comprising:
	Current queries term compared with a upper historical query, the recurrence probability that new epexegesis and a upper history are clicked
Current queries and a upper inquiry are compared, the quantity of new epexegesis
	Current queries term is compared with a upper historical query, and co-occurrence word accounts for the number percent of current queries length
The similarity average of current queries and historical query
	The similarity maximal value of current queries and historical query
The similarity of current queries term and a upper historical query
	Current queries term compared with a upper historical query, the recurrence probability of new epexegesis and current queries
New epexegesis quantity

The number of times summation that new epexegesis occurs
	Current queries term, compared with a upper historical query, deletes the recurrence probability of word and a upper historical query
The quantity of word is deleted in a upper historical query
	The number of times summation that word occurs is deleted in a upper historical query
Current queries term compared with a upper historical query, the recurrence probability of co-occurrence word and a upper historical query
	The quantity of co-occurrence word in a upper historical query
The number of times summation that in a upper historical query, co-occurrence word occurs

On the other hand, the optimum weighted value of each inquiry is calculated.The training data of these 39 common composition parameter forecast models of characteristic sum optimal weights value.In training data, the part of@beginning represents that the symbolic animal of the birth year of the filename of training data and the title of each feature and character pair describes.The part of below@DATA is exactly eigenmatrix (this form directly can input for existing SVM returns kit).

With q ₂for example, then corresponding training data is:

@RELATION q2.arff

@ATTRIBUTE cqlenth numeric

@ATTRIBUTE class numeric

@DATA

3,2,20,20,10,20,0,2,0.0869565217391304,0.0869565217391304,0.0869565217391304,0,0,0,2,2,1,0.4,0.4,0.4,0.333333333333333,0,0.5,0.4

4,3,2,2,0.666666666666667,2,1,2,0,0,0,0,0,0,2,2,1,0.333333333333333,0.333333333333333,0.333333333333333,0.25,0,0.5,0.4

The first row of above-mentioned training data describes file " q2.arff " by name, key word is " RELATION ", and the second line description first feature " length of current queries ", key word is " ATTRIBUTE ".By that analogy, 39 feature interpretation are had.

An ensuing line description optimized parameter type is the decimal between 0-1, and key word is " ATTRIBUTE ".Be exactly characteristic of correspondence matrix after@DATA, eigenmatrix refers in training data file the content removed with@beginning, and eigenmatrix is made up of previously mentioned 39 user behavior proper vectors and corresponding optimized parameter.Every a line has 40 data item, and first 39 is eigenwert, and the 40th data item is optimized parameter.Each training data, can use a line (40) vector representation, a line of constitutive characteristic matrix.The quantity of training data determines the line number of eigenmatrix.Separate with comma between data item.

Adopt machine learning method SVM to return (SVM-Regression) according to above-mentioned training data and carry out training parameter forecast model, this model representation be the funtcional relationship of optimal weights and each feature;

With the MAP maximal value of each inquiry for search desired value.The step-length of traversal is 0.1.Adopt Support VectorRegression (SVR) (Chang and Lin, 2001) to train, determine the optimal weights of each inquiry and the funtcional relationship of 39 features, and then obtain training parameter forecast model.

When application parameter forecast model is predicted, input 39 eigenwerts of each test query, this parametric prediction model just can produce corresponding weighted value.Combine current queries by this way, historical query and history click three parts.Test data form is as follows.Test data and training data form basically identical, difference be last row of proper vector in test data are "? ", represent value to be predicted.Test data form is:

The main task performing retrieval subsystem is using TREC AP88-90 document as band search file, uses Lemur to set up index, then completes retrieval tasks at conventional language model framework.

What produce according to previous step application parameter forecast model predicts the outcome, and organizes current queries and historical information, forms personalized enquire model.

If current queries is the kth inquiry Q in inquiry session _k, the user interest so representated by short-term history inquiry is embodied in historical query Q _ithe average of the term probability of occurrence in (1≤i≤k-1).Similar, user's short-term interest is also embodied in history and clicks C _ithe average that term in (1≤i≤k-1) occurs.Query history is by historical query H _qh is clicked with history _ccomposition.Query word ω represents.

A) current queries model is calculated

p (ω | Q_{i}) = \frac{c (ω, Q_{i})}{| Q_{i} |} - - - (1)

The implication of parameters in formula, please illustrate: ω represents word, Q _irepresentative inquiry, P represents probability, and i represents i-th time.The length of the number of times that current queries model is occurred by current queries word and current queries determines.P (ω | Q _i) represent current queries Q ₁in the probability that occurs of each word ω.C (ω, Q _i) represent at inquiry Q _ithe number of times that middle word ω occurs.| Q _i| represent the length of inquiry Qi, be namely made up of how many words.The implication of current queries model representative is the computing method of the probability of some words in inquiry string, the number of times that this word occurs in inquiry then divided by current queries in the sum of word.

B) historical query model is calculated

p (ω | H_{Q}) = \frac{1}{k - 1} Σ_{i = 1}^{i = k - 1} p (ω | Q_{i}) - - - (2)

The implication of parameters in formula, please illustrate: ω represents word, Q _irepresentative inquiry, P represents probability, H _qrepresent whole historical querys, i represents i-th time.Historical query model p (ω | H _q) by single historical query model P (ω | Q _i) adding up and being averaged obtains.For current queries Q _k, its historical query is by Q ₁, Q ₂... Q _k-1composition.By each historical query model P (ω | Q _i) cumulative, then divided by the quantity k-1 of historical query.Wherein single historical query model P (ω | Q _i) calculate according to formula (1).The implication of historical query model representative is at whole history H _qthe method for calculating probability of middle single word ω is, calculates the sum of the word that number of times that this word occurs in each historical query comprises divided by place historical query first respectively, next, next k-1 probability is done and, finally divided by k-1.

C) history click model is calculated

p (ω | H_{C}) = \frac{1}{k - 1} Σ_{i = 1}^{i = k - 1} p (ω | C_{i}) - - - (3)

The implication of parameters in formula, please illustrate: ω represents word, C _ithe web document that representative of consumer was checked, P represents probability, H _cwhole history web pages document that representative of consumer has been seen, i represents i-th time.With historical query model class seemingly, history click model P (ω | H _c) by single history click model P (ω | C _i) adding up and being averaged obtains.For current queries Q _k, its history is clicked by C ₁, C ₂... C _k-1composition.By each history click model P (ω | C _i) cumulative, then divided by the quantity k-1 that history is clicked.Wherein single history click model calculates according to formula (1).

D) current queries category feature is extracted

Mainly comprise the length of current queries.

E) historical query category feature is extracted

Mainly comprise historical query quantity, total length and average length.

F) feature between current queries and historical query is extracted

Mainly comprise the similarity between current queries and a upper inquiry, the similarity of current queries and whole historical query, new epexegesis and the quantity deleting word, and the proportion shared by current queries or historical query.

G) feature between current queries and history click is extracted

Mainly comprise the similarity between current queries and whole and upper history click, new epexegesis and the quantity deleting word, and concentrate the proportion of operation at current queries and history point.

H) operation parameter forecast model obtains parameter

User characteristics, as the input of parameter prediction system, exports the parameter of the best being applicable to current queries

I) current queries model, historical query model and history click model is organized according to the parameter doped

Wherein parameter beta _k∈ (0,1) determines the weight allocation between historical query and history click, parameter beta _kthe importance that larger explanation history is clicked is larger; Work as β _kwhen=1, represent that user interest model is clicked by history completely and embody.In like manner, α _klarger, the importance of current queries is larger.

Two kinds of methods attempted respectively by adaptive personalized retrieval model, and a kind of is that formalization representation is as formula (4) based on the retrieval model (AdaptiveEW) in the equal situation of importance between history.Another kind be according to history and current queries time gap descending, importance is changed from small to big the retrieval model (AdaptiveDW) under rule, and formalization representation is as shown in formula (5).Wherein, Q _krepresent current queries, H _crepresent that the history in current queries session before current queries is clicked, H _qrepresent the historical query in current queries session.Parameter alpha _k, β _k, m _k, n _krepresent weight respectively, their span is the arbitrary small number between 0 to 1.

The interrogation model p of self-adaptation personalized retrieval model (AdaptiveEW) (ω | θ _k) comprise two parts: current queries model p (ω | Q _k) and historical models, current queries Model Weight is α _k.Historical models weight is 1-α _k.The probability that current queries model representation current queries word ω occurs, calculates according to formula (1).Wherein historical information by history click model p (ω | H _c) and historical query model p (ω | H _q) composition.Historical query model calculates according to formula (2).History click model calculates according to formula (3).Between each historical query, weight is equal.Between the click of each history, weight is equal.History click model weight is 1-β _k, history click model weight is β _kas shown in formula (4).

p(ω|θ _k)=α _κp(ω|Q _K)+(1-α _k)[β _kp(ω|H _C)+(1-β _k)p(ω|H _Q)]

（4）

The implication of parameters in formula, please illustrate:

Be more than self-adaptation retrieval model (AdaptiveEW), wherein between historical information, weight is equal.Another kind of self-adaptation retrieval model thinks that the importance of historical information is relevant with the time gap of current queries.The interrogation model p of this self-adaptation retrieval model (AdaptiveDW) (ω | ψ _k) comprise two parts: interrogation model p (ω | θ _k) and history click model p (ω | H _c) composition.History click model p (ω | H _c) weight be m _k, interrogation model p (ω | θ _k) weight be 1-m _k.Interrogation model p (ω | θ _k) by current queries model p (ω, θ _k) and a upper moment interrogation model p (ω | θ _k-1) composition.Current queries model p (ω, θ _k) weight is n _k, the interrogation model p in a upper moment (ω | θ _k-1) weight is 1-n _k.Historical query model calculates according to formula (2).History click model calculates according to formula (3).Interrogation model carries out weight decay to old interrogation model along with passage of time in self-adaptation retrieval model (AdaptiveDW), and new historical query is larger than the weight of old historical query, formalization representation is as shown in formula (5).

p(ω|θ _k)=n _kp(ω,Q _K)+(1-n _k)p(ω|θ _k-1)

The implication of parameters in formula, please illustrate:

J) retrieving is started

In document to be retrieved, find the result for retrieval mated with personalized enquire, and carry out descending sort according to correlation probabilities value.Each inquiry returns 1000 sections of documents.

After personalized enquire is submitted to searching system, searching system returns result for retrieval.The data layout of personalizing search results:

First row represents number of queries, and secondary series represents document code, and the 3rd row representative sequence, the 4th row represent the mark of language model.So far, the implementation process of whole self-adaptation personalized retrieval model terminates.

Claims

1. self-adaptation Personal Information System, is characterized in that this system comprises:

For finding the document that mates with personalized enquire as preliminary search result in document to be retrieved, also for sorting to described preliminary search result according to correlativity, and using data output subsystem that the result after sorting exports as final result for retrieval;

Wherein, described parameter training and predicting subsystem comprise:

For receiving the data input module of pending data;

For constitutive characteristic matrix norm block;

2. self-adaptation Personal Information System according to claim 1, is characterized in that, described data input subsystem comprises:

3. self-adaptation Personal Information System according to claim 2, is characterized in that, described user behavior feature comprises:

Represent that the history of the web document of checking of user in an inquiry session session clicks category feature,

Represent the historical query category feature to searching system submitted of user in an inquiry session session,

Represent the current queries category feature of current queries;

4. self-adaptation Personal Information System according to claim 3, is characterized in that,

Described history is clicked category feature and is comprised: history click total degree, and history clicks total length, and history clicks length mean value, clicks average length at every turn, and a upper history clicks total length, last click number of documents, the last mean value clicking Document Length;

5. self-adaptation Personalized search, is characterized in that this self-adaptation Personalized search comprises:

Find in document to be retrieved with the document of personalized enquire Model Matching as preliminary search result, and according to correlativity, described preliminary search result is sorted, using the step that the result after sequence exports as final result for retrieval data;

Wherein, described according to eigenmatrix training also application parameter forecast model, the step obtaining the parameter of prediction also comprises:

Receive the step of pending data;

The step of constitutive characteristic matrix;

6. self-adaptation Personalized search according to claim 5, is characterized in that, according to current queries information, the step in conjunction with historical query information and history click information constitutive characteristic matrix comprises:

7. self-adaptation Personalized search according to claim 6, is characterized in that, described user behavior feature comprises:

Represent the current queries category feature of current queries;

8. self-adaptation Personalized search according to claim 7, is characterized in that,