CN102637170A

CN102637170A - Question pushing method and system

Info

Publication number: CN102637170A
Application number: CN2011100356794A
Authority: CN
Inventors: 姜庭欣; 谢双宾; 李连华; 罗建岚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2011-02-10
Filing date: 2011-02-10
Publication date: 2012-08-15

Abstract

The invention provides a question pushing method and system, which are implemented based on pre-established user models, and the user models include at least two of the following models: an interest model, an attribute model, a behavior model and a relation model, wherein the interest model is established by exploiting questions asked by users and answers thereof, the attribute model is established based on user attributes, the behavior model is established by carrying out statistics on user behaviors, and the relation model is established based on relations between different users. The method comprises the following steps: A, carrying out text analysis on a question to be answered so as to extract question features; B, matching with the user models by using the extracted question features, and carrying out sequencing on users according to the matching degree of the question features with the users in each user model and a preset sequencing weight of each user model; and C, pushing the question to be answered to the front M users in the obtained sequence, wherein M is a preset positive integer. Through the method and system disclosed by the invention, the answer providing efficiency and quality of a knowledge questioning and answering system can be improved.

Description

A kind of problem method for pushing and system

[technical field]

The present invention relates to Internet technical field, particularly a kind of problem method for pushing and system.

[background technology]

Along with developing rapidly of Internet technology, obtaining information and carry out mutual communication through the internet has become the part that people live every day.The knowledge question system is exactly a kind of system that utilizes communication function realization information to obtain, and the user can submit variety of issue in the knowledge question system through webpage, and which answer the state that inquiry is submitted a question adopts according to the situation decision of question answering.Other users can check problem through visiting this webpage, and answer according to oneself hobby and knowledge.

In present knowledge question system, in a single day problem is submitted in the knowledge question system, and represents with the form of puing question to the page, just relies on other users that see this enquirement page in this knowledge question system and answers.Yet this mode can cause following problem:

One of which, need other User login knowledge question systems after, browse to this enquirement page and just may answer puing question to the problem on the page.

The user who two, browses to this enquirement page possibly not answer the interest or the ability of this problem, just can not answer puing question to the problem on the page, and perhaps, the answer that provides possibly not be the high-quality answer.

Can find out that the efficient and the quality that furnish an answer for the enquirement user in the existing knowledge question system are lower.

[summary of the invention]

In view of this, the invention provides a kind of problem push method and system, so that improve efficient and quality that the knowledge question system furnishes an answer.

Concrete technical scheme is following:

A kind of problem method for pushing; Based on the user model of setting up in advance, wherein said user model comprises following in listed at least two: the interest model that excavates foundation through problem that the user is putd question to and answer, the attribute model of setting up based on user property, through the behavior model of user behavior being added up foundation and the relational model of setting up based on relation between the different user; This method comprises:

A, treat to answer a question and carry out text analyzing and extract problem characteristic;

B, the problem characteristic match user model that utilize to extract, according to problem characteristic in each user model with the ordering weight of user's matching degree and preset each user model, the user is sorted;

C, with said wait to answer a question be pushed to the user of ordering at preceding M, wherein M is preset positive integer.

Wherein, the foundation of said interest model comprises:

S1, grasp each user's question and answer historical data;

S2, said each user's question and answer historical data is carried out text analyzing, extract each user's interest speech;

Each user's interest model is set up or upgraded to the interest speech that S3, utilization are extracted, and wherein comprises in the interest model: user's interest speech and the weight FeatW of interest speech in the interest model of respective user.

Particularly, step S2 can comprise: be directed against each user's difference execution in step S21 respectively to step S23;

The problem of S21, problem, the problem of proposition, the problem of browsing or inquiry that this user is answered is carried out the word segmentation processing based on semanteme;

S22, based on the rate of falling the document and the ability of expressing the meaning to each word of obtaining after the word segmentation processing, confirm the weighted value TermW that expresses the meaning of said each word;

S23, the weighted value TermW that will express the meaning confirm as this user's interest speech greater than the word of preset interest weight selection value.

In addition, said steps A specifically comprises:

A1, to said wait to answer a question carry out word segmentation processing based on semanteme;

A2, based on the rate of falling the document and the ability of expressing the meaning to each word of obtaining after the word segmentation processing, confirm the weighted value TermW that expresses the meaning of said each word;

A3, the weighted value TermW that will express the meaning confirm as characteristic speech said to be answered a question greater than the word of preset feature extraction weighted value.

Wherein, the weighted value TermW that expresses the meaning of said each word is:

TermW=TermW=a*idf+b*ind; Wherein, The rate of falling the document c for preset more than or equal to 1 parameter; Df be word the number of times that occurs in remaining to be answered a question; N is a quantity to be answered a question, the ability of expressing the meaning of ind identification of words, and a and b are the predetermined weights coefficient.

Further, before said step S22, also comprise:

In each word that obtains after the word segmentation processing, the df not word in preset range filters, wherein df be word the number of times that occurs in remaining to be answered a question;

Among the said step S22, only calculate the weighted value of expressing the meaning to the word of df in preset range.

The weight FeatW of said interest speech in the interest model of respective user is:

FeatW=a1*TermW+b1* Δ Tr; Wherein, The changing value or the rate of change of the frequency that this interest speech occurs in Δ Tr user's question and answer historical data that to be this interest speech occurs in user's question and answer historical data of grasping of current slot the frequency grasped with respect to a last time period, a1 and b1 are the predetermined weights coefficient.

Particularly, according to

Upgrade user's interest model, wherein, the interest model after Q ' expression is upgraded, the interest model before Q representes to upgrade, D _rRepresent positive routine sample set, be included in interest speech in rising trend of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _rRepresent positive routine sample size, D _nExpression counter-example sample set is included in interest speech on a declining curve of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _nExpression counter-example sample size, α, β and γ are predefined adjustment coefficients.

More excellent ground, this method also comprises: the interest speech in user's the interest model is expanded;

Wherein, to interest speech T in the interest model of user u _iThe expansion of carrying out comprises:

D1, confirm said interest speech T _iExpansion word T _j

D2, calculating user u are to said expansion word T _jInterest level W _j,

W_{j} = α 1 * \log (jnum) + P_{u, T_{j}} + β 1 * W_{avg},

Said

Wherein, T ' be in the interest model of user u with T _jThe degree of correlation surpass the interest set of words of preset relevance degree, s (T _n, T _j) be interest speech T _nExpansion word T with this interest speech _jBetween the degree of correlation, α 1 and β 1 are preset adjustment coefficient, jnum is expansion word T _jThe number of times that in all users' interest model, occurs, W _AvgBe interest speech T _iWeighted mean in all user interest models, Be interest speech T _iWeighted value in the interest model of user u;

If D3 is W _jValue surpass preset interest weight selection value, then with said expansion word T _jAdd to as the interest speech in the interest model of said user u.

Particularly, said step D1 can comprise:

From having said interest speech T _iOther users' interest model in, select in other users' the interest model and said interest speech T _iBelong to other interest speech in the same categorize interests;

According to said other interest speech and the said interest speech T that select _iThe degree of correlation, select the degree of correlation to come before P interest speech as interest speech T _iExpansion word, wherein P is preset positive integer.

Wherein, said attribute model comprises: user property and the user property weighted value in attribute model;

Said user property comprises following a kind of or combination in any in listed: geographic position, sex, age and industry;

The weighted value of the user property that the user has in attribute model is set to a fixed value, and the weighted value of the user property that the user does not have in attribute model is set to 0; Perhaps; The interest speech with geographic position attribute that weighted value in user's the interest model is the highest adds in this attribute of user model as this user's geographic position attribute, and the weighted value of this geographic position attribute in attribute model is identical with the weighted value of interest speech in this user's interest model.

Relation comprises following listed a kind of or combination in any between the said different user: question and answer relation, same regimental tie, similar interests relation;

Wherein, saidly comprise: appear at same community, same column, same interchange crowd or reply same model with regimental tie.

Particularly, if in the relational model between the different user relation only comprise question and answer relations or same regimental tie, then between user u1 and the user u2 concern weight Rela Rank1 (u1 u2) is: RelaRank1 (u1, u2)=log (relaCnt+ α 2)/log β 2; Wherein, α 2 and β 2 are preset smoothing factor, and relaCnt be that user u1 concerns with u2 question and answer in the setting-up time section or with the number of times of regimental tie appearance;

If concern the similar interests relation that only comprises in the relational model between the different user; Then concern weight Rela Rank2 (u1 between user u1 and the user u2; U2) be:

wherein;

is the interest speech matrix-vector of user u1,

be the interest speech matrix-vector of user u2;

If concern between the different user in the relational model except comprising the similar interests relation; Also comprise question and answer relation or same regimental tie, then (u1 u2) is the weight Rela Rank that concerns between user u1 and the user u2: Rela Rank (u1; U2)=Rela Rank1 (u1; U2) * α 3+Rela Rank2 (u1, u2) * β 3, wherein α 3 is the predetermined weights coefficient with β 3.

Said behavior model comprises user's liveness grade; Wherein the liveness grade Active (u) of user u is:

Active(u)＝[log(S _answer+α4)*w1+log(S _login+α4)*w2+log(S _browse+α4)*w3]/logβ4，

W1, w2 and w3 are the predetermined weights coefficients, and α 4 and β 4 are preset smoothing factor,

S _Answer, S _LoginAnd S _BrowseBe respectively number of times that user u answers a question in the setting-up time section, land the number of times of question answering system and browse the number of times of the question answering system page, perhaps be respectively frequency, the frequency of landing question answering system that user u answers a question, the frequency of browsing the question answering system page in the setting-up time section.

Particularly, said step B can comprise:

A kind of or combination in any in problem characteristic match interest model, attribute model and the relational model that B1, utilization are extracted is confirmed the problem characteristic matching user with said extraction;

B2, utilize a kind of or combination in any in interest model, attribute model, relational model and the behavior model; Confirm user that step B1 determines and matching degree said to be answered a question, the user that said step B1 determines is sorted according to matching degree;

Wherein, the user model of the user model of said step B1 utilization and step B2 utilization is incomplete same.

If the user model of utilizing among the said step B1 is more than two, then will utilize each user model problem characteristic matching user that determine and said extraction to get union respectively.

If utilized interest model among the said step B1, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question;

Utilize interest model to confirm to comprise with the problem characteristic matching user of said extraction:

Said characteristic speech and each user's interest model is mated, said characteristic speech is hit the corresponding user of interest speech confirm as the problem characteristic matching user with said extraction;

Perhaps; Said characteristic speech and each user's interest model is mated; Confirm that said characteristic speech hits interest the speech corresponding user and the weighted value of interest speech in interest model that hit, hitting interest speech weighted value in interest model is confirmed as matching user greater than the user of preset matching weighted value.

More preferably, this method also comprises: confirm said categorize interests under waiting to answer a question, the user less than preset interest filtration weighted value filters out from said matching user with the weighted value of this categorize interests.

If utilized attribute model among the said step B1, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question;

Utilize attribute model to confirm to comprise with the problem characteristic matching user of said extraction:

Said characteristic speech and each attribute of user model are mated, will and said characteristic speech between the degree of correlation reach the pairing user of user property that the preset degree of correlation requires and confirm as the problem characteristic matching user with said extraction.

If utilized relational model among the said step B1, then the problem characteristic of said extraction comprises enquirement user profile said to be answered a question;

Utilize relational model to confirm to comprise with the problem characteristic matching user of said extraction:

Said enquirement user profile and relational model are mated, will and said enquirement user between the user that weighted value reaches the preset matching weighted value of concerning confirm as the problem characteristic matching user with said extraction.

Wherein, user U and matching degree Rank (U) said to be answered a question are: Rank (U)=Rank (interest) * W1+Rank (Profile) * W2+Rank (rela) * W3+Rank (behavior) * W4;

W1, W2, W3 and W4 are the predetermined weights coefficient, T _NFor from the said characteristic set of words of extracting waiting to answer a question, TermW _iBe the weighted value of expressing the meaning of i characteristic speech, FeatW _iBe the weighted value of i characteristic speech in the interest model of user U,

FeaVec _iBe the weighted value of i characteristic speech in the attribute model of user U; Rank (rela) is the weighted value that concerns between the enquirement user said to be answered a question that in said relational model, searches and the said user U, Rank (behavior) by the said user U that in the behavior model of said user U, searches to enliven grade Active (U) definite.

In addition, said step C comprises:

According to the ranking results of said step B, with said wait to answer a question be pushed to the user who is in line states before coming among the user of M.

Perhaps, said step C comprises:

According to the ranking results of said step B, with said wait to answer a question be pushed to come before among the user of M the liveness grade surpass the user of preset propelling movement liveness.

A kind of problem supplying system; This system is based on the user model of setting up in advance, and wherein said user model comprises following in listed at least two: the interest model that excavates foundation through problem that the user is putd question to and answer, the attribute model of setting up based on user property, through the behavior model of user behavior being added up foundation and the relational model of setting up based on relation between the different user; This system comprises: feature extraction unit, user's sequencing unit and problem push unit;

Said feature extraction unit is used to treat answer a question and carries out text analyzing extraction problem characteristic;

Said user's sequencing unit, the problem characteristic match user model that is used to utilize said feature extraction unit to extract, according to problem characteristic in each user model with the ordering weight of user's matching degree and preset each user model, the user is sorted;

Said problem push unit is used for the ranking results according to said user's sequencing unit, with said wait to answer a question be pushed to the user of ordering at preceding M, wherein M is preset positive integer.

Further, this system also comprises the interest model maintenance unit;

Said interest model maintenance unit specifically comprises: data grasp subelement, text analyzing subelement and model maintenance subelement;

Said data grasp subelement, are used to grasp each user's question and answer historical data;

Said text analyzing subelement is used for said each user's question and answer historical data is carried out text analyzing, extracts each user's interest speech;

Said model maintenance subelement; The interest model that is used to utilize the interest speech foundation of said text analyzing subelement extraction or upgrades each user wherein comprises in the interest model: user's interest speech and the weight FeatW of interest speech in the interest model of respective user.

Particularly, said text analyzing subelement can comprise: word segmentation processing module, weight determination module and interest speech determination module;

Said word segmentation processing module, the problem of problem, the problem of proposition, the problem of browsing or the inquiry that is used for the user is answered is carried out the word segmentation processing based on semanteme;

Said weight determination module is used for the rate of falling the document and the ability of expressing the meaning based on each word that obtains after the said word segmentation processing module word segmentation processing, and confirms the weighted value TermW that expresses the meaning of said each word;

Said interest speech determination module is used for the weighted value TermW that expresses the meaning is confirmed as greater than the word of preset interest weight selection value this user's interest speech.

Wherein, said feature extraction unit specifically comprises: word segmentation processing subelement, weight confirm that subelement and interest speech confirm subelement;

Said word segmentation processing subelement, be used for to said wait to answer a question carry out word segmentation processing based on semanteme;

Said weight is confirmed subelement, is used for confirming the weighted value TermW that expresses the meaning of said each word based on the rate of falling the document and the ability of expressing the meaning to each word of obtaining after the said word segmentation processing subelement word segmentation processing;

Subelement confirmed in said interest speech, is used for the weighted value TermW that expresses the meaning is confirmed as characteristic speech said to be answered a question greater than the word of preset feature extraction weighted value.

The weighted value TermW that expresses the meaning of said each word is:

More preferably; Said text analyzing subelement also comprises: the filtration treatment module; Be used for each word of obtaining after the said word segmentation processing module word segmentation processing, the df not word in preset range filters, wherein df be word the number of times that occurs in remaining to be answered a question;

The word that said weight determination module obtains after only filtering to said filtration treatment module calculates the weighted value of expressing the meaning.

Particularly, said model maintenance subelement according to Upgrade user's interest model, wherein, the interest model after Q ' expression is upgraded, the interest model before Q representes to upgrade, D _rRepresent positive routine sample set, be included in interest speech in rising trend of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _rRepresent positive routine sample size, D _nExpression counter-example sample set is included in interest speech on a declining curve of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _nExpression counter-example sample size, α, β and γ are predefined adjustment coefficients.

Further, the interest model maintenance unit also comprises: interest speech expansion subelement is used for the interest speech of user's interest model is expanded;

Said interest speech expansion subelement specifically comprises: expansion word determination module, interest-degree computing module and interest speech expansion module;

Said expansion word determination module is used for confirming interest speech T _iExpansion word T _j

Said interest-degree computing module is used to calculate user u to said expansion word T _jInterest level W _j,

Wherein,

Said

Wherein, T ' be in the interest model of user u with T _jThe degree of correlation surpass the interest set of words of preset relevance degree, s (T _n, T _j) be interest speech T _nExpansion word T with this interest speech _jBetween the degree of correlation, α 1 and β 1 are preset adjustment coefficient, jnum is expansion word T _jThe number of times that in all users' interest model, occurs, W _AvgBe interest speech T _iWeighted mean in all user interest models,

Be interest speech T _iWeighted value in the interest model of user u;

Said interest speech expansion module is used to judge W _jValue whether surpass preset interest weight selection value, if, then with said expansion word T _jAdd to as the interest speech in the interest model of said user u.

Said expansion word determination module is specifically from having said interest speech T _iOther users' interest model in, select in other users' the interest model and said interest speech T _iBelong to other interest speech in the same categorize interests, according to said other interest speech and the said interest speech T that select _iThe degree of correlation, select the degree of correlation to come before P interest speech as interest speech T _iExpansion word, wherein P is preset positive integer.

Further, this system also comprises: the attribute model maintenance unit is used for setting up and renewal attribute of user model based on user property;

Said attribute model comprises: user property and the user property weighted value in attribute model;

Wherein, said user property comprises following a kind of or combination in any in listed: geographic position, sex, age and industry;

The weighted value of user property in attribute model that said attribute model maintenance unit user has is set to a fixed value, and the weighted value of the user property that the user does not have in attribute model is set to 0; Perhaps; The interest speech with geographic position attribute that weighted value in user's the interest model is the highest adds in this attribute of user model as this user's geographic position attribute, and the weighted value of this geographic position attribute in attribute model is identical with the weighted value of interest speech in this user's interest model.

In addition, this system also comprises: the relational model maintenance unit, be used for based on concerning between the different user, and set up and the renewal relational model;

Relation comprises following listed a kind of or combination in any between the said different user: question and answer concern, concern with regimental tie and similar interests;

If in the relational model between the different user relation only comprise question and answer relations or same regimental tie, then between user u1 and the user u2 concern weight Rela Rank1 (u1 u2) is: Rela Rank1 (u1, u2)=log (relaCnt+ α 2)/log β 2; Wherein, α 2 and β 2 are preset smoothing factor, and relaCnt be that user u1 concerns with u2 question and answer in the setting-up time section or with the number of times of regimental tie appearance;

If concern the similar interests relation that only comprises in the relational model between the different user; Then concern weight Rela Rank2 (u1 between user u1 and the user u2; U2) be: wherein;

is the interest speech matrix-vector of user u1,

be the interest speech matrix-vector of user u2;

This system also comprises: the behavior model maintenance unit, be used for user behavior is added up, and set up and upgrade user's behavior model;

Wherein, said behavior model comprises user's liveness grade; Wherein the liveness grade Active (u) of user u is:

Particularly, said user's sequencing unit can comprise: the user is mated subelement and the user subelement that sorts;

Said user is mated subelement, is used for utilizing a kind of or combination in any of problem characteristic match interest model, attribute model and relational model that said feature extraction unit extracts, confirms the problem characteristic matching user with said extraction;

The said user subelement that sorts; Be used for utilizing a kind of or combination in any of interest model, attribute model, relational model and behavior model; Confirm that said user matees user that subelement determines and matching degree said to be answered a question, according to matching degree said user is mated the user that subelement determines and sort;

Wherein, said user user model and the said user of mating the subelement utilization user model that subelement utilizes that sorts is incomplete same.

If it is more than two that said user is mated the user model of subelement utilization, then also be used for utilizing each user model problem characteristic matching user that determine and said extraction to get union.

Utilized interest model if said user is mated subelement, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question;

Said user is mated subelement said characteristic speech and each user's interest model is mated, and said characteristic speech is hit the corresponding user of interest speech confirm as the problem characteristic matching user with said extraction; Perhaps; Said characteristic speech and each user's interest model is mated; Confirm that said characteristic speech hits interest the speech corresponding user and the weighted value of interest speech in interest model that hit, hitting interest speech weighted value in interest model is confirmed as matching user greater than the user of preset matching weighted value.

More excellent ground; Said user's sequencing unit also comprises: the user filtering subelement; Be used for confirming said categorize interests under waiting to answer a question, the weighted value of this categorize interests is mated the user that subelement determines from said user less than the user of preset interest filtration weighted value filter out.

Utilized attribute model if said user is mated subelement, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question;

Said user is mated subelement said characteristic speech and each attribute of user model is mated, will and said characteristic speech between the degree of correlation reach the pairing user of user property that the preset degree of correlation requires and confirm as the problem characteristic matching user with said extraction.

Utilized relational model if said user is mated subelement, then the problem characteristic of said extraction comprises enquirement user profile said to be answered a question;

Said user is mated subelement said enquirement user profile and relational model is mated, will and said enquirement user between the user that weighted value reaches the preset matching weighted value of concerning confirm as the problem characteristic matching user with said extraction.

The said user subelement that sorts adopts following formula to confirm user U and matching degree Rank (U) said to be answered a question: Rank (U)=Rank (interest) * W1+Rank (Profile) * W2+Rank (rela) * W3+Rank (behavior) * W4;

Wherein, W1, W2, W3 and W4 are the predetermined weights coefficient,

T _NFor from the said characteristic set of words of extracting waiting to answer a question, TermW _iBe the weighted value of expressing the meaning of i characteristic speech, FeatW _iBe the weighted value of i characteristic speech in the interest model of user U,

Particularly, said problem push unit can comprise: state confirms that subelement and first pushes subelement;

Said state is confirmed subelement, is used for the ranking results according to said user's sequencing unit, and whether M user is in line states before confirming to come;

Said first pushes subelement, be used for said wait to answer a question be pushed to come before M user be in the user of line states.

Particularly, said problem push unit can comprise: liveness confirms that subelement and second pushes subelement;

Said liveness is confirmed subelement, is used for the ranking results according to said user's sequencing unit, searches M user's before coming liveness grade in the subordinate act model;

Said second pushes subelement, be used for said wait to answer a question be pushed to come before user's liveness grade of M surpass the user of preset propelling movement liveness.

Can find out by above technical scheme; The present invention treat answer a question carry out text analyzing and extract problem characteristic after; Based on a plurality of user models of setting up in advance; To sorting with matching user to be answered a question, answer a question active push to the user of ordering with waiting at preceding M, this preceding M user is exactly the user who this most probable to be answered a question is existed the interest or the ability of answer.Through method and system provided by the invention,, thereby improved efficient and the quality that furnishes an answer even if make that treating answers a question and to exist the user of the interest or the ability of answer not browse to put question to that the page also can be by the propelling movement problem.

[description of drawings]

The training process process flow diagram of the interest model that Fig. 1 provides for the embodiment of the invention one;

Fig. 2 is the method flow diagram of the expansion interest speech that provides in the embodiment of the invention one;

Among Fig. 3 (a) (b) (c) be respectively the inverted list synoptic diagram of interest model, Profile model and relational model;

Definite answer user's that Fig. 4 provides for the embodiment of the invention five process flow diagram;

The system architecture synoptic diagram that Fig. 5 provides for the embodiment of the invention six;

The structural representation of the text analyzing subelement that Fig. 6 provides for the embodiment of the invention six;

The structural representation of the interest speech expansion subelement that Fig. 7 provides for the embodiment of the invention six.

[embodiment]

In order to make the object of the invention, technical scheme and advantage clearer, describe the present invention below in conjunction with accompanying drawing and specific embodiment.

Method provided by the invention mainly comprises following two processes:

The model training process on backstage: set up user model in advance; Wherein, user model comprises following in listed at least two: the interest model through foundation is excavated in user's enquirement and answer, user property (Profile) model of setting up based on user property, through the behavior model of user behavior being added up foundation and the relational model of setting up based on relation between the different user.

The foreground is directed against problem definite response user's process: problem is carried out text analyzing extract problem characteristic; The problem characteristic match user model that utilize to extract, according to problem characteristic in each user model with the ordering weight of user's matching degree and preset each user model, the user is sorted; This problem is pushed to ordering the individual user of preceding M, and wherein M is preset positive integer.

Specifically be described in detail respectively below to the model training process with to problem definite response user's process.At first describe through the process of setting up of a pair of interest model of embodiment.

Embodiment one,

The training process process flow diagram of the interest model that Fig. 1 provides for the embodiment of the invention one, in this process, the interest speech is the eigenwert in the interest model, and is as shown in Figure 1, this process can may further comprise the steps:

Step 101: the question and answer historical data that grasps each user.

In the present embodiment, after setting up interest model for the first time, can periodically grasp each user's in the setting-up time section question and answer historical data, utilize each user's who grabs question and answer historical data that interest model is upgraded.

Step 102: the question and answer historical data to each user is carried out text analyzing, extracts each user's interest speech.

In this step, can comprise in user's the question history data: the problem that the user answers, the problem of proposition, the problem of browsing, the problem of inquiring about etc.Wherein, can show the problem that the user answers that is of user interest, the problem of in follow-up description, answering with the user is that example is described.

The text analyzing process that relates in this step can comprise: the problem to the user answers is carried out the word segmentation processing based on semanteme; Based on the rate of falling the document and the ability of expressing the meaning to each word of obtaining after the word segmentation processing; Confirm the weighted value of expressing the meaning of each word, the weighted value of expressing the meaning is confirmed as this user's interest speech greater than the word of preset interest weight selection value.

Wherein, the weighted value TermW that expresses the meaning of word can be expressed as:

TermW＝a*idf+b*ind (1)

Wherein, Idf representes the rate of falling the document of word;

c for preset more than or equal to 1 parameter; N is a quantity to be answered a question, and df is the document rate, promptly word the number of times that occurs in remaining to be answered a question.Ind representes the ability of expressing the meaning of word; The ability of expressing the meaning of word can be according to the part of speech decision of word; Noun for example be set usually have the higher ability of expressing the meaning, conjunction is set, function word has the lower ability of expressing the meaning; Can confirm through inquiring about the preset capability list of expressing the meaning, in this expresses the meaning capability list, can store the corresponding ability value of expressing the meaning of each part of speech.A and b are the weight coefficient that is provided with in advance.

Further; Before the weighted value of expressing the meaning that calculates word; Can at first calculate the document rate of word; With the document rate not the word in setting range filter, only the word of document rate in setting range calculated the weighted value of expressing the meaning, and the weighted value of will expressing the meaning is confirmed as this user's interest speech greater than the word of preset interest weight selection value.The reason of so doing is: the document rate of word has reflected the frequency that word occurs in problem, if seldom, the interest that this word can not emphasis reflection user is described then; If a lot, for example " ", " what " wait auxiliary word, interrogative, can not emphasis reflects user's interest; Therefore; Can these words be filtered, needn't calculate the weighted value of expressing the meaning of word, thereby raise the efficiency.

Step 103: utilize the interest speech foundation of extracting or upgrade each interest model, wherein comprise in the interest model: user's interest speech, and the weight FeatW of this interest speech in this user's interest model.

The weight FeatW of interest speech in user interest model is that express the meaning weighted value and user with this interest speech is to the trend correlation interested of this interest speech, promptly

FeatW＝a1*TermW+b1*ΔTr，(2)

Wherein Δ Tr is the interest trend of user to this interest speech, comprises changing value or the rate of change of the frequency of this interest speech with respect to the frequency of this interest speech appearance in user's question and answer historical data of last time period extracting in user's question and answer historical data that can adopt current slot to grasp.If Δ Tr is a positive number, explain that the user increases the trend interested of this interest speech, if Δ Tr is a negative, explain that the user reduces the trend interested of this interest speech.A1 and b1 are the predetermined weights coefficient.

In interest model, can also be further user's interest speech be carried out categorize interests, and utilize the FeatW of each interest speech in the categorize interests to calculate the weighted value of each categorize interests in this interest model.Usually can the FeatW of each interest in the categorize interests be averaged as the weighted value of this categorize interests, perhaps, the weighted value of categorize interests also can be confirmed by the statistical distribution of interest speech in this categorize interests.

The thought of in the present embodiment adaptive filtering algorithm having been used in the foundation and the renewal of interest model is core with the related feedback method (Rocchio algorithm) based on expansion, when setting up interest model, can utilize vector space model to describe user's interest speech.Carry out active study through constantly answering history, upgrade interest model, make whole interest model answer the development trend of situation along with the user and change, the user is answered the descriptive power of interest to keep interest model to the user.

The formula that adopts the Ricchio algorithm to upgrade interest model is:

Q^{'} = αQ + β (\frac{1}{N_{r}} \underset{D_{i} &Element; D_{r}}{Σ} D_{i}) - γ (\frac{1}{N} \underset{D_{j} &Element; D_{n}}{Σ} D_{j}) - - - (3)

Wherein, the interest model after Q ' expression is upgraded, the interest model before Q representes to upgrade, D _rRepresent positive routine sample set, N _rRepresent positive routine sample size, D _nExpression counter-example sample set, N _nExpression counter-example sample size.α, β and γ are predefined adjustment coefficients, have reflected interest model before revising, positive routine sample and the counter-example sample significance level when the renewal interest model respectively.Wherein, Positive routine sample set is included in interest speech in rising trend of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof; Be used for interest model is expanded, increase the related interests characteristic, help to improve answering user's recall rate.The counter-example sample set is included in interest speech on a declining curve of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, is used to reduce the importance of irrelevant interest characteristics, guarantees the accuracy rate of recommendation results.

More preferably,, can expand user's interest speech in order to improve the recall rate of answering the user, can be as shown in Figure 2 to the mode that expansion word is expanded, the process of user u being expanded the interest speech can may further comprise the steps:

Step 201: at first confirm interest speech T _iExpansion word T _j

At definite interest speech T _iExpansion word the time, can be from having this interest speech T _iOther user interest models in, select in other user models and this interest speech T _iBelong to other interest speech in the same categorize interests, according to other interest speech and the interest speech T that select _iBetween the degree of correlation, select the degree of correlation to come before P interest speech as interest speech T _iExpansion word, wherein P is preset positive integer.

Give an example; User u has an interest speech " Sprite "; Exist equally in other user interest models of interest speech " Sprite ", have with " Sprite " belongs to beverage classification and also have " cola ", " eye-catching ", " Pepsi ", calculate respectively and " Sprite " between the degree of correlation; Suppose to be according to the relevancy ranking result: " eye-catching ", " cola ", " Pepsi ", can select to come the expansion word of first interest speech " eye-catching " conduct " Sprite ".

Step 202: calculate user u to this expansion word T _jInterest level W _j:

W_{j} = α 1 * \log (jnum) + P_{u, T_{j}} + β 1 * W_{avg} - - - (4)

Wherein, Wherein, T ' be in the interest model of user u with T _jThe degree of correlation surpass the interest set of words of preset relevance degree, s (T _n, T _j) be interest speech T _nExpansion word T with this interest speech _jBetween the degree of correlation:

α 1 and the adjustment coefficient of β 1 for presetting can rule of thumb be worth and choose, and for example, getting α 1 is 2.1, and getting β 1 is 0.45.Jnum is expansion word T _jThe number of times that in all users' interest model, occurs, W _AvgBe interest speech T _iWeighted mean in all user interest models. Be interest speech T _iWeighted value in the interest model of user u, promptly above-mentioned FeatW.

Step 203: if W _jValue surpass preset interest weight selection value, then with this expansion word T _jAdd to as the interest speech in the interest model of user u.

This expansion word T _jWeighted value in interest model adopts formula (2) to calculate equally.

Because the interest speech can expand a plurality of interest speech; The length decision of the interest model before the interest speech number that finally expands can upgrade according to the user; If the interest speech number of user u is LEN in the interest model before upgrading; Then when upgrading interest model, the interest speech number that expands can be (LEN/10+1) * 10.

Describe through embodiment two, embodiment three and four pairs of processes of setting up Profile model, relational model and behavior model of embodiment below.

Embodiment two,

When setting up the Profile model, user property obtains the user property set as the eigenwert of Profile model, comprising: geographic position, sex, age, industry etc.

Above-mentioned user property is static attribute normally; For example the user in geographic position that the knowledge question platform is filled in, log-on message such as sex, age, industry, at this moment, if the user has a certain static attribute; Then the weighted value of its static attribute in the Profile model just is set to a fixed value; For example fixedly installing is 1, if the user does not have a certain static attribute, then the weighted value of this static attribute in the Profile model is exactly 0.

Especially, for this user property of geographic position, also can confirm that the interest speech with geographic position attribute that can weighted value in the interest model of user u is the highest is as this user's geographic position attribute through user's interest speech.For example; The interest speech with geographic position attribute that weighted value is the highest in the interest model of user u is " Beijing "; Then can be with " Beijing " geographic position attribute as this user u; The weighted value of this geographic position attribute this moment in the Profile model can be exactly its weighted value in same user's interest model, i.e. FeatW.

Embodiment three,

In the relational model, the relation between the different user is as the eigenwert of relational model, can comprise but is not limited to: question and answer relation, same regimental tie, similar interests relation.Correspondingly, when calculating concerning between the user, can consider a kind of or combination in any in the above-mentioned relation.

If the relation in the relational model between the different user is only considered question and answer relations and with a kind of or combination in the regimental tie, then between user u1 and the user u2 concern weight Rela Rank1 (u1, u2) can confirm by following formula:

Rela?Rank?1(u1，u2)＝log(relaCnt+α2)/logβ2(5)

Wherein, α 2 and β 2 are smoothing factor, can rule of thumb be worth and choose, and for example, choosing α 2 is 1.0, and choosing β 2 is 2.0.RelaCnt is user u1 and u2 question and answer relation or the number of times that occurs with regimental tie in the setting-up time section, wherein can include but not limited to regimental tie: appear at same community, same column, same interchange crowd, reply same model etc.

If the relation in the relational model between the different user is only considered similar interests relation, then based on interest model calculate between user u1 and the user u2 concern weight Rela Rank2 (u1, u2) can confirm by following formula:

Rela Rank 2 (u 1, u 2) = \cos (\overset{&RightArrow;}{T_{u 1},} \overset{&RightArrow;}{T_{u 2}}) = \frac{\overset{&RightArrow;}{T_{u 1}} \cdot \overset{&RightArrow;}{T_{u 2}}}{{| | \overset{&RightArrow;}{T_{u 1}} | |}^{2} \times {| | \overset{&RightArrow;}{T_{u 2}} | |}^{2}}, - - - (6)

Wherein,

is the interest speech matrix-vector of user u1, be the interest speech matrix-vector of user u2.

If the relation in the relational model between the different user except considering the similar interests relation, is also considered question and answer relation or same regimental tie, then between user u1 and the user u2 concern weight Rela Rank (u1, u2) can for:

Rela?Rank(u1，u2)＝Rela?Rank1(u1，u2)*α3+Rela?Rank2(u1，u2)*β3。(7)

Wherein, α 3 is the weight coefficient that is provided with in advance with β 3.

Embodiment four,

When setting up behavior model, at first each user's in the setting-up time section behavioral data is added up, calculate each user's liveness grade according to statistics.Wherein, User's behavioral data includes but not limited to: the situation of answering a question, land question answering system situation, (browsing here can comprise through the mode of inquiry and in Query Result, browsing to browse the question answering system page; And land browsing after the question answering system) situation, the situation here can be number of times or frequency.

The liveness grade Active (u) of user u can adopt following formula to calculate:

Active(u)＝[log(S _answer+α4)*w1+log(S _login+α4)*w2+log(S _browse+α4)*w3]/logβ4

(8)

Wherein, w1, w2 and w3 are the predetermined weights coefficients, and α 4 and β 4 are smoothing factor, can rule of thumb be worth setting, and for example getting α 4 is 1.0, and getting β 4 is 2.0.S _Answer, S _LoginAnd S _BrowseBe respectively number of times, the number of times that lands question answering system that user u answers a question, the number of times of browsing the question answering system page in the setting-up time section, perhaps be respectively frequency, the frequency of landing question answering system that user u answers a question, the frequency of browsing the question answering system page in the setting-up time section.

After setting up above-mentioned user model; Can utilize the eigenwert (being user's interest speech) in ID and the interest model to set up inverted list, wherein, eigenwert is as inverted index; Slide fastener is arranged in the weight conduct in this interest model of ID and this eigenwert, for example shown in (a) among Fig. 3.

Equally; Also can utilize the eigenwert (being user property) in ID and the Profile model to set up inverted list, wherein, eigenwert is as inverted index; Slide fastener is arranged in the weight conduct in this Profile model of ID and this eigenwert, for example shown in (b) among Fig. 3.

Perhaps; Also can utilize the eigenwert (surpassing other IDs of preset weighted value) in ID and the relational model to set up inverted list with this user's the weight that concerns; Wherein, ID is as inverted index, and slide fastener is arranged in eigenwert in the relational model and the conduct of corresponding relation weight, for example shown in (c) among Fig. 3.

Use when the foundation of above-mentioned inverted list can be used for follow-up definite response user, through embodiment five definite response user's process is described below.

Embodiment five,

Definite answer user's that Fig. 4 provides for the embodiment of the invention five process flow diagram; Usually question answering system can be put into waiting to answer a question and wait to answer a question tabulation; Periodically treat the tabulation of answering a question and carry out poll; To each definite response user that waits to answer a question, and be pushed to definite answer user.As shown in Figure 4, definite response user and the process that pushes problem can may further comprise the steps:

Step 401: the problem of treating answer is carried out text analyzing, extracts the characteristic speech of problem.

In this step; The process that the problem of treating answer is carried out text analyzing can comprise: the problem of treating answer is carried out the word segmentation processing based on semanteme; Based on the rate of falling the document and the ability of expressing the meaning to each word of obtaining after the word segmentation processing; Confirm the weighted value of expressing the meaning of each word, the weighted value of expressing the meaning is confirmed as the characteristic speech of problem greater than the word of preset feature extraction weighted value.

Wherein, the computation process of the weighted value of word can adopt the formula (1) among the embodiment one to calculate equally, repeats no more at this.

Step 402: utilize characteristic speech and the user model extracted to mate, confirm and this characteristic speech matching user.

In this step; Can mate with characteristic speech that extracts and each user's interest model; Confirm that the characteristic speech hits interest the speech corresponding user and the weighted value of interest speech in interest model that hit, promptly in interest model, search inverted index, confirm the slide fastener of arranging that hits.

Can directly the characteristic speech be hit the corresponding user of interest speech as matching user; More preferably; After confirming that the characteristic speech hits the user of interest speech correspondence, can choose the interest speech that hits in interest model weighted value greater than the user of preset matching weighted value as matching user.

Because in some cases; Though characteristic speech to be answered a question possibly hit certain user's interest speech, this user possibly interestedly not be the categorize interests under this problem, for the problem under this categorize interests; The user can interestedly not answer yet; Therefore, further, in embodiments of the present invention; The further categorize interests under the problem identificatioin, the user less than preset interest filtration weighted value filters out from matching user with the weighted value of categorize interests under the problem.Give an example; Be " what time Beijing's Imperial Palace opens the door " if wait to answer a question, if the interest speech of user A has been hit in " Beijing ", and this interest speech weighted value in the interest model of user A is very high; Then can this user A be elected as puing question to matching user with this; But this problem belongs to tourism classification, but the weighted value of tourism classification is very low in the interest model of user A, explain that user A classifies to tourism and loses interest in; Therefore lower for this questions answer probability, just can filter out user A this moment from matching user.

Wherein, During categorize interests under problem identificatioin; The categorize interests of each word that can problem identificatioin carries out obtaining after the word segmentation processing under respectively given a mark for each categorize interests according to the weighted value of each word, confirms that the highest categorize interests of marking value is as the categorize interests under this problem.This process is existing mature technology, is not described in detail at this.

In addition, can also the characteristic speech of extraction and each user's Profile model be mated, confirm to reach the pairing user of user property that the preset degree of correlation requires with the degree of correlation of this characteristic speech, user that this is definite is as matching user.For example; Suppose the characteristic speech " the Forbidden City " that from wait to answer a question, extracts; The industry attribute is tourism in certain user's the Profile model; Promptly this characteristic speech and this user's user property has the very high degree of correlation, if reach preset degree of correlation requirement, then can this user be confirmed as and this matching user to be answered a question.

Further, can also the enquirement user and the relational model of problem be mated, confirm with this enquirement user between the weighted value that concerns reach the user of preset matching weighted value, with definite user as matching user.Promptly in relational model, utilize to put question to ID to search inverted index, that confirms to hit arranges the user that corresponding relation weighted value in the slide fastener reaches the preset matching weighted value.

If adopt plural user model to carry out user's coupling, the match user that then will adopt each user model to confirm is got union.

Step 403: utilize the behavioral aspect Rank (behavior) that concern Rank (rela) and user of user, the user who determines is sorted to degree of correlation Rank (Profile), user and the enquirement user of interest level Rank (interest), this enquirement and the user property of enquirement.

The ordering weight Rank (U) of user U can adopt following mode to calculate:

Rank(U)＝Rank(interest)*W1+Rank(Profile)*W2+Rank(rela)*W3+Rank(behavior)*W4(9)

Wherein, W1, W2, W3 and W4 are respectively the preset corresponding weight coefficient of Rank (interest), Rank (Profile), Rank (rela) and Rank (behavior).

Rank (interest) can comprehensively confirm by express the meaning weighted value TermW and the weighted value FeatW of this characteristic speech in the interest model of user U of each characteristic speech to be answered a question, promptly

Wherein, T _NBe the characteristic set of words of extracting to be answered a question, TermW _iBe the weighted value of expressing the meaning of i characteristic speech, FeatW _iBe the weighted value of i characteristic speech in the interest model of user U.

Rank (Profile) can put question to express the meaning weighted value TermW and the weighted value FeaVec of characteristic speech in the Profile of user U model of each characteristic speech comprehensively to confirm by this, promptly

Wherein, FeaVec _iBe the weighted value of i problem characteristic speech in the Profile of user U model.

Rank (rela) can adopt user U and put question to the weight Rela Rank that concerns between the user A that (U A), can find through the relational model of match user A.

Rank (behavior) can be confirmed by the grade Active (U) that enlivens of user U, wherein enlivens grade Active (U) and can find through the behavior model of searching user U.

When matching user is sorted; Can only utilize a part of user model that the user is sorted, for example only utilize behavior model, confirm the behavioral aspect of each matching user; Utilize behavioral aspect that matching user is sorted, W1, W2, W3 in formula this moment (9) are 0; Again for example, can combine interest model and behavior model, confirm that the user to the interest level of enquirement and user's behavioral aspect, sorts to matching user then, W2 and the W3 of this moment are set to 0.Can use one or more user models to come matching user is sorted flexibly, no longer exhaustive at this.

That is to say that this step is actually a kind of or combination in any of utilizing in interest model, Profile model, relational model and the behavior model, calculate matching user and matching degree to be answered a question, matching user is sorted according to matching degree.It should be noted that in the step 402 confirm in employed user model of matching user and the step 403 matching user to be sorted employed user model can not be identical.

Step 404: will sort and confirm as this answer user to be answered a question the individual user of preceding M, problem is pushed to definite answer user, wherein M is preset positive integer.

More preferably, before pushing problem to the answer user, at first query answer user's current state is pushed to the current answer user who is in line states with waiting to answer a question.

Perhaps, can also confirm to sort in preceding M user's liveness grade, be pushed to ordering with waiting to answer a question in preceding M user, liveness surpasses the user of preset propelling movement liveness.For example; Can the user be divided into core customer, general user and decline user according to user's liveness grade; Can be pushed to ordering with waiting to answer a question the individual core customer of preceding M; Also can be pushed to the individual user of M1 before coming among the core customer with waiting to answer a question, the individual user of M3, perhaps other strategies before coming among M2 user and the decline user before coming among the general user.

Will wait to answer a question be pushed to definite answer user after, record is answered the user for this questions answer situation, promptly is recorded as the question and answer historical data and the behavioral data of answering the user, is used to upgrade the interest model and the attribute model of answering the user.In addition, answer this questions answer user and produced new answer relation, can be used to upgrade relational model with puing question to the user.Based on the behavior of answering this questions answer user, can also be used to upgrade this answer user's behavior model.

Take a concrete example, suppose that the problem that user X puts question to is: " what time Beijing's Imperial Palace opens the door ".This problem is carried out text analyzing,, obtain " Beijing ", " the Forbidden City ", " what time " and " opening the door " at first based on the word segmentation processing of semanteme.Based on the rate of falling the document of each word that obtains after the word segmentation processing and the ability of expressing the meaning, confirm the weighted value of each word, referring to formula (1).Suppose that " Beijing ", " the Forbidden City ", " what time " and " opening the door " corresponding weighted value of expressing the meaning are respectively 0.6,0.8,0.5 and 0.1, it is 0.6 that preset eigenwert is extracted weighted value, confirms that then " Beijing " and " the Forbidden City " is the characteristic speech of problem.

Characteristic speech " Beijing " and " the Forbidden City " are mated with each user model respectively, and the user model that adopts when supposing coupling is: interest model and relational model.

Utilize " Beijing " match interest model; Promptly with " Beijing " as the index search inverted index; Suppose that the slide fastener of arranging that hits is: [user A; Weight in the interest model of user A is 0.7], [user B, the weight in the interest model of user B is 0.3], [user C, the weight in the interest model of user C is 0.6].Utilize " the Forbidden City " match interest model, the slide fastener of arranging that hits is [user A, the weight in the interest model of user A is 0.3], [user D, the weight in the interest model of user D is 0.9].Suppose that the preset matching weighted value is 0.5, what utilize then that interest model confirms with problem characteristic speech matching user is: user A, user C, user D.

Utilize user X matching relationship model, suppose and user X between the user that weighted value reaches the preset matching weighted value that concerns be: [user A, and the weighted value that concerns between the user X is 0.8], [user C, and the weighted value that concerns between the user X is 0.7].

To get through the user that interest model and relational model are determined and obtain matching user after the union and be: user A, user C and user D.

Calculate the ordering weight of user A, user C and user D then respectively; Suppose that the weight coefficient W1, W2, W3 and the W4 that use are respectively 0.4,0.2,0.2,0.2; Suppose that " Beijing " is 0.8 as the geographic position at the weighted value of the Profile of user A model; " the Forbidden City " is 0.6 as tourism industry at the weighted value of the Profile of user A model; The Rank (behavior) that the liveness grade of user A is confirmed is 0.5, then: Rank (A)=(0.6*0.7+0.8*0.3) * 0.4+ (0.6*0.8+0.8*0.6) * 0.2+0.8*0.2+0.5*0.2=0.716

Adopt same way as to calculate the ordering weight of user C and user D respectively, suppose to be respectively:

Rank(C)＝0.424

Rank(D)＝0.628

Ordering weight ranking results according to each user is: user A, user D and user C.

Confirm as the answer user with coming preceding 2 user, promptly user A and user D confirm as and answer the user.Search the current state of user A and user D, confirm that user A and user D are online, then problem " what time Beijing's Imperial Palace opens the door " is pushed to user A and user D.

More than be the description that method provided by the present invention is carried out, be described in detail through six pairs of systems provided by the present invention of embodiment below.

Embodiment six,

The system that the embodiment of the invention six provides is based on the user model of setting up in advance, and wherein user model comprises following in listed at least two: the interest model that excavates foundation through problem that the user is putd question to and answer, the Profile model of setting up based on user property, through the behavior model of user behavior being added up foundation and the relational model of setting up based on relation between the different user.As shown in Figure 5, this system can comprise: feature extraction unit 500, user's sequencing unit 510 and problem push unit 520.

Feature extraction unit 500 is used to treat answer a question and carries out text analyzing extraction problem characteristic.

User's sequencing unit 510, the problem characteristic match user model that is used to utilize feature extraction unit 500 to extract, according to problem characteristic in each user model with the ordering weight of user's matching degree and preset each user model, the user is sorted.

Problem push unit 520 is used for the ranking results according to user's sequencing unit 510, is pushed to ordering with waiting to answer a question the individual user of preceding M, and wherein M is preset positive integer.

For the foundation and the maintenance that realize user model, this system can also comprise interest model maintenance unit 530.

Interest model maintenance unit 530 can specifically comprise: data grasp subelement 531, text analyzing subelement 532 and model maintenance subelement 533.

Data grasp subelement 531, are used to grasp each user's question and answer historical data.

Data grasp the question and answer historical data that subelement 531 can periodically grasp each user in the setting-up time section, with foundation and the renewal of carrying out interest model.

Text analyzing subelement 532 is used for each user's question and answer historical data is carried out text analyzing, extracts each user's interest speech.

Model maintenance subelement 533, the interest model that is used to utilize the interest speech foundation of text analyzing subelement 532 extractions or upgrades each user wherein comprises in the interest model: user's interest speech and the weight FeatW of interest speech in the interest model of respective user.

More specifically, the structure of text analyzing subelement 532 can be as shown in Figure 6, specifically comprises: word segmentation processing module 601, weight determination module 602 and interest speech determination module 603.

Word segmentation processing module 601, the problem of problem, the problem of proposition, the problem of browsing or the inquiry that is used for the user is answered is carried out the word segmentation processing based on semanteme.The word segmentation processing of only carrying out based on semanteme with the problem that the user is answered among Fig. 6 is an example.

Weight determination module 602 is used for based on the rate of falling the document of each word that obtains after word segmentation processing module 601 word segmentation processing and the ability of expressing the meaning, and confirms the weighted value TermW that expresses the meaning of each word.

Interest speech determination module 603 is used for the weighted value TermW that expresses the meaning is confirmed as greater than the word of preset interest weight selection value this user's interest speech.

The weighted value TermW that expresses the meaning of above-mentioned each word is: TermW=a*idf+b*ind; Wherein, The rate of falling the document c for preset more than or equal to 1 parameter; Df be word the number of times that occurs in remaining to be answered a question; N is a quantity to be answered a question, and ind representes the ability of expressing the meaning of word, and a and b are the predetermined weights coefficient.

Wherein the ability of expressing the meaning of word can be according to the decision of the part of speech of word, noun for example is set usually has the higher ability of expressing the meaning, conjunction is set, function word has the lower ability of expressing the meaning.

In addition; Can utilize the document rate df of word to realize filtration to each word that obtains after the word segmentation processing; Be that text analyzing subelement 532 can also comprise: filtration treatment module 604; Be used for each word of obtaining after word segmentation processing module 601 word segmentation processing, the df not word in preset range filters, wherein df be word the number of times that occurs in remaining to be answered a question.

At this moment, weight determination module 602 only calculates the weighted value of expressing the meaning to the word that obtains after 604 filtrations of filtration treatment module.

In Fig. 5; Model maintenance subelement 533 is confirmed the weight FeatW of interest speech in the interest model of respective user according to FeatW=a1*TermW+b1* Δ Tr; Wherein, The changing value or the rate of change of the frequency that this interest speech occurs in Δ Tr user's question and answer historical data that to be this interest speech occurs in user's question and answer historical data of grasping of current slot the frequency grasped with respect to a last time period if Δ Tr is a positive number, explains the interested trend increase of user to this interest speech; If Δ Tr is a negative, explain that the user reduces the trend interested of this interest speech.A1 and b1 are the predetermined weights coefficient.

In sum, the foundation of interest model and upgrade the thought in fact adopted the adaptive filtering algorithm is core with the related feedback method (Rocchio algorithm) based on expansion, promptly model maintenance subelement 533 according to

In order to improve the recall rate of answering the user; Can be to carrying out the expansion of interest speech in user's the interest model; In order to realize this purpose, interest model maintenance unit 530 can also comprise: interest speech expansion subelement 534 is used for the interest speech of user's interest model is expanded.

Interest speech expansion subelement 534 specifically can as shown in Figure 7ly comprise: expansion word determination module 701, interest-degree computing module 702 and interest speech expansion module 703.

Expansion word determination module 701 is used for confirming interest speech T _iExpansion word T _j

Interest-degree computing module 702 is used to calculate user u to expansion word T _jInterest level W _j, wherein,

α 1 and β 1 are preset adjustment coefficient, and jnum is expansion word T _jThe number of times that in all users' interest model, occurs, said Wherein, T ' be in the interest model of user u with T _jThe degree of correlation surpass the interest set of words of preset relevance degree, s (T _n, T _j) be interest speech T _nExpansion word T with this interest speech _jBetween the degree of correlation, W _AvgBe interest speech T _iWeighted mean in all user interest models,

Be interest speech T _iWeighted value in the interest model of user u.

Interest speech expansion module 703 is used to judge W _jValue whether surpass preset interest weight selection value, if, then with expansion word T _jAdd to as the interest speech in the interest model of user u.

Wherein, expansion word determination module 701 can be specifically from having interest speech T _iOther users' interest model in, select in other users' the interest model and interest speech T _iBelong to other interest speech in the same categorize interests, according to other interest speech and the interest speech T that select _iThe degree of correlation, select the degree of correlation to come before P interest speech as interest speech T _iExpansion word, wherein P is preset positive integer.

Still as shown in Figure 5, this system also comprises: Profile model maintenance unit 540 is used for the Profile model based on user property foundation and renewal user.

Wherein, the Profile model can comprise: user property and the user property weighted value in the Profile model.

Wherein, user property can include but not limited to following a kind of or combination in any in listed: geographic position, sex, age and industry.

The weighted value of user property in the Profile model that Profile model maintenance unit 540 can the user has is set to a fixed value, and the weighted value of the user property that the user does not have in the Profile model is set to 0; Perhaps; The interest speech with geographic position attribute that weighted value in user's the interest model is the highest adds in this user's the Profile model as this user's geographic position attribute; And the weighted value of this geographic position attribute in the Profile model is identical with the weighted value of interest speech in this user's interest model; In this case; Profile model maintenance unit 540 can utilize the interest speech in the interest model to form the geographic position attribute in the Profile model, and this annexation is not shown in Fig. 5.

This system can also comprise: relational model maintenance unit 550 is used for setting up and upgrade user's relational model based on concerning between the different user.

Relation includes but not limited to following listed a kind of or combination in any between the different user: question and answer concern, concern with regimental tie and similar interests.Relational model maintenance unit 550 can get access to the relation between the different user equally from user's question and answer historical data or user behavior data, this relational model maintenance unit 550 is not shown in Fig. 5 with the annexation that the user answers historical data base and user behavior data storehouse.

Wherein, can include but not limited to regimental tie: appear at same community, same column, same interchange crowd or reply same model.

If in the relational model between the different user relation only comprise question and answer relations or same regimental tie, then between user u1 and the user u2 concern weight Rela Rank1 (u1 u2) is: Rela Rank1 (u1, u2)=log (relaCnt+ α 2)/log β 2; Wherein, α 2 and β 2 are preset smoothing factor, and relaCnt be that user u1 concerns with u2 question and answer in the setting-up time section or with the number of times of regimental tie appearance.

wherein;

As shown in Figure 5, this system can also comprise: behavior model maintenance unit 560, be used for user behavior is added up, and set up and upgrade user's behavior model.

Wherein, behavior model comprises user's liveness grade; Wherein the liveness grade Active (u) of user u can for:

W1, w2 and w3 are the predetermined weights coefficients, and α 4 and β 4 are preset smoothing factor, S _Answer, S _LoginAnd S _BrowseBe respectively number of times that user u answers a question in the setting-up time section, land the number of times of question answering system and browse the number of times of the question answering system page, perhaps be respectively frequency, the frequency of landing question answering system that user u answers a question, the frequency of browsing the question answering system page in the setting-up time section.

Particularly, feature extraction unit 500 can specifically comprise: word segmentation processing subelement 501, weight confirm that subelement 502 and interest speech confirm subelement 503.

Word segmentation processing subelement 501 is used to treat answer a question and carries out the word segmentation processing based on semanteme.

Weight is confirmed subelement 502, is used for confirming the weighted value TermW that expresses the meaning of each word based on the rate of falling the document and the ability of expressing the meaning to each word of obtaining after word segmentation processing subelement 501 word segmentation processing.

Subelement 503 confirmed in the interest speech, is used for the weighted value TermW that expresses the meaning is confirmed as characteristic speech to be answered a question greater than the word of preset feature extraction weighted value.

Above-mentioned feature extraction unit 500 can be set to independently unit respectively with text analyzing subelement 532, also can realize through a unit.

User's sequencing unit 510 can specifically comprise among Fig. 5: the user is mated subelement 511 and the user subelement 512 that sorts.

The user is mated subelement 511, is used for utilizing a kind of or combination in any of problem characteristic match interest model, Profile model and relational model that feature extraction unit 500 extracts, confirms and the problem characteristic matching user of extracting.

User's subelement 512 that sorts; Be used for utilizing a kind of or combination in any of interest model, Profile model, relational model and behavior model; Confirm that the user matees the user and matching degree to be answered a question that subelement 511 determines, according to matching degree the user is mated the user that subelement 511 determines and sort.

Wherein, it is incomplete same that the user is mated the sort user model of subelement 512 utilizations of user model that subelement 511 utilizes and user.

If it is more than two that the user is mated the user model of subelement 511 utilizations, then can also be used for the problem characteristic matching user with extracting of utilizing each user model to determine is got union.

The user is mated subelement 511 according to the different user model that adopts, and confirms that there is following situation in matching user:

Utilized interest model if the user is mated subelement 511, the problem characteristic of then extracting comprises characteristic speech to be answered a question; At this moment,

The user is mated subelement 511 characteristic speech and each user's interest model is mated, and the characteristic speech is hit the problem characteristic matching user that the corresponding user of interest speech confirms as and extracts; Perhaps; Characteristic speech and each user's interest model is mated; Confirm that the characteristic speech hits interest the speech corresponding user and the weighted value of interest speech in interest model that hit, hitting interest speech weighted value in interest model is confirmed as matching user greater than the user of preset matching weighted value.

More excellent ground; User's sequencing unit 510 can also comprise: user filtering subelement 513; Categorize interests under being used to confirm wait to answer a question is mated the weighted value of this categorize interests the user that subelement 511 determines from the user less than the user of preset interest filtration weighted value and to be filtered out.

Utilized the Profile model if the user is mated subelement 511, the problem characteristic of then extracting comprises characteristic speech to be answered a question; At this moment,

The user is mated subelement 511 characteristic speech and each user's Profile model is mated, will and the characteristic speech between the degree of correlation reach the problem characteristic matching user that the pairing user of user property that the preset degree of correlation requires confirms as and extracts.

Utilized relational model if the user is mated subelement 511, the problem characteristic of then extracting comprises enquirement user profile to be answered a question; At this moment,

The user is mated subelement 511 and will be putd question to user profile and relational model to mate, and will and put question to the weighted value that concerns between the user to reach the problem characteristic matching user that the user of preset matching weighted value confirms as and extracts.

Particularly, user's subelement 512 that sorts can adopt following formula to confirm user U and matching degree Rank (U) to be answered a question: Rank (U)=Rank (interest) * W1+Rank (Profile) * W2+Rank (rela) * W3+Rank (behavior) * W4.

Wherein, W1, W2, W3 and W4 are the predetermined weights coefficient,

T _NBe the characteristic set of words of from wait to answer a question, extracting, TermW _iBe the weighted value of expressing the meaning of i characteristic speech, FeatW _iBe the weighted value of i characteristic speech in the interest model of user U,

FeaVec _iBe the weighted value of i characteristic speech in the Profile of user U model; Rank (rela) is the weighted value that concerns between the enquirement user to be answered a question that in relational model, searches and the user U, Rank (behavior) by the user U that in the behavior model of user U, searches to enliven grade Active (U) definite.

As shown in Figure 5, problem push unit 520 can adopt two kinds of structures, and in first kind of structure, problem push unit 520 can specifically comprise: state confirms that subelement 521 and first pushes subelement 522.

State is confirmed subelement 521, is used for the ranking results according to user's sequencing unit 510, and whether M user is in line states before confirming to come.

First pushes subelement 522, is used for being pushed to waiting to answer a question before coming the user that M user is in line states.

In second kind of structure (not shown among Fig. 5), problem push unit 520 specifically comprises: liveness confirms that subelement and second pushes subelement.

Liveness is confirmed subelement, is used for the ranking results according to user's sequencing unit 510, searches M user's before coming liveness grade in the subordinate act model.

Second pushes subelement, is used for being pushed to waiting to answer a question before coming the user that user's liveness grade of M surpasses preset propelling movement liveness.

In addition; Said system also comprises user data record cell 570; Be used for record and answer the user, promptly be recorded as the question and answer historical data and the behavioral data of answering the user, can be used for interest model, Profile model and behavior model that the user is answered in follow-up renewal for the questions answer situation; The answer user who answers a question has in addition also produced new answer relation with puing question to the user, can be used to upgrade relational model.User data record 570 is the existing unit in the knowledge question system, no longer specifically describes at this.

The above is merely preferred embodiment of the present invention, and is in order to restriction the present invention, not all within spirit of the present invention and principle, any modification of being made, is equal to replacement, improvement etc., all should be included within the scope that the present invention protects.

Claims

1. problem method for pushing; It is characterized in that; Based on the user model of setting up in advance, wherein said user model comprises following in listed at least two: the interest model that excavates foundation through problem that the user is putd question to and answer, the attribute model of setting up based on user property, through the behavior model of user behavior being added up foundation and the relational model of setting up based on relation between the different user; This method comprises:

2. method according to claim 1 is characterized in that, the foundation of said interest model comprises:

S1, grasp each user's question and answer historical data;

3. method according to claim 2 is characterized in that step S2 specifically comprises: be directed against each user's difference execution in step S21 respectively to step S23;

4. method according to claim 1 is characterized in that, said steps A specifically comprises:

5. according to claim 3 or 4 described methods, it is characterized in that the weighted value TermW that expresses the meaning of said each word is:

TermW=TermW=a*idf+b*ind; Wherein, The rate of falling the document

c for preset more than or equal to 1 parameter; Df be word the number of times that occurs in remaining to be answered a question; N is a quantity to be answered a question, the ability of expressing the meaning of ind identification of words, and a and b are the predetermined weights coefficient.

6. method according to claim 3 is characterized in that, before said step S22, also comprises:

7. method according to claim 3 is characterized in that, the weight FeatW of said interest speech in the interest model of respective user is:

8. method according to claim 2 is characterized in that,

According to Upgrade user's interest model, wherein, the interest model after Q ' expression is upgraded, the interest model before Q representes to upgrade, D _rRepresent positive routine sample set, be included in interest speech in rising trend of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _rRepresent positive routine sample size, D _nExpression counter-example sample set is included in interest speech on a declining curve of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _nExpression counter-example sample size, α, β and γ are predefined adjustment coefficients.

9. method according to claim 2 is characterized in that, this method also comprises: the interest speech in user's the interest model is expanded;

D1, confirm said interest speech T _iExpansion word T _j

D2, calculating user u are to said expansion word T _jInterest level W _j,

W_{j} = α 1 * \log (jnum) + P_{u, T_{j}} + β 1 * W_{avg},

Said

Be interest speech T _iWeighted value in the interest model of user u;

10. method according to claim 9 is characterized in that, said step D1 specifically comprises:

11. method according to claim 1 is characterized in that, said attribute model comprises: user property and the user property weighted value in attribute model;

12. method according to claim 1 is characterized in that, relation comprises following listed a kind of or combination in any between the said different user: question and answer relation, same regimental tie, similar interests relation;

13. method according to claim 11; It is characterized in that; Only comprise question and answer relation or same regimental tie if concern between the different user in the relational model; Then between user u1 and the user u2 concern weight Rela Rank1 (u1 u2) is: Rela Rank1 (u1, u2)=log (relaCnt+ α 2)/log β 2; Wherein, α 2 and β 2 are preset smoothing factor, and relaCnt be that user u1 concerns with u2 question and answer in the setting-up time section or with the number of times of regimental tie appearance;

wherein;

is the interest speech matrix-vector of user u1,

be the interest speech matrix-vector of user u2;

14. method according to claim 1 is characterized in that, said behavior model comprises user's liveness grade; Wherein the liveness grade Active (u) of user u is:

15. method according to claim 1 is characterized in that, said step B specifically comprises:

16. method according to claim 15 is characterized in that, if the user model of utilizing among the said step B1 is more than two, then will utilize each user model problem characteristic matching user that determine and said extraction to get union respectively.

17., it is characterized in that if utilized interest model among the said step B1, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question according to claim 15 or 16 described methods;

18. method according to claim 17; It is characterized in that; This method also comprises: confirm said categorize interests under waiting to answer a question, the user less than preset interest filtration weighted value filters out from said matching user with the weighted value of this categorize interests.

19., it is characterized in that if utilized attribute model among the said step B1, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question according to claim 15 or 16 described methods;

20., it is characterized in that if utilized relational model among the said step B1, then the problem characteristic of said extraction comprises enquirement user profile said to be answered a question according to claim 15 or 16 described methods;

21. method according to claim 15 is characterized in that, user U and matching degree Rank (U) said to be answered a question are: Rank (U)=Rank (interest) * W1+Rank (Profile) * W2+Rank (rela) * W3+Rank (behavior) * W4;

Wherein, W1, W2, W3 and W4 are the predetermined weights coefficient,

T _NFor from the said characteristic set of words of extracting waiting to answer a question, TermW _iBe the weighted value of expressing the meaning of i characteristic speech, FeatW _iBe the weighted value of i characteristic speech in the interest model of user U, FeaVec _iBe the weighted value of i characteristic speech in the attribute model of user U; Rank (rela) is the weighted value that concerns between the enquirement user said to be answered a question that in said relational model, searches and the said user U, Rank (behavior) by the said user U that in the behavior model of said user U, searches to enliven grade Active (U) definite.

22. method according to claim 1 is characterized in that, said step C comprises:

23. method according to claim 14 is characterized in that, said step C comprises:

24. problem supplying system; It is characterized in that; This system is based on the user model of setting up in advance, and wherein said user model comprises following in listed at least two: the interest model that excavates foundation through problem that the user is putd question to and answer, the attribute model of setting up based on user property, through the behavior model of user behavior being added up foundation and the relational model of setting up based on relation between the different user; This system comprises: feature extraction unit, user's sequencing unit and problem push unit;

25. system according to claim 24 is characterized in that, this system also comprises the interest model maintenance unit;

26. system according to claim 25 is characterized in that, said text analyzing subelement specifically comprises: word segmentation processing module, weight determination module and interest speech determination module;

27. system according to claim 24 is characterized in that, said feature extraction unit specifically comprises: word segmentation processing subelement, weight confirm that subelement and interest speech confirm subelement;

28., it is characterized in that the weighted value TermW that expresses the meaning of said each word is according to claim 26 or 27 described systems:

TermW=TermW=a*idf+b*ind; Wherein, The rate of falling the document

29. system according to claim 26; It is characterized in that; Said text analyzing subelement also comprises: the filtration treatment module; Be used for each word of obtaining after the said word segmentation processing module word segmentation processing, the df not word in preset range filters, wherein df be word the number of times that occurs in remaining to be answered a question;

30. system according to claim 26 is characterized in that, the weight FeatW of said interest speech in the interest model of respective user is:

31. system according to claim 25 is characterized in that, said model maintenance subelement according to Upgrade user's interest model, wherein, the interest model after Q ' expression is upgraded, the interest model before Q representes to upgrade, D _rRepresent positive routine sample set, be included in interest speech in rising trend of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _rRepresent positive routine sample size, D _nExpression counter-example sample set is included in interest speech on a declining curve of frequency of occurrence in this user's the question and answer historical data and the weighted value in interest model thereof, N _nExpression counter-example sample size, α, β and γ are predefined adjustment coefficients.

32. system according to claim 25 is characterized in that, the interest model maintenance unit also comprises: interest speech expansion subelement is used for the interest speech of user's interest model is expanded;

Said interest-degree computing module is used to calculate user u to said expansion word T _jInterest level W _j, wherein,

Said

Be interest speech T _iWeighted value in the interest model of user u;

33. system according to claim 32 is characterized in that, said expansion word determination module is specifically from having said interest speech T _iOther users' interest model in, select in other users' the interest model and said interest speech T _iBelong to other interest speech in the same categorize interests, according to said other interest speech and the said interest speech T that select _iThe degree of correlation, select the degree of correlation to come before P interest speech as interest speech T _iExpansion word, wherein P is preset positive integer.

34. system according to claim 24 is characterized in that, this system also comprises: the attribute model maintenance unit is used for setting up and renewal attribute of user model based on user property;

35. system according to claim 24 is characterized in that, this system also comprises: the relational model maintenance unit, be used for based on concerning between the different user, and set up and the renewal relational model;

36. system according to claim 35; It is characterized in that; Only comprise question and answer relation or same regimental tie if concern between the different user in the relational model; Then between user u1 and the user u2 concern weight Rela Rank1 (u1 u2) is: Rela Rank1 (u1, u2)=log (relaCnt+ α 2)/log β 2; Wherein, α 2 and β 2 are preset smoothing factor, and relaCnt be that user u1 concerns with u2 question and answer in the setting-up time section or with the number of times of regimental tie appearance;

wherein; is the interest speech matrix-vector of user u1, be the interest speech matrix-vector of user u2;

37. system according to claim 24 is characterized in that, this system also comprises: the behavior model maintenance unit, be used for user behavior is added up, and set up and upgrade user's behavior model;

38. system according to claim 24 is characterized in that, said user's sequencing unit specifically comprises: the user is mated subelement and the user subelement that sorts;

39., it is characterized in that according to the described system of claim 38, be more than two if said user is mated the user model of subelement utilization, then also be used for utilizing each user model problem characteristic matching user that determine and said extraction to get union.

40., it is characterized in that utilized interest model if said user is mated subelement, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question according to claim 38 or 39 described systems;

41. according to the described system of claim 40; It is characterized in that; Said user's sequencing unit also comprises: the user filtering subelement; Be used for confirming said categorize interests under waiting to answer a question, the weighted value of this categorize interests is mated the user that subelement determines from said user less than the user of preset interest filtration weighted value filter out.

42., it is characterized in that utilized attribute model if said user is mated subelement, then the problem characteristic of said extraction comprises characteristic speech said to be answered a question according to claim 38 or 39 described systems;

43., it is characterized in that utilized relational model if said user is mated subelement, then the problem characteristic of said extraction comprises enquirement user profile said to be answered a question according to claim 38 or 39 described systems;

44. according to the described system of claim 38; It is characterized in that the said user subelement that sorts adopts following formula to confirm user U and matching degree Rank (U) said to be answered a question: Rank (U)=Rank (interest) * W1+Rank (Profile) * W2+Rank (rela) * W3+Rank (behavior) * W4;

Wherein, W1, W2, W3 and W4 are the predetermined weights coefficient,

45. system according to claim 24 is characterized in that, said problem push unit specifically comprises: state confirms that subelement and first pushes subelement;

46., it is characterized in that said problem push unit specifically comprises according to the described system of claim 37: liveness confirms that subelement and second pushes subelement;