CN106886561A

CN106886561A - Web Community's model influence sort method based on association in time interaction fusion

Info

Publication number: CN106886561A
Application number: CN201611249593.0A
Authority: CN
Inventors: 胡卫明; 游强; 吴偶
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2017-06-23

Abstract

The present invention relates to a kind of Web Community's model influence sort method based on association in time interaction fusion.The method includes：It is determined that the influence sequence based on association in time；According to text semantic model, the content of text to model carries out semantic modeling, builds the semantic tree based on semantic context similarity；According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged, the influence to model is ranked up.In association in time information incorporated into sort method, form the mode of association in time influence sequence, the unified fusion framework with reference to semantic and community structure is realized on this basis, network structure and content of text are merged, so as to solve pinpointing the problems for potential impact source in Web Community, the shown temporal information and semantic context information of Web Community's model can be made full use of.

Description

Web Community's model influence sort method based on association in time interaction fusion

Technical field

It is more particularly to a kind of that fusion is interacted based on association in time the present invention relates to Web Mining and social computing technical field Web Community model influence sort method.

Background technology

With flourishing for network, increasing user likes in online communation interest, states one's views.Then, have Some models for making a strong impact are emerged, and have attracted substantial amounts of user to participate in.It is indiscriminate different from whole internet Include numerous and jumbled information, Web Community oftenes focus on one or several related fields, and these Web Communities become spy The ground of person's quick obtaining information of determining area research or the ideal for making a policy.In view of the influence power of some Web Communities, active Degree and the content distribution platform as specific area stabilization, excavate Web Community and find out valuable or potential influence Big model just turns into problem demanding prompt solution.

Some Web Communities have specific network structure (explicit or implicit).Such as one classics with line The structure of the zone of discussion of rope tissue is as shown in Figure 1.The longitudinal axis reflect each tissue clue, transverse axis reflection be timeline to Preceding development, it is each in each clue to discuss that model is pushed ahead by interaction reply.Based on the basic analyzing method of influence, potential shadow The quality height correlation with model is rung, while the liveness replied with model in clue is also closely connected, the model that the former considers The content information of itself, and the latter then reflects the structural information that whole clue is included.The model of network interaction is unless the context Outside structural information, also include abundant metadata, such as timestamp of posting, the information for the people that posts etc..The time posted Interval between stamp also reflects the liveness of model in clue, and this is often ignored in conventional work.

In view of this, it is special to propose the present invention.

The content of the invention

In order to solve above mentioned problem of the prior art, the potential impact source in solving Web Community that has been is pinpointed the problems And a kind of Web Community's model influence sort method based on association in time interaction fusion is provided.

To achieve these goals, there is provided following technical scheme：

A kind of Web Community's model influence sort method based on association in time interaction fusion, the method includes：

It is determined that the influence sequence based on association in time；

According to text semantic model, the content of text to model carries out semantic modeling, builds similar based on semantic context The semantic tree of degree；

According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged one Rise, the influence to model is ranked up.

Preferably, it is determined that the influence sequence based on association in time is specifically included：

Model to Web Community's clue is screened, the timestamp that each model is delivered in extraction clue；

For the timestamp for extracting, ageing function and association in time weight of the model in clue are determined；

According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of association in time.

Preferably, according to text semantic model, the content of text to model carries out semantic modeling, builds under being based on semantically The semantic tree of literary similarity, specifically includes：

According to the text semantic feature of model in each clue of text semantic model extraction Web Community；

Based on text semantic feature, with clue as organizational form, by the semantic association degree between model, by clue The set of model is configured to semantic tree.

Text semantic feature is preferably based on, with clue as organizational form, by the semantic association degree between model, The set of model in clue is configured to semantic tree, is specifically included：

Under text semantic model, the semantic context similarity in calculating clue between model；

According to the timestamp that model is delivered, determine that the father node of model in clue is alternatively gathered；

Alternatively gathered based on father node, according to following formula, the father node of model in clue is determined, so as to build semantic tree：

Wherein, L_*Represent model；L_iRepresent under text semantic model, the text semantic feature of the model that clue is included, i, J=1......M；M represents the quantity of model contained by clue.

Preferably, under text semantic model, the semantic context similarity in calculating clue between model, specific bag Include：

According to following formula calculated direction similarity：

Wherein, S_cos(L_i,L_j) represent direction similarity；L_jRepresent under text semantic model, the model that clue is included Text semantic feature, i ＜ j, j=1......M；

Amplitude similarity is calculated according to following formula：

Wherein, S_str(L_i,L_j) represent amplitude similarity；

Calculated under text semantic model according to following formula, the semantic context similarity in clue between model：

Wherein, S (L_i,L_j) represent the semantic context similarity between model in clue；λ represents control direction similarity With the weight of amplitude similarity.

Preferably, text semantic model is potential including the word bag model with the word frequency TFIDF of falling document frequency as representative, probability Semantic indexing pLSI models and latent Dirichletal location LDA models；

The method also includes：

Each model is merged respectively in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications rope Draw the similarity under pLSI models and latent Dirichletal location LDA models, and fusion results are defined as model in clue Between semantic context similarity.

Preferably, according to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is melted It is combined, the influence to model is ranked up, and specifically includes：

The semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, part is marked off Semantic tree collection is trained, and obtains percentage contribution of the text semantic model to Ordering and marking；

The semantic tree of quality evaluation and structure according to text semantic model, semanteme is carried out on association in time influence marking value The fusion of aspect, and fusion is weighted to semantic structure and reply structure.

Preferably, the semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, is drawn Separate part of semantic tree collection to be trained, obtain percentage contribution of the text semantic model to Ordering and marking, specifically include：

Text semantic feature of the fusion model under text semantic model, timestamp and the corresponding matter of text semantic model Amount vector, obtains the association in time Ordering and marking obtained by text semantic model of model.

Preferably, the quality evaluation according to text semantic model and the semantic tree for building, marking value is influenceed on association in time The fusion of semantic level is carried out, and fusion is weighted to semantic structure and reply structure, specifically included：

The corresponding quality evaluation vector of text semantic model is calculated by following formula：

Wherein, q_*Represent quality evaluation vector；Q represents quality vector；N_tRepresent clue quantity；m_iRepresent model quantity；y_j Represent other network users according to the sequence or marking provided to model value to the help of itself；Represent beating for prediction Point；

Total association in time influence marking value is calculated according to following formula：

Trr=α q_*trr_se ^T+(1-α)trr_st

Wherein, trr represents total association in time influence marking value；trr_seRepresent the association in time that text semantic model is obtained Ordering and marking vector；trr_stRepresent and reply the association in time Ordering and marking vector that structure is obtained；α represents balance trr_seAnd trr_st Between proportion parameter；

Model influence is ranked up according to total association in time influence marking value.

The present invention provides a kind of Web Community's model influence sort method based on association in time interaction fusion.The method bag Include：It is determined that the influence sequence based on association in time；According to text semantic model, the content of text to model carries out semantic modeling, Build the semantic tree based on semantic context similarity；According to unified interactive blending algorithm, by the text semantic and net of model The structural information of network community is merged, and the influence to model is ranked up.Sequence side is incorporated by by association in time information In method, the mode of association in time influence sequence is formed, the unification with reference to semantic and community structure is realized on this basis Fusion framework, network structure and content of text is merged, so as to solve the discovery in potential impact source in Web Community Problem, can make full use of the shown temporal information and semantic context information of Web Community's model, obtain than traditional Rely only on the sort method or the more preferable effect of method only with text mining of community structure.

Brief description of the drawings

Fig. 1 is the Web Community's schematic diagram organized with clue discussion；

Fig. 2 is the Web Community's model influence sort method based on association in time interaction fusion according to embodiments of the present invention Schematic flow sheet；

Fig. 3 is the interval of timestamps distribution schematic diagram that a data set according to embodiments of the present invention is extracted；

Fig. 4 be it is according to embodiments of the present invention be that association in time influence marking is worth the shadow to be formed from a certain model discharge rate Ring flow diagram；

Fig. 5 a are that the structure of the word bag model with the word frequency TFIDF of falling document frequency as representative according to embodiments of the present invention is shown It is intended to；

Fig. 5 b are the structural representations that probability potential applications according to embodiments of the present invention index pLSI models；

Fig. 5 c are the structural representations of latent Dirichletal location LDA models according to embodiments of the present invention；

Fig. 6 is the unified fusion block schematic illustration based on model quality evaluation according to embodiments of the present invention.

Specific embodiment

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little implementation methods are used only for explaining know-why of the invention, it is not intended that limit the scope of the invention.

Method provided in an embodiment of the present invention is not restricted to the hardware and programming language of carrying out practically, uses such as C languages Any language such as speech, VC++, Java, Python is write and can be realized, is that this other mode of operation is repeated no more.Example Such as：Method provided in an embodiment of the present invention can apply to following system, it include by system bus connect processor, ROM (read-only storage) and RAM (random access memory) and interface.Wherein, be stored with operating system and database in ROM.Treatment Device is used to provide calculating and control ability.The processor can perform method provided in an embodiment of the present invention.It is excellent that one is only lifted below Example is selected, it is 2.67G hertz of four core central processors and the computer of 4G byte of memorys to use one to have dominant frequency, is used in combination Python realizes method provided in an embodiment of the present invention.

Some Web Communities have specific network structure (explicit or implicit).Such as one classics with line The structure of the zone of discussion of rope tissue is as shown in Figure 1.The longitudinal axis reflect each tissue clue, transverse axis reflection be timeline to Preceding development, it is each in each clue to discuss that model is pushed ahead by interaction reply.Based on the basic analyzing method of influence, potential shadow The quality height correlation with model is rung, while the liveness replied with model in clue is also closely connected, the model that the former considers The content information of itself, and the latter then reflects the structural information that whole clue is included.The model of network interaction is unless the context Outside structural information, also include abundant metadata, such as timestamp of posting, the information of the people that posts, the timestamp posted Between interval etc., this is often ignored in conventional work.

Therefore, the embodiment of the present invention proposes a kind of model influence sequence side of Web Community based on association in time interaction fusion Method.As shown in Fig. 2 the method can be realized by following steps：

S200：It is determined that the influence sequence based on association in time.

Specifically, this step can include：

S201：Model to Web Community's clue is screened, the timestamp that each model is delivered in extraction clue.

This step is screened by the model to Web Community's clue (preferably discussion clue), and removal includes model Very few clue, to extract the timestamp information that each model in clue is delivered.The timestamp information that the model of extraction is delivered shows Position of the model on timeline in clue, the time interval length between the number of same clue model and different models are shown Reflect the active degree of model in clue.This has greatly effect when association in time Ordering and marking algorithm is modeled below.

The timestamp information that each model in extraction clue is delivered is illustrated with reference to Fig. 3.

Assuming that the given model with timestamp integrates, to be combined into clue be D.Model to Web Community's clue is screened, Removal includes the very few clue of model (such as：Comprising the clue less than 3 models), then clue D can be expressed as one it is oriented Figure G (V, E, ts).Wherein, V represents the set of node that the set of all models is formed；E represents that the reply between model is formed Side collection；Ts represents the timestamp set of model in clue.A node in each model representative graph.For example：If model v Reply model u, then a line (u → v) ∈ E are just constituted between model.Fig. 3 exemplarily give a data set extraction when Between stamp be spaced apart schematic diagram.

It should be noted that each model represents a node in set up figure, herein during described below, Will no longer be distinguished the connotation of model and node.

S202：For the timestamp for extracting, ageing function and association in time weight of the model in clue are determined.

When it is determined that association in time influence is sorted, the embodiment of the present invention considers situations below：(1) model in clue should This is delivered in time；(2) model in clue is once delivering, it should can attract the follow-up of a large amount of replies；(3) follow-up to the greatest extent should may be used Former note can in time be replied.So, the model for meeting above-mentioned situation potentially influences sequence forward.

The association in time weight and ageing function that define in clue are illustrated with reference to Fig. 3.

In the digraph modeled comprising model clue, define the corresponding association in time weights of side (u → v) ∈ E for w (u, V), and define model v ageing function be h (v).Wherein, the association in time weight indicates a certain model and other models Between frequency of interaction.The ageing function indicates model v relative positions on the time line in clue.

The distribution that time interval according to Fig. 3 is presented, defines association in time weight w (u, v) for time interval letter Number K (ts_u,ts_v)=log (ts_u-ts_v).Wherein, ts_vRepresent the timestamp of the current model v for considering；ts_uRepresent current model v Money order receipt to be signed and returned to the sender u timestamp.

It is model v timestamps ts to define model v ageing functions h (v) in itself_vFunction H (ts_v)：

Wherein, ts_minRepresent first timestamp of model in clue；ts_maxRepresent in clue last on timeline The timestamp of model.

S203：According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of association in time.

Fig. 4 is schematically illustrated from the influence that certain model discharge rate is the formation of association in time influence marking value trr Stream.Wherein, it is the influence stream that association in time influence marking trr is formed from certain model discharge rate, is w by a length, Sectional area introduces model to be investigated for the conduit of h.Assuming that model to be investigated is v, and there is semantic context with it and associate Model be followed successively by x, y ..., u, it is assumed that t-1 moment, the influence stream of model x, y ..., u is followed successively by trr^(t-1)(x),trr^(t ^-1)(y),…,trr^(t-1)(u), then t, the influence stream trr of model to be investigated^(t)V () can be by following iterative formula Solve：

The influence sequence for calculating association in time is illustrated in the way of preferred embodiment below.

It is w by a length, the conduit that sectional area is h introduces model to be investigated.

The first step, sets the trr between each node at random.Wherein, trr represents that the association in time influence sequence of each node is beaten Score value.Preferably, the result for being obtained using ageing function is used as initial value, i.e. trr⁽¹⁾(v)=h (v)=H (ts_v), it is assumed that In t-1 steps, the marking for introducing the node u of node v influences to be investigated is trr^(t-1)(u), then in t steps, node v to be investigated Influence marking can be solved by following equation：

So by iteration to restraining, the association in time influence marking of each model in clue is obtained, finally by right Influence of the marking result answered to model is ranked up.

S210：According to text semantic model, the content of text to model carries out semantic modeling, builds and be based on semantic context The semantic tree of similarity.

Specifically, this step can include：

S211：According to the text semantic feature of model in each clue of text semantic model extraction Web Community.

Wherein, text semantic model includes but is not limited to word bag model with the word frequency TFIDF of falling document frequency as representative, general Rate potential applications index pLSI models and latent Dirichletal location LDA models.Fig. 5 a-5c schematically illustrate respectively with The word frequency TFIDF of falling document frequency is the word bag model of representative, probability potential applications index pLSI models and potential Di Li Crays Distribute the structural representation of LDA models.Wherein, TFIDF contains information of both word frequency TF and the IDF of falling document frequency, preceding Person reflects the number of times that a word occurs in a document, and the latter then reflects the inverse of the document number comprising the word.

By taking TFIDF as an example, for M model under clue D, if word w_iIn model d_jThe word frequency of middle appearance is tf (w_i, d_j), the model frequency of appearance is df (w_i), then can calculate document frequency by following formula：

In formula, idf (w_i) represent document frequency.

With reference to the word frequency TF and IDF of falling document frequency, word frequency is calculated according to following formula and falls document frequency：

In formula, tfidf (w_i, d_j) represent that word frequency falls document frequency；tf(w_i,d_j) represent the word frequency occurred in model；idf (w_i) represent document frequency.

Due to pLSI models consider be document, hide Categories (or be referred to as " concept ") and word relation, so, root Singular value decomposition is carried out to document frequency matrix according to following formula：

W=U Σ V^T；

Wherein, W represents document frequency matrix, and its i-th row, the element of jth row are word frequency tf (w_i,d_j)。

In above formula, U and V is orthogonal matrix and U^TU=V^TV=I.Wherein, U^TAnd V^TWhat is be corresponding in turn to is to matrix U With the transposed matrix of V.

In actual applications, singular value in diagonal matrix Σ on leading diagonal from top to bottom can be selected successively from big to small Arrangement, the number of the singular value of diagonal matrix represents the number of potential semantic concept, and the size table of the singular value of diagonal matrix The power of potential semantic concept is shown.Preferably, in order to suppress to deviate in semantic space the noise of Semantic center, to remove one The influence of a little noises, generally requires to do approximate processing, it is only necessary to retain the several singular values of maximum of which, and neglect less Singular value.Such as, in the statement of some models, there are some rare misspellings, because they are document frequency matrix W's Value is smaller and extremely sparse, and after singular value decomposition is carried out, these words will form the singular value of very little, and we are by ignoring this A little less singular values, and then remove the influence that they are caused to the potential concept of core.

The concept produced in pLSI models comes from document in itself, by singular value decomposition in pLSI models, can be by word Semantically doing a kind of effect of similar cluster.Word is more similar in semantic space, then by being obtained just after singular value decomposition The distance between corresponding term vector is also smaller in handing over matrix U, original document frequency matrix W represent N number of word and M model it Between relation, by after singular value decomposition and the larger Z singular value of selection, generating Z potential semantic concept, typically and Speech Z will be far smaller than N, and so N number of different word will be assigned in Z concept so as to complete the work of the similar cluster of word With.

The thinking of LDA models is similar to the thinking of pLSI models.Difference is that the concept of generation in pLSI models comes from In itself, and the concept in LDA models comes from topic distribution to document, and the topic is distributed with the scale of document that it doesn't matter.Specifically For, LDA models describe a generation process for model, it is assumed that each model has implicit " a potential topic " Layer, the layer is the mixture of several " potential topics ", and " potential topic " (abbreviation topic) is to represent potential in model Semanteme, LDA models produce a process for model can be：Set up one it is relatively wide in range if exam pool, this model of design From the topic storehouse, it is processed, such as some concepts come from " military affairs " topic, and some concepts come from " humanity " words Topic, the proportion shared by different topics and topic is formed topic distribution, and these concepts have their exclusive dictionaries, from this Vocabulary is constantly selected in the dictionary of a little correspondence concepts, sentence is formed, so as to constitute chapter.

Above-mentioned three kinds of models portraying on text semantic yardstick be from bottom to high-rise process, gradually from word in itself to Topic it is abstract.In the follow-up calculating of model, the corresponding text of three kinds of semantic models of model is special in can simultaneously extracting clue Levy expression (i.e. text semantic feature).

Its semanteme is individually obtained to model in Web Community extremely difficult, its reason is these models often a few isolated words and phrases, Extremely sparse in higher-dimension semantic space, model semanteme in itself is incomplete, lacks substantial amounts of background knowledge.By knowledge The method of engineering often expends substantial amounts of manpower in order to some models individually go to build substantial amounts of associated context semantic information, its knot What fruit was even lost more than gain.Because the model in Web Community is not independently of other models, particularly with clue The model set that mode is constituted, there is extremely strong semantic association between them.Thus, the embodiment of the present invention passes through step S212 builds semantic tree.

S212：Based on text semantic feature, with clue as organizational form, by the semantic association degree between model, will The set of model is configured to semantic tree in clue.

By above-mentioned steps, just there is similar semanteme on semantic tree between adjacent node, so as to each other can Certain contextual information is provided.

Specifically, this step can include：

S2121：Under text semantic model, the semantic context similarity in calculating clue between model.

For similarity, the embodiment of the present invention considers the directional information and amplitude information between semanteme.So, in clue Semantic context similarity (or referred to as similarity measure) between model is direction similarity and the weighted sum of amplitude similarity.

Further, this step can include：

Step 1：According to following formula calculated direction similarity：

In formula, S_cos(L_i,L_j) represent direction similarity；L_iRepresent under text semantic model, the model that clue is included Text semantic feature, i=1......M；M represents the quantity of model contained by clue；L_jRepresent under text semantic model, clue Comprising model text semantic feature, i ＜ j, j=1......M.

In above formula, if taking i ＜ j, ts_i＜ ts_j, wherein,WithRepresent from model extract when Between stab.

Direction similarity S_cos(L_i,L_j) cosine similarity is represented, the distance reflects the feature side represented by two models Upward similarity, embodiment is uniformity between two models semanteme.

Step 2：Amplitude similarity is calculated according to following formula：

In formula, S_str(L_i,L_j) represent amplitude similarity.

Wherein, amplitude similarity shows similitude of two models in semantic intensity.

Step 3：Calculated under text semantic model according to following formula, the semantic context similarity in clue between model：

In formula, S (L_i,L_j) represent the semantic context similarity between model in clue；λ represents control direction similarity With the weight of amplitude similarity.

If can also carry out step 4 using multiple text semantic models in implementation process.For example：Text semantic Model can be word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications index pLSI models and Latent Dirichletal location LDA models.

Step 4：Merge each model potential in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability respectively Similarity under semantic indexing pLSI models and latent Dirichletal location LDA models, and fusion results are defined as clue Semantic context similarity between middle model.

Similarity of this step by unified fusion framework by each model under different semantic models is merged.

S2122：According to the timestamp that model is delivered, determine that the father node of model in clue is alternatively gathered.

For example, in certain discussion clue, for wherein specifically certain is characterized as L_jModel, can be from the time Stab early model characteristic setOne model of middle selection is used as its father node.

In the way of describing the alternative set of father node for determining model in clue in detail by a preferred embodiment below：It is false If the timestamp that certain model is set up is 58 minutes November 10 day 14 point in 2015, then the early model set of timestamp represents foundation Time earlier than the range subclass in 58 minutes November 10 day 14 point in 2015, using these model set as father node select it is standby Selected works are closed.

It should be noted that above-mentioned be only assumed as citing, the improper restriction to the scope of the present invention is not construed as.

S2123：Alternatively gathered based on father node, according to following formula, the father node of model in clue is determined, so as to build semanteme Tree：

In formula, L_*Represent model；L_iRepresent under text semantic model, the text semantic feature of the model that clue is included, i, J=1......M.

This step can be considered as the optimization problem to following formula：

Above formula can be regarded as selection model L_*, make itself and L_jUnder the similarity (i.e. measuring similarity function) of definition The value for arriving is maximum.

After sequence number j is decremented to 2 from M, a semantic tree based on semantic similarity measurement is (i.e. based on semantic context Reply structure) just rebuild successfully.

S220：According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged Together, the influence to model is ranked up.

The text semantic model different under vector space, the embodiment of the present invention extracts a kind of qualitative factor to describe Contribution of these semantic models to final Ordering and marking result has much.

Specifically, this step can be realized by step S221 and step S222.Wherein：

S221：The semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, is divided Go out part of semantic tree collection to be trained, obtain percentage contribution of the text semantic model to Ordering and marking.

Specifically, this step can include：Text semantic feature of the fusion model under text semantic model, timestamp with And the corresponding quality vector of text semantic model, the association in time obtained by the text semantic model sequence for obtaining the model beats Point.

The embodiment of the present invention consider text semantic model never ipsilateral or from different semantic hierarchies to the semanteme of clue Space is portrayed, then the quality that text semantic model is portrayed, i.e. quality vector can be weighed by a value.For K Text semantic model, its corresponding quality vector is q=(q₁,...,q_k,...,q_K)。

In actual applications, N can randomly be selected from all clues_tIndividual clue is used to train.For i-th clue D_i, there is m_iIndividual modelIt is included, wherein L_jRepresent clue D_iIn j-th content of model certain text Characteristic vector under this semantic model, ts_jRepresent the timestamp of model, and y_jRepresent other network users according to the side of itself The sequence or marking be given to model value are helped, can be as label value when training.

For example, for j-th model, with reference to its text semantic feature L_j, timestamp ts_j, and text semantic model Corresponding quality vector q, carries out quality fusion, obtains scoring functions f (), i.e.,It represents prediction Marking.

Y in above formula_jCan also be used as label value when training.

For multiple text semantic models, the corresponding quality evaluation vector of each text semantic model is merged.Then, based on structure The semantic tree based on semantic context similarity is built, is merged, obtain the marking of each model.

S222：Quality evaluation according to text semantic model and the semantic tree for building, association in time influence marking is worth into The fusion of row semantic level, and fusion is weighted to semantic structure and reply structure.

Wherein, it is based on the clue structure replied to reply structure.

S2221：The corresponding quality evaluation vector of text semantic model is calculated by following formula：

In formula, q_*Represent quality evaluation vector；Q represents quality vector；N_tRepresent clue quantity；m_iRepresent model quantity；y_j Other network users are represented according to the sequence be given to model value to the help of itself or is given a mark；Represent prediction Marking.

S2222：Total association in time influence marking value is calculated according to following formula：

Trr=α q_*trr_se ^T+(1-α)trr_st

In formula, trr represents total association in time influence marking value；trr_seRepresent the association in time that text semantic model is obtained Ordering and marking vector；trr_stRepresent and reply the association in time Ordering and marking vector that structure is obtained；α represents balance trr_seAnd trr_st Between proportion parameter.

S2223：Model influence is ranked up according to total association in time influence marking value.

For example, indexed for the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications PLSI models and latent Dirichletal location LDA models, can obtain q_*=(q_TFIDF,q_LSI,q_LDA) and trr_se= (trr_TFIDF,trr_LSI,trr_LDA).Wherein, q_TFIDFRepresent the word bag model correspondence as representative with the word frequency TFIDF of falling document frequency Quality evaluation vector；q_LSIRepresent the corresponding quality evaluation vector of probability potential applications index pLSI models；q_LDARepresent potential The corresponding quality evaluation vector of Di Li Crays distribution LDA models；trr_TFIDFRepresent with the word frequency TFIDF of falling document frequency as representative The corresponding total association in time influence marking value of word bag model；trr_LSIRepresent that probability potential applications index pLSI models are corresponding Total association in time influence marking value；trr_LDARepresent the corresponding total association in time influence of latent Dirichletal location LDA models Marking value.By being merged, total association in time Ordering and marking is obtained, the influence according still further to marking to model is ranked up.

The word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications are indexed with reference to Fig. 6 PLSI models and latent Dirichletal location LDA models describe the present invention in detail as a example by text semantic model.Wherein, For training clue, in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications index pLSI models And under latent Dirichletal location LDA models, can be according to the semantic context between model in step S1121 calculating clues Similarity, then, can carry out semantic tree reconstruction according to step S1123, then choose the semantic tree collection for training and be used for The semantic tree collection of test, then model quality evaluation will be carried out for the semantic tree collection trained, then, carry out melting for semantic level Close；For test scenarios, according to structure is replied, structure fusion is carried out；Finally, by the fusion results of semantic level in structural level Fusion results carry out unified fusion, so as to realize being ranked up the potential impact of Web Community's model.

Web Community's model influence sort method based on association in time interaction fusion provided in an embodiment of the present invention, by note The structural information that the sub content information of itself is included with whole clue organically combines, and considers metadata, such as send out Interval (that reflects the liveness of model in clue) between note timestamp, the information of the people that posts, the timestamp posted etc..It is logical Cross during association in time information incorporated into sort method, it is proposed that association in time influences sort algorithm, is carried on the basis of the algorithm A kind of semantic and community structure the unified fusion framework of combination is gone out, by network structure together with content mergence.Can be abundant Using the temporal information shown by Web Community's model and semantic context information, acquirement relies only on community structure than traditional Sort method or the more preferable effect of method only with text mining.

Although each step is described according to the mode of above-mentioned precedence in above-described embodiment, this area Technical staff is appreciated that to realize the effect of the present embodiment, not necessarily in the execution of such order between different steps, It (parallel) execution simultaneously or can be performed with the order for overturning, these simple changes all protection scope of the present invention it It is interior.

Method provided in an embodiment of the present invention can also be realized using PLD, it is also possible to be embodied as calculating (it includes performing particular task or realizes the routine of particular abstract data type, program, right for machine program software or program module As, component or data structure etc.), such as embodiments in accordance with the present invention can be a kind of computer program product, and operation should Computer program product makes computer perform for demonstrated method.The computer program product is deposited including computer-readable Storage media, includes computer program logic or code section, for realizing methods described on the medium.The computer-readable is deposited Storage media can be mounted built-in medium in a computer or can be disassembled from basic computer it is removable Medium is (for example：Using the storage device of hot plug technology).The built-in medium includes but is not limited to rewritable non-volatile Memory, for example：RAM, ROM, flash memory and hard disk.The removable medium is included but is not limited to：Optical storage media (example Such as：CD-ROM and DVD), magnetic-optical storage medium (for example：MO), magnetic storage medium is (for example：Tape or mobile hard disk), with interior The media of the rewritable nonvolatile memory put are (for example：Storage card) and media with built-in ROM are (for example：ROM boxes).

The detailed description to example embodiment of the invention is provided for the purpose of illustration and description above.Be not for Limit limits the invention to described precise forms.Obviously, many variations and modifications are to those skilled in the art Speech is obvious.Selection and description of the embodiments be in order to most preferably illustrate principle of the invention and its practical application, from And make others skilled in the art it will be appreciated that various embodiments of the present invention and being suitable to specific use expected various modifications. Embodiments of the invention can omit some technical characteristics in above-mentioned technical characteristic, only solve part present in prior art Technical problem.And, described technical characteristic can be combined.Protection scope of the present invention is by appended claims And its equivalent is limited, art technology other staff can carry out respectively to the technical scheme described in appended claims Modification or replacement and combination are planted, the technical scheme after these are changed or replace it is fallen within protection scope of the present invention.

Claims

1. a kind of Web Community's model based on association in time interaction fusion influences sort method, it is characterised in that methods described Including：

It is determined that the influence sequence based on association in time；

According to text semantic model, the content of text to the model carries out semantic modeling, builds similar based on semantic context The semantic tree of degree；

According to unified interactive blending algorithm, the structural information of the text semantic of the model and Web Community is merged one Rise, the influence to the model is ranked up.

2. method according to claim 1, it is characterised in that the determination is based on the specific bag of influence sequence of association in time Include：

Model to Web Community's clue is screened, the timestamp that each model is delivered in the extraction clue；

For the timestamp for extracting, ageing function and association in time weight of the model in the clue are determined；

According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of the association in time.

3. method according to claim 1, it is characterised in that described according to text semantic model, to the text of the model This content carries out semantic modeling, builds the semantic tree based on semantic context similarity, specifically includes：

According to the text semantic feature of model described in each clue described in Web Community described in the text semantic model extraction；

Based on the text semantic feature, with the clue as organizational form, by the semantic association degree between the model, The set of model described in the clue is configured to semantic tree.

4. method according to claim 3, it is characterised in that described based on the text semantic feature, with the clue It is organizational form, by the semantic association degree between the model, the set of model described in the clue is configured to language Justice tree, specifically includes：

Under the text semantic model, the semantic context similarity between model described in the clue is calculated；

According to the timestamp that the model is delivered, determine that the father node of model described in the clue is alternatively gathered；

Alternatively gathered based on the father node, according to following formula, determine the father node of model described in the clue, so as to build institute State semantic tree：

L_{*} = \underset{L_{i}}{\arg} \underset{1 \leq i \leq j - 1}{m a x} S (L_{i}, L_{j})

Wherein, the L_*Represent the model；The L_iRepresent under the text semantic model, it is described that the clue is included The text semantic feature of model, described i, j=1......M；The M represents the quantity of the model contained by the clue.

5. method according to claim 4, it is characterised in that described under the text semantic model, calculates the line Semantic context similarity described in rope between model, specifically includes：

According to following formula calculated direction similarity：

S_{c o s} (L_{i}, L_{j}) = \frac{L_{i} L_{j}^{T}}{| | L_{i} | | | | L_{j} | |}

Wherein, the S_cos(L_i,L_j) represent the direction similarity；The L_jRepresent under the text semantic model, it is described The text semantic feature of the model that clue is included, i ＜ j, j=1......M；

Amplitude similarity is calculated according to following formula：

S_{s t r} (L_{i}, L_{j}) = \frac{2 | | L_{i} | | | | L_{j} | |}{| | L_{i} | |^{2} + | | L_{j} | |^{2}}

Wherein, the S_str(L_i,L_j) represent the amplitude similarity；

Calculated under the text semantic model according to following formula, the semantic context phase described in the clue between model Like degree：

S (L_{i}, L_{j}) = \frac{1}{2} ({λS}_{c o s} (L_{i}, L_{j}) + (1 - λ) S_{s t r} (L_{i}, L_{j}))

Wherein, the S (L_i,L_j) represent the semantic context similarity between model described in the clue；The λ tables Show the weight for controlling the direction similarity and the amplitude similarity.

6. method according to claim 5, the text semantic model is included with the word frequency TFIDF of falling document frequency as representative Word bag model, probability potential applications index pLSI models and latent Dirichletal location LDA models；

Characterized in that, methods described also includes：

Each model is merged to be indexed in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications respectively Similarity under pLSI models and latent Dirichletal location LDA models, and fusion results are defined as institute in the clue State the semantic context similarity between model.

7. method according to claim 1, it is characterised in that described according to unified interactive blending algorithm, by the note The text semantic of son and the structural information of Web Community are merged, and the influence to the model is ranked up, and specifically includes：

The semantic tree built under the text semantic model is estimated on association in time influence Ordering and marking, is marked off Part of semantic tree collection is trained, and obtains percentage contribution of the text semantic model to Ordering and marking；

The semantic tree of quality evaluation and structure according to the text semantic model, is carried out on association in time influence marking value The fusion of semantic level, and fusion is weighted to semantic structure and reply structure.

8. method according to claim 7, it is characterised in that the institute's predicate to being built under the text semantic model Justice tree is estimated on association in time influence Ordering and marking, marks off part of semantic tree collection and is trained, and obtains the text language Adopted model is specifically included to the percentage contribution of Ordering and marking：

Merge the text semantic feature of the model under the text semantic model, the timestamp and the text The corresponding quality vector of semantic model, the association in time sequence obtained by the text semantic model for obtaining the model is beaten Point.

9. method according to claim 7, it is characterised in that the quality evaluation according to the text semantic model and The semantic tree for building, the fusion of semantic level is carried out on association in time influence marking value, and to semantic structure and reply knot Structure is weighted fusion, specifically includes：

The corresponding quality evaluation vector of the text semantic model is calculated by following formula：

q_{*} = \underset{q}{\arg} \min Σ_{i = 1}^{N_{t}} Σ_{j = 1}^{m_{i}} {({\hat{y}}_{j} - y_{j})}^{2}

Wherein, the q_*Represent the quality evaluation vector；The q represents quality vector；The N_tRepresent clue quantity；It is described m_iRepresent model quantity；The y_jRepresent other network users according to the help of itself to the model sequence that is given of value or Marking；It is describedRepresent the marking of prediction；

Trr=α q_*trr_se ^T+(1-α)trr_st

Wherein, the trr represents total association in time influence marking value；The trr_seRepresent that the text semantic model is obtained The association in time Ordering and marking vector for arriving；The trr_stRepresent described and reply the association in time Ordering and marking vector that structure is obtained； The α represents the balance trr_seWith the trr_stBetween proportion parameter；