CN106886561A - Web Community's model influence sort method based on association in time interaction fusion - Google Patents

Web Community's model influence sort method based on association in time interaction fusion Download PDF

Info

Publication number
CN106886561A
CN106886561A CN201611249593.0A CN201611249593A CN106886561A CN 106886561 A CN106886561 A CN 106886561A CN 201611249593 A CN201611249593 A CN 201611249593A CN 106886561 A CN106886561 A CN 106886561A
Authority
CN
China
Prior art keywords
model
semantic
association
clue
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611249593.0A
Other languages
Chinese (zh)
Inventor
胡卫明
游强
吴偶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201611249593.0A priority Critical patent/CN106886561A/en
Publication of CN106886561A publication Critical patent/CN106886561A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The present invention relates to a kind of Web Community's model influence sort method based on association in time interaction fusion.The method includes:It is determined that the influence sequence based on association in time;According to text semantic model, the content of text to model carries out semantic modeling, builds the semantic tree based on semantic context similarity;According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged, the influence to model is ranked up.In association in time information incorporated into sort method, form the mode of association in time influence sequence, the unified fusion framework with reference to semantic and community structure is realized on this basis, network structure and content of text are merged, so as to solve pinpointing the problems for potential impact source in Web Community, the shown temporal information and semantic context information of Web Community's model can be made full use of.

Description

Web Community's model influence sort method based on association in time interaction fusion
Technical field
It is more particularly to a kind of that fusion is interacted based on association in time the present invention relates to Web Mining and social computing technical field Web Community model influence sort method.
Background technology
With flourishing for network, increasing user likes in online communation interest, states one's views.Then, have Some models for making a strong impact are emerged, and have attracted substantial amounts of user to participate in.It is indiscriminate different from whole internet Include numerous and jumbled information, Web Community oftenes focus on one or several related fields, and these Web Communities become spy The ground of person's quick obtaining information of determining area research or the ideal for making a policy.In view of the influence power of some Web Communities, active Degree and the content distribution platform as specific area stabilization, excavate Web Community and find out valuable or potential influence Big model just turns into problem demanding prompt solution.
Some Web Communities have specific network structure (explicit or implicit).Such as one classics with line The structure of the zone of discussion of rope tissue is as shown in Figure 1.The longitudinal axis reflect each tissue clue, transverse axis reflection be timeline to Preceding development, it is each in each clue to discuss that model is pushed ahead by interaction reply.Based on the basic analyzing method of influence, potential shadow The quality height correlation with model is rung, while the liveness replied with model in clue is also closely connected, the model that the former considers The content information of itself, and the latter then reflects the structural information that whole clue is included.The model of network interaction is unless the context Outside structural information, also include abundant metadata, such as timestamp of posting, the information for the people that posts etc..The time posted Interval between stamp also reflects the liveness of model in clue, and this is often ignored in conventional work.
In view of this, it is special to propose the present invention.
The content of the invention
In order to solve above mentioned problem of the prior art, the potential impact source in solving Web Community that has been is pinpointed the problems And a kind of Web Community's model influence sort method based on association in time interaction fusion is provided.
To achieve these goals, there is provided following technical scheme:
A kind of Web Community's model influence sort method based on association in time interaction fusion, the method includes:
It is determined that the influence sequence based on association in time;
According to text semantic model, the content of text to model carries out semantic modeling, builds similar based on semantic context The semantic tree of degree;
According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged one Rise, the influence to model is ranked up.
Preferably, it is determined that the influence sequence based on association in time is specifically included:
Model to Web Community's clue is screened, the timestamp that each model is delivered in extraction clue;
For the timestamp for extracting, ageing function and association in time weight of the model in clue are determined;
According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of association in time.
Preferably, according to text semantic model, the content of text to model carries out semantic modeling, builds under being based on semantically The semantic tree of literary similarity, specifically includes:
According to the text semantic feature of model in each clue of text semantic model extraction Web Community;
Based on text semantic feature, with clue as organizational form, by the semantic association degree between model, by clue The set of model is configured to semantic tree.
Text semantic feature is preferably based on, with clue as organizational form, by the semantic association degree between model, The set of model in clue is configured to semantic tree, is specifically included:
Under text semantic model, the semantic context similarity in calculating clue between model;
According to the timestamp that model is delivered, determine that the father node of model in clue is alternatively gathered;
Alternatively gathered based on father node, according to following formula, the father node of model in clue is determined, so as to build semantic tree:
Wherein, L*Represent model;LiRepresent under text semantic model, the text semantic feature of the model that clue is included, i, J=1......M;M represents the quantity of model contained by clue.
Preferably, under text semantic model, the semantic context similarity in calculating clue between model, specific bag Include:
According to following formula calculated direction similarity:
Wherein, Scos(Li,Lj) represent direction similarity;LjRepresent under text semantic model, the model that clue is included Text semantic feature, i < j, j=1......M;
Amplitude similarity is calculated according to following formula:
Wherein, Sstr(Li,Lj) represent amplitude similarity;
Calculated under text semantic model according to following formula, the semantic context similarity in clue between model:
Wherein, S (Li,Lj) represent the semantic context similarity between model in clue;λ represents control direction similarity With the weight of amplitude similarity.
Preferably, text semantic model is potential including the word bag model with the word frequency TFIDF of falling document frequency as representative, probability Semantic indexing pLSI models and latent Dirichletal location LDA models;
The method also includes:
Each model is merged respectively in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications rope Draw the similarity under pLSI models and latent Dirichletal location LDA models, and fusion results are defined as model in clue Between semantic context similarity.
Preferably, according to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is melted It is combined, the influence to model is ranked up, and specifically includes:
The semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, part is marked off Semantic tree collection is trained, and obtains percentage contribution of the text semantic model to Ordering and marking;
The semantic tree of quality evaluation and structure according to text semantic model, semanteme is carried out on association in time influence marking value The fusion of aspect, and fusion is weighted to semantic structure and reply structure.
Preferably, the semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, is drawn Separate part of semantic tree collection to be trained, obtain percentage contribution of the text semantic model to Ordering and marking, specifically include:
Text semantic feature of the fusion model under text semantic model, timestamp and the corresponding matter of text semantic model Amount vector, obtains the association in time Ordering and marking obtained by text semantic model of model.
Preferably, the quality evaluation according to text semantic model and the semantic tree for building, marking value is influenceed on association in time The fusion of semantic level is carried out, and fusion is weighted to semantic structure and reply structure, specifically included:
The corresponding quality evaluation vector of text semantic model is calculated by following formula:
Wherein, q*Represent quality evaluation vector;Q represents quality vector;NtRepresent clue quantity;miRepresent model quantity;yj Represent other network users according to the sequence or marking provided to model value to the help of itself;Represent beating for prediction Point;
Total association in time influence marking value is calculated according to following formula:
Trr=α q*trrse T+(1-α)trrst
Wherein, trr represents total association in time influence marking value;trrseRepresent the association in time that text semantic model is obtained Ordering and marking vector;trrstRepresent and reply the association in time Ordering and marking vector that structure is obtained;α represents balance trrseAnd trrst Between proportion parameter;
Model influence is ranked up according to total association in time influence marking value.
The present invention provides a kind of Web Community's model influence sort method based on association in time interaction fusion.The method bag Include:It is determined that the influence sequence based on association in time;According to text semantic model, the content of text to model carries out semantic modeling, Build the semantic tree based on semantic context similarity;According to unified interactive blending algorithm, by the text semantic and net of model The structural information of network community is merged, and the influence to model is ranked up.Sequence side is incorporated by by association in time information In method, the mode of association in time influence sequence is formed, the unification with reference to semantic and community structure is realized on this basis Fusion framework, network structure and content of text is merged, so as to solve the discovery in potential impact source in Web Community Problem, can make full use of the shown temporal information and semantic context information of Web Community's model, obtain than traditional Rely only on the sort method or the more preferable effect of method only with text mining of community structure.
Brief description of the drawings
Fig. 1 is the Web Community's schematic diagram organized with clue discussion;
Fig. 2 is the Web Community's model influence sort method based on association in time interaction fusion according to embodiments of the present invention Schematic flow sheet;
Fig. 3 is the interval of timestamps distribution schematic diagram that a data set according to embodiments of the present invention is extracted;
Fig. 4 be it is according to embodiments of the present invention be that association in time influence marking is worth the shadow to be formed from a certain model discharge rate Ring flow diagram;
Fig. 5 a are that the structure of the word bag model with the word frequency TFIDF of falling document frequency as representative according to embodiments of the present invention is shown It is intended to;
Fig. 5 b are the structural representations that probability potential applications according to embodiments of the present invention index pLSI models;
Fig. 5 c are the structural representations of latent Dirichletal location LDA models according to embodiments of the present invention;
Fig. 6 is the unified fusion block schematic illustration based on model quality evaluation according to embodiments of the present invention.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little implementation methods are used only for explaining know-why of the invention, it is not intended that limit the scope of the invention.
Method provided in an embodiment of the present invention is not restricted to the hardware and programming language of carrying out practically, uses such as C languages Any language such as speech, VC++, Java, Python is write and can be realized, is that this other mode of operation is repeated no more.Example Such as:Method provided in an embodiment of the present invention can apply to following system, it include by system bus connect processor, ROM (read-only storage) and RAM (random access memory) and interface.Wherein, be stored with operating system and database in ROM.Treatment Device is used to provide calculating and control ability.The processor can perform method provided in an embodiment of the present invention.It is excellent that one is only lifted below Example is selected, it is 2.67G hertz of four core central processors and the computer of 4G byte of memorys to use one to have dominant frequency, is used in combination Python realizes method provided in an embodiment of the present invention.
Some Web Communities have specific network structure (explicit or implicit).Such as one classics with line The structure of the zone of discussion of rope tissue is as shown in Figure 1.The longitudinal axis reflect each tissue clue, transverse axis reflection be timeline to Preceding development, it is each in each clue to discuss that model is pushed ahead by interaction reply.Based on the basic analyzing method of influence, potential shadow The quality height correlation with model is rung, while the liveness replied with model in clue is also closely connected, the model that the former considers The content information of itself, and the latter then reflects the structural information that whole clue is included.The model of network interaction is unless the context Outside structural information, also include abundant metadata, such as timestamp of posting, the information of the people that posts, the timestamp posted Between interval etc., this is often ignored in conventional work.
Therefore, the embodiment of the present invention proposes a kind of model influence sequence side of Web Community based on association in time interaction fusion Method.As shown in Fig. 2 the method can be realized by following steps:
S200:It is determined that the influence sequence based on association in time.
Specifically, this step can include:
S201:Model to Web Community's clue is screened, the timestamp that each model is delivered in extraction clue.
This step is screened by the model to Web Community's clue (preferably discussion clue), and removal includes model Very few clue, to extract the timestamp information that each model in clue is delivered.The timestamp information that the model of extraction is delivered shows Position of the model on timeline in clue, the time interval length between the number of same clue model and different models are shown Reflect the active degree of model in clue.This has greatly effect when association in time Ordering and marking algorithm is modeled below.
The timestamp information that each model in extraction clue is delivered is illustrated with reference to Fig. 3.
Assuming that the given model with timestamp integrates, to be combined into clue be D.Model to Web Community's clue is screened, Removal includes the very few clue of model (such as:Comprising the clue less than 3 models), then clue D can be expressed as one it is oriented Figure G (V, E, ts).Wherein, V represents the set of node that the set of all models is formed;E represents that the reply between model is formed Side collection;Ts represents the timestamp set of model in clue.A node in each model representative graph.For example:If model v Reply model u, then a line (u → v) ∈ E are just constituted between model.Fig. 3 exemplarily give a data set extraction when Between stamp be spaced apart schematic diagram.
It should be noted that each model represents a node in set up figure, herein during described below, Will no longer be distinguished the connotation of model and node.
S202:For the timestamp for extracting, ageing function and association in time weight of the model in clue are determined.
When it is determined that association in time influence is sorted, the embodiment of the present invention considers situations below:(1) model in clue should This is delivered in time;(2) model in clue is once delivering, it should can attract the follow-up of a large amount of replies;(3) follow-up to the greatest extent should may be used Former note can in time be replied.So, the model for meeting above-mentioned situation potentially influences sequence forward.
The association in time weight and ageing function that define in clue are illustrated with reference to Fig. 3.
In the digraph modeled comprising model clue, define the corresponding association in time weights of side (u → v) ∈ E for w (u, V), and define model v ageing function be h (v).Wherein, the association in time weight indicates a certain model and other models Between frequency of interaction.The ageing function indicates model v relative positions on the time line in clue.
The distribution that time interval according to Fig. 3 is presented, defines association in time weight w (u, v) for time interval letter Number K (tsu,tsv)=log (tsu-tsv).Wherein, tsvRepresent the timestamp of the current model v for considering;tsuRepresent current model v Money order receipt to be signed and returned to the sender u timestamp.
It is model v timestamps ts to define model v ageing functions h (v) in itselfvFunction H (tsv):
Wherein, tsminRepresent first timestamp of model in clue;tsmaxRepresent in clue last on timeline The timestamp of model.
S203:According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of association in time.
Fig. 4 is schematically illustrated from the influence that certain model discharge rate is the formation of association in time influence marking value trr Stream.Wherein, it is the influence stream that association in time influence marking trr is formed from certain model discharge rate, is w by a length, Sectional area introduces model to be investigated for the conduit of h.Assuming that model to be investigated is v, and there is semantic context with it and associate Model be followed successively by x, y ..., u, it is assumed that t-1 moment, the influence stream of model x, y ..., u is followed successively by trr(t-1)(x),trr(t -1)(y),…,trr(t-1)(u), then t, the influence stream trr of model to be investigated(t)V () can be by following iterative formula Solve:
The influence sequence for calculating association in time is illustrated in the way of preferred embodiment below.
It is w by a length, the conduit that sectional area is h introduces model to be investigated.
The first step, sets the trr between each node at random.Wherein, trr represents that the association in time influence sequence of each node is beaten Score value.Preferably, the result for being obtained using ageing function is used as initial value, i.e. trr(1)(v)=h (v)=H (tsv), it is assumed that In t-1 steps, the marking for introducing the node u of node v influences to be investigated is trr(t-1)(u), then in t steps, node v to be investigated Influence marking can be solved by following equation:
So by iteration to restraining, the association in time influence marking of each model in clue is obtained, finally by right Influence of the marking result answered to model is ranked up.
S210:According to text semantic model, the content of text to model carries out semantic modeling, builds and be based on semantic context The semantic tree of similarity.
Specifically, this step can include:
S211:According to the text semantic feature of model in each clue of text semantic model extraction Web Community.
Wherein, text semantic model includes but is not limited to word bag model with the word frequency TFIDF of falling document frequency as representative, general Rate potential applications index pLSI models and latent Dirichletal location LDA models.Fig. 5 a-5c schematically illustrate respectively with The word frequency TFIDF of falling document frequency is the word bag model of representative, probability potential applications index pLSI models and potential Di Li Crays Distribute the structural representation of LDA models.Wherein, TFIDF contains information of both word frequency TF and the IDF of falling document frequency, preceding Person reflects the number of times that a word occurs in a document, and the latter then reflects the inverse of the document number comprising the word.
By taking TFIDF as an example, for M model under clue D, if word wiIn model djThe word frequency of middle appearance is tf (wi, dj), the model frequency of appearance is df (wi), then can calculate document frequency by following formula:
In formula, idf (wi) represent document frequency.
With reference to the word frequency TF and IDF of falling document frequency, word frequency is calculated according to following formula and falls document frequency:
In formula, tfidf (wi, dj) represent that word frequency falls document frequency;tf(wi,dj) represent the word frequency occurred in model;idf (wi) represent document frequency.
Due to pLSI models consider be document, hide Categories (or be referred to as " concept ") and word relation, so, root Singular value decomposition is carried out to document frequency matrix according to following formula:
W=U Σ VT
Wherein, W represents document frequency matrix, and its i-th row, the element of jth row are word frequency tf (wi,dj)。
In above formula, U and V is orthogonal matrix and UTU=VTV=I.Wherein, UTAnd VTWhat is be corresponding in turn to is to matrix U With the transposed matrix of V.
In actual applications, singular value in diagonal matrix Σ on leading diagonal from top to bottom can be selected successively from big to small Arrangement, the number of the singular value of diagonal matrix represents the number of potential semantic concept, and the size table of the singular value of diagonal matrix The power of potential semantic concept is shown.Preferably, in order to suppress to deviate in semantic space the noise of Semantic center, to remove one The influence of a little noises, generally requires to do approximate processing, it is only necessary to retain the several singular values of maximum of which, and neglect less Singular value.Such as, in the statement of some models, there are some rare misspellings, because they are document frequency matrix W's Value is smaller and extremely sparse, and after singular value decomposition is carried out, these words will form the singular value of very little, and we are by ignoring this A little less singular values, and then remove the influence that they are caused to the potential concept of core.
The concept produced in pLSI models comes from document in itself, by singular value decomposition in pLSI models, can be by word Semantically doing a kind of effect of similar cluster.Word is more similar in semantic space, then by being obtained just after singular value decomposition The distance between corresponding term vector is also smaller in handing over matrix U, original document frequency matrix W represent N number of word and M model it Between relation, by after singular value decomposition and the larger Z singular value of selection, generating Z potential semantic concept, typically and Speech Z will be far smaller than N, and so N number of different word will be assigned in Z concept so as to complete the work of the similar cluster of word With.
The thinking of LDA models is similar to the thinking of pLSI models.Difference is that the concept of generation in pLSI models comes from In itself, and the concept in LDA models comes from topic distribution to document, and the topic is distributed with the scale of document that it doesn't matter.Specifically For, LDA models describe a generation process for model, it is assumed that each model has implicit " a potential topic " Layer, the layer is the mixture of several " potential topics ", and " potential topic " (abbreviation topic) is to represent potential in model Semanteme, LDA models produce a process for model can be:Set up one it is relatively wide in range if exam pool, this model of design From the topic storehouse, it is processed, such as some concepts come from " military affairs " topic, and some concepts come from " humanity " words Topic, the proportion shared by different topics and topic is formed topic distribution, and these concepts have their exclusive dictionaries, from this Vocabulary is constantly selected in the dictionary of a little correspondence concepts, sentence is formed, so as to constitute chapter.
Above-mentioned three kinds of models portraying on text semantic yardstick be from bottom to high-rise process, gradually from word in itself to Topic it is abstract.In the follow-up calculating of model, the corresponding text of three kinds of semantic models of model is special in can simultaneously extracting clue Levy expression (i.e. text semantic feature).
Its semanteme is individually obtained to model in Web Community extremely difficult, its reason is these models often a few isolated words and phrases, Extremely sparse in higher-dimension semantic space, model semanteme in itself is incomplete, lacks substantial amounts of background knowledge.By knowledge The method of engineering often expends substantial amounts of manpower in order to some models individually go to build substantial amounts of associated context semantic information, its knot What fruit was even lost more than gain.Because the model in Web Community is not independently of other models, particularly with clue The model set that mode is constituted, there is extremely strong semantic association between them.Thus, the embodiment of the present invention passes through step S212 builds semantic tree.
S212:Based on text semantic feature, with clue as organizational form, by the semantic association degree between model, will The set of model is configured to semantic tree in clue.
By above-mentioned steps, just there is similar semanteme on semantic tree between adjacent node, so as to each other can Certain contextual information is provided.
Specifically, this step can include:
S2121:Under text semantic model, the semantic context similarity in calculating clue between model.
For similarity, the embodiment of the present invention considers the directional information and amplitude information between semanteme.So, in clue Semantic context similarity (or referred to as similarity measure) between model is direction similarity and the weighted sum of amplitude similarity.
Further, this step can include:
Step 1:According to following formula calculated direction similarity:
In formula, Scos(Li,Lj) represent direction similarity;LiRepresent under text semantic model, the model that clue is included Text semantic feature, i=1......M;M represents the quantity of model contained by clue;LjRepresent under text semantic model, clue Comprising model text semantic feature, i < j, j=1......M.
In above formula, if taking i < j, tsi< tsj, wherein,WithRepresent from model extract when Between stab.
Direction similarity Scos(Li,Lj) cosine similarity is represented, the distance reflects the feature side represented by two models Upward similarity, embodiment is uniformity between two models semanteme.
Step 2:Amplitude similarity is calculated according to following formula:
In formula, Sstr(Li,Lj) represent amplitude similarity.
Wherein, amplitude similarity shows similitude of two models in semantic intensity.
Step 3:Calculated under text semantic model according to following formula, the semantic context similarity in clue between model:
In formula, S (Li,Lj) represent the semantic context similarity between model in clue;λ represents control direction similarity With the weight of amplitude similarity.
If can also carry out step 4 using multiple text semantic models in implementation process.For example:Text semantic Model can be word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications index pLSI models and Latent Dirichletal location LDA models.
Step 4:Merge each model potential in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability respectively Similarity under semantic indexing pLSI models and latent Dirichletal location LDA models, and fusion results are defined as clue Semantic context similarity between middle model.
Similarity of this step by unified fusion framework by each model under different semantic models is merged.
S2122:According to the timestamp that model is delivered, determine that the father node of model in clue is alternatively gathered.
For example, in certain discussion clue, for wherein specifically certain is characterized as LjModel, can be from the time Stab early model characteristic setOne model of middle selection is used as its father node.
In the way of describing the alternative set of father node for determining model in clue in detail by a preferred embodiment below:It is false If the timestamp that certain model is set up is 58 minutes November 10 day 14 point in 2015, then the early model set of timestamp represents foundation Time earlier than the range subclass in 58 minutes November 10 day 14 point in 2015, using these model set as father node select it is standby Selected works are closed.
It should be noted that above-mentioned be only assumed as citing, the improper restriction to the scope of the present invention is not construed as.
S2123:Alternatively gathered based on father node, according to following formula, the father node of model in clue is determined, so as to build semanteme Tree:
In formula, L*Represent model;LiRepresent under text semantic model, the text semantic feature of the model that clue is included, i, J=1......M.
This step can be considered as the optimization problem to following formula:
Above formula can be regarded as selection model L*, make itself and LjUnder the similarity (i.e. measuring similarity function) of definition The value for arriving is maximum.
After sequence number j is decremented to 2 from M, a semantic tree based on semantic similarity measurement is (i.e. based on semantic context Reply structure) just rebuild successfully.
S220:According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged Together, the influence to model is ranked up.
The text semantic model different under vector space, the embodiment of the present invention extracts a kind of qualitative factor to describe Contribution of these semantic models to final Ordering and marking result has much.
Specifically, this step can be realized by step S221 and step S222.Wherein:
S221:The semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, is divided Go out part of semantic tree collection to be trained, obtain percentage contribution of the text semantic model to Ordering and marking.
Specifically, this step can include:Text semantic feature of the fusion model under text semantic model, timestamp with And the corresponding quality vector of text semantic model, the association in time obtained by the text semantic model sequence for obtaining the model beats Point.
The embodiment of the present invention consider text semantic model never ipsilateral or from different semantic hierarchies to the semanteme of clue Space is portrayed, then the quality that text semantic model is portrayed, i.e. quality vector can be weighed by a value.For K Text semantic model, its corresponding quality vector is q=(q1,...,qk,...,qK)。
In actual applications, N can randomly be selected from all cluestIndividual clue is used to train.For i-th clue Di, there is miIndividual modelIt is included, wherein LjRepresent clue DiIn j-th content of model certain text Characteristic vector under this semantic model, tsjRepresent the timestamp of model, and yjRepresent other network users according to the side of itself The sequence or marking be given to model value are helped, can be as label value when training.
For example, for j-th model, with reference to its text semantic feature Lj, timestamp tsj, and text semantic model Corresponding quality vector q, carries out quality fusion, obtains scoring functions f (), i.e.,It represents prediction Marking.
Y in above formulajCan also be used as label value when training.
For multiple text semantic models, the corresponding quality evaluation vector of each text semantic model is merged.Then, based on structure The semantic tree based on semantic context similarity is built, is merged, obtain the marking of each model.
S222:Quality evaluation according to text semantic model and the semantic tree for building, association in time influence marking is worth into The fusion of row semantic level, and fusion is weighted to semantic structure and reply structure.
Wherein, it is based on the clue structure replied to reply structure.
S2221:The corresponding quality evaluation vector of text semantic model is calculated by following formula:
In formula, q*Represent quality evaluation vector;Q represents quality vector;NtRepresent clue quantity;miRepresent model quantity;yj Other network users are represented according to the sequence be given to model value to the help of itself or is given a mark;Represent prediction Marking.
S2222:Total association in time influence marking value is calculated according to following formula:
Trr=α q*trrse T+(1-α)trrst
In formula, trr represents total association in time influence marking value;trrseRepresent the association in time that text semantic model is obtained Ordering and marking vector;trrstRepresent and reply the association in time Ordering and marking vector that structure is obtained;α represents balance trrseAnd trrst Between proportion parameter.
S2223:Model influence is ranked up according to total association in time influence marking value.
For example, indexed for the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications PLSI models and latent Dirichletal location LDA models, can obtain q*=(qTFIDF,qLSI,qLDA) and trrse= (trrTFIDF,trrLSI,trrLDA).Wherein, qTFIDFRepresent the word bag model correspondence as representative with the word frequency TFIDF of falling document frequency Quality evaluation vector;qLSIRepresent the corresponding quality evaluation vector of probability potential applications index pLSI models;qLDARepresent potential The corresponding quality evaluation vector of Di Li Crays distribution LDA models;trrTFIDFRepresent with the word frequency TFIDF of falling document frequency as representative The corresponding total association in time influence marking value of word bag model;trrLSIRepresent that probability potential applications index pLSI models are corresponding Total association in time influence marking value;trrLDARepresent the corresponding total association in time influence of latent Dirichletal location LDA models Marking value.By being merged, total association in time Ordering and marking is obtained, the influence according still further to marking to model is ranked up.
The word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications are indexed with reference to Fig. 6 PLSI models and latent Dirichletal location LDA models describe the present invention in detail as a example by text semantic model.Wherein, For training clue, in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications index pLSI models And under latent Dirichletal location LDA models, can be according to the semantic context between model in step S1121 calculating clues Similarity, then, can carry out semantic tree reconstruction according to step S1123, then choose the semantic tree collection for training and be used for The semantic tree collection of test, then model quality evaluation will be carried out for the semantic tree collection trained, then, carry out melting for semantic level Close;For test scenarios, according to structure is replied, structure fusion is carried out;Finally, by the fusion results of semantic level in structural level Fusion results carry out unified fusion, so as to realize being ranked up the potential impact of Web Community's model.
Web Community's model influence sort method based on association in time interaction fusion provided in an embodiment of the present invention, by note The structural information that the sub content information of itself is included with whole clue organically combines, and considers metadata, such as send out Interval (that reflects the liveness of model in clue) between note timestamp, the information of the people that posts, the timestamp posted etc..It is logical Cross during association in time information incorporated into sort method, it is proposed that association in time influences sort algorithm, is carried on the basis of the algorithm A kind of semantic and community structure the unified fusion framework of combination is gone out, by network structure together with content mergence.Can be abundant Using the temporal information shown by Web Community's model and semantic context information, acquirement relies only on community structure than traditional Sort method or the more preferable effect of method only with text mining.
Although each step is described according to the mode of above-mentioned precedence in above-described embodiment, this area Technical staff is appreciated that to realize the effect of the present embodiment, not necessarily in the execution of such order between different steps, It (parallel) execution simultaneously or can be performed with the order for overturning, these simple changes all protection scope of the present invention it It is interior.
Method provided in an embodiment of the present invention can also be realized using PLD, it is also possible to be embodied as calculating (it includes performing particular task or realizes the routine of particular abstract data type, program, right for machine program software or program module As, component or data structure etc.), such as embodiments in accordance with the present invention can be a kind of computer program product, and operation should Computer program product makes computer perform for demonstrated method.The computer program product is deposited including computer-readable Storage media, includes computer program logic or code section, for realizing methods described on the medium.The computer-readable is deposited Storage media can be mounted built-in medium in a computer or can be disassembled from basic computer it is removable Medium is (for example:Using the storage device of hot plug technology).The built-in medium includes but is not limited to rewritable non-volatile Memory, for example:RAM, ROM, flash memory and hard disk.The removable medium is included but is not limited to:Optical storage media (example Such as:CD-ROM and DVD), magnetic-optical storage medium (for example:MO), magnetic storage medium is (for example:Tape or mobile hard disk), with interior The media of the rewritable nonvolatile memory put are (for example:Storage card) and media with built-in ROM are (for example:ROM boxes).
The detailed description to example embodiment of the invention is provided for the purpose of illustration and description above.Be not for Limit limits the invention to described precise forms.Obviously, many variations and modifications are to those skilled in the art Speech is obvious.Selection and description of the embodiments be in order to most preferably illustrate principle of the invention and its practical application, from And make others skilled in the art it will be appreciated that various embodiments of the present invention and being suitable to specific use expected various modifications. Embodiments of the invention can omit some technical characteristics in above-mentioned technical characteristic, only solve part present in prior art Technical problem.And, described technical characteristic can be combined.Protection scope of the present invention is by appended claims And its equivalent is limited, art technology other staff can carry out respectively to the technical scheme described in appended claims Modification or replacement and combination are planted, the technical scheme after these are changed or replace it is fallen within protection scope of the present invention.

Claims (9)

1. a kind of Web Community's model based on association in time interaction fusion influences sort method, it is characterised in that methods described Including:
It is determined that the influence sequence based on association in time;
According to text semantic model, the content of text to the model carries out semantic modeling, builds similar based on semantic context The semantic tree of degree;
According to unified interactive blending algorithm, the structural information of the text semantic of the model and Web Community is merged one Rise, the influence to the model is ranked up.
2. method according to claim 1, it is characterised in that the determination is based on the specific bag of influence sequence of association in time Include:
Model to Web Community's clue is screened, the timestamp that each model is delivered in the extraction clue;
For the timestamp for extracting, ageing function and association in time weight of the model in the clue are determined;
According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of the association in time.
3. method according to claim 1, it is characterised in that described according to text semantic model, to the text of the model This content carries out semantic modeling, builds the semantic tree based on semantic context similarity, specifically includes:
According to the text semantic feature of model described in each clue described in Web Community described in the text semantic model extraction;
Based on the text semantic feature, with the clue as organizational form, by the semantic association degree between the model, The set of model described in the clue is configured to semantic tree.
4. method according to claim 3, it is characterised in that described based on the text semantic feature, with the clue It is organizational form, by the semantic association degree between the model, the set of model described in the clue is configured to language Justice tree, specifically includes:
Under the text semantic model, the semantic context similarity between model described in the clue is calculated;
According to the timestamp that the model is delivered, determine that the father node of model described in the clue is alternatively gathered;
Alternatively gathered based on the father node, according to following formula, determine the father node of model described in the clue, so as to build institute State semantic tree:
L * = arg L i m a x 1 ≤ i ≤ j - 1 S ( L i , L j )
Wherein, the L*Represent the model;The LiRepresent under the text semantic model, it is described that the clue is included The text semantic feature of model, described i, j=1......M;The M represents the quantity of the model contained by the clue.
5. method according to claim 4, it is characterised in that described under the text semantic model, calculates the line Semantic context similarity described in rope between model, specifically includes:
According to following formula calculated direction similarity:
S c o s ( L i , L j ) = L i L j T | | L i | | | | L j | |
Wherein, the Scos(Li,Lj) represent the direction similarity;The LjRepresent under the text semantic model, it is described The text semantic feature of the model that clue is included, i < j, j=1......M;
Amplitude similarity is calculated according to following formula:
S s t r ( L i , L j ) = 2 | | L i | | | | L j | | | | L i | | 2 + | | L j | | 2
Wherein, the Sstr(Li,Lj) represent the amplitude similarity;
Calculated under the text semantic model according to following formula, the semantic context phase described in the clue between model Like degree:
S ( L i , L j ) = 1 2 ( λS c o s ( L i , L j ) + ( 1 - λ ) S s t r ( L i , L j ) )
Wherein, the S (Li,Lj) represent the semantic context similarity between model described in the clue;The λ tables Show the weight for controlling the direction similarity and the amplitude similarity.
6. method according to claim 5, the text semantic model is included with the word frequency TFIDF of falling document frequency as representative Word bag model, probability potential applications index pLSI models and latent Dirichletal location LDA models;
Characterized in that, methods described also includes:
Each model is merged to be indexed in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications respectively Similarity under pLSI models and latent Dirichletal location LDA models, and fusion results are defined as institute in the clue State the semantic context similarity between model.
7. method according to claim 1, it is characterised in that described according to unified interactive blending algorithm, by the note The text semantic of son and the structural information of Web Community are merged, and the influence to the model is ranked up, and specifically includes:
The semantic tree built under the text semantic model is estimated on association in time influence Ordering and marking, is marked off Part of semantic tree collection is trained, and obtains percentage contribution of the text semantic model to Ordering and marking;
The semantic tree of quality evaluation and structure according to the text semantic model, is carried out on association in time influence marking value The fusion of semantic level, and fusion is weighted to semantic structure and reply structure.
8. method according to claim 7, it is characterised in that the institute's predicate to being built under the text semantic model Justice tree is estimated on association in time influence Ordering and marking, marks off part of semantic tree collection and is trained, and obtains the text language Adopted model is specifically included to the percentage contribution of Ordering and marking:
Merge the text semantic feature of the model under the text semantic model, the timestamp and the text The corresponding quality vector of semantic model, the association in time sequence obtained by the text semantic model for obtaining the model is beaten Point.
9. method according to claim 7, it is characterised in that the quality evaluation according to the text semantic model and The semantic tree for building, the fusion of semantic level is carried out on association in time influence marking value, and to semantic structure and reply knot Structure is weighted fusion, specifically includes:
The corresponding quality evaluation vector of the text semantic model is calculated by following formula:
q * = arg q min Σ i = 1 N t Σ j = 1 m i ( y ^ j - y j ) 2
Wherein, the q*Represent the quality evaluation vector;The q represents quality vector;The NtRepresent clue quantity;It is described miRepresent model quantity;The yjRepresent other network users according to the help of itself to the model sequence that is given of value or Marking;It is describedRepresent the marking of prediction;
Total association in time influence marking value is calculated according to following formula:
Trr=α q*trrse T+(1-α)trrst
Wherein, the trr represents total association in time influence marking value;The trrseRepresent that the text semantic model is obtained The association in time Ordering and marking vector for arriving;The trrstRepresent described and reply the association in time Ordering and marking vector that structure is obtained; The α represents the balance trrseWith the trrstBetween proportion parameter;
Model influence is ranked up according to total association in time influence marking value.
CN201611249593.0A 2016-12-29 2016-12-29 Web Community's model influence sort method based on association in time interaction fusion Pending CN106886561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611249593.0A CN106886561A (en) 2016-12-29 2016-12-29 Web Community's model influence sort method based on association in time interaction fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611249593.0A CN106886561A (en) 2016-12-29 2016-12-29 Web Community's model influence sort method based on association in time interaction fusion

Publications (1)

Publication Number Publication Date
CN106886561A true CN106886561A (en) 2017-06-23

Family

ID=59175788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611249593.0A Pending CN106886561A (en) 2016-12-29 2016-12-29 Web Community's model influence sort method based on association in time interaction fusion

Country Status (1)

Country Link
CN (1) CN106886561A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992596A (en) * 2017-12-12 2018-05-04 百度在线网络技术(北京)有限公司 A kind of Text Clustering Method, device, server and storage medium
CN108717469A (en) * 2018-06-11 2018-10-30 北京五八信息技术有限公司 A kind of model sort method, device, equipment and computer readable storage medium
CN110727794A (en) * 2018-06-28 2020-01-24 上海传漾广告有限公司 System and method for collecting and analyzing network semantics and summarizing and analyzing content
CN110825939A (en) * 2019-09-19 2020-02-21 五八有限公司 Method and device for generating and sorting scores of posts, electronic equipment and storage medium
CN114579739A (en) * 2022-01-12 2022-06-03 中国电子科技集团公司第十研究所 Topic detection and tracking method for text data stream

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010123264A2 (en) * 2009-04-20 2010-10-28 Konkuk University Industrial Cooperation Corp. Online community post search method and apparatus based on interactions between online community users and computer readable storage medium storing program thereof
CN102270240A (en) * 2011-08-15 2011-12-07 哈尔滨工业大学 Method for discovering hot views in network forum and analyzing evolvement trend thereof
CN102637170A (en) * 2011-02-10 2012-08-15 北京百度网讯科技有限公司 Question pushing method and system
CN104077354A (en) * 2014-05-29 2014-10-01 小米科技有限责任公司 Forum post heat determining method and related device thereof
CN103488637B (en) * 2012-06-11 2016-12-14 北京大学 A kind of method carrying out expert Finding based on dynamics community's excavation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010123264A2 (en) * 2009-04-20 2010-10-28 Konkuk University Industrial Cooperation Corp. Online community post search method and apparatus based on interactions between online community users and computer readable storage medium storing program thereof
CN102637170A (en) * 2011-02-10 2012-08-15 北京百度网讯科技有限公司 Question pushing method and system
CN102270240A (en) * 2011-08-15 2011-12-07 哈尔滨工业大学 Method for discovering hot views in network forum and analyzing evolvement trend thereof
CN103488637B (en) * 2012-06-11 2016-12-14 北京大学 A kind of method carrying out expert Finding based on dynamics community's excavation
CN104077354A (en) * 2014-05-29 2014-10-01 小米科技有限责任公司 Forum post heat determining method and related device thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIANG YOU等: "A Unified Fusion Framework for Time-Related Rank in Threaded Discussion Communities", 《SPRINGER INTERNATIONAL PUBLISHING SWITZERLAND 2014》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992596A (en) * 2017-12-12 2018-05-04 百度在线网络技术(北京)有限公司 A kind of Text Clustering Method, device, server and storage medium
CN107992596B (en) * 2017-12-12 2021-05-18 百度在线网络技术(北京)有限公司 Text clustering method, text clustering device, server and storage medium
CN108717469A (en) * 2018-06-11 2018-10-30 北京五八信息技术有限公司 A kind of model sort method, device, equipment and computer readable storage medium
CN108717469B (en) * 2018-06-11 2021-11-23 北京五八信息技术有限公司 Post sorting method, device and equipment and computer readable storage medium
CN110727794A (en) * 2018-06-28 2020-01-24 上海传漾广告有限公司 System and method for collecting and analyzing network semantics and summarizing and analyzing content
CN110825939A (en) * 2019-09-19 2020-02-21 五八有限公司 Method and device for generating and sorting scores of posts, electronic equipment and storage medium
CN110825939B (en) * 2019-09-19 2023-10-13 五八有限公司 Post score generation and ordering method and device, electronic equipment and storage medium
CN114579739A (en) * 2022-01-12 2022-06-03 中国电子科技集团公司第十研究所 Topic detection and tracking method for text data stream
CN114579739B (en) * 2022-01-12 2023-05-30 中国电子科技集团公司第十研究所 Topic detection and tracking method for text data stream

Similar Documents

Publication Publication Date Title
Chaturvedi et al. Story comprehension for predicting what happens next
CN106886561A (en) Web Community's model influence sort method based on association in time interaction fusion
Ge et al. Explainable metaphor identification inspired by conceptual metaphor theory
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
US20160357854A1 (en) Scenario generating apparatus and computer program therefor
CN107590134A (en) Text sentiment classification method, storage medium and computer
CN108710680A (en) It is a kind of to carry out the recommendation method of the film based on sentiment analysis using deep learning
Jimenez et al. SOFTCARDINALITY: Hierarchical text overlap for student response analysis
CN108038725A (en) A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN107515873A (en) A kind of junk information recognition methods and equipment
CN103793503A (en) Opinion mining and classification method based on web texts
US20160321244A1 (en) Phrase pair collecting apparatus and computer program therefor
CN103646088A (en) Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN108038205A (en) For the viewpoint analysis prototype system of Chinese microblogging
CN106844632A (en) Based on the product review sensibility classification method and device that improve SVMs
CN103995853A (en) Multi-language emotional data processing and classifying method and system based on key sentences
CN106997341A (en) A kind of innovation scheme matching process, device, server and system
CN109726745A (en) A kind of sensibility classification method based on target incorporating description knowledge
CN109684636B (en) Deep learning-based user emotion analysis method
Mallya et al. Recurrent models for situation recognition
CN110458373A (en) A kind of method of crime prediction and system of the fusion of knowledge based map
CN112699240A (en) Intelligent dynamic mining and classifying method for Chinese emotional characteristic words
Sadr et al. Unified topic-based semantic models: A study in computing the semantic relatedness of geographic terms
CN111353044A (en) Comment-based emotion analysis method and system
CN108733652A (en) The test method of film review emotional orientation analysis based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170623

WD01 Invention patent application deemed withdrawn after publication