CN106886561A - Web Community's model influence sort method based on association in time interaction fusion - Google Patents
Web Community's model influence sort method based on association in time interaction fusion Download PDFInfo
- Publication number
- CN106886561A CN106886561A CN201611249593.0A CN201611249593A CN106886561A CN 106886561 A CN106886561 A CN 106886561A CN 201611249593 A CN201611249593 A CN 201611249593A CN 106886561 A CN106886561 A CN 106886561A
- Authority
- CN
- China
- Prior art keywords
- model
- semantic
- association
- clue
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000004927 fusion Effects 0.000 title claims abstract description 35
- 230000003993 interaction Effects 0.000 title claims abstract description 15
- 230000002452 interceptive effect Effects 0.000 claims abstract description 7
- 238000002156 mixing Methods 0.000 claims abstract description 7
- 238000013441 quality evaluation Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 9
- 230000032683 aging Effects 0.000 claims description 8
- 238000005295 random walk Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 description 11
- 238000000354 decomposition reaction Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000015654 memory Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010009 beating Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000003447 ipsilateral effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Abstract
The present invention relates to a kind of Web Community's model influence sort method based on association in time interaction fusion.The method includes:It is determined that the influence sequence based on association in time;According to text semantic model, the content of text to model carries out semantic modeling, builds the semantic tree based on semantic context similarity;According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged, the influence to model is ranked up.In association in time information incorporated into sort method, form the mode of association in time influence sequence, the unified fusion framework with reference to semantic and community structure is realized on this basis, network structure and content of text are merged, so as to solve pinpointing the problems for potential impact source in Web Community, the shown temporal information and semantic context information of Web Community's model can be made full use of.
Description
Technical field
It is more particularly to a kind of that fusion is interacted based on association in time the present invention relates to Web Mining and social computing technical field
Web Community model influence sort method.
Background technology
With flourishing for network, increasing user likes in online communation interest, states one's views.Then, have
Some models for making a strong impact are emerged, and have attracted substantial amounts of user to participate in.It is indiscriminate different from whole internet
Include numerous and jumbled information, Web Community oftenes focus on one or several related fields, and these Web Communities become spy
The ground of person's quick obtaining information of determining area research or the ideal for making a policy.In view of the influence power of some Web Communities, active
Degree and the content distribution platform as specific area stabilization, excavate Web Community and find out valuable or potential influence
Big model just turns into problem demanding prompt solution.
Some Web Communities have specific network structure (explicit or implicit).Such as one classics with line
The structure of the zone of discussion of rope tissue is as shown in Figure 1.The longitudinal axis reflect each tissue clue, transverse axis reflection be timeline to
Preceding development, it is each in each clue to discuss that model is pushed ahead by interaction reply.Based on the basic analyzing method of influence, potential shadow
The quality height correlation with model is rung, while the liveness replied with model in clue is also closely connected, the model that the former considers
The content information of itself, and the latter then reflects the structural information that whole clue is included.The model of network interaction is unless the context
Outside structural information, also include abundant metadata, such as timestamp of posting, the information for the people that posts etc..The time posted
Interval between stamp also reflects the liveness of model in clue, and this is often ignored in conventional work.
In view of this, it is special to propose the present invention.
The content of the invention
In order to solve above mentioned problem of the prior art, the potential impact source in solving Web Community that has been is pinpointed the problems
And a kind of Web Community's model influence sort method based on association in time interaction fusion is provided.
To achieve these goals, there is provided following technical scheme:
A kind of Web Community's model influence sort method based on association in time interaction fusion, the method includes:
It is determined that the influence sequence based on association in time;
According to text semantic model, the content of text to model carries out semantic modeling, builds similar based on semantic context
The semantic tree of degree;
According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged one
Rise, the influence to model is ranked up.
Preferably, it is determined that the influence sequence based on association in time is specifically included:
Model to Web Community's clue is screened, the timestamp that each model is delivered in extraction clue;
For the timestamp for extracting, ageing function and association in time weight of the model in clue are determined;
According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of association in time.
Preferably, according to text semantic model, the content of text to model carries out semantic modeling, builds under being based on semantically
The semantic tree of literary similarity, specifically includes:
According to the text semantic feature of model in each clue of text semantic model extraction Web Community;
Based on text semantic feature, with clue as organizational form, by the semantic association degree between model, by clue
The set of model is configured to semantic tree.
Text semantic feature is preferably based on, with clue as organizational form, by the semantic association degree between model,
The set of model in clue is configured to semantic tree, is specifically included:
Under text semantic model, the semantic context similarity in calculating clue between model;
According to the timestamp that model is delivered, determine that the father node of model in clue is alternatively gathered;
Alternatively gathered based on father node, according to following formula, the father node of model in clue is determined, so as to build semantic tree:
Wherein, L*Represent model;LiRepresent under text semantic model, the text semantic feature of the model that clue is included, i,
J=1......M;M represents the quantity of model contained by clue.
Preferably, under text semantic model, the semantic context similarity in calculating clue between model, specific bag
Include:
According to following formula calculated direction similarity:
Wherein, Scos(Li,Lj) represent direction similarity;LjRepresent under text semantic model, the model that clue is included
Text semantic feature, i < j, j=1......M;
Amplitude similarity is calculated according to following formula:
Wherein, Sstr(Li,Lj) represent amplitude similarity;
Calculated under text semantic model according to following formula, the semantic context similarity in clue between model:
Wherein, S (Li,Lj) represent the semantic context similarity between model in clue;λ represents control direction similarity
With the weight of amplitude similarity.
Preferably, text semantic model is potential including the word bag model with the word frequency TFIDF of falling document frequency as representative, probability
Semantic indexing pLSI models and latent Dirichletal location LDA models;
The method also includes:
Each model is merged respectively in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications rope
Draw the similarity under pLSI models and latent Dirichletal location LDA models, and fusion results are defined as model in clue
Between semantic context similarity.
Preferably, according to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is melted
It is combined, the influence to model is ranked up, and specifically includes:
The semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, part is marked off
Semantic tree collection is trained, and obtains percentage contribution of the text semantic model to Ordering and marking;
The semantic tree of quality evaluation and structure according to text semantic model, semanteme is carried out on association in time influence marking value
The fusion of aspect, and fusion is weighted to semantic structure and reply structure.
Preferably, the semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, is drawn
Separate part of semantic tree collection to be trained, obtain percentage contribution of the text semantic model to Ordering and marking, specifically include:
Text semantic feature of the fusion model under text semantic model, timestamp and the corresponding matter of text semantic model
Amount vector, obtains the association in time Ordering and marking obtained by text semantic model of model.
Preferably, the quality evaluation according to text semantic model and the semantic tree for building, marking value is influenceed on association in time
The fusion of semantic level is carried out, and fusion is weighted to semantic structure and reply structure, specifically included:
The corresponding quality evaluation vector of text semantic model is calculated by following formula:
Wherein, q*Represent quality evaluation vector;Q represents quality vector;NtRepresent clue quantity;miRepresent model quantity;yj
Represent other network users according to the sequence or marking provided to model value to the help of itself;Represent beating for prediction
Point;
Total association in time influence marking value is calculated according to following formula:
Trr=α q*trrse T+(1-α)trrst
Wherein, trr represents total association in time influence marking value;trrseRepresent the association in time that text semantic model is obtained
Ordering and marking vector;trrstRepresent and reply the association in time Ordering and marking vector that structure is obtained;α represents balance trrseAnd trrst
Between proportion parameter;
Model influence is ranked up according to total association in time influence marking value.
The present invention provides a kind of Web Community's model influence sort method based on association in time interaction fusion.The method bag
Include:It is determined that the influence sequence based on association in time;According to text semantic model, the content of text to model carries out semantic modeling,
Build the semantic tree based on semantic context similarity;According to unified interactive blending algorithm, by the text semantic and net of model
The structural information of network community is merged, and the influence to model is ranked up.Sequence side is incorporated by by association in time information
In method, the mode of association in time influence sequence is formed, the unification with reference to semantic and community structure is realized on this basis
Fusion framework, network structure and content of text is merged, so as to solve the discovery in potential impact source in Web Community
Problem, can make full use of the shown temporal information and semantic context information of Web Community's model, obtain than traditional
Rely only on the sort method or the more preferable effect of method only with text mining of community structure.
Brief description of the drawings
Fig. 1 is the Web Community's schematic diagram organized with clue discussion;
Fig. 2 is the Web Community's model influence sort method based on association in time interaction fusion according to embodiments of the present invention
Schematic flow sheet;
Fig. 3 is the interval of timestamps distribution schematic diagram that a data set according to embodiments of the present invention is extracted;
Fig. 4 be it is according to embodiments of the present invention be that association in time influence marking is worth the shadow to be formed from a certain model discharge rate
Ring flow diagram;
Fig. 5 a are that the structure of the word bag model with the word frequency TFIDF of falling document frequency as representative according to embodiments of the present invention is shown
It is intended to;
Fig. 5 b are the structural representations that probability potential applications according to embodiments of the present invention index pLSI models;
Fig. 5 c are the structural representations of latent Dirichletal location LDA models according to embodiments of the present invention;
Fig. 6 is the unified fusion block schematic illustration based on model quality evaluation according to embodiments of the present invention.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little implementation methods are used only for explaining know-why of the invention, it is not intended that limit the scope of the invention.
Method provided in an embodiment of the present invention is not restricted to the hardware and programming language of carrying out practically, uses such as C languages
Any language such as speech, VC++, Java, Python is write and can be realized, is that this other mode of operation is repeated no more.Example
Such as:Method provided in an embodiment of the present invention can apply to following system, it include by system bus connect processor,
ROM (read-only storage) and RAM (random access memory) and interface.Wherein, be stored with operating system and database in ROM.Treatment
Device is used to provide calculating and control ability.The processor can perform method provided in an embodiment of the present invention.It is excellent that one is only lifted below
Example is selected, it is 2.67G hertz of four core central processors and the computer of 4G byte of memorys to use one to have dominant frequency, is used in combination
Python realizes method provided in an embodiment of the present invention.
Some Web Communities have specific network structure (explicit or implicit).Such as one classics with line
The structure of the zone of discussion of rope tissue is as shown in Figure 1.The longitudinal axis reflect each tissue clue, transverse axis reflection be timeline to
Preceding development, it is each in each clue to discuss that model is pushed ahead by interaction reply.Based on the basic analyzing method of influence, potential shadow
The quality height correlation with model is rung, while the liveness replied with model in clue is also closely connected, the model that the former considers
The content information of itself, and the latter then reflects the structural information that whole clue is included.The model of network interaction is unless the context
Outside structural information, also include abundant metadata, such as timestamp of posting, the information of the people that posts, the timestamp posted
Between interval etc., this is often ignored in conventional work.
Therefore, the embodiment of the present invention proposes a kind of model influence sequence side of Web Community based on association in time interaction fusion
Method.As shown in Fig. 2 the method can be realized by following steps:
S200:It is determined that the influence sequence based on association in time.
Specifically, this step can include:
S201:Model to Web Community's clue is screened, the timestamp that each model is delivered in extraction clue.
This step is screened by the model to Web Community's clue (preferably discussion clue), and removal includes model
Very few clue, to extract the timestamp information that each model in clue is delivered.The timestamp information that the model of extraction is delivered shows
Position of the model on timeline in clue, the time interval length between the number of same clue model and different models are shown
Reflect the active degree of model in clue.This has greatly effect when association in time Ordering and marking algorithm is modeled below.
The timestamp information that each model in extraction clue is delivered is illustrated with reference to Fig. 3.
Assuming that the given model with timestamp integrates, to be combined into clue be D.Model to Web Community's clue is screened,
Removal includes the very few clue of model (such as:Comprising the clue less than 3 models), then clue D can be expressed as one it is oriented
Figure G (V, E, ts).Wherein, V represents the set of node that the set of all models is formed;E represents that the reply between model is formed
Side collection;Ts represents the timestamp set of model in clue.A node in each model representative graph.For example:If model v
Reply model u, then a line (u → v) ∈ E are just constituted between model.Fig. 3 exemplarily give a data set extraction when
Between stamp be spaced apart schematic diagram.
It should be noted that each model represents a node in set up figure, herein during described below,
Will no longer be distinguished the connotation of model and node.
S202:For the timestamp for extracting, ageing function and association in time weight of the model in clue are determined.
When it is determined that association in time influence is sorted, the embodiment of the present invention considers situations below:(1) model in clue should
This is delivered in time;(2) model in clue is once delivering, it should can attract the follow-up of a large amount of replies;(3) follow-up to the greatest extent should may be used
Former note can in time be replied.So, the model for meeting above-mentioned situation potentially influences sequence forward.
The association in time weight and ageing function that define in clue are illustrated with reference to Fig. 3.
In the digraph modeled comprising model clue, define the corresponding association in time weights of side (u → v) ∈ E for w (u,
V), and define model v ageing function be h (v).Wherein, the association in time weight indicates a certain model and other models
Between frequency of interaction.The ageing function indicates model v relative positions on the time line in clue.
The distribution that time interval according to Fig. 3 is presented, defines association in time weight w (u, v) for time interval letter
Number K (tsu,tsv)=log (tsu-tsv).Wherein, tsvRepresent the timestamp of the current model v for considering;tsuRepresent current model v
Money order receipt to be signed and returned to the sender u timestamp.
It is model v timestamps ts to define model v ageing functions h (v) in itselfvFunction H (tsv):
Wherein, tsminRepresent first timestamp of model in clue;tsmaxRepresent in clue last on timeline
The timestamp of model.
S203:According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of association in time.
Fig. 4 is schematically illustrated from the influence that certain model discharge rate is the formation of association in time influence marking value trr
Stream.Wherein, it is the influence stream that association in time influence marking trr is formed from certain model discharge rate, is w by a length,
Sectional area introduces model to be investigated for the conduit of h.Assuming that model to be investigated is v, and there is semantic context with it and associate
Model be followed successively by x, y ..., u, it is assumed that t-1 moment, the influence stream of model x, y ..., u is followed successively by trr(t-1)(x),trr(t -1)(y),…,trr(t-1)(u), then t, the influence stream trr of model to be investigated(t)V () can be by following iterative formula
Solve:
The influence sequence for calculating association in time is illustrated in the way of preferred embodiment below.
It is w by a length, the conduit that sectional area is h introduces model to be investigated.
The first step, sets the trr between each node at random.Wherein, trr represents that the association in time influence sequence of each node is beaten
Score value.Preferably, the result for being obtained using ageing function is used as initial value, i.e. trr(1)(v)=h (v)=H (tsv), it is assumed that
In t-1 steps, the marking for introducing the node u of node v influences to be investigated is trr(t-1)(u), then in t steps, node v to be investigated
Influence marking can be solved by following equation:
So by iteration to restraining, the association in time influence marking of each model in clue is obtained, finally by right
Influence of the marking result answered to model is ranked up.
S210:According to text semantic model, the content of text to model carries out semantic modeling, builds and be based on semantic context
The semantic tree of similarity.
Specifically, this step can include:
S211:According to the text semantic feature of model in each clue of text semantic model extraction Web Community.
Wherein, text semantic model includes but is not limited to word bag model with the word frequency TFIDF of falling document frequency as representative, general
Rate potential applications index pLSI models and latent Dirichletal location LDA models.Fig. 5 a-5c schematically illustrate respectively with
The word frequency TFIDF of falling document frequency is the word bag model of representative, probability potential applications index pLSI models and potential Di Li Crays
Distribute the structural representation of LDA models.Wherein, TFIDF contains information of both word frequency TF and the IDF of falling document frequency, preceding
Person reflects the number of times that a word occurs in a document, and the latter then reflects the inverse of the document number comprising the word.
By taking TFIDF as an example, for M model under clue D, if word wiIn model djThe word frequency of middle appearance is tf (wi,
dj), the model frequency of appearance is df (wi), then can calculate document frequency by following formula:
In formula, idf (wi) represent document frequency.
With reference to the word frequency TF and IDF of falling document frequency, word frequency is calculated according to following formula and falls document frequency:
In formula, tfidf (wi, dj) represent that word frequency falls document frequency;tf(wi,dj) represent the word frequency occurred in model;idf
(wi) represent document frequency.
Due to pLSI models consider be document, hide Categories (or be referred to as " concept ") and word relation, so, root
Singular value decomposition is carried out to document frequency matrix according to following formula:
W=U Σ VT;
Wherein, W represents document frequency matrix, and its i-th row, the element of jth row are word frequency tf (wi,dj)。
In above formula, U and V is orthogonal matrix and UTU=VTV=I.Wherein, UTAnd VTWhat is be corresponding in turn to is to matrix U
With the transposed matrix of V.
In actual applications, singular value in diagonal matrix Σ on leading diagonal from top to bottom can be selected successively from big to small
Arrangement, the number of the singular value of diagonal matrix represents the number of potential semantic concept, and the size table of the singular value of diagonal matrix
The power of potential semantic concept is shown.Preferably, in order to suppress to deviate in semantic space the noise of Semantic center, to remove one
The influence of a little noises, generally requires to do approximate processing, it is only necessary to retain the several singular values of maximum of which, and neglect less
Singular value.Such as, in the statement of some models, there are some rare misspellings, because they are document frequency matrix W's
Value is smaller and extremely sparse, and after singular value decomposition is carried out, these words will form the singular value of very little, and we are by ignoring this
A little less singular values, and then remove the influence that they are caused to the potential concept of core.
The concept produced in pLSI models comes from document in itself, by singular value decomposition in pLSI models, can be by word
Semantically doing a kind of effect of similar cluster.Word is more similar in semantic space, then by being obtained just after singular value decomposition
The distance between corresponding term vector is also smaller in handing over matrix U, original document frequency matrix W represent N number of word and M model it
Between relation, by after singular value decomposition and the larger Z singular value of selection, generating Z potential semantic concept, typically and
Speech Z will be far smaller than N, and so N number of different word will be assigned in Z concept so as to complete the work of the similar cluster of word
With.
The thinking of LDA models is similar to the thinking of pLSI models.Difference is that the concept of generation in pLSI models comes from
In itself, and the concept in LDA models comes from topic distribution to document, and the topic is distributed with the scale of document that it doesn't matter.Specifically
For, LDA models describe a generation process for model, it is assumed that each model has implicit " a potential topic "
Layer, the layer is the mixture of several " potential topics ", and " potential topic " (abbreviation topic) is to represent potential in model
Semanteme, LDA models produce a process for model can be:Set up one it is relatively wide in range if exam pool, this model of design
From the topic storehouse, it is processed, such as some concepts come from " military affairs " topic, and some concepts come from " humanity " words
Topic, the proportion shared by different topics and topic is formed topic distribution, and these concepts have their exclusive dictionaries, from this
Vocabulary is constantly selected in the dictionary of a little correspondence concepts, sentence is formed, so as to constitute chapter.
Above-mentioned three kinds of models portraying on text semantic yardstick be from bottom to high-rise process, gradually from word in itself to
Topic it is abstract.In the follow-up calculating of model, the corresponding text of three kinds of semantic models of model is special in can simultaneously extracting clue
Levy expression (i.e. text semantic feature).
Its semanteme is individually obtained to model in Web Community extremely difficult, its reason is these models often a few isolated words and phrases,
Extremely sparse in higher-dimension semantic space, model semanteme in itself is incomplete, lacks substantial amounts of background knowledge.By knowledge
The method of engineering often expends substantial amounts of manpower in order to some models individually go to build substantial amounts of associated context semantic information, its knot
What fruit was even lost more than gain.Because the model in Web Community is not independently of other models, particularly with clue
The model set that mode is constituted, there is extremely strong semantic association between them.Thus, the embodiment of the present invention passes through step
S212 builds semantic tree.
S212:Based on text semantic feature, with clue as organizational form, by the semantic association degree between model, will
The set of model is configured to semantic tree in clue.
By above-mentioned steps, just there is similar semanteme on semantic tree between adjacent node, so as to each other can
Certain contextual information is provided.
Specifically, this step can include:
S2121:Under text semantic model, the semantic context similarity in calculating clue between model.
For similarity, the embodiment of the present invention considers the directional information and amplitude information between semanteme.So, in clue
Semantic context similarity (or referred to as similarity measure) between model is direction similarity and the weighted sum of amplitude similarity.
Further, this step can include:
Step 1:According to following formula calculated direction similarity:
In formula, Scos(Li,Lj) represent direction similarity;LiRepresent under text semantic model, the model that clue is included
Text semantic feature, i=1......M;M represents the quantity of model contained by clue;LjRepresent under text semantic model, clue
Comprising model text semantic feature, i < j, j=1......M.
In above formula, if taking i < j, tsi< tsj, wherein,WithRepresent from model extract when
Between stab.
Direction similarity Scos(Li,Lj) cosine similarity is represented, the distance reflects the feature side represented by two models
Upward similarity, embodiment is uniformity between two models semanteme.
Step 2:Amplitude similarity is calculated according to following formula:
In formula, Sstr(Li,Lj) represent amplitude similarity.
Wherein, amplitude similarity shows similitude of two models in semantic intensity.
Step 3:Calculated under text semantic model according to following formula, the semantic context similarity in clue between model:
In formula, S (Li,Lj) represent the semantic context similarity between model in clue;λ represents control direction similarity
With the weight of amplitude similarity.
If can also carry out step 4 using multiple text semantic models in implementation process.For example:Text semantic
Model can be word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications index pLSI models and
Latent Dirichletal location LDA models.
Step 4:Merge each model potential in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability respectively
Similarity under semantic indexing pLSI models and latent Dirichletal location LDA models, and fusion results are defined as clue
Semantic context similarity between middle model.
Similarity of this step by unified fusion framework by each model under different semantic models is merged.
S2122:According to the timestamp that model is delivered, determine that the father node of model in clue is alternatively gathered.
For example, in certain discussion clue, for wherein specifically certain is characterized as LjModel, can be from the time
Stab early model characteristic setOne model of middle selection is used as its father node.
In the way of describing the alternative set of father node for determining model in clue in detail by a preferred embodiment below:It is false
If the timestamp that certain model is set up is 58 minutes November 10 day 14 point in 2015, then the early model set of timestamp represents foundation
Time earlier than the range subclass in 58 minutes November 10 day 14 point in 2015, using these model set as father node select it is standby
Selected works are closed.
It should be noted that above-mentioned be only assumed as citing, the improper restriction to the scope of the present invention is not construed as.
S2123:Alternatively gathered based on father node, according to following formula, the father node of model in clue is determined, so as to build semanteme
Tree:
In formula, L*Represent model;LiRepresent under text semantic model, the text semantic feature of the model that clue is included, i,
J=1......M.
This step can be considered as the optimization problem to following formula:
Above formula can be regarded as selection model L*, make itself and LjUnder the similarity (i.e. measuring similarity function) of definition
The value for arriving is maximum.
After sequence number j is decremented to 2 from M, a semantic tree based on semantic similarity measurement is (i.e. based on semantic context
Reply structure) just rebuild successfully.
S220:According to unified interactive blending algorithm, the structural information of the text semantic of model and Web Community is merged
Together, the influence to model is ranked up.
The text semantic model different under vector space, the embodiment of the present invention extracts a kind of qualitative factor to describe
Contribution of these semantic models to final Ordering and marking result has much.
Specifically, this step can be realized by step S221 and step S222.Wherein:
S221:The semantic tree built under text semantic model is estimated on association in time influence Ordering and marking, is divided
Go out part of semantic tree collection to be trained, obtain percentage contribution of the text semantic model to Ordering and marking.
Specifically, this step can include:Text semantic feature of the fusion model under text semantic model, timestamp with
And the corresponding quality vector of text semantic model, the association in time obtained by the text semantic model sequence for obtaining the model beats
Point.
The embodiment of the present invention consider text semantic model never ipsilateral or from different semantic hierarchies to the semanteme of clue
Space is portrayed, then the quality that text semantic model is portrayed, i.e. quality vector can be weighed by a value.For K
Text semantic model, its corresponding quality vector is q=(q1,...,qk,...,qK)。
In actual applications, N can randomly be selected from all cluestIndividual clue is used to train.For i-th clue
Di, there is miIndividual modelIt is included, wherein LjRepresent clue DiIn j-th content of model certain text
Characteristic vector under this semantic model, tsjRepresent the timestamp of model, and yjRepresent other network users according to the side of itself
The sequence or marking be given to model value are helped, can be as label value when training.
For example, for j-th model, with reference to its text semantic feature Lj, timestamp tsj, and text semantic model
Corresponding quality vector q, carries out quality fusion, obtains scoring functions f (), i.e.,It represents prediction
Marking.
Y in above formulajCan also be used as label value when training.
For multiple text semantic models, the corresponding quality evaluation vector of each text semantic model is merged.Then, based on structure
The semantic tree based on semantic context similarity is built, is merged, obtain the marking of each model.
S222:Quality evaluation according to text semantic model and the semantic tree for building, association in time influence marking is worth into
The fusion of row semantic level, and fusion is weighted to semantic structure and reply structure.
Wherein, it is based on the clue structure replied to reply structure.
S2221:The corresponding quality evaluation vector of text semantic model is calculated by following formula:
In formula, q*Represent quality evaluation vector;Q represents quality vector;NtRepresent clue quantity;miRepresent model quantity;yj
Other network users are represented according to the sequence be given to model value to the help of itself or is given a mark;Represent prediction
Marking.
S2222:Total association in time influence marking value is calculated according to following formula:
Trr=α q*trrse T+(1-α)trrst
In formula, trr represents total association in time influence marking value;trrseRepresent the association in time that text semantic model is obtained
Ordering and marking vector;trrstRepresent and reply the association in time Ordering and marking vector that structure is obtained;α represents balance trrseAnd trrst
Between proportion parameter.
S2223:Model influence is ranked up according to total association in time influence marking value.
For example, indexed for the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications
PLSI models and latent Dirichletal location LDA models, can obtain q*=(qTFIDF,qLSI,qLDA) and trrse=
(trrTFIDF,trrLSI,trrLDA).Wherein, qTFIDFRepresent the word bag model correspondence as representative with the word frequency TFIDF of falling document frequency
Quality evaluation vector;qLSIRepresent the corresponding quality evaluation vector of probability potential applications index pLSI models;qLDARepresent potential
The corresponding quality evaluation vector of Di Li Crays distribution LDA models;trrTFIDFRepresent with the word frequency TFIDF of falling document frequency as representative
The corresponding total association in time influence marking value of word bag model;trrLSIRepresent that probability potential applications index pLSI models are corresponding
Total association in time influence marking value;trrLDARepresent the corresponding total association in time influence of latent Dirichletal location LDA models
Marking value.By being merged, total association in time Ordering and marking is obtained, the influence according still further to marking to model is ranked up.
The word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications are indexed with reference to Fig. 6
PLSI models and latent Dirichletal location LDA models describe the present invention in detail as a example by text semantic model.Wherein,
For training clue, in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications index pLSI models
And under latent Dirichletal location LDA models, can be according to the semantic context between model in step S1121 calculating clues
Similarity, then, can carry out semantic tree reconstruction according to step S1123, then choose the semantic tree collection for training and be used for
The semantic tree collection of test, then model quality evaluation will be carried out for the semantic tree collection trained, then, carry out melting for semantic level
Close;For test scenarios, according to structure is replied, structure fusion is carried out;Finally, by the fusion results of semantic level in structural level
Fusion results carry out unified fusion, so as to realize being ranked up the potential impact of Web Community's model.
Web Community's model influence sort method based on association in time interaction fusion provided in an embodiment of the present invention, by note
The structural information that the sub content information of itself is included with whole clue organically combines, and considers metadata, such as send out
Interval (that reflects the liveness of model in clue) between note timestamp, the information of the people that posts, the timestamp posted etc..It is logical
Cross during association in time information incorporated into sort method, it is proposed that association in time influences sort algorithm, is carried on the basis of the algorithm
A kind of semantic and community structure the unified fusion framework of combination is gone out, by network structure together with content mergence.Can be abundant
Using the temporal information shown by Web Community's model and semantic context information, acquirement relies only on community structure than traditional
Sort method or the more preferable effect of method only with text mining.
Although each step is described according to the mode of above-mentioned precedence in above-described embodiment, this area
Technical staff is appreciated that to realize the effect of the present embodiment, not necessarily in the execution of such order between different steps,
It (parallel) execution simultaneously or can be performed with the order for overturning, these simple changes all protection scope of the present invention it
It is interior.
Method provided in an embodiment of the present invention can also be realized using PLD, it is also possible to be embodied as calculating
(it includes performing particular task or realizes the routine of particular abstract data type, program, right for machine program software or program module
As, component or data structure etc.), such as embodiments in accordance with the present invention can be a kind of computer program product, and operation should
Computer program product makes computer perform for demonstrated method.The computer program product is deposited including computer-readable
Storage media, includes computer program logic or code section, for realizing methods described on the medium.The computer-readable is deposited
Storage media can be mounted built-in medium in a computer or can be disassembled from basic computer it is removable
Medium is (for example:Using the storage device of hot plug technology).The built-in medium includes but is not limited to rewritable non-volatile
Memory, for example:RAM, ROM, flash memory and hard disk.The removable medium is included but is not limited to:Optical storage media (example
Such as:CD-ROM and DVD), magnetic-optical storage medium (for example:MO), magnetic storage medium is (for example:Tape or mobile hard disk), with interior
The media of the rewritable nonvolatile memory put are (for example:Storage card) and media with built-in ROM are (for example:ROM boxes).
The detailed description to example embodiment of the invention is provided for the purpose of illustration and description above.Be not for
Limit limits the invention to described precise forms.Obviously, many variations and modifications are to those skilled in the art
Speech is obvious.Selection and description of the embodiments be in order to most preferably illustrate principle of the invention and its practical application, from
And make others skilled in the art it will be appreciated that various embodiments of the present invention and being suitable to specific use expected various modifications.
Embodiments of the invention can omit some technical characteristics in above-mentioned technical characteristic, only solve part present in prior art
Technical problem.And, described technical characteristic can be combined.Protection scope of the present invention is by appended claims
And its equivalent is limited, art technology other staff can carry out respectively to the technical scheme described in appended claims
Modification or replacement and combination are planted, the technical scheme after these are changed or replace it is fallen within protection scope of the present invention.
Claims (9)
1. a kind of Web Community's model based on association in time interaction fusion influences sort method, it is characterised in that methods described
Including:
It is determined that the influence sequence based on association in time;
According to text semantic model, the content of text to the model carries out semantic modeling, builds similar based on semantic context
The semantic tree of degree;
According to unified interactive blending algorithm, the structural information of the text semantic of the model and Web Community is merged one
Rise, the influence to the model is ranked up.
2. method according to claim 1, it is characterised in that the determination is based on the specific bag of influence sequence of association in time
Include:
Model to Web Community's clue is screened, the timestamp that each model is delivered in the extraction clue;
For the timestamp for extracting, ageing function and association in time weight of the model in the clue are determined;
According to Random Walk Algorithm, and the method for passing through iteration, determine the influence sequence of the association in time.
3. method according to claim 1, it is characterised in that described according to text semantic model, to the text of the model
This content carries out semantic modeling, builds the semantic tree based on semantic context similarity, specifically includes:
According to the text semantic feature of model described in each clue described in Web Community described in the text semantic model extraction;
Based on the text semantic feature, with the clue as organizational form, by the semantic association degree between the model,
The set of model described in the clue is configured to semantic tree.
4. method according to claim 3, it is characterised in that described based on the text semantic feature, with the clue
It is organizational form, by the semantic association degree between the model, the set of model described in the clue is configured to language
Justice tree, specifically includes:
Under the text semantic model, the semantic context similarity between model described in the clue is calculated;
According to the timestamp that the model is delivered, determine that the father node of model described in the clue is alternatively gathered;
Alternatively gathered based on the father node, according to following formula, determine the father node of model described in the clue, so as to build institute
State semantic tree:
Wherein, the L*Represent the model;The LiRepresent under the text semantic model, it is described that the clue is included
The text semantic feature of model, described i, j=1......M;The M represents the quantity of the model contained by the clue.
5. method according to claim 4, it is characterised in that described under the text semantic model, calculates the line
Semantic context similarity described in rope between model, specifically includes:
According to following formula calculated direction similarity:
Wherein, the Scos(Li,Lj) represent the direction similarity;The LjRepresent under the text semantic model, it is described
The text semantic feature of the model that clue is included, i < j, j=1......M;
Amplitude similarity is calculated according to following formula:
Wherein, the Sstr(Li,Lj) represent the amplitude similarity;
Calculated under the text semantic model according to following formula, the semantic context phase described in the clue between model
Like degree:
Wherein, the S (Li,Lj) represent the semantic context similarity between model described in the clue;The λ tables
Show the weight for controlling the direction similarity and the amplitude similarity.
6. method according to claim 5, the text semantic model is included with the word frequency TFIDF of falling document frequency as representative
Word bag model, probability potential applications index pLSI models and latent Dirichletal location LDA models;
Characterized in that, methods described also includes:
Each model is merged to be indexed in the word bag model with the word frequency TFIDF of falling document frequency as representative, probability potential applications respectively
Similarity under pLSI models and latent Dirichletal location LDA models, and fusion results are defined as institute in the clue
State the semantic context similarity between model.
7. method according to claim 1, it is characterised in that described according to unified interactive blending algorithm, by the note
The text semantic of son and the structural information of Web Community are merged, and the influence to the model is ranked up, and specifically includes:
The semantic tree built under the text semantic model is estimated on association in time influence Ordering and marking, is marked off
Part of semantic tree collection is trained, and obtains percentage contribution of the text semantic model to Ordering and marking;
The semantic tree of quality evaluation and structure according to the text semantic model, is carried out on association in time influence marking value
The fusion of semantic level, and fusion is weighted to semantic structure and reply structure.
8. method according to claim 7, it is characterised in that the institute's predicate to being built under the text semantic model
Justice tree is estimated on association in time influence Ordering and marking, marks off part of semantic tree collection and is trained, and obtains the text language
Adopted model is specifically included to the percentage contribution of Ordering and marking:
Merge the text semantic feature of the model under the text semantic model, the timestamp and the text
The corresponding quality vector of semantic model, the association in time sequence obtained by the text semantic model for obtaining the model is beaten
Point.
9. method according to claim 7, it is characterised in that the quality evaluation according to the text semantic model and
The semantic tree for building, the fusion of semantic level is carried out on association in time influence marking value, and to semantic structure and reply knot
Structure is weighted fusion, specifically includes:
The corresponding quality evaluation vector of the text semantic model is calculated by following formula:
Wherein, the q*Represent the quality evaluation vector;The q represents quality vector;The NtRepresent clue quantity;It is described
miRepresent model quantity;The yjRepresent other network users according to the help of itself to the model sequence that is given of value or
Marking;It is describedRepresent the marking of prediction;
Total association in time influence marking value is calculated according to following formula:
Trr=α q*trrse T+(1-α)trrst
Wherein, the trr represents total association in time influence marking value;The trrseRepresent that the text semantic model is obtained
The association in time Ordering and marking vector for arriving;The trrstRepresent described and reply the association in time Ordering and marking vector that structure is obtained;
The α represents the balance trrseWith the trrstBetween proportion parameter;
Model influence is ranked up according to total association in time influence marking value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249593.0A CN106886561A (en) | 2016-12-29 | 2016-12-29 | Web Community's model influence sort method based on association in time interaction fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249593.0A CN106886561A (en) | 2016-12-29 | 2016-12-29 | Web Community's model influence sort method based on association in time interaction fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106886561A true CN106886561A (en) | 2017-06-23 |
Family
ID=59175788
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611249593.0A Pending CN106886561A (en) | 2016-12-29 | 2016-12-29 | Web Community's model influence sort method based on association in time interaction fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106886561A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992596A (en) * | 2017-12-12 | 2018-05-04 | 百度在线网络技术(北京)有限公司 | A kind of Text Clustering Method, device, server and storage medium |
CN108717469A (en) * | 2018-06-11 | 2018-10-30 | 北京五八信息技术有限公司 | A kind of model sort method, device, equipment and computer readable storage medium |
CN110727794A (en) * | 2018-06-28 | 2020-01-24 | 上海传漾广告有限公司 | System and method for collecting and analyzing network semantics and summarizing and analyzing content |
CN110825939A (en) * | 2019-09-19 | 2020-02-21 | 五八有限公司 | Method and device for generating and sorting scores of posts, electronic equipment and storage medium |
CN114579739A (en) * | 2022-01-12 | 2022-06-03 | 中国电子科技集团公司第十研究所 | Topic detection and tracking method for text data stream |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010123264A2 (en) * | 2009-04-20 | 2010-10-28 | Konkuk University Industrial Cooperation Corp. | Online community post search method and apparatus based on interactions between online community users and computer readable storage medium storing program thereof |
CN102270240A (en) * | 2011-08-15 | 2011-12-07 | 哈尔滨工业大学 | Method for discovering hot views in network forum and analyzing evolvement trend thereof |
CN102637170A (en) * | 2011-02-10 | 2012-08-15 | 北京百度网讯科技有限公司 | Question pushing method and system |
CN104077354A (en) * | 2014-05-29 | 2014-10-01 | 小米科技有限责任公司 | Forum post heat determining method and related device thereof |
CN103488637B (en) * | 2012-06-11 | 2016-12-14 | 北京大学 | A kind of method carrying out expert Finding based on dynamics community's excavation |
-
2016
- 2016-12-29 CN CN201611249593.0A patent/CN106886561A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010123264A2 (en) * | 2009-04-20 | 2010-10-28 | Konkuk University Industrial Cooperation Corp. | Online community post search method and apparatus based on interactions between online community users and computer readable storage medium storing program thereof |
CN102637170A (en) * | 2011-02-10 | 2012-08-15 | 北京百度网讯科技有限公司 | Question pushing method and system |
CN102270240A (en) * | 2011-08-15 | 2011-12-07 | 哈尔滨工业大学 | Method for discovering hot views in network forum and analyzing evolvement trend thereof |
CN103488637B (en) * | 2012-06-11 | 2016-12-14 | 北京大学 | A kind of method carrying out expert Finding based on dynamics community's excavation |
CN104077354A (en) * | 2014-05-29 | 2014-10-01 | 小米科技有限责任公司 | Forum post heat determining method and related device thereof |
Non-Patent Citations (1)
Title |
---|
QIANG YOU等: "A Unified Fusion Framework for Time-Related Rank in Threaded Discussion Communities", 《SPRINGER INTERNATIONAL PUBLISHING SWITZERLAND 2014》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992596A (en) * | 2017-12-12 | 2018-05-04 | 百度在线网络技术(北京)有限公司 | A kind of Text Clustering Method, device, server and storage medium |
CN107992596B (en) * | 2017-12-12 | 2021-05-18 | 百度在线网络技术(北京)有限公司 | Text clustering method, text clustering device, server and storage medium |
CN108717469A (en) * | 2018-06-11 | 2018-10-30 | 北京五八信息技术有限公司 | A kind of model sort method, device, equipment and computer readable storage medium |
CN108717469B (en) * | 2018-06-11 | 2021-11-23 | 北京五八信息技术有限公司 | Post sorting method, device and equipment and computer readable storage medium |
CN110727794A (en) * | 2018-06-28 | 2020-01-24 | 上海传漾广告有限公司 | System and method for collecting and analyzing network semantics and summarizing and analyzing content |
CN110825939A (en) * | 2019-09-19 | 2020-02-21 | 五八有限公司 | Method and device for generating and sorting scores of posts, electronic equipment and storage medium |
CN110825939B (en) * | 2019-09-19 | 2023-10-13 | 五八有限公司 | Post score generation and ordering method and device, electronic equipment and storage medium |
CN114579739A (en) * | 2022-01-12 | 2022-06-03 | 中国电子科技集团公司第十研究所 | Topic detection and tracking method for text data stream |
CN114579739B (en) * | 2022-01-12 | 2023-05-30 | 中国电子科技集团公司第十研究所 | Topic detection and tracking method for text data stream |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chaturvedi et al. | Story comprehension for predicting what happens next | |
CN106886561A (en) | Web Community's model influence sort method based on association in time interaction fusion | |
Ge et al. | Explainable metaphor identification inspired by conceptual metaphor theory | |
CN112199608B (en) | Social media rumor detection method based on network information propagation graph modeling | |
US20160357854A1 (en) | Scenario generating apparatus and computer program therefor | |
CN107590134A (en) | Text sentiment classification method, storage medium and computer | |
CN108710680A (en) | It is a kind of to carry out the recommendation method of the film based on sentiment analysis using deep learning | |
Jimenez et al. | SOFTCARDINALITY: Hierarchical text overlap for student response analysis | |
CN108038725A (en) | A kind of electric business Customer Satisfaction for Product analysis method based on machine learning | |
CN107515873A (en) | A kind of junk information recognition methods and equipment | |
CN103793503A (en) | Opinion mining and classification method based on web texts | |
US20160321244A1 (en) | Phrase pair collecting apparatus and computer program therefor | |
CN103646088A (en) | Product comment fine-grained emotional element extraction method based on CRFs and SVM | |
CN108038205A (en) | For the viewpoint analysis prototype system of Chinese microblogging | |
CN106844632A (en) | Based on the product review sensibility classification method and device that improve SVMs | |
CN103995853A (en) | Multi-language emotional data processing and classifying method and system based on key sentences | |
CN106997341A (en) | A kind of innovation scheme matching process, device, server and system | |
CN109726745A (en) | A kind of sensibility classification method based on target incorporating description knowledge | |
CN109684636B (en) | Deep learning-based user emotion analysis method | |
Mallya et al. | Recurrent models for situation recognition | |
CN110458373A (en) | A kind of method of crime prediction and system of the fusion of knowledge based map | |
CN112699240A (en) | Intelligent dynamic mining and classifying method for Chinese emotional characteristic words | |
Sadr et al. | Unified topic-based semantic models: A study in computing the semantic relatedness of geographic terms | |
CN111353044A (en) | Comment-based emotion analysis method and system | |
CN108733652A (en) | The test method of film review emotional orientation analysis based on machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170623 |
|
WD01 | Invention patent application deemed withdrawn after publication |