CN104765757A - Micro-blog timing sequence ranking method based on heterogeneous network - Google Patents

Micro-blog timing sequence ranking method based on heterogeneous network Download PDF

Info

Publication number
CN104765757A
CN104765757A CN201410737709.XA CN201410737709A CN104765757A CN 104765757 A CN104765757 A CN 104765757A CN 201410737709 A CN201410737709 A CN 201410737709A CN 104765757 A CN104765757 A CN 104765757A
Authority
CN
China
Prior art keywords
microblogging
user
matrix
webpage
ranking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410737709.XA
Other languages
Chinese (zh)
Inventor
金海�
余辰
李瑞丹
姚德中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201410737709.XA priority Critical patent/CN104765757A/en
Publication of CN104765757A publication Critical patent/CN104765757A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

In order to solve the problems that the ranking node is single and the time characteristics are not considered sufficiently due to the fact that knowledge about learning other node types in the micro-blog ranking, the invention discloses a micro-blog timing sequence ranking method based on a heterogeneous network. Type-cross linking of webpage file associated with the micro-blog information is adopted, a mutual reinforcing relationship between different type main bodies is utilized in the ranking process to obtain the ranking quality which is higher than that in a mode of only depending on the micro-blog information network. Besides, in the ranking process, the interferences of the timing sequence information to ranking results are researched, and the ranking accuracy is improved by utilizing the life cycle characteristics of the micro-blog timing sequence.

Description

A kind of microblogging sequential arrangement method based on heterogeneous network
Technical field
The invention belongs to net application technology field, more specifically, relate to a kind of microblogging sequential arrangement method based on heterogeneous network.
Background technology
In recent years, along with internet scale fast development and apply widely, revolutionize the discovery of people, the mode of sharing information, the thing followed is the Exponential growth of the data volume that user produces.But these a large amount of data users are wanted effectively to utilize internet data cause very big inconvenience, rank problem etc. is had technology such as the search engine etc. of network analysis is arisen at the historic moment for internet brings new life and new operation mode.Rank becomes the hot issue analyzing internet and academia gradually as one of the most basic network analysis technique.The most basic object of sequence sorts to the expectation that information is liked according to user.Conventional arrangement method is the rank such as PageRank etc. based on figure, these method major parts all to focus in the whole network of hypothesis that all node types identical namely only there is a kind of node type, but real-life network all comprises multiple node type, such as, in DBLP network, include author, paper, meeting, keyword.Social the making of network wherein comprises a large amount of and abundant heterogeneous network resource.Compared to the node information only using a type, heterogeneous network can bring abundanter information, also bring more challenge: the observability between heterogeneous types node and complicacy be there are differences simultaneously, how to utilize the contact between them, the order models in script homogeneous network to be adapted to and in the node that is applied in other types in heterogeneous network environment exactly and task.Time is the major criterion weighing rank accuracy in addition, existing method mostly supposes that the concern of user in whole process and hobby are static not changing along with the change of network, or only extract the information paid close attention to recently, although the variation tendency that these methods can cover sequential provides information but can not meet the demand to real-time information, time conclusion.
The information of full and accurate, the reliable public concern of extraction filtration from these a large amount of information that user shares to content of microblog information rank basic goal.Sina's microblogging, as one of current most popular online short message intercommunion platform, provides that a large amount of fresh information comprises real time information, comment, chat, individual sigh with deep feeling every day and advertisement etc.In these information that user shares, have part to be user the site of the accident, near or get up source thus the first hand authentic communication provided, but the information of greater part is to the concern interest of event thus the individual view delivered for user.Therefore content of microblog itself has the advantages such as extensive, real-time, but also possess in a jumble simultaneously, not full and accurate reliability.
Based on above analysis, how to utilize the unbalancedness of dissimilar nodal information in heterogeneous network, order models in script homogeneous network is moved to the demand in the task of other types in isomerous environment, and in conjunction with the temporal aspect of heterogeneous network node, the demand of the impact of ordered pair ranking result during research.Thus the key message excavated on heterogeneous network interior joint, improve the precision of sequence.
Summary of the invention
The object of the present invention is to provide a kind of microblogging sequential arrangement method based on heterogeneous network, the method utilizes the life cycle of the transfer amount simulation microblogging of microblogging, thus obtain the weight of believable microblogging temporal aspect, and according to the similarity of microblogging and web page contents make microblogging and webpage interrelated, dissimilar node and webpage in recycling heterogeneous network, the unbalancedness of microblogging and user-to-user information, the mode of information flow is adopted the information flow between node to be propagated thus complementary enhancing, rank is obtained full and accurate, real-time micro-blog information accurately.
Microblogging sequential arrangement method based on heterogeneous network provided by the invention, comprises following steps:
Microblogging Heterogeneous Information network G=(V, E), wherein V={V d∪ V w∪ V urepresent that in network, all node set include collections of web pages V d, microblogging set V wv is gathered with user u, E={ (v i, v j) | v i, v j∈ V} represents relational links set between all nodes in network.
(1) filter principles according to four kinds and carry out noise filtering to microblog data content, wherein four kinds are filtered principles and comprise: content too briefly and do not comprise complete URL, content of microblog with the first person be start, the common saying that comprises in content of microblog and emoticon, the general format mentioning and forward in content of microblog;
(2) word division is carried out to the content of microblog after all filtrations, concentrate according to division result statistics microblog data the microblogging keyword related to, and according to popular keyword retrieval web document;
(3) initialization microblogging ranking matrix R wand page rank matrix R d, calculate webpage-web page text similar matrix M dand microblogging-microblogging text similar matrix M w, according to text similar matrix M d, M wmiddle webpage and webpage, relation between microblogging and microblogging adopt DivRank algorithm to carry out rank weight assignment to webpage and microblogging;
(4) initialising subscriber ranking matrix R u; Adopt the concern relational matrix M between user ufand User reliability matrix M uc, build user-user adjacency matrix M u; According to user-user adjacency matrix M urelation between middle user and user, adopts DivRank algorithm initialization user rank weight matrix R u;
(5) analyze the temporal aspect of microblogging according to the transfer amount of microblogging, adopt sigmoid curve microblogging sequential weight and microblogging life cycle, and upgrade microblogging rank weight R according to the sequential weight of microblogging w;
(6) webpage-microblogging incidence matrix M is calculated dwand microblogging-user-association matrix M wu, build webpage-microblogging-user's Heterogeneous Information network; For webpage-microblogging incidence matrix M dw, adopt the similarity of content of text make microblogging and webpage interrelated; For microblogging-user-association matrix M wuuser's issuing microblog within a period of time is then adopted to associate with the text similarity of this microblogging;
(7) utilize the unbalancedness of dissimilar nodal information in network, the information flow between node is propagated thus complementary enhancing; First carried out the rank R of more new web page to the information flow of webpage, user by microblogging dand the rank R of user u; Make the rank R of microblogging again to the information flow of microblogging according to webpage, user wupgraded;
(8) draw the ranking result of heterogeneous network microblogging, terminate.
By the above technical scheme that the present invention conceives, compared with prior art, the present invention has following beneficial effect:
(1) high precision: adopt in step (2) and collect web data collection, and in step (6) according to the similarity of text make microblogging and webpage interrelated.Thus make full use of the reliability of web document and high precision to excavate the key message on microblogging node, the accuracy of ranking result is improved.
(2) real-time: the sequential weight comprehensively analyzing microblogging in step (5) according to microblogging in the transfer amount of different time sections, and the sequential weight simulating microblogging.Make ranking result fully in conjunction with the temporal aspect of microblogging node in heterogeneous network, and adjust microblogging ranking result dynamically according to microblogging sequential weight.
(3) information balance: by adopting webpage in step (7), information flow between user and microblogging, thus dissimilar nodal information in balance heterogeneous network, and the order models in script homogeneous network is moved in the task of other types in isomerous environment.
Accompanying drawing explanation
Fig. 1 is the block flow diagram of heterogeneous network microblogging sequential arrangement method of the present invention;
Fig. 2 (1) is that 40 microbloggings of random selecting issue transfer amount and the distribution relation of time in rear different time sections;
Fig. 2 (2) is according to the transfer amount matching microblogging sequential weight of microblogging in different time sections.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.In addition, if below in described each embodiment of the present invention involved technical characteristic do not form conflict each other and just can mutually combine.
The information between existing microblogging rank interior joint that present invention is directed at is imperfect and cause the set of rank posterior nodal point single and to the inconsiderate full Problems existing of temporal characteristics, propose the method for a set of microblogging sequential rank based on heterogeneous network, it adopts linking across type of the web document relevant to micro-blog information, utilizes the relation of the mutual enhancing between dissimilar main body thus obtain than only leaning on the sequence quality that micro-blog information network itself is higher in the process of rank.In addition in the process of rank according to the impact of time sequence information on ranking result, utilize microblogging sequential life cycle characteristic to improve the accuracy of rank.
Heterogeneous network refers to: in network, type of subject relationship type that is multiple or main body is multiple.Such as: commercial product recommending network principal type has user, commodity etc., and subjective relationship type has user to buy commodity, commodity are bought by user; In DBLP network, type of subject has author, meeting, paper, keyword, and the relationship type of main body has author to publish thesis, paper is included keyword etc. by procceedings with in, paper.
The method flow framework of the microblogging sequential rank based on heterogeneous network provided by the invention as shown in Figure 1, and by reference to the accompanying drawings with embodiment the present invention will be further described as shown in Figure 1.Concrete steps are mainly as follows:
Microblogging Heterogeneous Information network G=(V, E), wherein V={V d∪ V w∪ V urepresent that in network, all node set include collections of web pages V d, microblogging set V wv is gathered with user u, E={ (v i, v j) | v i, v j∈ V} represents relational links set between all nodes in network.
(1) rank is carried out to content of microblog quantity of information, first according to four kinds filter principles as content too briefly and do not comprise complete URL, content of microblog with the first person be start, the common saying that comprises in content of microblog and emoticon, the general format mentioning and forward in content of microblog, noise filtering is carried out to microblog data content.
(2) word division is carried out to the content of microblog after all filtrations, concentrate according to division result statistics microblog data the microblogging keyword related to.And according to popular keyword retrieval web document.
(2-1) word division is carried out to content of microblog, and add up the microblogging hot topic keyword that microblog data concentrates front m (m is preset value, gets m=10 in embodiments of the present invention).
(2-2) adopt Google Search API to retrieve corresponding web document to the popular keyword of the microblogging of m before rank, thus collect web document data set.
(3) initialization microblogging ranking matrix R wand page rank matrix R d.First webpage-webpage, microblogging-microblogging text similar matrix M is calculated d, M w.According to text similar matrix M d, M wmiddle webpage and webpage, relation between microblogging and microblogging adopt DivRank algorithm to carry out rank weight assignment to webpage and microblogging.
(3-1) all adopt the mode of short text to carry out word division to webpage, content of microblog, and carry out common saying filtration, for microblogging or webpage w iwith microblogging or webpage w jthe term vector of content adopts text cosine similarity to calculate the text similar matrix M of webpage-webpage, microblogging-microblogging respectively d, M w, wherein for arbitrary text similar matrix M d, M w, have;
M ij = sim ( w i , w j ) Σ k sim ( w i , w k ) , sim ( w i , w j ) = w → i · w → j | | w → i | | · | | w → j | |
(3-2) to webpage-web page text similar matrix M d, microblogging-microblogging text similar matrix M wadopt random walk DivRank algorithm initialization page rank matrix R respectively dand microblogging ranking matrix R w.Adopt DivRank assignment rank for microblogging, α is fixing damping factor, then for each step redirect, or do lasting random walk with the probability of α, or jump to any one node at random with the probability of (1-α).DivRank considers importance and the diversity of data, therefore in each iterative process, introduces a dynamic transition matrix after z iteration, transition matrix is no longer static, but:
M z w = α · M z - 1 w · R z - 1 w + 1 - α | V w | E
Microblogging ranking matrix R wcomprise the rank weights of all microblogging nodes in network. be transposition.E is one to be had | V w| the matrix of individual element, each element value is 1, V wrepresent microblogging set all in network.And microblogging ranking matrix R wcalculating formula be:
R z w = α · [ M z w ] T · R z - 1 w + 1 - α | V w | E
For page rank matrix wherein v drepresent collections of web pages all in network.
(4) initialising subscriber ranking matrix R u.Adopt the concern relational matrix M between user ufand User reliability matrix M uc, build user-user adjacency matrix M u.According to user-user adjacency matrix M urelation between middle user and user, adopts DivRank algorithm initialization user rank weight matrix R u.
(4-1) set up user-user according to the concern relation between user and pay close attention to relational matrix M uf.Namely as user u ipay close attention to user u j, then an annexation (u is added i, u j) to relation function f ().By user u iand u jrelation function f (u i, u j) and user u jin-degree ∑ kf (u k, u j) as input, thus to concern relational matrix M ufin entrance carry out assignment.
M ij uf = f ( u i , u j ) Σ k f ( u k , u j ) , f ( u i , u j ) = 1 , ( u i , u j ) ∈ E u 0 , ( u i , u j ) ∉ E u
(4-2) in order to improve the accuracy of user's rank, User reliability problem is considered.And User reliability matrix M ucbe according to user in microblogging between interactive number of times calculate.Interaction (actions) in this patent between user includes three kinds as mentioned (mention), forwarding (repost) and comment (reply), i.e. actions ∈ { mention, repost, reply}.User u iand u jinteractive number of times include two types: one is by user u iproduce and with user u jrelevant interactive number of times actions_from_u i.Two is all users couple and user u in network jthe interactive number of times actions_of_u produced j.And using the ratio of these two kinds of interactive number of times as User reliability matrix M ucentrance.
M ij uc = arctions _ from _ u i actions _ of _ u j , actions ∈ { mention , repost , reply }
(4-3) synthetic user pays close attention to relational matrix M ufand User reliability matrix M ucobtain user-user adjacency matrix M u, i.e. M u=M ucm uf.For user-user adjacency matrix M urelated information between middle user, adopts DivRank algorithm initialization user rank R u.
R z u = α · [ M z u ] T · R z - 1 u + 1 - α | V u | E
Wherein α is fixing damping factor, be transposition, E is one to be had | V w| the matrix of individual element, each element value is 1, V urepresent user's set all in network, z represents the z time iteration.
(5) temporal aspect of microblogging is analyzed according to the transfer amount of microblogging.Adopt sigmoid curve microblogging sequential weight and microblogging life cycle.And upgrade microblogging rank weight R according to the sequential weight of microblogging w.
(5-1) the per hour transfer amount of all microbloggings after it is issued is added up according to microblogging forwarding data, and according to transfer amount and forwarding time interval do microblogging life cycle distribution plan, depict the statistics transfer amount of 40 microbloggings at different time spacer segment of Stochastic choice in Fig. 2 (1).
(5-2) transfer amount at all microblogging same time intervals is sued for peace, analyze Life cycle curve conversion trend, and adopt sigmoid curve to go out the life change rule of microblogging.According to the life cycle of microblogging transfer amount simulation microblogging in Fig. 2 (2).Custom parameter a, d and c are used for controlling curve horizontal level, and parameter b adjusts curve and increases mild speed.The sequential weight of microblogging after microblogging issues t hour account form is as follows:
M t life = a - d · exp c - b · t
(5-3) according to the dynamic time sequence weight of different time sections microblogging, thus the weight order matrix R of microblogging is constantly adjusted w, i.e. R w=R wm life.
(6) webpage-microblogging, microblogging-user-association matrix M is calculated dw, M wu, build webpage-microblogging-user's Heterogeneous Information network.For webpage-microblogging incidence matrix M dw, adopt the similarity of content of text make microblogging and webpage interrelated.For microblogging-user-association matrix M wuuser's issuing microblog within a period of time is then adopted to associate with the text similarity of this microblogging.
(6-1) webpage-microblogging incidence matrix M is calculated dw.First content of microblog term vector w is calculated jwith web page text term vector d icosine similarity sim (d between compute vector i, w j).Judge cosine similarity sim (d again i, w j) whether be greater than given threshold value δ.Be greater than then microblogging w jwith webpage d ibetween the degree of association be sim (d i, w j), otherwise be 0.Concrete formula is as follows.
M ij dw = sim ( d i , w j ) , if sim ( d i , w j ) > δ 0 , others
(6-2) microblogging-user-association matrix M is calculated wu.Counting user u jcontent of microblog set { the posted_by_u issued within nearest a period of time j.Then set { posted_by_u is got jin with microblogging w ithe cosine similarity maximal value max of content.And judge whether this maximal value max is greater than given threshold value δ.Be greater than then user u jwith microblogging w ithe degree of association be max, otherwise be 0.
M ij wu = max sim ( w i , w k ) w k ∈ { posted _ by _ u j } , if max sim ( w i , w k ) > δ 0 , others
(7) utilize the unbalancedness of dissimilar nodal information in network, the information flow between node is propagated thus complementary enhancing.First carried out the rank R of more new web page to the information flow of webpage and user by microblogging dand the rank R of user u.Make the rank R of microblogging again to the information flow of microblogging according to webpage and user wupgraded.
(7-1) carried out the rank R of more new web page to the information flow of webpage, user by microblogging dand the rank R of user u.Custom parameter λ d∈ [0,1], λ u∈ [0,1] is in order to balance microblogging debut ranking value and webpage, user information flows to the influence value of ranking result.Webpage-microblogging incidence matrix M dw, microblogging-user-association matrix M wuand webpage debut ranking matrix R d, microblogging debut ranking matrix R wwith user's debut ranking matrix R uas input, adopt the mode of information flow in kth time, concrete matrix Formal Representation as:
R d i d ( k + 1 ) = ( 1 - λ d ) R d i d ( k ) + λ d Σ w j ∈ V w M ij dw R w j w ( k )
R u i u ( k + 1 ) = ( 1 - λ u ) R u i u ( k ) + λ u Σ w j ∈ V w M lj uw R w j w ( k )
(7-2) utilize webpage, user to the information flow of microblogging to adjust the rank value R of microblogging w.And in adjacent twice iterative process of algorithm, when the difference calculating the rank weights of any two adjacent microblogging nodes is all less than certain given threshold value μ, is less than then algorithm iteration and stops.Otherwise judge to reach maximum iteration time θ, reach maximum iteration time then algorithm stopping.
R w j w ( k + 1 ) = ( 1 - λ d - λ u ) R w j w ( k ) + λ d Σ d i ∈ V d M ji wd R d i d ( k ) + λ u Σ u i ∈ V u M jl wu R u l u ( k )
(8) draw the ranking result of heterogeneous network microblogging, terminate.
Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1., based on a microblogging sequential arrangement method for heterogeneous network, it is characterized in that, said method comprising the steps of:
(1) filter principles according to four kinds and carry out noise filtering to microblog data content, wherein four kinds are filtered principles and comprise: content too briefly and do not comprise complete URL, content of microblog with the first person be start, the common saying that comprises in content of microblog and emoticon, the general format mentioning and forward in content of microblog;
(2) word division is carried out to the content of microblog after all filtrations, concentrate according to division result statistics microblog data the microblogging keyword related to, and according to popular keyword retrieval web document;
(3) initialization microblogging ranking matrix R wand page rank matrix R d, calculate webpage-web page text similar matrix M dand microblogging-microblogging text similar matrix M w, according to text similar matrix M d, M wmiddle webpage and webpage, relation between microblogging and microblogging adopt DivRank algorithm to carry out rank weight assignment to webpage and microblogging;
(4) initialising subscriber ranking matrix R u; Adopt the concern relational matrix M between user ufand User reliability matrix M uc, build user-user adjacency matrix M u; According to user-user adjacency matrix M urelation between middle user and user, adopts DivRank algorithm initialization user rank weight matrix R u;
(5) analyze the temporal aspect of microblogging according to the transfer amount of microblogging, adopt sigmoid curve microblogging sequential weight and microblogging life cycle, and upgrade microblogging rank weight R according to the sequential weight of microblogging w;
(6) webpage-microblogging incidence matrix M is calculated dwand microblogging-user-association matrix M wu, build webpage-microblogging-user's Heterogeneous Information network; For webpage-microblogging incidence matrix M dw, adopt the similarity of content of text make microblogging and webpage interrelated; For microblogging-user-association matrix M wuuser's issuing microblog within a period of time is then adopted to associate with the text similarity of this microblogging;
(7) utilize the unbalancedness of dissimilar nodal information in network, the information flow between node is propagated thus complementary enhancing; First carried out the rank R of more new web page to the information flow of webpage, user by microblogging dand the rank R of user u; Make the rank R of microblogging again to the information flow of microblogging according to webpage, user wupgraded;
(8) draw the ranking result of heterogeneous network microblogging, terminate.
2. the method for claim 1, is characterized in that, described step (2) specifically comprises:
(2-1) carry out word division to content of microblog, and add up the microblogging hot topic keyword that microblog data concentrates front m, wherein m is preset value;
(2-2) adopt Google Search API to retrieve corresponding web document to the popular keyword of the microblogging of m before rank, thus collect web document data set.
3. method as claimed in claim 1 or 2, it is characterized in that, described step (3) specifically comprises:
(3-1) all adopt the mode of short text to carry out word division to webpage, content of microblog, and carry out common saying filtration, for microblogging or webpage w iwith microblogging or webpage w jthe term vector of content adopts text cosine similarity to calculate the text similar matrix M of webpage-webpage, microblogging-microblogging respectively d, M w, wherein for arbitrary text similar matrix M d, M w, have;
M ij = sim ( w i , w j ) Σ k sim ( w i , w k ) , sim ( w i , w j ) = w → i · w → j | | w → i | | · | | w → j | |
(3-2) to webpage-web page text similar matrix M d, microblogging-microblogging text similar matrix M wadopt random walk DivRank algorithm initialization page rank matrix R respectively dand microblogging ranking matrix R w; Particularly, microblogging ranking matrix R z w = α · [ M z w ] T · R z - 1 w + 1 - α | V w | E , Wherein M z w = α · M z - 1 w · R z - 1 w + 1 - α | V w | E , Page rank matrix R z d = α · [ M z d ] T · R z - 1 d + 1 - α | V d | E , Wherein M z d = α · M z - 1 d · R z - 1 d + 1 - α | V d | E , α is fixing damping factor, be transposition, be transposition, E is one to be had | V w| the matrix of individual element, each element value is 1, V wrepresent microblogging set all in network, V drepresent collections of web pages all in network, z represents the z time iteration.
4. method as claimed in claim 1 or 2, it is characterized in that, described step (4) specifically comprises:
(4-1) set up user-user according to the concern relation between user and pay close attention to relational matrix M uf, namely as user u ipay close attention to user u j, then an annexation (u is added i, u j) to relation function f (), by user u iand u jrelation function f (u i, u j) and user u jin-degree Σ kf (u k, u j) as input, thus to concern relational matrix M ufin entrance carry out assignment,
M ij uf = f ( u i , u j ) Σ k f ( u k , u j ) , f ( u i , u j ) = 1 , ( u i , u j ) ∈ E u 0 , ( u i , u j ) ∉ E u
(4-2) User reliability matrix M uccalculate according to the interactive number of times between user in microblogging, interaction between user includes three kinds: mention (mention), forward (repost) and comment (reply), i.e. actions ∈ { mention, repost, reply}; User u iand u jinteractive number of times include two types: one is by user u iproduce and with user u jrelevant interactive number of times actions_from_u i, two is all users couple and user u in network jthe interactive number of times actions_of_u produced j; And using the ratio of these two kinds of interactive number of times as User reliability matrix M ucentrance;
M ij uc = actions _ form _ u i actions _ of _ u j , actions ∈ { mention , repost , reply }
(4-3) synthetic user pays close attention to relational matrix M ufand User reliability matrix M ucobtain user-user adjacency matrix M u, i.e. M u=M ucm uf; For user-user adjacency matrix M urelated information between middle user, adopts DivRank algorithm initialization user ranking matrix R u, user's ranking matrix R z u = α · [ M z u ] T · R z - 1 u + 1 - α | V u | E , Wherein M z u = α · M z - 1 u · R z - 1 u + 1 - α | V u | E , α is fixing damping factor, be transposition, E is one to be had | V w| the matrix of individual element, each element value is 1, V urepresent user's set all in network, z represents the z time iteration.
5. method as claimed in claim 1 or 2, it is characterized in that, described step (5) specifically comprises:
(5-1) according to microblogging forwarding data add up all microbloggings its issue after transfer amount per hour, and according to transfer amount and forwarding time interval do microblogging life cycle distribution plan;
(5-2) transfer amount at all microblogging same time intervals is sued for peace, analyze Life cycle curve conversion trend, and adopt sigmoid curve to go out the life change rule of microblogging; Custom parameter a, d and c are used for controlling curve horizontal level, and parameter b adjusts curve and increases mild speed; The sequential weight of microblogging after microblogging issues t hour account form is as follows:
M t life = a - b · exp c - b · t
(5-3) according to the dynamic time sequence weight of different time sections microblogging, thus the weight order matrix R of microblogging is constantly adjusted w, i.e. R w=R wm life.
6. method as claimed in claim 1 or 2, it is characterized in that, described step (6) specifically comprises:
(6-1) webpage-microblogging incidence matrix M is calculated dw: first calculate content of microblog term vector w jwith web page text term vector d icosine similarity sim (d between compute vector i, w j); Judge cosine similarity sim (d again i, w j) whether be greater than given threshold value δ; Be greater than then microblogging w jwith webpage d ibetween the degree of association be sim (d i, w j), otherwise be 0; Concrete formula is as follows:
M ij dw = sim ( d i , w j ) , ifsim ( d i , w j ) > δ 0 , others
(6-2) microblogging-user-association matrix M is calculated wu: counting user u jcontent of microblog set { the posted_by_u issued within nearest a period of time j; Then set { posted_by_u is got jin with microblogging w ithe cosine similarity maximal value max of content; And judge whether this maximal value max is greater than given threshold value δ.Be greater than then user u jwith microblogging w ithe degree of association be max, otherwise be 0; Concrete formula is as follows:
M ij wu = max sim ( w i , w k ) w k ∈ { posted _ by _ u j } , if max sim ( w i , w k ) > δ 0 , others .
7. method as claimed in claim 1 or 2, it is characterized in that, described step (7) specifically comprises:
(7-1) carried out the rank R of more new web page to the information flow of webpage, user by microblogging dand the rank R of user u, custom parameter λ d∈ [0,1], λ u∈ [0,1] is in order to balance microblogging debut ranking value and webpage, user information flows to the influence value of ranking result; Webpage-microblogging incidence matrix M dw, microblogging-user-association matrix M wuand webpage debut ranking weight R d, microblogging debut ranking R wwith user's debut ranking weight R uas input, adopt the mode of information flow in kth time, concrete matrix Formal Representation as:
R d i d ( k + 1 ) = ( 1 - λ d ) R d i d ( k ) + λ d Σ w j ∈ V w M ij dw R w j w ( k )
R u i u ( k + 1 ) = ( 1 - λ u ) R u i u ( k ) + λ u Σ w j ∈ V w M ij uw R w j w ( k )
(7-2) utilize webpage, user to the information flow of microblogging to adjust the rank value R of microblogging w; And in adjacent twice iterative process of algorithm, when the difference calculating the ranking result of any two adjacent microblogging nodes is all less than certain given threshold value μ, is less than then algorithm iteration and stops.Otherwise judge to reach maximum iteration time θ, arrive then algorithm iteration and stop.
R w j w ( k + 1 ) = ( 1 - λ d - λ u ) R w j w ( k ) + λ d Σ d i ∈ V d M ji wd R d i d ( k ) + λ u Σ u i ∈ V u M jl wu R u l u ( k )
8. method as claimed in claim 1 or 2, it is characterized in that, the m value in described step (2) is 10.
CN201410737709.XA 2014-12-05 2014-12-05 Micro-blog timing sequence ranking method based on heterogeneous network Pending CN104765757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410737709.XA CN104765757A (en) 2014-12-05 2014-12-05 Micro-blog timing sequence ranking method based on heterogeneous network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410737709.XA CN104765757A (en) 2014-12-05 2014-12-05 Micro-blog timing sequence ranking method based on heterogeneous network

Publications (1)

Publication Number Publication Date
CN104765757A true CN104765757A (en) 2015-07-08

Family

ID=53647590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410737709.XA Pending CN104765757A (en) 2014-12-05 2014-12-05 Micro-blog timing sequence ranking method based on heterogeneous network

Country Status (1)

Country Link
CN (1) CN104765757A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320647A (en) * 2015-12-04 2016-02-10 北京邮电大学 User characteristic modeling method based on character interaction behaviors
CN110245757A (en) * 2019-06-14 2019-09-17 上海商汤智能科技有限公司 A kind of processing method and processing device of image pattern, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477536A (en) * 2008-12-30 2009-07-08 华中科技大学 Scientific and technical literature entity integrated ranking method based on associating network
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103123649A (en) * 2013-01-29 2013-05-29 广州一找网络科技有限公司 Method and system for searching information based on micro blog platform
CN103745000A (en) * 2014-01-24 2014-04-23 福州大学 Hot topic detection method of Chinese micro-blogs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477536A (en) * 2008-12-30 2009-07-08 华中科技大学 Scientific and technical literature entity integrated ranking method based on associating network
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103123649A (en) * 2013-01-29 2013-05-29 广州一找网络科技有限公司 Method and system for searching information based on micro blog platform
CN103745000A (en) * 2014-01-24 2014-04-23 福州大学 Hot topic detection method of Chinese micro-blogs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN YU 等: ""Temporal-Based Ranking in Heterogeous Networks"", 《11TH IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING(NPC)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320647A (en) * 2015-12-04 2016-02-10 北京邮电大学 User characteristic modeling method based on character interaction behaviors
CN105320647B (en) * 2015-12-04 2018-05-15 北京邮电大学 A kind of user characteristics modeling method based on word interbehavior
CN110245757A (en) * 2019-06-14 2019-09-17 上海商汤智能科技有限公司 A kind of processing method and processing device of image pattern, electronic equipment and storage medium
CN110245757B (en) * 2019-06-14 2022-04-01 上海商汤智能科技有限公司 Image sample processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102394798B (en) Multi-feature based prediction method of propagation behavior of microblog information and system thereof
CN109241440A (en) It is a kind of based on deep learning towards implicit feedback recommended method
CN102332006B (en) A kind of information push control method and device
CN107169873B (en) Multi-feature fusion microblog user authority evaluation method
CN106484764A (en) User's similarity calculating method based on crowd portrayal technology
CN104239385A (en) Method for estimating relationships between topics, and system
CN106776881A (en) A kind of realm information commending system and method based on microblog
Prathapan et al. Effectiveness of digital marketing: Tourism websites comparative analytics based on AIDA model
CN105794154A (en) System and method for analyzing and transmitting social communication data
CN106503014A (en) A kind of recommendation methods, devices and systems of real time information
CN104008203A (en) User interest discovering method with ontology situation blended in
CN103793476A (en) Network community based collaborative filtering recommendation method
CN102760128A (en) Telecommunication field package recommending method based on intelligent customer service robot interaction
CN103116611A (en) Social network opinion leader identification method
CN102646122B (en) Automatic building method of academic social network
CN105095267A (en) User involving project recommendation method and apparatus
CN103544188A (en) Method and device for pushing mobile internet content based on user preference
CN102033883A (en) Method, device and system for improving data transmission speed of website
CN104134159A (en) Method for predicting maximum information spreading range on basis of random model
CN103810184A (en) Method for determining web page address velocity, optimization method and device of methods
CN103136253A (en) Method and device of acquiring information
CN105095625B (en) Clicking rate prediction model method for building up, device and information providing method, system
CN105550275A (en) Microblog forwarding quantity prediction method
CN103631862B (en) Event characteristic evolution excavation method and system based on microblogs
CN103136331A (en) Micro blog network opinion leader identification method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150708