CN107451703A - A kind of social networks multitask Forecasting Methodology based on factor graph model - Google Patents

A kind of social networks multitask Forecasting Methodology based on factor graph model Download PDF

Info

Publication number
CN107451703A
CN107451703A CN201710770816.6A CN201710770816A CN107451703A CN 107451703 A CN107451703 A CN 107451703A CN 201710770816 A CN201710770816 A CN 201710770816A CN 107451703 A CN107451703 A CN 107451703A
Authority
CN
China
Prior art keywords
mrow
msub
network
msubsup
msup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710770816.6A
Other languages
Chinese (zh)
Inventor
张子柯
林松
刘闯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Normal University
Original Assignee
Hangzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Normal University filed Critical Hangzhou Normal University
Priority to CN201710770816.6A priority Critical patent/CN107451703A/en
Publication of CN107451703A publication Critical patent/CN107451703A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0217Discounts or incentives, e.g. coupons or rebates involving input on products or services in exchange for incentives or rewards
    • G06Q30/0218Discounts or incentives, e.g. coupons or rebates involving input on products or services in exchange for incentives or rewards based on score
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of social networks multitask Forecasting Methodology based on factor graph model, comprises the following steps:The first step, Network Data Capture, specifically include network data crawl, data prediction;Second step, multitask factor graph model is established, specifically include network characterization extraction, network migration structure structure, factor graph model construction;3rd step, prediction result are assessed.

Description

A kind of social networks multitask Forecasting Methodology based on factor graph model
Technical field
The present invention relates to machine learning method, factor graph model, personalized recommendation technology and network migration structure structure skill Art, suitable for solving multitask link forecasting problem heterogeneous network.
Background technology
With the progress of Internet technology and the popularization of online social networks, network size constantly expands, and people are faced Network also from simple human relation network to coupling social networks transition.Coupling social networks generally has complicated net Network structure, node (such as user and commodity) and polytype side (such as social link and scoring chain comprising polymorphic type Connect).Link prediction research in traditional coupling social networks is concentrated mainly on social link prediction or scoring link is pre- Survey, it is generally recognized that be separate between different types of link prediction task.But the network in real world, this Two kinds of predictions are often related, such as, they are more likely to purchase or evaluated identical business if two people are friends Product, if likewise, two people often buy or evaluated same commodity, they more likely have similar interest and Hobby, and there is more maximum probability to turn into friend.Therefore, how by building network migration structure, binding factor graph model, make more Kind prediction task produces association by information flow, has epochmaking reason to the heterogeneous link forecasting research in complex network By and practical significance.
Existing link prediction thinking is first calculating network node diagnostic, such as in-degree, out-degree, cluster coefficients etc.;Network is opened up Architectural feature, such as common neighbours' index, AA indexs, Salton indexs, Jaccard indexs, HPI indexs etc. are flutterred, then more Kind feature integratesR represents integrated result, x in formulaiRepresent the ith feature of extraction, wiRepresent The weight of ith feature, finally it is brought into existing machine learning model, weight vectors w is obtained by training.
However, due to social link prediction and the difference of scoring link prediction network structure, the feature of extraction is also different, Therefore need to learn multiple models.Present factor graph model thinks more in social networks in the application of link prediction technology Between kind prediction task is independent, and doing a variety of predictions needs to train different models, and there is presently no by excavating network Structure is migrated to do the link prediction of multitask.
Existing Predicting Technique does not solve Sparse sex chromosome mosaicism, does not make full use of the structural information of network, it is impossible to Adapt to the link prediction of multitask;
Multitask prediction needs to learn multiple machine learning models, and computational efficiency is low;Do not fully take into account a variety of pre- Coupling facilitation between survey task.
The content of the invention
The present invention will overcome the disadvantages mentioned above of prior art, there is provided a kind of more of social networks based on factor graph model Business Forecasting Methodology.
The present invention is a kind of social networks multitask Forecasting Methodology based on factor graph model, its flow chart such as Fig. 1 institutes Show.This method, by building network migration structure, is made more by the use of traditional complex network link Forecasting Methodology as feature Information flow occurs between kind prediction task, intercouples, solves the problems, such as that Deta sparseness, computational efficiency are low, simultaneously The accuracy of prediction result is improved, this method includes following steps.
The first step, Network Data Capture:User social contact information and behavioural information are collected by web crawlers, and to crawling Data cleared up, it is convenient follow-up to calculate, it is main to be crawled including network data and data prediction.
(1) network data crawls:Crawl the behavioural information of user social contact behavioural information and user to commodity, every information Including:User UserID and user UserID, user UserID and commodity ItemID.
(2) data prediction:In order to facilitate follow-up calculating, it is necessary to clear up redundancy in data, incomplete data, formed Unified user and user social contact behavioural matrix w required for model1, user and commodity scoring behavioural matrix w2.In matrix w1 In, element w in matrix1 ijRepresent that (it can also be unidirectionally double that can be to the relations such as the good friend between user i and user j, concern To), in matrix w2In, element w in matrix2 ijThe relations such as collection, purchase, evaluation between expression user i and commodity j.
Second step, establish multitask factor graph model:
(1) network characterization extracts:Factor graph model is a supervised learning model, it would be desirable to using different in network Structure information is social link and scoring link extraction feature.For a specific node i in social networks, we can be with Extract node diagnostic, including degree k (vi), out-degree kout(vi), in-degree kin(vi), cluster coefficients ci.For social networks node pair I and j, similarity indices are to predict its maximally related feature whether connected in a network.Therefore, we are extracted some biographies The similarity indices of system are as feature.
Crossover network (the scoring relation of user and commodity) the also under cover information of social networks node pair, such as they The commodity commented on jointly are more, then the possibility that they are friends is bigger.Based on this, we extract according to crossover network Similarity indices.Similar, for crossover network destination node to user i and commodity a, we can also refer to according to similitude Mark to extract feature.
(2) network migration structure is built:It is the factor important in factor graph model to migrate structure, and the label information on side can Can be migrated in structure.We build migration structure with triangle in this work so that information can be Inside social networks, propagated between social networks and crossover network.Structure is migrated as shown in Fig. 2-1- Fig. 2-18.
(3) factor graph model construction:Coupling network G=(GS,GC) a social networks G can be divided intoSWith an intersection Network GC, our target is one model of study while predicts that potential social link and scoring link that (such as Fig. 3 b's is orange Dotted line)
For the node in network to eij, we use label yeIts state is represented, works as yeDeposited between=1 expression node pair In a line, work as yeSide is not present between=0 expression node pair.The label y of final mask outpute=1 probability P (ye=1).
(a) joint probability distribution
For coupling social networks G=(V, E, X), V={ viRepresent set of node, E={ eijNode is represented to gathering,It is an attribute matrix, node is represented to e per a lineijCorresponding attribute vector, our target are estimations every Probability P (the y that unknown link is formede|xe).We represent the joint probability distribution of network with P (Y | X, G), and G represents the institute of network There is information.This joint probability distribution show the label of link not only with the local attribute of node pair about also and network knot Structure is relevant, and joint probability distribution can be instantiated as:
Wherein, d and d' represents the characteristic dimension of social networks and crossover network, x respectivelyeiRepresent node to i-th of e Property value, ESThe node in social networks is represented to set, ECThe node on crossover network is represented to setRepresent In attribute in social networksUnder the conditions ofProbability,Represent in crossover network in attributeUnder the conditions of Probability, P (Yε) represent to migrate the influence of structure, Π represents the species of migration structure, and π represents a type of migration structure, ε represents one of migration structure.
(b) factor is instantiated
In principle, Attribute Association characteristic function and Social Relation characteristic function can be instantiated by different modes.I It is modeled using the Hammersley-Clifford theories in markov random file here:
fi(*)、gi(*)hε(*) be respectively social networks, crossover network, migrate structure characteristic function, αi、βi、γεIt is Its corresponding weight, Z1、Z2、Z3For normalization factor.
(c) objective function optimization
With reference to above-mentioned formula, last we obtain object function:
Wherein, Z=Z1Z2Z3For normalization factor.
With the method for stochastic gradient descent, the gradient of each parameter can be obtained:
With E [hε(Yε)] respectively represent data distribution function hε(Yε) expectation,WithIt is according to estimation model In Pα, beta, gammaExpectation under (Y | X, G) distribution.
3rd step, prediction result are assessed
The index for weighing this method validity has totally 3 kinds of AUC, Precision and Ranking Score.They are to prediction The emphasis that accuracy is weighed is different:AUC(area under the receiver operating characteristic Curve) the accuracy of measure algorithm on the whole.Precision only considers whether the side of L positions before coming is predicted accurately.And Ranking Score are more considered to the sequence on the side predicted.
AUC can be understood as in test set while fractional value have be not present than randomly selected one while point The high probability of numerical value, that is to say, that choose a line from test set at random every time and carried out with the randomly selected side being not present Compare, if in test set while fractional value be more than be not present while fractional value, just plus 1 point;If two fractional value phases Deng with regard to adding 0.5 point.Independently compare n times, if in the secondary test sets of n ' while fractional value be more than be not present while point Number, there is that secondary two fractional values of n " are equal, then AUC is defined as:
Obviously, if all fractions all randomly generate, AUC=0.5.Therefore degree of the AUC more than 0.5 is weighed Algorithm is to what extent more accurate than randomly selected method.
Precision is defined as being predicted accurate ratio in first L prediction side.If m prediction is accurate, i.e., There are m before coming in L side in test set, then Precision is defined as:
Obviously, the bigger predictions of Precision are more accurate.If two algorithm AUC are identical, and the Precision of algorithm 1 More than algorithm 2, illustrate that algorithm 1 is more preferable because its tend to really connect the node on side to coming before.
Ranking Score mainly consider position of the side in final sequence in test set.Make H=U-ETTo be unknown While set (equivalent in test set while and the set on side that is not present), riRepresent the rows of unknown side i ∈ E in the ranking Name.Then the Ranking Score values on the unknown side of this are RSi=ri/ | H |, wherein | H | represent element in set H it is individual several times All sides in test set are gone through, the Ranking Score values for obtaining system are:
The key of the present invention is by building network migration structure, and binding factor graph model, is made between multi-type network Information flows, so as to realize multitask link prediction in heterogeneous network.
The present invention has it in network characterization extraction, network migration structure structure, multitask factor graph model construction etc. Feature.
It is an advantage of the invention that:Due to taking data crawling method, thus can efficiently, comprehensively search dependency number According to.Proof analysis community network user individual behavior and interbehavior, disclose the phase between user's dissemination and network structure Mutually influence.We portray the inherent coupling between more prediction tasks by network migration structure, and the preference of user will influence The formation of interaction of the user in social networks, social networks will also influence the preference of user.Based on this, we use more Business factor graph model, allows information to be propagated between network internal, network, solves multitask forecasting problem simultaneously and solves Determine Sparse sex chromosome mosaicism.
Brief description of the drawings
Fig. 1 is the flow chart of the inventive method.
Fig. 2-1- Fig. 2-10 represents user-user-user and migrates structure.Figure Fig. 2-11- Fig. 2-18 represent user-commodity- User migrates structure.
Fig. 3 a are the social networks examples of coupling;Fig. 3 b are that coupling social networks can be divided into social networks and crossing net Network;Fig. 3 c are migration topology examples;Fig. 3 d are output:Social networks and the probability on scoring network missing side.
Fig. 3 a~Fig. 3 d are the schematic diagrames of multitask link prediction in coupling network of the present invention.
Embodiment
Technical scheme is further illustrated below in conjunction with the accompanying drawings.
The present invention is a kind of social networks multitask Forecasting Methodology based on factor graph model, its flow chart such as Fig. 1 institutes Show.This method, by building network migration structure, is made more by the use of traditional complex network link Forecasting Methodology as feature Information flow occurs between kind prediction task, intercouples, solves the problems, such as that Deta sparseness, computational efficiency are low, simultaneously The accuracy of prediction result is improved, this method includes following steps.
The first step, Network Data Capture:User social contact information and behavioural information are collected by web crawlers, and to crawling Data cleared up, it is convenient follow-up to calculate, it is main to be crawled including network data and data prediction.
(1) network data crawls:Crawl the behavioural information of user social contact behavioural information and user to commodity, every information Including:User UserID and user UserID, user UserID and commodity ItemID.
(2) data prediction:In order to facilitate follow-up calculating, it is necessary to clear up redundancy in data, incomplete data, formed Unified user and user social contact behavioural matrix w required for model1, user and commodity scoring behavioural matrix w2.In matrix w1 In, element w in matrix1 ijRepresent that (it can also be unidirectionally double that can be to the relations such as the good friend between user i and user j, concern To), in matrix w2In, element w in matrix2 ijThe relations such as collection, purchase, evaluation between expression user i and commodity j.
Second step, establish multitask factor graph model:
(1) network characterization extracts:Factor graph model is a supervised learning model, it would be desirable to using different in network Structure information is social link and scoring link extraction feature.For a specific node i in social networks, we can be with Extract node diagnostic, including degree k (vi), out-degree kout(vi), in-degree kin(vi), cluster coefficients ci.For social networks node pair I and j, similarity indices are to predict its maximally related feature whether connected in a network.Therefore, we are extracted some biographies The similarity indices of system are as feature.
Crossover network (the scoring relation of user and commodity) the also under cover information of social networks node pair, such as they The commodity commented on jointly are more, then the possibility that they are friends is bigger.Based on this, we extract according to crossover network Similarity indices.Similar, for crossover network destination node to user i and commodity a, we can also refer to according to similitude Mark to extract feature.
(2) network migration structure is built:It is the factor important in factor graph model to migrate structure, and the label information on side can Can be migrated in structure.We build migration structure with triangle in this work so that information can be Inside social networks, propagated between social networks and crossover network.Structure is migrated as shown in Fig. 2-1- Fig. 2-18.
(3) factor graph model construction:Coupling network G=(GS,GC) a social networks G can be divided intoSWith an intersection Network GC, our target is one model of study while the potential social link of prediction and scoring link (such as Fig. 3 (b) orange Color dotted line)
For the node in network to eij, we use label yeIts state is represented, works as yeDeposited between=1 expression node pair In a line, work as yeSide is not present between=0 expression node pair.The label y of final mask outpute=1 probability P (ye=1).
(a) joint probability distribution
For coupling social networks G=(V, E, X), V={ viRepresent set of node, E={ eijNode is represented to gathering,It is an attribute matrix, node is represented to e per a lineijCorresponding attribute vector, our target are estimations every Probability P (the y that unknown link is formede|xe).We represent the joint probability distribution of network with P (Y | X, G), and G represents the institute of network There is information.This joint probability distribution show the label of link not only with the local attribute of node pair about also and network knot Structure is relevant, and joint probability distribution can be instantiated as:
Wherein, d and d' represents the characteristic dimension of social networks and crossover network, x respectivelyeiRepresent node to i-th of e Property value,Represent in social networks in attributeUnder the conditions ofProbability,Represent in crossover network In attributeUnder the conditions ofProbability, P (Yε) represent to migrate the influence of structure, π represents a type of migration structure.
(b) factor is instantiated
In principle, Attribute Association characteristic function and Social Relation characteristic function can be instantiated by different modes.I It is modeled using the Hammersley-Clifford theories in markov random file here:
fi(*)、gi(*)hε(*) be respectively social networks, crossover network, migrate structure characteristic function, αi、βi、γεIt is Its corresponding weight, Z1、Z2、Z3For normalization factor.
(c) objective function optimization
With reference to above-mentioned formula, last we obtain object function:
Wherein, Z=Z1Z2Z3For normalization factor.
With the method for stochastic gradient descent, the gradient of each parameter can be obtained:
With E [hε(Yε)] respectively represent data distribution function hε(Yε) expectation,WithIt is according to estimation model In Pα, beta, gammaExpectation under (Y | X, G) distribution.
3rd step, prediction result are assessed
The index for weighing this method validity has totally 3 kinds of AUC, Precision and Ranking Score.They are to prediction The emphasis that accuracy is weighed is different:AUC(area under the receiver operating characteristic Curve) the accuracy of measure algorithm on the whole.Precision only considers whether the side of L positions before coming is predicted accurately.And Ranking Score are more considered to the sequence on the side predicted.
AUC can be understood as in test set while fractional value have be not present than randomly selected one while point The high probability of numerical value, that is to say, that choose a line from test set at random every time and carried out with the randomly selected side being not present Compare, if in test set while fractional value be more than be not present while fractional value, just plus 1 point;If two fractional value phases Deng with regard to adding 0.5 point.Independently compare n times, if in the secondary test sets of n ' while fractional value be more than be not present while point Number, there is that secondary two fractional values of n " are equal, then AUC is defined as:
Obviously, if all fractions all randomly generate, AUC=0.5.Therefore degree of the AUC more than 0.5 is weighed Algorithm is to what extent more accurate than randomly selected method.
Precision is defined as being predicted accurate ratio in first L prediction side.If m prediction is accurate, i.e., There are m before coming in L side in test set, then Precision is defined as:
Obviously, the bigger predictions of Precision are more accurate.If two algorithm AUC are identical, and the Precision of algorithm 1 More than algorithm 2, illustrate that algorithm 1 is more preferable because its tend to really connect the node on side to coming before.
Ranking Score mainly consider position of the side in final sequence in test set.Make H=U-ETTo be unknown While set (equivalent in test set while and the set on side that is not present), riRepresent the rows of unknown side i ∈ E in the ranking Name.Then the Ranking Score values on the unknown side of this are RSi=ri/ | H |, wherein | H | represent element in set H it is individual several times All sides in test set are gone through, the Ranking Score values for obtaining system are:

Claims (1)

1. a kind of social networks multitask Forecasting Methodology based on factor graph model, comprises the following steps:
The first step, Network Data Capture:User social contact information and behavioural information, and the data to crawling are collected by web crawlers Cleared up, it is convenient follow-up to calculate, it is main to be crawled including network data and data prediction.
(11) network data crawls:The behavioural information of user social contact behavioural information and user to commodity is crawled, every information includes: User UserID and user UserID, user UserID and commodity ItemID.
(12) data prediction:In order to facilitate follow-up calculating, it is necessary to clear up redundancy in data, incomplete data, model is formed Required unified user and user social contact behavioural matrix w1, user and commodity scoring behavioural matrix w2.In matrix w1In, square Element w in battle array1 ijGood friend, concern relation between expression user i and user j, in matrix w2In, element w in matrix2 ijRepresent to use Collection, purchase between family i and commodity j, evaluation relation.
Second step, establish multitask factor graph model:
(21) network characterization extracts:Factor graph model is a supervised learning model, it is necessary to be using the Heterogeneous Information in network Social activity link and scoring link extraction feature.For a specific node i in social networks, the feature of node is extracted, is wrapped Degree of including k (vi), out-degree kout(vi), in-degree kin(vi), cluster coefficients ci.For social networks node to i and j, similarity indices It is to predict its maximally related feature whether connected in a network.Therefore, some traditional similarity indices are extracted as special Sign.
Also under cover the information of social networks node pair, two users comment crossover network (the scoring relation of user and commodity) jointly The commodity of opinion are more, then the possibility that they are friends is bigger.Based on this, some similarity indices are extracted according to crossover network. Similar, for crossover network destination node to user i and commodity a, feature is extracted according to above-mentioned similarity indices.
(22) network migration structure is built:Migration structure is the factor important in factor graph model, and the label information on side can be It can be migrated in structure.Migration structure is built with triangle so that information can be in social networks in this work Inside, propagated between social networks and crossover network.
(23) factor graph model construction:Coupling network G=(Gs, GC) a social networks G can be divided intosWith a crossover network GC, target is one model of study while the potential social link of prediction and scoring link
For the node in network to eij, with label yeIts state is represented, works as yeA line be present between=1 expression node pair, when yeSide is not present between=0 expression node pair.The label y of final mask outpute=1 probability P (ye=1).
(a) joint probability distribution
For coupling social networks G=(V, E, X), V={ viRepresent set of node, E={ eijNode is represented to gathering,It is an attribute matrix, node is represented to e per a lineijCorresponding attribute vector, target are every unknown chains of estimation Probability P (the y that road is formede|xe).The joint probability distribution of network is represented with P (Y | X, G), G represents all information of network.It is this Joint probability distribution show the label of link not only with the local attribute of node pair about also and network structure it is relevant, joint is general Rate distribution can be instantiated as:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>Y</mi> <mo>|</mo> <mi>X</mi> <mo>,</mo> <mi>G</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&amp;Pi;</mo> <mrow> <mi>e</mi> <mo>&amp;Element;</mo> <msup> <mi>E</mi> <mi>S</mi> </msup> </mrow> </munder> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mi>e</mi> <mi>s</mi> </msubsup> <mo>|</mo> <msubsup> <mi>x</mi> <mrow> <mi>e</mi> <mi>i</mi> </mrow> <mi>s</mi> </msubsup> <mo>)</mo> </mrow> <munder> <mo>&amp;Pi;</mo> <mrow> <mi>e</mi> <mo>&amp;Element;</mo> <msup> <mi>E</mi> <mi>C</mi> </msup> </mrow> </munder> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>d</mi> <mo>&amp;prime;</mo> </msup> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mi>e</mi> <mi>c</mi> </msubsup> <mo>|</mo> <msubsup> <mi>x</mi> <mrow> <mi>e</mi> <mi>i</mi> </mrow> <mi>c</mi> </msubsup> <mo>)</mo> </mrow> <munder> <mo>&amp;Pi;</mo> <mrow> <mi>&amp;pi;</mi> <mo>&amp;Element;</mo> <mo>&amp;Pi;</mo> </mrow> </munder> <munder> <mo>&amp;Pi;</mo> <mrow> <mi>&amp;epsiv;</mi> <mo>&amp;Element;</mo> <mi>&amp;pi;</mi> </mrow> </munder> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>Y</mi> <mi>&amp;epsiv;</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein, d and d ' represents the characteristic dimension of social networks and crossover network, x respectivelyeiRepresent ith attribute of the node to e Value, ESThe node in social networks is represented to set, ECThe node on crossover network is represented to setRepresent social network In attribute in networkUnder the conditions ofProbability,Represent in crossover network in attributeUnder the conditions ofProbability, P (Yε) represent to migrate the influence of structure, Π represents the species of migration structure, and π represents a type of migration structure, and ε is represented wherein One migration structure.
(b) factor is instantiated
In principle, Attribute Association characteristic function and Social Relation characteristic function can be instantiated by different modes.Here adopt It is modeled with the Hammersley-Clifford theories in markov random file:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mi>e</mi> <mi>s</mi> </msubsup> <mo>|</mo> <msubsup> <mi>x</mi> <mrow> <mi>e</mi> <mi>i</mi> </mrow> <mi>s</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>Z</mi> <mn>1</mn> </msub> </mfrac> <mi>exp</mi> <mo>{</mo> <msub> <mi>&amp;alpha;</mi> <mi>i</mi> </msub> <msub> <mi>f</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mrow> <mi>e</mi> <mi>i</mi> </mrow> <mi>s</mi> </msubsup> <mo>,</mo> <msubsup> <mi>y</mi> <mi>e</mi> <mi>s</mi> </msubsup> <mo>)</mo> </mrow> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mi>e</mi> <mi>c</mi> </msubsup> <mo>|</mo> <msubsup> <mi>x</mi> <mrow> <mi>e</mi> <mi>i</mi> </mrow> <mi>c</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>Z</mi> <mn>2</mn> </msub> </mfrac> <mi>exp</mi> <mo>{</mo> <msub> <mi>&amp;beta;</mi> <mi>i</mi> </msub> <msub> <mi>g</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mrow> <mi>e</mi> <mi>i</mi> </mrow> <mi>c</mi> </msubsup> <mo>,</mo> <msubsup> <mi>y</mi> <mi>e</mi> <mi>c</mi> </msubsup> <mo>)</mo> </mrow> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>Y</mi> <mi>&amp;epsiv;</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>Z</mi> <mn>3</mn> </msub> </mfrac> <mi>exp</mi> <mo>{</mo> <msub> <mi>&amp;gamma;</mi> <mi>&amp;epsiv;</mi> </msub> <msub> <mi>h</mi> <mi>&amp;epsiv;</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>Y</mi> <mi>&amp;epsiv;</mi> </msub> <mo>)</mo> </mrow> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
fi(*)、gi(*)hε(*) be respectively social networks, crossover network, migrate structure characteristic function, αi、βi、γεIt is corresponding Its weight, Z1、Z2、Z3For normalization factor.
(c) objective function optimization
With reference to above-mentioned formula, object function is finally obtained:
Wherein, Z=Z1Z2Z3For normalization factor.
With the method for stochastic gradient descent, the gradient of each parameter can be obtained:
With E [hε(Yε)] respectively represent data distribution function hε(Yε) expectation,WithIt is according to estimation model In Pα, beta, gammaExpectation under (Y | X, G) distribution.
3rd step, prediction result are assessed:
The index for weighing this method validity has totally 3 kinds of AUC, Precision and Ranking Score.They are accurate to predicting It is different to spend the emphasis weighed:AUC(area under the receiver operating characteristic Curve) the accuracy of measure algorithm on the whole.Precision only considers whether the side of L positions before coming is predicted accurately.And Ranking Score are more considered to the sequence on the side predicted.
AUC can be understood as in test set while fractional value have be not present than randomly selected one while fractional value it is high Probability, that is to say, that every time at random from test set choose a line compared with the randomly selected side being not present, such as In fruit test set while fractional value be more than be not present while fractional value, just plus 1 point;If two fractional values are equal, just add 0.5 point.Independently compare n times, if in the secondary test sets of n ' while fractional value be more than be not present while fraction, have n " secondary Two fractional values are equal, then AUC is defined as:
<mrow> <mi>A</mi> <mi>U</mi> <mi>C</mi> <mo>=</mo> <mfrac> <mrow> <msup> <mi>n</mi> <mo>&amp;prime;</mo> </msup> <mo>+</mo> <mn>0.5</mn> <msup> <mi>n</mi> <mrow> <mo>&amp;prime;</mo> <mo>&amp;prime;</mo> </mrow> </msup> </mrow> <mi>n</mi> </mfrac> </mrow>
Obviously, if all fractions all randomly generate, AUC=0.5.Therefore degree of the AUC more than 0.5 has been weighed algorithm and existed It is more accurate than randomly selected method in much degree.
Precision is defined as being predicted accurate ratio in first L prediction side.If m prediction is accurate, that is, before coming There are m in L side in test set, then Precision is defined as:
<mrow> <mi>Pr</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> <mo>=</mo> <mfrac> <mi>m</mi> <mi>L</mi> </mfrac> </mrow>
Obviously, the bigger predictions of Precision are more accurate.If two algorithm AUC are identical, and the Precision of algorithm 1 is more than calculation Method 2, illustrate that algorithm 1 is more preferable because its tend to really connect the node on side to coming before.
Ranking Score mainly consider position of the side in final sequence in test set.Make H=U-ETFor the collection on unknown side Close (equivalent in test set while and be not present while set), riRepresent the rankings of unknown side i ∈ E in the ranking.Then should The Ranking Score values on the unknown side of bar are RSi=ri/ | H |, wherein | H | represent element in set H number travel through it is all Side in test set, the Ranking Score values for obtaining system are:
<mrow> <mi>R</mi> <mi>S</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <msup> <mi>E</mi> <mi>P</mi> </msup> <mo>|</mo> </mrow> </mfrac> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>&amp;Element;</mo> <msup> <mi>E</mi> <mi>P</mi> </msup> </mrow> </munder> <msub> <mi>RS</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <msup> <mi>E</mi> <mi>P</mi> </msup> <mo>|</mo> </mrow> </mfrac> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>&amp;Element;</mo> <msup> <mi>E</mi> <mi>P</mi> </msup> </mrow> </munder> <mfrac> <msub> <mi>r</mi> <mi>i</mi> </msub> <mrow> <mo>|</mo> <mi>H</mi> <mo>|</mo> </mrow> </mfrac> <mo>.</mo> </mrow> 3
CN201710770816.6A 2017-08-31 2017-08-31 A kind of social networks multitask Forecasting Methodology based on factor graph model Pending CN107451703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710770816.6A CN107451703A (en) 2017-08-31 2017-08-31 A kind of social networks multitask Forecasting Methodology based on factor graph model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710770816.6A CN107451703A (en) 2017-08-31 2017-08-31 A kind of social networks multitask Forecasting Methodology based on factor graph model

Publications (1)

Publication Number Publication Date
CN107451703A true CN107451703A (en) 2017-12-08

Family

ID=60493405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710770816.6A Pending CN107451703A (en) 2017-08-31 2017-08-31 A kind of social networks multitask Forecasting Methodology based on factor graph model

Country Status (1)

Country Link
CN (1) CN107451703A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399491A (en) * 2018-02-02 2018-08-14 浙江工业大学 A kind of employee's diversity ranking method based on network
CN109829561A (en) * 2018-11-15 2019-05-31 西南石油大学 Accident forecast method based on smoothing processing Yu network model machine learning
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization
CN110599358A (en) * 2019-07-10 2019-12-20 杭州师范大学钱江学院 Cross-social network user identity association method based on probability factor graph model
CN111324812A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Federal recommendation method, device, equipment and medium based on transfer learning
WO2020168851A1 (en) * 2019-02-18 2020-08-27 北京三快在线科技有限公司 Behavior recognition
CN112039700A (en) * 2020-08-26 2020-12-04 重庆理工大学 Social network link abnormity prediction method based on stack generalization and cost sensitive learning
CN112073227A (en) * 2020-08-26 2020-12-11 重庆理工大学 Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning
CN112233734A (en) * 2020-09-30 2021-01-15 山东大学 Water quality data deduction acquisition method and system based on machine learning
US20210021616A1 (en) * 2018-03-14 2021-01-21 Intelici - Cyber Defense System Ltd. Method and system for classifying data objects based on their network footprint

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399491A (en) * 2018-02-02 2018-08-14 浙江工业大学 A kind of employee's diversity ranking method based on network
CN108399491B (en) * 2018-02-02 2021-10-29 浙江工业大学 Employee diversity ordering method based on network graph
US20210021616A1 (en) * 2018-03-14 2021-01-21 Intelici - Cyber Defense System Ltd. Method and system for classifying data objects based on their network footprint
CN109829561A (en) * 2018-11-15 2019-05-31 西南石油大学 Accident forecast method based on smoothing processing Yu network model machine learning
CN109829561B (en) * 2018-11-15 2021-03-16 西南石油大学 Accident prediction method based on smoothing processing and network model machine learning
WO2020168851A1 (en) * 2019-02-18 2020-08-27 北京三快在线科技有限公司 Behavior recognition
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization
CN110599358A (en) * 2019-07-10 2019-12-20 杭州师范大学钱江学院 Cross-social network user identity association method based on probability factor graph model
CN110599358B (en) * 2019-07-10 2021-05-04 杭州师范大学钱江学院 Cross-social network user identity association method based on probability factor graph model
CN111324812A (en) * 2020-02-20 2020-06-23 深圳前海微众银行股份有限公司 Federal recommendation method, device, equipment and medium based on transfer learning
CN112073227A (en) * 2020-08-26 2020-12-11 重庆理工大学 Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning
CN112039700A (en) * 2020-08-26 2020-12-04 重庆理工大学 Social network link abnormity prediction method based on stack generalization and cost sensitive learning
CN112073227B (en) * 2020-08-26 2021-11-05 重庆理工大学 Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning
CN112233734A (en) * 2020-09-30 2021-01-15 山东大学 Water quality data deduction acquisition method and system based on machine learning
CN112233734B (en) * 2020-09-30 2022-07-19 山东大学 Water quality data deduction acquisition method and system based on machine learning

Similar Documents

Publication Publication Date Title
CN107451703A (en) A kind of social networks multitask Forecasting Methodology based on factor graph model
Yu et al. Identifying critical nodes in complex networks via graph convolutional networks
CN105117422B (en) Intelligent social network recommendation system
Zheng et al. Understanding the tourist mobility using GPS: How similar are the tourists?
Park et al. Interpretation of Bayesian neural networks for predicting the duration of detected incidents
CN110837602B (en) User recommendation method based on representation learning and multi-mode convolutional neural network
De Winter et al. Combining temporal aspects of dynamic networks with node2vec for a more efficient dynamic link prediction
CN108172301A (en) A kind of mosquito matchmaker&#39;s epidemic Forecasting Methodology and system based on gradient boosted tree
Li et al. An extended TODIM method for group decision making with the interval intuitionistic fuzzy sets
CN108108844A (en) A kind of urban human method for predicting and system
CN106778894A (en) A kind of method of author&#39;s cooperative relationship prediction in academic Heterogeneous Information network
Chen et al. Calibrating a Land Parcel Cellular Automaton (LP-CA) for urban growth simulation based on ensemble learning
CN106530687B (en) A kind of transportation network pitch point importance measuring method based on time-space attribute
Deng et al. Bridge condition assessment using D numbers
CN109214599A (en) The method that a kind of pair of complex network carries out link prediction
CN103034687B (en) A kind of relating module recognition methodss based on 2 class heterogeneous networks
CN110347932A (en) A kind of across a network user&#39;s alignment schemes based on deep learning
CN106952167A (en) A kind of catering trade good friend Lian Bian influence force prediction methods based on multiple linear regression
CN109034960A (en) A method of more inferred from attributes based on user node insertion
CN107742169A (en) A kind of Urban Transit Network system constituting method and performance estimating method based on complex network
Wei et al. STGSA: A novel spatial-temporal graph synchronous aggregation model for traffic prediction
Liu et al. Modeling the interaction coupling of multi-view spatiotemporal contexts for destination prediction
Zhao et al. Incorporating spatio-temporal smoothness for air quality inference
He et al. Next point-of-interest recommendation via a category-aware Listwise Bayesian Personalized Ranking
Liao et al. Reimagining multi-criterion decision making by data-driven methods based on machine learning: A literature review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171208

RJ01 Rejection of invention patent application after publication