CN107451703A - A kind of social networks multitask Forecasting Methodology based on factor graph model - Google Patents
A kind of social networks multitask Forecasting Methodology based on factor graph model Download PDFInfo
- Publication number
- CN107451703A CN107451703A CN201710770816.6A CN201710770816A CN107451703A CN 107451703 A CN107451703 A CN 107451703A CN 201710770816 A CN201710770816 A CN 201710770816A CN 107451703 A CN107451703 A CN 107451703A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- network
- msubsup
- msup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000013508 migration Methods 0.000 claims abstract description 23
- 230000005012 migration Effects 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000012512 characterization method Methods 0.000 claims abstract description 5
- 238000010276 construction Methods 0.000 claims abstract description 5
- 238000013481 data capture Methods 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 20
- 230000003542 behavioural effect Effects 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 15
- 230000008878 coupling Effects 0.000 claims description 14
- 238000010168 coupling process Methods 0.000 claims description 14
- 238000005859 coupling reaction Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 230000009193 crawling Effects 0.000 claims description 4
- 230000000644 propagated effect Effects 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000005315 distribution function Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000005303 weighing Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims 1
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 230000000694 effects Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 206010068052 Mosaicism Diseases 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 210000003765 sex chromosome Anatomy 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
- G06Q30/0217—Discounts or incentives, e.g. coupons or rebates involving input on products or services in exchange for incentives or rewards
- G06Q30/0218—Discounts or incentives, e.g. coupons or rebates involving input on products or services in exchange for incentives or rewards based on score
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Tourism & Hospitality (AREA)
- Databases & Information Systems (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A kind of social networks multitask Forecasting Methodology based on factor graph model, comprises the following steps:The first step, Network Data Capture, specifically include network data crawl, data prediction;Second step, multitask factor graph model is established, specifically include network characterization extraction, network migration structure structure, factor graph model construction;3rd step, prediction result are assessed.
Description
Technical field
The present invention relates to machine learning method, factor graph model, personalized recommendation technology and network migration structure structure skill
Art, suitable for solving multitask link forecasting problem heterogeneous network.
Background technology
With the progress of Internet technology and the popularization of online social networks, network size constantly expands, and people are faced
Network also from simple human relation network to coupling social networks transition.Coupling social networks generally has complicated net
Network structure, node (such as user and commodity) and polytype side (such as social link and scoring chain comprising polymorphic type
Connect).Link prediction research in traditional coupling social networks is concentrated mainly on social link prediction or scoring link is pre-
Survey, it is generally recognized that be separate between different types of link prediction task.But the network in real world, this
Two kinds of predictions are often related, such as, they are more likely to purchase or evaluated identical business if two people are friends
Product, if likewise, two people often buy or evaluated same commodity, they more likely have similar interest and
Hobby, and there is more maximum probability to turn into friend.Therefore, how by building network migration structure, binding factor graph model, make more
Kind prediction task produces association by information flow, has epochmaking reason to the heterogeneous link forecasting research in complex network
By and practical significance.
Existing link prediction thinking is first calculating network node diagnostic, such as in-degree, out-degree, cluster coefficients etc.;Network is opened up
Architectural feature, such as common neighbours' index, AA indexs, Salton indexs, Jaccard indexs, HPI indexs etc. are flutterred, then more
Kind feature integratesR represents integrated result, x in formulaiRepresent the ith feature of extraction, wiRepresent
The weight of ith feature, finally it is brought into existing machine learning model, weight vectors w is obtained by training.
However, due to social link prediction and the difference of scoring link prediction network structure, the feature of extraction is also different,
Therefore need to learn multiple models.Present factor graph model thinks more in social networks in the application of link prediction technology
Between kind prediction task is independent, and doing a variety of predictions needs to train different models, and there is presently no by excavating network
Structure is migrated to do the link prediction of multitask.
Existing Predicting Technique does not solve Sparse sex chromosome mosaicism, does not make full use of the structural information of network, it is impossible to
Adapt to the link prediction of multitask;
Multitask prediction needs to learn multiple machine learning models, and computational efficiency is low;Do not fully take into account a variety of pre-
Coupling facilitation between survey task.
The content of the invention
The present invention will overcome the disadvantages mentioned above of prior art, there is provided a kind of more of social networks based on factor graph model
Business Forecasting Methodology.
The present invention is a kind of social networks multitask Forecasting Methodology based on factor graph model, its flow chart such as Fig. 1 institutes
Show.This method, by building network migration structure, is made more by the use of traditional complex network link Forecasting Methodology as feature
Information flow occurs between kind prediction task, intercouples, solves the problems, such as that Deta sparseness, computational efficiency are low, simultaneously
The accuracy of prediction result is improved, this method includes following steps.
The first step, Network Data Capture:User social contact information and behavioural information are collected by web crawlers, and to crawling
Data cleared up, it is convenient follow-up to calculate, it is main to be crawled including network data and data prediction.
(1) network data crawls:Crawl the behavioural information of user social contact behavioural information and user to commodity, every information
Including:User UserID and user UserID, user UserID and commodity ItemID.
(2) data prediction:In order to facilitate follow-up calculating, it is necessary to clear up redundancy in data, incomplete data, formed
Unified user and user social contact behavioural matrix w required for model1, user and commodity scoring behavioural matrix w2.In matrix w1
In, element w in matrix1 ijRepresent that (it can also be unidirectionally double that can be to the relations such as the good friend between user i and user j, concern
To), in matrix w2In, element w in matrix2 ijThe relations such as collection, purchase, evaluation between expression user i and commodity j.
Second step, establish multitask factor graph model:
(1) network characterization extracts:Factor graph model is a supervised learning model, it would be desirable to using different in network
Structure information is social link and scoring link extraction feature.For a specific node i in social networks, we can be with
Extract node diagnostic, including degree k (vi), out-degree kout(vi), in-degree kin(vi), cluster coefficients ci.For social networks node pair
I and j, similarity indices are to predict its maximally related feature whether connected in a network.Therefore, we are extracted some biographies
The similarity indices of system are as feature.
Crossover network (the scoring relation of user and commodity) the also under cover information of social networks node pair, such as they
The commodity commented on jointly are more, then the possibility that they are friends is bigger.Based on this, we extract according to crossover network
Similarity indices.Similar, for crossover network destination node to user i and commodity a, we can also refer to according to similitude
Mark to extract feature.
(2) network migration structure is built:It is the factor important in factor graph model to migrate structure, and the label information on side can
Can be migrated in structure.We build migration structure with triangle in this work so that information can be
Inside social networks, propagated between social networks and crossover network.Structure is migrated as shown in Fig. 2-1- Fig. 2-18.
(3) factor graph model construction:Coupling network G=(GS,GC) a social networks G can be divided intoSWith an intersection
Network GC, our target is one model of study while predicts that potential social link and scoring link that (such as Fig. 3 b's is orange
Dotted line)
For the node in network to eij, we use label yeIts state is represented, works as yeDeposited between=1 expression node pair
In a line, work as yeSide is not present between=0 expression node pair.The label y of final mask outpute=1 probability P (ye=1).
(a) joint probability distribution
For coupling social networks G=(V, E, X), V={ viRepresent set of node, E={ eijNode is represented to gathering,It is an attribute matrix, node is represented to e per a lineijCorresponding attribute vector, our target are estimations every
Probability P (the y that unknown link is formede|xe).We represent the joint probability distribution of network with P (Y | X, G), and G represents the institute of network
There is information.This joint probability distribution show the label of link not only with the local attribute of node pair about also and network knot
Structure is relevant, and joint probability distribution can be instantiated as:
Wherein, d and d' represents the characteristic dimension of social networks and crossover network, x respectivelyeiRepresent node to i-th of e
Property value, ESThe node in social networks is represented to set, ECThe node on crossover network is represented to setRepresent
In attribute in social networksUnder the conditions ofProbability,Represent in crossover network in attributeUnder the conditions of
Probability, P (Yε) represent to migrate the influence of structure, Π represents the species of migration structure, and π represents a type of migration structure,
ε represents one of migration structure.
(b) factor is instantiated
In principle, Attribute Association characteristic function and Social Relation characteristic function can be instantiated by different modes.I
It is modeled using the Hammersley-Clifford theories in markov random file here:
fi(*)、gi(*)hε(*) be respectively social networks, crossover network, migrate structure characteristic function, αi、βi、γεIt is
Its corresponding weight, Z1、Z2、Z3For normalization factor.
(c) objective function optimization
With reference to above-mentioned formula, last we obtain object function:
Wherein, Z=Z1Z2Z3For normalization factor.
With the method for stochastic gradient descent, the gradient of each parameter can be obtained:
With E [hε(Yε)] respectively represent data distribution function
hε(Yε) expectation,WithIt is according to estimation model
In Pα, beta, gammaExpectation under (Y | X, G) distribution.
3rd step, prediction result are assessed
The index for weighing this method validity has totally 3 kinds of AUC, Precision and Ranking Score.They are to prediction
The emphasis that accuracy is weighed is different:AUC(area under the receiver operating characteristic
Curve) the accuracy of measure algorithm on the whole.Precision only considers whether the side of L positions before coming is predicted accurately.And
Ranking Score are more considered to the sequence on the side predicted.
AUC can be understood as in test set while fractional value have be not present than randomly selected one while point
The high probability of numerical value, that is to say, that choose a line from test set at random every time and carried out with the randomly selected side being not present
Compare, if in test set while fractional value be more than be not present while fractional value, just plus 1 point;If two fractional value phases
Deng with regard to adding 0.5 point.Independently compare n times, if in the secondary test sets of n ' while fractional value be more than be not present while point
Number, there is that secondary two fractional values of n " are equal, then AUC is defined as:
Obviously, if all fractions all randomly generate, AUC=0.5.Therefore degree of the AUC more than 0.5 is weighed
Algorithm is to what extent more accurate than randomly selected method.
Precision is defined as being predicted accurate ratio in first L prediction side.If m prediction is accurate, i.e.,
There are m before coming in L side in test set, then Precision is defined as:
Obviously, the bigger predictions of Precision are more accurate.If two algorithm AUC are identical, and the Precision of algorithm 1
More than algorithm 2, illustrate that algorithm 1 is more preferable because its tend to really connect the node on side to coming before.
Ranking Score mainly consider position of the side in final sequence in test set.Make H=U-ETTo be unknown
While set (equivalent in test set while and the set on side that is not present), riRepresent the rows of unknown side i ∈ E in the ranking
Name.Then the Ranking Score values on the unknown side of this are RSi=ri/ | H |, wherein | H | represent element in set H it is individual several times
All sides in test set are gone through, the Ranking Score values for obtaining system are:
The key of the present invention is by building network migration structure, and binding factor graph model, is made between multi-type network
Information flows, so as to realize multitask link prediction in heterogeneous network.
The present invention has it in network characterization extraction, network migration structure structure, multitask factor graph model construction etc.
Feature.
It is an advantage of the invention that:Due to taking data crawling method, thus can efficiently, comprehensively search dependency number
According to.Proof analysis community network user individual behavior and interbehavior, disclose the phase between user's dissemination and network structure
Mutually influence.We portray the inherent coupling between more prediction tasks by network migration structure, and the preference of user will influence
The formation of interaction of the user in social networks, social networks will also influence the preference of user.Based on this, we use more
Business factor graph model, allows information to be propagated between network internal, network, solves multitask forecasting problem simultaneously and solves
Determine Sparse sex chromosome mosaicism.
Brief description of the drawings
Fig. 1 is the flow chart of the inventive method.
Fig. 2-1- Fig. 2-10 represents user-user-user and migrates structure.Figure Fig. 2-11- Fig. 2-18 represent user-commodity-
User migrates structure.
Fig. 3 a are the social networks examples of coupling;Fig. 3 b are that coupling social networks can be divided into social networks and crossing net
Network;Fig. 3 c are migration topology examples;Fig. 3 d are output:Social networks and the probability on scoring network missing side.
Fig. 3 a~Fig. 3 d are the schematic diagrames of multitask link prediction in coupling network of the present invention.
Embodiment
Technical scheme is further illustrated below in conjunction with the accompanying drawings.
The present invention is a kind of social networks multitask Forecasting Methodology based on factor graph model, its flow chart such as Fig. 1 institutes
Show.This method, by building network migration structure, is made more by the use of traditional complex network link Forecasting Methodology as feature
Information flow occurs between kind prediction task, intercouples, solves the problems, such as that Deta sparseness, computational efficiency are low, simultaneously
The accuracy of prediction result is improved, this method includes following steps.
The first step, Network Data Capture:User social contact information and behavioural information are collected by web crawlers, and to crawling
Data cleared up, it is convenient follow-up to calculate, it is main to be crawled including network data and data prediction.
(1) network data crawls:Crawl the behavioural information of user social contact behavioural information and user to commodity, every information
Including:User UserID and user UserID, user UserID and commodity ItemID.
(2) data prediction:In order to facilitate follow-up calculating, it is necessary to clear up redundancy in data, incomplete data, formed
Unified user and user social contact behavioural matrix w required for model1, user and commodity scoring behavioural matrix w2.In matrix w1
In, element w in matrix1 ijRepresent that (it can also be unidirectionally double that can be to the relations such as the good friend between user i and user j, concern
To), in matrix w2In, element w in matrix2 ijThe relations such as collection, purchase, evaluation between expression user i and commodity j.
Second step, establish multitask factor graph model:
(1) network characterization extracts:Factor graph model is a supervised learning model, it would be desirable to using different in network
Structure information is social link and scoring link extraction feature.For a specific node i in social networks, we can be with
Extract node diagnostic, including degree k (vi), out-degree kout(vi), in-degree kin(vi), cluster coefficients ci.For social networks node pair
I and j, similarity indices are to predict its maximally related feature whether connected in a network.Therefore, we are extracted some biographies
The similarity indices of system are as feature.
Crossover network (the scoring relation of user and commodity) the also under cover information of social networks node pair, such as they
The commodity commented on jointly are more, then the possibility that they are friends is bigger.Based on this, we extract according to crossover network
Similarity indices.Similar, for crossover network destination node to user i and commodity a, we can also refer to according to similitude
Mark to extract feature.
(2) network migration structure is built:It is the factor important in factor graph model to migrate structure, and the label information on side can
Can be migrated in structure.We build migration structure with triangle in this work so that information can be
Inside social networks, propagated between social networks and crossover network.Structure is migrated as shown in Fig. 2-1- Fig. 2-18.
(3) factor graph model construction:Coupling network G=(GS,GC) a social networks G can be divided intoSWith an intersection
Network GC, our target is one model of study while the potential social link of prediction and scoring link (such as Fig. 3 (b) orange
Color dotted line)
For the node in network to eij, we use label yeIts state is represented, works as yeDeposited between=1 expression node pair
In a line, work as yeSide is not present between=0 expression node pair.The label y of final mask outpute=1 probability P (ye=1).
(a) joint probability distribution
For coupling social networks G=(V, E, X), V={ viRepresent set of node, E={ eijNode is represented to gathering,It is an attribute matrix, node is represented to e per a lineijCorresponding attribute vector, our target are estimations every
Probability P (the y that unknown link is formede|xe).We represent the joint probability distribution of network with P (Y | X, G), and G represents the institute of network
There is information.This joint probability distribution show the label of link not only with the local attribute of node pair about also and network knot
Structure is relevant, and joint probability distribution can be instantiated as:
Wherein, d and d' represents the characteristic dimension of social networks and crossover network, x respectivelyeiRepresent node to i-th of e
Property value,Represent in social networks in attributeUnder the conditions ofProbability,Represent in crossover network
In attributeUnder the conditions ofProbability, P (Yε) represent to migrate the influence of structure, π represents a type of migration structure.
(b) factor is instantiated
In principle, Attribute Association characteristic function and Social Relation characteristic function can be instantiated by different modes.I
It is modeled using the Hammersley-Clifford theories in markov random file here:
fi(*)、gi(*)hε(*) be respectively social networks, crossover network, migrate structure characteristic function, αi、βi、γεIt is
Its corresponding weight, Z1、Z2、Z3For normalization factor.
(c) objective function optimization
With reference to above-mentioned formula, last we obtain object function:
Wherein, Z=Z1Z2Z3For normalization factor.
With the method for stochastic gradient descent, the gradient of each parameter can be obtained:
With E [hε(Yε)] respectively represent data distribution function
hε(Yε) expectation,WithIt is according to estimation model
In Pα, beta, gammaExpectation under (Y | X, G) distribution.
3rd step, prediction result are assessed
The index for weighing this method validity has totally 3 kinds of AUC, Precision and Ranking Score.They are to prediction
The emphasis that accuracy is weighed is different:AUC(area under the receiver operating characteristic
Curve) the accuracy of measure algorithm on the whole.Precision only considers whether the side of L positions before coming is predicted accurately.And
Ranking Score are more considered to the sequence on the side predicted.
AUC can be understood as in test set while fractional value have be not present than randomly selected one while point
The high probability of numerical value, that is to say, that choose a line from test set at random every time and carried out with the randomly selected side being not present
Compare, if in test set while fractional value be more than be not present while fractional value, just plus 1 point;If two fractional value phases
Deng with regard to adding 0.5 point.Independently compare n times, if in the secondary test sets of n ' while fractional value be more than be not present while point
Number, there is that secondary two fractional values of n " are equal, then AUC is defined as:
Obviously, if all fractions all randomly generate, AUC=0.5.Therefore degree of the AUC more than 0.5 is weighed
Algorithm is to what extent more accurate than randomly selected method.
Precision is defined as being predicted accurate ratio in first L prediction side.If m prediction is accurate, i.e.,
There are m before coming in L side in test set, then Precision is defined as:
Obviously, the bigger predictions of Precision are more accurate.If two algorithm AUC are identical, and the Precision of algorithm 1
More than algorithm 2, illustrate that algorithm 1 is more preferable because its tend to really connect the node on side to coming before.
Ranking Score mainly consider position of the side in final sequence in test set.Make H=U-ETTo be unknown
While set (equivalent in test set while and the set on side that is not present), riRepresent the rows of unknown side i ∈ E in the ranking
Name.Then the Ranking Score values on the unknown side of this are RSi=ri/ | H |, wherein | H | represent element in set H it is individual several times
All sides in test set are gone through, the Ranking Score values for obtaining system are:
Claims (1)
1. a kind of social networks multitask Forecasting Methodology based on factor graph model, comprises the following steps:
The first step, Network Data Capture:User social contact information and behavioural information, and the data to crawling are collected by web crawlers
Cleared up, it is convenient follow-up to calculate, it is main to be crawled including network data and data prediction.
(11) network data crawls:The behavioural information of user social contact behavioural information and user to commodity is crawled, every information includes:
User UserID and user UserID, user UserID and commodity ItemID.
(12) data prediction:In order to facilitate follow-up calculating, it is necessary to clear up redundancy in data, incomplete data, model is formed
Required unified user and user social contact behavioural matrix w1, user and commodity scoring behavioural matrix w2.In matrix w1In, square
Element w in battle array1 ijGood friend, concern relation between expression user i and user j, in matrix w2In, element w in matrix2 ijRepresent to use
Collection, purchase between family i and commodity j, evaluation relation.
Second step, establish multitask factor graph model:
(21) network characterization extracts:Factor graph model is a supervised learning model, it is necessary to be using the Heterogeneous Information in network
Social activity link and scoring link extraction feature.For a specific node i in social networks, the feature of node is extracted, is wrapped
Degree of including k (vi), out-degree kout(vi), in-degree kin(vi), cluster coefficients ci.For social networks node to i and j, similarity indices
It is to predict its maximally related feature whether connected in a network.Therefore, some traditional similarity indices are extracted as special
Sign.
Also under cover the information of social networks node pair, two users comment crossover network (the scoring relation of user and commodity) jointly
The commodity of opinion are more, then the possibility that they are friends is bigger.Based on this, some similarity indices are extracted according to crossover network.
Similar, for crossover network destination node to user i and commodity a, feature is extracted according to above-mentioned similarity indices.
(22) network migration structure is built:Migration structure is the factor important in factor graph model, and the label information on side can be
It can be migrated in structure.Migration structure is built with triangle so that information can be in social networks in this work
Inside, propagated between social networks and crossover network.
(23) factor graph model construction:Coupling network G=(Gs, GC) a social networks G can be divided intosWith a crossover network
GC, target is one model of study while the potential social link of prediction and scoring link
For the node in network to eij, with label yeIts state is represented, works as yeA line be present between=1 expression node pair, when
yeSide is not present between=0 expression node pair.The label y of final mask outpute=1 probability P (ye=1).
(a) joint probability distribution
For coupling social networks G=(V, E, X), V={ viRepresent set of node, E={ eijNode is represented to gathering,It is an attribute matrix, node is represented to e per a lineijCorresponding attribute vector, target are every unknown chains of estimation
Probability P (the y that road is formede|xe).The joint probability distribution of network is represented with P (Y | X, G), G represents all information of network.It is this
Joint probability distribution show the label of link not only with the local attribute of node pair about also and network structure it is relevant, joint is general
Rate distribution can be instantiated as:
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<mi>Y</mi>
<mo>|</mo>
<mi>X</mi>
<mo>,</mo>
<mi>G</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mo>&Pi;</mo>
<mrow>
<mi>e</mi>
<mo>&Element;</mo>
<msup>
<mi>E</mi>
<mi>S</mi>
</msup>
</mrow>
</munder>
<munderover>
<mo>&Pi;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>d</mi>
</munderover>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>y</mi>
<mi>e</mi>
<mi>s</mi>
</msubsup>
<mo>|</mo>
<msubsup>
<mi>x</mi>
<mrow>
<mi>e</mi>
<mi>i</mi>
</mrow>
<mi>s</mi>
</msubsup>
<mo>)</mo>
</mrow>
<munder>
<mo>&Pi;</mo>
<mrow>
<mi>e</mi>
<mo>&Element;</mo>
<msup>
<mi>E</mi>
<mi>C</mi>
</msup>
</mrow>
</munder>
<munderover>
<mo>&Pi;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<msup>
<mi>d</mi>
<mo>&prime;</mo>
</msup>
</munderover>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>y</mi>
<mi>e</mi>
<mi>c</mi>
</msubsup>
<mo>|</mo>
<msubsup>
<mi>x</mi>
<mrow>
<mi>e</mi>
<mi>i</mi>
</mrow>
<mi>c</mi>
</msubsup>
<mo>)</mo>
</mrow>
<munder>
<mo>&Pi;</mo>
<mrow>
<mi>&pi;</mi>
<mo>&Element;</mo>
<mo>&Pi;</mo>
</mrow>
</munder>
<munder>
<mo>&Pi;</mo>
<mrow>
<mi>&epsiv;</mi>
<mo>&Element;</mo>
<mi>&pi;</mi>
</mrow>
</munder>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>Y</mi>
<mi>&epsiv;</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, d and d ' represents the characteristic dimension of social networks and crossover network, x respectivelyeiRepresent ith attribute of the node to e
Value, ESThe node in social networks is represented to set, ECThe node on crossover network is represented to setRepresent social network
In attribute in networkUnder the conditions ofProbability,Represent in crossover network in attributeUnder the conditions ofProbability, P
(Yε) represent to migrate the influence of structure, Π represents the species of migration structure, and π represents a type of migration structure, and ε is represented wherein
One migration structure.
(b) factor is instantiated
In principle, Attribute Association characteristic function and Social Relation characteristic function can be instantiated by different modes.Here adopt
It is modeled with the Hammersley-Clifford theories in markov random file:
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>y</mi>
<mi>e</mi>
<mi>s</mi>
</msubsup>
<mo>|</mo>
<msubsup>
<mi>x</mi>
<mrow>
<mi>e</mi>
<mi>i</mi>
</mrow>
<mi>s</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>Z</mi>
<mn>1</mn>
</msub>
</mfrac>
<mi>exp</mi>
<mo>{</mo>
<msub>
<mi>&alpha;</mi>
<mi>i</mi>
</msub>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mrow>
<mi>e</mi>
<mi>i</mi>
</mrow>
<mi>s</mi>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>y</mi>
<mi>e</mi>
<mi>s</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>}</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>y</mi>
<mi>e</mi>
<mi>c</mi>
</msubsup>
<mo>|</mo>
<msubsup>
<mi>x</mi>
<mrow>
<mi>e</mi>
<mi>i</mi>
</mrow>
<mi>c</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>Z</mi>
<mn>2</mn>
</msub>
</mfrac>
<mi>exp</mi>
<mo>{</mo>
<msub>
<mi>&beta;</mi>
<mi>i</mi>
</msub>
<msub>
<mi>g</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mrow>
<mi>e</mi>
<mi>i</mi>
</mrow>
<mi>c</mi>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>y</mi>
<mi>e</mi>
<mi>c</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>}</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>Y</mi>
<mi>&epsiv;</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>Z</mi>
<mn>3</mn>
</msub>
</mfrac>
<mi>exp</mi>
<mo>{</mo>
<msub>
<mi>&gamma;</mi>
<mi>&epsiv;</mi>
</msub>
<msub>
<mi>h</mi>
<mi>&epsiv;</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>Y</mi>
<mi>&epsiv;</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>}</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
fi(*)、gi(*)hε(*) be respectively social networks, crossover network, migrate structure characteristic function, αi、βi、γεIt is corresponding
Its weight, Z1、Z2、Z3For normalization factor.
(c) objective function optimization
With reference to above-mentioned formula, object function is finally obtained:
Wherein, Z=Z1Z2Z3For normalization factor.
With the method for stochastic gradient descent, the gradient of each parameter can be obtained:
With E [hε(Yε)] respectively represent data distribution function
hε(Yε) expectation,WithIt is according to estimation model
In Pα, beta, gammaExpectation under (Y | X, G) distribution.
3rd step, prediction result are assessed:
The index for weighing this method validity has totally 3 kinds of AUC, Precision and Ranking Score.They are accurate to predicting
It is different to spend the emphasis weighed:AUC(area under the receiver operating characteristic
Curve) the accuracy of measure algorithm on the whole.Precision only considers whether the side of L positions before coming is predicted accurately.And
Ranking Score are more considered to the sequence on the side predicted.
AUC can be understood as in test set while fractional value have be not present than randomly selected one while fractional value it is high
Probability, that is to say, that every time at random from test set choose a line compared with the randomly selected side being not present, such as
In fruit test set while fractional value be more than be not present while fractional value, just plus 1 point;If two fractional values are equal, just add
0.5 point.Independently compare n times, if in the secondary test sets of n ' while fractional value be more than be not present while fraction, have n " secondary
Two fractional values are equal, then AUC is defined as:
<mrow>
<mi>A</mi>
<mi>U</mi>
<mi>C</mi>
<mo>=</mo>
<mfrac>
<mrow>
<msup>
<mi>n</mi>
<mo>&prime;</mo>
</msup>
<mo>+</mo>
<mn>0.5</mn>
<msup>
<mi>n</mi>
<mrow>
<mo>&prime;</mo>
<mo>&prime;</mo>
</mrow>
</msup>
</mrow>
<mi>n</mi>
</mfrac>
</mrow>
Obviously, if all fractions all randomly generate, AUC=0.5.Therefore degree of the AUC more than 0.5 has been weighed algorithm and existed
It is more accurate than randomly selected method in much degree.
Precision is defined as being predicted accurate ratio in first L prediction side.If m prediction is accurate, that is, before coming
There are m in L side in test set, then Precision is defined as:
<mrow>
<mi>Pr</mi>
<mi>e</mi>
<mi>c</mi>
<mi>i</mi>
<mi>s</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mo>=</mo>
<mfrac>
<mi>m</mi>
<mi>L</mi>
</mfrac>
</mrow>
Obviously, the bigger predictions of Precision are more accurate.If two algorithm AUC are identical, and the Precision of algorithm 1 is more than calculation
Method 2, illustrate that algorithm 1 is more preferable because its tend to really connect the node on side to coming before.
Ranking Score mainly consider position of the side in final sequence in test set.Make H=U-ETFor the collection on unknown side
Close (equivalent in test set while and be not present while set), riRepresent the rankings of unknown side i ∈ E in the ranking.Then should
The Ranking Score values on the unknown side of bar are RSi=ri/ | H |, wherein | H | represent element in set H number travel through it is all
Side in test set, the Ranking Score values for obtaining system are:
<mrow>
<mi>R</mi>
<mi>S</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mo>|</mo>
<msup>
<mi>E</mi>
<mi>P</mi>
</msup>
<mo>|</mo>
</mrow>
</mfrac>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>&Element;</mo>
<msup>
<mi>E</mi>
<mi>P</mi>
</msup>
</mrow>
</munder>
<msub>
<mi>RS</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mo>|</mo>
<msup>
<mi>E</mi>
<mi>P</mi>
</msup>
<mo>|</mo>
</mrow>
</mfrac>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>&Element;</mo>
<msup>
<mi>E</mi>
<mi>P</mi>
</msup>
</mrow>
</munder>
<mfrac>
<msub>
<mi>r</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>|</mo>
<mi>H</mi>
<mo>|</mo>
</mrow>
</mfrac>
<mo>.</mo>
</mrow>
3
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710770816.6A CN107451703A (en) | 2017-08-31 | 2017-08-31 | A kind of social networks multitask Forecasting Methodology based on factor graph model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710770816.6A CN107451703A (en) | 2017-08-31 | 2017-08-31 | A kind of social networks multitask Forecasting Methodology based on factor graph model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107451703A true CN107451703A (en) | 2017-12-08 |
Family
ID=60493405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710770816.6A Pending CN107451703A (en) | 2017-08-31 | 2017-08-31 | A kind of social networks multitask Forecasting Methodology based on factor graph model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107451703A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399491A (en) * | 2018-02-02 | 2018-08-14 | 浙江工业大学 | A kind of employee's diversity ranking method based on network |
CN109829561A (en) * | 2018-11-15 | 2019-05-31 | 西南石油大学 | Accident forecast method based on smoothing processing Yu network model machine learning |
CN110083778A (en) * | 2019-04-08 | 2019-08-02 | 清华大学 | The figure convolutional neural networks construction method and device of study separation characterization |
CN110599358A (en) * | 2019-07-10 | 2019-12-20 | 杭州师范大学钱江学院 | Cross-social network user identity association method based on probability factor graph model |
CN111324812A (en) * | 2020-02-20 | 2020-06-23 | 深圳前海微众银行股份有限公司 | Federal recommendation method, device, equipment and medium based on transfer learning |
WO2020168851A1 (en) * | 2019-02-18 | 2020-08-27 | 北京三快在线科技有限公司 | Behavior recognition |
CN112039700A (en) * | 2020-08-26 | 2020-12-04 | 重庆理工大学 | Social network link abnormity prediction method based on stack generalization and cost sensitive learning |
CN112073227A (en) * | 2020-08-26 | 2020-12-11 | 重庆理工大学 | Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning |
CN112233734A (en) * | 2020-09-30 | 2021-01-15 | 山东大学 | Water quality data deduction acquisition method and system based on machine learning |
US20210021616A1 (en) * | 2018-03-14 | 2021-01-21 | Intelici - Cyber Defense System Ltd. | Method and system for classifying data objects based on their network footprint |
-
2017
- 2017-08-31 CN CN201710770816.6A patent/CN107451703A/en active Pending
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399491A (en) * | 2018-02-02 | 2018-08-14 | 浙江工业大学 | A kind of employee's diversity ranking method based on network |
CN108399491B (en) * | 2018-02-02 | 2021-10-29 | 浙江工业大学 | Employee diversity ordering method based on network graph |
US20210021616A1 (en) * | 2018-03-14 | 2021-01-21 | Intelici - Cyber Defense System Ltd. | Method and system for classifying data objects based on their network footprint |
CN109829561A (en) * | 2018-11-15 | 2019-05-31 | 西南石油大学 | Accident forecast method based on smoothing processing Yu network model machine learning |
CN109829561B (en) * | 2018-11-15 | 2021-03-16 | 西南石油大学 | Accident prediction method based on smoothing processing and network model machine learning |
WO2020168851A1 (en) * | 2019-02-18 | 2020-08-27 | 北京三快在线科技有限公司 | Behavior recognition |
CN110083778A (en) * | 2019-04-08 | 2019-08-02 | 清华大学 | The figure convolutional neural networks construction method and device of study separation characterization |
CN110599358A (en) * | 2019-07-10 | 2019-12-20 | 杭州师范大学钱江学院 | Cross-social network user identity association method based on probability factor graph model |
CN110599358B (en) * | 2019-07-10 | 2021-05-04 | 杭州师范大学钱江学院 | Cross-social network user identity association method based on probability factor graph model |
CN111324812A (en) * | 2020-02-20 | 2020-06-23 | 深圳前海微众银行股份有限公司 | Federal recommendation method, device, equipment and medium based on transfer learning |
CN112073227A (en) * | 2020-08-26 | 2020-12-11 | 重庆理工大学 | Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning |
CN112039700A (en) * | 2020-08-26 | 2020-12-04 | 重庆理工大学 | Social network link abnormity prediction method based on stack generalization and cost sensitive learning |
CN112073227B (en) * | 2020-08-26 | 2021-11-05 | 重庆理工大学 | Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning |
CN112233734A (en) * | 2020-09-30 | 2021-01-15 | 山东大学 | Water quality data deduction acquisition method and system based on machine learning |
CN112233734B (en) * | 2020-09-30 | 2022-07-19 | 山东大学 | Water quality data deduction acquisition method and system based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107451703A (en) | A kind of social networks multitask Forecasting Methodology based on factor graph model | |
Yu et al. | Identifying critical nodes in complex networks via graph convolutional networks | |
CN105117422B (en) | Intelligent social network recommendation system | |
Zheng et al. | Understanding the tourist mobility using GPS: How similar are the tourists? | |
Park et al. | Interpretation of Bayesian neural networks for predicting the duration of detected incidents | |
CN110837602B (en) | User recommendation method based on representation learning and multi-mode convolutional neural network | |
De Winter et al. | Combining temporal aspects of dynamic networks with node2vec for a more efficient dynamic link prediction | |
CN108172301A (en) | A kind of mosquito matchmaker's epidemic Forecasting Methodology and system based on gradient boosted tree | |
Li et al. | An extended TODIM method for group decision making with the interval intuitionistic fuzzy sets | |
CN108108844A (en) | A kind of urban human method for predicting and system | |
CN106778894A (en) | A kind of method of author's cooperative relationship prediction in academic Heterogeneous Information network | |
Chen et al. | Calibrating a Land Parcel Cellular Automaton (LP-CA) for urban growth simulation based on ensemble learning | |
CN106530687B (en) | A kind of transportation network pitch point importance measuring method based on time-space attribute | |
Deng et al. | Bridge condition assessment using D numbers | |
CN109214599A (en) | The method that a kind of pair of complex network carries out link prediction | |
CN103034687B (en) | A kind of relating module recognition methodss based on 2 class heterogeneous networks | |
CN110347932A (en) | A kind of across a network user's alignment schemes based on deep learning | |
CN106952167A (en) | A kind of catering trade good friend Lian Bian influence force prediction methods based on multiple linear regression | |
CN109034960A (en) | A method of more inferred from attributes based on user node insertion | |
CN107742169A (en) | A kind of Urban Transit Network system constituting method and performance estimating method based on complex network | |
Wei et al. | STGSA: A novel spatial-temporal graph synchronous aggregation model for traffic prediction | |
Liu et al. | Modeling the interaction coupling of multi-view spatiotemporal contexts for destination prediction | |
Zhao et al. | Incorporating spatio-temporal smoothness for air quality inference | |
He et al. | Next point-of-interest recommendation via a category-aware Listwise Bayesian Personalized Ranking | |
Liao et al. | Reimagining multi-criterion decision making by data-driven methods based on machine learning: A literature review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171208 |
|
RJ01 | Rejection of invention patent application after publication |