CN110134881A - A kind of friend recommendation method and system based on the insertion of multiple information sources figure - Google Patents
A kind of friend recommendation method and system based on the insertion of multiple information sources figure Download PDFInfo
- Publication number
- CN110134881A CN110134881A CN201910450218.XA CN201910450218A CN110134881A CN 110134881 A CN110134881 A CN 110134881A CN 201910450218 A CN201910450218 A CN 201910450218A CN 110134881 A CN110134881 A CN 110134881A
- Authority
- CN
- China
- Prior art keywords
- vector
- information
- attribute
- source node
- destination node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of friend recommendation method and system based on the insertion of multiple information sources figure, wherein the friend recommendation method based on the insertion of multiple information sources figure includes: to obtain source node identification and destination node information;Figure insertion processing is carried out to source node identification and destination node information respectively, and obtains after the insertion of source node identification after structure vector and insertion after the insertion of attribute vector and destination node information attribute vector after structure vector and insertion;The attribute vector after structure vector and insertion after splicing the insertion of source node identification obtains complete source node vector;The attribute vector after structure vector and insertion after splicing the insertion of destination node information obtains complete destination node vector;Hidden layer is sent into after the complete destination node vector of complete source node vector sum is spliced to learn;It the probability of existing connection and is pushed according to the size of probability after completing study, between output node.The application has the technical effect for improving friend recommendation accuracy.
Description
Technical field
This application involves information technology field more particularly to it is a kind of based on multiple information sources figure insertion friend recommendation method and
System.
Background technique
Social networks is the network that communication exchange generates relationship to be formed between each member in society, we are familiar with
The social networks of solution has wechat, Sina weibo, knows etc., and user can deliver oneself one in these social network-i i-platforms
A little novelty ideas, can also thumb up or comment on to other people idea to generate interaction, and can make friends with by these interactions
To new friend.A node indicates a user in social networks, and node connects to form a scale by relationship
Huge network.Traditional, predict that there are one between two nodes by linking the attribute on side and node that prediction is observed
A possibility that side, if between two nodes there are a line mean that two users there are friend relation, between two nodes not
There are sides then to indicate that there is no friend relation or relationship are unknown.Exact relationship between two users in order to obtain, related
In the link forecasting research of social networks, how accurately to indicate that the network information is a major issue.Traditional network representation
The sparse vector of higher-dimension is generally used, but the sparse vector of higher-dimension needs to spend more runing times and calculates space, is people
Using statistical learning method when limitation where.In addition, node also includes attribute information abundant, for example, user gender,
Age, school, constellation, location information etc..In the interaction of social networks, the age, relatively more similar user was more likely to become
Friend, school, same institute two users also than not being easier to become good friend in a school.In addition, the hobby of user
It is a key factor of friend recommendation.It, should although current some social network sites are also specially provided with friend recommendation module
Friend recommendation module generallys use collaborative filtering, i.e., analyzes the similarity between user by user-user relational matrix, according to
Similarity size is that user recommends new friend.But in real life, a large amount of missing is contained in user-user Interactive matrix
Value, and it is unable to get interactive information when social networks is added in new user, it be easy to cause Sparse, new user cold start-up etc.
Problem.Traditional friend recommendation module mostly just with data information structural information calculate two users similitude, and incite somebody to action
Attribute information abundant present in node has ignored.How these structural informations and attribute information to be fully utilized, point
The preference of user is precipitated, it is thus understood that user prefers to associate with which type of people, prefers to buy which type of article, increases
The accuracy of recommendation results and the satisfaction for improving user are also current problem to be solved.
Summary of the invention
The application's is designed to provide a kind of friend recommendation method and system based on the insertion of multiple information sources figure, has benefit
The technical effect of the accuracy of friend recommendation is improved with attribute information and figure embedded mobile GIS.
In order to achieve the above objectives, the application provides a kind of friend recommendation method based on the insertion of multiple information sources figure, including such as
Lower step: source node identification and destination node information are obtained;Figure insertion is carried out to source node identification and destination node information respectively
Processing, and obtain the insertion of structure vector and attribute vector and destination node information after insertion after the insertion of source node identification
Structure vector and attribute vector after insertion afterwards;The attribute vector after structure vector and insertion after splicing the insertion of source node identification
Obtain complete source node vector;The attribute vector after structure vector and insertion after splicing the insertion of destination node information obtains
Complete destination node vector;After the complete destination node vector of complete source node vector sum is spliced, it is sent into and hides
Layer is learnt;After completing study, the probability of existing connection between output node, and pushed according to the size of probability.
Preferably, source node identification includes source node structural information and source node attribute information;Destination node information includes
Destination node structural information and destination node attribute information.
Preferably, source node structural information is converted to one group of binary structure input vector in the form of one-hot coding
Input.
Preferably, source node attribute information is Category Attributes or connection attribute;Category Attributes use the form of one-hot coding
Be converted to one group of binary attribute input vector input, connection attribute by word frequency-inversely document-frequency be converted to real value to
Amount input.
Preferably, the structure input vector of the one-hot coding input of source node identification is embedded as source by node embedded mobile GIS
Structure vector after the insertion of nodal informationThe attribute input vector or real-valued vectors of source node pass through customized weight square
Battle array W(k)It is embedded as attribute vector after the insertion of source node identification
Preferably, by the structure vector after the insertion of source node identificationWith the attribute vector after insertionIt is sent into splicing
Layer is spliced, and obtains complete source node vector us, the complete source node vector usExpression formula is as follows:In formula, α is weight, for adjusting the balance between structural information and attribute information.
Preferably, the value range of α is [0,1].
A kind of friend recommendation system based on the insertion of multiple information sources figure, at least one connect including server and with server
A client;Wherein, server: for executing the above-mentioned friend recommendation method based on the insertion of multiple information sources figure;In server
With recommended models;Client: for receiving the pushed information of server.
Preferably, recommended models include the input layer set gradually, embeding layer, splicing layer, hidden layer and output layer;Input
Layer: for obtaining source node identification and destination node information;Embeding layer: for being carried out to source node identification and destination node information
Figure insertion processing, obtains the close close vector of vector sum attribute low-dimensional of structure low-dimensional of source node identification;And obtain target section
The close close vector of vector sum attribute low-dimensional of structure low-dimensional of point information;Splice layer: the structure for splicing source node identification is low
It ties up the close vector of close vector sum attribute low-dimensional and obtains complete source node vector;Structure for splicing destination node information is low
It ties up the close vector of close vector sum attribute low-dimensional and obtains complete destination node vector;Hidden layer: it is used for complete source node
The complete destination node vector of vector sum is stitched together, and spliced vector is analyzed and trained;Output layer: it is used for root
According to analysis and training, output probability value, and good friend's push is carried out according to probability value.
Preferably, hidden layer has multiple sub- hidden layers.
What the application realized has the beneficial effect that:
(1) deep learning is applied to social activity by the friend recommendation method and system based on the insertion of multiple information sources figure of the application
In the friend recommendation of network, using two kinds of information of structural information and attribute information, by the method for figure insertion to two parts information
It is handled, and is merged in early days, indicated, obtained as good friend's using the node of deep neural network study user
Possibility, and be user's commending friends according to the possibility, substantially increase the accuracy of friend recommendation.
(2) the friend recommendation method and system based on the insertion of multiple information sources figure of the application are converted to figure by figure insertion
Node each in network, the potential table of low dimensional is converted into using a mapping function by the lower dimensional space for saving figure information
Show.
(3) the friend recommendation method and system based on the insertion of multiple information sources figure of the application are by structural information and attribute information
Input as push model simultaneously, wherein the structural similarity of structural information capture node in a network, attribute information capture
Attribute homogeney, two parts message complementary sense sufficiently excavate the relationship between node;In addition, indicating section using the weight between node
Relationship between point, weight is bigger, and expression relationship is closer, conversely, indicating to stand off under weight.
(4) the friend recommendation method and system based on the insertion of multiple information sources figure of the application believe structural information and attribute
Breath takes different processing modes, and structure vector that treated and attribute vector carry out the fusion of early stage and learn, using different
Amalgamation mode merges two parts information, and the ratio of two parts information is adjusted by weight α, is finally sent to the node expression of integration
Learnt in push model, predicts the relationship between two nodes.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The some embodiments recorded in application can also be obtained according to these attached drawings other for those of ordinary skill in the art
Attached drawing.
Fig. 1 is a kind of push structural schematic diagram of embodiment of model;
Fig. 2 is the line chart of influence of the sub- hidden layer number number of plies to AUC value on UNC data set;
Fig. 3 is a kind of flow chart of embodiment of friend recommendation system being embedded in based on multiple information sources figure;
Fig. 4 is Common Neighbors, tetra- kinds of algorithms of Adamic-Adar, Deepwalk and node2vec exist
Performance histogram on OKLAHOMA data set;
Fig. 5 is tetra- kinds of Common Neighbors, Adamic-Adar, Deepwalk and node2vec algorithms in UNC data
Performance histogram on collection;
Fig. 6 is Common Neighbors, tetra- kinds of algorithms of Adamic-Adar, Deepwalk and node2vec exist
The performance histogram of Citeseer data set;
Fig. 7 is comparison line chart of the different weight αs on data set.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention
In embodiment, those skilled in the art's every other embodiment obtained without making creative work, all
Belong to the scope of protection of the invention.
The application provides a kind of friend recommendation method and system based on the insertion of multiple information sources figure, and deep learning is applied to
In the friend recommendation of social networks, through two kinds of information of method processing structure information and attribute information of figure insertion, after processing
Two kinds of information merged, using deep neural network study user node indicate, and obtain become good friend a possibility that,
It is in turn user's commending friends according to possibility, substantially increases the accuracy of commending friends.
As shown in Figure 1, the application provides a kind of friend recommendation system based on the insertion of multiple information sources figure, including server 1
With at least one client 2 being connect with server 1.
Wherein, server 1: for executing the friend recommendation method based on the insertion of multiple information sources figure;Have in server 1 and pushes away
Recommend model 11.
Client 2: for receiving the pushed information of server 1.
Further, recommended models 11 include the input layer 111 set gradually, embeding layer 112, splicing layer 113, hidden layer
114 and output layer 115.
Wherein, input layer 111: for obtaining source node identification and destination node information.
Embeding layer 112: for carrying out figure insertion processing to source node identification and destination node information, source node identification is obtained
The close close vector of vector sum attribute low-dimensional of structure low-dimensional;And obtain the close vector sum of structure low-dimensional of destination node information
The close vector of attribute low-dimensional.
Splice layer 113: the close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing source node identification obtains
Complete source node vector;The close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing destination node information obtains
Complete destination node vector.
Hidden layer 114: for the complete destination node vector of complete source node vector sum to be stitched together, and to splicing
Vector afterwards is analyzed and is trained.
Further, hidden layer 114 has multiple sub- hidden layers 1141, and the specific number of plies of the sub- hidden layer 1141 can basis
Depending on actual conditions.
Specifically, as one embodiment, different layers of sub- hidden layers 1141 are attempted in an experiment to experimental result
It influences.The generalization ability of push model 11 can be increased when increasing the depth of hidden layer (neural network) 114, but work as hidden layer
When 1141 number of plies of sub- hidden layer in 114 is excessive, optimization process will be become difficult, it is also possible to reduce the performance of algorithm.Such as Fig. 2
Shown, the push model 11 with hiding 1141 numbers of plies of different sons is in UNC data set (due to pushing away with the different hiding numbers of plies of son
Send trend of the model on other data sets similar, therefore only analyzed on UNC data set) on performance change, with
The number of plies of sub- hidden layer 1141 in hidden layer 114 increases, and the performance of push model 11 will be improved.The experiment shows
There is good effect using the frame of the push model 11 of deep layer, the number of plies intensification of sub- hidden layer 1141 can increase push model
11 generalization ability, and improve the performance of push model 11.Further, connect entirely since the push model 11 in experiment uses
Connect layer, the son of addition hide 1141 layers be more difficult to optimize, therefore this experiment carries out for using three straton hidden layers 1141
Illustrate, when not having to push model 11 without hidden layer, the performance for pushing model 11 is excessively poor, and the hidden layer 114 is every to increase by one layer
Sub- hidden layer 1141, then the performance of pushing module 11 gradually improves with the increase of sub- 1141 number of plies of hidden layer, with non-linear side
The common learning structure information of formula and attribute information interaction have validity.
Output layer 115: being used for according to analysis and training, output probability value, and carries out good friend's push according to probability value.
The friend recommendation system for combining the application to be embedded in based on multiple information sources figure further below, this application provides one kind
Based on the friend recommendation method of multiple information sources figure insertion, as shown in figure 3, including the following steps:
S110: source node identification and destination node information are obtained.
Specifically, server 1 is to 11 input data set of model is pushed, it include source node identification and destination node in data set
Information, input layer 111 obtain source node identification and destination node information directly from data set.
Further, source node identification includes source node structural information and source node attribute information.
Wherein, which is converted to one group of binary system in the form of one-hot encodes (one-hot coding)
Structure input vector input push model 11.
Wherein, which carries out coding as attribute input using different coding forms.The source node
Attribute information is Category Attributes or connection attribute.
Specifically, most of source node attribute information is Category Attributes, the Category Attributes are using one-hot coding (solely warm
Coding) form be converted to one group of binary attribute input vector input push model 11.Connection attribute passes through TF-IDF (word
Frequently-reverse document-frequency) be converted to real-valued vectors input pushing module 11.
Further, destination node information includes destination node structural information and destination node attribute information.
Wherein, the destination node structural information using one-hot encode (one-hot coding) in the form of be converted to one group two into
The structure input vector input push model 11 of system.
Wherein, which carries out coding as attribute input using different coding forms.The target
Node attribute information is Category Attributes or connection attribute.
Specifically, most of destination node attribute information is Category Attributes, the Category Attributes are (only using one-hot coding
Heat coding) form be converted to one group of binary attribute input vector input push model 11.Connection attribute passes through TF-IDF
(the reverse document-frequency of word frequency -) is converted to real-valued vectors input pushing module 11.
Specifically, when source node attribute information and destination node attribute information belong to Category Attributes, for the discrete category
Property processing method be all made of the form of one-hot coding, for example, gender information belongs to discrete message in attribute information, gender
Attribute can be divided into male, female, unknown three values, and one-hot coding converts thereof into three features: it is male, female, unknown, these three
Feature be all it is binary, when indicating female identity, one-hot is encoded to women={ 0,1,0 }, wherein deputy 1 table
Show female gender.
Specifically, when source node attribute information and destination node attribute information belong to connection attribute, the connection attribute without
Method is simply classified, and directly just can not be encoded this simple mode with one-ho and is indicated.Therefore for the continuous category
Property processing method use TF-IDF (the reverse document-frequency of word frequency -), TF-IDF is meant that: some word is in an article
It frequently occurs, but the number occurred in an other article is seldom, it is considered that the word can represent this article,
Illustrate that it has good separating capacity.For example, " friend recommendation " occurs in a paper in the data set of Citeseer
20 times, but occur in entire data set 30 times, illustrate that this paper mainly explains is exactly the content in relation to friend recommendation.
Wherein, the calculation formula of TF-IDF is as follows:
TF-IDF=TF*IDF (3),
In formula, j is word;tfijIndicate the frequency of word j;ni,jIndicate the number that the word occurs in a document, ni,jBelong to
In nk,j;K indicates total number of documents;nk,jThe number occurred in a document for word j;idfiFor reverse document-frequency;| D | indicate text
Part sum;|{j:ti∈dj| it indicates to include word tiNumber of files, in order to avoid mother be 0 the case where occur, usually existDenominator on plus 1;TF is to indicate term frequencies, that is, takes tfijValue;IDF is reverse document-frequency, i.e.,
Take idfiValue.
Further, data set includes training set, test set and verifying collection.
Specifically, data set is from North Carolina University (University of North as one embodiment
Carolina, UNC), (also known as ResearchIndex is NEC research institute by Oklahoma (OKLAHOMA) and CiteSeer
The academic opinion of one built on the basis of automatic citation indexes (Autonomous Citation Indexing, ACI) mechanism
Literary digital library) three real worlds data set, but be not limited only to the data set of three real worlds, i.e., can also be from
Other approach obtain initial data set of the data set as the application.
Specifically, the Facebook friend that the data set from UNC and OKLAHOMA can be constructed from Traud (name) et al.
Friendship network obtains.The data set is made of the student information of each school.Wherein, student information include student id information and
Student's attribute information;Student's attribute information include identity, gender, major, second major, dormitory, senior middle school, in grade
It is one or more.
Specifically, the data set from Citeseer is the citation network data set of computer science publication, the citation networks
Network data set is divided into six major class, includes 3312 nodes and 4715 sides.Wherein, each node represents a paper,
The relationship that reference can be generated between paper and/or be cited, the relationship can be embodied by side information;The attribute information source of node
In Article Titles content.Further, it is small to need to remove the frequency of occurrences in stop words and paper document in paper for attribute information
In 10 word.
Further, as one embodiment, training set in data set: test set: verifying collection=8:1:1.
Specifically, model 11 will be pushed by test set after push model 11 of the server 1 by training set training creation
It is trained for extensive model (extensive model is that the rule for referring to learn is suitable for the model of new samples).
Further, server 1 generates test error while being trained using test set to push model 11, should
Test error is used to assess the ability that push model 11 learns new samples.If pushing model 11, only effect is very on training set
It is good, and effect is poor on test set, then the push model 11 is likely to occur fitting problems.
Further, server 1 is collected using verifying after being trained using test set to push model 11 and determines push
The hyper parameter of model 11, to determine that push model 11 is used multiple times.
Specifically, verifying collection is for carrying out preliminary assessment for the ability for pushing model 11.As one embodiment, originally
The verifying collection of application is a part of data set chosen from training set.
S120: figure insertion processing is carried out to source node identification and destination node information respectively, and obtains source node identification
After insertion structure vector and insertion after attribute vector and destination node information insertion after structure vector and insertion after attribute to
Amount.
Specifically, since there are symmetry for push model 11, to the processing side of source node identification and destination node information
Formula is identical, therefore following steps are illustrated as an example with pushing model to the processing of source node identification.Deepwalk (depth trip
Walk) algorithm be first using depth learning technology carry out figure insertion study method.This method is random by experimental verification
Migration sequence interior joint all defers to exponential law as the word in document, calculates to further practise famous vocabulary dendrography
Method Word2vec is applied in random walk sequence, and is indicated by the node in learning network.Deepwalk algorithm is by network
In node be considered as word, by network random walk generate short sequence and be sent to Word2vec algorithm model as sentence
In be trained, thus obtain node vector indicate.Node2vec (node insertion) algorithm is changed to Deepwalk algorithm
Into.When Deepwalk algorithm generates sequence node by way of random walk, and node2vec algorithm changes random walk
Strategy, define two parameter p and q, a possibility that parameter p controls in migration accessed node again, parameter q controls migration and visits
A possibility that asking a node lower hop neighbor, node2vec algorithm make migration in breadth-first search by parameter p and parameter q and
Node in depth-first search reaches a balance.Therefore, the figure of node2vec algorithm is used in the push model of the application
Embedding grammar learns the structural information of node, and by obtaining the sequence of node in network wandering, node imbedding problem is treated
For word imbedding problem, final result is that similar node has similar insertion vector.
Specifically, node2vec algorithm and other several link prediction algorithms compare, two kinds of tradition are selected in this comparison
Algorithm Common Neighbors and Adamic-Adar and two kinds using deep learning algorithm Deepwalk and
node2vec.The weight α that adjustment structure information and attribute information are arranged in an experiment is 0.2, and hidden layer is 3 layers, neuron
Number is 512,256 and 128 three numerical value.As (wherein, method 1:AA index (is mentioned by two authors of Adamic and Adar for Fig. 4,5,6
Method out);Method 2: common neighbor approach;Method 3:Deepwalk (depth migration) method;Method 4:node2vec (node
Insertion) method;Method 5:MIEP (multiple information sources insertion recommended method)) shown in, performance of the algorithms of different on three data sets
As follows: CN (Common Neighbors) algorithm and AA (Adamic-Adar) algorithm are conventional methods, and two kinds of algorithms belong to base
In the link prediction algorithm of neighbours, the topological structure of user is only taken into account, AA algorithm is better than CN, main cause in performance capabilities
It is that it assigns node weights, punishment degree high node on the basis of CN.Although Deepwalk and node2vec algorithm is also only examined
Structural information is considered, but has but been better than AA and CN algorithm in performance, depth learning technology is utilized in both methods, random
The sequence that migration generates is learnt as sentence, and the vector for obtaining node indicates, captures the structural information of deeper.
Node2vec algorithm improves on the basis of Deepwalk algorithm, while considering local message and global information.Four kinds of baseline sides
Method does not all account for the attribute information of node, and the node in network includes attribute abundant, especially in social networks.And this
Apply for while considering structural information and attribute information, merged attribute information as auxiliary information and structural information, alleviates net
The problem of Sparse is linked in network, therefore the application is best using node2vec algorithm effect.
The structure input vector of the one-hot coding input of the source node identification of the application is embedded in by node2vec algorithm
For the close vector of low-dimensional(i.e. structure vector after the insertion of source node identification), captures the structural information of source node;Source node
Attribute input vector or real-valued vectors pass through customized weight matrix W(k)It is embedded as the attribute vector of Aggregate attribute information
(i.e. attribute vector after the insertion of source node identification).Wherein, customized weight matrix W(k)Dimension can be by creator voluntarily
Definition, common, source node attribute information is carried out by attribute input vector or real-valued vectors multiplied by customized weight matrix
Dimensionality reduction.
The structure input vector of an object of the application nodal information is embedded as the close vector of low-dimensional by node2vec algorithmThe attribute input vector or real-valued vectors of destination node information are embedded as Aggregate attribute letter by customized weight matrix
The attribute vector of breath
S130: the attribute vector after the structure vector and insertion after splicing the insertion of source node identification obtains complete source section
Point vector;The attribute vector after structure vector and insertion after splicing the insertion of destination node information obtains complete destination node
Vector.
Specifically, embeding layer 112 is by structure vector (the i.e. close vector of low-dimensional after the insertion of source node identification) and it is embedding
Attribute vector (attribute vector after entering) it is sent into splicing layer 113, the close vector of low-dimensional is spliced by splicing layer 113And attribute
VectorAnd obtain complete source node vector us(i.e. source node in Fig. 1 indicates), the complete source node vector usTable
It is as follows up to formula:
In formula, α is weight, for adjusting the balance between structural information and attribute information.
Further, the value of α can be any positive real number in push model 11.
Specifically, the value that α is arranged in an experiment is [0,0.01,0.1,1,10,100] as one embodiment.Work as α
When=0, push model 11 can only learn the structural information in source node and destination node, at this time be reduced to push model 11 only
There is the model of structural information;As α=100, the attribute information in source node and destination node rises leading in push model 11
Effect, the effect of structural information are unobvious.It is found by the experiment, when the value of α is between [0,1], push model 11 can
The experimental result obtained.Therefore during the experiment, by the value range control of α in [0,1], the variation of the value range of α
Between be divided into 0.2.(wherein, data set 1: for citeseer data set specific experiment result such as Fig. 7;Data set 2: for UNC (north card
Luo Laina university) data set;Data set 3: for OKLAHOMA (Oklahoma university) data set) shown in: attribute information pair
Important function is played really in improving push 11 performance of model, and as α=0, pushing 11 structure of model at this time only includes structure
Attribute information can be ignored in information, and the performance for pushing model 11 is worst.Especially in the experiment by taking Citeseer data set as an example
Performance it is the most obvious, but when α in 0 to 0.2 section, push model 11 performance be then greatly improved.Due to
The link information of Citeseer data set is fewer, can be by effectively alleviating the above problem using attribute information.
Specifically, embeding layer 112 is by structure vector (the i.e. close vector of low-dimensional after the insertion of destination node information) and
Attribute vector (attribute vector after insertion) it is sent into splicing layer 113, the close vector of low-dimensional is spliced by splicing layer 113WithAnd obtain complete destination node vector ut(i.e. destination node in Fig. 1 indicates), the complete destination node vector utTable
It is as follows up to formula:
In formula, γ is weight, for adjusting the balance between structural information and attribute information.In this application, expression formula
(5) γ in is identical as the α in expression formula (4).
S140: it after the complete destination node vector of complete source node vector sum is spliced, is sent into hidden layer and carries out
Study.
Specifically, the complete source node vector u that splicing layer 113 will acquiresWith complete destination node vector utIt is sent into hidden
Layer 114 is hidden, is spliced by the first straton hidden layer in hidden layer 114, spliced first straton hidden layer h(1)Expression
Formula are as follows:
Further, kth straton hidden layer h(k)Expression formula are as follows:
h(k)=δk(W(k)h(k-1)+b(k)) (7),
In formula, k indicates which layer of sub- hidden layer, k=2, n;W(k)For customized weight matrix, by founder
Member's self-defining;b(k)For customized offset parameter;δkFor activation primitive, the application chooses Relu function as sub- hidden layer
Activation primitive.
Wherein, Relu function is specific as follows:
Relu=max (0, X),
In formula, Max: the number for choosing that value is big between 0 and X is represented;X is the vector of input.
Further, in order to improve push model 11 prediction precision, refer to optimization algorithm to push model 11 into
Row optimization.
Specifically, in deep learning frame, stochastic gradient descent (SGD) method is a kind of general as one embodiment
Time neural network optimization algorithm, can to push model in parameter θjIt is iterated update.Stochastic gradient descent (SGD)
Data that method is randomly selected by one obtain gradient, with this to parameter θjIt is updated, loss function J (θ) is defined such as
Under:
In formula, m is sample size;For predicted value;For true value;J is j-th of sample.
Using loss function J (θ) come undated parameter θjValue, expression formula is as follows:
In formula, β is learning rate, customized by creator, attempts to obtain optimum value using random value), it is boarding steps
A key parameter in degree decline (SGD) algorithm, for determining the step-length of gradient decline.Specifically, learning rate β is excessive too small
It can all affect for the study of loss function J (θ), if learning rate β is too small, will affect loss function J (θ) most
The rate of smallization;If learning rate β is too big, it will appear the phenomenon that crossing minimum point.
Further, the hidden layer for pushing model, which is also quoted, dropout technology.
Specifically, during study (training) pushes model, with one in every layer of sub- hidden layer (neural network)
Fixed ratio uses dropout at random.Wherein, Dropout refers to the temporary nerve in certain probability dropping neural network
Member can make push model not depend on certain local features, to increase the generalization ability of model in this way.
S150: after completing study, the probability of existing connection between output node, and carried out according to the size of the probability
Push.
Specifically, after completing study by there is the probability of link in output layer 115 between Sigmoid function output node
(i.e. in Fig. 1 there are the probability on side), and according to the size of probability to 2 recommendation information of client, as one embodiment, this Shen
Pushed information please is push good friend.
Sigmoid function is as follows:
In formula, e is the truth of a matter of logarithm;X is the output of hidden layer.
Further, the evaluation index by AUC (Area Under Curve, area under a curve) as link prediction,
AUC literally means the area in region under curve, and wherein curve refers to ROC curve, and using FPR as abscissa, TPR numerical value is made
For ordinate.Wherein, FPR (False Positive Rate) represents false positive example rate, TPR (True Positive Rate) generation
The real example rate of table.Further, the value of AUC is generally higher than 0.5 less than 1, has quantified the classification capacity of ROC curve, and determines to divide
The value of the quality of class ability, AUC is the bigger the better, and the probability of output is more reasonable, and finally obtained ranking results are more reasonable, recommends knot
Fruit more enables user satisfied.Specifically, relevant evaluation index is calculated by confusion matrix such as table 1, it is specific as follows:
The calculation formula of FPR are as follows:
The calculation formula of TPR are as follows:
Wherein, TP (the real example i.e. in table 1): predicting the result is that positive example, actual result are also the feature of positive example
Number;FP (the false positive example i.e. in table 1): predicting the result is that positive example, actual result are the characteristics of negative example;FN is (i.e. in table 1
The negative example of vacation): it is predicting the result is that negative example, actual result are the characteristics of positive example;TN (the very negative example i.e. in table 1): pre-
It is measuring the result is that negative example, actual result are also the characteristic of negative example.
Table 1
What the application realized has the beneficial effect that:
(1) deep learning is applied to social activity by the friend recommendation method and system based on the insertion of multiple information sources figure of the application
In the friend recommendation of network, using two kinds of information of structural information and attribute information, by the method for figure insertion to two parts information
It is handled, and is merged in early days, indicated, obtained as good friend's using the node of deep neural network study user
Possibility, and be user's commending friends according to the possibility, substantially increase the accuracy of friend recommendation.
(2) the friend recommendation method and system based on the insertion of multiple information sources figure of the application are converted to figure by figure insertion
Node each in network, the potential table of low dimensional is converted into using a mapping function by the lower dimensional space for saving figure information
Show.
(3) the friend recommendation method and system based on the insertion of multiple information sources figure of the application are by structural information and attribute information
Input as push model simultaneously, wherein the structural similarity of structural information capture node in a network, attribute information capture
Attribute homogeney, two parts message complementary sense sufficiently excavate the relationship between node;In addition, indicating section using the weight between node
Relationship between point, weight is bigger, and expression relationship is closer, conversely, indicating to stand off under weight.
(4) the friend recommendation method and system based on the insertion of multiple information sources figure of the application believe structural information and attribute
Breath takes different processing modes, and structure vector that treated and attribute vector carry out the fusion of early stage and learn, using different
Amalgamation mode merges two parts information, and the ratio of two parts information is adjusted by weight α, is finally sent to the node expression of integration
Learnt in push model, predicts the relationship between two nodes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.Obviously, those skilled in the art can be to the application
Various modification and variations are carried out without departing from spirit and scope.If in this way, these modifications and variations of the application
Belong within the scope of the claim of this application and its equivalent technologies, then the application is also intended to encompass these modification and variations and exists
It is interior.
Claims (10)
1. a kind of friend recommendation method based on the insertion of multiple information sources figure, which is characterized in that steps are as follows:
Obtain source node identification and destination node information;
Figure insertion processing is carried out to the source node identification and the destination node information respectively, and obtains the source node identification
Insertion after structure vector and insertion after attribute vector and the destination node information insertion after structure vector and insertion after
Attribute vector;
Attribute vector after structure vector and insertion after splicing the insertion of the source node identification obtain complete source node to
Amount;Attribute vector after structure vector and insertion after splicing the insertion of the destination node information obtains complete destination node
Vector;
After the complete destination node vector of complete source node vector sum is spliced, it is sent into hidden layer and is learnt;
After completing study, the probability of existing connection between output node, and pushed according to the size of the probability.
2. the friend recommendation method according to claim 1 based on the insertion of multiple information sources figure, which is characterized in that the source section
Point information includes source node structural information and source node attribute information;The destination node information includes destination node structural information
With destination node attribute information.
3. the friend recommendation method according to claim 2 based on the insertion of multiple information sources figure, which is characterized in that the source section
Point structure information is converted to one group of binary structure input vector input in the form of one-hot coding.
4. the friend recommendation method according to claim 3 based on the insertion of multiple information sources figure, which is characterized in that the source section
Point attribute information is Category Attributes or connection attribute;The Category Attributes are converted to one group of binary system in the form of one-hot coding
The input of attribute input vector, the connection attribute is converted to real-valued vectors input by the reverse document-frequency of word frequency-.
5. the friend recommendation method according to claim 4 based on the insertion of multiple information sources figure, which is characterized in that source node letter
The structure input vector of the one-hot coding input of breath by node embedded mobile GIS be embedded as after the insertion of source node identification structure to
AmountThe attribute input vector or real-valued vectors of source node pass through customized weight matrix W(k)It is embedded as source node identification
Attribute vector after insertion
6. the friend recommendation method according to claim 5 based on the insertion of multiple information sources figure, which is characterized in that by source node
Structure vector after the insertion of informationWith the attribute vector after insertionIt is sent into splicing layer to be spliced, and obtains complete
Source node vector us, the complete source node vector usExpression formula is as follows:
In formula, α is weight, for adjusting the balance between structural information and attribute information.
7. the friend recommendation method according to claim 6 based on the insertion of multiple information sources figure, which is characterized in that the value of α
Range is [0,1].
8. it is a kind of based on multiple information sources figure insertion friend recommendation system, which is characterized in that including server and with the service
At least one client of device connection;
Wherein, it the server: requires described in any one of 1-7 for perform claim based on the good of multiple information sources figure insertion
Friendly recommended method;There are recommended models in the server;
The client: for receiving the pushed information of the server.
9. the friend recommendation system according to claim 8 based on the insertion of multiple information sources figure, which is characterized in that the recommendation
Model includes the input layer set gradually, embeding layer, splicing layer, hidden layer and output layer;
The input layer: for obtaining source node identification and destination node information;
The embeding layer: for carrying out figure insertion processing to the source node identification and the destination node information, source section is obtained
The close close vector of vector sum attribute low-dimensional of structure low-dimensional of point information;And the structure low-dimensional of acquisition destination node information is close
The close vector of vector sum attribute low-dimensional;
The splicing layer: the close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing source node identification obtains complete
Source node vector;The close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing destination node information obtains complete
Destination node vector;
The hidden layer: for the complete destination node vector of complete source node vector sum to be stitched together, and to splicing after
Vector analyzed and trained;
The output layer: being used for according to analysis and training, output probability value, and carries out good friend's push according to probability value.
10. the friend recommendation system according to claim 9 based on the insertion of multiple information sources figure, which is characterized in that described hidden
Hiding layer has multiple sub- hidden layers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910450218.XA CN110134881A (en) | 2019-05-28 | 2019-05-28 | A kind of friend recommendation method and system based on the insertion of multiple information sources figure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910450218.XA CN110134881A (en) | 2019-05-28 | 2019-05-28 | A kind of friend recommendation method and system based on the insertion of multiple information sources figure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110134881A true CN110134881A (en) | 2019-08-16 |
Family
ID=67582312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910450218.XA Pending CN110134881A (en) | 2019-05-28 | 2019-05-28 | A kind of friend recommendation method and system based on the insertion of multiple information sources figure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110134881A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111143684A (en) * | 2019-12-30 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based generalized model training method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997373A (en) * | 2016-12-29 | 2017-08-01 | 南京邮电大学 | A kind of link prediction method based on depth confidence network |
CN107330115A (en) * | 2017-07-12 | 2017-11-07 | 广东工业大学 | A kind of information recommendation method and device |
CN108647800A (en) * | 2018-03-19 | 2018-10-12 | 浙江工业大学 | A kind of online social network user missing attribute forecast method based on node insertion |
CN108920641A (en) * | 2018-07-02 | 2018-11-30 | 北京理工大学 | A kind of information fusion personalized recommendation method |
CN109376857A (en) * | 2018-09-03 | 2019-02-22 | 上海交通大学 | A kind of multi-modal depth internet startup disk method of fusion structure and attribute information |
CN109446171A (en) * | 2017-08-30 | 2019-03-08 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
-
2019
- 2019-05-28 CN CN201910450218.XA patent/CN110134881A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997373A (en) * | 2016-12-29 | 2017-08-01 | 南京邮电大学 | A kind of link prediction method based on depth confidence network |
CN107330115A (en) * | 2017-07-12 | 2017-11-07 | 广东工业大学 | A kind of information recommendation method and device |
CN109446171A (en) * | 2017-08-30 | 2019-03-08 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN108647800A (en) * | 2018-03-19 | 2018-10-12 | 浙江工业大学 | A kind of online social network user missing attribute forecast method based on node insertion |
CN108920641A (en) * | 2018-07-02 | 2018-11-30 | 北京理工大学 | A kind of information fusion personalized recommendation method |
CN109376857A (en) * | 2018-09-03 | 2019-02-22 | 上海交通大学 | A kind of multi-modal depth internet startup disk method of fusion structure and attribute information |
Non-Patent Citations (2)
Title |
---|
LIZI LIAO等: "《Attributed Social Network Embedding》", 《JOURNAL OF LATEX CLASS FILES》 * |
康琦等: "《机器学习中的不平衡分类方法》", 31 October 2017 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111143684A (en) * | 2019-12-30 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based generalized model training method and device |
CN111143684B (en) * | 2019-12-30 | 2023-03-21 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based generalized model training method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yuan et al. | Jointly embedding the local and global relations of heterogeneous graph for rumor detection | |
Luo et al. | Tourism attraction selection with sentiment analysis of online reviews based on probabilistic linguistic term sets and the IDOCRIW-COCOSO model | |
CN110837602B (en) | User recommendation method based on representation learning and multi-mode convolutional neural network | |
Logesh et al. | Learning recency and inferring associations in location based social network for emotion induced point-of-interest recommendation. | |
Xu et al. | Improving user recommendation by extracting social topics and interest topics of users in uni-directional social networks | |
CN108647800B (en) | Online social network user missing attribute prediction method based on node embedding | |
CN110532379B (en) | Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis | |
KR101543780B1 (en) | System and method for expert search by dynamic profile and social network reliability | |
CN112966091B (en) | Knowledge map recommendation system fusing entity information and heat | |
Amine et al. | Merging deep learning model for fake news detection | |
CN111538827A (en) | Case recommendation method and device based on content and graph neural network and storage medium | |
Zhang et al. | Alleviating new user cold-start in user-based collaborative filtering via bipartite network | |
Kang et al. | LA-CTR: A limited attention collaborative topic regression for social media | |
CN111666496A (en) | Group recommendation method based on comment text | |
CN113742586B (en) | Learning resource recommendation method and system based on knowledge graph embedding | |
CN110781300A (en) | Tourism resource culture characteristic scoring algorithm based on Baidu encyclopedia knowledge graph | |
CN110134881A (en) | A kind of friend recommendation method and system based on the insertion of multiple information sources figure | |
CN116010681A (en) | Training and retrieving method and device for recall model and electronic equipment | |
CN107203632A (en) | Topic Popularity prediction method based on similarity relation and cooccurrence relation | |
Li | Context-based collective preference aggregation for prioritizing crowd opinions in social decision-making | |
Deenadayalan et al. | User Feature Similarity Supported Collaborative Filtering for Page Recommendation Using Hybrid Shuffled Frog Leaping Algorithm. | |
Xiang et al. | Interactive Web API Recommendation for Mashup Development based on Light Neural Graph Collaborative Filtering | |
Rahim et al. | A comparative study of similarity and centrality measures for friends recommendation | |
Qin et al. | Recommender resources based on acquiring user's requirement and exploring user's preference with Word2Vec model in web service | |
Koopmann et al. | CoBERT: Scientific Collaboration Prediction via Sequential Recommendation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |
|
RJ01 | Rejection of invention patent application after publication |