CN110134881A

CN110134881A - A kind of friend recommendation method and system based on the insertion of multiple information sources figure

Info

Publication number: CN110134881A
Application number: CN201910450218.XA
Authority: CN
Inventors: 张邦佐; 李瑞雪; 孙小新; 冯国忠
Original assignee: Northeast Normal University
Current assignee: Northeastern University China; Northeast Normal University
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2019-08-16

Abstract

This application discloses a kind of friend recommendation method and system based on the insertion of multiple information sources figure, wherein the friend recommendation method based on the insertion of multiple information sources figure includes: to obtain source node identification and destination node information；Figure insertion processing is carried out to source node identification and destination node information respectively, and obtains after the insertion of source node identification after structure vector and insertion after the insertion of attribute vector and destination node information attribute vector after structure vector and insertion；The attribute vector after structure vector and insertion after splicing the insertion of source node identification obtains complete source node vector；The attribute vector after structure vector and insertion after splicing the insertion of destination node information obtains complete destination node vector；Hidden layer is sent into after the complete destination node vector of complete source node vector sum is spliced to learn；It the probability of existing connection and is pushed according to the size of probability after completing study, between output node.The application has the technical effect for improving friend recommendation accuracy.

Description

A kind of friend recommendation method and system based on the insertion of multiple information sources figure

Technical field

This application involves information technology field more particularly to it is a kind of based on multiple information sources figure insertion friend recommendation method and System.

Background technique

Social networks is the network that communication exchange generates relationship to be formed between each member in society, we are familiar with The social networks of solution has wechat, Sina weibo, knows etc., and user can deliver oneself one in these social network-i i-platforms A little novelty ideas, can also thumb up or comment on to other people idea to generate interaction, and can make friends with by these interactions To new friend.A node indicates a user in social networks, and node connects to form a scale by relationship Huge network.Traditional, predict that there are one between two nodes by linking the attribute on side and node that prediction is observed A possibility that side, if between two nodes there are a line mean that two users there are friend relation, between two nodes not There are sides then to indicate that there is no friend relation or relationship are unknown.Exact relationship between two users in order to obtain, related In the link forecasting research of social networks, how accurately to indicate that the network information is a major issue.Traditional network representation The sparse vector of higher-dimension is generally used, but the sparse vector of higher-dimension needs to spend more runing times and calculates space, is people Using statistical learning method when limitation where.In addition, node also includes attribute information abundant, for example, user gender, Age, school, constellation, location information etc..In the interaction of social networks, the age, relatively more similar user was more likely to become Friend, school, same institute two users also than not being easier to become good friend in a school.In addition, the hobby of user It is a key factor of friend recommendation.It, should although current some social network sites are also specially provided with friend recommendation module Friend recommendation module generallys use collaborative filtering, i.e., analyzes the similarity between user by user-user relational matrix, according to Similarity size is that user recommends new friend.But in real life, a large amount of missing is contained in user-user Interactive matrix Value, and it is unable to get interactive information when social networks is added in new user, it be easy to cause Sparse, new user cold start-up etc. Problem.Traditional friend recommendation module mostly just with data information structural information calculate two users similitude, and incite somebody to action Attribute information abundant present in node has ignored.How these structural informations and attribute information to be fully utilized, point The preference of user is precipitated, it is thus understood that user prefers to associate with which type of people, prefers to buy which type of article, increases The accuracy of recommendation results and the satisfaction for improving user are also current problem to be solved.

Summary of the invention

The application's is designed to provide a kind of friend recommendation method and system based on the insertion of multiple information sources figure, has benefit The technical effect of the accuracy of friend recommendation is improved with attribute information and figure embedded mobile GIS.

In order to achieve the above objectives, the application provides a kind of friend recommendation method based on the insertion of multiple information sources figure, including such as Lower step: source node identification and destination node information are obtained；Figure insertion is carried out to source node identification and destination node information respectively Processing, and obtain the insertion of structure vector and attribute vector and destination node information after insertion after the insertion of source node identification Structure vector and attribute vector after insertion afterwards；The attribute vector after structure vector and insertion after splicing the insertion of source node identification Obtain complete source node vector；The attribute vector after structure vector and insertion after splicing the insertion of destination node information obtains Complete destination node vector；After the complete destination node vector of complete source node vector sum is spliced, it is sent into and hides Layer is learnt；After completing study, the probability of existing connection between output node, and pushed according to the size of probability.

Preferably, source node identification includes source node structural information and source node attribute information；Destination node information includes Destination node structural information and destination node attribute information.

Preferably, source node structural information is converted to one group of binary structure input vector in the form of one-hot coding Input.

Preferably, source node attribute information is Category Attributes or connection attribute；Category Attributes use the form of one-hot coding Be converted to one group of binary attribute input vector input, connection attribute by word frequency-inversely document-frequency be converted to real value to Amount input.

Preferably, the structure input vector of the one-hot coding input of source node identification is embedded as source by node embedded mobile GIS Structure vector after the insertion of nodal informationThe attribute input vector or real-valued vectors of source node pass through customized weight square Battle array W^(k)It is embedded as attribute vector after the insertion of source node identification

Preferably, by the structure vector after the insertion of source node identificationWith the attribute vector after insertionIt is sent into splicing Layer is spliced, and obtains complete source node vector u_s, the complete source node vector u_sExpression formula is as follows:In formula, α is weight, for adjusting the balance between structural information and attribute information.

Preferably, the value range of α is [0,1].

A kind of friend recommendation system based on the insertion of multiple information sources figure, at least one connect including server and with server A client；Wherein, server: for executing the above-mentioned friend recommendation method based on the insertion of multiple information sources figure；In server With recommended models；Client: for receiving the pushed information of server.

Preferably, recommended models include the input layer set gradually, embeding layer, splicing layer, hidden layer and output layer；Input Layer: for obtaining source node identification and destination node information；Embeding layer: for being carried out to source node identification and destination node information Figure insertion processing, obtains the close close vector of vector sum attribute low-dimensional of structure low-dimensional of source node identification；And obtain target section The close close vector of vector sum attribute low-dimensional of structure low-dimensional of point information；Splice layer: the structure for splicing source node identification is low It ties up the close vector of close vector sum attribute low-dimensional and obtains complete source node vector；Structure for splicing destination node information is low It ties up the close vector of close vector sum attribute low-dimensional and obtains complete destination node vector；Hidden layer: it is used for complete source node The complete destination node vector of vector sum is stitched together, and spliced vector is analyzed and trained；Output layer: it is used for root According to analysis and training, output probability value, and good friend's push is carried out according to probability value.

Preferably, hidden layer has multiple sub- hidden layers.

What the application realized has the beneficial effect that:

(1) deep learning is applied to social activity by the friend recommendation method and system based on the insertion of multiple information sources figure of the application In the friend recommendation of network, using two kinds of information of structural information and attribute information, by the method for figure insertion to two parts information It is handled, and is merged in early days, indicated, obtained as good friend's using the node of deep neural network study user Possibility, and be user's commending friends according to the possibility, substantially increase the accuracy of friend recommendation.

(2) the friend recommendation method and system based on the insertion of multiple information sources figure of the application are converted to figure by figure insertion Node each in network, the potential table of low dimensional is converted into using a mapping function by the lower dimensional space for saving figure information Show.

(3) the friend recommendation method and system based on the insertion of multiple information sources figure of the application are by structural information and attribute information Input as push model simultaneously, wherein the structural similarity of structural information capture node in a network, attribute information capture Attribute homogeney, two parts message complementary sense sufficiently excavate the relationship between node；In addition, indicating section using the weight between node Relationship between point, weight is bigger, and expression relationship is closer, conversely, indicating to stand off under weight.

(4) the friend recommendation method and system based on the insertion of multiple information sources figure of the application believe structural information and attribute Breath takes different processing modes, and structure vector that treated and attribute vector carry out the fusion of early stage and learn, using different Amalgamation mode merges two parts information, and the ratio of two parts information is adjusted by weight α, is finally sent to the node expression of integration Learnt in push model, predicts the relationship between two nodes.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application can also be obtained according to these attached drawings other for those of ordinary skill in the art Attached drawing.

Fig. 1 is a kind of push structural schematic diagram of embodiment of model；

Fig. 2 is the line chart of influence of the sub- hidden layer number number of plies to AUC value on UNC data set；

Fig. 3 is a kind of flow chart of embodiment of friend recommendation system being embedded in based on multiple information sources figure；

Fig. 4 is Common Neighbors, tetra- kinds of algorithms of Adamic-Adar, Deepwalk and node2vec exist Performance histogram on OKLAHOMA data set；

Fig. 5 is tetra- kinds of Common Neighbors, Adamic-Adar, Deepwalk and node2vec algorithms in UNC data Performance histogram on collection；

Fig. 6 is Common Neighbors, tetra- kinds of algorithms of Adamic-Adar, Deepwalk and node2vec exist The performance histogram of Citeseer data set；

Fig. 7 is comparison line chart of the different weight αs on data set.

Specific embodiment

With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, those skilled in the art's every other embodiment obtained without making creative work, all Belong to the scope of protection of the invention.

The application provides a kind of friend recommendation method and system based on the insertion of multiple information sources figure, and deep learning is applied to In the friend recommendation of social networks, through two kinds of information of method processing structure information and attribute information of figure insertion, after processing Two kinds of information merged, using deep neural network study user node indicate, and obtain become good friend a possibility that, It is in turn user's commending friends according to possibility, substantially increases the accuracy of commending friends.

As shown in Figure 1, the application provides a kind of friend recommendation system based on the insertion of multiple information sources figure, including server 1 With at least one client 2 being connect with server 1.

Wherein, server 1: for executing the friend recommendation method based on the insertion of multiple information sources figure；Have in server 1 and pushes away Recommend model 11.

Client 2: for receiving the pushed information of server 1.

Further, recommended models 11 include the input layer 111 set gradually, embeding layer 112, splicing layer 113, hidden layer 114 and output layer 115.

Wherein, input layer 111: for obtaining source node identification and destination node information.

Embeding layer 112: for carrying out figure insertion processing to source node identification and destination node information, source node identification is obtained The close close vector of vector sum attribute low-dimensional of structure low-dimensional；And obtain the close vector sum of structure low-dimensional of destination node information The close vector of attribute low-dimensional.

Splice layer 113: the close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing source node identification obtains Complete source node vector；The close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing destination node information obtains Complete destination node vector.

Hidden layer 114: for the complete destination node vector of complete source node vector sum to be stitched together, and to splicing Vector afterwards is analyzed and is trained.

Further, hidden layer 114 has multiple sub- hidden layers 1141, and the specific number of plies of the sub- hidden layer 1141 can basis Depending on actual conditions.

Specifically, as one embodiment, different layers of sub- hidden layers 1141 are attempted in an experiment to experimental result It influences.The generalization ability of push model 11 can be increased when increasing the depth of hidden layer (neural network) 114, but work as hidden layer When 1141 number of plies of sub- hidden layer in 114 is excessive, optimization process will be become difficult, it is also possible to reduce the performance of algorithm.Such as Fig. 2 Shown, the push model 11 with hiding 1141 numbers of plies of different sons is in UNC data set (due to pushing away with the different hiding numbers of plies of son Send trend of the model on other data sets similar, therefore only analyzed on UNC data set) on performance change, with The number of plies of sub- hidden layer 1141 in hidden layer 114 increases, and the performance of push model 11 will be improved.The experiment shows There is good effect using the frame of the push model 11 of deep layer, the number of plies intensification of sub- hidden layer 1141 can increase push model 11 generalization ability, and improve the performance of push model 11.Further, connect entirely since the push model 11 in experiment uses Connect layer, the son of addition hide 1141 layers be more difficult to optimize, therefore this experiment carries out for using three straton hidden layers 1141 Illustrate, when not having to push model 11 without hidden layer, the performance for pushing model 11 is excessively poor, and the hidden layer 114 is every to increase by one layer Sub- hidden layer 1141, then the performance of pushing module 11 gradually improves with the increase of sub- 1141 number of plies of hidden layer, with non-linear side The common learning structure information of formula and attribute information interaction have validity.

Output layer 115: being used for according to analysis and training, output probability value, and carries out good friend's push according to probability value.

The friend recommendation system for combining the application to be embedded in based on multiple information sources figure further below, this application provides one kind Based on the friend recommendation method of multiple information sources figure insertion, as shown in figure 3, including the following steps:

S110: source node identification and destination node information are obtained.

Specifically, server 1 is to 11 input data set of model is pushed, it include source node identification and destination node in data set Information, input layer 111 obtain source node identification and destination node information directly from data set.

Further, source node identification includes source node structural information and source node attribute information.

Wherein, which is converted to one group of binary system in the form of one-hot encodes (one-hot coding) Structure input vector input push model 11.

Wherein, which carries out coding as attribute input using different coding forms.The source node Attribute information is Category Attributes or connection attribute.

Specifically, most of source node attribute information is Category Attributes, the Category Attributes are using one-hot coding (solely warm Coding) form be converted to one group of binary attribute input vector input push model 11.Connection attribute passes through TF-IDF (word Frequently-reverse document-frequency) be converted to real-valued vectors input pushing module 11.

Further, destination node information includes destination node structural information and destination node attribute information.

Wherein, the destination node structural information using one-hot encode (one-hot coding) in the form of be converted to one group two into The structure input vector input push model 11 of system.

Wherein, which carries out coding as attribute input using different coding forms.The target Node attribute information is Category Attributes or connection attribute.

Specifically, most of destination node attribute information is Category Attributes, the Category Attributes are (only using one-hot coding Heat coding) form be converted to one group of binary attribute input vector input push model 11.Connection attribute passes through TF-IDF (the reverse document-frequency of word frequency -) is converted to real-valued vectors input pushing module 11.

Specifically, when source node attribute information and destination node attribute information belong to Category Attributes, for the discrete category Property processing method be all made of the form of one-hot coding, for example, gender information belongs to discrete message in attribute information, gender Attribute can be divided into male, female, unknown three values, and one-hot coding converts thereof into three features: it is male, female, unknown, these three Feature be all it is binary, when indicating female identity, one-hot is encoded to women={ 0,1,0 }, wherein deputy 1 table Show female gender.

Specifically, when source node attribute information and destination node attribute information belong to connection attribute, the connection attribute without Method is simply classified, and directly just can not be encoded this simple mode with one-ho and is indicated.Therefore for the continuous category Property processing method use TF-IDF (the reverse document-frequency of word frequency -), TF-IDF is meant that: some word is in an article It frequently occurs, but the number occurred in an other article is seldom, it is considered that the word can represent this article, Illustrate that it has good separating capacity.For example, " friend recommendation " occurs in a paper in the data set of Citeseer 20 times, but occur in entire data set 30 times, illustrate that this paper mainly explains is exactly the content in relation to friend recommendation. Wherein, the calculation formula of TF-IDF is as follows:

TF-IDF=TF*IDF (3),

In formula, j is word；tf_ijIndicate the frequency of word j；n_i,jIndicate the number that the word occurs in a document, n_i,jBelong to In n_k,j；K indicates total number of documents；n_k,jThe number occurred in a document for word j；idf_iFor reverse document-frequency；| D | indicate text Part sum；|{j:t_i∈d_j| it indicates to include word t_iNumber of files, in order to avoid mother be 0 the case where occur, usually existDenominator on plus 1；TF is to indicate term frequencies, that is, takes tf_ijValue；IDF is reverse document-frequency, i.e., Take idf_iValue.

Further, data set includes training set, test set and verifying collection.

Specifically, data set is from North Carolina University (University of North as one embodiment Carolina, UNC), (also known as ResearchIndex is NEC research institute by Oklahoma (OKLAHOMA) and CiteSeer The academic opinion of one built on the basis of automatic citation indexes (Autonomous Citation Indexing, ACI) mechanism Literary digital library) three real worlds data set, but be not limited only to the data set of three real worlds, i.e., can also be from Other approach obtain initial data set of the data set as the application.

Specifically, the Facebook friend that the data set from UNC and OKLAHOMA can be constructed from Traud (name) et al. Friendship network obtains.The data set is made of the student information of each school.Wherein, student information include student id information and Student's attribute information；Student's attribute information include identity, gender, major, second major, dormitory, senior middle school, in grade It is one or more.

Specifically, the data set from Citeseer is the citation network data set of computer science publication, the citation networks Network data set is divided into six major class, includes 3312 nodes and 4715 sides.Wherein, each node represents a paper, The relationship that reference can be generated between paper and/or be cited, the relationship can be embodied by side information；The attribute information source of node In Article Titles content.Further, it is small to need to remove the frequency of occurrences in stop words and paper document in paper for attribute information In 10 word.

Further, as one embodiment, training set in data set: test set: verifying collection=8:1:1.

Specifically, model 11 will be pushed by test set after push model 11 of the server 1 by training set training creation It is trained for extensive model (extensive model is that the rule for referring to learn is suitable for the model of new samples).

Further, server 1 generates test error while being trained using test set to push model 11, should Test error is used to assess the ability that push model 11 learns new samples.If pushing model 11, only effect is very on training set It is good, and effect is poor on test set, then the push model 11 is likely to occur fitting problems.

Further, server 1 is collected using verifying after being trained using test set to push model 11 and determines push The hyper parameter of model 11, to determine that push model 11 is used multiple times.

Specifically, verifying collection is for carrying out preliminary assessment for the ability for pushing model 11.As one embodiment, originally The verifying collection of application is a part of data set chosen from training set.

S120: figure insertion processing is carried out to source node identification and destination node information respectively, and obtains source node identification After insertion structure vector and insertion after attribute vector and destination node information insertion after structure vector and insertion after attribute to Amount.

Specifically, since there are symmetry for push model 11, to the processing side of source node identification and destination node information Formula is identical, therefore following steps are illustrated as an example with pushing model to the processing of source node identification.Deepwalk (depth trip Walk) algorithm be first using depth learning technology carry out figure insertion study method.This method is random by experimental verification Migration sequence interior joint all defers to exponential law as the word in document, calculates to further practise famous vocabulary dendrography Method Word2vec is applied in random walk sequence, and is indicated by the node in learning network.Deepwalk algorithm is by network In node be considered as word, by network random walk generate short sequence and be sent to Word2vec algorithm model as sentence In be trained, thus obtain node vector indicate.Node2vec (node insertion) algorithm is changed to Deepwalk algorithm Into.When Deepwalk algorithm generates sequence node by way of random walk, and node2vec algorithm changes random walk Strategy, define two parameter p and q, a possibility that parameter p controls in migration accessed node again, parameter q controls migration and visits A possibility that asking a node lower hop neighbor, node2vec algorithm make migration in breadth-first search by parameter p and parameter q and Node in depth-first search reaches a balance.Therefore, the figure of node2vec algorithm is used in the push model of the application Embedding grammar learns the structural information of node, and by obtaining the sequence of node in network wandering, node imbedding problem is treated For word imbedding problem, final result is that similar node has similar insertion vector.

Specifically, node2vec algorithm and other several link prediction algorithms compare, two kinds of tradition are selected in this comparison Algorithm Common Neighbors and Adamic-Adar and two kinds using deep learning algorithm Deepwalk and node2vec.The weight α that adjustment structure information and attribute information are arranged in an experiment is 0.2, and hidden layer is 3 layers, neuron Number is 512,256 and 128 three numerical value.As (wherein, method 1:AA index (is mentioned by two authors of Adamic and Adar for Fig. 4,5,6 Method out)；Method 2: common neighbor approach；Method 3:Deepwalk (depth migration) method；Method 4:node2vec (node Insertion) method；Method 5:MIEP (multiple information sources insertion recommended method)) shown in, performance of the algorithms of different on three data sets As follows: CN (Common Neighbors) algorithm and AA (Adamic-Adar) algorithm are conventional methods, and two kinds of algorithms belong to base In the link prediction algorithm of neighbours, the topological structure of user is only taken into account, AA algorithm is better than CN, main cause in performance capabilities It is that it assigns node weights, punishment degree high node on the basis of CN.Although Deepwalk and node2vec algorithm is also only examined Structural information is considered, but has but been better than AA and CN algorithm in performance, depth learning technology is utilized in both methods, random The sequence that migration generates is learnt as sentence, and the vector for obtaining node indicates, captures the structural information of deeper. Node2vec algorithm improves on the basis of Deepwalk algorithm, while considering local message and global information.Four kinds of baseline sides Method does not all account for the attribute information of node, and the node in network includes attribute abundant, especially in social networks.And this Apply for while considering structural information and attribute information, merged attribute information as auxiliary information and structural information, alleviates net The problem of Sparse is linked in network, therefore the application is best using node2vec algorithm effect.

The structure input vector of the one-hot coding input of the source node identification of the application is embedded in by node2vec algorithm For the close vector of low-dimensional(i.e. structure vector after the insertion of source node identification), captures the structural information of source node；Source node Attribute input vector or real-valued vectors pass through customized weight matrix W^(k)It is embedded as the attribute vector of Aggregate attribute information (i.e. attribute vector after the insertion of source node identification).Wherein, customized weight matrix W^(k)Dimension can be by creator voluntarily Definition, common, source node attribute information is carried out by attribute input vector or real-valued vectors multiplied by customized weight matrix Dimensionality reduction.

The structure input vector of an object of the application nodal information is embedded as the close vector of low-dimensional by node2vec algorithmThe attribute input vector or real-valued vectors of destination node information are embedded as Aggregate attribute letter by customized weight matrix The attribute vector of breath

S130: the attribute vector after the structure vector and insertion after splicing the insertion of source node identification obtains complete source section Point vector；The attribute vector after structure vector and insertion after splicing the insertion of destination node information obtains complete destination node Vector.

Specifically, embeding layer 112 is by structure vector (the i.e. close vector of low-dimensional after the insertion of source node identification) and it is embedding Attribute vector (attribute vector after entering) it is sent into splicing layer 113, the close vector of low-dimensional is spliced by splicing layer 113And attribute VectorAnd obtain complete source node vector u_s(i.e. source node in Fig. 1 indicates), the complete source node vector u_sTable It is as follows up to formula:

In formula, α is weight, for adjusting the balance between structural information and attribute information.

Further, the value of α can be any positive real number in push model 11.

Specifically, the value that α is arranged in an experiment is [0,0.01,0.1,1,10,100] as one embodiment.Work as α When=0, push model 11 can only learn the structural information in source node and destination node, at this time be reduced to push model 11 only There is the model of structural information；As α=100, the attribute information in source node and destination node rises leading in push model 11 Effect, the effect of structural information are unobvious.It is found by the experiment, when the value of α is between [0,1], push model 11 can The experimental result obtained.Therefore during the experiment, by the value range control of α in [0,1], the variation of the value range of α Between be divided into 0.2.(wherein, data set 1: for citeseer data set specific experiment result such as Fig. 7；Data set 2: for UNC (north card Luo Laina university) data set；Data set 3: for OKLAHOMA (Oklahoma university) data set) shown in: attribute information pair Important function is played really in improving push 11 performance of model, and as α=0, pushing 11 structure of model at this time only includes structure Attribute information can be ignored in information, and the performance for pushing model 11 is worst.Especially in the experiment by taking Citeseer data set as an example Performance it is the most obvious, but when α in 0 to 0.2 section, push model 11 performance be then greatly improved.Due to The link information of Citeseer data set is fewer, can be by effectively alleviating the above problem using attribute information.

Specifically, embeding layer 112 is by structure vector (the i.e. close vector of low-dimensional after the insertion of destination node information) and Attribute vector (attribute vector after insertion) it is sent into splicing layer 113, the close vector of low-dimensional is spliced by splicing layer 113WithAnd obtain complete destination node vector u_t(i.e. destination node in Fig. 1 indicates), the complete destination node vector u_tTable It is as follows up to formula:

In formula, γ is weight, for adjusting the balance between structural information and attribute information.In this application, expression formula (5) γ in is identical as the α in expression formula (4).

S140: it after the complete destination node vector of complete source node vector sum is spliced, is sent into hidden layer and carries out Study.

Specifically, the complete source node vector u that splicing layer 113 will acquire_sWith complete destination node vector u_tIt is sent into hidden Layer 114 is hidden, is spliced by the first straton hidden layer in hidden layer 114, spliced first straton hidden layer h⁽¹⁾Expression Formula are as follows:

Further, kth straton hidden layer h^(k)Expression formula are as follows:

h^(k)=δ_k(W^(k)h^(k-1)+b^(k)) (7),

In formula, k indicates which layer of sub- hidden layer, k=2, n；W^(k)For customized weight matrix, by founder Member's self-defining；b^(k)For customized offset parameter；δ_kFor activation primitive, the application chooses Relu function as sub- hidden layer Activation primitive.

Wherein, Relu function is specific as follows:

Relu=max (0, X),

In formula, Max: the number for choosing that value is big between 0 and X is represented；X is the vector of input.

Further, in order to improve push model 11 prediction precision, refer to optimization algorithm to push model 11 into Row optimization.

Specifically, in deep learning frame, stochastic gradient descent (SGD) method is a kind of general as one embodiment Time neural network optimization algorithm, can to push model in parameter θ_jIt is iterated update.Stochastic gradient descent (SGD) Data that method is randomly selected by one obtain gradient, with this to parameter θ_jIt is updated, loss function J (θ) is defined such as Under:

In formula, m is sample size；For predicted value；For true value；J is j-th of sample.

Using loss function J (θ) come undated parameter θ_jValue, expression formula is as follows:

In formula, β is learning rate, customized by creator, attempts to obtain optimum value using random value), it is boarding steps A key parameter in degree decline (SGD) algorithm, for determining the step-length of gradient decline.Specifically, learning rate β is excessive too small It can all affect for the study of loss function J (θ), if learning rate β is too small, will affect loss function J (θ) most The rate of smallization；If learning rate β is too big, it will appear the phenomenon that crossing minimum point.

Further, the hidden layer for pushing model, which is also quoted, dropout technology.

Specifically, during study (training) pushes model, with one in every layer of sub- hidden layer (neural network) Fixed ratio uses dropout at random.Wherein, Dropout refers to the temporary nerve in certain probability dropping neural network Member can make push model not depend on certain local features, to increase the generalization ability of model in this way.

S150: after completing study, the probability of existing connection between output node, and carried out according to the size of the probability Push.

Specifically, after completing study by there is the probability of link in output layer 115 between Sigmoid function output node (i.e. in Fig. 1 there are the probability on side), and according to the size of probability to 2 recommendation information of client, as one embodiment, this Shen Pushed information please is push good friend.

Sigmoid function is as follows:

In formula, e is the truth of a matter of logarithm；X is the output of hidden layer.

Further, the evaluation index by AUC (Area Under Curve, area under a curve) as link prediction, AUC literally means the area in region under curve, and wherein curve refers to ROC curve, and using FPR as abscissa, TPR numerical value is made For ordinate.Wherein, FPR (False Positive Rate) represents false positive example rate, TPR (True Positive Rate) generation The real example rate of table.Further, the value of AUC is generally higher than 0.5 less than 1, has quantified the classification capacity of ROC curve, and determines to divide The value of the quality of class ability, AUC is the bigger the better, and the probability of output is more reasonable, and finally obtained ranking results are more reasonable, recommends knot Fruit more enables user satisfied.Specifically, relevant evaluation index is calculated by confusion matrix such as table 1, it is specific as follows:

The calculation formula of FPR are as follows:

The calculation formula of TPR are as follows:

Wherein, TP (the real example i.e. in table 1): predicting the result is that positive example, actual result are also the feature of positive example Number；FP (the false positive example i.e. in table 1): predicting the result is that positive example, actual result are the characteristics of negative example；FN is (i.e. in table 1 The negative example of vacation): it is predicting the result is that negative example, actual result are the characteristics of positive example；TN (the very negative example i.e. in table 1): pre- It is measuring the result is that negative example, actual result are also the characteristic of negative example.

Table 1

What the application realized has the beneficial effect that:

Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.Obviously, those skilled in the art can be to the application Various modification and variations are carried out without departing from spirit and scope.If in this way, these modifications and variations of the application Belong within the scope of the claim of this application and its equivalent technologies, then the application is also intended to encompass these modification and variations and exists It is interior.

Claims

1. a kind of friend recommendation method based on the insertion of multiple information sources figure, which is characterized in that steps are as follows:

Obtain source node identification and destination node information；

Figure insertion processing is carried out to the source node identification and the destination node information respectively, and obtains the source node identification Insertion after structure vector and insertion after attribute vector and the destination node information insertion after structure vector and insertion after Attribute vector；

Attribute vector after structure vector and insertion after splicing the insertion of the source node identification obtain complete source node to Amount；Attribute vector after structure vector and insertion after splicing the insertion of the destination node information obtains complete destination node Vector；

After the complete destination node vector of complete source node vector sum is spliced, it is sent into hidden layer and is learnt；

After completing study, the probability of existing connection between output node, and pushed according to the size of the probability.

2. the friend recommendation method according to claim 1 based on the insertion of multiple information sources figure, which is characterized in that the source section Point information includes source node structural information and source node attribute information；The destination node information includes destination node structural information With destination node attribute information.

3. the friend recommendation method according to claim 2 based on the insertion of multiple information sources figure, which is characterized in that the source section Point structure information is converted to one group of binary structure input vector input in the form of one-hot coding.

4. the friend recommendation method according to claim 3 based on the insertion of multiple information sources figure, which is characterized in that the source section Point attribute information is Category Attributes or connection attribute；The Category Attributes are converted to one group of binary system in the form of one-hot coding The input of attribute input vector, the connection attribute is converted to real-valued vectors input by the reverse document-frequency of word frequency-.

5. the friend recommendation method according to claim 4 based on the insertion of multiple information sources figure, which is characterized in that source node letter The structure input vector of the one-hot coding input of breath by node embedded mobile GIS be embedded as after the insertion of source node identification structure to AmountThe attribute input vector or real-valued vectors of source node pass through customized weight matrix W^(k)It is embedded as source node identification Attribute vector after insertion

6. the friend recommendation method according to claim 5 based on the insertion of multiple information sources figure, which is characterized in that by source node Structure vector after the insertion of informationWith the attribute vector after insertionIt is sent into splicing layer to be spliced, and obtains complete Source node vector u_s, the complete source node vector u_sExpression formula is as follows:

7. the friend recommendation method according to claim 6 based on the insertion of multiple information sources figure, which is characterized in that the value of α Range is [0,1].

8. it is a kind of based on multiple information sources figure insertion friend recommendation system, which is characterized in that including server and with the service At least one client of device connection；

Wherein, it the server: requires described in any one of 1-7 for perform claim based on the good of multiple information sources figure insertion Friendly recommended method；There are recommended models in the server；

The client: for receiving the pushed information of the server.

9. the friend recommendation system according to claim 8 based on the insertion of multiple information sources figure, which is characterized in that the recommendation Model includes the input layer set gradually, embeding layer, splicing layer, hidden layer and output layer；

The input layer: for obtaining source node identification and destination node information；

The embeding layer: for carrying out figure insertion processing to the source node identification and the destination node information, source section is obtained The close close vector of vector sum attribute low-dimensional of structure low-dimensional of point information；And the structure low-dimensional of acquisition destination node information is close The close vector of vector sum attribute low-dimensional；

The splicing layer: the close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing source node identification obtains complete Source node vector；The close vector of the close vector sum attribute low-dimensional of structure low-dimensional for splicing destination node information obtains complete Destination node vector；

The hidden layer: for the complete destination node vector of complete source node vector sum to be stitched together, and to splicing after Vector analyzed and trained；

The output layer: being used for according to analysis and training, output probability value, and carries out good friend's push according to probability value.

10. the friend recommendation system according to claim 9 based on the insertion of multiple information sources figure, which is characterized in that described hidden Hiding layer has multiple sub- hidden layers.