CN107122455A

CN107122455A - A kind of network user's enhancing method for expressing based on microblogging

Info

Publication number: CN107122455A
Application number: CN201710283853.4A
Authority: CN
Inventors: 胡玥; 贾焰; 周斌; 杨树强; 韩伟红; 李爱平; 黄九鸣; 江荣; 全拥; 邓璐; 刘强; 张涛; 童咏之; 刘心; 韩文祥
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-04-26
Filing date: 2017-04-26
Publication date: 2017-09-01
Anticipated expiration: 2037-04-26
Also published as: CN107122455B

Abstract

Strengthen method for expressing the invention discloses a kind of network based on microblogging, the invention belongs to microblog data excavation applications, more particularly to for the network representation learning method of microblog data.This method considers the colloquial style feature of microblogging short text, the pretreatment of text is targetedly carried out, so as to reduce the influence of noise data；The character representation of user's history blog article text is generated using LDA topic models, and calculates the cosine similarity between any two users' blog article feature, so as to build potential friend relation network；The structural information of primitive network is integrated, and potential friend relation is fused in primitive network, revised network structure is obtained.The present invention corrects original network topology structure, so as to obtain the character representation of more accurately microblog users node using the potential friend relation network extracted in generating text from user.Compared to the network representation learning method for only considering network structure, in two tasks of sex and age reasoning, accuracy rate is significantly improved.

Description

A kind of network user's enhancing method for expressing based on microblogging

Technical field

Belong to microblog data excavation applications the invention belongs to microblog data, more particularly to for the network representation of microblog data Learning method.

Background technology

The internet in Web2.0 epoch is just progressively developing into ubiquitous Information Communication platform, the face such as Twitter, microblogging Masses are obtained rapidly to the social new media of social network services (Social Networking Services, abbreviation SNS) Favor.Newest statistics shows that Twitter moon any active ues reach 3.1 hundred million, and the moon any active ues of Sina weibo reach To 2.97 hundred million.People express viewpoint, sharing information, exchange and interdynamic by social media, and social media is propagated by social networks And flood message, produce profound influence in the field such as politics, economic, culture, education.Then, online social network data scale The characteristic such as huge, various informative, complicated, dynamic change, and the far-reaching guide effect of focus public sentiment so that online social Network analysis has important researching value.By taking Sina weibo as an example, user can issue the original blog article within 140 words, can To be the diversified forms such as picture, hyperlink, video, audio, the blog article of good friend of interest can also be browsed, forwarded, commenting on.Microblogging The characteristics of data are presented multi-source heterogeneous, it is all important data that user, which generates text, Customer attribute row form, network topology etc., Source, how to merge the character representation of multi-source micro-blog information calculating user node becomes most important.

Represent that to be that one, machine learning field is important studies a question for study, by learning one automatically from being originally inputted number According to the conversion to new character representation, effective character representation is obtained.Network representation study is exactly learning network node in low-dimensional The character representation in space, realizes the purpose that quantization characteristic and dimensionality reduction are represented.

At present, many achievements in research have been occurred in that in network representation learning areas.Traditional popular learning method from Low dimensional manifold structure is recovered in high dimensional data, the low-dimensional insertion for finding higher-dimension network data is represented.Such as, Isomap algorithms base In MDS theoretical frames, the geodesic curve distance of any two points is regard as the geometric description of manifold, LLE algorithms (Locally linear Embedding) think that a manifold can approximately regard local linear as in the local neighborhood of very little, by this linear fit Coefficient portraying as this manifold local geometric property, the basic thought of LE algorithms (Laplacian Eigenmaps) is A manifold is described with a undirected authorized graph, is then represented with figure insertion to find low-dimensional, that is, keeps the part of figure adjacent Figure, is signed in lower dimensional space by relation again from higher dimensional space.

In recent years, deep learning for network representation study provided new thinking, for large scale network structured data with Abundant network node information, the network representation model based on deep learning continuously emerges.

Inspired by word2vec models, Deepwalk models only consider the topological structure of network, by the node in network Word in correspondence corpus, the sequence pair of node generation answers the sentence in corpus, and standard is produced using the method for random walk List entries, then using Skip-gram models to Series Modeling so as to obtaining the vector representation of network node.But, Deepwalk algorithms do not set up object function, it is impossible to which the node for learning Weighted Directed Graph is represented, and sequence node is random production Raw, it is affected by noise big.

LINE models consider the single order and second order similitude of network topology structure simultaneously, and single order similitude is represented two in network Point between individual node is to similitude, the weight on side as between node, and second order similitude is set up " if sharing phase between node As neighbor node, then both tend to be similar " it is assumed that portraying second order similitude using the common neighbours of two nodes.Base After the model of single order similitude and second order similitude is built up, the node table of network is obtained using the negative method of sampling based on side Show.GraRep models consider the affinity information of higher order, and the local message of every single order is modeled respectively, using SVD matrixes point Solution method obtains the vector representation of network node, it is adaptable to large-scale network structure.

The side of neighbor node is found in the randomness generated for Deepwalk algorithms sequence node, node2vec model refinements Formula, it is believed that the node in network has a content similarities and structural similarity, wherein content similarities be mainly adjacent node it Between similitude, there is the neighbor node of homogeney by breadth-first search, and the node of structural similarity might not phase Neighbour, the neighbor node by depth-first search with structure homogeney uses Skip-grim methods to obtained sequence node Extract the vector representation of node.

The studies above is the angle from network structure, but the online social networks by representative of Sina weibo is not Only network topology, node also includes the information of a large amount of other forms.In view of the diversity of network node information, TADW (Text-associated Deepwalk) method uses induced matrix filling algorithm, while being built to text feature and network structure Mould, obtains more preferable network node and represents.GENE models consider that online social network user can voluntarily build group and selection adds Enter the group that other people build, even and if the side that is not joined directly together of same group of node, the fact that can also have some internal relations, The information of group is considered in network representation study.Multi-faceted Representations models consider user's generation Text, three kinds of information of node attribute information and network topology structure, obtain network node and more really represent.

However, the network in real world is typically sparse, i.e., the side number being joined directly together in network very little, just with The initial finite structure information of network is difficult that accurate network representation is arrived in study.For the user in online social networks, The similarity feature that generation text is reflected can imply that the two has common concern interest, then, it is understood that there may be it is potential Friend relation.Current research not yet carrys out the topological structure of extended network from the text message of node, so as to strengthen net list The effect that dendrography is practised.

The content of the invention

The present invention is true based on above-mentioned hypothesis for the openness feature of network structure, establishes a kind of combination user life Network user's enhancing into text message represents learning method, and with regard to the character representation of user, realizes user's sex and age Reasoning task.

The present invention to implement step as follows：

Step 1: with reference to existing microblogging short essay treatment method, being pre-processed to user's generation blog article, so as to eliminate The influence of noise data；

Step 2: with reference to related natural language processing technique, the characteristic vector of pretreated user's blog article text is generated, The similarity between blog article vector is calculated with reference to measuring similarity function, the potential friend relation for generating text based on user is carried Take, build potential friend relation network；

Step 3: considering the single order and second order similitude of network structure, integrate primitive network structural information and expand original Micro blog network topological relation network；

Step 4: by from blog article information extraction to the potential friend relation network integration to the network topology structure after integration On, correct original network structure information, including increase part connect while and augmenting portion connect while two kinds of amendment sides of weighted value Formula；

Step 5: with reference to existing network representation learning art, learning the character representation of enhanced micro blog network user；

Step 6: for the difference on effect between the expression vector of Contrast enhanced network and the expression vector of primitive network, Above-mentioned expression learning outcome is applied in sex and age reasoning task, the accuracy rate of the reasoning results is contrasted with pedestal method.

Compared with prior art, the advantage of the invention is that：The present invention is directed to the sparse sex chromosome mosaicism of network topology structure, examines The fact that consider " two users of similar blog article are delivered in online social networks has similar hobby ", proposes a kind of knot The network enhancing for sharing family generation text represents learning method, more accurately portrays the user characteristics of online social networks, carries The accuracy rate of high microblog users attribute reasoning task.

Brief description of the drawings

Fig. 1 is to combine the network enhancing method for expressing flow chart that user generates text

Fig. 2 is that the network enhancing of the embodiment of the present invention represents schematic diagram

Fig. 3 is the distribution map of the text feature of LDA extractions in the embodiment of the present invention

Fig. 4 be in the embodiment of the present invention from user generate Text Feature Extraction to potential network structure effect of visualization figure

Fig. 5 is the effect of visualization figure of enhancing network topology structure in the embodiment of the present invention

Fig. 6 is the experimental result comparison diagram of age reasoning task in the embodiment of the present invention

Embodiment：

The present invention is directed to the openness feature of network structure, true based on above-mentioned hypothesis, establishes a kind of combination user life Network user's enhancing into text message represents learning method, and with regard to the character representation of user, realizes user's sex and age Reasoning task.

The present invention is illustrated with reference to the accompanying drawings and detailed description.First, following formal definitions are provided：

In social networks, node is correspondence user, the substantial amounts of text message of each node correspondence, represents going through for correspondence user History blog article information.It is assumed that represent network with G, then G=(V, E, T), wherein, V={ v_iIt is user node set, E={ (v_i, v_j) it is two-value side collection, wherein each edge respective weights w, w ∈ { 0,1 }, T={ t_iIt is the blog article set that user generates.Then, The goal in research of the present invention is to capture the characteristic information of text from user's generation blog article and primitive network is modified, so that The low-dimensional of each node is represented in study corrective networks G "

Microblogging short text is pre-processed, and the blog article of Sina weibo is the short text that number of words is no more than 140 words, first, will be each The history blog article of user is integrated into a text fragment.The colloquial expression way of blog article causes microblogging text to there is substantial amounts of make an uproar Sound data, for the pretreatment operation of microblogging short text, by filtering stop words, replace abnormal word, and the process such as participle rejects text Noise data in this information, so as to be more beneficial for the extraction of text feature.The present invention for microblogging text used it is specific Pretreatment operation have it is following some：

1) content of text is the topic information for corresponding to blog article between two " # " are provided in Sina weibo, can reflect user's Interest is paid close attention to, then, the content of text between two " # " is directly extracted to be used as keyword, without cutting again；

2) "@" represents to refer to certain user, therefore the content of text after "@" is user's pet name, without further cutting；

3) additional characters such as punctuation mark in urtext are filtered out；

4) unusual vocabulary is compareed, all unusual words in text are replaced.Unusual word is that some are generally accepted often by netizen With cyberspeak, including initialism, splice word.Such as, if you wish to expression " thank you ", can be used " 3Q " or " 3q "；Also Have, " harmony " is possible to split into " standing grain mouthful speech is all " to express for some expression purposes；

5) complicated and simple vocabulary is compareed, all complex forms of Chinese characters are substituted for corresponding simplified Chinese character；

6) word segmentation processing is carried out to the microblogging text of reservation using HanLP participles instrument；

7) filtering disables the stop words in vocabulary；

8) the TF-IDF values of all words are counted, and filter out low frequency words therein；

The potential friend relation for generating text based on user is extracted, it is contemplated that similar blog article information can reflect between user Common concern interest, in other words there is a possibility that potential friend relation than larger between the corresponding user of similar blog article, Then, the customer relationship extracted from user's generation text is referred to as potential friend relation.

The extraction of potential friend relation can substantially incorporate text similarity computational problem into.First, using LDA topics Model generates the characteristic vector of user's microblogging text, then, and the cosine similarity calculated between any two users' blog article vector is characterized The weight size on corresponding potential relation side, so as to build potential friend relation network.

LDA is a generating probability model, is related to document, three levels of topic and word.It is considered that document can be with The random mixing of K potential topics is expressed as, wherein each topic obeys the multinomial distribution of word, every document obeys k words The multinomial distribution of topic.Then, for corpusIn every document, generating process is described as follows：

1) for each document M_i, selection θ~Dir (α), wherein Dir (α) is the Di Li Crays distribution of parameter alpha, and θ is The each topic of each element representation in one topic vector, vector appears in the probability in the document；

2) for j-th of word w in i-th document_ij, pass through conditional probability p (z_i| θ), select one from topic vector θ Individual potential topic z_i, then pass through conditional probability p (w_j|z_i, β) and generation word w_j.

3) given parameters α and parameter beta, the Joint Distribution of model is,

Wherein, w is observational variable, and θ is hidden variable, then we using EM algorithm (EM) learning parameter α and Parameter beta.

It is assumed that retaining preceding T topic, then each text fragment is embedded in vectorIts In, w_iIt is the weight for corresponding to i-th of topic, represents user v_iThe text of generation belongs to the possibility of i-th of topic.Fig. 2 is text The distribution map of eigen, for the generation text of each user, selects first three topic, then calculates and corresponds on three coordinates Coordinate value, the vector representation of point one text of correspondence.

Finally, each characteristic vector represents to generate the topic of textual association with each user, in other words, represents user's hair The concern interest extracted in the blog article of table.Then, we use cosine similarity computational methods, are extracted from these expression vectors Potential friend relation.Certainly, other similarity functions can be used for calculating the similarity between different vectors.It is given two Represent vectorWithThen two users v_iAnd v_jThe potential friend relation of generation can be defined as,

Therefore, the potential adjacency matrix extracted from user's generation text can be described as matrix Wherein, each element w '_ij∈ [0,1].

Primitive network structural information is integrated, the social networks of real world is typically sparse, because only that certain customers Between have direct concern relation.Moreover, directly friend relation is typically that user voluntarily adds according to the hobby of oneself, institute So that direct concern relation plays important role in the internet startup disk problem for only considering network structure.However, direct good friend Relation is not enough to describe whole network structure, may not be two people of good friend, it may have some common features.In fact, society In friendship network there are two users of common friend to level off to has identical interest and feature.

Then, LINE considers that above-mentioned two is true, and the concept that first proposed single order and second order similitude is fully portrayed The part and global information of network structure.

1) single order similitude：

Deckle collection E is given, for each node pair therein, the weighted value of corresponding sides represents single order similarity.Represent one Rank similarity matrix W¹Element, can be defined as,

2) second order similitude：

Common neighbours' number of arbitrary node pair is used for defining second order similarity, to describe the neighbour of two users in social networks Occupy the similitude of structure.User v is given respectively_iWith user v_jNeighbor node setWithThen common friend is calculated Number, second order similarity is defined as,

Now, we consider single order and second order similitude, in being fused to the adjacency matrix extracted from network structure.Cause This, we introduce W, represent neighbours' matrix after integrating, and each element of matrix is made up of two Similarity values,

Wherein, λ and μ are normalization coefficients, and specific value is determined by experiment constantly adjustment.

With potential friend relation corrective networks structure, carry out corrective networks knot from the potential friend relation of Text Feature Extraction first Structure, then learns the potential expression of network structure after extension using LINE models.This extension can bring two kinds of changes：The One, weight from 0 by without to having, i.e., becoming 1；Second, weight is changed from small to big.Shown in accompanying drawing 1, the subgraph of grayed-out nodes is former The network structure of beginning, colored node now is isolated node, i.e., colored node is closed with other node onrelevants in network System.After network structure complete with potential friend relation amendment, the dotted line side newly produced is the new good friend from microblogging Text Feature Extraction Relation, the solid line side of overstriking then represents the side right weight values increase in primitive network structure, i.e. friend relation strengthens.Accompanying drawing 3 and attached Fig. 4 is respectively the microblogging friend relation topological diagram before and after network structure amendment.

The adjacency matrix that W " is corrective networks is made, wherein, each element w "_ijFor,

However, some of revised adjacency matrix element is too small, so needing given threshold, delete all less than this The element of threshold value.Then, we represent using last amendment adjacency matrix as LINE input to calculate low-dimensional.LINE is first Single order and second order similarity are first introduced, and is based respectively on single order similitude and second order similitude, is that each node study is corresponding Vector is represented, then, introduces and how to represent permeate a final node of the two vector representations.

Substantially, what single order similitude was represented be the side of nodes pair weighted value.In order to model single order similarity, LINE models set up empirical probability using direct weight, then use by representing vectorial tectonic syntaxis probability, using K-L divergences To describe the error between empirical probability and joint probability, so as to set up object function.Similarly, second order similitude can also be built Vertical similar object function, respectively obtains the knot vector under two similarities using negative sampling optimization algorithm and represents Finally two vectors are simply spliced, final network representation is obtained

The sex reasoning task of microblog users can regard a two-value point for having supervision represented based on user characteristics as Class problem.Then, we use the SVM models of linear kernel, and final expression vector is trained as the feature extracted Gender sorter.With the experimental result such as table 1 of pedestal method, method of the invention is as shown in table 2.

The experimental result (pedestal method) of the sex reasoning task of table 1

The experimental result (method of the invention) of the sex reasoning task of table 2

As can be seen from the table, Average Accuracy improves about 4 percentage points.Moreover, with test set sample The increase of amount, accuracy rate increases, and to this, we can so explain that number of training is more, the classification that SVM training is obtained Device is more accurate.

Age reasoning, which is then one, many classification problems of supervision.For the age of more accurately reasoning test sample, I According to the distribution of date of birth in user profile, age of user is divided into 4 intervals.Statistics is it can be found that mostly Several users is the young people between being in 18 years old to 30 years old.Then, we are based on " one-to-one " and " a pair remaining " two kinds of SVM Expandable algorithm makes inferences to age of user.Experimental result is as shown in Table 3 and Table 4.

The experimental result (pedestal method) of 3 age of table reasoning task

The experimental result (method of the invention) of 4 age of table reasoning task

The SVM classifier that the first behavior of accuracy rate is extended by the way of " one-to-one " in two tables realizes age reasoning As a result, the SVM classifier that the second behavior is extended by the way of " a pair remaining " realizes the experimental result of age reasoning.From table The expression vector that data can be seen that obtained by network enhancing is represented has than the classification performance for the expression vector that reference scheme is obtained Very big raising, such as, when correspondence Percentage is 10% or so, the accuracy rate of the first expansion scheme is from 69.03% Bring up to 76.25%.Accompanying drawing 6 shows the Comparative result curve map of age reasoning, it is seen that the vector table obtained by network enhancing expression Show the more preferable classification results of vector representation obtained really than pedestal method.

Generally speaking, we are directed to the sparse sex chromosome mosaicism of online social networks in real world, similar based on blog article is delivered Two users between have potential friend relation the fact, it is proposed that a kind of aggregators text message network enhancing table Dendrography learning method, specifically, using potential friend relation network is extracted in generating text from user, corrects original network Topological structure, is represented so as to obtain more accurately network node.Compared to the network representation study for only considering network topology structure, In two tasks of sex and age reasoning, accuracy rate is significantly improved.

Therefore, the network enhancing method for expressing proposed by the invention based on microblogging is in network user's character representation and follow-up In classification and reasoning task, with critically important actual application value.

In order to illustrate present disclosure and implementation, this specification gives a specific embodiment.In embodiment The middle purpose for introducing details is not the scope for limiting claims, and is to aid in understanding the method for the invention.This area Technical staff should be understood that：Do not departing from the present invention and its spirit and scope of the appended claims, to most preferred embodiment step Various modifications, change or replacement be all possible.Therefore, the present invention should not be limited to disclosed in most preferred embodiment and accompanying drawing Content.

Claims

1. a kind of network user's enhancing method for expressing based on microblogging, it is characterised in that comprise the following steps：

Step 1: with reference to existing microblogging short essay treatment method, being pre-processed to user's generation blog article, so as to eliminate noise The influence of data；

Step 2: with reference to related natural language processing technique, generating the characteristic vector of pretreated user's blog article text, reference Measuring similarity function calculates the similarity between blog article vector, and the potential friend relation for generating text based on user is extracted, structure Build potential friend relation network；

Step 3: considering the single order and second order similitude of network structure, integrate primitive network structural information and expand in microblogging and use Topological relation network between family；

Step 4: by from blog article information extraction to the potential friend relation network integration in the network topology structure after integration, The original network structure information of amendment, including increase part potentially connect while and augmenting portion connected while two kinds of weighted value Correcting mode；

Step 6: for the difference on effect between the expression vector of Contrast enhanced network and the expression vector of primitive network, will be upper State expression learning outcome to be applied in sex and age reasoning task, the accuracy rate of the reasoning results is contrasted with pedestal method.

2. a kind of network user's enhancing method for expressing based on microblogging according to claim 1, it is characterised in that social network In network, node is correspondence user, the substantial amounts of text message of each node correspondence, represents the history blog article information of correspondence user.It is false Determine to represent network with G, then G=(V, E, T), wherein, V={ v_iIt is user node set, E- { (v_i, v_j) it is two-value side collection, often Bar side respective weights w, wherein w ∈ { 0,1 }, T={ t_iIt is the blog article set that user generates, the present invention is to generate blog article from user The characteristic information of middle capture text is simultaneously modified to primitive network, so as to learn the low-dimensional table of each node in corrective networks G " Show

3. a kind of network user's enhancing method for expressing based on microblogging according to claim 1, it is characterised in that described to obtain Microblogging short text preprocess method in step 2 is taken to include herein below：

(1) content of text, extracted between two " # " is used directly as keyword；

(2) content of text after "@", is extracted；

(3) additional characters such as punctuation mark in urtext, are filtered out；

(4) unusual vocabulary, is compareed, all unusual words in text are replaced；

(5) word segmentation processing, is carried out to the microblogging text of reservation using HanLP participles instrument；

(6), filtering disables the stop words in vocabulary；

(7) the TF-IDF values of all words, are counted, and filter out low frequency words therein.

4. a kind of network user's enhancing method for expressing based on microblogging according to claim 2, it is characterised in that the step The method that the potential friend relation for generating text based on user in rapid two is extracted is as follows：

(1) characteristic vector of user's microblogging text, is generated using LDA topic models：

LDA is a generating probability model, is related to document, three levels of topic and word.It is considered that a document can be represented For the random mixing of K potential topics, wherein each topic obeys the multinomial distribution of word, every document obeys k topic Multinomial distribution.Then, for corpusIn every document, generating process is described as follows：

For each document M_i, selection θ~Dir (α), wherein Dir (α) is the Di Li Crays distribution of parameter alpha, and θ is a topic The each topic of each element representation in vector, vector appears in the probability in the document；

For j-th of word w in i-th document_ij, pass through conditional probability p (z_i| θ), selection one is potential from topic vector θ Topic Z_i, then pass through conditional probability p (w_j|z_i, β) and generation word w_j.

Given parameters α and parameter beta, the Joint Distribution of model is,

<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>,</mo> <mi>z</mi> <mo>,</mo> <mi>w</mi> <mo>|</mo> <mi>&alpha;</mi> <mo>,</mo> <mi>&beta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>p</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>|</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <munderover> <mo>&Pi;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>|</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>|</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>&beta;</mi> <mo>)</mo> </mrow> </mrow>

Wherein, w is observational variable, and θ is hidden variable, and then we use EM algorithm (EM) learning parameter α and parameter β。

It is assumed that retaining preceding T topic, then each text fragment is embedded in vectorWherein, w_i It is the weight for corresponding to i-th of topic, represents user v_iThe text of generation belongs to the possibility of i-th of topic.

(2) the weight size that the cosine similarity between any two users' blog article vector characterizes corresponding potential relation side is calculated, from And build potential friend relation network；

Using cosine similarity computational methods, potential friend relation is extracted from these expression vectors.Given two represent vectorWithThen two users v_iAnd v_jThe potential friend relation of generation can be defined as,

Therefore, the potential adjacency matrix extracted from user's generation text can be described as matrixIts In, each element w '_ij∈ [0,1].

5. a kind of network user's enhancing method for expressing based on microblogging according to claim 2, it is characterised in that described to obtain Take the integration method of step 3 primitive network structural information as follows：

Two users with common friend, which level off to, in social networks identical interest and feature.LINE considers above-mentioned two thing Real, the concept that first proposed single order and second order similitude fully portrays the part and global information of network structure.

(1), single order similitude：

Deckle collection E is given, for each node pair therein, the weighted value of corresponding sides represents single order similarity.Represent single order phase Like degree matrix W¹Element, can be defined as,

<mrow> <msubsup> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>1</mn> </msubsup> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>1</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mtable> <mtr> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>j</mi> </msub> <mo>)</mo> <mo>&Element;</mo> <mi>E</mi> </mrow> </mtd> </mtr> </mtable> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

(2), second order similitude：

Common neighbours' number of arbitrary node pair is used for defining second order similarity, to describe neighbours' knot of two users in social networks The similitude of structure.User v is given respectively_iWith user v_jNeighbor node setWithThen common friend number, two are calculated Rank similarity is defined as

Now, we consider single order and second order similitude, in being fused to the adjacency matrix extracted from network structure.Therefore, We introduce W, represent neighbours' matrix after integrating, and each element of matrix is made up of two Similarity values,

<mrow> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mi>&lambda;</mi> <mo>&CenterDot;</mo> <msubsup> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>1</mn> </msubsup> <mo>+</mo> <mi>&mu;</mi> <mo>&CenterDot;</mo> <msubsup> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>2</mn> </msubsup> </mrow> 2

6. a kind of network user's enhancing method for expressing based on microblogging according to claim 2, it is characterised in that described to obtain Take step 4 as follows with the method for potential friend relation amendment primitive network structure：

<mrow> <msubsup> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msubsup> <mo>=</mo> <mfrac> <mrow> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>+</mo> <msubsup> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mo>&prime;</mo> </msubsup> </mrow> <mrow> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </munder> <mo>{</mo> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <msubsup> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>}</mo> </mrow> </mfrac> </mrow>

However, some of revised adjacency matrix element is too small, so needing given threshold, delete all less than the threshold value Element.Then, we carry out the low-dimensional table of calculating network node users using last amendment adjacency matrix as LINE input Show.