CN107608962A - Pushing away based on complex network especially big selects data analysing method - Google Patents

Pushing away based on complex network especially big selects data analysing method Download PDF

Info

Publication number
CN107608962A
CN107608962A CN201710816286.4A CN201710816286A CN107608962A CN 107608962 A CN107608962 A CN 107608962A CN 201710816286 A CN201710816286 A CN 201710816286A CN 107608962 A CN107608962 A CN 107608962A
Authority
CN
China
Prior art keywords
push away
corporations
complex network
text
literary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710816286.4A
Other languages
Chinese (zh)
Inventor
费高雷
胡翔宇
许乔若
顾杰
艾小翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710816286.4A priority Critical patent/CN107608962A/en
Publication of CN107608962A publication Critical patent/CN107608962A/en
Pending legal-status Critical Current

Links

Abstract

The present invention discloses a kind of pushes away literary general election data analysing method based on complex network, for pushing away the problem of literary hiding information is excavated, the Sentiment orientation that text is pushed away in literary network is pushed away by analysis, and by sentiment analysis result for same emotion node division into a corporations, the corporations of multiple known Sentiment orientations are obtained, then the method divided by complex network community will be unable to judge the node division of Sentiment orientation into the corporations of the known Sentiment orientation obtained before and finally give the prediction result of general election;The method that the application provides can more accurately judge that the voter of every day in general election is inclined to change, be advantageous to further predict general election result.

Description

Pushing away based on complex network especially big selects data analysing method
Technical field
The invention belongs to Data Mining, more particularly to one kind pushes away special data prediction technology.
Background technology
State's general election result is successfully predicted to formulating state by analyzing the tendency of popular ballot before other country's general election Family's strategy tool is of great significance.Traditional method for carrying out opinion poll to its user by media, can be by limited Media itself audience size and person under investigation's number quantity influence so that prediction result is not accurate enough, therefore by pushing away special number It is predicted that general election result is an emerging effective way.The service condition of word in text is pushed away by analysis come judge this push away text and Deliver push away text user realized to the candidate in general election and its attitude of political parties and groups, and then by sentiment analysis method to general election The prediction of final result.
, it is necessary to first be located in advance to pushing away text in analysis of the word during text is pushed away by analysis to judge its Sentiment orientation Reason, remove the functional stop words pushed away in text.Meanwhile when pushing away specific to each on text, just must take into consideration to push away in text is made The Sentiment orientation of word.Judgement to the Sentiment orientation of cyberspeak in recent years is mainly based upon two class sentiment dictionaries:1、 The sentiment dictionary of everyday words, negative words are represented and 2006 represent positive lists including wherein containing 4783 Word;2nd, the sentiment dictionary of network slang, because the term custom in the micro-blog of network is distinct with the literary style of common-use words, because We also need to add the Sentiment orientation of some slangs for this.One Sentiment orientation for pushing away text mainly by its used emotion word and Wherein represent the compound word (such as very, extremely) of enhancement and represent that the compound word (such as not) of negative determines.
The above-mentioned sentiment analysis based on emotion word can only analyze the text that pushes away of the word that contains in sentiment dictionary, and this portion Point push away text and only account for and always push away the 20% of literary number.But how to excavate remaining 80% and push away the information hidden in text, do not go out also Now effectively solves method.
Large amount of complex system present in nature can be described by panoramic network.One typical What the company side that network is connect between node by multiple nodes formed, its interior joint is used for representing individual different in real system, And side is then used for representing the relation between individual, one is added typically when having certain specific relation between two nodes Lian Bian, it is on the contrary then do not connect side.Two nodes for having side connected are counted as adjacent in a network.With the physics meaning to network The further investigation of justice and mathematical characteristic, it has been found that many real networks all have a common property --- community structure, i.e., Whole network is made up of several corporations, the connection between each corporations' internal node closely, and between each corporations Connection is comparatively than sparse.Therefore a large-scale complex network can be internal chain by the algorithm partition that corporations divide Meet more close multiple corporations.
The content of the invention
In order to solve the above technical problems, literary general election data analysis side is pushed away based on complex network present applicant proposes a kind of Method, by the way that the text that pushes away with similar views is connected together, realize general election data analysis.
The technical solution adopted by the present invention is:Literary general election data analysing method is pushed away based on complex network, including:
S1, based on push away literary similarity push away literary forwarding relation or push away text thumb up relation structure complex network;
S2, corporations' division is carried out to the complex network that step S1 is established.
Further, step S1 is specifically included and is pushed away literary similitude complex network, pushes away text forwarding complex network and push away literary point Praise complex network;
It is described to push away literary similitude complex network and be:If between two similarity two nodes of more than one threshold value for pushing away text In the presence of even side;
It is described push away text forwarding complex network be:If two push away text and forwarded by same person, push away text two are represented A company side is there is between node;
It is described push away text and thumb up complex network be:If two push away text and thumbed up by same person, two sections for pushing away text are represented A company side is there is between point.
Further, step S2 is specifically included:
S21, initialization node label, it is specially:According to the result of sentiment analysis, literary Sentiment orientation is pushed away to known in network Node addition affective tag;
S22, initialization corporations, by affective tag identical node division into same corporations;Obtain some known societies Group;
One S23, selection node not being divided, calculate the modularity of the node, and by the node division to corresponding Known corporations.
Further, step S23 is specially:Calculate in the unallocated node and each known corporations between partitioning site Modularity increment, obtain corporations where the partitioning site maximum with the unallocated node module degree increment, and this is not drawn Partial node is divided into the known corporations.
Beneficial effects of the present invention:The present invention based on complex network push away it is especially big select data analysing method, analysis pushes away text Pushed away in network text Sentiment orientation, and by sentiment analysis result for same emotion node division into a corporations, obtain more The corporations of individual known Sentiment orientation, then the method divided by complex network community will be unable to judge the node division of Sentiment orientation Into the corporations of the known Sentiment orientation obtained before and finally give the prediction result of general election;The method that the application provides can be with More accurately judge that the voter of every day in general election is inclined to change, be advantageous to further predict general election result.
Brief description of the drawings
Fig. 1 is the solution of the present invention flow chart;
Fig. 2 is prediction schematic diagram provided in an embodiment of the present invention.
Embodiment
For ease of skilled artisan understands that the technology contents of the present invention, enter one to present invention below in conjunction with the accompanying drawings Step explaination.
Can be seen that a user according to the analysis of networks congestion control can be more likely to forward and thumb up and oneself viewpoint Similar pushes away text, at the same have similar views push away its literary similarity degree also can be higher.Therefore it is similar with text is pushed away to forward, thumb up Property the network of text is pushed away for basic frontier juncture system of company structure, and the method divided by corporations will push away literary network and be divided into multiple societies Group.
It is the protocol procedures figure of the application as shown in Figure 1, the technical scheme is that:Pushing away based on complex network is literary big Data analysing method is selected, including:
S1, based on push away literary similarity push away literary forwarding relation or push away text thumb up relation structure complex network;There is following three class:
1st, text forwarding network is pushed away:If two push away text and forwarded by same person, representative is pushed away between two nodes of text It there is a company side;
2nd, push away text and thumb up network:If two push away text thumbed up by same person, represent push away text two nodes between just In the presence of a company side.
3rd, literary similitude network is pushed away:Based on the network for pushing away literary word similitude, if two similarities for pushing away text are more than one Individual threshold value, then even side between two nodes be present;Otherwise the company of being not present side;Here threshold value is true by the side number of the first two network It is fixed, that is, generate push away literary similitude network edge quantity and push away text thumb up network and push away text forwarding network side quantity it is about the same when Required threshold size.
Constructing three class complex networks and then corporations' division is carried out to this three classes complex network, and drawn according to corporations Point result predicts the final result of general election.
S2, corporations' division is carried out to the complex network that step S1 is established.Analysis pushes away the Sentiment orientation that text is pushed away in literary network, and By sentiment analysis result for same emotion node division into a corporations, obtain the corporations of multiple known Sentiment orientations.Again The method divided by complex network community will be unable to judge that the node division of Sentiment orientation is inclined to the known emotion obtained before To corporations in and finally give the prediction result of general election.Specifically include:
S21, initialization node label:According to the result of sentiment analysis, to the known node for pushing away literary Sentiment orientation in network Add affective tag.
S22, initialization corporations:By sentiment analysis result for same emotion node division into a corporations.Formed more Individual known corporations
S23, corporations' division:A node for not being divided corporations is selected, is calculated according to modularity delta algorithm unallocated The modularity increment of the node of corporations, and the node division of maximum modularity increment will be obtained into corresponding corporations.
Repeat step S23 until in network all nodes be all divided into known corporations.The Sentiment orientation of unallocated node Determined by corporations' type at place.
Modularity is a kind of standard of in recent years conventional measurement corporations division quality, and its basic idea is after division corporations Network compared with corresponding zero model, with measure corporations division quality.So-called zero model corresponding with a network Buy and just refer to that there are some identical properties and the in other respects random graph model of completely random with the network;Below to module The calculating of degree is described briefly:
Due to spending, distribution is considered as the important topological property of network and real network often has degree heterogeneous Distribution, so at present when analyzing network community structure, typically network to be studied with having the random of identical degree series Figure is made comparisons.
For a given real network, it is assumed that have found a kind of corporations' division, then corporations' internal edges number summation can be counted Calculate as follows:
Wherein, A=(aij) be real network adjacency matrix, CiWith CjNode i and node j institute in a network are represented respectively Belong to corporations:If the two nodes belong to same corporations, δ values are 1;Otherwise δ values are 0.
For with the real network corresponding to an identical scale zero model, if with identical corporations divide, then The desired value of side number summation inside all corporations is
Wherein, pijIt is the desired value of company's side number between zero model interior joint i and node j.
The modularity of one network be just defined as corporations' internal edges number of the network with inside the corporations of corresponding zero model The number M ratio when the difference of number accounts for whole network, i.e.,
In theory, for there are identical degree series but without the allocation models of degree correlation, Wo Menyou with former network pij=kikj/ (2M), here kiAnd kjRespectively former network node interior joint is i and node j degree.Therefore, conventional modularity is determined Justice is
Wherein
B=(bij)N×NAlso referred to as modularity matrix.
The community detecting algorithm based on modularity is specially in the application:
1st, initialize:It is initially assumed that each node is exactly an independent corporations, module angle value Q=0, initial eij、 aiIt is calculated as follows:
ai=ki/2M
The element of initial modularity Increment Matrix is calculated as follows:
After obtaining introductory die lumpiness Increment Matrix, it is possible to obtain the most raft H being made up of its greatest member per a line.
2nd, the Δ Q of maximum is selected from most raft Hij, merge corresponding corporations i and j, the corporations after label merging marked as J and update module degree Increment Matrix Δ Qij, most raft H and auxiliary vector ai
(1)ΔQijRenewal:The element of ith row and jth column is deleted, the element of jth row and jth row is updated, obtains:
Most raft H renewal:Update the greatest member of corresponding row and column in most raft.
Auxiliary vector aiRenewal:
aj'=ai+aj,ai'=0
Record merges later module angle value Q=Q+ Δs Qij
Repeat step S2 until in network all nodes be all grouped into a corporations.
In algorithm whole process, modularity Q only has a peak-peak.When greatest member is all in modularity Increment Matrix After zero, Q values are just only possible to have dropped always.As long as so in modularity Increment Matrix greatest member from being just changed into negative, Can stops merging, and the result for thinking now is exactly the community structure of network.
The present processes are described further below according to example, with Twitter spaces in this day on May 12 Filter out it is relevant with General elections of the United Kingdom push away text exemplified by, totally 10440 push away text, there is 10440 nodes in corresponding complex network; Analyze:The node of Sentiment orientation can not be judged, support the node of the Conservative Party;Oppose the node of the Labour Party;Support the node of the Labour Party; And oppose the node of the Conservative Party.Specific node data is as shown in table 1:
The general election of table 1 pushes away literary Sentiment orientation classification
Sentiment orientation Nodes
Support the Conservative Party 2683
Oppose the Conservative Party 2589
Support the Labour Party 2367
Oppose the Labour Party 2802
The supporting rate of the Liang Ge political parties and groups obtained according to the application method changes as shown in Fig. 2 final general election result:The Conservative Party 314 seats, compared to reducing by 17 seats before.The seat of the Labour Party 266, compared to increasing by 34 seats before.The result is close with our prediction, i.e., big Choose the Conservative Party dominant, but advantage and unobvious.
One of ordinary skill in the art will be appreciated that embodiment described here is to aid in reader and understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such especially statement and embodiment.For ability For the technical staff in domain, the present invention can have various modifications and variations.Within the spirit and principles of the invention, made Any modification, equivalent substitution and improvements etc., should be included within scope of the presently claimed invention.

Claims (4)

1. literary general election data analysing method is pushed away based on complex network, it is characterised in that including:
S1, based on push away literary similarity push away literary forwarding relation or push away text thumb up relation structure complex network;
S2, corporations' division is carried out to the complex network that step S1 is established.
2. according to claim 1 push away literary general election data analysing method based on complex network, it is characterised in that step S1 Specifically include and push away literary similitude complex network, push away text forwarding complex network and push away text and thumb up complex network;
It is described to push away literary similitude complex network and be:If exist between two similarity two nodes of more than one threshold value for pushing away text Lian Bian;
It is described push away text forwarding complex network be:If two push away text and forwarded by same person, two nodes for pushing away text are represented Between there is a company side;
It is described push away text and thumb up complex network be:If two push away text thumbed up by same person, represent push away text two nodes it Between there is a company side.
3. according to claim 1 push away literary general election data analysing method based on complex network, it is characterised in that step S2 Specifically include:
S21, initialization node label, it is specially:According to the result of sentiment analysis, to the known section for pushing away literary Sentiment orientation in network Point addition affective tag;
S22, initialization corporations, by affective tag identical node division into same corporations;Obtain some known corporations;
One S23, selection node not being divided, calculate the modularity of the node, and by the node division to corresponding known Corporations.
4. according to claim 3 push away literary general election data analysing method based on complex network, it is characterised in that step S23 is specially:The modularity increment between partitioning site is calculated in the unallocated node and each known corporations, is obtained with this not Corporations where the maximum partitioning site of partitioning site modularity increment, and by the unallocated node division to the known corporations In.
CN201710816286.4A 2017-09-12 2017-09-12 Pushing away based on complex network especially big selects data analysing method Pending CN107608962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710816286.4A CN107608962A (en) 2017-09-12 2017-09-12 Pushing away based on complex network especially big selects data analysing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710816286.4A CN107608962A (en) 2017-09-12 2017-09-12 Pushing away based on complex network especially big selects data analysing method

Publications (1)

Publication Number Publication Date
CN107608962A true CN107608962A (en) 2018-01-19

Family

ID=61062616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710816286.4A Pending CN107608962A (en) 2017-09-12 2017-09-12 Pushing away based on complex network especially big selects data analysing method

Country Status (1)

Country Link
CN (1) CN107608962A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363350A (en) * 2019-07-15 2019-10-22 西华大学 A kind of regional air pollutant analysis method based on complex network
CN110851733A (en) * 2019-10-31 2020-02-28 天津大学 Community discovery and emotion interpretation method based on network topology and document content

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130227104A1 (en) * 2012-02-28 2013-08-29 Samsung Electronics Co., Ltd. Topic-based community index generation apparatus and method and topic-based community searching apparatus and method
CN104516947A (en) * 2014-12-03 2015-04-15 浙江工业大学 Chinese microblog emotion analysis method fused with dominant and recessive characters
CN105427125A (en) * 2015-10-29 2016-03-23 电子科技大学 Goods clustering method based on goods network connection graph
CN105760426A (en) * 2016-01-28 2016-07-13 仲恺农业工程学院 Subject community mining method for online social networking service
CN107145516A (en) * 2017-04-07 2017-09-08 北京捷通华声科技股份有限公司 A kind of Text Clustering Method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130227104A1 (en) * 2012-02-28 2013-08-29 Samsung Electronics Co., Ltd. Topic-based community index generation apparatus and method and topic-based community searching apparatus and method
CN104516947A (en) * 2014-12-03 2015-04-15 浙江工业大学 Chinese microblog emotion analysis method fused with dominant and recessive characters
CN105427125A (en) * 2015-10-29 2016-03-23 电子科技大学 Goods clustering method based on goods network connection graph
CN105760426A (en) * 2016-01-28 2016-07-13 仲恺农业工程学院 Subject community mining method for online social networking service
CN107145516A (en) * 2017-04-07 2017-09-08 北京捷通华声科技股份有限公司 A kind of Text Clustering Method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
J.BORONDO 等: "Analyzing the Usage of Social Media During Spanish", 《2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING》 *
何有世 等: "基于复杂网络构建面向主题的在线评论挖掘模型", 《软科学》 *
刘璐: "基于粒计算的社会网络中社团挖掘的研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
汪小帆 等: "复杂网络中的社团结构算法综述", 《电子科技大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363350A (en) * 2019-07-15 2019-10-22 西华大学 A kind of regional air pollutant analysis method based on complex network
CN110363350B (en) * 2019-07-15 2023-10-10 西华大学 Regional air pollutant analysis method based on complex network
CN110851733A (en) * 2019-10-31 2020-02-28 天津大学 Community discovery and emotion interpretation method based on network topology and document content

Similar Documents

Publication Publication Date Title
Guo et al. Fuzzy detection system for rumors through explainable adaptive learning
Zhang et al. Identification of core-periphery structure in networks
US10671936B2 (en) Method for clustering nodes of a textual network taking into account textual content, computer-readable storage device and system implementing said method
Ceron et al. iSA: A fast, scalable and accurate algorithm for sentiment analysis of social media content
Rajadesingan et al. Identifying users with opposing opinions in Twitter debates
Xia et al. Community detection based on a semantic network
Rahmatullah Imon Identifying multiple influential observations in linear regression
CN104077417B (en) People tag in social networks recommends method and system
CN106780073B (en) Social network influence maximization initial node selection method considering user behaviors and emotions
Wicaksono A proposed method for predicting US presidential election by analyzing sentiment in social media
CN105869058B (en) A kind of method that multilayer latent variable model user portrait extracts
Li et al. Social network user influence dynamics prediction
Cherepnalkoski et al. A retweet network analysis of the European Parliament
CN107608962A (en) Pushing away based on complex network especially big selects data analysing method
Ma et al. Tag-latent dirichlet allocation: Understanding hashtags and their relationships
Chen et al. Content-based influence modeling for opinion behavior prediction
CN110110220A (en) Merge the recommended models of social networks and user's evaluation
CN110910235A (en) Method for detecting abnormal behavior in credit based on user relationship network
Vega et al. Metrics for temporal text networks
Alamsyah et al. Top Brand Alternative Measurement Based on Consumer Network Activity
CN113157993A (en) Network water army behavior early warning model based on time sequence graph polarization analysis
Mahdipour et al. Performance evaluation of an importance sampling technique in a Jackson network
CN113191144A (en) Network rumor recognition system and method based on propagation influence
Ye et al. Reconstructing spatial information diffusion networks with heterogeneous agents and text contents
Hosaka et al. An analytical model of website relationships based on browsing history embedding considerations of page transitions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180119

RJ01 Rejection of invention patent application after publication