CN107608962A - Pushing away based on complex network especially big selects data analysing method - Google Patents
Pushing away based on complex network especially big selects data analysing method Download PDFInfo
- Publication number
- CN107608962A CN107608962A CN201710816286.4A CN201710816286A CN107608962A CN 107608962 A CN107608962 A CN 107608962A CN 201710816286 A CN201710816286 A CN 201710816286A CN 107608962 A CN107608962 A CN 107608962A
- Authority
- CN
- China
- Prior art keywords
- push away
- corporations
- complex network
- text
- literary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention discloses a kind of pushes away literary general election data analysing method based on complex network, for pushing away the problem of literary hiding information is excavated, the Sentiment orientation that text is pushed away in literary network is pushed away by analysis, and by sentiment analysis result for same emotion node division into a corporations, the corporations of multiple known Sentiment orientations are obtained, then the method divided by complex network community will be unable to judge the node division of Sentiment orientation into the corporations of the known Sentiment orientation obtained before and finally give the prediction result of general election;The method that the application provides can more accurately judge that the voter of every day in general election is inclined to change, be advantageous to further predict general election result.
Description
Technical field
The invention belongs to Data Mining, more particularly to one kind pushes away special data prediction technology.
Background technology
State's general election result is successfully predicted to formulating state by analyzing the tendency of popular ballot before other country's general election
Family's strategy tool is of great significance.Traditional method for carrying out opinion poll to its user by media, can be by limited
Media itself audience size and person under investigation's number quantity influence so that prediction result is not accurate enough, therefore by pushing away special number
It is predicted that general election result is an emerging effective way.The service condition of word in text is pushed away by analysis come judge this push away text and
Deliver push away text user realized to the candidate in general election and its attitude of political parties and groups, and then by sentiment analysis method to general election
The prediction of final result.
, it is necessary to first be located in advance to pushing away text in analysis of the word during text is pushed away by analysis to judge its Sentiment orientation
Reason, remove the functional stop words pushed away in text.Meanwhile when pushing away specific to each on text, just must take into consideration to push away in text is made
The Sentiment orientation of word.Judgement to the Sentiment orientation of cyberspeak in recent years is mainly based upon two class sentiment dictionaries:1、
The sentiment dictionary of everyday words, negative words are represented and 2006 represent positive lists including wherein containing 4783
Word;2nd, the sentiment dictionary of network slang, because the term custom in the micro-blog of network is distinct with the literary style of common-use words, because
We also need to add the Sentiment orientation of some slangs for this.One Sentiment orientation for pushing away text mainly by its used emotion word and
Wherein represent the compound word (such as very, extremely) of enhancement and represent that the compound word (such as not) of negative determines.
The above-mentioned sentiment analysis based on emotion word can only analyze the text that pushes away of the word that contains in sentiment dictionary, and this portion
Point push away text and only account for and always push away the 20% of literary number.But how to excavate remaining 80% and push away the information hidden in text, do not go out also
Now effectively solves method.
Large amount of complex system present in nature can be described by panoramic network.One typical
What the company side that network is connect between node by multiple nodes formed, its interior joint is used for representing individual different in real system,
And side is then used for representing the relation between individual, one is added typically when having certain specific relation between two nodes
Lian Bian, it is on the contrary then do not connect side.Two nodes for having side connected are counted as adjacent in a network.With the physics meaning to network
The further investigation of justice and mathematical characteristic, it has been found that many real networks all have a common property --- community structure, i.e.,
Whole network is made up of several corporations, the connection between each corporations' internal node closely, and between each corporations
Connection is comparatively than sparse.Therefore a large-scale complex network can be internal chain by the algorithm partition that corporations divide
Meet more close multiple corporations.
The content of the invention
In order to solve the above technical problems, literary general election data analysis side is pushed away based on complex network present applicant proposes a kind of
Method, by the way that the text that pushes away with similar views is connected together, realize general election data analysis.
The technical solution adopted by the present invention is:Literary general election data analysing method is pushed away based on complex network, including:
S1, based on push away literary similarity push away literary forwarding relation or push away text thumb up relation structure complex network;
S2, corporations' division is carried out to the complex network that step S1 is established.
Further, step S1 is specifically included and is pushed away literary similitude complex network, pushes away text forwarding complex network and push away literary point
Praise complex network;
It is described to push away literary similitude complex network and be:If between two similarity two nodes of more than one threshold value for pushing away text
In the presence of even side;
It is described push away text forwarding complex network be:If two push away text and forwarded by same person, push away text two are represented
A company side is there is between node;
It is described push away text and thumb up complex network be:If two push away text and thumbed up by same person, two sections for pushing away text are represented
A company side is there is between point.
Further, step S2 is specifically included:
S21, initialization node label, it is specially:According to the result of sentiment analysis, literary Sentiment orientation is pushed away to known in network
Node addition affective tag;
S22, initialization corporations, by affective tag identical node division into same corporations;Obtain some known societies
Group;
One S23, selection node not being divided, calculate the modularity of the node, and by the node division to corresponding
Known corporations.
Further, step S23 is specially:Calculate in the unallocated node and each known corporations between partitioning site
Modularity increment, obtain corporations where the partitioning site maximum with the unallocated node module degree increment, and this is not drawn
Partial node is divided into the known corporations.
Beneficial effects of the present invention:The present invention based on complex network push away it is especially big select data analysing method, analysis pushes away text
Pushed away in network text Sentiment orientation, and by sentiment analysis result for same emotion node division into a corporations, obtain more
The corporations of individual known Sentiment orientation, then the method divided by complex network community will be unable to judge the node division of Sentiment orientation
Into the corporations of the known Sentiment orientation obtained before and finally give the prediction result of general election;The method that the application provides can be with
More accurately judge that the voter of every day in general election is inclined to change, be advantageous to further predict general election result.
Brief description of the drawings
Fig. 1 is the solution of the present invention flow chart;
Fig. 2 is prediction schematic diagram provided in an embodiment of the present invention.
Embodiment
For ease of skilled artisan understands that the technology contents of the present invention, enter one to present invention below in conjunction with the accompanying drawings
Step explaination.
Can be seen that a user according to the analysis of networks congestion control can be more likely to forward and thumb up and oneself viewpoint
Similar pushes away text, at the same have similar views push away its literary similarity degree also can be higher.Therefore it is similar with text is pushed away to forward, thumb up
Property the network of text is pushed away for basic frontier juncture system of company structure, and the method divided by corporations will push away literary network and be divided into multiple societies
Group.
It is the protocol procedures figure of the application as shown in Figure 1, the technical scheme is that:Pushing away based on complex network is literary big
Data analysing method is selected, including:
S1, based on push away literary similarity push away literary forwarding relation or push away text thumb up relation structure complex network;There is following three class:
1st, text forwarding network is pushed away:If two push away text and forwarded by same person, representative is pushed away between two nodes of text
It there is a company side;
2nd, push away text and thumb up network:If two push away text thumbed up by same person, represent push away text two nodes between just
In the presence of a company side.
3rd, literary similitude network is pushed away:Based on the network for pushing away literary word similitude, if two similarities for pushing away text are more than one
Individual threshold value, then even side between two nodes be present;Otherwise the company of being not present side;Here threshold value is true by the side number of the first two network
It is fixed, that is, generate push away literary similitude network edge quantity and push away text thumb up network and push away text forwarding network side quantity it is about the same when
Required threshold size.
Constructing three class complex networks and then corporations' division is carried out to this three classes complex network, and drawn according to corporations
Point result predicts the final result of general election.
S2, corporations' division is carried out to the complex network that step S1 is established.Analysis pushes away the Sentiment orientation that text is pushed away in literary network, and
By sentiment analysis result for same emotion node division into a corporations, obtain the corporations of multiple known Sentiment orientations.Again
The method divided by complex network community will be unable to judge that the node division of Sentiment orientation is inclined to the known emotion obtained before
To corporations in and finally give the prediction result of general election.Specifically include:
S21, initialization node label:According to the result of sentiment analysis, to the known node for pushing away literary Sentiment orientation in network
Add affective tag.
S22, initialization corporations:By sentiment analysis result for same emotion node division into a corporations.Formed more
Individual known corporations
S23, corporations' division:A node for not being divided corporations is selected, is calculated according to modularity delta algorithm unallocated
The modularity increment of the node of corporations, and the node division of maximum modularity increment will be obtained into corresponding corporations.
Repeat step S23 until in network all nodes be all divided into known corporations.The Sentiment orientation of unallocated node
Determined by corporations' type at place.
Modularity is a kind of standard of in recent years conventional measurement corporations division quality, and its basic idea is after division corporations
Network compared with corresponding zero model, with measure corporations division quality.So-called zero model corresponding with a network
Buy and just refer to that there are some identical properties and the in other respects random graph model of completely random with the network;Below to module
The calculating of degree is described briefly:
Due to spending, distribution is considered as the important topological property of network and real network often has degree heterogeneous
Distribution, so at present when analyzing network community structure, typically network to be studied with having the random of identical degree series
Figure is made comparisons.
For a given real network, it is assumed that have found a kind of corporations' division, then corporations' internal edges number summation can be counted
Calculate as follows:
Wherein, A=(aij) be real network adjacency matrix, CiWith CjNode i and node j institute in a network are represented respectively
Belong to corporations:If the two nodes belong to same corporations, δ values are 1;Otherwise δ values are 0.
For with the real network corresponding to an identical scale zero model, if with identical corporations divide, then
The desired value of side number summation inside all corporations is
Wherein, pijIt is the desired value of company's side number between zero model interior joint i and node j.
The modularity of one network be just defined as corporations' internal edges number of the network with inside the corporations of corresponding zero model
The number M ratio when the difference of number accounts for whole network, i.e.,
In theory, for there are identical degree series but without the allocation models of degree correlation, Wo Menyou with former network
pij=kikj/ (2M), here kiAnd kjRespectively former network node interior joint is i and node j degree.Therefore, conventional modularity is determined
Justice is
Wherein
B=(bij)N×NAlso referred to as modularity matrix.
The community detecting algorithm based on modularity is specially in the application:
1st, initialize:It is initially assumed that each node is exactly an independent corporations, module angle value Q=0, initial eij、
aiIt is calculated as follows:
ai=ki/2M
The element of initial modularity Increment Matrix is calculated as follows:
After obtaining introductory die lumpiness Increment Matrix, it is possible to obtain the most raft H being made up of its greatest member per a line.
2nd, the Δ Q of maximum is selected from most raft Hij, merge corresponding corporations i and j, the corporations after label merging marked as
J and update module degree Increment Matrix Δ Qij, most raft H and auxiliary vector ai。
(1)ΔQijRenewal:The element of ith row and jth column is deleted, the element of jth row and jth row is updated, obtains:
Most raft H renewal:Update the greatest member of corresponding row and column in most raft.
Auxiliary vector aiRenewal:
aj'=ai+aj,ai'=0
Record merges later module angle value Q=Q+ Δs Qij
Repeat step S2 until in network all nodes be all grouped into a corporations.
In algorithm whole process, modularity Q only has a peak-peak.When greatest member is all in modularity Increment Matrix
After zero, Q values are just only possible to have dropped always.As long as so in modularity Increment Matrix greatest member from being just changed into negative,
Can stops merging, and the result for thinking now is exactly the community structure of network.
The present processes are described further below according to example, with Twitter spaces in this day on May 12
Filter out it is relevant with General elections of the United Kingdom push away text exemplified by, totally 10440 push away text, there is 10440 nodes in corresponding complex network;
Analyze:The node of Sentiment orientation can not be judged, support the node of the Conservative Party;Oppose the node of the Labour Party;Support the node of the Labour Party;
And oppose the node of the Conservative Party.Specific node data is as shown in table 1:
The general election of table 1 pushes away literary Sentiment orientation classification
Sentiment orientation | Nodes |
Support the Conservative Party | 2683 |
Oppose the Conservative Party | 2589 |
Support the Labour Party | 2367 |
Oppose the Labour Party | 2802 |
The supporting rate of the Liang Ge political parties and groups obtained according to the application method changes as shown in Fig. 2 final general election result:The Conservative Party
314 seats, compared to reducing by 17 seats before.The seat of the Labour Party 266, compared to increasing by 34 seats before.The result is close with our prediction, i.e., big
Choose the Conservative Party dominant, but advantage and unobvious.
One of ordinary skill in the art will be appreciated that embodiment described here is to aid in reader and understands this hair
Bright principle, it should be understood that protection scope of the present invention is not limited to such especially statement and embodiment.For ability
For the technical staff in domain, the present invention can have various modifications and variations.Within the spirit and principles of the invention, made
Any modification, equivalent substitution and improvements etc., should be included within scope of the presently claimed invention.
Claims (4)
1. literary general election data analysing method is pushed away based on complex network, it is characterised in that including:
S1, based on push away literary similarity push away literary forwarding relation or push away text thumb up relation structure complex network;
S2, corporations' division is carried out to the complex network that step S1 is established.
2. according to claim 1 push away literary general election data analysing method based on complex network, it is characterised in that step S1
Specifically include and push away literary similitude complex network, push away text forwarding complex network and push away text and thumb up complex network;
It is described to push away literary similitude complex network and be:If exist between two similarity two nodes of more than one threshold value for pushing away text
Lian Bian;
It is described push away text forwarding complex network be:If two push away text and forwarded by same person, two nodes for pushing away text are represented
Between there is a company side;
It is described push away text and thumb up complex network be:If two push away text thumbed up by same person, represent push away text two nodes it
Between there is a company side.
3. according to claim 1 push away literary general election data analysing method based on complex network, it is characterised in that step S2
Specifically include:
S21, initialization node label, it is specially:According to the result of sentiment analysis, to the known section for pushing away literary Sentiment orientation in network
Point addition affective tag;
S22, initialization corporations, by affective tag identical node division into same corporations;Obtain some known corporations;
One S23, selection node not being divided, calculate the modularity of the node, and by the node division to corresponding known
Corporations.
4. according to claim 3 push away literary general election data analysing method based on complex network, it is characterised in that step
S23 is specially:The modularity increment between partitioning site is calculated in the unallocated node and each known corporations, is obtained with this not
Corporations where the maximum partitioning site of partitioning site modularity increment, and by the unallocated node division to the known corporations
In.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710816286.4A CN107608962A (en) | 2017-09-12 | 2017-09-12 | Pushing away based on complex network especially big selects data analysing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710816286.4A CN107608962A (en) | 2017-09-12 | 2017-09-12 | Pushing away based on complex network especially big selects data analysing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107608962A true CN107608962A (en) | 2018-01-19 |
Family
ID=61062616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710816286.4A Pending CN107608962A (en) | 2017-09-12 | 2017-09-12 | Pushing away based on complex network especially big selects data analysing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107608962A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110363350A (en) * | 2019-07-15 | 2019-10-22 | 西华大学 | A kind of regional air pollutant analysis method based on complex network |
CN110851733A (en) * | 2019-10-31 | 2020-02-28 | 天津大学 | Community discovery and emotion interpretation method based on network topology and document content |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130227104A1 (en) * | 2012-02-28 | 2013-08-29 | Samsung Electronics Co., Ltd. | Topic-based community index generation apparatus and method and topic-based community searching apparatus and method |
CN104516947A (en) * | 2014-12-03 | 2015-04-15 | 浙江工业大学 | Chinese microblog emotion analysis method fused with dominant and recessive characters |
CN105427125A (en) * | 2015-10-29 | 2016-03-23 | 电子科技大学 | Goods clustering method based on goods network connection graph |
CN105760426A (en) * | 2016-01-28 | 2016-07-13 | 仲恺农业工程学院 | Subject community mining method for online social networking service |
CN107145516A (en) * | 2017-04-07 | 2017-09-08 | 北京捷通华声科技股份有限公司 | A kind of Text Clustering Method and system |
-
2017
- 2017-09-12 CN CN201710816286.4A patent/CN107608962A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130227104A1 (en) * | 2012-02-28 | 2013-08-29 | Samsung Electronics Co., Ltd. | Topic-based community index generation apparatus and method and topic-based community searching apparatus and method |
CN104516947A (en) * | 2014-12-03 | 2015-04-15 | 浙江工业大学 | Chinese microblog emotion analysis method fused with dominant and recessive characters |
CN105427125A (en) * | 2015-10-29 | 2016-03-23 | 电子科技大学 | Goods clustering method based on goods network connection graph |
CN105760426A (en) * | 2016-01-28 | 2016-07-13 | 仲恺农业工程学院 | Subject community mining method for online social networking service |
CN107145516A (en) * | 2017-04-07 | 2017-09-08 | 北京捷通华声科技股份有限公司 | A kind of Text Clustering Method and system |
Non-Patent Citations (4)
Title |
---|
J.BORONDO 等: "Analyzing the Usage of Social Media During Spanish", 《2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING》 * |
何有世 等: "基于复杂网络构建面向主题的在线评论挖掘模型", 《软科学》 * |
刘璐: "基于粒计算的社会网络中社团挖掘的研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
汪小帆 等: "复杂网络中的社团结构算法综述", 《电子科技大学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110363350A (en) * | 2019-07-15 | 2019-10-22 | 西华大学 | A kind of regional air pollutant analysis method based on complex network |
CN110363350B (en) * | 2019-07-15 | 2023-10-10 | 西华大学 | Regional air pollutant analysis method based on complex network |
CN110851733A (en) * | 2019-10-31 | 2020-02-28 | 天津大学 | Community discovery and emotion interpretation method based on network topology and document content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Fuzzy detection system for rumors through explainable adaptive learning | |
Zhang et al. | Identification of core-periphery structure in networks | |
US10671936B2 (en) | Method for clustering nodes of a textual network taking into account textual content, computer-readable storage device and system implementing said method | |
Ceron et al. | iSA: A fast, scalable and accurate algorithm for sentiment analysis of social media content | |
Rajadesingan et al. | Identifying users with opposing opinions in Twitter debates | |
Xia et al. | Community detection based on a semantic network | |
Rahmatullah Imon | Identifying multiple influential observations in linear regression | |
CN104077417B (en) | People tag in social networks recommends method and system | |
CN106780073B (en) | Social network influence maximization initial node selection method considering user behaviors and emotions | |
Wicaksono | A proposed method for predicting US presidential election by analyzing sentiment in social media | |
CN105869058B (en) | A kind of method that multilayer latent variable model user portrait extracts | |
Li et al. | Social network user influence dynamics prediction | |
Cherepnalkoski et al. | A retweet network analysis of the European Parliament | |
CN107608962A (en) | Pushing away based on complex network especially big selects data analysing method | |
Ma et al. | Tag-latent dirichlet allocation: Understanding hashtags and their relationships | |
Chen et al. | Content-based influence modeling for opinion behavior prediction | |
CN110110220A (en) | Merge the recommended models of social networks and user's evaluation | |
CN110910235A (en) | Method for detecting abnormal behavior in credit based on user relationship network | |
Vega et al. | Metrics for temporal text networks | |
Alamsyah et al. | Top Brand Alternative Measurement Based on Consumer Network Activity | |
CN113157993A (en) | Network water army behavior early warning model based on time sequence graph polarization analysis | |
Mahdipour et al. | Performance evaluation of an importance sampling technique in a Jackson network | |
CN113191144A (en) | Network rumor recognition system and method based on propagation influence | |
Ye et al. | Reconstructing spatial information diffusion networks with heterogeneous agents and text contents | |
Hosaka et al. | An analytical model of website relationships based on browsing history embedding considerations of page transitions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180119 |
|
RJ01 | Rejection of invention patent application after publication |