CN109241430A - A kind of election prediction technique of internet multi-resources Heterogeneous data fusion - Google Patents
A kind of election prediction technique of internet multi-resources Heterogeneous data fusion Download PDFInfo
- Publication number
- CN109241430A CN109241430A CN201811038860.9A CN201811038860A CN109241430A CN 109241430 A CN109241430 A CN 109241430A CN 201811038860 A CN201811038860 A CN 201811038860A CN 109241430 A CN109241430 A CN 109241430A
- Authority
- CN
- China
- Prior art keywords
- candidate
- election
- prediction
- internet
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 238000012216 screening Methods 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 230000008451 emotion Effects 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000008520 organization Effects 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 6
- 238000011160 research Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 5
- 230000002996 emotional effect Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 3
- 238000005096 rolling process Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 5
- 238000007418 data mining Methods 0.000 abstract description 2
- 238000012544 monitoring process Methods 0.000 abstract description 2
- 238000007726 management method Methods 0.000 description 11
- 230000007246 mechanism Effects 0.000 description 6
- 210000003813 thumb Anatomy 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013506 data mapping Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010038743 Restlessness Diseases 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of election prediction techniques of internet multi-resources Heterogeneous data fusion, belong to the field of data mining.First from internet data, screening is able to reflect the information source of election country popular feelings trend.Then specific features are extracted from the internet information source filtered out, construct candidate's supporting rate prediction index system based on internet platform.All kinds of prediction index of extraction are finally considered as the signal to reflect the people's will, are merged with Kalman filter model, the supporting rate of dynamic realtime tracking prediction candidate.The present invention have the characteristics that data source extensively, strong real-time, there is important application value in public sentiment monitoring and the fields such as viewpoint analysis.
Description
Technical field
The invention belongs to the field of data mining, are related to a kind of election prediction technique of internet multi-resources Heterogeneous data fusion.
Background technique
Electoral system sets up the history for having over one hundred year so far, and the prediction of general election result is paid close attention to by various circles of society,
Multiclass prediction technique and technology are emerged.
Initial election prediction relies on opinion poll, fact-finding organ be generally from survey organization, major mainstream media and
The research institution of university, they are often based upon Sampling Survey theory and carry out information collection, are aided with expertise amendment, with the will of the people
The political wind direction of test assessment obtains prediction result in turn.This prediction technique advantage based on poll is: real-time is stronger, closes on
Election may include in the result the new information that the will of the people impacts.But due to investigation method, sample size and poll
The influence of the factors such as political party of mechanism tendentiousness, poll result often have partially.
It is subsequent, there are some scholars and corporate facility to propose the prediction technique based on macrovariable.This kind of prediction technique is comprehensive
It closes and considers State-level macroeconomic data, building prediction model predicts general election percentage of votes obtained.Such method prediction model
It is easy to get, has to election results stronger explanatory.But prediction model is often based upon long history data, and timeliness is not strong, nothing
Method introduces the up-to-date information for closing on election in a model;And in the case where candidate is roughly the same, it is difficult to make Accurate Prediction.
With the rapid development of Internet technology, information is in explosive growth, and election information presentation mode is also more and more
Sample, the abundant information contained in big data bring new resolving ideas to election prediction.The election of multiple countries
All demonstrate effect of the social networks such as Facebook and Twitter in percentage of votes obtained prediction.Based on internet big data
Election prediction technique compared to poll method and the prediction technique based on macrovariable have stronger real-time, but at present side
Fado belongs to ex-post analysis, and is based only upon single social media data source, does not account for user and participates in social media platform
Diversity.In this way, obtained candidate's supporting rate prediction result often has biggish deviation, it is difficult to reflection election public sentiment comprehensively.
Summary of the invention
To solve the above problems, the invention proposes a kind of prediction technique for obtaining election percentage of votes obtained, it is specifically a kind of mutual
The election prediction technique for multi-resources Heterogeneous data fusion of networking;With person participating in the election's supporting rate be prediction object, by fusion social media,
The multi-source heterogeneous big data such as search engine and election contest homepage, overcomes deviation of the data mapping in terms of disclosing the will of the people, to realize
The target of real-time tracking and predicting candidate people's supporting rate.
The election prediction technique of the internet multi-resources Heterogeneous data fusion, the specific steps are as follows:
Step 1: from internet data, screening is able to reflect the information source of election country popular feelings trend.
The step of filter information source specifically:
Firstly, for election country, what the internet management and service organization for searching the country were issued
Research report extracts the widely used internet platform of netizen from report.
Then, traffic statistics are carried out by the website to internet platform, the website for obtaining the election country makes
With ranking, most frequently used website is filtered out.
Finally, retaining social networks class and search engine class etc. from most frequently used website and being generated with user
The information source of content.Meanwhile candidates participating in election campaign homepage being added in candidate information source, and then pass through traffic statistics website, analysis
Degree of concern of the common people for different candidates participating in election campaign websites.
Step 2: extracting specific features from the internet information source filtered out, constructs the candidate based on internet platform
People's supporting rate prediction index system.
The prediction index includes: social networks prediction index, search engine prediction index and candidates participating in election campaign homepage
Prediction index.Specific building process is as follows:
(1) social networks prediction index is constructed in terms of quantity and emotion two;
In quantitative aspects, by referring to the ratio of posting of candidate as prediction index in social networks.
Specifically, if referring in social network-i i-platform for t days, the model quantity of candidate i isThen same day time
The i that chooses what the platform obtained refers to supporting rate indexCalculation is as follows:
Or support of the number as netizen to the candidate is praised using what each per day every note text of candidate obtained.
Specifically, number is praised in every model j acquisition if t days candidate i have issued n bars of model in social network-i i-platform
ForThen the same day candidate i praises several supporting rate indexs what the platform obtainedCalculation is as follows:
In terms of emotion, emotional semantic classification is carried out to the text information in social networks, and calculates positive emotion and passive feelings
The ratio of sense, thus as netizen to the supporting rate prediction index of candidate.
Specifically, if shared about posting for candidate i in t days social networksItem, wherein positive emotion model
ForItem, Negative Affect model areItem, then the text emotion supporting rate index of the candidate iCalculation
It is as follows:
(2) search engine prediction index is constructed;
Firstly, choosing the election maximum search engine of country usage amount;
Then, volumes of searches of the candidate i on t is obtainedCalculate concern of the candidate i in t days search engines
Spend index:
(3) candidates participating in election campaign homepage prediction index is constructed;
Candidate i in t days IP amount of access is by election contest websiteThe election contest homepage that candidate i is calculated on t closes
Note degree index:
Step 3: all kinds of prediction index of extraction are considered as the signal to reflect the people's will, are carried out with Kalman filter model
Fusion, the supporting rate of dynamic realtime tracking prediction candidate.
Detailed process is as follows:
Step 301 carries out all kinds of prediction index extracted with the method for moving average smoothly, it is flat to obtain each prediction index
Sliding value
When to candidate's i supporting rate is predicted within t+1 days, t-l to t day each index value daily is first calculatedC ∈ { count, like, senti, search, IP }, each prediction index after then calculating separately rolling average
Smooth valueCalculation method is as follows:
Step 302, the state according to the common people to candidate i on t-1, develop that calculate the candidate i true at t days
State value
B is control input variable coefficient matrix;ut-1To control input variable;wt-1For process noise vector, the noise to
Amount obey mean value be 0, covariance matrix QtMultivariate normal distributions, wt~N (0, Qt)。
Step 303, at each moment, by each prediction index smooth valueAs time of day valueReflection;Building the
T days measured valuesWith time of day valueBetween mapping relations.
Measured valueHtFor time of day value to observation measured value
Mapping matrix;vtFor the white Gaussian noise of measurement, obey mean value be 0, covariance matrix RtMultivariate normal distributions, vt~N
(0,Rt).Assuming that during state evolution, original stateProcess noise wtWith measurement noise vtIndependently of each other.
Step 304 observed measured value when t daysAfter input Kalman filter model, Kalman filtering was according to the same day
The prior state estimated value and observation of candidate's supporting rate, after predicting the same day with kalman gain coefficient Weighted Fusion
Test state estimation
Indicate the estimated value of the supporting rate according to preceding t-1 days of observation to candidate i on t.KtFor karr
Graceful gain coefficient, to measure the weight of prior state estimated value and measured value in fusion process.
T days posteriority state estimations and state transition equation are updated by step 305 with Kalman filtering,
Obtain the posteriority state estimation of next day supporting rate.
The present invention has the advantages that a kind of election prediction technique of internet multi-resources Heterogeneous data fusion, it is contemplated that user
Using the diversity of internet platform, have the characteristics that data source extensively, strong real-time, in the neck such as public sentiment monitoring and viewpoint analysis
Domain has important application value.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the election prediction technique of internet multi-resources Heterogeneous data fusion of the present invention.
Fig. 2 is the supporting rate of dynamic realtime tracking prediction candidate after the present invention is merged the prediction index of extraction
Flow chart.
Specific embodiment
Below in conjunction with drawings and examples, the present invention is described in further detail.
In view of the big data scale of construction is huge, data type is various, value density is low, processing speed wants the characteristics such as fast, the present invention
In view of the wide participation of user on internet platform, proposes for this kind of events are elected, dug from internet platform
The method for digging the will of the people;Simultaneously in view of user uses the diversity of internet platform, propose based on Kalman filter model
Candidate's supporting rate prediction technique of multi-resources Heterogeneous data fusion;This method considers that country internet is elected to use first
Situation filters out the internet platform for being able to reflect popular feelings trend.In turn, filtering out from numerous and disorderly internet platform can be anti-
Reflect the information source of election country popular feelings trend;In turn, for each information source filtered out, the invention proposes the will of the people
Prediction index extracting method;Candidate's supporting rate prediction index system based on internet platform of building.Finally, will extract
Index is considered as the signal to reflect the people's will, with signal processing model --- and Kalman filter model in real time refers to multi-resources Heterogeneous prediction
The dynamically track prediction of candidate's percentage of votes obtained is realized in mark fusion.
A kind of election prediction technique of internet multi-resources Heterogeneous data fusion, detailed process as shown in Figure 1, implementation steps such as
Under:
Step 1: screening is able to reflect the information source of election country popular feelings trend.
In face of internet data abundant, it is quasi- for finding and being able to reflect the reliable information source of election country popular feelings trend
The really basis of prediction election results.In terms of filter information source, it is broadly divided into two steps:
Step 101, the research report of election country internet management and service organization's publication is searched.
Internet management and service organization can be all issued every year for the analysis of the Internet Use in the country in which it is located or area
Report can form preliminary understanding to the network use habit of election country by these research reports, and then from report
It is the widely used internet platform of netizen that election country is extracted in announcement.
Currently, internet management and service organization mainly have in international coverage: International Telecommunication Union (ITU), international interconnection
Net association (ISOC), Internet information centre (INTERNIC) etc..The internet management of the Asian-Pacific area and service organization
Mainly have: Asian-Pacific area internet society (APIA), Asian-Pacific area internet group (APNG), Asia Pacific Internet Information Center
(APNIC), China Internet Network Information Center (CNNIC), Japanese Network Information Centre (JPNIC), South Korea Network Information Centre
(KRNIC), Malaysian domain name registration management organization (MIMOS) etc..The mechanism in America area is with specifically including that America area IP
Location management and distributor gear (ARIN), domain name registration management organization of the U.S. (NeuStar), Canadian internet registration office
(CIRA) etc..The mechanism of European Region mainly has: the committee of top level domain registration management mechanism of European countries (CENTR), Germany
Inter network information center (DENIC), inter network information center of Britain (Nominet), European Region IP address management and point
Fitting mechanism (RIPE).African Territories mainly have: African inter network information center (AfriNIC) etc..Australia area specifically includes that
Australian domain name registration management organization (AUDA) etc..
Step 102, the Web vector graphic investigation report of research firm's publication of election country is consulted.For example, Alexa
Etc. website traffic statistics website can provide the website in every country or area using ranking.According to website ranking, screening election
The most frequently used website in country.
On the basis of first two steps, the big website of a batch election country usage amount can be filtered out.Due to using
Frequently, common people's wide participation, these websites are more likely to disclose popular feelings trend.
Step 103, it is contemplated that prediction index should disclose the viewpoint of the common people as far as possible, only retain in the high website of amount of access
These information sources with user-generated content such as social networks class and search engine class.Simultaneously, it is contemplated that the spy of general election topic
Candidates participating in election campaign homepage should be also added in different property in candidate information source, and then passes through the traffic statistics web analytics common people such as Alexa
For the degree of concern of different candidates participating in election campaign websites.As a result, can preliminary screening go out be able to reflect popular feelings trend internet letter
Breath source.
Step 2: extracting specific features from the internet information source filtered out, constructs the candidate based on internet platform
People's supporting rate prediction index system.
Prediction index includes: that social networks prediction index, search engine prediction index and the prediction of candidates participating in election campaign homepage refer to
Mark.The index system of building overall scientific is the key that selection prediction.Combining information source specific features, each channel forecast index tool
Body building process is as follows:
(1) social networks prediction index is constructed in terms of quantity and emotion two;
Social networks has become the main platform that the common people obtain information, express an opinion because of its interactivity and timeliness.Such as
Favor of the social medias such as Facebook, Twitter by more and more users.Allow netizen logical in these social media platforms
It crosses and the Behavior Expressions such as thumbs up, comments on to the view of election candidate.The common people are to candidate in excavating these user-generated contents
Tendentiousness when, prediction index can be constructed in terms of quantity and emotion two.
Discuss that the model quantity of candidate reflects the common people for candidate in quantitative aspects, Facebook, Twitter
Attention rate.It therefore, can be by referring to the ratio of posting of candidate as prediction index in social networks.Specifically, if
The model quantity for referring to candidate i for t days in social network-i i-platform isThen same day candidate i is mentioned what the platform obtained
And supporting rate indexCalculation is as follows:
In addition, other than the referring to and can reflect common people's support of candidate in social network-i i-platform, many social networks
Station, which additionally provides, the functions such as thumbs up.It thumbs up and may be considered netizen for the strong approval of candidate people's words unrest.It therefore, can be to wait
What each per day every note text of choosing obtained praises support of the number as netizen to candidate.Specifically, if t days candidate
People's i social network-i i-platform has issued n model, and every model j acquisition praises number and isThen same day candidate i is obtained in the platform
Praise several supporting rate indexsCalculation is as follows:
Referred in terms of emotion, in social networks candidate post and candidate's personal homepage in comment embody
Netizen's viewpoint abundant.It, can be to the text envelope in social networks in order to excavate the Sentiment orientation contained in these text informations
Breath carries out emotional semantic classification, and calculates the ratio of positive emotion and Negative Affect, so as to the support as netizen to candidate
Rate prediction index.Specifically, if shared about posting for candidate i in t days social networksItem, wherein positive emotion note
Son isItem, Negative Affect model areItem, then the text emotion supporting rate index of candidate iCalculation
It is as follows:
(2) search engine prediction index is constructed;
The retrieval behavior of each user in a search engine is the displaying of active wish.In order to help user to understand net
People's focus of attention, more search engines provide keyword search query index service, such as Google Trends.These indexes are with sea
Based on measuring netizen's behavioral data, it is capable of providing the search scale of some keyword in a search engine, is usually updated by day degree.
For the scene that the present invention considers, the election maximum search engine of country usage amount is chosen, candidate i is then obtained and exists
T days volumes of searchesCalculate attention rate index of the candidate i in t days search engines:
(3) candidates participating in election campaign homepage prediction index is constructed;
Candidate is in order to publicize the opinion in power of oneself, draw ballot paper over to one's side, it will usually set up election contest homepage.By campaigning for net
It stands, on the one hand candidate shows recent electioneering and speech;On the other hand the whip-round page would generally be set up, it is competing to obtain development
Select movable financial support.The IP amount of access of candidates participating in election campaign homepage reflects the common people for the concern of candidate's words and deeds.In order to
Website adjusting and optimizing is helped, the traffic statistics mechanism such as Alexa, SEO comprehensive inquiry head of a station's tool can provide appointed website
Daily IP amount of access.If candidate i campaigns for website in t days IP amount of accessCalculate election contest of the candidate i on t
Homepage attention rate index:
Step 3: all kinds of prediction index of extraction are considered as the signal to reflect the people's will, are carried out with Kalman filter model
Fusion, the supporting rate of dynamic realtime tracking prediction candidate.
The five class prediction index extracted in step 2 reflect concern of the common people to candidate from different perspectives.Due to
Family uses the diversity and excess kurtosis of internet platform, and only relying upon the prediction that some above-mentioned index is made may have partially.
Therefore, it is necessary to one kind can merge multi-resources Heterogeneous index, the method for concentrated expression candidate's support.The present invention will be in step 2
The five class prediction index extracted are considered as the signal of will of the people reflection, with signal processing method --- Kalman filter model fusion
Multi-resources Heterogeneous signal.Specifically, implementation the following steps are included:
Step 301, prediction index is smooth.In order to reflect the supporting rate trend of candidate's acquisition, at the same it is each in order to prevent
Influence of the prediction index fluctuation for prediction result first has to carry out the five class prediction index extracted smooth.The present invention
The method used is the method for moving average.Specifically, when to candidate's i supporting rate prediction in t+1 days, when calculating t-l to t first
Carve daily prediction index valueRolling average is calculated separately again
Five class prediction index values afterwardsAs second step --- Kalman filter model
Input.Calculation method is as follows:
Step 302, Kalman filter model fusion forecasting index.Kalman filtering is a kind of utilization linear system state side
Journey, by carrying out the algorithm of optimal estimation to system mode with noisy observation data.In the present invention, with multi-source on line
Counted each prediction index in dataAs the observation of common people's supporting rate, estimated by carrying out fusion to these prediction index
Count the practical status of support of the common people.If time of day value of the common people to candidate i on tIt is drilled by the state at (t-1) moment
Become, it may be assumed that
Wherein, B is control input variable coefficient matrix;ut-1To control input variable;wt-1For process noise vector, this is made an uproar
Sound vector obey mean value be 0, covariance matrix QtMultivariate normal distributions, wt~N (0, Qt)。
Step 303, at each moment, measured value is constructedWith it is true
Real state valueBetween mapping relations, and think that observation contains noise, it may be assumed that
Wherein, HtFor the mapping matrix of state value to measured value;vtTo measure noise, and it is assumed to be white Gaussian noise, vt~
N(0,Rt).Assuming that during state evolution, original stateProcess noise wt, measurement noise vtIndependently of each other.
Step 304, Kalman filtering includes two stages: prediction and update.First in forecast period, Kalman filtering root
Go out the state value at current time according to the posteriority status predication of last moment Indicate the observation of (t-1) day before
Be worth to candidate i moment t prior state estimated value.Measured value was observed when t daysIt afterwards, will be to the priori inscribed when this
State estimationAnd observationIt is weighted fusion, obtains the posteriority state estimation at current time
Wherein, KtFor kalman gain coefficient, to measure prior state estimated value and measured value in fusion process
Weight.Remember that posteriority state estimation mistake is
The covariance matrix of posteriority state estimation mistake isIt is expressed as
In order to enable posteriority state estimation and time of day value as close possible to, minimize posteriority state estimation mistake,
It is equivalent to minimizeThis optimization is equivalent to minimize posteriority state estimation error covariance matrix's
Mark solves:
It can be transported as a result, according to the prior state estimated value of daily candidate's supporting rate and the observation of each channel support rate
The posteriority state estimation on the same day is obtained with kalman gain coefficient Weighted Fusion.
Step 305, the more new stage of Kalman filtering is obtained by the posteriority state estimation and state transition equation on the same day
To the posteriority state estimation of next day supporting rate:
The present invention uses the diversity of internet platform in view of user, has the characteristics that data source is extensive, strong real-time,
Deviation of the data mapping in terms of disclosing the will of the people is overcome, will be had broad application prospects in future.
Claims (2)
1. a kind of election prediction technique of internet multi-resources Heterogeneous data fusion, which is characterized in that specific step is as follows:
Step 1: from internet data, screening is able to reflect the information source of election country popular feelings trend;
Step 2: extracting specific features from the internet information source filtered out, constructs candidate's branch based on internet platform
Holdup prediction index system;
The prediction index includes: social networks prediction index, search engine prediction index and the prediction of candidates participating in election campaign homepage
Index;Specific building process is as follows:
(1) social networks prediction index is constructed in terms of quantity and emotion two;
In quantitative aspects, by referring to the ratio of posting of candidate as prediction index in social networks;
Specifically, if referring in social network-i i-platform for t days, the model quantity of candidate i isThen same day candidate i
Supporting rate index is referred to what the platform obtainedCalculation is as follows:
Or support of the number as netizen to the candidate is praised using what each per day every note text of candidate obtained;
Specifically, every model j acquisition praises number and is if t days candidate i have issued n bars of model in social network-i i-platform
Then the same day candidate i praises several supporting rate indexs what the platform obtainedCalculation is as follows:
In terms of emotion, emotional semantic classification is carried out to the text information in social networks, and calculate positive emotion and Negative Affect
Ratio, thus as netizen to the supporting rate prediction index of candidate;
Specifically, if shared about posting for candidate i in t days social networksItem, wherein positive emotion model beItem, Negative Affect model areItem, then the text emotion supporting rate index of the candidate iCalculation is such as
Under:
(2) search engine prediction index is constructed;
Firstly, choosing the election maximum search engine of country usage amount;
Then, volumes of searches of the candidate i on t is obtainedAttention rate of the candidate i in t days search engines is calculated to refer to
Mark:
(3) candidates participating in election campaign homepage prediction index is constructed;
Candidate i in t days IP amount of access is by election contest websiteCalculate election contest homepage attention rate of the candidate i on t
Index:
Step 3: being considered as the signal to reflect the people's will for all kinds of prediction index of extraction, merged with Kalman filter model,
The supporting rate of dynamic realtime tracking prediction candidate;
Detailed process is as follows:
Step 301 carries out all kinds of prediction index extracted with the method for moving average smoothly, to obtain each prediction index smooth value
When to candidate's i supporting rate is predicted within t+1 days, t-l to t day each index value daily is first calculatedc
∈ { count, like, senti, search, IP }, each prediction index smooth value after then calculating separately rolling averageMeter
Calculation method is as follows:
Step 302, the state according to the common people to candidate i on t-1 develop and calculate the time of day of the candidate i on t
Value
B is control input variable coefficient matrix;ut-1To control input variable;wt-1For process noise vector, noise vector clothes
From mean value be 0, covariance matrix QtMultivariate normal distributions, wt~N (0, Qt);
Step 303, at each moment, by each prediction index smooth valueAs time of day valueReflection;It constructs t days
Measured valueWith time of day valueBetween mapping relations;
Measured valueHtFor time of day value to the mapping of observation measured value
Matrix;vtFor the white Gaussian noise of measurement, obey mean value be 0, covariance matrix RtMultivariate normal distributions, vt~N (0,
Rt);Assuming that during state evolution, original stateProcess noise wtWith measurement noise vtIndependently of each other;
Step 304 observed measured value when t daysAfter input Kalman filter model, Kalman filtering is according to same day candidate
The prior state estimated value and observation of supporting rate predict the posteriority state on the same day with kalman gain coefficient Weighted Fusion
Estimated value
Indicate the estimated value of the supporting rate according to preceding t-1 days of observation to candidate i on t;KtFor Kalman's increasing
Beneficial coefficient, to measure the weight of prior state estimated value and measured value in fusion process;
T days posteriority state estimations and state transition equation are updated by step 305 with Kalman filtering, are obtained
The posteriority state estimation of next day supporting rate:
2. a kind of election prediction technique of internet multi-resources Heterogeneous data fusion as described in claim 1, which is characterized in that step
Described in rapid one the step of filter information source specifically:
Firstly, searching the internet management of the country and the research of service organization's publication for election country
Report, extracts the widely used internet platform of netizen from report;
Then, traffic statistics are carried out by the website to internet platform, the website for obtaining the election country uses row
Name, filters out most frequently used website;
Finally, leave strip has the information source of user-generated content from most frequently used website;Meanwhile in candidate information
Candidates participating in election campaign homepage is added in source, and then by traffic statistics website, analyzes the common people for different candidates participating in election campaign websites
Degree of concern.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811038860.9A CN109241430A (en) | 2018-09-06 | 2018-09-06 | A kind of election prediction technique of internet multi-resources Heterogeneous data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811038860.9A CN109241430A (en) | 2018-09-06 | 2018-09-06 | A kind of election prediction technique of internet multi-resources Heterogeneous data fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109241430A true CN109241430A (en) | 2019-01-18 |
Family
ID=65067469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811038860.9A Pending CN109241430A (en) | 2018-09-06 | 2018-09-06 | A kind of election prediction technique of internet multi-resources Heterogeneous data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109241430A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563918A (en) * | 2020-03-30 | 2020-08-21 | 西北工业大学 | Target tracking method for data fusion of multiple Kalman filters |
CN112348257A (en) * | 2020-11-09 | 2021-02-09 | 中国石油大学(华东) | Election prediction method driven by multi-source data fusion and time sequence analysis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140289004A1 (en) * | 2012-08-10 | 2014-09-25 | Itron, Inc. | Near-Term Data Filtering, Smoothing and Load Forecasting |
CN104408108A (en) * | 2014-11-18 | 2015-03-11 | 重庆邮电大学 | Hot topic group influence analysis system and method based on grey system theory |
CN105050132A (en) * | 2015-08-10 | 2015-11-11 | 北京邮电大学 | Method for estimating extreme value throughput capacity of cell |
CN106227766A (en) * | 2016-07-15 | 2016-12-14 | 国家计算机网络与信息安全管理中心 | A kind of election public opinion prediction method of big data-driven |
CN107577782A (en) * | 2017-09-14 | 2018-01-12 | 国家计算机网络与信息安全管理中心 | A kind of people-similarity depicting method based on heterogeneous data |
-
2018
- 2018-09-06 CN CN201811038860.9A patent/CN109241430A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140289004A1 (en) * | 2012-08-10 | 2014-09-25 | Itron, Inc. | Near-Term Data Filtering, Smoothing and Load Forecasting |
CN104408108A (en) * | 2014-11-18 | 2015-03-11 | 重庆邮电大学 | Hot topic group influence analysis system and method based on grey system theory |
CN105050132A (en) * | 2015-08-10 | 2015-11-11 | 北京邮电大学 | Method for estimating extreme value throughput capacity of cell |
CN106227766A (en) * | 2016-07-15 | 2016-12-14 | 国家计算机网络与信息安全管理中心 | A kind of election public opinion prediction method of big data-driven |
CN107577782A (en) * | 2017-09-14 | 2018-01-12 | 国家计算机网络与信息安全管理中心 | A kind of people-similarity depicting method based on heterogeneous data |
Non-Patent Citations (1)
Title |
---|
ZHENG XIE 等: "《Wisdom of fusion: Prediction of 2016 Taiwan election with heterogeneous big data》", 《2016 13TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT(ICSSSM)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563918A (en) * | 2020-03-30 | 2020-08-21 | 西北工业大学 | Target tracking method for data fusion of multiple Kalman filters |
CN111563918B (en) * | 2020-03-30 | 2022-03-04 | 西北工业大学 | Target tracking method for data fusion of multiple Kalman filters |
CN112348257A (en) * | 2020-11-09 | 2021-02-09 | 中国石油大学(华东) | Election prediction method driven by multi-source data fusion and time sequence analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jaidka et al. | Predicting elections from social media: a three-country, three-method comparative study | |
Elçi | The rise of populism in Turkey: A content analysis | |
Bozarth et al. | Toward a better performance evaluation framework for fake news classification | |
Rao et al. | Actionable and political text classification using word embeddings and LSTM | |
CN108363753A (en) | Comment text sentiment classification model is trained and sensibility classification method, device and equipment | |
CN104750856B (en) | A kind of System and method for of multidimensional Collaborative Recommendation | |
CN103198072B (en) | Method and device is recommended in a kind of excavation of popular search word | |
Castro et al. | Back to# 6D: Predicting Venezuelan states political election results through Twitter | |
CN107291886A (en) | A kind of microblog topic detecting method and system based on incremental clustering algorithm | |
CN103699626A (en) | Method and system for analysing individual emotion tendency of microblog user | |
CN104572888B (en) | A kind of associated information retrieval method of time series | |
CN111241425B (en) | POI recommendation method based on hierarchical attention mechanism | |
Gómez Fortes et al. | Basque regional elections 2012: The return of nationalism under the influence of the economic crisis | |
Jerven | Measuring African development: past and present. Introduction to the Special Issue | |
CN109241430A (en) | A kind of election prediction technique of internet multi-resources Heterogeneous data fusion | |
CN113407729A (en) | Judicial-oriented personalized case recommendation method and system | |
Chueri et al. | Closing the gap: how descriptive and substantive representation affect women’s vote for populist radical right parties | |
Buono et al. | Big data econometrics: Now casting and early estimates | |
Nawaz et al. | Mining public opinion: a sentiment based forecasting for democratic elections of Pakistan | |
Stauffer et al. | Contextualizing the gender gap in voter turnout | |
De Groot | Culture, contiguity and conflict: on the measurement of ethnolinguistic effects in spatial spillovers | |
JP7291100B2 (en) | Anomaly/change estimation method, program and device using multiple posted time-series data | |
Bergman | Insights from the Quantification of the Study of Populism | |
CN106227766A (en) | A kind of election public opinion prediction method of big data-driven | |
Vilas et al. | The irruption of cryptocurrencies into Twitter cashtags: a classifying solution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190118 |