TWI744620B

TWI744620B - Pooling prediction system

Info

Publication number: TWI744620B
Application number: TW108111316A
Authority: TW
Inventors: 黃建源; 周宇軒; 陳詳翰
Original assignee: 金志聿
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2021-11-01
Also published as: TW202036447A

Abstract

An election prediction system is provided. The election prediction system comprises a winning factor analysis module, an online word-of-mouth estimation module, a media pool data accessing module, a public tendency data accessing module, a weight calculation module, and a win probability estimation module. The winning factor analysis module is utilized for generating a candidate strength data. The online word-of-mouth estimation module is utilized for generating an online word-of-mouth estimation data. The media pool data accessing module is utilized for generating a media pool data. The pubic tendency data accessing module is utilized for generating a public tendency data. The weight calculation module is utilized for generating a weight data based on a voter structure data. The win probability estimation module is utilized for generating a candidate predictive win probability.

Description

Election Forecast System

本發明係關於預測系統，尤其是關於選舉預測系統。 The present invention relates to a prediction system, especially an election prediction system.

隨著網路的迅速發展，網路輿情已經成為是品牌/服務業者，乃至於政策研究機構非常重視的資料來源。傳統單純仰賴民調機構抽樣調查的選舉預測方法，未能有效觸及不同族群，往往無法有效反映實際選民傾向，而容易產生預測失準之問題。 With the rapid development of the Internet, Internet public opinion has become a data source that brands/service companies and even policy research institutions attach great importance to. Traditional election prediction methods that rely solely on sample surveys by polling agencies have failed to effectively reach different ethnic groups, and often fail to effectively reflect actual voter preferences, and are prone to inaccurate predictions.

有鑑於此，本發明提供一種選舉預測系統，以提升預測準確度。 In view of this, the present invention provides an election prediction system to improve the accuracy of prediction.

本發明之一實施例提供一種選舉預測系統。此選舉預測系統包括一候選人勝選因素分析模組、一網路口碑預測模組、一媒體民調資料擷取模組、一群眾預測資料擷取模組、一權重計算模組與一當選率預測模組。 An embodiment of the present invention provides an election prediction system. This election prediction system includes a candidate winning factor analysis module, an online word-of-mouth prediction module, a media poll data acquisition module, a crowd prediction data acquisition module, a weight calculation module, and an election Rate prediction module.

其中，候選人勝選因素分析模組係用以擷取至少一歷史選情資料以分析出對應之至少一勝選因素權重據以產生一候選人實力資料。網路口碑預測模組係用以擷取至少一網站之一網站資料，並分析此網站資料以產生一網路口碑預測資料。前述網路口碑預測資料包括一選民熟悉度資料、一選民好感度資料與一網路經營績效資料之至少其中之一。媒體民調資料擷取模組係用以擷取至少一媒體民調報告以產生一媒體民調資料。群眾預測資料擷取模組係用以擷取至少一網路統計資料以產生一群眾預測資料。 Among them, the candidate winning factor analysis module is used to retrieve at least one historical election data to analyze the corresponding at least one winning factor The weight is used to generate a candidate's strength data. The internet word-of-mouth prediction module is used to retrieve at least one website data of one website, and analyze the website data to generate an internet word-of-mouth prediction data. The aforementioned online word-of-mouth prediction data includes at least one of voter familiarity data, voter favorability data, and online business performance data. The media poll data acquisition module is used to acquire at least one media poll report to generate a media poll data. The crowd prediction data acquisition module is used to retrieve at least one network statistical data to generate a crowd prediction data.

權重計算模組係用以依據一選民結構資料，產生對應於前述候選人實力資料、網路口碑預測資料、媒體民調資料與群眾預測資料之至少其中之一之一權重資料。當選率預測模組係用以依據權重資料調整候選人實力資料、網路口碑預測資料、媒體民調資料與群眾預測資料之權重以產生一候選人預測當選率。 The weight calculation module is used to generate weight data corresponding to at least one of the aforementioned candidate strength data, Internet word-of-mouth prediction data, media poll data, and crowd prediction data based on a voter structure data. The election rate prediction module is used to adjust the weight of candidate strength data, Internet word-of-mouth prediction data, media poll data, and mass prediction data based on the weight data to generate a candidate's predicted election rate.

相較於傳統單純仰賴民調機構抽樣調查的選舉預測方法，本發明所提供之選舉預測系統係兼顧媒體民調資料與網路群眾預測資料，利用機器學習之方式取得候選人實力資料，透過語意與情緒分析之方式取得網路口碑資料，並賦予各資料適當之權重。藉此，一方面可以觸及傳統選舉預測方法所無法觸及之選民族群，另一方面亦將其他可能的影響因素納入考慮，如此，即可提高選舉預測之準確度。 Compared with traditional election prediction methods that rely solely on sample surveys by polling agencies, the election prediction system provided by the present invention takes into account both media polling data and Internet crowd prediction data, and uses machine learning to obtain candidate strength data. Get online word-of-mouth data by means of sentiment analysis, and assign appropriate weights to each data. In this way, on the one hand, the electoral group that cannot be touched by traditional election prediction methods can be touched, and on the other hand, other possible influencing factors can be taken into consideration. In this way, the accuracy of election prediction can be improved.

本發明所採用的具體實施例，將藉由以下之實施例及圖式作進一步之說明。 The specific embodiments adopted in the present invention will be further explained by the following embodiments and drawings.

10:選舉預測系統 10: Election prediction system

100:候選人勝選因素分析模組 100: Candidate Winning Factor Analysis Module

200:網路口碑預測模組 200: Internet word-of-mouth prediction module

300:媒體民調資料擷取模組 300: Media Poll Data Acquisition Module

400:群眾預測資料擷取模組 400: Crowd prediction data acquisition module

500:權重計算模組 500: Weight calculation module

600:當選率預測模組 600: Winning rate prediction module

20:歷史選情資料庫 20: Historical election database

220:網路爬蟲單元 220: Web crawler unit

240:語意分析單元 240: Semantic Analysis Unit

260:情緒分析單元 260: Sentiment Analysis Unit

30a,30b:網站 30a, 30b: website

262:向量轉換模組 262: Vector Conversion Module

264:情緒預測模組 264: Emotion Prediction Module

266:預測值產生模組 266: Predicted value generation module

40a,40b:媒體 40a, 40b: media

520:權重產生單元 520: weight generation unit

540:權重調整單元 540: Weight adjustment unit

D_H:歷史選情資料 D _H : Historical election information

D1:候選人實力資料 D1: Candidate's strength information

DOC:網站資料 DOC: website information

D2:網路口碑預測資料 D2: Internet word-of-mouth prediction data

V_DOC:文本向量 V _DOC : text vector

C_P:正面信度資料 C _P : Positive reliability data

C_N:負面信度資料 C _N : Negative reliability data

V_S:情緒值 V _S : Mood value

D3:媒體民調資料 D3: Media polling data

P_M1,P_M2:媒體民調報告 P _M1 , P _M2 : media poll report

P_T:媒體傾向資料 P _T : Media trend data

D4:群眾預測資料 D4: Crowd forecast data

D_NS:選民結構資料 D _NS : Voter structure data

D5:權重資料 D5: Weight data

D_HNS:歷史選民結構資料 D _HNS : historical voter structure data

D_W:初估權重資料 D _W : Initial estimation weight data

D6:候選人預測當選率 D6: Candidates predict the election rate

D_A:相對優勢數值資料 D _A : Relative advantage numerical data

P_TL:發佈時間資料 P _TL : Release time information

640:當選率計算單元 640: Election Rate Calculation Unit

P_N1,P_N2:網路統計資料 P _N1 ,P _N2 : network statistics

250:網路聲量統計單元 250: Network sound volume statistics unit

620:優勢數值計算單元 620: Advantage Numerical Calculation Unit

50:預測市場網站 50: Prediction Market Website

第一圖係本發明之選舉預測系統一實施例之方塊示意圖。 The first figure is a block diagram of an embodiment of the election prediction system of the present invention.

第二圖係第一圖之候選人勝選因素分析模組一實施例之方塊示意圖。 The second figure is a block diagram of an embodiment of the candidate winning factor analysis module of the first figure.

第三圖係第一圖之網路口碑預測模組一實施例之方塊示意圖。 The third figure is a block diagram of an embodiment of the Internet word-of-mouth prediction module of the first figure.

第四圖係第三圖之網路情緒分析單元一實施例之方塊示意圖。 The fourth diagram is a block diagram of an embodiment of the network sentiment analysis unit in the third diagram.

第五圖係第一圖之媒體民調資料擷取模組一實施例之方塊示意圖。 The fifth figure is a block diagram of an embodiment of the media poll data acquisition module of the first figure.

第六圖係第一圖之群眾預測分析單元一實施例之方塊示意圖。 The sixth figure is a block diagram of an embodiment of the crowd prediction analysis unit in the first figure.

第七圖係第一圖之權重計算模組一實施例之方塊示意圖。 The seventh diagram is a block diagram of an embodiment of the weight calculation module of the first diagram.

第八圖係第一圖之當選率預測模組一實施例之方塊示意圖。 The eighth figure is a block diagram of an embodiment of the election rate prediction module of the first figure.

第九圖係本發明之選舉預測方法一實施例之流程圖。 Figure 9 is a flowchart of an embodiment of the election prediction method of the present invention.

下面將結合示意圖對本發明的具體實施方式進行更詳細的描述。根據下列描述和申請專利範圍，本發明的優點和特徵將更清楚。需說明的是，圖式均採用非常簡化的形式且均使用非精準的比例，僅用以方便、明晰地輔助說明本發明實施例的目的。 The specific embodiments of the present invention will be described in more detail below in conjunction with the schematic diagrams. According to the following description and the scope of patent application, the advantages and features of the present invention will be more clear. It should be noted that the diagrams are in a very simplified form and all use imprecise proportions. They are only used for Conveniently and clearly assist in explaining the purpose of the embodiments of the present invention.

第一圖係本發明之選舉預測系統10一實施例之方塊示意圖。如圖中所示，此選舉預測系統10包括一候選人勝選因素分析模組100、一網路口碑預測模組200、一媒體民調資料擷取模組300、一群眾預測資料擷取模組400、一權重計算模組500與一當選率預測模組600。 The first figure is a block diagram of an embodiment of the election prediction system 10 of the present invention. As shown in the figure, the election prediction system 10 includes a candidate winning factor analysis module 100, an Internet word-of-mouth prediction module 200, a media poll data acquisition module 300, and a crowd prediction data acquisition module. Group 400, a weight calculation module 500, and an election rate prediction module 600.

其中，候選人勝選因素分析模組100係用以擷取至少一歷史選情資料D_H以分析出各選區之勝選因素權重據以產生一候選人實力資料D1。在一實施例中，前述歷史選情資料D_H包括一候選人資料、一政黨推薦資料、一參選經驗資料與一政務經驗資料。 Among them, the candidate winning factor analysis module 100 is used to retrieve at least one historical election data D _H to analyze the weight of the winning factors of each constituency to generate a candidate strength data D1. In one embodiment, the aforementioned historical election information D _H includes one candidate data, one political party recommendation data, one election experience data, and one government affairs experience data.

在一實施例中，如第二圖所示，候選人勝選因素分析模組100連結至一歷史選情資料庫20以擷取各個選區之歷史選情資料D_H。在一實施例中，此歷史選情資料庫20係中央選舉委員會選舉資料庫，不過亦不限於此，此歷史選情資料庫20亦可以是其他公開的選舉資料庫，如維基百科。又，候選人勝選因素分析模組100亦可連結多個歷史選情資料庫20以擷取更完整的歷史選情資料。 In one embodiment, as shown in the second figure, the candidate winning factor analysis module 100 is connected to a historical election database 20 to retrieve historical election data D _{H of} each constituency. In one embodiment, the historical election database 20 is the election database of the Central Election Commission, but it is not limited to this. The historical election database 20 may also be other public election databases, such as Wikipedia. In addition, the candidate winning factor analysis module 100 can also link multiple historical election databases 20 to retrieve more complete historical election data.

候選人勝選因素分析模組100係以迴歸分析與/或機器學習之方式，分析前述歷史選情資料D_H，以判斷各選區之勝選因素權重，進而產生候選人實力資料D1。在一實施例中，候選人勝選因素分析模組100可透過迴歸分析之方式，判斷各種歷史選情資料D_H(如候選人年齡、推薦政黨、參選經驗與政務經驗)與候選人勝選之關聯度，以確認各選區之勝選因素權重，套用於待預測之候選人，以產生候選人實力資料D1。另外，在一實施例中，候選人勝選因素分析模組100可將歷史選情資料D_H作為訓練材料，並透過機器學習之方式，分析出各種歷史選情資料D_H與候選人勝選的關聯函數或關聯度，套用於待預測之候選人，以產生候選人實力資料D1。 The candidate winning factor analysis module 100 analyzes the aforementioned historical election data D _H by means of regression analysis and/or machine learning to determine the weight of the winning factors of each constituency, and then generates candidate strength data D1. In one embodiment, the candidate winning factor analysis module 100 can use regression analysis to determine various historical election data D _H (such as candidate age, recommended political party, election experience, and government experience) and the candidate's victory. The degree of relevance of the election is used to confirm the weight of the winning factors of each constituency and apply it to the candidate to be predicted to generate candidate strength data D1. In addition, in one embodiment, the candidate winning factor analysis module 100 can use historical election data D _H as training materials, and analyze various historical election data D _H and the candidate's winning by means of machine learning. The correlation function or degree of correlation is applied to the candidate to be predicted to generate candidate strength data D1.

網路口碑預測模組200係用以擷取至少一網站之一網站資料DOC，並分析此網站資料DOC以產生一網路口碑預測資料D2。前述網路口碑預測資料D2包括一選民熟悉度資料、一選民好感度資料與一網路經營績效資料之至少其中之一。 The Internet word-of-mouth prediction module 200 is used to retrieve one of the website data DOC of at least one website, and analyze the website data DOC to generate an Internet word-of-mouth prediction data D2. The aforementioned online word-of-mouth prediction data D2 includes at least one of voter familiarity data, voter favorability data, and online business performance data.

就選民熟悉度資料而言，在一實施例中，如第三圖所示，網路口碑預測模組200包括一網路爬蟲單元220、一語意分析單元240、一網路聲量統計單元250與一情緒分析單元260。網路爬蟲單元220係用以擷取至少一網站之網站資料DOC(圖中以二個網站30a與30b為例)，例如網路討論區之文本資料等。語意分析單元240係對於所擷取之網站資料DOC進行關鍵字詞分析，以產生網路口碑預測資料D2，尤其是其中的選民熟悉度資料。 In terms of voter familiarity data, in one embodiment, as shown in the third figure, the Internet word-of-mouth prediction module 200 includes a Internet crawler unit 220, a semantic analysis unit 240, and an Internet voice volume statistics unit 250. With a sentiment analysis unit 260. The web crawler unit 220 is used to retrieve website data DOC of at least one website (in the figure, two websites 30a and 30b are taken as examples), such as text data of a discussion area on the Internet. The semantic analysis unit 240 performs keyword analysis on the retrieved website data DOC to generate Internet word-of-mouth prediction data D2, especially the voter familiarity data therein.

在一實施例中，此選民熟悉度資料包括網路聲量、搜尋熱度、社群粉絲數、社群互動數與社群談論數。其中，網路聲量可透過網路聲量統計單元250取得。候選人名稱在網站討論文章中出現的次數越多，即代表有較高的網路聲量，通常也代表選民熟悉度越高。搜尋熱度係選民利用搜尋引擎(如Google)搜尋特定候選人的頻繁程度。搜尋熱度越高，通常也代表選民熟悉度越高。關於搜尋熱度資料可利用既有的統計軟體，如Google trend，取得。社群粉絲數、社群互動數與社群談論數均代表特定候選人之粉絲團的影響力。粉絲團影響力越高，通常也代表選民熟悉度越高。社群粉絲數、社群互動數與社群談論數的資料則可透過網路爬蟲單元220與既有的統計軟體，由社群網站，如臉書、推特等，取得。 In one embodiment, the voter familiarity data includes internet voice volume, search popularity, number of community fans, number of community interactions, and number of community discussions. Among them, the network sound volume can be obtained through the network sound volume statistics unit 250. The more the candidate’s name appears in the discussion article on the website, that is It means that there is a higher volume of online voice, and it usually means that voters are more familiar. Search heat is how often voters use search engines (such as Google) to search for specific candidates. The higher the search popularity, the higher the voters' familiarity. Information about search popularity can be obtained by using existing statistical software, such as Google trend. The number of community fans, the number of community interactions, and the number of community discussions all represent the influence of the fan group of a particular candidate. The higher the influence of the fan group, the higher the familiarity of voters. The data on the number of community fans, the number of community interactions, and the number of community discussions can be obtained from social websites such as Facebook and Twitter through the web crawler unit 220 and existing statistical software.

就選民好感度而言，情緒分析單元260係以情緒分析之方式分析所擷取之網路資料，以產生網路口碑預測資料D2，尤其是其中的選民好感度資料。在一實施例中，如第四圖所示，情緒分析單元260具有一向量轉換模組262、一情緒預測模組264與一預測值產生模組266。向量轉換模組262係用以對所擷取之網站資料DOC(尤其是文本資料)進行向量轉換以產生一文本向量V_DOC。情緒預測模組264內具有一正面情緒預測模型與一負面情緒預測模型。文本向量V_DOC係套人正面情緒預測模型與負面情緒預測模型以產生一正面信度資料C_P與一負面信度資料C_N。預測值產生模組266係依據正面信度資料C_P與負面信度資料C_N，產生一情緒值V_S(可將文本區分為正面、負面與中立)。此情緒值V_S即可作為網路口碑預測資料D2，尤其是其中的選民好感度資料。 In terms of voter favorability, the sentiment analysis unit 260 analyzes the retrieved network data by way of sentiment analysis to generate Internet word-of-mouth prediction data D2, especially the voter favorability data therein. In one embodiment, as shown in FIG. 4, the sentiment analysis unit 260 has a vector conversion module 262, an sentiment prediction module 264, and a prediction value generation module 266. The vector conversion module 262 is used to perform vector conversion on the retrieved website data DOC (especially text data) to generate a text vector V _DOC . The emotion prediction module 264 has a positive emotion prediction model and a negative emotion prediction model. The text vector V _DOC is a combination of a positive emotion prediction model and a negative emotion prediction model to generate a positive reliability data C _P and a negative reliability data _CN . The predictive value generating module 266 generates a sentiment value V _S (the text can be divided into positive, negative, and neutral) based on the positive reliability data C _P and the negative reliability data C _N. This sentiment value V _S can be used as the Internet word-of-mouth prediction data D2, especially the voter favorability data therein.

除了前述文本情緒分析之方式，網路口碑預測模組200亦可利用網路正負評比、社群網站踩讚比與表情符號之資料，產生選民好感度資料。 In addition to the aforementioned method of text sentiment analysis, Internet word-of-mouth The prediction module 200 can also use the data of the internet positive and negative ratings, the likes of social networking sites and emoticons to generate voter favorability data.

就網路經營績效資料而言，在一實施例中，網路經營績效資料包括一社群活躍指數、一粉絲黏著指數與一貼文互動指數。社群活躍指數代表，對於特定候選人，網友自主發文討論的熱烈程度。粉絲黏著指數代表，粉絲與特定候選人互動的積極程度。貼文互動指數代表，粉絲與特定候選人之互動的熱烈狀態。 Regarding the online business performance data, in one embodiment, the online business performance data includes a community activity index, a fan adhesion index, and a post interaction index. The community activity index represents the degree of enthusiasm in the discussion of specific candidates by netizens. The Fan Adhesion Index represents the active degree of fans interacting with a particular candidate. The Post Interaction Index represents the enthusiastic state of interaction between fans and a specific candidate.

前述資料可透過網路爬蟲單元220與語意分析單元240取得。舉例來說，網路爬蟲單元220可以擷取特定討論區之文本資料，如主文與回文，主文之格式可包括文章標題、文章內容、文章作者、文章連結、發佈時間、按讚數量、回文數量、分享數量等：回文之格式可包括回文內容、回文作者、被回覆的主文連結、回覆時間等。語意分析單元240可針對網路爬蟲單元220擷取之文本資料進行分析，以判斷各文本所涉及的候選人，進而得知各個候選人的網路經營績效。 The aforementioned data can be obtained through the web crawler unit 220 and the semantic analysis unit 240. For example, the web crawler unit 220 can retrieve text data of a specific discussion area, such as main text and palindrome. The format of the main text may include article title, article content, article author, article link, publication time, number of likes, and response. Number of essays, number of shares, etc.: The format of the palindrome can include the content of the palindrome, the author of the palindrome, the link to the main text being replied, and the time of reply, etc. The semantic analysis unit 240 can analyze the text data captured by the web crawler unit 220 to determine the candidates involved in each text, and then learn the online business performance of each candidate.

舉例來說，透過前述網路爬蟲單元220與語意分析單元240，可以分析在特定網路討論區中，網友對於特定候選人的發文討論熱烈程度(例如依據討論區中涉及特定候選人的主文數量)、互動的積極程度(例如依據討論區中涉及特定候選人之按讚數量、分享數量等)、互動的熱烈程度(例如依據討論區中涉及特定候選人之回文數量)等，透過這些資料，即可取得網路經營績效資料。 For example, through the aforementioned web crawler unit 220 and semantic analysis unit 240, it is possible to analyze the degree of enthusiasm of netizens’ discussions about specific candidates in specific online discussion forums (for example, based on the number of main posts in the discussion forum involving specific candidates. ), the degree of active interaction (for example, based on the number of likes and shares related to a specific candidate in the discussion area), and the degree of enthusiasm (for example, based on the number of palindromes involving a specific candidate in the discussion area), etc., through these data , You can get network operation performance data.

媒體民調資料擷取模組300係用以擷取至少一媒體民調報告以產生一媒體民調資料D3。媒體民調報告是由具公信力的媒體以抽樣調查方法，透過電話等方式直接調查選民對候選人的支持度所獲得的報告資料。在一實施例中，如第五圖所示，媒體民調資料擷取模組300係由至少一媒體取得媒體民調報告P_M1,P_M2(圖中以二個媒體40a與40b，產生二個媒體民調報告P_M1,P_M2為例)，產生對應於各個媒體民調報告P_M1,P_M2之一媒體傾向資料P_T與一發佈時間資料P_TL，並依據媒體傾向資料P_T與發佈時間資料P_TL調整媒體民調報告P_M1,P_M2之數據以產生媒體民調資料D3。舉例來說，若是某個媒體民調報告具有一特定政黨傾向，即須調降此報告中該特定政黨所推薦之候選人的民調數據。若是某個媒體民調報告發佈時間較近，就需要提高此媒體民調報告對於所產生之媒體民調資料D3之影響力。在一實施例中，可利用各個媒體之歷史媒體民調資料分析出媒體傾向資料，用以調整各媒體民調報告之數據。 The media poll data capturing module 300 is used to capture at least one media poll report to generate a media poll data D3. The media poll report is the report data obtained by the credible media using sample survey methods to directly survey voters' support for candidates through telephone and other methods. In one embodiment, as shown in FIG. Fifth, data acquisition module 300 polls media acquired by the at least one media-based media poll reported P _M1, P _M2 (FIG. 40a to two media and 40b, to generate two A media poll report P _M1 , P _M2 as an example), generate a media tendency data P _T and a release time data P _TL _{corresponding to each media poll report P M1} , P _M2 , and based on the media tendency data P _T and The release time data P _TL adjusts the data of the media poll reports P _M1 and P _M2 to generate the media poll data D3. For example, if a media poll report has a specific political party tendency, the poll data of the candidates recommended by the specific political party in the report must be downgraded. If a certain media poll report is released recently, it is necessary to increase the influence of the media poll report on the generated media poll data D3. In one embodiment, the historical media poll data of each media can be used to analyze media tendency data to adjust the data of each media poll report.

如第六圖所示，群眾預測資料擷取模組400係用以擷取至少一網路統計資料P_N1,P_N2以產生一群眾預測資料D4。此網路統計資料P_N1可以是預測市場網站50，如未來事件交易所、智慧交易所(又名台北政治經濟交易所)等，之統計資料。也可以是由群眾直接表達之方式所產生之網路統計資料P_N2，例如透過網路票選之方式。一般而言，參加人數越多、投票越多，就會對於所產生之群眾預測資料D4有較大的影響力。 As shown in FIG. 6, the crowd prediction data acquisition module 400 is used to capture at least one network statistical data P _N1 and P _N2 to generate a crowd prediction data D4. This network statistical data P _N1 may be statistical data of prediction market websites 50, such as Future Event Exchange, Smart Exchange (also known as Taipei Political Economy Exchange), etc. It can also be the network statistical data P _N2 generated by the direct expression of the masses, for example, by means of online voting. Generally speaking, the more participants and the more votes, the greater the influence on the generated mass forecast data D4.

權重計算模組500係依據一選民結構資料D_NS，產生對應於前述候選人實力資料D1、網路口碑預測資料D2、媒體民調資料D3與群眾預測資料D4之至少其中之一之權重資料D5。在一實施例中，選民結構資料可包括年齡結構資料、地域資料、性別結構資料、人口流動資料等可能影響投票傾向之資料。 The weight calculation module 500 generates weight data D5 corresponding to at least one of the aforementioned candidate strength data D1, Internet word-of-mouth prediction data D2, media poll data D3, and crowd prediction data D4 based on a voter structure data D _NS . In one embodiment, the voter structure data may include age structure data, geographic data, gender structure data, population movement data, and other data that may affect voting tendencies.

在一實施例中，權重計算模組500可參酌歷史選民結構資料與歷史選情資料D_H，計算出不同選區對應於當前選民結構資料D_NS之權重分配。在一實施例中，權重計算模組500可參酌歷史選民結構資料與先前之選情預測資料，計算出不同選區對應於當前選民結構資料D_NS之權重分配。 In one embodiment, the weight calculation module 500 can refer to the historical voter structure data and the historical election data D _H to calculate the weight distribution of different electoral districts corresponding to the current voter structure data D _NS. In one embodiment, the weight calculation module 500 may refer to historical voter structure data and previous election forecast data to calculate the weight distribution _{of different electoral districts corresponding to the current voter structure data D NS.}

在一實施例中，如第七圖所示，權重計算模組500具有一權重產生單元520與一權重調整單元540。權重產生單元520係依據歷史選民結構資料D_HNS與歷史選情資料D_H，計算出對應之初估權重資料D_W。權重調整單元540係依據選民結構資料D_NS區分出一網路原生選民群組與一非網路原生選民群組，並針對此二個群組套用不同之權重，以獲得權重資料D5。 In one embodiment, as shown in FIG. 7, the weight calculation module 500 has a weight generation unit 520 and a weight adjustment unit 540. The weight generating unit 520 calculates the corresponding initial estimated weight data D _W based on the historical voter structure data D _HNS and the historical election data D _H. The weight adjustment unit 540 distinguishes a network-native voter group from a non-network-native voter group based on the voter structure data D _{NS, and applies different weights to the two groups to obtain the weight data D5.}

舉例來說，選民結構資料D_NS係包括選民年齡結構資料。權重調整單元540可將年齡小於四十歲的選民定義為網路原生選民，將年齡大於或等於四十歲的選民定義為非網路原生選民。針對非網路原生選民，權重調整單元540係調高候選人實力資料D1與媒體民調資料D3之權重。針對網路原生選民，權重調整單元540係調高網路口碑預測資料D2與群眾預測資料D4之權重。不過亦不限於此。在一實施例中，針對非網路原生選民，權重調整單元540可忽略網路口碑預測資料D2與群眾預測資料D4。針對網路原生選民，權重調整單元540可忽略候選人實力資料D1與媒體民調資料D3。 For example, the voter structure data D _NS includes the voter age structure data. The weight adjustment unit 540 may define voters younger than forty years old as Internet native voters, and voters older than or equal to 40 years old as non-online native voters. For non-Internet native voters, the weight adjustment unit 540 increases the weight of candidate strength data D1 and media polling data D3. For the original online voters, the weight adjustment unit 540 increases the weights of the online word-of-mouth prediction data D2 and the crowd prediction data D4. But it is not limited to this. In one embodiment, for non-Internet native voters, the weight adjustment unit 540 may ignore the Internet word-of-mouth prediction data D2 and the crowd prediction data D4. For the native voters on the Internet, the weight adjustment unit 540 may ignore the candidate strength data D1 and the media polling data D3.

當選率預測模組600係依據權重計算模組500產生之權重資料D5調整候選人實力資料D1、網路口碑預測資料D2、媒體民調資料D3與群眾預測資料D4之權重以產生一候選人預測當選率D6。如第八圖所示，在一實施例中，當選率預測模組600包括一優勢數值計算單元620與一當選率計算單元640。其中，優勢數值計算單元620可依據權重資料D5調整候選人實力資料D1、網路口碑預測資料D2、媒體民調資料D3與群眾預測資料D4之權重以計算出各個候選人之相對優勢數值資料D_A。當選率計算單元640係依據選區特性，例如選舉制度之差異，例如單一選區制(即在一個選區中選出單一個當選人)、複數選區制(即在一個選區中選出複數個當選人)等，透過相對優勢數值資料D_A計算出各個候選人之當選機率，即候選人預測當選率D6。舉例來說，當選率預測模組600可利用機器學習或迴歸分析之方式，依據各個候選人之相對優勢數值，計算出各個候選人之當選機率。 The election rate prediction module 600 adjusts the weights of candidate strength data D1, Internet word-of-mouth prediction data D2, media poll data D3 and crowd prediction data D4 based on the weight data D5 generated by the weight calculation module 500 to generate a candidate prediction Election rate D6. As shown in FIG. 8, in one embodiment, the winning rate prediction module 600 includes an advantage value calculation unit 620 and a winning rate calculation unit 640. Among them, the advantage value calculation unit 620 can adjust the weight of candidate strength data D1, Internet word-of-mouth prediction data D2, media poll data D3, and crowd prediction data D4 according to weight data D5 to calculate the relative advantage value data D of each candidate. _A. The election rate calculation unit 640 is based on the characteristics of electoral districts, such as differences in electoral systems, such as a single electoral district system (that is, a single elector is elected in a electoral district), a plural electoral district system (that is, multiple electors in a electoral district), etc., Calculate the probability of each candidate's election through the numerical data of relative advantage D _A , that is, the candidate's predicted election rate D6. For example, the election rate prediction module 600 may use machine learning or regression analysis to calculate the election probability of each candidate based on the relative advantage value of each candidate.

請同時參照第九圖所示。第九圖係本發明之選舉預測方法一實施例之流程圖。如圖中所示，此選舉預測方法包括以下步驟。 Please also refer to the ninth picture. Figure 9 is a flowchart of an embodiment of the election prediction method of the present invention. As shown in the figure, this election prediction method includes the following steps.

在步驟S120中，擷取至少一歷史選情資料以分析出對應之勝選因素權重據以產生一候選人實力資料。在步驟S130中，擷取至少一網站之一網站資料並分析網站資料以產生一網路口碑預測資料。在步驟S140中，擷取至少一媒體民調報告以產生一媒體民調資料。在步驟S150中，擷取至少一網路統計資料以產生一群眾預測資料。其中，網路口碑預測資料包括一選民熟悉度資料、一選民好感度資料與一網路經營績效資料之至少其中之一。 In step S120, at least one historical selection data is retrieved Analyze the weight of the corresponding winning factors to generate a candidate's strength data. In step S130, the website data of at least one website is retrieved and the website data is analyzed to generate an Internet word-of-mouth prediction data. In step S140, at least one media poll report is captured to generate a media poll data. In step S150, at least one network statistical data is retrieved to generate a crowd forecast data. Among them, the Internet word-of-mouth prediction data includes at least one of voter familiarity data, voter favorability data, and online business performance data.

前述步驟S120至S150之資料可以同時產生或是依序產生。各個步驟之順序亦不設限。在本實施例中，這些資料係同時產生提供後續步驟進行處理。 The data in the aforementioned steps S120 to S150 can be generated simultaneously or sequentially. The order of the steps is not limited. In this embodiment, these data are generated at the same time to provide subsequent steps for processing.

隨後，如步驟S160所示，依據一選民結構資料，產生對應於前述候選人實力資料、網路口碑預測資料、媒體民調資料與群眾預測資料之至少其中之一之一權重資料。 Subsequently, as shown in step S160, based on a voter structure data, a weight data corresponding to at least one of the aforementioned candidate strength data, Internet word-of-mouth prediction data, media poll data, and crowd prediction data is generated.

接下來，如步驟S180所示，依據此權重資料調整候選人實力資料、網路口碑預測資料、媒體民調資料與群眾預測資料之權重，以產生一候選人預測當選率。 Next, as shown in step S180, the weights of candidate strength data, Internet word-of-mouth prediction data, media poll data, and crowd prediction data are adjusted according to the weight data to generate a candidate's predicted election rate.

相較於傳統單純仰賴民調機構抽樣調查的選舉預測方法，本發明所提供之選舉預測系統係兼顧媒體民調資料與網路群眾預測資料，利用機器學習之方式取得候選人實力資料，透過語意與情緒分析之方式取得網路口碑資料，並賦予各資料適當之權重。藉此，一方面可以觸及傳統選舉預測方法所無法觸及之選民族群，另一方面亦將其他可能的影響因素納入考慮，如此，即可提高選舉預測之準確度。 Compared with traditional election prediction methods that rely solely on sample surveys by polling agencies, the election prediction system provided by the present invention takes into account both media polling data and Internet crowd prediction data, and uses machine learning to obtain candidate strength data. Get online word-of-mouth data by means of sentiment analysis, and assign appropriate weights to each data. In this way, on the one hand, it can touch the ethnic groups that cannot be touched by traditional election forecasting methods. Groups, on the other hand, also take other possible influencing factors into consideration. In this way, the accuracy of election forecasts can be improved.

上述僅為本發明較佳之實施例而已，並不對本發明進行任何限制。任何所屬技術領域的技術人員，在不脫離本發明的技術手段的範圍內，對本發明揭露的技術手段和技術內容做任何形式的等同替換或修改等變動，均屬未脫離本發明的技術手段的內容，仍屬於本發明的保護範圍之內。 The above are only preferred embodiments of the present invention, and do not limit the present invention in any way. Any person skilled in the art, without departing from the scope of the technical means of the present invention, makes any form of equivalent replacement or modification or other changes to the technical means and technical content disclosed by the present invention, which does not depart from the technical means of the present invention. The content still falls within the protection scope of the present invention.

10:選舉預測系統 10: Election prediction system

200:網路口碑預測模組 200: Internet word-of-mouth prediction module

300:媒體民調資料擷取模組 300: Media Poll Data Acquisition Module

500:權重計算模組 500: Weight calculation module

600:當選率預測模組 600: Winning rate prediction module

D1:候選人實力資料 D1: Candidate's strength information

D2:網路口碑預測資料 D2: Internet word-of-mouth prediction data

D3:媒體民調資料 D3: Media polling data

D4:群眾預測資料 D4: Crowd forecast data

D5:權重資料 D5: Weight data

D6:候選人預測當選率 D6: Candidates predict the election rate

Claims

An election prediction system, comprising: a candidate winning factor analysis module for extracting at least one historical election data to analyze the corresponding weight of at least one winning factor to generate a candidate's strength data; a network The word-of-mouth prediction module is used to retrieve one of the website data of at least one website, and analyze the website data to generate an online word-of-mouth prediction data. The online word-of-mouth prediction data includes a voter familiarity data, a voter favorability data, and At least one of the network business performance data; a media poll data acquisition module for extracting at least one media poll report to generate a media poll data; a crowd forecast data acquisition module, It is used to retrieve at least one network statistical data to generate a mass prediction data; a weight calculation module is used to generate data corresponding to the candidate's strength, the Internet word-of-mouth prediction data, and the media based on a voter structure data Poll data and one of the mass prediction data; and a selection rate prediction module for adjusting the candidate's strength data, the Internet word-of-mouth prediction data, the media poll data and the mass prediction based on the weight data The weight of the data is used to generate a candidate’s predicted election rate; wherein, the voter structure data includes at least one age structure data, and the weight calculation module distinguishes a network-native voter group from a non-net voter group based on the age structure data Route native voter groups, and increase the weight of the candidate's strength data and the media polling data for the non-online native voter group; Among them, the candidate winning factor analysis module uses regression analysis and/or machine learning to generate the candidate's strength data; wherein, the online word-of-mouth prediction module uses semantic analysis and/or sentiment analysis to analyze The website data is used to generate the voter’s familiarity data and the voter’s favorability data.

For example, the election prediction system of item 1 of the scope of patent application, wherein the voter structure data includes at least one geographic data, one gender structure data, one age structure data, and one population movement data.

For example, the election prediction system of item 1 of the scope of patent application, in which the media poll data acquisition module obtains a media trend data corresponding to each media poll report, and adjusts the media poll according to the media trend data The reported data is used to generate the media polling data.

For example, the election prediction system of item 1 of the scope of patent application, wherein the historical election data includes a candidate data, a political party recommendation data, a candidate experience data, and a government affairs experience data.