TWI697789B - Public opinion inquiry system and method - Google Patents

Public opinion inquiry system and method Download PDF

Info

Publication number
TWI697789B
TWI697789B TW107119658A TW107119658A TWI697789B TW I697789 B TWI697789 B TW I697789B TW 107119658 A TW107119658 A TW 107119658A TW 107119658 A TW107119658 A TW 107119658A TW I697789 B TWI697789 B TW I697789B
Authority
TW
Taiwan
Prior art keywords
public opinion
articles
images
text
group
Prior art date
Application number
TW107119658A
Other languages
Chinese (zh)
Other versions
TW202001595A (en
Inventor
楊孟鑫
黃華泰
黃博威
Original Assignee
中華電信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中華電信股份有限公司 filed Critical 中華電信股份有限公司
Priority to TW107119658A priority Critical patent/TWI697789B/en
Publication of TW202001595A publication Critical patent/TW202001595A/en
Application granted granted Critical
Publication of TWI697789B publication Critical patent/TWI697789B/en

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A public opinion inquiry system and a public opinion inquiry method are provided. In some embodiments, an internet module for video capture settings sets the rules for searching primary articles about public opinion related to a keyword, and sets the rules for searching videos about public opinion related to the keyword. Then, a module of text integration and arrangement locates supplementary articles about public opinion based on the text in subtitle regions of the videos about public opinion, and a module for optimizing the index of public opinion combines the primary articles and supplementary articles about public opinion as a result of public opinion inquiry.

Description

輿情查詢系統及方法 Public opinion query system and method

本發明係關於一種輿情查詢技術,特別是一種藉由影像比對來找尋輿情文章的輿情查詢系統及方法。 The present invention relates to a public opinion query technology, in particular to a public opinion query system and method for searching public opinion articles through image comparison.

在傳統的輿情查詢系統中,通常使用者會先輸入欲查詢輿情文章的關鍵字(如:稅改制度、美牛進口等),再由查詢系統找尋資料庫中是否存在包含該關鍵字的輿情文章,然後將找到的輿情文章提供給使用者。 In the traditional public opinion query system, usually users will first enter the keywords of the articles to be searched for public opinion (such as tax reform system, American cattle imports, etc.), and then the query system will find whether there is public opinion containing the keyword in the database Articles, and then provide the users with the found public opinion articles.

然而這樣的輿情查詢系統存在許多問題。首先,僅由文章內容是否包含該關鍵字的方式來找到的輿情文章,其文章數量通常不夠多,這對於想要深度了解特定主題(關鍵字)之輿情內容的使用者是遠遠不夠的。 However, such public opinion query system has many problems. First of all, the number of public opinion articles found only by whether the content of the article contains the keyword is usually not enough, which is far from enough for users who want to deeply understand the public opinion content of a specific topic (keyword).

其次,由於使用者通常僅以單一關鍵字在輿情查詢系統中進行查詢,而無法以窮舉的方式在輿情查詢系統中輸入所有與特定主題相關的關鍵字,因此在查詢時,系統無法提供完整、全方位的輿情文章。 Secondly, because users usually only use a single keyword to query in the public opinion query system, and cannot enter all keywords related to a specific topic in the public opinion query system in an exhaustive manner, the system cannot provide complete information when querying. , All-round public opinion articles.

舉例而言,使用者對於主題為稅改制度的輿情文章有興趣時,其在輿情查詢系統的搜尋欄位輸入「稅改制度」。然而,實際上與「稅改制度」相關的關鍵字可能有「稅務 改革」、「稅務制度變更」、「稅改聯盟」、「稅務修法」等。此時,若僅輸入「稅改制度」的查詢,基本上是無法在一般輿情查詢系統中查到與「稅改制度」相關且完整的輿情文章。 For example, when a user is interested in a public opinion article whose topic is the tax reform system, he enters "tax reform system" in the search field of the public opinion query system. However, in fact, the keywords related to the "tax reform system" may include "tax reform", "tax system change", "tax reform alliance", "tax revision law", etc. At this time, if you only enter the "tax reform system" query, it is basically impossible to find complete public opinion articles related to the "tax reform system" in the general public opinion query system.

鑑於前述問題,著實有必要提供一有效的輿情查詢系統及方法,來改善使用者在使用輿情查詢系統時所面臨的難題。 In view of the aforementioned problems, it is really necessary to provide an effective public opinion query system and method to improve the difficulties that users face when using the public opinion query system.

基於先前技術所存在的問題,本發明揭示了一能提供較完整輿情文章的輿情查詢系統及方法。相較於先前技術,本發明之一實施例揭示了結合文字辨識模組、合併字幕模組、文字整合與排序模組及輿情指標優化模組等模組來搜尋、建立完整的輿情文章,以供使用者能查詢、瞭解較完整的輿情內容。 Based on the problems in the prior art, the present invention discloses a public opinion query system and method that can provide relatively complete public opinion articles. Compared with the prior art, an embodiment of the present invention discloses a combination of a text recognition module, a merged subtitle module, a text integration and sorting module, and a public opinion index optimization module to search for and create a complete public opinion article. For users to query and understand more complete public opinion content.

本發明之一實施例提供了一種輿情查詢系統,包含:一網路影片擷取設定模組,係用以設定一第一規則及一第二規則,其中,該第一規則係在複數輿情文章中搜尋與一關鍵字相關的第一組輿情文章的規則,而該第二規則係在複數影像中搜尋輿情影像的規則,該等輿情影像係為與該關鍵字相關的影像;一文字辨識模組,係將根據該第二規則所搜尋到的該等輿情影像進行文字辨識;一合併字幕模組,係根據該等輿情影像進行文字辨識的結果,以將該等輿情影像中的文字部分進行合併,俾作為該等輿情影像中的字幕區域;一文字整合與排序模組,係用以將該等字幕 區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章;及一輿情指標優化模組,係用以將該第一組輿情文章與該第二組輿情文章結合,以產生第三組輿情文章。 An embodiment of the present invention provides a public opinion query system, which includes: a network video capture setting module for setting a first rule and a second rule, wherein the first rule is in plural public opinion articles The rules for searching for the first group of public opinion articles related to a keyword, and the second rule is the rule for searching public opinion images in plural images, the public opinion images are images related to the keyword; a text recognition module , Is to perform text recognition on the public opinion images searched according to the second rule; a combined subtitle module is based on the results of text recognition of the public opinion images to merge the text parts of the public opinion images , To serve as the subtitle area in the public opinion images; a text integration and sorting module is used to perform correlation comparison between the text in the subtitle area and the plural public opinion articles to find the second group of public opinion articles; And a public opinion index optimization module is used to combine the first group of public opinion articles with the second group of public opinion articles to generate a third group of public opinion articles.

在另一實施例中,該複數輿情文章係由一輿情擷取模組從網路上所下載及/或儲存於一輿情文章資料庫中的輿情文章。 In another embodiment, the plural public opinion articles are public opinion articles downloaded from the Internet by a public opinion capturing module and/or stored in a public opinion article database.

在另一實施例中,該等輿情影像係由一影片下載模組從網路上所下載及/或儲存於一影像資料庫中的輿情影像。 In another embodiment, the public opinion images are public opinion images downloaded from the Internet by a video download module and/or stored in an image database.

在另一實施例中,該第一規則係將該複數輿情文章中包含該關鍵字之文章名稱或文章內容作為第一組輿情文章。 In another embodiment, the first rule is to use the article name or article content of the plural public opinion articles containing the keyword as the first group of public opinion articles.

在另一實施例中,該第二規則係將該複數影像中包含該關鍵字之檔案名稱作為該等輿情影像。 In another embodiment, the second rule is to use the file names of the plural images containing the keyword as the public opinion images.

在另一實施例中,該文字辨識模組係使用光學文字辨識(OCR)的方式來對於該等輿情影像進行文字辨識。 In another embodiment, the text recognition module uses optical text recognition (OCR) to perform text recognition on the public opinion images.

在另一實施例中,該合併字幕模組使用一跑馬燈判別方式來將該等輿情影像中的複數文字區域分成跑馬燈區域及主字幕區域,以將該等主字幕區域及/或該等跑馬燈區域作為該等輿情影像中的字幕區域。 In another embodiment, the combined subtitle module uses a marquee discriminating method to divide the plural text areas in the public opinion images into marquee regions and main subtitle regions, so that the main subtitle regions and/or the The marquee area is used as the subtitle area in the public opinion images.

在另一實施例中,該跑馬燈判別方式係由該合併字幕模組判斷該等輿情影像中的複數文字區域中之文字的內容及位置是否會隨著時間而改變。 In another embodiment, the marquee determination method is that the combined subtitle module determines whether the content and position of the text in the plural text areas in the public opinion images will change over time.

在另一實施例中,該文字整合與排序模組係進行以下 操作,將該等字幕區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章:該文字整合與排序模組以一數值化方式將該複數輿情文章及該等字幕區域中的文字轉換成數值化資料;該文字整合與排序模組利用餘弦夾角演算法,針對該複數輿情文章之數值化資料及該等字幕區域中的文字的數值化資料進行運算,以獲得該複數輿情文章與該等字幕區域中的文字的相似度分數;及該文字整合與排序模組在該複數輿情文章中,找出其相似度分數大於一預設值的輿情文章,以作為第二組輿情文章。 In another embodiment, the text integration and sorting module performs the following operations to compare the text in the subtitle area with the plural public opinion articles to find the second group of public opinion articles: the text integration The text integration and ranking module converts the plural public opinion articles and the text in the subtitle areas into numerical data in a numerical manner; the text integration and ranking module uses the cosine angle algorithm to target the numerical data of the plural public opinion articles And the numerical data of the text in the subtitle areas to obtain the similarity score between the plural public opinion articles and the text in the subtitle areas; and the text integration and ranking module finds the plural public opinion articles The public opinion articles whose similarity score is greater than a preset value are selected as the second group of public opinion articles.

在另一實施例中,該輿情指標優化模組係將該第一組輿情文章與該第二組輿情文章進行聯集以產生該第三組輿情文章。 In another embodiment, the public opinion index optimization module combines the first group of public opinion articles with the second group of public opinion articles to generate the third group of public opinion articles.

本發明之又一實施例提供了一種輿情查詢方法,包含以下步驟:(1)在複數輿情文章中搜尋與一關鍵字相關的第一組輿情文章;(2)在複數影像中搜尋輿情影像,其中,該等輿情影像係為與該關鍵字相關的影像;(3)針對該等輿情影像進行文字辨識,找出該等輿情影像中的字幕區域;(4)將該等字幕區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章;及(5)將該第一組輿情文章與該第二組輿情文章結合,以產生第三組輿情文章。 Another embodiment of the present invention provides a public opinion query method, including the following steps: (1) searching for a first group of public opinion articles related to a keyword among plural public opinion articles; (2) searching for public opinion images among plural images, Among them, the public opinion images are images related to the keyword; (3) perform text recognition on the public opinion images to find the subtitle areas in the public opinion images; (4) the text in the subtitle areas Perform correlation comparison with the plural public opinion articles to find the second group of public opinion articles; and (5) combine the first group of public opinion articles with the second group of public opinion articles to generate a third group of public opinion articles.

在另一實施例中,該步驟(1)係在該複數輿情文章中找尋包含該關鍵字的輿情文章,以作為該第一組輿情文章。 In another embodiment, the step (1) is to find a public opinion article containing the keyword among the plural public opinion articles, as the first group of public opinion articles.

在另一實施例中,該步驟(2)係在該複數影像中找尋包 含該關鍵字之檔案名稱的影像以作為該等輿情影像。 In another embodiment, the step (2) is to find images containing the file name of the keyword in the plural images as the public opinion images.

在另一實施例中,該步驟(3)係使用光學文字辨識(OCR)的方式來對於該等輿情影像進行文字辨識、找出該等輿情影像中的文字區域,以作為該等輿情影像中的字幕區域。 In another embodiment, the step (3) uses optical character recognition (OCR) to perform text recognition on the public opinion images, find the text areas in the public opinion images, and use them as the public opinion images. Subtitle area.

在另一實施例中,在該步驟(3)中,係使用一跑馬燈判別方式來將該等輿情影像中的複數文字區域分成跑馬燈區域及主字幕區域,以將該等主字幕區域及/或該等跑馬燈區域作為該等輿情影像中的字幕區域。 In another embodiment, in this step (3), a marquee discrimination method is used to divide the plural text areas in the public opinion images into marquee areas and main subtitle areas, so that the main subtitle areas and / Or the marquee area as the subtitle area in the public opinion images.

在另一實施例中,該跑馬燈判別方式係為判斷該等輿情影像中的複數文字區域中之文字的內容及位置是否會隨著時間而改變。 In another embodiment, the marquee determination method is to determine whether the content and position of the text in the plural text areas in the public opinion images will change over time.

在另一實施例中,該步驟(4)包含以下步驟:(4-1)以一數值化方式將該複數輿情文章及該等字幕區域中的文字轉換成數值化資料;(4-2)以餘弦夾角演算法,針對該複數輿情文章之數值化資料及該等字幕區域中的文字的數值化資料進行運算,來獲得該複數輿情文章與該等字幕區域中的文字的相似度分數;及(4-3)在該複數輿情文章中,找出其相似度分數大於一預設值的輿情文章,以作為第二組輿情文章。 In another embodiment, the step (4) includes the following steps: (4-1) convert the plural public opinion articles and the text in the subtitle areas into numerical data in a numerical manner; (4-2) Use the cosine angle algorithm to calculate the numerical data of the plural public opinion articles and the numerical data of the text in the subtitle areas to obtain the similarity scores between the plural public opinion articles and the text in the subtitle areas; and (4-3) In the plural public opinion articles, find public opinion articles whose similarity score is greater than a preset value, and use them as the second group of public opinion articles.

在另一實施例中,該步驟(5)係將該第一組輿情文章與該第二組輿情文章進行聯集,以產生該第三組輿情文章。 In another embodiment, the step (5) is to combine the first group of public opinion articles with the second group of public opinion articles to generate the third group of public opinion articles.

應理解,以上描述的標的可實施為電腦控制的設備、電腦程式、計算系統,或作為製品,諸如,電腦可讀取儲存媒體。 It should be understood that the subject matter described above can be implemented as a computer-controlled device, computer program, computing system, or as a product, such as a computer-readable storage medium.

為讓本發明之上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明。在以下描述內容中將部分闡述本發明之額外特徵及優點,且此等特徵及優點將部分自所述描述內容顯而易見,或可藉由對本發明之實踐習得。本發明之特徵及優點借助於在申請專利範圍中特別指出的元件及組合來認識到並達到。應理解,前文一般描述與以下詳細描述兩者均僅為例示性及解釋性的,且不欲約束本發明所主張之範圍。 In order to make the above-mentioned features and advantages of the present invention more comprehensible, embodiments are specifically described below in conjunction with the accompanying drawings. In the following description, the additional features and advantages of the present invention will be partially explained, and these features and advantages will be partly obvious from the description, or can be learned by practicing the present invention. The features and advantages of the present invention are realized and achieved by means of the elements and combinations specifically pointed out in the scope of the patent application. It should be understood that the foregoing general description and the following detailed description are both illustrative and explanatory, and are not intended to limit the claimed scope of the present invention.

100‧‧‧影片資料庫 100‧‧‧Video Database

101‧‧‧網路影片擷取設定模組 101‧‧‧Network Video Capture Setting Module

102‧‧‧影片下載模組 102‧‧‧Video download module

103‧‧‧取樣設定模組 103‧‧‧Sampling setting module

104‧‧‧取樣切片模組 104‧‧‧Sampling and slice module

105‧‧‧影音文字查詢模組 105‧‧‧Video and text query module

106‧‧‧字幕區塊設定模組 106‧‧‧Subtitle block setting module

107‧‧‧文字辨識模組 107‧‧‧Text recognition module

108‧‧‧合併字幕模組 108‧‧‧Combined subtitle module

109‧‧‧輿情擷取模組 109‧‧‧Public opinion capture module

110‧‧‧文字整合與排序模組 110‧‧‧Text Integration and Sorting Module

111‧‧‧輿情指標優化模組 111‧‧‧Public Opinion Index Optimization Module

112‧‧‧輿情文章資料庫 112‧‧‧Public opinion article database

113‧‧‧輿情查詢系統 113‧‧‧Public Opinion Inquiry System

B1、B2、R1、R2‧‧‧字幕區域 B1, B2, R1, R2‧‧‧Subtitle area

第1圖所示係為根據本發明之一實施例的輿情查詢系統的示意架構圖;及第2圖至第4圖所示係為根據本發明之一實施例,切片後輿情影像之示意圖。 Figure 1 is a schematic architecture diagram of a public opinion query system according to an embodiment of the present invention; and Figures 2 to 4 are schematic diagrams of public opinion images after slicing according to an embodiment of the present invention.

以下藉由特定的具體實施形態說明本發明之實施方式,熟悉此技術之人士可由本說明書所揭示之內容輕易地了解本發明之其他優點與功效,亦可藉由其他不同的具體實施形態加以施行或應用。 The following describes the implementation of the present invention with specific specific embodiments. Those familiar with this technology can easily understand the other advantages and effects of the present invention from the contents disclosed in this specification, and can also be implemented by other different specific embodiments. Or apply.

本發明揭示了一種輿情查詢系統及方法,現在參閱第1圖,其圖示根據本發明之一實施例的輿情查詢系統的示意架構圖。輿情查詢系統113主要包含影片資料庫100、網路影片擷取設定模組101、影片下載模組102、取樣設定模組103、取樣切片模組104、影音文字查詢模組105、字幕區塊設定模組106、文字辨識模組107、合併字幕模組 108、輿情擷取模組109、文字整合與排序模組110、輿情指標優化模組111及/或輿情文章資料庫112。 The present invention discloses a public opinion query system and method. Now referring to FIG. 1, which illustrates a schematic architecture diagram of a public opinion query system according to an embodiment of the present invention. The public opinion query system 113 mainly includes a video database 100, a network video capture setting module 101, a video download module 102, a sampling setting module 103, a sampling slicing module 104, a video and audio text query module 105, and a subtitle block setting Module 106, text recognition module 107, combined subtitle module 108, public opinion capture module 109, text integration and sorting module 110, public opinion index optimization module 111 and/or public opinion article database 112.

如圖所示,影片資料庫100係為可儲存影像資料的資料庫,其中,影像資料包含但不限於圖像、影片、影音等視覺化或多媒體資料。相對地,輿情文章資料庫112係為儲存輿情文章的資料庫,其中,輿情文章的內容包含但不限於民眾對社會事件、政治選舉、教育制度改革等議題的看法及言論。 As shown in the figure, the video database 100 is a database that can store image data. The image data includes, but is not limited to, visual or multimedia data such as images, movies, and audio-visual data. In contrast, the public opinion article database 112 is a database for storing public opinion articles, where the content of the public opinion articles includes, but is not limited to, public opinion and speech on social events, political elections, and educational system reforms.

網路影片擷取設定模組101可供使用者設定搜尋與特定關鍵字(如:「電信」等)相關的輿情文章及與關鍵字相關的影像(簡稱:輿情影像)之規則。 The network video capture setting module 101 allows users to set rules for searching public opinion articles related to specific keywords (such as "telecommunications", etc.) and images related to keywords (referred to as public opinion images).

舉例而言,使用者可設定欲搜尋輿情文章之規格,係指搜尋其文章名稱或文章內容包含關鍵字的文章。此外,使用者亦可設定欲搜尋輿情影像之規則,係指搜尋其檔案名稱包含關鍵字的影像。 For example, users can set the specifications for searching public opinion articles, which means searching for articles whose article name or article content contains keywords. In addition, users can also set rules for searching public opinion images, which means searching for images whose file names contain keywords.

應注意,上述使用者在網路影片擷取設定模組101之規則設定僅為釋例性。實際上,使用者在網路影片擷取設定模組101之規則設定可視使用者需求而作增減或修改。 It should be noted that the user's rule setting in the web video capture setting module 101 is only illustrative. In fact, the rule setting of the user in the network video capture setting module 101 can be increased, decreased or modified according to user needs.

此外,網路影片擷取設定模組101設定好輿情影像的搜尋規則後,可直接自影片資料庫100搜尋輿情影像。或者,亦可由網路影片擷取設定模組101將關鍵字(搜尋規則)傳送給影片下載模組102,再由影片下載模組102根據關鍵字(搜尋規則)在網路上搜尋及下載輿情影像。 In addition, after the network video capture setting module 101 has set the search rules of the public opinion image, it can directly search the public opinion image from the video database 100. Alternatively, the network video capture setting module 101 can also send the keywords (search rules) to the video download module 102, and then the video download module 102 searches and downloads public opinion images on the Internet according to the keywords (search rules) .

舉例而言,影片下載模組102在收到關鍵字後,可直 接在如youtube等影像搜尋引擎中以該關鍵字(如:「電信」)作為檔名找尋、下載輿情影像。 For example, after receiving a keyword, the video download module 102 can directly search for and download public opinion images using the keyword (such as "telecommunications") as the file name in an image search engine such as youtube.

另一方面,網路影片擷取設定模組101設定好輿情文章的搜尋規則後,可直接從輿情文章資料庫112搜尋輿情文章。或者,亦可由網路影片擷取設定模組101將關鍵字(搜尋規則)傳送給輿情擷取模組109,再由輿情擷取模組109根據關鍵字(搜尋規則)在網路上搜尋及下載輿情文章。 On the other hand, after the network video capture setting module 101 has set the search rules of the public opinion articles, it can directly search the public opinion articles from the public opinion article database 112. Alternatively, the web video capture setting module 101 can send the keywords (search rules) to the public opinion capture module 109, and then the public opinion capture module 109 searches and downloads on the Internet according to the keywords (search rules) Public opinion articles.

舉例而言,輿情擷取模組109在收到關鍵字後,可直接在各新聞網站、論壇、時事討論平台、網路留言板等發表文章的網路平台,找尋其文章名稱或文章內容包含關鍵字的文章,以作為輿情文章的搜尋結果(以下簡稱主要輿情文章)。 For example, after receiving keywords, the public opinion capture module 109 can directly publish articles on news websites, forums, current affairs discussion platforms, online message boards, and other online platforms to find the name of the article or the content of the article. Keyword articles are used as search results of public opinion articles (hereinafter referred to as main public opinion articles).

在搜尋及下載輿情影像後,取樣設定模組103可針對輿情影像進行影片切片(取樣)的頻率設定。舉例而言,取樣設定模組103可設定僅取出特定時段內(如:民國107年4月20日早上6點至下午3點、每個星期一的早上6點30分至6點40分、每段輿情影像之開頭5分鐘等)的輿情影像。應注意,前述頻率設定僅為釋例性,而非用以限制本發明之保護範圍。 After searching and downloading the public opinion image, the sampling setting module 103 can set the frequency of video slicing (sampling) for the public opinion image. For example, the sampling setting module 103 can be configured to only take samples within a specific time period (e.g., from 6 am to 3 pm on April 20, 2007, 6:30 am to 6:40 am every Monday, The first 5 minutes of each public opinion video). It should be noted that the aforementioned frequency setting is only illustrative, and not intended to limit the protection scope of the present invention.

舉例而言,取樣設定模組103將影片下載模組102從youtube所下載關於關鍵字「電信」的輿情影像之影片切片(取樣)的頻率設定為從2017年11月24日某段時間內的輿情影像進行每秒15次的切片。 For example, the sampling setting module 103 sets the frequency of video slices (sampling) of public opinion images about the keyword "telecommunications" downloaded by the video download module 102 from youtube to a certain period of time from November 24, 2017 Public opinion images are sliced 15 times per second.

爾後,取樣切片模組104可根據所設定的輿情影像切 片(取樣)頻率,來將輿情影像進行切片並記錄每個切片後影像片段的時間戳記,其中,切片後影片的時間戳記格式可為日期+時間+亂數+切片序Thereafter, the sampling and slicing module 104 can slice the public opinion image according to the set frequency of slicing (sampling) the public opinion image and record the time stamp of each sliced video segment. The time stamp format of the sliced video may be a date. + Time + random number + slice order .

舉例而言,切片後影片的時間戳記的格式可為「20171124080808-824791419114957289-01、20171124080808-824791419114957289-02、……20171124080808-10071954288746-15」。 For example, the format of the time stamp of the sliced video may be " 20171124080808-824791419114957289-01, 20171124080808-824791419114957289-02, ... 20171124080808-10071954288746-15 ".

對於切片後影片(影像)的內容可參考第2圖至第4圖,其圖示根據本發明之一實施例,切片後輿情影像之示意圖。 For the content of the sliced movie (image), please refer to FIGS. 2 to 4, which illustrate a schematic diagram of a public opinion image after sliced according to an embodiment of the present invention.

接著,可由字幕區塊設定模組106將切片後的輿情影像進行字幕區塊的設定、標示。 Then, the subtitle block setting module 106 can set and mark the subtitle block of the sliced public opinion image.

舉例而言,使用者可直接透過字幕區塊設定模組106在切片後的輿情影像中,設定、標示感興趣的字幕區塊(區域)。或者,亦可由文字辨識模組107將輿情影像中的文字進行辨識,爾後再由字幕區塊設定模組106針對文字辨識的結果,自動在輿情影像中設定、標示感興趣的字幕區域。 For example, the user can directly use the subtitle block setting module 106 to set and mark the subtitle block (region) of interest in the sliced public opinion image. Alternatively, the text recognition module 107 can also recognize the text in the public opinion image, and then the subtitle block setting module 106 automatically sets and marks the subtitle area of interest in the public opinion image according to the result of the text recognition.

舉例而言,字幕區塊設定模組106可根據時間戳記,針對第2圖至第4圖中所示切片後影片(影像)中所辨識出的文字加以組合,並針對切片後影片進行字幕區域的框定,例如:底部的字幕區域(B1、B2)及右部的字幕區域(R1、R2) For example, the subtitle block setting module 106 can combine the recognized text in the sliced video (image) shown in Figures 2 to 4 according to the time stamp, and perform subtitle area for the sliced video , For example: the subtitle area at the bottom (B1, B2) and the subtitle area at the right (R1, R2)

在一實施例中,文字辨識模組107可使用光學文字辨 識(OCR)的方式來對於該等輿情影像進行文字辨識。 In one embodiment, the text recognition module 107 can use optical text recognition (OCR) to perform text recognition on the public opinion images.

舉例而言,針對第2圖至第4圖之切片後影片,文字辨識模組107可針對圖式中R1、R2、B1、B2字幕區域中的文字進行辨識,其辨識結果如下:「R1 1 :上街頭!反核怒!勞工怨 For example, for the sliced video in Figures 2 to 4, the text recognition module 107 can recognize the text in the subtitle area R1, R2, B1, B2 in the figure, and the recognition result is as follows: " R1 1 : Go to the streets! Anti-nuclear anger! Labor grievances

R1R1 22 :上街頭!又核怒!勞土怨: Go to the streets! Nuclear anger again! Hard work

R1R1 33 :土街頭!反核怒!勞工怨: Soil street! Anti-nuclear anger! Labor grievance

R2R2 11 :政府下個大地雷預言: The government's next big landmine prediction

R2R2 22 :政府下個太地雷預言: The government's prediction of the next landmine

R2R2 33 :正又府下個大地雷預言: Prophecy of the next big landmine in Zhengyoufu

B1B1 11 :自己出去成立的子公司: Subsidiary established by myself

B1B1 22 :賺了非常多錢,成為母公司金雞母: Earned a lot of money and became the parent company Golden Rooster

B1B1 33 :但員工薪水調幅卻跟賺的錢不成比例: But the salary adjustment of employees is not proportional to the money earned

B2B2 11 :環團上街頭,反核總動員: The ring regiment took to the streets, the anti-nuclear mobilization

B2B2 22 :嗆低薪!反派遣!勞動節兩萬人怒圍勞動部: Choking low salary! Counter dispatch! 20,000 people encircle the Ministry of Labor on Labor Day

B2 3 :嗆低薪!反派遣!勞動節兩萬人怒圍勞動部 B2 3 : choking low salary! Counter dispatch! 20,000 people encircle the Ministry of Labor on Labor Day "

接著,合併字幕模組108可根據切片後影片文字辨識的結果,將輿情影像中文字部分(文字區域)進行合併,以產生輿情影像中的字幕區域。 Then, the subtitle merging module 108 can merge the text part (text area) in the public opinion image according to the result of the text recognition of the video after slicing to generate the subtitle area in the public opinion image.

在一實施例中,可使用公式

Figure 107119658-A0101-12-0010-22
來代表合併字幕模組108所產生的輿情影像中之字幕區域,其中,x代表特定時間的切割點,Cx代表特定時間的切割點下所產生輿情影像中的字幕區域,而符號U代表聯集。 In one embodiment, the formula can be used
Figure 107119658-A0101-12-0010-22
To represent the subtitle area in the public opinion image generated by the combined subtitle module 108, where x represents the cutting point at a specific time, C x represents the subtitle area in the public opinion image generated at the cutting point at a specific time, and the symbol U represents the joint set.

以第2圖至第4圖中的切片影片為例,其輿情影像中的字幕區域Cx等於{自己出去成立的子公司 賺了非常多錢 成為母公司金雞母 但員工薪水調幅卻跟賺的錢不成比例|環團上街頭,反核總動員 嗆低薪!反派遣!勞動節兩萬人怒圍勞動部 嗆低薪!反派遣!勞動節兩萬人怒圍勞動部|上街頭!反核怒!勞工怨 上街頭!又核怒!勞土怨 土街頭!反核怒!勞工怨|政府下個大地雷預言 政府下個太地雷預言正又府下個大地雷預言}。 Take the sliced video in Figures 2 to 4 as an example, the subtitle area C x in the public opinion image is equal to {The subsidiary I established by myself has made a lot of money to become the parent company's Golden Rooster, but the salary adjustment of the employees is not the same. Money disproportionate|The ring group took to the streets, and the anti-nuclear mobilization choke low salary! Counter dispatch! On Labor Day, 20,000 people angered the Ministry of Labor to choke low wages! Counter dispatch! Twenty thousand people encircle the Ministry of Labor on Labor Day|Take the streets! Anti-nuclear anger! Labor complaints go to the streets! Nuclear anger again! Hard work and resent the streets! Anti-nuclear anger! Labor grievances|The government's next big landmine is predicted.

在一實施例中,合併字幕模組108可使用主字幕分析方式或是跑馬燈判別方式來將輿情影像中的文字區域分成跑馬燈區域及主字幕區域,以將主字幕區域及/或跑馬燈區域作為輿情影像中的字幕區域。 In one embodiment, the combined subtitle module 108 can use the main subtitle analysis method or the marquee discrimination method to divide the text area in the public opinion image into a marquee area and a main subtitle area, so as to divide the main subtitle area and/or the marquee area. The area is used as the subtitle area in the public opinion image.

在一實施例中,主字幕分析方式可為:在輿情影像中的一文字區域中,若某個字串在該文字區域中停留時間越短、變換字幕頻率越高,則該文字區域係為輿情影像的主字幕區域的可能性就越大。 In one embodiment, the main subtitle analysis method may be: in a text area in a public opinion image, if a character string stays in the text area for a shorter time and the frequency of subtitles is changed, the text area is a public opinion The greater the possibility of the main subtitle area of the video.

以前述文字區域(R1、R2、B1、B2)為例,因為R1、R2中出現3次的字串內容都相同,因此R1、R2可能為輿情影像的主字幕區域之可能性較低。 Taking the aforementioned text area (R1, R2, B1, B2) as an example, because the content of the string that appears three times in R1, R2 is the same, it is less likely that R1, R2 may be the main subtitle area of the public opinion image.

相對地,B1、B2中出現3次的字串內容不盡相同,因此B1、B2可能為輿情影像的主字幕區域之可能性相對較高。 In contrast, the content of the string that appears three times in B1 and B2 is not the same, so the possibility that B1 and B2 may be the main subtitle area of the public opinion image is relatively high.

在合併字幕模組108判斷B1、B2區塊較可能為主字幕區域後,可接著使用公式Mx來在B1、B2區塊中找出最 可能為主字幕區域者,並用符號BM來表示主字幕區域。Mx公式為Mx=Max(Main_Text(Bi)|i=1 to 2)(亦即,在B1、B2區塊中,取其為主字幕區域可能性最高者,以作為主字幕區域)。 After the combined subtitle module 108 determines that the B1 and B2 blocks are more likely to be the main subtitle area, the formula M x can be used to find the most likely main subtitle area in the B1 and B2 blocks, and the symbol BM is used to indicate the main subtitle area. Subtitle area. Formula M x M x = Max (Main_Text (B i) | i = 1 to 2) ( i.e., the B1, B2 block, whichever is the highest likelihood based subtitle region, as a main subtitle area) .

此時,可計算得出BM=B1,亦即,在前述文字區域中,最有可能為主字幕之區域為B1={自己出去成立的子公司 賺了非常多錢 成為母公司金雞母 但員工薪水調幅卻跟賺的錢不成比例}。 At this time, it can be calculated that BM=B1, that is, in the aforementioned text area, the area most likely to be the main subtitle is B1={The subsidiary that I established by myself has made a lot of money to become the parent company’s Golden Rooster but employees The salary adjustment is not proportional to the money earned}.

接著,合併字幕模組108除了B1區域視為主字幕區域外,亦可將其餘區域(即,B2、R1、R2)視為係非主字幕區域(如:跑馬燈區域),並針對非主字幕區域進行錯別字修正及重複字結合來產生另一集合。爾後,再以聯集方式將主字幕區域之文字集合、及修正後非主字幕區域之文字加以結合。 Then, in addition to the B1 area as the main subtitle area, the merged subtitle module 108 can also treat the remaining areas (ie, B2, R1, R2) as non-main subtitle areas (such as the marquee area), and target non-main subtitle areas. In the subtitle area, typos are corrected and repeated words are combined to generate another set. After that, the text collection in the main subtitle area and the text in the non-main subtitle area after correction are combined in a united manner.

舉例而言,前述B1、B2、R1、R2區域的Cx在結合後可更新為

Figure 107119658-A0101-12-0012-23
={自己出去成立的子公司 賺了非常多錢 成為母公司金雞母 但員工薪水調幅卻跟賺的錢不成比例;環團上街頭,反核總動員 嗆低薪!反派遣!勞動節兩萬人怒圍勞動部|上街頭!反核怒!勞工怨|政府下個大地雷預言}。 For example, the Cx of the aforementioned B1, B2, R1, and R2 regions can be updated to
Figure 107119658-A0101-12-0012-23
={The subsidiary I set up by myself made a lot of money to become the parent company Golden Rooster, but the salary adjustment of the employees was not proportional to the money earned; the ring group took to the streets, and the anti-nuclear general mobilization choked low salary! Counter dispatch! Twenty thousand people encircle the Ministry of Labor on Labor Day|Take the streets! Anti-nuclear anger! Labor grievances|The government's prediction of a big landmine}.

另外,在一實施例中,跑馬燈判別方式係指合併字幕模組108輿情影像的文字區域中之文字的內容及位置是否會隨著時間而改變。 In addition, in one embodiment, the marquee determination method refers to whether the content and position of the text in the text area of the public opinion image of the combined subtitle module 108 will change over time.

舉例而言,在輿情影像中的一文字區域中,若某個字 串在該文字區域中不斷位移(即,位置不斷改變),但字串內容卻沒改變,那麼該文字區域可能為跑馬燈區域的可能性便很高。 For example, in a text area in a public opinion image, if a character string is constantly shifting in the text area (that is, the position changes continuously), but the content of the string does not change, then the text area may be a marquee area The probability is very high.

在判斷完輿情影像中的跑馬燈區域及主字幕區域後,合併字幕模組108可將主字幕區域及/或跑馬燈區域作為輿情影像中的字幕區域。 After determining the marquee area and the main subtitle area in the public opinion image, the combined subtitle module 108 can use the main subtitle area and/or the marquee area as the subtitle area in the public opinion image.

文字整合與排序模組110可將輿情影像中字幕區域的文字與(i)從輿情擷取模組109所下載的輿情文章、(ii)儲存在輿情文章資料庫112中的輿情文章、或(iii)從輿情擷取模組109所下載、及儲存在輿情文章資料庫112中的輿情文章(前述(i)~(iii)所述之輿情文章簡稱為候選輿情文章)進行相關性比對,找出更多符合使用者需求的輿情文章。 The text integration and sorting module 110 can combine the text in the subtitle area in the public opinion image with (i) public opinion articles downloaded from the public opinion capturing module 109, (ii) public opinion articles stored in the public opinion article database 112, or ( iii) Perform correlation comparison of the public opinion articles downloaded from the public opinion capture module 109 and stored in the public opinion article database 112 (the public opinion articles mentioned in (i)~(iii) above are referred to as candidate public opinion articles), Find more public opinion articles that meet user needs.

在一實施例中,文字整合與排序模組110可先用數值化方式將候選輿情文章及輿情影像中字幕區域的文字轉換成數值化資料。 In one embodiment, the text integration and sorting module 110 may first use a numerical method to convert the text in the subtitle area of the candidate public opinion article and public opinion image into numerical data.

舉例而言,以輿情文章所示文字「中華電公司」及字幕區域的文字「公司中華電」兩詞為例,若以(中,華,電,公,司,中華,華電,電公,公司,司中)作為詞句數值化(此處,係10維向量化)之標準,並假設詞句中若出現某維度所示之文字(如:「中華」、「華電」、「司中」等),則該詞句在該維度之數值為1;反之,則為0。 For example, take the text "China Power Corporation" in the public opinion article and the text "Company China Power" in the subtitle area as an example. If you use (中,华,电,公司,司, China, Huadian, Diangong, Company, Division) is used as the standard for the digitization of words and sentences (here, 10-dimensional vectorization), and it is assumed that if the words shown in a certain dimension appear in the words and sentences (such as: "China", "Huadian", "Sizhong", etc. ), the value of the phrase in that dimension is 1; otherwise, it is 0.

如此,前述「中華電公司」詞句數值化之結果為(1,1,1,1,1,1,1,1,1,0),而「公司中華電」詞句數值化之結果為(1,1,1,1,1,1,1,1,1,1)。 In this way, the result of the digitization of the phrase "Chonghua Power Company" is (1,1,1,1,1,1,1,1,1,0), and the result of the digitization of the phrase "China Power Company" is (1 ,1,1,1,1,1,1,1,1,1).

然後,文字整合與排序模組110再利用相關性比對演算法(如:餘弦夾角演算法),對於候選輿情文章及輿情影像字幕區域中文字的數值化資料進行計算,來算出候選輿情文章與輿情影像字幕區域中文字的相似度分數。 Then, the text integration and ranking module 110 uses a correlation comparison algorithm (such as the cosine angle algorithm) to calculate the numerical data of the candidate public opinion articles and the text in the public opinion image caption area to calculate the candidate public opinion articles and The similarity score of the text in the subtitle area of the public opinion image.

以前述「中華電公司」及「公司中華電」兩詞為例,該兩詞數值化結果之餘弦夾角為(1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+0 * 1)/10=0.9,因此計算出的相似度分數為0.9。 Take the aforementioned terms "Chonghua Power Company" and "Company Chunghwa Power" as an example, the cosine angle of the numerical result of the two words is (1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+1 * 1+0 * 1)/10=0.9, so the calculated similarity score is 0.9.

接著,文字整合與排序模組110在候選輿情文章中,找出其與輿情影像字幕區域中文字的相似度分數大於一預設值(如:0.8、0.7等)的文章,以作為補充式輿情文章。 Then, the text integration and sorting module 110 finds articles whose similarity scores with the text in the subtitle area of the public opinion image are greater than a preset value (such as 0.8, 0.7, etc.) among the candidate public opinion articles, as supplementary public opinion articles article.

輿情指標優化模組111可將前述直接由關鍵字搜尋得到的主要輿情文章及由文字整合與排序模組110所產生的補充式輿情文章加以結合,以產生最後呈現給使用者的輿情文章。 The public opinion index optimization module 111 can combine the aforementioned main public opinion articles directly obtained from the keyword search and the supplementary public opinion articles generated by the text integration and ranking module 110 to generate the public opinion articles finally presented to the user.

在一實施例中,輿情指標優化模組111係將主要輿情文章和補充式輿情文章以聯集方式來產生最後呈現給使用者的輿情文章。 In one embodiment, the public opinion index optimization module 111 combines the main public opinion articles and supplementary public opinion articles to generate the public opinion articles that are finally presented to the user.

此外,可使用累加型演算法或平均型演算法,來針對輿情指標優化模組111所產生的輿情文章,進行文章數量與質量的評量。 In addition, a cumulative algorithm or an average algorithm can be used to evaluate the number and quality of the articles generated by the public opinion index optimization module 111.

舉例而言,累加型演算法可如下所示:主要輿情文章和補充式輿情文章結合後呈現給使用者的輿情文章之分數 可設定為

Figure 107119658-A0101-12-0015-14
,其中,主要輿情文章得分數為M×P(hits);補充式輿情文章的分數為
Figure 107119658-A0101-12-0015-25
;M代表單一篇輿情文章的分數(如:前述計算出的相似度分數);P代表利用關鍵字搜尋到的輿情文章數;r|s代表利用關鍵字所搜尋到的輿情影像文字組;因其位置s不同而有不同權重r,一影片字幕總和值為
Figure 107119658-A0101-12-0015-28
,Q (hits) 代表利用關鍵字搜尋到的輿情影像文字組。此外,平均型演算法可如下所示:主要輿情文章和補充式輿情文章結合後呈現給使用者的輿情文章之分數可設定為
Figure 107119658-A0101-12-0015-3
,其中,主要輿情文章得分數為
Figure 107119658-A0101-12-0015-18
,M代表單一篇輿情文章的分數;P代表利用關鍵字搜尋到的輿情文章數;base代表要除之數(如篇數);其中r|s代表利用關鍵字所搜尋到的輿情影像文字組;因其位置s不同而有不同權重r,一影片字幕總和值為
Figure 107119658-A0101-12-0015-29
。將此總和值乘以代表利用關鍵字搜尋到的輿情影像文字組Q(hits),與M×P(hits)相加後,除以輿情篇數base|P與影 片篇數base|Q。 For example, the cumulative algorithm can be as follows: The score of the public opinion article presented to the user after the combination of the main public opinion article and the supplementary public opinion article can be set as
Figure 107119658-A0101-12-0015-14
, Where the score of the main public opinion article is M×P ( hits ) ; the score of the supplementary public opinion article is
Figure 107119658-A0101-12-0015-25
; M represents the score of a single public opinion article (such as the similarity score calculated above); P represents the number of public opinion articles searched by keywords; r | s represents the public opinion image text group searched by keywords; Its position s has different weight r, the sum of subtitles of a movie is
Figure 107119658-A0101-12-0015-28
, Q (hits) represents the public opinion image text group searched by keywords. In addition, the average algorithm can be as follows: The score of the public opinion article presented to the user after the combination of the main public opinion article and the supplementary public opinion article can be set as
Figure 107119658-A0101-12-0015-3
, Where the scores of main public opinion articles are
Figure 107119658-A0101-12-0015-18
, M represents the score of a single public opinion article; P represents the number of public opinion articles searched by keywords; base represents the number to be divided (such as the number of articles); where r | s represents the public opinion image text group searched by keywords ; Because of its different position s, there are different weights r, the sum of subtitles of a movie is
Figure 107119658-A0101-12-0015-29
. Multiply this total value by the Q ( hits ) of the public opinion image text group searched by keywords, add M×P ( hits ) , and divide by the number of public opinion articles base|P and the number of videos base|Q.

以累加型演算法為例,但不以此為限,假設每篇主要輿情文章之分數為1,利用關鍵字搜尋到100篇,則主要輿情文章總分數為1 x 100=100。此外,在補充式輿情文章中,假設關鍵字擊中B1=1(即,B1區域的相似度分數為1分)有20篇,擊中L1=0.5(即,L1區域的相似度分數為0.5分)有10篇,擊中R1=0.2有10篇,擊中T=0.1有10篇,則最後呈現給使用者的輿情文章之分數為100+20+5+2+1=128分。 Take the cumulative algorithm as an example, but not limited to this. Assuming that the score of each main public opinion article is 1, and 100 articles are searched by keywords, the total score of the main public opinion articles is 1 x 100=100. In addition, in the supplementary public opinion article, suppose that the keyword hits B1=1 (that is, the similarity score of the B1 area is 1 point), there are 20 articles, and it hits L1=0.5 (that is, the similarity score of the L1 area is 0.5 If there are 10 articles, 10 articles with R1=0.2, and 10 articles with T=0.1, the score of the final public opinion article presented to the user is 100+20+5+2+1=128 points.

在此例中,原本僅依靠關鍵字搜尋主要輿情文章的分數為100分;而使用本輿情查詢系統113進行搜尋,其最後呈現之輿情文章的分數為128分,顯見使用本輿情查詢系統所進行輿情文章之搜尋,能找到比一般輿情文章搜尋系統更為完整、多樣的輿情文章。 In this example, originally only relying on keywords to search for the main public opinion article score is 100 points; but using this public opinion query system 113 to search, the final public opinion article presented by the score is 128 points, obviously using this public opinion query system The search of public opinion articles can find more complete and diverse public opinion articles than general public opinion article search systems.

最後,影音文字查詢模組105可供使用者查詢與關鍵字相關的輿情影像及輿情文章。 Finally, the audiovisual text query module 105 allows the user to query public opinion images and public opinion articles related to keywords.

上述實施形態僅例示性說明本發明之原理、特點及其功效,並非用以限制本發明之可實施範疇,任何熟習此項技藝之人士均可在不違背本發明之精神及範疇下,對上述實施形態進行修飾與改變。任何運用本發明所揭示內容而完成之等效改變及修飾,均仍應為申請專利範圍所涵蓋。因此,本發明之權利保護範圍,應如申請專利範圍所列。 The above embodiments are only illustrative of the principles, features and effects of the present invention, and are not intended to limit the scope of implementation of the present invention. Anyone who is familiar with the art can comment on the above without departing from the spirit and scope of the present invention. Modifications and changes to the implementation form. Any equivalent changes and modifications made by using the contents disclosed in the present invention should still be covered by the scope of the patent application. Therefore, the protection scope of the present invention should be as listed in the scope of patent application.

100‧‧‧影片資料庫 100‧‧‧Video Database

101‧‧‧網路影片擷取設定模組 101‧‧‧Network Video Capture Setting Module

102‧‧‧影片下載模組 102‧‧‧Video download module

103‧‧‧取樣設定模組 103‧‧‧Sampling setting module

104‧‧‧取樣切片模組 104‧‧‧Sampling and slice module

105‧‧‧影音文字查詢模組 105‧‧‧Video and text query module

106‧‧‧字幕區塊設定模組 106‧‧‧Subtitle block setting module

107‧‧‧文字辨識模組 107‧‧‧Text recognition module

108‧‧‧合併字幕模組 108‧‧‧Combined subtitle module

109‧‧‧輿情擷取模組 109‧‧‧Public opinion capture module

110‧‧‧文字整合與排序模組 110‧‧‧Text Integration and Sorting Module

111‧‧‧輿情指標優化模組 111‧‧‧Public Opinion Index Optimization Module

112‧‧‧輿情文章資料庫 112‧‧‧Public opinion article database

113‧‧‧輿情查詢系統 113‧‧‧Public Opinion Inquiry System

Claims (18)

一種輿情查詢系統,包含:一網路影片擷取設定模組,係用以設定一第一規則及一第二規則,其中,該第一規則係在複數輿情文章中搜尋與一關鍵字相關的第一組輿情文章的規則,而該第二規則係在複數影像中搜尋輿情影像的規則,該等輿情影像係為與該關鍵字相關的影像;一文字辨識模組,係將根據該第二規則所搜尋到的該等輿情影像進行文字辨識;一合併字幕模組,係根據該等輿情影像進行文字辨識的結果,以將該等輿情影像中的文字部分進行合併,俾作為該等輿情影像中的字幕區域,其中,該合併字幕模組使用一跑馬燈判別方式來將該等輿情影像中的複數文字區域分成跑馬燈區域及主字幕區域,以將該等主字幕區域及/或該等跑馬燈區域作為該等輿情影像中的字幕區域;一文字整合與排序模組,係用以將該等字幕區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章;及一輿情指標優化模組,係用以將該第一組輿情文章與該第二組輿情文章結合,以產生第三組輿情文章。 A public opinion query system, comprising: a network video capture setting module for setting a first rule and a second rule, wherein the first rule is to search for a keyword related to a plurality of public opinion articles The rules of the first group of public opinion articles, and the second rule is the rule of searching public opinion images in plural images. The public opinion images are the images related to the keyword; a text recognition module will be based on the second rule The searched public opinion images perform text recognition; a combined subtitle module is based on the results of the text recognition of the public opinion images to merge the text parts in the public opinion images to be used in the public opinion images The subtitle area of the subtitle area, wherein the combined subtitle module uses a marquee discrimination method to divide the plural text areas in the public opinion image into a marquee area and a main subtitle area, so that the main subtitle area and/or the running lights The marquee area is used as the subtitle area in the public opinion images; a text integration and sorting module is used to perform correlation comparison between the text in the subtitle area and the plural public opinion articles to find the second group of public opinion articles; And a public opinion index optimization module is used to combine the first group of public opinion articles with the second group of public opinion articles to generate a third group of public opinion articles. 如申請專利範圍第1項所述之輿情查詢系統,其中,該複數輿情文章係由一輿情擷取模組從網路上所下載及/或儲存於一輿情文章資料庫中的輿情文章。 For example, in the public opinion query system described in the first item of the scope of patent application, the plural public opinion articles are public opinion articles downloaded from the Internet by a public opinion capturing module and/or stored in a public opinion article database. 如申請專利範圍第1項所述之輿情查詢系統,其中,該等輿情影像係由一影片下載模組從網路上所下載及/或儲存於一影像資料庫中的輿情影像。 For example, in the public opinion query system described in item 1 of the scope of patent application, the public opinion images are public opinion images downloaded from the Internet by a video download module and/or stored in an image database. 如申請專利範圍第1項所述之輿情查詢系統,其中,該第一規則係將該複數輿情文章中包含該關鍵字之文章名稱或文章內容作為第一組輿情文章。 For example, in the public opinion query system described in item 1 of the scope of patent application, the first rule is to regard the article name or article content of the plural public opinion articles containing the keyword as the first group of public opinion articles. 如申請專利範圍第1項所述之輿情查詢系統,其中,該第二規則係將該複數影像中包含該關鍵字之檔案名稱作為該等輿情影像。 For example, in the public opinion query system described in item 1 of the scope of patent application, the second rule is to use the file name of the plural images containing the keyword as the public opinion images. 如申請專利範圍第1項所述之輿情查詢系統,其中,該文字辨識模組係使用光學文字辨識(OCR)的方式來對於該等輿情影像進行文字辨識。 For example, in the public opinion query system described in item 1 of the scope of patent application, the text recognition module uses optical text recognition (OCR) to perform text recognition on the public opinion images. 如申請專利範圍第1項所述之輿情查詢系統,其中,該輿情指標優化模組係將該第一組輿情文章與該第二組輿情文章進行聯集以產生該第三組輿情文章。 For example, in the public opinion query system described in item 1 of the scope of patent application, the public opinion index optimization module combines the first group of public opinion articles with the second group of public opinion articles to generate the third group of public opinion articles. 如申請專利範圍第1項所述之輿情查詢系統,其中,該跑馬燈判別方式係由該合併字幕模組判斷該等輿情影像中的複數文字區域中之文字的內容及位置是否會隨著時間而改變。 For example, the public opinion query system described in item 1 of the scope of patent application, wherein the marquee judging method is determined by the combined subtitle module to determine whether the content and position of the text in the plural text areas in the public opinion images will change over time And change. 一種輿情查詢系統,包含:一網路影片擷取設定模組,係用以設定一第一規則及一第二規則,其中,該第一規則係在複數輿情文章中搜尋與一關鍵字相關的第一組輿情文章的規則,而該第二規則係在複數影像中搜尋輿情影像的規則,該等輿 情影像係為與該關鍵字相關的影像;一文字辨識模組,係將根據該第二規則所搜尋到的該等輿情影像進行文字辨識;一合併字幕模組,係根據該等輿情影像進行文字辨識的結果,以將該等輿情影像中的文字部分進行合併,俾作為該等輿情影像中的字幕區域;一文字整合與排序模組,係用以將該等字幕區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章;及一輿情指標優化模組,係用以將該第一組輿情文章與該第二組輿情文章結合,以產生第三組輿情文章;其中,該文字整合與排序模組係進行以下操作,將該等字幕區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章:該文字整合與排序模組以一數值化方式將該複數輿情文章及該等字幕區域中的文字轉換成數值化資料;該文字整合與排序模組利用餘弦夾角演算法,針對該複數輿情文章之數值化資料及該等字幕區域中的文字的數值化資料進行運算,以獲得該複數輿情文章與該等字幕區域中的文字的相似度分數;及該文字整合與排序模組在該複數輿情文章中,找出其相似度分數大於一預設值的輿情文章,以作為第二組輿情文章。 A public opinion query system, comprising: a network video capture setting module for setting a first rule and a second rule, wherein the first rule is to search for a keyword related to a plurality of public opinion articles The first set of rules for public opinion articles, and the second rule is the rule for searching public opinion images in plural images. A sentiment image is an image related to the keyword; a text recognition module is used for text recognition of the public opinion images searched according to the second rule; a combined subtitle module is used for text recognition based on the public opinion images The result of the recognition is to merge the text parts in the public opinion images to serve as the subtitle area in the public opinion images; a text integration and sorting module is used to combine the text in the subtitle areas with the plural public opinion The articles are compared to find the second group of public opinion articles; and a public opinion index optimization module is used to combine the first group of public opinion articles with the second group of public opinion articles to generate the third group of public opinion articles ; Among them, the text integration and ranking module performs the following operations to compare the text in the subtitle area with the plural public opinion articles to find the second group of public opinion articles: the text integration and ranking module Convert the plural public opinion articles and the text in the subtitle areas into numerical data in a numerical manner; the text integration and sorting module uses the cosine angle algorithm to target the numerical data of the plural public opinion articles and the subtitles The numerical data of the text in the area is calculated to obtain the similarity score between the plural public opinion article and the text in the subtitle areas; and the text integration and ranking module finds the similarity in the plural public opinion article Public opinion articles with scores greater than a preset value are used as the second group of public opinion articles. 如申請專利範圍第9項所述之輿情查詢系統,其中,該 輿情指標優化模組係將該第一組輿情文章與該第二組輿情文章進行聯集以產生該第三組輿情文章。 Such as the public opinion query system described in item 9 of the scope of patent application, where the The public opinion index optimization module combines the first group of public opinion articles with the second group of public opinion articles to generate the third group of public opinion articles. 一種輿情查詢方法,包含以下步驟:(1)在複數輿情文章中搜尋與一關鍵字相關的第一組輿情文章;(2)在複數影像中搜尋輿情影像,其中,該等輿情影像係為與該關鍵字相關的影像;(3)針對該等輿情影像進行文字辨識,找出該等輿情影像中的字幕區域,其中,使用一跑馬燈判別方式來將該等輿情影像中的複數文字區域分成跑馬燈區域及主字幕區域,以將該等主字幕區域及/或該等跑馬燈區域作為該等輿情影像中的字幕區域;(4)將該等字幕區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章;及(5)將該第一組輿情文章與該第二組輿情文章結合,以產生第三組輿情文章。 A public opinion query method includes the following steps: (1) Searching for a first group of public opinion articles related to a keyword in plural public opinion articles; (2) Searching for public opinion images in plural images, wherein the public opinion images are related to Images related to the keyword; (3) Perform text recognition for the public opinion images to find the subtitle areas in the public opinion images, wherein a marquee discrimination method is used to divide the plural text areas in the public opinion images The marquee area and the main subtitle area, using the main subtitle area and/or the marquee area as the subtitle area in the public opinion images; (4) The text in the subtitle area is combined with the plural public opinion articles Correlation comparison to find the second group of public opinion articles; and (5) combine the first group of public opinion articles with the second group of public opinion articles to generate a third group of public opinion articles. 如申請專利範圍第11項所述之輿情查詢方法,其中,該步驟(1)係在該複數輿情文章中找尋包含該關鍵字的輿情文章,以作為該第一組輿情文章。 For example, the public opinion query method described in item 11 of the scope of patent application, wherein the step (1) is to search for public opinion articles containing the keyword among the plural public opinion articles, as the first group of public opinion articles. 如申請專利範圍第11項所述之輿情查詢方法,其中,該步驟(2)係在該複數影像中找尋包含該關鍵字之檔案名稱的影像以作為該等輿情影像。 For example, the public opinion query method described in item 11 of the scope of the patent application, wherein the step (2) is to find the images containing the file name of the keyword in the plural images as the public opinion images. 如申請專利範圍第11項所述之輿情查詢方法,其中,該步驟(3)係使用光學文字辨識(OCR)的方式來對於該 等輿情影像進行文字辨識、找出該等輿情影像中的文字區域,以作為該等輿情影像中的字幕區域。 For example, the public opinion query method described in item 11 of the scope of patent application, wherein the step (3) is to use optical character recognition (OCR) for the And other public opinion images perform text recognition, find out the text area in the public opinion images, and use them as the subtitle area in the public opinion images. 如申請專利範圍第11項所述之輿情查詢方法,其中,該步驟(5)係將該第一組輿情文章與該第二組輿情文章進行聯集,以產生該第三組輿情文章。 For example, in the public opinion query method described in claim 11, the step (5) is to combine the first group of public opinion articles with the second group of public opinion articles to generate the third group of public opinion articles. 如申請專利範圍第11項所述之輿情查詢方法,其中,該跑馬燈判別方式係為判斷該等輿情影像中的複數文字區域中之文字的內容及位置是否會隨著時間而改變。 For example, the public opinion query method described in item 11 of the scope of patent application, wherein the marquee judgment method is to determine whether the content and position of the text in the plural text areas in the public opinion images will change over time. 一種輿情查詢方法,包含以下步驟:(1)在複數輿情文章中搜尋與一關鍵字相關的第一組輿情文章;(2)在複數影像中搜尋輿情影像,其中,該等輿情影像係為與該關鍵字相關的影像;(3)針對該等輿情影像進行文字辨識,找出該等輿情影像中的字幕區域;(4)將該等字幕區域中的文字與該複數輿情文章進行相關性比對,以找出第二組輿情文章;及(5)將該第一組輿情文章與該第二組輿情文章結合,以產生第三組輿情文章;其中,該步驟(4)包含以下步驟:(4-1)以一數值化方式將該複數輿情文章及該等字幕區域中的文字轉換成數值化資料;(4-2)以餘弦夾角演算法,針對該複數輿情文章之數值化資料及該等字幕區域中的文字的數值化資料進 行運算,來獲得該複數輿情文章與該等字幕區域中的文字的相似度分數;及(4-3)在該複數輿情文章中,找出其相似度分數大於一預設值的輿情文章,以作為第二組輿情文章。 A public opinion query method includes the following steps: (1) Searching for a first group of public opinion articles related to a keyword in plural public opinion articles; (2) Searching for public opinion images in plural images, wherein the public opinion images are related to Images related to the keyword; (3) Perform text recognition for the public opinion images to find the subtitle area in the public opinion images; (4) Compare the text in the subtitle area with the plural public opinion articles Yes, to find the second group of public opinion articles; and (5) combine the first group of public opinion articles with the second group of public opinion articles to generate a third group of public opinion articles; wherein, the step (4) includes the following steps: (4-1) Convert the plural public opinion articles and the text in the subtitle areas into numerical data in a numerical manner; (4-2) Use the cosine angle algorithm to target the numerical data and the plural public opinion articles The numerical data of the text in the subtitle area is entered Perform operations to obtain the similarity scores between the plural public opinion articles and the text in the subtitle areas; and (4-3) in the plural public opinion articles, find public opinion articles whose similarity scores are greater than a preset value, As the second group of public opinion articles. 如申請專利範圍第17項所述之輿情查詢方法,其中,該步驟(5)係將該第一組輿情文章與該第二組輿情文章進行聯集,以產生該第三組輿情文章。 For example, in the public opinion query method described in the scope of patent application, the step (5) is to combine the first group of public opinion articles with the second group of public opinion articles to generate the third group of public opinion articles.
TW107119658A 2018-06-07 2018-06-07 Public opinion inquiry system and method TWI697789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW107119658A TWI697789B (en) 2018-06-07 2018-06-07 Public opinion inquiry system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW107119658A TWI697789B (en) 2018-06-07 2018-06-07 Public opinion inquiry system and method

Publications (2)

Publication Number Publication Date
TW202001595A TW202001595A (en) 2020-01-01
TWI697789B true TWI697789B (en) 2020-07-01

Family

ID=69941577

Family Applications (1)

Application Number Title Priority Date Filing Date
TW107119658A TWI697789B (en) 2018-06-07 2018-06-07 Public opinion inquiry system and method

Country Status (1)

Country Link
TW (1) TWI697789B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200519912A (en) * 2003-10-31 2005-06-16 Samsung Electronics Co Ltd Storage medium storing meta information for enhanced search and subtitle information, and reproducing apparatus
CN101887445A (en) * 2009-05-12 2010-11-17 大相科技股份有限公司 Dynamic image processing method, network server and added value processing method
TW201220099A (en) * 2010-11-05 2012-05-16 Microsoft Corp Multi-modal approach to search query input
TWI412277B (en) * 2009-08-10 2013-10-11 Univ Nat Cheng Kung Video summarization method based on mining the story-structure and semantic relations among concept entities

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200519912A (en) * 2003-10-31 2005-06-16 Samsung Electronics Co Ltd Storage medium storing meta information for enhanced search and subtitle information, and reproducing apparatus
CN101887445A (en) * 2009-05-12 2010-11-17 大相科技股份有限公司 Dynamic image processing method, network server and added value processing method
TWI412277B (en) * 2009-08-10 2013-10-11 Univ Nat Cheng Kung Video summarization method based on mining the story-structure and semantic relations among concept entities
TW201220099A (en) * 2010-11-05 2012-05-16 Microsoft Corp Multi-modal approach to search query input

Also Published As

Publication number Publication date
TW202001595A (en) 2020-01-01

Similar Documents

Publication Publication Date Title
Jackoway et al. Identification of live news events using Twitter
US9245001B2 (en) Content processing systems and methods
US9201880B2 (en) Processing a content item with regard to an event and a location
US9659278B2 (en) Methods, systems, and computer program products for displaying tag words for selection by users engaged in social tagging of content
US9275152B2 (en) Related entities
US8135739B2 (en) Online relevance engine
US8577882B2 (en) Method and system for searching multilingual documents
US9965726B1 (en) Adding to a knowledge base using an ontological analysis of unstructured text
US20090094189A1 (en) Methods, systems, and computer program products for managing tags added by users engaged in social tagging of content
US20090287676A1 (en) Search results with word or phrase index
CN107180093B (en) Information searching method and device and timeliness query word identification method and device
US20120011129A1 (en) Faceted exploration of media collections
US10783192B1 (en) System, method, and user interface for a search engine based on multi-document summarization
US9195735B2 (en) Information extracting server, information extracting client, information extracting method, and information extracting program
US10152478B2 (en) Apparatus, system and method for string disambiguation and entity ranking
CN101911069A (en) Method and system for discovery and modification of data clusters and synonyms
WO2020258662A1 (en) Keyword determination method and apparatus, electronic device, and storage medium
US8838580B2 (en) Method and system for providing keyword ranking using common affix
US20140081994A1 (en) Identifying Content for Planned Events Across Social Media Sites
CN104008180A (en) Association method of structural data with picture, association device thereof
CN110209781B (en) Text processing method and device and related equipment
Chiny et al. Netflix recommendation system based on TF-IDF and cosine similarity algorithms
US11651039B1 (en) System, method, and user interface for a search engine based on multi-document summarization
CN103955480B (en) A kind of method and apparatus for determining the target object information corresponding to user
Korn et al. Automatically generating interesting facts from wikipedia tables