TWI385540B - Article content value-added service system and method of the same - Google Patents

Article content value-added service system and method of the same Download PDF

Info

Publication number
TWI385540B
TWI385540B TW97108551A TW97108551A TWI385540B TW I385540 B TWI385540 B TW I385540B TW 97108551 A TW97108551 A TW 97108551A TW 97108551 A TW97108551 A TW 97108551A TW I385540 B TWI385540 B TW I385540B
Authority
TW
Taiwan
Prior art keywords
article
keyword
module
word
article content
Prior art date
Application number
TW97108551A
Other languages
Chinese (zh)
Other versions
TW200921430A (en
Inventor
Jen Diann Chiou
Tingchia Chen
Ming Yang Chiang
Erh Chan Yeh
Yungwei Cheng
Kuohua Chung
Original Assignee
Intumit Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intumit Inc filed Critical Intumit Inc
Priority to TW97108551A priority Critical patent/TWI385540B/en
Publication of TW200921430A publication Critical patent/TW200921430A/en
Application granted granted Critical
Publication of TWI385540B publication Critical patent/TWI385540B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Description

文章內容加值服務系統及其方法Article content value-added service system and method thereof

本發明係有關於文章內容加值服務,特定而言係有關於可提供相關之維基式知識、相關詞以及廣告之文章內容加值服務系統及其方法。The present invention relates to an article content value-added service, and in particular to an article content value-added service system and method thereof that can provide related wiki-like knowledge, related words, and advertisements.

對一般讀者而言,在檢視一篇文章時,一般僅能從文章標題來大略瞭解此篇文章之主要描述內容,若欲藉由事先掌握文章要點並加速閱讀之理解深度,並不容易。除非能於閱讀整篇文章之前,即事先將此篇文章之關鍵字詞標示出,如此藉由文章之標題以及與此標題適切關聯之關鍵字詞,即更能掌握此篇文章之主要內容。For the average reader, when viewing an article, it is generally only possible to get a general idea of the main description of the article from the title of the article. It is not easy to grasp the key points of the article in advance and speed up the understanding of reading. Unless you can mark the keyword of the article before reading the entire article, the main content of the article can be grasped by the title of the article and the keyword words that are appropriately related to the title.

然而,單只是標示出文章內容之關鍵字對讀者而言幫助仍有限,尤其是當文章中之關鍵字屬於專有名詞而讀者對於該專有名詞並不知悉時。如此造成讀者必須另外使用其他工具查詢該專有名詞才能有所瞭解。因此,現今仍需一能解決上述問題之系統或方法,以提升讀者閱讀文章之速度及深度。However, the keyword that simply indicates the content of the article is still of limited help to the reader, especially if the keyword in the article is a proper noun and the reader is not aware of the proper noun. This will cause the reader to use another tool to query the proper noun to understand. Therefore, there is still a need for a system or method that can solve the above problems to improve the speed and depth of reading articles.

本發明係提供文章內容加值服務系統及其方法。於一觀點中,本發明之文章內容加值服務系統包含資料庫;關鍵字擷取標示模組,用以擷取並標示文章內容中與文章主題關聯性高之關鍵字;維基式知識擷取模組,其耦合至關鍵字擷取標示模組以及資料庫,用以於資料庫中搜尋與關鍵字相匹配之維基式知識;以及廣告提供模組,其耦合至關鍵字擷取標示模組以及資料庫,用以於資料庫中搜尋與關鍵字相媒合之廣告。此外,本文章內容加值服務系統還包含相關詞回饋模組,其耦合至關鍵字擷取標示模組以及資料庫,用以於資料庫中搜尋與關鍵字相關聯之相關詞,以及多國語言翻譯模組,其耦合至維基式知識擷取模組,用以翻譯維基式知識。再者,關鍵字擷取標示模組包含內容前處理模組、字串比對模組以及主題關聯性計分模組。The present invention provides an article content value-added service system and method thereof. In one aspect, the article content value-added service system of the present invention includes a database; a keyword capture indicator module is used to capture and mark keywords in the article content that are highly relevant to the article topic; wiki-based knowledge capture a module coupled to the keyword capture indicator module and the database for searching the database for wiki-like knowledge matching the keyword; and an advertisement providing module coupled to the keyword capture indicator module And a database for searching for advertisements matching the keywords in the database. In addition, the content value-added service system of the article further includes a related word feedback module coupled to the keyword capture indicator module and the database for searching the database for related words associated with the keyword, and multi-country A language translation module coupled to a wiki-based knowledge capture module for translating wiki-like knowledge. Furthermore, the keyword capture indicator module includes a content pre-processing module, a string comparison module, and a topic relevance scoring module.

於另一觀點中,本發明之文章內容加值服務方法包含擷取並標示出文章內容中與文章主題關聯性高之關鍵字;搜尋與關鍵字相匹配之維基式知識,搜尋與關鍵字相關聯之相關詞以及搜尋與關鍵字相媒合之廣告;以及顯示上述維基式知識、相關詞以及廣告於關鍵字之關聯視窗上。此外,本文章內容加值服務方法還包含即時翻譯維基式知識成為使用者偏好之語言。再者,上述擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟包含對文章內容進行前處理,將經前處理之文章內容與語料庫進行比對以搜尋出對應存於語料庫之字詞,以及計算上述字詞與文章主題之關聯性分數。In another aspect, the article content value-adding service method of the present invention includes: capturing and indicating keywords in the article content that are highly relevant to the article topic; searching for wiki-like knowledge matching the keywords, and searching for keywords related Associated words and search for ads that match the keywords; and display the above wiki-like knowledge, related words, and ads on the associated window of the keyword. In addition, the content value-added service method of this article also includes the instant translation of wiki-style knowledge into the language of user preference. Furthermore, the step of extracting and indicating the keyword with high relevance to the article topic in the article content includes pre-processing the article content, comparing the pre-processed article content with the corpus to search for the corresponding corpus. The word, and the relevance score for calculating the above words and the subject of the article.

本發明之一優點係為本文章內容加值服務系統及其方法可讓使用者於閱讀文章時獲得文章內容中關鍵字之相關進一步資訊。One of the advantages of the present invention is that the content value-added service system and method thereof allow the user to obtain further information about the keywords in the article content when reading the article.

本發明之另一優點係為本文章內容加值服務系統及其方法可讓使用者於閱讀文章時藉由關鍵字之標示而快速抓住文章之重點及意含。Another advantage of the present invention is that the content value-added service system and method thereof allow the user to quickly grasp the focus and meaning of the article by using the keyword indication when reading the article.

本發明之再一優點係為本文章內容加值服務系統及其方法可讓使用者於閱讀文章時能視得使用者感興趣之廣告。Yet another advantage of the present invention is that the content value-added service system and method thereof of the present invention allow a user to view an advertisement of interest to a user when reading an article.

本發明將以較佳之實施例及觀點加以詳細敘述,而此類敘述係解釋本發明之結構及程序,只用以說明而非用以限制本發明之申請專利範圍。因此,除說明書中之較佳實施例之外,本發明亦可廣泛實行於其他實施例。The present invention will be described in detail with reference to the preferred embodiments and the accompanying claims Therefore, the invention may be embodied in other embodiments in addition to the preferred embodiments described herein.

本發明係揭露一種文章內容加值服務系統及其方法。本發明之文章內容加值服務系統可設置於個人電腦或行動裝置之作業系統中,以讓個人電腦或行動裝置之使用者均可使用本文章內容加值服務系統閱讀或瀏覽加值資訊。上述文章內容可為新聞文章、技術文件或其他任何類型之數位文件之內容。The invention discloses an article content value-added service system and a method thereof. The article content value service system of the present invention can be installed in the operating system of the personal computer or the mobile device, so that the user of the personal computer or the mobile device can use the content value service system of the article to read or browse the value-added information. The content of the above article may be the content of a news article, technical file or any other type of digital file.

如第一圖所示,於本發明之較佳實施例中,文章內容加值服務系統10包含關鍵字擷取標示模組101、維基式知識擷取模組102、多國語言翻譯模組103、廣告提供模組104、相關詞回饋模組105,以及資料庫106。於一實施例中,文章內容加值服務系統10可耦合至行動裝置或個人電腦之處理器107,如第一圖所示。於上述行動裝置或個人電腦中,處理器107係耦合至記憶體108、顯示器109、硬碟110、輸出入介面111、音訊輸出入介面112以及指向裝置113。藉此,文章內容加值服務系統10所處理之加值資訊之視訊資料可顯示於顯示器109上,而文章內容加值服務系統10所處理之加值資訊之音訊資料可透過音訊輸出入介面112輸出,且文章內容加值服務系統10於運作時可暫時儲存於記憶體108或儲存於硬碟110中。此外,使用者可利用指向裝置113點選文章內容加值服務系統10中之物件以操控之。As shown in the first figure, in the preferred embodiment of the present invention, the article content value-added service system 10 includes a keyword capture indicator module 101, a wiki-based knowledge capture module 102, and a multi-language translation module 103. The advertisement providing module 104, the related word feedback module 105, and the database 106. In one embodiment, the article content value-added service system 10 can be coupled to a processor 107 of a mobile device or personal computer, as shown in the first figure. In the above mobile device or personal computer, the processor 107 is coupled to the memory 108, the display 109, the hard disk 110, the input/output interface 111, the audio output interface 112, and the pointing device 113. The video data of the value-added information processed by the article content value-added service system 10 can be displayed on the display 109, and the audio information of the value-added information processed by the article content value-added service system 10 can be transmitted through the audio output interface 112. The output and the article content value-added service system 10 can be temporarily stored in the memory 108 or stored in the hard disk 110 during operation. In addition, the user can use the pointing device 113 to select an object in the article content value-added service system 10 to manipulate it.

於一實施例中,關鍵字擷取標示模組101包含內容前處理模組101a、字串比對模組101b以及主題關聯性計分模組101c,以決定及標示關鍵字。內容前處理模組101a係將使用者所閱讀之文章內容進行前處理,例如全半形轉換、英文單字字首轉換大寫、英文單字拆解並以底線區隔等。字串比對模組101b係將完成前處理之文章內容與一事先建立之語料庫例如從維基網站(Wikipedia)下載而得之語料庫利用字串比對演算法進行比對,藉以於文章內容中搜尋出對應存於語料庫中之字詞。上述字串比對演算法係使用習知之中文字串比對演算法。主題關聯性權重模組101c係計算文章內容中被比對出之各字詞與文章主題之關聯性權重,並依其權重高低進行排序。In one embodiment, the keyword capture indicator module 101 includes a content pre-processing module 101a, a string comparison module 101b, and a topic relevance scoring module 101c to determine and label keywords. The content pre-processing module 101a performs pre-processing on the content of the article read by the user, for example, full semi-converted conversion, English single-word initial conversion uppercase, English single-word disassembly, and bottom line segmentation. The string comparison module 101b compares the pre-processed article content with a pre-established corpus, such as a corpus downloaded from a Wikipedia website, using a string comparison algorithm to search for article content. Correspond to the words stored in the corpus. The above-described string comparison algorithm uses a conventional text string comparison algorithm. The topic relevance weight module 101c calculates the relevance weights of the compared words in the article content and the article theme, and sorts according to the weight of the article.

關聯性權重計算之考量參數可包含但不限於詞語頻率(term frequency;TF)、逆向文件頻率(inverse document frequency;IDF)、長詞優先、停用字、本文出現頻率、具英文縮寫字以及維基字詞關聯性等。其中,維基網站係為一網路上之公開寫作計畫,只要符合百科全書之需要與規範,任何人都可自己在維基上撰寫新詞條,或編輯、修改已經存在之詞條。在維基網站中每一個維基字詞(Wiki Word)均有一個頁面加以介紹,且編纂者會於此字詞介紹文字中將與此字詞關連性強且存於維基網站中之其他維基字詞標示出來,因而呈現該維基字詞與其餘維基字詞有連接關係,此連結關係則為上述維基字詞關聯性。在維基網站之公開資料存有網頁連接(Page Link)資料表,便記載上述連接關係。值得注意的是,由於介紹頁面中與其餘維基字詞間之連接,係由編纂者所標示,故會與被介紹之維基字詞主題較為相關。此外,使用者可於關鍵字擷取標示模組101中設定一門檻分數,以便關鍵字擷取標示模組101將主題關聯性權重模組101c所計算權重高於門檻分數之字詞標示為關鍵字並刪除未達門檻分數之字詞。於本發明之一實施例中,關聯性權重之計算係利用維基網站(Wikipedia)之公開資料為基礎以詞頻、逆向文件頻率及長詞優先作為計算權重之參數。以下將舉一實例以說明關聯性權重之計算方式,然而此領域之技藝者應得以領會,下列之實例係用以說明本發明而非用以限制本發明。其算式如下所示:詞語頻率(TF)×逆向文件頻率(IDF)×(1+(維基字詞關連性)+(長詞分數×第三權重))The consideration parameters for the calculation of the association weight may include, but are not limited to, the term frequency (TF), the inverse document frequency (IDF), the long word priority, the stop word, the frequency of occurrence of the text, the English abbreviation, and the wiki. Word relevance, etc. Among them, the wiki website is a public writing plan on the Internet. As long as it meets the needs and norms of the encyclopedia, anyone can write new terms on the wiki, or edit and modify existing ones. Each wiki word (Wiki Word) on the wiki has a page that will be introduced, and the editor will include other wikis in this word that are relevant to the word and stored on the wiki. Marked out, thus presenting that the wiki word has a connection with the remaining wiki words, and the link relationship is the above wiki term relevance. The above link is recorded in the public information of the wiki website with a page link data sheet. It is worth noting that the link between the introduction page and the rest of the wiki words is indicated by the editor and is therefore more relevant to the subject of the introduced wiki. In addition, the user can set a threshold in the keyword capture indicator module 101, so that the keyword capture indicator module 101 identifies the words whose weights calculated by the topic relevance weight module 101c are higher than the threshold score as a key. Words and delete words that do not reach the threshold. In an embodiment of the present invention, the calculation of the relevance weight is based on the public information of the Wikipedia website, and the word frequency, the reverse file frequency, and the long word priority are used as parameters for calculating the weight. In the following, an example will be given to illustrate the manner in which the correlation weights are calculated. However, those skilled in the art should understand that the following examples are intended to illustrate the invention and not to limit the invention. The formula is as follows: word frequency (TF) × reverse file frequency (IDF) × (1 + (wiki word relevance) + (long word score × third weight))

其中維基字詞關連性可以下式表示:(關聯數×第一權重)+(貢獻數×第二權重)The wiki word relevance can be expressed as follows: (association number × first weight) + (contribution number × second weight)

換言之,本發明之算式如下所示:詞語頻率(TF)×逆向文件頻率(IDF)×(1+(關聯數×第一權重)+(貢獻數×第二權重)+(長詞分數×第三權重))In other words, the equation of the present invention is as follows: word frequency (TF) × reverse file frequency (IDF) × (1 + (association number × first weight) + (contribution number × second weight) + (long word score × third) Weights))

其中第一權重、第二權重以及第三權重各值,可由使用者根據所欲檢測之文章類型來加以訂定。其中,詞語頻率(TF)係指某一給定之詞語於一文章中出現之次數。逆向文件頻率(IDF)為一詞語普遍重要性之度量,亦即出現於所有資料庫文件中之頻率。對一維基字詞而言,若其於一文章中具有高詞語頻率,但於整個資料庫文件中出現低文件頻率,亦即僅出現於部分之資料庫文件中,此時維基字詞可產生出高權重之詞語頻率(TF)×逆向文件頻率(IDF)。例如,假如一篇文章之總詞語數為100個,而詞語「棒球」出現3次,則「棒球」一詞於該文章中之詞語頻率係為0.03(=3/100)。一計算文件頻率(DF)之方法係測定有多少份文件出現過「棒球」一詞,然後除以文件集裡包含之文件總數。是故,若「棒球」一詞於1,000份文件出現過,而文件總數為10,000,000份,則其文件頻率係為0.0001(=1000/10,000,000)。最後,詞語頻率(TF)×逆向文件頻率(IDF)分數就可由計算詞語頻率除以文件頻率而得。以上述之實例而言,「棒球」一詞於該文件集之詞語頻率(TF)×逆向文件頻率(IDF)分數則為300(=0.03/0.0001)。The values of the first weight, the second weight, and the third weight may be determined by the user according to the type of the article to be detected. Wherein, the word frequency (TF) refers to the number of times a given word appears in an article. The Reverse File Frequency (IDF) is a measure of the universal importance of a word, that is, the frequency that appears in all database files. For a wiki word, if it has a high word frequency in an article, but a low file frequency appears in the entire database file, that is, it only appears in part of the database file, then the wiki word can be generated. High-frequency word frequency (TF) × reverse file frequency (IDF). For example, if the total number of words in an article is 100 and the word "baseball" appears three times, the word "baseball" in the article has a frequency of 0.03 (= 3/100). A method of calculating the file frequency (DF) is to determine how many files have appeared in the word "baseball" and then divide by the total number of files contained in the file set. Therefore, if the term "baseball" appears in 1,000 documents and the total number of documents is 10,000,000, the file frequency is 0.0001 (=1000/10,000,000). Finally, the word frequency (TF) x reverse file frequency (IDF) score can be obtained by dividing the frequency of the word by the frequency of the file. In the above example, the word "baseball" has a word frequency (TF) x reverse file frequency (IDF) score of 300 (=0.03/0.0001) in the document set.

此外,於維基百科之公開資料存有一網頁連接資料表,其記載各維基字詞間之關聯性。關聯數係指某一維基字詞之連接資料表中之關聯詞之數目。貢獻詞係指某一維基字詞出現於其他維基字詞之連接資料表中。因此,例如「詹詠然」出現於「莊佳容」之連接資料表中,而「莊佳容」亦出現於「詹詠然」之連接資料表中,故若假設一個貢獻詞之分數為0.5,則「詹詠然」及「莊佳容」之貢獻詞分數均為0.5。長詞分數係指長度大於3之關鍵字於判斷是否為重要關鍵字後,給予特定分數。例如,「詹詠然」及「莊佳容」均為3個字,因此其長詞分數均為1,但是「洋基球場」為4個字,因此其長詞分數為1.5,而「衛視體育台」為5個字,因此其長詞分數為2。換言之,於本實施例中,以3個字為基準,每增加一個字其長詞分數則增加0.5。藉此,根據使用者設定之權重值以及各參數值即可算出於文章中被比對出之各字詞之關連性權重,並依權重高低進行排序。In addition, the Wikipedia publicly available data contains a web link data table that records the relevance of each wiki word. The number of associations refers to the number of related words in the linked data table of a wiki word. A contributing word means that a wiki word appears in the linked data table of other wiki terms. For example, "Zhan Yuran" appears in the linked data sheet of "Zhuang Jiarong" and "Zhuang Jiarong" appears in the linked data sheet of "Zhan Yuran". Therefore, if a contribution score of 0.5 is assumed, then "Zhan Yuran" and " Zhuang Jiarong's contribution scores are all 0.5. A long word score refers to a keyword whose length is greater than 3, and a specific score is given after judging whether it is an important keyword. For example, "Zhan Yuran" and "Zhuang Jiarong" are both three characters, so their long-term scores are all 1, but the "Yankee Stadium" is 4 words, so the long-term score is 1.5, and the "TV Sports" is 5 Word, so its long word score is 2. In other words, in the present embodiment, the long word score is increased by 0.5 for each additional word based on 3 words. Thereby, according to the weight value set by the user and each parameter value, the correlation weights of the words compared in the article can be calculated, and sorted according to the weight.

關鍵字擷取標示模組101係耦合至維基式知識擷取模組102、廣告提供模組104以及相關詞回饋模組105,以便分別傳送上述高於門檻分數之關鍵字之資料至三者。維基式知識擷取模組102、廣告提供模組104以及相關詞回饋模組105係各別耦合至資料庫106以於資料庫106中各別搜尋其所需之資料。The keyword capture indicator module 101 is coupled to the wiki-based knowledge capture module 102, the advertisement providing module 104, and the related word feedback module 105 to respectively transmit the data of the keyword above the threshold score to the three. The wiki-based knowledge capture module 102, the advertisement providing module 104, and the related word feedback module 105 are each coupled to the database 106 to search for the required data in the database 106.

維基式知識擷取模組102係利用上述關鍵字資料於資料庫106中搜尋與該關鍵字資料相匹配之維基式知識,以便將該維基式知識顯示於上述關鍵字之關聯視窗上。資料庫106中之維基式知識資料係可事先透過網際網路或行動電話網路從維基網站(Wikipedia)下載其公開資料而成,亦可定期透過網際網路或行動電話網路連線至維基網站(Wikipedia)進行更新。廣告提供模組104係利用上述關鍵字資料於資料庫106中搜尋與該關鍵字資料相媒合之廣告,以便將該廣告顯示於上述關聯視窗中維基式知識之下方、上方或側邊。資料庫106中之廣告資料可為文字、圖案、動畫或其結合,且可事先由廣告主提供至廣告服務提供者,再從廣告服務提供者透過網際網路或行動電話網路傳送至資料庫106,亦可定期透過網際網路或行動電話網路連線至廣告服務提供者進行更新。每一廣告資料可經由廣告主設定一關鍵字,以便與上述關鍵字資料作比對,用以媒合適當之廣告。是故,上述關鍵字資料可用以媒合可能會吸引使用者之廣告。此領域之技藝者應得以領會,於使用者點選、閱讀或瀏覽該廣告資料後,使用者可得到來自廣告主或系統擁有者之回饋報酬。The wiki-based knowledge capture module 102 searches the database 106 for the wiki-like knowledge matching the keyword data by using the keyword data to display the wiki-style knowledge on the associated window of the keyword. The wiki-based knowledge data in the database 106 can be downloaded from the Wikipedia website via the Internet or a mobile phone network in advance, or periodically connected to the wiki via the Internet or a mobile phone network. The website (Wikipedia) is updated. The advertisement providing module 104 searches the database 106 for the advertisement matched with the keyword data by using the keyword data to display the advertisement below, above or to the side of the wiki knowledge in the associated window. The advertising material in the database 106 can be text, graphics, animation or a combination thereof, and can be provided by the advertiser to the advertising service provider beforehand, and then transmitted from the advertising service provider to the database through the internet or mobile phone network. 106, can also be regularly updated through the Internet or mobile phone network to the advertising service provider. Each advertisement material can be set by the advertiser to be compared with the above keyword data for the media to be suitable for the advertisement. Therefore, the above keyword data can be used to match advertisements that may attract users. Artists in this field should be able to understand that after the user clicks, reads or browses the advertising material, the user can receive feedback from the advertiser or system owner.

相關詞回饋模組105係利用上述關鍵字資料於資料庫中106搜尋與該關鍵字資料相關聯之相關詞,以便將該相關詞顯示於上述關聯視窗中維基式知識之下方、上方或側邊。同理,資料庫106中之相關詞資料係可事先儲存於資料庫中106,亦可定期透過網際網路或行動電話網路進行更新。於一實施例中,上述相關詞包含模糊詞。於一實施例中,如第一圖所示,相關詞回饋模組105可耦合至維基式知識擷取模組102,藉此相關詞回饋模組105亦可透過維基式知識擷取模組102從資料庫106中與該關鍵字資料相匹配之維基式知識中擷取相關詞。由於與該關鍵字資料相匹配之維基式知識中之詞語大都與該關鍵字資料相近,因此擷取出之相關詞亦大都與該關鍵字資料相近。於一實施例中,維基式知識擷取模組102可利用上述相關詞資料於資料庫106中搜尋與該相關詞資料相匹配之維基式知識,以便將該維基式知識顯示於上述相關詞之關聯視窗上。多國語言翻譯模組103係耦合至維基式知識擷取模組102,以便即時翻譯維基式知識擷取模組102所取得與關鍵字或相關詞相關之維基式知識成為使用者偏好之語言,例如繁體中文、簡體中文、英文、日文、德文或任何其他語言。於一實施例中,維基式知識擷取模組102亦可包含多國語言選擇模式,以讓使用者選擇偏好語言,使維基式知識擷取模組102於資料庫106中搜尋上述偏好語言版本之維基式知識。The related word feedback module 105 searches the database 106 for the related words associated with the keyword data by using the keyword data to display the related words below, above or to the side of the wiki knowledge in the associated window. . Similarly, the related word data in the database 106 can be stored in the database 106 in advance, or periodically updated via the Internet or a mobile phone network. In an embodiment, the related words include fuzzy words. In an embodiment, as shown in the first figure, the related word feedback module 105 can be coupled to the wiki-based knowledge capture module 102, whereby the related word feedback module 105 can also pass through the wiki-based knowledge capture module 102. Relevant words are retrieved from the wiki-like knowledge in the database 106 that matches the keyword data. Since the words in the wiki-style knowledge that match the keyword data are mostly similar to the keyword data, most of the related words are similar to the keyword data. In an embodiment, the wiki-based knowledge capture module 102 can use the related word data to search the database 106 for wiki-like knowledge matching the related word data, so as to display the wiki-like knowledge on the related words. On the associated window. The multi-language translation module 103 is coupled to the wiki-based knowledge capture module 102 to instantly translate the wiki-like knowledge acquired by the wiki-based knowledge capture module 102 with keywords or related words into a user-preferred language. For example, Traditional Chinese, Simplified Chinese, English, Japanese, German, or any other language. In an embodiment, the wiki-based knowledge capture module 102 can also include a multi-language selection mode for the user to select a preferred language, so that the wiki-based knowledge capture module 102 searches the database 106 for the preferred language version. Wikipedia knowledge.

是故,本發明所提供之文章內容加值服務系統10可將文章內容中與文章主題關聯性較高之關鍵字擷取並標示出,並於關鍵字旁提供含有維基式知識、相關詞、相關廣告等加值資訊之關聯視窗,以讓閱讀該則文章內容之使用者可利用行動裝置或個人電腦瀏覽或閱讀與關鍵字相關之維基式知識以及與關鍵字之相關詞相關之維基式知識,或點選相關廣告而獲得回饋報酬。藉此,本文章內容加值服務系統10可讓使用者於閱讀文章時獲得文章內容中關鍵字之相關進一步資訊,藉由關鍵字之標示而快速抓住文章之重點及意含,以及能視得使用者感興趣之廣告。Therefore, the article content value-added service system 10 provided by the present invention can extract and mark keywords in the article content that are highly correlated with the article topic, and provide wiki-like knowledge, related words, Relevant window of value-added information such as related advertisements, so that the user who reads the content of the article can browse or read the wiki-related knowledge related to the keyword and the wiki-related knowledge related to the keyword related words by using the mobile device or the personal computer. , or click on relevant ads to get rewards. Therefore, the content value-added service system 10 of the article allows the user to obtain further information about the keyword in the article content when reading the article, and quickly grasp the focus and meaning of the article by using the keyword indication, and can view Advertising that users are interested in.

如上所述,於本發明之另一實施例中,本發明係揭露一種電子裝置(或電腦)可辨識儲存媒體,上述電子裝置(或電腦)可辨識儲存媒體包含文章內容加值服務程式或軟體儲存於其內,所述之文章內容加值服務程式包含關鍵字擷取標示模組,用以擷取並標示文章內容中與文章主題關聯性高之關鍵字;維基式知識擷取模組,其耦合至關鍵字擷取標示模組以及資料庫,用以於資料庫中搜尋與關鍵字相匹配之維基式知識;以及廣告提供模組,其耦合至關鍵字擷取標示模組以及資料庫,用以於資料庫中搜尋與關鍵字相媒合之廣告。上述電子裝置可辨識儲存媒體更包含相關詞回饋模組,其耦合至關鍵字擷取標示模組以及資料庫,用以於資料庫中搜尋與關鍵字相關聯之相關詞。上述電子裝置可辨識儲存媒體亦可包含多國語言翻譯模組,其耦合至維基式知識擷取模組,用以翻譯維基式知識。上述關鍵字擷取標示模組包含內容前處理模組或字串比對模組。於一實施例中,上述關鍵字擷取標示模組亦可包含主題關聯性權重模組。As described above, in another embodiment of the present invention, the present invention discloses an electronic device (or computer) identifiable storage medium, and the electronic device (or computer) identifiable storage medium includes an article content value service program or software. Storing therein, the article content value-adding service program includes a keyword capture indicator module for capturing and marking keywords in the article content that are highly relevant to the article topic; a wiki-based knowledge capture module, It is coupled to the keyword capture indicator module and the database for searching the database for wiki-like knowledge matching the keyword; and an advertisement providing module coupled to the keyword capture indicator module and the database Used to search the database for ads that match the keywords. The electronic device identifiable storage medium further includes a related word feedback module coupled to the keyword capture indication module and the database for searching the database for related words associated with the keyword. The electronic device identifiable storage medium may also include a multi-language translation module coupled to the wiki-based knowledge capture module for translating wiki-like knowledge. The keyword capture indicator module includes a content pre-processing module or a string comparison module. In an embodiment, the keyword capture indicator module may also include a topic relevance weight module.

如第二圖所示,於本發明之文章內容加值服務方法中,首先於步驟201擷取並標示出文章內容中與文章主題關聯性高之關鍵字。之後,於步驟202搜尋與上述關鍵字相匹配之維基式知識,搜尋與上述關鍵字相關聯之相關詞,以及搜尋與上述關鍵字相媒合之廣告。其後,於步驟203顯示上述維基式知識、相關詞以及廣告於關鍵字之關聯視窗上。此外,本文章內容加值服務方法還包含步驟204即時翻譯上述維基式知識成為使用者偏好之語言。上述擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟包含步驟201a對文章內容進行前處理,步驟201b將經前處理之文章內容與語料庫進行比對以搜尋出對應存於語料庫之字詞,以及步驟201c計算上述字詞與文章主題之關聯性分數。此外,於一實施例中,上述擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟還包含步驟201d根據關聯性分數將上述字詞排序;以及步驟201e設定門檻分數並刪除關聯性分數未達上述門檻分數之字詞。於一實施例中,若上述文章內容為英文文章內容,則對文章內容進行前處理之步驟包含進行全半形轉換;將該英文文章內容之英文單字字首轉換大寫;以及將該英文文章內容之英文單字拆解並以底線區隔。As shown in the second figure, in the article content value-adding service method of the present invention, first, in step 201, keywords with high relevance to the article topic in the article content are extracted and marked. Then, in step 202, the wiki-like knowledge matching the above keywords is searched, the related words associated with the above keywords are searched, and the advertisements matched with the above keywords are searched. Thereafter, in step 203, the wiki-like knowledge, related words, and advertisements are displayed on the associated window of the keyword. In addition, the content value service method of the article further includes the step 204 of instantly translating the wiki-like knowledge into a language preferred by the user. The step of extracting and indicating the keyword with high relevance to the article topic in the article content comprises the step 201a pre-processing the article content, and the step 201b comparing the pre-processed article content with the corpus to search for the corresponding existence. The words of the corpus, and step 201c calculate the relevance scores of the above words and the subject of the article. In addition, in an embodiment, the step of extracting and marking the keyword in the article content that is highly correlated with the article topic further includes step 201d sorting the word according to the relevance score; and step 201e setting the threshold and deleting The relevance score does not reach the above threshold. In an embodiment, if the content of the article is an English article content, the step of pre-processing the article content includes performing a full-half transformation; converting the English word prefix of the English article content to uppercase; and the content of the English article The English word is disassembled and separated by the bottom line.

上述敘述係為本發明之較佳實施例。此領域之技藝者應得以領會其係用以說明本發明而非用以限定本發明所主張之專利權利範圍。其專利保護範圍當視後附之申請專利範圍及其等同領域而定。凡熟悉此領域之技藝者,在不脫離本專利精神或範圍內,所作之更動或潤飾,均屬於本發明所揭示精神下所完成之等效改變或設計,且應包含在下述之申請專利範圍內。The above description is a preferred embodiment of the invention. Those skilled in the art should be able to understand the invention and not to limit the scope of the patent claims claimed herein. The scope of patent protection is subject to the scope of the patent application and its equivalent fields. Any modification or refinement made by those skilled in the art without departing from the spirit or scope of the present invention is equivalent to the equivalent change or design made in the spirit of the present disclosure, and should be included in the following patent application scope. Inside.

10...文章內容加值服務系統10. . . Article content value-added service system

101...關鍵字擷取標示模組101. . . Keyword capture module

101a...內容前處理模組101a. . . Content pre-processing module

101b...字串比對模組101b. . . String comparison module

101c...主題關聯性計分模組101c. . . Topic relevance scoring module

102...維基式知識擷取模組102. . . Wiki-style knowledge capture module

103...多國語言翻譯模組103. . . Multi-language translation module

104...廣告提供模組104. . . Advertising module

105...相關詞回饋模組105. . . Related word feedback module

106...資料庫106. . . database

107...處理器107. . . processor

108...記憶體108. . . Memory

109...顯示器109. . . monitor

110...硬碟110. . . Hard disk

111...輸出入介面111. . . Output interface

112...音訊輸出入介面112. . . Audio output interface

113...指向裝置113. . . Pointing device

201...步驟201. . . step

201a...步驟201a. . . step

201b...步驟201b. . . step

201c...步驟201c. . . step

201d...步驟201d. . . step

201e...步驟201e. . . step

202...步驟202. . . step

203...步驟203. . . step

204...步驟204. . . step

本發明可藉由說明書中若干較佳實施例及詳細敘述以及後附圖式得以瞭解。然而,此領域之技藝者應得以領會所有本發明之較佳實施例係用以說明而非用以限制本發明之申請專利範圍,其中:第一圖係為根據本發明之文章內容加值服務系統之方塊圖。The invention may be understood by the description of the preferred embodiments and the detailed description and the accompanying drawings. However, those skilled in the art should understand that the preferred embodiments of the present invention are intended to be illustrative and not to limit the scope of the claims of the present invention, wherein the first figure is the value-added service of the article content according to the present invention. Block diagram of the system.

第二圖係為根據本發明之文章內容加值服務方法之示意圖。The second figure is a schematic diagram of an article content value service method according to the present invention.

10...文章內容加值服務系統10. . . Article content value-added service system

101...關鍵字擷取標示模組101. . . Keyword capture module

101a...內容前處理模組101a. . . Content pre-processing module

101b...字串比對模組101b. . . String comparison module

101c...主題關聯性計分模組101c. . . Topic relevance scoring module

102...維基式知識擷取模組102. . . Wiki-style knowledge capture module

103...多國語言翻譯模組103. . . Multi-language translation module

104...廣告提供模組104. . . Advertising module

105...相關詞回饋模組105. . . Related word feedback module

106...資料庫106. . . database

107...處理器107. . . processor

108...記憶體108. . . Memory

109...顯示器109. . . monitor

110...硬碟110. . . Hard disk

111...輸出入介面111. . . Output interface

112...音訊輸出入介面112. . . Audio output interface

113...指向裝置113. . . Pointing device

Claims (23)

一種文章內容加值服務系統,包含:一資料庫;一關鍵字擷取標示模組,用以擷取並標示文章內容中與文章主題關聯性高之關鍵字;一維基式知識擷取模組,其耦合至該關鍵字擷取標示模組以及該資料庫,用以於該資料庫中搜尋與該關鍵字相匹配之維基式知識;以及一廣告提供模組,其耦合至該關鍵字擷取標示模組以及該資料庫,用以於該資料庫中搜尋與該關鍵字相媒合之廣告。An article content value-added service system includes: a database; a keyword capture indicator module for capturing and marking keywords in the article content that are highly relevant to the article topic; a wiki-based knowledge capture module And coupled to the keyword capture indicator module and the database for searching the database for wiki-like knowledge matching the keyword; and an advertisement providing module coupled to the keyword 撷The tag module and the database are used to search for the ad that matches the keyword in the database. 如請求項1所述之文章內容加值服務系統,其中更包含一相關詞回饋模組,其耦合至該關鍵字擷取標示模組以及該資料庫,用以於該資料庫中搜尋與該關鍵字相關聯之相關詞。The article content value-added service system of claim 1, further comprising a related word feedback module coupled to the keyword capture indicator module and the database for searching and searching in the database The related words associated with the keyword. 如請求項1所述之文章內容加值服務系統,其中更包含一多國語言翻譯模組,其耦合至該維基式知識擷取模組,用以翻譯該維基式知識。The article content value-added service system of claim 1, further comprising a multi-language translation module coupled to the wiki-based knowledge capture module for translating the wiki-style knowledge. 如請求項1所述之文章內容加值服務系統,其中該關鍵字擷取標示模組包含一內容前處理模組。The article content value-added service system of claim 1, wherein the keyword capture indicator module comprises a content pre-processing module. 如請求項1所述之文章內容加值服務系統,其中該關鍵字擷取標示模組包含一字串比對模組。The article content value-added service system of claim 1, wherein the keyword capture indicator module comprises a string comparison module. 如請求項1所述之文章內容加值服務系統,其中該關鍵字擷取標示模組包含一主題關聯性權重模組。The article content value-added service system of claim 1, wherein the keyword capture indicator module comprises a topic relevance weight module. 一種文章內容加值服務方法,包含:擷取並標示出文章內容中與文章主題關聯性高之關鍵字;搜尋與該關鍵字相匹配之維基式知識,搜尋與該關鍵字相關聯之相關詞,以及搜尋與該關鍵字相媒合之廣告;以及顯示該維基式知識、該相關詞以及該廣告於該關鍵字之關聯視窗上。An article content value-adding service method includes: capturing and marking keywords in an article content that are highly relevant to an article topic; searching for wiki-like knowledge matching the keyword, searching for related words associated with the keyword And searching for an advertisement that matches the keyword; and displaying the wiki-style knowledge, the related word, and the advertisement on the associated window of the keyword. 如請求項7所述之文章內容加值服務方法,還包含即時翻譯該維基式知識成為使用者偏好之語言。The article content value-adding service method as claimed in claim 7 further includes promptly translating the wiki-style knowledge into a language preferred by the user. 如請求項7所述之文章內容加值服務方法,其中擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟包含對該文章內容進行前處理。The article content value-adding service method according to claim 7, wherein the step of extracting and indicating a keyword having high relevance to the article topic in the article content comprises pre-processing the article content. 如請求項9所述之文章內容加值服務方法,其中該文章內容係為一英文文章內容。The article content value-adding service method of claim 9, wherein the article content is an English article content. 如請求項10所述之文章內容加值服務方法,其中對於該文章內容進行前處理之步驟包含:進行全半形轉換;將該英文文章內容之英文單字字首轉換大寫;以及將該英文文章內容之英文單字拆解並以底線區隔。 The article content value-adding service method as claimed in claim 10, wherein the step of pre-processing the content of the article comprises: performing a full-half-shaped conversion; converting an English single-word prefix of the English article content to uppercase; and the English article The English word of the content is disassembled and separated by the bottom line. 如請求項9所述之文章內容加值服務方法,其中擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟還包含將經前處理之該文章內容與語料庫進行比對以搜尋出對應存於該語料庫之字詞。 The article content value-adding service method of claim 9, wherein the step of extracting and indicating the keyword of the article content that is highly correlated with the article topic further comprises comparing the pre-processed article content with the corpus to Search for the words that correspond to the corpus. 如請求項12所述之文章內容加值服務方法,其中擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟還包含計算該字詞與該文章主題之關聯性分數。 The article content value-adding service method as claimed in claim 12, wherein the step of extracting and indicating the keyword in the article content that is highly correlated with the article topic further comprises calculating a relevance score of the word and the article topic. 如請求項13所述之文章內容加值服務方法,其中擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟還包含根據該關聯性分數將該字詞排序。 The article content value-adding service method of claim 13, wherein the step of extracting and indicating a keyword in the article content that is highly correlated with the article topic further comprises sorting the word according to the relevance score. 如請求項13所述之文章內容加值服務方法,其中擷取並標示出文章內容中與文章主題關聯性高之關鍵字之步驟還包含設定一門檻分數以及刪除該關聯性分數未達該門檻分數之該字詞。 The article content value-adding service method of claim 13, wherein the step of extracting and indicating the keyword with high relevance to the article topic in the article content further comprises setting a threshold and deleting the relevance score does not reach the threshold The word of the score. 如請求項13所述之文章內容加值服務方法,其中計算該字詞與該文章主題之關聯性分數所考量之參數包含詞語頻率、逆向文件頻率、長詞優先、停用字、本文出現頻率、具英文縮寫字以及維基字詞關聯性。 The article content value-adding service method as claimed in claim 13, wherein the parameter for calculating the relevance score of the word and the article topic comprises a word frequency, a reverse file frequency, a long word priority, a stop word, and a frequency of occurrence of the article. , with English abbreviations and wiki word relevance. 如請求項13所述之文章內容加值服務方法,其中計算該字詞與該文章主題之關聯性分數之算式係為詞語頻率×逆向文件頻率×(1+(維基字詞關連性)+(長詞分數×權重))。 The article content value-adding service method as claimed in claim 13, wherein the calculation formula for calculating the relevance score of the word and the article subject is the word frequency×reverse file frequency×(1+(wiki word relevance)+( Long word score × weight)). 一種電子裝置可辨識儲存媒體包含一文章內容加值服務程式儲存於其內,包含:一關鍵字擷取標示模組,用以擷取並標示文章內容中與文章主題關聯性高之關鍵字;一維基式知識擷取模組,其耦合至該關鍵字擷取標示模組以及一資料庫,用以於該資料庫中搜尋與該關鍵字相匹配之維基式知識;以及一廣告提供模組,其耦合至該關鍵字擷取標示模組以及該資料庫,用以於該資料庫中搜尋與該關鍵字相媒合之廣告。 An electronic device identifiable storage medium includes an article content value-adding service program stored therein, comprising: a keyword capture indication module, configured to capture and mark keywords in the article content that are highly correlated with the article theme; a wiki-based knowledge capture module coupled to the keyword capture indicator module and a database for searching the database for wiki-style knowledge matching the keyword; and an advertisement providing module And coupled to the keyword capture indicator module and the database for searching for advertisements matching the keyword in the database. 如請求項18所述之電子裝置可辨識儲存媒體,其中更包含一相關詞回饋模組,其耦合至該關鍵字擷取標示模組以及該資料庫,用以於該資料庫中搜尋與該關鍵字相 關聯之相關詞。 The electronic device identifiable storage medium of claim 18, further comprising a related word feedback module coupled to the keyword capture indication module and the database for searching and searching in the database Keyword phase Related words. 如請求項18所述之電子裝置可辨識儲存媒體,其中更包含一多國語言翻譯模組,其耦合至該維基式知識擷取模組,用以翻譯該維基式知識。 The electronic device identifiable storage medium as claimed in claim 18, further comprising a multi-language translation module coupled to the wiki-based knowledge capture module for translating the wiki-like knowledge. 如請求項18所述之電子裝置可辨識儲存媒體,其中該關鍵字擷取標示模組包含一內容前處理模組。 The electronic device as claimed in claim 18 can identify the storage medium, wherein the keyword capture indicator module includes a content pre-processing module. 如請求項18所述之電子裝置可辨識儲存媒體,其中該關鍵字擷取標示模組包含一字串比對模組。 The electronic device as claimed in claim 18 can identify the storage medium, wherein the keyword capture indicator module comprises a string comparison module. 如請求項18所述之電子裝置可辨識儲存媒體,其中該關鍵字擷取標示模組包含一主題關聯性權重模組。The electronic device as claimed in claim 18 can identify the storage medium, wherein the keyword capture indicator module includes a topic relevance weight module.
TW97108551A 2007-11-02 2008-03-11 Article content value-added service system and method of the same TWI385540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97108551A TWI385540B (en) 2007-11-02 2008-03-11 Article content value-added service system and method of the same

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TW96141538 2007-11-02
TW97108551A TWI385540B (en) 2007-11-02 2008-03-11 Article content value-added service system and method of the same

Publications (2)

Publication Number Publication Date
TW200921430A TW200921430A (en) 2009-05-16
TWI385540B true TWI385540B (en) 2013-02-11

Family

ID=44727858

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97108551A TWI385540B (en) 2007-11-02 2008-03-11 Article content value-added service system and method of the same

Country Status (1)

Country Link
TW (1) TWI385540B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI423053B (en) * 2010-03-05 2014-01-11 Univ Nat Chi Nan Domain Interpretation Data Retrieval Method and Its System

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060004734A1 (en) * 2004-05-21 2006-01-05 Peter Malkin Method, system, and article to provide data analysis or searching
TWI267005B (en) * 2002-11-21 2006-11-21 Ibm System, method, and computer readable medium for annotating a displayed received document without changing the received document content
KR100756382B1 (en) * 2006-04-26 2007-09-10 엔에이치엔(주) Method for accumulating user created contents and system thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI267005B (en) * 2002-11-21 2006-11-21 Ibm System, method, and computer readable medium for annotating a displayed received document without changing the received document content
US20060004734A1 (en) * 2004-05-21 2006-01-05 Peter Malkin Method, system, and article to provide data analysis or searching
KR100756382B1 (en) * 2006-04-26 2007-09-10 엔에이치엔(주) Method for accumulating user created contents and system thereof

Also Published As

Publication number Publication date
TW200921430A (en) 2009-05-16

Similar Documents

Publication Publication Date Title
JP5395461B2 (en) Information recommendation device, information recommendation method, and information recommendation program
JP5224868B2 (en) Information recommendation device and information recommendation method
JP5311378B2 (en) Feature word automatic learning system, content-linked advertisement distribution computer system, search-linked advertisement distribution computer system, text classification computer system, and computer programs and methods thereof
US8355997B2 (en) Method and system for developing a classification tool
US8001135B2 (en) Search support apparatus, computer program product, and search support system
US9201955B1 (en) Unambiguous noun identification
US9189562B2 (en) Apparatus, method and program product for classifying web browsing purposes
US20090144240A1 (en) Method and systems for using community bookmark data to supplement internet search results
US8812505B2 (en) Method for recommending best information in real time by appropriately obtaining gist of web page and user's preference
US20090254455A1 (en) System and method for virtual canvas generation, product catalog searching, and result presentation
KR101574277B1 (en) Providing content using stored query information
JP2011529600A (en) Method and apparatus for relating datasets by using semantic vector and keyword analysis
US8782049B2 (en) Keyword presenting device
JP6429382B2 (en) Content recommendation device and program
JP2010225115A (en) Device and method for recommending content
KR100393176B1 (en) Internet information searching system and method by document auto summation
JP2010044585A (en) Advertisement distribution device, advertisement distribution method and advertisement distribution control program
RU2683482C2 (en) Method of displaying relevant contextual information
KR101606758B1 (en) Issue data extracting method and system using relevant keyword
JP4879775B2 (en) Dictionary creation method
JP2007193697A (en) Information collection apparatus, information collection method and program
Gali et al. Extracting representative image from web page
TWI385540B (en) Article content value-added service system and method of the same
JP2010026996A (en) Tag attachment support method and its device, program, and recording medium
Mishra et al. VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees