TW201513013A - Method of digging product evaluation words in electronic articles and system thereof - Google Patents

Method of digging product evaluation words in electronic articles and system thereof Download PDF

Info

Publication number
TW201513013A
TW201513013A TW102134816A TW102134816A TW201513013A TW 201513013 A TW201513013 A TW 201513013A TW 102134816 A TW102134816 A TW 102134816A TW 102134816 A TW102134816 A TW 102134816A TW 201513013 A TW201513013 A TW 201513013A
Authority
TW
Taiwan
Prior art keywords
word
mouth
new
positive
module
Prior art date
Application number
TW102134816A
Other languages
Chinese (zh)
Other versions
TWI501174B (en
Inventor
Jing-Yi Li
wei-jun Wu
Jia-Jie Lu
Original Assignee
Telexpress Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telexpress Corp filed Critical Telexpress Corp
Priority to TW102134816A priority Critical patent/TW201513013A/en
Publication of TW201513013A publication Critical patent/TW201513013A/en
Application granted granted Critical
Publication of TWI501174B publication Critical patent/TWI501174B/zh

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of digging product evaluation words in electronic articles and system thereof are disclosed. A plurality of articles that receive public praises are selected from a plurality of universal test data and a universal prediction model is established. A prediction is conducted to a plurality of industrial test data according to the universal prediction model, and new positive and negative words of mouth are then determined and dug. The positive and negative words of mouth, either new or old, are accumulated and a plurality of new narrative sentences of words of mouth are extracted to establish a new prediction model. A prediction is conducted again to the industrial test data to repeat the processes of establishing a new prediction model, and digging new positive and negative words of mouth and new narrative sentences of words of mouth. All accumulated positive and negative words of mouth are not imported to a universal vocabulary database, which is used as common vocabularies, until no new positive and negative words of mouth are discovered.

Description

用於電子化文章之商品評價用詞發掘方法及其系統 Product evaluation word excavation method and system thereof for electronic article

本發明係關於一種用於電子化文章之商品評價用詞發掘方法及其系統,特別是指一種能夠發掘出商品評價用詞,並建立出一通用詞彙資料庫,以於網路上進行測電子化文章的商品評價時,能夠依據商品評價用詞更快且更精準的篩選出網路上輿論對特定商品的評價。 The invention relates to a method and system for excavating a product evaluation word for an electronic article, in particular to a word for examining product evaluation, and establishing a general vocabulary database for electronic measurement on the network. In the evaluation of articles in the article, it is possible to filter out the evaluation of specific products on the Internet by the public opinion words based on the product evaluation words faster and more accurately.

網際網路已經為商品資訊傳播的主要管道,消費者在售前會上網詢價、比價,而除了價格之外,更大一部份是詢問網路上輿論對所欲購買的商品的使用心得與評價,甚至有消費者更會於消費後上網尋求技術上的搜尋,或是負評的回應,因此若是業主能在上述的時間點提供使用者必要的協助或是提出解釋,不但可以減少使用者的不滿,更能夠正面地提高消費者對該項產品的信心,以增加使用該項產品的消費群。 The Internet has become the main channel for the dissemination of commodity information. Consumers will query and compare prices online before the sale. In addition to the price, the larger part is to ask the public on the Internet to use the products they want to buy. Evaluation, even consumers will seek technical search or negative feedback after spending online, so if the owner can provide the necessary assistance or explanation at the above time, it can reduce the user. Dissatisfaction, it is more able to positively increase consumer confidence in the product to increase the consumer base of the product.

因此有些業者會對網路上的文章進行篩選網路上輿論對特定商品的評價,但由於不同領域、不同產業對於一些評價用語的定義與用法皆會有所差異,甚至有些產業(領域)有獨有的評價用詞,而這些用詞需要領域專家花費大量的人力才能取得,而一般自行定義的通用用語,由於完整度不高,所以也不見得真的能適用於全部的產業使用。 Therefore, some operators will screen the articles on the Internet to evaluate the specific products on the Internet. However, the definitions and usages of some evaluation terms will be different in different fields and industries, and even some industries (domains) have their own uniqueness. The evaluation of the words, and these terms require the field experts to spend a lot of manpower to obtain, and the general definition of the general terms, because the integrity is not high, it is not necessarily applicable to all industries.

因此,若能夠經由商品評價用詞的發掘,逐漸建立出一可持 續更新的通用詞彙資料庫,以能夠跨領域使用,而經過不同領域的文章重覆測試後,更能夠持續添加新的通用用語,如此將能夠達到持續成長與學習,以使通用詞彙資料庫能夠通用於各種不同的領域或產業使用,如此應為一最佳解決方案。 Therefore, if you can explore the wording of product evaluation, you can gradually establish a sustainable The updated general vocabulary database can be used across domains, and after repeated tests in different fields, it is possible to continuously add new generic terms, which will enable continuous growth and learning so that the general vocabulary database can It is used in a variety of different fields or industries, and this should be an optimal solution.

本發明即在於提供一種用於電子化文章之商品評價用詞發掘方法及其系統,係能夠發掘出商品評價用詞,並建立出一通用詞彙資料庫,以於網路上進行測電子化文章的商品評價時,能夠依據商品評價用詞更快且更精準的篩選出網路上輿論對特定商品的評價。 The present invention provides a method and system for excavating a product evaluation word for an electronic article, which is capable of discovering a product evaluation term and establishing a general vocabulary database for conducting an electronic article on the network. In the evaluation of commodities, it is possible to screen out the evaluation of specific commodities on the Internet by the public opinion words based on the product evaluation terms.

可達成上述用於電子化文章之商品評價用詞發掘方法及其系統,其中用於電子化文章之商品評價用詞發掘方法,其步驟為: 1.先以數筆通用測試資料挑選出數筆含有口碑的文章,並取出數筆正負口碑用詞及數筆口碑敘述句後,再依據正負口碑用詞與口碑敘述句,進行設定並建立出一通用預測模型; 2.之後,再依據通用預測模型將數筆產業領域測試資料進行預測,並判斷及發掘用於產業領域之新的正負口碑用詞; 3.再將既有的正負口碑用詞與新的正負口碑用詞進行累計後,將累計之正負口碑用詞於數筆含有口碑的文章中取出數筆新的口碑敘述句後,再依據累計之正負口碑用詞與新的口碑敘述句進行設定對文字進行標註的規則,以建立出一新的預測模型;以及 4.最後,再依據新的預測模型將數筆產業領域測試資料進行再次預測,並再次進行判斷及發掘新的正負口碑用詞與新的口碑敘述 句,之後,重覆進行新的預測模型建立與新的正負口碑用詞及新的口碑敘述句發掘,並於沒有發掘到新的正負口碑用詞後,則結束發掘程序。 The above-mentioned product evaluation word excavation method and system for electronic article can be achieved, wherein the method for excavating a product evaluation word for an electronic article is as follows: 1. First select a few articles containing word-of-mouth with several general test data, and take out a few positive and negative word-of-mouth words and several word-of-mouth narration sentences, then set and establish according to the words of positive and negative word-of-mouth and word-of-mouth narration. a general predictive model; 2. After that, based on the general forecasting model, predict the test data of several industrial fields, and judge and explore new positive and negative word-of-mouth terms used in the industrial field; 3. After accumulating the existing positive and negative word-of-mouth words with the new positive and negative word-of-mouth words, the accumulated positive and negative word-of-mouth words are taken out from several articles containing word-of-mouth to extract several new word-of-mouth narrative sentences, and then based on the cumulative The positive and negative word-of-mouth word and the new word-of-mouth narrative are used to set rules for labeling text to establish a new prediction model; 4. Finally, based on the new forecasting model, the test data of several industrial fields will be predicted again, and the new positive and negative word-of-mouth words and new word-of-mouth narratives will be judged again. After the sentence, the new prediction model is established and the new positive and negative word-of-mouth words and new word-of-mouth narratives are discovered. After the new positive and negative word-of-mouth words are not discovered, the excavation process is terminated.

更具體的說,所述結束發掘程後,則將全部累計之正負口碑用詞導入一通用詞彙資料庫中,以做為共用的詞彙使用。 More specifically, after the end of the excavation process, all accumulated positive and negative word-of-mouth words are imported into a common vocabulary database for use as a shared vocabulary.

更具體的說,所述文章係為網路文章。 More specifically, the article is a web article.

本發明之用於電子化文章之商品評價用詞發掘系統,係包含一測試資料輸入模組、一通用預測模型建立模組、一專有詞彙發掘模組、一詞彙累計模組、一敘述句擷取模組、一預測模型訓練模組,其中該測試資料輸入模組係用以輸入測試資料,該測試資料係至少包含數筆通用測試資料與數筆產業領域測試資料;而該通用預測模型建立模組係包含一口碑文章挑選模組,係與該測試資料輸入模組相連接,該口碑文章挑選模組用以由數筆通用測試資料中挑選出數筆含有口碑的文章;一口碑用詞挑選模組,係與該口碑文章挑選模組相連接,係能夠於數筆含有口碑的文章整理出數筆正負口碑用詞;一敘述句擷取模組,係與該口碑文章挑選模組及該口碑用詞挑選模組相連接,使用正負口碑用詞於數筆含有口碑的文章中取出數筆口碑敘述句;一通用預測模型訓練模組,係與該口碑用詞挑選模組及該敘述句擷取模組相連接,用以依據正負口碑用詞與口碑敘述句進行設定對文字進行標註的規則,以建立出一通用預測模型;該專有詞彙發掘模組係與該測試資料輸入模組及該通用預測模型訓練模組相連接,係能夠依據通用預測模型將數筆產業領域測試資料進行預測,並判斷及取出用於產業領域之新的正負口碑用詞;而該詞彙 累計模組,係與該專有詞彙發掘模組及該口碑用詞挑選模組相連接,用以將既有的正負口碑用詞與新的正負口碑用詞進行累計;該敘述句擷取模組,係與該口碑文章挑選模組及該詞彙累計模組相連接,使用累計之正負口碑用詞於數筆含有口碑的文章中取出數筆新的口碑敘述句;該預測模型訓練模組,係與該詞彙累計模組、敘述句擷取模組及該專有詞彙發掘模組相連接,用以依據累計之正負口碑用詞與新的口碑敘述句進行設定對文字進行標註的規則,以建立出一新的預測模型,而該專有詞彙發掘模組能夠再依據新的預測模型將數筆產業領域測試資料進行預測,並重覆進行判斷及取出用於產業領域之新的正負口碑用詞。 The product evaluation word excavation system for electronic articles of the present invention comprises a test data input module, a general predictive model building module, a proprietary vocabulary mining module, a vocabulary accumulating module, and a narrative sentence. The capture module and the predictive model training module, wherein the test data input module is configured to input test data, the test data includes at least a plurality of general test data and a plurality of industrial field test materials; and the universal predictive model The building module system comprises a word-of-mouth article selection module, which is connected with the test data input module, and the word-of-mouth article selection module is used to select a number of articles containing word-of-mouth from a plurality of general test materials; The word selection module is connected with the word-of-mouth article selection module, and is capable of sorting out a number of positive and negative word-of-mouth words in a number of articles containing word-of-mouth; a narrative sentence capture module, and the word-of-mouth article selection module And the word-of-mouth is connected by a word selection module, and uses a positive and negative word-of-mouth word to extract a plurality of word-of-mouth narration sentences in a number of articles containing word-of-mouth; a general prediction model training model And the word selection module and the narrative extraction module are connected to set a rule for labeling the text according to the positive and negative word-of-mouth word and the word-of-mouth sentence to establish a general prediction model; The proprietary vocabulary mining module is connected with the test data input module and the general predictive model training module, and is capable of predicting and testing a plurality of industrial field test data according to a general predictive model, and judging and extracting for use in the industrial field. New words for positive and negative word of mouth; and the word The cumulative module is connected with the proprietary vocabulary mining module and the word-of-mouth word selection module for accumulating the existing positive and negative word-of-mouth terms with the new positive and negative word-of-mouth terms; The group is connected with the word-of-mouth article selection module and the vocabulary accumulation module, and uses the accumulated positive and negative word-of-mouth words to extract a number of new word-of-mouth narrative sentences in the articles containing word-of-mouth; the prediction model training module, The system is connected with the vocabulary accumulation module, the narrative sentence capture module and the exclusive vocabulary search module, and is configured to set a rule for labeling the text according to the accumulated positive and negative word-of-mouth word and the new word-of-mouth narration sentence, A new predictive model is established, and the proprietary vocabulary mining module can predict a number of industrial field test data based on the new predictive model, and repeatedly judge and extract new positive and negative word-of-mouth terms for the industrial field. .

更具體的說,所述用於電子化文章之商品評價用詞發掘系統,更包含有一與該詞彙累計模組相連接之通用詞彙資料庫,其中該通用詞彙資料庫中具有共用之正負口碑用詞,而該專有詞彙發掘模組判斷已無新的正負口碑用詞時,則透過該詞彙累計模組將累計之正負口碑用詞導入該通用詞彙資料庫中。 More specifically, the commodity evaluation word excavation system for an electronic article further includes a general vocabulary database connected to the vocabulary accumulation module, wherein the common vocabulary database has a shared positive and negative reputation. The word, and the proprietary vocabulary mining module judges that there is no new positive or negative word-of-mouth word, and then uses the vocabulary accumulation module to import the accumulated positive and negative word-of-mouth words into the general vocabulary database.

更具體的說,所述文章係為網路文章。 More specifically, the article is a web article.

〔本發明〕 〔this invention〕

11‧‧‧測試資料輸入模組 11‧‧‧Test data input module

12‧‧‧通用預測模型建立模組 12‧‧‧General predictive model building module

121‧‧‧口碑文章挑選模組 121‧‧‧Words article selection module

122‧‧‧口碑用詞挑選模組 122‧‧‧Words word selection module

123‧‧‧敘述句擷取模組 123‧‧‧ Narrative Sentence Module

124‧‧‧通用預測模型訓練模組 124‧‧‧General Predictive Model Training Module

13‧‧‧專有詞彙發掘模組 13‧‧‧Special vocabulary mining module

14‧‧‧詞彙累計模組 14‧‧‧ vocabulary accumulation module

15‧‧‧敘述句擷取模組 15‧‧‧ Narrative Sentence Module

16‧‧‧預測模型訓練模組 16‧‧‧ Forecast Model Training Module

17‧‧‧通用詞彙資料庫 17‧‧‧Common vocabulary database

18‧‧‧領域詞彙資料庫 18‧‧‧ Domain vocabulary database

第1圖係本發明用於電子化文章之商品評價用詞發掘方法及其系統之流程示意圖。 Fig. 1 is a flow chart showing the method for excavating a product for electronic evaluation of the article and the system thereof.

第2圖係本發明用於電子化文章之商品評價用詞發掘方法及其系統之整體架構示意圖。 Fig. 2 is a schematic view showing the overall structure of a product evaluation word excavation method and system thereof for use in an electronic article.

有關於本發明之前述及其他技術內容、特點與功效,在以下配合參考圖式之較佳實施例的詳細說明中,將可清楚的呈現。 The above and other technical contents, features and advantages of the present invention will be apparent from the following detailed description of the preferred embodiments.

請參閱第1圖,為本發明用於電子化文章之商品評價用詞發掘方法及其系統之流程示意圖,由圖中可知,其步驟為:1.先以數筆通用測試資料挑選出數筆含有口碑的文章,並取出數筆正負口碑用詞及數筆口碑敘述句後,再依據正負口碑用詞與口碑敘述句,進行設定並建立出一通用預測模型101;2.之後,再依據通用預測模型將數筆產業領域測試資料進行預測,並判斷及發掘用於產業領域之新的正負口碑用詞102;3.再將既有的正負口碑用詞與新的正負口碑用詞進行累計後,將累計之正負口碑用詞於數筆含有口碑的文章中取出數筆新的口碑敘述句後,再依據累計之正負口碑用詞與新的口碑敘述句進行設定對文字進行標註的規則,以建立出一新的預測模型103;以及4.最後,再依據新的預測模型將數筆產業領域測試資料進行再次預測,並再次進行判斷及發掘新的正負口碑用詞與新的口碑敘述句,之後,重覆進行新的預測模型建立與新的正負口碑用詞及新的口碑敘述句發掘,並於沒有發掘到新的正負口碑用詞後,則結束發掘程序104。 Please refer to FIG. 1 , which is a schematic flowchart of a method for excavating a product evaluation word and a system thereof for an electronic article according to the present invention. The steps are as follows: 1. First, select a few pens from a plurality of general test materials. After the articles containing word-of-mouth, and taking out a number of positive and negative word-of-mouth words and several word-of-mouth narration sentences, then based on the positive and negative word-of-mouth words and word-of-mouth narration sentences, set up and establish a general prediction model 101; 2. After that, according to the general The forecasting model predicts several test data in the industrial field, and judges and discovers the new positive and negative word-of-mouth word 102 used in the industrial field; 3. Then accumulates the existing positive and negative word-of-mouth words with the new positive and negative word-of-mouth words. After the cumulative positive and negative word of mouth is used to extract several new word-of-mouth narrative sentences in several articles containing word-of-mouth, the rules for labeling the characters are set according to the accumulated positive and negative word-of-mouth words and the new word-of-mouth sentence. Establish a new forecasting model 103; and 4. Finally, based on the new forecasting model, re-predict several industrial field test data, and judge and discover new ones again. Negative word-of-mouth and new word-of-mouth narratives, after which repeated new prediction model establishments and new positive and negative word-of-mouth words and new word-of-mouth narratives were discovered, and after the new words of positive and negative word-of-mouth were not discovered, The discovery process 104 ends.

請參閱第2圖可知,該用於電子化文章之商品評價用詞發掘系統,係包含一測試資料輸入模組11、一通用預測模型建立模組12、一專有詞彙發掘模組13、一詞彙累計模組14、一敘述句擷取模組15、一預測模型訓練模組16、一通用詞彙資料庫17及一領域詞彙資料庫18,其中該測 試資料輸入模組11係用以輸入測試資料,所能夠輸入之測試資料係包含數筆通用測試資料或是數筆產業領域測試資料;而該通用預測模型建立模組12係包含一口碑文章挑選模組121、一口碑用詞挑選模組122、一敘述句擷取模組123及一通用預測模型訓練模組124,該通用預測模型建立模組12最主要是用以建立出一通用預測模型出來,因此當該測試資料輸入模組11將數筆通用測試資料輸入該口碑文章挑選模組121後,由該口碑文章挑選模組121由數筆通用測試資料中挑選出數筆已含有口碑的文章,例如一篇文章中,有下列內容「我家紘寶一開始喝桂格 新美力,但會一直便便,像拉肚子一樣,後來就換雀巢S26,喝了S26真的會比較虛胖(現在看那時的照片,都覺的他好壯),而且有一陣子他會溢奶,所以又改換新安琪兒安哺奶粉(我同學的孩子也喝這罐),換了新安琪兒後喝的還不錯,現在一歲已改新安琪兒的羊奶了,提供給水水參考囉!」,由於這篇文章中有出現桂格、新美力、拉肚子、還不錯等評價用語,故能夠將此篇文章挑選出來;之後,由與該口碑文章挑選模組121相連接之口碑用詞挑選模組122,於該數筆含有口碑的文章中整理出數筆正負口碑用詞,以有下列內容舉例說明,「我們家本來喝S26,我也覺得會虛胖,現在改喝優生」,其中能夠將虛胖標注為負向口碑用詞,而改喝則標注為正向口碑用詞;而標註且整理出數筆正負口碑用詞後,則由與該口碑文章挑選模組121及該口碑用詞挑選模組122相連接之敘述句擷取模組123,使用標注的正負口碑用詞於數筆含有口碑的文章中取出數筆口碑敘述句,而該敘述句擷取模組123則能夠取得商品及口碑所在的位置,其中會含概完整 的句子,例如下列的含有口碑的文章中:<div class="body dfs"><p>可以去參加媽媽教室呀</p><p>孕婦也需要一些活動量~</p><p>參加媽媽教室一則學習當媽媽</p><p>一則認識其他的媽咪可以當同學~</p><p>有的課程可以拿到小贈品我覺得也是不錯的附加價值~</p><p>媽咪教室的課程也滿多的</p><p>像是再生緣、優生、mamaway參加過經驗都還不錯唷</p><p>學到的知識是真的很受用~</p>其中「優生」與「不錯」皆為正負口碑用詞,因此藉由「優生」與「不錯」於此篇文章中篩選出敘述句「像是再生緣、優生、mamaway參加過經驗都還不錯唷」,而敘述句擷取更具有長句斷句規則與短句斷句規則,其中會先檢查句子長度,以判斷要進行長句斷句(本實施例中則是設定每一句長度大於超過25個字元,則進行長句斷句)或短句斷句(設定每一句長度小於超過25個字元,則進行短句斷句),而長句斷句規則則是以標點符號斷句,截取正負口碑用詞涵蓋範圍,反之,短句斷句規則則是以短句的前後句子擴展,以截取正負口碑用詞的涵蓋範圍。 Please refer to FIG. 2 , the product evaluation word excavation system for electronic articles includes a test data input module 11 , a general predictive model building module 12 , a proprietary vocabulary excavation module 13 , and a a vocabulary accumulation module 14, a narration capture module 15, a predictive model training module 16, a general vocabulary database 17, and a domain vocabulary database 18, wherein the test The test data input module 11 is used for inputting test data, and the test data that can be input includes a plurality of general test data or a plurality of industrial field test materials; and the general predictive model building module 12 includes a word-of-mouth article selection. The module 121, the word-of-mouth word selection module 122, the narration sentence capturing module 123 and a general predictive model training module 124, the general predictive model building module 12 is mainly used to establish a general predictive model. When the test data input module 11 inputs a plurality of general test data into the word-of-mouth article selection module 121, the word-of-mouth article selection module 121 selects a plurality of general-purpose test materials and contains a number of words that have been included in the word-of-mouth. Articles, such as an article, have the following content: "My family started to drink Quaker's new beauty, but it will always be convenient, like diarrhea, and later changed to Nestle S26, drinking S26 will be more puffy (now look At that time, the photos all felt that he was strong.) And for a while he would overflow the milk, so he changed to the new Angel's nursing powder (the children of my classmates also drink this can), after changing the new Angel Drinking is not bad, now one year old has changed the Angel milk of Angel, and provide water and water reference!", because this article has the evaluation words such as Quaker, New Beauty, diarrhea, and good, so you can After the article is selected, the word-of-mouth word selection module 122 connected to the word-of-mouth article selection module 121 is used to sort out a number of positive and negative word-of-mouth words in the articles containing word-of-mouth, with the following contents as examples. Explain, "Our family originally intended to drink S26, I also think that it will be puffy, and now change to drink eugenics," which can be marked as negative word of mouth, and changed to drink as a positive word of mouth; and marked and sorted out After a number of positive and negative word-of-mouth words, the narration sentence extraction module 123 is connected with the word-of-mouth article selection module 121 and the word-of-mouth word selection module 122, and the used positive and negative word-of-mouth words are used in the number of pens to contain word of mouth. In the article, a number of word-of-mouth narration sentences are taken, and the narration sentence capture module 123 can obtain the location of the product and the word-of-mouth, which will be complete. Sentences, such as the following articles with word of mouth: <div class="body dfs"><p>You can go to the mom classroom</p><p>Pregnant women also need some activity~</p><p >Participate in the mother classroom, learn to be a mother</p><p>One can meet other mommy as a classmate~</p><p>Some courses can get small gifts. I think it is also a good added value~</ p><p>Mummy classroom courses are also full.</p><p>It’s like regenerative, eugenics, and mamaway. It’s not bad.</p><p>The knowledge is true. Very useful ~</p> Among them, "eugenics" and "good" are both positive and negative word-of-mouth. Therefore, the words "like eugenics, eugenics, mamaway" are selected in this article by "eugenics" and "good". It has been a good experience to participate in the experience, and the narrative sentence has a long sentence sentence rule and a short sentence sentence rule, in which the sentence length will be checked first to judge the long sentence sentence (in this embodiment, each sentence is set) If the length is greater than 25 characters, the long sentence is broken or the short sentence is broken (if each sentence is less than 25 characters in length, the short sentence is broken) Sentence punctuation rules is based on punctuation punctuation, word negative intercept terms coverage, whereas the phrase punctuation rules before and after the sentence is based on the phrase extension to intercept terms of coverage of the positive and negative reputation.

之後,由與該口碑用詞挑選模組122及該敘述句擷取模組123相連接之通用預測模型訓練模組124,依據正負口碑用詞與口碑敘述句進行設定對文字進行標註的規則,該通用預測模型訓練模組124係使用Conditional Random Fields(CRF)使系統進行學習,並進行對輸入的文字進行標註的動作訓練,以演算法學習先前標註時所運用的知識後,模仿先 前對文字進行的標註,以建立出一通用預測模型。 Then, the general predictive model training module 124 connected to the word-of-mouth word selection module 122 and the sentence sentence capturing module 123 sets a rule for labeling characters according to the positive and negative word-of-mouth word and the word-of-mouth sentence. The general predictive model training module 124 uses the Conditional Random Fields (CRF) to learn the system, and performs the action training of the input characters, and learns the knowledge used in the previous labeling by the algorithm. The previous text is labeled to create a general predictive model.

而通用預測模型建立出來後,則由與該測試資料輸入模組11及該通用預測模型訓練模組124相連接之專有詞彙發掘模組13,依據通用預測模型將該測試資料輸入模組所輸入的數筆產業領域測試資料進行預測,預測後若非屬於先前所取得之正負口碑用詞,則取出做為用於產業領域之新的正負口碑用詞,之後,由與該專有詞彙發掘模組13及該口碑用詞挑選模組122相連接之詞彙累計模組14,將既有的正負口碑用詞與新的正負口碑用詞進行累計,而所累計之新的正負口碑用詞更能夠直接儲存於一領域詞彙資料庫18中,以使不同領域的領域詞彙資料庫將能夠累計不同領域的專有用詞;之後,與該口碑文章挑選模組121及該詞彙累計模組14相連接之敘述句擷取模組15,能夠再使用累計之正負口碑用詞於數筆含有口碑的文章中取出數筆新的口碑敘述句,並再由與該詞彙累計模組14、敘述句擷取模組15及該專有詞彙發掘模組13相連接之預測模型訓練模組16,藉由累計之正負口碑用詞與新的口碑敘述句進行設定對文字進行標註的規則,以建立出一新的預測模型;之後,該專有詞彙發掘模組13再依據新的預測模型對數筆產業領域測試資料再次進行預測,並重覆進行判斷及取出用於產業領域之新的正負口碑用詞,而該專有詞彙發掘模組13判斷已無新的正負口碑用詞時,則透過該詞彙累計模組14將累計之正負口碑用詞與敘述句導入一通用詞彙資料庫17中,以做為通用的詞彙資料使用,而經由不同領域的測試資料發掘後,將能夠使通用詞彙資料庫17更加完整,並於日後於網路上進行 測電子化文章的商品評價時,依據通用商品評價用詞將更快且更精準的篩選出網路上輿論對特定商品的評價。 After the general predictive model is established, the proprietary vocabulary mining module 13 connected to the test data input module 11 and the universal predictive model training module 124 inputs the test data into the module according to the general predictive model. The input of several industrial field test data is predicted, and if it is not the previously used positive and negative word-of-mouth terms, it is taken out as a new positive and negative word-of-mouth term for the industrial field, and then, the model is developed with the exclusive vocabulary. The group 13 and the vocabulary accumulation module 14 connected by the word-of-mouth word selection module 122 accumulate the existing positive and negative word-of-mouth words and the new positive and negative word-of-mouth words, and the accumulated new positive and negative word-of-mouth words are more capable. Directly stored in a domain vocabulary database 18, so that domain vocabulary databases in different fields will be able to accumulate proprietary terms in different fields; and then connected to the word-of-mouth article selection module 121 and the vocabulary accumulation module 14 The narrative sentence capture module 15 can then use the accumulated positive and negative word-of-mouth words to extract a number of new word-of-mouth narrative sentences in a number of articles containing word-of-mouth, and then be tired by the vocabulary The module 14 , the narrative retrieval module 15 and the predictive model training module 16 connected to the exclusive vocabulary excavation module 13 are configured to mark the text by accumulating the positive and negative word-of-mouth terms and the new word-of-mouth narrative. The rules are used to establish a new forecasting model; after that, the proprietary vocabulary mining module 13 re-predicts the test data of the industrial field based on the new forecasting model, and repeatedly judges and extracts new products for the industrial field. The positive and negative word-of-mouth word, and the proprietary vocabulary mining module 13 judges that there is no new positive or negative word-of-mouth word, and then uses the vocabulary accumulation module 14 to import the accumulated positive and negative word-of-mouth words and narrative sentences into a common vocabulary database. In 17th, it is used as a general vocabulary data, and after being explored through different fields of test data, it will be able to make the general vocabulary database 17 more complete and later on the network. When measuring the product evaluation of an electronic article, the general commodity evaluation term will be used to screen the public opinion on the specific product more quickly and accurately.

接著,本發明實際以美強生這個品牌與奶粉產業的文章進行實施說明,其實施步驟為: Next, the present invention is actually described in the article of Meiqiangsheng, the brand and the milk powder industry, and the implementation steps are as follows:

1.首先以美強生約9000篇文章,挑選出含有口碑的文章約500篇,並進行標註後整理出約300個正負口碑用詞,之後取出約200筆敘述,經訓練後,得到通用預測模型。 1. First, about 9000 articles were selected by Meiqiangsheng, and about 500 articles containing word-of-mouth were selected, and about 300 positive and negative word-of-mouth words were sorted out. Then, about 200 narratives were taken out. After training, a general prediction model was obtained. .

2.取得奶粉產業共30,000篇文章(測試資料),並發掘出奶粉產業領域專有詞彙,因此使用通用預測模型進行第一次預測,得到約100個新詞。 2. A total of 30,000 articles (test data) from the milk powder industry were obtained, and the vocabulary of the milk powder industry was discovered. Therefore, the general prediction model was used for the first prediction, and about 100 new words were obtained.

3.之後再次重新取得口碑述敘句,於原始500篇文章,再加入100個新詞後,得到400個正負口碑用詞(既有的300個正負口碑用詞加上得到的100個新詞),並使用該400個正負口碑用詞,於原始500篇文章中得到約300筆敘述,經訓練後,得到更新的預測模型。 3. After re-acquiring the word-of-mouth narrative, in the original 500 articles, after adding 100 new words, get 400 positive and negative word-of-mouth words (the existing 300 positive and negative word-of-mouth words plus 100 new words) ), and using the 400 positive and negative word-of-mouth terms, about 300 narratives were obtained in the original 500 articles, and after training, an updated prediction model was obtained.

4.使用新的預測模型,再對奶粉業30,000筆的資料進行第二次預測,得到約80個新詞。 4. Using the new forecasting model, a second forecast of 30,000 data in the milk powder industry will result in about 80 new words.

5.原始500篇文章,再加入80個新詞後,得到480個正負口碑用詞,使用480個正負口碑用詞,於原始500篇文章中得到約350筆敘述,經訓練後,得到更新的預測模型。 5. The original 500 articles, after adding 80 new words, get 480 positive and negative word-of-mouth words, using 480 positive and negative word-of-mouth words, get about 350 narrations in the original 500 articles, after training, get updated Forecast model.

6.持續重複進行實施步驟2、3,於第四次測試後,共得200個新詞(100+80+20)。之後再無新的詞,發掘的程序就此結束。 6. Repeat steps 2 and 3, and after the fourth test, a total of 200 new words (100+80+20). After that, there are no new words, and the excavation process ends.

由上述實施步驟可知,本發明使用通用模型對新領域的文章 進行預測(發掘),預測的結果可以發掘到新的詞彙,然而發掘的詞彙並不能立即成為新領域的詞彙,更需要額外經過Rule篩選之後,成為候選的詞彙,可以加入人工的輔助判斷;而新領域詞彙的增加,將會使得口碑敘述句的數量可能性增加,因此套用口碑詞句擷取的技術,將能夠取得更多的敘述句做為訓練資料;另外上述步驟3的正負口碑用詞或是敘述句若有增加,就訓練新的模型,再重複步驟1至3的程序,每重複一次,就能增加領域詞典,直到沒有新增為止,但若是沒有新的敘述句,發掘的程序就終止,合格的詞彙就可以加入共用的字典。而經由上面以美強生這個品牌與奶粉產業的文章進行實施後,所得到初期的準確度可達95%以上,比目前市面上所習用的方法優秀。 As can be seen from the above implementation steps, the present invention uses a generic model for articles in new fields. Predicting (exploration), the predicted results can be used to discover new vocabulary, but the vocabulary that is discovered cannot immediately become a new domain vocabulary, and it needs to be a candidate vocabulary after additional Rule screening, and can be added to artificial auxiliary judgment; The increase of vocabulary in the new field will increase the number of word-of-mouth narration sentences. Therefore, the technique of using word-of-mouth vocabulary will be able to obtain more narrative sentences as training materials. In addition, the positive and negative word-of-mouth words in step 3 above may be used. If there is an increase in the narrative sentence, train the new model, and repeat the procedures in steps 1 to 3. Each time you repeat, you can add the domain dictionary until there is no new one, but if there is no new narrative, the excavation program will Termination, qualified vocabulary can be added to the shared dictionary. After the implementation of the article on the brand and the milk powder industry, the initial accuracy is over 95%, which is better than the methods currently used in the market.

本發明所提供之用於電子化文章之商品評價用詞發掘方法及其系統,與其他習用技術相互比較時,其優點如下: The advantages of the product evaluation word excavation method and system thereof for electronic articles provided by the present invention are compared with other conventional technologies, and the advantages thereof are as follows:

1.本發明能夠經由商品評價用詞的發掘,逐漸建立出一可持續更新的通用詞彙資料庫,以能夠跨領域使用,而經過不同領域的文章重覆測試後,更能夠持續添加新的通用用語,如此將能夠達到持續成長與學習,以使通用詞彙資料庫能夠通用於各種不同的領域或產業使用。 1. The invention can gradually establish a sustainable updated general vocabulary database through the discovery of commodity evaluation words, so as to be able to be used across fields, and after repeated tests of articles in different fields, it is possible to continuously add new universals. The terminology will enable continuous growth and learning so that the general vocabulary database can be used in a variety of different fields or industries.

2.藉由本發明的通用詞彙資料庫,當要對某一個領域進行篩選出網路上輿論對特定商品的評價時,不需要重新搜集口碑用詞,而能夠藉由通用詞彙資料庫進行直接使用或是先以這個產業的文章進行發掘後,即可於這一個新的領域中直接投入使用。 2. With the general vocabulary database of the present invention, when a certain field is to be screened for the evaluation of a specific commodity on the Internet, it is not necessary to re-collect the word-of-mouth word, but can be directly used by the general vocabulary database or After first exploring the article in this industry, it can be put into use directly in this new field.

3.本發明於建立通用預測模型,仍需要人力配合進行操作電腦標 注,但建立通用預測模型後的程序則由系統自己運作模擬,故能夠讓人力投入的成本降到最低。 3. The invention establishes a general predictive model, and still needs human cooperation to operate the computer standard Note, but the program after the establishment of the general predictive model is simulated by the system itself, so the cost of human input can be minimized.

4.本發明所投入的測試資料越龐大、資料量越大,則會對最後用於產業時所出來的結果會越精準。 4. The larger the test data invested by the present invention and the larger the amount of data, the more accurate the results will be when it is finally used in the industry.

藉由以上較佳具體實施例之詳述,係希望能更加清楚描述本發明之特徵與精神,而並非以上述所揭露的較佳具體實施例來對本發明之範疇加以限制。相反地,其目的是希望能涵蓋各種改變及具相等性的安排於本發明所欲申請之專利範圍的範疇內。 The features and spirit of the present invention will be more apparent from the detailed description of the preferred embodiments. On the contrary, the intention is to cover various modifications and equivalents within the scope of the invention as claimed.

Claims (6)

一種用於電子化文章之商品評價用詞發掘方法,其步驟為:先以數筆通用測試資料挑選出數筆含有口碑的文章,並取出數筆正負口碑用詞及數筆口碑敘述句後,再依據正負口碑用詞與口碑敘述句,進行設定並建立出一通用預測模型;之後,再依據通用預測模型將數筆產業領域測試資料進行預測,並判斷及發掘用於產業領域之新的正負口碑用詞;再將既有的正負口碑用詞與新的正負口碑用詞進行累計後,將累計之正負口碑用詞於數筆含有口碑的文章中取出數筆新的口碑敘述句後,再依據累計之正負口碑用詞與新的口碑敘述句進行設定對文字進行標註的規則,以建立出一新的預測模型;以及最後,再依據新的預測模型將數筆產業領域測試資料進行再次預測,並再次進行判斷及發掘新的正負口碑用詞與新的口碑敘述句,之後,重覆進行新的預測模型建立與新的正負口碑用詞及新的口碑敘述句發掘,並於沒有發掘到新的正負口碑用詞後,則結束發掘程序。 A method for excavating a product evaluation word for an electronic article, the steps of which are: first selecting a number of articles containing word of mouth with a plurality of general test data, and taking out a number of positive and negative word-of-mouth words and a plurality of word-of-mouth narration sentences, Then based on the positive and negative word-of-mouth and word-of-mouth narrative sentences, set up and establish a general forecasting model; then, based on the general forecasting model, predict the test data of several industrial fields, and judge and discover new positive and negative effects for the industrial field. Words of word of mouth; then accumulate the existing words of positive and negative word of mouth and the new words of positive and negative word of mouth, then use the accumulated positive and negative word of mouth to extract several new word-of-mouth narratives in several articles containing word-of-mouth, and then According to the accumulated positive and negative word-of-mouth words and the new word-of-mouth narrative sentence, the rules for labeling the characters are set to establish a new prediction model; and finally, the test data of several industrial fields are predicted again according to the new prediction model. And again judge and discover new positive and negative word-of-mouth words and new word-of-mouth narrative sentences, and then repeat new prediction model establishment. After the new words and new positive and negative word of mouth reputation narrative sentence explore, to discover and to no new positive and negative word of mouth words, the end of the excavation process. 如申請專利範圍第1項所述之用於電子化文章之商品評價用詞發掘方法,其中結束發掘程後,則將全部累計之正負口碑用詞導入一通用詞彙資料庫中,以做為共用的詞彙使用。 The scope of the patent application of paragraph 1 for the product of the evaluation terms of electronic article excavation method, wherein after the excavation process, then all of the positive and negative accumulated introducing a common word vocabulary word repository to serve as the common The vocabulary used. 如申請專利範圍第1項所述之用於電子化文章之商品評價用詞發掘方法,其中文章係為網路文章。 The method for excavating a commodity evaluation word for an electronic article as described in claim 1 of the patent application, wherein the article is a web article. 一種用於電子化文章之商品評價用詞發掘系統,係包含:一測試資料輸入模組,係用以輸入測試資料,該測試資料係至少包含數 筆通用測試資料與數筆產業領域測試資料;一通用預測模型建立模組,係包含:一口碑文章挑選模組,係與該測試資料輸入模組相連接,該口碑文章挑選模組用以由數筆通用測試資料中挑選出數筆含有口碑的文章;一口碑用詞挑選模組,係與該口碑文章挑選模組相連接,係能夠於數筆含有口碑的文章整理出數筆正負口碑用詞;一敘述句擷取模組,係與該口碑文章挑選模組及該口碑用詞挑選模組相連接,使用正負口碑用詞於數筆含有口碑的文章中取出數筆口碑敘述句;一通用預測模型訓練模組,係與該口碑用詞挑選模組及該敘述句擷取模組相連接,用以依據正負口碑用詞與口碑敘述句進行設定對文字進行標註的規則,以建立出一通用預測模型;一專有詞彙發掘模組,係與該測試資料輸入模組及該通用預測模型訓練模組相連接,係能夠依據通用預測模型將數筆產業領域測試資料進行預測,並判斷及取出用於產業領域之新的正負口碑用詞;一詞彙累計模組,係與該專有詞彙發掘模組及該口碑用詞挑選模組相連接,用以將既有的正負口碑用詞與新的正負口碑用詞進行累計;一敘述句擷取模組,係與該口碑文章挑選模組及該詞彙累計模組相連接,使用累計之正負口碑用詞於數筆含有口碑的文章中取出數筆新的口碑敘述句;一預測模型訓練模組,係與該詞彙累計模組、敘述句擷取模組及該專有 詞彙發掘模組相連接,用以依據累計之正負口碑用詞與新的口碑敘述句進行設定對文字進行標註的規則,以建立出一新的預測模型,而該專有詞彙發掘模組能夠再依據新的預測模型將數筆產業領域測試資料進行預測,並重覆進行判斷及取出用於產業領域之新的正負口碑用詞。 A product evaluation word excavation system for an electronic article, comprising: a test data input module for inputting test data, the test data is at least including Universal test data and several industry test data; a general predictive model building module, comprising: a tablet article selection module, connected with the test data input module, the word-of-mouth article selection module is used by A number of articles containing word-of-mouth have been selected from several general test materials; a word-for-word selection module is connected to the word-of-mouth article selection module, which is capable of sorting out several positive and negative word-of-mouths in several articles containing word-of-mouth. a narration sentence extraction module, which is connected with the word-of-mouth article selection module and the word-of-mouth word selection module, and uses a positive and negative word-of-mouth word to extract a plurality of word-of-mouth narration sentences in a number of articles containing word-of-mouth; The general predictive model training module is connected with the word-of-mouth word selection module and the narrative sentence capture module, and is configured to set a rule for labeling text according to the positive and negative word-of-mouth word and the word-of-mouth sentence to establish a general predictive model; a proprietary vocabulary mining module coupled to the test data input module and the universal predictive model training module, capable of being based on a general predictive model Forecasting a number of industrial field test data, and judging and extracting new words used in the industrial field; a vocabulary accumulation module is associated with the proprietary vocabulary mining module and the word-of-mouth word selection module The connection is used to accumulate the existing positive and negative word-of-mouth words with the new positive and negative word-of-mouth words; a narrative sentence retrieval module is connected with the word-of-mouth article selection module and the vocabulary accumulation module, and the cumulative use is used. Positive and negative word-of-mouth uses a number of new word-of-mouth narratives in several articles containing word-of-mouth; a predictive model training module, and the vocabulary accumulation module, narrative sentence capture module and the proprietary The vocabulary mining module is connected to set a rule for labeling words based on the accumulated positive and negative word-of-mouth words and a new word-of-mouth narration sentence to establish a new prediction model, and the proprietary vocabulary mining module can According to the new forecasting model, several industrial field test data are predicted, and the new positive and negative word-of-mouth terms used in the industrial field are repeatedly judged and taken out. 如申請專利範圍第4項所述之用於電子化文章之商品評價用詞發掘系統,更包含有一與該詞彙累計模組相連接之通用詞彙資料庫,其中該通用詞彙資料庫中具有共用之正負口碑用詞,而該專有詞彙發掘模組判斷已無新的正負口碑用詞時,則透過該詞彙累計模組將累計之正負口碑用詞導入該通用詞彙資料庫中。 The product evaluation word excavation system for an electronic article described in claim 4 , further comprising a general vocabulary database connected to the vocabulary accumulation module, wherein the common vocabulary database has a common When the proprietary vocabulary mining module judges that there is no new positive or negative word-of-mouth word, the accumulated positive and negative word-of-mouth words are imported into the general vocabulary database through the vocabulary accumulation module. 如申請專利範圍第4項所述之用於電子化文章之商品評價用詞發掘系統,其中文章係為網路文章。 The product evaluation word excavation system for an electronic article as described in claim 4 of the patent application, wherein the article is a web article.
TW102134816A 2013-09-26 2013-09-26 Method of digging product evaluation words in electronic articles and system thereof TW201513013A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW102134816A TW201513013A (en) 2013-09-26 2013-09-26 Method of digging product evaluation words in electronic articles and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW102134816A TW201513013A (en) 2013-09-26 2013-09-26 Method of digging product evaluation words in electronic articles and system thereof

Publications (2)

Publication Number Publication Date
TW201513013A true TW201513013A (en) 2015-04-01
TWI501174B TWI501174B (en) 2015-09-21

Family

ID=53437185

Family Applications (1)

Application Number Title Priority Date Filing Date
TW102134816A TW201513013A (en) 2013-09-26 2013-09-26 Method of digging product evaluation words in electronic articles and system thereof

Country Status (1)

Country Link
TW (1) TW201513013A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI612488B (en) * 2016-12-05 2018-01-21 財團法人資訊工業策進會 Computer device and method for predicting market demand of commodities

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201118619A (en) * 2009-11-30 2011-06-01 Inst Information Industry An opinion term mining method and apparatus thereof
US8352405B2 (en) * 2011-04-21 2013-01-08 Palo Alto Research Center Incorporated Incorporating lexicon knowledge into SVM learning to improve sentiment classification

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI612488B (en) * 2016-12-05 2018-01-21 財團法人資訊工業策進會 Computer device and method for predicting market demand of commodities

Also Published As

Publication number Publication date
TWI501174B (en) 2015-09-21

Similar Documents

Publication Publication Date Title
CN106295796B (en) entity link method based on deep learning
CN109492229B (en) Cross-domain emotion classification method and related device
CN107368468A (en) A kind of generation method and system of O&M knowledge mapping
CN112329467A (en) Address recognition method and device, electronic equipment and storage medium
CN113065003B (en) Knowledge graph generation method based on multiple indexes
CN103336852B (en) Across language ontology construction method and device
CN104866498A (en) Information processing method and device
CN105740404A (en) Label association method and device
CN108875072A (en) File classification method, device, equipment and storage medium
CN105740382A (en) Aspect classification method for short comment texts
CN111428051A (en) Method and system for constructing self-adaptive learning knowledge graph fused with multivariate intelligence
CN111046171B (en) Emotion discrimination method based on fine-grained labeled data
CN110888989A (en) Intelligent learning platform and construction method thereof
CN115204156A (en) Keyword extraction method and device
CN112101029A (en) College instructor recommendation management method based on bert model
Cobos et al. Moods in MOOCs: Analyzing emotions in the content of online courses with edX-CAS
CN109388749A (en) The detection of accurate high-efficiency network public sentiment and method for early warning based on multi-layer geography
Peterlin et al. Automated content analysis: The review of the big data systemic discourse in tourism and hospitality
Addepalli et al. A proposed framework for measuring customer satisfaction and product recommendation for ecommerce
Pan et al. Image2Triplets: A computer vision-based explicit relationship extraction framework for updating construction activity knowledge graphs
Ayala et al. A neural network for semantic labelling of structured information
Raviya et al. An approach for recommender system based on multilevel sentiment analysis using hybrid deep learning models
CN111159400A (en) Product comment emotion classification method and system
TW201513013A (en) Method of digging product evaluation words in electronic articles and system thereof
CN104679492B (en) The computer implemented device and method that technical support is provided

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees