TWI286697B - Chinese opinion retrieval and extraction systems - Google Patents

Chinese opinion retrieval and extraction systems Download PDF

Info

Publication number
TWI286697B
TWI286697B TW94105374A TW94105374A TWI286697B TW I286697 B TWI286697 B TW I286697B TW 94105374 A TW94105374 A TW 94105374A TW 94105374 A TW94105374 A TW 94105374A TW I286697 B TWI286697 B TW I286697B
Authority
TW
Taiwan
Prior art keywords
opinion
vocabulary
emotional
file
database
Prior art date
Application number
TW94105374A
Other languages
Chinese (zh)
Other versions
TW200630827A (en
Inventor
Hsin-Hsi Chen
Lun-Wei Ku
Tung-Ho Wu
Li-Ying Lee
Original Assignee
Hsin-Hsi Chen
Lun-Wei Ku
Tung-Ho Wu
Li-Ying Lee
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hsin-Hsi Chen, Lun-Wei Ku, Tung-Ho Wu, Li-Ying Lee filed Critical Hsin-Hsi Chen
Priority to TW94105374A priority Critical patent/TWI286697B/en
Publication of TW200630827A publication Critical patent/TW200630827A/en
Application granted granted Critical
Publication of TWI286697B publication Critical patent/TWI286697B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention concerns an opinion analysis system. It employs sentiment words to determine the opinion polarity, and detects the major topics of the related events. When users input the opinion analysis requests, the system extracts the positive and the negative opinions, and provides the summarization of the events. For the data containing temporal information, the invention presents the opinion analysis results along the time axis.

Description

1286697 九、發明說明: 【發明所屬之技術領域】 本發明係關於一種運用電腦系統分析意見資訊之技術,尤指利用情緒 詞資料庫進行意見分析,並可提供意見相關事件摘要之技标。1286697 IX. Description of the invention: [Technical field to which the invention pertains] The present invention relates to a technique for analyzing opinions using a computer system, in particular, using an emotional word database for opinion analysis, and providing a technical summary of the relevant event summary.

【先前技術】 ^ 在資訊電子化的社會中,資訊量呈倍數增加,使用者由大量文件中找 尋所需資訊的需求越來越高,目前大部份使用的系統是搜尋引擎,使用者 輸入資訊需求,搜尋引擎提供與資訊需求相關的資料供使用者參考。然而, 傳統搜尋引擎(資訊檢索系統)僅反應與使用者資訊需求相關的資訊,並沒有 進一步反應此相關資訊在表達意見層次上是正面、反面、或中性的信息, 更不能自動分析出造成這些意見傾向的主因。[Prior Art] ^ In an information-based society, the amount of information has increased exponentially, and the demand for users to find the required information from a large number of documents is increasing. Most of the systems currently used are search engines, user input. For information needs, search engines provide information related to information needs for users' reference. However, the traditional search engine (information retrieval system) only reflects the information related to the user's information needs, and does not further reflect that the relevant information is positive, negative, or neutral at the level of expression, and cannot be automatically analyzed. The main reason for these opinions tends.

傳統關於意見層次的調查,非常仰賴行銷人員、市場調查人員、與民 意,查機構’針對特定主取人卫進行魅。般民私不能自、行 任意決定想知道的意見分析主題。在電子化文件越來越容棘得,以及一 娜魏下發表意見的社會鶴巾,_興起了由電腦自 的這不僅可在大量資訊中快速分析意見,提供可靠的意 見參考貝訊’更可讓使用者隨意輸人意見分析需求。 資訊中_ ^麟資訊_純。冑訊娜纟献可在電子 ^1^»* ° 麵需資訊m細,傳統資訊獅%統著重於 具名實朗隱喊掘,並不聽在意見層次上支持 戍支無法提供使时_意_向的參考。 目分析系統’過去是以產品聲譽、產品評論、線上討論等 為意見分析範籌,而且以分耕箪立々杜在0〇βτ姻咪工盯输寻以!,的情緒成份,會 次一=要目的。期盼本發明可推廣至產業ίί ί而根據大眾意見增進公^政策、 本發明的主要目的。期盼本發推 相關設計,提升人_生活福祉Γ 共政茱、地點選擇 產品等之 9 128.6697 呑司彙、,&供思見相關事件的主題資訊。 目=J發明電腦系統可能包括至少下列任一模組: ,用尋找與意見分析需求侧的文件,· = Ϊ 料庫’用以储存與主題相關的文件; ίΐ文庫’用以館存有時序關係的文件; 資料庫’用以餘存經過意見分析後的文件; 料庫,用以儲存有時序關係且經過意見分析後的文件· 謂存與意見分浦求糊的各事件主題資气· 料庫,用以儲存情緒詞及其情緒強度; 區間;、序“引擎,用以將文章依時序分類,其中時序單位可為任—時間 給予參tif1擎’根據騎詞’在文賴巾_目關意見正反資訊,並 文件nts事題辦、事件詞彙與意見的相似程度,摘要出 及音見総4含面情賴、反面騎詞、條情緒詞 意見^分 /5蝌沾立目 ^文』杈供忍見描述不同的強度分數,以反應支持哎 的發^依^盧音^7同。的領f挖掘情緒詞囊集’故具有領域適應 中文環意本發明構造新穎,能提供產業上使用,且可在 中文兄運作良好,故依法申請發明專利。 隹 【實施方式】 如$能讓貴審查委員更瞭解本發明之技術内容,特舉一較佳實施例說明 件資先定一主要實施的主題’並由系統收集至相關文 提二:明之意見分析系統挖掘出相關的正反意見,並 要。請參彻 =以使用者之意見分析需求100為「國民卡」為例: 利用求綱輸入系統,例「國民卡」,接下來系統 文件,缺後放入;件身料庫210及既有文件資料庫220搜尋相關 牛…、後放入相關文件資料庫400。相關文件資料庫400所儲存的文件 I286697 包含以下資訊:文件標題、文件内文、相關意見分析需求、 關性分數、及文件時間,請參考第4圖係本發明關於相關 ^號、相 之示意圖。譬如: _文件-貝料庫400 例:文件標題:國民卡收費無共識議約進展慢 文件内文·國民卡議約進展緩慢,首次發卡收費無 曰、 意見分析需求:國民卡 、…、%仔共識… 文件編號:sourcel_typel—19981023_0251 相關性分數:15.950 文件時間:19981023Traditional surveys on the level of opinions are highly dependent on marketing personnel, market investigators, and public opinion, and the agencies are enchanting the specific masters. The civil and private can't decide on the subject of opinion analysis that you want to know. In the electronic file is getting more and more difficult, and the social crane towel expressed by Yi Nawei, _ arises from the computer. This can not only quickly analyze the opinions in a large amount of information, but also provide reliable opinions. Users can be free to input opinions and analysis needs. Information _ ^ Lin Information _ pure.胄讯娜纟献 can be in the electronic ^1^»* ° face information m fine, the traditional information lion% unified focus on the name of the real screaming, not listening to the level of support _ to the reference. The purpose of the analysis system is to analyze the product reputation, product reviews, online discussions, etc., and to analyze the enthusiasm of the 〇 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在= Purpose. It is hoped that the present invention can be extended to the industry ίίί and the public policy is promoted according to public opinions, and the main purpose of the present invention. I hope that this development will be related to the design, enhance people _ life and welfare, co-government, location selection products, etc. 9 128.6697 呑司汇,, & for the subject information of the relevant events.目=J invention computer system may include at least one of the following modules: , use the file to find and analyze the demand side, · = Ϊ library 'to store the file related to the theme; ΐ ΐ library' for the library with timing The document of the relationship; the database is used to store the documents after the analysis of the opinions; the database is used to store the documents with the time series relationship and after the analysis of the opinions. The library is used to store emotional words and their emotional intensity; interval; the order "engine, to classify articles according to time series, wherein the time unit can be any time-time given tif1 engine' according to the riding word' in the text Opinions on positive and negative information, and the similarity of the document nts, the vocabulary and opinions of the event, the summary and the sounds of the 総4 面 含 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 反^文"杈 for the sake of forbearing to describe different intensity scores, in order to respond to the support of 哎 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ Provided for industrial use and available at Chinese brothers work well, so apply for invention patents according to law. 实施 [Embodiment] If you can let your reviewer know more about the technical content of the present invention, a preferred embodiment is described as a subject of a major implementation. Collected by the system to the relevant article 2: Ming's opinion analysis system unearths relevant positive and negative opinions, and please. Refer to the user's opinion to analyze the demand 100 as the “national card” as an example: For example, the "national card" is followed by the system file, and the file system 210 and the existing document database 220 search for the relevant cattle... and then put it into the relevant file database 400. The file I286697 stored in the related document database 400 contains the following information: file title, file text, relevant opinion analysis requirements, relevance score, and file time. Please refer to FIG. 4 for a schematic diagram of the present invention regarding the relevant number and phase. . For example: _Document-Beiku 400 cases: Document title: National card charges No consensus negotiation progress Slow document text · National card negotiation progress is slow, first card issuance fee is innocent, opinion analysis needs: national card, ..., % Aberdeen consensus... File number: sourcel_typel—19981023_0251 Relevance score: 15.950 File time: 19981023

準備好相關文件資料庫400後,可加上斷詞及詞性標印义 下來進入本發明社要肋纽分析。請先參考第 2見分析系統之示意圖。以下亦以使用者之意見分析需求100為「國民卡; 首先將相關文件資料庫400裡的文件分成兩類,分別為且 間』資訊的文件群及沒有『文件時間』資訊的文件群。具有: ^寻 資訊的文件群經過時序整合引擎500,依照使用者所希望的時序位H 分類後,存入時序文件資料庫600。以相關文件内的文件時間最早 ^3 年1月1日(相關文件資料庫400内的『文件時間』攔位為2〇〇3〇1〇υ, 晚為2003年12月31日(相關文件資料庫400内的『文件時門^I 20031231)為例: 〜』彌m馬 例一 ·以天為時序單位,則分成365個時序文件群放入時序文件資料 例二:以星期為時序單位,則分成52個時序文件群放入時序文件資料 例三:以月為時序單位,則分成Π個時序文件群放入時序文件資料庫 600 0 無論是否有『文件時間』資訊,皆可輸入意見分析引擎7〇〇偵測其意見傾 Μ ° ^見刀析引擎700主要參考情緒詞資料庫11〇〇,從文件中挖掘出相關 正反,見。『情緒詞』為意見表達之重要關鍵字,在情緒詞資料庫11〇〇中 包含意,表達詞、正反情緒詞及其強度分數,請參考第u圖係本發明關 於情緒詞資料庫之示意圖。以下為情緒詞資料庫11〇〇之内容例: 例一、正面情緒詞: 尊气 正面 0·3713 freq=22,score=〇.3426 讚許 正面 〇·5讚 freq=6?score=〇,5〇QQ〇〇 尊正面 freq=5,score=0.400000 重正面 正面 freq=6,score=0.500000 許正面 工。反面-0.875 失反面 freq=37,score=-0.900000 ^ 汉面 fteq=l8, score=-0.850000 ϋ 二= 反面-0.51256 死反面 freq=64,score=-0.925120 freq=Kscore=-0.100000 1286697 例三、一意見表達辭: 表示、指出、提出、強調 重申、主張、反映、聲明 透露 看法· 表達、呼籲、建議、認為After preparing the relevant document database 400, you can add the word breaker and the part of the word mark to enter the analysis of the body of the invention. Please refer to the schematic diagram of the analysis system first. The following is also based on the user's opinion to analyze the demand 100 as a "national card; firstly, the files in the relevant document database 400 are divided into two categories, respectively, the information group of the information and the file group without the "file time" information. : The file group of the search information is stored in the time series file database 600 according to the time series integration H desired by the user, and is stored in the time series file database 600. The file time in the relevant file is the earliest ^3 year January 1 (related The "file time" block in the file database 400 is 2〇〇3〇1〇υ, which is as late as December 31, 2003 (the file time gate ^I 20031231 in the related document database 400) as an example:弥m horse example 1 · Days as the time series unit, then divided into 365 time series file group into the time series file data Example 2: Weekly time unit, then divided into 52 time series file group into the time series file data Example 3: The month is the time series unit, and it is divided into the time series file group into the time series file database 600 0. If there is any "file time" information, you can input the opinion analysis engine 7 to detect its opinions. 700 main reference Emotional word database 11〇〇, mining related positive and negative from the file, see. “Emotional words” is an important keyword for opinion expression, including meaning, expression words, positive and negative emotional words in the emotional word database 11〇〇 And its intensity score, please refer to the u-picture is a schematic diagram of the emotional word database of the present invention. The following is an example of the content of the emotional word database 11: Example 1, positive emotional words: respectfulness front 0·3713 freq=22 ,score=〇.3426 Appreciation positive 〇·5 likes freq=6?score=〇,5〇QQ〇〇尊正freq=5,score=0.400000 重正正正freq=6,score=0.500000 许正工.反面- 0.875 Loss and face freq=37, score=-0.900000 ^ Han fteq=l8, score=-0.850000 ϋ 2 = Negative -0.51256 Dead reverse freq=64, score=-0.925120 freq=Kscore=-0.100000 1286697 Example 3, one opinion Expressing words: expressing, pointing out, proposing, emphasizing, reiterating, advocating, reflecting, declaring opinions, expressing, appealing, suggesting, thinking

情緒詞的分數在實施例中係由分析組成情緒詞的『字』士二統ί方式計算而來。請參考第3圖係本發明關於情緒詞處理弓月I擎之 >瓜程圖。基本詞彙105為常用的基本情緒詞彙,如『快 j ^之 機器學習得來,從基本詞彙105之中的所有正詞 的頻率(以下稱字頻),計算-個字的情緒傾向, 】中我們用子出現 字的正面傾向= 份In the embodiment, the scores of the emotional words are calculated by analyzing the "words" of the emotional words. Please refer to FIG. 3 for the invention of the emotional word processing bow month I engine > melon map. The basic vocabulary 105 is a commonly used basic emotional vocabulary, such as "fast j ^ machine learning, from the frequency of all positive words in the basic vocabulary 105 (hereinafter referred to as the word frequency), calculate - the emotional tendency of the word,] We use the positive tendency of the word appearing in the sub =

----丰現在正面詞彙的字頻/正面字總出現字頻 出現在正面頻/正面字總出 1^^^·囊的字頻 字的反面傾向= ---出現在反面詞彙的字頻反面字總出現字頻 出現在正面詞彙的字頻正面字總&現字娇出現棄的字頻反面字總^:字瀨 ^的總傾向=(字的正面傾向-字的反面傾向)χ信心指數 4吕心指數指此字的字頻在所有字的字頻分佈中所佔的位置。在此實施 例中我們將所有字的字頻3到50的範圍正規化到04的範圍,下例中『尊』 的字頻為5,具有字頻為5的字信心指數落在0.4處,也就是有60%以上 的字之字頻大於5,而有40%以下的字之字頻小於5。 最後’意見詞彙的意見傾向則由組成詞彙的各字總傾向加總後取平均 而得。 以下以意見詞彙『尊重』為例: 尊的正面傾向= 5/2309 = 1 〇 5/2309 + 0/5148 尊的反面傾向= 0/5148 =〇〇 5/2309 + 0/5148 尊的總傾向=(1-0) X 0.4 = 0.4 重的正面傾向= 11/2309 =0 6903 11/2309 + 11/5148 重的反面傾向= 11/5148 = 0.3096 11/2309 + 11/5148 * 12 1286697 重的總傾向=(0·6903·0·3096) x 〇·9 = 0.3426 詞彙『尊重』的意見傾向=(『尊』的總傾向〇·4 +『重』的總傾 向 0.3426)/2 = 0.3713 "----Feng now the vocabulary of the positive vocabulary / positive words always appear in the frequency of the front frequency / positive words total 1 ^ ^ ^ · capsule word frequency of the opposite side of the trend = --- appear in the reverse word vocabulary frequency The reverse word always appears in the word frequency of the positive vocabulary. The word is positive. The word is the opposite of the word frequency. The total word of the word ^: the total tendency of the word = ^ (the positive tendency of the word - the opposite tendency of the word) χ confidence The index 4 Lu Xin index refers to the position of the word frequency of this word in the word frequency distribution of all words. In this embodiment, we normalize the range of word frequencies from 3 to 50 of all words to the range of 04. In the following example, the word frequency of "Zun" is 5, and the word confidence index with a word frequency of 5 falls at 0.4. That is, more than 60% of the words have a word frequency greater than 5, and 40% or less of the words have a frequency less than 5. In the end, the opinion tendencies of the vocabulary are derived from the total tendency of the words constituting the vocabulary. The following is an example of the vocabulary "Respect": The positive tendency of respect = 5/2309 = 1 〇 5/2309 + 0/5148 The opposite tendency of 尊 = 0/5148 = 〇〇 5/2309 + 0/5148 The general tendency of respect =(1-0) X 0.4 = 0.4 Heavy positive tendency = 11/2309 =0 6903 11/2309 + 11/5148 Heavy negative tendency = 11/5148 = 0.3096 11/2309 + 11/5148 * 12 1286697 Heavy The total tendency = (0·6903·0·3096) x 〇·9 = 0.3426 The tendency of the word "respect" = (the total tendency of "respect" 〇 · 4 + "heavy" total tendency 0.3426) / 2 = 0.3713 &quot ;

意見分析引擎700分別分析各時序單位的所有文章(若無時序單位則 分^所有文章,亦即將所有文章當作一個時序單位,在實施g中將沒有時 序單位的文早’『時序單元』攔位值設為〇),請參考第8圖係本發明關於時 序意見文件資料庫800與意見文件資料庫850之示意圖。每一文章的分數 來自文章内正反意見句子的總和,每一句子的分數則由句中挖掘出的情緒 詞分數總和決定。句子意見分數計算元件705會計算每個句子的意見分 數,再交由文件意見分數計算元件710算出文件的總意見分數,請^考第 7圖係本發明關於意見分析引擎700之流程圖。在實施例中,具有^見表 達觸句子為意見句,只有意見句才會計算其情緒詞分數。町實施例以 方框代表意見表達詞,底線代表情緒詞: 例一、『國民卡』的正面意見句·· 〇〇 gg,國民卡專案推廣的預期效益,包括可紐行政紐,如節省現有身分證、健保卡 新領、換發等龐大祿會成本,民眾洽公時不須重複填寫資鮮,此外,對_動電子商務 應用、i^ic卡產業如讀卡機、晶片及卡片製造等商機;也有莫大的助益。 Z面情緒詞:提升、效率、安全、促進、助益。---- 例二、『國民卡』的反面意見句:The opinion analysis engine 700 analyzes all the articles of each sequence unit separately (if there is no sequence unit, then all the articles are divided, and all the articles are regarded as one time series unit, and in the implementation g, there will be no sequence unit of the text early '"sequence unit" block The bit value is set to 〇), please refer to FIG. 8 for a schematic diagram of the time-series opinion file database 800 and the opinion file database 850 of the present invention. The score of each article is the sum of the positive and negative opinions in the article, and the score of each sentence is determined by the sum of the emotional word scores excavated in the sentence. The sentence opinion score calculation component 705 calculates the opinion score of each sentence, and then submits the file opinion score calculation component 710 to calculate the total opinion score of the file. Please refer to Fig. 7 for a flow chart of the opinion analysis engine 700 of the present invention. In the embodiment, the sentence is a comment sentence, and only the opinion sentence calculates the emotional word score. The town's example is represented by a box representing the opinions, and the bottom line is the emotional word: Example 1. The positive opinion of the "National Card" · 〇〇gg, the expected benefits of the promotion of the National Card project, including the New Zealand administrative, such as saving existing The cost of identity card, health insurance card new collar, reissuing, etc., the public will not need to fill in the fresh resources when you contact the public. In addition, the e-commerce application, i^ic card industry such as card reader, chip and card manufacturing Waiting for business opportunities; there are also great benefits. Z-face emotional words: promotion, efficiency, safety, promotion, and help. ---- Example 2: The negative comments of the "National Card":

00學者昨天P 紐政府細=細級「刪瞻_1」,_動「職卡聯盟」00 scholars yesterday P New Zealand government fine = fine "deletion _1", _ move "job card alliance"

反面情緒詞:專制、極權、反對。 -- =若句t有正面情緒詞亦有反面情緒詞,則加總後分數較多者會決定句 子最後的意見傾向,如正面情緒詞總分為115,反面情緒詞總分為3.8, =句子的意見總分為11·5_3·8=7·7,判定為正面;反之若正面情緒詞總分 為3.8,反面情緒詞總分為„·5,則句子的意見總分為3 8_115=_77, 定為反面。 、立意見分析引擎700會完整分析並挖掘出所有文章内的意見,這些經 ,思見分析引¥ 700處理的文章,已經包含了句子層:欠的意見分數盥文件 ,次的意見分數。這些含有意見分數的文件,若有『文件時間』資訊,即 存入時序意見文件資料庫800,依時序單位分群;沒有『文件時間』資訊 =件,則直接存入意見文件資料庫85〇,請參考第8圖係本發明關於時 ,見文件負料庫800與意見文件資料庫85〇之示意圖。從意見文件資料 f 850及時序意見文件資料庫_中的文章,皆可明確看出文鱼音 相關的資訊。譬如: …心兄 13 時序意見文件資料庫850中的資訊 文件層次(第8圖『文章意見總分』攔位) 號:S〇Ureeljypel —19981〇23—〇251 :國民卡灰費無共識議約逼展慢 意見總分:-10反面 夂Negative emotional words: autocracy, totalitarianism, opposition. -- = If the sentence has a positive emotional word and a negative emotional word, then the total score after the total will determine the final opinion tendency of the sentence, such as the positive emotional word total score of 115, the negative emotional word total score of 3.8, = The total score of the sentence is 11·5_3·8=7·7, which is judged as positive; on the other hand, if the positive emotional word is 3.8, and the negative emotional word is „·5, the total score of the sentence is 3 8_115= _77, is the opposite. The opinion analysis engine 700 will completely analyze and dig out the opinions in all the articles. These thoughts, which are analyzed and analyzed, have already included the sentence layer: the owed opinion score 盥 file, The opinion scores. If there are "file time" information, they will be stored in the time series opinion file database 800, grouped according to the time series unit; if there is no "file time" information = part, it will be directly deposited into the opinion file. Database 85, please refer to Figure 8 for the present invention, see the document negative database 800 and the opinion file database 85〇. From the opinion file information f 850 and the time series opinion file database _ article, Can be clearly seen Information related to the squid sound. For example: ...Heart 13 The information file level in the chronological opinion file database 850 (Fig. 8 "Article Opinion Total Score") No.: S〇Ureeljypel —19981〇23—〇251: National Card Gray Charges No Consensus Negotiation Forced Exhibition Slow View Total Points: -10 Reverse

, 窣·· 意絲達辭〕,國民卡禮μ〇·7〇6959)軍人身 證好處(0.468159懷 身 在實_巾雖以具錢見表賴之奸為意D,但亦可考量文 中其他句子所隱含的意見詞彙,來決定意見的傾向。 主題偵測引擎900根據輸入之各種不同意見的相似程度整人出相 要,精簡反應出造成此意見傾向駐@。在實施例中細將正面^⑽ 文件與負面意見的文件分別摘要,顯示產生不同意見的主要事件,請參 考第9圖係本發明關於主題偵測引擎之流程圖。主題偵測引擎9〇〇,:主 題詞彙和事件詞彙挑選計分元件905,選取出文章中的主題詞彙和事件莽 彙。本發明依據主題詞彙及事件詞彙在文件中選取重要的事件代表句, 其中主通5司茱指的是能夠代表所有文件共同主題之詞彙集;事件詞彙貝, 是代表所有文件之共同的主題下所發生之重要子事件。有關主題詞彙輿 事件詞彙的理論及分數計算請參考『Event Tracking based on Domaii Dependency, Fumiyo Fukumoto and Yoshimi Suzuki, SIGIR 2000 pp.57-64』。 以下以主題為『國民卡』的相關文章所找出的主題詞彙與事件詞棄 為例: 例一、 主題詞彙: 疑慮(Na) 33.4515511550498 同意(VK) 30.4936548784942 項目(Na) 41.7396637249344 例二 事件詞彙: 略紋料 策指資 16.919289139043 實施 16.8777314125345 15.0655470437868 研考會 14.1486909212288 14.1082889350082 晶片 13.4815851793844 1286697 以上為主題詞彙、事件詞彙與其相對應分數的示例。包含主 與事件詞囊越多(或總和分數越大)的文章或句子,表示此文章或句子較且 有代表性。代表句挑選元件910抽取具有代表性的句子或文章之柯了 就可提供此主題關於不同意見的相關事件資訊。以下再以主題為:國 卡』時,主題偵測引擎900所顯示出的代表句資訊為例··(由國 相關文件37篇中抽出) K卞主4 例一、正面事件代表句:(第10圖『代表分數』攔位與『代表句』 棚位), 窣··意丝达辞], National Card ceremony μ〇·7〇6959) Military personal advantage (0.468159 in the body _ towel, although the money to see the traitor is D, but can also consider the text The opinion vocabulary implied by other sentences determines the tendency of the opinion. The theme detection engine 900 is based on the degree of similarity of the various opinions input, and the streamlined response causes the opinion to tend to stay at @. In the embodiment The main ^(10) file and the negative opinion file are separately summarized to show the main events that generate different opinions. Please refer to Figure 9 for the flow chart of the subject detection engine of the present invention. The subject detection engine 9〇〇:: topic vocabulary and The event vocabulary selects the scoring component 905, and selects the topic vocabulary and the event sputum in the article. The present invention selects an important event representative sentence in the document according to the topic vocabulary and the event vocabulary, wherein the main channel 5 茱 refers to being able to represent all The vocabulary set of the common theme of the document; the event vocabulary is an important sub-event that occurs under the common theme of all documents. The theory and score calculation of the topic vocabulary 舆 event vocabulary Refer to "Event Tracking based on Domaii Dependency, Fumiyo Fukumoto and Yoshimi Suzuki, SIGIR 2000 pp. 57-64". The following is an example of the topic vocabulary and event words found in the article "National Card": Example 1 , Subject vocabulary: Doubt (Na) 33.4515511550498 Agree (VK) 30.4936548784942 Item (Na) 41.7396637249344 Example 2 Event vocabulary: Slightly stipulated tactics 16.919289139043 Implementation 16.8777314125345 15.0655470437868 Research meeting 14.14486909212288 14.1082889350082 Wafer 13.81415851793844 1286697 The above is the subject vocabulary, event vocabulary and An example of a corresponding score. An article or sentence containing more of the main and event vocabulary (or a larger sum score) indicates that the article or sentence is more representative. The representative sentence selection component 910 extracts a representative sentence or article. Keke can provide relevant event information about different opinions on this topic. The following is the topic: National Card, the representative sentence information displayed by the theme detection engine 900 is an example. Extracted from the middle) K 卞 main 4 cases one, positive event generation :( sentence Figure 10 "represents the fraction" position and stopped "On behalf of the sentence" shed-bit)

10.77 10.12 9.88 王令台:國民卡安全性絕對沒問題 業者:營運不會踰越政府委託範圍 國民卡健保資料層居保護 例一攔位)反面事件代表句:(第1〇圖『代表分數』攔位與『代表句』 11.51 11.39 10.10 國民卡6成4民眾怕洩底 國,卡計晝應懸崖勒馬^底檢討 適法性存疑國民卡政策恐古漏洞10.77 10.12 9.88 Wang Lingtai: The safety of the national card is absolutely no problem. The operator: the operation will not exceed the government's entrusted scope. The national card health insurance information layer protection case is blocked.) The negative event representative sentence: (1st picture "representative score" block and "Representative sentence" 11.51 11.39 10.10 National card 6 into 4 people are afraid of venting the country, the card should be on the brink of the cliffs to review the lawful doubts of the national card policy

這些摘要出的事件主題將會存入文件主題資料庫1〇〇中 ,係本發,於文件主題細之示意圖。而在實施例 ϋΐί 2有句子所擁有的主題詞彙與事件詞彙分數加總後,除 付來°產生文件主題資料庫_後,合併意見文件資料庫 Γ,r ’就可產生意見分析資訊與圖表給使用 要等產生刀析貝訊與圖表屬於一般皆知之呈現技術,在此不再資述。 【發明的效果】 其效^雜本發明可提側於使用者意見分析需求的相關資訊,換言之, (1) (2) 二ϋ使r者意見傾向,如實施例中’當意見分析需求為『國 =』I,系統可告知使用者綜合意見傾向為正面或反面、以及 月ί,亦可告知使用者在某一特定時間區間(如本 月)之意見傾向為正面或反面。 可以提供使用者造成此意見傾向_0,如實施辦,當意見分 15 1286697 ⑶ (4) (5)The abstracted event topics will be stored in the file subject database, which is a summary of the subject matter of the document. In the embodiment ϋΐί 2, after the sentence vocabulary and event vocabulary scores of the sentence are added together, after the file theme database is generated, the document file database _, the merged opinion file database Γ, r ' can generate opinion analysis information and charts It is a well-known presentation technique for the use of the knife and the chart, and will not be described here. [Effects of the Invention] The present invention can provide information related to the needs of the user's opinion analysis, in other words, (1) (2) The second party makes the opinions of the users inclined, as in the embodiment, when the opinion analysis needs "Country =" I, the system can inform the user that the general opinion tends to be positive or negative, and the month, and can also inform the user that the opinion tends to be positive or negative in a certain time interval (such as this month). Can provide users with this opinion tends to _0, such as the implementation office, when the opinion points 15 1286697 (3) (4) (5)

Mi'S,。』_,意見傾向反面的原因之-為『國民卡6 J於習知技術的特徵,為:大:用Mi'S,. 』_, the reason why the opinion tends to be the opposite side - is the characteristic of the National Card 6 J in the conventional technology, which is: Large:

不性說明本發明之原理及其功效,而非用於=本ί明之? 此項“工情緒詞資料庫。任何‘ 改與變化。本發明之權施例作修 【圖式簡單說明】 第1圖係本發_於意見分射、統之示· 第2圖,本發明關於需求處理引擎之流程圖 第3圖係本發明關於情緒詞處理引擎之流程圖 第4圖係本發明關於相關文件資料庫之示意圖 第5圖係本發明關於時序整合引擎之流程圖 第6圖係本發明關於時序文件資料庫之示意圖 第7圖係本發明關於意見分析引擎之流程圖 ^於時序意見文件資料庫與意見文件資料庫之示意圖 第9圖係^發明關於主題偵測引擎之流程圖 第10圖係本發明關於文件主題資料庫之示意圖 第11圖係本發明關於情緒詞資料庫之示意圖 【主要元件符號說明】 意見分析需求100 基本詞彙105 需求處理引擎2〇〇 搜尋引擎205 即時文件資料庫210 既有文件資料庫220 情緒詞處理引擎300 情緒詞學習元件305 情緒詞權重計算元件31〇 相關文件資料庫400 16 1286697 時序整合引擎500 時序分群元件505 時序文件資料庫600 意見分析引擎700 句子意見分數計算元件705 文件意見分數計算元件710 時序意見文件資料庫8〇〇 意見文件資料庫850 主題偵測引擎900What is the purpose of the invention and its efficacy, and not for the use of this? This "work emotion word database. Any 'change and change. The right example of the invention is repaired [simple description of the figure] The first picture is the present issue _ in opinion split, unified display · 2nd picture, this BRIEF DESCRIPTION OF THE DRAWINGS FIG. 3 is a flow chart of an emotional word processing engine of the present invention. FIG. 4 is a schematic diagram of a related document database of the present invention. FIG. 5 is a flow chart of the present invention for a timing integration engine. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 7 is a flow chart of a time series file database. FIG. 7 is a flow chart of the opinion analysis engine of the present invention. FIG. 9 is a schematic diagram of a time series opinion file database and an opinion file database. 10 is a schematic diagram of a document subject database of the present invention. FIG. 11 is a schematic diagram of an emotional word database of the present invention. [Main component symbol description] Opinion analysis requirement 100 basic vocabulary 105 demand processing engine 2 〇〇 search engine 205 Instant file database 210 existing file database 220 emotional word processing engine 300 emotional word learning component 305 emotional word weight computing component 31 〇 related documents Library 400 16 1286697 Timing Integration Engine 500 Timing Grouping Component 505 Timing File Library 600 Opinion Analysis Engine 700 Sentence Opinion Score Calculation Component 705 Document Opinion Score Calculation Element 710 Timing Opinion Document Database 8〇〇 Opinion Document Database 850 Theme Detection Engine 900

主題詞彙和事件詞彙挑選計分元件905 代表句挑選元件910 文件主題資料庫1000 情緒詞資料庫11〇〇 十、申請專利範圍: • 一 供意見資訊之電腦系統,包括下列模組: 一情引擎,係判斷該等情緒詞、並用以計算該等情緒詞之意見傾向, 5形式ίϊ,該情緒處理引擎的輸人為任一中文詞囊,並根據該詞彙 ΐ 各單字出現在已知所有正面詞彙及負面詞彙中的比例’以判斷該 ίϊίίί—情緒詞、—反面情緒詞、或-中立情緒詞,亦即該詞彙之 ί22匕該處理引擎並計算該詞囊對應之_意見分數,亦即該詞彙 緒中該中文詞彙對應之情緒傾向與情緒強度之資訊,儲存於一情 一 蕊5緒存複數辦文詞彙及情緒處理引擎提供其對應之 2· 1!2=第述之電腦系統,更包括: 根據該等情緒詞,以於—文件中找出相關咅見之π次 訊,並滅-參相的文章意見分誠句子纽1^1見之正反身 3· 第I?所述之電腦系統,更包括: ff具有最高主題與意見3加=|==^最多主題與意見詞 件或多個文件中之一個或數個為代表句,以摘要出單-文 4.如申請專利範圍第2項所述之電腦糸#勺姓. 一意見文件資料庫,存經#Uf擎所分析後之文件。 17Topic vocabulary and event vocabulary selection scoring component 905 representative sentence selection component 910 file subject database 1000 emotional word database 11 〇〇 10, the scope of application for patents: • A computer system for information, including the following modules: Is the tendency to judge the emotional words and to calculate the opinions of the emotional words, 5 forms ϊ, the input of the emotional processing engine is any Chinese vocabulary, and according to the vocabulary ΐ each word appears in all known positive vocabulary And the proportion in the negative vocabulary' to judge the ίϊίίί - emotional word, - negative emotional word, or - neutral emotional word, that is, the word 之22 匕 the processing engine and calculate the _ opinion score corresponding to the word capsule, that is, the In the vocabulary, the information about the emotional tendency and emotional intensity of the Chinese vocabulary is stored in a sensation and a core. The vocabulary of the vocabulary and the emotional processing engine provide the corresponding 2·1! 2=the computer system of the description, Including: According to the emotional words, in the - file to find out the relevant 咅 π 讯 ,, and extinction - the article's opinion is divided into sentence sentence new 1 ^ 1 see the rebellious 3 · I? The computer system further includes: ff having the highest subject and opinion 3 plus =|==^ most of the subject and opinion words or one or more of the plurality of files being representative sentences to summarize the single-text 4. For example, the computer file mentioned in item 2 of the scope of patent application is the name of the spoon. The file of the opinion file is stored in the file analyzed by #Uf擎. 17

Claims (1)

1286697 時序整合引擎500 時序分群元件505 時序文件資料庫600 意見分析引擎700 句子意見分數計算元件705 文件意見分數計算元件710 時序意見文件資料庫8〇〇 意見文件資料庫850 主題偵測引擎9001286697 Timing Integration Engine 500 Timing Grouping Component 505 Timing File Repository 600 Opinion Analysis Engine 700 Sentence Opinion Score Calculation Component 705 Document Opinion Score Calculation Element 710 Timing Opinion Document Database 8〇〇 Opinion Document Database 850 Theme Detection Engine 900 主題詞彙和事件詞彙挑選計分元件905 代表句挑選元件910 文件主題資料庫1000 情緒詞資料庫11〇〇 十、申請專利範圍: • 一 供意見資訊之電腦系統,包括下列模組: 一情引擎,係判斷該等情緒詞、並用以計算該等情緒詞之意見傾向, 5形式ίϊ,該情緒處理引擎的輸人為任一中文詞囊,並根據該詞彙 ΐ 各單字出現在已知所有正面詞彙及負面詞彙中的比例’以判斷該 ίϊίίί—情緒詞、—反面情緒詞、或-中立情緒詞,亦即該詞彙之 ί22匕該處理引擎並計算該詞囊對應之_意見分數,亦即該詞彙 緒中該中文詞彙對應之情緒傾向與情緒強度之資訊,儲存於一情 一 蕊5緒存複數辦文詞彙及情緒處理引擎提供其對應之 2· 1!2=第述之電腦系統,更包括: 根據該等情緒詞,以於—文件中找出相關咅見之π次 訊,並滅-參相的文章意見分誠句子纽1^1見之正反身 3· 第I?所述之電腦系統,更包括: ff具有最高主題與意見3加=|==^最多主題與意見詞 件或多個文件中之一個或數個為代表句,以摘要出單-文 4.如申請專利範圍第2項所述之電腦糸#勺姓. 一意見文件資料庫,存經#Uf擎所分析後之文件。 17 1286697 5.如申請專利範圍第2項所述之電腦系統,更包括: 一時序意見文件資料庫,係儲存具有時序關係、並經由該意見分析引擎所分 析後之文件。Topic vocabulary and event vocabulary selection scoring component 905 representative sentence selection component 910 file subject database 1000 emotional word database 11 〇〇 10, the scope of application for patents: • A computer system for information, including the following modules: Is the tendency to judge the emotional words and to calculate the opinions of the emotional words, 5 forms ϊ, the input of the emotional processing engine is any Chinese vocabulary, and according to the vocabulary ΐ each word appears in all known positive vocabulary And the proportion in the negative vocabulary' to judge the ίϊίίί - emotional word, - negative emotional word, or - neutral emotional word, that is, the word 之22 匕 the processing engine and calculate the _ opinion score corresponding to the word capsule, that is, the In the vocabulary, the information about the emotional tendency and emotional intensity of the Chinese vocabulary is stored in a sensation and a core. The vocabulary of the vocabulary and the emotional processing engine provide the corresponding 2·1! 2=the computer system of the description, Including: According to the emotional words, in the - file to find out the relevant 咅 π 讯 ,, and 灭 - 参 相 相 相 相 相 相 相 1 1 1 1 1 1 1 1 The computer system described in the first aspect includes: ff having the highest subject and opinion 3 plus =|==^ most of the subject and opinion words or one or more of the plurality of documents being representative sentences to summarize the bills - 4. The computer 糸# spoon surname as described in item 2 of the patent application scope. A file database of opinions, stored in the file analyzed by #Uf擎. 17 1286697 5. The computer system of claim 2, further comprising: a time-based opinion file database storing files having a timing relationship and analyzed by the opinion analysis engine. 18 1286697 七、指定代表圖·· (一) 本案指定代表圖為:第(1 )圖。 (二) 本代表圖之元件符號簡單說明:18 1286697 VII. Designation of Representative Representatives (1) The representative representative of the case is: (1). (2) A brief description of the symbol of the representative figure: 意見分析需求100 基本詞彙105 需求處理引擎200 情緒詞處理引擎300 相關文件資料庫400 時序整合引擎500 時序文件資料庫600 意見分析引擎700 時序意見文件資料庫800 意見文件資料庫850 主題偵測引擎900 文件主題資料庫1000 情緒詞資料庫1100 八、本案若有化學式時,請揭示最能顯示發明特徵的化 學式ZOpinion analysis requirement 100 basic vocabulary 105 demand processing engine 200 emotional word processing engine 300 related file database 400 timing integration engine 500 time series file database 600 opinion analysis engine 700 time series opinion file database 800 opinion file database 850 theme detection engine 900 Document Subject Database 1000 Emotional Word Database 1100 8. If there is a chemical formula in this case, please reveal the chemical formula Z that best shows the characteristics of the invention. 88
TW94105374A 2005-02-23 2005-02-23 Chinese opinion retrieval and extraction systems TWI286697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW94105374A TWI286697B (en) 2005-02-23 2005-02-23 Chinese opinion retrieval and extraction systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW94105374A TWI286697B (en) 2005-02-23 2005-02-23 Chinese opinion retrieval and extraction systems

Publications (2)

Publication Number Publication Date
TW200630827A TW200630827A (en) 2006-09-01
TWI286697B true TWI286697B (en) 2007-09-11

Family

ID=39459368

Family Applications (1)

Application Number Title Priority Date Filing Date
TW94105374A TWI286697B (en) 2005-02-23 2005-02-23 Chinese opinion retrieval and extraction systems

Country Status (1)

Country Link
TW (1) TWI286697B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI550422B (en) * 2015-04-08 2016-09-21 雲拓科技有限公司 Claim text generalizing method
TWI582627B (en) * 2016-05-13 2017-05-11 國立雲林科技大學 Device and method for analyzing information, application software and computer readable storage medium
TWI639927B (en) 2016-05-27 2018-11-01 雲拓科技有限公司 Method for corresponding element symbols in the specification to the corresponding element terms in claims
TWI598751B (en) 2016-12-05 2017-09-11 雲拓科技有限公司 Automatic claim computerized-translating apparatus
TWI665567B (en) * 2018-09-26 2019-07-11 華碩電腦股份有限公司 Semantic processing method, electronic device, and non-transitory computer readable storage medium
CN110955748B (en) * 2018-09-26 2022-10-28 华硕电脑股份有限公司 Semantic processing method, electronic device and non-transitory computer readable recording medium

Also Published As

Publication number Publication date
TW200630827A (en) 2006-09-01

Similar Documents

Publication Publication Date Title
Pejić Bach et al. Text mining for big data analysis in financial sector: A literature review
Yuldasheva et al. TASKS AND CONTENT OF BIBLIOGRAPHIC WORKS
Alonso et al. Clustering and exploring search results using timeline constructions
Bao et al. Mining social emotions from affective text
Frank et al. Building an Entity-Centric Stream Filtering Test Collection for TREC 2012.
Hu et al. Opinion extraction and summarization on the web
Inmon et al. Business metadata: Capturing enterprise knowledge
KR102075788B1 (en) Healthy content recommendation service system using big datas
US20120036130A1 (en) Systems, methods, software and interfaces for entity extraction and resolution and tagging
Arthur Studying the UK job market during the COVID-19 crisis with online job ads
TWI286697B (en) Chinese opinion retrieval and extraction systems
US20130339364A1 (en) Method and system for automatically identifying related content to an electronic text
Salah et al. Building the classical Arabic named entity recognition corpus (CANERCorpus)
Campos et al. Identifying top relevant dates for implicit time sensitive queries
Ishak et al. Cyclical industries’ stock performance reaction during COVID-19: A systematic
Martin et al. The language and news values of ‘most highly shared’news
Paolillo The awkward semantics of Facebook reactions
Sandhu et al. Applying Opinion Mining to organize web opinions
Kalokyri et al. Integration and exploration of connected personal digital traces
Akinwumi Indexing and abstracting services in libraries: A legal perspective
Fitzgerald et al. Cohesion and Solidarity in Covid-related Addresses to the Nation
Péter Researching (British digital) press archives with new quantitative methods
Zanasi Web mining through the Online Analyst
Al Hasan Baniata et al. Sentiment Analytics: Extraction of Challenging Influencing Factors from COVID-19 Pandemics.
Rossi QUALITATIVE AND QUANTITATIVE APPROACHES IN DIGITAL EPIGRAPHY.

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees