TW201416887A - Methods for sentimental analysis of news text - Google Patents

Methods for sentimental analysis of news text Download PDF

Info

Publication number
TW201416887A
TW201416887A TW101140206A TW101140206A TW201416887A TW 201416887 A TW201416887 A TW 201416887A TW 101140206 A TW101140206 A TW 101140206A TW 101140206 A TW101140206 A TW 101140206A TW 201416887 A TW201416887 A TW 201416887A
Authority
TW
Taiwan
Prior art keywords
vocabulary
emotional
text
clauses
clause
Prior art date
Application number
TW101140206A
Other languages
Chinese (zh)
Other versions
TWI477987B (en
Inventor
Yang-Cheng Lu
Jen-Nan Chen
Sue-Jin Ker
Yu-Chen Wei
Original Assignee
Univ Ming Chuan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Ming Chuan filed Critical Univ Ming Chuan
Priority to TW101140206A priority Critical patent/TWI477987B/en
Priority to CN201310462920.0A priority patent/CN103793371B/en
Publication of TW201416887A publication Critical patent/TW201416887A/en
Application granted granted Critical
Publication of TWI477987B publication Critical patent/TWI477987B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Methods for analyzing a text are provided. The proposed method includes decomposing the text into plural sentences, each of which includes at least one clause, and the at least one clause includes at least one vocabulary, analyzing an attribute of the at least one vocabulary, wherein the attribute is selected from a group consisting of an optimistic vocabulary, a pessimistic vocabulary, a non-sentimental vocabulary and a negative adjective; accumulating each of the attributes of all the vocabularies in each of the clauses to estimate a sentimental tendency of each of the clauses, and accumulating all of the sentimental tendencies of each of the clauses counted according to each of the sentences to calculate a respective entropy value of each of the sentimental tendencies in the text to determine a sentimental tendency of the text.

Description

新聞文本情緒傾向分析方法 News text sentiment orientation analysis method

本發明涉及一種新聞文本情緒傾向分析方法,尤指一種使用一有限狀態自動機(finite state automata)與一熵(entropy)值之新聞文本情緒傾向分析方法。 The invention relates to a method for analyzing emotional sentiment of news texts, in particular to a news text sentiment orientation analysis method using a finite state automata and an entropy value.

財經領域的新聞文本情緒傾向分析的相關研究證實,財經新聞的內容常會影響金融市場的股票價格、交易量,甚至公司未來的營收;因此其具有實際運用上的重要價值。 Relevant research on the analysis of emotional sentiment in the financial field confirms that the content of financial news often affects the stock price, trading volume and even the company's future revenue in the financial market; therefore, it has important value in practical application.

目前有關新聞文本情緒傾向分析的習知技藝中,已存在利用機器學習技術來自動判斷財經新聞的情緒傾向為樂觀或悲觀的技術。惟因該技術尚須經過情緒語言的模型訓練與測試,故必須收集相當數量之歷史資料以作為訓練之用,以及必須先行計算語料的詞彙機率分佈等資料,故其應用上較受限制,應尚有改善空間。 At present, among the conventional techniques for analyzing the emotional tendency of news texts, there are techniques for using machine learning techniques to automatically judge the emotional tendency of financial news as optimistic or pessimistic. However, because the technology still needs to be trained and tested by emotional language models, it is necessary to collect a considerable amount of historical data for training purposes, and must first calculate the vocabulary probability distribution of the corpus, etc., so its application is more limited. There should be room for improvement.

因此,如何進一步改善新聞文本情緒傾向分析的現有技術,以使其無須經過情緒語言的模型訓練與測試,及無須計算語料的詞彙機率分佈,以提高其使用效率,實為一值得進一步探討的議題。 Therefore, how to further improve the existing technology of emotional text analysis of news texts, so that it does not need to go through emotional language model training and testing, and does not need to calculate the lexical probability distribution of corpus to improve its use efficiency, which is worthy of further discussion. issue.

職是之故,發明人鑒於習知技術之缺失,乃思及改良發明之意念,終能發明出本案之「新聞文本情緒傾向分析方法」。 As a result of the job, the inventor, in view of the lack of prior art, thought of and improved the idea of invention, and finally invented the "method of emotional analysis of news texts" in this case.

本案之主要目的在於提出一種新聞文本情緒傾向分析方法,該方法具有不需建立語料的詞彙機率分佈,以分句為單位,經由有限狀態自動機推估分句的情緒傾向,以及整合各分句之情緒傾向,經由熵(entropy)值之計算,推估文本之情緒傾向等特色,具有提高新聞文本情緒傾向分析效率及縮短新聞文本情緒傾向分析方法之相關應用模組的建立時程等優點。 The main purpose of this case is to propose a method for analyzing emotional sentiment of news texts. This method has the vocabulary probability distribution without corpus, and uses the finite state automaton to estimate the emotional tendency of the clauses and integrates the points. The emotional tendency of the sentence, through the calculation of the entropy value, to estimate the emotional tendency of the text, etc., has the advantages of improving the efficiency of the analysis of the emotional tendency of the news text and shortening the establishment time of the related application modules of the emotional analysis method of the news text. .

本案之又一主要目的在於提供一種用於分析一新聞文本之一情緒傾向之方法,包含:提供一情緒詞彙庫、一否定修飾詞彙庫與一有限狀態自動機(finite state automata);對該新聞文本進行一分句分詞處理,以產生複數個句子,其中各該句子包括至少一子句,且各該子句包括至少一詞彙;使用該情緒詞彙庫與該否定修飾詞彙庫對該複數個句子及其各該分句之各該詞彙進行一詞彙比對,以標示各該詞彙為一樂觀詞彙、一悲觀詞彙、一非情緒詞彙或一否定修飾詞彙;依據該詞彙比對之一結果,而將各該詞彙分別轉換為一代表符號;使用該有限狀態自動機與各該代表符號,以推算各該分句之一情緒傾向是屬於一樂觀、一悲觀或一中性;以句子為單位分別加總該新聞文本中各該句子所包含之各該分句之各該情緒傾向後,據以計算經加總之所有句子所對應之各該情緒傾向之一熵(entropy)值;以及 依據該等熵值以決定該新聞文本之該情緒傾向是屬於該樂觀、該悲觀或該中性。 Another main object of the present invention is to provide a method for analyzing an emotional tendency of a news text, comprising: providing an emotional vocabulary library, a negative modified vocabulary library and a finite state automata; the news The text is subjected to a sentence segmentation process to generate a plurality of sentences, wherein each sentence includes at least one clause, and each of the clauses includes at least one vocabulary; the emotional vocabulary and the negative modified vocabulary are used to the plurality of sentences And each vocabulary of each of the clauses is subjected to a lexical comparison to indicate that the vocabulary is an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary or a negatively modified vocabulary; Converting each of the vocabulary into a representative symbol; using the finite state automaton and each of the representative symbols to estimate whether one of the emotional tendencies of the clause belongs to an optimistic, a pessimistic or a neutral; respectively, in terms of sentences After summing up the sentimental tendencies of each of the clauses contained in the sentence in the news text, it is calculated according to the sum of all the sentences of the summed sentences. One tendency of entropy (entropy) value; and The emotional tendency to determine the news text based on the isentropic value belongs to the optimism, the pessimism, or the neutrality.

本案之下一主要目的在於提供一種用於分析一新聞文本之一情緒傾向之方法,包含:提供一情緒詞彙庫、一否定修飾詞彙庫與一有限狀態自動機(finite state automata);對該新聞文本進行一分句分詞處理,以產生複數個句子,其中各該句子包含至少一子句,各該子句包含至少一詞彙;使用該否定修飾詞彙庫與該情緒詞彙庫對該複數個句子進行一詞彙比對,以標示各該詞彙為一樂觀詞彙、一悲觀詞彙、一非情緒詞彙或一否定修飾詞彙;依據各該詞彙比對之一結果,而將各該詞彙分別轉換為一代表符號;以及使用該有限狀態自動機與各該分句之各該詞彙之各該代表符號以推算各該分句之一情緒傾向。 A main purpose of the present invention is to provide a method for analyzing an emotional tendency of a news text, comprising: providing an emotional vocabulary library, a negative modified vocabulary and a finite state automata; the news The text is subjected to a sentence segmentation process to generate a plurality of sentences, wherein each sentence includes at least one clause, each of the clauses includes at least one vocabulary; and the plurality of vocabulary is used by using the negative modified vocabulary and the emotional vocabulary a lexical comparison to indicate that each vocabulary is an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary or a negatively modified vocabulary; and each vocabulary is converted into a representative symbol according to one of the lexical comparisons And using the finite state automaton and each of the representative symbols of each of the vocabulary of the clause to estimate an emotional tendency of each of the clauses.

本案之另一主要目的在於提供一種分析一文本之方法,包含:提供複數詞彙庫及一有限狀態自動機;分析該文本以產生複數個句子,各該句子包含具有至少一詞彙之至少一分句;將該至少一詞彙與該複數詞彙庫進行比對,以標示該至少一詞彙之一屬性及對應該屬性之一代號;使用該有限狀態自動機比對該代號,以推算各該分句之一情緒傾向;以各該句子為單位累加該等情緒傾向而計算出該文本中各該情緒傾向之一熵(entropy)值;以及依據該熵值以決定該文本之一情緒狀態。 Another main object of the present invention is to provide a method for analyzing a text, comprising: providing a complex vocabulary library and a finite state automaton; analyzing the text to generate a plurality of sentences, each sentence comprising at least one clause having at least one vocabulary Comparing the at least one vocabulary with the plural vocabulary to indicate one of the attributes of the at least one vocabulary and one of the corresponding attributes; using the finite state automaton to compare the code to calculate each of the clauses An emotional tendency; calculating an entropy value of each of the emotional tendencies in the text by accumulating the emotional tendencies in units of the sentence; and determining an emotional state of the text according to the entropy value.

本案之再一主要目的在於提供一種分析一文本之方法,包含:拆解該文本成複數句子,各該句子包括至少一分句,且各該分句包括至少一詞彙;分析該至少一詞彙之一屬性,其中該屬性係選自由一樂觀詞彙、一悲觀詞彙、一非情緒詞彙及一否定修飾詞彙所組成之群組其中之一;累計各該分句中之所有詞彙之各該屬性,以推算各該分句之一情緒傾向;以及以各該句子為單位累加各該分句之該等情緒傾向而計算出該文本中各該情緒傾向之一熵(entropy)值,以決定該文本之一情緒傾向。 A further main object of the present invention is to provide a method for analyzing a text, comprising: disassembling the text into plural sentences, each sentence including at least one clause, and each clause includes at least one vocabulary; analyzing the at least one vocabulary An attribute, wherein the attribute is selected from the group consisting of an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary, and a negatively modified vocabulary; accumulating each of the vocabulary in each of the clauses to Calculating one of the emotional tendencies of each of the clauses; and accumulating the entropy values of each of the emotional tendencies in the text by accumulating the emotional tendencies of the clauses in each of the sentences to determine the text An emotional tendency.

為了讓本發明之上述目的、特徵、和優點能更明顯易懂,下文特舉較佳實施例,並配合所附圖式,作詳細說明如下: The above described objects, features, and advantages of the present invention will become more apparent and understood.

第一圖是顯示一依據本發明構想之較佳實施例之用於分析一新聞文本之一情緒傾向之方法的流程圖。在第一圖中顯示有一情緒詞彙庫1、一否定修飾詞彙庫2、以及一有限狀態自動機3。如第一圖所示,該用於分析一新聞文本之一情緒傾向之方法,包括下列之步驟:(S1)提供一新聞文本;(S2)進行分句與分詞處理;(S3)轉換分句詞彙為情緒符號;(S4)透過有限狀態自動機決定各分句之一情緒傾向;(S5)輸出分句情緒傾向;(S6)以句子為單位累計各分句情緒傾向;(S7)計算文本之情緒傾向熵(entropy)值;以 及(S8)推估文本之情緒傾向。 The first figure is a flow chart showing a method for analyzing an emotional tendency of a news text in accordance with a preferred embodiment of the present invention. In the first figure, there is an emotional vocabulary 1, a negative modified vocabulary 2, and a finite state automaton 3. As shown in the first figure, the method for analyzing an emotional tendency of a news text includes the following steps: (S1) providing a news text; (S2) performing clause and word segmentation; (S3) converting a clause The vocabulary is an emotional symbol; (S4) determines the emotional tendency of each clause through a finite state automaton; (S5) outputs the emotional tendency of the clause; (S6) accumulates the emotional tendency of each clause in units of sentences; (S7) calculates the text Emotional entropy value; And (S8) estimate the emotional tendency of the text.

如第一圖所示之該情緒詞彙庫1包括複數個樂觀詞彙與複數個悲觀詞彙,例如: The emotional vocabulary 1 as shown in the first figure includes a plurality of optimistic vocabularies and a plurality of pessimistic words, for example:

◆情緒詞彙庫 ◆Emotional Vocabulary

另,如第一圖所示之否定修飾詞彙庫2包括複數個否定修飾詞彙,例如: In addition, the negative modified vocabulary 2 as shown in the first figure includes a plurality of negative modified vocabularies, for example:

◆否定修飾詞彙庫 ◆Negative modified vocabulary

如前所述,依據本發明構想所提出之用於分析新聞文本情緒傾向之方法,具有以下之特色:1.不需建立語料的詞彙機率分佈;2.以分句為單位,經由有限狀態自動機推估分句的情緒傾向;以及整合各分句之情緒傾向,經由熵(entropy)值之計算,推估文本之情緒傾向。 As described above, the method for analyzing the emotional tendency of a news text according to the concept of the present invention has the following characteristics: 1. no need to establish a lexical probability distribution of the corpus; 2. a sentence-based unit, via a finite state The automaton estimates the emotional tendency of the clause; and integrates the emotional tendency of each clause, and estimates the emotional tendency of the text through the calculation of the entropy value.

此外,如第一圖所示之有限狀態自動機3,其各該分句之一情緒傾向的決定,是依據下表而進行一狀態之轉換,最左側行中所示為各該分句的各個目前狀態,最上側之列中所示為輸入之下一詞彙的代號,而表中各狀態為為各該分句的下一狀態。其中,S0表示其情緒傾向為樂觀,S1表示其情緒傾向為悲觀,S2表示其情緒傾向為中性。 In addition, as shown in the first figure, the finite state automaton 3, the decision of one of the clauses of the emotional tendency is to perform a state transition according to the following table, and the leftmost row shows the clauses. For each current state, the uppermost column shows the code for the next vocabulary, and the states in the table are the next state for each clause. Among them, S 0 indicates that his emotional tendency is optimistic, S 1 indicates that his emotional tendency is pessimistic, and S 2 indicates that his emotional tendency is neutral.

1.輸入符號說明:+:表樂觀詞彙;-:表悲觀詞彙;~:表否定修飾詞彙;?:表非情緒詞彙。 1. Input symbol description: +: optimistic vocabulary; -: pessimistic vocabulary; ~: negative vocabulary; : Table is not emotional vocabulary.

2.各該分句之一情緒傾向的起始狀態為S22. The starting state of one of the emotional tendencies of each of the clauses is S 2 .

3.當最終狀態為S0表輸入之分句其情緒傾向屬樂觀;當最終狀態為S1表輸入之分句其情緒傾向屬悲觀;當最終狀態為S2表輸入之分句其情緒傾向屬中性。 3. When the final state is the S 0 form input clause, the emotional tendency is optimistic; when the final state is the S 1 form input clause, the emotional tendency is pessimistic; when the final state is the S 2 form input clause, its emotional tendency Neutral.

4.該有限狀態自動機之上述矩陣值,係經由觀察隨機產出之文本所獲得規則。 4. The above matrix value of the finite state automaton is obtained by observing the randomly generated text.

以本發明所提出之用於分析新聞文本情緒傾向方法來分析一新聞的兩個範例,分別列示如下。本發明所提出之用於分析新聞文本情緒傾向的方法,使用一則情緒新聞為樣本範例,透過斷詞系統將語料斷詞,經由上述如第一圖所示之該有限狀態自動機3的概念運算,最後應用熵(entropy)值計算該新聞文本之情緒傾向的運算過程如下: Two examples of analyzing a news article by analyzing the emotional tendency method of the news text proposed by the present invention are listed below. The method for analyzing the emotional tendency of a news text proposed by the present invention uses an emotional news as a sample paradigm to break a corpus through a word breaking system, via the concept of the finite state automaton 3 as shown in the first figure. The operation, and finally applying the entropy value to calculate the emotional tendency of the news text is as follows:

◆情緒傾向之判定可分為下列幾個步驟: ◆ The judgment of emotional tendency can be divided into the following steps:

1.建立特徵詞彙資料庫。 1. Establish a feature vocabulary database.

2.利用「有限狀態自動機」,將詞彙情緒狀態轉換為分句的情緒狀態。 2. Use the "finite state automaton" to convert the lexical emotional state into the emotional state of the clause.

3統計每一子句中各分句的情緒分類,藉由熵(entropy)值計算出這三類的統計量。 3 Calculate the emotional classification of each clause in each clause, and calculate the statistics of these three types by entropy values.

一、新聞文本偏負面情緒的範例(以下文本已經過斷詞處理) 1. Examples of negative emotions in news texts (the following text has been processed by word breaking)

下則新聞含標題共有4個句子,每個句子之分句數分別為1、4、5、4,如(表1)說明:各分句詞彙內容依據情緒詞彙庫與否定修飾詞彙庫再轉換為情緒符號,透過有限狀態自動機輸出分句情緒傾向,如(表2)說明:依據(表2),以句子為單位累計各分句情緒傾向,再藉由entropy推估該篇文章之情緒傾向,詳細說明如下述(表3): There are 4 sentences in the headline, and the number of sentences in each sentence is 1, 4, 5, and 4 respectively. For example, (Table 1): The vocabulary content of each clause is converted according to the emotional vocabulary and the negative modified vocabulary. For emotional symbols, the emotional tendency of the clauses is output through a finite state automaton, as shown in (Table 2): according to (Table 2), the emotional tendency of each clause is accumulated in units of sentences, and then the emotion of the article is estimated by entropy. The tendency is detailed as follows (Table 3):

熵值權重計算步驟 Entropy weight calculation step

◆步驟一:正規化矩陣表中各分句情緒傾向Xij的接近程度dij◆ Step 1: The degree of closeness d ij of the emotional tendency X ij of each clause in the normalized matrix table.

◆步驟二:將dij轉化成發生機率Pij◆Step 2: Convert d ij into probability of occurrence P ij .

◆步驟三:由Pij計算各準則之熵值ej◆Step 3: Calculate the entropy value e j of each criterion from P ij .

其中m為句子,n為情緒傾向,情緒傾向包括正向(+)、負向(-)與無法判斷(?)。 Where m is a sentence, n is an emotional tendency, and emotional tendencies include positive (+), negative (-), and undecidable (?).

評估值=(熵+-熵-)/(熵++熵-)=(0.4591-0.7595)/(0.4591+0.7595)=-0.2465 Evaluation value = (entropy + - entropy - ) / (entropy + + entropy - ) = (0.4591 - 0.7595) / (0.4591 + 0.7595) = -0.2465

情緒門檻設定,情緒傾向門檻可依使用者自行設定:例如,樂觀門檻值=0.1,若情緒傾向≧0.1,則判斷為樂觀新聞;例如,悲觀門檻值=-0.1,若情緒傾向≦-0.1,則判斷為判斷為悲觀新聞。因為上述新聞文本之評估值=-0.2465≦-0.1,故上述之新聞文本,經判斷其情緒傾向為悲觀,亦即其為一悲觀新聞。 The emotional threshold is set, and the threshold of emotional preference can be set by the user: for example, the optimistic threshold = 0.1, if the emotional tendency is 0.1, it is judged as optimistic news; for example, the pessimistic threshold = -0.1, if the emotional tendency is ≦-0.1, Then it is judged to be pessimistic news. Because the above-mentioned news text has an evaluation value of -0.2465≦-0.1, the above-mentioned news text is judged to be pessimistic, that is, it is a pessimistic news.

經熵(entropy)值之運算後,可推估新聞文本之情緒傾向,除財經新聞外,其他新聞文本,例如政治新聞或國際新聞,其情緒樣本判斷邏輯皆與上述財經新聞相同,故往後本發明所提出之此一情緒傾向分析方法,將可運用在推估大量文本之情緒傾向上。 After the entropy value calculation, the emotional tendency of the news text can be estimated. Except for financial news, other news texts, such as political news or international news, have the same emotional sample judgment logic as the above financial news, so The method of analyzing the emotional tendency proposed by the present invention can be applied to the emotional tendency of estimating a large amount of text.

二、新聞文本偏正面情緒的範例(以下文本已經過斷詞處理) Second, the news text biased positive examples (the following text has been processed by word breaking)

上則新聞含標題共有7個句子,每個句子之分句數分別為1、3、6、7、2、2、4,如(表11)說明: The above news contains a total of 7 sentences, and the number of clauses in each sentence is 1, 3, 6, 7, 2, 2, 4, as shown in (Table 11):

各分句詞彙內容依據情緒詞彙庫與否定修飾詞彙庫再轉換為情緒符號,透過有限狀態自動機輸出分句情緒傾向,如(表12)說明: The vocabulary content of each clause is converted into emotional symbols based on the emotional vocabulary and the negative modified vocabulary, and the emotional tendency of the clauses is output through the finite state automaton, as shown in (Table 12):

依據(表12),以句子為單位累計各分句情緒傾向,再藉由熵推估該篇文章之情緒傾向,詳細說明如下述: According to (Table 12), the emotional tendency of each clause is accumulated in units of sentences, and then the entropy of the article is estimated by entropy. The details are as follows:

熵值權重計算步驟 Entropy weight calculation step

◆步驟一:正規化矩陣表中各分句情緒傾向Xij的接近程度dij◆ Step 1: The degree of closeness d ij of the emotional tendency X ij of each clause in the normalized matrix table.

◆步驟二:將dij轉化成發生機率Pij◆Step 2: Convert d ij into probability of occurrence P ij .

◆步驟三:由Pij計算各準則之熵值ej◆Step 3: Calculate the entropy value e j of each criterion from P ij .

其中,m為句子,n為情緒傾向,情緒傾向包括正向(+)、負向(-)與無法判斷(?)。 Among them, m is a sentence, n is an emotional tendency, and emotional tendency includes positive (+), negative (-), and unrecognizable (?).

評估值=(熵+-熵-)/(熵++熵-)=(0.9010-0.4360)/(0.9010+0.4360)=0.35 Evaluation value = (entropy + - entropy - ) / (entropy + + entropy - ) = (0.9010 - 0.4360) / (0.9010 + 0.4360) = 0.35

因為上述新聞文本之評估值=0.35≧0.1,故上述之新聞文本,經判斷其情緒傾向為樂觀,亦即其為一樂觀新聞。 Because the evaluation value of the above news text is = 0.35 ≧ 0.1, the above news text is judged to be optimistic, that is, it is an optimistic news.

三、依據本發明構想所提出之用於分析新聞文本情緒傾向之方法的正確率實證: 3. Proof of correctness of the method for analyzing emotional sentiment of news texts in accordance with the teachings of the present invention:

(一)、正確情緒傾向判別(A), the correct emotional tendency to distinguish

由五位人工判別情緒新聞30則,採用多數決制,決定出「正確情緒傾向」,結果如(表21)所示: The five people manually judged the emotional news 30, using the majority decision system to determine the "correct emotional tendency", the results are as shown in (Table 21):

1.「新聞3」中四位認為此篇文章情緒傾向為正,一位為負。經由多數決可判定,此文章正確情緒傾向為正。 1. Four of the "News 3" think that the emotional tendency of this article is positive and one is negative. Judging by majority decision, this article has a correct emotional tendency to be positive.

2.「新聞8」中四位認為此篇文章情緒傾向為正,一位為負。經由多數決可判定,此文章正確情緒傾向為正。 2. Four of the "News 8" believe that the emotional tendency of this article is positive and one is negative. Judging by majority decision, this article has a correct emotional tendency to be positive.

3.「新聞28」中一位認為此篇文章情緒傾向為正,四位為負。經由多數決可判定,此文章正確情緒傾向為負。 3. One of the "News 28" believes that the emotional tendency of this article is positive and the four are negative. Judging by majority decision, this article has a correct emotional tendency to be negative.

(二)、人工判斷正確率(B), manual judgment of the correct rate

隨機抽樣出五位人員判斷,當其中意見不相符合時,判斷為人工判別錯誤,樣本包含30則新聞,其中「新聞3」、「新聞8」與「新聞28」,共三則之人工判斷結果並不一致,因此(30-3)/30=0.9,可求出人工判斷正確率為90%,平均花費時間為18.6分鐘。 A random sample of five people judged that when the opinions were not consistent, it was judged as a manual discriminant error. The sample contained 30 news items, including "News 3", "News 8" and "News 28". The results are not consistent, so (30-3)/30=0.9, the correct rate of manual judgment can be found to be 90%, and the average time taken is 18.6 minutes.

(三)、機器判斷正確率(3), the machine judges the correct rate

經由機器判斷結果,與正確情緒傾向加以判斷,結果如(表22)所示: 由上述分析與結果可知,當樣本包含30則新聞時,經人工判讀之正確率為90%,平均花費時間為18.6分鐘。而使用依據本發明構想所提出之用於分析新聞文本情緒傾向之方法,由機器判斷的正 確率則為83.3%,所花費之時間則僅需5.1秒,故使用本發明所提出之方法,可驗證由機器判斷文本之情緒傾向,確實具有相對較高之正確率與花費相對較短之時間,因此可透過本發明之分析文本之情緒傾向的方法,由機器先行推估新聞文本之情緒傾向,再由人工檢驗其正確性,將可大量減少所投入的人力與時間,且品質之一致性亦可獲得控制。故本發明所提出之用於分析新聞文本情緒傾向之方法確實具有其優點。 The results are judged by the machine and judged with the correct emotional tendency. The results are as shown in (Table 22): From the above analysis and results, when the sample contains 30 news items, the correct rate of manual interpretation is 90%, and the average time taken is 18.6 minutes. However, using the method for analyzing the emotional tendency of the news text according to the concept of the present invention, the correct rate determined by the machine is 83.3%, and the time taken is only 5.1 seconds, so the method proposed by the present invention can be used. Verifying that the emotional tendency of the text is judged by the machine does have a relatively high accuracy rate and a relatively short time. Therefore, the emotional tendency of the news text can be estimated by the machine first by the method of analyzing the emotional tendency of the text according to the present invention. By manually verifying the correctness, the manpower and time invested can be greatly reduced, and the consistency of quality can be controlled. Therefore, the method proposed by the present invention for analyzing the emotional tendency of news texts does have its advantages.

實施例: Example:

1.一種用於分析一新聞文本之一情緒傾向之方法,包含:提供一情緒詞彙庫、一否定修飾詞彙庫與一有限狀態自動機(finite state automata);對該新聞文本進行一分句分詞處理,以產生複數個句子,其中各該句子包括至少一子句,且各該子句包括至少一詞彙;使用該情緒詞彙庫與該否定修飾詞彙庫對該複數個句子及其各該分句之各該詞彙進行一詞彙比對,以標示各該詞彙為一樂觀詞彙、一悲觀詞彙、一非情緒詞彙或一否定修飾詞彙;依據該詞彙比對之一結果,而將各該詞彙分別轉換為一代表符號;使用該有限狀態自動機與各該代表符號,以推算各該分句之一情緒傾向是屬於一樂觀、一悲觀或一中性;以句子為單位分別加總該新聞文本中各該句子所包含之各該分句之各該情緒傾向後,據以計算經加總之所有句子所對應之各該情緒傾向之一熵(entropy)值;以及 依據該等熵值以決定該新聞文本之該情緒傾向是屬於該樂觀、該悲觀或該中性。 A method for analyzing an emotional tendency of a news text, comprising: providing an emotional vocabulary library, a negative modified vocabulary library and a finite state automata; and performing a clause segmentation of the news text Processing to generate a plurality of sentences, wherein each sentence includes at least one clause, and each of the clauses includes at least one vocabulary; using the emotional vocabulary and the negative modified vocabulary for the plurality of sentences and their respective clauses Each vocabulary is subjected to a lexical comparison to indicate that each vocabulary is an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary or a negatively modified vocabulary; and each vocabulary is converted according to one of the lexical comparisons a representative symbol; using the finite state automaton and each of the representative symbols to estimate that one of the clauses has an emotional tendency of belonging to an optimistic, a pessimistic or a neutral; summing the news texts in units of sentences After each of the emotional tendencies of each of the clauses included in the sentence, an entropy value of each of the emotional tendencies corresponding to all the sentences added is calculated; The emotional tendency to determine the news text based on the isentropic value belongs to the optimism, the pessimism, or the neutrality.

2.根據實施例1所述之方法,其中各該分句之該情緒傾向的一判定過程是自各該分句之一目前狀態經加入各該分句之一下一詞彙後,由該有限狀態自動機據以轉換至一下一狀態;而在進一步加入另一下一詞彙前,該下一狀態又取代該原有之目前狀態而成為該目前狀態,如此循環運作,直至所有之各該分句均被判定完畢;當各該分句之該目前狀態為該樂觀,而各該分句之該下一詞彙分別為該樂觀詞彙、該悲觀詞彙、該否定修飾詞彙或該非情緒詞彙時,則加入該下一詞彙後,各該分句之該下一狀態分別成為該樂觀、該悲觀、該悲觀或該樂觀;當各該分句之該目前狀態為該悲觀,而各該分句之該下一詞彙分別為該樂觀詞彙、該悲觀詞彙、該否定修飾詞彙或該非情緒詞彙時,則加入該下一詞彙後,各該分句之該下一狀態分別成為該悲觀、該悲觀、該樂觀或該悲觀;當各該分句之該目前狀態為該中性,而各該分句之該下一詞彙分別為該樂觀詞彙、該悲觀詞彙、該否定修飾詞彙或該非情緒詞彙時,則加入該下一詞彙後,各該分句之該下一狀態分別成為該樂觀、該悲觀、該悲觀或該中性;各該分句之該情緒傾向之一起始狀態為該中性,當某一特定分句之一最終狀態為該樂觀時,表示該特定分句 的該情緒傾向為該樂觀;當該特定分句之該最終狀態為該悲觀時,表示該特定分句的該情緒傾向為該悲觀;而當該特定分句之該最終狀態為該中性時,表示該特定分句的該情緒傾向為該中性。 2. The method according to embodiment 1, wherein the determining process of the emotional tendency of each of the clauses is automatically performed by the finite state after the current state of one of the clauses is added to the next vocabulary of each of the clauses. The machine is switched to the next state; and before further adding another next vocabulary, the next state replaces the original current state and becomes the current state, so that the cycle is operated until all the clauses are The determination is completed; when the current state of each clause is the optimistic, and the next vocabulary of each clause is the optimistic vocabulary, the pessimistic vocabulary, the negative modified vocabulary or the non-emotional vocabulary, then the loyalty is added After a vocabulary, the next state of each clause becomes the optimism, the pessimism, the pessimism or the optimism; when the current state of each clause is the pessimistic, and the next vocabulary of each clause When the optimistic vocabulary, the pessimistic vocabulary, the negative modified vocabulary or the non-emotional vocabulary respectively are added to the next vocabulary, the next state of each of the clauses becomes the pessimism, the pessimism, the optimism or the When the current state of each clause is the neutral, and the next vocabulary of each clause is the optimistic vocabulary, the pessimistic vocabulary, the negative modified vocabulary or the non-emotional vocabulary, then After a vocabulary, the next state of each of the clauses becomes the optimism, the pessimism, the pessimism or the neutrality; the starting state of the emotional tendency of each clause is the neutrality, when a certain score One of the final states of the sentence is the optimistic, indicating the specific clause The emotional tendency is the optimism; when the final state of the particular clause is the pessimistic, the emotional tendency of the particular clause is the pessimistic; and when the final state of the particular clause is the neutral , indicating that the sentiment of the particular clause tends to be neutral.

3.根據實施例1或2所述之方法,其中該以句子為單位分別加總步驟更包含下列之步驟: 3. The method according to embodiment 1 or 2, wherein the step of adding the totals in sentences further comprises the following steps:

對各該情緒傾向在該新聞文本內之一出現頻率作正規化處理,並將正規化後之該出現頻率轉化成一發生機率pij;由pij計算各該情緒傾向之一熵值 Normalizing the frequency of occurrence of each of the emotional tendencies in the news text, and converting the normalized frequency of occurrence to a probability of occurrence p ij ; calculating one of the entropy values of each of the emotional tendencies by p ij

其中k=1/ln(m),i=1,2,3,...,m,m表示該複數個句子之一總數,j=1,2,3,...,n,n表示各該情緒傾向之一總數;以及求算一評估值=(熵值+-熵值-)/(熵值++熵值-) 式(2) Where k=1/ln(m), i=1, 2,3,...,m,m represents the total number of one of the plural sentences, j=1,2,3,...,n,n represents One of each of these emotional tendencies; and an evaluation value = (entropy + entropy - ) / (entropy + + entropy - ) (2)

其中熵值+為當該情緒傾向為樂觀時之熵值,熵值-為當該情緒傾向為悲觀時之熵值,當該評估值大於一第一門檻值時,該新聞文本之情緒傾向為該樂觀,而當該評估值小於一第二門檻值時,該新聞文本之情緒傾向為該悲觀。 + Which the entropy of entropy value is when the mood is optimistic tendency, entropy - entropy value of is when the mood is pessimistic tendency, when the assessment value is greater than a first threshold value, the emotional tendency of the news text is This optimism, and when the evaluation value is less than a second threshold, the sentiment of the news text tends to be pessimistic.

4.根據實施例1-3所述之方法,其中該第一門檻值為一正實數值,而該第二門檻值為一負實數值。 4. The method of embodiment 1-3, wherein the first threshold value is a positive real value and the second threshold value is a negative real value.

5.根據實施例1-4所述之方法,其中該新聞文本是選自一財經新聞、一政治新聞與一國際新聞及其組合所組成群組的其中之一,該新聞文本中之各該句子是以一句號 與其他句子彼此分隔,而各該句子中之每一分句是以一逗號或一分號與該句子之其他分句彼此分隔。 5. The method of embodiments 1-4, wherein the news text is one selected from the group consisting of a financial news, a political news, an international news, and a combination thereof, each of the news texts The sentence is a period Separate from other sentences, and each clause in each sentence is separated from each other by a comma or a semicolon.

6.一種用於分析一新聞文本之一情緒傾向之方法,包含:提供一情緒詞彙庫、一否定修飾詞彙庫與一有限狀態自動機(finite state automata);對該新聞文本進行一分句分詞處理,以產生複數個句子,其中各該句子包含至少一子句,各該子句包含至少一詞彙;使用該否定修飾詞彙庫與該情緒詞彙庫對該複數個句子進行一詞彙比對,以標示各該詞彙為一樂觀詞彙、一悲觀詞彙、一非情緒詞彙或一否定修飾詞彙;依據各該詞彙比對之一結果,而將各該詞彙分別轉換為一代表符號;以及使用該有限狀態自動機與各該分句之各該詞彙之各該代表符號以推算各該分句之一情緒傾向。 6. A method for analyzing an emotional tendency of a news text, comprising: providing an emotional vocabulary library, a negative modified vocabulary library and a finite state automata; and performing a clause segmentation of the news text Processing to generate a plurality of sentences, wherein each sentence includes at least one clause, each of the clauses including at least one vocabulary; using the negative modified vocabulary and the emotional vocabulary to perform a lexical comparison on the plurality of sentences, Marking each vocabulary as an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary or a negatively modified vocabulary; converting each vocabulary into a representative symbol according to one of the lexical comparisons; and using the finite state The automaton and each of the representative symbols of each of the vocabulary of the clauses are used to estimate an emotional tendency of each of the clauses.

7.根據實施例6所述之方法,更包含:經加總該新聞文本中各該句子所包含之各該分句之各該情緒傾向後,計算該新聞文本中各該情緒傾向之一熵(entropy)值,以決定該新聞文本該情緒傾向是屬於一樂觀、一悲觀或一中性。 7. The method according to embodiment 6, further comprising: calculating the entropy of each of the emotional tendencies in the news text after summing the emotional tendencies of each of the clauses included in the sentence in the news text. The value of (entropy) to determine the emotional tendency of the news text is optimistic, pessimistic or neutral.

8.一種分析一文本之方法,包含:提供複數詞彙庫及一有限狀態自動機;分析該文本以產生複數個句子,各該句子包含具有至少一詞彙之至少一分句;將該至少一詞彙與該複數詞彙庫進行比對,以標示該至少一詞彙之一屬 性及對應該屬性之一代號;使用該有限狀態自動機比對該代號,以推算各該分句之一情緒傾向;以各該句子為單位累加該等情緒傾向而計算出該文本中各該情緒傾向之一熵(entropy)值;以及依據該熵值以決定該文本之一情緒狀態。 8. A method of analyzing a text, comprising: providing a complex vocabulary library and a finite state automaton; analyzing the text to generate a plurality of sentences, each sentence comprising at least one clause having at least one vocabulary; the at least one vocabulary Comparing with the plural vocabulary to indicate one of the at least one vocabulary One of the attributes and the corresponding attribute; use the finite state automaton to compare the code to estimate the emotional tendency of each of the clauses; calculate the emotional tendency by adding the emotional tendency to each sentence, and calculate the An entropy value of an emotional tendency; and determining an emotional state of the text based on the entropy value.

9.根據實施例8所述之方法,其中該文本為一新聞聞本,該複數詞彙庫包括一情緒詞彙庫與一否定修飾詞彙庫,該情緒詞彙庫包括複數個樂觀詞彙與複數個悲觀詞彙,該否定修飾詞彙庫包括複數個否定修飾詞彙,而該文本之各該情緒傾向是屬於一樂觀、一悲觀或一中性。 9. The method of embodiment 8, wherein the text is a news vocabulary, the complex vocabulary comprising an emotional vocabulary and a negative modified vocabulary, the emotional vocabulary comprising a plurality of optimistic vocabulary and a plurality of pessimistic vocabulary The negative modified vocabulary includes a plurality of negative modified vocabulary, and each of the emotional tendencies of the text belongs to an optimistic, a pessimistic or a neutral.

10.一種分析一文本之方法,包含:拆解該文本成複數句子,各該句子包括至少一分句,且各該至少一分句包括至少一詞彙;分析該至少一詞彙之一屬性,其中該屬性係選自由一樂觀詞彙、一悲觀詞彙、一非情緒詞彙及一否定修飾詞彙所組成之群組其中之一;累計各該分句中之所有詞彙之各該屬性,以推算各該分句之一情緒傾向;以及以各該句子為單位累加各該分句之該等情緒傾向而計算出該文本中各該情緒傾向之一熵(entropy)值,以決定該文本之一情緒傾向。 10. A method of analyzing a text, comprising: disassembling the text into a plurality of sentences, each of the sentences including at least one clause, and each of the at least one clause includes at least one vocabulary; analyzing one attribute of the at least one vocabulary, wherein The attribute is selected from the group consisting of an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary, and a negatively modified vocabulary; each attribute of each vocabulary in the clause is accumulated to calculate each of the points. One of the sentences is emotionally inclined; and the entropy values of one of the emotional tendencies in the text are calculated by accumulating the emotional tendencies of the respective clauses in units of the sentence to determine an emotional tendency of the text.

綜上所述,本發明在於提供一種新聞文本情緒傾向分析方法,該方法具有不需建立語料的詞彙機率分佈,以分句為單位,經由有限狀態自動機推估分句的情緒傾向,以及整合各分 句之情緒傾向,經由熵(entropy)值之計算,推估文本之情緒傾向等特色,具有提高新聞文本情緒傾向分析效率及縮短新聞文本情緒傾向分析方法之相關應用模組的建立時程等優點,故其確實具有進步性與新穎性。 In summary, the present invention provides a method for analyzing emotional sentiment of a news text, which has a vocabulary probability distribution that does not need to establish a corpus, and uses a finite state automaton to estimate the emotional tendency of the clause, in units of clauses, and Integrate each score The emotional tendency of the sentence, through the calculation of the entropy value, to estimate the emotional tendency of the text, etc., has the advantages of improving the efficiency of the analysis of the emotional tendency of the news text and shortening the establishment time of the related application modules of the emotional analysis method of the news text. Therefore, it is indeed progressive and novel.

是以,縱使本案已由上述之實施例所詳細敘述而可由熟悉本技藝之人士任施匠思而為諸般修飾,然皆不脫如附申請專利範圍所欲保護者。 Therefore, even though the present invention has been described in detail by the above-described embodiments, it can be modified by those skilled in the art, and is not intended to be protected as claimed.

1‧‧‧情緒詞彙庫 1‧‧‧Emotional vocabulary

2‧‧‧否定修飾詞彙庫 2‧‧‧Negative modified vocabulary

3‧‧‧有限狀態自動機 3‧‧‧Limited state automaton

第一圖:其係顯示一依據本發明構想之較佳實施例之用於分析一新聞文本之一情緒傾向之方法的流程圖。 First Figure: A flowchart showing a method for analyzing an emotional tendency of a news text in accordance with a preferred embodiment of the present invention.

1‧‧‧情緒詞彙庫 1‧‧‧Emotional vocabulary

2‧‧‧否定修飾詞彙庫 2‧‧‧Negative modified vocabulary

3‧‧‧有限狀態自動機 3‧‧‧Limited state automaton

Claims (10)

一種用於分析一新聞文本之一情緒傾向之方法,包含:提供一情緒詞彙庫、一否定修飾詞彙庫與一有限狀態自動機(finite state automata);對該新聞文本進行一分句分詞處理,以產生複數個句子,其中各該句子包括至少一子句,且各該子句包括至少一詞彙;使用該情緒詞彙庫與該否定修飾詞彙庫對該複數個句子及其各該分句之各該詞彙進行一詞彙比對,以標示各該詞彙為一樂觀詞彙、一悲觀詞彙、一非情緒詞彙或一否定修飾詞彙;依據該詞彙比對之一結果,而將各該詞彙分別轉換為一代表符號;使用該有限狀態自動機與各該代表符號,以推算各該分句之一情緒傾向是屬於一樂觀、一悲觀或一中性;以句子為單位分別加總該新聞文本中各該句子所包含之各該分句之各該情緒傾向後,據以計算經加總之所有句子所對應之各該情緒傾向之一熵(entropy)值;以及依據該等熵值以決定該新聞文本之該情緒傾向是屬於該樂觀、該悲觀或該中性。 A method for analyzing an emotional tendency of a news text, comprising: providing an emotional vocabulary library, a negative modified vocabulary library and a finite state automata; and performing a sentence segmentation process on the news text, Generating a plurality of sentences, wherein each of the sentences includes at least one clause, and each of the clauses includes at least one vocabulary; using the emotional vocabulary and the negative modified vocabulary for the plurality of sentences and each of the clauses thereof The vocabulary performs a lexical comparison to indicate that each vocabulary is an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary or a negatively modified vocabulary; and each vocabulary is converted into one according to one of the lexical comparisons a representative symbol; using the finite state automaton and each of the representative symbols to estimate that one of the clauses has an emotional tendency of belonging to an optimistic, a pessimistic or a neutral; respectively, adding the total of the news text in units of sentences After each emotional tendency of each of the clauses included in the sentence, an entropy value of each of the emotional tendencies corresponding to all the sentences added is calculated; According to these entropy values to determine the text of the news belonging to the emotional tendency is optimistic or pessimistic that the neutral. 如申請專利範圍第1項所述之方法,其中各該分句之該情緒傾向的一判定過程是自各該分句之一目前狀態經加入各該分句之一下一詞彙後,由該有限狀態自動機據以轉換至一下一狀態;而在進一步加入另一下一詞彙前,該下一狀態又取代該原有之目前狀態而成為該目前狀 態,如此循環運作,直至所有之各該分句均被判定完畢;當各該分句之該目前狀態為該樂觀,而各該分句之該下一詞彙分別為該樂觀詞彙、該悲觀詞彙、該否定修飾詞彙或該非情緒詞彙時,則加入該下一詞彙後,各該分句之該下一狀態分別成為該樂觀、該悲觀、該悲觀或該樂觀;當各該分句之該目前狀態為該悲觀,而各該分句之該下一詞彙分別為該樂觀詞彙、該悲觀詞彙、該否定修飾詞彙或該非情緒詞彙時,則加入該下一詞彙後,各該分句之該下一狀態分別成為該悲觀、該悲觀、該樂觀或該悲觀;當各該分句之該目前狀態為該中性,而各該分句之該下一詞彙分別為該樂觀詞彙、該悲觀詞彙、該否定修飾詞彙或該非情緒詞彙時,則加入該下一詞彙後,各該分句之該下一狀態分別成為該樂觀、該悲觀、該悲觀或該中性;各該分句之該情緒傾向之一起始狀態為該中性,當某一特定分句之一最終狀態為該樂觀時,表示該特定分句的該情緒傾向為該樂觀;當該特定分句之該最終狀態為該悲觀時,表示該特定分句的該情緒傾向為該悲觀;而當該特定分句之該最終狀態為該中性時,表示該特定分句的該情緒傾向為該中性。 The method of claim 1, wherein the determining process of the emotional tendency of each of the clauses is from the finite state after one of the clauses is added to the next vocabulary of each of the clauses. The automaton is switched to the next state; and before further adding another next vocabulary, the next state replaces the original current state and becomes the current state State, so cyclical operation until all the clauses are judged; when the current state of each clause is optimistic, and the next vocabulary of each clause is the optimistic vocabulary, the pessimistic vocabulary When the negative modified vocabulary or the non-emotional vocabulary is added to the next vocabulary, the next state of each of the clauses becomes the optimism, the pessimism, the pessimism or the optimism; The state is the pessimistic, and when the next vocabulary of the clause is the optimistic vocabulary, the pessimistic vocabulary, the negative modified vocabulary or the non-emotional vocabulary, then the next vocabulary is added, and the clauses are respectively a state becomes the pessimism, the pessimism, the optimism or the pessimism; when the current state of each clause is the neutral, and the next vocabulary of each clause is the optimistic vocabulary, the pessimistic vocabulary, When the negative modified vocabulary or the non-emotional vocabulary is added to the next vocabulary, the next state of each of the clauses becomes the optimism, the pessimism, the pessimism or the neutrality respectively; the emotional tendency of each clause One of the beginning The state is neutral, when one of the specific clauses has a final state of the optimism, indicating that the sentiment of the particular clause tends to be the optimism; when the final state of the particular clause is the pessimistic, indicating that The sentiment of the particular clause is pessimistic; and when the final state of the particular clause is the neutral, the sentiment indicating the particular clause is the neutral. 如申請專利範圍第1項所述之方法,其中該以句子為單位分別加總步驟更包含下列之步驟:對各該情緒傾向在該新聞文本內之一出現頻率作正規化處理,並將正規化後 之該出現頻率轉化成一發生機率pij;由pij計算各該情緒傾向之一熵值 其中k=1/ln(m),i=1,2,3,...,m,m表示該複數個句子之一總數,j=1,2,3,...,n,n表示各該情緒傾向之一總數;以及求算一評估值=(熵值+-熵值-)/(熵值++熵值-) 式(2)其中熵值+為當該情緒傾向為樂觀時之熵值,熵值-為當該情緒傾向為悲觀時之熵值,當該評估值大於一第一門檻值時,該新聞文本之情緒傾向為該樂觀,而當該評估值小於一第二門檻值時,該新聞文本之情緒傾向為該悲觀。 The method of claim 1, wherein the step of summing up the sentences further comprises the steps of: normalizing the frequency of occurrence of each of the emotional trends in the news text, and The frequency of occurrence after conversion is converted into a probability of occurrence pij; one of the entropy values of each of the emotional tendencies is calculated by p ij Where k=1/ln(m), i=1, 2,3,...,m,m represents the total number of one of the plural sentences, j=1,2,3,...,n,n represents a total of one of each of the emotional tendencies; and an evaluation value = (entropy + entropy - ) / (entropy + + entropy - ) (2) where entropy + is when the sentiment tends to be optimistic The entropy value, the entropy value is the entropy value when the emotion tends to be pessimistic. When the evaluation value is greater than a first threshold, the emotional tendency of the news text is the optimism, and when the evaluation value is less than a second When the threshold is devalued, the sentiment of the news text tends to be pessimistic. 如申請專利範圍第3項所述之方法,其中該第一門檻值為一正實數值,而該第二門檻值為一負實數值。 The method of claim 3, wherein the first threshold is a positive real value and the second threshold is a negative real value. 如申請專利範圍第1項所述之方法,其中該新聞文本是選自一財經新聞、一政治新聞與一國際新聞及其組合所組成群組的其中之一,該新聞文本中之各該句子是以一句號與其他句子彼此分隔,而各該句子中之每一分句是以一逗號或一分號與該句子之其他分句彼此分隔。 The method of claim 1, wherein the news text is one selected from the group consisting of a financial news, a political news, an international news, and a combination thereof, and the sentence in the news text. A sentence is separated from other sentences, and each clause in the sentence is separated from the other clauses of the sentence by a comma or a semicolon. 一種用於分析一新聞文本之一情緒傾向之方法,包含:提供一情緒詞彙庫、一否定修飾詞彙庫與一有限狀態自動機(finite state automata);對該新聞文本進行一分句分詞處理,以產生複數個句子,其中各該句子包含至少一子句,各該子句包含至少一詞彙;使用該否定修飾詞彙 庫與該情緒詞彙庫對該複數個句子進行一詞彙比對,以標示各該詞彙為一樂觀詞彙、一悲觀詞彙、一非情緒詞彙或一否定修飾詞彙;依據各該詞彙比對之一結果,而將各該詞彙分別轉換為一代表符號;以及使用該有限狀態自動機與各該分句之各該詞彙之各該代表符號以推算各該分句之一情緒傾向。 A method for analyzing an emotional tendency of a news text, comprising: providing an emotional vocabulary library, a negative modified vocabulary library and a finite state automata; and performing a sentence segmentation process on the news text, Generating a plurality of sentences, wherein each of the sentences includes at least one clause, each of the clauses including at least one vocabulary; using the negative vocabulary The library and the emotional vocabulary perform a lexical comparison on the plurality of sentences to indicate that each vocabulary is an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary or a negative modified vocabulary; and one of the vocabulary comparison results And converting each of the words into a representative symbol; and using the finite state automaton and each of the representative symbols of each of the terms of the clause to estimate an emotional tendency of each of the clauses. 如申請專利範圍第6項所述之方法,更包含:經加總該新聞文本中各該句子所包含之各該分句之各該情緒傾向後,計算該新聞文本中各該情緒傾向之一熵(entropy)值,以決定該新聞文本之該情緒傾向是屬於一樂觀、一悲觀或一中性。 The method of claim 6, further comprising: calculating one of the emotional tendencies in the news text after summing up the emotional tendency of each of the clauses included in the sentence in the news text The entropy value is used to determine the emotional tendency of the news text to be optimistic, pessimistic or neutral. 一種分析一文本之方法,包含:提供複數詞彙庫及一有限狀態自動機;分析該文本以產生複數個句子,各該句子包含具有至少一詞彙之至少一分句;將該至少一詞彙與該複數詞彙庫進行比對,以標示該至少一詞彙之一屬性及對應該屬性之一代號;使用該有限狀態自動機比對該代號,以推算各該分句之一情緒傾向;以各該句子為單位累加該等情緒傾向而計算出該文本中各該情緒傾向之一熵(entropy)值;以及依據該熵值以決定該文本之一情緒狀態。 A method for analyzing a text, comprising: providing a complex vocabulary library and a finite state automaton; analyzing the text to generate a plurality of sentences, each sentence comprising at least one clause having at least one vocabulary; The plurality of vocabulary libraries are compared to indicate one of the attributes of the at least one vocabulary and one of the corresponding attributes; the finite state automaton is used to compare the code to estimate an emotional tendency of each of the clauses; An entropy value of each of the emotional tendencies in the text is calculated for the unit to accumulate the emotional tendencies; and an emotional state of the text is determined according to the entropy value. 如申請專利範圍第8項所述之方法,其中該文本為一新聞聞本,該複數詞彙庫包括一情緒詞彙庫與一否定修飾 詞彙庫,該情緒詞彙庫包括複數個樂觀詞彙與複數個悲觀詞彙,該否定修飾詞彙庫包括複數個否定修飾詞彙,而該文本之各該情緒傾向是屬於一樂觀、一悲觀或一中性。 The method of claim 8, wherein the text is a news article, the plural vocabulary includes an emotional vocabulary and a negative modification The vocabulary library includes a plurality of optimistic vocabulary and a plurality of pessimistic vocabulary. The negative modified vocabulary includes a plurality of negative modified vocabulary, and each of the emotional tendencies of the text belongs to an optimistic, a pessimistic or a neutral. 一種分析一文本之方法,包含:拆解該文本成複數句子,各該句子包括至少一分句,且各該至少一分句包括至少一詞彙;分析該至少一詞彙之一屬性,其中該屬性係選自由一樂觀詞彙、一悲觀詞彙、一非情緒詞彙及一否定修飾詞彙所組成之群組其中之一;累計各該分句中之所有詞彙之各該屬性,以推算各該分句之一情緒傾向;以及以各該句子為單位累加各該分句之該等情緒傾向而計算出該文本中各該情緒傾向之一熵(entropy)值,以決定該文本之一情緒傾向。 A method for analyzing a text, comprising: disassembling the text into plural sentences, each sentence includes at least one clause, and each of the at least one clause includes at least one vocabulary; analyzing one attribute of the at least one vocabulary, wherein the attribute Or one of a group consisting of an optimistic vocabulary, a pessimistic vocabulary, a non-emotional vocabulary, and a negatively modified vocabulary; accumulating each of the vocabulary in each of the clauses to calculate each of the clauses An emotional tendency; and an entropy value of each of the emotional tendencies in the text is calculated by accumulating the emotional tendencies of the respective clauses in units of the sentence to determine an emotional tendency of the text.
TW101140206A 2012-10-30 2012-10-30 Methods for sentimental analysis of news text TWI477987B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
TW101140206A TWI477987B (en) 2012-10-30 2012-10-30 Methods for sentimental analysis of news text
CN201310462920.0A CN103793371B (en) 2012-10-30 2013-09-30 News text emotional tendency analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW101140206A TWI477987B (en) 2012-10-30 2012-10-30 Methods for sentimental analysis of news text

Publications (2)

Publication Number Publication Date
TW201416887A true TW201416887A (en) 2014-05-01
TWI477987B TWI477987B (en) 2015-03-21

Family

ID=50669057

Family Applications (1)

Application Number Title Priority Date Filing Date
TW101140206A TWI477987B (en) 2012-10-30 2012-10-30 Methods for sentimental analysis of news text

Country Status (2)

Country Link
CN (1) CN103793371B (en)
TW (1) TWI477987B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504032A (en) * 2014-12-12 2015-04-08 北京智谷睿拓技术服务有限公司 Method and equipment for providing service upon user emotion tendencies
CN104503959A (en) * 2014-12-12 2015-04-08 北京智谷睿拓技术服务有限公司 Method and equipment for predicting user emotion tendency
TWI661319B (en) * 2017-11-30 2019-06-01 財團法人資訊工業策進會 Apparatus, method, and computer program product thereof for generatiing control instructions based on text

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016085409A1 (en) * 2014-11-24 2016-06-02 Agency For Science, Technology And Research A method and system for sentiment classification and emotion classification
CN105893582B (en) * 2016-04-01 2019-06-28 深圳市未来媒体技术研究院 A kind of social network user mood method of discrimination
TWI587156B (en) * 2016-07-25 2017-06-11 元智大學 System and method for evaluating the rating of overall text

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
US7996210B2 (en) * 2007-04-24 2011-08-09 The Research Foundation Of The State University Of New York Large-scale sentiment analysis
CN101408883B (en) * 2008-11-24 2010-09-01 电子科技大学 Method for collecting network public feelings viewpoint
CN101639824A (en) * 2009-08-27 2010-02-03 北京理工大学 Text filtering method based on emotional orientation analysis against malicious information
TW201137632A (en) * 2010-04-22 2011-11-01 Univ Nat Taiwan Document analyzing system and document analyzing method thereof in reader and writer emotion analysis
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504032A (en) * 2014-12-12 2015-04-08 北京智谷睿拓技术服务有限公司 Method and equipment for providing service upon user emotion tendencies
CN104503959A (en) * 2014-12-12 2015-04-08 北京智谷睿拓技术服务有限公司 Method and equipment for predicting user emotion tendency
CN104504032B (en) * 2014-12-12 2019-03-01 北京智谷睿拓技术服务有限公司 The method and apparatus for being inclined to the service of offer based on user feeling
TWI661319B (en) * 2017-11-30 2019-06-01 財團法人資訊工業策進會 Apparatus, method, and computer program product thereof for generatiing control instructions based on text
US10460731B2 (en) 2017-11-30 2019-10-29 Institute For Information Industry Apparatus, method, and non-transitory computer readable storage medium thereof for generating control instructions based on text

Also Published As

Publication number Publication date
TWI477987B (en) 2015-03-21
CN103793371B (en) 2016-06-01
CN103793371A (en) 2014-05-14

Similar Documents

Publication Publication Date Title
Desai et al. Techniques for sentiment analysis of Twitter data: A comprehensive survey
CN107977798B (en) Risk assessment method for quality of electronic commerce product
TWI477987B (en) Methods for sentimental analysis of news text
CN108388660B (en) Improved E-commerce product pain point analysis method
Yussupova et al. Models and methods for quality management based on artificial intelligence applications
CN108038725A (en) A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN105022825A (en) Financial variety price prediction method capable of combining financial news mining and financial historical data
CN112966082B (en) Audio quality inspection method, device, equipment and storage medium
CN102929860B (en) Chinese clause emotion polarity distinguishing method based on context
CN110188047A (en) A kind of repeated defects report detection method based on binary channels convolutional neural networks
CN110674296B (en) Information abstract extraction method and system based on key words
CN109101551A (en) A kind of construction method and device of question and answer knowledge base
CN111309900A (en) Legal class similarity judging and pushing method
KR20170009692A (en) Stock fluctuatiion prediction method and server
CN109063983A (en) A kind of natural calamity loss real time evaluating method based on social media data
Jagadeesan et al. Twitter Sentiment Analysis with Machine Learning
CN107480126B (en) Intelligent identification method for engineering material category
Singh et al. Twitter sentiment analysis for stock prediction
Siregar Application of the Naive Bayes classifier method in the sentiment analysis of Twitter user about the capital city relocation
CN110263344B (en) Text emotion analysis method, device and equipment based on hybrid model
Sun et al. Information fusion in automatic user satisfaction analysis in call center
Hajare et al. A machine learning pipeline to examine political bias with congressional speeches
CN111209375B (en) Universal clause and document matching method
CN112307157B (en) Complaint opinion mining method and device
Narmadha et al. Recognizing eminent players from the Indian Premier League using CNN model