TW201021024A - Method for classifying speech emotion and method for establishing emotional semantic model thereof - Google Patents

Method for classifying speech emotion and method for establishing emotional semantic model thereof Download PDF

Info

Publication number
TW201021024A
TW201021024A TW97144755A TW97144755A TW201021024A TW 201021024 A TW201021024 A TW 201021024A TW 97144755 A TW97144755 A TW 97144755A TW 97144755 A TW97144755 A TW 97144755A TW 201021024 A TW201021024 A TW 201021024A
Authority
TW
Taiwan
Prior art keywords
semantic
emotional
rhythm
attribute
prosody
Prior art date
Application number
TW97144755A
Other languages
Chinese (zh)
Other versions
TWI389100B (en
Inventor
Chung-Hsien Wu
Wei-Chuan Lee
Red-Tom Lin
Chin-Shun Hsu
Chia-Te Chu
Original Assignee
Inst Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inst Information Industry filed Critical Inst Information Industry
Priority to TW97144755A priority Critical patent/TWI389100B/en
Publication of TW201021024A publication Critical patent/TW201021024A/en
Application granted granted Critical
Publication of TWI389100B publication Critical patent/TWI389100B/en

Links

Abstract

A method for classifying a speech emotion and a method for establishing an emotional semantic model thereof are provided. First, the emotional semantic model is established by the speech signal with a semantic attribute and a prosodic attribute in an emotional corpus. Next, the semantic attribute and the prosodic attribute of each test word of a test speech signal are extracted. Afterward, the semantic attribute and the prosodic attribute of each test word are inputted to the emotional semantic model so as to classify the test speech signal to a corresponding emotional category.

Description

201021024 ----29579twfdoc/n 六、發明說明: 【發明所屬之技街領域】 本發明是有關於一種情緒辨識方法,且特別是有關於 -種結合語意與韻律之語音情緒的分類方法及其情緒語意 模型的建立方法。 【先前技術】 ❹ 近幾年,由於科技的日新月異,人們與智慧型電子裝 置之間的溝賴式,已料是過去以指令輸人至電子裝 置,而電子裝置再以文字回應的方式所能滿足。因此,未 來人類與智慧型電子褒置之間的人機介面也將透過最自然 且方便的溝通媒介「語音」來進行控制。而為了使人機介 面系統更為多樣性與人性化,許多學者、廠商則莫不開始 著手研究情緒的辨識。 以客服系統而言,目前利用電視以及網路來進行購物 力使用者越來越普遍。當產品發生故障時,使用者大都皆 ❿會打電話至客服中心詢問。倘若客服系統能夠辨識出使用 者目前的情緒狀態,客服人員便能夠趁早來安撫使用者的 情緒。並且’客服人員亦可依據所辨識出之使用者的情緒 來判斷=身是否能夠解決,而決定是否將電話轉接給資深 客服人員來進行安撫、動作。如此一來,便能夠解決許多不 必要的衝突產生。據此,如何提高情緒辨識的正確率,亦 為目如研究的重要一環。 29579twf.d〇c/r 201021024 【發明内容】 本發明提供一種情緒語意模型的建 屬性與韻律屬性來建構情緒語意模型。次。b思 本發明提供一種語音情緒的分類方八 意屬性並結合韻律屬性,藉以提高分类負的正ς率〜的語 本發明提出-種情緒語意模型的建立方法 =情::吾料庫,其包括分屬於多個情绪類別的多個語; 律屬性。其中,語意屬性是查詢詞棄夂 律屬性是由各語音訊號十擷取所得。之 :二而韻 音,的語意屬性與韻律屬性,建立由各語 在本發明之-實施例中,上述建 驟,包括依據語意屬性與韻律屬性,將型的步 換為語意韻律向量。再將這些語意韻律=二s分別轉 模型,以建立情緒語意翻。 ° 1 斯混合 :,爾換各語音訊號為語意韻律向量的步驟? 2 „語意屬性與韻律屬性,獲得—語意韻律記 者,藉由語意韻律記錄來探勘一情緒規則。之你、 據上述情緒規則,將語意韻律記_換為語意韻律向:依 在本發明之-實施例中,上述獲得語意韻 2 ,括:依據語意屬性,判斷各詞語是否為語意^的= 卜語意標籤是依據詞囊知識庫所定義而得4 = /、 居意標Μ,結合此簡的語意顧與其所對應的^律= 4 201021024 29579twf.doc/n 詞語中是否包括情緒特徵詞,以結合情緒特 的韻律屬性為-特徵集合,而將特徵集合記 S己錄。 ~t 實〜例中,情绪語意模型的建立方法更201021024 ----29579twfdoc/n VI. Description of the invention: [Technical street field to which the invention belongs] The present invention relates to an emotion recognition method, and in particular to a classification method for a combination of semantic meaning and rhythm of voice emotion and The method of establishing emotional semantic models. [Prior Art] ❹ In recent years, due to the rapid development of technology, the gap between people and smart electronic devices has been expected to be input to electronic devices in the past, and electronic devices can respond in words. Satisfy. Therefore, the human-machine interface between humans and smart electronic devices will be controlled through the most natural and convenient communication medium “speech”. In order to make the human-machine interface system more diverse and humanized, many scholars and manufacturers are not beginning to study the identification of emotions. In terms of customer service systems, it is increasingly common to use television and the Internet to conduct shopping. When the product fails, most users will call the customer service center to ask. If the customer service system can identify the user's current emotional state, the customer service staff can help the user's emotions early. And the 'customer service staff can also judge whether the body can solve the problem according to the identified user's emotions, and decide whether to transfer the call to the senior customer service staff to appease and act. In this way, many unnecessary conflicts can be resolved. According to this, how to improve the correct rate of emotional recognition is also an important part of the research. 29579twf.d〇c/r 201021024 SUMMARY OF THE INVENTION The present invention provides an attribute and prosody attribute of an emotional semantic model to construct an emotional semantic model. Times. The present invention provides a categorical attribute of a phonetic emotion and a combination of a prosody attribute, thereby improving the negative rate of the classification. The present invention proposes a method for establishing an emotional semantic model. Includes multiple words that belong to multiple emotional categories; legal attributes. Among them, the semantic attribute is the query word discarding law attribute is obtained by each voice signal ten. In the embodiment of the present invention, the above-described construction includes changing the type of the step into a semantic rhythm vector according to the semantic attribute and the prosodic attribute. These semantic rhythms = two s are then transferred to the model to establish emotional semantics. ° 1 斯混:, the step of changing the voice signal to the semantic rhythm vector? 2 „ semantic attribute and rhythm attribute, obtain – semantic rhythm reporter, through the semantic rhythm record to explore an emotional rule. You, according to the above emotional rules In the embodiment of the present invention, the above-mentioned semantic meaning is 2, including: judging whether each word is semantically based on the semantic attribute, the meaning of the word is based on the word capsule. The definition of the knowledge base is 4 = /, the meaning of the standard, combined with the meaning of this simple meaning and its corresponding law = 4 201021024 29579twf.doc / n whether the words include emotional feature words, in combination with the emotional rhythm properties For the feature set, the feature set is recorded in the record. ~t Real ~ In the case, the emotional semantic model is established.

籤。其中,語意標籤包括特定語意 厚了疋—心 轉折語意標籤 發月之a知例中’上述將語意韻律記錄轉換為 律向量的辣,首先,依據情緒規則,計算語意韻 律記錄中之情_徵觸語意分數以及韻律分數。接著, 依據語意分數以及财分數,分職得各語音訊號的語意 韻,記錄在語意韻律向量巾的維度分數。社述語意韻律 向量的維度是依據情緒規則的數量而決定。 ❹ 徵詞與其對應 錄至語意韻律 標籤、否定語意標籤與 在本發明之一實施例中,上述在分別擷取語音訊號中 的t詞語的語意屬性與韻律屬性的步驟之前,更包括將各 π曰訊號轉換為文句。並且,對上述文句進行斷詞處理, 而獲得上述詞語。 在本發明之一實施例中,上述詞彙知識庫為知網 (HowNet)。韻律屬性包括音高、能量以及音長。 本發明提出一種語音情緒的分類方法。首先,依據多 個語音訊號中的各測試詞語所包括的語意屬性以及韻律屬 性,建立一情緒語意模型。而語意屬性是查詢詞彙知識庫 5 29579twf.doc/n 201021024 而獲得,韻律屬性則是由各語音訊號所取得。接著,擷取 待測語音訊號中的各待測詞語的語意屬性與韻律屬性。之 後,將各待測詞語的語意屬性與韻律屬性,代入上述情緒 語意模型’赠得雜語意錄。最後,歸情緒語意分 數來判斷待測語音訊號的情緒類別。 .在本發明之-實施例中,語音情緒的分類方法更包 括.在待測語音訊號中,债測一情緒顯著音段以擷取情 ❹ 緒顯著音段的韻律紐。之後,將韻律魏代人情緒韻律 f型’以獲得騎鮮錄。據此,上述觸制語音訊 说=緒_的步驟’更可依據情緒語意分數與情緒韻律 分數來進行判斷。 t本伽之-實施财,上述在待測語音訊號中,伯 測情緒顯者音段的步驟,可齡待測 執 :音:音音高。藉由偵測音高軌跡二 、其於μ、+曰Γ跡中的連續音段作為情緒顯著音段。 基上述,本發明先依據各詞語的語咅屬性 立产绪的八_ &以猎由此語意韻律模型來作為語 別的产绪二徵將來,經由韻律屬性來加強各情緒類 別的#緒舰’將可提高情緒分_正確率。 夹本:::上逑特徵和優點能更明顯易懂,下文特 舉只_,並配合所附圖式作詳細說明如下。 【實施方式】 有别於傳、·充僅使用情緒關鍵字於情緒分類,在下列實 201021024 /υχ? 29579twf.doc/n 施例中冑進步分析文字語意並應用於情緒分類上。為 I使本發明之时更為日膽,以下_實關作為本發明 確實咸夠據以實施的範例。 篇一實施例 圖1是依照本發明第一實施例所繪示的情緒語意模型 的建立方法流程圖。請參照圖!,在步驟請5中提供一 情^語料庫,其包括㈣語音减。在建續緒語意模型 之前’先搜集多種情緒類別的語音訊號。例如,可由多個 不Π的人針對生氣、悲傷、高興以及中性四種情緒類別分 別進行語音錄製,以建立情緒語料庫。 ^接著,在步驟suo中,擷取各個語音訊號中的各詞 π的m思屬性與韻律屬性,以利用語意屬性與韻律屬性來 做為分類㈣徵值。例如,先將各個語音訊雜換為文句, 再將文句進行斷詞處理以切割為多個詞語。之後,自一詞 彙^識庫中查詢這些詞語的語意屬性,並a自語音訊號擷 取迳些詞語的韻律屬性(例如,音高、能量以及音長)。 最後’在步驟SU5中,依據語意屬性與韻律屬性, 建立情緒語意模型。例如,依據語意屬性與韻律屬性,將 各浯音訊號轉換為語意韻律向量。之後,再藉由語意韻律 向量來訓練情緒語意模型。即是將所擷取出的語意屬性與 韻律屬性等情緒特徵利用分類技術加以歸納。 一般而言’分類技術包括支持向量機(Supp〇rt Vector Machme ’ SVM)、類神經網路(Neural Network,丽)、 fe藏式馬可夫模組(Hidden Markov Model,HMM)以及 201021024 上 m…v A 〆 29579twfdoc/n 高斯混合模式(Gaussian Mixture Model,GMM)等。在分 類技術中’ 一般是藉由在空間上的向量來進行訓練。 舉例來說,依據各詞語的語意屬性與韻律屬性可獲得 此文句的一筆語意韻律記錄。接著,再藉由自情緒語料庫 所取得之全部的語意韻律記錄來探勘出各個情緒類別的情 緒規則。之後,便可依據上述情緒規則,將語意韻律記錄 轉換為語意韻律向量。 在此,可預先利用上述詞彙知識庫來定義出多個語意 ❹ 標籤。在定義上述語意標籤之後,再加入語音訊號中之^ 律屬性’而將語意標籤擴展成語意韻律標籤。據此,便可 經由韻律屬性來加強各情緒類別的情緒特徵。 進一步地說,在擷取出各詞語的語意屬性之後,判斷 f詞語的語意屬性是否為語意標籤。當詞語屬於語意標籤 時,便將此詞語的語意標籤與其所對應的韻律屬性結合為 一語意韻律標籤。之後,再將語意韻律標籤記錄至文句所 對應的語意韻律記錄。另一方面,更可依據語意屬性,判 ❹ 斷在不為語意標籤的詞語中是否包括情緒特徵詞,以結合 1緒特徵詞與其對應的韻律屬性為一特徵集合,而將特徵 集合記錄至語意韻律記錄。據此,便可依據自動探勘而得 之情緒規則’將語意韻律記錄轉換為語意韻律向量,以藉 由語意韻律向量在空間中的關係來訓練情緒語意模型。 而在建立情緒g吾意模組之後,便可開始進行語音情緒 的辨識。以下再舉一例來說明。 、 圖2是依照本發明第一實施例所繪示的語音情緒的分 8 201021024 /UJL^ 29579twf.doc/n 類方法流程圖。請參照圖2,在步驟S2〇5中,接收一待測 語音訊號。在接收到待測語音訊號之後,便將待測語音訊 號轉換為文句’並將此文句切割為多個待測詞語。 接著,如步驟S210所示,自詞彙知識庫中查詢這些 待測列#的語意屬性並且自待測語音訊號來擷取這些待測 詞語的韻律屬性。 之後’在步驟S215中,將各待測詞語的語意屬性與 鵪 韻律屬性,代入情緒語意模型,以獲得情緒語意.分數。由 於情緒語意模型已經建立,在此將語意屬性及韻律屬性代 入情緒語意模型,便可獲得此待測語音訊號在各情緒類別 中所代表的情緒語意分數。 隶後,在步驟S220中,依據情緒語意分數,判斷待 測語音訊號的情緒類別。一般而言,情緒語意分數越高者 通常是代表最後的分類結果。倘若快樂情緒類別的情緒語 意分數為最高,即代表此待測語音訊號屬於快樂情緒類別。 上述词莱知識庫例如為知網(H〇wnet)。知網是一個 ® 以漢語和英語之詞語所代表之概念為描述物件,並且揭示 概念與概念之間所具有之屬性之間的關係。以下便以知網 為例,再舉一實施例來詳細說明建立情緒語意模型的各步 驟。 第二實施例 圖3疋依照本發明弟二實施例所續·示的情緒語意模型 的建立方法流程圖。請參照圖3,在步驟S31〇中,首先, 查詢知網301來擷取文句中各詞語的語意屬性。在知網3〇1 9 201021024 29579twf.doc/n 中記錄了多細^的概如及這些詞語之間關係。sign. Among them, the semantic label includes a specific meaning of the language. The heart-turning semantics label is in the case of a known case. The above-mentioned semantic rhythm record is converted into a rhythm of the law vector. First, according to the emotional rule, the emotion in the semantic rhythm record is calculated. Touch scores and rhythm scores. Then, according to the semantic score and the financial score, the semantic meaning of each voice signal is assigned, and the dimension score of the semantic rhythm vector towel is recorded. Socialistic rhythm The dimension of a vector is determined by the number of emotional rules.征 征 与其 与其 与其 与其 与其 与其 与其 与其 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 The signal is converted into a sentence. And, the above sentence is subjected to word segmentation processing to obtain the above words. In an embodiment of the invention, the vocabulary knowledge base is HowNet. Rhythm attributes include pitch, energy, and length. The invention proposes a classification method of speech emotions. First, an emotional semantic model is established based on the semantic attributes and prosodic attributes included in each test word in the plurality of voice signals. The semantic attribute is obtained by querying the vocabulary knowledge base 5 29579twf.doc/n 201021024, and the prosody attribute is obtained by each voice signal. Then, the semantic attribute and the prosodic attribute of each test word in the voice signal to be tested are extracted. After that, the semantic attributes and rhythmic attributes of each test word are substituted into the above-mentioned emotional semantic model, and the linguistic quotation is given. Finally, the emotional semantic score is used to determine the emotional category of the speech signal to be tested. In the embodiment of the present invention, the classification method of the speech emotion further includes: in the speech signal to be tested, the debt is measured by an emotionally significant segment to capture the rhythm of the significant segment. After that, the rhythm Weidai people's emotional rhythm f-type will be obtained to obtain a fresh record. Accordingly, the step of the above-mentioned tactile speech message can be judged based on the emotional semantic score and the emotional rhythm score. t本伽之-Implementation, in the above-mentioned voice signal to be tested, the step of measuring the emotionally explicit segment, the age to be tested: tone: pitch. By detecting the pitch trajectory 2, its continuous segments in the μ and + tracks are used as emotionally significant segments. Based on the above, the present invention first establishes the eight _ & according to the linguistic attributes of each word, and uses the linguistic rhythm model as the language of the second syllabus, and enhances the emotion categories by the rhythm attribute. The ship's will increase the emotional score _ correct rate. Clipbook::: The features and advantages of the upper jaw can be more clearly understood. The following is a special description of _, and is described in detail with the accompanying drawings. [Embodiment] Different from the use of emotional keywords in the emotional classification, in the following 201021024 / υχ 29579twf.doc / n examples of progress analysis of text semantics and applied to emotional classification. In order to make the present invention more daring, the following is an example in which the present invention is indeed salty enough to be implemented. 1 is a flow chart of a method for establishing an emotional semantic model according to a first embodiment of the present invention. Please refer to the picture! In the step 5, a corpus is provided, which includes (4) speech subtraction. Before the linguistic semantic model was built, the voice signals of various emotion categories were collected first. For example, voice recording can be performed by multiple unscrupulous people for angry, sad, happy, and neutral four emotional categories to build an emotional corpus. ^ Next, in the step suo, the msi attribute and the prosody attribute of each word π in each speech signal are extracted to use the semantic attribute and the prosody attribute as the classification (four) levy value. For example, first change each voice message into a sentence, and then perform a word segmentation process to cut into multiple words. After that, the semantic attributes of these words are queried from a word library, and a rhythm attributes (eg, pitch, energy, and length) of the words are taken from the voice signal. Finally, in step SU5, an emotional semantic model is established based on semantic attributes and prosodic attributes. For example, according to semantic attributes and prosodic attributes, each voice signal is converted into a semantic rhythm vector. After that, the emotional semantic model is trained by the semantic rhythm vector. That is, the emotional features such as the semantic attributes and the prosodic attributes extracted are summarized using classification techniques. In general, 'classification techniques include support vector machine (Supp〇rt Vector Machme ' SVM), neural network (Neural Network), Hidden Markov Model (HMM) and 201021024 on m...v A 〆29579twfdoc/n Gaussian Mixture Model (GMM), etc. In classification techniques, 'training is generally done by vectors in space. For example, a semantic prosody record of the sentence can be obtained according to the semantic attribute and the prosodic attribute of each word. Then, through the semantic rhythm records obtained from the emotional corpus, the emotional rules of each emotional category are explored. After that, the semantic rhythm record can be converted into a semantic rhythm vector according to the above emotional rules. Here, a plurality of semantic tags may be defined in advance using the above vocabulary knowledge base. After defining the above semantic tags, the semantic attribute in the voice signal is added and the semantic tag is expanded into a semantic rhythm tag. According to this, the emotional characteristics of each emotional category can be enhanced by the prosody attribute. Further, after taking out the semantic attributes of each word, it is determined whether the semantic attribute of the f word is a semantic tag. When a word belongs to a semantic tag, the semantic tag of the word and its corresponding rhythm attribute are combined into a semantic prosodic tag. After that, the semantic rhythm label is recorded to the semantic rhythm record corresponding to the sentence. On the other hand, according to the semantic attribute, it is judged whether or not the emotional feature words are included in the words that are not semantically labeled, and the feature points are combined with the corresponding rhythm attributes as a feature set, and the feature sets are recorded to the semantic meaning. Rhythm record. Accordingly, the semantic rhythm record can be converted into a semantic rhythm vector according to the emotional rule obtained by automatic exploration to train the emotional semantic model by the relationship of the semantic rhythm vector in space. After the emotional g-module module is established, the speech emotion can be identified. The following is an example to illustrate. 2 is a flow chart of a method for classifying a voice emotion according to a first embodiment of the present invention. Referring to FIG. 2, in step S2〇5, a voice signal to be tested is received. After receiving the voice signal to be tested, the voice signal to be tested is converted into a sentence ' and the sentence is cut into a plurality of words to be tested. Next, as shown in step S210, the semantic attributes of the to-be-tested columns # are queried from the vocabulary knowledge base and the prosody attributes of the to-be-measured words are retrieved from the speech signals to be tested. Then, in step S215, the semantic attribute and the 韵 rhythm attribute of each test word are substituted into the emotional semantic model to obtain emotional semantics and scores. Since the emotional semantic model has been established, the semantic attribute and the prosodic attribute are substituted into the emotional semantic model, and the emotional semantic score represented by the voice signal to be tested in each emotional category can be obtained. Thereafter, in step S220, the emotional category of the speech signal to be tested is determined based on the emotional semantic score. In general, the higher the emotional semantic score is usually the final classification result. If the emotional semantic score of the happy emotion category is the highest, it means that the voice signal to be tested belongs to the happy emotion category. The above-mentioned word knowledge base is, for example, HowNet. HowNet is a ® that describes the relationship between the concepts represented by the words Chinese and English and reveals the relationship between concepts and concepts. The following is an example of HowNet. Another embodiment is used to describe the steps of establishing an emotional semantic model. SECOND EMBODIMENT Fig. 3 is a flow chart showing a method for establishing an emotional semantic model according to the second embodiment of the present invention. Referring to FIG. 3, in step S31, first, the network 301 is queried to retrieve the semantic attributes of each word in the sentence. The details of the details and the relationship between these words are recorded in HowNet 3〇1 9 201021024 29579twf.doc/n.

以下列舉一例來說明知網的概念記錄形式。圖4是依 照本發明第二實施靖繪示的知網概念記錄形式的示意 圖。請參㈣4’在知咐,每—個語是由其概念與^ 述形成-筆記錄。而每一筆記錄主要包括功能變數名稱(包 括詞語、詞詞語性、詞語例子以及概念定義)與資料。以 W_C打為例,其中打”為資料,而“打”的功能變數名 稱為“W_C” ’也就是說“打,,為一個詞語。而以“c = V” 為例’其巾“V”為資料,而“v”的功能變數名稱為“G—c”, 也就疋說V為一個詞詞語性。其餘以此類推。 返回圖3,接著,在步驟S315中,查詢語意標籤資庫 302 ’以判斷語意屬性是否屬於語意標籤。在此,語意標籤 是藉由知網301當中所定義的語意屬性而訂定。以下^舉 一例來說明語意標籤的定義方法各步驟。 圖5疋依照本發明第二實施例所繪示的語意標籤的定 義方法流程圖。請參照圖5,在步驟S505中,先訂定美本 引發的情緒因素。例如,參考情緒心理學以了解人類&什 麼情境或狀況之下,會引發情緒的產生。在整理歸納出引 發情緒之基本情緒因素後,分析這些基本情緒因素所隱人 之主要垂意。 ' 、 舉例來說,圖6是依照本發明第二實施例所繪示的邊 本情緒因素的示意圖。圖6所示即為歸納所得之美本产f 因素在此’分別為快樂情緒因素、生氣情緒因素以及禮丨 傷情緒因素。 〜 201021024 NMiv /uiy 29579twf.doc/n 田去j對基本情緒^素的觀察,可以發現㈣基本情緒 ^ ’均會有某絲定語意的表現。例如:得到某種 除某種壓力、失去某種好處科。在上述語意表 /a得f ' “解除”、“失去”等動作描述部分稱 % ^關杨作語糊而形成完整語意 楚口刀、’二附屬動作語意詞”,如某種好處、壓力、目標 荨。The following is an example to illustrate the concept record format of HowNet. Fig. 4 is a schematic view showing the form of the concept of the knowledge network according to the second embodiment of the present invention. Please refer to (4) 4' in the knowledge, each language is formed by its concept and description - pen record. Each record mainly includes functional variable names (including words, word lexicality, word examples, and concept definitions) and materials. Take W_C as an example, in which "the name of the function" is called "W_C" ‘that is, it is a word. Taking "c = V" as an example, the towel "V" is the material, and the function variable of "v" is named "G-c", which means that V is a word utterance. The rest is like this. Returning to Fig. 3, next, in step S315, the semantic tag library 302' is queried to determine whether the semantic attribute belongs to the semantic tag. Here, the semantic tag is defined by the semantic attribute defined in HowNet 301. The following is an example to illustrate the steps of the definition method of the semantic tag. FIG. 5 is a flow chart showing a method for defining a semantic tag according to a second embodiment of the present invention. Referring to FIG. 5, in step S505, the emotional factors caused by the beauty are first determined. For example, referring to emotional psychology to understand humans and what situations or conditions can trigger emotions. After sorting out the basic emotional factors that trigger emotions, analyze the main intentions of these basic emotional factors. For example, FIG. 6 is a schematic diagram of a side emotional factor according to a second embodiment of the present invention. Figure 6 shows that the inductive factors of the US origin are the factors of happy emotional factors, angry emotional factors, and ritual hurts. ~ 201021024 NMiv /uiy 29579twf.doc/n Tian went to j to observe the basic emotions, and found that (4) the basic emotions ^ ‘have a certain semantic expression. For example: get some kind of pressure, lose some kind of benefits. In the above-mentioned semantic table /a get f '"release", "lost" and other action descriptions, said that the part of the "Yang Yang lyrics paste" and form a complete semantics Chu mouth knife, 'two subsidiary action language ideograms", such as certain benefits, pressure, goals Hey.

回到圖5,為了由語音訊號所辨識出來文句中正確擷 取出語意屬性’之後便如步驟測,利用基本情緒規則與 知網來定義出語意標鐵。 立,例來說’圖7是依照本發明第二實施例所緣示的語 ,標籤的示:t®。在此’語意標籤包括特定語意標鐵、否 疋忐思fe籤以及轉折語意標籤。其中,特定語意標籤是用 來表達特定語S之聰’否定語意賊具有否定意味之詞 5吾,而轉折語意標籤則具有語氣轉折之詞語。 在此,便根據知網中的動詞,選取出其中具有表達特 定語意的語意屬性,再將這些語意屬性分為15類,成為 15個特定語意標籤之定義。 以语意標籤[達成]為例,在知網中具有“Vachieve |達 成 ”、“fulfill 丨實現”、“end I 終結”、“finish 丨完畢”、 “succeed丨成功”屬性的詞語將會被歸類到[達成]此一語意 標籤。例如,“查出”與“猜到”兩個詞語在知網中的記錄為 DEF = Vachieve |達成”,因此,上述兩個詞語將會被歸 類至[達成]這個語意標戴。 11 201021024 ^^,ίυι;72957^ω〇ε/η 否定語意標籤的定義是將知網裡所有詞語的定義中具 有特徵“neg I負否,,之詞語全部直接擷取,而成為否定語意 標籤之定義。 ^轉折語意標籤之定義則是觀察知網中所有副詞與連接 词,將具有轉折語氣之詞語擷取成為轉折語意標籤之定 義。此外,再根據轉折語之特性,將轉折語意標籤分為二 種,一為[轉折一擷取],另一為[轉折—省略]。 ϋ 在語意標籤定義完成之後,便可利用其來對語音情緒 進行分類。 立返回圖3,在步驟S315中,當詞語的語意屬性符合語 意標籤時,如步驟S32G所示,標示對應的語意標鐵。之 後’ ^步驟s325所示,查詢韻律屬性資料庫3〇3以將語 思標籤擴充為語意韻律標籤。據此’可經由韻律屬性來加 強各情緒類別之情緒特徵。接著,在步驟S34〇中將語 意韻律標籤記錄至此文句所對應的—語意叫 舉例來說,在將語音訊號轉換為文句^進之 零 後,可依據各個詞語在語音訊號中的音段,擁取+卜一立 段的韻律屬性,並記錄至韻律屬性資料庫期。在此 律f性包括音高、能量以及音長。而每_種韻律屬性可^ 別1化成二個程度來表示,音高與能量以高(Ή)、中(M)、 低(L)來表示’而音長則以長(L)、中(m)、短⑻ 來表示。 返回步驟SS15,當詞語的語意屬性不符合語意標藏, 如步驟S330所示’自這些詞語中取情緒特徵詞。之後, 12 201021024 ΐΝίνι^,υ^ 29579twf.doc/n 如步驟S335所示,將這些情绪特徵詞結合 性而成為一特徵集合。接著,在步驟S34〇"中了 &韻律屬 合記錄至此文句所對應的一語意韻律記錄中。,將特徵集 也就是說,除了已經被標上語意標籤之詞與之 其他未被標上語意標織的詞語(例如,形容气戈名1 取其在知網3〇1中之情緒特徵詞’並將此“特二:乂 至語意韻律記錄之令,如此一來才算是—個完整 意特徵。 °Returning to Fig. 5, in order to correctly extract the semantic attribute from the sentence recognized by the voice signal, the basic emotional rule and the knowledge network are used to define the semantic iron. Fig. 7 is a diagram showing the language of the label according to the second embodiment of the present invention: t®. Here, the semantic tag includes a specific semantic iron, a non-feel sign, and a transitional semantic tag. Among them, the specific semantic label is used to express the specific language S. The negation of the semantic thief has the negative meaning of the word 5, while the metaphorical meaning label has the tone of the word transition. Here, according to the verbs in HowNet, the semantic attributes with specific meanings are selected, and then these semantic attributes are divided into 15 categories, which become the definition of 15 specific semantic labels. For example, in the semantics tag [achievement], the words "Vachieve | Achieve", "Fulfill 丨 Implementation", "End I End", "finish 丨 Complete", and "succeed 丨 Success" attributes will be returned in HowNet. Class to [achieve] this semantic tag. For example, the words "detect" and "guess" in the network are known as DEF = Vachieve |, so the above two words will be classified as [achieve] the meaning of the mark. 11 201021024 ^^, ίυι;72957^ω〇ε/η The definition of the negation semantic label is that the definition of all words in the network has the characteristic "neg I negative or not, and all the words are directly drawn, and become the definition of the negative semantic label. . ^ The definition of the semantic meaning label is to observe all the adverbs and conjunctions in the network, and to use the words with the turning tone to define the meaning of the transitional semantics. In addition, according to the characteristics of the turning words, the semantic meaning labels of the transition are divided into two types, one is [turning one] and the other is [turning-omitting]. ϋ After the semantic tag definition is complete, you can use it to classify your voice emotions. Returning to Fig. 3, in step S315, when the semantic attribute of the word conforms to the semantic label, as indicated by step S32G, the corresponding semantic iron is indicated. Thereafter, as shown in step s325, the prosody attribute database 3〇3 is queried to expand the linguistic label into a semantic rhythm label. According to this, the emotional characteristics of each emotional category can be enhanced by the rhythm attribute. Then, in step S34, the semantic rhythm label is recorded to the corresponding sentence of the sentence. For example, after converting the voice signal into the zero sentence of the sentence, the segment can be based on the segment of the voice signal in each word. Take the rhythm attribute of the +1 segment and record it to the prosody attribute database period. Here, the law includes pitch, energy, and length. And each _ rhythm attribute can be expressed as two degrees, the pitch and energy are expressed as high (Ή), medium (M), low (L), and the length is longer (L), medium (m), short (8) to indicate. Returning to step SS15, when the semantic attributes of the words do not conform to the semantic meaning, as shown in step S330, the emotional feature words are taken from these words. Thereafter, 12 201021024 ΐΝίνι^, υ^ 29579twf.doc/n, as shown in step S335, these emotional feature words are combined into a feature set. Then, in step S34, the & rhythm is recorded in the semantic prosody record corresponding to the sentence. That is, the feature set is, in other words, words that have been marked with semantic tags and other words that are not marked with semantics (for example, describe the emotional characteristics of the genre 3 in 1 'And this special 2: 乂 to the rhythm of the rhythm record, so that is a complete meaning.

舉例來說’假設語音訊號轉換的文句為,“都快沒錢過 日子了 ’還好今天拿到-點錢了,,。在經過上述步驟 S3H)〜S325之後,獲得“沒”的語意韻律標籤為[否定 _PM_EH—DS],“還好”的語意韻律標籤為[轉折—擷取 _PL—EM—DL],“拿到”的語意韻律標籤為[得到 —PH—EH—DM] ’而其他詞語則無任何語意標籤。 接著,在其他未標示語意標籤的詞語中,依據其語意 屬性來找出情緒特徵詞。在經過上述步驟S330〜S335之 後’獲得“錢”的特徵集合為[wealth |錢財j>HJEMJDS], 而“一點”的特徵集合為[few |少-PL—EMJDM]。 據此’所獲得的語意韻律記錄則為{[否定 -PM-EH-DS]、[轉折—擷取 _PL_EM_DL]、[得到 _PH—EH_DM]、[few丨少—pLJEM—DM]、[wealth丨錢財 一PH—EM—DS]} 〇 — 值得注意的是,在本實施例中,僅擷取[轉折_擷取] 之後的語意韻律標籤,因此最後得到之語意韻律記錄為 13 201021024 iNivuy/Uly 29579twf.doc/n {[得到—PH_EH_DM]、[few| 少 _PL—EM DM]、[wealth丨錢財 —PH—EM—DS]}。 _ ^ 而辨識後之文句經由語意屬性以及韻律屬性的擷取 後,成為一筆語意韻律記錄。之後,便運用資料探勘之技 術,從全部的語意韻律記錄中,自動來探勘出情緒規則 304。For example, 'assuming that the sentence of the voice signal conversion is, "all have no money to live." Fortunately, I got it today - I got some money. After the above steps S3H)~S325, I got the "no" semantic rhythm label. For [negative_PM_EH-DS], the semantic rhythm label of "good for" is [turn--take _PL-EM-DL], and the semantic rhythm label of "get it" is [get-PH-EH-DM] Other words have no semantic label. Then, in other words that are not marked with semantic meanings, the emotional feature words are found according to their semantic attributes. After the above steps S330~S335, the feature set of 'obtaining money' is [ Wealth | money j>HJEMJDS], and the feature set of "one point" is [few | less - PL - EMJDM]. According to this, the semantic rhythm record obtained is {[negative-PM-EH-DS], [turning] - _PL_EM_DL], [Get _PH-EH_DM], [few 丨--pLJEM-DM], [wealth丨钱一 PH-EM-DS] 〇 - It is worth noting that, in this embodiment, Only the semantic rhythm label after [turning_crawling] is taken, so the final semantic rhythm record is obtained. 13 201021024 iNivuy/Uly 29579twf.doc/n {[Get - PH_EH_DM], [few| Less _PL-EM DM], [wealth丨钱财-PH-EM-DS]}. _ ^ And the recognized sentence is via semantics After the attributes and rhythm attributes are captured, they become a semantic rhythm record. After that, using the technique of data exploration, the emotion rules 304 are automatically explored from all the semantic rhythm records.

由於#f意韻律標籤的標示程序是以被標示之語意韻律 標籤為中心,因此所要求之情緒規則形態為T—D。其中, τ代表語意韻律標籤,例如[達成_pH_EM_DS]、[解除 JPMJEH一DM]等。而D為附屬於某個動作之附屬語意詞, 是利用知網301來擷取出主要的情緒特徵詞並結合韻律屬 性而成為特徵集合,如[symbol|符號_PM_EH_DM]。在此T 與D可以是一個或多個。據此,不論是τ1λΤ2—D1或是 T3—D2aD3均有可能。 經由資料探勘技術得到情緒規則304後,在步驟 S345〜S355中,利用情緒規則3〇4,將每一筆語意韻律記 錄轉換為一語意韻律向量表示,其中每一條情緯規則代表 向量空間中之一個維度。 倘若中性情緒規則為,快樂情緒規則為 及产’㈤7,…,,生氣情緒規則,悲傷情緒 規則為。 / 則每一筆語意韻律記錄的語意韻律向量表示為 14 201021024 iNMiy/uiy 29579twf.doc/n S吾意韻律向量則是在維度為+Q +4向量空間中之— 在步驟S345中,依據情緒規則304 ’計算語意韻律記 錄的語意分數。詳細地說,在計分時’先檢查語意韻律記 錄的T部分是否符合情緒規則3〇4的T部分。倘若語意韻 律記錄的T部分符合情緒規則的T部分’才進一步對d部 分進行檢查,以計算此維度分數。 • . . · ·Since the labeling procedure of the #f rhythm label is centered on the marked rhythm label, the required emotional rule form is T-D. Where τ represents a semantic rhythm label, such as [achieve_pH_EM_DS], [release JPMJEH-DM], and the like. D is an auxiliary language affixed to an action. It uses the knowledge network 301 to extract the main emotional feature words and combine them with the prosodic attributes to become a feature set, such as [symbol|symbol_PM_EH_DM]. Here T and D can be one or more. Accordingly, it is possible to use either τ1λΤ2—D1 or T3—D2aD3. After the emotion rule 304 is obtained through the data exploration technique, each of the semantic prosody records is converted into a semantic prosodic vector representation using the emotion rule 3〇4 in steps S345 to S355, wherein each of the emotion rules represents one of the vector spaces. Dimensions. If the neutral emotion rule is, the happy emotional rule is the production of (5) 7, ..., angry rules, sad emotion rules. / Then the semantic rhythm vector of each semantic rhythm record is expressed as 14 201021024 iNMiy/uiy 29579twf.doc/n S The Italian rhythm vector is in the dimension +Q +4 vector space - in step S345, according to the emotional rules 304 'Compute the semantic score of the semantic rhythm record. In detail, at the time of scoring, it is first checked whether the T portion of the semantic rhythm record conforms to the T portion of the emotional rule 3〇4. If the T part of the semantic rhythm record conforms to the T part of the emotional rule, the d part is further examined to calculate the dimension score. • . . · ·

由於在知網301定義中,詞語具有階層關係,此階層 關係是在語意韻律記錄與情緒規則3 〇 4之間二個情緒特^ 詞不一樣時’用來計算二個情緒特徵詞之語意相似度。在 知網3〇1中,若階層關係最深共分m層,階層愈深,其關 係愈好’即語意愈相近。因此語意韻律記錄與情緒規則1〇4 之間二個情緒特徵詞螞,巧之比對分數為: vp L{DhDj) =1二5’為〜'最大相同路徑長,零(^·))) 後再===== 附力的0屬性的示意圖。圖8δ 例所、會不的附加屬性權重表的示意圖。在圖8Α,以^ 15 201021024 ΝΜΪ97019 29579twf.doc/n 301為例,疋義了八種附加屬性。在圖8 屬性之__,給予各附加屬性—權重值。據各附加 表二在步驟S355中,依據韻律屬性權重 :第律屬性權重表的示意Ξ = 量以高(Η)、中(刀^里^三個程度來表示,音高與能 rT Ν .. )低(L)來表示,而音長則以長 ❿ 來仏予-權㈣^(s)來表示。依據量化的程度差距 來、、σ予權重值。兩者程度較近,賦予之權重值為 0.L而Η與L兩者程度較遠,賦予之權重值為Q 25 ^ 與Η兩者程度較近,所賦权權重值亦為〇.5。以 上〜之後,在步驟S360中,根據上述步驟S345〜S355來 计异出5吾意韻律向量中的各維度分數。 舉例來說’假設^為[symbol丨符號_PM_EH_DM],Dj-為[&language|語言jph_EMJDM]。Di在知網的路徑為 1.12.5.1.1 ’而1)】在知網的路徑為1 ] 2 5],1(^,巧)為$, 為4 ’求得階層關係分數%為。附加屬性 分數〜為〇.5。韻律屬性分數9為0.5。最後求得在語意韻 律向嚴中的一維度分數為V=VpXVpXV/。 曰最後’在步驟S365中,在各情緒類別的語意韻律向 量收集完成之後’便可以高斯混合模型建構出每一情绪類 別的情緒語意模型。在情緒語意模型建構完成之後,便可 開始進行語音情緒的分類。以下再舉另一實施例來說明。 16 201021024 iNMiy/uiv 29579twf.doc/n 第三實施例 圖10是依照本發明第三實施例所繪示的語音情緒分 類方法的流程圖。請參照圖1G,在本實施射,接收到待 測語音訊號之後,可分別對待測語音訊號進行文字語意的 分析以及韻律特徵的擷取。 在文字語意的分析上,首先如步驟讀〇所示,利用 「語音辨識的聲學模型職(例如HMM)將待測語音訊 雜換為文句。接著,如步驟1〇ls所示,利用語意標鐵資 料庫1002、韻律屬性資料庫1〇〇3以及情緒規則1〇〇4,將 此文句轉換為語意韻律向量。 在此,步驟1015與前述第二實施例中的步驟 smo〜s36〇相同或相似,而語意標籤資料庫1〇〇2、韻律屬 性資料庫聰以及情緒規則_與前述第二實施例中的 語意標籤資料庫3〇2、韻律屬性資料庫3〇3以及情緒規則 304的建立方法亦相同或相似,故在此皆不再資述。 -另方面,在韻律特徵的擷取上,首先如步驟S1020 Φ 所* ’在制語音職巾,偵測-情賴著音段。 在待測 語9訊號中,為了避免非情緒音段影響了情緒辨識之準確 率,因而可先偵測情緒顯著音段(Em〇ti〇naUy故 Segment) ϋ情崎音絲絲财雜。所謂的情緒 ”、、頁著曰¥又疋先汁算出整個待測語音訊號的音高軌跡(pi&h C〇ntour)。倘若音高轨跡巾存在連續音段,·此連續音 段定義為情緒顯著音段。 接著’在步驟S1025中,基於情緒顯著音段來擷取韻 17 201021024 NMiy/uiy 29579twf.doc/n 律特徵。韻律特徵包括最大音高值、最小音高值、平均音 高值、音高變異數、最大能量值、最小能量值、平均能量 值、能量變異數、最大共振峰值、最小共振峰值、平均共 振峰值、共振峰變異數,共計12個參數。將此12個參^ 視為12維度的韻律向量。 " 最後在步驟S1030中,結合情緒語意模型1〇〇5與情 緒韻律模型1_,並且根據貝式定理來決定Since in the definition of HowNet 301, words have a hierarchical relationship, which is used to calculate the semantic similarity of two emotional feature words when the two emotional characteristics are different between the semantic rhythm record and the emotional rule 3 〇4. degree. In Zhiwang 3〇1, if the class relationship is deeply divided into m layers, the deeper the class, the better the relationship is, that is, the closer the semantics are. Therefore, between the semantic rhythm record and the emotional rule 1〇4, the two emotional feature words are sac, and the coincidence score is: vp L{DhDj) =1 2 5' is ~ 'maximum same path length, zero (^·)) After that ===== A schematic diagram of the attached 0 attribute. Figure 8 is a schematic diagram of the additional attribute weight table of the δ example. In Fig. 8Α, taking ^ 15 201021024 ΝΜΪ97019 29579twf.doc/n 301 as an example, eight additional attributes are deprecated. In the __ of the attribute of Figure 8, each additional attribute-weight value is given. According to each additional table 2 in step S355, according to the rhythm attribute weight: the indication of the law attribute weight table 量 = quantity is represented by high (Η), medium (knife ^ 里 ^ three degrees, pitch and energy rT Ν . . ) Low (L) to indicate, and the length of the sound is expressed by the long-sentence-right (four)^(s). According to the degree of difference in quantization, σ is given a weight value. The degree of the two is relatively close, the weight value given is 0.L and the degree of Η and L is far. The weight value given is closer to Q 25 ^ and Η, and the weight of the weight is also 〇.5 . Above and after, in step S360, the respective dimension scores in the five my-sense prosody vectors are calculated according to the above-described steps S345 to S355. For example, 'assumed to be [symbol丨 symbol_PM_EH_DM], and Dj- is [&language|language jph_EMJDM]. The path of Di in HowNet is 1.12.5.1.1 ’and 1)] The path in HowNet is 1 ] 2 5], 1 (^, Qiao) is $, and the score of % is 4 ’. Additional attributes Score ~ is 〇.5. The prosodic attribute score 9 is 0.5. Finally, the score of one dimension in the semantic rhythm is V=VpXVpXV/. Finally, in step S365, after the semantic rhythm vector collection of each emotion category is completed, the Gaussian mixture model can construct an emotional semantic model for each emotion category. After the emotional semantic model is constructed, the classification of speech emotions can begin. Another embodiment will be described below. 16 201021024 iNMiy/uiv 29579twf.doc/n Third Embodiment FIG. 10 is a flowchart of a speech emotion classification method according to a third embodiment of the present invention. Referring to FIG. 1G, after receiving the voice signal to be tested in the present embodiment, the speech signal to be measured can be analyzed for semantic meaning and the prosody feature is captured. In the analysis of the semantic meaning of the text, first of all, as shown in the step reading, the acoustic model of the speech recognition (such as HMM) is used to change the speech signal to be tested into a sentence. Then, as shown in step 1 〇ls, the semantic meaning is used. The iron database 1002, the prosody attribute database 1〇〇3, and the emotion rule 1〇〇4 convert the sentence into a semantic rhythm vector. Here, the step 1015 is the same as the steps smo~s36〇 in the foregoing second embodiment or Similarly, the semantic tag database 1 2, the prosody attribute database Cong Cong and the emotional rules _ and the semantic index database 3 〇 2, the prosody attribute database 3 〇 3 and the emotional rule 304 in the foregoing second embodiment are established. The methods are the same or similar, so they are no longer described here. - On the other hand, in the acquisition of the prosodic features, first of all, as in step S1020 Φ * 'in the voice of the voice, detect - affect the sound segment. In the signal 9 to be tested, in order to avoid the non-emotional segment affecting the accuracy of the emotion recognition, it is possible to detect the emotionally significant segment (Em〇ti〇naUy, the Segment). Emotion", page 曰¥又The first juice calculates the pitch trajectory of the entire voice signal to be tested (pi&h C〇ntour). If the pitch track has a continuous segment, this continuous segment is defined as an emotionally significant segment. Next, in step S1025, the rhyme 17 201021024 NMiy/uiy 29579twf.doc/n law feature is extracted based on the emotion significant segment. Prosodic features include maximum pitch value, minimum pitch value, average pitch value, pitch variation, maximum energy value, minimum energy value, average energy value, energy variance, maximum resonance peak, minimum resonance peak, average resonance peak , the formant variation, a total of 12 parameters. Think of these 12 parameters as a 12-dimensional prosody vector. " Finally, in step S1030, the emotional semantic model 1〇〇5 and the emotional prosody model 1_ are combined, and the decision is made according to the Bell's theorem.

的情緒類別。詳細地說’也就是將語意韻律向量代入情^ 居思模型1005而獲得情緒語意分數,並將韻律向量代入情 緒韻律模型1006而獲得情緒韻律分數。之後,藉由貝式定 =所找出之事後齡最大的情緒_,即騎後的辨識結 果0 在此’情绪韻律模型1006例如亦是以高斯混合模型所 t 4^就疋將情緒語料庫巾·音訊號,分卿取上述 m’6以此12個參數作為韻律向量來.此情緒韻 祐、隹在上述實施财,分析各詞語的語意屬性, 々進。韻律屬性,據以提高情緒分類的正確率。此 ’更可僅針對語音訊號巾的情賴著音聽行分以 避免非情緒的音段影響了情緒分類的正確率。 本發^彳本㈣已以實關揭露如上,然其並翻以限定 ^壬可所屬技術領域中具有通常知識者,在不脫離 圍内’當可作些許之更動與潤飾,故本 月之保瘦關當視後附之申請專利範圍所界定者為準。 18 201021024 JNMiy/uiy 29579twf.doc/n 【圖式簡單說明】 圖1是依照本發明第一實施例所繪示的情緒語意模型 的建立方法流程圖。 圖2是依照本發明第一實施例所繪示的語音情緒的分 類方法流程圖。 圖3是依照本發明第二實施例所繪示的情緒語意模型 的建立方法流程圖。 圖4疋依照本發明弟二實施例所繪不的知網概念記錄 ® 形式的示意圖。 圖5是依照本發明第二實施例所繪示的語意標籤的定 義方法流程圖。 圖6是依照本發明第二實施例所繪示的基本情緒因素 的示意圖。 圖7是依照本發明第二實施例所繪示的語意標籤的示 意圖。 圖8A是依照本發明第二實施例所纷示的附加屬性的 φ 示意圖。 圖8B是依照本發明第二實施例所繪示的附加屬性權 重表的示意圖。 圖9是依照本發明第二實施例所緣示的韻律屬性權重 表的示意圖。 圖10是依照本發明第三實施例所繪示的語音情緒分 類方法的流程圖。 19 201021024 NMiy /Uiy 29579twf.doc/n 【主要元件符號說明】 S105〜S115 :本發明第一實施例的情緒語意模型的 立方法各步驟 S205〜S220 ··本發明第一實施例的語音情緒的分類方 法各步驟 S310〜S365 :本發明第二實施例的情緒語意模型的建 立方法各步驟 S505〜S510 :本發明第二實施例的語意標籤的定義方 法各步驟 S1010〜S1030 :本發明第三實施例的語音情緒分類方 法各步驟 301 :知網 3〇2、1〇〇2 :語意標籤資料庫 303、 1003 .韻律屬性資料庫 304、 1004 :情緒規則 305 :附加屬性權重表 306 :韻律屬性權重表 1001 :聲學模型 1005 :情緒語意模型 1006 :情緒韻律模型 20The emotional category. In detail, that is, the semantic rhythm vector is substituted into the emotion model 1005 to obtain an emotional semantic score, and the prosody vector is substituted into the emotional prosody model 1006 to obtain an emotional rhythm score. After that, the maximum emotion _ after the horse is determined by the syllabus = the identification result 0 after the ride. Here, the 'emotional rhythm model 1006 is also a Gaussian mixture model, and the emotional corpus is used. · The audio signal, the division takes the above m'6 with these 12 parameters as the prosody vector. This emotional Yunyou, 隹 in the above implementation of wealth, analysis of the semantic attributes of each word, hyperthyroidism. The rhythm attribute is used to improve the correct rate of emotional classification. This can be used only for the voice signal to avoid the non-emotional segments affecting the correct rate of emotional classification. This issue (4) has been exposed as above, but it is also limited to the general knowledge of the technical field, and it can be used to make some changes and refinements. The warranty is based on the scope of the patent application attached to it. 18 201021024 JNMiy/uiy 29579twf.doc/n [Simplified Schematic Description] FIG. 1 is a flow chart showing a method for establishing an emotional semantic model according to a first embodiment of the present invention. 2 is a flow chart of a method for classifying a voice emotion according to a first embodiment of the present invention. 3 is a flow chart of a method for establishing an emotional semantic model according to a second embodiment of the present invention. Figure 4 is a schematic diagram showing the form of the Knowledge Base Concept Record ® according to the second embodiment of the present invention. FIG. 5 is a flow chart of a method for defining a semantic tag according to a second embodiment of the present invention. Figure 6 is a schematic illustration of the basic emotional factors depicted in accordance with a second embodiment of the present invention. Figure 7 is a schematic illustration of a semantic tag in accordance with a second embodiment of the present invention. Figure 8A is a schematic illustration of φ of additional attributes highlighted in accordance with a second embodiment of the present invention. FIG. 8B is a schematic diagram of an additional attribute weight table according to a second embodiment of the present invention. Figure 9 is a diagram showing the prosody attribute weight table according to the second embodiment of the present invention. FIG. 10 is a flow chart of a method for classifying a voice emotion according to a third embodiment of the present invention. 19 201021024 NMiy /Uiy 29579twf.doc/n [Description of Main Element Symbols] S105 to S115: Method of Establishing Emotional semantic Model of the First Embodiment of the Present Invention Each Step S205 to S220 · The Speech Emotion of the First Embodiment of the Present Invention Each of the steps S310 to S365: the method for establishing the emotional semantic model according to the second embodiment of the present invention, each step S505 to S510: the method for defining the semantic tag of the second embodiment of the present invention, each step S1010 to S1030: the third embodiment of the present invention Example of speech emotion classification method 301: HowNet 3〇2, 1〇〇2: semantic tag database 303, 1003. Prosody attribute database 304, 1004: Emotion rule 305: Additional attribute weight table 306: Prosody attribute weight Table 1001: Acoustic Model 1005: Emotional semantic model 1006: Emotional Rhythm Model 20

Claims (1)

201021024 JNMiy/uiy 29579twf.doc/n 七、申請專利範圍: 1. 一種情緒語意模型的建立方法,包括: 提供一情緒語料庫,其包括分屬於多個情緒類別的多 個5吾音訊號; 分別擷取該些語音訊號中的多個詞語各自的一語意屬 性與一韻律屬性,其中該語意屬性是依據一詞彙知識^而 獲得,該韻律屬性是由各該些語音訊號所取得;以及 參201021024 JNMiy/uiy 29579twf.doc/n VII. Application for patents: 1. A method for establishing an emotional semantic model, comprising: providing an emotional corpus comprising a plurality of 5 voicing signals belonging to a plurality of emotional categories; Taking a semantic attribute and a prosody attribute of each of the plurality of words in the voice signal, wherein the semantic attribute is obtained according to a vocabulary knowledge, the prosody attribute is obtained by each of the voice signals; 藉由該些語音訊號各自的該語意屬性與該韻律屬性, 建立該情緒語意模型。 2.如申請專利範圍第丨項所述之情緒語意模型的建 立方法,其中藉由該些語音訊號各自的該語意屬性與該韻 律屬性,建立該情緒語意模型的步驟,包括: 、 依據該語意雜與該韻律屬性,分_換各該些語音 向量’以藉由該些語音訊號各自的該語 w明律向置,建立該情緒語意模型。 立方範圍第2項所述之情緒語意模型的建 建立該情緒自的該語意韻律向量, 合模==:意韻律向量代入-高斯混 4.如申請專利範圍第2項所述之 立 立方法,其巾依_語意屬^^翻的建 該些語音訊號為該語意上量Si屬:括分別轉換各 依據該些阔語各自的該語意屬性與該韻律屬性,獲得 21 201021024 i^tivLiy/vky 29579twf.doc/n 一語思知律記錄; 藉由該語意韻律記錄,探勘一情緒規則;以及 /依據該情緒規則,將該語意韻律記錄轉換為該語意韻 律向量。 u 5.如申》月專利範圍第4項所述之情緒語意模型的建 立方法,其中依據該些詞語各自的該語性 性,獲得該語意韻律記錄的步驟,包括: 縛屬 Φ Φ &依據該語意屬性,判斷各該些詞語是否屬於一語 其巾該語隸缺依據該贈知識庫所定義^得^ 反 當該些詞語其中之—屬於該語意 應的該韻律屬性為-語意韻律標藏: π^明律私戴记錄至該語意韻律記錄。 立二利範圍第5項所述之情緒語意模型的建 ^依據—基本情_素以及該贿知鱗,訂㈣注咅 標藏與一轉折語意叙⑸知、—否定語意 立方:範圍第5項所述之情緒語意模型的建 性,獲得該語意韻律記錄的步驟,更包括:u祕屬 語中意ΐ性,判斷在不屬於該語意標籤的該些詞 應的該韻律屬性為-特徵集合,而將該 22 201021024 NMiy/U!V 29579twf.d〇c/n 語意韻律記錄。 8. 如申凊專利範圍第7項所述之情緒語意模型的建 立方法,其中依據該情緒規則,將該語意韻律記 該語意韻律向量的步驟,包括: 、马 ^依據該情緒規則,計算該語意韻律記錄中之情緒特徵 詞的一語意分數以及一韻律分數;以及 依據該語意分數以及該韻律分數,分別獲得該些語音 | 訊號各自的該語意韻律記錄在該語意韻律向量中的一維度 分數’而該語意韻律向量的維度是依據該情緒規則的數量 而決定。 9. 如申請專利範圍第1項所述之情緒語意模型的建 立方法’其中在分別擷取該些語音訊號中的該些詞語各自 的該語意屬性與該韻律屬性的步驟之前,更包括: 轉換各該些語音訊號為一文句;以及 對該文句進行斷詞處理,而獲得該些詞語。 10·如申請專利範圍第1項所述之情緒語意模型的建 . 立方法’其中該詞彙知識庫為知網(HowNet)。 11·如申請專利範圍第1項所述之情緒語意模型的建 立方法’其中該韻律屬性包括音高、能量以及音長。 12, 一種語音情緒的分類方法,包括: 依據多個語音訊號中的多個測試詞語各自所包括的— 語意屬性以及一韻律屬性,建立一情緒語意模型,其中該 語意屬性是依據一詞彙知識庫而獲得,該韻律屬性是由各 該些語音訊號所取得; 23 201021024 NMiy/uiy 29579tw£doc/n 接收一待測語音訊號; 擷取該待測語音訊號中的多個待測詞語各自的該語意 屬性與該韻律屬性; 將該些待/則同香各自的該語意屬性與該韻律屬性,代 入該情緒語意模型,以獲得一情緒語意分數;以及 依據該情緒語意分數,判斷該待測語音訊號的情緒類 別。 I3.如申請專利範圍第12項所述之語音情緒的分類 囑方法’更包括: 在該待測語音訊號中,偵測一情緒顯著音段; 擷取該情緒顯著音段的多個韻律特徵; 將該些韻律特徵代入一情緒韻律模型,以獲得一情緒 韻律分數。 14·如申請專利範圍第13項所述之語音情緒的分類 方法’其中依據該情緒語意分數,判斷該待測語音訊號的 情緒類別的步驟,更包括: © 依據該情緒語意分數與該情緒韻律分數,判斷該待測 語音訊號的情緒類別。 15.如申請專利範圍第13項所述之語音情緒的分類 方法,其中在該待測語音訊號中,偵測該情緒顯著音段的 步驟,包括: 擷取該待測語音訊號的一音高軌跡;以及 偵測該音高軌跡中的連續音段,以將該音高轨跡中的 連續音段作為該情緒顯著音段。 24 201021024 NM19 /0iy 29579twf.doc/n 、16.如中請專利範圍第12項所述之語音情緒的分類 方法’、其巾贱些制詞語各自的該語意胁與該韻律屬 性,代入該情緒語意模型的步驟,包括: 依據該語意屬性與該韻律屬性,轉換該待測語音訊號 為一 5吾意執律向量;以及 將該語意韻律向量代入該情緒語意模型。 17.如申印專利範圍第16項所述之語音情緒的分類 ❹ 巾依據贿簡性與轉制性,轉換該待測語 曰訊號為該語意韻律向量的步驟,包括: 依=二制觸各自的該語意輕與該韻律屬性, 獲侍一 §吾思韻律記錄;以及 ,據-情緒規則’將該語意韻律記錄轉換為該 律向罝。 方半it申請專利範㈣17項所述之語音情緒的分類 屬性,康Ϊ些待測詞語各自的該語意屬性與該韻律 ',獲侍該語意韻律記錄的步驟,包括: 各該些_語_於—語 標織是依據該詞彙知識庫所定義而得; 田待測詞語其中之一屬於該語士人 ==所對應的該韻律屬性為“以 將I病律標籤記錄至該語意韻律記錄;以及 測詞屬Ϊ ’判斷在不屬於該語意標籤的該些待 對應::2 緒特徵詞,以結合該情緒特徵詞與其 _ 性為—特徵集合,而將轉徵集合記錄至 25 201021024 NMiy/ϋΐν 29579^^00^ 該語意韻律記錄。 19. 如中請專利範圍第18項所述之語音情緒的分類 方法,其中該語意標籤包括一特定語意標籤、一否 立、 標籤與一轉折語意標籤。 思 20. 如申請專利範圍第18項所述之語音情緒的分類 祖立超Ϊ中依據該情緒規則’將該語意韻律記錄轉換為該 叩思韌律向量的步驟,包括: 1沾—據該晴緒規則,該語意韻律記錄巾之情緒特徵 5司的—語意分數以及一韻律分數;以及 依據贿意分數収該雜分數,獲得驗意韻律纪 的唯律向量中的一維度分數,而該語意韻律向量 的維度疋依據該情緒規則的數量而決定。 方21.如申請專利範圍第12項所述之語音情緒的分類 ,,其中建立該情緒語意模型的步驟,包括: 依據一鬲斯混合模型,建立該情緒語意模型。 ❹ 22. 如申請專利範圍第12項所述之語音情緒的分類 其中在擷取該待測語音訊號中的該些待測詞語各自 、性與該韻律屬性的步驟之前,更包括: 轉換該待測語音訊號為一文句;以及 對該文句進行斷詞處理,而獲得該些待測詞語。 23. 如申請專利範圍第12項所述之語音情緒的分類 / ,其中該詞彙知識庫為知網(H〇wNet)。 24. 如申請專利範圍第12項所述之語音情緒的分類 ^ ’其巾該韻律屬性包括音高、能量以及音長。 26The emotional semantic model is established by the semantic attribute and the prosody attribute of each of the voice signals. 2. The method for establishing an emotional semantic model according to the scope of the patent application, wherein the step of establishing the emotional semantic model by the semantic attribute and the prosody attribute of each of the voice signals comprises: The heterosexuality and the prosody attribute are divided into the respective speech vectors to establish the emotional semantic model by the language of each of the speech signals. The construction of the emotional semantic model described in item 2 of the cubic range establishes the semantic prosody vector of the emotion, and the model ==: the rhythm vector substitution-Gaussian mixture 4. The method of standing as described in claim 2 The towel is constructed according to the meaning of the _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Vky 29579twf.doc/n A linguistic record; by the semantic rhythm record, an emotional rule is explored; and// according to the emotional rule, the semantical rhythm record is converted into the semantic prosody vector. u 5. The method for establishing an emotional semantic model according to item 4 of the patent scope of the application, wherein the steps of obtaining the semantic rhythm record according to the respective linguistic properties of the words include: binding Φ Φ & According to the semantic attribute, it is judged whether each of the words belongs to a language, and the language is lacking according to the definition of the gifted knowledge base. ^There are some of the words - the prosodic attribute belonging to the meaning of the language is - semantic rhythm Standard: π^ Ming law private record to the semantic rhythm record. The basis of the emotional semantic model described in the fifth item of the Li Erli range - the basic situation _ prime and the bribe knowing scale, set (4) note 咅 与 与 与 与 与 ( ( ( ( ( ( ( ( ( ( 否定 否定 否定 否定 否定 否定 否定 否定 否定 否定 否定 否定 否定The constructing of the emotional semantic model described in the item, the step of obtaining the semantic rhythm record, further includes: meaning meaning in the secret language, and determining that the prosody attribute of the words not belonging to the semantic label is a feature set And the 22 201021024 NMiy/U!V 29579twf.d〇c/n semantic rhythm record. 8. The method for establishing an emotional semantic model according to claim 7, wherein the step of recording the semantic rhythm vector according to the emotional rule comprises: calculating a horse according to the emotional rule a semantic score of the emotional feature word in the semantic rhythm record and a prosody score; and obtaining, according to the semantic score and the prosody score, a one-dimensional score of the semantic rhythm of each of the voice | signals recorded in the semantic rhythm vector 'The dimension of the semantic rhythm vector is determined by the number of emotion rules. 9. The method for establishing an emotional semantic model according to claim 1 of the patent application, wherein before the steps of respectively capturing the semantic attribute and the prosody attribute of each of the words in the voice signals, the method further comprises: converting Each of the voice signals is a sentence; and the words are processed by word segmentation to obtain the words. 10. The method of constructing the emotional semantic model as described in item 1 of the patent application scope wherein the vocabulary knowledge base is HowNet. 11. A method of establishing an emotional semantic model as described in claim 1 wherein the prosody attributes include pitch, energy, and length. 12, a method for classifying a voice emotion, comprising: establishing an emotional semantic model according to a semantic attribute and a prosody attribute included in each of a plurality of test words, wherein the semantic attribute is based on a vocabulary knowledge base Obtaining, the prosody attribute is obtained by each of the voice signals; 23 201021024 NMiy/uiy 29579 TW/doc/n receives a voice signal to be tested; and extracts each of the plurality of words to be tested in the voice signal to be tested a semantic attribute and the prosody attribute; substituting the semantic attribute and the prosody attribute of each of the to-be-senses into the emotional semantic model to obtain an emotional semantic score; and determining the to-be-tested speech according to the emotional semantic score The emotional category of the signal. I3. The method for classifying speech emotions as described in claim 12, further comprising: detecting an emotionally significant segment in the to-be-tested speech signal; and extracting a plurality of prosodic features of the emotionally significant segment Substituting the prosodic features into an emotional prosody model to obtain an emotional rhythm score. 14. The method for classifying a voice emotion according to claim 13 of the patent application scope, wherein the step of determining the emotion category of the voice signal to be tested according to the emotional semantic score further comprises: © based on the emotional semantic score and the emotional rhythm A score that determines the emotional category of the voice signal to be tested. 15. The method for classifying a voice emotion according to claim 13, wherein the step of detecting the emotion significant segment in the voice signal to be tested comprises: capturing a pitch of the voice signal to be tested And detecting a continuous segment in the pitch trajectory to use the continuous segment in the pitch trajectory as the emotional saliency segment. 24 201021024 NM19 /0iy 29579twf.doc/n, 16. The classification method of speech emotions as described in item 12 of the patent scope, the meaning of the language and the rhythm attribute of each of these words are substituted into the emotion. The step of the semantic model includes: converting the to-be-measured speech signal into a 5-year-old obedience vector according to the semantic attribute and the prosody attribute; and substituting the semantic prosody vector into the emotional semantic model. 17. The classification of speech emotions as described in item 16 of the scope of the patent application is based on the bribery and conversion of the bribe, and the step of converting the to-be-tested signal into the semantic rhythm vector includes: The meaning of the language is light and the rhythm attribute is obtained by a servant rhythm record; and, according to the - emotion rule, the semantic rhythm record is converted into the law. The party half applies for the classification attribute of the voice emotion described in Item 17 (4), and the semantic attribute and the rhythm of each of the to-be-tested words, and the steps of obtaining the semantic rhythm record, including: each of these _ _ The quotation is based on the definition of the vocabulary knowledge base; one of the fields to be tested belongs to the linguistic person == the corresponding rhythm attribute is "to record the I disease label to the semantic temperament record" And the test words belong to Ϊ 'determine the corresponding correspondences that do not belong to the semantic tag: 2 feature characters, combined with the emotional feature words and their _ sex as the feature set, and record the transfer set to 25 201021024 NMiy /ϋΐν 29579^^00^ The semantic rhythm record. 19. The method for classifying speech emotions as described in claim 18, wherein the semantic label includes a specific semantic label, a non-distribution, a label and a transitional meaning. Label 20. Thinking 20. The classification of the speech emotions described in item 18 of the patent application scope is based on the step of converting the semantic rhythm record into the belief law vector according to the emotional rule, including : 1 dip - according to the Qingxu rule, the semantic rhythm of the semantic rhythm records the scores of the 5th grade and the rhythm scores; and the scores based on the bribe scores, and obtain the rhythm of the rhythm a one-dimensional score, and the dimension of the semantic prosody vector is determined according to the number of the emotional rules.. 21. The classification of speech emotions as described in claim 12, wherein the step of establishing the emotional semantic model includes : The emotional semantic model is established according to a Misuse model. ❹ 22. The classification of speech emotions as described in claim 12, wherein each of the to-be-tested words in the speech signal to be tested is obtained. Before the step of the prosody attribute, the method further includes: converting the to-be-tested speech signal into a sentence; and performing word-breaking processing on the sentence to obtain the to-be-tested words. 23. As described in claim 12 Classification of speech emotions / , where the vocabulary knowledge base is HowNet (H〇wNet) 24. Assume the classification of speech emotions as described in item 12 of the patent application ^ Rhythm attributes include pitch, energy, and length. 26
TW97144755A 2008-11-19 2008-11-19 Method for classifying speech emotion and method for establishing emotional semantic model thereof TWI389100B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97144755A TWI389100B (en) 2008-11-19 2008-11-19 Method for classifying speech emotion and method for establishing emotional semantic model thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW97144755A TWI389100B (en) 2008-11-19 2008-11-19 Method for classifying speech emotion and method for establishing emotional semantic model thereof

Publications (2)

Publication Number Publication Date
TW201021024A true TW201021024A (en) 2010-06-01
TWI389100B TWI389100B (en) 2013-03-11

Family

ID=44832510

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97144755A TWI389100B (en) 2008-11-19 2008-11-19 Method for classifying speech emotion and method for establishing emotional semantic model thereof

Country Status (1)

Country Link
TW (1) TWI389100B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390409A (en) * 2012-05-11 2013-11-13 鸿富锦精密工业(深圳)有限公司 Electronic device and method for sensing pornographic voice bands
TWI512719B (en) * 2013-02-01 2015-12-11 Tencent Tech Shenzhen Co Ltd An acoustic language model training method and apparatus
US9396723B2 (en) 2013-02-01 2016-07-19 Tencent Technology (Shenzhen) Company Limited Method and device for acoustic language model training
TWI579830B (en) * 2015-12-29 2017-04-21 Chunghwa Telecom Co Ltd On the Chinese Text Normalization System and Method of Semantic Cooperative Processing
TWI639997B (en) * 2017-09-28 2018-11-01 大仁科技大學 Dialog understanding method based on probabilistic rule

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI602174B (en) * 2016-12-27 2017-10-11 李景峰 Emotion recording and management device, system and method based on voice recognition
TWI666558B (en) * 2018-11-20 2019-07-21 財團法人資訊工業策進會 Semantic analysis method, semantic analysis system, and non-transitory computer-readable medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390409A (en) * 2012-05-11 2013-11-13 鸿富锦精密工业(深圳)有限公司 Electronic device and method for sensing pornographic voice bands
TWI512719B (en) * 2013-02-01 2015-12-11 Tencent Tech Shenzhen Co Ltd An acoustic language model training method and apparatus
US9396723B2 (en) 2013-02-01 2016-07-19 Tencent Technology (Shenzhen) Company Limited Method and device for acoustic language model training
TWI579830B (en) * 2015-12-29 2017-04-21 Chunghwa Telecom Co Ltd On the Chinese Text Normalization System and Method of Semantic Cooperative Processing
TWI639997B (en) * 2017-09-28 2018-11-01 大仁科技大學 Dialog understanding method based on probabilistic rule

Also Published As

Publication number Publication date
TWI389100B (en) 2013-03-11

Similar Documents

Publication Publication Date Title
US11740863B2 (en) Search and knowledge base question answering for a voice user interface
CN109196495B (en) System and method for fine-grained natural language understanding
Poria et al. Fusing audio, visual and textual clues for sentiment analysis from multimodal content
Yu et al. A neural approach to pun generation
Poria et al. Towards an intelligent framework for multimodal affective data analysis
CN107016994B (en) Voice recognition method and device
CN110674339A (en) Chinese song emotion classification method based on multi-mode fusion
TW201021024A (en) Method for classifying speech emotion and method for establishing emotional semantic model thereof
Negi et al. A study of suggestions in opinionated texts and their automatic detection
Chandrasekar et al. Automatic speech emotion recognition: A survey
CN108846063A (en) Determine the method, apparatus, equipment and computer-readable medium of problem answers
CN105551485B (en) Voice file retrieval method and system
TW201140559A (en) Method and system for identifying emotional voices
JP5017534B2 (en) Drinking state determination device and drinking state determination method
Liu et al. A multi-modal chinese poetry generation model
Li et al. Towards zero-shot learning for automatic phonemic transcription
Houjeij et al. A novel approach for emotion classification based on fusion of text and speech
CN106021234A (en) Label extraction method and system
CN115422947A (en) Ancient poetry assignment method and system based on deep learning
CN113420556A (en) Multi-mode signal based emotion recognition method, device, equipment and storage medium
TWI269192B (en) Semantic emotion classifying system
CN113761377A (en) Attention mechanism multi-feature fusion-based false information detection method and device, electronic equipment and storage medium
CN107562907A (en) A kind of intelligent lawyer's expert system and case answering device
KR20130068624A (en) Apparatus and method for recognizing speech based on speaker group
CN107609096A (en) A kind of intelligent lawyer's expert responses method

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees