TW201008222A

TW201008222A - A mobile phone for emotion recognition of incoming-phones and a method thereof

Info

Publication number: TW201008222A
Application number: TW97131191A
Authority: TW
Inventors: Tang-Yu Chang
Original assignee: Chi Mei Comm Systems Inc
Priority date: 2008-08-15
Filing date: 2008-08-15
Publication date: 2010-02-16

Abstract

A mobile phone for emotion recognition of incoming-phones is provided. The mobile phone may include a speech recording unit, an A/D converter, a character extracting unit, an emotion classifier, and an emotion report outputting unit. The mobile phone may further include a memory that stores a plurality of emotion data corresponding to characters of speeches. The mobile phone can recognize emotions of speaks, including anger, happiness, sadness, boredom and neutral, during the communication of the talk. A related method for emotion recognition of incoming-phones is also provided.

Description

201008222 九、發明說明：【發明所屬之技術領域】本發明涉及語音賴技術，尤其係關於情緒辨識之«及^法。 m 【先前技術】根據研究，人類總共具有五種基本之情错反庫，包含生氣(Anger)、厭倦(bored)、快樂（happy)、 Ο 悲傷(sadness)。目前，忙碌之現代人彝 Μ6·1)及丹税人、朋友、同事之間，常以電話作為溝通與連絡感情之媒介如久 U管信之非面對面性，故時常不知對方在當前=於電話通態，有時更會因未能正確理解對方之說話含之情緒，從而說錯話引起雙方發生σ角，1❿誤解對方會。當今手機，若能在這方面提供Μ 資料，從而辨識出對方說話時之愔 _ 步之情感間感情交蚊提升相產生缺的效果㈣對於人與人之【發明内容】鑒於以上内容，有必要梃也 ^ _ ^ ^ 要&供一種實現來電情错辨，夕態。 _^程中_崎方之情緒狀 -種實現來電情_識製單元，狀將對方之來料音括：語音錄轉換器，贱賴比語 =:::藉由端點偵測原理切_:語二3 ^料與無聲語音:㈣，並根財聲語音訊號之頻 201008222 - 率大小從有聲語音訊號中擷取不同之特徵參數；情緒分類器，用於根據有不同之特徵參數讀取有聲語音訊號對應之情緒特徵資料，並對讀取之情緒特徵資料進行分類統計以產生情緒特徵之分類統計資料；及情緒輸出單元，用於根據情緒分類器產生之分類統計資料生成來電對方之情緒分析報告。一種手機來電情緒辨識之方法，該方法包括步驟：將對方之來電語音錄製為類比語音訊號；將類比語音訊號轉 ® 換為數位語音訊號；藉由端點偵測原理將數位語音訊號中之有聲語音資料與無聲語音資料切割開來；根據有聲語音訊號之頻率大小從有聲語音訊號中擷取不同之特徵參數；根據不同之特徵參數讀取有聲語音訊號對應之情緒特徵資料；對讀取之情緒特徵資料進行分類統計以產生情緒特徵之分類統計資料；及根據所述之分類統計資料生成來電對方之情緒分析報告。 • 相較於習知技術，所述之實現來電情緒辨識之手機及方法能夠在手機通話過程中辨識出對方之情緒狀態，從而提升通話雙方之間的通話品質。【實施方式】參閱圖1所示，係本發明實現來電情緒辨識之手機8 較佳實施例之結構圖。在本實施例中，所述之手機8包括語音錄製單元1、數模（A/D)轉換器2、特徵擷取單元3、記憶體4、情緒分類器5、情緒輸出單元6及顯示熒幕7。語音錄製單元1用於將對方之來電語音錄製為類比語 201008222 音訊號’並將該類比語音訊號傳送給她轉換器2。轉換器2用於將類比語音訊號轉換為數位注立特徵擷取單元3用於藉由端點偵測原:;數:語號中之有聲語音資料與無聲語音資料切割開來，數位語音訊號帽取㈣語音峨，顺據有轉音訊號^ 頻率大小從有聲語音訊號中擷取不同之特徵參數。如何利用端點賴原理將數位語音訊號中之有聲語音資料與無201008222 IX. Description of the Invention: [Technical Field of the Invention] The present invention relates to voice-based technology, and more particularly to the method of emotion recognition. m [Prior Art] According to research, humans have a total of five basic erroneous libraries, including Anger, Bored, Happy, and Sadness. At present, busy modern people 彝Μ6·1) and Dan taxpayers, friends, colleagues, often use the phone as a medium for communication and contact feelings, such as the long-term U-management is not face-to-face, so often do not know the other party at the current = on the phone The state of affairs, sometimes because of the failure to correctly understand the emotions of the other party's speech, so that the wrong words cause the σ angle of both sides, 1 misunderstanding the other party will. Today's mobile phones, if they can provide Μ information in this respect, thus recognizing the ambiguity of the other party's speech _ step emotions between the emotions of the mosquitoes to enhance the lack of results (four) for people and people [invention content] in view of the above, it is necessary梃也 ^ _ ^ ^ To & for a realization of the caller's illegitimate, eve state. _^ Cheng _ 崎之情绪 - 种种种种种种 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _: 语二3 ^ material and silent voice: (four), and the root of the voice voice signal frequency 201008222 - rate size from the voice signal to extract different feature parameters; emotional classifier, used to read according to different characteristic parameters Taking the emotional feature data corresponding to the voiced voice signal, and classifying and counting the read emotional feature data to generate classified statistical data of the emotional feature; and an emotion output unit for generating the incoming caller according to the classified statistical data generated by the emotional classifier Emotional analysis report. A method for recognizing a phone call emotion, the method comprising the steps of: recording a call voice of the other party as an analog voice signal; converting the analog voice signal into a digital voice signal; and sounding the digital voice signal by using the endpoint detection principle The voice data and the silent voice data are cut out; the different feature parameters are extracted from the voiced voice signal according to the frequency of the voiced voice signal; the emotional feature data corresponding to the voiced voice signal is read according to different feature parameters; The feature data is classified and statistically generated to generate classified statistical data of the emotional features; and the sentiment analysis report of the incoming caller is generated according to the classified statistical data. • Compared with the prior art, the mobile phone and the method for realizing the emotional recognition of the incoming call can recognize the emotional state of the other party during the mobile phone call, thereby improving the call quality between the two parties. [Embodiment] Referring to FIG. 1, a structural diagram of a preferred embodiment of a mobile phone 8 for implementing call emotion recognition according to the present invention is shown. In this embodiment, the mobile phone 8 includes a voice recording unit 1, a digital-to-analog (A/D) converter 2, a feature capturing unit 3, a memory 4, an emotion classifier 5, an emotion output unit 6, and a display fluorescent device. Curtain 7. The voice recording unit 1 is used to record the incoming call voice of the other party as an analogy 201008222 audio signal ' and transmit the analog voice signal to her converter 2. The converter 2 is configured to convert the analog voice signal into a digital signature feature capture unit 3 for detecting the original voice through the endpoint: the number: the voiced voice data and the voice data in the language are cut, the digital voice signal The cap takes (four) voice 峨, and according to the frequency of the audio signal ^ frequency, different characteristic parameters are extracted from the voiced voice signal. How to use the principle of endpoints to compare the voiced data in digital voice signals with and without

語音資料進㈣#m在下® 2巾進行詳細描述。所述: 徵參數用於描述語音特徵之聲學參數，例如倒譜系數 CMel-FrequencyCepstmm C〇efficients，MFCC)等曰。’、記憶體4用於存儲不同特徵參數所對應之情緒特徵資料。例如：一個特徵參數A與一個情緒特徵資料（例如：生氣“angry”）相對應。所述之情緒特徵資料係手機製造商預先定義之，在本實施例中，該情緒特徵資料直接存儲在手機8之記憶體4中。在其他實施例中，所述之情緒特徵資料可以存儲在手機運營商之網路資料庫中。情緒分類器5用於根據不同之特徵參數從記憶體4中讀取有聲語音訊號對應之情緒特徵資料，並對讀取之情緒特徵資料進行分類統計以產生情緒特徵之分類統計資料。情緒分類器5利用相近資料具有同類特徵之原理對讀取之情緒特徵資料進行分類統計，例如，若兩個有聲語音訊號之MFCC值相差不大於一個預設值，則該兩個有聲語音訊號係相近之有聲語音訊號，且與同一個情緒特徵（例如：生氣“angry”）相對應。在本實施例中，情緒分類器5根據 201008222 情緒特徵之分類統計資料中統計值最高之情緒特徵來判斷對方當前情緒之，例如，若情緒特徵之分類統計資料為：悲傷度（sadness ) =4，生氣度（angry ) =2，快樂度（happy ) =1 ’中性度（n_al) =1及厭倦度（bored) =〇，則情緒分類器5判定該情緒類別係為“悲傷（sadness)，，。The voice data into (4) #m is described in detail in the lower ® 2 towel. The: sign parameter is used to describe the acoustic parameters of the speech feature, such as the cepstral coefficient CMel-FrequencyCepstmm C〇efficients, MFCC). The memory 4 is used to store emotional feature data corresponding to different feature parameters. For example, a feature parameter A corresponds to an emotional profile (eg, angry "angry"). The emotional feature data is predefined by the mobile phone manufacturer. In the embodiment, the emotional feature data is directly stored in the memory 4 of the mobile phone 8. In other embodiments, the emotional profile data can be stored in a network of mobile phone operators. The emotion classifier 5 is configured to read the emotional feature data corresponding to the voiced voice signal from the memory 4 according to different feature parameters, and classify and calculate the read emotional feature data to generate classified statistical data of the emotional feature. The emotion classifier 5 classifies the read emotional feature data by using the principle that the similar data has the same characteristics. For example, if the MFCC values of the two voice signals are not more than a preset value, the two voice signals are A similar voiced voice signal corresponds to the same emotional feature (eg, angry "angry"). In the present embodiment, the emotion classifier 5 judges the current emotion of the opponent according to the emotional feature with the highest statistical value in the classified statistical data of the 201008222 emotion, for example, if the classification statistics of the emotional feature are: sadness = 4 , angryness (angry) = 2, happiness (happy) =1 'neutrality (n_al) =1 and tiredness (bored) = 〇, the emotion classifier 5 determines that the emotion category is "sadness" ,,.

情緒輸出單元6用於根據情緒特徵之分類統計資料生成來電對方之情緒分析報告，並將該情緒分析報告輸出並顯不在手機8之顯示熒幕7上。所述之情緒分析報告包括生氣度、厭倦度、快樂度、平常度及悲傷度，從而讓使用者瞭解對方通話時之情緒狀態。參閱圖2所示’係圖1中之特徵擷取單元3利用端點偵測原理切割有聲語音與無聲語音之示意圖。本實施例中’端點偵測主要目的係為在切割出語音訊號中之有聲資料與無聲資料’其依據某一個時間内語音訊號中之能量或越零率。如圖2所示，“Enl”表示一個能量保守值，若語音訊號之能量小於等於該能量保守值“Enl”，則特徵擷取單元 3判定該語音訊號為無聲語音；若語音訊號之能量大於該能量保守值“Enl”，則特徵擷取單元3判定該語音訊號為有聲語音。“En2”表示一個比“Enl”大之開始能量值，若某一時刻“U”之語音訊號能量大於能量值“En2”，則該時刻“tl” 即為該語音有聲訊號之開始。“EnEnd”表示一個比“Enl，，小之終點能量值，若某一時刻“t2”之語音訊號能量小於能量值EnEnd”，則該時刻“t2”即為該語音有聲訊號之結束。特徵擷取單元3將時刻“tl”到時刻“t2”之間之按能量值之大 201008222 Π語切割出聲語音資料與無聲語音資料。在圖有聲1盘採用越零率“ZCR ”來切割出語音訊射之原理相同，因此本實施例不再做詳細地蘭述判斷佳實施不’係本發明手機來電情緒辨識之方法較製為類比圖。㊄音錄製單元1將對方之來電語音錄 W為類比語音訊號，並將曰％參器2 (步驟S31)。A/n魅員比"訊说傳送給A/D轉換位語音訊號（步驟议）換盗2將類比語音訊號轉換為數之二= 偵測原理將數位語音訊號中音訊號t獲取4;=_切割開來，以便從數位語根據有聲語音訊號之頻率大小特徵擷取單元3 之特徵參數（步驟S34)，有聲曰訊號中擷取不同語音訊號中之有聲纽立5Γ利用端點備測原理切割數位情緒分類不=聲語音資料如圖2描述。百心。日减對應之情崎” 賈取器5對讀取之情緒特徵資料驟^)。情緒分類之分類統計資料（步驟生情緒特徵具有同類特徵之原理對讀取九5利用相近資料計。例如，特徵擷取單^ 3擷月聲特^貝料進行分類統The emotion output unit 6 is configured to generate an emotion analysis report of the calling party based on the classification statistics of the emotion characteristics, and output the emotion analysis report and display it on the display screen 7 of the mobile phone 8. The sentiment analysis report includes anger, tiredness, happiness, normality, and sadness, so that the user can understand the emotional state of the other party during the call. Referring to Fig. 2, the feature capturing unit 3 in Fig. 1 cuts the schematic diagram of the voiced voice and the voiceless voice by using the endpoint detection principle. In this embodiment, the main purpose of the endpoint detection is to cut the voiced data and the unvoiced data in the voice signal by the energy or the zero rate in the voice signal at a certain time. As shown in FIG. 2, "Enl" represents a conservative value of energy. If the energy of the voice signal is less than or equal to the conservative value "Enl" of the energy, the feature extraction unit 3 determines that the voice signal is silent voice; if the energy of the voice signal is greater than The energy conservation value "Enl", the feature extraction unit 3 determines that the voice signal is voiced speech. "En2" indicates a starting energy value larger than "Enl". If the voice signal energy of "U" at a certain time is greater than the energy value "En2", then the time "tl" is the beginning of the voiced voice signal. "EnEnd" indicates an end energy value that is smaller than "Enl,". If the voice signal energy of a certain time "t2" is less than the energy value EnEnd", the time "t2" is the end of the voiced signal. The feature extracting unit 3 cuts out the voice data and the silent voice data from the time value "tl" to the time "t2" according to the energy value of the 201008222 slang. In the figure, the principle that the zero-rate "ZCR" is used to cut out the voice signal is the same. Therefore, the method of the present invention is not described in detail. Analog map. The five-tone recording unit 1 records the incoming call voice of the other party as an analog voice signal, and 曰% the parameter 2 (step S31). A/n charm ratio than "communication" is transmitted to the A/D conversion bit voice signal (step negotiation) change theft 2 to convert the analog voice signal into the number two = detection principle will digital voice signal midrange signal t get 4; _cutting, so as to extract the characteristic parameters of the unit 3 according to the frequency size characteristics of the voiced voice signal from the digital language (step S34), and picking up the voice in the voice signal from the voice signal, using the terminal preparation principle Cutting digital emotion classification is not = acoustic speech data as depicted in Figure 2. Hundreds of hearts. The diminishing corresponding to the emotional singer" Jia Jia 5 pairs of reading emotional characteristics of the data ^). The classification of statistical classification of emotional classification (steps of the emotional characteristics of the same characteristics of the principle of reading the same use of the data. For example, Feature selection list ^ 3撷月声特^Bei material for classification

情緒分類器5將MFCC值進行相鄰距離計算，取個L 離最短之情緒資料定義語音之情個值距傷度(一)=4,生氣戶（緒特徵，如果修5’悲玍孔度Ungry)=2，快樂度（ 201008222 • =1，中性度（neutral) =1及厭倦度（bored) =0，則情緒分類器5判定該情緒類別係為“悲傷（sadness )’’。情緒輸出單元6根據情绪分類器5產生之分類統計資料生成來電對方之情緒分析報告。所述之情緒分析報告描迷了對方通話時之情緒狀態，其包括生氣度、厭倦度、快樂度、平常度及悲傷度（步驟S37)。最後，情緒輸出單元 6將該情緒分析報告輸出並顯示在手機8之顯示螢幕7 魯上’以供使用者瞭解對方通話時之情緒狀態（步驟S38)。本發明雖以較佳實施方式揭露如上，然其並非用以限定本發明。任何熟悉此項技藝者，在不脫離本發明之精神和範圍内’當可做更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。【圖式簡單說明】圖1係本發明實現來電情緒辨識之手機較佳實施例之結構圖。 » 圖2係圖1中之特徵擷取單元利用端點偵測原理切割有聲語音與無聲語音之示意圖。圖3係本發明實現手機來電情緒辨識之方法較佳實施例之流程圖。【主要元件符號說明】語音錄製單元丄 A/D轉換器 2 特徵擷取單元 3 記憶體， 201008222 情緒分類器情緒輸出單元顯示熒幕手機The emotion classifier 5 calculates the MFCC value for the adjacent distance, and takes an L from the shortest emotional data to define the voice. The value of the sentiment is from the injury degree (1) = 4, angry households (when the characteristics, if the repair 5' grief Ungry)=2, happiness (201008222 • =1, neutrality =1 and bored =0, the emotion classifier 5 determines that the emotion category is “sadness”. The output unit 6 generates an emotional analysis report of the incoming caller according to the classified statistical data generated by the emotion classifier 5. The sentiment analysis report describes the emotional state of the other party's call, including anger, tiredness, happiness, and normality. And the degree of sadness (step S37). Finally, the emotion output unit 6 outputs and displays the sentiment analysis report on the display screen 7 of the mobile phone 8 for the user to know the emotional state of the other party's call (step S38). The present invention is not limited to the scope of the present invention, and may be modified and retouched, and thus the protection of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a structural diagram of a preferred embodiment of a mobile phone for implementing call emotion recognition according to the present invention. FIG. 2 is a feature capture of FIG. The unit uses the endpoint detection principle to cut the schematic diagram of the voiced voice and the voiceless voice. Fig. 3 is a flow chart of a preferred embodiment of the method for realizing the emotional recognition of the incoming call of the mobile phone according to the present invention. [Description of main component symbols] Voice recording unit 丄 A/D conversion Device 2 feature extraction unit 3 memory, 201008222 emotion classifier emotion output unit display screen mobile phone

Claims

201008222 X. Applying for Patent Fan Park 1 · A mobile phone that realizes the emotional recognition of incoming calls. The mobile phone includes: a voice recording unit for recording the incoming call voice as an analog voice A/D conversion n, and changing the analog voice signal to digital Zhayuan, used to cut the digital voice by the end of the grammar: the number of the 艏Γ rape (4) and the silent voice (four), the silk according to the voice signal, 'small from the sound s voice signal to draw different characteristic parameters;

= Classification n 'Poetry (4) Different characteristic parameters read the emotional characteristics of the voiced speech signal, and classify the read emotional feature data to generate the classified statistical data of the emotional characteristics; and the unit 'used according to the emotion The classified statistics generated by the classifier generate an sentiment analysis report of the incoming caller. ^ As claimed, the hand of the silk electric emotion identification mentioned in Item 1 of the patent (4): the data in the memory is stored in the memory of the mobile phone, or stored in the network database of the operator. Machine, special: the hand of the range of the first to achieve the emotional recognition of the caller's emotional classifier according to the similar data with similar characteristics of the principle, s emotional characteristics of the data for classification and statistics. = = as claimed in the patent _ the first item of the real county electric sentiment identification hand out and display unit is also used to the emotional analysis report, the shell is not on the display screen of the mobile phone. [·If the application (4) _ 4 items are 敎丝 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 12 201008222 6 · A method for emotional recognition of a mobile phone call, the method comprising the steps of: recording an incoming call voice as an analog voice signal; converting the analog voice signal into a digital voice signal; and using the endpoint detection principle to digitize the voice signal The voiced voice data and the voiced voice data are cut out; the different feature parameters are extracted from the voiced voice signal according to the frequency of the voiced voice signal; and the emotional features corresponding to the voiced voice signal are read according to different feature parameters.

The classified emotional feature data is classified and statisticed to generate classified statistical data of the emotional characteristics; and the sentiment analysis report of the incoming caller is generated according to the classified statistical data. 7. The method for identifying the emotional characteristics of a mobile phone as described in claim 6 of the patent application, wherein the emotional feature data is stored in a memory of the mobile phone or stored in a network database of the mobile phone operator. 8 · As claimed in claim 6 of the patent application scope, the endpoint detection principle cuts out the voice data and the silent voice in the voice signal according to the energy or the zero rate in the voice signal. data. 9. The method as claimed in claim 6, wherein the method further comprises the step of: outputting and displaying the sentiment analysis report on a display screen of the mobile phone. 10. The method of emotional recognition of a mobile phone call as described in claim 9 of the patent scope, wherein the sentiment analysis report describes the emotional state of the other party's call, including anger, tiredness, happiness, normality and sadness. 13