TW201008222A - A mobile phone for emotion recognition of incoming-phones and a method thereof - Google Patents
A mobile phone for emotion recognition of incoming-phones and a method thereof Download PDFInfo
- Publication number
- TW201008222A TW201008222A TW97131191A TW97131191A TW201008222A TW 201008222 A TW201008222 A TW 201008222A TW 97131191 A TW97131191 A TW 97131191A TW 97131191 A TW97131191 A TW 97131191A TW 201008222 A TW201008222 A TW 201008222A
- Authority
- TW
- Taiwan
- Prior art keywords
- emotional
- mobile phone
- voice
- voice signal
- data
- Prior art date
Links
Landscapes
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
Description
201008222 九、發明說明: 【發明所屬之技術領域】 本發明涉及語音賴技術,尤其係關於 情緒辨識之«及^法。 m 【先前技術】 根據研究,人類總共具有五種基本之情错反庫,包含 生氣(Anger)、厭倦(bored)、快樂(happy)、 Ο 悲傷(sadness)。目前,忙碌之現代人彝 Μ6·1)及 丹税人、朋友、同事 之間,常以電話作為溝通與連絡感情之媒介如久 U管 信之非面對面性,故時常不知對方在當前=於電話通 態,有時更會因未能正確理解對方之說話含 之情緒,從而說錯話引起雙方發生σ角,1❿誤解對方 會。當今手機,若能在這方面提供Μ 資料,從而辨識出對方說話時之愔 _ 步之情感 間感情交蚊提升相產生缺的效果㈣對於人與人之 【發明内容】 鑒於以上内容,有必要梃也 ^ _ ^ ^ 要&供一種實現來電情错辨,夕 態。 _^程中_崎方之情緒狀 -種實現來電情_識 製單元,狀將對方之來料音括:語音錄 轉換器,贱賴比語 =:::藉由端點偵測原理切_:語二3 ^料與無聲語音:㈣,並根財聲語音訊號之頻 201008222 - 率大小從有聲語音訊號中擷取不同之特徵參數;情緒分類 器,用於根據有不同之特徵參數讀取有聲語音訊號對應之 情緒特徵資料,並對讀取之情緒特徵資料進行分類統計以 產生情緒特徵之分類統計資料;及情緒輸出單元,用於根 據情緒分類器產生之分類統計資料生成來電對方之情緒分 析報告。 一種手機來電情緒辨識之方法,該方法包括步驟:將 對方之來電語音錄製為類比語音訊號;將類比語音訊號轉 ® 換為數位語音訊號;藉由端點偵測原理將數位語音訊號中 之有聲語音資料與無聲語音資料切割開來;根據有聲語音 訊號之頻率大小從有聲語音訊號中擷取不同之特徵參數; 根據不同之特徵參數讀取有聲語音訊號對應之情緒特徵資 料;對讀取之情緒特徵資料進行分類統計以產生情緒特徵 之分類統計資料;及根據所述之分類統計資料生成來電對 方之情緒分析報告。 • 相較於習知技術,所述之實現來電情緒辨識之手機及 方法能夠在手機通話過程中辨識出對方之情緒狀態,從而 提升通話雙方之間的通話品質。 【實施方式】 參閱圖1所示,係本發明實現來電情緒辨識之手機8 較佳實施例之結構圖。在本實施例中,所述之手機8包括 語音錄製單元1、數模(A/D)轉換器2、特徵擷取單元3、 記憶體4、情緒分類器5、情緒輸出單元6及顯示熒幕7。 語音錄製單元1用於將對方之來電語音錄製為類比語 201008222 音訊號’並將該類比語音訊號傳送給她轉換器2。 轉換器2用於將類比語音訊號轉換為數位注立 特徵擷取單元3用於藉由端點偵測原:;數:語 號中之有聲語音資料與無聲語音資料切割開來,數 位語音訊號帽取㈣語音峨,顺據有轉音訊號^ 頻率大小從有聲語音訊號中擷取不同之特徵參數。如何利 用端點賴原理將數位語音訊號中之有聲語音資料與無201008222 IX. Description of the Invention: [Technical Field of the Invention] The present invention relates to voice-based technology, and more particularly to the method of emotion recognition. m [Prior Art] According to research, humans have a total of five basic erroneous libraries, including Anger, Bored, Happy, and Sadness. At present, busy modern people 彝Μ6·1) and Dan taxpayers, friends, colleagues, often use the phone as a medium for communication and contact feelings, such as the long-term U-management is not face-to-face, so often do not know the other party at the current = on the phone The state of affairs, sometimes because of the failure to correctly understand the emotions of the other party's speech, so that the wrong words cause the σ angle of both sides, 1 misunderstanding the other party will. Today's mobile phones, if they can provide Μ information in this respect, thus recognizing the ambiguity of the other party's speech _ step emotions between the emotions of the mosquitoes to enhance the lack of results (four) for people and people [invention content] in view of the above, it is necessary梃 也 ^ _ ^ ^ To & for a realization of the caller's illegitimate, eve state. _^ Cheng _ 崎 之 情绪 - 种 种 种 种 种 种 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _: 语二3 ^ material and silent voice: (four), and the root of the voice voice signal frequency 201008222 - rate size from the voice signal to extract different feature parameters; emotional classifier, used to read according to different characteristic parameters Taking the emotional feature data corresponding to the voiced voice signal, and classifying and counting the read emotional feature data to generate classified statistical data of the emotional feature; and an emotion output unit for generating the incoming caller according to the classified statistical data generated by the emotional classifier Emotional analysis report. A method for recognizing a phone call emotion, the method comprising the steps of: recording a call voice of the other party as an analog voice signal; converting the analog voice signal into a digital voice signal; and sounding the digital voice signal by using the endpoint detection principle The voice data and the silent voice data are cut out; the different feature parameters are extracted from the voiced voice signal according to the frequency of the voiced voice signal; the emotional feature data corresponding to the voiced voice signal is read according to different feature parameters; The feature data is classified and statistically generated to generate classified statistical data of the emotional features; and the sentiment analysis report of the incoming caller is generated according to the classified statistical data. • Compared with the prior art, the mobile phone and the method for realizing the emotional recognition of the incoming call can recognize the emotional state of the other party during the mobile phone call, thereby improving the call quality between the two parties. [Embodiment] Referring to FIG. 1, a structural diagram of a preferred embodiment of a mobile phone 8 for implementing call emotion recognition according to the present invention is shown. In this embodiment, the mobile phone 8 includes a voice recording unit 1, a digital-to-analog (A/D) converter 2, a feature capturing unit 3, a memory 4, an emotion classifier 5, an emotion output unit 6, and a display fluorescent device. Curtain 7. The voice recording unit 1 is used to record the incoming call voice of the other party as an analogy 201008222 audio signal ' and transmit the analog voice signal to her converter 2. The converter 2 is configured to convert the analog voice signal into a digital signature feature capture unit 3 for detecting the original voice through the endpoint: the number: the voiced voice data and the voice data in the language are cut, the digital voice signal The cap takes (four) voice 峨, and according to the frequency of the audio signal ^ frequency, different characteristic parameters are extracted from the voiced voice signal. How to use the principle of endpoints to compare the voiced data in digital voice signals with and without
語音資料進㈣#m在下® 2巾進行詳細描述。所述: 徵參數用於描述語音特徵之聲學參數,例如倒譜系數 CMel-FrequencyCepstmm C〇efficients,MFCC)等曰。’、 記憶體4用於存儲不同特徵參數所對應之情緒特徵資 料。例如:一個特徵參數A與一個情緒特徵資料(例如: 生氣“angry”)相對應。所述之情緒特徵資料係手機製造商 預先定義之,在本實施例中,該情緒特徵資料直接存儲在 手機8之記憶體4中。在其他實施例中,所述之情緒特徵 資料可以存儲在手機運營商之網路資料庫中。 情緒分類器5用於根據不同之特徵參數從記憶體4中 讀取有聲語音訊號對應之情緒特徵資料,並對讀取之情緒 特徵資料進行分類統計以產生情緒特徵之分類統計資料。 情緒分類器5利用相近資料具有同類特徵之原理對讀取之 情緒特徵資料進行分類統計,例如,若兩個有聲語音訊號 之MFCC值相差不大於一個預設值,則該兩個有聲語音訊 號係相近之有聲語音訊號,且與同一個情緒特徵(例如: 生氣“angry”)相對應。在本實施例中,情緒分類器5根據 201008222 情緒特徵之分類統計資料中統計值最高之情緒特徵來判斷 對方當前情緒之,例如,若情緒特徵之分類統計資料為: 悲傷度(sadness ) =4,生氣度(angry ) =2,快樂度(happy ) =1 ’中性度(n_al) =1及厭倦度(bored) =〇,則情緒 分類器5判定該情緒類別係為“悲傷(sadness),,。The voice data into (4) #m is described in detail in the lower ® 2 towel. The: sign parameter is used to describe the acoustic parameters of the speech feature, such as the cepstral coefficient CMel-FrequencyCepstmm C〇efficients, MFCC). The memory 4 is used to store emotional feature data corresponding to different feature parameters. For example, a feature parameter A corresponds to an emotional profile (eg, angry "angry"). The emotional feature data is predefined by the mobile phone manufacturer. In the embodiment, the emotional feature data is directly stored in the memory 4 of the mobile phone 8. In other embodiments, the emotional profile data can be stored in a network of mobile phone operators. The emotion classifier 5 is configured to read the emotional feature data corresponding to the voiced voice signal from the memory 4 according to different feature parameters, and classify and calculate the read emotional feature data to generate classified statistical data of the emotional feature. The emotion classifier 5 classifies the read emotional feature data by using the principle that the similar data has the same characteristics. For example, if the MFCC values of the two voice signals are not more than a preset value, the two voice signals are A similar voiced voice signal corresponds to the same emotional feature (eg, angry "angry"). In the present embodiment, the emotion classifier 5 judges the current emotion of the opponent according to the emotional feature with the highest statistical value in the classified statistical data of the 201008222 emotion, for example, if the classification statistics of the emotional feature are: sadness = 4 , angryness (angry) = 2, happiness (happy) =1 'neutrality (n_al) =1 and tiredness (bored) = 〇, the emotion classifier 5 determines that the emotion category is "sadness" ,,.
情緒輸出單元6用於根據情緒特徵之分類統計資料生 成來電對方之情緒分析報告,並將該情緒分析報告輸出並 顯不在手機8之顯示熒幕7上。所述之情緒分析報告包括 生氣度、厭倦度、快樂度、平常度及悲傷度,從而讓使用 者瞭解對方通話時之情緒狀態。 參閱圖2所示’係圖1中之特徵擷取單元3利用端點 偵測原理切割有聲語音與無聲語音之示意圖。本實施例 中’端點偵測主要目的係為在切割出語音訊號中之有聲資 料與無聲資料’其依據某一個時間内語音訊號中之能量或 越零率。如圖2所示,“Enl”表示一個能量保守值,若語音 訊號之能量小於等於該能量保守值“Enl”,則特徵擷取單元 3判定該語音訊號為無聲語音;若語音訊號之能量大於該 能量保守值“Enl”,則特徵擷取單元3判定該語音訊號為有 聲語音。“En2”表示一個比“Enl”大之開始能量值,若某一 時刻“U”之語音訊號能量大於能量值“En2”,則該時刻“tl” 即為該語音有聲訊號之開始。“EnEnd”表示一個比“Enl,,小 之終點能量值,若某一時刻“t2”之語音訊號能量小於能量 值EnEnd”,則該時刻“t2”即為該語音有聲訊號之結束。特 徵擷取單元3將時刻“tl”到時刻“t2”之間之按能量值之大 201008222 Π語切割出聲語音資料與無聲語音資料。在圖 有聲1盘採用越零率“ZCR ”來切割出語音訊射之 原理相同,因此本實施例不再做詳細地蘭述 判斷 佳實施不’係本發明手機來電情緒辨識之方法較 製為類比圖。㊄音錄製單元1將對方之來電語音錄 W為類比語音訊號,並將 曰% 參 器2 (步驟S31)。A/n魅員比"訊说傳送給A/D轉換 位語音訊號(步驟议)換盗2將類比語音訊號轉換為數 之二= 偵測原理將數位語音訊號中 音訊號t獲取4;=_切割開來,以便從數位語 根據有聲語音訊號之頻率大小特徵擷取單元3 之特徵參數(步驟S34),有聲曰訊號中擷取不同 語音訊號中之有聲纽立5Γ利用端點備測原理切割數位 情緒分類不=聲語音資料如圖2描述。 百心。日减對應之情崎” 賈取 器5對讀取之情緒特徵資料 驟^)。情緒分類 之分類統計資料(步驟生情緒特徵 具有同類特徵之原理對讀取 九5利用相近資料 計。例如,特徵擷取單^ 3擷月聲特^貝料進行分類統The emotion output unit 6 is configured to generate an emotion analysis report of the calling party based on the classification statistics of the emotion characteristics, and output the emotion analysis report and display it on the display screen 7 of the mobile phone 8. The sentiment analysis report includes anger, tiredness, happiness, normality, and sadness, so that the user can understand the emotional state of the other party during the call. Referring to Fig. 2, the feature capturing unit 3 in Fig. 1 cuts the schematic diagram of the voiced voice and the voiceless voice by using the endpoint detection principle. In this embodiment, the main purpose of the endpoint detection is to cut the voiced data and the unvoiced data in the voice signal by the energy or the zero rate in the voice signal at a certain time. As shown in FIG. 2, "Enl" represents a conservative value of energy. If the energy of the voice signal is less than or equal to the conservative value "Enl" of the energy, the feature extraction unit 3 determines that the voice signal is silent voice; if the energy of the voice signal is greater than The energy conservation value "Enl", the feature extraction unit 3 determines that the voice signal is voiced speech. "En2" indicates a starting energy value larger than "Enl". If the voice signal energy of "U" at a certain time is greater than the energy value "En2", then the time "tl" is the beginning of the voiced voice signal. "EnEnd" indicates an end energy value that is smaller than "Enl,". If the voice signal energy of a certain time "t2" is less than the energy value EnEnd", the time "t2" is the end of the voiced signal. The feature extracting unit 3 cuts out the voice data and the silent voice data from the time value "tl" to the time "t2" according to the energy value of the 201008222 slang. In the figure, the principle that the zero-rate "ZCR" is used to cut out the voice signal is the same. Therefore, the method of the present invention is not described in detail. Analog map. The five-tone recording unit 1 records the incoming call voice of the other party as an analog voice signal, and 曰% the parameter 2 (step S31). A/n charm ratio than "communication" is transmitted to the A/D conversion bit voice signal (step negotiation) change theft 2 to convert the analog voice signal into the number two = detection principle will digital voice signal midrange signal t get 4; _cutting, so as to extract the characteristic parameters of the unit 3 according to the frequency size characteristics of the voiced voice signal from the digital language (step S34), and picking up the voice in the voice signal from the voice signal, using the terminal preparation principle Cutting digital emotion classification is not = acoustic speech data as depicted in Figure 2. Hundreds of hearts. The diminishing corresponding to the emotional singer" Jia Jia 5 pairs of reading emotional characteristics of the data ^). The classification of statistical classification of emotional classification (steps of the emotional characteristics of the same characteristics of the principle of reading the same use of the data. For example, Feature selection list ^ 3撷月声特^Bei material for classification
情緒分類器5將MFCC值進行相鄰距離計算,取個L 離最短之情緒資料定義語音之情個值距 傷度(一)=4,生氣戶(緒特徵,如果修5’悲 玍孔度Ungry)=2,快樂度( 201008222 • =1,中性度(neutral) =1及厭倦度(bored) =0,則情緒 分類器5判定該情緒類別係為“悲傷(sadness )’’。 情緒輸出單元6根據情绪分類器5產生之分類統計資 料生成來電對方之情緒分析報告。所述之情緒分析報告描 迷了對方通話時之情緒狀態,其包括生氣度、厭倦度、快 樂度、平常度及悲傷度(步驟S37)。最後,情緒輸出單元 6將該情緒分析報告輸出並顯示在手機8之顯示螢幕7 魯上’以供使用者瞭解對方通話時之情緒狀態(步驟S38)。 本發明雖以較佳實施方式揭露如上,然其並非用以限 定本發明。任何熟悉此項技藝者,在不脫離本發明之精神 和範圍内’當可做更動與潤飾,因此本發明之保護範圍當 視後附之申請專利範圍所界定者為準。 【圖式簡單說明】 圖1係本發明實現來電情緒辨識之手機較佳實施例之 結構圖。 » 圖2係圖1中之特徵擷取單元利用端點偵測原理切割 有聲語音與無聲語音之示意圖。 圖3係本發明實現手機來電情緒辨識之方法較佳實施 例之流程圖。 【主要元件符號說明】 語音錄製單元 丄 A/D轉換器 2 特徵擷取單元 3 記憶體 , 201008222 情緒分類器 情緒輸出單元 顯示熒幕 手機The emotion classifier 5 calculates the MFCC value for the adjacent distance, and takes an L from the shortest emotional data to define the voice. The value of the sentiment is from the injury degree (1) = 4, angry households (when the characteristics, if the repair 5' grief Ungry)=2, happiness (201008222 • =1, neutrality =1 and bored =0, the emotion classifier 5 determines that the emotion category is “sadness”. The output unit 6 generates an emotional analysis report of the incoming caller according to the classified statistical data generated by the emotion classifier 5. The sentiment analysis report describes the emotional state of the other party's call, including anger, tiredness, happiness, and normality. And the degree of sadness (step S37). Finally, the emotion output unit 6 outputs and displays the sentiment analysis report on the display screen 7 of the mobile phone 8 for the user to know the emotional state of the other party's call (step S38). The present invention is not limited to the scope of the present invention, and may be modified and retouched, and thus the protection of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a structural diagram of a preferred embodiment of a mobile phone for implementing call emotion recognition according to the present invention. FIG. 2 is a feature capture of FIG. The unit uses the endpoint detection principle to cut the schematic diagram of the voiced voice and the voiceless voice. Fig. 3 is a flow chart of a preferred embodiment of the method for realizing the emotional recognition of the incoming call of the mobile phone according to the present invention. [Description of main component symbols] Voice recording unit 丄 A/D conversion Device 2 feature extraction unit 3 memory, 201008222 emotion classifier emotion output unit display screen mobile phone
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW97131191A TW201008222A (en) | 2008-08-15 | 2008-08-15 | A mobile phone for emotion recognition of incoming-phones and a method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW97131191A TW201008222A (en) | 2008-08-15 | 2008-08-15 | A mobile phone for emotion recognition of incoming-phones and a method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
TW201008222A true TW201008222A (en) | 2010-02-16 |
Family
ID=44827370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW97131191A TW201008222A (en) | 2008-08-15 | 2008-08-15 | A mobile phone for emotion recognition of incoming-phones and a method thereof |
Country Status (1)
Country | Link |
---|---|
TW (1) | TW201008222A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9329677B2 (en) | 2011-12-29 | 2016-05-03 | National Taiwan University | Social system and method used for bringing virtual social network into real life |
TWI684148B (en) * | 2014-02-26 | 2020-02-01 | 華為技術有限公司 | Grouping processing method and device of contact person |
-
2008
- 2008-08-15 TW TW97131191A patent/TW201008222A/en unknown
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9329677B2 (en) | 2011-12-29 | 2016-05-03 | National Taiwan University | Social system and method used for bringing virtual social network into real life |
TWI684148B (en) * | 2014-02-26 | 2020-02-01 | 華為技術有限公司 | Grouping processing method and device of contact person |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6341092B2 (en) | Expression classification device, expression classification method, dissatisfaction detection device, and dissatisfaction detection method | |
US10878824B2 (en) | Speech-to-text generation using video-speech matching from a primary speaker | |
WO2020211354A1 (en) | Speaker identity recognition method and device based on speech content, and storage medium | |
US11688416B2 (en) | Method and system for speech emotion recognition | |
US8676586B2 (en) | Method and apparatus for interaction or discourse analytics | |
JP4085924B2 (en) | Audio processing device | |
KR101149135B1 (en) | Method and apparatus for voice interactive messaging | |
US8165874B2 (en) | System, method, and program product for processing speech ratio difference data variations in a conversation between two persons | |
JP5731998B2 (en) | Dialog support device, dialog support method, and dialog support program | |
CN101789990A (en) | Method and mobile terminal for judging emotion of opposite party in conservation process | |
JP5311348B2 (en) | Speech keyword collation system in speech data, method thereof, and speech keyword collation program in speech data | |
WO2014069120A1 (en) | Analysis object determination device and analysis object determination method | |
TWI484475B (en) | Method for displaying words, voice-to-text device and computer program product | |
CN110149805A (en) | Double-directional speech translation system, double-directional speech interpretation method and program | |
TW201214413A (en) | Modification of speech quality in conversations over voice channels | |
WO2016187910A1 (en) | Voice-to-text conversion method and device, and storage medium | |
CN109448728A (en) | Merge the multi-party conversation method for visualizing and system of emotion recognition | |
CN108364638A (en) | A kind of voice data processing method, device, electronic equipment and storage medium | |
CN109754816B (en) | Voice data processing method and device | |
TW201008222A (en) | A mobile phone for emotion recognition of incoming-phones and a method thereof | |
JP2014123813A (en) | Automatic scoring device for dialog between operator and customer, and operation method for the same | |
WO2022041192A1 (en) | Voice message processing method and device, and instant messaging client | |
CN101645961A (en) | Mobilephone and method for achieving caller emotion identification | |
CN112235183B (en) | Communication message processing method and device and instant communication client | |
KR102291113B1 (en) | Apparatus and method for producing conference record |