KR20110017559A - Method and apparatus for analyzing emotion - Google Patents

Method and apparatus for analyzing emotion Download PDF

Info

Publication number
KR20110017559A
KR20110017559A KR1020090075078A KR20090075078A KR20110017559A KR 20110017559 A KR20110017559 A KR 20110017559A KR 1020090075078 A KR1020090075078 A KR 1020090075078A KR 20090075078 A KR20090075078 A KR 20090075078A KR 20110017559 A KR20110017559 A KR 20110017559A
Authority
KR
South Korea
Prior art keywords
voice
emotional state
information
feature
emotion
Prior art date
Application number
KR1020090075078A
Other languages
Korean (ko)
Inventor
이군섭
Original Assignee
에스케이 텔레콤주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 에스케이 텔레콤주식회사 filed Critical 에스케이 텔레콤주식회사
Priority to KR1020090075078A priority Critical patent/KR20110017559A/en
Publication of KR20110017559A publication Critical patent/KR20110017559A/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/18Service support devices; Network management devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

PURPOSE: A method and an apparatus for analyzing an emotion are provided to accurately analyze various emotions for a voice communication subscriber. CONSTITUTION: A DB storage unit(210) samples voices from voice communication between mobile terminals, and extracts a voice characteristics from the sampled voice. In addition, the DB storage unit stores the voice characteristics in a DB by matching an emotional state with the first extracted voice characteristics. An emotion state analysis unit(220) extracts a second voice characteristic from the voice inputted to a communication terminal which requests an emotional state, and analyzes the emotional state by using the voice characteristic DB and the extracted second voice characteristics.

Description

Method and Apparatus for Emotion Analysis {Method And Apparatus for Analyzing Emotion}

One embodiment of the invention relates to a method and apparatus for emotion analysis. More specifically, in the case of extracting emotions on the basis of speech, various emotions are generated through an adaptive emotion analysis algorithm structure that gradually increases the accuracy of emotion analysis by accumulating individual voice feature DB considering the voice feature of a specific speaker. Emotion analysis method and apparatus that can analyze emotions.

Speech recognition is the quantification by analyzing the vibrations of people's phonemes, syllables, words, etc., due to the different characteristics of human accent and pitch. It's a way of finding out. Emotion analysis technology for determining human emotional state is a technique for analyzing human emotion by generating different signals according to the determined emotional state.

However, even though the voice-based emotion analysis service is commercially available, it is not activated at present because the accuracy of the voice emotion analysis is low and the user's convenience is required to improve performance. In other words, in order to increase the field of application of emotion analysis and increase its utilization, it is necessary to develop an algorithm for improving the accuracy of emotion analysis to improve service reliability, and to expand the user base and diversify the service by providing various life necessity services.

In order to solve the above problems, an embodiment of the present invention is to provide a method and apparatus for analyzing emotions that can analyze a variety of accurate emotions for a voice call subscriber.

In order to achieve the above object, an embodiment of the present invention is to sample a voice from a voice call between communication terminals, extract a first voice feature from the sampled voice, and extract the first voice feature. A DB accumulator for accumulating the voice feature DB stored by matching the emotional state in the memory; And an emotional state analyzing unit configured to extract a second speech feature from a voice input to the communication terminal requesting the emotional state, and analyze the emotional state by using the extracted second speech feature and the accumulated speech feature DB. An emotional analysis apparatus is provided.

In addition, according to another object of the invention, the voice sampling step of sampling the voice from the voice call between the communication terminal; Extracting a first voice feature from the sampled voice; A voice feature DB accumulating step of accumulating a voice feature DB stored by matching an emotional state with the extracted first voice feature; A second voice feature extraction step of extracting a second voice feature from a voice input to the communication terminal requesting an emotional state; And an emotional state analyzing step of analyzing an emotional state using the extracted second speech feature and the accumulated speech feature DB.

As described above, according to an exemplary embodiment of the present invention, voice features of all voice call subscribers can be extracted and accumulated in a DB, and more accurate and various emotions of the voice call subscriber can be analyzed through the accumulated DB. It has an effect.

In addition, according to an embodiment of the present invention, it is possible to accumulate a DB matching the emotional state for each individual by extracting the voice feature from the voice of the emotional state is not confirmed from the voice calls of all mobile communication subscribers.

Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are assigned to the same components as much as possible even though they are shown in different drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

In addition, in describing the component of this invention, terms, such as 1st, 2nd, A, B, (a), (b), can be used. These terms are only for distinguishing the components from other components, and the nature, order or order of the components are not limited by the terms. If a component is described as being "connected", "coupled" or "connected" to another component, that component may be directly connected to or connected to that other component, but there may be another configuration between each component. It is to be understood that the elements may be "connected", "coupled" or "connected".

1 is a block diagram schematically showing an emotion analysis system according to an embodiment of the present invention.

Emotion analysis system according to an embodiment of the present invention includes a communication terminal 110, a wired and wireless communication network 120 and the emotion analysis device 130.

The communication terminal 110 refers to a terminal that performs normal voice call and data communication through wired / wireless communication in association with the wired / wireless communication network 120. The communication terminal 110 may be a wired terminal or a wireless terminal, and in the case of a wireless terminal, a personal digital assistant (PDA), a cellular phone, a personal communication service (PCS) phone, a hand-held PC, a CDMA- 2000 phone, WCDMA phone, Portable Multimedia Player (PMP), PlayStation Portable (PSP) and Mobile Broadband System (MBS) phone. Meanwhile, the communication terminal 110 may request the analyzed emotional state of the counterpart talker to the emotion analyzing apparatus 130 and receive the message as an SMS or MMS type message.

The wired / wireless communication network 120 refers to a network capable of transmitting and receiving data using an internet protocol using various wired and wireless communication technologies such as an internet network, an intranet network, a mobile communication network, and a satellite communication network. The wired / wireless communication network 120 is a network connecting the emotion analysis device 130 and the communication terminals 110, but may be a closed network such as a local area network (LAN), a wide area network (WAN), or the like. It is preferable to be open type such as Internet). The Internet has many services in the TCP / IP protocol and its upper layers: HyperText Transfer Protocol (HTTP), Telnet, File Transfer Protocol (FTP), Domain Name System (DNS), Simple Mail Transfer Protocol (SMTP), and SNMP ( The global open computer network architecture that provides the Simple Network Management Protocol (NFS), Network File Service (NFS), and Network Information Service (NIS). In addition, when the communication terminal 110 is a mobile communication terminal, the wired / wireless communication network 120 may include a mobile communication network. Here, since the technology for the wired and wireless communication network 120 is already known technology, a detailed description thereof will be omitted.

The emotion analysis apparatus 130 according to an embodiment of the present invention samples an untrained voice from which an emotional state is not confirmed from a voice call between the communication terminals 110, and extracts a first voice feature from the sampled voice. ), Accumulate the voice feature DB stored by matching the emotion state to the extracted first voice feature, extract the second voice feature from the voice input to the communication terminal 110 requesting the emotion state, and extract the voice feature. And analyzing the emotional state by comparing the second voice feature and the accumulated voice feature DB. In addition, when there is a request for information on the emotional state analyzed from the specific communication terminal, the emotion analyzing apparatus 130 transmits the analyzed emotional state information to the specific communication terminal by using an SMS or an MMS message. Here, the voice for which the emotional state is not confirmed refers to a voice for which the emotional state is not confirmed by the emotion analyzing apparatus 130.

The emotion analysis apparatus 130 distinguishes between the user's own voice and the other party's voice based on user information of the communication terminal in a call or filters the other's voice by using the frequency of use or the voice feature to distinguish between the user's own voice and the other's voice. Extract a voice feature for each of the user's own voice and the other party's voice. In addition, the emotion analysis apparatus 130 performs a function of extracting a voice feature for each grouped user by grouping voice features of a user who is calling through the same communication terminal. In addition, the emotion analysis apparatus 130 may store the voice feature DB accumulated in a meta data structure including at least one or more of age information, age information, specific information, and other information included in the extracted first voice feature. Function to distinguish

The emotion analyzing apparatus 130 analyzes the emotional state by using the registered voice feature of a specific speaker when the extracted second voice feature has a predetermined correspondence with the emotional state of the accumulated voice feature DB. To perform. On the other hand, the emotion analysis apparatus 130 performs a function of analyzing the emotional state by using the speech feature of the unspecified speaker when the extracted speech feature is less than a certain value matching the emotional state of the accumulated voice feature DB.

The emotion analyzing apparatus 130 generates a first analysis result analyzing the correlation between the voice feature of the other party's voice and the extracted second voice feature and performs a function of analyzing the emotion state based on the first analysis result. Meanwhile, the emotion analyzing apparatus 130 generates a second analysis result analyzing the correlation between the voice feature of the ambient noise and the extracted second voice feature, and analyzes the emotion state based on the second analysis result. To perform.

The emotion analysis apparatus 130 may include pitch information, energy information, teager energy information (squared-by-frequency squared), frequency shift information, included in the extracted first voice feature, Shimmer information, Rate of speech information, Vowel formants information, Frequency Range of Meaningful Signal (FRMS) information, Mean Mean information used, Standard Based on at least one of standard deviation information, min / max value information, gradient information, range information, range information, and percentile and linear regression coefficients information To match and store the emotional state.

Emotion analysis device 130 is false, truth, affection, stress, humor, resentment, happiness, sorrow, apocryphal, pleasure, displeasure, excitement, depression, By matching the emotional state, including at least one of tension, relaxation, joy, sadness, effort, fatigue, uplifting, vanity, stability, anxiety, anger, disgust, fear, jealousy, love, hate, hate, obedience, and rebellion Save function.

2 is a block diagram schematically illustrating an apparatus for analyzing emotions according to an embodiment of the present invention.

The emotion analysis apparatus 130 according to an embodiment of the present invention includes a DB accumulator 210, an emotion state analyzer 220, and a message transmitter 230. Meanwhile, in one embodiment of the present invention, the emotion analysis apparatus 130 is described as including only the DB storage unit 210, the emotional state analysis unit 220 and the message transmission unit 230, which is the present invention As merely illustrative of the technical spirit of one embodiment of the present invention, those skilled in the art to which one embodiment of the present invention belongs to the emotional analysis apparatus 130 without departing from the essential characteristics of one embodiment of the present invention For the components included in the) will be applicable to various modifications and variations.

The DB accumulator 210 samples an untrained voice whose emotion state is not confirmed from the voice call between the communication terminals 110, extracts a first voice feature from the sampled voice, and extracts the extracted voice. 1 Function to accumulate the voice feature DB stored by matching the emotional state to the voice feature.

The DB accumulator 210 distinguishes the user's voice from the other party's voice based on user information of the communication terminal in the call or filters the other party's voice by using the frequency of use or the voice feature to distinguish the user's own voice from the other party's voice. Extract a voice feature for each of the user's own voice and the other party's voice. In addition, the DB accumulator 210 performs a function of extracting voice features for each grouped user by grouping voice features of a user who is calling through the same communication terminal. Also, the DB accumulator 210 stores the voice feature DB accumulated in a meta data structure including at least one or more of age information, age information, specific information, and other information included in the extracted first voice feature. Function to distinguish

The DB accumulator 210 may include pitch information, energy information, tieger energy information (squared squared of amplitude and frequency squared), frequency shift information, and the like included in the extracted first voice feature. Shimmer information, Rate of speech information, Vowel formants information, Frequency Range of Meaningful Signal (FRMS) information, Mean Mean information used, Standard Based on at least one of standard deviation information, min / max value information, gradient information, range information, range information, and percentile and linear regression coefficients information To match and store the emotional state. In addition, the DB accumulator 210 detects false, truth, affection, stress, humor, resentment, happiness, sorrow, apocryphal, pleasure, displeasure, excitement, and depression in the extracted first voice feature. ) Matching emotional states including at least one of tension, relaxation, joy, sadness, effort, fatigue, uplifting, sense of insincerity, stability, anxiety, anger, disgust, fear, jealousy, love, hate, hate, obedience, and rebellion To save.

The emotion state analyzer 220 extracts a second voice feature from the voice input to the communication terminal 110 requesting the emotion state, and analyzes the emotion state by comparing the extracted second voice feature and the accumulated voice feature DB. It performs the function. Here, the emotional state analyzer 220 may include an input voice feature analyzer 222, a speaker-dependent voice emotion analyzer 224, a speaker-independent voice emotion analyzer 226, and a peripheral information utilization unit 228. It is composed. Meanwhile, in an embodiment of the present invention, the emotional state analyzer 220 uses the input voice feature analyzer 222, the speaker-dependent voice emotion analyzer 224, the speaker-independent voice emotion analyzer 226, and surrounding information. Although described as being configured to include only the part 228, which is merely illustrative of the technical idea of one embodiment of the present invention, those skilled in the art to which one embodiment of the present invention belongs Various modifications and variations to the components included in the emotional state analyzer 220 may be applied without departing from the essential characteristics of the exemplary embodiment of the present invention.

The speaker-dependent voice emotion analysis unit 224 performs a function of analyzing an emotion state for a specific registered speaker, and the extracted second voice feature is consistent with the emotion state of the accumulated voice feature DB. If the value is greater than or equal to, a function of analyzing an emotional state is performed by using a voice feature of a specific registered speaker. In addition, the speaker-independent speech emotion analysis unit 226 performs a function of analyzing an emotional state for an unspecified speaker, and when the extracted speech feature is less than a predetermined value in agreement with the emotion state of the accumulated speech feature DB, Analyzes emotional state by using voice features of unspecified speaker. The speaker-dependent speech emotion analysis unit 224 or the speaker-independent speech emotion analysis unit 226 may analyze the correlation between the voice feature of the other party's voice and the extracted second voice feature from the surrounding information utilization unit 228. Receives a result, and analyzes the emotional state based on the first analysis result. In addition, the speaker-dependent speech emotion analysis unit 224 or the speaker-independent speech emotion analysis unit 226 analyzes the correlation between the speech feature of the ambient noise and the extracted second speech feature from the surrounding information utilization unit 228. Receives a second analysis result, and performs a function of analyzing the emotional state based on the second analysis result.

The peripheral information utilization unit 228 performs a function of analyzing a correlation between voice or noise information input to the communication terminal 110 requesting an emotional state, and the voice feature of the counterpart talker and the extracted second voice feature. The first analysis result analyzing the correlation is delivered to the speaker dependent speech emotion analysis unit 224 or the speaker independent speech emotion analysis unit 226. In addition, the peripheral information utilization unit 228 analyzes the correlation between the speech feature of the ambient noise and the extracted second speech feature by using the speaker-dependent speech emotion analysis unit 224 or the speaker-independent speech emotion analysis. Performs a function of delivering to the unit 226.

The message transmitter 230 performs a function of transmitting an SMS or MMS message, and when there is a request for information of an emotional state analyzed from a specific communication terminal, the information of the analyzed emotional state is specified using an SMS or MMS message. It performs the function of transmitting to the terminal.

3 is a flowchart illustrating a emotion analysis method according to an embodiment of the present invention.

The emotion analyzing apparatus 130 samples the voice for which the emotional state is not confirmed from the voice call between the communication terminals 110 (S310). The emotion analysis apparatus 130 extracts a first voice feature that extracts a first voice feature from the sampled voice (S320). In addition, the emotion analysis device 130 distinguishes between the user's own voice and the other party's voice based on the user information of the communication terminal in the call, or filters the other's voice by using the frequency of use or voice features to distinguish the user's own voice and the other's voice. In addition, the voice feature of each of the separated user's own voice and the other party's voice is extracted. In addition, the emotion analysis apparatus 130 extracts a voice feature for each grouped user by grouping the voice features of the user who is calling through the same communication terminal. In addition, the emotion analysis apparatus 130 may store the voice feature DB accumulated in a meta data structure including at least one or more of age information, age information, specific information, and other information included in the extracted first voice feature. Separate.

The emotion analyzing apparatus 130 accumulates the voice feature DB stored by matching the emotion state with the extracted first voice feature (S330). That is, the emotion analysis apparatus 130 may include pitch information, energy information, teager energy information (squared amplitude x squared amplitude) included in the extracted first voice feature, and frequency jitter. Information, amplitude information (Shimmer), rate of speech information, constituent speech information of vowels, frequency range of Meaningful Signal (FRMS) information of speech, mean information of statistics used At least one of standard deviation information, minimum / maximum value information, gradient information, range information, and percentile and linear regression coefficients information Matching and storing the emotional state based on. In addition, the emotion analysis apparatus 130 is false, truth, affection, stress, humor, resentment, happiness, sadness, apocryphal, pleasure, displeasure, excitement, sympathy to the extracted first voice feature ), The emotional state including at least one of tension, relaxation, joy, sadness, effort, fatigue, uplifting, vanity, stability, anxiety, anger, disgust, fear, jealousy, love, hate, hate, obedience, and rebellion Match and save.

The emotion analyzing apparatus 130 extracts a second voice feature from the voice input to the communication terminal 110 requesting an emotional state (S340). That is, the emotion analysis device 130 extracts the second voice feature from the voice input to the communication terminal 110 requesting the emotion state via the wired / wireless communication network 120.

The emotion analyzing apparatus 130 analyzes the emotional state by comparing the extracted second voice feature and the accumulated voice feature DB (S350). Here, the emotion analyzing apparatus 130 analyzes the emotional state by using the registered voice feature of a specific speaker when the extracted second voice feature has a predetermined correspondence with the emotional state of the accumulated voice feature DB. . In addition, the emotion analysis apparatus 130 analyzes the emotional state by using the speech feature of the unspecified speaker when the extracted speech feature has a consistency with the emotional state of the accumulated speech feature DB less than a predetermined value. In addition, the emotion analysis apparatus 130 generates a first analysis result analyzing the correlation between the voice feature of the other party's voice and the extracted second voice feature, and analyzes the emotional state based on the first analysis result. In addition, the emotion analysis apparatus 130 generates a second analysis result analyzing the correlation between the voice feature of the ambient noise and the extracted second voice feature, and analyzes the emotion state based on the second analysis result.

In FIG. 3, the emotion analysis apparatus 130 sequentially executes steps S310 to S350. However, this is merely illustrative of the technical idea of an embodiment of the present invention and includes an embodiment of the present invention. Those skilled in the art may execute the emotion analysis apparatus 130 in a different order as described in FIG. 3 or perform one or more of steps S310 to S350 without departing from the essential characteristics of one embodiment of the present invention. Since it will be applicable to various modifications and variations by executing in parallel, Figure 3 is not limited to the time series order.

As described above, the emotion analysis method according to an embodiment of the present invention described in FIG. 3 may be implemented in a program and recorded in a computer-readable recording medium. The computer-readable recording medium having recorded thereon a program for implementing the emotion analysis method according to an embodiment of the present invention includes all kinds of recording devices storing data that can be read by a computer system. Examples of such computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, etc., and also implemented in the form of a carrier wave (e.g., transmission over the Internet) . The computer readable recording medium may also be distributed over a networked computer system so that computer readable code is stored and executed in a distributed manner. In addition, functional programs, code, and code segments for implementing an embodiment of the present invention may be easily inferred by programmers skilled in the art to which an embodiment of the present invention belongs.

4 is an exemplary diagram for transmitting the analyzed emotional state of the called party to the caller according to an embodiment of the present invention.

4 illustrates an example of a voice additional service procedure during a call in which a caller (customer) may analyze the other party's emotion through a voice call with the other party (the called party) using the communication terminal 110. That is, when the caller presses 'prefix + called number' using the communication terminal 110, the wired / wireless communication network 120 routes the corresponding call to the emotion analyzing apparatus 130, and the emotion analyzing apparatus 130. Extracts a voice whose emotion state is not confirmed from the caller's communication terminal 110 and the caller's voice call, extracts a first voice feature from the sampled voice, and matches the emotion state with the extracted first voice feature. Accumulate the stored voice feature DB, extract the second voice feature from the voice input to the communication terminal 110 requesting the emotion state, and compare the extracted second voice feature with the accumulated voice feature DB to determine the emotion state. And transmits the analyzed emotional state to the caller's communication terminal 110 by SMS or MMS.

For example, the emotion analysis apparatus 130 extracts a voice feature from the voice input from the called party, and extracts the voice features such as pitch information, energy information, and teager energy information (square of amplitude). x frequency squared), frequency jitter information, amplitude immersion information, rate of speech information, constituent speech information of a vowel, and frequency range of speech (FRMS) Information, statistics used Mean information, Standard deviation information, Min / Max value information, Gradient information, Range information, Regression coefficients Based on the at least one information of the linear regression coefficients information, the speech feature DB accumulated in the emotion analysis apparatus 130 is compared to find false, truth, affection, stress, humor, dissatisfaction, happiness, sadness, apocryphal, Pleasure, displeasure, excitement, ecstasy Iii) analyze at least one emotion of tension, relaxation, joy, sadness, effort, fatigue, excitement, vanity, stability, anxiety, anger, disgust, fear, jealousy, love, hate, hate, obedience, and rebellion. In this case, the analyzed emotional state may be transmitted to the caller's communication terminal 110 by SMS or MMS.

In the above description, all elements constituting the embodiments of the present invention are described as being combined or operating in combination, but the present invention is not necessarily limited to the embodiments. In other words, within the scope of the present invention, all of the components may be selectively operated in combination with one or more. In addition, although all of the components may be implemented in one independent hardware, each or all of the components may be selectively combined to perform some or all functions combined in one or a plurality of hardware. It may be implemented as a computer program having a. Codes and code segments constituting the computer program may be easily inferred by those skilled in the art. Such a computer program may be stored in a computer readable storage medium and read and executed by a computer, thereby implementing embodiments of the present invention. The storage medium of the computer program may include a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like.

In addition, the terms "comprise", "comprise" or "having" described above mean that the corresponding component may be included, unless otherwise stated, and thus excludes other components. It should be construed that it may further include other components instead. All terms, including technical and scientific terms, have the same meanings as commonly understood by one of ordinary skill in the art unless otherwise defined. Terms commonly used, such as terms defined in the dictionary, should be interpreted to coincide with the contextual meaning of the related art, and shall not be construed in an ideal or overly formal sense unless explicitly defined in the present invention.

The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention but to describe the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of the present invention.

As described above, the present invention is applied to an application service capable of analyzing various emotions through voice recognition, and extracts voice features from voices whose emotion states are not confirmed from voice calls of all mobile communication subscribers. It is a useful invention that not only can accumulate DB matching a state, but also generate an effect of analyzing a more accurate emotional state using the accumulated DB.

1 is a block diagram schematically showing an emotion analysis system according to an embodiment of the present invention;

2 is a block diagram schematically showing an emotion analysis apparatus according to an embodiment of the present invention;

3 is a flowchart illustrating a emotion analysis method according to an embodiment of the present invention;

4 is an exemplary diagram for transmitting the analyzed emotional state of the called party to the caller according to an embodiment of the present invention.

<Description of Symbols for Main Parts of Drawings>

110: communication terminal 120: wired and wireless communication network

130: emotion analysis device 210: DB accumulation unit

220: emotional state analysis unit 230: message transmission unit

Claims (16)

A DB for sampling a voice from a voice call between communication terminals, extracting a first voice feature from the sampled voice, and accumulating a voice feature DB stored by matching an emotional state to the extracted first voice feature An accumulation part; And Emotional state analysis unit for extracting a second voice feature from the voice input to the communication terminal requesting the emotional state, and analyzing the emotional state using the extracted second voice feature and the accumulated voice feature DB Emotional analysis device comprising a. The method of claim 1, The emotional state analysis unit includes a speaker-dependent speech emotion analysis unit for analyzing the emotional state for a specific registered speaker, The speaker-dependent speech emotion analysis unit analyzes the emotion state by using the registered voice feature of a specific speaker when the extracted second voice feature has a matching value equal to or greater than a predetermined emotion state of the accumulated voice feature DB. Emotion analysis device, characterized in that. The method of claim 2, The emotional state analysis unit includes a peripheral information utilization unit for analyzing the correlation of the voice or noise information input to the communication terminal requesting the emotional state, The speaker-dependent speech emotion analysis unit receives a first analysis result analyzing the correlation between the voice feature of the other party's voice and the extracted second voice feature from the peripheral information utilization unit, and based on the first analysis result. Emotion analysis device, characterized in that for analyzing the state. The method of claim 3, wherein The speaker dependent voice emotion analysis unit, And receiving a second analysis result analyzing the correlation between the voice feature of the ambient noise and the extracted second voice feature from the surrounding information utilization unit, and analyzing the emotional state based on the second analysis result. Emotion analysis device. The method of claim 1, The emotional state analysis unit includes a speaker-independent speech emotion analysis unit for analyzing the emotional state for the unspecified speaker, The speaker-independent speech emotion analysis unit analyzes the emotional state by using the speech feature of the unspecified speaker when the extracted second speech feature has a matching degree with the emotion state of the accumulated speech feature DB is less than a predetermined value. Emotion analysis device. The method of claim 5, The emotional state analysis unit includes a peripheral information utilization unit for analyzing the correlation of the voice or noise information input to the communication terminal requesting the emotional state, The speaker-independent speech emotion analysis unit receives a first analysis result analyzing the correlation between the voice feature of the other party's voice and the extracted second voice feature from the peripheral information utilization unit, and based on the first analysis result, an emotion state. Emotion analysis device, characterized in that for analyzing. The method of claim 6, The speaker independent voice emotion analysis unit, And receiving a second analysis result analyzing the correlation between the voice feature of the ambient noise and the extracted second voice feature from the surrounding information utilization unit, and analyzing the emotional state based on the second analysis result. Emotion analysis device. The method of claim 1, The DB accumulation unit, Distinguish between the user's own voice and the other party's voice based on the user information of the communication terminal in a call, or filter the other's voice by using the frequency of use or voice characteristics to distinguish the user's own voice from the other's voice. Emotion analysis device, characterized in that for extracting the speech feature for each of the other party's voice. The method of claim 1, The DB accumulation unit, And extracting a voice feature for each of the grouped users by grouping voice features of a user calling through the same communication terminal. The method of claim 1, The DB accumulation unit, Pitch information, energy information, Teager energy information (squared squared frequency squared), frequency jitter information, amplitude shift information included in the extracted first voice feature Rate of speech information, vowel formants information, frequency range of speech (FRMS) information, statistical mean information used, standard deviation information Matching the emotional state based on at least one of min / max value information, max gradient information, range information, and regression coefficients (percentile and linear regression coefficients) Emotion analysis device, characterized in that for storing. The method of claim 1, The DB accumulation unit, An emotion characterized in that the accumulated voice feature DB is divided into a meta data structure including at least one or more of age information, age information, specific information, and other information included in the extracted first voice feature Analysis device. The method of claim 1, The apparatus may further include a message transmitter for transmitting an SMS or MMS message, wherein the message transmitter uses the analyzed emotional state information as the SMS or MMS message when a request for information of the analyzed emotional state is received from a specific communication terminal. Emotion analysis device, characterized in that for transmitting to the specific communication terminal. The method of claim 1, The DB accumulation unit, False, truth, affection, stress, humor, resentment, happiness, sadness, apocryphal, pleasure, displeasure, excitement, sympathy, tension, relaxation, joy, Matching and storing the emotional state including at least one of sadness, effort, fatigue, uplifting, vanity, stability, anxiety, anger, disgust, fear, jealousy, love, hate, hate, obedience and rebellion Emotional analysis device. A voice sampling step of sampling voice from a voice call between communication terminals; Extracting a first voice feature from the sampled voice; A voice feature DB accumulating step of accumulating a voice feature DB stored by matching an emotional state with the extracted first voice feature; A second voice feature extraction step of extracting a second voice feature from a voice input to the communication terminal requesting an emotional state; And Emotional state analysis step of analyzing the emotional state using the extracted second speech feature and the accumulated speech feature DB Emotional analysis method comprising a. The method of claim 14, The emotional state analysis step, Analyzing the emotional state by using the registered voice feature of a specific speaker when the extracted second speech feature has a consistency equal to or greater than a predetermined value of the emotional state of the accumulated speech feature DB; And Analyzing the emotional state by using the speech characteristic of the unspecified speaker when the extracted speech characteristic is less than a predetermined value in agreement with the emotional state of the accumulated speech characteristic DB. Emotional analysis method comprising a. The method of claim 14, The emotional state analysis step, Generating a first analysis result analyzing a correlation between the voice feature of the other party's voice and the extracted second voice feature, and analyzing an emotional state based on the first analysis result; And Generating a second analysis result analyzing a correlation between the voice feature of the ambient noise and the extracted second voice feature, and analyzing an emotional state based on the second analysis result Emotional analysis method comprising a.
KR1020090075078A 2009-08-14 2009-08-14 Method and apparatus for analyzing emotion KR20110017559A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020090075078A KR20110017559A (en) 2009-08-14 2009-08-14 Method and apparatus for analyzing emotion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020090075078A KR20110017559A (en) 2009-08-14 2009-08-14 Method and apparatus for analyzing emotion

Publications (1)

Publication Number Publication Date
KR20110017559A true KR20110017559A (en) 2011-02-22

Family

ID=43775525

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020090075078A KR20110017559A (en) 2009-08-14 2009-08-14 Method and apparatus for analyzing emotion

Country Status (1)

Country Link
KR (1) KR20110017559A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013137512A1 (en) * 2012-03-13 2013-09-19 주식회사 이루온 Emotion-based advertisement system and method
CN104754110A (en) * 2013-12-31 2015-07-01 广州华久信息科技有限公司 Machine voice conversation based emotion release method mobile phone
US9972341B2 (en) 2014-01-22 2018-05-15 Samsung Electronics Co., Ltd. Apparatus and method for emotion recognition
KR20190022151A (en) * 2017-08-25 2019-03-06 강원대학교산학협력단 Non-contact biosignal detecting method and apparatus thereof
KR20190069992A (en) * 2017-12-12 2019-06-20 세종대학교산학협력단 Method and system for recognizing emotions based on speaking style
KR20200143991A (en) 2019-06-17 2020-12-28 주식회사 오니온파이브 Answer recommendation system and method based on text content and emotion analysis
WO2021162489A1 (en) * 2020-02-12 2021-08-19 Samsung Electronics Co., Ltd. Method and voice assistance apparatus for providing an intelligence response

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013137512A1 (en) * 2012-03-13 2013-09-19 주식회사 이루온 Emotion-based advertisement system and method
CN104754110A (en) * 2013-12-31 2015-07-01 广州华久信息科技有限公司 Machine voice conversation based emotion release method mobile phone
US9972341B2 (en) 2014-01-22 2018-05-15 Samsung Electronics Co., Ltd. Apparatus and method for emotion recognition
KR20190022151A (en) * 2017-08-25 2019-03-06 강원대학교산학협력단 Non-contact biosignal detecting method and apparatus thereof
KR20190069992A (en) * 2017-12-12 2019-06-20 세종대학교산학협력단 Method and system for recognizing emotions based on speaking style
KR20200143991A (en) 2019-06-17 2020-12-28 주식회사 오니온파이브 Answer recommendation system and method based on text content and emotion analysis
WO2021162489A1 (en) * 2020-02-12 2021-08-19 Samsung Electronics Co., Ltd. Method and voice assistance apparatus for providing an intelligence response
US11741954B2 (en) 2020-02-12 2023-08-29 Samsung Eleotronics Co., Ltd. Method and voice assistance apparatus for providing an intelligence response

Similar Documents

Publication Publication Date Title
US11811970B2 (en) Voice and speech recognition for call center feedback and quality assurance
US10645214B1 (en) Identical conversation detection method and apparatus
KR20110017559A (en) Method and apparatus for analyzing emotion
CN105489221B (en) A kind of audio recognition method and device
US8804918B2 (en) Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
EP2523441A1 (en) A Mass-Scale, User-Independent, Device-Independent, Voice Message to Text Conversion System
US8051134B1 (en) Systems, methods, and programs for evaluating audio messages
US20110004473A1 (en) Apparatus and method for enhanced speech recognition
US20150310877A1 (en) Conversation analysis device and conversation analysis method
CN102780819A (en) Method of voice recognition of contact for mobile terminal
CN107886951B (en) Voice detection method, device and equipment
JP2013011830A (en) Abnormal state detection device, telephone set, abnormal state detection method, and program
CN108831456A (en) It is a kind of by speech recognition to the method, apparatus and system of video marker
CN103856626A (en) Customization method and device of individual voice
JP2010103751A (en) Method for preventing prohibited word transmission, telephone for preventing prohibited word transmission, and server for preventing prohibited word transmission
JP6268916B2 (en) Abnormal conversation detection apparatus, abnormal conversation detection method, and abnormal conversation detection computer program
JP5988077B2 (en) Utterance section detection apparatus and computer program for detecting an utterance section
US9875236B2 (en) Analysis object determination device and analysis object determination method
EP2913822A1 (en) Speaker recognition method
CN113194210A (en) Voice call access method and device
KR100463706B1 (en) A system and a method for analyzing human emotion based on voice recognition through wire or wireless network
CN108040185B (en) A kind of method and apparatus identifying harassing call
US10237399B1 (en) Identical conversation detection method and apparatus
CN113593580B (en) Voiceprint recognition method and device
US20090326940A1 (en) Automated voice-operated user support

Legal Events

Date Code Title Description
N231 Notification of change of applicant
WITN Withdrawal due to no request for examination