EP3706125B1 - Method and system using successive differences of speech signals for emotion identification - Google Patents
Method and system using successive differences of speech signals for emotion identification Download PDFInfo
- Publication number
- EP3706125B1 EP3706125B1 EP20161439.3A EP20161439A EP3706125B1 EP 3706125 B1 EP3706125 B1 EP 3706125B1 EP 20161439 A EP20161439 A EP 20161439A EP 3706125 B1 EP3706125 B1 EP 3706125B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech signal
- emotion
- features
- recognition model
- emotion recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008451 emotion Effects 0.000 title claims description 43
- 238000000034 method Methods 0.000 title claims description 29
- 230000008909 emotion recognition Effects 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 22
- 230000015654 memory Effects 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 15
- 238000004891 communication Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Definitions
- the disclosure herein generally relates to speech signal processing, and, more particularly, to a method and system for processing speech signal for emotion identification.
- Speech signal processing allows identification of emotions associated with a speech signal.
- accuracy with which the existing mechanisms identify the emotions associated with the speech signals being processed varies from one approach to another.
- An example of such a mechanism is proposed in US 2019/074028 A1 .
Description
- The present application claims priority from
India Patent Application No. 201921009121, filed before Indian Patent Office on March 8, 2019 - The disclosure herein generally relates to speech signal processing, and, more particularly, to a method and system for processing speech signal for emotion identification.
- Typically, when different users communicate to each other different emotions are conveyed through the speech signals. For example, when a user informs a happy news to another user, the emotion 'happiness' is conveyed through their speech signal. Similarly when two users are involved in a long conversation covering multiple topics, depending on nature of the topic being discussed, different emotions may be conveyed, which means within the same communication session, speech signals conveying different emotions are exchanged between the users involved in the communication.
- Speech signal processing allows identification of emotions associated with a speech signal. Different mechanisms exist which allow identification of the emotions. However, accuracy with which the existing mechanisms identify the emotions associated with the speech signals being processed varies from one approach to another. An example of such a mechanism is proposed in
US 2019/074028 A1 . - Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor-implemented method of speech signal processing according to independent claim 1 is provided.
- In another aspect, a system of
speech signal 2. processing according to independent claim 5 is provided. - In yet another aspect, a non-transitory computer readable medium for speech signal processing according to independent claim 9 is provided.
- It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention which is defined by the following claims.
- The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
-
FIG. 1 illustrates an exemplary block diagram of a system used for speech signal processing, according to some embodiments of the present disclosure. -
FIG. 2 is a flow diagram depicting steps involved in the process of emotion identification using the system ofFIG. 1 , according to some embodiments of the present disclosure. -
FIG. 3 is a flow diagram depicting steps involved in the process of extracting differential values corresponding to a speech signal, using the system ofFIG. 1 , according to some embodiments of the present disclosure. -
FIGS. 4a and 4b are example diagrams depicting signals at different stages of processing by the system ofFIG. 1 , according to some embodiments of the present disclosure. - Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the scope of the invention being defined by the following claims.
- Referring now to the drawings, and more particularly to
FIG. 1 through FIG. 4b , where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method. -
FIG. 1 illustrates an exemplary block diagram of a system used for speech signal processing, according to some embodiments of the present disclosure. Thesystem 100 includes at least onememory module 101, at least onehardware processor 102, and at least onecommunication interface 103. - The one or
more hardware processors 102 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the hardware processor(s) 102 are configured to fetch and execute computer-readable instructions stored in thememory module 101, which causes the hardware processor(s) 102 to perform actions depicted inFIG. 2 andFIG. 3 for the purpose of performing speech processing for emotion identification. In an embodiment, thesystem 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like. - The communication interface(s) 103 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the communication interface(s) 103 can include one or more ports for connecting a number of devices to one another or to another server.
- The memory module(s) 101 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the
system 100 can be stored in thememory 101. The memory module(s) 101 stores a plurality of instructions which when executed, cause the one ormore hardware processors 102 to perform the speech signal processing for emotion identification. - The
system 100 can be implemented in such a way that it can collect real-time speech input from at least one user and process the collected speech signal for the emotion identification purpose. In an example mode of implementation, thesystem 100 may be a component of a communication device being used by the at least one user. In another possible mode of implementation, thesystem 100 is part of a communication network, and one or more user communications may be routed through thesystem 100. - The
system 100 processes the speech signal collected as input, and extracts a plurality of features from the speech signal. Thesystem 100 may use any suitable mechanism/techniques for extracting the plurality of frames from the speech signal. Thesystem 100 then processes the frames and extracts one or more differential features. The process of extracting the differential features is explained below: - During this process the speech signal corresponding to each of the extracted features is sampled at a defined sampling rate. For example, the sampling rate may be 8000 Hz (this is depicted in
FIG. 4a ). In an embodiment, if the original sampling is not carried out at the defined sampling rate, a samples corresponding to the defined sampling rate are generated from the original samples. Each of the sampling outputs i.e. sampled speech signals is then split to a plurality of overlapping or non-overlapping frames, each oflength 20 ms. Further each frame is separately processed by thesystem 100. During the processing of each of the frames, thesystem 100 selects one sample in every M samples of the speech signal in the frame. In various embodiments, the selection of the one sample from every M samples may be at random or may be based on any specific pre-defined conditions that may have been configured with thesystem 100. For example, the pre-defined condition may be in terms of intervals within which the samples need to be picked. Thesystem 100 may be configured to pick the first of each M samples and skip M-1 samples, and then pick the first sample of the next M samples, and so on. Further thesystem 100 generates the differential values of the speech signal by calculating differences between adjacent samples of the frame to get an output frame of size (L ― (N)). - The system then extracts a plurality of features from the differential values (as depicted in
FIG. 4b ), and then compares the extracted plurality of differential values with an emotion recognition model to identify at least one emotion that matches the speech signal. The emotion recognition model is a reference database that stores at least the information pertaining to mapping between different differential values and corresponding emotions. By identifying match for the differential values of the speech signal in the emotion recognition model, thesystem 100 identifies corresponding emotions as the emotions matching the speech signal. Thesystem 100 may then associate each of the identified emotions with the speech signal. This process is elaborated inFIG. 2 as well as inFIG. 3 . - The emotion recognition model may be a data model generated using suitable machine learning algorithms. Inputs for the machine learning model(s) for generating the emotion recognition model may be speech signals, sentences extracted from the speech signal, utterances extracted from the sentences and so on.
-
FIG. 2 is a flow diagram depicting steps involved in the process of emotion identification using the system ofFIG. 1 , according to some embodiments of the present disclosure. In this method, at least one speech signal is collected (202) as input by thesystem 100, via one or more hardware processors. Then the collected speech signal is processed, via the one or more hardware processors, during which a plurality of differential features are extracted (204) by thesystem 100 using themethod 300. Out of the plurality of differential features, thesystem 100 extracts (206) a plurality of features, which are then compared (208) with an emotion recognition model, via the one or more hardware processors. Based on matching data found in the emotion recognition model, thesystem 100 identifies at least one emotion as matching the collected speech signal, and then associates (210) the identified at least one emotion with the speech signal, via the one or more hardware processors. In an embodiment, thesystem 100 associates the emotions to the speech signal such that emotions that match each frame of the speech signal is associated with the corresponding frame. For example, consider a speech signal oflength 20 seconds. Consider that thesystem 100 identifies that the emotion that matches the first 10 seconds (i.e. 0-10 sec) is HAPPY, and for speech signal from 11 to 20 sec, the emotion is SAD, then thesystem 100 accordingly associates the emotions to the speech signal and presents in an appropriate format to the user. -
FIG. 3 is a flow diagram depicting steps involved in the process of extracting differential values corresponding to a speech signal, using the system ofFIG. 1 , according to some embodiments of the present disclosure. In this process, thesystem 100 samples (302) the speech signal at a defined sampling rate. For example, the sampling rate is 8000 Hz. Output of the sampling process is a plurality of sampled speech signals. Thesystem 100 then splits (304) each of the plurality of sampled speech signals to a plurality of overlapping and non-overlapping frames, each of length 20ms. Further thesystem 100 separately processes each of the plurality of frames. Steps involved in the process of processing each frame is explained below. - The
system 100 selects (306) one sample in every M samples of the speech signal in a frame being considered. In various embodiments, thesystem 100 may randomly select one out of M samples or may select the sample based on at least one defined condition. Thesystem 100 then generates differential values corresponding to the speech signal, by calculating differences between adjacent samples of the frame as a differential feature to get an output frame of size (L ― (N)). By iterating thesteps system 100 generates a plurality of differential features corresponding to the speech signal being processed by thesystem 100. - It is to be noted that though the mechanism of generating the differential features (as in method 300) is explained in context of speech signal processing, the same method can be used for processing other types (other than the speech signal) of signals as well, in different signal processing domains.
- The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims.
- The embodiments of present disclosure herein address unresolved problem of emotion identification from a speech signal. The embodiments thus provide a mechanism of speech signal processing for identifying emotions associated with the speech signal.
- It is to be understood that the embodiments herein can comprise a program and in addition a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
- The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Also, the words "comprising," "having," "containing," and "including," and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise.
- Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term "computer-readable medium" should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
- It is intended that the disclosure and examples be considered as exemplary only, with the scope of the invention being defined by the following claims.
Claims (12)
- A processor implemented method (200) of speech signal processing, comprising:collecting (202) a speech signal from at least one user, as input, via one or more hardware processors;extracting (204) a plurality of differential features from the speech signal to generate a feature file, via the one or more hardware processors, by:sampling (302) the speech signal at a defined sampling rate to generate a plurality of sampled speech signals;splitting (304) each of the plurality of sampled speech signals to a plurality of overlapping or non-overlapping frames of 20 milliseconds length;iteratively performing for each of the plurality of frames, for a pre-defined number of times:selecting (306) one sample in every M samples of the speech signal in the frame, where M is 2; andcalculating (308) differences between adjacent samples of the frame as the differential feature to get an output frame of size (L ― (N)), where L is number of samples in a frame divided by M, and N represents number oftimes the difference between adjacent values is calculated;extracting (206) a plurality of features from the plurality of differential features of the plurality of overlapping or non-overlapping frames, via the one or more hardware processors;comparing (208) each of the plurality of features from the plurality of differential features with an emotion recognition model to identify at least one emotion corresponding to the speech signal, via the one or more hardware processors; andassociating (210) the identified at least one emotion with the speech signal, via the one or more hardware processors.
- The method (200) as claimed in claim 1, wherein the emotion recognition model comprises of information pertaining to a plurality of speech signal characteristics and corresponding emotions, annotated at an utterance level.
- The method (200) as claimed in claim 1, wherein identifying the at least one emotion corresponding to the speech signal comprises:identifying matching data in the emotion recognition model, corresponding to each of the plurality of features in the feature file; andidentifying at least one emotion tagged against each of the identified matching data in the emotion recognition model.
- The method (200) as claimed in claim 1, wherein the emotion recognition model is machine learning model generated by using training data comprising a plurality of sentences from at least one speech signal, and a plurality of utterances corresponding each of the plurality of sentences.
- A system (100) of speech signal processing, comprising:a memory module (101) storing a plurality of instructions;one or more communication interfaces (103); andone or more hardware processors (102) coupled to the memory module (101) via the one or more communication interfaces (103), wherein the one or more hardware processors are caused by the plurality of instructions to:collect (202) a speech signal from at least one user, as input;extract (204) a plurality of differential features from the speech signal to generate a feature file, by:sampling (302) the speech signal at a defined sampling rate to generate a plurality of sampled speech signals;splitting (304) each of the plurality of sampled speech signals to a plurality of overlapping or non-overlapping frames of 20 milliseconds length;iteratively performing for each of the plurality of frames, for a pre-defined number of times:selecting (306) one sample in every M samples of the speech signal in the frame, where M is 2; andcalculating (308) differences between adjacent samples of the frame as the differential feature to get an output frame of size (L ― (N)), where L is number of samples in a frame divided by M, and N representsnumber of times the difference between adjacent values is calculated;extract (206) a plurality of features from the plurality of differential features of the plurality of overlapping or non-overlapping frames;compare (208) each of the plurality of features from the plurality of differential features with an emotion recognition model to identify at least one emotion corresponding to the speech signal; andassociate (210) the identified at least one emotion with the speech signal.
- The system (100) as claimed in claim 5, wherein the emotion recognition model comprises of information pertaining to a plurality of speech signal characteristics and corresponding emotions, annotated at an utterance level.
- The system (100) as claimed in claim 5, wherein the system identifies the at least one emotion corresponding to the speech signal by:identifying matching data in the emotion recognition model, corresponding to each of the plurality of features in the feature file; andidentifying at least one emotion tagged against each of the identified matching data in the emotion recognition model.
- The system (100) as claimed in claim 5, wherein the system generates the emotion recognition model by using training data comprising a plurality of sentences from at least one speech signal, and a plurality of utterances corresponding each of the plurality of sentences.
- A non-transitory computer readable medium for speech signal processing, the non-transitory computer readable medium performs the speech signal processing by:collecting (202) a speech signal from at least one user, as input, via one or more hardware processors;extracting (204) a plurality of differential features from the speech signal to generate a feature file, via the one or more hardware processors, by:sampling (302) the speech signal at a defined sampling rate to generate a plurality of sampled speech signals;splitting (304) each of the plurality of sampled speech signals to a plurality of overlapping or non-overlapping frames of 20 milliseconds length;iteratively performing for each of the plurality of frames, for a pre-defined number of times:selecting (306) one sample in every M samples of the speech signal in the frame, where M is 2; andcalculating (308) differences between adjacent samples of the frame as the differential feature to get an output frame of size (L ― (N)), where L is number of samples in a frame divided by M, and N represents number of times the difference between adjacent values is calculated;extracting (206) a plurality of features from the plurality of differential features of the plurality of overlapping or non-overlapping frames, via the one or more hardware processors;comparing (208) each of the plurality of features from the plurality of differential features with an emotion recognition model to identify at least one emotion corresponding to the speech signal, via the one or more hardware processors; andassociating (210) the identified at least one emotion with the speech signal, via the one or more hardware processors.
- The non-transitory computer readable medium as claimed in claim 9, wherein the emotion recognition model comprises of information pertaining to a plurality of speech signal characteristics and corresponding emotions, annotated at an utterance level.
- The non-transitory computer readable medium as claimed in claim 9, wherein identifying the at least one emotion corresponding to the speech signal comprises:identifying matching data in the emotion recognition model, corresponding to each of the plurality of features in the feature file; andidentifying at least one emotion tagged against each of the identified matching data in the emotion recognition model.
- The non-transitory computer readable medium as claimed in claim 9, wherein the emotion recognition model is machine learning model generated by using training data comprising a plurality of sentences from at least one speech signal, and a plurality of utterances corresponding each of the plurality of sentences.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN201921009121 | 2019-03-08 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3706125A1 EP3706125A1 (en) | 2020-09-09 |
EP3706125B1 true EP3706125B1 (en) | 2021-12-22 |
Family
ID=69779997
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20161439.3A Active EP3706125B1 (en) | 2019-03-08 | 2020-03-06 | Method and system using successive differences of speech signals for emotion identification |
Country Status (2)
Country | Link |
---|---|
US (1) | US11227624B2 (en) |
EP (1) | EP3706125B1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115050077A (en) * | 2022-06-30 | 2022-09-13 | 浪潮电子信息产业股份有限公司 | Emotion recognition method, device, equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL115697A (en) * | 1995-10-19 | 1999-09-22 | Audiocodes Ltd | Pitch determination preprocessor based on correlation techniques |
US6151571A (en) | 1999-08-31 | 2000-11-21 | Andersen Consulting | System, method and article of manufacture for detecting emotion in voice signals through analysis of a plurality of voice signal parameters |
US20080040110A1 (en) | 2005-08-08 | 2008-02-14 | Nice Systems Ltd. | Apparatus and Methods for the Detection of Emotions in Audio Interactions |
US9020822B2 (en) * | 2012-10-19 | 2015-04-28 | Sony Computer Entertainment Inc. | Emotion recognition using auditory attention cues extracted from users voice |
US11004461B2 (en) * | 2017-09-01 | 2021-05-11 | Newton Howard | Real-time vocal features extraction for automated emotional or mental state assessment |
-
2020
- 2020-03-06 EP EP20161439.3A patent/EP3706125B1/en active Active
- 2020-03-09 US US16/812,757 patent/US11227624B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
EP3706125A1 (en) | 2020-09-09 |
US20200286506A1 (en) | 2020-09-10 |
US11227624B2 (en) | 2022-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110444198B (en) | Retrieval method, retrieval device, computer equipment and storage medium | |
US11017775B1 (en) | Systems and methods to utilize text representations of conversations | |
US10354677B2 (en) | System and method for identification of intent segment(s) in caller-agent conversations | |
CN110472224B (en) | Quality of service detection method, apparatus, computer device and storage medium | |
US10388283B2 (en) | System and method for improving call-centre audio transcription | |
US10956480B2 (en) | System and method for generating dialogue graphs | |
US10255346B2 (en) | Tagging relations with N-best | |
US11451666B1 (en) | Systems and methods for handling calls based on call insight information | |
CN110459223B (en) | Data tracking processing method, device, storage medium and apparatus | |
CN110287318B (en) | Service operation detection method and device, storage medium and electronic device | |
US11886509B2 (en) | Predictive prompt generation by an automated prompt system | |
CN115935182A (en) | Model training method, topic segmentation method in multi-turn conversation, medium, and device | |
EP3618061B1 (en) | Method and system for improving recognition of disordered speech | |
EP3706125B1 (en) | Method and system using successive differences of speech signals for emotion identification | |
CN111508530B (en) | Speech emotion recognition method, device and storage medium | |
CN113255368B (en) | Method and device for emotion analysis of text data and related equipment | |
CN113393845A (en) | Method and device for speaker recognition, electronic equipment and readable storage medium | |
US11647115B2 (en) | Conformational framework for call drop likelihood from interactive voice response system | |
US20220130414A1 (en) | Selection of speech segments for training classifiers for detecting emotional valence from input speech signals | |
US20240127790A1 (en) | Systems and methods for reconstructing voice packets using natural language generation during signal loss | |
US11947872B1 (en) | Natural language processing platform for automated event analysis, translation, and transcription verification | |
US11830489B2 (en) | System and method for speech processing based on response content | |
US11792243B2 (en) | System and method for conducting multi-session user interactions | |
CN117459637A (en) | Service data processing method, device, computer equipment and storage medium | |
CN113762786A (en) | Agent quality inspection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20210309 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 25/63 20130101AFI20210616BHEP Ipc: G10L 25/03 20130101ALN20210616BHEP |
|
INTG | Intention to grant announced |
Effective date: 20210701 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 25/63 20130101AFI20210618BHEP Ipc: G10L 25/03 20130101ALN20210618BHEP |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602020001338 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1457594 Country of ref document: AT Kind code of ref document: T Effective date: 20220115 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: FP |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220322 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1457594 Country of ref document: AT Kind code of ref document: T Effective date: 20211222 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220322 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220323 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220422 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602020001338 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220422 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 |
|
26N | No opposition filed |
Effective date: 20220923 |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20220331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20220306 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20220306 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20220331 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: NL Payment date: 20230215 Year of fee payment: 4 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20230208 Year of fee payment: 4 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20211222 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230526 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20230331 Year of fee payment: 4 Ref country code: CH Payment date: 20230401 Year of fee payment: 4 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: NL Payment date: 20240326 Year of fee payment: 5 |