KR20170073113A - Method and apparatus for recognizing emotion using tone and tempo of voice signal - Google Patents
Method and apparatus for recognizing emotion using tone and tempo of voice signal Download PDFInfo
- Publication number
- KR20170073113A KR20170073113A KR1020150181619A KR20150181619A KR20170073113A KR 20170073113 A KR20170073113 A KR 20170073113A KR 1020150181619 A KR1020150181619 A KR 1020150181619A KR 20150181619 A KR20150181619 A KR 20150181619A KR 20170073113 A KR20170073113 A KR 20170073113A
- Authority
- KR
- South Korea
- Prior art keywords
- value
- emotion
- voice
- interval
- information
- Prior art date
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000013528 artificial neural network Methods 0.000 claims abstract description 35
- 230000008909 emotion recognition Effects 0.000 claims abstract description 31
- 239000000284 extract Substances 0.000 claims abstract description 5
- 230000002996 emotional effect Effects 0.000 claims description 4
- 238000005311 autocorrelation function Methods 0.000 claims description 3
- 241000282414 Homo sapiens Species 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 208000037656 Respiratory Sounds Diseases 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000002243 primary neuron Anatomy 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 210000003900 secondary neuron Anatomy 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
According to an aspect of the present invention, there is provided an emotion recognition method using tone and tempo information, comprising: receiving a voice signal of a user; Detecting a voice interval by dividing the voice signal into a voice interval and a non-voice interval using an absolute division; Extracting tone information and tempo information from the detected voice interval; And the neural network extracts emotion information using the tone information and the tempo information in two or more neural networks. The first neural network of the neural network classifies emotions and sadness emotions, and the second neural network emits joy emotions and angry emotions And extracting the emotion information separately.
Description
BACKGROUND OF THE
In communication, the transmission and recognition of emotion is a very important factor, which is necessary for accurate communication not only between people but also between people and animals or between people and machines.
Communication between human beings consists of various elements such as voice, gesture, and facial expression, which act individually or in combination to convey and recognize emotions.
Recently, as the internet technology of things has developed, the communication between human and machine has been emerged as an important factor to communicate emotion. Up to now, the research has mainly focused on the emotion of human being based on the expression of human face, And to identify and judge them.
A variety of studies have been conducted using speech to communicate between humans and machines. However, research has focused on recognition of human speech, synthesis of text by speech, or recognition and authentication of speech by speech. Research that recognizes emotions is not active yet.
Conventionally, emotion recognition using speech recognition has used a method of determining an anger or the like based on a pitch in a calm state based on a pitch or a volume, that is, a strength of a voice signal, according to a person's emotional state.
However, there is a problem that the method of using such a pitch has a large deviation according to an individual, so that it is difficult to obtain an average value thereof, and the strength of a voice signal is greatly influenced by the state of a microphone and a distance between a speaker and a microphone. There is a problem in that the accuracy is low.
Also, since the voice signal has a voice section and a non-voice section, when a voice signal is analyzed to analyze the entire voice, a non-voice section included in the voice signal lowers the accuracy of speech recognition or emotion recognition. And a speech onset technique capable of detecting only a speech section is also necessary.
The present invention has been made in view of the technical background as described above, and it is an object of the present invention to provide an apparatus and method for recognizing emotions using a tone and tempo of a voice section by distinguishing a voice section and a non- And to provide such a method.
The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.
According to an aspect of the present invention, there is provided an emotion recognition method using tone and tempo information, the method comprising: receiving a voice signal of a user; Detecting a voice interval by dividing the voice signal into a voice interval and a non-voice interval using an absolute division; Extracting tone information and tempo information from the detected voice interval; And the neural network extracts emotion information using the tone information and the tempo information in two or more neural networks. The first neural network of the neural network classifies emotions and sadness emotions, and the second neural network emits joy emotions and angry emotions And extracting the emotion information separately.
According to another aspect of the present invention, an emotion recognition apparatus using tone and tempo information includes: an input unit for receiving a user's voice signal; A voice section detector for detecting a voice section by dividing the voice signal into a voice section and a non-voice section using an absolute division; A tone information extracting unit for extracting tone information from the detected voice interval; A tempo information extracting unit for extracting tempo information from the extracted voice interval; And extracting emotional information using the tone information and the tempo information, wherein the first neural network distinguishes between the emotion and the sadness emotion, and the second neural network emits joy emotions And an emotion recognition unit for extracting emotion information by classifying the emotion.
According to the present invention, it is possible to correctly distinguish a voice section and a non-voice section of a voice signal, and more effective and accurate from the voice section to recognize emotions.
1 is a flowchart of an emotion recognition method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a speech interval extraction method according to an embodiment of the present invention; FIG.
FIG. 3 illustrates extracted voice segments according to an embodiment of the present invention. FIG.
4 is a structural diagram of an emotion recognition apparatus according to another embodiment of the present invention.
5 is a diagram showing tone characteristics of a voice signal according to emotion;
6 is a diagram showing a tempo characteristic of a voice signal according to emotion;
7 is a structural diagram of an emotion recognition apparatus according to another embodiment of the present invention.
8 is a structural diagram of an emotion recognition apparatus according to another embodiment of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. 1 shows a flow chart of an emotion recognition method according to the present invention.
First, the user's voice to be subjected to emotion recognition is input (S110).
The input voice can be acquired via a microphone or the like, or can be acquired from a voice message or the like, or the voice of the user can be input in such a manner as extracting only a voice portion from a moving image attached to the mail.
Next, a voice interval necessary for emotion recognition is detected from the voice signal of the input user (S120).
Since the input voice signal has mixed voice and non-voice sections, the recognition rate of the voice recognition is lowered when the entire voice signal is used. Therefore, only the voice section is separated and used for emotion recognition.
The IAV (Integral Absolute Value) feature is used to separate speech segments. This is to reflect the energy magnitude value of the signal because the voice interval is larger than the non-voice interval.
2 is a flowchart for detecting a voice interval.
First, in order to detect a speech interval, an absolute value for each frame is calculated (S210). The frame of the speech signal depends on the sampling frequency and the number of samples. One frame having a sample frequency of 48 kHz and including 1536 samples has a length of 32 milliseconds (ms).
That is, the absolute division can be obtained by integrating the absolute values of 1536 samples in one frame.
If the absolute value of the input signal is obtained, the maximum value and the minimum value in the interval are calculated (S220), and a threshold value for determining whether the voice interval is calculated from the maximum value is calculated.
First, it is determined whether the minimum value exceeds 70% of the maximum value (S230). In the case where the minimum value is close to the maximum value, the threshold value becomes too high to prevent the section judged as the voice section from becoming too short.
If the minimum value is equal to or greater than 70% of the maximum value, the threshold value is made 20% of the maximum value (S240), and the voice interval is determined.
If the minimum value is less than 70% of the maximum value, the threshold value is set to a value obtained by adding 10% of the difference value between the maximum value and the minimum value to the minimum value (S250), and the threshold value is determined.
If it is determined that the absolute interval is greater than the threshold value, it is determined that the voice interval is started (S270). If the absolute interval is smaller than the threshold value, it is determined that the voice interval is terminated (S280) And ends the step S120 for detecting the voice interval.
Each of the numerical values used in the speech interval detection step (S120) can be calculated by substituting an optimal value through an experiment as an example value for explanation.
When the voice interval detection step S120 is finished, the tone information of the voice interval is extracted (S130), the tempo information of the voice interval is extracted (S140), and used for emotion recognition.
4 shows an apparatus for extracting tone and tempo information and performing emotion recognition using a neural network.
A human voice signal is a quasi-periodic signal generated by vibrating the vocal cords. The vibration period of such a voice signal is called a fundamental frequency or pitch or tone.
Tone of a voice signal is an important feature that is widely used in the field of voice signal processing. There are various methods for obtaining tone information.
Autocorrelation or AMDF (Average Magnitude Difference Function) method is a method of finding a frequency having the greatest autocorrelation in a voice signal and determining the frequency as a fundamental frequency, that is, a tone. Usually, 500 Hz, so change the frequency from 80 Hz to 500 Hz, find the cycle with the largest autocorrelation value, and determine the frequency with the highest correlation as the fundamental frequency.
In the method of using the energy of a voice signal, a voice signal as a time axis signal is converted into a frequency signal by FFT (Fast Fourier Transform) or the like, and energy values of each frequency are measured to determine a frequency having the largest energy value as a fundamental frequency . As a method of converting a voice signal into a frequency signal, a method such as DCT (Discrete Cosine Transform), DFT (Discrete Fourier Transform), and a filter bank may be used in addition to FFT.
As shown in FIG. 4, the tone extracted for each frame is used to determine an average value and variance value for the entire voice interval, and transmits the tone value and the variance value to the neural network to recognize the emotion.
The tempo of the voice signal is measured using a BPM (Beat Per Minute) unit. In the case of music, the number of beats is constant within one minute. In the case of human voice, the tempo of a voice signal is obtained by using the number of syllables composed of one consonant and vowel or one vowel.
In the present invention, a vowel and a consonant are extracted by analyzing the envelope of a speech signal, and the length of the vowel is defined as the length of the syllable.
The syllable extraction result is expressed by the number of frames for one vowel. As described above, one frame has a length of 32 ms in case of 48 kHz and 1536 samples / frame, so the average value of the syllable length extracted from one sentence is extracted as a tempo.
The artificial intelligence algorithm is used in step S150 for recognizing the emotion based on the extracted tone and tempo. In this embodiment, the Recurrent Neural Network algorithm is used. However, Deep Neural Network (DNN) (CNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), and Deep Q-Network (Deep Q-Networks) Can be used.
Like the tone information, the tempo information of the tempo information is also obtained for each frame, and the average value and the variance value of the tempo information are obtained and transmitted to the neural network.
In order to analyze the emotion in the artificial intelligence algorithm by using the tone information and the tempo information thus obtained, an initial learning process is required. After inputting the voice signals for the four emotions of the people, the optimal threshold value is set .
After the learning is completed, the neural network recognizes the emotion by dividing it into the primary neural network and the secondary neural network. In the primary neural network, it recognizes the normal emotions and the sad emotions with relatively low tone. In the secondary neural network, Tons that are not recognized in the neural network go through a process of recognition for joy and angry feelings rather than feelings and sadness feelings.
By recognizing emotions by dividing the neural network into two primary and secondary neurons, we distinguish only emotions and sadness emotions in the primary neural network. In the secondary neural network, we distinguish only joy and angry emotions. .
The emotion recognition step compares the variance of the tone of each emotion, the average value and the variance of the tempo, and the mean value, which are previously set through learning, and the variance and average value of the tone extracted from the previous stage. It is judged to express emotions.
FIG. 5 is a graph showing features extracted from tones in a voice signal. In the graph, the abscissa represents the time and the ordinate the frequency of the voice signal in hertz (Hz).
The diamonds corresponding to sadness are distributed below 150Hz, indicating the characteristics of bass, while joy is more than 200Hz, and angular frequency is more than 300Hz And that it has high frequency characteristics compared to sadness.
Therefore, by analyzing these characteristics, it is possible to analyze and recognize sadness, joy, anger, or general emotional state in a neural network.
FIG. 6 is a graph showing features extracted from the tempo of a voice signal. The vertical axis indicates the presence or absence of a voice signal. The interval in which the voice is present is 1, and the interval is 0. The horizontal axis represents time in frame units.
The difference in thickness of each bar in the graph indicates the tempo speed. The thicker the bar, the faster the tempo.
6 (a) shows the sadness, (b) shows the joy, and (c) shows the tempo for the South Korean. In the case of South Korea and joy, the frequency of the bar indicated by the thicker line than the sadness is higher.
The emotion can be determined using the tone information and the tempo information of the emotions shown in FIGS. 5 and 6, and the threshold value of tone and tempo for emotion determination can be determined through experiments.
If the emotion can not be recognized through the above steps, a method of recognizing the emotion by analyzing the breath sound of the person can also be used.
In a section in which no speech interval is detected, only a breathing sound may be generated without a speech in a case where a person is extremely sad or angry. Therefore, when a speech interval is detected using an existing threshold value, This is to compensate for the situation in which emotions can not be recognized because it can not be done.
In addition, even if the voice interval is detected, emotion recognition can be additionally performed by analyzing the energy level and tempo of the breathing sound in addition to the ambiguous recognition in the boundary section of the normal / sadness, joy / angry emotion. The threshold of breath sounds can also be set by experiment.
7 shows an
The
The
In order to detect the voice interval, the voice interval and the non-voice interval are separated based on the energy level using the absolute dividing feature as described above, and are transmitted to the tone
The tone
The tone information can be obtained by using an autocorrelation function or a method using the energy of each frequency of the frequency signal.
The tempo
When the tone information and the tempo information are found, the
The
It is possible to recognize the user's emotion more precisely by the emotion recognition apparatus as described above, and there is a possibility that the emotion recognition apparatus can be utilized in many parts.
Meanwhile, the emotion recognition method according to the embodiment of the present invention can be implemented in a computer system or recorded on a recording medium. 8, the computer system includes at least one
The computer system may further include a network interface 129 coupled to the network. The
The
Accordingly, the emotion recognition method according to the embodiment of the present invention can be implemented in a computer-executable method. When the emotion recognition method according to the embodiment of the present invention is performed in a computer device, computer-readable instructions can perform the recognition method according to the present invention.
Meanwhile, the above-described emotion recognition method according to the present invention can be implemented as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording media storing data that can be decoded by a computer system. For example, there may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device and the like. The computer-readable recording medium may also be distributed and executed in a computer system connected to a computer network and stored and executed as a code that can be read in a distributed manner.
While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.
Claims (10)
Detecting a voice interval by dividing the voice signal into a voice interval and a non-voice interval using an absolute division;
Extracting tone information and tempo information from the detected voice interval; And
Wherein the first neural network distinguishes between emotion and sadness emotion, and the second neural network distinguishes between joy emotion and anger emotion. Extracting emotion information;
The emotion recognition method comprising:
Calculating a maximum value and a minimum value of an absolute division of the speech signal,
If the minimum value exceeds a preset constant rate of the maximum value, the threshold value is multiplied by a maximum value and a first rate. If the minimum value is less than the predetermined rate, the threshold value is set as a minimum value and a second rate is set as a difference between a maximum value and a minimum value Multiplied value,
Determining that the absolute interval exceeds the threshold value, and determining that the absolute interval is a non-voice interval if the absolute interval is less than the threshold value
/ RTI >
Wherein the tone information includes an average value and a variance value of a fundamental frequency of the detected voice interval,
Wherein the tempo information includes an average value and a variance value of a speed of the detected voice interval
/ RTI >
The step of extracting the emotion information may include comparing an average value and a variance value of the fundamental frequency and an average value and a variance value of the fundamental frequency with a mean value and a variance value of an average value, If the value is below the set threshold value, judge the emotion.
/ RTI >
Extracting a fundamental frequency using an autocorrelation function, AMDF (Average Magnitude Difference Function), or FFT (Fast Fourier Transform)
/ RTI >
A voice section detector for detecting a voice section by dividing the voice signal into a voice section and a non-voice section using an absolute division;
A tone information extracting unit for extracting tone information from the detected voice interval;
A tempo information extracting unit for extracting tempo information from the extracted voice interval; And
Wherein the first neural network of the neural network distinguishes the emotion from the normal emotion and the second neural network distinguishes the emotional state from the joy emotion and the angry emotion An emotion recognition unit for extracting emotion information by classifying the emotion information;
And an emotion recognition device.
Calculating a maximum value and a minimum value of an absolute division of the speech signal,
If the minimum value exceeds a preset constant rate of the maximum value, the threshold value is multiplied by a maximum value and a first rate. If the minimum value is less than the predetermined rate, the threshold value is set as a minimum value and a second rate is set as a difference between a maximum value and a minimum value Multiplied value,
Determining that the absolute interval exceeds the threshold value, and determining that the absolute interval is a non-voice interval if the absolute interval is less than the threshold value
/ RTI >
Wherein the tone information extracting unit extracts tone information including an average value and a variance value of the tones of the detected voice interval,
Wherein the tempo information extracting unit extracts tempo information including an average value and a variance value of the detected tempo of the speech interval
/ RTI >
Comparing the mean value and the variance value of the tones and the average value and the variance value of the tempo with an average value and a mean value and a variance value of a mean value, a variance value and a tempo of each predetermined emotion,
/ RTI >
Extracting a fundamental frequency using an autocorrelation function, AMDF (Average Magnitude Difference Function), or FFT (Fast Fourier Transform)
/ RTI >
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/KR2015/013968 WO2017104875A1 (en) | 2015-12-18 | 2015-12-18 | Emotion recognition method using voice tone and tempo information, and apparatus therefor |
KR1020150181619A KR20170073113A (en) | 2015-12-18 | 2015-12-18 | Method and apparatus for recognizing emotion using tone and tempo of voice signal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150181619A KR20170073113A (en) | 2015-12-18 | 2015-12-18 | Method and apparatus for recognizing emotion using tone and tempo of voice signal |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20170073113A true KR20170073113A (en) | 2017-06-28 |
Family
ID=59056830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150181619A KR20170073113A (en) | 2015-12-18 | 2015-12-18 | Method and apparatus for recognizing emotion using tone and tempo of voice signal |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR20170073113A (en) |
WO (1) | WO2017104875A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108806667B (en) * | 2018-05-29 | 2020-04-17 | 重庆大学 | Synchronous recognition method of voice and emotion based on neural network |
CN109147826B (en) * | 2018-08-22 | 2022-12-27 | 平安科技(深圳)有限公司 | Music emotion recognition method and device, computer equipment and computer storage medium |
US10810382B2 (en) * | 2018-10-09 | 2020-10-20 | Disney Enterprises, Inc. | Automated conversion of vocabulary and narrative tone |
CN109243491B (en) * | 2018-10-11 | 2023-06-02 | 平安科技(深圳)有限公司 | Method, system and storage medium for emotion recognition of speech in frequency spectrum |
CN111627462B (en) * | 2020-05-22 | 2023-12-19 | 上海师范大学 | Semantic analysis-based emotion recognition method and device |
CN113327630B (en) * | 2021-05-27 | 2023-05-09 | 平安科技(深圳)有限公司 | Speech emotion recognition method, device, equipment and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI221574B (en) * | 2000-09-13 | 2004-10-01 | Agi Inc | Sentiment sensing method, perception generation method and device thereof and software |
US8788270B2 (en) * | 2009-06-16 | 2014-07-22 | University Of Florida Research Foundation, Inc. | Apparatus and method for determining an emotion state of a speaker |
US9020822B2 (en) * | 2012-10-19 | 2015-04-28 | Sony Computer Entertainment Inc. | Emotion recognition using auditory attention cues extracted from users voice |
-
2015
- 2015-12-18 WO PCT/KR2015/013968 patent/WO2017104875A1/en active Application Filing
- 2015-12-18 KR KR1020150181619A patent/KR20170073113A/en not_active Application Discontinuation
Also Published As
Publication number | Publication date |
---|---|
WO2017104875A1 (en) | 2017-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Boles et al. | Voice biometrics: Deep learning-based voiceprint authentication system | |
KR20170073113A (en) | Method and apparatus for recognizing emotion using tone and tempo of voice signal | |
KR101988222B1 (en) | Apparatus and method for large vocabulary continuous speech recognition | |
US8145486B2 (en) | Indexing apparatus, indexing method, and computer program product | |
JPH0352640B2 (en) | ||
WO2011046474A2 (en) | Method for identifying a speaker based on random speech phonograms using formant equalization | |
CN108899033B (en) | Method and device for determining speaker characteristics | |
KR101616112B1 (en) | Speaker separation system and method using voice feature vectors | |
KR101893789B1 (en) | Method for speech endpoint detection using normalizaion and apparatus thereof | |
KR101943381B1 (en) | Endpoint detection method of speech using deep neural network and apparatus thereof | |
CN112102850A (en) | Processing method, device and medium for emotion recognition and electronic equipment | |
JP2018180334A (en) | Emotion recognition device, method and program | |
CN110827853A (en) | Voice feature information extraction method, terminal and readable storage medium | |
Pao et al. | Combining acoustic features for improved emotion recognition in mandarin speech | |
KR101992955B1 (en) | Method for speech endpoint detection using normalizaion and apparatus thereof | |
JP2015055653A (en) | Speech recognition device and method and electronic apparatus | |
KR102098956B1 (en) | Voice recognition apparatus and method of recognizing the voice | |
Hasija et al. | Recognition of Children Punjabi Speech using Tonal Non-Tonal Classifier | |
CN114822502A (en) | Alarm method, alarm device, computer equipment and storage medium | |
Jamil et al. | Influences of age in emotion recognition of spontaneous speech: A case of an under-resourced language | |
KR100391123B1 (en) | speech recognition method and system using every single pitch-period data analysis | |
Mishra et al. | Speaker identification, differentiation and verification using deep learning for human machine interface | |
Lertwongkhanakool et al. | An automatic real-time synchronization of live speech with its transcription approach | |
Laleye et al. | Automatic boundary detection based on entropy measures for text-independent syllable segmentation | |
Raj et al. | Gender based affection recognition of speech signals using spectral & prosodic feature extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E90F | Notification of reason for final refusal | ||
E601 | Decision to refuse application |