CN110390942A - Mood detection method and its device based on vagitus - Google Patents
Mood detection method and its device based on vagitus Download PDFInfo
- Publication number
- CN110390942A CN110390942A CN201910571836.XA CN201910571836A CN110390942A CN 110390942 A CN110390942 A CN 110390942A CN 201910571836 A CN201910571836 A CN 201910571836A CN 110390942 A CN110390942 A CN 110390942A
- Authority
- CN
- China
- Prior art keywords
- vagitus
- signal
- mood
- environment voice
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000036651 mood Effects 0.000 title claims abstract description 97
- 238000001514 detection method Methods 0.000 title claims abstract description 87
- 239000002775 capsule Substances 0.000 claims abstract description 80
- 238000001228 spectrum Methods 0.000 claims abstract description 36
- 230000015654 memory Effects 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 239000003292 glue Substances 0.000 claims description 3
- 230000008451 emotion Effects 0.000 abstract description 5
- 230000009466 transformation Effects 0.000 abstract description 3
- 239000013598 vector Substances 0.000 description 16
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 206010011469 Crying Diseases 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Spectroscopy & Molecular Physics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a kind of mood detection method and its device based on vagitus, are related to big data technical field.The mood detection method includes: to obtain vagitus signal to be detected, carries out adding window to vagitus signal.Fast Fourier Transform (FFT) is carried out to the vagitus window signal in each time window respectively, to generate the corresponding spectrogram of vagitus signal.Spectrogram is inputted in trained mood detection model, to determine the corresponding mood of vagitus signal.Hereby it is achieved that converting corresponding sound spectrum image by Fourier transformation for vagitus signal, Emotion identification is carried out to sound spectrum using capsule network, improves the recognition accuracy to vagitus.Technical solution provided in an embodiment of the present invention is able to solve problem low to the recognition accuracy of vagitus in the prior art.
Description
[technical field]
The present invention relates to big data technical field more particularly to a kind of mood detection methods and its dress based on vagitus
It sets.
[background technique]
In the related technology, voice signal is directly inputted in voice recognition model, export corresponding Emotion identification as a result,
To realize the Emotion identification to voice signal.Wherein, voice recognition model is preparatory trained neural network.
Particular kind of voice signal this for vagitus, without specific voice content, using existing sound
Identification model has that recognition accuracy is low.
[summary of the invention]
In view of this, the embodiment of the invention provides a kind of mood detection method and its device based on vagitus, is used
It is low to the Emotion identification accuracy rate of vagitus in the prior art to solve the problems, such as.
On the one hand, the mood detection method based on vagitus that the embodiment of the invention provides a kind of, which comprises
Obtain vagitus signal to be detected;Adding window is carried out to the vagitus signal;Respectively to the baby in each time window
Crying window signal carries out Fast Fourier Transform (FFT), to generate the corresponding spectrogram of the vagitus signal;And by the frequency
Spectrogram inputs in trained mood detection model, with the corresponding mood of the determination vagitus signal;Wherein, the mood
Detection model includes capsule network.
Further, described to obtain vagitus signal to be detected, comprising: to obtain environment voice signal;To the ring
Border voice signal carries out high-pass filtering, to filter out low frequency voice signal;Using mute detection algorithm to the filtered environment
Voice signal is detected, to obtain the vagitus signal.
It is further, described that the filtered environment voice signal is detected using mute detection algorithm, comprising:
Filtered environment voice signal described in a frame is obtained as environment voice frame signal;It is corresponding to calculate the environment voice frame signal
Frequency spectrum flatness and short-time energy value;Judge that the frequency spectrum flatness whether less than the first preset threshold, and described in short-term can
Whether magnitude is less than the second preset threshold;If so, corresponding using the environment voice frame signal as the vagitus signal
Vagitus frame signal.
Further, the trained mood detection model is completed to train by following steps: obtaining and refers to baby cried
The corresponding reference spectrum figure of acoustical signal and refer to mood;The reference spectrum figure is inputted into the capsule network;Use cross entropy
Loss function calculates the output and the difference with reference to mood of the capsule network, to optimize the parameter of the capsule network;
The trained mood detection model is determined based on the parameter of the capsule network after optimization.
On the one hand, the mood detection device based on vagitus that the embodiment of the invention provides a kind of, described device include:
First obtains module, for obtaining vagitus signal to be detected;Adding window module, for being carried out to the vagitus signal
Adding window;Conversion module, for carrying out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively, to generate
The corresponding spectrogram of the vagitus signal;And first input module, for the spectrogram to be inputted trained feelings
In thread detection model, with the corresponding mood of the determination vagitus signal;Wherein, the mood detection model includes capsule net
Network.
Further, the first acquisition module includes: acquisition submodule, for obtaining environment voice signal;Filtering
Module, for carrying out high-pass filtering to the environment voice signal, to filter out low frequency voice signal;Detection sub-module, for making
The filtered environment voice signal is detected with mute detection algorithm, to obtain the vagitus signal.
Further, the detection sub-module includes: acquiring unit, for obtaining filtered environment voice described in a frame
Signal is as environment voice frame signal;Computing unit, for calculate the corresponding frequency spectrum flatness of the environment voice frame signal and
Short-time energy value;Judging unit, for judge the frequency spectrum flatness whether less than the first preset threshold, and the short-time energy
Whether value is less than the second preset threshold;Setting unit, for when the output of the judging unit be certainly signal when, by the ring
Border voice frame signal is as the corresponding vagitus frame signal of the vagitus signal.
Further, described device further include: second obtains module, refers to the corresponding ginseng of vagitus signal for obtaining
Examine spectrogram and with reference to mood;Second input module, for the reference spectrum figure to be inputted the capsule network;Calculate mould
Block, for calculating the output and the difference with reference to mood of the capsule network using cross entropy loss function, to optimize
State the parameter of capsule network;Determining module, it is described trained for being determined based on the parameter of the capsule network after optimization
Mood detection model.
On the one hand, the embodiment of the invention provides a kind of computer equipment, including memory and processor, the memories
For storing the information including program instruction, the processor is used to control the execution of program instruction, and described program instruction is located
The step of reason device loads and realizes the above-mentioned mood detection method based on vagitus when executing.
On the one hand, the embodiment of the invention provides a kind of storage medium, the storage medium includes the program of storage,
In, equipment where controlling the storage medium in described program operation executes the above-mentioned mood detection side based on vagitus
The step of method.
In embodiments of the present invention, corresponding sound audio spectrogram is converted by Fourier transformation by vagitus signal
Picture carries out Emotion identification to sound spectrum using capsule network, solves the recognition accuracy in the prior art to vagitus
Low problem has achieved the effect that improve the recognition accuracy to vagitus.
[Detailed description of the invention]
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field
For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is a kind of flow diagram of the mood detection method based on vagitus provided by the embodiment of the present invention;
Fig. 2 is the structural schematic diagram of capsule network provided by the embodiment of the present invention;
Fig. 3 is an exemplary process of the mood detection method based on vagitus provided by the embodiment of the present invention
Figure;
Fig. 4 is a kind of structural schematic diagram of the mood detection device based on vagitus provided by the embodiment of the present invention;
And
Fig. 5 is a kind of schematic diagram of computer equipment provided by the embodiment of the present invention.
[specific embodiment]
For a better understanding of the technical solution of the present invention, being retouched in detail to the embodiment of the present invention with reference to the accompanying drawing
It states.
It will be appreciated that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its
Its embodiment, shall fall within the protection scope of the present invention.
The term used in embodiments of the present invention is only to be not intended to be limiting merely for for the purpose of describing particular embodiments
The present invention.In the embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the"
It is also intended to including most forms, unless the context clearly indicates other meaning.
It should be appreciated that term "and/or" used herein is only a kind of identical field for describing affiliated partner, table
Show there may be three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, individualism B this three
Kind situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though default model may be described using term first, second, third, etc. in embodiments of the present invention
It encloses, but these preset ranges should not necessarily be limited by these terms.These terms are only used to for preset range being distinguished from each other out.For example,
In the case where not departing from range of embodiment of the invention, the first preset range can also be referred to as the second preset range, similarly,
Second preset range can also be referred to as the first preset range.
Depending on context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determination " or " in response to detection ".Similarly, depend on context, phrase " if it is determined that " or " if detection
(condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when the detection (condition of statement
Or event) when " or " in response to detection (condition or event of statement) ".
Description based on the above-mentioned prior art is this for vagitus particular kind of it is recognised that in the related technology
Voice signal has that recognition accuracy is low using existing voice recognition model without specific voice content.
For this problem, the mood detection method based on vagitus that the embodiment of the invention provides a kind of, by baby
Crying signal is converted into corresponding sound spectrum image by Fourier transformation, carries out mood to sound spectrum using capsule network
Identification, improves the recognition accuracy to vagitus.
Fig. 1 is a kind of flow diagram of the mood detection method based on vagitus provided by the embodiment of the present invention.
As shown in Figure 1, this method comprises:
Step S101 obtains vagitus signal to be detected.
It should be noted that the mood detection method based on vagitus that is proposed of the embodiment of the present invention can be from noisy
Environment voice signal in obtain vagitus signal, then to vagitus signal carry out mood detection.
Therefore, in order to the acquisition vagitus signal from noisy environment voice signal, a kind of possible realization side
Formula is to obtain environment voice signal, high-pass filtering is carried out to environment voice signal, to filter out low frequency voice signal.Using mute
Detection algorithm detects filtered environment voice signal, to obtain vagitus signal.
It should be noted that may include vagitus signal in environment voice signal, and people's one's voice in speech signal, environment
Noise signal etc., and vagitus signal, for other voice signals, frequency is some higher, therefore can pass through high pass
The mode of filtering filters out other voice signals.
In addition, none totally continuous feature of vagitus signal, but the sequence as pulse, each sequence
Duration is also different.Therefore, it is also desirable to filtered environment voice signal is detected using mute detection algorithm, with
Obtain the vagitus signal.
It is a kind of it is possible is achieved in that, obtain the filtered environment voice signal of a frame as environment voice frame signal,
Calculate the corresponding frequency spectrum flatness of environment voice frame signal and short-time energy value.Judge whether frequency spectrum flatness is default less than first
Threshold value, and whether short-time energy value less than the second preset threshold, if so, using environment voice frame signal as vagitus signal
Corresponding vagitus frame signal.
Wherein, short-time energy value is the weighted sum of squares of a frame speech signal samples point energy value, and frequency spectrum flatness is one
The opposite variation of frame voice signal power in frequency domain.
The calculation formula of frequency spectrum flatness is as follows:
SFdB=10log10(GS/AS)
Wherein, SFdBIndicate frequency spectrum flatness, GSAnd ASIt is the average of audible spectrum spatially and arithmetically respectively.
If should be appreciated that the corresponding frequency spectrum flatness of environment voice frame signal less than the first preset threshold, and short-time energy
Value then illustrates that the environment voice frame signal is vagitus frame signal less than the second preset threshold.
Further, multiframe vagitus frame signal is spliced, vagitus signal can be obtained.
Step S102 carries out adding window to vagitus signal.
In order to allow the corresponding spectrogram resolution ratio with higher in time of generation, a kind of possible implementation
It is that every 15 frame vagitus signal adds a time window, obtains vagitus window signal, two adjacent vagitus window signals
Overlapping 50%.
Step S103 carries out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively, to generate
The corresponding spectrogram of vagitus signal.
Wherein, the size of Fast Fourier Transform (FFT) is 256.
It should be appreciated that the corresponding spectrogram of each vagitus window signal, if two neighboring vagitus window signal weight
When folded 50%, then there is also corresponding overlappings for corresponding two spectrograms.
And vagitus signal corresponds to multiple vagitus window signals, therefore, the corresponding spectrogram of vagitus signal is
It is multiple.
Step S104 inputs spectrogram in trained mood detection model, to determine that vagitus signal is corresponding
Mood.
Wherein, mood detection model includes capsule network.
It should be noted that the capsule network that the embodiment of the present invention is proposed was made of some filters, these filters
Wave device can detecte object, does alignment on linear subspace with specific reference to the feature to extraction and carrys out detection object.
As shown in Fig. 2, capsule network includes ReLU layers of convolution, main capsule layer, sobbing capsule layer.
ReLU layers of main function of convolution be responsible for extract input picture low-level features, include 128 step-lengths be 2
9x9 convolution kernel, activation primitive are ReLU line rectification function.The intensity of pixel has been changed into local feature detection by this layer
Device, the input as capsule.The capsule that ReLU layers of convolution is the bottom of various dimensions object, these capsules is activated to correspond to reversion
Render process.
Main capsule layer is a convolution capsule layer, which includes 32 channels, and each capsule contains 8 convolution units,
Namely each capsule is 8 dimensions.Each unit is the convolution kernel for the 9x9 that step-length is 2.The output of each capsule can be seen
To the output of all 128x28x28 convolution units, their receptive field and the center of capsule are overlapped.In general, mainly
Capsule layer has [32,10,10] a capsule output, each output is the vector of one 8 dimension, each output is in [6,6]
Weight is shared between capsule in grid.
Sobbing capsule layer corresponds to different mood classifications, each class has the capsule of one 16 dimension, each sobbing
The vector length of the capsule of layer represents whether each classification occurs, and vector length, which can also be used to calculate, intersects entropy loss letter
Number.The output of sobbing capsule layer is connected to softmax, to determine the corresponding mood of vagitus signal.
It should be strongly noted that capsule is that one group of learning of neuron goes whether some objects in identification image occur,
And carry out its property to be encoded into vector, the size of vector field homoemorphism just represents whether an object occurs.For example, each capsule
Study can be gone to recognize a part of specific object or object on the image.Under the frame of neural network, many glue
Capsule can be gathered together to form a capsule layer, wherein each unit of the inside can export a vector, rather than
The activation of traditional scalar exports.
The size of the mould of the output vector of capsule, which is represented, appears in the general of current input kind by this object of Capsules representative
The size of the mould of any output vector can be compressed to [0,1] using nonlinear function, this nonlinear function by rate are as follows:
Wherein vjIt is the output vector of capsule j.That is, capsule j is to input vector sjApply a nonlinear compression
Activation primitive, then output vector vj。sjIt is retained, only the size of its mould is compressed between 0 to 1.vj
The parameter of the inside represents the different property of object (such as position, size, texture), and the size of mould is represented with the presence or absence of this
Object.
Input vector sjIt is one to all predicted vectorsWhat weighted sum obtained, be to a glue in upper one layer
The output u of capsuleiMultiplied by a weight matrix Wji, formula is as follows:
Wherein cijThe referred to as coefficient of coup is updated by dynamic routing algorithm between capsule, other capsules of capsule i and same layer
The sum of the coefficient of coup be equal to 1.
Initial bijIt is initialized to 0, the coefficient of coup of capsule i all capsules into next layer is all equal, then docks
All inputs receivedIt is weighted and the summation that adds up obtains sj, wherein weight is each coefficient of coup cij, then using non-thread
Property function is to sjIt is handled, obtains vj, following formula is reused to bijAnd cijIt is updated.
It is a convolution unit in each of capsule unit in traditional capsule layer.Therefore, each capsule meeting
Export a series of vectors rather than an individual vector.
The coefficient of coup determines how information circulates between capsule pair.It include the classification task of K class for one,
The last layer of capsule network can be designed to have K capsule, each represents a class.Because of the length of vector output
Degree, which represents, whether there is an object, and in the last layer, the length of each capsule can be counted as into whether a picture belongs to
In the probability of classification k.
Specifically in embodiments of the present invention, the corresponding mood of vagitus signal can be divided into it is positive, irritated, cry
Tears three classes.
Trained mood detection model provided by the embodiment of the present invention in order to obtain, a kind of possible implementation
It is that trained mood detection model is completed to train by following steps:
S11 is obtained with reference to the corresponding reference spectrum figure of vagitus signal and is referred to mood.
It should be appreciated that reference spectrum figure provided by the embodiment of the present invention and reference mood are for training mood to detect mould
Type, therefore, each reference spectrum figure correctly corresponds to a kind of with reference to mood.
Reference spectrum figure is inputted capsule network by S12.
S13 calculates the output and the difference of reference mood of capsule network, using cross entropy loss function to optimize capsule net
The parameter of network.
Wherein, what cross entropy calculated is reality output probability at a distance from desired output probability, that is to say, that cross entropy
It is worth smaller, two probability distribution are with regard to closer.
When reality output probability is identical with desired output probability in test each time, illustrate that capsule network is completed
Parameter optimization.
S14 determines trained mood detection model based on the parameter of the capsule network after optimization.
In conclusion a kind of mood detection method based on vagitus that the embodiment of the present invention is proposed.It obtains to be checked
The vagitus signal of survey, to vagitus signal carry out adding window, respectively to the vagitus window signal in each time window into
Row Fast Fourier Transform (FFT), to generate the corresponding spectrogram of vagitus signal.Spectrogram is inputted into trained mood detection
In model, to determine the corresponding mood of vagitus signal.Wherein, mood detection model includes capsule network.Hereby it is achieved that
According to vagitus signal to be detected, corresponding spectrogram is generated.Mould is detected using the mood generated by capsule network training
Type carries out mood detection to spectrogram, and then determines the corresponding mood of vagitus signal.
In order to clearly illustrate the mood detection method provided by the embodiment of the present invention based on vagitus, below
It is illustrated.
As shown in figure 3, carrying out Fast Fourier Transform (FFT) to reference vagitus signal first, obtain believing with reference to vagitus
Number corresponding reference spectrum figure, is trained using parameter of the reference spectrum figure to capsule network, obtains mood detection model.
High-pass filtering may be carried out to it and be made an uproar with filtering environmental comprising the environment voice signal of vagitus signal by obtaining
Then sound detects filtered voice signal using mute detection algorithm, to obtain the vagitus detected to mood
Signal.Fast Fourier Transform (FFT) is carried out to vagitus signal, obtains corresponding spectrogram, utilizes the detection pair of mood detection model
The mood answered.
In order to realize above-described embodiment, the embodiment of the present invention also proposed a kind of mood detection dress based on vagitus
It sets.Fig. 4 is a kind of structural schematic diagram of the mood detection device based on vagitus provided by the embodiment of the present invention.The device
For executing above-mentioned mood detection method, as shown in figure 4, the device includes: the first acquisition module 210, adding window module 220 becomes
Change the mold block 230, the first input module 240.
First obtains module 210, for obtaining vagitus signal to be detected.
Adding window module 220, for carrying out adding window to vagitus signal.
Conversion module 230, for carrying out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively,
To generate the corresponding spectrogram of vagitus signal.
First input module 240, for inputting spectrogram in trained mood detection model, to determine vagitus
The corresponding mood of signal;Wherein, mood detection model includes capsule network.
Further, in order to acquisition vagitus signal, a kind of possible reality from noisy environment voice signal
Existing mode is that the first acquisition module 210 includes: acquisition submodule 211, for obtaining environment voice signal.Filter submodule
212, for carrying out high-pass filtering to environment voice signal, to filter out low frequency voice signal.Detection sub-module 213, for using
Mute detection algorithm detects filtered environment voice signal, to obtain vagitus signal.
Further, in order to be detected to filtered environment voice signal, to obtain vagitus signal, one
Kind is possible to be achieved in that, detection sub-module 213 includes: acquiring unit 2131, for obtaining the filtered environment language of a frame
Sound signal is as environment voice frame signal.Computing unit 2132, for calculating the corresponding frequency spectrum flatness of environment voice frame signal
With short-time energy value.Judging unit 2133, for judge frequency spectrum flatness whether less than the first preset threshold, and short-time energy value
Whether less than the second preset threshold.Setting unit 2134, for when the output of judging unit be certainly signal when, by environment voice
Frame signal is as the corresponding vagitus frame signal of vagitus signal.
Further, trained mood detection model provided by the embodiment of the present invention, one kind are possible in order to obtain
It is achieved in that, the device further include: second obtains module 250, corresponding with reference to frequency with reference to vagitus signal for obtaining
Spectrogram and refer to mood.Second input module 260, for reference spectrum figure to be inputted capsule network.Computing module 270, is used for
The output and the difference of reference mood of capsule network are calculated, using cross entropy loss function to optimize the parameter of capsule network.Really
Cover half block 280, for determining trained mood detection model based on the parameter of the capsule network after optimization.
It should be noted that the aforementioned explanation to the mood detection method embodiment based on vagitus is also applied for
The mood detection device based on vagitus of the embodiment, details are not described herein again.
In conclusion a kind of mood detection device based on vagitus that the embodiment of the present invention is proposed.It obtains to be checked
The vagitus signal of survey, to vagitus signal carry out adding window, respectively to the vagitus window signal in each time window into
Row Fast Fourier Transform (FFT), to generate the corresponding spectrogram of vagitus signal.Spectrogram is inputted into trained mood detection
In model, to determine the corresponding mood of vagitus signal.Wherein, mood detection model includes capsule network.Hereby it is achieved that
According to vagitus signal to be detected, corresponding spectrogram is generated.Mould is detected using the mood generated by capsule network training
Type carries out mood detection to spectrogram, and then determines the corresponding mood of vagitus signal.
In order to realize above-described embodiment, the embodiment of the present invention also proposes a kind of computer equipment, including memory, processor
And store the computer program that can be run in memory and on a processor, which is characterized in that processor executes computer
The step of mood detection method based on vagitus such as preceding method embodiment is realized when program.
Fig. 5 is a kind of schematic diagram of computer equipment provided in an embodiment of the present invention.As shown in figure 5, the meter of the embodiment
Machine equipment 50 is calculated to include: processor 51, memory 52 and be stored in the meter that can be run in memory 52 and on processor 51
Calculation machine program 53 realizes that the mood based on vagitus in embodiment detects when the computer program 53 is executed by processor 51
Method does not repeat one by one herein to avoid repeating.Alternatively, being realized in embodiment when the computer program is executed by processor 51
The function of each model/unit does not repeat one by one herein in mood detection device based on vagitus to avoid repeating.
Computer equipment 50 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.
Computer equipment may include, but be not limited only to, processor 51, memory 52.It will be understood by those skilled in the art that Fig. 5 is only
It is the example of computer equipment 50, does not constitute the restriction to computer equipment 50, may include more more or fewer than illustrating
Component perhaps combines certain components or different components, such as computer equipment can also include input-output equipment, net
Network access device, bus etc..
Alleged processor 51 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
Memory 52 can be the internal storage unit of computer equipment 50, such as the hard disk or interior of computer equipment 50
It deposits.Memory 52 is also possible to the plug-in type being equipped on the External memory equipment of computer equipment 50, such as computer equipment 50
Hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card
(Flash Card) etc..Further, memory 52 can also both including computer equipment 50 internal storage unit and also including
External memory equipment.Memory 52 is for storing other programs and data needed for computer program and computer equipment.It deposits
Reservoir 52 can be also used for temporarily storing the data that has exported or will export.
In order to realize above-described embodiment, the embodiment of the present invention also proposes that a kind of computer readable storage medium, computer can
It reads storage medium and is stored with computer program, which is characterized in that realize such as preceding method when computer program is executed by processor
The step of mood detection method based on vagitus of embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit,
Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can be with
In conjunction with or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING of device or unit or
Communication connection can be electrical property, mechanical or other forms.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit
Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks
On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that device (can be personal computer, server or network equipment etc.) or processor (Processor) execute the present invention
The part steps of embodiment method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only
Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit
Store up the medium of program code.
The above is merely preferred embodiments of the present invention, be not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.
Claims (10)
1. a kind of mood detection method based on vagitus, which is characterized in that the described method includes:
Obtain vagitus signal to be detected;
Adding window is carried out to the vagitus signal;
Fast Fourier Transform (FFT) is carried out to the vagitus window signal in each time window respectively, to generate the vagitus letter
Number corresponding spectrogram;And
The spectrogram is inputted in trained mood detection model, with the corresponding mood of the determination vagitus signal;
Wherein, the mood detection model includes capsule network.
2. the method as described in claim 1, which is characterized in that described to obtain vagitus signal to be detected, comprising:
Obtain environment voice signal;
High-pass filtering is carried out to the environment voice signal, to filter out low frequency voice signal;
The filtered environment voice signal is detected using mute detection algorithm, to obtain the vagitus letter
Number.
3. method according to claim 2, which is characterized in that described to use mute detection algorithm to the filtered environment
Voice signal is detected, comprising:
Filtered environment voice signal described in a frame is obtained as environment voice frame signal;
Calculate the corresponding frequency spectrum flatness of the environment voice frame signal and short-time energy value;
Whether the frequency spectrum flatness is judged less than the first preset threshold, and whether the short-time energy value is less than the second default threshold
Value;
If so, using the environment voice frame signal as the corresponding vagitus frame signal of the vagitus signal.
4. method as claimed in any one of claims 1-3, which is characterized in that the trained mood detection model passes through
Following steps complete training:
It obtains with reference to the corresponding reference spectrum figure of vagitus signal and refers to mood;
The reference spectrum figure is inputted into the capsule network;
The output and the difference with reference to mood of the capsule network are calculated, using cross entropy loss function to optimize the glue
The parameter of keed network;
The trained mood detection model is determined based on the parameter of the capsule network after optimization.
5. a kind of mood detection device based on vagitus, which is characterized in that described device includes:
First obtains module, for obtaining vagitus signal to be detected;
Adding window module, for carrying out adding window to the vagitus signal;
Conversion module, for carrying out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively, to generate
The corresponding spectrogram of the vagitus signal;And
First input module, for inputting the spectrogram in trained mood detection model, with the determination baby cried
The corresponding mood of acoustical signal;Wherein, the mood detection model includes capsule network.
6. device as claimed in claim 5, which is characterized in that described first, which obtains module, includes:
Acquisition submodule, for obtaining environment voice signal;
Submodule is filtered, for carrying out high-pass filtering to the environment voice signal, to filter out low frequency voice signal;
Detection sub-module, for being detected using mute detection algorithm to the filtered environment voice signal, to obtain
The vagitus signal.
7. device as claimed in claim 6, which is characterized in that the detection sub-module includes:
Acquiring unit, for obtaining filtered environment voice signal described in a frame as environment voice frame signal;
Computing unit, for calculating the corresponding frequency spectrum flatness of the environment voice frame signal and short-time energy value;
Judging unit, for judge the frequency spectrum flatness whether less than the first preset threshold, and the short-time energy value whether
Less than the second preset threshold;
Setting unit, for when the output of the judging unit be certainly signal when, using the environment voice frame signal as institute
State the corresponding vagitus frame signal of vagitus signal.
8. the device as described in any one of claim 5-7, which is characterized in that described device further include:
Second obtains module, for obtaining with reference to the corresponding reference spectrum figure of vagitus signal and referring to mood;
Second input module, for the reference spectrum figure to be inputted the capsule network;
Computing module, for calculating the output and the difference with reference to mood of the capsule network using cross entropy loss function
Value, to optimize the parameter of the capsule network;
Determining module, for determining the trained mood detection model based on the parameter of the capsule network after optimization.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor
The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to
The step of 4 described in any item mood detection methods based on vagitus.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In realization such as Claims 1-4 described in any item feelings based on vagitus when the computer program is executed by processor
The step of thread detection method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910571836.XA CN110390942A (en) | 2019-06-28 | 2019-06-28 | Mood detection method and its device based on vagitus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910571836.XA CN110390942A (en) | 2019-06-28 | 2019-06-28 | Mood detection method and its device based on vagitus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110390942A true CN110390942A (en) | 2019-10-29 |
Family
ID=68285966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910571836.XA Pending CN110390942A (en) | 2019-06-28 | 2019-06-28 | Mood detection method and its device based on vagitus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390942A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105812A (en) * | 2019-12-31 | 2020-05-05 | 普联国际有限公司 | Audio feature extraction method and device, training method and electronic equipment |
CN111261173A (en) * | 2020-01-10 | 2020-06-09 | 珠海格力电器股份有限公司 | Electric appliance control method and device, storage medium and electric appliance |
CN111967361A (en) * | 2020-08-07 | 2020-11-20 | 盐城工学院 | Emotion detection method based on baby expression recognition and crying |
CN112057089A (en) * | 2020-08-31 | 2020-12-11 | 五邑大学 | Emotion recognition method, emotion recognition device and storage medium |
CN112599134A (en) * | 2020-12-02 | 2021-04-02 | 国网安徽省电力有限公司 | Transformer sound event detection method based on voiceprint recognition |
WO2021143411A1 (en) * | 2020-01-17 | 2021-07-22 | 海信视像科技股份有限公司 | Ambient sound output apparatus, system, method, and nonvolatile storage medium |
WO2021147157A1 (en) * | 2020-01-20 | 2021-07-29 | 网易(杭州)网络有限公司 | Game special effect generation method and apparatus, and storage medium and electronic device |
CN114863950A (en) * | 2022-07-07 | 2022-08-05 | 深圳神目信息技术有限公司 | Baby crying detection and network establishment method and system based on anomaly detection |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020135618A1 (en) * | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
US20050108004A1 (en) * | 2003-03-11 | 2005-05-19 | Takeshi Otani | Voice activity detector based on spectral flatness of input signal |
CN105976831A (en) * | 2016-05-13 | 2016-09-28 | 中国人民解放军国防科学技术大学 | Lost child detection method based on cry recognition |
CN109410917A (en) * | 2018-09-26 | 2019-03-01 | 河海大学常州校区 | Voice data classification method based on modified capsule network |
-
2019
- 2019-06-28 CN CN201910571836.XA patent/CN110390942A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020135618A1 (en) * | 2001-02-05 | 2002-09-26 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
US20050108004A1 (en) * | 2003-03-11 | 2005-05-19 | Takeshi Otani | Voice activity detector based on spectral flatness of input signal |
CN105976831A (en) * | 2016-05-13 | 2016-09-28 | 中国人民解放军国防科学技术大学 | Lost child detection method based on cry recognition |
CN109410917A (en) * | 2018-09-26 | 2019-03-01 | 河海大学常州校区 | Voice data classification method based on modified capsule network |
Non-Patent Citations (1)
Title |
---|
M. A. TU˘GTEKIN TURAN等: "Monitoring Infant’s Emotional Cry in Domestic Environments", 《ISCA-SPEECH》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105812A (en) * | 2019-12-31 | 2020-05-05 | 普联国际有限公司 | Audio feature extraction method and device, training method and electronic equipment |
CN111261173A (en) * | 2020-01-10 | 2020-06-09 | 珠海格力电器股份有限公司 | Electric appliance control method and device, storage medium and electric appliance |
WO2021143411A1 (en) * | 2020-01-17 | 2021-07-22 | 海信视像科技股份有限公司 | Ambient sound output apparatus, system, method, and nonvolatile storage medium |
WO2021147157A1 (en) * | 2020-01-20 | 2021-07-29 | 网易(杭州)网络有限公司 | Game special effect generation method and apparatus, and storage medium and electronic device |
CN111967361A (en) * | 2020-08-07 | 2020-11-20 | 盐城工学院 | Emotion detection method based on baby expression recognition and crying |
CN112057089A (en) * | 2020-08-31 | 2020-12-11 | 五邑大学 | Emotion recognition method, emotion recognition device and storage medium |
CN112599134A (en) * | 2020-12-02 | 2021-04-02 | 国网安徽省电力有限公司 | Transformer sound event detection method based on voiceprint recognition |
CN114863950A (en) * | 2022-07-07 | 2022-08-05 | 深圳神目信息技术有限公司 | Baby crying detection and network establishment method and system based on anomaly detection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390942A (en) | Mood detection method and its device based on vagitus | |
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
US10621971B2 (en) | Method and device for extracting speech feature based on artificial intelligence | |
KR102213013B1 (en) | Frequency-based audio analysis using neural networks | |
CN111814574B (en) | Face living body detection system, terminal and storage medium applying double-branch three-dimensional convolution model | |
CN110600059B (en) | Acoustic event detection method and device, electronic equipment and storage medium | |
CN110444202B (en) | Composite voice recognition method, device, equipment and computer readable storage medium | |
CN110082135A (en) | Equipment fault recognition methods, device and terminal device | |
CN109658943B (en) | Audio noise detection method and device, storage medium and mobile terminal | |
CN111798828B (en) | Synthetic audio detection method, system, mobile terminal and storage medium | |
CN112949708A (en) | Emotion recognition method and device, computer equipment and storage medium | |
CN113205820B (en) | Method for generating voice coder for voice event detection | |
CN111357051A (en) | Speech emotion recognition method, intelligent device and computer readable storage medium | |
CN110648669B (en) | Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium | |
CN114863938A (en) | Bird language identification method and system based on attention residual error and feature fusion | |
CN114155875B (en) | Method and device for identifying voice scene tampering, electronic equipment and storage medium | |
CN111462755A (en) | Information prompting method and device, electronic equipment and medium | |
CN113763966A (en) | End-to-end text-independent voiceprint recognition method and system | |
Chinmayi et al. | Emotion Classification Using Deep Learning | |
CN115328661B (en) | Computing power balance execution method and chip based on voice and image characteristics | |
CN117496990A (en) | Speech denoising method, device, computer equipment and storage medium | |
CN111639537A (en) | Face action unit identification method and device, electronic equipment and storage medium | |
US20220408201A1 (en) | Method and system of audio processing using cochlear-simulating spike data | |
CN115565548A (en) | Abnormal sound detection method, abnormal sound detection device, storage medium and electronic equipment | |
CN114548262A (en) | Feature level fusion method for multi-modal physiological signals in emotion calculation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191029 |
|
RJ01 | Rejection of invention patent application after publication |