CN110390942A - Mood detection method and its device based on vagitus - Google Patents

Mood detection method and its device based on vagitus Download PDF

Info

Publication number
CN110390942A
CN110390942A CN201910571836.XA CN201910571836A CN110390942A CN 110390942 A CN110390942 A CN 110390942A CN 201910571836 A CN201910571836 A CN 201910571836A CN 110390942 A CN110390942 A CN 110390942A
Authority
CN
China
Prior art keywords
vagitus
signal
mood
environment voice
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910571836.XA
Other languages
Chinese (zh)
Inventor
刘博卿
王健宗
贾雪丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910571836.XA priority Critical patent/CN110390942A/en
Publication of CN110390942A publication Critical patent/CN110390942A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a kind of mood detection method and its device based on vagitus, are related to big data technical field.The mood detection method includes: to obtain vagitus signal to be detected, carries out adding window to vagitus signal.Fast Fourier Transform (FFT) is carried out to the vagitus window signal in each time window respectively, to generate the corresponding spectrogram of vagitus signal.Spectrogram is inputted in trained mood detection model, to determine the corresponding mood of vagitus signal.Hereby it is achieved that converting corresponding sound spectrum image by Fourier transformation for vagitus signal, Emotion identification is carried out to sound spectrum using capsule network, improves the recognition accuracy to vagitus.Technical solution provided in an embodiment of the present invention is able to solve problem low to the recognition accuracy of vagitus in the prior art.

Description

Mood detection method and its device based on vagitus
[technical field]
The present invention relates to big data technical field more particularly to a kind of mood detection methods and its dress based on vagitus It sets.
[background technique]
In the related technology, voice signal is directly inputted in voice recognition model, export corresponding Emotion identification as a result, To realize the Emotion identification to voice signal.Wherein, voice recognition model is preparatory trained neural network.
Particular kind of voice signal this for vagitus, without specific voice content, using existing sound Identification model has that recognition accuracy is low.
[summary of the invention]
In view of this, the embodiment of the invention provides a kind of mood detection method and its device based on vagitus, is used It is low to the Emotion identification accuracy rate of vagitus in the prior art to solve the problems, such as.
On the one hand, the mood detection method based on vagitus that the embodiment of the invention provides a kind of, which comprises Obtain vagitus signal to be detected;Adding window is carried out to the vagitus signal;Respectively to the baby in each time window Crying window signal carries out Fast Fourier Transform (FFT), to generate the corresponding spectrogram of the vagitus signal;And by the frequency Spectrogram inputs in trained mood detection model, with the corresponding mood of the determination vagitus signal;Wherein, the mood Detection model includes capsule network.
Further, described to obtain vagitus signal to be detected, comprising: to obtain environment voice signal;To the ring Border voice signal carries out high-pass filtering, to filter out low frequency voice signal;Using mute detection algorithm to the filtered environment Voice signal is detected, to obtain the vagitus signal.
It is further, described that the filtered environment voice signal is detected using mute detection algorithm, comprising: Filtered environment voice signal described in a frame is obtained as environment voice frame signal;It is corresponding to calculate the environment voice frame signal Frequency spectrum flatness and short-time energy value;Judge that the frequency spectrum flatness whether less than the first preset threshold, and described in short-term can Whether magnitude is less than the second preset threshold;If so, corresponding using the environment voice frame signal as the vagitus signal Vagitus frame signal.
Further, the trained mood detection model is completed to train by following steps: obtaining and refers to baby cried The corresponding reference spectrum figure of acoustical signal and refer to mood;The reference spectrum figure is inputted into the capsule network;Use cross entropy Loss function calculates the output and the difference with reference to mood of the capsule network, to optimize the parameter of the capsule network; The trained mood detection model is determined based on the parameter of the capsule network after optimization.
On the one hand, the mood detection device based on vagitus that the embodiment of the invention provides a kind of, described device include: First obtains module, for obtaining vagitus signal to be detected;Adding window module, for being carried out to the vagitus signal Adding window;Conversion module, for carrying out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively, to generate The corresponding spectrogram of the vagitus signal;And first input module, for the spectrogram to be inputted trained feelings In thread detection model, with the corresponding mood of the determination vagitus signal;Wherein, the mood detection model includes capsule net Network.
Further, the first acquisition module includes: acquisition submodule, for obtaining environment voice signal;Filtering Module, for carrying out high-pass filtering to the environment voice signal, to filter out low frequency voice signal;Detection sub-module, for making The filtered environment voice signal is detected with mute detection algorithm, to obtain the vagitus signal.
Further, the detection sub-module includes: acquiring unit, for obtaining filtered environment voice described in a frame Signal is as environment voice frame signal;Computing unit, for calculate the corresponding frequency spectrum flatness of the environment voice frame signal and Short-time energy value;Judging unit, for judge the frequency spectrum flatness whether less than the first preset threshold, and the short-time energy Whether value is less than the second preset threshold;Setting unit, for when the output of the judging unit be certainly signal when, by the ring Border voice frame signal is as the corresponding vagitus frame signal of the vagitus signal.
Further, described device further include: second obtains module, refers to the corresponding ginseng of vagitus signal for obtaining Examine spectrogram and with reference to mood;Second input module, for the reference spectrum figure to be inputted the capsule network;Calculate mould Block, for calculating the output and the difference with reference to mood of the capsule network using cross entropy loss function, to optimize State the parameter of capsule network;Determining module, it is described trained for being determined based on the parameter of the capsule network after optimization Mood detection model.
On the one hand, the embodiment of the invention provides a kind of computer equipment, including memory and processor, the memories For storing the information including program instruction, the processor is used to control the execution of program instruction, and described program instruction is located The step of reason device loads and realizes the above-mentioned mood detection method based on vagitus when executing.
On the one hand, the embodiment of the invention provides a kind of storage medium, the storage medium includes the program of storage, In, equipment where controlling the storage medium in described program operation executes the above-mentioned mood detection side based on vagitus The step of method.
In embodiments of the present invention, corresponding sound audio spectrogram is converted by Fourier transformation by vagitus signal Picture carries out Emotion identification to sound spectrum using capsule network, solves the recognition accuracy in the prior art to vagitus Low problem has achieved the effect that improve the recognition accuracy to vagitus.
[Detailed description of the invention]
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is a kind of flow diagram of the mood detection method based on vagitus provided by the embodiment of the present invention;
Fig. 2 is the structural schematic diagram of capsule network provided by the embodiment of the present invention;
Fig. 3 is an exemplary process of the mood detection method based on vagitus provided by the embodiment of the present invention Figure;
Fig. 4 is a kind of structural schematic diagram of the mood detection device based on vagitus provided by the embodiment of the present invention; And
Fig. 5 is a kind of schematic diagram of computer equipment provided by the embodiment of the present invention.
[specific embodiment]
For a better understanding of the technical solution of the present invention, being retouched in detail to the embodiment of the present invention with reference to the accompanying drawing It states.
It will be appreciated that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its Its embodiment, shall fall within the protection scope of the present invention.
The term used in embodiments of the present invention is only to be not intended to be limiting merely for for the purpose of describing particular embodiments The present invention.In the embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the" It is also intended to including most forms, unless the context clearly indicates other meaning.
It should be appreciated that term "and/or" used herein is only a kind of identical field for describing affiliated partner, table Show there may be three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, individualism B this three Kind situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though default model may be described using term first, second, third, etc. in embodiments of the present invention It encloses, but these preset ranges should not necessarily be limited by these terms.These terms are only used to for preset range being distinguished from each other out.For example, In the case where not departing from range of embodiment of the invention, the first preset range can also be referred to as the second preset range, similarly, Second preset range can also be referred to as the first preset range.
Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination " or " in response to detection ".Similarly, depend on context, phrase " if it is determined that " or " if detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when the detection (condition of statement Or event) when " or " in response to detection (condition or event of statement) ".
Description based on the above-mentioned prior art is this for vagitus particular kind of it is recognised that in the related technology Voice signal has that recognition accuracy is low using existing voice recognition model without specific voice content.
For this problem, the mood detection method based on vagitus that the embodiment of the invention provides a kind of, by baby Crying signal is converted into corresponding sound spectrum image by Fourier transformation, carries out mood to sound spectrum using capsule network Identification, improves the recognition accuracy to vagitus.
Fig. 1 is a kind of flow diagram of the mood detection method based on vagitus provided by the embodiment of the present invention. As shown in Figure 1, this method comprises:
Step S101 obtains vagitus signal to be detected.
It should be noted that the mood detection method based on vagitus that is proposed of the embodiment of the present invention can be from noisy Environment voice signal in obtain vagitus signal, then to vagitus signal carry out mood detection.
Therefore, in order to the acquisition vagitus signal from noisy environment voice signal, a kind of possible realization side Formula is to obtain environment voice signal, high-pass filtering is carried out to environment voice signal, to filter out low frequency voice signal.Using mute Detection algorithm detects filtered environment voice signal, to obtain vagitus signal.
It should be noted that may include vagitus signal in environment voice signal, and people's one's voice in speech signal, environment Noise signal etc., and vagitus signal, for other voice signals, frequency is some higher, therefore can pass through high pass The mode of filtering filters out other voice signals.
In addition, none totally continuous feature of vagitus signal, but the sequence as pulse, each sequence Duration is also different.Therefore, it is also desirable to filtered environment voice signal is detected using mute detection algorithm, with Obtain the vagitus signal.
It is a kind of it is possible is achieved in that, obtain the filtered environment voice signal of a frame as environment voice frame signal, Calculate the corresponding frequency spectrum flatness of environment voice frame signal and short-time energy value.Judge whether frequency spectrum flatness is default less than first Threshold value, and whether short-time energy value less than the second preset threshold, if so, using environment voice frame signal as vagitus signal Corresponding vagitus frame signal.
Wherein, short-time energy value is the weighted sum of squares of a frame speech signal samples point energy value, and frequency spectrum flatness is one The opposite variation of frame voice signal power in frequency domain.
The calculation formula of frequency spectrum flatness is as follows:
SFdB=10log10(GS/AS)
Wherein, SFdBIndicate frequency spectrum flatness, GSAnd ASIt is the average of audible spectrum spatially and arithmetically respectively.
If should be appreciated that the corresponding frequency spectrum flatness of environment voice frame signal less than the first preset threshold, and short-time energy Value then illustrates that the environment voice frame signal is vagitus frame signal less than the second preset threshold.
Further, multiframe vagitus frame signal is spliced, vagitus signal can be obtained.
Step S102 carries out adding window to vagitus signal.
In order to allow the corresponding spectrogram resolution ratio with higher in time of generation, a kind of possible implementation It is that every 15 frame vagitus signal adds a time window, obtains vagitus window signal, two adjacent vagitus window signals Overlapping 50%.
Step S103 carries out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively, to generate The corresponding spectrogram of vagitus signal.
Wherein, the size of Fast Fourier Transform (FFT) is 256.
It should be appreciated that the corresponding spectrogram of each vagitus window signal, if two neighboring vagitus window signal weight When folded 50%, then there is also corresponding overlappings for corresponding two spectrograms.
And vagitus signal corresponds to multiple vagitus window signals, therefore, the corresponding spectrogram of vagitus signal is It is multiple.
Step S104 inputs spectrogram in trained mood detection model, to determine that vagitus signal is corresponding Mood.
Wherein, mood detection model includes capsule network.
It should be noted that the capsule network that the embodiment of the present invention is proposed was made of some filters, these filters Wave device can detecte object, does alignment on linear subspace with specific reference to the feature to extraction and carrys out detection object.
As shown in Fig. 2, capsule network includes ReLU layers of convolution, main capsule layer, sobbing capsule layer.
ReLU layers of main function of convolution be responsible for extract input picture low-level features, include 128 step-lengths be 2 9x9 convolution kernel, activation primitive are ReLU line rectification function.The intensity of pixel has been changed into local feature detection by this layer Device, the input as capsule.The capsule that ReLU layers of convolution is the bottom of various dimensions object, these capsules is activated to correspond to reversion Render process.
Main capsule layer is a convolution capsule layer, which includes 32 channels, and each capsule contains 8 convolution units, Namely each capsule is 8 dimensions.Each unit is the convolution kernel for the 9x9 that step-length is 2.The output of each capsule can be seen To the output of all 128x28x28 convolution units, their receptive field and the center of capsule are overlapped.In general, mainly Capsule layer has [32,10,10] a capsule output, each output is the vector of one 8 dimension, each output is in [6,6] Weight is shared between capsule in grid.
Sobbing capsule layer corresponds to different mood classifications, each class has the capsule of one 16 dimension, each sobbing The vector length of the capsule of layer represents whether each classification occurs, and vector length, which can also be used to calculate, intersects entropy loss letter Number.The output of sobbing capsule layer is connected to softmax, to determine the corresponding mood of vagitus signal.
It should be strongly noted that capsule is that one group of learning of neuron goes whether some objects in identification image occur, And carry out its property to be encoded into vector, the size of vector field homoemorphism just represents whether an object occurs.For example, each capsule Study can be gone to recognize a part of specific object or object on the image.Under the frame of neural network, many glue Capsule can be gathered together to form a capsule layer, wherein each unit of the inside can export a vector, rather than The activation of traditional scalar exports.
The size of the mould of the output vector of capsule, which is represented, appears in the general of current input kind by this object of Capsules representative The size of the mould of any output vector can be compressed to [0,1] using nonlinear function, this nonlinear function by rate are as follows:
Wherein vjIt is the output vector of capsule j.That is, capsule j is to input vector sjApply a nonlinear compression Activation primitive, then output vector vj。sjIt is retained, only the size of its mould is compressed between 0 to 1.vj The parameter of the inside represents the different property of object (such as position, size, texture), and the size of mould is represented with the presence or absence of this Object.
Input vector sjIt is one to all predicted vectorsWhat weighted sum obtained, be to a glue in upper one layer The output u of capsuleiMultiplied by a weight matrix Wji, formula is as follows:
Wherein cijThe referred to as coefficient of coup is updated by dynamic routing algorithm between capsule, other capsules of capsule i and same layer The sum of the coefficient of coup be equal to 1.
Initial bijIt is initialized to 0, the coefficient of coup of capsule i all capsules into next layer is all equal, then docks All inputs receivedIt is weighted and the summation that adds up obtains sj, wherein weight is each coefficient of coup cij, then using non-thread Property function is to sjIt is handled, obtains vj, following formula is reused to bijAnd cijIt is updated.
It is a convolution unit in each of capsule unit in traditional capsule layer.Therefore, each capsule meeting Export a series of vectors rather than an individual vector.
The coefficient of coup determines how information circulates between capsule pair.It include the classification task of K class for one, The last layer of capsule network can be designed to have K capsule, each represents a class.Because of the length of vector output Degree, which represents, whether there is an object, and in the last layer, the length of each capsule can be counted as into whether a picture belongs to In the probability of classification k.
Specifically in embodiments of the present invention, the corresponding mood of vagitus signal can be divided into it is positive, irritated, cry Tears three classes.
Trained mood detection model provided by the embodiment of the present invention in order to obtain, a kind of possible implementation It is that trained mood detection model is completed to train by following steps:
S11 is obtained with reference to the corresponding reference spectrum figure of vagitus signal and is referred to mood.
It should be appreciated that reference spectrum figure provided by the embodiment of the present invention and reference mood are for training mood to detect mould Type, therefore, each reference spectrum figure correctly corresponds to a kind of with reference to mood.
Reference spectrum figure is inputted capsule network by S12.
S13 calculates the output and the difference of reference mood of capsule network, using cross entropy loss function to optimize capsule net The parameter of network.
Wherein, what cross entropy calculated is reality output probability at a distance from desired output probability, that is to say, that cross entropy It is worth smaller, two probability distribution are with regard to closer.
When reality output probability is identical with desired output probability in test each time, illustrate that capsule network is completed Parameter optimization.
S14 determines trained mood detection model based on the parameter of the capsule network after optimization.
In conclusion a kind of mood detection method based on vagitus that the embodiment of the present invention is proposed.It obtains to be checked The vagitus signal of survey, to vagitus signal carry out adding window, respectively to the vagitus window signal in each time window into Row Fast Fourier Transform (FFT), to generate the corresponding spectrogram of vagitus signal.Spectrogram is inputted into trained mood detection In model, to determine the corresponding mood of vagitus signal.Wherein, mood detection model includes capsule network.Hereby it is achieved that According to vagitus signal to be detected, corresponding spectrogram is generated.Mould is detected using the mood generated by capsule network training Type carries out mood detection to spectrogram, and then determines the corresponding mood of vagitus signal.
In order to clearly illustrate the mood detection method provided by the embodiment of the present invention based on vagitus, below It is illustrated.
As shown in figure 3, carrying out Fast Fourier Transform (FFT) to reference vagitus signal first, obtain believing with reference to vagitus Number corresponding reference spectrum figure, is trained using parameter of the reference spectrum figure to capsule network, obtains mood detection model.
High-pass filtering may be carried out to it and be made an uproar with filtering environmental comprising the environment voice signal of vagitus signal by obtaining Then sound detects filtered voice signal using mute detection algorithm, to obtain the vagitus detected to mood Signal.Fast Fourier Transform (FFT) is carried out to vagitus signal, obtains corresponding spectrogram, utilizes the detection pair of mood detection model The mood answered.
In order to realize above-described embodiment, the embodiment of the present invention also proposed a kind of mood detection dress based on vagitus It sets.Fig. 4 is a kind of structural schematic diagram of the mood detection device based on vagitus provided by the embodiment of the present invention.The device For executing above-mentioned mood detection method, as shown in figure 4, the device includes: the first acquisition module 210, adding window module 220 becomes Change the mold block 230, the first input module 240.
First obtains module 210, for obtaining vagitus signal to be detected.
Adding window module 220, for carrying out adding window to vagitus signal.
Conversion module 230, for carrying out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively, To generate the corresponding spectrogram of vagitus signal.
First input module 240, for inputting spectrogram in trained mood detection model, to determine vagitus The corresponding mood of signal;Wherein, mood detection model includes capsule network.
Further, in order to acquisition vagitus signal, a kind of possible reality from noisy environment voice signal Existing mode is that the first acquisition module 210 includes: acquisition submodule 211, for obtaining environment voice signal.Filter submodule 212, for carrying out high-pass filtering to environment voice signal, to filter out low frequency voice signal.Detection sub-module 213, for using Mute detection algorithm detects filtered environment voice signal, to obtain vagitus signal.
Further, in order to be detected to filtered environment voice signal, to obtain vagitus signal, one Kind is possible to be achieved in that, detection sub-module 213 includes: acquiring unit 2131, for obtaining the filtered environment language of a frame Sound signal is as environment voice frame signal.Computing unit 2132, for calculating the corresponding frequency spectrum flatness of environment voice frame signal With short-time energy value.Judging unit 2133, for judge frequency spectrum flatness whether less than the first preset threshold, and short-time energy value Whether less than the second preset threshold.Setting unit 2134, for when the output of judging unit be certainly signal when, by environment voice Frame signal is as the corresponding vagitus frame signal of vagitus signal.
Further, trained mood detection model provided by the embodiment of the present invention, one kind are possible in order to obtain It is achieved in that, the device further include: second obtains module 250, corresponding with reference to frequency with reference to vagitus signal for obtaining Spectrogram and refer to mood.Second input module 260, for reference spectrum figure to be inputted capsule network.Computing module 270, is used for The output and the difference of reference mood of capsule network are calculated, using cross entropy loss function to optimize the parameter of capsule network.Really Cover half block 280, for determining trained mood detection model based on the parameter of the capsule network after optimization.
It should be noted that the aforementioned explanation to the mood detection method embodiment based on vagitus is also applied for The mood detection device based on vagitus of the embodiment, details are not described herein again.
In conclusion a kind of mood detection device based on vagitus that the embodiment of the present invention is proposed.It obtains to be checked The vagitus signal of survey, to vagitus signal carry out adding window, respectively to the vagitus window signal in each time window into Row Fast Fourier Transform (FFT), to generate the corresponding spectrogram of vagitus signal.Spectrogram is inputted into trained mood detection In model, to determine the corresponding mood of vagitus signal.Wherein, mood detection model includes capsule network.Hereby it is achieved that According to vagitus signal to be detected, corresponding spectrogram is generated.Mould is detected using the mood generated by capsule network training Type carries out mood detection to spectrogram, and then determines the corresponding mood of vagitus signal.
In order to realize above-described embodiment, the embodiment of the present invention also proposes a kind of computer equipment, including memory, processor And store the computer program that can be run in memory and on a processor, which is characterized in that processor executes computer The step of mood detection method based on vagitus such as preceding method embodiment is realized when program.
Fig. 5 is a kind of schematic diagram of computer equipment provided in an embodiment of the present invention.As shown in figure 5, the meter of the embodiment Machine equipment 50 is calculated to include: processor 51, memory 52 and be stored in the meter that can be run in memory 52 and on processor 51 Calculation machine program 53 realizes that the mood based on vagitus in embodiment detects when the computer program 53 is executed by processor 51 Method does not repeat one by one herein to avoid repeating.Alternatively, being realized in embodiment when the computer program is executed by processor 51 The function of each model/unit does not repeat one by one herein in mood detection device based on vagitus to avoid repeating.
Computer equipment 50 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment. Computer equipment may include, but be not limited only to, processor 51, memory 52.It will be understood by those skilled in the art that Fig. 5 is only It is the example of computer equipment 50, does not constitute the restriction to computer equipment 50, may include more more or fewer than illustrating Component perhaps combines certain components or different components, such as computer equipment can also include input-output equipment, net Network access device, bus etc..
Alleged processor 51 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.
Memory 52 can be the internal storage unit of computer equipment 50, such as the hard disk or interior of computer equipment 50 It deposits.Memory 52 is also possible to the plug-in type being equipped on the External memory equipment of computer equipment 50, such as computer equipment 50 Hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, memory 52 can also both including computer equipment 50 internal storage unit and also including External memory equipment.Memory 52 is for storing other programs and data needed for computer program and computer equipment.It deposits Reservoir 52 can be also used for temporarily storing the data that has exported or will export.
In order to realize above-described embodiment, the embodiment of the present invention also proposes that a kind of computer readable storage medium, computer can It reads storage medium and is stored with computer program, which is characterized in that realize such as preceding method when computer program is executed by processor The step of mood detection method based on vagitus of embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit, Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can be with In conjunction with or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING of device or unit or Communication connection can be electrical property, mechanical or other forms.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that device (can be personal computer, server or network equipment etc.) or processor (Processor) execute the present invention The part steps of embodiment method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.
The above is merely preferred embodiments of the present invention, be not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (10)

1. a kind of mood detection method based on vagitus, which is characterized in that the described method includes:
Obtain vagitus signal to be detected;
Adding window is carried out to the vagitus signal;
Fast Fourier Transform (FFT) is carried out to the vagitus window signal in each time window respectively, to generate the vagitus letter Number corresponding spectrogram;And
The spectrogram is inputted in trained mood detection model, with the corresponding mood of the determination vagitus signal; Wherein, the mood detection model includes capsule network.
2. the method as described in claim 1, which is characterized in that described to obtain vagitus signal to be detected, comprising:
Obtain environment voice signal;
High-pass filtering is carried out to the environment voice signal, to filter out low frequency voice signal;
The filtered environment voice signal is detected using mute detection algorithm, to obtain the vagitus letter Number.
3. method according to claim 2, which is characterized in that described to use mute detection algorithm to the filtered environment Voice signal is detected, comprising:
Filtered environment voice signal described in a frame is obtained as environment voice frame signal;
Calculate the corresponding frequency spectrum flatness of the environment voice frame signal and short-time energy value;
Whether the frequency spectrum flatness is judged less than the first preset threshold, and whether the short-time energy value is less than the second default threshold Value;
If so, using the environment voice frame signal as the corresponding vagitus frame signal of the vagitus signal.
4. method as claimed in any one of claims 1-3, which is characterized in that the trained mood detection model passes through Following steps complete training:
It obtains with reference to the corresponding reference spectrum figure of vagitus signal and refers to mood;
The reference spectrum figure is inputted into the capsule network;
The output and the difference with reference to mood of the capsule network are calculated, using cross entropy loss function to optimize the glue The parameter of keed network;
The trained mood detection model is determined based on the parameter of the capsule network after optimization.
5. a kind of mood detection device based on vagitus, which is characterized in that described device includes:
First obtains module, for obtaining vagitus signal to be detected;
Adding window module, for carrying out adding window to the vagitus signal;
Conversion module, for carrying out Fast Fourier Transform (FFT) to the vagitus window signal in each time window respectively, to generate The corresponding spectrogram of the vagitus signal;And
First input module, for inputting the spectrogram in trained mood detection model, with the determination baby cried The corresponding mood of acoustical signal;Wherein, the mood detection model includes capsule network.
6. device as claimed in claim 5, which is characterized in that described first, which obtains module, includes:
Acquisition submodule, for obtaining environment voice signal;
Submodule is filtered, for carrying out high-pass filtering to the environment voice signal, to filter out low frequency voice signal;
Detection sub-module, for being detected using mute detection algorithm to the filtered environment voice signal, to obtain The vagitus signal.
7. device as claimed in claim 6, which is characterized in that the detection sub-module includes:
Acquiring unit, for obtaining filtered environment voice signal described in a frame as environment voice frame signal;
Computing unit, for calculating the corresponding frequency spectrum flatness of the environment voice frame signal and short-time energy value;
Judging unit, for judge the frequency spectrum flatness whether less than the first preset threshold, and the short-time energy value whether Less than the second preset threshold;
Setting unit, for when the output of the judging unit be certainly signal when, using the environment voice frame signal as institute State the corresponding vagitus frame signal of vagitus signal.
8. the device as described in any one of claim 5-7, which is characterized in that described device further include:
Second obtains module, for obtaining with reference to the corresponding reference spectrum figure of vagitus signal and referring to mood;
Second input module, for the reference spectrum figure to be inputted the capsule network;
Computing module, for calculating the output and the difference with reference to mood of the capsule network using cross entropy loss function Value, to optimize the parameter of the capsule network;
Determining module, for determining the trained mood detection model based on the parameter of the capsule network after optimization.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The step of 4 described in any item mood detection methods based on vagitus.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization such as Claims 1-4 described in any item feelings based on vagitus when the computer program is executed by processor The step of thread detection method.
CN201910571836.XA 2019-06-28 2019-06-28 Mood detection method and its device based on vagitus Pending CN110390942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910571836.XA CN110390942A (en) 2019-06-28 2019-06-28 Mood detection method and its device based on vagitus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910571836.XA CN110390942A (en) 2019-06-28 2019-06-28 Mood detection method and its device based on vagitus

Publications (1)

Publication Number Publication Date
CN110390942A true CN110390942A (en) 2019-10-29

Family

ID=68285966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910571836.XA Pending CN110390942A (en) 2019-06-28 2019-06-28 Mood detection method and its device based on vagitus

Country Status (1)

Country Link
CN (1) CN110390942A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105812A (en) * 2019-12-31 2020-05-05 普联国际有限公司 Audio feature extraction method and device, training method and electronic equipment
CN111261173A (en) * 2020-01-10 2020-06-09 珠海格力电器股份有限公司 Electric appliance control method and device, storage medium and electric appliance
CN111967361A (en) * 2020-08-07 2020-11-20 盐城工学院 Emotion detection method based on baby expression recognition and crying
CN112057089A (en) * 2020-08-31 2020-12-11 五邑大学 Emotion recognition method, emotion recognition device and storage medium
CN112599134A (en) * 2020-12-02 2021-04-02 国网安徽省电力有限公司 Transformer sound event detection method based on voiceprint recognition
WO2021143411A1 (en) * 2020-01-17 2021-07-22 海信视像科技股份有限公司 Ambient sound output apparatus, system, method, and nonvolatile storage medium
WO2021147157A1 (en) * 2020-01-20 2021-07-29 网易(杭州)网络有限公司 Game special effect generation method and apparatus, and storage medium and electronic device
CN114863950A (en) * 2022-07-07 2022-08-05 深圳神目信息技术有限公司 Baby crying detection and network establishment method and system based on anomaly detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020135618A1 (en) * 2001-02-05 2002-09-26 International Business Machines Corporation System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input
US20050108004A1 (en) * 2003-03-11 2005-05-19 Takeshi Otani Voice activity detector based on spectral flatness of input signal
CN105976831A (en) * 2016-05-13 2016-09-28 中国人民解放军国防科学技术大学 Lost child detection method based on cry recognition
CN109410917A (en) * 2018-09-26 2019-03-01 河海大学常州校区 Voice data classification method based on modified capsule network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020135618A1 (en) * 2001-02-05 2002-09-26 International Business Machines Corporation System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input
US20050108004A1 (en) * 2003-03-11 2005-05-19 Takeshi Otani Voice activity detector based on spectral flatness of input signal
CN105976831A (en) * 2016-05-13 2016-09-28 中国人民解放军国防科学技术大学 Lost child detection method based on cry recognition
CN109410917A (en) * 2018-09-26 2019-03-01 河海大学常州校区 Voice data classification method based on modified capsule network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
M. A. TU˘GTEKIN TURAN等: "Monitoring Infant’s Emotional Cry in Domestic Environments", 《ISCA-SPEECH》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105812A (en) * 2019-12-31 2020-05-05 普联国际有限公司 Audio feature extraction method and device, training method and electronic equipment
CN111261173A (en) * 2020-01-10 2020-06-09 珠海格力电器股份有限公司 Electric appliance control method and device, storage medium and electric appliance
WO2021143411A1 (en) * 2020-01-17 2021-07-22 海信视像科技股份有限公司 Ambient sound output apparatus, system, method, and nonvolatile storage medium
WO2021147157A1 (en) * 2020-01-20 2021-07-29 网易(杭州)网络有限公司 Game special effect generation method and apparatus, and storage medium and electronic device
CN111967361A (en) * 2020-08-07 2020-11-20 盐城工学院 Emotion detection method based on baby expression recognition and crying
CN112057089A (en) * 2020-08-31 2020-12-11 五邑大学 Emotion recognition method, emotion recognition device and storage medium
CN112599134A (en) * 2020-12-02 2021-04-02 国网安徽省电力有限公司 Transformer sound event detection method based on voiceprint recognition
CN114863950A (en) * 2022-07-07 2022-08-05 深圳神目信息技术有限公司 Baby crying detection and network establishment method and system based on anomaly detection

Similar Documents

Publication Publication Date Title
CN110390942A (en) Mood detection method and its device based on vagitus
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
US10621971B2 (en) Method and device for extracting speech feature based on artificial intelligence
KR102213013B1 (en) Frequency-based audio analysis using neural networks
CN111814574B (en) Face living body detection system, terminal and storage medium applying double-branch three-dimensional convolution model
CN110600059B (en) Acoustic event detection method and device, electronic equipment and storage medium
CN110444202B (en) Composite voice recognition method, device, equipment and computer readable storage medium
CN110082135A (en) Equipment fault recognition methods, device and terminal device
CN109658943B (en) Audio noise detection method and device, storage medium and mobile terminal
CN111798828B (en) Synthetic audio detection method, system, mobile terminal and storage medium
CN112949708A (en) Emotion recognition method and device, computer equipment and storage medium
CN113205820B (en) Method for generating voice coder for voice event detection
CN111357051A (en) Speech emotion recognition method, intelligent device and computer readable storage medium
CN110648669B (en) Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium
CN114863938A (en) Bird language identification method and system based on attention residual error and feature fusion
CN114155875B (en) Method and device for identifying voice scene tampering, electronic equipment and storage medium
CN111462755A (en) Information prompting method and device, electronic equipment and medium
CN113763966A (en) End-to-end text-independent voiceprint recognition method and system
Chinmayi et al. Emotion Classification Using Deep Learning
CN115328661B (en) Computing power balance execution method and chip based on voice and image characteristics
CN117496990A (en) Speech denoising method, device, computer equipment and storage medium
CN111639537A (en) Face action unit identification method and device, electronic equipment and storage medium
US20220408201A1 (en) Method and system of audio processing using cochlear-simulating spike data
CN115565548A (en) Abnormal sound detection method, abnormal sound detection device, storage medium and electronic equipment
CN114548262A (en) Feature level fusion method for multi-modal physiological signals in emotion calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191029

RJ01 Rejection of invention patent application after publication