CN108648748B - Acoustic event detection method under hospital noise environment - Google Patents

Acoustic event detection method under hospital noise environment Download PDF

Info

Publication number
CN108648748B
CN108648748B CN201810297418.1A CN201810297418A CN108648748B CN 108648748 B CN108648748 B CN 108648748B CN 201810297418 A CN201810297418 A CN 201810297418A CN 108648748 B CN108648748 B CN 108648748B
Authority
CN
China
Prior art keywords
acoustic event
audio
event
target acoustic
mfcc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810297418.1A
Other languages
Chinese (zh)
Other versions
CN108648748A (en
Inventor
邵虹
田影
刘阳
崔文成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Technology
Original Assignee
Shenyang University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Technology filed Critical Shenyang University of Technology
Priority to CN201810297418.1A priority Critical patent/CN108648748B/en
Publication of CN108648748A publication Critical patent/CN108648748A/en
Application granted granted Critical
Publication of CN108648748B publication Critical patent/CN108648748B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The invention relates to an acoustic event detection method, in particular to an acoustic event detection method in a hospital noise environment. The method and the device can accurately recognize the voice into the characters, improve the recognition rate of the voice input electronic medical record and reduce the false recognition rate. The method comprises the following steps: step 1, intercepting the characteristics of the audio signal of each acoustic event, and correspondingly marking the audio segment of the audio signal; step 2, extracting MFCC characteristic coefficients of each target acoustic event in the audio; step 3, aligning the voice phonemes; step 4, generating a feature matrix of the voice; step 5, establishing a CRNN model for each target acoustic event; step 6, preprocessing an audio signal of a target acoustic event to be detected, which is acquired in real time in a hospital noise environment, and then extracting MFCC features; step 7, obtaining the category of the target acoustic event to be detected; and 8, filtering out audio segments irrelevant to the target acoustic event.

Description

Acoustic event detection method under hospital noise environment
Technical Field
The invention relates to an acoustic event detection method, in particular to an acoustic event detection method in a hospital noise environment.
Background
Under the condition of very low signal-to-noise ratio or the condition of speaking of a plurality of people, the recognition rate of the existing voice electronic medical record can be greatly reduced, so that the acoustic event detection becomes a key step in the process of removing the noise influence in the hospital environment.
Current speech recognizers uniformly classify non-speech into one class: noise, actually realistic noise, may be more complex than speech, and if various noise types can also be modeled, it is helpful for speech recognition to distinguish which useful speech is, which is significant for recognition of electronic medical records of speech.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the acoustic event detection method in the hospital noise environment, which can accurately recognize the voice into characters, improve the recognition rate of the voice input electronic medical record and reduce the false recognition rate.
In order to achieve the purpose, the invention adopts the following technical scheme, which comprises the following steps.
Step 1, in a training stage, taking an audio signal of a target acoustic event and a hospital environment noise signal as training data, carrying out feature interception on the audio signal of each acoustic event, and carrying out corresponding marking on an audio fragment of the audio signal.
And 2, extracting the MFCC characteristic coefficient of each target acoustic event in the audio according to the features intercepted in the step 1, wherein the MFCC characteristic coefficient comprises an audio fragment of the features of the acoustic events.
And 3, training an alignment model by adopting an HMM-CNN according to the extracted MFCC characteristic coefficients, and aligning the speech phonemes.
And 4, calculating statistics of cepstrum mean and variance normalization of the MFCC features after alignment in the step 3, taking the sound event number as an index, wherein each statistic set is a matrix, namely a feature matrix of the generated speech.
And 5, establishing a CRNN model for each target acoustic event by using Theano as the background of a Keras development tool according to the feature matrix generated in the step 4.
And 6, in the identification stage, preprocessing the audio signal of the target acoustic event to be detected, which is acquired in real time in the hospital noise environment, and then extracting the MFCC characteristics.
And 7, classifying and identifying by adopting the CRNN model obtained in the step 5 according to the MFCC coefficients extracted in the step 6 to obtain the category of the target acoustic event to be detected.
And 8, in the acoustic events of which the types are determined in the step 7, comprehensively analyzing the noise events based on time sequence and direction to obtain corresponding event sequence codes, filtering the current event sequence according to the obtained event sequence codes, and filtering audio segments irrelevant to the target acoustic events.
Preferably, the method adopted by the model for aligning the speech factors in the step 3 comprises the following steps.
Step 3-1, processing according to MFCC characteristics to extract a frame sequence; each frame is normalized to the same scale and fed into the CNN which yields a posterior probability of belonging to one class.
And 3-2, reducing training parameters by the characteristic shared by the CNN weight to inhibit overfitting, enhancing the original voice signal characteristic by convolution operation of the convolutional layer, and reducing the background noise.
And 3-3, performing sub-sampling on the characteristics by utilizing a speech signal frequency spectrum local correlation principle at a pooling layer, reducing the dimension of the data and reserving useful information.
Step 3-4, after normalization, this probability is used as the output probability of the HMM, which is used to infer the most likely sequence of feature frames.
Preferably, the method for using the feature matrix in step 4 is as follows: and calculating statistics of cepstrum mean and variance normalization of the extracted features, taking the sound event number as an index, wherein each statistic set is a feature matrix.
Preferably, the method for training the CRNN acoustic model in step 5 includes the following steps.
Step 5-1, using a Gated Linear Unit (GLU) as an activation function in the CNN, and using the gated linear unit to introduce an attention mechanism into all layers of the neural network in the audio classification; processing the associated audio event by setting its value to a time domain close to zero; convolutional layers are applied to extract advanced features.
Step 5-2, capturing the time context information by using a bidirectional recurrent neural network (Bi-RNN), and predicting the posterior of each audio category of each frame by using a Forward Neural Network (FNN) and the number of audio categories. The prediction probability of each audio tag is obtained by averaging the posteriori of all frames.
Step 5-3, applying binary cross entropy loss between the prediction probability of the audio record and the basic fact; the weights of the neural network may be updated by using the gradient of the weights calculated by back propagation.
Wherein GLU has the formula.
Y=(W*X+b)⊙σ(V*X+c) (1)
In the above formula, σ is an S-type nonlinear, which is an element product, and σ is a convolution operator; w and V are convolution filters, b and c are offsets; x represents the input T-F representation in the first layer or a feature map of the spacer layer.
In addition, the binary cross entropy loss formula is trained as follows.
Figure GDA0001670342770000031
Where E is a binary cross entropy, and On and Pn represent the estimate and reference label vectors at sample index n, respectively; the size of the cluster is denoted by n.adam as a random optimization method.
Compared with the prior art, the invention has the beneficial effects.
The invention can enable the voice-input electronic medical record to accurately recognize the voice into characters under the noise environment of a hospital, thereby improving the recognition rate of the voice-input electronic medical record and reducing the false recognition rate. The target acoustic event detection under the hospital noise environment is realized, and certain robustness is provided for noise.
Drawings
The invention is further described with reference to the following figures and detailed description. The scope of the invention is not limited to the following expressions.
Fig. 1 is a block diagram of the overall structure of the present invention.
FIG. 2 is a model diagram of a CNN-HMM according to the present invention.
FIG. 3 is a diagram of the CRNN model of the present invention.
In the figure, 1 is a pooling layer, 2 is a hidden layer, and 3 is a convolutional layer.
Detailed Description
As shown in fig. 1-3, the present invention relates to the study of acoustic event detection methods, particularly noise data in a hospital environment. The method comprises the following steps.
In the training phase.
1. Firstly, hospital noise is comprehensively analyzed, various acoustic events in the hospital are adopted for detection and classification, and the sound data of the medical equipment comprises the following components: the breathing machine, the ECG monitor, the pacing equipment of defibrillation of heart, the sound that the nursing car removed includes in addition: the printer, the patient cries, six acoustic events are detected. The data each comprises 100 events each having a length of not less than 1 second, the audio length being ten seconds or more, the target sound event categories being selected according to their frequency of occurrence in the original annotation and the number of different recordings in which they occur. The data set divides the data into training and evaluation subsets according to the number of examples available for each event class, while also taking into account the recording location. In order to adjust the parameters to achieve the best effect, the development set is further divided into four levels, and each record is only used once as test data. At this stage, the only condition imposed is that the test subset does not contain in-training data. The sound event data consisted of five records in the evaluation set, with 12 records in the training and testing subset distributed over the four folds. The sound event data consisted of five records in the evaluation set, and four folds distributed 10 records into the training and testing subsets.
2. And then, acquiring audio signals and noise signals of various target acoustic events by using a microphone array voice recording system, taking the audio of the target acoustic events and hospital noise signals as training data, carrying out characteristic interception on the audio signals of each acoustic event, and carrying out corresponding marking on audio segments.
3. And extracting MFCC characteristic coefficients of each target acoustic event in the audio according to the intercepted features, wherein the audio segments comprise the features of the acoustic events.
4. The model for aligning the speech factors according to the extracted MFCC coefficients adopts the following method.
(1) Processing is performed to extract a sequence of frames based on the MFCC features. Each frame is normalized to the same scale and fed into the CNN which yields a posterior probability of belonging to one class.
(2) The characteristic of CNN weight sharing reduces training parameters to inhibit overfitting, enhances the original speech signal characteristic through convolution operation of convolution layers, and reduces background noise. The CNN model is divided into three layers: a pooling layer, a hidden layer, and a convolutional layer.
(3) And sub-sampling the characteristics by utilizing a speech signal frequency spectrum local correlation principle at a pooling layer, reducing the dimension of the data and retaining useful information.
(4) After normalization, this probability is used as the output probability of the HMM, which is used to infer the most likely sequence of feature frames.
Referring to FIG. 2, which is a diagram of a CNN-HMM model, the speech phonemes are aligned by a trained CNN-HMM acoustic model.
5. And calculating statistics of cepstrum mean and variance normalization of the features after alignment in the third step, taking the sound event number as an index, wherein each statistic set is a matrix, namely a feature matrix of the generated speech.
6. And establishing a CRNN model for each target acoustic event by using Theano as the background of a Keras development tool according to the feature matrix generated in the step four.
The method in training the CRNN acoustic model is as follows.
(1) CNN uses Gated Linear Units (GLU) as activation functions, which in audio classification introduces attention mechanisms into all layers of the neural network. The associated audio event is processed by setting its value to the time domain close to zero. If a GLU is close to 1, then there should be a corresponding T-F unit. If a GLU is close to 0, then the corresponding T-F unit should be ignored. In this way, the network will learn to focus on audio events and ignore irrelevant sounds. Convolutional layers are applied to extract advanced features.
(2) Capturing temporal context information using a Bi-directional recurrent neural network (Bi-RNN), predicting an a posteriori for each audio class of each frame using a Forward Neural Network (FNN) and a number of audio classes. The prediction probability of each audio tag is obtained by averaging the posteriori of all frames.
(3) Applying a binary cross entropy loss between the prediction probability of the audio recording and the ground truth. The weights of the neural network may be updated by using the gradient of the weights calculated by back propagation; a CRNN acoustic model is created for each target acoustic event, as shown in fig. 3, which is a diagram of a CRNN model.
In connection with fig. 1, in the recognition phase.
1. And (3) preprocessing the audio signal of the target acoustic event to be detected, which is acquired in real time in the noise environment of the hospital, by using a microphone array voice recording system, and then extracting the MFCC characteristics.
2. And D, according to the extracted MFCC coefficients, performing classification recognition by adopting the CRNN model obtained in the fifth step to obtain the category of the target acoustic event to be detected.
3. And seventhly, comprehensively analyzing the noise events based on time sequence and direction in the acoustic events of the determined category to obtain corresponding event sequence codes, filtering the current event sequence according to the obtained event sequence codes, and filtering audio segments irrelevant to the target acoustic events.
The GLU formula is as follows.
Y=(W*X+b)⊙σ(V*X+c) (1)
Where σ is an S-type non-linear, and is an element product, and is a convolution operator. W and V are convolution filters, and b and c are offsets. X represents the input T-F representation in the first layer or a feature map of the spacer layer.
In addition, the binary cross entropy loss formula is trained as follows.
Figure GDA0001670342770000071
Where E is the binary cross entropy, and On and Pn represent the estimate and reference label vectors at sample index n, respectively. The size of the cluster is denoted by n.adam as a random optimization method.
The invention well solves the problem of the recognition rate under the noise environment in the traditional voice recording electronic medical record, can greatly improve the working efficiency and effect of medical personnel, and can be well popularized and applied in the field of voice recording electronic medical records.
It should be understood that the detailed description of the present invention is only for illustrating the present invention and is not limited by the technical solutions described in the embodiments of the present invention, and those skilled in the art should understand that the present invention can be modified or substituted equally to achieve the same technical effects; as long as the use requirements are met, the method is within the protection scope of the invention.

Claims (1)

1. The method for detecting the acoustic events in the noise environment of the hospital is characterized by comprising the following steps of:
step 1, in a training stage, taking an audio signal of a target acoustic event and a hospital environment noise signal as training data, carrying out feature interception on the audio signal of each acoustic event, and carrying out corresponding marking on an audio segment of the audio signal;
step 2, extracting MFCC characteristic coefficients of each target acoustic event in the audio according to the features intercepted in the step 1, wherein the MFCC characteristic coefficients comprise audio segments of the features of the acoustic events;
step 3, according to the extracted MFCC characteristic coefficients, adopting an HMM-CNN training alignment model to align the speech phonemes;
step 4, calculating statistics of cepstrum mean and variance normalization of the MFCC features after alignment in the step 3, taking the sound event number as an index, wherein each statistic set is a matrix, namely a feature matrix of the generated voice;
step 5, establishing a CRNN model for each target acoustic event by using Theano as a background of a Keras development tool according to the feature matrix generated in the step 4;
step 6, in the identification stage, the audio signals of the target acoustic events to be detected, which are acquired in real time in the noise environment of the hospital, are preprocessed and then MFCC (Mel frequency cepstrum coefficient) feature extraction is carried out;
step 7, classifying and identifying by adopting the CRNN model obtained in the step 5 according to the MFCC coefficient extracted in the step 6 to obtain the category of the target acoustic event to be detected;
step 8, in the acoustic events of the determined category in the step 7, carrying out comprehensive analysis based on time sequence and direction on the noise events to obtain corresponding event sequence codes, filtering the current event sequence according to the obtained event sequence codes, and filtering out audio segments irrelevant to the target acoustic events;
the method adopted by the model for aligning the speech phonemes in the step 3 comprises the following steps:
step 3-1, processing according to MFCC characteristics to extract a frame sequence; each frame is normalized to the same scale and fed into the CNN which produces a posterior probability of belonging to one class;
3-2, reducing training parameters by the characteristic shared by CNN weight to inhibit overfitting, enhancing the characteristics of the original voice signal by convolution operation of a volume base layer, and reducing background noise;
3-3, performing sub-sampling on the characteristics by utilizing a speech signal frequency spectrum local correlation principle at a pooling layer, reducing the dimension of data and reserving useful information;
step 3-4, after normalization, this probability is used as the output probability of the HMM, which is used to infer the most likely sequence of feature frames.
CN201810297418.1A 2018-03-30 2018-03-30 Acoustic event detection method under hospital noise environment Expired - Fee Related CN108648748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810297418.1A CN108648748B (en) 2018-03-30 2018-03-30 Acoustic event detection method under hospital noise environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810297418.1A CN108648748B (en) 2018-03-30 2018-03-30 Acoustic event detection method under hospital noise environment

Publications (2)

Publication Number Publication Date
CN108648748A CN108648748A (en) 2018-10-12
CN108648748B true CN108648748B (en) 2021-07-13

Family

ID=63745447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810297418.1A Expired - Fee Related CN108648748B (en) 2018-03-30 2018-03-30 Acoustic event detection method under hospital noise environment

Country Status (1)

Country Link
CN (1) CN108648748B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259189B (en) * 2018-11-30 2023-04-18 马上消费金融股份有限公司 Music classification method and device
CN109961017A (en) * 2019-02-26 2019-07-02 杭州电子科技大学 A kind of cardiechema signals classification method based on convolution loop neural network
CN110097193B (en) * 2019-04-28 2021-03-19 第四范式(北京)技术有限公司 Method and system for training model and method and system for predicting sequence data
CN110085249B (en) * 2019-05-09 2021-03-16 南京工程学院 Single-channel speech enhancement method of recurrent neural network based on attention gating
CN110147788B (en) * 2019-05-27 2021-09-21 东北大学 Feature enhancement CRNN-based metal plate strip product label character recognition method
CN110179466A (en) * 2019-06-03 2019-08-30 珠海涵辰科技有限公司 Breathing detection system after calamity based on intelligent terminal
CN110223713A (en) * 2019-06-11 2019-09-10 苏州思必驰信息科技有限公司 Sound event detection model training method and sound event detection method
CN110232927B (en) * 2019-06-13 2021-08-13 思必驰科技股份有限公司 Speaker verification anti-spoofing method and device
CN110334243A (en) * 2019-07-11 2019-10-15 哈尔滨工业大学 Audio representation learning method based on multilayer timing pond
CN110600059B (en) * 2019-09-05 2022-03-15 Oppo广东移动通信有限公司 Acoustic event detection method and device, electronic equipment and storage medium
CN111261192A (en) * 2020-01-15 2020-06-09 厦门快商通科技股份有限公司 Audio detection method based on LSTM network, electronic equipment and storage medium
CN111259188B (en) * 2020-01-19 2023-07-25 成都潜在人工智能科技有限公司 Lyric alignment method and system based on seq2seq network
CN111899760A (en) * 2020-07-17 2020-11-06 北京达佳互联信息技术有限公司 Audio event detection method and device, electronic equipment and storage medium
CN111933188B (en) * 2020-09-14 2021-02-05 电子科技大学 Sound event detection method based on convolutional neural network
CN112309405A (en) * 2020-10-29 2021-02-02 平安科技(深圳)有限公司 Method and device for detecting multiple sound events, computer equipment and storage medium
CN112712804B (en) * 2020-12-23 2022-08-26 哈尔滨工业大学(威海) Speech recognition method, system, medium, computer device, terminal and application
CN112863492B (en) * 2020-12-31 2022-06-10 思必驰科技股份有限公司 Sound event positioning model training method and device
CN113159217B (en) * 2021-05-12 2023-08-01 深圳龙岗智能视听研究院 Attention mechanism target detection method based on event camera
CN113761269B (en) * 2021-05-21 2023-10-10 腾讯科技(深圳)有限公司 Audio recognition method, apparatus and computer readable storage medium
CN113903003B (en) * 2021-10-15 2022-07-29 宿迁硅基智能科技有限公司 Event occurrence probability determination method, storage medium, and electronic apparatus
CN113920473B (en) * 2021-10-15 2022-07-29 宿迁硅基智能科技有限公司 Complete event determination method, storage medium and electronic device
CN114974303B (en) * 2022-05-16 2023-05-12 江苏大学 Self-adaptive hierarchical aggregation weak supervision sound event detection method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1598770A2 (en) * 2004-05-20 2005-11-23 Microsoft Corporation Low resolution optical character recognition for camera acquired documents
WO2013057652A3 (en) * 2011-10-17 2013-07-18 Koninklijke Philips Electronics N.V. A medical feedback system based on sound analysis in a medical environment
CN104916289A (en) * 2015-06-12 2015-09-16 哈尔滨工业大学 Quick acoustic event detection method under vehicle-driving noise environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1598770A2 (en) * 2004-05-20 2005-11-23 Microsoft Corporation Low resolution optical character recognition for camera acquired documents
WO2013057652A3 (en) * 2011-10-17 2013-07-18 Koninklijke Philips Electronics N.V. A medical feedback system based on sound analysis in a medical environment
CN103875034A (en) * 2011-10-17 2014-06-18 皇家飞利浦有限公司 A medical feedback system based on sound analysis in a medical environment
CN104916289A (en) * 2015-06-12 2015-09-16 哈尔滨工业大学 Quick acoustic event detection method under vehicle-driving noise environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection;Emre,Cakır;《Transactions on Audio, Speech, and Language Processing》;20171231;第1291-1302页 *
Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs;Oscar Koller;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171231;全文 *

Also Published As

Publication number Publication date
CN108648748A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108648748B (en) Acoustic event detection method under hospital noise environment
CN107492382B (en) Voiceprint information extraction method and device based on neural network
CN109044396B (en) Intelligent heart sound identification method based on bidirectional long-time and short-time memory neural network
Ghoraani et al. Time–frequency matrix feature extraction and classification of environmental audio signals
CN109036382B (en) Audio feature extraction method based on KL divergence
CN110033756B (en) Language identification method and device, electronic equipment and storage medium
CN113537005B (en) Online examination student behavior analysis method based on attitude estimation
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN111951824A (en) Detection method for distinguishing depression based on sound
CN111986699B (en) Sound event detection method based on full convolution network
CN110364168B (en) Voiceprint recognition method and system based on environment perception
CN111816185A (en) Method and device for identifying speaker in mixed voice
CN110991238A (en) Speech auxiliary system based on speech emotion analysis and micro-expression recognition
CN111666996A (en) High-precision equipment source identification method based on attention mechanism
CN112466284B (en) Mask voice identification method
CN107274912A (en) A kind of equipment source discrimination method of mobile phone recording
CN115831352B (en) Detection method based on dynamic texture features and time slicing weight network
CN116842460A (en) Cough-related disease identification method and system based on attention mechanism and residual neural network
CN109584861A (en) The screening method of Alzheimer's disease voice signal based on deep learning
CN114881668A (en) Multi-mode-based deception detection method
Xie et al. Image processing and classification procedure for the analysis of australian frog vocalisations
CN113571092A (en) Method for identifying abnormal sound of engine and related equipment thereof
CN111785262A (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN111524523A (en) Instrument and equipment state detection system and method based on voiceprint recognition technology
CN116052725B (en) Fine granularity borborygmus recognition method and device based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210713