CN116092517A - Audio detection method, audio detection device and computer storage medium - Google Patents

Audio detection method, audio detection device and computer storage medium Download PDF

Info

Publication number
CN116092517A
CN116092517A CN202211734814.9A CN202211734814A CN116092517A CN 116092517 A CN116092517 A CN 116092517A CN 202211734814 A CN202211734814 A CN 202211734814A CN 116092517 A CN116092517 A CN 116092517A
Authority
CN
China
Prior art keywords
audio
detection
detected
audio detection
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211734814.9A
Other languages
Chinese (zh)
Inventor
方瑞东
杜海云
吴人杰
史巍
林聚财
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202211734814.9A priority Critical patent/CN116092517A/en
Publication of CN116092517A publication Critical patent/CN116092517A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The application discloses an audio detection method, an audio detection device and a computer storage medium, wherein the audio detection method comprises the following steps: acquiring audio to be detected; acquiring acoustic characteristics belonging to a positive sample in the audio to be detected by using a characteristic analysis model; and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on the classification detection result. Through the mode, the method and the device can carry out primary screening on the extracted acoustic features of the audio to be detected based on the feature analysis model, and then send the acoustic features into the audio detection model for detection, so that the calculated amount of the audio detection model is reduced, the training complexity of the whole audio detection network structure is reduced, and the overall time consumption of an audio detection algorithm is reduced.

Description

Audio detection method, audio detection device and computer storage medium
Technical Field
The present disclosure relates to the field of audio processing, and in particular, to an audio detection method, an audio detection apparatus, and a computer storage medium.
Background
Along with the continuous development of intelligent acoustic technology, the requirements of the related technology for detecting the audio event are increasingly higher, the audio event detection is mainly to judge whether the event occurs according to the detected audio signal intensity, if the detected signal intensity is higher than a set threshold value, the algorithm judges that the event occurs, and then prompts the user that the event occurs. The detection of audio events makes the life of people more convenient and efficient.
In an application scenario, it is necessary to detect infant crying, and when infant crying is detected, warning information is sent to a user so that the user can find out an abnormal state of the infant in time. However, in an actual scene, environmental factors are mostly complex and changeable, and the surrounding environment usually contains various noise interferences besides the sound event to be detected, so that the overall audio detection is too long in time consumption and too high in cost.
Disclosure of Invention
The application mainly solves the technical problem of how to reduce the time consumption of audio detection, and in this regard, the application provides an audio detection method, an audio detection device and a computer storage medium.
In order to solve the technical problems, one technical scheme adopted by the application is as follows: there is provided an audio detection method, the method comprising: acquiring audio to be detected; acquiring acoustic characteristics belonging to a positive sample in the audio to be detected by using a characteristic analysis model; and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on the classification detection result.
Wherein the classification detection result includes a probability value detected as a positive sample; if the probability value is higher than a first preset threshold value, outputting that the detection type of the audio to be detected is a positive sample.
The classification detection result also comprises detection times of positive samples; if the detection times and the probability value are higher than a second preset threshold value and a first preset threshold value respectively, outputting the detection type of the audio to be detected as a positive sample.
Before the acoustic features belonging to the positive sample in the audio to be detected are acquired by using the feature analysis model, the method further comprises the following steps: and extracting the audio characteristics of the audio to be detected by using the characteristic extraction model.
The method for extracting the audio features of the audio to be detected by using the feature extraction model comprises the following steps: and performing frequency domain and/or cepstrum domain conversion on the audio signal of the audio to be detected by using the feature extraction model so as to obtain audio features.
The audio features may be one or more of logarithmic mel-spectrum, mel-cepstral coefficients, filter bank features, perceptual linear prediction, channel energy normalization features.
The method for acquiring the acoustic characteristics of the positive sample in the audio to be detected by using the characteristic analysis model comprises the following steps: the feature analysis model obtains parameters of a probability density function by using a maximized likelihood estimation method based on a preset positive sample; obtaining a probability density function based on parameter training; and obtaining the acoustic characteristics belonging to the positive sample in the audio to be detected by using the probability density function.
Wherein the network structure of the audio detection model comprises a packet convolution layer and a residual layer.
In order to solve the technical problems, another technical scheme adopted by the application is as follows: there is provided an audio detection device comprising a processor and a memory coupled to the processor, the memory storing program data, the processor being configured to execute the program data to implement an audio detection method as described above.
In order to solve the technical problems, another technical scheme adopted by the application is as follows: there is provided a computer readable storage medium storing program data which, when executed, is adapted to carry out the above-described audio detection method.
The beneficial effects of this application are: different from the condition of the prior art, the audio detection method provided by the invention is applied to an audio detection device, and the audio detection device acquires audio to be detected; acquiring acoustic characteristics belonging to a positive sample in the audio to be detected by using a characteristic analysis model; and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on the classification detection result. By means of the method, compared with a conventional audio detection method, the method and the device for detecting the audio by using the characteristic analysis model in the audio detection device are used for carrying out primary screening on the acoustic characteristics of the audio to be detected, data which are most likely to be negative samples are removed, the data are not sent into the audio detection model for classification, only the data which belong to positive samples are sent into the audio detection model for further prediction, so that calculation time of the audio detection model is saved, calculation efficiency is improved, overall audio event prediction efficiency is improved, and time cost waste in the detection process is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
fig. 1 is a schematic flow chart of a first embodiment of an audio detection method provided in the present application;
fig. 2 is a schematic flow chart of an audio detection method applied to an audio detection device;
fig. 3 is a schematic structural diagram of an audio detection device provided in the present application;
fig. 4 is a schematic flow chart of a second embodiment of an audio detection method provided in the present application;
fig. 5 is a schematic structural diagram of a first embodiment of an audio detection device provided in the present application;
fig. 6 is a schematic structural diagram of a second embodiment of an audio detection device provided in the present application;
fig. 7 is a schematic structural diagram of an embodiment of a computer readable storage medium provided in the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
The audio detection method is mainly applied to an audio detection device, wherein the audio detection device can be a server or a system formed by mutually matching a server and terminal equipment. Accordingly, each part, such as each unit, sub-unit, module, and sub-module, included in the audio detection apparatus may be all disposed in the server, or may be disposed in the server and the terminal device, respectively.
Further, the server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing a distributed server, or may be implemented as a single software or software module, which is not specifically limited herein. In some possible implementations, the audio detection method of the embodiments of the present application may be implemented by a processor invoking computer readable instructions stored in a memory.
The audio detection method is mainly applied to the detection of audio events, and is mainly used for judging whether the event occurs according to the detected audio signal intensity, if the detected signal intensity is higher than a set threshold value, the event is judged to occur, and then the user is prompted to occur. In a practical scenario, environmental factors are mostly complex and changeable, and the surrounding environment usually contains various noise interferences besides the sound event to be detected, so that the suitable audio signal features need to be extracted, thereby reducing the cost of audio event detection.
Referring to fig. 1 to 3, fig. 1 is a schematic flow chart of a first embodiment of an audio detection method provided in the present application, fig. 2 is a schematic flow chart of an audio detection method applied to an audio detection device provided in the present application, and fig. 3 is a schematic structural diagram of an audio detection device provided in the present application.
Step 11: and acquiring the audio to be detected.
Specifically, the audio to be detected may be recorded in advance by the recording device, or may be recorded in real time by the recording device. The recording device may be any device with recording and/or recording functions, such as a mobile phone, a microphone, a tablet, etc., which is not limited herein. Any application and/or recording equipment with the recording function can acquire audio data to record so as to obtain audio to be detected, such as an audio recorder, a video recorder, instant messaging software (such as WeChat, QQ and the like) and the like.
Specifically, after the audio detection device obtains the audio to be detected, the audio detection device further performs labeling, framing and other processes on the audio to be detected, so as to segment the audio data with an indefinite length into small segments with a fixed length and label each small segment, the framing is required because the audio is a long-time unsteady sequence, and in order to make the unsteady audio in a steady state in a short time, so that the subsequent audio detection device can obtain relatively stable characteristic parameters.
Optionally, the audio detection device may further add noise, reverberation, and the like to the acquired audio to be detected, so as to increase the robustness of the audio to be detected.
Optionally, the audio detection device may further perform speed adjustment, noise adding, pitch adjustment, volume adjustment, and the like on the audio data to generate training data sets similar to but different from the audio data of the audio to be detected, and then use the amplified audio data as the training data sets, so that by labeling and amplifying the original audio data, the scale of the training data sets is enlarged, and the generalization capability of the audio detection network during subsequent training of the audio detection network is improved.
Specifically, before the audio to be detected is sent to the feature analysis model, the audio detection device also uses the feature extraction model to extract the audio features of the audio to be detected. The audio features may be one or more of logarithmic mel-spectrum, mel-cepstral coefficients, filter bank features, perceptual linear prediction, channel energy normalization features.
The mel cepstrum coefficient is based on the auditory characteristic of the human ear, the mel cepstrum frequency band division is equally divided on the mel scale, and the logarithmic distribution relation of the scale value of the mel frequency and the actual frequency is more consistent with the auditory characteristic of the human ear, so that the voice signal can be better represented. The logarithmic mel spectrum is obtained by carrying out logarithmic conversion on the mel spectrum.
The filter bank features are discrete cosine transforms that correspond to the last step of mel-frequency cepstrum coefficient removal, and retain more of the original speech data than the mel-frequency cepstrum coefficients.
The perceptual linear prediction is a characteristic parameter based on an auditory model, the parameter is a characteristic equivalent to a linear prediction coefficient, and is also a group of coefficients of an all-pole model prediction polynomial, the perceptual linear prediction is applied to spectrum analysis through calculation, an input voice signal is processed by an in-ear auditory model to replace a used time domain signal, and the perceptual linear prediction has the advantage of being beneficial to the extraction of anti-noise voice characteristics.
Specifically, before the audio detection device acquires the acoustic features belonging to the positive sample in the audio to be detected by using the feature analysis model, the audio signal of the audio to be detected is subjected to frequency domain and/or cepstrum domain conversion by using the feature extraction model so as to obtain the audio features.
The frequency domain is converted into a structural form represented by frequency components by converting one-dimensional sound signals of the audio to be detected into a structural form represented by frequency components through Fourier transformation, wavelet transformation and the like. The cepstrum domain is converted into the short-time amplitude spectrum of the signal to carry out logarithmic Fourier inverse transformation, and can be used for analyzing a periodic structure on a complex spectrogram and separating and extracting periodic components in a dense frequency modulation signal.
Step 12: and acquiring acoustic characteristics belonging to the positive sample in the audio to be detected by using the characteristic analysis model.
In particular, the feature analysis model may utilize a GMM model (Gaussian Mixture Model ) to cluster the obtained acoustic features and screen the desired data. Fig. 4 is a schematic flow chart of a second embodiment of the audio detection method provided in the present application.
Step 41: and obtaining parameters of the probability density function by using a maximum likelihood estimation method based on a preset positive sample by using a feature analysis model.
In particular, maximizing likelihood estimation is to find the parameter values most likely (i.e., most probable) to lead to a known sample distribution, using such a distribution.
Specifically, the probability density function of the GMM model is
Figure BDA0004032265250000061
Wherein K is the total number of Gaussian probability models, and the kth Gaussian probability model is +.>
Figure BDA0004032265250000062
μ k And->
Figure BDA0004032265250000063
Respectively representing the mean and the variance; alpha k Is->
Figure BDA0004032265250000064
Prior probability, alpha k ≥0,/>
Figure BDA0004032265250000065
Step 42: and obtaining a probability density function based on parameter training.
GMM model to maximize likelihood estimation
Figure BDA0004032265250000071
Training a preset positive sample to obtain parameters of a probability density function, and modeling to obtain the probability density function of the positive sample.
Step 43: and obtaining the acoustic characteristics belonging to the positive sample in the audio to be detected by using the probability density function.
Specifically, the feature analysis model in the audio detection device substitutes the acoustic features of the audio to be detected into the probability density function to perform clustering screening on the acoustic features, so that acoustic features belonging to positive samples are obtained. Wherein, the positive sample indicates that the event to be judged has occurred.
Specifically, the audio detection device filters the acoustic features of the audio to be detected in advance through a trained acoustic feature model in the feature analysis model, divides sample data to be detected through a probability threshold set by people, considers negative sample data if the probability value of predicting that the sample is a positive sample is lower than the threshold set by people, and otherwise, judges that the sample is a positive sample, and sends the positive sample into the audio detection model of the next stage for retraining analysis.
Step 13: and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on the classification detection result.
Specifically, the network structure of the audio detection model includes a packet convolution layer and a residual layer.
The grouping convolution is to group different feature graphs of an input layer, and then to convolve each group by adopting different convolution kernels, so that the calculated amount of convolution can be reduced. The residual layer is helpful to solve the problems of gradient disappearance and gradient explosion when the network depth is too high, so that the accuracy of the model is improved while the number of convolution layers is increased continuously.
Specifically, the classification detection result includes a probability value detected as a positive sample; if the probability value is higher than a first preset threshold value, outputting that the detection type of the audio to be detected is a positive sample.
Specifically, the classification detection result further includes the number of times of detection detected as a positive sample; if the detection times and the probability value are higher than a second preset threshold value and a first preset threshold value respectively, outputting the detection type of the audio to be detected as a positive sample.
The audio detection device performs primary screening on the acoustic features of the audio to be detected by the feature extraction module, eliminates the acoustic features belonging to the negative sample, and does not send the acoustic features into the audio detection model. Only the acoustic features belonging to the positive sample are sent into the audio detection model for reclassifying, so that the overall time consumption of the whole audio event detection can be reduced, and the efficiency of the audio detection is improved.
Optionally, after the audio detection device obtains the classification to which the acoustic feature of the audio to be detected belongs, the audio detection device may further send the classification to which the positive sample belongs, that is, the corresponding audio data or the detection result of the occurrence of the detection event, to the client (such as the mobile phone APP) of the user, so as to inform the user of the occurrence of the event, so that the user can make subsequent operations on the event in time.
In an embodiment of the present application, the audio data of the infant crying may be extracted, the audio detection device divides the audio data into audio segments with equal lengths, and performs frequency domain transformation on the audio signal of each audio segment, and converts the audio signal into a frequency domain signal from a one-dimensional audio signal, so as to improve the representation capability of the signal. The audio detection device extracts the frequency domain signal into a proper audio feature, and sends the audio feature to the feature analysis module to judge whether the baby crys for the first time. And sending the audio characteristics which are judged to occur in the infant crying event into a neural network in the audio detection model for audio detection again, so that a final detection result is obtained.
Different from the condition of the prior art, the audio detection method provided by the invention is applied to an audio detection device, and the audio detection device acquires audio to be detected; acquiring acoustic characteristics belonging to a positive sample in the audio to be detected by using a characteristic analysis model; and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on the classification detection result. By means of the method, compared with a conventional audio detection method, the method and the device for detecting the audio by using the characteristic analysis model in the audio detection device are used for carrying out primary screening on the acoustic characteristics of the audio to be detected, data which are most likely to be negative samples are removed, the data are not sent into the audio detection model for classification, only the data which belong to positive samples are sent into the audio detection model for further prediction, so that calculation time of the audio detection model is saved, calculation efficiency is improved, overall audio event prediction efficiency is improved, and time cost waste in the detection process is reduced.
The method of the above embodiment may be implemented by an audio detection device, and is described below with reference to fig. 5, where fig. 5 is a schematic structural diagram of a first embodiment of an audio detection device provided in the present application.
As shown in fig. 5, the audio detection apparatus 50 in the embodiment of the present application includes an acquisition module 51, a feature analysis module 52, and an audio detection module 53.
The acquiring module 51 is configured to acquire audio to be detected.
The feature analysis module 52 is configured to obtain acoustic features belonging to a positive sample in the audio to be detected by using the feature analysis model.
The audio detection module 53 is configured to perform classification detection on the acoustic features by using an audio detection model, and output a detection type of the audio to be detected based on a classification detection result.
The method of the above embodiment may be implemented by an audio detection device, and referring to fig. 6, fig. 6 is a schematic structural diagram of a second embodiment of the audio detection device provided in the present application, where the audio detection device 60 includes a memory 61 and a processor 62, the memory 61 is used for storing program data, and the processor 62 is used for executing the program data to implement the following method:
acquiring audio to be detected; acquiring acoustic characteristics belonging to a positive sample in the audio to be detected by using a characteristic analysis model; and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on the classification detection result.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a computer readable storage medium 70 provided in the present application, where the computer readable storage medium 70 stores program data 71, and the program data 71, when executed by a processor, is configured to implement the following method:
acquiring audio to be detected; acquiring acoustic characteristics belonging to a positive sample in the audio to be detected by using a characteristic analysis model; and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on the classification detection result.
Embodiments of the present application are implemented in the form of software functional units and sold or used as a stand-alone product, which may be stored on a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the patent application, and all equivalent structures or equivalent processes using the descriptions and the contents of the present application or other related technical fields are included in the scope of the patent application.

Claims (10)

1. An audio detection method, characterized in that the audio detection method comprises:
acquiring audio to be detected;
acquiring acoustic characteristics belonging to a positive sample in the audio to be detected by using a characteristic analysis model;
and carrying out classification detection on the acoustic features by using an audio detection model, and outputting the detection type of the audio to be detected based on a classification detection result.
2. The audio detection method of claim 1, wherein,
the classification detection result comprises a probability value detected as a positive sample;
and if the probability value is higher than a first preset threshold value, outputting the detection type of the audio to be detected as a positive sample.
3. The audio detection method of claim 2, wherein,
the classification detection result also comprises detection times of positive samples;
and if the detection times and the probability value are higher than a second preset threshold value and the first preset threshold value respectively, outputting that the detection type of the audio to be detected is a positive sample.
4. The audio detection method according to claim 1, wherein
Before the acoustic features belonging to the positive sample in the audio to be detected are acquired by using the feature analysis model, the method further comprises:
and extracting the audio characteristics of the audio to be detected by using a characteristic extraction model.
5. The audio detection method of claim 4, wherein,
the extracting the audio features of the audio to be detected by using the feature extraction model comprises:
and performing frequency domain and/or cepstrum domain conversion on the audio signal of the audio to be detected by using a feature extraction model to obtain audio features.
6. The audio detection method of claim 4, wherein,
the audio features may be one or more of logarithmic mel-spectrum, mel-cepstral coefficients, filter bank features, perceptual linear prediction, channel energy normalization features.
7. The audio detection method of claim 1, wherein,
the obtaining the acoustic features belonging to the positive sample in the audio to be detected by using the feature analysis model comprises the following steps:
obtaining parameters of a probability density function by using a maximized likelihood estimation method based on a preset positive sample by utilizing the characteristic analysis model;
training to obtain the probability density function based on the parameters;
and obtaining the acoustic characteristics belonging to the positive sample in the audio to be detected by using the probability density function.
8. The audio detection method of claim 1, wherein,
the network structure of the audio detection model comprises a grouping convolution layer and a residual layer.
9. An audio detection device, comprising a memory and a processor coupled to the memory;
wherein the memory is for storing program data and the processor is for executing the program data to implement the audio detection method according to any one of claims 1 to 8.
10. A computer storage medium for storing program data which, when executed by a computer, is adapted to carry out the audio detection method according to any one of claims 1 to 8.
CN202211734814.9A 2022-12-30 2022-12-30 Audio detection method, audio detection device and computer storage medium Pending CN116092517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211734814.9A CN116092517A (en) 2022-12-30 2022-12-30 Audio detection method, audio detection device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211734814.9A CN116092517A (en) 2022-12-30 2022-12-30 Audio detection method, audio detection device and computer storage medium

Publications (1)

Publication Number Publication Date
CN116092517A true CN116092517A (en) 2023-05-09

Family

ID=86200401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211734814.9A Pending CN116092517A (en) 2022-12-30 2022-12-30 Audio detection method, audio detection device and computer storage medium

Country Status (1)

Country Link
CN (1) CN116092517A (en)

Similar Documents

Publication Publication Date Title
CN110415728B (en) Method and device for recognizing emotion voice
US9666183B2 (en) Deep neural net based filter prediction for audio event classification and extraction
JP4177755B2 (en) Utterance feature extraction system
CN110600059B (en) Acoustic event detection method and device, electronic equipment and storage medium
US20170061978A1 (en) Real-time method for implementing deep neural network based speech separation
JP4797342B2 (en) Method and apparatus for automatically recognizing audio data
WO2017045429A1 (en) Audio data detection method and system and storage medium
US9058384B2 (en) System and method for identification of highly-variable vocalizations
CN114596879B (en) False voice detection method and device, electronic equipment and storage medium
CN112382300A (en) Voiceprint identification method, model training method, device, equipment and storage medium
CN111798875A (en) VAD implementation method based on three-value quantization compression
CN113053400B (en) Training method of audio signal noise reduction model, audio signal noise reduction method and equipment
Murugaiya et al. Probability enhanced entropy (PEE) novel feature for improved bird sound classification
CN112151055B (en) Audio processing method and device
CN113744715A (en) Vocoder speech synthesis method, device, computer equipment and storage medium
CN116386589A (en) Deep learning voice reconstruction method based on smart phone acceleration sensor
CN111755025B (en) State detection method, device and equipment based on audio features
CN111261192A (en) Audio detection method based on LSTM network, electronic equipment and storage medium
CN116312561A (en) Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system
CN116092517A (en) Audio detection method, audio detection device and computer storage medium
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
CN110767238B (en) Blacklist identification method, device, equipment and storage medium based on address information
CN113936667A (en) Bird song recognition model training method, recognition method and storage medium
Borrelli et al. Automatic reliability estimation for speech audio surveillance recordings
CN117153185B (en) Call processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination