CN115565525A - Audio anomaly detection method and device, electronic equipment and storage medium - Google Patents
Audio anomaly detection method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115565525A CN115565525A CN202211552884.2A CN202211552884A CN115565525A CN 115565525 A CN115565525 A CN 115565525A CN 202211552884 A CN202211552884 A CN 202211552884A CN 115565525 A CN115565525 A CN 115565525A
- Authority
- CN
- China
- Prior art keywords
- audio
- tensor
- detection model
- initial
- random variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 95
- 238000003860 storage Methods 0.000 title claims abstract description 11
- 238000004080 punching Methods 0.000 claims abstract description 53
- 230000002159 abnormal effect Effects 0.000 claims abstract description 32
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000005457 optimization Methods 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims abstract description 14
- 238000011156 evaluation Methods 0.000 claims abstract description 11
- 238000012937 correction Methods 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000010354 integration Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 abstract description 7
- 238000012545 processing Methods 0.000 abstract description 7
- 230000005856 abnormality Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000002354 daily effect Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 206010048909 Boredom Diseases 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000009223 counseling Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention provides an audio anomaly detection method and device, electronic equipment and a storage medium, and relates to the field of data processing. The audio anomaly detection method provided by the application comprises the steps of constructing an initial detection model; processing the initial card punching audio data to generate an audio characteristic tensor; inputting the audio characteristic tensor into an initial detection model, and outputting a first random variable and a second random variable; training the initial detection model according to the optimization function to obtain a corrected detection model; inputting the first random variable and the second random variable into a correction detection model to generate a reconstruction tensor; carrying out anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score; and if the abnormal score is larger than or equal to the abnormal threshold value, determining that the initial card punching audio data is abnormal. The embodiment jointly encodes time and spatial data, can be used for monitoring the daily state of personnel, the running state of a machine and the like, gives early warning in time, and helps enterprises, institutions and the like to manage better.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an audio anomaly detection method and device, electronic equipment and a storage medium.
Background
In the existing audio anomaly detection task, suspicious activities such as vehicle collision, shouting or gunshot detection are mainly detected, and the task is used for improving the reliability of a security system or monitoring the state of equipment. Different from image texts, the conditions for building an audio experimental environment are more rigorous, and the cost for marking audio is higher, so that the abnormal state of people is rarely and directly detected through the audio.
Currently, existing research mainly focuses on emotion recognition through a single audio, an audio data set is constructed by professional actors through emotion guidance, scene recall, environment change and the like, and data annotation is performed by experts. Such data sets suffer mainly from the following two problems: the authenticity of the mood cannot be guaranteed and there is variability from individual to individual. In addition, a great deal of time and labor are needed for manually labeling the audio data, and how to find out abnormal audio in a great deal of unlabeled audio data is currently not studied.
Disclosure of Invention
In order to solve the foregoing technical problem, embodiments of the present application provide an audio anomaly detection method and apparatus, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present application provides an audio anomaly detection method, where the method includes:
constructing an initial detection model based on the variation network and the generation network;
generating an audio feature tensor based on the initial card punching audio data;
inputting the audio feature tensor into the initial detection model, and outputting a first random variable and a second random variable through the initial detection model;
training the initial detection model according to an optimization function to obtain a corrected detection model;
inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor;
performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and if the abnormal score is larger than or equal to an abnormal threshold value, determining that the initial card punching audio data is abnormal.
In one embodiment, the step of generating an audio feature tensor based on the initial time stamp out audio data includes:
obtaining N 1 Initial card punching audio data;
preprocessing each initial card punching audio data to obtain N 1 Modifying the card punching audio data;
converting each of the modified punch-card audio data into a corresponding N 2 A characteristic data and N 2 Splicing the feature data into feature vectors;
n is to be 1 And splicing the eigenvectors into audio feature tensors.
In one embodiment, the step of preprocessing the plurality of initial card punching audio data includes:
removing the background noise of each initial card punching audio data to obtain noise-reduced card punching audio data;
and sampling the noise reduction card punching audio data according to a preset frequency.
In one embodiment, the initial detection model comprises: presetting a convolution layer, a deconvolution layer, a gate control circulation layer, a linear transformation layer and a full connection layer;
the variational network consists of a preset convolution layer, a preset anti-convolution layer and a gate control circulation layer;
the generation network is composed of a preset deconvolution layer, a gating circulation layer, a linear transformation layer and a full connection layer.
In one embodiment, the step of training the initial detection model according to an optimization function includes:
the optimization function is:
wherein,which is indicative of a loss of training,the mathematical expectation that the tensor of audio features is represented,representing a posterior probability of the generating network to the audio feature tensor,representing a posterior probability of the variational network to the audio feature tensor,the dispersion of the KL is expressed,is a constant, theta is a layer parameter of the generation network, and theta is a layer parameter of the variation network, \981;
adjusting theta and \981byrandom gradient variation estimation and reparameterization, and calculating according to the adjusted theta and \981(ii) a When in useWhen the loss is smaller than the loss threshold value, storing the adjusted theta and \981.
The step of generating a reconstruction tensor corresponding to the audio feature tensor includes:
mapping the first random variable through the linear transformation layer to obtain a mapping result;
inputting the second random variable into the preset deconvolution layer to obtain a deconvolution result;
connecting the mapping result and the deconvolution result to obtain a connection result;
and decoding the connection result through the full connection layer to obtain the reconstruction tensor.
In an embodiment, the step of performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor includes:
sampling the reconstruction tensor to obtain L reconstruction samples;
carrying out Monte Carlo integration on the L reconstruction samples to obtain reconstruction probability;
and taking the inverse number of the reconstruction probability to obtain the abnormal score corresponding to the audio feature tensor.
In a second aspect, an embodiment of the present application provides an audio anomaly detection apparatus, including:
the construction module is used for constructing an initial detection model based on the variation network and the generation network;
the first generation module is used for generating an audio feature tensor based on the initial card punching audio data;
the input module is used for inputting the audio feature tensor into the initial detection model and outputting a first random variable and a second random variable through the initial detection model;
the training module is used for training the initial detection model according to an optimization function to obtain a corrected detection model;
a second generating module, configured to input the first random variable and the second random variable into the modified detection model, and generate a reconstruction tensor corresponding to the audio feature tensor;
the computing module is used for performing anomaly evaluation computation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and the determining module is used for determining that the initial card punching audio data has abnormity if the abnormity score is larger than or equal to an abnormity threshold value.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory is used to store a computer program, and the computer program executes, when the processor runs, the audio anomaly detection method provided in the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program runs on a processor, the computer program performs the audio anomaly detection method provided in the first aspect.
In the audio anomaly detection method provided by the application, an initial detection model is constructed by adopting a variational self-encoder; processing the initial card punching audio data to generate an audio characteristic tensor; inputting the audio feature tensor into the initial detection model, and outputting a first random variable and a second random variable through the initial detection model; training the initial detection model according to an optimization function to obtain a corrected detection model; inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor; performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor; and if the abnormal score is larger than or equal to an abnormal threshold value, determining that the initial card punching audio data is abnormal. The embodiment of the application jointly encodes time and space data, detects abnormity of continuous card punching audio of the same target for the first time, can be used for monitoring daily states of monitoring personnel, machine running states and the like, gives early warning in time, and helps enterprises, institutions and the like to manage better.
Drawings
In order to more clearly explain the technical solutions of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like components are numbered similarly in the various figures.
Fig. 1 is a schematic flow chart illustrating an audio anomaly detection method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an initial detection model provided by an embodiment of the present application;
FIG. 3 is a diagram illustrating a one-dimensional feature vector provided by an embodiment of the present application;
FIG. 4 is a diagram illustrating the seven-day card punching audio feature tensor provided by the embodiment of the application;
FIG. 5 shows another schematic diagram of a time series provided by an embodiment of the present application;
fig. 6 shows a schematic structural diagram of an audio anomaly detection device provided in an embodiment of the present application.
Icon: 210-a variational network, 220-a generative network;
510-fundamental frequency characteristic anomaly in time series, 520-silence segment percentage characteristic anomaly in time series, 530-multiple characteristic anomaly in time series;
600-audio anomaly detection means, 610-construction module, 620-first generation module, 630-input module, 640-training module, 650-second generation module, 660-calculation module, 670-determination module.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of this application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.
Example 1
The embodiment of the disclosure provides an audio anomaly detection method.
Specifically, referring to fig. 1, the audio anomaly detection method includes:
step S110, constructing an initial detection model based on the variation network 210 and the generation network 220;
in one embodiment, referring to fig. 2, the initial detection model includes: presetting a convolution layer Conv1D, a deconvolution layer deConv1D, a gate control circulation layer GRU, a linear transformation layer linear and a full connection layer dense; the variational network consists of a preset convolution layer Conv1D, a preset deconvolution layer deConv1D and a gated cyclic layer GRU; the generation network is composed of a preset deconvolution layer Conv1D, a gate control circulation layer GRU, a linear transformation layer linear and a full connection layer dense. The variation network is 210, the generation network is 220, and all subsequent formulas are expressed in english for convenience of description.
Step S120, generating an audio feature tensor based on the initial card punching audio data;
in one embodiment, the step of generating an audio feature tensor based on the initial time stamp out audio data includes: obtaining N 1 Initial card punching audio data;
in one embodiment, daily audio card punching data are collected through the card punching machine to serve as initial card punching audio data, two questions are set in the card punching machine in advance, 15s of answer time is reserved behind each question, the card punching personnel answer the questions after the questions of the card punching machine, the card punching machine collects the audio of the respondents, and 30s of the daily card punching audio data of each person are obtained. In one embodiment, the initial punch-card audio data may be collected for one week continuously, at which point N 1 Is 7.
Preprocessing each initial card punching audio data to obtain N 1 Modifying the card punching audio data;
in one embodiment, the step of preprocessing the plurality of initial card punching audio data includes: removing the background noise of each initial card punching audio data to obtain noise reduction card punching audio data; and sampling the noise reduction card punching audio data according to a preset frequency.
In one embodiment, the audio denoising is removing the audio noise floor by a filter. The audio frequency down-sampling is to fix the audio frequency sampling rate at 16kHz, so that the subsequent calculation processing is convenient.
Converting each of the modified punch-card audio data into a corresponding N 2 A characteristic data and N 2 Splicing the feature data into feature vectors; n is to be 1 And splicing the eigenvectors into audio feature tensors.
In one embodiment, as shown in fig. 3, fig. 3 shows a schematic diagram of a one-dimensional feature vector provided in the embodiment of the present application. Wherein N is 2 The characteristic data comprises 1 fundamental frequency, 1 silent section percentage, 1 average energy value, 40 Mel spectrums, 13 Mel cepstrums and 12 first-order Mel cepstrums; the feature vector obtained by splicing is a one-dimensional feature vector with the length of 68, namely N at the moment 2 Equal to 68.
Splicing the audio feature vectors of the same person who makes a card every day to obtain an audio feature tensorIt is shown that,,representing a characteristic dimension, t representing a length of time,. For ease of description, the letters herein are extended to the following. In an embodiment, as shown in fig. 4, fig. 4 shows a schematic diagram of a seven-day card punching audio feature tensor obtained by splicing one-dimensional feature vectors of the same person for seven consecutive days.
Step S130, inputting the audio characteristic tensor into the initial detection model, and outputting a first random variable through the initial detection modelAnd a second random variable;
In the present embodiment, a variational self-encoder is used to construct and train an initial detection model. The variation network can be expressed as,As a result of the input audio tensor,in order to vary the layer parameters of the network,、is a random hidden variable, and is characterized in that,is used for learning the embedding of the dependency information between the characteristics,for learning the temporal embedding between features.By inputtingObtained by presetting convolutional layers, please refer to formula 1:
where k denotes after the convolution operationIs determined by the number of convolution kernels and the sliding window step size. Will be provided withThe original size is restored through the deconvolution layer in preparation for subsequent decoding.
Step S140, training the initial detection model according to an optimization function to obtain a corrected detection model;
in an embodiment of the present application, the training of the model by means of ELBO is performed according to an optimization function, and the training of the initial detection model according to the optimization function includes:
see equation 2 for the optimization function:
the formula 2 is developed to obtain
Wherein,which is indicative of a loss of training,the mathematical expectation of representing the audio feature tensor,representing a posterior probability of the generating network to the audio feature tensor,representing a posterior probability of the variational network to the audio feature tensor,the degree of divergence of the KL is expressed,is a constant, theta is a layer parameter of the generation network, and theta is a layer parameter of the variation network, \981;
adjusting theta and \981byrandom gradient variation and reparameterization, and calculating according to the adjusted theta and \981(ii) a When in useWhen the loss is smaller than the loss threshold value, storing the adjusted theta and \981.
Wherein the KL divergence is used to describe the difference of the two probability distributions, hereAs a regularization term, the effect is to make the variation distribution have a certain randomness. Optimization objectives it is desirable that the variational and posterior distributions be as identical as possible and pass、ReconstructionIs more probable, so random gradient variation estimation (SGVB) and reparameterization can be used to optimize parameters θ and \981And minimum.
Specifically, the first step may beSampling several points and integrating the points by Monte CarloHowever, the sampled data are discrete, in other words, the sampled data are not derivable, and consequently the inverse gradient optimization is not possible eitherAt this point, a re-parameterization technique may be introduced, introducing parameters of known form, to make the sampling conductive.
Step S150, inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor;
the step of generating a reconstruction tensor corresponding to the audio feature tensor includes:
mapping the first random variable through the linear transformation layer to obtain a mapping result; inputting the second random variable into the preset deconvolution layer to obtain a deconvolution result; connecting the mapping result and the deconvolution result to obtain a connection result; and decoding the connection result through the full connection layer to obtain the reconstruction tensor.
As shown in fig. 2, the input audio tensorObtaining a second random variable through a predetermined convolutional layerSince abnormal data may be included in the feature data, overfitting is likely to occur in the process of training the self-encoder. Therefore, to prevent overfitting of the model to the anomalous data, a second random variable needs to be appliedAnd performing moving average processing to eliminate abnormal characteristic points. Eliminating abnormal characteristic pointsAfter the division, the input gate control loop layer GRU is coded to obtain a first random variableFirst random variableLearning is the dependency information embedding between features, the length is consistent with the input, please see formula 3:
Generating a network may be represented as,For generating network layer parameters, the input is a first random variableAnd a second random variableBy applying to the first random variableMapping is carried out to obtain a mapping result; second randomVariables ofInputting a preset deconvolution layer to obtain a deconvolution result; connecting the mapping result and the deconvolution result through a connection function (concat function) to obtain a connection result; the reconstruction tensor of the original audio is generated by jointly decoding the connected result, namely the dependency information embedding and the time sequence embedding between the characteristics through the full connection layerThe size is consistent with the original input, please see equation 4:
step S160, carrying out anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor;
the step of performing anomaly evaluation calculation on the reconstruction tensor comprises:
sampling the reconstruction tensor to obtain L reconstruction samples; carrying out Monte Carlo integration on the L reconstruction samples to obtain reconstruction probability; and taking the inverse number of the reconstruction probability to obtain the abnormal score corresponding to the audio feature tensor. Specifically, please see equation 5:
wherein,as the anomaly score, the meaning of the anomaly score is a reconstruction tensorThe mathematical expectation of the abnormal value of (a),representing a Monte Carlo integration of the L reconstructed samples, whereinIs fromAnd obtaining the intermediate sample.) Representing the probability of the l-th reconstructed sample.
In the abnormality detection, the reconstruction probability is used as an abnormality index. Assume that the input is,In order to observe the data in the field,for missing data, assumeObey to observed dataCan be distributed fromIn distribution pairSampling is carried out at a given pointIs reconstructed under the circumstances ofObserving the values to obtain missing values,Satisfy the observation dataIn a normal mode, i.e. close to. Order the reconstructed data toThe reconstruction probability can be obtained byThe samples are calculated by Monte Carlo integration, and the abnormal score is the inverse number of the reconstruction probability, and the calculation formula is the above formula 5.
Step S170, if the abnormal score is larger than or equal to an abnormal threshold, determining that the initial card punching audio data is abnormal. Setting an anomaly thresholdWhen the calculated abnormality score is greater than the threshold valueAnd prompting that the initial card punching audio data is abnormal.
Referring to fig. 4 and 5, in an embodiment, the data of 7-day continuous card punching audio of 10 volunteers are collected, fig. 4 is a spatial sequence corresponding to 7-day continuous card punching audio processing results of an abnormal volunteer, and fig. 3 is a one-dimensional feature vector of the volunteer in a corresponding time sequence. Converting the data of the punch cards of the continuous 7 days into audio feature tensors, and then carrying out anomaly monitoring on a time sequence and a space sequence, wherein the model can monitor the data which is obviously abnormal on the time sequence and can monitor the anomalies among the features in the audio of the same day, and the data trends of the fundamental frequency (510 in figure 5) of the first day and the silent section percentage (520 in figure 5) of the sixth day are opposite to the trend between the features of the data at ordinary times, such as the data of the fourth day (530 in figure 5) which is obviously abnormal compared with the data of the previous three days. And after the card is punched on the fourth day, the detection model is corrected to give an early warning in time, the volunteer is known after interviewing, the psychological conflict of boredom occurs during the card punching due to the influence of sleep, and after psychological counseling, the subsequent card punching data are recovered to be normal.
The audio anomaly detection method provided by the embodiment combines a variational self-encoder to jointly encode time and space data, performs anomaly detection on continuous time-stamped audio of the same target for the first time, can be used for monitoring the daily state of personnel, the running state of a machine and the like, and can be used for early warning in time to help enterprises, institutions and the like to better manage.
Example 2
In addition, the embodiment of the disclosure provides an audio anomaly detection device.
Specifically, as shown in fig. 6, the audio abnormality detection apparatus 600 includes:
a construction module 610, configured to construct an initial detection model based on the variation network and the generation network;
a first generating module 620, configured to generate an audio feature tensor based on the initial time stamp out audio data;
an input module 630, configured to input the audio feature tensor into the initial detection model, and output a first random variable and a second random variable through the initial detection model;
a training module 640, configured to train the initial detection model according to an optimization function to obtain a modified detection model;
a second generating module 650, configured to input the first random variable and the second random variable into the modified detection model, and generate a reconstruction tensor corresponding to the audio feature tensor;
a calculating module 660, configured to perform anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor;
a determining module 670, configured to determine that there is an abnormality in the initial card punching audio data if the abnormality score is greater than or equal to an abnormality threshold.
The audio anomaly detection apparatus 600 provided in this embodiment can implement the audio anomaly detection method provided in embodiment 1, and is not described herein again to avoid repetition.
The audio frequency anomaly detection device provided by the embodiment combines a variational self-encoder to jointly encode time and spatial data, performs anomaly detection on continuous time-stamped audio frequency of the same target for the first time, can be used for monitoring daily states of personnel, machine running states and the like, gives an early warning in time, and helps enterprises, institutions and the like to better manage.
Example 3
Furthermore, an embodiment of the present disclosure provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the computer program executes the audio anomaly detection method provided in embodiment 1 when running on the processor.
The electronic device provided in the embodiment of the present invention may execute steps that may be executed by the audio anomaly detection apparatus in the above method embodiment, and details are not described again.
The electronic equipment that this embodiment provided combines variational autoencoder, jointly encodes time and space data, carries out anomaly detection to the audio frequency of checking card in succession of the same target for the first time, can be used to monitor personnel state every day, machine running state etc. and timely early warning helps enterprise, organ unit etc. to manage better.
Example 4
The present application also provides a computer-readable storage medium on which a computer program is stored, where the computer program, when executed by a processor, implements the audio anomaly detection method provided in embodiment 1.
In this embodiment, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The computer-readable storage medium provided in this embodiment may implement the audio anomaly detection method provided in embodiment 1, and is not described herein again to avoid repetition.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional like elements in the process, method, article, or terminal that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A method of audio anomaly detection, the method comprising:
constructing an initial detection model based on a variation network and a generation network;
generating an audio feature tensor based on the initial card punching audio data;
inputting the audio feature tensor into the initial detection model, and outputting a first random variable and a second random variable through the initial detection model;
training the initial detection model according to an optimization function to obtain a corrected detection model;
inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor;
performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and if the abnormal score is larger than or equal to an abnormal threshold value, determining that the initial card punching audio data is abnormal.
2. The audio anomaly detection method according to claim 1, wherein the step of generating an audio feature tensor based on the initial card punching audio data comprises:
obtaining N 1 Initial card punching audio data;
preprocessing each initial card punching audio data to obtain N 1 Modifying the card punching audio data;
converting each of the modified punch-card audio data into a corresponding N 2 A feature data and N 2 Splicing the feature data into feature vectors;
n is to be 1 And splicing the eigenvectors into audio feature tensors.
3. The method of claim 2, wherein the step of preprocessing each of the initial card punching audio data comprises:
removing the background noise of each initial card punching audio data to obtain noise-reduced card punching audio data;
and sampling the noise reduction card punching audio data according to a preset frequency.
4. The audio anomaly detection method according to claim 1, characterized in that said initial detection model comprises: presetting a convolution layer, a preset deconvolution layer, a gate control circulation layer, a linear transformation layer and a full connection layer;
the variational network consists of a preset convolution layer, a preset anti-convolution layer and a gate control circulation layer;
the generation network is composed of a preset deconvolution layer, a gate control circulation layer, a linear transformation layer and a full connection layer.
5. The audio anomaly detection method of claim 4, wherein said step of training said initial detection model according to an optimization function comprises:
the optimization function is:
wherein,which is indicative of a loss of training,the mathematical expectation that the tensor of audio features is represented,representing a posterior probability of the generating network to the audio feature tensor,representing a posterior probability of the variational network to the audio feature tensor,the degree of divergence of the KL is expressed,is a constant, theta is a layer parameter of the generation network, and theta is a layer parameter of the variation network, \981;
adjusting theta and \981byrandom gradient variation estimation and re-parameterization, and calculating according to the adjusted theta and \981;
6. The method according to claim 5, wherein the step of generating the reconstruction tensor corresponding to the audio feature tensor comprises:
mapping the first random variable through the linear transformation layer to obtain a mapping result;
inputting the second random variable into the preset deconvolution layer to obtain a deconvolution result;
connecting the mapping result and the deconvolution result to obtain a connection result;
and decoding the connection result through the full connection layer to obtain the reconstruction tensor.
7. The method according to claim 1, wherein the step of performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor comprises:
sampling the reconstruction tensor to obtain L reconstruction samples;
carrying out Monte Carlo integration on the L reconstruction samples to obtain reconstruction probability;
and taking the inverse number of the reconstruction probability to obtain the abnormal score.
8. An audio anomaly detection apparatus, the apparatus comprising:
the construction module is used for constructing an initial detection model based on the variation network and the generation network;
the first generation module is used for generating an audio feature tensor based on the initial card punching audio data;
the input module is used for inputting the audio feature tensor into the initial detection model and outputting a first random variable and a second random variable through the initial detection model;
the training module is used for training the initial detection model according to an optimization function to obtain a corrected detection model;
a second generating module, configured to input the first random variable and the second random variable into the modified detection model, and generate a reconstruction tensor corresponding to the audio feature tensor;
the calculation module is used for performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and the determining module is used for determining that the initial card punching audio data is abnormal if the abnormal score is greater than or equal to an abnormal threshold.
9. An electronic device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, performs the audio anomaly detection method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the audio anomaly detection method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211552884.2A CN115565525A (en) | 2022-12-06 | 2022-12-06 | Audio anomaly detection method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211552884.2A CN115565525A (en) | 2022-12-06 | 2022-12-06 | Audio anomaly detection method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115565525A true CN115565525A (en) | 2023-01-03 |
Family
ID=84769976
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211552884.2A Pending CN115565525A (en) | 2022-12-06 | 2022-12-06 | Audio anomaly detection method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115565525A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116994609A (en) * | 2023-09-28 | 2023-11-03 | 苏州芯合半导体材料有限公司 | Data analysis method and system applied to intelligent production line |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112466290A (en) * | 2021-02-02 | 2021-03-09 | 鹏城实验室 | Abnormal sound detection model training method and device and computer storage medium |
US11010666B1 (en) * | 2017-10-24 | 2021-05-18 | Tunnel Technologies Inc. | Systems and methods for generation and use of tensor networks |
US11075933B1 (en) * | 2019-03-27 | 2021-07-27 | Ca, Inc. | Abnormal user behavior detection |
CN113255835A (en) * | 2021-06-28 | 2021-08-13 | 国能大渡河大数据服务有限公司 | Hydropower station pump equipment anomaly detection method |
CN114386521A (en) * | 2022-01-14 | 2022-04-22 | 湖南师范大学 | Method, system, device and storage medium for detecting abnormality of time-series data |
CN114400019A (en) * | 2021-12-31 | 2022-04-26 | 深圳市声扬科技有限公司 | Model generation method, abnormality detection device, and electronic apparatus |
-
2022
- 2022-12-06 CN CN202211552884.2A patent/CN115565525A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11010666B1 (en) * | 2017-10-24 | 2021-05-18 | Tunnel Technologies Inc. | Systems and methods for generation and use of tensor networks |
US11075933B1 (en) * | 2019-03-27 | 2021-07-27 | Ca, Inc. | Abnormal user behavior detection |
CN112466290A (en) * | 2021-02-02 | 2021-03-09 | 鹏城实验室 | Abnormal sound detection model training method and device and computer storage medium |
CN113255835A (en) * | 2021-06-28 | 2021-08-13 | 国能大渡河大数据服务有限公司 | Hydropower station pump equipment anomaly detection method |
CN114400019A (en) * | 2021-12-31 | 2022-04-26 | 深圳市声扬科技有限公司 | Model generation method, abnormality detection device, and electronic apparatus |
CN114386521A (en) * | 2022-01-14 | 2022-04-22 | 湖南师范大学 | Method, system, device and storage medium for detecting abnormality of time-series data |
Non-Patent Citations (1)
Title |
---|
ZHIHAN LI等: "Multivariate time series anomaly detection and interpretation using hierarchical inter-metric and temporal embedding", 《KDD "21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116994609A (en) * | 2023-09-28 | 2023-11-03 | 苏州芯合半导体材料有限公司 | Data analysis method and system applied to intelligent production line |
CN116994609B (en) * | 2023-09-28 | 2023-12-01 | 苏州芯合半导体材料有限公司 | Data analysis method and system applied to intelligent production line |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Deep Gaussian mixture-hidden Markov model for classification of EEG signals | |
Clifton et al. | Novelty detection with multivariate extreme value statistics | |
Aquaro et al. | A Bayesian networks approach to operational risk | |
US10452961B2 (en) | Learning temporal patterns from electronic health records | |
CN112418059B (en) | Emotion recognition method and device, computer equipment and storage medium | |
Ma et al. | Deep wavelet sequence-based gated recurrent units for the prognosis of rotating machinery | |
US20220015659A1 (en) | Processing time-frequency representations of eeg data using neural networks | |
Ganesan et al. | Fault detection in satellite power system using convolutional neural network | |
JP2021528743A (en) | Time behavior analysis of network traffic | |
CN115565525A (en) | Audio anomaly detection method and device, electronic equipment and storage medium | |
CN115223251A (en) | Training method and device for signature detection model, electronic equipment and storage medium | |
Bui et al. | Accuracy improvement of various short-term load forecasting models by a novel and unified statistical data-filtering method | |
CN115969392A (en) | Cross-period brainprint recognition method based on tensor frequency space attention domain adaptive network | |
Maharaj et al. | Discrimination of locally stationary time series using wavelets | |
Hemachandira et al. | A Framework on Performance Analysis of Mathematical Model‐Based Classifiers in Detection of Epileptic Seizure from EEG Signals with Efficient Feature Selection | |
Simon et al. | Deep Learning and XAI Techniques for Anomaly Detection: Integrate the theory and practice of deep anomaly explainability | |
Chen et al. | Single-channel bearing vibration signal blind source separation method based on morphological filter and optimal matching pursuit (MP) algorithm | |
Ugot et al. | Biometric fingerprint generation using generative adversarial networks | |
García-Ordás et al. | Multispecies bird sound recognition using a fully convolutional neural network | |
Brown et al. | ThrEEBoost: thresholded boosting for variable selection and prediction via estimating equations | |
Tong et al. | A fault diagnosis approach for rolling element bearings based on dual-tree complex wavelet packet transform-improved intrinsic time-scale decomposition, singular value decomposition, and online sequential extreme learning machine | |
Kumar et al. | Wavelet bispectrum-based nonlinear features for cardiac murmur identification | |
Sharifi et al. | A cluster-based machine learning model for large healthcare data analysis | |
Firpi et al. | On prediction of epileptic seizures by means of genetic programming artificial features | |
Sundaram et al. | Denoising Algorithm for Subtle Anomaly Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230103 |