CN115565525A - Audio anomaly detection method and device, electronic equipment and storage medium - Google Patents

Audio anomaly detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115565525A
CN115565525A CN202211552884.2A CN202211552884A CN115565525A CN 115565525 A CN115565525 A CN 115565525A CN 202211552884 A CN202211552884 A CN 202211552884A CN 115565525 A CN115565525 A CN 115565525A
Authority
CN
China
Prior art keywords
audio
tensor
detection model
initial
random variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211552884.2A
Other languages
Chinese (zh)
Inventor
张伟
郑子强
何得淮
何行知
姚佳
唐怀都
朱鑫海
路浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Provincial Prison Administration
West China Hospital of Sichuan University
Original Assignee
Sichuan Provincial Prison Administration
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Provincial Prison Administration, West China Hospital of Sichuan University filed Critical Sichuan Provincial Prison Administration
Priority to CN202211552884.2A priority Critical patent/CN115565525A/en
Publication of CN115565525A publication Critical patent/CN115565525A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering

Abstract

The embodiment of the invention provides an audio anomaly detection method and device, electronic equipment and a storage medium, and relates to the field of data processing. The audio anomaly detection method provided by the application comprises the steps of constructing an initial detection model; processing the initial card punching audio data to generate an audio characteristic tensor; inputting the audio characteristic tensor into an initial detection model, and outputting a first random variable and a second random variable; training the initial detection model according to the optimization function to obtain a corrected detection model; inputting the first random variable and the second random variable into a correction detection model to generate a reconstruction tensor; carrying out anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score; and if the abnormal score is larger than or equal to the abnormal threshold value, determining that the initial card punching audio data is abnormal. The embodiment jointly encodes time and spatial data, can be used for monitoring the daily state of personnel, the running state of a machine and the like, gives early warning in time, and helps enterprises, institutions and the like to manage better.

Description

Audio anomaly detection method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to an audio anomaly detection method and device, electronic equipment and a storage medium.
Background
In the existing audio anomaly detection task, suspicious activities such as vehicle collision, shouting or gunshot detection are mainly detected, and the task is used for improving the reliability of a security system or monitoring the state of equipment. Different from image texts, the conditions for building an audio experimental environment are more rigorous, and the cost for marking audio is higher, so that the abnormal state of people is rarely and directly detected through the audio.
Currently, existing research mainly focuses on emotion recognition through a single audio, an audio data set is constructed by professional actors through emotion guidance, scene recall, environment change and the like, and data annotation is performed by experts. Such data sets suffer mainly from the following two problems: the authenticity of the mood cannot be guaranteed and there is variability from individual to individual. In addition, a great deal of time and labor are needed for manually labeling the audio data, and how to find out abnormal audio in a great deal of unlabeled audio data is currently not studied.
Disclosure of Invention
In order to solve the foregoing technical problem, embodiments of the present application provide an audio anomaly detection method and apparatus, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present application provides an audio anomaly detection method, where the method includes:
constructing an initial detection model based on the variation network and the generation network;
generating an audio feature tensor based on the initial card punching audio data;
inputting the audio feature tensor into the initial detection model, and outputting a first random variable and a second random variable through the initial detection model;
training the initial detection model according to an optimization function to obtain a corrected detection model;
inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor;
performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and if the abnormal score is larger than or equal to an abnormal threshold value, determining that the initial card punching audio data is abnormal.
In one embodiment, the step of generating an audio feature tensor based on the initial time stamp out audio data includes:
obtaining N 1 Initial card punching audio data;
preprocessing each initial card punching audio data to obtain N 1 Modifying the card punching audio data;
converting each of the modified punch-card audio data into a corresponding N 2 A characteristic data and N 2 Splicing the feature data into feature vectors;
n is to be 1 And splicing the eigenvectors into audio feature tensors.
In one embodiment, the step of preprocessing the plurality of initial card punching audio data includes:
removing the background noise of each initial card punching audio data to obtain noise-reduced card punching audio data;
and sampling the noise reduction card punching audio data according to a preset frequency.
In one embodiment, the initial detection model comprises: presetting a convolution layer, a deconvolution layer, a gate control circulation layer, a linear transformation layer and a full connection layer;
the variational network consists of a preset convolution layer, a preset anti-convolution layer and a gate control circulation layer;
the generation network is composed of a preset deconvolution layer, a gating circulation layer, a linear transformation layer and a full connection layer.
In one embodiment, the step of training the initial detection model according to an optimization function includes:
the optimization function is:
Figure M_221130152803963_963094001
wherein the content of the first and second substances,
Figure M_221130152804060_060297001
which is indicative of a loss of training,
Figure M_221130152804091_091493002
the mathematical expectation that the tensor of audio features is represented,
Figure M_221130152804122_122748003
representing a posterior probability of the generating network to the audio feature tensor,
Figure M_221130152804169_169619004
representing a posterior probability of the variational network to the audio feature tensor,
Figure M_221130152804218_218948005
the dispersion of the KL is expressed,
Figure M_221130152804234_234562006
is a constant, theta is a layer parameter of the generation network, and theta is a layer parameter of the variation network, \981;
adjusting theta and \981byrandom gradient variation estimation and reparameterization, and calculating according to the adjusted theta and \981
Figure M_221130152804265_265834001
(ii) a When in use
Figure M_221130152804297_297080002
When the loss is smaller than the loss threshold value, storing the adjusted theta and \981.
The step of generating a reconstruction tensor corresponding to the audio feature tensor includes:
mapping the first random variable through the linear transformation layer to obtain a mapping result;
inputting the second random variable into the preset deconvolution layer to obtain a deconvolution result;
connecting the mapping result and the deconvolution result to obtain a connection result;
and decoding the connection result through the full connection layer to obtain the reconstruction tensor.
In an embodiment, the step of performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor includes:
sampling the reconstruction tensor to obtain L reconstruction samples;
carrying out Monte Carlo integration on the L reconstruction samples to obtain reconstruction probability;
and taking the inverse number of the reconstruction probability to obtain the abnormal score corresponding to the audio feature tensor.
In a second aspect, an embodiment of the present application provides an audio anomaly detection apparatus, including:
the construction module is used for constructing an initial detection model based on the variation network and the generation network;
the first generation module is used for generating an audio feature tensor based on the initial card punching audio data;
the input module is used for inputting the audio feature tensor into the initial detection model and outputting a first random variable and a second random variable through the initial detection model;
the training module is used for training the initial detection model according to an optimization function to obtain a corrected detection model;
a second generating module, configured to input the first random variable and the second random variable into the modified detection model, and generate a reconstruction tensor corresponding to the audio feature tensor;
the computing module is used for performing anomaly evaluation computation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and the determining module is used for determining that the initial card punching audio data has abnormity if the abnormity score is larger than or equal to an abnormity threshold value.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory is used to store a computer program, and the computer program executes, when the processor runs, the audio anomaly detection method provided in the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program runs on a processor, the computer program performs the audio anomaly detection method provided in the first aspect.
In the audio anomaly detection method provided by the application, an initial detection model is constructed by adopting a variational self-encoder; processing the initial card punching audio data to generate an audio characteristic tensor; inputting the audio feature tensor into the initial detection model, and outputting a first random variable and a second random variable through the initial detection model; training the initial detection model according to an optimization function to obtain a corrected detection model; inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor; performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor; and if the abnormal score is larger than or equal to an abnormal threshold value, determining that the initial card punching audio data is abnormal. The embodiment of the application jointly encodes time and space data, detects abnormity of continuous card punching audio of the same target for the first time, can be used for monitoring daily states of monitoring personnel, machine running states and the like, gives early warning in time, and helps enterprises, institutions and the like to manage better.
Drawings
In order to more clearly explain the technical solutions of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like components are numbered similarly in the various figures.
Fig. 1 is a schematic flow chart illustrating an audio anomaly detection method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an initial detection model provided by an embodiment of the present application;
FIG. 3 is a diagram illustrating a one-dimensional feature vector provided by an embodiment of the present application;
FIG. 4 is a diagram illustrating the seven-day card punching audio feature tensor provided by the embodiment of the application;
FIG. 5 shows another schematic diagram of a time series provided by an embodiment of the present application;
fig. 6 shows a schematic structural diagram of an audio anomaly detection device provided in an embodiment of the present application.
Icon: 210-a variational network, 220-a generative network;
510-fundamental frequency characteristic anomaly in time series, 520-silence segment percentage characteristic anomaly in time series, 530-multiple characteristic anomaly in time series;
600-audio anomaly detection means, 610-construction module, 620-first generation module, 630-input module, 640-training module, 650-second generation module, 660-calculation module, 670-determination module.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of this application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.
Example 1
The embodiment of the disclosure provides an audio anomaly detection method.
Specifically, referring to fig. 1, the audio anomaly detection method includes:
step S110, constructing an initial detection model based on the variation network 210 and the generation network 220;
in one embodiment, referring to fig. 2, the initial detection model includes: presetting a convolution layer Conv1D, a deconvolution layer deConv1D, a gate control circulation layer GRU, a linear transformation layer linear and a full connection layer dense; the variational network consists of a preset convolution layer Conv1D, a preset deconvolution layer deConv1D and a gated cyclic layer GRU; the generation network is composed of a preset deconvolution layer Conv1D, a gate control circulation layer GRU, a linear transformation layer linear and a full connection layer dense. The variation network is 210, the generation network is 220, and all subsequent formulas are expressed in english for convenience of description.
Step S120, generating an audio feature tensor based on the initial card punching audio data;
in one embodiment, the step of generating an audio feature tensor based on the initial time stamp out audio data includes: obtaining N 1 Initial card punching audio data;
in one embodiment, daily audio card punching data are collected through the card punching machine to serve as initial card punching audio data, two questions are set in the card punching machine in advance, 15s of answer time is reserved behind each question, the card punching personnel answer the questions after the questions of the card punching machine, the card punching machine collects the audio of the respondents, and 30s of the daily card punching audio data of each person are obtained. In one embodiment, the initial punch-card audio data may be collected for one week continuously, at which point N 1 Is 7.
Preprocessing each initial card punching audio data to obtain N 1 Modifying the card punching audio data;
in one embodiment, the step of preprocessing the plurality of initial card punching audio data includes: removing the background noise of each initial card punching audio data to obtain noise reduction card punching audio data; and sampling the noise reduction card punching audio data according to a preset frequency.
In one embodiment, the audio denoising is removing the audio noise floor by a filter. The audio frequency down-sampling is to fix the audio frequency sampling rate at 16kHz, so that the subsequent calculation processing is convenient.
Converting each of the modified punch-card audio data into a corresponding N 2 A characteristic data and N 2 Splicing the feature data into feature vectors; n is to be 1 And splicing the eigenvectors into audio feature tensors.
In one embodiment, as shown in fig. 3, fig. 3 shows a schematic diagram of a one-dimensional feature vector provided in the embodiment of the present application. Wherein N is 2 The characteristic data comprises 1 fundamental frequency, 1 silent section percentage, 1 average energy value, 40 Mel spectrums, 13 Mel cepstrums and 12 first-order Mel cepstrums; the feature vector obtained by splicing is a one-dimensional feature vector with the length of 68, namely N at the moment 2 Equal to 68.
Splicing the audio feature vectors of the same person who makes a card every day to obtain an audio feature tensor
Figure M_221130152804328_328333001
It is shown that,
Figure M_221130152804343_343934002
Figure M_221130152804411_411815003
representing a characteristic dimension, t representing a length of time,
Figure M_221130152804443_443086004
. For ease of description, the letters herein are extended to the following. In an embodiment, as shown in fig. 4, fig. 4 shows a schematic diagram of a seven-day card punching audio feature tensor obtained by splicing one-dimensional feature vectors of the same person for seven consecutive days.
Step S130, inputting the audio characteristic tensor into the initial detection model, and outputting a first random variable through the initial detection model
Figure M_221130152804474_474329001
And a second random variable
Figure M_221130152804489_489933002
In the present embodiment, a variational self-encoder is used to construct and train an initial detection model. The variation network can be expressed as
Figure M_221130152804521_521196001
Figure M_221130152804568_568078002
As a result of the input audio tensor,
Figure M_221130152804586_586102003
in order to vary the layer parameters of the network,
Figure M_221130152804794_794599004
Figure M_221130152804841_841995005
is a random hidden variable, and is characterized in that,
Figure M_221130152804873_873230006
is used for learning the embedding of the dependency information between the characteristics,
Figure M_221130152804904_904484007
for learning the temporal embedding between features.
Figure M_221130152804920_920148008
By inputting
Figure M_221130152804951_951396009
Obtained by presetting convolutional layers, please refer to formula 1:
Figure M_221130152804985_985524001
Figure M_221130152805019_019212002
where k denotes after the convolution operation
Figure M_221130152805050_050969001
Is determined by the number of convolution kernels and the sliding window step size. Will be provided with
Figure M_221130152805082_082218002
The original size is restored through the deconvolution layer in preparation for subsequent decoding.
Step S140, training the initial detection model according to an optimization function to obtain a corrected detection model;
in an embodiment of the present application, the training of the model by means of ELBO is performed according to an optimization function, and the training of the initial detection model according to the optimization function includes:
see equation 2 for the optimization function:
Figure P_221130152805097_097870001
the formula 2 is developed to obtain
Figure P_221130152805129_129110001
Wherein the content of the first and second substances,
Figure M_221130152805175_175969001
which is indicative of a loss of training,
Figure M_221130152805208_208212002
the mathematical expectation of representing the audio feature tensor,
Figure M_221130152805239_239938003
representing a posterior probability of the generating network to the audio feature tensor,
Figure M_221130152805271_271191004
representing a posterior probability of the variational network to the audio feature tensor,
Figure M_221130152805317_317610005
the degree of divergence of the KL is expressed,
Figure M_221130152805333_333694006
is a constant, theta is a layer parameter of the generation network, and theta is a layer parameter of the variation network, \981;
adjusting theta and \981byrandom gradient variation and reparameterization, and calculating according to the adjusted theta and \981
Figure M_221130152805364_364961001
(ii) a When in use
Figure M_221130152805397_397713002
When the loss is smaller than the loss threshold value, storing the adjusted theta and \981.
Wherein the KL divergence is used to describe the difference of the two probability distributions, here
Figure M_221130152805428_428925001
As a regularization term, the effect is to make the variation distribution have a certain randomness. Optimization objectives it is desirable that the variational and posterior distributions be as identical as possible and pass
Figure M_221130152805460_460156002
Figure M_221130152805491_491401003
Reconstruction
Figure M_221130152805507_507048004
Is more probable, so random gradient variation estimation (SGVB) and reparameterization can be used to optimize parameters θ and \981
Figure M_221130152805538_538283005
And minimum.
Specifically, the first step may be
Figure M_221130152805569_569556001
Sampling several points and integrating the points by Monte Carlo
Figure M_221130152805603_603190002
However, the sampled data are discrete, in other words, the sampled data are not derivable, and consequently the inverse gradient optimization is not possible either
Figure M_221130152805663_663281003
At this point, a re-parameterization technique may be introduced, introducing parameters of known form, to make the sampling conductive.
Step S150, inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor;
the step of generating a reconstruction tensor corresponding to the audio feature tensor includes:
mapping the first random variable through the linear transformation layer to obtain a mapping result; inputting the second random variable into the preset deconvolution layer to obtain a deconvolution result; connecting the mapping result and the deconvolution result to obtain a connection result; and decoding the connection result through the full connection layer to obtain the reconstruction tensor.
As shown in fig. 2, the input audio tensor
Figure M_221130152805694_694054001
Obtaining a second random variable through a predetermined convolutional layer
Figure M_221130152805710_710143002
Since abnormal data may be included in the feature data, overfitting is likely to occur in the process of training the self-encoder. Therefore, to prevent overfitting of the model to the anomalous data, a second random variable needs to be applied
Figure M_221130152805741_741397003
And performing moving average processing to eliminate abnormal characteristic points. Eliminating abnormal characteristic pointsAfter the division, the input gate control loop layer GRU is coded to obtain a first random variable
Figure M_221130152805772_772647004
First random variable
Figure M_221130152805793_793618005
Learning is the dependency information embedding between features, the length is consistent with the input, please see formula 3:
Figure M_221130152805825_825393001
Figure M_221130152805872_872255002
wherein
Figure M_221130152805919_919139001
Is composed of
Figure M_221130152805950_950385002
Is determined by the output layer dimension of the gated loop layer GRU.
Generating a network may be represented as
Figure M_221130152805983_983555001
Figure M_221130152806015_015348002
For generating network layer parameters, the input is a first random variable
Figure M_221130152806046_046574003
And a second random variable
Figure M_221130152806062_062201004
By applying to the first random variable
Figure M_221130152806093_093467005
Mapping is carried out to obtain a mapping result; second randomVariables of
Figure M_221130152806124_124700006
Inputting a preset deconvolution layer to obtain a deconvolution result; connecting the mapping result and the deconvolution result through a connection function (concat function) to obtain a connection result; the reconstruction tensor of the original audio is generated by jointly decoding the connected result, namely the dependency information embedding and the time sequence embedding between the characteristics through the full connection layer
Figure M_221130152806140_140343007
The size is consistent with the original input, please see equation 4:
Figure M_221130152806171_171582001
Figure M_221130152806220_220423002
step S160, carrying out anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor;
the step of performing anomaly evaluation calculation on the reconstruction tensor comprises:
sampling the reconstruction tensor to obtain L reconstruction samples; carrying out Monte Carlo integration on the L reconstruction samples to obtain reconstruction probability; and taking the inverse number of the reconstruction probability to obtain the abnormal score corresponding to the audio feature tensor. Specifically, please see equation 5:
Figure M_221130152806267_267281001
wherein the content of the first and second substances,
Figure M_221130152806376_376662001
as the anomaly score, the meaning of the anomaly score is a reconstruction tensor
Figure M_221130152806441_441125002
The mathematical expectation of the abnormal value of (a),
Figure M_221130152806472_472373003
representing a Monte Carlo integration of the L reconstructed samples, wherein
Figure M_221130152806534_534881004
Is from
Figure M_221130152806566_566133005
And obtaining the intermediate sample.
Figure M_221130152806614_614961006
) Representing the probability of the l-th reconstructed sample.
In the abnormality detection, the reconstruction probability is used as an abnormality index. Assume that the input is
Figure M_221130152806661_661810001
Figure M_221130152806693_693072002
In order to observe the data in the field,
Figure M_221130152806724_724330003
for missing data, assume
Figure M_221130152806755_755578004
Obey to observed data
Figure M_221130152806787_787751005
Can be distributed from
Figure M_221130152806820_820521006
In distribution pair
Figure M_221130152806851_851770007
Sampling is carried out at a given point
Figure M_221130152806883_883017008
Is reconstructed under the circumstances ofObserving the values to obtain missing values
Figure M_221130152806898_898651009
Figure M_221130152806929_929887010
Satisfy the observation data
Figure M_221130152806961_961151011
In a normal mode, i.e. close to
Figure M_221130152806993_993391012
. Order the reconstructed data to
Figure M_221130152807026_026071013
The reconstruction probability can be obtained by
Figure M_221130152807057_057338014
The samples are calculated by Monte Carlo integration, and the abnormal score is the inverse number of the reconstruction probability, and the calculation formula is the above formula 5.
Step S170, if the abnormal score is larger than or equal to an abnormal threshold, determining that the initial card punching audio data is abnormal. Setting an anomaly threshold
Figure M_221130152807088_088607001
When the calculated abnormality score is greater than the threshold value
Figure M_221130152807104_104221002
And prompting that the initial card punching audio data is abnormal.
Referring to fig. 4 and 5, in an embodiment, the data of 7-day continuous card punching audio of 10 volunteers are collected, fig. 4 is a spatial sequence corresponding to 7-day continuous card punching audio processing results of an abnormal volunteer, and fig. 3 is a one-dimensional feature vector of the volunteer in a corresponding time sequence. Converting the data of the punch cards of the continuous 7 days into audio feature tensors, and then carrying out anomaly monitoring on a time sequence and a space sequence, wherein the model can monitor the data which is obviously abnormal on the time sequence and can monitor the anomalies among the features in the audio of the same day, and the data trends of the fundamental frequency (510 in figure 5) of the first day and the silent section percentage (520 in figure 5) of the sixth day are opposite to the trend between the features of the data at ordinary times, such as the data of the fourth day (530 in figure 5) which is obviously abnormal compared with the data of the previous three days. And after the card is punched on the fourth day, the detection model is corrected to give an early warning in time, the volunteer is known after interviewing, the psychological conflict of boredom occurs during the card punching due to the influence of sleep, and after psychological counseling, the subsequent card punching data are recovered to be normal.
The audio anomaly detection method provided by the embodiment combines a variational self-encoder to jointly encode time and space data, performs anomaly detection on continuous time-stamped audio of the same target for the first time, can be used for monitoring the daily state of personnel, the running state of a machine and the like, and can be used for early warning in time to help enterprises, institutions and the like to better manage.
Example 2
In addition, the embodiment of the disclosure provides an audio anomaly detection device.
Specifically, as shown in fig. 6, the audio abnormality detection apparatus 600 includes:
a construction module 610, configured to construct an initial detection model based on the variation network and the generation network;
a first generating module 620, configured to generate an audio feature tensor based on the initial time stamp out audio data;
an input module 630, configured to input the audio feature tensor into the initial detection model, and output a first random variable and a second random variable through the initial detection model;
a training module 640, configured to train the initial detection model according to an optimization function to obtain a modified detection model;
a second generating module 650, configured to input the first random variable and the second random variable into the modified detection model, and generate a reconstruction tensor corresponding to the audio feature tensor;
a calculating module 660, configured to perform anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor;
a determining module 670, configured to determine that there is an abnormality in the initial card punching audio data if the abnormality score is greater than or equal to an abnormality threshold.
The audio anomaly detection apparatus 600 provided in this embodiment can implement the audio anomaly detection method provided in embodiment 1, and is not described herein again to avoid repetition.
The audio frequency anomaly detection device provided by the embodiment combines a variational self-encoder to jointly encode time and spatial data, performs anomaly detection on continuous time-stamped audio frequency of the same target for the first time, can be used for monitoring daily states of personnel, machine running states and the like, gives an early warning in time, and helps enterprises, institutions and the like to better manage.
Example 3
Furthermore, an embodiment of the present disclosure provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the computer program executes the audio anomaly detection method provided in embodiment 1 when running on the processor.
The electronic device provided in the embodiment of the present invention may execute steps that may be executed by the audio anomaly detection apparatus in the above method embodiment, and details are not described again.
The electronic equipment that this embodiment provided combines variational autoencoder, jointly encodes time and space data, carries out anomaly detection to the audio frequency of checking card in succession of the same target for the first time, can be used to monitor personnel state every day, machine running state etc. and timely early warning helps enterprise, organ unit etc. to manage better.
Example 4
The present application also provides a computer-readable storage medium on which a computer program is stored, where the computer program, when executed by a processor, implements the audio anomaly detection method provided in embodiment 1.
In this embodiment, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The computer-readable storage medium provided in this embodiment may implement the audio anomaly detection method provided in embodiment 1, and is not described herein again to avoid repetition.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional like elements in the process, method, article, or terminal that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method of audio anomaly detection, the method comprising:
constructing an initial detection model based on a variation network and a generation network;
generating an audio feature tensor based on the initial card punching audio data;
inputting the audio feature tensor into the initial detection model, and outputting a first random variable and a second random variable through the initial detection model;
training the initial detection model according to an optimization function to obtain a corrected detection model;
inputting the first random variable and the second random variable into the correction detection model, and generating a reconstruction tensor corresponding to the audio feature tensor;
performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and if the abnormal score is larger than or equal to an abnormal threshold value, determining that the initial card punching audio data is abnormal.
2. The audio anomaly detection method according to claim 1, wherein the step of generating an audio feature tensor based on the initial card punching audio data comprises:
obtaining N 1 Initial card punching audio data;
preprocessing each initial card punching audio data to obtain N 1 Modifying the card punching audio data;
converting each of the modified punch-card audio data into a corresponding N 2 A feature data and N 2 Splicing the feature data into feature vectors;
n is to be 1 And splicing the eigenvectors into audio feature tensors.
3. The method of claim 2, wherein the step of preprocessing each of the initial card punching audio data comprises:
removing the background noise of each initial card punching audio data to obtain noise-reduced card punching audio data;
and sampling the noise reduction card punching audio data according to a preset frequency.
4. The audio anomaly detection method according to claim 1, characterized in that said initial detection model comprises: presetting a convolution layer, a preset deconvolution layer, a gate control circulation layer, a linear transformation layer and a full connection layer;
the variational network consists of a preset convolution layer, a preset anti-convolution layer and a gate control circulation layer;
the generation network is composed of a preset deconvolution layer, a gate control circulation layer, a linear transformation layer and a full connection layer.
5. The audio anomaly detection method of claim 4, wherein said step of training said initial detection model according to an optimization function comprises:
the optimization function is:
Figure M_221130152759917_917203001
wherein the content of the first and second substances,
Figure M_221130152800232_232143001
which is indicative of a loss of training,
Figure M_221130152800263_263383002
the mathematical expectation that the tensor of audio features is represented,
Figure M_221130152800310_310258003
representing a posterior probability of the generating network to the audio feature tensor,
Figure M_221130152800341_341506004
representing a posterior probability of the variational network to the audio feature tensor,
Figure M_221130152800390_390299005
the degree of divergence of the KL is expressed,
Figure M_221130152800406_406448006
is a constant, theta is a layer parameter of the generation network, and theta is a layer parameter of the variation network, \981;
adjusting theta and \981byrandom gradient variation estimation and re-parameterization, and calculating according to the adjusted theta and \981
Figure M_221130152800453_453315001
When the temperature is higher than the set temperature
Figure M_221130152800484_484574001
When the loss is smaller than the loss threshold value, storing the adjusted theta and \981.
6. The method according to claim 5, wherein the step of generating the reconstruction tensor corresponding to the audio feature tensor comprises:
mapping the first random variable through the linear transformation layer to obtain a mapping result;
inputting the second random variable into the preset deconvolution layer to obtain a deconvolution result;
connecting the mapping result and the deconvolution result to obtain a connection result;
and decoding the connection result through the full connection layer to obtain the reconstruction tensor.
7. The method according to claim 1, wherein the step of performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio feature tensor comprises:
sampling the reconstruction tensor to obtain L reconstruction samples;
carrying out Monte Carlo integration on the L reconstruction samples to obtain reconstruction probability;
and taking the inverse number of the reconstruction probability to obtain the abnormal score.
8. An audio anomaly detection apparatus, the apparatus comprising:
the construction module is used for constructing an initial detection model based on the variation network and the generation network;
the first generation module is used for generating an audio feature tensor based on the initial card punching audio data;
the input module is used for inputting the audio feature tensor into the initial detection model and outputting a first random variable and a second random variable through the initial detection model;
the training module is used for training the initial detection model according to an optimization function to obtain a corrected detection model;
a second generating module, configured to input the first random variable and the second random variable into the modified detection model, and generate a reconstruction tensor corresponding to the audio feature tensor;
the calculation module is used for performing anomaly evaluation calculation on the reconstruction tensor to obtain an anomaly score corresponding to the audio characteristic tensor;
and the determining module is used for determining that the initial card punching audio data is abnormal if the abnormal score is greater than or equal to an abnormal threshold.
9. An electronic device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, performs the audio anomaly detection method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the audio anomaly detection method of any one of claims 1 to 7.
CN202211552884.2A 2022-12-06 2022-12-06 Audio anomaly detection method and device, electronic equipment and storage medium Pending CN115565525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211552884.2A CN115565525A (en) 2022-12-06 2022-12-06 Audio anomaly detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211552884.2A CN115565525A (en) 2022-12-06 2022-12-06 Audio anomaly detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115565525A true CN115565525A (en) 2023-01-03

Family

ID=84769976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211552884.2A Pending CN115565525A (en) 2022-12-06 2022-12-06 Audio anomaly detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115565525A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994609A (en) * 2023-09-28 2023-11-03 苏州芯合半导体材料有限公司 Data analysis method and system applied to intelligent production line

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112466290A (en) * 2021-02-02 2021-03-09 鹏城实验室 Abnormal sound detection model training method and device and computer storage medium
US11010666B1 (en) * 2017-10-24 2021-05-18 Tunnel Technologies Inc. Systems and methods for generation and use of tensor networks
US11075933B1 (en) * 2019-03-27 2021-07-27 Ca, Inc. Abnormal user behavior detection
CN113255835A (en) * 2021-06-28 2021-08-13 国能大渡河大数据服务有限公司 Hydropower station pump equipment anomaly detection method
CN114386521A (en) * 2022-01-14 2022-04-22 湖南师范大学 Method, system, device and storage medium for detecting abnormality of time-series data
CN114400019A (en) * 2021-12-31 2022-04-26 深圳市声扬科技有限公司 Model generation method, abnormality detection device, and electronic apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11010666B1 (en) * 2017-10-24 2021-05-18 Tunnel Technologies Inc. Systems and methods for generation and use of tensor networks
US11075933B1 (en) * 2019-03-27 2021-07-27 Ca, Inc. Abnormal user behavior detection
CN112466290A (en) * 2021-02-02 2021-03-09 鹏城实验室 Abnormal sound detection model training method and device and computer storage medium
CN113255835A (en) * 2021-06-28 2021-08-13 国能大渡河大数据服务有限公司 Hydropower station pump equipment anomaly detection method
CN114400019A (en) * 2021-12-31 2022-04-26 深圳市声扬科技有限公司 Model generation method, abnormality detection device, and electronic apparatus
CN114386521A (en) * 2022-01-14 2022-04-22 湖南师范大学 Method, system, device and storage medium for detecting abnormality of time-series data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHIHAN LI等: "Multivariate time series anomaly detection and interpretation using hierarchical inter-metric and temporal embedding", 《KDD "21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994609A (en) * 2023-09-28 2023-11-03 苏州芯合半导体材料有限公司 Data analysis method and system applied to intelligent production line
CN116994609B (en) * 2023-09-28 2023-12-01 苏州芯合半导体材料有限公司 Data analysis method and system applied to intelligent production line

Similar Documents

Publication Publication Date Title
US20220342941A1 (en) Dark web content analysis and identification
Clifton et al. Novelty detection with multivariate extreme value statistics
US10452961B2 (en) Learning temporal patterns from electronic health records
CN112418059B (en) Emotion recognition method and device, computer equipment and storage medium
Mellor et al. Application of data mining to “big data” acquired in audiology: Principles and potential
CN115565525A (en) Audio anomaly detection method and device, electronic equipment and storage medium
CN115223251A (en) Training method and device for signature detection model, electronic equipment and storage medium
Shin et al. Extraction of acoustic features based on auditory spike code and its application to music genre classification
Yu et al. Semi-supervised learning and data augmentation in wearable-based momentary stress detection in the wild
Bui et al. Accuracy improvement of various short-term load forecasting models by a novel and unified statistical data-filtering method
Maharaj et al. Discrimination of locally stationary time series using wavelets
Maboudou-Tchao High-dimensional data monitoring using support machines
US20220015659A1 (en) Processing time-frequency representations of eeg data using neural networks
Ugot et al. Biometric fingerprint generation using generative adversarial networks
Kumar et al. Wavelet bispectrum-based nonlinear features for cardiac murmur identification
Sharifi et al. A cluster-based machine learning model for large healthcare data analysis
Schönfelder et al. Sparse regularized regression identifies behaviorally-relevant stimulus features from psychophysical data
Firpi et al. On prediction of epileptic seizures by means of genetic programming artificial features
Barreto et al. A unifying methodology for the evaluation of neural network models on novelty detection tasks
Mohammad et al. Tri-model classifiers for EEG based mental task classification: hybrid optimization assisted framework
Jayne et al. One-to-many neural network mapping techniques for face image synthesis
García-Ordás et al. Multispecies bird sound recognition using a fully convolutional neural network
Panayotova et al. One Approach to using R for Bayesian Analysis of Brain Signals
US20240127036A1 (en) Multi-event time-series encoding
CN117350461B (en) Enterprise abnormal behavior early warning method, system, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230103