WO2013190551A1 - Procédé de classification de sons de bris de verre présents dans un signal audio - Google Patents

Procédé de classification de sons de bris de verre présents dans un signal audio Download PDF

Info

Publication number
WO2013190551A1
WO2013190551A1 PCT/IL2013/050522 IL2013050522W WO2013190551A1 WO 2013190551 A1 WO2013190551 A1 WO 2013190551A1 IL 2013050522 W IL2013050522 W IL 2013050522W WO 2013190551 A1 WO2013190551 A1 WO 2013190551A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameters
psd
sounds
audio signal
characterizing
Prior art date
Application number
PCT/IL2013/050522
Other languages
English (en)
Inventor
Per-Olof Gutman
Noroz Akhlagi
Shmuel Melman
Original Assignee
Securitas Direct Ab
Crow Electronic Engineering Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Securitas Direct Ab, Crow Electronic Engineering Ltd. filed Critical Securitas Direct Ab
Priority to EP13742753.0A priority Critical patent/EP2864969A1/fr
Publication of WO2013190551A1 publication Critical patent/WO2013190551A1/fr

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/16Actuation by interference with mechanical vibrations in air or other fluid
    • G08B13/1654Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
    • G08B13/1672Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention in some embodiments thereof, relates to audio signal analysis, and, more particularly, but not exclusively, to audio signal analysis for detection of glass breakage and, yet more particularly, but not exclusively, to a method and device enabling computationally efficient audio signal analysis for detection of glass breakage sounds.
  • the audio signal from movies or TV programs is segmented and classified into basic types such as speech, music, song, environmental sound, speech with music background, environmental sound with music background, silence, etc.
  • Simple audio features including the energy function, the average zero-crossing rate, the fundamental frequency, and the spectral peak tracks are extracted to ensure the feasibility of real-time processing.
  • a heuristic rule-based procedure is proposed to segment and classify audio signals and built upon morphological and statistical analysis of the time-varying functions of these audio features. Experimental results show that the proposed scheme achieves an accuracy rate of more than 90% in audio classification;
  • the first method employs a linear autoregressive (AR) model to extract linear features of the audio data, and then perform classification on these features, i.e., the AR coefficients.
  • AR coefficient estimations based on least squares and higher order statistics are considered in this study.
  • the second approach uses nonlinear predictors to model the audio data and then classifies the signals according to the prediction errors.
  • the real-life audio radar data set used here was collected by an AN/PPS-15 ground surveillance radar and consists of 13 different target classes, which include men marching, a man walking, airplanes, a man crawling, boats, etc. It is found that each classification method has some classes which are difficult to classify.
  • the AR feature extraction approach is most effective and has a correct classification rate of 88% for the training data and 67% for data not used for training;
  • Non-uniformity among sensor recordings may occasionally preclude using PSDs, or other delicate metrics, to classify sounds. Indeed, in such cases, traditional crude methods may be more robust to sensor, sensor location, room acoustics, and other variations.
  • a typical PSD condenses a signal into a limited number of real numbers, for example 128 real numbers, corresponding for example to 128 frequency bins up to the Nyquist frequency.
  • the PSD is computed in one go, using a whole record of a sound event, where a signal of 0.5 sec duration at 44kHz corresponds to a vector containing approximately 22,000 real numbers. Not all hardware/software supports storing such a vector, so the many samples of the sound event are processed serially, and only a few, for example 128 calculated parameters, are retained for further analysis.
  • the PSD is optionally computed sequentially from successive PSDs, each successive PSD computed for overlapping time segments. Such a method can save memory, but not always computational complexity.
  • an alternative method is used: computing an autoregressive model (AR-model), for example as computed by the Matlab arx command, or the corresponding recursive Matlab command, rarx, which are optionally coded on a glass break detector processor.
  • AR-model autoregressive model
  • the number of identified AR-parameters is suggested to be on the order of 30, that is, a captured signal is condensed into approximately 30 numbers only, computed in a recursive manner, handling each sample when it arrives without a need to save the sample.
  • an impulse response of the AR-model is an estimate of what a sound would be if provoked by a single impulse, such as an audio response of a glass pane when broken by a sudden blow.
  • the AR-model also acts as a "prewhitening" filter of a captured sound, that is, when a captured sound is passed through an inverse of the AR- model, white noise is produced.
  • glass break sound signals are defined as ranges of AR parameters.
  • sound signals which produce even one AR-parameter outside the glass break AR range is considered a non-glass break event.
  • the PSD of a captured sound signal is optionally computed, even on a glass-break-detector processor, which is considered weaker than today's typical personal computers.
  • the PSD optionally computed from an AR model is equivalent to a PSD computed by a batch windowed FFT, and is optionally used to classify events into glass break events and other events, by comparing the PSD with glass break standards, or by computing a power spectrum in selected frequency ranges and comparing with glass break standards.
  • an iterative method is used for processing sound and classifying acoustic signatures.
  • the iterative method can be used to provide an advantage of on-the-fly computing, without saving too much of a stream of digitized audio samples in memory.
  • a relatively simple and inexpensive processing unit such as an ARM-7, or a Cortex M3, may be used.
  • a simple, inexpensive processing unit especially such as described above, which are already produced in large numbers for use in mobile phones, provides a benefit of lowering cost of the security device, while still providing smart processing and classification of sound.
  • a method for detecting sounds of breaking glass within an audio signal including receiving an audio signal, digitizing the audio signal, producing a digitized audio signal, producing a plurality of autoregressive (AR) model parameters based, at least in part, on the digitized audio signal, comparing the plurality of AR parameters to a characterizing set of AR parameters associated with sounds of breaking glass, and determining whether the audio signal includes sounds of breaking glass based on a result of the comparison.
  • AR autoregressive
  • the receiving, the digitizing, the producing, the comparing, and the determining are performed by an alarm device.
  • the producing a plurality of AR parameters is performed by an iterative process.
  • the characterizing set of AR parameters includes an upper limit value and a lower limit value for each one of the AR parameters, the upper and lower limits defining a range within which a single AR parameter is within the characterizing set.
  • the characterizing set of AR parameters includes a central value for each one of the AR parameters and a range of departure from the central value for each one of the AR parameters, the central value and the range of departure defining whether an AR parameter is within the characterizing set.
  • the determining includes determining that each one of the plurality of AR parameters is within the characterizing set. According to some embodiments of the invention, the determining includes determining that a threshold percentage of the plurality of AR parameters is within the characterizing set.
  • PSD Power Spectrum Density
  • the characterizing PSD values include an upper limit to PSD values and a lower limit to PSD values for at least some of a range of PSD values, the upper and lower limits defining a range within which PSD values are within the characterizing PSD values.
  • the characterizing PSD values include a central PSD value and a maximum allowed delta from the central PSD value for at least some of the range of PSD values, the central PSD value and the maximum allowed delta defining a range within which PSD values are within the characterizing PSD values.
  • the determining includes determining that all of the PSD values are within the characterizing PSD values. According to some embodiments of the invention, the determining includes determining that a threshold percentage of the PSD values are within the characterizing PSD values.
  • a method for preparing a characterizing set of AR parameters for use in detecting sounds of breaking glass within an audio signal including capturing an audio signal belonging to the specific class of sounds, digitizing the audio signal, producing a digitized audio signal, producing a plurality of AR parameters based, at least in part, on the digitized audio signal, storing the plurality of AR parameters as a characterizing set of AR parameters associated with sounds of breaking glass.
  • PSD Power Spectrum Density
  • the storing including storing the upper and lower limits associated with the specific class of sounds.
  • the storing including storing the upper and lower limits associated with the specific class of sounds.
  • the capturing and the digitizing are performed at a first location, further including transmitting the digitized audio signal to a second location, and the producing the plurality of AR parameters is performed at the second location.
  • Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.
  • a data processor such as a computing platform for executing a plurality of instructions.
  • the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data.
  • a network connection is provided as well.
  • a display and/or a user input device such as a keyboard or mouse are optionally provided as well.
  • FIG. 1 is a graph showing AR parameters of various sounds and upper and lower limits to each one of the AR parameters according to an example embodiment of the invention in use for detecting sounds of glass breaking;
  • FIG. 2 is a graph showing optional PSD values of various sounds and upper and lower limits to the PSD values according to the example embodiment of FIG. 1;
  • FIG. 3 is a simplified flow chart representation of a method for detecting sounds of breaking glass, according to an example embodiment of the invention
  • FIG. 4 is a simplified flow chart representation of a method for preparing a characterizing set of AR parameters for use in detecting sounds of breaking glass within an audio signal according to an example embodiment of the invention
  • FIG. 5 is a simplified flow chart representation of a method of using an AR filter method to classify sounds according to an example embodiment of the invention.
  • the present invention in some embodiments thereof, relates to audio signal analysis for detection of glass breakage and, more particularly, but not exclusively, to a method and device allowing computationally efficient audio signal analysis for detection of glass breakage.
  • the present invention in some embodiments thereof, relates to applying a pre- whitening Auto Regressive (AR) filter model to an audio signal.
  • AR Auto Regressive
  • the AR-model condenses an audio signal possibly consisting of tens, hundreds, or many thousands of samples into about 30 AR parameters, and may include an optional computation of the Power Spectral Density (PSD) of the audio signal.
  • PSD Power Spectral Density
  • Classification of the audio signal into, for example, a glass break event or a non-glass break event is based on the various measures of the AR-parameters and/or the PSD.
  • an audio signal is captured and digitized.
  • An Auto Regressive (AR) filter model is used to condense the captured audio signal, which may include tens or even hundreds of thousands of samples, to a smaller number of parameters, for example about 20, 30, 40, 50 or more parameters.
  • the AR filter model also enables optionally computing a Power Spectral Density (PSD) of the captured signal.
  • PSD Power Spectral Density
  • the AR parameters, and optionally also the PSD parameters are compared to a set of parameters which characterize a specific class of sounds, for example sounds of glass breaking. The comparison enables a determination whether the audio signal belongs to the specific class of sounds, and/or whether the audio signal contains sounds which belong to the specific class of sounds.
  • classification of an audio signal into glass break, and non-glass break, sound events is optionally based on various measures of the PSDs and on AR- parameters of the sound events.
  • An efficient feature of a method according to an embodiment of the present invention is that the method enables computation of time- varying AR-parameters, such as one AR- model for each sub-interval of a sound signal, and optionally also time-varying PSDs, to form additional classification measures.
  • post-processing of the AR-parameters and the PSDs are tuned so as to distinguish between glass break, and non-glass break events.
  • the comparison includes determining whether values of the computed parameters are between upper and lower bounds for a class of sounds. If the values of the computed parameters are in a range between the upper and the lower bounds for a specific class of sounds, the audio signal may belong to the specific class of sounds.
  • the comparison includes determining whether values of the computed parameters are less than a specific delta away from characterizing values of the parameters for the class of sounds. If the values of the computed parameters are less than the specific delta away from characterizing values of the parameters for the class of sounds, the audio signal may belong to the specific class of sounds.
  • the comparison includes determining whether each one of the computed parameters are within the characterizing set, optionally as determined by the above-described comparisons.
  • the audio signal is considered to belong to the specific class of sounds if each one of the computed parameters is within the characterizing set.
  • the comparison includes determining what percentage of the computed parameters are within the characterizing set, optionally as determined by the above-described comparisons.
  • the audio signal is considered to belong to the specific class of sounds if the percentage is above a specific threshold percentage, such as 99%, 98%, 95%, 90%, 80%, 75%, 66%, and 50%.
  • the threshold is set to correspond to a user's trade-off between type 1 errors (missed detection) and type 2 errors (false alarms) that the user is willing to suffer. In applications where a false alarm is much costlier than a missed detection, the threshold percentage is set to be high.
  • the present invention in some embodiments thereof, includes a method for preparing a characterizing set of AR parameters for use in detecting a specific class of sounds.
  • the preparing includes capturing an audio signal belonging to the specific class of sounds, for example breaking glass, digitizing the audio signal, producing AR parameters based on the digitized audio signal, and optionally also producing PSD parameters of the audio signal, and storing the parameters as characterizing parameters for the specific class of sounds.
  • each one of the samples is used to produce AR parameters, and optionally PSD parameters.
  • the several, or many, parameters are used to determine a high limit and a low limit to each of the parameters in the class of sounds.
  • the limits are saved as associated with the specific class of sounds.
  • the several, or many, parameters are used to determine a central value to each of the parameters in the class of sounds, and a delta value of a variation of each of the parameters in the class of sounds.
  • the central value and the delta value are saved as associated with the specific class of sounds.
  • the central value is optionally, for example, an average or a median.
  • the variation may is optionally, for example, a standard deviation, a multiple of a standard deviation such as three standard deviations, four standard deviations, six standard deviations, or a maximum delta of a parameter value from the parameter's central value.
  • an entire sound signal segment which is to be analyzed is stored for the analysis.
  • a sound signal segment which is to be analyzed is captured, digitized, and analyzed, producing parameters such as AR- parameters and/or PSD parameters, and only the parameters are kept, either short term or stored for further analysis.
  • an initial portion of the sound signal, from a triggering event to a decision whether to continue capturing the sound for analysis, is stored.
  • the stored signal is optionally classified by methods as described below, optionally followed sequentially by a remainder of the sound signal to be analyzed, optionally in real time.
  • the triggering event is optionally a noise loud enough to produce sounds which can be analyzed, that is, sounds having a maximum amplitude of sufficient amplitude to be suspect as glass breaking, and/or having sufficient low-frequency power to be suspect as glass breaking.
  • the triggering event is typical of what a simple loudness test of a captured signal would provide.
  • maximal signal excerpt length for analysis 29,000 samples, corresponding to 0.6576 seconds;
  • dead time length after triggering representing samples not to be included in the excerpt for analysis: 309 samples, corresponding to approximately 7 ms;
  • a word length of each sample 16 bits.
  • Matlab code which may be used to implement the example embodiment described below is provided in Appendix 1 below.
  • the process implemented by the Matlab code is now generally described as a process which includes:
  • a most suitable model order is optionally found.
  • an AR-model of order 30 gave rise to a computed signal spectrum (PSD) sufficiently similar to a PSD computed by direct application of a suitably windowed Fast Fourier Transform (etfe(data, 100,128) in Appendix 1).
  • an AR-model is defined as follows: Let y(k) be the k th sample, and z(k) a model prediction of y(k), given measurements y(k- 1), y(k-2), y(k-30).
  • the AR-model is defined as:
  • the coefficients a 1 ; a 3 o are those that
  • A(w) a 2 s 28 +...+ a 29 s + a 30 )
  • a power versus frequency for example IA(w)l 12 vs. w, or using a log-log scale (Bode diagram) lO-logio Aiw)! 12) vs. logio(w) are calculated and plotted.
  • a power versus frequency for example IA(w)l 12 vs. w, or using a log-log scale (Bode diagram) lO-logio Aiw)! 12) vs. logio(w)
  • a subsequent PSD computation and display is optionally performed normalized relative to its maximum value.
  • Figure 1 is a graph 100 showing AR parameters of various sounds and upper and lower limits to each one of the AR parameters according to an example embodiment of the invention in use for detecting sounds of glass breaking.
  • Figure 1 depicts parameter sets of AR-models for 14 non-glass-break events, each one of which includes at least one parameter which falls outside an envelope produced from parameter sets of AR-models for 33 glass break events.
  • the sounds of the 33 glass break events were produced by breaking glass panes embedded in window frames. The sounds were produced in a Glass Break laboratory.
  • Figure 1 has a Y-axis 110 depicting AR moving average (ARM A) parameter values, which are unit-less amplification and/or attenuation factors, and an X-axis 105 depicting an AR parameter number.
  • ARM A AR moving average
  • Figure 1 depicts 14 thin lines such as an example thin line 112. Each of the thin lines connects AR values of one captured non- glass -break audio signal from which the AR parameters were calculated.
  • Figure 1 also depicts thick lines, which are a lower limit 115 and an upper limit 120.
  • the lower limit 115 and the upper limit 120 are lower and upper limits for each AR parameter between which the AR parameters are classified as belonging to the specific class of sounds of glass breaking, according to the example embodiment of Figure 1.
  • the lower limit 115 and the upper limit 120 were produced from upper and lower limits of parameters measured for 33 glass break events.
  • a central value is defined for each AR parameter, and a delta value, or range by which a computed AR parameter may differ from the central value, are defined.
  • the central value and the delta also define lower and upper limits between which AR parameters are classified as belonging to the specific class of sounds of glass breaking, according to another example embodiment of the invention.
  • the lower limit 115 and the upper limit 120 are produced off-line, for example in a sound lab. Sound events of a specific class of sounds are produced, digitized, and a number of AR parameters are produced for each sound event. In some embodiments of the invention a number of sound events are produced, so as to provide a variety of sound events from the specific class.
  • choosing a good number of AR parameters for use in sound classification is optionally done by adding sound events of the class until the limits of a sound class change by less than a threshold amount, and or until limits are defined to differentiate between a class and some other sound class.
  • 30 AR parameters have been found to be suitable for classifying glass break sounds, providing a large enough number for classifying and a small enough number which does not require an undue computational load on a relatively simple and inexpensive processing unit, as described above in the Summary of the Invention.
  • choosing a suitable duration of a sound signal to digitize in order to classify the sound signal is optionally done according to known characteristics of the class of sounds. For example, panes of glass are broken in a laboratory, and the duration of the sound of glass breaking is optionally measured by an operator defining the duration of the sound of glass breaking by looking at a spectrum of the sound, optionally in a recording of the sound.
  • a duration of an audio sample for digitizing is set to be a maximal duration measured during the laboratory glass breakings.
  • the number of parameters is 30, and the following equations are used as an Auto-Regressive (AR) model of a captured audio signal:
  • z(k) -a ⁇ k-l) - a 2 y(k-2) - ... - a 30 y(k-30) Equation 1
  • k a sample number
  • z(k) an AR parameter value predicted by the model
  • y(k) a sample of the audio signal.
  • the parameters a 1; a 1; ... a 3 o are optionally found off-line (batch), optionally using a Least Squares method by minimizing:
  • Equation 2 may optionally be performed by using a standard least squares function of any one of the standard mathematical packages.
  • the lower limit 115 and the upper limit 120 are optionally a minimal a; and a maximal a; of parameter i in the samples.
  • AR-modeling is possible with a standard quality, off the shelf microphone. It is also noted that quality of classification can depend on, inter alia, the dynamic range of the microphone.
  • classification mode Using the learned classification parameters to classify sounds is termed herein classification mode.
  • the learning mode may be performed based on glass-break sounds produced in a laboratory; on glass-break sounds recorded in a laboratory; on glass-break sounds received by a glass-break detector placed in a laboratory for learning; on glass-break sounds received by a glass-break detector located in-situ, in the location where it will act as a glass-break detector.
  • classifying a sound as a glass breaking sound is optionally useful in security and/or devices, where a sound of glass breaking may be interpreted as a window breaking, and an alarm may be initiated.
  • an on-line implementation of the AR model is optionally desired.
  • An example embodiment of an on-line implementation of an iterative Least Squares implementation is now described, which minimizes the sum of squares of a difference between a recorded sound level and a model output at each sample, as per equation 2 above.
  • An example embodiment of the invention as implemented in real-time as iterative "Recursive" Least Squares (RLS) method
  • the AR parameters are computed using an iterative least squares method.
  • Matlab also includes a command for iterative, so called recursive, least squares procedure called rarx.
  • the rarx command is optionally used for comparison with a real time implementation described below, as well as with results obtained by an off-line least- square using arx, as described in item 5 of the section titled "An example embodiment of the invention as implemented in Matlab code”_above.
  • the code of rarx.m is based on a description in above-mentioned text book "System Identification - Theory for the User", found as open source code in the Matlab Identification Toolbox, and can be easily modified and adopted for the a real time implementation of an embodiment of the invention.
  • z(k) model output (prediction) for time k.
  • the following pseudo-code describes iterative calculations optionally used as an Auto-Regressive (AR) model of a captured audio signal:
  • K(k) P(k)0(k)/(1 + 0 T (k)P(k)0(k))
  • a system noise covariance matrix optionally provided by a user, is marked: R; a parameter estimate covariance matrix is marked: P(k); and
  • K(k) a Kalman filter gain matrix
  • the system noise covariance matrix is chosen as a variance of the noise when what is practically silence is recorded, using appropriate units, which are a square of the units of y.
  • the system noise covariance matrix R should be positive.
  • the matrix sizes correspond to the number of AR-parameters to be identified.
  • each k, P(k) is optionally checked to be positive definite.
  • the Matlab routine rarx does not perform such a check.
  • the check is computationally expensive.
  • One potential advantage of recursive modeling is that several models may be found, one for each time interval, for example for the first 0.3 seconds, for the next 0.3 seconds, for the following 0.3 seconds.
  • more indicators are optionally computed for use in classification, in that a matrix of AR-parameters is computed, [ ⁇ 1 ; ⁇ 2 , 0 M ], where ⁇ ; equals an AR-parameter vector for a sub-interval i.
  • two RLS-routines are computed in parallel, for example one to compute an AR-vector ⁇ valid for the whole interval, and one, with P-reset to compute sub-interval AR-vectors ⁇ . It is noted that optionally it is possible to run only the latter sub-interval AR-routine to compute ⁇ i, and then compute an estimate of ⁇ by an averaging procedure.
  • the above iterative method provides an advantage of not requiring saving an entire stream of digitized audio samples in memory.
  • Another potential advantage of the above iterative method is that an optional calculation of Power Spectral Density (PSD) of the audio samples is also enabled.
  • PSD Power Spectral Density
  • a normalized PSD is computed as follows:
  • A(w) l/(s +a ⁇ s + a 2 s +...+ a 29 s + a 30 ) Equation 3
  • A(w) is an amplitude of a sinus component of a Fourier transform
  • a power versus frequency for example IA(w)l 12 vs. w, or using a log-log scale (Bode diagram)
  • 10-logK)(IA(w)l 12) vs. logio(w), are calculated and plotted.
  • IA(w)l 12 from w to w 2 a numerical integration of the above is performed, for example by the Euler approximation.
  • PSD is typically computed by using an FFT, yet the above-mentioned method uses less memory than an FFT would use.
  • audio samples are digitized to capture audio frequencies up to approximately 20 KHz, at a rate of approximately 40,000 samples per second.
  • any time only approximately 280 samples are kept in memory, corresponding to approximately 7 milliseconds of sound. In some embodiments more or less samples may be kept in memory, for examples 1024 samples, or 100 samples.
  • the 7 milliseconds include an initial sample, which optionally serves to detect if there is a possible glass break or not, and a decision is optionally taken if further analysis, of additional sound captured after the initial 7 milliseconds, should be done.
  • a relatively simple and inexpensive processing unit such as an ARM-7, or a Cortex M3, may be used.
  • a simple, inexpensive processing unit especially such as described above, which are already produced in large numbers for use in mobile phones, provides a benefit of lowering cost of the security device, while still providing smart processing and classification of sound.
  • Figure 2 is a graph 200 showing optional PSD values of various sounds and upper and lower limits to the PSD values according to the example embodiment of Figure 1.
  • Figure 2 has a Y-axis 210 depicting PSD values, in units of power spectrum (sound amplitude squared and divided by 2) and an X-axis 105 depicting sound frequency in terms of rad/s.
  • Figure 2 depicts several thin lines such as example thin lines 212, each of which graphs optional PSD values of one captured non-glass-break audio signal.
  • Figure 2 also depicts thick lines, which are a lower limit 215 and an upper limit 220.
  • the lower limit 215 and the upper limit 220 are lower and upper limits between which the optional PSD values are classified as belonging to the specific class of sounds of glass breaking, according to the example embodiment of Figure 1.
  • Figure 2 is a Bode diagram of 14 power normalized non-glass event AR- computed PSDs, depicted by the thin lines 212, all of which, at least at some frequencies, fall outside an envelope of the lower limit 215 and the upper limit 220.
  • the lower limit 215 and the upper limit 220 were produced by taking an outer envelope of 33 AR-computed power-normalized PSDs of 33 glass-break sound events. The sound events were produced in the Crow Glass Break laboratory on 8 August 2011.
  • tuning for classification is performed by capturing two sets of audio measurements; one set of glass break events, and one set of non-glass break events. Capturing both glass break events and non-glass break events optionally enables tuning for the glass break sounds, and then optionally verifying that the non-glass break events do not produce an undue number of false alarms, beyond a certain acceptable limit.
  • a set of glass break events produce a set of AR-parameter vectors, or matrices, as described above. From the AR-parameters, PSD-values are optionally computed at a desired number of frequencies, also as described above.
  • i is a recording index
  • M is a recording index
  • a definition glass break set is defined as a set of glass break events which can characterize sounds of glass breaking.
  • a definition non-glass break set is defined as a set of non-glass break events, used in a tuning procedure.
  • a non-glass event is defined as one where at least one PSD-value, and at least one AR-parameter value fall outside the envelope.
  • the definition glass break set and the definition non-glass break set are optionally defined according to sounds produced at a factory or laboratory in physical settings defined by appropriate European standards.
  • the definition sets are optionally defined together with other appropriate glass-break and non-glass break events from other sources, such as field recordings from cases where prior art glass break detectors have failed.
  • a representative partial set of glass break detectors produced according to an embodiments of the invention are selected for discovering tuning parameters, being subjected to true glass break sounds, and/or to high-fidelity replay of these sounds, optionally in the physical setting defined by the European standards.
  • Classification values are computed according to the procedure in the preceding paragraph, and the tuning parameters envelope, r, and R are determined.
  • the tuning parameters are then optionally programmed into other glass break detectors which have not been selected for participating in the tuning.
  • the following procedure is optionally used to tune, and/or to further tune the envelope, r, and R.
  • An installation or service technician optionally provides tuning sounds at the site.
  • the tuning sounds are optionally sufficiently loud to activate the glass break detector.
  • the tuning sounds optionally include one or more of: a glass-break simulation such as a recording of glass breaking; one or more knocks, without breaking, on window panes in a room; and a non-glass-break simulator.
  • the tuning sounds are produced while optionally indicating to a tuning system, what sort of sound (glass-break simulation, glass pane knocks, and non-glass-break simulation) was produced.
  • the tuning system is built into a glass break detector.
  • the tuning system is a remote system.
  • the remote system is optionally provided with sounds captured at the site, by providing sounds recorded at the site, and/or by using sound transmission capabilities of some security detector to transmit sounds captured on site to the remote tuning system.
  • an analysis of the classification values can be done, and the envelope, r, R of the local sensors may optionally be modified remotely.
  • Figure 3 is a simplified flow chart representation of a method for detecting sounds of breaking glass within an audio signal, according to an example embodiment of the invention.
  • the method depicted by Figure 3 includes:
  • the method of Figure 3 is applied to the specific class of sounds comprises sounds of breaking glass.
  • the characterizing set of AR parameters comprises an upper limit value and a lower limit value for each one of the AR parameters, the upper and lower limits defining a range within which a single AR parameter is within the characterizing set.
  • the characterizing set of AR parameters comprises a central value for each one of the AR parameters and a maximum allowed delta from the central value for each one of the AR parameters, the central value and the maximum allowed delta defining a range within which a single AR parameter is within the characterizing set.
  • the determining includes determining that each one of the plurality of AR parameters is within the characterizing set.
  • the determining includes determining that a threshold percentage of the plurality of AR parameters is within the characterizing set.
  • Power Spectrum Density (PSD) values are computed, based on the digitized audio signal, and the comparing includes comparing the PSD values to characterizing PSD values associated with the specific class of sounds.
  • the characterizing PSD values include an upper limit to PSD values and a lower limit to PSD values for at least some of a range of PSD values, the upper and lower limits defining a range within which PSD values are within the characterizing PSD values.
  • the characterizing PSD values include a central PSD value and a maximum allowed delta from the central PSD value for at least some of the range of PSD values, the central PSD value and the maximum allowed delta defining a range within which PSD values are within the characterizing PSD values.
  • the determining includes determining that all of the PSD values are within the characterizing PSD values.
  • the determining includes determining that a threshold percentage of the PSD values are within the characterizing PSD values.
  • Figure 4 is a simplified flow chart representation of a method for preparing a characterizing set of AR parameters for use in detecting sounds of breaking glass within an audio signal according to an example embodiment of the invention.
  • the method of Figure 4 includes:
  • the specific class of sounds comprises sounds of breaking glass.
  • an upper limit value and a lower limit value for each one of the AR parameters are calculated, and the upper and lower limits are stored associated with the specific class of sounds.
  • a central value for each one of the AR parameters is calculated, and optionally also a maximum allowed delta from the central value for each one of the AR parameters.
  • the central value and the optional maximum allowed delta define a range within which a single AR parameter is within the characterizing set.
  • the storing includes storing the central value and the optional maximum allowed delta associated with the specific class of sounds.
  • Power Spectrum Density (PSD) values are computed, based on the digitized audio signal.
  • the PSD values are stored associated with the specific class of sounds.
  • an upper limit value and a lower limit value for the PSD values are calculated and stored associated with the specific class of sounds.
  • a central value for the PSD values and a maximum allowed delta from the central value for the PSD values are calculated, the central value and the maximum allowed delta defining a range within which a PSD value is within the characterizing PSD values, and stored associated with the specific class of sounds.
  • Figure 5 is a simplified flow chart representation of a method of using an AR filter method to classify sounds according to an example embodiment of the invention.
  • the method of Figure 5 includes:
  • sound events belonging to the specific class of sounds are produced in a laboratory.
  • the sounds are picked and AR parameters, and optionally also PSD values, are calculated as described above.
  • AR parameters and optionally also PSD values, are calculated as described above.
  • PSD values are produced.
  • upper and lower limits to the class of sounds such as glass breaking sounds.
  • central values and delta limits may be produced.
  • calibration may be done per each microphone type, or even per specific microphone.
  • calibration may be done per each housing type, or even per specific housing and/or device.
  • calibration is optionally done on- location, that is, at a site where the sound classification will take place.
  • Such calibration is optionally typical for security devices for detecting sounds of breaking glass, which may be done on location, with breaking glass sounds for calibration optionally produced at locations of actual windows, possibly at different directions from a security device, possibly even in different rooms.
  • calibration is optionally done on- location, optionally by a security device placed in learning mode. After a learning session has ended, the security device may optionally be switched to classifying mode.
  • compositions, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.
  • a unit or “at least one unit” may include a plurality of units, including combinations thereof.
  • range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
  • Appendix 1 includes sample Matlab code, which was used in an example embodiment of the invention as described above.
  • Varjength 29000; % signal length restricted to ca 0.5 s.
  • MAG30F []; %psd from arx
  • Nbits 16; %bits
  • suffg char(Or,O2',O3',O4',O5',O6V06-02V07',O8',O9V10',...
  • % y1 sscanf(String, ' %4x ',[1 inf]);
  • silent_smpl silent_smpl+1 ;
  • %pause y3 y2([(silent_smpl+ms_7_smpl):length(y2)]);
  • y4 y3(1 :min(Var_length,length(y3))); plot(silent_smpl+ms_7_smpl,y2(silent_smpl+ms_7_smpl),'r*') plot(silent_smpl+ms_7_smpl+min(Var_length,length(y3)),...
  • mag30alt abs(1./polyval(M30.a,exp(j*F30/Fs))); %arx freq vector
  • mag30F abs(1./polyval(M30.a,exp(j*F/Fs))); %orig freq vector
  • MAG30F [MAG30F,mag30F]; figure('Position', [0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4) ]);

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

L'invention concerne un procédé de détection d'une classe de sons, telle que le verre cassé, dans un signal audio. Le procédé consiste à recevoir un signal audio, à numériser le signal audio, à produire un signal audio numérisé, à produire une pluralité de paramètres de modèle autorégressif (AR) sur la base du signal audio numérisé, à comparer la pluralité des paramètres d'AR à une plage de caractérisation de paramètres d'AR associés à des sons de la classe particulière, et à déterminer si le signal audio comprend des sons de cette classe particulière, sur la base de la correspondance établissant si les paramètres d'AR se situent ou non dans la plage caractéristique. Une valeur de densité sprectrale peut aussi être calculée par une modélisation d'AR. L'invention concerne aussi un procédé d'apprentissage associé.
PCT/IL2013/050522 2012-06-21 2013-06-18 Procédé de classification de sons de bris de verre présents dans un signal audio WO2013190551A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP13742753.0A EP2864969A1 (fr) 2012-06-21 2013-06-18 Procédé de classification de sons de bris de verre présents dans un signal audio

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261662439P 2012-06-21 2012-06-21
US61/662,439 2012-06-21

Publications (1)

Publication Number Publication Date
WO2013190551A1 true WO2013190551A1 (fr) 2013-12-27

Family

ID=48906466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2013/050522 WO2013190551A1 (fr) 2012-06-21 2013-06-18 Procédé de classification de sons de bris de verre présents dans un signal audio

Country Status (2)

Country Link
EP (1) EP2864969A1 (fr)
WO (1) WO2013190551A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018052791A1 (fr) * 2016-09-13 2018-03-22 Walmart Apollo, Llc Système et procédés pour identifier une action sur la base d'une détection par le son
US10070238B2 (en) 2016-09-13 2018-09-04 Walmart Apollo, Llc System and methods for identifying an action of a forklift based on sound detection
US10656266B2 (en) 2016-09-13 2020-05-19 Walmart Apollo, Llc System and methods for estimating storage capacity and identifying actions based on sound detection
CN111895344A (zh) * 2020-07-27 2020-11-06 韶关学院 一种驱蚊灯

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008024624A2 (fr) * 2006-08-25 2008-02-28 Savic Research, Llc Diagnostic assisté par ordinateur d'une maladie des poumons
US20090312660A1 (en) * 2008-06-17 2009-12-17 Biorics Nv Recognition and localisation of pathologic animal and human sounds

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008024624A2 (fr) * 2006-08-25 2008-02-28 Savic Research, Llc Diagnostic assisté par ordinateur d'une maladie des poumons
US20090312660A1 (en) * 2008-06-17 2009-12-17 Biorics Nv Recognition and localisation of pathologic animal and human sounds

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TJÃ STHEIM D: "Recognition of Waveforms Using Autoregressive Feature Extraction", IEEE TRANSACTIONS ON COMPUTERS,, vol. 26, no. 3, 1 March 1977 (1977-03-01), pages 268 - 270, XP001345030 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018052791A1 (fr) * 2016-09-13 2018-03-22 Walmart Apollo, Llc Système et procédés pour identifier une action sur la base d'une détection par le son
US10070238B2 (en) 2016-09-13 2018-09-04 Walmart Apollo, Llc System and methods for identifying an action of a forklift based on sound detection
US10656266B2 (en) 2016-09-13 2020-05-19 Walmart Apollo, Llc System and methods for estimating storage capacity and identifying actions based on sound detection
CN111895344A (zh) * 2020-07-27 2020-11-06 韶关学院 一种驱蚊灯

Also Published As

Publication number Publication date
EP2864969A1 (fr) 2015-04-29

Similar Documents

Publication Publication Date Title
US11678013B2 (en) Methods and apparatus to determine a state of a media presentation device
Marchi et al. A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks
JP6377592B2 (ja) 異常音検出装置、異常音検出学習装置、これらの方法及びプログラム
Huang et al. Scream detection for home applications
RU2488815C2 (ru) Способ и устройство для классификации генерирующих звук процессов
US20120185418A1 (en) System and method for detecting abnormal audio events
KR100580643B1 (ko) 충격음 감지 장치, 방법 그리고 이를 이용한 충격음 식별장치 및 방법
CA2382122A1 (fr) Classification de sources sonores
CN106683687B (zh) 异常声音的分类方法和装置
WO2013190551A1 (fr) Procédé de classification de sons de bris de verre présents dans un signal audio
CN106683333B (zh) 设备安全检测方法及装置
Kiktova et al. Comparison of different feature types for acoustic event detection system
CN110890087A (zh) 一种基于余弦相似度的语音识别方法和装置
KR102314824B1 (ko) 딥러닝 기반 감지상황에서의 음향 사건 탐지 방법
Kiapuchinski et al. Spectral noise gate technique applied to birdsong preprocessing on embedded unit
EP3446296B1 (fr) Système de détection de bris de glace
KR20190046569A (ko) 음향기반 터널 사고 검지 시스템
JP5627962B2 (ja) 異常検知装置
CN105810222A (zh) 一种音频设备的缺陷检测方法、装置及系统
JP4926588B2 (ja) がいし放電音判別方法及びその装置
Singh et al. Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection.
CN116364108A (zh) 变压器声纹检测方法及装置、电子设备、存储介质
JP2010071773A (ja) 開閉部材の開閉動作認識装置、開閉部材の開閉動作認識方法
CN102789780B (zh) 基于谱时幅度分级向量辨识环境声音事件的方法
US10109298B2 (en) Information processing apparatus, computer readable storage medium, and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13742753

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE