EP2864969A1 - Method of classifying glass break sounds in an audio signal - Google Patents

Method of classifying glass break sounds in an audio signal

Info

Publication number
EP2864969A1
EP2864969A1 EP13742753.0A EP13742753A EP2864969A1 EP 2864969 A1 EP2864969 A1 EP 2864969A1 EP 13742753 A EP13742753 A EP 13742753A EP 2864969 A1 EP2864969 A1 EP 2864969A1
Authority
EP
European Patent Office
Prior art keywords
parameters
psd
sounds
audio signal
characterizing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP13742753.0A
Other languages
German (de)
French (fr)
Inventor
Per-Olof Gutman
Noroz Akhlagi
Shmuel Melman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Crow Electronic Engineering Ltd
Securitas Direct AB
Original Assignee
Crow Electronic Engineering Ltd
Securitas Direct AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Crow Electronic Engineering Ltd, Securitas Direct AB filed Critical Crow Electronic Engineering Ltd
Publication of EP2864969A1 publication Critical patent/EP2864969A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/16Actuation by interference with mechanical vibrations in air or other fluid
    • G08B13/1654Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
    • G08B13/1672Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

A method for detecting a class of sounds, such as breaking glass, within an audio signal, the method including receiving an audio signal, digitizing the audio signal, producing a digitized audio signal, producing a plurality of autoregressive (AR) model parameters based on the digitized audio signal, comparing the plurality of AR parameters to a characterizing range of AR parameters associated with sounds of the particular class, and determining whether the audio signal includes sounds of that particular based on the correspondence whether the AR parameters fall into the characteristic range. Power sprectral density value may also be derived by AR modeling. Related training method is also described.

Description

METHOD OF CLASSIFYING GLASS BREAK SOUNDS IN AN AUDIO SIGNAL
RELATED APPLICATION/S
This application is a PCT application claiming priority from U.S. Provisional Application number 61/662,439, filed on 21 June 2012, by Gutman et al.
FIELD AND BACKGROUND OF THE INVENTION
The present invention, in some embodiments thereof, relates to audio signal analysis, and, more particularly, but not exclusively, to audio signal analysis for detection of glass breakage and, yet more particularly, but not exclusively, to a method and device enabling computationally efficient audio signal analysis for detection of glass breakage sounds.
Several signal processing algorithms exist to analyze audio signals in order to detect glass breaks. Some are described in patents such as listed below, and some are reported in published literature, also listed below.
Additional background art includes:
US Patent 6,538,570 to Smith et al, titled "Glass-break detector and method of alarm discrimination";
US Patent 6,493,687 to Wu et al, titled "Apparatus and method for detecting glass break";
US Patent 6,236,313 to Eskildsen et al, titled "Glass breakage detector";
An article by Bemke I, and Zielonko R, titled "improvement of glass break acoustic signal detection via application of wavelet packet decomposition, published in METROLOGY AND MEASUREMENT SYSTEMS, Volume: 15, Issue: 4, Pages: 513-526, published: 2008;
An article by Gestner, B.; Tanner, J.; and Anderson, D. titled "Glass Break Detector Analog Front-End Using Novel Classifier Circuit" published in IEEE International Symposium on Circuits and Systems, ICAS 2007, pages 3586 - 3589, New Orleans, LA, 27-30 May 2007;
An article by Cowling M, and Sitte R, titled "Comparison of techniques for environmental sound recognition", published in Pattern Recognition Letters 24, 2895- 2907, 2003; an article by Zhang T. and Jay Kuo C.-C, titled "Audio Content Analysis for Online Audiovisual Data Segmentation and Classification", published in IEEE Transactions on Speech and Audio Processing, Vol. 9, No. 4., May 2001, which describes that while current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based on audio content analysis is proposed. The audio signal from movies or TV programs is segmented and classified into basic types such as speech, music, song, environmental sound, speech with music background, environmental sound with music background, silence, etc. Simple audio features including the energy function, the average zero-crossing rate, the fundamental frequency, and the spectral peak tracks are extracted to ensure the feasibility of real-time processing. A heuristic rule-based procedure is proposed to segment and classify audio signals and built upon morphological and statistical analysis of the time-varying functions of these audio features. Experimental results show that the proposed scheme achieves an accuracy rate of more than 90% in audio classification;
an article by McConaghy T. Leung, H., Bosse E. and Varadan V., titled "Classification of Audio Radar Signals Using Radial Basis Function Neural Networks", published in IEEE Transactions on Instrumentation and Measurement, Vol. 52, No. 6., December 2003, which describes radial basis function (RBF) neural networks which are used to classify real-life audio radar signals that are collected by a ground surveillance radar mounted on a tank. Currently, a human operator is required to operate the radar system to discern among signals bouncing off tanks, vehicles, planes, and so on. The objective of this project is to investigate the possibility of using a neural network to perform this target recognition task, with the aim of reducing the number of personnel required in a tank. Different signal classification methods in the neural net literature are considered. The first method employs a linear autoregressive (AR) model to extract linear features of the audio data, and then perform classification on these features, i.e., the AR coefficients. AR coefficient estimations based on least squares and higher order statistics are considered in this study. The second approach uses nonlinear predictors to model the audio data and then classifies the signals according to the prediction errors. The real-life audio radar data set used here was collected by an AN/PPS-15 ground surveillance radar and consists of 13 different target classes, which include men marching, a man walking, airplanes, a man crawling, boats, etc. It is found that each classification method has some classes which are difficult to classify. Overall, the AR feature extraction approach is most effective and has a correct classification rate of 88% for the training data and 67% for data not used for training;
L. Ljung, System Identification - Theory for the User, Prentice-Hall 1987, Section 7.3, pages 176-180;
an article by L Ljung, titled "Issues in system identification", published in Control Systems Magazine, IEEE, 1991; and
an article by BSI British Standards, DD CLC/TS 50131-2-7-1:2009, titled "Alarm systems— Intrusion and hold-up systems— Part 2-7-1: Intrusion detectors— Glass break detectors (acoustic)".
The disclosures of all references mentioned above and throughout the present specification, as well as the disclosures of all references mentioned in those references, are hereby incorporated herein by reference.
SUMMARY OF THE INVENTION
In an example embodiment of the invention, we can detect a possible glass-break event within a short period of time, such as 0.5 sec after a possible glass-break event and distinguish between a glass break sound event, and a non-glass break sound event.
Non-uniformity among sensor recordings, such as, for example, the same glass break event giving rise to different PSDs when different sensors are used, may occasionally preclude using PSDs, or other delicate metrics, to classify sounds. Indeed, in such cases, traditional crude methods may be more robust to sensor, sensor location, room acoustics, and other variations.
A typical PSD condenses a signal into a limited number of real numbers, for example 128 real numbers, corresponding for example to 128 frequency bins up to the Nyquist frequency.
In some embodiments of the invention, the PSD is computed in one go, using a whole record of a sound event, where a signal of 0.5 sec duration at 44kHz corresponds to a vector containing approximately 22,000 real numbers. Not all hardware/software supports storing such a vector, so the many samples of the sound event are processed serially, and only a few, for example 128 calculated parameters, are retained for further analysis.
In some embodiments of the invention, the PSD is optionally computed sequentially from successive PSDs, each successive PSD computed for overlapping time segments. Such a method can save memory, but not always computational complexity.
In some embodiments of the invention, an alternative method is used: computing an autoregressive model (AR-model), for example as computed by the Matlab arx command, or the corresponding recursive Matlab command, rarx, which are optionally coded on a glass break detector processor. Using the above method, the number of identified AR-parameters is suggested to be on the order of 30, that is, a captured signal is condensed into approximately 30 numbers only, computed in a recursive manner, handling each sample when it arrives without a need to save the sample.
In some embodiments of the invention, an impulse response of the AR-model is an estimate of what a sound would be if provoked by a single impulse, such as an audio response of a glass pane when broken by a sudden blow.
In some embodiments the AR-model also acts as a "prewhitening" filter of a captured sound, that is, when a captured sound is passed through an inverse of the AR- model, white noise is produced.
In some embodiments of the invention, glass break sound signals are defined as ranges of AR parameters.
In some embodiments, sound signals which produce even one AR-parameter outside the glass break AR range is considered a non-glass break event. Also, as described herein, with the AR-model, the PSD of a captured sound signal is optionally computed, even on a glass-break-detector processor, which is considered weaker than today's typical personal computers.
The PSD optionally computed from an AR model according to an example embodiment of the invention is equivalent to a PSD computed by a batch windowed FFT, and is optionally used to classify events into glass break events and other events, by comparing the PSD with glass break standards, or by computing a power spectrum in selected frequency ranges and comparing with glass break standards.
In some embodiments of the invention an iterative method is used for processing sound and classifying acoustic signatures. The iterative method can be used to provide an advantage of on-the-fly computing, without saving too much of a stream of digitized audio samples in memory.
In some embodiments of the invention, such as a glass break detector in a security device, a relatively simple and inexpensive processing unit, such as an ARM-7, or a Cortex M3, may be used. Using a simple, inexpensive processing unit, especially such as described above, which are already produced in large numbers for use in mobile phones, provides a benefit of lowering cost of the security device, while still providing smart processing and classification of sound.
According to an aspect of some embodiments of the present invention there is provided a method for detecting sounds of breaking glass within an audio signal, the method including receiving an audio signal, digitizing the audio signal, producing a digitized audio signal, producing a plurality of autoregressive (AR) model parameters based, at least in part, on the digitized audio signal, comparing the plurality of AR parameters to a characterizing set of AR parameters associated with sounds of breaking glass, and determining whether the audio signal includes sounds of breaking glass based on a result of the comparison.
According to some embodiments of the invention, the receiving, the digitizing, the producing, the comparing, and the determining are performed by an alarm device.
According to some embodiments of the invention, the producing a plurality of AR parameters is performed by an iterative process.
According to some embodiments of the invention, the characterizing set of AR parameters includes an upper limit value and a lower limit value for each one of the AR parameters, the upper and lower limits defining a range within which a single AR parameter is within the characterizing set.
According to some embodiments of the invention, the characterizing set of AR parameters includes a central value for each one of the AR parameters and a range of departure from the central value for each one of the AR parameters, the central value and the range of departure defining whether an AR parameter is within the characterizing set.
According to some embodiments of the invention, the determining includes determining that each one of the plurality of AR parameters is within the characterizing set. According to some embodiments of the invention, the determining includes determining that a threshold percentage of the plurality of AR parameters is within the characterizing set.
According to some embodiments of the invention, further including producing Power Spectrum Density (PSD) values based, at least in part, on the digitized audio signal, and in which the comparing further includes comparing the PSD values to characterizing PSD values associated with the specific class of sounds.
According to some embodiments of the invention, the characterizing PSD values include an upper limit to PSD values and a lower limit to PSD values for at least some of a range of PSD values, the upper and lower limits defining a range within which PSD values are within the characterizing PSD values.
According to some embodiments of the invention, the characterizing PSD values include a central PSD value and a maximum allowed delta from the central PSD value for at least some of the range of PSD values, the central PSD value and the maximum allowed delta defining a range within which PSD values are within the characterizing PSD values.
According to some embodiments of the invention, the determining includes determining that all of the PSD values are within the characterizing PSD values. According to some embodiments of the invention, the determining includes determining that a threshold percentage of the PSD values are within the characterizing PSD values.
According to an aspect of some embodiments of the present invention there is provided a method for preparing a characterizing set of AR parameters for use in detecting sounds of breaking glass within an audio signal, the method including capturing an audio signal belonging to the specific class of sounds, digitizing the audio signal, producing a digitized audio signal, producing a plurality of AR parameters based, at least in part, on the digitized audio signal, storing the plurality of AR parameters as a characterizing set of AR parameters associated with sounds of breaking glass.
According to some embodiments of the invention, further including calculating an upper limit value and a lower limit value for each one of the AR parameters, and the storing including storing the upper and lower limits associated with the specific class of sounds.
According to some embodiments of the invention, further including calculating a central value for each one of the AR parameters and a maximum allowed delta from the central value for each one of the AR parameters, the central value and the maximum allowed delta defining a range within which a single AR parameter is within the characterizing set, and the storing including storing the central value and the maximum allowed delta associated with the specific class of sounds.
According to some embodiments of the invention, further including producing Power Spectrum Density (PSD) values based, at least in part, on the digitized audio signal, and in which the storing further includes storing the PSD values to characterizing PSD values associated with the specific class of sounds.
According to some embodiments of the invention, further including calculating an upper limit value and a lower limit value for the PSD values, and the storing including storing the upper and lower limits associated with the specific class of sounds.
According to some embodiments of the invention, further including calculating a central value for the PSD values and a maximum allowed delta from the central value for the PSD values, the central value and the maximum allowed delta defining a range within which a PSD value is within the characterizing PSD values, and the storing including storing the upper and lower limits associated with the specific class of sounds.
According to some embodiments of the invention, the capturing and the digitizing are performed at a first location, further including transmitting the digitized audio signal to a second location, and the producing the plurality of AR parameters is performed at the second location.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.
Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.
For example, hardware for performing selected tasks according to embodiments of the invention could be implemented as a chip or a circuit. As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the invention, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data. Optionally, a network connection is provided as well. A display and/or a user input device such as a keyboard or mouse are optionally provided as well.
BRIEF DESCRIPTION OF THE DRAWINGS
Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
In the drawings:
FIG. 1 is a graph showing AR parameters of various sounds and upper and lower limits to each one of the AR parameters according to an example embodiment of the invention in use for detecting sounds of glass breaking;
FIG. 2 is a graph showing optional PSD values of various sounds and upper and lower limits to the PSD values according to the example embodiment of FIG. 1;
FIG. 3 is a simplified flow chart representation of a method for detecting sounds of breaking glass, according to an example embodiment of the invention; FIG. 4 is a simplified flow chart representation of a method for preparing a characterizing set of AR parameters for use in detecting sounds of breaking glass within an audio signal according to an example embodiment of the invention; and
FIG. 5 is a simplified flow chart representation of a method of using an AR filter method to classify sounds according to an example embodiment of the invention.
DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION
The present invention, in some embodiments thereof, relates to audio signal analysis for detection of glass breakage and, more particularly, but not exclusively, to a method and device allowing computationally efficient audio signal analysis for detection of glass breakage.
The present invention, in some embodiments thereof, relates to applying a pre- whitening Auto Regressive (AR) filter model to an audio signal. The AR-model condenses an audio signal possibly consisting of tens, hundreds, or many thousands of samples into about 30 AR parameters, and may include an optional computation of the Power Spectral Density (PSD) of the audio signal. Classification of the audio signal into, for example, a glass break event or a non-glass break event is based on the various measures of the AR-parameters and/or the PSD.
In some embodiments of the invention, an audio signal is captured and digitized. An Auto Regressive (AR) filter model is used to condense the captured audio signal, which may include tens or even hundreds of thousands of samples, to a smaller number of parameters, for example about 20, 30, 40, 50 or more parameters. The AR filter model also enables optionally computing a Power Spectral Density (PSD) of the captured signal. The AR parameters, and optionally also the PSD parameters, are compared to a set of parameters which characterize a specific class of sounds, for example sounds of glass breaking. The comparison enables a determination whether the audio signal belongs to the specific class of sounds, and/or whether the audio signal contains sounds which belong to the specific class of sounds.
For example, classification of an audio signal into glass break, and non-glass break, sound events is optionally based on various measures of the PSDs and on AR- parameters of the sound events. An efficient feature of a method according to an embodiment of the present invention is that the method enables computation of time- varying AR-parameters, such as one AR- model for each sub-interval of a sound signal, and optionally also time-varying PSDs, to form additional classification measures.
In some embodiments of the invention post-processing of the AR-parameters and the PSDs are tuned so as to distinguish between glass break, and non-glass break events.
In some embodiments of the invention the comparison includes determining whether values of the computed parameters are between upper and lower bounds for a class of sounds. If the values of the computed parameters are in a range between the upper and the lower bounds for a specific class of sounds, the audio signal may belong to the specific class of sounds.
In some embodiments of the invention the comparison includes determining whether values of the computed parameters are less than a specific delta away from characterizing values of the parameters for the class of sounds. If the values of the computed parameters are less than the specific delta away from characterizing values of the parameters for the class of sounds, the audio signal may belong to the specific class of sounds.
In some embodiments of the invention the comparison includes determining whether each one of the computed parameters are within the characterizing set, optionally as determined by the above-described comparisons. In some embodiments of the invention the audio signal is considered to belong to the specific class of sounds if each one of the computed parameters is within the characterizing set.
In some embodiments of the invention the comparison includes determining what percentage of the computed parameters are within the characterizing set, optionally as determined by the above-described comparisons. In some embodiments of the invention the audio signal is considered to belong to the specific class of sounds if the percentage is above a specific threshold percentage, such as 99%, 98%, 95%, 90%, 80%, 75%, 66%, and 50%.
In some embodiments of the invention the threshold is set to correspond to a user's trade-off between type 1 errors (missed detection) and type 2 errors (false alarms) that the user is willing to suffer. In applications where a false alarm is much costlier than a missed detection, the threshold percentage is set to be high. The present invention, in some embodiments thereof, includes a method for preparing a characterizing set of AR parameters for use in detecting a specific class of sounds.
In some embodiments of the invention, the preparing includes capturing an audio signal belonging to the specific class of sounds, for example breaking glass, digitizing the audio signal, producing AR parameters based on the digitized audio signal, and optionally also producing PSD parameters of the audio signal, and storing the parameters as characterizing parameters for the specific class of sounds.
In some embodiments of the invention several samples of the specific class of sounds are captured, even many samples. Each one of the samples is used to produce AR parameters, and optionally PSD parameters.
In some embodiments of the invention, the several, or many, parameters are used to determine a high limit and a low limit to each of the parameters in the class of sounds. The limits are saved as associated with the specific class of sounds.
In some embodiments of the invention, the several, or many, parameters are used to determine a central value to each of the parameters in the class of sounds, and a delta value of a variation of each of the parameters in the class of sounds. The central value and the delta value are saved as associated with the specific class of sounds. The central value is optionally, for example, an average or a median. The variation may is optionally, for example, a standard deviation, a multiple of a standard deviation such as three standard deviations, four standard deviations, six standard deviations, or a maximum delta of a parameter value from the parameter's central value.
In some embodiments of the invention, an entire sound signal segment which is to be analyzed is stored for the analysis.
In some embodiments of the invention, a sound signal segment which is to be analyzed is captured, digitized, and analyzed, producing parameters such as AR- parameters and/or PSD parameters, and only the parameters are kept, either short term or stored for further analysis.
In some embodiments of the invention, an initial portion of the sound signal, from a triggering event to a decision whether to continue capturing the sound for analysis, is stored. The stored signal is optionally classified by methods as described below, optionally followed sequentially by a remainder of the sound signal to be analyzed, optionally in real time.
In some embodiments of the invention, the triggering event is optionally a noise loud enough to produce sounds which can be analyzed, that is, sounds having a maximum amplitude of sufficient amplitude to be suspect as glass breaking, and/or having sufficient low-frequency power to be suspect as glass breaking. The triggering event is typical of what a simple loudness test of a captured signal would provide.
An example embodiment of the invention is now described with reference to identifying sounds of glass breaking.
The example embodiment has been performed under the following, non-limiting technical parameters:
sampling frequency: Fs=44100 Hz;
maximal signal excerpt length for analysis: 29,000 samples, corresponding to 0.6576 seconds;
a trigger level, for start of excerpt for analysis: 1/20 of maximal absolute signal amplitude;
frequency vector for PSD calculation: [1, 2,..., 128]-Fs-7u/128 rad/s;
dead time length after triggering, representing samples not to be included in the excerpt for analysis: 309 samples, corresponding to approximately 7 ms; and
a word length of each sample: 16 bits.
An example embodiment of the invention as implemented in Matlab code
Matlab code which may be used to implement the example embodiment described below is provided in Appendix 1 below. The process implemented by the Matlab code is now generally described as a process which includes:
1. Read an audio signal.
2. Find a maximum absolute amplitude of the audio signal = maxamp.
3. Select an excerpt of the audio signal that starts 309 samples after the trigger level, for example maxamp/20, has been reached, and let the signal encompass 29,000 samples, or less if the captured audio signal is too short for that.
4. Normalize the excerpt such its mean is 0, and its standard deviation is 1. It is noted that such a normalization is not done in a real time setting, and is also unnecessary when an AR-model is to be used, since the AR-model is normalized even when based on non-normalized data. However, in a real time setting, if an estimate of the signal strength is desired, the mean and standard deviation may be calculated in a recursive fashion.
5. Compute a 30th order (for example) AR-model by a least squares method, for example the Matlab function M30 = arx (data, 30) as described in Appendix 1 below.
Note: a most suitable model order is optionally found. In the presently described example embodiment an AR-model of order 30 gave rise to a computed signal spectrum (PSD) sufficiently similar to a PSD computed by direct application of a suitably windowed Fast Fourier Transform (etfe(data, 100,128) in Appendix 1).
In the example embodiment of the invention an AR-model is defined as follows: Let y(k) be the kth sample, and z(k) a model prediction of y(k), given measurements y(k- 1), y(k-2), y(k-30). The AR-model is defined as:
z(k) = -alY(k- l) - a2y(k-2) - ... - a30y(k-30)
By the least squares (LS) method, the coefficients a1 ; a3o are those that
2 2 2 minimize a sum of prediction errors = (y(l)-z(l)) + (y(2)-z(2)) + ... + (y(N)-z(N)) , where N is the number of samples.
6. Noting that the AR-model is independent of scaling of the signal y, the AR-model is used to compute a normalized PSD as follows, where A(w) stands for an amplitude of a sinus component of the Fourier transform, or Fourier series, of the modeled normalized glass-break signal at a frequency w [rad/s], with w e [0, (Fs/2)-2^, where Fs/2 [Hz] is the Nyquist frequency, and with s = ej(w Fs),
A(w) = a2s28+...+ a29s + a30)
In some embodiments of the invention, in order to find out the PSD, a power versus frequency, for example IA(w)l 12 vs. w, or using a log-log scale (Bode diagram) lO-logio Aiw)! 12) vs. logio(w), are calculated and plotted. To find out power content over a specific frequency band, for example from Wi to w2 [rad/s], IA(w)l 12 from Wi to w2, a numerical integration of the above is performed, for example by the Euler approximation.
In Appendix 1 below, a frequency vector for the PSD calculation, whose elements correspond to w, is F = [l,2,...,128] -Fs^/128 [rad/s] . A Matlab command to compute amplitudes of the Fourier series is mag30F = abs(l./polyval(M30.a,exp(j*F/Fs)));. A subsequent PSD computation and display is optionally performed normalized relative to its maximum value.
Reference is now made to Figure 1, which is a graph 100 showing AR parameters of various sounds and upper and lower limits to each one of the AR parameters according to an example embodiment of the invention in use for detecting sounds of glass breaking.
Figure 1 depicts parameter sets of AR-models for 14 non-glass-break events, each one of which includes at least one parameter which falls outside an envelope produced from parameter sets of AR-models for 33 glass break events. The sounds of the 33 glass break events were produced by breaking glass panes embedded in window frames. The sounds were produced in a Glass Break laboratory.
Figure 1 has a Y-axis 110 depicting AR moving average (ARM A) parameter values, which are unit-less amplification and/or attenuation factors, and an X-axis 105 depicting an AR parameter number.
Figure 1 depicts 14 thin lines such as an example thin line 112. Each of the thin lines connects AR values of one captured non- glass -break audio signal from which the AR parameters were calculated. Figure 1 also depicts thick lines, which are a lower limit 115 and an upper limit 120. The lower limit 115 and the upper limit 120 are lower and upper limits for each AR parameter between which the AR parameters are classified as belonging to the specific class of sounds of glass breaking, according to the example embodiment of Figure 1. The lower limit 115 and the upper limit 120 were produced from upper and lower limits of parameters measured for 33 glass break events.
It is noted that the first ARMA parameter (number zero) is by definition =1, so for the first ARMA parameter, an upper and a lower limit coincide.
In some embodiments of the invention, a central value is defined for each AR parameter, and a delta value, or range by which a computed AR parameter may differ from the central value, are defined.
The central value and the delta also define lower and upper limits between which AR parameters are classified as belonging to the specific class of sounds of glass breaking, according to another example embodiment of the invention. In some embodiments of the invention, the lower limit 115 and the upper limit 120 are produced off-line, for example in a sound lab. Sound events of a specific class of sounds are produced, digitized, and a number of AR parameters are produced for each sound event. In some embodiments of the invention a number of sound events are produced, so as to provide a variety of sound events from the specific class.
In some embodiments of the invention choosing a good number of AR parameters for use in sound classification is optionally done by adding sound events of the class until the limits of a sound class change by less than a threshold amount, and or until limits are defined to differentiate between a class and some other sound class.
By way of a non-limiting example, 30 AR parameters have been found to be suitable for classifying glass break sounds, providing a large enough number for classifying and a small enough number which does not require an undue computational load on a relatively simple and inexpensive processing unit, as described above in the Summary of the Invention.
In some embodiments of the invention choosing a suitable duration of a sound signal to digitize in order to classify the sound signal is optionally done according to known characteristics of the class of sounds. For example, panes of glass are broken in a laboratory, and the duration of the sound of glass breaking is optionally measured by an operator defining the duration of the sound of glass breaking by looking at a spectrum of the sound, optionally in a recording of the sound. In some embodiments of the invention, a duration of an audio sample for digitizing is set to be a maximal duration measured during the laboratory glass breakings.
In some embodiments of the invention, the number of parameters is 30, and the following equations are used as an Auto-Regressive (AR) model of a captured audio signal:
z(k) = -a^k-l) - a2y(k-2) - ... - a30y(k-30) Equation 1 where k = a sample number; z(k) = an AR parameter value predicted by the model; and y(k) = a sample of the audio signal.
The parameters a1; a1; ... a3o are optionally found off-line (batch), optionally using a Least Squares method by minimizing:
k=1:N ^ ~ z^2 Equation 2 Equation 2 may optionally be performed by using a standard least squares function of any one of the standard mathematical packages.
For each parameter a;, the lower limit 115 and the upper limit 120 are optionally a minimal a; and a maximal a; of parameter i in the samples.
It is noted that AR-modeling is possible with a standard quality, off the shelf microphone. It is also noted that quality of classification can depend on, inter alia, the dynamic range of the microphone.
The above description details how a set of upper and lower limits may be produced for AR parameters, so that sounds producing AR parameters between the upper and the lower limits may optionally be classified as belonging to a specific class of sounds.
The above described process of finding out sound classification parameters is termed herein a learning mode. Using the learned classification parameters to classify sounds is termed herein classification mode.
The learning mode may be performed based on glass-break sounds produced in a laboratory; on glass-break sounds recorded in a laboratory; on glass-break sounds received by a glass-break detector placed in a laboratory for learning; on glass-break sounds received by a glass-break detector located in-situ, in the location where it will act as a glass-break detector.
In some embodiments of the invention it is desirable to classify sounds as they happen. For example, classifying a sound as a glass breaking sound is optionally useful in security and/or devices, where a sound of glass breaking may be interpreted as a window breaking, and an alarm may be initiated.
In order to detect sound classes, such as a glass-break sound, as they happen, an on-line implementation of the AR model is optionally desired. An example embodiment of an on-line implementation of an iterative Least Squares implementation is now described, which minimizes the sum of squares of a difference between a recorded sound level and a model output at each sample, as per equation 2 above. An example embodiment of the invention as implemented in real-time as iterative "Recursive" Least Squares (RLS) method
In some embodiments the AR parameters are computed using an iterative least squares method.
Matlab also includes a command for iterative, so called recursive, least squares procedure called rarx. The rarx command, with the 'kf option, is optionally used for comparison with a real time implementation described below, as well as with results obtained by an off-line least- square using arx, as described in item 5 of the section titled "An example embodiment of the invention as implemented in Matlab code"_above. The code of rarx.m is based on a description in above-mentioned text book "System Identification - Theory for the User", found as open source code in the Matlab Identification Toolbox, and can be easily modified and adopted for the a real time implementation of an embodiment of the invention.
A description of the Recursive Least Squares method in Kalman Filter form is now provided. The following are the notations used in the description:
Model order: n
Measured scalar signal at time k: y(k), k = 0,1,2, ..., N-l
AR-model: z(k) = -ai(k)y(k-l) - a2(k)y(k-2) - ... - an(k)y(k-n) =
=O>T(k)0(k)
where 0>(k) = [-y(k-l) - y(k-2) ... - any(k-n)]T, and
0(k) = [ai(k) a2(k) ... an(k)]T, and
z(k) = model output (prediction) for time k.
"System noise" covariance matrix: R, which is a user given positive definite and
Symmetric (nxn) matrix
Parameter estimate covariance: P(k), which is a positive definite and symmetric
(nxn) matrix
Kalman filter gain matrix (nx l): K(k)
An initialization of the above variables at time k=0 is:
Φ(0) = [0 0 ... 0]T
Θ(0)= [0 0 ... 0]T
R = In n (other choices are possible, such as R = alnxn, where a>0) P(0) = βΙηχη, β>0. For example: β=100,000.
k=0
In some embodiments of the invention the following pseudo-code describes iterative calculations optionally used as an Auto-Regressive (AR) model of a captured audio signal:
for k = 0, 1, 2, ... N-l
K(k) = P(k)0(k)/(1 + 0T(k)P(k)0(k))
0(k+l) = 0(k) + K(k)- (y(k) - OT(k)0(k))
P(k+1) = P(k) + R(k) - K(k)0T(k)P(k)
k := k+l
The above iterative process produces, after N iterations, a result Θ(Ν), where: previous measurement samples are: < (k) = [-y(k- l) - y(k-2) ... - any(k-n)] ; a variance of the estimated AR parameters is given by the diagonal elements of
P(N);
current parameters are: 0(k) = [a^k) a2(k) ... an(k)] ;
a system noise covariance matrix, optionally provided by a user, is marked: R; a parameter estimate covariance matrix is marked: P(k); and
a Kalman filter gain matrix, computed in the first line of the iterative pseudocode above, is marked: K(k).
In some embodiments of the invention, the system noise covariance matrix is chosen as a variance of the noise when what is practically silence is recorded, using appropriate units, which are a square of the units of y. The system noise covariance matrix R should be positive.
The matrix sizes correspond to the number of AR-parameters to be identified.
In some embodiments each k, P(k) is optionally checked to be positive definite. For example, the Matlab routine rarx does not perform such a check. The check is computationally expensive. The check is optionally done by computing eigenvalues of P(k), for example by Gauss elimination of P(k) to a triangular form, and checking that all diagonal elements of the triangular form are positive. If, during off-line experimentation with rarx, a non-positive definite P(k) appears, a remedy is to increase a in R = alnxn. One potential advantage of recursive modeling is that several models may be found, one for each time interval, for example for the first 0.3 seconds, for the next 0.3 seconds, for the following 0.3 seconds. In some embodiments, in order to produce several models, the P-matrix is optionally reset at a beginning of each interval, for example P(i .3sec) = βΙηχη, β>0, i=0, l,..., (m-1), where the captured audio-signal is divided into m sub-intervals.
In some embodiments, more indicators are optionally computed for use in classification, in that a matrix of AR-parameters is computed, [Θ1 ; Θ2, 0M], where Θ; equals an AR-parameter vector for a sub-interval i.
In some embodiments two RLS-routines are computed in parallel, for example one to compute an AR-vector Θ valid for the whole interval, and one, with P-reset to compute sub-interval AR-vectors Θ . It is noted that optionally it is possible to run only the latter sub-interval AR-routine to compute ©i, and then compute an estimate of Θ by an averaging procedure.
The above iterative method provides an advantage of not requiring saving an entire stream of digitized audio samples in memory.
Another potential advantage of the above iterative method is that an optional calculation of Power Spectral Density (PSD) of the audio samples is also enabled.
Noting that the AR-model is independent of scaling of the signal y, in some embodiments of the invention a normalized PSD is computed as follows:
30 29 28
A(w) = l/(s +a\s + a2s +...+ a29s + a30) Equation 3 where A(w) is an amplitude of a sinus component of a Fourier transform, or
Fourier series, of a normalized signal at a frequency w [rad/s], with w e [0, (FS/2)-27T], where Fs/2 [Hz] is the Nyquist frequency, and with s = ej(w Fs).
In some embodiments of the invention, in order to find out the PSD, a power versus frequency, for example IA(w)l 12 vs. w, or using a log-log scale (Bode diagram)
10-logK)(IA(w)l 12) vs. logio(w), are calculated and plotted. To find out power content over a specific frequency band, for example from w to w2 [rad/s], IA(w)l 12 from w to w2, a numerical integration of the above is performed, for example by the Euler approximation. PSD is typically computed by using an FFT, yet the above-mentioned method uses less memory than an FFT would use.
In an example embodiment of the invention, used for detecting breaking glass in a security setting, audio samples are digitized to capture audio frequencies up to approximately 20 KHz, at a rate of approximately 40,000 samples per second.
Yet, in some embodiments, at any time only approximately 280 samples are kept in memory, corresponding to approximately 7 milliseconds of sound. In some embodiments more or less samples may be kept in memory, for examples 1024 samples, or 100 samples.
In some embodiments of the invention the 7 milliseconds include an initial sample, which optionally serves to detect if there is a possible glass break or not, and a decision is optionally taken if further analysis, of additional sound captured after the initial 7 milliseconds, should be done.
In some embodiments of the invention, such as a glass break detector in a security device, a relatively simple and inexpensive processing unit, such as an ARM-7, or a Cortex M3, may be used. Using a simple, inexpensive processing unit, especially such as described above, which are already produced in large numbers for use in mobile phones, provides a benefit of lowering cost of the security device, while still providing smart processing and classification of sound.
Reference is now made to Figure 2, which is a graph 200 showing optional PSD values of various sounds and upper and lower limits to the PSD values according to the example embodiment of Figure 1.
Figure 2 has a Y-axis 210 depicting PSD values, in units of power spectrum (sound amplitude squared and divided by 2) and an X-axis 105 depicting sound frequency in terms of rad/s.
Figure 2 depicts several thin lines such as example thin lines 212, each of which graphs optional PSD values of one captured non-glass-break audio signal. Figure 2 also depicts thick lines, which are a lower limit 215 and an upper limit 220. The lower limit 215 and the upper limit 220 are lower and upper limits between which the optional PSD values are classified as belonging to the specific class of sounds of glass breaking, according to the example embodiment of Figure 1. Figure 2 is a Bode diagram of 14 power normalized non-glass event AR- computed PSDs, depicted by the thin lines 212, all of which, at least at some frequencies, fall outside an envelope of the lower limit 215 and the upper limit 220. The lower limit 215 and the upper limit 220 were produced by taking an outer envelope of 33 AR-computed power-normalized PSDs of 33 glass-break sound events. The sound events were produced in the Crow Glass Break laboratory on 8 August 2011.
Tuning for classification
In some embodiments of the invention tuning for classification is performed by capturing two sets of audio measurements; one set of glass break events, and one set of non-glass break events. Capturing both glass break events and non-glass break events optionally enables tuning for the glass break sounds, and then optionally verifying that the non-glass break events do not produce an undue number of false alarms, beyond a certain acceptable limit.
It is noted that instructions for capturing such sound sets in a standardized way may be found in, for example, above mentioned reference BSI British Standards, DD CLC/TS 50131-2-7-1:2009, Alarm systems— Intrusion and hold-up systems— Part 2- 7-1: Intrusion detectors— Glass break detectors (acoustic).
A set of glass break events produce a set of AR-parameter vectors, or matrices, as described above. From the AR-parameters, PSD-values are optionally computed at a desired number of frequencies, also as described above.
For each sound recording, M classification values {cy} are optionally computed, where i is a recording index, and j=l,...,M. In the example depicted in Figure 1, 128 PSD-values were computed for 128 frequencies, for each sound recording, and for the same recordings, in Figure 2, 31 AR-values for each recording were computed, so that for these recordings M=159.
A definition glass break set is defined as a set of glass break events which can characterize sounds of glass breaking. A definition non-glass break set is defined as a set of non-glass break events, used in a tuning procedure.
For the definition glass break set, and for j=l,...,M, maxi(Cij)] defines a glass break envelope of allowed glass break classification values. In Figures 1 and 2, the envelopes are marked with black bold face. Tuning is optionally performed such that for the definition non-glass break set sufficiently many classification values r (r <M) fall outside the glass break envelope. A determination of r, and a specification of R = a set of classification values that are allowed to fall inside the glass break envelope, is an essential part of tuning. A new sound event is classified as a glass break event or as a non-glass break event according to a number of classification values inside the glass break envelope r, and R.
In the example embodiment depicted in Figures 1 and 2, a non-glass event is defined as one where at least one PSD-value, and at least one AR-parameter value fall outside the envelope.
Factory-set tuning
In some embodiments of the invention, the definition glass break set and the definition non-glass break set are optionally defined according to sounds produced at a factory or laboratory in physical settings defined by appropriate European standards.
In some embodiments of the invention, the definition sets are optionally defined together with other appropriate glass-break and non-glass break events from other sources, such as field recordings from cases where prior art glass break detectors have failed.
In some embodiments of the invention, a representative partial set of glass break detectors produced according to an embodiments of the invention are selected for discovering tuning parameters, being subjected to true glass break sounds, and/or to high-fidelity replay of these sounds, optionally in the physical setting defined by the European standards. Classification values are computed according to the procedure in the preceding paragraph, and the tuning parameters envelope, r, and R are determined. The tuning parameters are then optionally programmed into other glass break detectors which have not been selected for participating in the tuning.
In situ tuning
At a customer's site, the following procedure is optionally used to tune, and/or to further tune the envelope, r, and R.
An installation or service technician optionally provides tuning sounds at the site. The tuning sounds are optionally sufficiently loud to activate the glass break detector. The tuning sounds optionally include one or more of: a glass-break simulation such as a recording of glass breaking; one or more knocks, without breaking, on window panes in a room; and a non-glass-break simulator.
The tuning sounds are produced while optionally indicating to a tuning system, what sort of sound (glass-break simulation, glass pane knocks, and non-glass-break simulation) was produced.
In some embodiments of the invention the tuning system is built into a glass break detector.
In some embodiments of the invention the tuning system is a remote system. The remote system is optionally provided with sounds captured at the site, by providing sounds recorded at the site, and/or by using sound transmission capabilities of some security detector to transmit sounds captured on site to the remote tuning system. Remotely, an analysis of the classification values can be done, and the envelope, r, R of the local sensors may optionally be modified remotely.
Reference is now made to Figure 3, which is a simplified flow chart representation of a method for detecting sounds of breaking glass within an audio signal, according to an example embodiment of the invention.
The method depicted by Figure 3 includes:
receiving an audio signal (310);
digitizing the audio signal (315), producing a digitized audio signal;
producing a plurality of AR parameters (320) based, at least in part, on the digitized audio signal;
comparing the plurality of AR parameters to a characterizing set of AR parameters associated with sounds of breaking glass (325); and
determining whether the audio signal includes sounds of breaking glass (330).
In some embodiments of the invention, the method of Figure 3 is applied to the specific class of sounds comprises sounds of breaking glass.
In some embodiments of the invention the characterizing set of AR parameters comprises an upper limit value and a lower limit value for each one of the AR parameters, the upper and lower limits defining a range within which a single AR parameter is within the characterizing set. In some embodiments of the invention the characterizing set of AR parameters comprises a central value for each one of the AR parameters and a maximum allowed delta from the central value for each one of the AR parameters, the central value and the maximum allowed delta defining a range within which a single AR parameter is within the characterizing set.
In some embodiments of the invention the determining includes determining that each one of the plurality of AR parameters is within the characterizing set.
In some embodiments of the invention the determining includes determining that a threshold percentage of the plurality of AR parameters is within the characterizing set.
In some embodiments of the invention Power Spectrum Density (PSD) values are computed, based on the digitized audio signal, and the comparing includes comparing the PSD values to characterizing PSD values associated with the specific class of sounds.
In some embodiments of the invention the characterizing PSD values include an upper limit to PSD values and a lower limit to PSD values for at least some of a range of PSD values, the upper and lower limits defining a range within which PSD values are within the characterizing PSD values.
In some embodiments of the invention the characterizing PSD values include a central PSD value and a maximum allowed delta from the central PSD value for at least some of the range of PSD values, the central PSD value and the maximum allowed delta defining a range within which PSD values are within the characterizing PSD values.
In some embodiments of the invention the determining includes determining that all of the PSD values are within the characterizing PSD values.
In some embodiments of the invention the determining includes determining that a threshold percentage of the PSD values are within the characterizing PSD values.
Reference is now made to Figure 4, which is a simplified flow chart representation of a method for preparing a characterizing set of AR parameters for use in detecting sounds of breaking glass within an audio signal according to an example embodiment of the invention.
The method of Figure 4 includes:
capturing an audio signal belonging to the specific class of sounds (410);
digitizing the audio signal (415), producing a digitized audio signal; producing a plurality of AR parameters (420) based, at least in part, on the digitized audio signal;
storing the plurality of AR parameters as a characterizing set of AR parameters associated with sounds of breaking glass (425).
In some embodiments of the invention, the specific class of sounds comprises sounds of breaking glass.
In some embodiments of the invention, an upper limit value and a lower limit value for each one of the AR parameters are calculated, and the upper and lower limits are stored associated with the specific class of sounds.
In some embodiments of the invention, a central value for each one of the AR parameters is calculated, and optionally also a maximum allowed delta from the central value for each one of the AR parameters. The central value and the optional maximum allowed delta define a range within which a single AR parameter is within the characterizing set.
In some embodiments of the invention the storing includes storing the central value and the optional maximum allowed delta associated with the specific class of sounds.
In some embodiments of the invention Power Spectrum Density (PSD) values are computed, based on the digitized audio signal.
In some embodiments of the invention the PSD values are stored associated with the specific class of sounds.
In some embodiments of the invention an upper limit value and a lower limit value for the PSD values are calculated and stored associated with the specific class of sounds.
In some embodiments of the invention a central value for the PSD values and a maximum allowed delta from the central value for the PSD values are calculated, the central value and the maximum allowed delta defining a range within which a PSD value is within the characterizing PSD values, and stored associated with the specific class of sounds.
Reference is now made to Figure 5, which is a simplified flow chart representation of a method of using an AR filter method to classify sounds according to an example embodiment of the invention. The method of Figure 5 includes:
receiving an audio signal (510);
digitizing the audio signal (515), producing a digitized audio signal;
producing a plurality of AR parameters (520) based, at least in part, on the digitized audio signal;
comparing the plurality of AR parameters to a plurality of stored sets of AR parameters, each set associated with a class of sounds (525); and
classifying the audio signal as belonging to a class of sounds if the plurality of AR parameters is similar to one of the plurality of stored sets of AR parameters (530).
Several methods of producing classifying limits for a specific class of sounds, according to an example embodiment of the invention, are now described. The process of producing the limits is termed herein calibration.
In a first example embodiment sound events belonging to the specific class of sounds are produced in a laboratory. The sounds are picked and AR parameters, and optionally also PSD values, are calculated as described above. Based on the AR parameters, and optionally on the PSD values, upper and lower limits to the class of sounds, such as glass breaking sounds, are produced. Alternatively or additionally, central values and delta limits may be produced.
In some embodiments of the invention, especially if microphones with differing sensitivities are expected to be used in a classifying mode, calibration may be done per each microphone type, or even per specific microphone.
In some embodiments of the invention, especially if a housing of a microphone is expected to affect sound pickup, calibration may be done per each housing type, or even per specific housing and/or device.
In some embodiments of the invention, calibration is optionally done on- location, that is, at a site where the sound classification will take place. Such calibration is optionally typical for security devices for detecting sounds of breaking glass, which may be done on location, with breaking glass sounds for calibration optionally produced at locations of actual windows, possibly at different directions from a security device, possibly even in different rooms. In some embodiments of the invention, calibration is optionally done on- location, optionally by a security device placed in learning mode. After a learning session has ended, the security device may optionally be switched to classifying mode.
It is expected that during the life of a patent maturing from this application many relevant audio pick-up and digitizing devices, and many relevant processing units will be developed and the scope of the terms microphone, digitizer, and processing unit is intended to include all such new technologies a priori.
As used herein the terms "about" and "approximately" refer to ± 30 %.
The terms "comprising", "including", "having" and their conjugates mean "including but not limited to".
The term "consisting of is intended to mean "including and limited to".
The term "consisting essentially of means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.
As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a unit" or "at least one unit" may include a plurality of units, including combinations thereof.
The words "example" and "exemplary" are used herein to mean "serving as an example, instance or illustration". Any embodiment described as an "example" or "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.
The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". Any particular embodiment of the invention may include a plurality of "optional" features unless such features conflict.
Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting. Appendix 1
Appendix 1 includes sample Matlab code, which was used in an example embodiment of the invention as described above.
% File D:\extra\crow\Per_ld\Pld15.m
% to create spectra and arx-parameters for professionally recorded glassbreak
% and non-glassbreak sounds from experiments in lab, found in directory
% D:\extra\crow\GBD_08_11_basement\STEREO.
% Further compares these models against COMPOSITE
% glassbreak id AND ver bounds found in
% {d:\extra\crow\per_id\glassid.mat Union
% d:\extra\crow\per_id\glassver.mat}, from library of glass break sounds in d:\extra\crow\data % In this m-file NON GLASS BREAK EVENTS are treated.
% (See file Pid1.m for glass break events)
% Output: D:\extra\crow\GBD_08_11_basement\STEREO\non081 Istereo.mat
% and documentation Pld15.html cd D:\extra\crow\GBD_08_11_basement\STEREO
format compact
clear all
close all scrsz = get(0,'ScreenSize'); %left bottom width height
%%%%%%%%%%%%%% read and treat new experiments
Varjength =29000; % signal length restricted to ca 0.5 s.
Signl_PWR_Param =20; % Find the signal according to this factor.
PsD=D; %column = PSD
MAG30F =[]; %psd from arx
F=0; %rad/s
POL = []; %each col is a polynomial
ROO = D;
Fs=44100; %Hz %professional recording
F=[1 :128]'*Fs*pi/128; %rad/s freq vector for spectrum calc below
% Fs=32000; %Hz % microphones
Nbits = 16; %bits
ms_7_smpl= round(0.007*Fs); %7 ms, ms_7_smpl=309
%Professional recording of glassbreak
stringg = Ό';
suffg = char(Or,O2',O3',O4',O5',O6V06-02V07',O8',O9V10',...
Ί 1 ',Ί2',Ί 3','14','15','15-01 ',Ί 6','17','18','19','20','21 ','22',... '23','24V25','26','27V28','29','30','30-01 ');
%Professional recording of glass not breaking, or other sounds
stringn = 'Ο';
suffn = char('06-01 ','17-01 ','27-01 ','31 ','31-01 ','32','32-01 ','33','34', ...
'35735-01 ','36736-01 ','36-02');
%33 should be subdivided into several sound portions
%%%% non glassbreak %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[n,m]=size(suffn); for k=1 :n,
name=deblank([stringn,suffn(k,:)]); % F, and Fs given above
% for microphone readings
% fid = fopen(name,'r');
% String = fscanf(fid,'%s');
% y1 = sscanf(String, ' %4x ',[1 inf]);
[y1 ,Fss,Nbits] = wavread([name,'.wav']);
if max(abs(Fss-Fs))>0, disp(['k=',num2str(k),...
'. Sampling frequency error']); end
%Fs=44100, Nbits=16
y1 =y1 (:,1)'; %row vec y2 = (y1 -mean(y1 ))/max(abs(y1 -mean(y1 ))); %Normalize
figure('Position', ...
[0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4) ]);
plot(y2), grid on, xlabel('sample no'),
ylabel('normalized signal value \in [-1 ,1]')
title([name,' at 44100 Hz. std = ',num2str(std(y2)) ])
sound(y2,Fs)
%pause
Max_amp=max(abs(y2)); %Max Value found = 1
silent_smpl=1 ; %
while abs(y2(silent_smpl)) < (Max_amp/Signl_PWR_Param) %
silent_smpl = silent_smpl+1 ;
end
%disp(['silent_smpl = ',num2str(silent_smpl)]);
title([name,' at 44100 Hz. std=',num2str(std(y2)),...
' silent\_smpl=',num2str(silent_smpl) ])
hold on
plot(silent_smpl,y2(silent_smpl),'ro');
%pause y3 = y2([(silent_smpl+ms_7_smpl):length(y2)]);
y4=y3(1 :min(Var_length,length(y3))); plot(silent_smpl+ms_7_smpl,y2(silent_smpl+ms_7_smpl),'r*') plot(silent_smpl+ms_7_smpl+min(Var_length,length(y3)),...
y2(silent_smpl+ms_7_smpl+min(Var_length,length(y3))),'r*')
%pause
y5=(y4-mean(y4))/std(y4); %mean =0, std=1 , %normalized signal excerpt data = iddata(y5',[],1/Fs); %preparing data for matlab handling
S=etfe(data,100,128); %direct PSD computation by fft
PsD =[PsD,S.SpectrumData(:)];
%F = S.Frequency(:);
M30=arx(data,30); %AR-model computation
figure('Position',...
[0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4)]);
bode(S, M30), grid, %displaying Bode diagram of PSD, and of AR-model title(['PSD norm 0.6 sec of '.name,' etfe(data,100,128), arx(data,30)'] )
sound(y5/max(abs(y5)),Fs); %sound accepts only values \in [-1 ,1]
%pause
[MAG30,PHASE30,F30] = bode(M30);
mag30alt = abs(1./polyval(M30.a,exp(j*F30/Fs))); %arx freq vector
mag30F = abs(1./polyval(M30.a,exp(j*F/Fs))); %orig freq vector
%computation of PSD from AR-model
MAG30F = [MAG30F,mag30F]; figure('Position', [0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4) ]);
plot(F,sqrt(PsD(:,end))/max(sqrt(PsD(:,end))),F,mag30F/max(mag30F),...
F30,mag30alt/max(mag30alt),F30,MAG30(:)/max(MAG30(:)))
legend('sqrt(PSD)','mag30F=1/a at Fs','mag30alt=1/a at F30','MAG30=bode(M30)'), grid on
xlabel('rad/s'), ylabel('amp')
title(['normalized sqrt spectra of '.name,' computed differently'])
%pause figure('Position', [0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4)]);
loglog(F,sqrt(PsD(:,end))/max(sqrt(PsD(:,end))),F,mag30F/max(mag30F),...
F30,mag30alt/max(mag30alt),F30,MAG30(:)/max(MAG30(:)))
legend('sqrt(PSD)','mag30F','mag30alt','MAG30'), grid
xlabel('rad/s'), ylabel('amp')
title(['normalized sqrt spectra of '.name,' computed differently'])
%pause pol=M30.a; %AR-polynomial coefficient values
dev=M30.da; % standard deviations of AR-polynomial coefficient values
POL = [POL, pol'];
%' arx model A-polynomial and std'
%[pol',dev']; %'poles'
figure('Position', [0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4)]); %plot([pol+M30.da;pol;pol-M30.da]'),grid on, hold on
stairs([0.5:1 :31.5], [[pol+M30.da,pol(end)+M30.da(end)];...
[pol,pol(end)];[pol-M30.da,pol(end)-M30.da(end)]]'), hold on, grid on xlabel('parameter number'), ylabel('value')
title(['arx parameters +- 1 sigma for '.name])
%pause
[Wn ,Zn]=damp(roots(pol), 1 /Fs) ;
figure('Position', [0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4)]); plot(Wn,log10(Zn),'*') %shows freq of dominant peaks! xlabel('rad/s'), ylabel('log10(damping coefficient'), grid
title([name,' log damping coefficient as a function of nat freq of A-poles'])
%pause figure('Position', [0.59*scrsz(3) 0.35*scrsz(4) 0.41*scrsz(3) 0.55*scrsz(4)]); plot(roots(pol),'*'), title(['poles of arx model for '.name]), zgrid
ROO=[ROO,roots(pol)];
%pause end
%save glassbreak data from D:\extra\crow\GBD_08_11_basement\STEREO save non0811 stereo F PsD MAG30F POL ROO

Claims

WHAT IS CLAIMED IS:
1. A method for detecting sounds of breaking glass within an audio signal, the method comprising:
receiving an audio signal;
digitizing the audio signal, producing a digitized audio signal;
producing a plurality of autoregressive (AR) model parameters based, at least in part, on the digitized audio signal;
comparing the plurality of AR parameters to a characterizing set of AR parameters associated with sounds of breaking glass; and
determining whether the audio signal includes sounds of breaking glass based on a result of the comparison.
2. The method of claim 1 in which the receiving, the digitizing, the producing, the comparing, and the determining are performed by an alarm device.
3. The method of any one of the above claims in which the producing a plurality of AR parameters is performed by an iterative process.
4. The method of any one of the above claims in which the characterizing set of AR parameters comprises an upper limit value and a lower limit value for each one of the AR parameters, the upper and lower limits defining a range within which a single AR parameter is within the characterizing set.
5. The method of any one of claims 1-4 in which the characterizing set of AR parameters comprises a central value for each one of the AR parameters and a range of departure from the central value for each one of the AR parameters, the central value and the range of departure defining whether an AR parameter is within the characterizing set.
6. The method of any one of the above claims in which the determining comprises determining that each one of the plurality of AR parameters is within the characterizing set.
7. The method of any one of claims 1-5 in which the determining comprises determining that a threshold percentage of the plurality of AR parameters is within the characterizing set.
8. The method of any one of the above claims and further comprising producing Power Spectrum Density (PSD) values based, at least in part, on the digitized audio signal, and in which the comparing further comprises comparing the PSD values to characterizing PSD values associated with the specific class of sounds.
9. The method of claim 8 in which the characterizing PSD values comprise an upper limit to PSD values and a lower limit to PSD values for at least some of a range of PSD values, the upper and lower limits defining a range within which PSD values are within the characterizing PSD values.
10. The method of claim 8 in which the characterizing PSD values comprise a central PSD value and a maximum allowed delta from the central PSD value for at least some of the range of PSD values, the central PSD value and the maximum allowed delta defining a range within which PSD values are within the characterizing PSD values.
11. The method of any one of claims 8-10 in which the determining comprises determining that all of the PSD values are within the characterizing PSD values.
12. The method of any one of claims 8-10 in which the determining comprises determining that a threshold percentage of the PSD values are within the characterizing PSD values.
13. A method for preparing a characterizing set of AR parameters for use in detecting sounds of breaking glass within an audio signal, the method comprising:
capturing an audio signal belonging to the specific class of sounds;
digitizing the audio signal, producing a digitized audio signal;
producing a plurality of AR parameters based, at least in part, on the digitized audio signal; storing the plurality of AR parameters as a characterizing set of AR parameters associated with sounds of breaking glass.
14. The method of claim 13 and further comprising calculating an upper limit value and a lower limit value for each one of the AR parameters, and the storing comprising storing the upper and lower limits associated with the specific class of sounds.
15. The method of claim 13 and further comprising calculating a central value for each one of the AR parameters and a maximum allowed delta from the central value for each one of the AR parameters, the central value and the maximum allowed delta defining a range within which a single AR parameter is within the characterizing set, and the storing comprising storing the central value and the maximum allowed delta associated with the specific class of sounds.
16. The method of any one of claims 13-15 and further comprising producing Power Spectrum Density (PSD) values based, at least in part, on the digitized audio signal, and in which the storing further comprises storing the PSD values to characterizing PSD values associated with the specific class of sounds.
17. The method of claim 16 and further comprising calculating an upper limit value and a lower limit value for the PSD values, and the storing comprising storing the upper and lower limits associated with the specific class of sounds.
18. The method of claim 16 and further comprising calculating a central value for the PSD values and a maximum allowed delta from the central value for the PSD values, the central value and the maximum allowed delta defining a range within which a PSD value is within the characterizing PSD values, and the storing comprising storing the upper and lower limits associated with the specific class of sounds.
19. The method of any one of claims 13-18 in which:
the capturing and the digitizing are performed at a first location; further comprising transmitting the digitized audio signal to a second location; and
the producing the plurality of AR parameters is performed at the second location.
EP13742753.0A 2012-06-21 2013-06-18 Method of classifying glass break sounds in an audio signal Withdrawn EP2864969A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261662439P 2012-06-21 2012-06-21
PCT/IL2013/050522 WO2013190551A1 (en) 2012-06-21 2013-06-18 Method of classifying glass break sounds in an audio signal

Publications (1)

Publication Number Publication Date
EP2864969A1 true EP2864969A1 (en) 2015-04-29

Family

ID=48906466

Family Applications (1)

Application Number Title Priority Date Filing Date
EP13742753.0A Withdrawn EP2864969A1 (en) 2012-06-21 2013-06-18 Method of classifying glass break sounds in an audio signal

Country Status (2)

Country Link
EP (1) EP2864969A1 (en)
WO (1) WO2013190551A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180074162A1 (en) * 2016-09-13 2018-03-15 Wal-Mart Stores, Inc. System and Methods for Identifying an Action Based on Sound Detection
WO2018052776A1 (en) 2016-09-13 2018-03-22 Walmart Apollo, Llc System and methods for identifying an action of a forklift based on sound detection
US10656266B2 (en) 2016-09-13 2020-05-19 Walmart Apollo, Llc System and methods for estimating storage capacity and identifying actions based on sound detection
CN111895344A (en) * 2020-07-27 2020-11-06 韶关学院 Mosquito repelling lamp

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7479115B2 (en) * 2006-08-25 2009-01-20 Savic Research, Llc Computer aided diagnosis of lung disease
US20090312660A1 (en) * 2008-06-17 2009-12-17 Biorics Nv Recognition and localisation of pathologic animal and human sounds

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2013190551A1 *

Also Published As

Publication number Publication date
WO2013190551A1 (en) 2013-12-27

Similar Documents

Publication Publication Date Title
US11678013B2 (en) Methods and apparatus to determine a state of a media presentation device
Marchi et al. A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks
JP6377592B2 (en) Abnormal sound detection device, abnormal sound detection learning device, method and program thereof
Huang et al. Scream detection for home applications
RU2488815C2 (en) Method and apparatus for classifying sound-generating processes
US20120185418A1 (en) System and method for detecting abnormal audio events
CN107305774A (en) Speech detection method and device
KR100580643B1 (en) Appratuses and methods for detecting and discriminating acoustical impact
CA2382122A1 (en) Sound source classification
CN106683687B (en) Abnormal sound classification method and device
JP2012048689A (en) Abnormality detection apparatus
WO2013190551A1 (en) Method of classifying glass break sounds in an audio signal
Kiktova et al. Comparison of different feature types for acoustic event detection system
CN110890087A (en) Voice recognition method and device based on cosine similarity
KR102314824B1 (en) Acoustic event detection method based on deep learning
Kiapuchinski et al. Spectral noise gate technique applied to birdsong preprocessing on embedded unit
EP3446296B1 (en) Glass breakage detection system
KR20190046569A (en) Acoustic Tunnel Accident Detection System
JP5627962B2 (en) Anomaly detection device
CN105810222A (en) Defect detection method, device and system for audio equipment
JP4926588B2 (en) Insulation discharge sound discrimination method and apparatus
Singh et al. Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection.
CN102789780B (en) Method for identifying environment sound events based on time spectrum amplitude scaling vectors
CN116364108A (en) Transformer voiceprint detection method and device, electronic equipment and storage medium
JP2010071773A (en) Apparatus and method for recognizing opening/closing motion of opening/closing member

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20150121

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20170731

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20190615