GB2494511A - Digital sound identification - Google Patents

Digital sound identification Download PDF

Info

Publication number
GB2494511A
GB2494511A GB201215240A GB201215240A GB2494511A GB 2494511 A GB2494511 A GB 2494511A GB 201215240 A GB201215240 A GB 201215240A GB 201215240 A GB201215240 A GB 201215240A GB 2494511 A GB2494511 A GB 2494511A
Authority
GB
United Kingdom
Prior art keywords
sound
text
data
markov model
frequency domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB201215240A
Other versions
GB2494511B (en
GB201215240D0 (en
Inventor
Christopher James Mitchell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Audio Analytic Ltd
Original Assignee
Audio Analytic Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Audio Analytic Ltd filed Critical Audio Analytic Ltd
Priority to GB201215240A priority Critical patent/GB2494511B/en
Publication of GB201215240D0 publication Critical patent/GB201215240D0/en
Publication of GB2494511A publication Critical patent/GB2494511A/en
Application granted granted Critical
Publication of GB2494511B publication Critical patent/GB2494511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/16Actuation by interference with mechanical vibrations in air or other fluid
    • G08B13/1654Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
    • G08B13/1672Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19695Arrangements wherein non-video detectors start video recording or forwarding but do not generate an alarm themselves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/16Hidden Markov models [HMM]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions

Abstract

A digital sound identification system is disclosed, comprising: non-volatile memory for storing a Markov model; stored program memory storing processor control code; a sound data input; a processor coupled to said sound data input, to said working memory, and to said stored program memory for executing the processor control code, and wherein the processor control code comprises code to: input, from said sound data input, first sample sound data for a first sound to be identified, said first sample sound data defining first sample frequency domain data, said first sample frequency domain data defining an energy of said first sample in a plurality of frequency ranges; generate a first set of mean and variance values for at least a first Markov model of said first sample sound from said first sample frequency domain data; store said first Markov mode in said non-volatile memory; input interference sound data defining interference frequency domain data; adjust said mean and variance values of said first Markov model using said interference frequency domain data; input third sound data defining third sound frequency domain data; determine a probability of said third sound frequency domain data fitting at least said first Markov model; and output sound identification data dependent on said probability.

Description

Sound Identification Systems
FIELD OF THE INVENTION
This invention relates to systems, methods and computer program code for identifying sounds, and to related applications of such techniques.
BACKGROUND TO THE INVENTION
Basic sound identification systems are known but there is a need for improved techniques. Background prior art can be found in: US2006/227237; W099/5621 4; WO2008/0 16360; USS.276,629; Goldrnann. L (2006), "Extracting High level semantics by means of speech audio and image primitives in surveillance", 2006 WEE International Conference on Image Processing, page 2397-2400; Gonzalez. R (2007), "Enhancing video surveillance with audio events", 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications, page 6 1-66; and Mitchell, C.J (2007), "Aligning Music Genre Taxonomies". PhD Thesis, Anglia Ruskin University, United Kingdom.
SUMMARY OF THE INVENTION
According to a first aspect of the invention there is therefore provided a digital sound identification system, the system comprising: non-volatile memory for storing a Markov model; stored program memory storing processor control code; a sound data input; a processor coupled to said sound data input, to said working memory, and to said stored program memory for executing said processor control code, and wherein said processor control code comprises code to: input, from said sound data input, first sample sound data for a first sound to be identified, said first sample sound data defining first sample frequency domain data, said first sample frequency domain data defining an energy of said first sample in a plurality of frequency ranges; generate a first set of mean and variance values for at least a first Markov model of said first sample sound from said first sample frequency domain data; store said first Markov model in said non-volatile memory; input interference sound data defining interference frequency domain data; adjust said mean and variance values of said first Markov model using said interference frequency domain data; input third sound data defining third sound frequency domain data; determine a probability of said third sound frequency domain data fitting at least said first Markov model; and output sound identifIcation data dependent on said probability.
The mean and valiance data for the Markov model is generated from frequency domain data. In some embodiments the system inputs sound data in the time domain, for example, from an analogue-to-digital converter, and then converts this to the frequency domain. Additionally or alternatively however the system may operate on compressed sound data, for example MP3 data which is already encoded into frequency domain information.
Where a time-to-frequency conversion is performed, in embodiments this may employ a series of time-to-frequency conversions,for example fast fourier transforms, operating on successive, preferably overlapping time frames or windows, for example, 20 milliseconds windows with a 10 millisecond overlap. These perform a transform into a plurality of frequency ranges or bands, for example, quarter octave sub-band decomposition over a range which depends upon the sound(s) to be identified, for example, over a range of approximately 1-10 KHz. Optionally, following the time-frequency transformation a normalisation step may be performed to reduce effective changes in amplitude of the input signal. In embodiments an optional statistical decomposition may also be applied prior to generating the mean and variance values for the Markov model, in particular to simplify the modelled features. This statistical decomposition may comprise, for example, a principal component analysis (PCA) or independent component analysis (ICA). This is particularly helpful where poor quality microphones are being employed andlor complex sounds are being identified.
In embodiments the Markov model comprises a continuous hidden Markov model described by a plurality of states and transitions, with associated probabilities for the transitions between the states. This representation is resilient to changes in audio sampling rate, the use of compression, and input of relatively poor quaffly sound data.
In embodiments of the system the second sample sound data, defining second sample frequency domain data, may be employed to generate a second set of mean and variance values for a second Markov model, and interference sound data (which may be different to the interference sound data for the first Markov model) may be employed to update this second model. In this way a plurality of different Markov models may be defined each corresponding to a sound to be identified, and then the input (third) sound data may, effectively, be fitted to each of the stored Markov models to identify the most dosely matching model. In embodiments the mean and covariance values may be expressed and processed in the form of a covariance matrix.
The interference sound data may comprise. for example, typical background sound to the sound to be identified; this will in general depend upon the sound to be identified.
For example, in an application to identify when a person is becoming aggressive, for example in a hospital accident and emergency facility, typical background noise of such a facility may be employed as the interference sound data. Alternatively, in an application to detect when a cal' window has been smashed in a car park (when the sound to be identified may comprise the sound of the breaking window and/or a car alarm) the interference sound data may comprise, for example, car engine or traffic noise. In general the inteiference sound data may comprise either or both of general background noise for the sound to be identified and one or more specific expected interfering sounds, for example, in an outdoor location an emergency vehicle siren. In a straightforward implementation the mean and variance values determined from the interference sound sample, more particularly from frequency domain data for the interference sound sample, may simply be subtracted from the mean and variance values from the first or each sample sound.
In embodiments of the system a state of the Markov model is defined by a set of frequency ranges or bands comprising the frequency ranges or bands defined by the frequency domain data. Each state may be represented by one or more gaussian) distributions, each distribution being characterised by a mean and a variance value. The system may include a user interface to enable a user, for example at system setup, to define one or both of a number of states of the model, and a number of gaussian distributions employed per state.
Broadly speaking an input sample sound is processed by decomposition into frequency bands, and optionally dc-correlated, for example. using PCA!ICA, and then this data is compared to each Markov model to generate log likelihood ratio (LLR) data for the input sound to be identified. A (hard) confidence threshold may then be employed to determine whether or not a sound has been identified; if a "fIt" is detected to two or more stored Markov models then preferably the system picks the most probable. As the skilled person will understand a sound is "fitted" to a model by effectively comparing the sound to be identified with expected frequency domain data predicted by the Markov model. False positives are reduced by correcting/updating means and variances in the model based on interference (which includes background) noise.
Therefore according to a related aspect the invention provides a method of identifying a sound, the method comprising: configuring a Markov model using a sample of the sound; inputting a sample of background or interfering sound for a said sound to be identified; adjusting said Markov model to reduce false positive identification by using said sample of background or interfering sound to adjust mean and variance parameters characterising probability distributions associated with states of said Markov model: and using said adjusted Markov mode' to identify a said sound by comparing a sound to be identified with said adjusted rnode to determine a probability of said sound to be identified fitting said model by comparing said sound to be identified with expected frequency domain data predicted by said Markov model.
We also describe a method of producing a Markov model of a sound to be identified, comprising: configuring a Markov model using a sample of the sound; inputting a sample of background or interfering sound for a said sound to be identified; and adjusting said Markov model to reduce false positive identification by using said sample of background or interfering sound to adjust mean and variance parameters characterising probability distributions associated with states of said Markov model.
The invention in a complementary aspect provides a system for identifying a sound. the system comprising: means for configuring a Markov model using a sample of the sound; means for inputting a sample of background or interfering sound for a said sound to be identified; means for adjusting said Markov model to reduce fake positive identification by using said sample of background or interfering sound to adjust mean and variance parameters characterising probability distributions associated with states of said Markov model; and means for using said adjusted Markov model to identify a said sound by comparing a sound to be identified with said adjusted model to determine a probability of said sound to be identified fitting said model by comparing said sound to be identified with expected frequency domain data predicted by said Markov model.
The invention further provides a sound amplification or transmission system. the system comprising: a sound input to input a sound; a sound identification system to identify whether said sound fits a model; and an amplifier or transmitter to selectively amply or transmit said sound responsive to said identification.
Preferably the model comprises a Markov model generated as described above, although depending on the application correction for interfering noise may not be needed.
One preferred application is in a baby monitor or alarm in which the volume of the baby monitor is adjusted when specific sounds are identified, especially relatively quiet sounds as compared with the baby crying. Such sounds may include, for example.
cessation of breathing and/or choking (and thus the "sound" to be identified may, in embodiments, be an absence of an expected sound). In imp'ementations of such a system a delay is included between the sound input and the amplified and/or transmitted sound, albeit only a relatively short delay, to enable a sound to be identified and hence the volume or amplitude of the identified sound to be adjusted.
In another application a Markov model, in particular, as described above, is employed in a security system including a least one camera, to enable the camera to be controlled or moved responsive to identification of a particular type of sound, for example, breaking glass, a car alarm, or a gunshot.
Thus in a further aspect there is provided a security camera system having at least one controllable camera, at least one microphone to capture sound, and including a sound identification system to identify whether said sound fits a Markov model, wherein said system comprises a processor configured to change said camera information to be displayed on said system in response to said sound and/or to draw attention to said sound by an alert.
In some prefened embodiments the microphone employed is incorporated within one or more cameras of the system. Such microphones are generally of poor quality but in embodiments the improved sound identification techniques we employ enables poor quality input data to be employed whilst nonetheless accurately identifying one or more sounds. In response a camera can be, for example, caused to perform a pan movement or swivelled towards the audio (in the latter case it is preferable to employ two or more cameras/microphones to triangulate the source of the identified noise).
In a further related aspect there is provided a phone network having a plurality of phones coupled to an exchange or network controller, each of said phones having a microphone to listen to a local sound, the phone network including a sound identification system to identify whether said sound fits a model, the system further comprising a controller to enable and disable said microphones of said phones as input to said sound identification system for monitoring sounds in locations of said phones.
Broadly speaking in embodiments a set of phones connected to a PBX (private branch exchange) may be employed as a distributed break-in/intruder detection system by controlling the phones to enable their respective microphones to detect sound when a building is unoccupied. A sound identification system, in particular, as described above. may then be employed to identify sounds characteristic of an intruder, and to provide an alarm or alert either locally or remotely.
In a further aspect there is provide an aid for providing in-home assistance, the aid having a microphone to capture sound, and a sound identification system to identify whether said sound fits a model, and a controller coupled to said sound identification system to identify a sound designating a request for aid and to instigate a communication to a helper in response to said identification.
An aid as described above has particular applications in a system of the type employing a central in-house controller with a communication link, typically via a phone line, to enable a user, typically an elderly person, carrying a mobile unit with an alert button, to call for help from a friend, neighbour, relative or health care worker by pressing the button. Typically the mobile unit is provided on a lanyard, and may often be taken oil A sound identification system, in particuhr of the type described above, may be incorporated into the base unit to detect one or more trigger words or sounds, for example, claps or a cry for help, and then to alert a helper. In this way the base unit may provide backup in the event that the mobile unit has been removed or accidentally forgotten.
The skilled person will recognise that in embodiments of the above described sound identification system one or more Markov models may be generated and corrected for background noise/interference prior to being used in a target application. Thus the target application system need not include code to set up and configure the system.
Thus in a further aspect the invention provides a digital sound identification system, the system comprising: non-volatile memory for storing a Markov modeL stored program memory storing processor control code; a sound data input; a processor coupled to said-sound data input, to said working memory, and to said stored program memory for executing said processor control code and wherein said processor control code comprises code to: input sound data defining sound frequency domain data; determine a probability of said sound frequency domain data fitting at least said first Markov model; and output sound identification data dependent on said probability.
The skilled person will also recognise that in some applications colTection of the Markov model for interference may not be required.
Thus the invention still further provides a digital sound identification system, the system comprising: non-volatile memory for storing a Markov model; stored program memory storing processor control code; a sound data input; a processor coupled to said-sound data input, to said working memory, and to said stored program memory for executing said processor control code and wherein said processor control code comprises code to: input first samp'e sound data for a first sound to be identified from said sound data input, said first sample sound data defining first sample frequency domain data, said first sample frequency domain data defining an energy of said first sample in a plurality of frequency ranges; generate a first set of mean and variance vakes for at least a first Markov model of said first sample sound from said first sample frequency domain data; store said first Markov model in said non-volatile memory; input third sound data defining third sound frequency domain data; determine a probability of said third sound frequency domain data fitting at least said first Markov model; and output sound identification data dependent on said probability; wherein said sound data comprises compressed sound data representing a sound in said frequency domain.
The invention a'so provides processor control code to implement the above-described systems and methods, in particular on a data carrier such as a disk, CD-or DVD-ROM, programmed memory such as read-only memory (Firmware). Code (andlor data) to implement embodiments of the invention may comprise source, object or executable code in a conventional programming language (interpreted or compiled) such as C, or assembly code, code for setting up or controlling an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array), or code for a hardware description language such as Verilog (Trade Mark) or VHDL (Very high speed integrated circuit Hardware Description Language). As the skilled person will appreciate such code and/or data may be distributed between a p'urality of coupled components in communication with one another.
The invention further provides a data carrier storing one or more Pvlarkov models with means and varhmces generated and updated to compensate for background/interference as described above.
Aspects and embodiments of the system. methods and code we describe maybe used for non-audible sound signa's, in particular infrasound and ultrasound, and references to sound in this specification are not limited to audible sound.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other aspects of the invention will now be further described, by way of example only. with reference to the accompanying figures in which: Figure 1 illustrates the system setup to generate the Markov models.
Figure 2 illustrates how the Markov models can be generated from an example of compressed data.
Figure 3 illustrates a system setup to use generated Markov models for sound identification.
Figure 4 shows how to update a Markov model being used as part of the sound identification system.
Figure 5 shows an example of a preferred embodiment of the system in a CCTV system.
Figure 6 shows an example of a preferred embodiment of the system in a baby monitor system.
Figure 7 shows an example of a preferred embodiment of the system in an in-home assistance system.
Figure 8 shows an example of a preferred embodiment of the system in an office phone PBX system.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Figures 1-8 illustrates the system and various prefelTed embodiments of the system.
Referring to Figure 1, this shows the main system flow diagram and related equations for generating the Markov models.
Figure 1 is split into two sections that describe the two main parts of the system. The parameters of the models are generated bypassing many examples of the same type of sound into the system. The two main parts are now defined, as well as their related maths. The left part is interchangeable with the data that can be obtained from a compressed audio fomut' s spectral coefficients. The advantage of this is to improve executable time and improve system scalability.
Referring now to Figure 2, this shows a system diagram of a typical audio compression system working with a sound identification system. Thus Figure 2 shows an example of this system working with a generic mpeg audio compression stream or file. The frame analyser is used to obtain the frequency coefficients stored in a frame of compressed audio in this example the frequency components are stored in modified discrete cosine transforms MDCT from a set of pre-filtered PCM audio data. These frequency coefficients can then be mapped onto grouped frequency bands defined in the left part of Figure 1 whereby the main system takes over the operation.
The complementary, but parallel, system for uncompressed spectral decomposition will now be described. In Figure 1 are definitions of the various components of the spectral decomposition where w is the window, / is the number of the frame and H is the advance of the window (hop-size) in samples (although there are many window types (2n the Hannmg window w(n) = 0.5-0.5eos NI) is used here as a good general purpose window. A time-frequency matrix is then constructedX[K,and a normalisation applied to this time-Frequency matrix. This take the Form of a nomi where r1 is the norm gain value, I is the unit spectral vector, K is the number of spectral coefficients and 1. is the frame count.
lo illustrate further in practical terms the time-frequency matrix For a 44.1 KHz signal might be a 1024 point FFT with a 512 overlap. This is approximately a 20 milliseconds window with 10 millisecond overlap. The resulting 512 frequency bins are then grouped into sub hands, or example quarter-octave ranging between 62.5 to 8000 Hz giving 30 sub-hands, listed below are the lower limits of the frequency bands: 0,62.5,74.32544469,88.38834765,105.1 l205l9.l25,i48.6508894,l76.7766953, 210.2241 038,250,297.30l7788,353.5533906,420.4482076,500,594.6035575,70 7.1067812,840.89641 53.1000,1189.207115,1414.213562,1681.792831,2000,23 78.4i423,2828.427125,3363.585661,4000,4756.82846.5656.854249,6727.l7l3 A lookup table is used to map from the compressed or uncompressed frequency bands to the new sub-band representation hands. For the sample rate and S'l'l [ size example given the array might comprise of a (Bin size ÷ 2) x 6 array for each sampling-rate/bin number pair supported. l'he rows correspond to the hin number (centre) -S'lFl' size or number of frequency coefficients. The first two columns determine the lower and upper quarter octave bin index numbers. The following four columns determine the proportion of the bins magnitude that should he placed in the colTesponding quarter octave bin starting from the lower quarter octave defined in the First column to the upper quarter octave bin defined in the second column. e.g. if the bin overlaps two quarter octave ranges the 3 and 4 columns will have proportional values that sum to 1 and the 5 and 6 columns will have zeros. If a bin overlaps more than one sub-band more columns will have proportional magnitude values. This example models the critical hands in the human auditory system. This reduced time/frequency representation is then processed by the normalisation method ouflined. Ibis process is repeated for all frames incrementally moving the frame position by a hop size of 10 ms.. The overlapping window (hop size not equal to window size) improves the time-resolution of the system.
This is taken as an adequate representation of the frequencies of the signal which can be used to summarise the perceptual characteristics of the sound. The normalisation stage then takes each frame in the sub-band decomposition and divides by the square root of the average power in each sub-hand. Ihe average is calculated as the total power in all frequency bands divided by the number of frequency bands. This normalised time frequency matrix is the passed to the next section of the system where its mean, variances and transitions can he generated to fully characterise the sound's frequency distribution and temporal trends. The next stage of the sound characterisation requires further definitions. A continuous hidden Markov model is used to obtain the mean, variance and transitions needed for the modeL A Markov model can he completely characterised by A = (A,B,H) where A is the state transition probability matrix, B is the observation probability matrix and II is the state initialisation probability matrix.
In more Icrmal terms: A = [a,1jwhere ajj B = [h(in)]where b(n) P(o=v,iq,=s) II =kIwhcrez P(q1=S) Where q is the state value, C is the observation value. In Figure 1 the system needs to generate the state transition probability matrix, a state in our model is actually the frequency distribution characterised by a set of mean and variance data however the format definitions for this will be introduced later. Generating the modcl parameters is a matter of maximising the probability of an observation sequence. The Baum-Welch algorithm is an expectation maximisation procedure that has been used for doing just that. It is an iterative algorithm where each iteration is made up of two parts, the expectation F,(i, j) and the maximisation y,(i). In the expectation part, E,(i, j) and are computed given A, the current model values, and then in the maxiniisation A is step recalculated. These two steps alternate until convergence occurs. It has been shown that during this alternation process, (o I A) never decreases. Assume indicator variables z as Expectation -a(i)ah1 (O,Jfl,1(J) t -L, a(k)a11 b1)fl,1(l) = e/i.J) E[z1=y(i) and [z1=e,O,j) Ji if q=51 [9 otherwise if q=S1 andq1=SJ -1 ü otherwise Maximisation -t(i. ) aU >:= = :yu)'(o =V) i U) ft Gaussian mixture models can be used to represent the continuous frequency values, and expectation maximisation equations can then be derived for the component parameters (with suitable regularisation to keep the number of parameters in check) and the mixture proportions. Assume a scalar continuous frequency value, O,e 9t with a normal distribution p(olq,=s1,4 N(p,a) This implies that in state S. the frequency distribution is drawn from a normal distribution with mean p / and variance cj. The maximisation step equation is then L (i) 2 -L r1.( J)(° _ft)2 ci-Ihe use of Gaussians enables the characterisation oF the time-frequency matrix's features. In the case of a single Gaussian per state, they become the states. The tnmsition matrix oF the hidden Markov model can he obtained using the Baum-Welch algorithm to characterise how the frequency distribution of the signal change over time.
the Gaussians can be initialised using K-Means with the starting points for the clusters being a random frequency distribution chosen from sample data.
Figure 3 shows the main system flow diagram and related supporting equations for classification of new sound. thus Pigure 3 explains the systems operations for classifying new input and adapting for changes in the acoustic conditions -interference.
The first part operates the same as previously explained. After this previously explained part of the system a forward algorithm can he used to determine the most likely state path of an observation sequence and produce a probability in terms of a log likclihood that can he used to classify and incoming signal. Thc forward and backward procedures can he used to obtain this value from the previously calculated model parameters. In Fact only the forward part is needed. The forward variable a(i) is defined as the probability of observing the partial sequence {O] K O} until time I and being in S at time I, given the modelA.
P(OIK O,,q1=SI 2) This can be calculated by accumulating results and has two steps, initialisation and recursion as seen in Figure 3. a(i) explains the first I observations and ends in state S. This is multiplied by the probability a of moving to state s1 and because there are N possible previous states, there is a need to sum over all such possible previous S. The term b1 (÷) is then the probability of generating the next observation.
frequency distribution, while in state 5 at time I + 1. With these variables it is then straightforward to calculate the probability of a frequency distribution sequence.
P(0I2)=aT(i) Computing aji) has order O(N2T) and avoids complexity issues of calculating the probability of the sequence. The models will operate in many different acoustic conditions and as it is practically restrictive to present examples that are representative of all the acoustic conditions the system will come in contact with. internal adjustment of the models will he performed to enable the system to operate in all these different acoustic conditions. In figure 3 this is shown as the background box. Many different methods can he used for this update. A simplistic illustrative method is listed in Figure 4. which illustrates adjusting Markov models for different acoustic conditions.
Figure 4 takes an average value for the sub-hands, in this case the quarter octave frequency values for the last T number of seconds these averages are added to the model values to update the internal model of the sound in that acoustic environment.
Figure 5 shows a systcm operation diagram for CCTV camera system, more particularly a typical system set-up for the CCIV systcm. This may comprise a comprcssed feed coming from a camera, which contains the coefficients which can be extracted using a frame analyser. Ibis data is sent to a processor with the pre-derived modd parameters with the output and a set of threshold values arc used for classifying a set of audio inputs. This is then used to switch various cameras information to he displayed or drawn attention to by an alert.
Referring now to Figure 6 this shows system operation in use with a Baby monitor amplitude control. A system like the one described or similar in its ability to identify sounds can he used as an amplitude control input or light control input for drawing attention to sounds that would otherwise bc lost. For example sounds that are quiet and would otherwise be missed, such as breathing difficulties, cooing or chocking can be detected and used to control thc playback volume. This would function like a nonnal baby monitor hut have the additional feature of drawing attention to itself when sounds of concern are detected or not detected.
Figire 7 shows system operation in use with a home care help system. A sound recognition system like the one described or similar can be used as a home alert use in substitution of a panic button to detect calls for help trigged by specific input, for example 3 hand claps or 3 shoe bangs on the floor. Ihis would operate by a repeated pattern being detected followed by a pause where by the system wou'd confirm it had detected the pattem and ask for confirmation. Once the pattern had been reconfirmed the system would send a alert to a operator or message to another individual.
Figure 8 shows system operation in the use with an office phone system. A sound recognition system similar to the one described can be used to turn an office or home telephone network into an intruder detection system where a microphone is installed in the base of the phone for means of hand free communication and the microphone can he activated remotely based on a timer. l'his timer would activate the phones microphone at certain times e.g. night times and then detect sounds synonymous with break-ins, for example windows breaking or general movement. An alert is then triggered and sent to a camera system to point camera at that location, a message to an operator, control system to turn on lights or message to operator or security guard.
No doubt many other effective alternatives will occur to the skilled person. It will he understood that the invention is not limited to the descrihed embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.

Claims (1)

  1. <claim-text>CLAIMS: 1. A digital sound identification system, the system comprising: non-v&atile memory for storing a Markov model; stored program memory storing processor control code; a sound data input; a processor coupled to said sound data input, to said working memory, and to said stored program memory for executing said processor control code, and wherein said processor control code comprises code to: input, from said sound data input, first sample sound data for a first sound to be identified, said first sample sound data defining first sample frequency domain data, said first sample frequency domain data defining an energy of said first sample in a plurality of frequency ranges; generate a first set of mean and variance values for at least a first Markov model of said first sample sound from said first sample frequency domain data; store said first Markov model in said non-volatile memory; input interference sound data defining interference frequency domain data; adjust said mean and variance values of said first Markov model using said interference frequency domain data; input third sound data defining third sound frequency domain data; determine a probability of said third sound frequency domain data fitting at least said first Markov model; and output sound identification data dependent on said probability.</claim-text> <claim-text>2. A digital sound identification system as claimed in claim I wherein said processor control code comprises code to: input a plurality of different sample sound data sets for a plurality of different sounds to be identified; generate a colTesponding plurality of said Markov models; determine a probability of said third sound frequency domain data fitting each of said Markov models.</claim-text> <claim-text>3. A digital sound identification system as claimed in claim I or 2 wherein said processor control code comprises code to: convert said sound data from a timed domain to a frequency domain to generate said frequency domain data.</claim-text> <claim-text>4. A digital sound identification system as claimed in claim I or 2 wherein said sound data comprises compressed sound data representing a sound in said frequency domain.</claim-text> <claim-text>5. A digital sound identification system as claimed in any preceding claim wherein said processor control code comprises code to: generate a second set of mean and variance values for said first Markov model from said inteiference frequency domain data; and adjust said mean and variance values of said first Markov model using said second set of mean and variance values.</claim-text> <claim-text>6. A digital sound identification system as claimed in claim 5 wherein said code to adjust said mean and vanance values of said first Markov model comprises code to subtract said second set of mean and variance values from said first set of mean and variance values.</claim-text> <claim-text>7. A digital sound identification system as claimed in any preceding claim wherein a state of said Markov model is defined by a set of frequency ranges comprising said plurality of frequency ranges and is represented by one or more distributions each characterised by a pair of mean and variance values.</claim-text> <claim-text>8. A digital sound identification system as claimed in claim 7 wherein a said distribution comprises a gaussian distribution and wherein said processor control code comprises code to: enable a user to define one or both of a number of states and a number gaussian distributions per slate of said Markov model, wherein said mean and variance values comprise mean and valiance values of a said gaussian distribution.</claim-text> <claim-text>9. A method of identifying a sound, the method comprising: configuring a Markov model using a sample of the sound; inputting a sample of background or interfering sound for a said sound to be identified; adjusting said Markov model to reduce false positive identification by using said sample of background or interfering sound to adjust mean and variance parameters characterising probability distributions associated with states of said Markov model; and using said adjusted Markov model to identify a said sound by comparing a sound to be identified with said adjusted model to determine a probability of said sound to be identified fitting said model by comparing said Sound to be identified with expected frequency domain data predicted by said Markov model.</claim-text> <claim-text>10. A system for identifying a sound, the system comprising: means for configuring a Markov model using a sample of the sound; means for inputting a sample of background or interfering sound for a said sound to be identified; means for adjusting said Markov model to reduce false positive identification by using said sample of background or interfering sound to adjust mean and variance parameters characterising probability distributions associated with states of said Markov model; and means for using said adjusted Markov model to identify a said sound by comparing a sound to be identified with said adjusted model to determine a probability of said sound to be identified fitting said model by comparing said sound to be identified with expected frequency domain data predicted by said Markov model.</claim-text> <claim-text>11. A sound amplification or transmission system, the system comprising: a sound input to input a sound; a sound identification system to identify whether said sound fits a model; and an ampfifier or transmitter to selectively amply or transmit said sound responsive to said identification.I</claim-text> <claim-text>12. A sound amplification or transmission system as claimed in claim 11 further comprising a digital delay to match a processing time of said identification.</claim-text> <claim-text>13. A sound amplification or transmission system as claimed in claim 11 or 12 wherein said sound identification system is as recited in any one of claims 1 to 8 and 10.</claim-text> <claim-text>14. A baby monitor or alarm compnsing the system of claim ii, 12 or 13, wherein said sound comprises a sound produced by a baby.</claim-text> <claim-text>15. A security camera system having at least one controllable camera, at least one microphone to capture sound, and including a sound identification system to identify whether said sound fits a Markov modeL in particular as recited in any one of claims 1 to 8 and 10, wherein said system comprises a processor configured to change said camera information to be displayed on said system in response to said sound and/or to draw attention to said sound by an alert.</claim-text> <claim-text>16. A phone network having a plurality of phones coupled to an exchange or network controller, each of said phones having a microphone to listen to a thcal sound, the phone network including a sound identification system to identify whether said sound fits a model, in particular as recited in any one of claims 1 to 8 and 10, the system further comprising a controller to enable and disable said microphones of said phones as input to said sound identification system for monitoring sounds in locations of said phones.</claim-text> <claim-text>17. A security system comprising a phone network as claimed in claim 16, wherein a said sound comprises a sound identifying an intruder.</claim-text> <claim-text>18. An aid for providing in-home assistance, the aid having a microphone to capture sound, and a sound identification system to identify whether said sound fits a model, in particular as recited in any one of claims I to 8 and 10, and a controller coupkd to said sound identification system to identify a sound designating a request for aid and to instigate a communication to a helper in response to said identification.I</claim-text> <claim-text>19. A digital sound identification system, the system comprising: non-v&atile memory for storing a Markov model; stored program memory storing processor control code; a sound data input; a processor coupled to said-sound data input, to said working memoly. and to said stored program memory for executing said processor control code and wherein said processor control code comprises code to: input sound data defining sound frequency domain data; determine a probability of said sound frequency domain data fitting at least said first Markov model; and output sound identification data dependent on said probability.</claim-text> <claim-text>20. A digital sound identification system, the system comprising: non-v&atile memory for storing a Markov model; stored program memory storing processor control code; a sound data input; a processor coupled to said-sound data input, to said working memoly, and to said stored program memory for executing said processor control code and wherein said processor control code comprises code to: input first sample sound data for a first sound to be identified from said sound data input, said first sample sound data defining first sample frequency domain data, said first sample frequency domain data defining an energy of said first sample in a plurality of frequency ranges; generate a first set of mean and variance values for at least a first Markov model of said first sample sound from said first sample frequency domain data; store said first Markov model in said non-volatile memory; input third sound data defining third sound frequency domain data; determine a probability of said third sound frequency domain data fitting at least said first Markov model: and output sound identification data dependent on said probability; wherein said sound data comprises compressed sound data representing a sound in said frequency domain.</claim-text> <claim-text>21. A data carrier carrying processor control code to implement the method of ci aim 9.</claim-text> <claim-text>22. A data carrier carrying a Markov model configured and adjusted using the method of claim 9.</claim-text>
GB201215240A 2008-12-15 2008-12-15 Sound identification systems Active GB2494511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB201215240A GB2494511B (en) 2008-12-15 2008-12-15 Sound identification systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB201215240A GB2494511B (en) 2008-12-15 2008-12-15 Sound identification systems

Publications (3)

Publication Number Publication Date
GB201215240D0 GB201215240D0 (en) 2012-10-10
GB2494511A true GB2494511A (en) 2013-03-13
GB2494511B GB2494511B (en) 2013-09-04

Family

ID=47045458

Family Applications (1)

Application Number Title Priority Date Filing Date
GB201215240A Active GB2494511B (en) 2008-12-15 2008-12-15 Sound identification systems

Country Status (1)

Country Link
GB (1) GB2494511B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11735202B2 (en) 2019-01-23 2023-08-22 Sound Genetics, Inc. Systems and methods for pre-filtering audio content based on prominence of frequency content

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2628118C2 (en) * 2015-02-16 2017-08-15 Общество С Ограниченной Ответственностью "Яндекс" Method for forming and usage of the inverted index of audio recording and machinescent of information storage device
US9558272B2 (en) 2014-08-14 2017-01-31 Yandex Europe Ag Method of and a system for matching audio tracks using chromaprints with a fast candidate selection routine
US9881083B2 (en) 2014-08-14 2018-01-30 Yandex Europe Ag Method of and a system for indexing audio tracks using chromaprints

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006075352A1 (en) * 2005-01-14 2006-07-20 Franco Valentini Surveillance method and surveillance device operating with said method
WO2008139203A2 (en) * 2007-05-15 2008-11-20 Ipsotek Ltd Data processing apparatus
GB2466242A (en) * 2008-12-15 2010-06-23 Audio Analytic Ltd Identifying sounds and reducing false positive identification of sound

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006075352A1 (en) * 2005-01-14 2006-07-20 Franco Valentini Surveillance method and surveillance device operating with said method
WO2008139203A2 (en) * 2007-05-15 2008-11-20 Ipsotek Ltd Data processing apparatus
GB2466242A (en) * 2008-12-15 2010-06-23 Audio Analytic Ltd Identifying sounds and reducing false positive identification of sound

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Camera control system using speech recognition technology. World conference on educational multimedia, hypermedia and telecommunications. XP002561489. *
'LISTENIN' to domestic environments from remote locations. International conference on auditory display, Boston, MA USA. 6 July 2003. XP002561488. *
Robust Speech Activity Detection in the presence of noise. 1 October 1998. page P922, XP007000673. *
Sound analysis using MPEG compressed audio. Acoustics, speech and signal processing 2000. ICASSP '00. Proceeding S, 2000 IEEE International Conference on 5-9 June 2000, Piscataway, NJ, USA IEEE. vol 2 pages 761-764 XP010504834. ISBN 978-0-7803-6293-2 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11735202B2 (en) 2019-01-23 2023-08-22 Sound Genetics, Inc. Systems and methods for pre-filtering audio content based on prominence of frequency content

Also Published As

Publication number Publication date
GB2494511B (en) 2013-09-04
GB201215240D0 (en) 2012-10-10

Similar Documents

Publication Publication Date Title
US8918343B2 (en) Sound identification systems
US9286911B2 (en) Sound identification systems
US10224019B2 (en) Wearable audio device
US9609441B2 (en) Smart hearing aid
US6895098B2 (en) Method for operating a hearing device, and hearing device
CA2501989C (en) Isolating speech signals utilizing neural networks
US20030112987A1 (en) Hearing prosthesis with automatic classification of the listening environment
US20040155770A1 (en) Audible alarm relay system
CN109298642B (en) Method and device for monitoring by adopting intelligent sound box
US20080240458A1 (en) Method and device configured for sound signature detection
Nordqvist et al. An efficient robust sound classification algorithm for hearing aids
US11558699B2 (en) Hearing device component, hearing device, computer-readable medium and method for processing an audio-signal for a hearing device
JP2011237865A (en) Living space monitoring system
JP2004500750A (en) Hearing aid adjustment method and hearing aid to which this method is applied
US10955287B2 (en) System and method of signal processing for use in gunshot detection
GB2494511A (en) Digital sound identification
US20200251120A1 (en) Method and system for individualized signal processing of an audio signal of a hearing device
JP2020524300A (en) Method and device for obtaining event designations based on audio data
CN112700765A (en) Assistance techniques
CA2432751A1 (en) Enhanced method and apparatus for integrated alarm monitoring system based on sound related events
WO2019002417A1 (en) Sound responsive device and method
JP2004500592A (en) Method for determining instantaneous acoustic environment condition, method for adjusting hearing aid and language recognition method using the same, and hearing aid to which the method is applied
GB2534027A (en) Sound capturing and identifying devices
CN108632692B (en) Intelligent control method of microphone equipment and microphone equipment
WO2022120085A1 (en) Subband domain acoustic echo canceller based acoustic state estimator

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)

Free format text: REGISTERED BETWEEN 20230202 AND 20230208