EP2454725A1 - Method and system for remotely guarding an area by means of cameras and microphones - Google Patents

Method and system for remotely guarding an area by means of cameras and microphones

Info

Publication number
EP2454725A1
EP2454725A1 EP10736847A EP10736847A EP2454725A1 EP 2454725 A1 EP2454725 A1 EP 2454725A1 EP 10736847 A EP10736847 A EP 10736847A EP 10736847 A EP10736847 A EP 10736847A EP 2454725 A1 EP2454725 A1 EP 2454725A1
Authority
EP
European Patent Office
Prior art keywords
sound
sound sources
area
location
intelligibility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP10736847A
Other languages
German (de)
French (fr)
Inventor
Frank Leonard Kooi
Kim Kranenborg
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nederlandse Organisatie voor Toegepast Natuurwetenschappelijk Onderzoek TNO
Original Assignee
Nederlandse Organisatie voor Toegepast Natuurwetenschappelijk Onderzoek TNO
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nederlandse Organisatie voor Toegepast Natuurwetenschappelijk Onderzoek TNO filed Critical Nederlandse Organisatie voor Toegepast Natuurwetenschappelijk Onderzoek TNO
Priority to EP10736847A priority Critical patent/EP2454725A1/en
Publication of EP2454725A1 publication Critical patent/EP2454725A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/16Actuation by interference with mechanical vibrations in air or other fluid
    • G08B13/1654Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
    • G08B13/1672Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19678User interface
    • G08B13/19686Interfaces masking personal details for privacy, e.g. blurring faces, vehicle license plates
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19678User interface
    • G08B13/19691Signalling events for better perception by user, e.g. indicating alarms by making display brighter, adding text, creating a sound
    • G08B13/19693Signalling events for better perception by user, e.g. indicating alarms by making display brighter, adding text, creating a sound using multiple video sources viewed on a single or compound screen
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19697Arrangements wherein non-video detectors generate an alarm themselves

Definitions

  • the present invention relates to a method and system for remotely guarding an area by means of cameras and microphones at several locations within that area, which are connected to a central surveillance post.
  • GB 2408880 discloses a camera surveillance system with a plurality of cameras and a user interface that allows cameras to be prioritized based on detected activity. Although the use of such cameras is very useful in guarding such areas, the effectiveness of such systems could be improved.
  • US 2007/182819 proposes to digitize known analog camera surveillance systems. It mentions the possibility of using sensors for fire, smoke, sound, glass breakage, motion, panic buttons and the like to trigger a camera activation event to switch a camera into transmission mode, or to alert a server.
  • WO 2007/095994 discloses a camera surveillance system that shows a mosaic of images and renders related sound, using stereophonic sound to associate sound with the display position of the corresponding image. When an event occurs in one image, sound corresponding to the other images may be switched off.
  • a method is provided.
  • cameras and sound sources at a plurality of locations within an area are used to support remote guarding of the area.
  • Each sound source comprises a microphone or a group of microphones, the cameras and sound sources being coupled to a surveillance post.
  • the method comprises the steps of: deriving, per sound source, an attention value based on the sound picked up by that sound source; comparing the attention values with a predetermined threshold value; and in response to detection that the attention value for a particular one of the sound sources has passed the predetermines threshold value; audibly rendering a sound representation of the sound picked up by the particular one of the sound sources causing the threshold passage, limited to a time interval of at most a predetermined length.
  • the time interval in which the audio is rendered has a length of at most ten seconds. It has been found that this usually limits intelligibility.
  • the time interval may be at least two seconds long for example and more preferably at least five seconds. This makes it possible to observe emotions in the sound. When emotions can be observed the operator can semantically comprehend the emotional components (in particular fear, anger, excitement etc.) in the audible signals picked up in the vicinity of the cameras. These components should, after transfer to the operator, attract his attention in a natural way and trigger him/her to pay attention to the location at which such (e.g. excited) audible signal originated or was recorded.
  • the various camera and microphone locations are displayed on a map of said area and a representation of a location of the sound source where the attention value exceeded the threshold is indicated in relation to said map in response the detection.
  • the indication may be realized visually, for example by activating display of a predetermined color at a location of the map that corresponds with the sound source, and/or auditively, for example by means of a synthetic stereophonic signal that suggest that sound comes from that location. This helps to keep the operator focus on the location of the source of the sound even after the time interval in which the sound has been rendered has expired.
  • the visual indication of the map location given at least following the time interval in which the sound from the sound source is rendered.
  • the visual indication may also be given during the time interval.
  • an audio indication of the location may be given by rendering the sound from the sound source stereophonically. After the interval other sound, that is not sound picked up after the time interval may be rendered in this way to indicate the location.
  • sound from the sound source is first processed before it is rendered, such that intelligibility is reduced.
  • the processing may involve time and/or frequency domain filtering, adding echos and/or scrambling of fragmenting of the sound representation may be used, reducing the overall semantic or linguistic intelligibility of the sound to a level which complies with the relevant privacy regulations related to eavesdropping.
  • Preferably a form of processing is used that results in rendering of at least the lowest frequencies of speech sounds. This provides for recognition of emotions.
  • a value of a measure of intelligibility of the sound is determined and the processing is controlled dependent on the value.
  • said processing may be enabled or disabled dependent on whether the value is above or below a predetermined threshold.
  • said processing may be adapted so as to reduce the value of the measure of intelligibility.
  • the Speech Transmission Index (abbreviated STI, see for its definition e.g. en.wikipedia.org/wiki/Speech_Transmission_Index) may be used as a measure of intelligibility for example.
  • the STI of the processed sound is reduced, e.g. by means of signal scrambling or addition of noise), to a maximum of e.g. or less 0.35.
  • the location representation of that sound may preferably be performed by spatial (2 or 3 Dimensional) audible reproduction of that sound representation in the vicinity of the observation screen.
  • observation screen which may be formed by a group of cooperating display screens
  • the operator's attention can be attracted when the sound representations, originated at several microphone locations, are reproduced (i.e. when the attention value of the sound passes a predetermined threshold value) via a spatial audio reproduction system.
  • the sounds as such may be picked up by single channel microphones, however their sound representations are reproduced, via a spatial audio system in the vicinity of the observation screen, in such a way that, in the operator's perception, the sound representations comes from the direction of the location, as mapped on the observation screen, where the sound has been produced or recorded.
  • the sound originating location may be represented by means of visual display of the location where the sound has been produced, e.g. by means of any form of highlighting that location at the area mapping on the observation screen.
  • Figure 1 shows an exemplary embodiment of a surveillance system
  • Figure 2 shows the diagram of an exemplary embodiment of a subsystem for sound processing.
  • Figures 1 shows use of a system for remotely guarding an area (the centre of a city in the example) using cameras and microphones at several locations within that area, which are connected to a central surveillance post.
  • Figure 2 shows the system, including an observation screen 1, a microphone (MIC) an event detector (ED) 2 and an intelligibility reductor (IR) 3, a 2D renderer (2DR) 4 and a set of loudspeakers 5, and a video screen driver (VD) 6.
  • Event detector (ED) 2 and an intelligibility reductor (IR) 3 have inputs coupled to the microphone MIC.
  • Event detector (ED) 2 has an output coupled to a control input of intelligibility reductor (IR) 3.
  • Intelligibility reductor (IR) 3 has an output coupled to 2D renderer (2DR) 4 and a video screen driver (VD) 6.
  • event detector (ED) 2 has an output coupled to a control inputs of intelligibility reductor (IR) 3 and video screen driver (VD) 6, and optionally 2D renderer (2DR), intelligibility reductor (IR) 3 not being coupled to video screen driver (VD) 6.
  • 2D renderer (2DR) 4 and a video screen driver (VD) 6 are coupled to observation screen 1.
  • microphone MIC is shown in the figure, it should be appreciated that a plurality of microphones or groups of microphones mat be used.
  • the system comprises a plurality of cameras (not shown).
  • the system may comprise auxiliary screens 10 (figure 1) coupled to the cameras, for showing images from different cameras.
  • the system is used for remotely guarding an area by means of cameras and microphones at several locations within that area, which are connected to a central surveillance post.
  • the various camera and microphone locations are displayed on a map of the area; enabling selective activation, e.g. by a screen observing operator 9, of one or more camera images for zooming in.
  • Per microphone or group of microphones, called sound source hereinafter an attention value is derived based on the sound picked up by that sound source. When the attention value passes a
  • a representation of the sound picked up by the sound source causing the threshold passage is output.
  • the output includes an audible and/or visual
  • location representation of the location of the sound source causing the threshold passage
  • ED event detector
  • IR intelligibility reductor
  • the signal from event detector (ED) 2 is used to control 2D renderer (2DR) 4 and video screen driver (VD) 6 to select a position representation that must be rendered by means of synthetic stereophony and display at a selected location on observation screen according to the position of the microphone MIC for which en event is detected.
  • the observation screen 1 is arranged for displaying the various camera (Cam) and microphone (Mic) locations on a map of the area.
  • Event detector (ED) 2 and intelligibility reductor (IR) 3 function as means for executing the method as discussed hereinbefore including processing means and means for the reproduction of the sound representations 2D renderer (2DR) 4 functions as means for the reproduction of the relevant location representations.
  • Set of loudspeakers 5 for acoustic location representation, as well as a video screen driver (VD) 6 provide for visual location representation at the observation screen 1. The relevant area thus can be monitored by means of cameras and
  • microphones at several locations within the area which are connected to a central surveillance post which accommodates the components shown in the figures 1 and 2.
  • the various camera and microphone locations are displayed on a map image of the area to be monitored.
  • a screen observing operator 9 is able, e.g. by means of a keyboard, mouse, joystick (not shown) or touch screen, to select and activate cameras and/or camera images to zoom in and out; besides the operator may be able to move the cameras into different positions.
  • microphones are installed, picking up the sound present in the camera's vicinity. In this way the sounds which are present in the vicinity of each camera is transmitted to the surveillance post, which accommodates the system.
  • an attention value is derived based on the sound picked up by that sound source.
  • the event detector 2 analyzes the incoming sound and decides -e.g. based on the results of a frequency spectrum and energy level analysis- whether the incoming sound comprises elements like fear, excitement (e.g. screaming), uncommon noise like e.g. breaking glass etc.
  • the attention value should pass a predetermined threshold value, indicating that there might be an event which should be investigated.
  • the attention value may be based on a level of signal power in a selected frequency band, or steepness of an increase of such a level, or deviation from a range of spectral distributions of standard sounds.
  • Known sound recognition algorithms may be used to detect specific types of sound.
  • this detector gives an "on" signal to the intelligibility reductor 3 to pass a representation of the sound picked up by the sound source causing the threshold passage, i.e. a sound representation having a reduced intelligibility.
  • the intelligibility reductor 3 reduces intelligibility by passing no more than a predetermined time interval of sound, for example at most ten seconds.
  • intelligibility reductor 3 may comprise a buffer memory to buffer sound of the sound source before it is rendered. Thus for example, intelligibility reductor 3 may render a part of buffered sound for which it was determined subsequently that the attention value exceeded the threshold.
  • an audible representation of the location of the possibly buffered sample of the event sound source causing the threshold passage is performed, viz. by reproducing the sound representation (having a reduced intelligibility) by means of a 2D sound rendering subsystem (2DR) 4 and loudspeakers 5 which—by means of audio phase manipulation causing pseudo stereo/quadraphonic sound reproduction (see
  • a visual location representation is presented to the operator, viz. in the form of an image, e.g. as shown in figure 1 (again in the corner right below) where the relevant microphone location and the neighbouring camera location have been accentuated by (bold) encircling the relevant location. In this way the operator
  • the operator may activate the relevant camera (e.g. by using a touch screen or keyboard function) to zoom in, which may be made visible via the same observation screen 1 or -as is suggested in figure 1— via one or more auxiliary screens.
  • the operator may have heard (the sound representation of) breaking glass and/or crying voices "Stop thief!!", is guided by that sound to the highlighted location at his screen 1, activates the relevant camera and see at the auxiliary screen
  • the operator then may contact and inform the police.
  • the display of the visual location representation may continue until it is switched off by the operator, or it may be switched off automatically after a time interval that is longer than the time interval during which sound is rendered from the sound source.
  • this may include to make separated fragmented parts of the sound picked up by the sound source (the microphone(s)), which fragmentation is such that the overall semantic intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping.
  • the length of each fragmented part is limited (e.g. to 10 seconds or less), the intelligibility will be decreased and thus the possibility to relate a spoken phrase to a particular individual will be made infeasible.
  • Another or an additional method for intelligibility reduction is to process (e.g. by scrambling and/or distortion) the sound from the originating sound source such that the intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping.
  • process e.g. by scrambling and/or distortion
  • the Speech Transmission Index is a measure for the intelligibility (understanding) of speech, whose value varies from 0 (completely
  • Intelligibility reduction may be realized by processing such as one or more of time and/or frequency domain scrambling, adding echos, distorting, filtering, or addition of noise.
  • the addition of echos for example is a simple and effective way of reducing intelligibility.
  • Scrambling may involve changing the relative sequence of a series of fragments of the sound.
  • Distortion may involve applying a non-linear function to sound sample values.
  • Filtering may involve reducing the strength high frequency components.
  • a frequency of a lowest sound (e.g. speech) component such as a formant is determined and the relative strength of frequency component at frequencies above this component is reduced.
  • Noise may be added at selected frequencies for example. Each of these measures reduces the intelligibility of the sound.
  • the degree of reduction may be increased for example by using shorter fragments or increased reordering, adding more echos, using a non-linear function that deviates more from a linear function, reducing high frequency components more strongly, reducing more high frequency components, adding more noise etc.
  • a method of reduction is used that preserves the amplitude variation at the lowest speech frequencies more than higher frequencies. This makes it easier to recognize emotions.
  • the reduction of intelligibility may be performed in a control loop, wherein the Speech Transmission Index is determined and the degree of reduction is controlled dependent on Speech Transmission Index. Processing to reduce intelligibility may be switched on or off dependent on the value of the Speech Transmission Index for example. Thus in an embodiment, processing to reduce intelligibility may be switched on only if the received audio is sufficiently intelligible speech.
  • the system may contain respective event detectors (ED) 2 and/or intelligibility reductors (IR) 3 for respective microphones (MIC) or groups of microphones. Alternatively an event detector (ED) 2 may process sound from different microphones on a time multiplexing basis.
  • ED event detector
  • intelligibility reductor (IR) 3 may comprise a memory for storing recent sound input from different microphones or groups of microphones, intelligibility reductor (IR) 3 outputting and optionally processing sound for a microphone or groups of microphones that is selected by an event detector (ED) 2 or event detectors (ED) 2.
  • An intelligibility reductor (IR) 3 may be configured to supply an output signal identifying the selected microphone or group of microphones that is the source of the sound that is passes.
  • Part or all of event detector (ED) 2 and intelligibility reductor (IR) 3, 2D renderer (2DR) 4 and a video screen driver (VD) 6 may be implemented using a programmable computer or set of computers and a computer program or set of programs to perform the functions as described.
  • a computer or set of computers, or dedicated hardware to perform the functions of event detector (ED) 2 and intelligibility reductor (IR) 3, 2D renderer (2DR) 4 and a video screen driver (VD) 6 will be referred to as a processing system.
  • a processing system When it is described that the system is configured to perform a function, it should be understood that this covers both hardware and software implementation, using a programmed computer and mixtures of both.
  • the position of a microphone is indicated by both visual and audio representation, it should be appreciated that one, for example visual representation by the activation of a light at a selected location may be sufficient.
  • an embodiment has been described that combines rendering of audio with a representation of position, it should be appreciated that in some cases it may suffice to render only the sound, without representing the position.
  • Representation of the position facilitates determination of the location of the microphone by the operator in combination with monitoring when a limited time interval of sound is rendered. When the intelligibility of the sound is reduced, it may not be necessary to limit the time interval in which audio is rendered after event detection. But when many cameras are used it may be advantageous to limit the time interval even in that case.
  • the microphones may be mounted in camera assemblies together with the cameras. Thus, each sound event can be associated with a camera and indication of the location of the microphone of a sound event can involve showing the images from the associated camera of that microphone. Alternatively, or in addition, microphones may be used that are at a distance from cameras.
  • a method for remotely guarding an area by means of cameras and microphones at several locations within that area, which are connected to a central surveillance post comprising the steps of: displaying, at an observation screen (1), the various camera and microphone locations on a map of said area; enabling selective activation, e.g. by a screen observing operator (9), of one or more camera images for zooming in; deriving, per microphone or group of microphones, called sound source hereinafter, an attention value based on the sound picked up by that sound source; outputting, when the attention value passes a predetermined threshold value, a
  • sound representation including an audible and/or visual representation of the location of the sound source causing the threshold passage, called location representation hereinafter.
  • said sound representation includes fragmented parts of the sound picked up by the sound source, the fragmentation being such that the overall semantic intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping.
  • the length of each fragmented part may have a maximum of 10 seconds.
  • said sound representation may includes at least part of the sound picked up by the relevant sound source, however, processed such, e.g. by means of time and/or frequency domain scrambling, distorting, filtering etc., that the intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping.
  • the Speech Transmission Index of the processed sound has a maximum of 0.35.
  • Said location representation may be performed by means of spatial audible reproduction of the relevant sound representation in the vicinity of said observation screen.
  • said location representation is performed by means of visual display of the location of the sound source causing said threshold passage.
  • a system for remotely guarding an area using cameras and microphones at several locations within that area, which are connected to a central surveillance post, including an observation screen (1) arranged for displaying the various camera and microphone locations on a map of said area; the system including means for executing the method according to any of the preceding claims, including processing means and means for the reproduction of said sound representations and location representations respectively.
  • An advantage of at least some of the embodiments is to provide a system which makes remote monitoring of (urban) areas more lively for the operator (e.g. guardsman), as the visual information offered by the video cameras is supplemented by accompanying "real live audio", however without passing on (private) conversations etc. in a way that their content could be followed, i.e. understood, by the operator.

Abstract

Method and system for remotely guarding an area by means of cameras and microphones at several locations within that area, which are connected to a central surveillance post, comprising the steps of displaying, at an observation screen, the various camera and microphone locations on a map of said area; enabling selective activation, e.g. by an operator, of camera images for zooming in; deriving, per microphone or group of microphones, an attention value based on the sound picked up by that sound source; and outputting, when the attention value passes a predetermined threshold value, a sound signal of limited duration from the sound source causing the threshold passage is rendered. Sound processing may be used to provide a further reduction of the intelligibility of the sound. An audible and/or visual representation of the location of the sound source causing the threshold passage is also given. The location representation may be performed by means of spatial audible reproduction of the relevant sound representation in the vicinity of said observation screen and/or by means of visual display of the location of the sound source causing said threshold passage.

Description

Title: Method and system for remotely guarding an area by means of cameras and microphones
Field of the invention
The present invention relates to a method and system for remotely guarding an area by means of cameras and microphones at several locations within that area, which are connected to a central surveillance post.
Background
Surveillance cameras for monitoring public areas have widespread
applications especially in urban areas. For example GB 2408880 discloses a camera surveillance system with a plurality of cameras and a user interface that allows cameras to be prioritized based on detected activity. Although the use of such cameras is very useful in guarding such areas, the effectiveness of such systems could be improved.
It is known to use audio input from the location of the camera to improve the effectiveness. US 2007/182819 proposes to digitize known analog camera surveillance systems. It mentions the possibility of using sensors for fire, smoke, sound, glass breakage, motion, panic buttons and the like to trigger a camera activation event to switch a camera into transmission mode, or to alert a server. WO 2007/095994 discloses a camera surveillance system that shows a mosaic of images and renders related sound, using stereophonic sound to associate sound with the display position of the corresponding image. When an event occurs in one image, sound corresponding to the other images may be switched off.
However, in most countries legal privacy regulations forbid eavesdropping (except under special conditions). Conventionally, surveillance camera systems for areas where such prohibitions are in force comply with the prohibition because they do not provide for audio output. But this way of addressing the prohibition on eavesdropping limits effectiveness of the surveillance system. Summary
It is one aim to improve the effectiveness of such a system by combining it with audible information in a way that prevents unlimited eavesdropping. Thus it can be made possible to use audible information to improve the effectiveness without infringing privacy regulations.
A method according to claim 1 is provided. Herein cameras and sound sources at a plurality of locations within an area are used to support remote guarding of the area. Each sound source comprises a microphone or a group of microphones, the cameras and sound sources being coupled to a surveillance post. The method comprises the steps of: deriving, per sound source, an attention value based on the sound picked up by that sound source; comparing the attention values with a predetermined threshold value; and in response to detection that the attention value for a particular one of the sound sources has passed the predetermines threshold value; audibly rendering a sound representation of the sound picked up by the particular one of the sound sources causing the threshold passage, limited to a time interval of at most a predetermined length. By rendering the sound in response to detection at most for a time interval of predetermined length, the intelligibility of the sound for eavesdropping is limited. In this way it is made possible to use audio to improve the effectiveness without infringing privacy regulations.
In an embodiment the time interval in which the audio is rendered has a length of at most ten seconds. It has been found that this usually limits intelligibility. The time interval may be at least two seconds long for example and more preferably at least five seconds. This makes it possible to observe emotions in the sound. When emotions can be observed the operator can semantically comprehend the emotional components (in particular fear, anger, excitement etc.) in the audible signals picked up in the vicinity of the cameras. These components should, after transfer to the operator, attract his attention in a natural way and trigger him/her to pay attention to the location at which such (e.g. excited) audible signal originated or was recorded. In an embodiment the various camera and microphone locations are displayed on a map of said area and a representation of a location of the sound source where the attention value exceeded the threshold is indicated in relation to said map in response the detection. The indication may be realized visually, for example by activating display of a predetermined color at a location of the map that corresponds with the sound source, and/or auditively, for example by means of a synthetic stereophonic signal that suggest that sound comes from that location. This helps to keep the operator focus on the location of the source of the sound even after the time interval in which the sound has been rendered has expired. In an embodiment, the visual indication of the map location given at least following the time interval in which the sound from the sound source is rendered. This helps the operator identify the location even if the operator did not identify the location during the time interval, without allowing eavesdropping. The visual indication may also be given during the time interval. During the time interval an audio indication of the location may be given by rendering the sound from the sound source stereophonically. After the interval other sound, that is not sound picked up after the time interval may be rendered in this way to indicate the location.
In an embodiment sound from the sound source is first processed before it is rendered, such that intelligibility is reduced. The processing may involve time and/or frequency domain filtering, adding echos and/or scrambling of fragmenting of the sound representation may be used, reducing the overall semantic or linguistic intelligibility of the sound to a level which complies with the relevant privacy regulations related to eavesdropping. Preferably a form of processing is used that results in rendering of at least the lowest frequencies of speech sounds. This provides for recognition of emotions.
In an embodiment a value of a measure of intelligibility of the sound is determined and the processing is controlled dependent on the value. In an embodiment said processing may be enabled or disabled dependent on whether the value is above or below a predetermined threshold. In an embodiment said processing may be adapted so as to reduce the value of the measure of intelligibility. The Speech Transmission Index (abbreviated STI, see for its definition e.g. en.wikipedia.org/wiki/Speech_Transmission_Index) may be used as a measure of intelligibility for example. The STI of the processed sound is reduced, e.g. by means of signal scrambling or addition of noise), to a maximum of e.g. or less 0.35.
To comply with the aim to provide that the relevant audible signals, picked up in the vicinity of the cameras and processed as indicated above, will attract the operator's attention and guide him to the location on his observation screen where the (e.g. excited) sound was originated, the location representation of that sound may preferably be performed by spatial (2 or 3 Dimensional) audible reproduction of that sound representation in the vicinity of the observation screen. As such observation screen (which may be formed by a group of cooperating display screens) normally will have rather large dimensions, the operator's attention can be attracted when the sound representations, originated at several microphone locations, are reproduced (i.e. when the attention value of the sound passes a predetermined threshold value) via a spatial audio reproduction system. It has to be noted that the sounds as such may be picked up by single channel microphones, however their sound representations are reproduced, via a spatial audio system in the vicinity of the observation screen, in such a way that, in the operator's perception, the sound representations comes from the direction of the location, as mapped on the observation screen, where the sound has been produced or recorded.
Additionally or optionally, the sound originating location may be represented by means of visual display of the location where the sound has been produced, e.g. by means of any form of highlighting that location at the area mapping on the observation screen.
Brief description of the drawing These and other object and advantageous aspects will become apparent from a description of exemplary embodiments, using the following figures
Figure 1 shows an exemplary embodiment of a surveillance system; Figure 2 shows the diagram of an exemplary embodiment of a subsystem for sound processing.
Detailed description of exemplary embodiments Figures 1 shows use of a system for remotely guarding an area (the centre of a city in the example) using cameras and microphones at several locations within that area, which are connected to a central surveillance post.
Figure 2 shows the system, including an observation screen 1, a microphone (MIC) an event detector (ED) 2 and an intelligibility reductor (IR) 3, a 2D renderer (2DR) 4 and a set of loudspeakers 5, and a video screen driver (VD) 6. Event detector (ED) 2 and an intelligibility reductor (IR) 3, have inputs coupled to the microphone MIC. Event detector (ED) 2 has an output coupled to a control input of intelligibility reductor (IR) 3. Intelligibility reductor (IR) 3 has an output coupled to 2D renderer (2DR) 4 and a video screen driver (VD) 6. In another embodiment event detector (ED) 2 has an output coupled to a control inputs of intelligibility reductor (IR) 3 and video screen driver (VD) 6, and optionally 2D renderer (2DR), intelligibility reductor (IR) 3 not being coupled to video screen driver (VD) 6. 2D renderer (2DR) 4 and a video screen driver (VD) 6 are coupled to observation screen 1. Although only one
microphone MIC is shown in the figure, it should be appreciated that a plurality of microphones or groups of microphones mat be used. Furthermore, the system comprises a plurality of cameras (not shown). The system may comprise auxiliary screens 10 (figure 1) coupled to the cameras, for showing images from different cameras.
In operation the system is used for remotely guarding an area by means of cameras and microphones at several locations within that area, which are connected to a central surveillance post. At observation screen 1, the various camera and microphone locations are displayed on a map of the area; enabling selective activation, e.g. by a screen observing operator 9, of one or more camera images for zooming in. Per microphone or group of microphones, called sound source hereinafter, an attention value is derived based on the sound picked up by that sound source. When the attention value passes a
predetermined threshold value, a representation of the sound picked up by the sound source causing the threshold passage, called sound representation hereinafter, is output. The output includes an audible and/or visual
representation of the location of the sound source causing the threshold passage, called location representation hereinafter. When event detector (ED) 2 detects an event it causes intelligibility reductor (IR) 3 to pass a sound signal from microphone during a limited time interval, optionally after applying a form of processing that further reduces
intelligibility. In addition the signal from event detector (ED) 2 is used to control 2D renderer (2DR) 4 and video screen driver (VD) 6 to select a position representation that must be rendered by means of synthetic stereophony and display at a selected location on observation screen according to the position of the microphone MIC for which en event is detected. The observation screen 1 is arranged for displaying the various camera (Cam) and microphone (Mic) locations on a map of the area. Event detector (ED) 2 and intelligibility reductor (IR) 3 function as means for executing the method as discussed hereinbefore including processing means and means for the reproduction of the sound representations 2D renderer (2DR) 4 functions as means for the reproduction of the relevant location representations. Set of loudspeakers 5 for acoustic location representation, as well as a video screen driver (VD) 6 provide for visual location representation at the observation screen 1. The relevant area thus can be monitored by means of cameras and
microphones at several locations within the area, which are connected to a central surveillance post which accommodates the components shown in the figures 1 and 2. By means of the observation screen 1, the various camera and microphone locations are displayed on a map image of the area to be monitored. A screen observing operator 9 is able, e.g. by means of a keyboard, mouse, joystick (not shown) or touch screen, to select and activate cameras and/or camera images to zoom in and out; besides the operator may be able to move the cameras into different positions. In the vicinity of each camera, microphones are installed, picking up the sound present in the camera's vicinity. In this way the sounds which are present in the vicinity of each camera is transmitted to the surveillance post, which accommodates the system. In the event detector 2 per microphone or group of microphones (sound source) an attention value is derived based on the sound picked up by that sound source. The event detector 2 analyzes the incoming sound and decides -e.g. based on the results of a frequency spectrum and energy level analysis- whether the incoming sound comprises elements like fear, excitement (e.g. screaming), uncommon noise like e.g. breaking glass etc. In such cases the attention value should pass a predetermined threshold value, indicating that there might be an event which should be investigated. The attention value may be based on a level of signal power in a selected frequency band, or steepness of an increase of such a level, or deviation from a range of spectral distributions of standard sounds. Known sound recognition algorithms may be used to detect specific types of sound.
When the attention value passes a predetermined threshold value, detected in the event detector 2, this detector gives an "on" signal to the intelligibility reductor 3 to pass a representation of the sound picked up by the sound source causing the threshold passage, i.e. a sound representation having a reduced intelligibility. In an embodiment the intelligibility reductor 3 reduces intelligibility by passing no more than a predetermined time interval of sound, for example at most ten seconds. In an embodiment intelligibility reductor 3 may comprise a buffer memory to buffer sound of the sound source before it is rendered. Thus for example, intelligibility reductor 3 may render a part of buffered sound for which it was determined subsequently that the attention value exceeded the threshold. In addition, an audible representation of the location of the possibly buffered sample of the event sound source causing the threshold passage (location representation) is performed, viz. by reproducing the sound representation (having a reduced intelligibility) by means of a 2D sound rendering subsystem (2DR) 4 and loudspeakers 5 which—by means of audio phase manipulation causing pseudo stereo/quadraphonic sound reproduction (see
en.wikipedia.org/wiki/Quadraphonic_sound) and/or sound reproduction via a selected set of loudspeakers 5a and 5b- provides that -in the perception of the operator 9, standing or sitting before his (widescreen) observation screen 1- the sound representation comes from the location at that observation screen (in the corner right below in figure 1).
In addition to the audible location representation, audible to the operator, also a visual location representation is presented to the operator, viz. in the form of an image, e.g. as shown in figure 1 (again in the corner right below) where the relevant microphone location and the neighbouring camera location have been accentuated by (bold) encircling the relevant location. In this way the operator
9 will be guided—in a natural and intuitive way— to pay his attention to the location in which -according to the sound picked up by the microphone(s)- something might be wrong. Then the operator may activate the relevant camera (e.g. by using a touch screen or keyboard function) to zoom in, which may be made visible via the same observation screen 1 or -as is suggested in figure 1— via one or more auxiliary screens. In the illustrated example, the operator may have heard (the sound representation of) breaking glass and/or crying voices "Stop thief!!", is guided by that sound to the highlighted location at his screen 1, activates the relevant camera and see at the auxiliary screen
10 a thief running away. The operator then may contact and inform the police. The display of the visual location representation may continue until it is switched off by the operator, or it may be switched off automatically after a time interval that is longer than the time interval during which sound is rendered from the sound source.
Concerning the sound representation, made in de IR module 3, this may include to make separated fragmented parts of the sound picked up by the sound source (the microphone(s)), which fragmentation is such that the overall semantic intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping. When the length of each fragmented part is limited (e.g. to 10 seconds or less), the intelligibility will be decreased and thus the possibility to relate a spoken phrase to a particular individual will be made infeasible.
Another or an additional method for intelligibility reduction is to process (e.g. by scrambling and/or distortion) the sound from the originating sound source such that the intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping. In practice it has been proven that when the Speech Transmission Index of the processed sound has a maximum of 0.35, this will fit to the desired lower intelligibility. The Speech Transmission Index (STI) is a measure for the intelligibility (understanding) of speech, whose value varies from 0 (completely
unintelligible) to 1 (perfect intelligibility). On this scale, an STI of at least 0.5 is desirable for most applications (Steeneken, H. J. M., & Houtgast, T. (1980). A physical method for measuring speech-transmission quality. Journal of the Acoustical Society of America, 67, 318-326).
Intelligibility reduction may be realized by processing such as one or more of time and/or frequency domain scrambling, adding echos, distorting, filtering, or addition of noise. The addition of echos, for example is a simple and effective way of reducing intelligibility. Scrambling may involve changing the relative sequence of a series of fragments of the sound. Distortion may involve applying a non-linear function to sound sample values. Filtering may involve reducing the strength high frequency components. In an embodiment, a frequency of a lowest sound (e.g. speech) component such as a formant is determined and the relative strength of frequency component at frequencies above this component is reduced. Noise may be added at selected frequencies for example. Each of these measures reduces the intelligibility of the sound.
The degree of reduction may be increased for example by using shorter fragments or increased reordering, adding more echos, using a non-linear function that deviates more from a linear function, reducing high frequency components more strongly, reducing more high frequency components, adding more noise etc. Preferably a method of reduction is used that preserves the amplitude variation at the lowest speech frequencies more than higher frequencies. This makes it easier to recognize emotions.
In an embodiment the reduction of intelligibility may be performed in a control loop, wherein the Speech Transmission Index is determined and the degree of reduction is controlled dependent on Speech Transmission Index. Processing to reduce intelligibility may be switched on or off dependent on the value of the Speech Transmission Index for example. Thus in an embodiment, processing to reduce intelligibility may be switched on only if the received audio is sufficiently intelligible speech. The system may contain respective event detectors (ED) 2 and/or intelligibility reductors (IR) 3 for respective microphones (MIC) or groups of microphones. Alternatively an event detector (ED) 2 may process sound from different microphones on a time multiplexing basis. In another embodiment
intelligibility reductor (IR) 3 may comprise a memory for storing recent sound input from different microphones or groups of microphones, intelligibility reductor (IR) 3 outputting and optionally processing sound for a microphone or groups of microphones that is selected by an event detector (ED) 2 or event detectors (ED) 2. An intelligibility reductor (IR) 3 may be configured to supply an output signal identifying the selected microphone or group of microphones that is the source of the sound that is passes.
Part or all of event detector (ED) 2 and intelligibility reductor (IR) 3, 2D renderer (2DR) 4 and a video screen driver (VD) 6 may be implemented using a programmable computer or set of computers and a computer program or set of programs to perform the functions as described. Such a computer or set of computers, or dedicated hardware to perform the functions of event detector (ED) 2 and intelligibility reductor (IR) 3, 2D renderer (2DR) 4 and a video screen driver (VD) 6 will be referred to as a processing system. When it is described that the system is configured to perform a function, it should be understood that this covers both hardware and software implementation, using a programmed computer and mixtures of both.
Although an embodiment has been described wherein the position of a microphone is indicated by both visual and audio representation, it should be appreciated that one, for example visual representation by the activation of a light at a selected location may be sufficient. Although an embodiment has been described that combines rendering of audio with a representation of position, it should be appreciated that in some cases it may suffice to render only the sound, without representing the position. Representation of the position facilitates determination of the location of the microphone by the operator in combination with monitoring when a limited time interval of sound is rendered. When the intelligibility of the sound is reduced, it may not be necessary to limit the time interval in which audio is rendered after event detection. But when many cameras are used it may be advantageous to limit the time interval even in that case. The microphones may be mounted in camera assemblies together with the cameras. Thus, each sound event can be associated with a camera and indication of the location of the microphone of a sound event can involve showing the images from the associated camera of that microphone. Alternatively, or in addition, microphones may be used that are at a distance from cameras.
In an embodiment a method is provided for remotely guarding an area by means of cameras and microphones at several locations within that area, which are connected to a central surveillance post, comprising the steps of: displaying, at an observation screen (1), the various camera and microphone locations on a map of said area; enabling selective activation, e.g. by a screen observing operator (9), of one or more camera images for zooming in; deriving, per microphone or group of microphones, called sound source hereinafter, an attention value based on the sound picked up by that sound source; outputting, when the attention value passes a predetermined threshold value, a
representation of the sound picked up by the sound source causing the threshold passage, called sound representation hereinafter, including an audible and/or visual representation of the location of the sound source causing the threshold passage, called location representation hereinafter.
Optionally said sound representation includes fragmented parts of the sound picked up by the sound source, the fragmentation being such that the overall semantic intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping. The length of each fragmented part may have a maximum of 10 seconds. In an embodiment said sound representation may includes at least part of the sound picked up by the relevant sound source, however, processed such, e.g. by means of time and/or frequency domain scrambling, distorting, filtering etc., that the intelligibility of the sound is reduced to a level which complies with the relevant privacy regulations related to eavesdropping. In an embodiment the Speech Transmission Index of the processed sound has a maximum of 0.35. Said location representation may be performed by means of spatial audible reproduction of the relevant sound representation in the vicinity of said observation screen. In an embodiment said location representation is performed by means of visual display of the location of the sound source causing said threshold passage.
A system is provided for remotely guarding an area using cameras and microphones at several locations within that area, which are connected to a central surveillance post, including an observation screen (1) arranged for displaying the various camera and microphone locations on a map of said area; the system including means for executing the method according to any of the preceding claims, including processing means and means for the reproduction of said sound representations and location representations respectively.
An advantage of at least some of the embodiments is to provide a system which makes remote monitoring of (urban) areas more lively for the operator (e.g. guardsman), as the visual information offered by the video cameras is supplemented by accompanying "real live audio", however without passing on (private) conversations etc. in a way that their content could be followed, i.e. understood, by the operator.

Claims

Claims
1. A method for supporting remote guarding of an area by means of cameras and sound sources at a plurality of locations within that area, each sound source comprising a microphone or a group of microphones, the cameras and sound sources being coupled to a surveillance post, the method comprising the steps of:
deriving, per sound source, an attention value based on the sound picked up by that sound source;
comparing the attention values with a predetermined threshold value; and in response to detection that the attention value for a particular one of the sound sources has passed the predetermines threshold value
audibly rendering a sound representation of the sound picked up by the particular one of the sound sources causing the threshold passage, limited to a time interval of at most a predetermined length.
2. A method according to claim 1, wherein the predetermined length is ten seconds.
3. A method according to claim 1 or 2, comprising
displaying, at an observation screen (1), the various camera and microphone locations on a map of said area;
outputting a representation of a location of the particular one of the sound sources in relation to said map in response to said detection.
4. A method according to claim 4, wherein said representation of the location of the particular one of the sound sources is performed by means of visual display of the location of the particular one of the sound sources.
5. A method according to any of the preceding claims, comprising processing the sound picked up by the particular one of the sound sources by means of at least one of time and/or frequency domain scrambling, adding echos, distorting, filtering, or addition of noise, reducing the intelligibility of the sound
6. A method according to claim 5, comprising
- determining a value of a measure of intelligibility of the sound picked up by the particular one of the sound sources and
- applying said processing dependent on said value of the measure of intelligibility.
7. A method according to claim 5 or 6, wherein said processing reduces the Speech Transmission Index of the processed sound to less than or equal to 0.35.
8. A system for supporting remote guarding of an area using cameras and sound sources at a plurality of locations within that area, each sound source comprising a microphone or a group of microphones, comprising
- an audio output device;
- a processing system with an input for audio information the sound sources, the processing system being configured, per sound source, an attention value based on the sound picked up by that sound source, to compare the attention values with a predetermined threshold value, and, in response to detection that the attention value for a particular one of the sound sources has passed the predetermines threshold value, audibly render a sound representation of the sound picked up by the particular one of the sound sources causing the threshold passage, limited to a time interval of at most a predetermined length.
9. A system according to claim 8, comprising a display for displaying a map of the area, wherein the processing system is configured to cause output of a signal representing a location of the particular one of the sound sources in relation to said map, in response to said detection.
10. A system according to claim 9, comprising an image display screen, wherein the signal represents the location of the particular one of the sound sources is performed by means of visual display of the location of the particular one of the sound sources.
11. A system according to any of claims 8-10, comprising an audio processor configured to process the sound picked up by the particular one of the sound sources by at least one of time and/or frequency domain scrambling, adding echos, distorting, filtering, or addition of noise, reducing the intelligibility of the sound.
12. A system according to claim 11 wherein the audio processor is configured to determine a value of a measure of intelligibility of the sound picked up by the particular one of the sound sources and to apply said processing dependent on said value of the measure of intelligibility.
13. A system for remotely guarding an area using cameras and sound sources at a plurality of locations within that area, each sound source comprising a microphone or a group of microphones, cameras and sound sources being coupled to a surveillance post, the system comprising including means for executing the method according to any of claims 1-7, the system including processing means and means for the reproduction of said sound representations and location representations respectively.
14. A computer program product comprising a program of instructions for a programmable computer in a system for remotely guarding an area, the program being configured to cause the computer, when the program is executed by the programmable computer, to execute the method of according to any of claims 1-7.
15. A system for supporting remote guarding of an area using cameras and sound sources at a plurality of locations within that area, each sound source comprising a microphone or a group of microphones, comprising
- an audio output device;
- a processing system with an input for audio information the sound sources, the processing system being configured, per sound source, an attention value based on the sound picked up by that sound source, to compare the attention values with a predetermined threshold value, and, in response to detection that the attention value for a particular one of the sound sources has passed the predetermines threshold value, audibly render a sound representation of the sound picked up by the particular one of the sound sources causing the threshold passage, after processing the sound picked up by the particular one of the sound sources by at least one of time and/or frequency domain scrambling, adding echos, distorting, filtering, or addition of noise, reducing the intelligibility of the sound to at most a predetermined level.
EP10736847A 2009-07-17 2010-07-19 Method and system for remotely guarding an area by means of cameras and microphones Withdrawn EP2454725A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP10736847A EP2454725A1 (en) 2009-07-17 2010-07-19 Method and system for remotely guarding an area by means of cameras and microphones

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP09165782A EP2276007A1 (en) 2009-07-17 2009-07-17 Method and system for remotely guarding an area by means of cameras and microphones.
EP10736847A EP2454725A1 (en) 2009-07-17 2010-07-19 Method and system for remotely guarding an area by means of cameras and microphones
PCT/NL2010/050466 WO2011008099A1 (en) 2009-07-17 2010-07-19 Method and system for remotely guarding an area by means of cameras and microphones

Publications (1)

Publication Number Publication Date
EP2454725A1 true EP2454725A1 (en) 2012-05-23

Family

ID=41110692

Family Applications (2)

Application Number Title Priority Date Filing Date
EP09165782A Withdrawn EP2276007A1 (en) 2009-07-17 2009-07-17 Method and system for remotely guarding an area by means of cameras and microphones.
EP10736847A Withdrawn EP2454725A1 (en) 2009-07-17 2010-07-19 Method and system for remotely guarding an area by means of cameras and microphones

Family Applications Before (1)

Application Number Title Priority Date Filing Date
EP09165782A Withdrawn EP2276007A1 (en) 2009-07-17 2009-07-17 Method and system for remotely guarding an area by means of cameras and microphones.

Country Status (2)

Country Link
EP (2) EP2276007A1 (en)
WO (1) WO2011008099A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2822053B1 (en) 2001-03-15 2003-06-20 Stryker Spine Sa ANCHORING MEMBER WITH SAFETY RING FOR SPINAL OSTEOSYNTHESIS SYSTEM
KR102127640B1 (en) * 2013-03-28 2020-06-30 삼성전자주식회사 Portable teriminal and sound output apparatus and method for providing locations of sound sources in the portable teriminal
WO2014199263A1 (en) 2013-06-10 2014-12-18 Honeywell International Inc. Frameworks, devices and methods configured for enabling display of facility information and surveillance data via a map-based user interface
JP5958833B2 (en) 2013-06-24 2016-08-02 パナソニックIpマネジメント株式会社 Directional control system
US10134422B2 (en) 2015-12-01 2018-11-20 Qualcomm Incorporated Determining audio event based on location information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666157A (en) * 1995-01-03 1997-09-09 Arc Incorporated Abnormality detection and surveillance system
US20020110264A1 (en) * 2001-01-30 2002-08-15 David Sharoni Video and audio content analysis system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4060803A (en) * 1976-02-09 1977-11-29 Audio Alert, Inc. Security alarm system with audio monitoring capability
US7023913B1 (en) * 2000-06-14 2006-04-04 Monroe David A Digital security multimedia sensor
GB2408881B (en) * 2003-12-03 2009-04-01 Safehouse Internat Inc Monitoring an environment to produce graphical output data representing events of interest
US20050225634A1 (en) * 2004-04-05 2005-10-13 Sam Brunetti Closed circuit TV security system
JP2008502228A (en) * 2004-06-01 2008-01-24 エル‐3 コミュニケーションズ コーポレイション Method and system for performing a video flashlight
US8624975B2 (en) * 2006-02-23 2014-01-07 Robert Bosch Gmbh Audio module for a video surveillance system, video surveillance system and method for keeping a plurality of locations under surveillance
GB0709329D0 (en) * 2007-05-15 2007-06-20 Ipsotek Ltd Data processing apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666157A (en) * 1995-01-03 1997-09-09 Arc Incorporated Abnormality detection and surveillance system
US20020110264A1 (en) * 2001-01-30 2002-08-15 David Sharoni Video and audio content analysis system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2011008099A1 *

Also Published As

Publication number Publication date
EP2276007A1 (en) 2011-01-19
WO2011008099A1 (en) 2011-01-20

Similar Documents

Publication Publication Date Title
US10848872B2 (en) Binaural recording for processing audio signals to enable alerts
US7936885B2 (en) Audio/video reproducing systems, methods and computer program products that modify audio/video electrical signals in response to specific sounds/images
EP2454725A1 (en) Method and system for remotely guarding an area by means of cameras and microphones
US20110092249A1 (en) Portable blind aid device
WO2015162645A1 (en) Audio processing apparatus, audio processing system, and audio processing method
EP3640935B1 (en) Notification information output method, server and monitoring system
US9652961B2 (en) Alarm notifying system
US8704893B2 (en) Ambient presentation of surveillance data
US20170354796A1 (en) Selective amplification of an acoustic signal
JP2006331388A (en) Crime prevention system
JPH0686295A (en) Monitor camera device
CN111275909B (en) Security early warning method and device
WO2011000113A1 (en) Multiple sound and voice detector for hearing- impaired or deaf person
CN105474665A (en) Sound processing apparatus, sound processing system, and sound processing method
JP2008500603A (en) Monitoring system and monitoring method
CN100483471C (en) Signalling system with imaging sensor
KR101882309B1 (en) safety light and safety system using voice recognition
JP2002297199A (en) Method and device for discriminating synthesized voice and voice synthesizer
KR101578108B1 (en) Scream detecting device for surveillance systems based on audio data and, the method thereof
US20190371146A1 (en) Burglary deterrent solution
JP5136074B2 (en) Earthquake early warning assistance device
JP2005184592A (en) Intercom system
US20230125575A1 (en) Fire panel audio interface
JP2014011609A (en) Information transmission system, transmitter, receiver, information transmission method, and program
DE102017011315B3 (en) Alarm-enabled microphone

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20120118

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: NEDERLANDSE ORGANISATIE VOOR TOEGEPAST- NATUURWETE

17Q First examination report despatched

Effective date: 20151110

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20160521