CN101501564A - Video surveillance system and method with combined video and audio recognition - Google Patents

Video surveillance system and method with combined video and audio recognition Download PDF

Info

Publication number
CN101501564A
CN101501564A CNA2006800555140A CN200680055514A CN101501564A CN 101501564 A CN101501564 A CN 101501564A CN A2006800555140 A CNA2006800555140 A CN A2006800555140A CN 200680055514 A CN200680055514 A CN 200680055514A CN 101501564 A CN101501564 A CN 101501564A
Authority
CN
China
Prior art keywords
video
audio
particular event
signal
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006800555140A
Other languages
Chinese (zh)
Other versions
CN101501564B (en
Inventor
M·G·基恩兹勒
V·舍伊宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN101501564A publication Critical patent/CN101501564A/en
Application granted granted Critical
Publication of CN101501564B publication Critical patent/CN101501564B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/16Actuation by interference with mechanical vibrations in air or other fluid
    • G08B13/1654Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
    • G08B13/1672Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B19/00Alarms responsive to two or more different undesired or abnormal conditions, e.g. burglary and fire, abnormal temperature and abnormal rate of flow
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B31/00Predictive alarm systems characterised by extrapolation or other computation using updated historic data

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Emergency Management (AREA)
  • Signal Processing (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Burglar Alarm Systems (AREA)
  • Alarm Systems (AREA)

Abstract

A novel video surveillance system is made up of video and audio compression engine, a storage device and, a video and audio recognition engine. The video recognition engine detects such events as face recognition, motion detection etc, whereas audio recognition engine detects voice and other sound signatures indicating a potential alarm situation, e.g., panic voices such as screaming and yelling, or sounds such as gun shots, explosions. Combined recognition of audio and video signals provides for higher true alarm generation and lower false alarms level of the surveillance system. Additionally, the audio recognition engine provides information for directing video cameras in the direction of interest allowing better capture of an interesting scene.

Description

Video monitoring system and method with composite video and audio identification
Technical field
Invention relates generally to surveillance and the method that is used to provide security, more particularly, relates to a kind of new online (in real time) video and audio recognition systems that is used for surveillance and the processing that is used for surveillance.
Background technology
Conventional video monitoring system does not generally comprise any function or the measure of monitor audio; That is, surveillance does not comprise the audio frequency input.At most, typical video monitoring system is recorded when providing visual information and audio-frequency information such as the video monitoring system of describing in U.S. Patent No. 6724421 and 6175382.In two kinds of video monitoring systems describing in these lists of references, video data is analyzed by the intelligent surveillance engine, and is compressed so that carry out stored digital.These engines are realized various recognizers, such as recognition of face, and motion detection, the panic detection assassinated (stabbing) motion detection or the like.For example, when monitoring the inlet of skyscraper, a kind of alarm condition relates to the unexpected rapid movement of a people towards another people, means possible plundering, hit or analogous action.In this case, the intelligent surveillance engine will be discerned (success ratio is less than 100%) motion suddenly fast, and produce alarm at monitoring station.Because the result who reports to the police, police strength can be sent monitored position.Obviously, motion suddenly fast may be run to its father and mother/friend by children and produce, and in this case, the alarm of generation becomes false alarm, and this can waste sending of police strength.Another consequence that intelligent surveillance engine flase drop is surveyed is in emergency circumstances real, does not produce alarm.For example, when there is more than one man-hour at the scene this situation may appear.But not sending police strength when real emergency condition takes place is another defective of present surveillance.
The surveillance of having only video of prior art has been described among Fig. 1.Video camera array 10 is sent into video compression engine 12 to video information by video links 11.Video information is compressed, and issues memory storage 14 so that long preservation by link 16.In addition, video information is fed to video recognition engine 13 by identical video links 11.Video recognition engine 13 is carried out video recognition tasks, such as recognition of face, motion detection or the like, and produces incident and the alarm that sends to event database 15 and monitoring station 18 by link 17.Monitoring station 18 can comprise manned monitoring station, thereby the operator carries out the real-time vision monitoring of the video camera of specific quantity.When the emergency condition of thinking as the operator takes place, whether send police strength or his/her decision is depended in other emergency response troop to monitored district.According to top description, obviously do not utilize audio-frequency information, although can obtain such audio-frequency information usually in monitored district.
Represented to record the existing video monitoring system that has among Fig. 2.Video camera array 20 is sent into video and audio compression engine 22 to video information by video links 21.Simultaneously, audio-frequency information is admitted to video and audio compression engine 22 from microphone array 29 through voice frequency link 30.Video and audio-frequency information are compressed, and are sent to memory storage 24 so that long preservation by link 26.Similarly, video information is admitted to video recognition engine 23 by identical video links 21.Video recognition engine 23 is carried out video recognition tasks, such as recognition of face, motion detection or the like, and produces the incident and the alarm of issuing database 25 and monitoring station 28 by link 27.Monitoring station 28 is manned monitoring stations, thereby the operator carries out the vision monitoring of the video camera of specific quantity.When the emergency condition of thinking as the operator takes place, whether send police strength or his/her decision is depended in other emergency response troop to monitored district.According to top description, obviously do not extract Useful Information, although can in the sound signal that monitored district obtains, obtain such information usually from the audio frequency input.
As mentioned above, second kind of surveillance be recorded video and audio-frequency information simultaneously, and realizes being used for the intelligent surveillance engine of various video recognition tasks.At present, in these systems, audio-frequency information is compressed and records, and not analyzed.
When analysis video was imported, present surveillance was not utilized quite valuable audio-frequency information.Obviously, this audio-frequency information is useful, and can be widely used under many surveillance scenarios.
Thereby, it is desirable to very much the use of audio-frequency information is introduced in the video monitoring system, the use of expection audio-frequency information will reduce the number of the false alarm of surveillance generation, and improves the percent of detected true alarm, and the people who reports to the police to assessment provides more information simultaneously.Utilize video information can not find that some incidents are opposite in addition and only, utilize Voice ﹠ Video information can find these incidents.
Summary of the invention
Thereby, an object of the present invention is to provide a kind of video monitoring system and method, comprise and using and the video information that combines from the audio-frequency information that obtained by surveillance zone.
Surveillance of the present invention had both comprised the vision signal input, comprised the sound signal input again.Vision signal is derived from numeral or analog video camera, and the audio frequency input is received from the microphone that is installed in monitored district.Video and audio-frequency information are compressed and send to digital memeory device.The quantity of the stored digital that the whole video cameras realized in order to save and microphone are required, preferably compressed audio and video information.With record side by side, video and audio frequency input are admitted to smart recognition engine, smart recognition engine is carried out video identification, audio identification, and carry out the instantaneous of result that is derived from video-audio identification and be correlated with, so that detection/recognition is represented one group of particular event of panic situation, such as high pitch birdie, blast, gunslinging etc.The alarm that smart recognition engine generates can be sent to monitoring station, and at monitoring station, whether operator's decision sends police or emergency personnel to monitored district.
According to one aspect of the present invention, smart recognition engine is carried out available video identification algorithm, such as recognition of face, motion detection or the like, and the audio/speech recognizer that is used for the specific vocabulary of speech recognition (" help ", " robbery " etc.).Audio recognition engine can be trained, and to discern special sound signal, such as gunslinging, blast etc., and representative is reported to the police or high-pitched tone and other phonetic feature (signature) of emergency condition.
By utilizing the microphone array of arranging along specific orientation, can determine the direction of sound.Directed audio-frequency information can be delivered to camera control unit subsequently, so that one/a plurality of cameras oriented are to interested direction.So can carry out further video/audio recognition with better efficient.Thereby, for example, utilizing the microphone array in the monitored district, audio recognition engine can detect explosive sound.As a result, will make cameras oriented arrive the blast direction, and will in video engine, carry out subsequent action---to monitoring station warning scene Recognition/understanding.At once the result who is derived from video and audio identification is used to instruct the further assessment of the Voice ﹠ Video of recording, and instruct the improvement of new video and audio frequency input to record, advantageously improved the accuracy that detects, shortened the used time of character of definite alarm, and provide more information to the operator of assessment situation.
The output of video recognition engine and audio recognition engine is by mutual recognition engine analysis, and the result generates final alarm and is transmitted to monitoring station.
In order to realize these and other objects, according to a preferred aspect of the present invention, provide a kind of surveillance and method, and computer program, wherein said system comprises:
Generate the device of real time video signals, described real time video signals is included in the video information that is subjected to acquisition in the surveillance zone;
Obtain the device of real-time audio signal, described real-time audio signal comprises the audio-frequency information that is subjected to surveillance zone from described;
Receive described vision signal and sound signal simultaneously, therefrom determine relevant video and audio identification information, and real-time audio and video information are relative to each other with the device of the possibility occurrence of determining particular event; With
According to the generation of particular event, produce the device of alarm condition.
Description of drawings
According to following explanation, additional claim and accompanying drawing, will understand further feature, aspect and the advantage of structure of the present invention and method better, wherein:
Fig. 1 graphic extension is according to the surveillance of having only video of prior art;
Fig. 2 graphic extension is according to the video monitoring system with audio recording ability of prior art;
Fig. 3 graphic extension is according to the video monitoring system with video and audio identification of the present invention; And
Fig. 4 graphic extension is according to the details of smart recognition engine of the present invention.
Embodiment
Fig. 3 graphic extension is according to the video monitoring system with video and audio identification of the present invention.As shown in Figure 3, comprise one or more colours or monochromatic still life or video electronic video camera, for example CCD or cmos camera, the video camera array 40 that perhaps has the equivalent combinations of taking the assembly that is subjected to surveillance zone is sent into digital video and audio compression engine 42 to vision signal by video communication link 41.For example under computing machine and/or software control, the motion of each camera system of video camera array 40 and operation can be by the control signal controls that receives.In addition, the operating parameter of each video camera in the video camera array 40 comprises panorama (pan)/pitching (tilt) mirror, lens combination, focusing motor, panorama motor and pitching motor control by the control signal control that receives, and is following described in more detail.Before the output digital video signal, can use many signal processing technologies for example to reduce noise or filtering/image enhancement technique is provided.
Simultaneously, setting comprises the microphone array 49 that can convert acoustic pressure to the microphone sensor devices (microphone of omnidirectional and/or high orientation) of electric signal, by voice communication link 50 audio-frequency information is sent into digital video and audio compression engine 42.Those skilled in the art is known, the directivity degree of microphone array changes with sound frequency, so that can consider required frequency range capability, determine the number of microphone and the distance between the microphone, so that the directivity of any given extent can be provided.For example, can under software control, be controlled at the microphone realized in the array to realize these targets, and the microphone of realizing in array comprises transducer, and described transducer is configured to have obviously is partial to for example pickup mode of each frequency reception in scopes such as human speech, blast, gunslinging.Like this, thus guarantee that microphone array is the sound field with higher accuracy response sound event of susceptible.Can use other sound signal regulation technology, the simulated audio signal that for example utilizes the A/D converter digitizing to obtain, and provide for example gain control, noise reduction/filter to make an uproar.Digitized video and audio-frequency information be by digital compression, and be sent to memory storage 44 so that long preservation by link 46, for example, and database, hard disk drive, magnetic medium or optical medium include, but is not limited to: CD-ROM, DVD, tape, disk, disk array or the like.The output of each video camera of video camera array 40 is with compressed format, and such as MPEG1, MPEG2 etc. are stored in the storage medium.In addition, the output of each video camera of video camera array 40 can be stored in ad-hoc location related with this video camera on the storage medium, perhaps is saved together with the output of each preservation indication corresponding to which video camera.
As further shown in Figure 3, identical video information and audio-frequency information are sent into smart recognition engine 43 simultaneously by corresponding video links 41 and voice frequency link 50 in addition.Should be understood that the communication link 41 and 50 between corresponding video camera array and audio microphone array and video and audio compression engine 42 and the smart recognition engine 43 can be hard-wired, perhaps can adopt Radio Link.In addition, these communication links are taked cable, satellite, RF and microwave transmission, optical fiber or the like form also within the scope of the invention.
Following described in more detail, as shown in Figure 4, smart recognition engine 43 comprises video recognition engine 62, audio recognition engine 63, mutual recognition engine and warning generation module 64.Smart recognition engine 43 realization control computer equipment are finished and are carried out video identification algorithm and the method for face recognition algorithms and the software of process.These algorithms can and motion detection algorithm (for example, following the tracks of each point) with known (patch) relevant or track algorithm of the motion of feature in the estimated image stream wait execution together.In addition, smart recognition engine 43 realization control computer equipment are finished and are carried out audio identification and the method for speech recognition algorithm and the software of process.The speech recognition algorithm that is embodied as computer-readable instruction, data structure, program module etc. can be used to discern the specific spoken words (" help ", " robbery " etc.) of representing emergency potentially or answering alarm condition.
The audio recognition engine 63 that comprises computer-readable instruction, data structure, program module or other data can be trained, special audio signal with identification such as gunslinging, blast, and higher pitch sounds, for example, scream, cry in fear, and with known relevant other sound and the voice characteristics of incident of reporting to the police of may causing.But be appreciated that according to the present invention and can adopt the various recognizers that do not require training in advance.
The computing equipment of realizing comprises the general purpose computing device such as PC, laptop devices, mobile device, have and include, but is not limited to processing unit, system storage and system bus at interior assembly, described system bus coupling comprises that each system component of system storage is to handling the unit.Computer equipment is realized these assemblies, so that carry out the smart recognition engine and the audio recognition engine that are kept on the known computer-readable medium, described computer-readable medium comprises any usable medium that can be visited by computer equipment, comprises detachable media, non-dismountable medium, Volatile media and non-volatile media.Computer readable recording medium storing program for performing can for example concentrate on a position, perhaps be dispersed in the computer system that connects by network, and the computer-readable recognizer can be stored in the computer readable recording medium storing program for performing, and be performed according to the mode of disperseing.
Return Fig. 3, by utilize specific towards microphone array 49, the direction of sound is confirmable.The directional information that relates to sensed audio event is passed to camera microphone control module 52 by wired or wireless communication link 53.Video camera/microphone control module 52 comprises realization by control signal 54, and the motor position of the position of one/a plurality of video cameras of aligned array 40 and control microphone array 49 is controlled necessary whole software.For example, control signal can be transfused to video camera array 40, to adjust or control video camera panorama/tilt mirrors, lens combination, focusing motor, panorama motor and pitching motor assembly and subsystem.These control signals are used to the being seen visual field of automatic aligned with camera in addition, so that obtain image better placed in the middle about the more information of actual alarm or alert event, and that perhaps amplify more, that focus on or distinct image more.In a non-limitative example, the audio identification of the gunslinging sound signal of response smart recognition engine can generate this scene of one or more camera alignment video camera array, with the control signal of " attention " gunslinging direction.If according to the audio identification of gunslinging, make video camera array aim at the scene of a crime, so owing to the more information that can obtain about gunslinging, the situation of " crime dramas " identification can be better.As an alternative or in addition, can produce these control signals to be used for the adjusting orientation of microphone and the distance between the microphone automatically, so that receive subsidiary audio-frequency information better.In addition, consider the sound signal that detects required frequency range, the directivity of any given degree perhaps is provided, can adjust microphones orientation.Thereby, for example, the response video recognition event, one or more microphones can be redirected, so that " intercepting " specific direction.
More particularly, as shown in Figure 4, analyze by mutual recognition engine 64, handling video and the audio identification information of receiving simultaneously, and finally determine whether to exist alarm condition from the output of video recognition engine 62 and audio recognition engine 63.In this way, can generate the alarm that is transmitted to manned monitoring station 48 by communication link 47.That is, use in mutual recognition engine 64, the identifying that adopts as computer-readable instruction, data structure, program module etc. is usually based on pattern match and/or hypothesis evaluation.In evaluation stage, determine the estimator of the probability of variety of event.This can be by determining to exist the correlativity of which kind of degree to realize between the subsidiary speech of the video scene of each identification and identification or audio frequency characteristics according to real-time video identifying information and sound signal.In an example recognition event, in order to discern the assassination activity, video information is used to manage to assess the probability of each video scene.If known such scene can be with higher pitch sounds (scream etc.), from the audio frequency input, detect so high-pitched tone can increase its for as the result's of the assassination activity of in vision signal, catching probability.The resolution that police or emergency personnel are depended on the operator is sent to the monitoring area in the specific region that operator's vision monitoring video camera array 40 monitors, and report to the police when indicating when the alarm generation unit provides whether.According to top description, obviously exist from the audio frequency input and extract useful information, by combining with video recognition event, this has improved the whole operation of surveillance.
In addition as shown in Figure 4, the communication link 60 between video recognition engine 62 and the mutual recognition engine 64 is two-way, just the communication link 61 between audio recognition engine 63 and mutual recognition engine 64. Link 60 and 61 amphicheirality allow video and audio recognition algorithm as described above mode influence each other, thereby, identification video and audio frequency better, and may realize detecting up to now can not detected particular event.
Although represent in detail and the present invention be described about illustrative and embodiment that realize of the present invention, but those skilled in the art understands and can make aforementioned aspect form and the details and other variation, and not breaking away from the spirit and scope of the present invention, the spirit and scope of the present invention only should be limited by the scope of additional claim.

Claims (27)

1, a kind of surveillance of utilizing video and audio identification comprises:
Generate the device of real time video signals, described real time video signals is included in the video information that is subjected to acquisition in the surveillance zone;
Obtain the device of real-time audio signal, described real-time audio signal comprises the audio-frequency information that is subjected to surveillance zone from described;
Receive described vision signal and sound signal simultaneously, therefrom determine relevant video and audio identification information, and real-time audio and video information are relative to each other with the device of the possibility occurrence of determining particular event; With
According to the generation of described particular event, produce the device of alarm condition.
2, according to the described system of claim 1, wherein said treating apparatus comprises first recognition engine and is used to handle described vision signal, to be used for determining described video identification information.
3, according to the described system of claim 2, wherein said treating apparatus comprises second recognition engine and is used to handle described sound signal, to be used for determining described audio identification information.
4, according to the described system of claim 1, wherein said treating apparatus comprises mutual recognition device so that described Voice ﹠ Video identifying information is relevant and the ability of the generation of raising detection particular event.
5, according to the described system of claim 4, the device of wherein said generation real time video signals comprises one or more camera systems, described mutual recognition device also comprises in response to the generation that identifies this particular event according to the described audio identification of particular event, generates control signal is caught vision signal in the direction of this particular event with the one or more video cameras in the guiding camera system device.
6, according to the described system of claim 5, wherein each described video camera apparatus comprises the described control signal of response one or more with in one or more panorama/tilt mirrors, lens combination, focusing motor, panorama motor and the pitching motor assembly in the panorama of adjusting video camera apparatus, pitching, convergent-divergent, rotation, passing, the translation controlled variable.
7, according to the described system of claim 4, the device of wherein said generation real-time audio signal comprises one or more microphone apparatus, described mutual recognition device comprises that also response identifies the generation of possibility incident according to the described video identification of particular event, generate control signal with the one or more microphones in the guiding microphone apparatus, thereby can catch the device of audio identification information in the direction of this particular event.
8, according to the described system of claim 7, wherein each described microphone apparatus responds described control signal, considers that the sound signal that detects required frequency range adjusts the orientation of microphone automatically.
9, according to the described system of claim 7, wherein each described microphone apparatus responds described control signal, considers the orientation of adjusting microphone with the directivity received audio signal of any given degree automatically.
10,, also comprise the device of preserving described Voice ﹠ Video data according to the described system of claim 1.
11, according to the described system of claim 10, also be included in be kept at described Voice ﹠ Video data in the described memory storage before, compress the device of described Voice ﹠ Video data.
12, a kind of method for monitoring that utilizes video and audio identification comprises the steps:
Receive real time video signals and real-time audio signal simultaneously at treating apparatus, described real time video signals is included in the video information that is subjected to acquisition in the surveillance zone, and described real-time audio signal comprises the audio-frequency information that is subjected to surveillance zone from described;
Determine relevant video identification and audio identification information from the video and audio signal of described reception;
Real-time audio and video identification information are relative to each other, to determine the possibility occurrence of particular event; With
Generation according to described particular event produces alarm condition.
13, according to the described method for monitoring of claim 12, wherein said treating apparatus comprises first recognition engine, and described first recognition engine realizes determining described video identification information processing step from described vision signal.
14, according to the described method for monitoring of claim 13, wherein said treating apparatus comprises second recognition engine, and described second recognition engine realizes determining described audio identification information processing step from described sound signal.
15, according to the described method for monitoring of claim 12, wherein said treating apparatus comprises mutual recognition device being used to make the Voice ﹠ Video identifying information relevant, and improves the ability of the generation that detects particular event.
16, according to the described method for monitoring of claim 15, wherein with the described receiving step while, also comprise the step that obtains described real time video signals by one or more video camera apparatus, described mutual recognition device also comprises response and identifies may taking place of this particular event according to the described audio identification of particular event, generate the device of control signal, described control signal is suitable for guiding the one or more video cameras in the camera system to catch vision signal in the direction of this particular event.
17, according to the described method for monitoring of claim 16, each in wherein said one or more video camera apparatus comprises the described control signal of response and adjusts one or more in one or more panorama/tilt mirrors, lens combination, focusing motor, panorama motor and the pitching motor assembly in the panorama of camera system, pitching, convergent-divergent, rotation, passing, the translation controlled variable.
18, according to the described method for monitoring of claim 15, wherein with the described receiving step while, also comprise the step that obtains described real-time audio signal by one or more microphone apparatus, described mutual recognition device comprises that also response identifies may taking place of incident according to the video identification of particular event, generate the device of control signal, described control signal is suitable for guiding the one or more microphones in the microphone apparatus to catch sound signal in the direction of this particular event.
19, according to the described method for monitoring of claim 18, wherein each described microphone apparatus responds described control signal, considers that the sound signal that detects required frequency range adjusts the orientation of microphone automatically.
20, according to the described method for monitoring of claim 18, wherein each described microphone apparatus responds described control signal, considers the orientation of adjusting microphone with the directivity received audio signal of any given degree automatically.
21,, also comprise described Voice ﹠ Video data are kept at step in the data storage device according to the described method for monitoring of claim 12.
22, according to the described method for monitoring of claim 21, further comprising the steps of: the described Voice ﹠ Video data of compression before being kept at the Voice ﹠ Video data in the described data storage device.
23, a kind of machine-readable program storage device is realized utilizing video and audio identification to carry out the programmed instruction of the method step of area monitoring thereby described program storage device comprises to be carried out by machine really, and described method step comprises the steps:
Receive real time video signals and real-time audio signal simultaneously at treating apparatus, described real time video signals is included in the video information that is subjected to acquisition in the surveillance zone, and described real-time audio signal comprises the audio-frequency information that is subjected to surveillance zone from described;
Determine relevant video identification and audio identification information from the video and audio signal of described reception;
Real-time audio and video identification information are relative to each other, to determine the possibility occurrence of particular event; With
According to the generation of described particular event, produce alarm condition.
24, according to the described machine-readable program storage device of claim 23, wherein said treating apparatus comprises: realize determining first recognition engine of described video identification information processing step and described audio identification information processing step is determined in realization from described sound signal second recognition engine from described vision signal.
25, according to the described machine-readable program storage device of claim 24, wherein said treating apparatus comprises mutual recognition device, and described mutual recognition device makes the Voice ﹠ Video identifying information relevant, and improves the ability of the generation that detects particular event.
26, according to the described machine-readable program storage device of claim 25, wherein with the described receiving step while, also comprise the step that obtains described real time video signals by one or more video camera apparatus, described mutual recognition device also comprises response and identifies may taking place of this particular event according to the described audio identification of particular event, generate the device of control signal, described control signal is suitable for guiding the one or more video cameras in the camera system to catch vision signal in the direction of this particular event.
27, according to the described machine-readable program storage device of claim 25, wherein with the described receiving step while, also comprise the step that obtains described real-time audio signal by one or more microphone apparatus, described mutual recognition device comprises that also response identifies may taking place of this particular event according to the video identification of particular event, generate the device of control signal, described control signal is suitable for guiding the one or more microphones in the microphone apparatus to catch sound signal in the direction of this particular event.
CN2006800555140A 2006-08-03 2006-08-03 video surveillance system and method with combined video and audio recognition Active CN101501564B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2006/030560 WO2008016360A1 (en) 2006-08-03 2006-08-03 Video surveillance system and method with combined video and audio recognition

Publications (2)

Publication Number Publication Date
CN101501564A true CN101501564A (en) 2009-08-05
CN101501564B CN101501564B (en) 2012-02-08

Family

ID=38997456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006800555140A Active CN101501564B (en) 2006-08-03 2006-08-03 video surveillance system and method with combined video and audio recognition

Country Status (6)

Country Link
JP (1) JP5043940B2 (en)
CN (1) CN101501564B (en)
BR (1) BRPI0621897B1 (en)
CA (1) CA2656268A1 (en)
MX (1) MX2009001254A (en)
WO (1) WO2008016360A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082948B (en) * 2009-11-30 2012-07-25 中国移动通信集团北京有限公司 System, method and equipment for acquiring video information
CN103067655A (en) * 2011-10-24 2013-04-24 鸿富锦精密工业(深圳)有限公司 System and method of controlling video camera device
CN103136899A (en) * 2013-01-23 2013-06-05 宁凯 Intelligent alarming monitoring method based on Kinect somatosensory equipment
CN103747217A (en) * 2014-01-26 2014-04-23 国家电网公司 Video monitoring method and device
CN104269016A (en) * 2014-09-22 2015-01-07 北京奇艺世纪科技有限公司 Alarm method and device
CN104333686A (en) * 2014-11-27 2015-02-04 天津天地伟业数码科技有限公司 Intelligent monitoring camera based on face and voiceprint recognition and control method of intelligent monitoring camera
CN105338294A (en) * 2014-08-07 2016-02-17 富士通株式会社 Monitoring device and method
CN106028217A (en) * 2016-06-20 2016-10-12 咻羞科技(深圳)有限公司 Intelligent device interacting system and method based on audio identification technology based
CN106600876A (en) * 2017-01-24 2017-04-26 璧典寒 Automatic machine room duty alarming system and alarming method
CN107031624A (en) * 2015-10-22 2017-08-11 福特全球技术公司 Drill carriage stitches the detection of motorcycle
CN109089087A (en) * 2018-10-18 2018-12-25 广州市盛光微电子有限公司 The audio-visual linkage of multichannel
CN109543538A (en) * 2018-10-23 2019-03-29 深圳壹账通智能科技有限公司 Obtain method, apparatus, computer equipment and the storage medium of the track of alert object
CN110336976A (en) * 2019-06-13 2019-10-15 长江大学 A kind of intelligent monitoring probe and system
TWI687753B (en) * 2018-12-06 2020-03-11 宏碁股份有限公司 Panoramic camera and panoramic photography system
CN111091073A (en) * 2019-11-29 2020-05-01 清华大学 Abnormal event monitoring equipment and method combining video and audio
CN111460907A (en) * 2020-03-05 2020-07-28 浙江大华技术股份有限公司 Malicious behavior identification method, system and storage medium
CN112396801A (en) * 2020-11-16 2021-02-23 苏州思必驰信息科技有限公司 Monitoring alarm method, monitoring alarm device and storage medium
CN112425157A (en) * 2018-07-24 2021-02-26 索尼公司 Information processing apparatus and method, and program
CN113920660A (en) * 2021-09-30 2022-01-11 中国工商银行股份有限公司 Safety monitoring method and system suitable for safety storage equipment
WO2022016573A1 (en) * 2020-07-21 2022-01-27 南京智金科技创新服务中心 Video monitoring analysis system and method

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286911B2 (en) 2008-12-15 2016-03-15 Audio Analytic Ltd Sound identification systems
GB2466242B (en) * 2008-12-15 2013-01-02 Audio Analytic Ltd Sound identification systems
JP5958833B2 (en) 2013-06-24 2016-08-02 パナソニックIpマネジメント株式会社 Directional control system
EP2927885A1 (en) * 2014-03-31 2015-10-07 Panasonic Corporation Sound processing apparatus, sound processing system and sound processing method
US10182280B2 (en) 2014-04-23 2019-01-15 Panasonic Intellectual Property Management Co., Ltd. Sound processing apparatus, sound processing system and sound processing method
EP2938097B1 (en) * 2014-04-24 2017-12-27 Panasonic Corporation Sound processing apparatus, sound processing system and sound processing method
US9813484B2 (en) 2014-12-31 2017-11-07 Motorola Solutions, Inc. Method and apparatus analysis of event-related media
US20160241818A1 (en) * 2015-02-18 2016-08-18 Honeywell International Inc. Automatic alerts for video surveillance systems
JP6682222B2 (en) * 2015-09-24 2020-04-15 キヤノン株式会社 Detecting device, control method thereof, and computer program
CN105491336B (en) * 2015-12-08 2018-07-06 成都芯软科技发展有限公司 A kind of low power image identification module
CN106023515A (en) * 2016-07-06 2016-10-12 中警科技(江苏)开发有限公司 Remote automatic alarm police kiosk
WO2018075068A1 (en) 2016-10-21 2018-04-26 Empire Technology Development Llc Selecting media from mass social monitoring devices
US10810854B1 (en) 2017-12-13 2020-10-20 Alarm.Com Incorporated Enhanced audiovisual analytics
CN109033997A (en) * 2018-07-02 2018-12-18 厦门快商通信息技术有限公司 A kind of lumbering event detecting method and system
EP3839909A1 (en) * 2019-12-18 2021-06-23 Koninklijke Philips N.V. Detecting the presence of an object in a monitored environment
DE102020209025A1 (en) 2020-07-20 2022-01-20 Robert Bosch Gesellschaft mit beschränkter Haftung Method for determining a conspicuous partial sequence of a surveillance image sequence
GB202019713D0 (en) * 2020-12-14 2021-01-27 Vaion Ltd Security system
CN112929372A (en) * 2021-02-06 2021-06-08 北京第七九七音响股份有限公司 Network intelligent audio terminal, monitoring method and monitoring system
GB2620594B (en) * 2022-07-12 2024-09-25 Ava Video Security Ltd Computer-implemented method, security system, video-surveillance camera, and server

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3381343B2 (en) * 1993-12-03 2003-02-24 株式会社日立製作所 Monitoring device
JPH0983856A (en) * 1995-09-07 1997-03-28 Nippon Telegr & Teleph Corp <Ntt> Intelligent camera equipment
US6175382B1 (en) * 1997-11-24 2001-01-16 Shell Oil Company Unmanned fueling facility
US6611206B2 (en) * 2001-03-15 2003-08-26 Koninklijke Philips Electronics N.V. Automatic system for monitoring independent person requiring occasional assistance
CN1186923C (en) * 2003-04-03 2005-01-26 上海交通大学 Abnormal object automatic finding and tracking video camera system
JP4175180B2 (en) * 2003-05-29 2008-11-05 松下電工株式会社 Monitoring and reporting system
CN1716329A (en) * 2004-06-29 2006-01-04 乐金电子(沈阳)有限公司 Baby monitoring system and its method using baby's crying frequency
CN200966113Y (en) * 2006-11-08 2007-10-24 天津三星电子有限公司 A monitor with the audio locking functions

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082948B (en) * 2009-11-30 2012-07-25 中国移动通信集团北京有限公司 System, method and equipment for acquiring video information
CN103067655A (en) * 2011-10-24 2013-04-24 鸿富锦精密工业(深圳)有限公司 System and method of controlling video camera device
CN103136899B (en) * 2013-01-23 2016-01-20 宁凯 Based on the intelligent alarm method for supervising of Kinect somatosensory device
CN103136899A (en) * 2013-01-23 2013-06-05 宁凯 Intelligent alarming monitoring method based on Kinect somatosensory equipment
CN103747217A (en) * 2014-01-26 2014-04-23 国家电网公司 Video monitoring method and device
CN105338294A (en) * 2014-08-07 2016-02-17 富士通株式会社 Monitoring device and method
CN104269016A (en) * 2014-09-22 2015-01-07 北京奇艺世纪科技有限公司 Alarm method and device
CN104333686A (en) * 2014-11-27 2015-02-04 天津天地伟业数码科技有限公司 Intelligent monitoring camera based on face and voiceprint recognition and control method of intelligent monitoring camera
CN104333686B (en) * 2014-11-27 2018-03-27 天地伟业技术有限公司 Intelligent monitoring camera and its control method based on face and Application on Voiceprint Recognition
CN107031624A (en) * 2015-10-22 2017-08-11 福特全球技术公司 Drill carriage stitches the detection of motorcycle
CN107031624B (en) * 2015-10-22 2021-10-15 福特全球技术公司 Detection of drill-seam motorcycle
CN106028217A (en) * 2016-06-20 2016-10-12 咻羞科技(深圳)有限公司 Intelligent device interacting system and method based on audio identification technology based
CN106028217B (en) * 2016-06-20 2020-01-21 咻羞科技(深圳)有限公司 Intelligent equipment interaction system and method based on audio recognition technology
CN106600876A (en) * 2017-01-24 2017-04-26 璧典寒 Automatic machine room duty alarming system and alarming method
CN112425157A (en) * 2018-07-24 2021-02-26 索尼公司 Information processing apparatus and method, and program
CN109089087A (en) * 2018-10-18 2018-12-25 广州市盛光微电子有限公司 The audio-visual linkage of multichannel
CN109089087B (en) * 2018-10-18 2020-09-29 广州市盛光微电子有限公司 Multi-channel audio-video linkage device
CN109543538A (en) * 2018-10-23 2019-03-29 深圳壹账通智能科技有限公司 Obtain method, apparatus, computer equipment and the storage medium of the track of alert object
TWI687753B (en) * 2018-12-06 2020-03-11 宏碁股份有限公司 Panoramic camera and panoramic photography system
CN110336976A (en) * 2019-06-13 2019-10-15 长江大学 A kind of intelligent monitoring probe and system
CN111091073A (en) * 2019-11-29 2020-05-01 清华大学 Abnormal event monitoring equipment and method combining video and audio
CN111460907A (en) * 2020-03-05 2020-07-28 浙江大华技术股份有限公司 Malicious behavior identification method, system and storage medium
WO2022016573A1 (en) * 2020-07-21 2022-01-27 南京智金科技创新服务中心 Video monitoring analysis system and method
CN112396801A (en) * 2020-11-16 2021-02-23 苏州思必驰信息科技有限公司 Monitoring alarm method, monitoring alarm device and storage medium
CN113920660A (en) * 2021-09-30 2022-01-11 中国工商银行股份有限公司 Safety monitoring method and system suitable for safety storage equipment
CN113920660B (en) * 2021-09-30 2023-04-18 中国工商银行股份有限公司 Safety monitoring method and system suitable for safety storage equipment

Also Published As

Publication number Publication date
BRPI0621897B1 (en) 2018-03-20
BRPI0621897A2 (en) 2011-03-29
CN101501564B (en) 2012-02-08
JP2009545911A (en) 2009-12-24
MX2009001254A (en) 2009-02-11
JP5043940B2 (en) 2012-10-10
WO2008016360A1 (en) 2008-02-07
CA2656268A1 (en) 2008-02-07

Similar Documents

Publication Publication Date Title
CN101501564B (en) video surveillance system and method with combined video and audio recognition
US20060227237A1 (en) Video surveillance system and method with combined video and audio recognition
CN109300471B (en) Intelligent video monitoring method, device and system for field area integrating sound collection and identification
KR101445367B1 (en) Intelligent cctv system to recognize emergency using unusual sound source detection and emergency recognition method
CN101119482B (en) Overall view monitoring method and apparatus
CN102737480B (en) Abnormal voice monitoring system and method based on intelligent video
CN111601074A (en) Security monitoring method and device, robot and storage medium
CN109616140B (en) Abnormal sound analysis system
WO1997008896A1 (en) Open area security system
KR101864388B1 (en) Peculiar sound detection system and method to be cancelled out the noise by using array microphone in CCTV camera system
KR101899436B1 (en) Safety Sensor Based on Scream Detection
CN102176746A (en) Intelligent monitoring system used for safe access of local cell region and realization method thereof
KR101444843B1 (en) System for monitoring image and thereof method
CN116129490A (en) Monitoring device and monitoring method for complex environment behavior recognition
CN201830388U (en) Video content collecting and processing device
CN113781702B (en) Cash box management method and system based on internet of things
CN117830053A (en) Perimeter security alarm system and method
CN213042656U (en) Information processing apparatus
KR101093022B1 (en) System for monitoring in passenger car
CN103945049A (en) Device used for collecting evidence and automatically giving alarm in mobile terminal and method
KR101589823B1 (en) Cctv monitoring system providing variable display environment to search event situation efficiently
CN115393798A (en) Early warning method and device, electronic equipment and storage medium
KR100902275B1 (en) Cctv system for intelligent security and method thereof
CN111627178A (en) Sound identification positioning warning system and method thereof
JP2014011609A (en) Information transmission system, transmitter, receiver, information transmission method, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant