CN105474665A

CN105474665A - Sound processing apparatus, sound processing system, and sound processing method

Info

Publication number: CN105474665A
Application number: CN201480044954.0A
Authority: CN
Inventors: 岸本伦典; 渡边泰章; 高桑诚; 中村学; 首藤秀树; 田村健二; 山崎龙次; 齐藤宽夫; 秋山明宽
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2014-03-31
Filing date: 2014-03-31
Publication date: 2016-04-06
Also published as: JPWO2015151130A1; WO2015151130A1

Abstract

Provided is a sound processing apparatus that can promote the utilization of sound data and image data, thereby improving convenience. The sound processing apparatus comprises: a data acquisition unit for acquiring the sound data of sounds picked up by a sound pick-up unit including a plurality of microphones and for also acquiring the image data of images captured by an image capture unit; a designation unit for designating a plurality of directions with reference to the sound pick-up unit that correspond to designated points on an image displayed on the basis of the image data; and a directivity processing unit for emphasizing the sound components of the sound data in the plurality of directions designated by the designation unit.

Description

Sound processing apparatus, sound processing system and sound processing method

Technical field

The present invention relates to sound processing apparatus, sound processing system and sound processing method.

Background technology

In the past, such as, when the situation of specific indoor or remote place to workshop, shop, public place monitors, surveillance is utilized.Surveillance possess such as photographic images camera, sound is carried out radio reception microphone, store tentation data (sound that the image such as, photographed, radio reception arrive) recorder device.By using surveillance, such as, when the event of generation or accident, reset to the data in the past of recorder device record, the situation that the image of record or sound can contribute to the moment is in the past grasped.

As surveillance in the past, there will be a known the system of comprehensive camera and microphone array.This system utilizes the array microphone formed by multiple microphone, only extracts the sound from specific direction, and form wave beam (for example, referring to patent documentation 1) by filtering.

Patent documentation 1: Japanese Laid-Open 2004-32782 publication

Summary of the invention

Use array microphone radio reception to voice data may comprise various useful information.In the surveillance of patent documentation 1, effective utilization of voice data and view data is also insufficient, expects to improve the convenience utilizing the user of surveillance.

The present invention makes in view of the foregoing, provides a kind of and can promote effective utilization of voice data and view data and improve the sound processing apparatus of convenience, sound processing system and sound processing method.

The sound processing apparatus of a mode of the present invention possesses: data acquisition, obtain by the radio reception portion radio reception comprising multiple microphone to voice data and the view data that photographed by shoot part; Specifying part, specifies the multiple directions that with above-mentioned radio reception portion are benchmark corresponding with the appointed part on the image shown based on above-mentioned view data; And directive property handling part, strengthen the acoustic constituents in the tut data of the multiple directions of being specified by above-mentioned specifying part.

And the sound processing system of another way of the present invention possesses: audio signal reception device, comprise the radio reception portion using multiple microphone to carry out radio reception; Filming apparatus, comprises the shoot part taken image; And sound processing apparatus, to by above-mentioned radio reception portion radio reception to voice data process, tut processing unit possesses: data acquisition, obtain by above-mentioned radio reception portion radio reception to voice data and the view data that photographed by above-mentioned shoot part; Specifying part, specifies the multiple directions that with above-mentioned radio reception portion are benchmark corresponding with the appointed part on the image shown based on above-mentioned view data; And directive property handling part, strengthen the acoustic constituents in the tut data of the multiple directions of being specified by above-mentioned specifying part.

And the sound processing method of another mode of the present invention comprises the following steps: obtain by the radio reception portion radio reception comprising multiple microphone to voice data and the view data that photographed by shoot part; Specify the multiple directions that with above-mentioned radio reception portion are benchmark corresponding with the appointed part on the image shown based on above-mentioned view data; And strengthen specified by above-mentioned multiple directions tut data in acoustic constituents.

Invention effect

According to the present invention, effective utilization of voice data and view data can be promoted, improve convenience.

Accompanying drawing explanation

Fig. 1 is the synoptic diagram of the surveillance of the first execution mode.

Fig. 2 is the block diagram of the configuration example of the surveillance representing the first execution mode.

Fig. 3 is the vertical view of an example of the configuration status representing the array microphone of the first execution mode, camera and each sound source.

Fig. 4 is the flow chart of the action case of the Monitor and Control device representing the first execution mode.

Fig. 5 is the synoptic diagram of the directive property process of the first execution mode.

Fig. 6 is the synoptic diagram of the surveillance of the second execution mode.

Fig. 7 is the block diagram of the configuration example of the surveillance representing the second execution mode.

Fig. 8 is the flow chart of the action case of the Monitor and Control device representing the second execution mode.

Embodiment

About embodiments of the present invention, accompanying drawing is below used to be described.

(the first execution mode)

Fig. 1 is the synoptic diagram of the surveillance 100 of the first execution mode.In surveillance 100, array microphone 10 (Arraymicrophones), camera 20 and Monitor and Control device 30 connect via wired or wireless network 50.

Array microphone 10 is examples for radio reception portion, audio signal reception device.Camera 20 is examples for shoot part, filming apparatus.Monitor and Control device 30 is examples for sound processing apparatus.Surveillance 100 is examples for sound processing system.

Array microphone 10 comprise multiple microphone 11 (11A, 11B, 11C ...), the sound of the surrounding of pair array microphone 10 carries out radio reception, obtains voice data.The presumptive area can taken by camera 20 taken by camera 20, obtains view data.View data comprises such as live image or rest image.The radio reception result of Monitor and Control device 30 according to array microphone 10, the shooting results of camera 20, carry out and monitor relevant various process.

In surveillance 100, load integratedly in unit framework 91 16 microphones that 1 camera 20 and array microphone 10 comprise 11 (11A, 11B, 11C ...), thus form radio unit 90.In addition, the number of the microphone in array microphone 10 can be less than 15, also can be more than 17.And array microphone 10 and camera 20 also can not form radio unit 90 and split is formed.

The center (optical axis direction) of shooting direction is such as set to the substantially central portion being configured in direction unit framework 91 under vertical by camera 20.Multiple microphones 11 in array microphone 10 are configured in 1 circumferentially at certain intervals in the mode of the surrounding of camera 20 being surrounded along the installation surface of unit framework 91.Multiple microphone 11 also can not be configured in circumferentially and be configured on such as rectangle.In addition, such camera 20 and the configuration relation of multiple microphone 11, configuration shape are examples, also can be other configuration relation, configuration shape.

Camera 20 is configured to the subject such as simultaneously can taking on a large scale (such as comprehensive).Each microphone 11 is configured to such as can to detect from a large scale the sound wave that (such as comprehensive) arrives.

Fig. 2 is the block diagram of the configuration example representing surveillance 100.

Surveillance 100 possesses array microphone 10, camera 20 and Monitor and Control device 30.Array microphone 10, camera 20 and Monitor and Control device 30 connect into the state mutually can carrying out data communication via network 50.And, Monitor and Control device 30 connects such as monitor 61, touch panel 62 and loud speaker 63.

The structure of Fig. 2 can imagine the situation of such as Monitor and Control device 30 recording image data and voice data when monitoring picture and sound in real time.In addition, also can be, camera 20 recording image data, array microphone 10 recording voice data, after reordering can reference image data and voice data.

In addition, below, 3 microphones 11A, 11B, 11C in the multiple microphones 11 mainly comprised with array microphone 10 are described for representative.About the microphone beyond 3 microphone 11A ~ 11C, also there is the structure same with microphone 11A ~ 11C and function.

Array microphone 10 by multiple microphone 11A, 11B, 11C with state close to each other (such as circumferentially) arrangement regularly, formed.Microphone 11A ~ 11C is the transducer converting tones into the signal of telecommunication (voice data).In array microphone 10, multiple microphone 11A, 11B, 11C may not be and configure regularly.In this case, such as, the information of the position of each microphone 11A ~ 11C can be held in surveillance 100, carry out directive property process.

The output of microphone 11A ~ 11C is connected with amplifier 12A ~ 12C, A/D converter (ADC:AnalogtoDigitalConverter) 13A ~ 13C and vocoder 14A ~ 14C.And, the output of vocoder 14A ~ 14C is connected with network processes portion 15.

Microphone 11A ~ 11C generates the voice data corresponding with the vibration of the sound inputted from various direction.This voice data is analog sound data.Amplifier 12A ~ 12C amplifies the voice data that microphone 11A ~ 11C exports.A/D converter (ADC) 13A ~ 13C periodically samples to the voice data that amplifier 12A ~ 12C exports, and converts voice data to numerical data.The voice data (timing variations of the waveform of voice data) that vocoder 14A ~ 14C exports A/D converter 13A ~ 13C is encoded, and generates the voice data being suitable for the predetermined form transmitted.

In addition, " sound " in present embodiment, except comprising the sound that obtains due to the sounding of the mankind, can also comprise the composition of the general sound or the noise such as produced due to mechanical vibration.And " sound " can comprise the sound beyond the sound of such as monitored object.That is, by microphone 11A ~ 11C radio reception to the signal of the sound be not recited as " sound " with sometimes not distinguishing the kind of the sound.

Network processes portion 15 obtains the voice data that vocoder 14A ~ 14C generates, and is sent to network 50 by voice data.Such as, relative to the sound that microphone 11A ~ 11C collects, vocoder 14A ~ 14C generates independently voice data.Thus, the voice data of the multiple sound channels corresponding with each microphone 11A ~ 11C is sent to network 50 by network processes portion 15.

Camera 20 possesses lens 21, transducer 22, image encoder 23 and network processes portion 24.

Lens 21 are such as comprehensive lens, fish-eye lens.Transducer 22 is capture apparatus, comprises such as CCD (ChargeCoupledDevice) image sensor, CMOS (ComplementaryMetalOxideSemiconductor) image sensor.Transducer 22 carrys out image data generating according to via lens 21 to the light image of the subject of the shooting face incidence of transducer 22.

The view data that image encoder 23 pairs of transducers 22 export processes successively, generates the view data being suitable for predetermined dimension.The view data that image encoder 23 generates by network processes portion 24 is sent to network 50.

Monitor and Control device 30 is such as PC (PersonalComputer).Monitor and Control device 30 comprises such as CPU (CentralProcessingUnit) or DSP (DigitalSignalProcessor).Monitor and Control device 30 comprises such as ROM (ReadOnlyMemory) or RAM (RandomAccessMemory).

Monitor and Control device 30 such as performs the program (such as, the program of application program, ActiveX form) of the control of recording in ROM or RAM by CPU or DSP, realize various function thus.And ROM or RAM forms not shown memory.

Monitor and Control device 30 possesses network processes portion 31, picture decoder 32, image output unit 33, Image recognizing section 34, radio reception coordinate specifying part 35, sound decoder 36 and directive property handling part 37.And Monitor and Control device 30 possesses radio reception angle operational part 38, test section 39, sound source presumption unit 40, speech synthesiser 41, audio output unit 42 and data recording section 43.

Network processes portion 31 via network 50, and carries out data communication between array microphone 10 and camera 20.By data communication, network processes portion 31 obtains the voice data of multiple sound channel from array microphone 10, obtains view data from camera 20.Network processes portion 31 is examples for data acquisition.

Network processes portion 31 can directly obtain voice data that array microphone 10 sends and the view data that camera 20 is sent from array microphone 10 and camera 20.Network processes portion 31 also can read from data recording section 43 in the arbitrary moment and obtain the voice data or view data (at least voice data) that data recording section 43 records.The voice data directly obtained from array microphone 10 and camera 20 or view data also can be recorded in data recording section 43 in the arbitrary moment by network processes portion 31.

Picture decoder 32 carries out decoding to the view data from network processes portion 31, generates the view data that can reset.

Image output unit 33 converts the view data from picture decoder 32 view data of the form that monitor 61 can show to, sends to monitor 61.And image output unit 33 also can control the display of monitor 61.And the view data corresponding with the Detection Information from test section 39 can be sent to monitor 61 by image output unit 33.

Monitor 61 shows various view data.Monitor 61 such as shows image according to the view data from image output unit 33.Such as, the image that camera 20 photographs is shown in monitor 61.Monitor 61 is examples for prompting part.

Whether Image recognizing section 34 can perform predetermined image process for the view data from image output unit 33, and identify consistent with the image of the various patterns of registering in advance in such as not shown memory.Such as, perform the process of pattern match, among the various objects that image comprises, extract the pattern similar with the face of predetermined personage or predetermined personage.Also the pattern of the object beyond personage can be extracted.

And Image recognizing section 34 such as also can the kind (such as, personage, the male sex, women) of object that comprises of recognition image data.And Image recognizing section 34 can have VMD (VideoMotionDetector) function, the presence or absence of the movement in inspection image data.

Radio reception coordinate specifying part 35 such as accepts multiple input from touch panel 62 or Image recognizing section 34, and derives the multiple coordinates corresponding with input position or input range.Such as, in the image that the picture of monitor 61 shows, accept the coordinate of multiple positions (Reference numeral P1, P2 of such as Fig. 1) that operator 60 should pay close attention to as multiple radio reception coordinate (x, y).Radio reception coordinate specifying part 35 is examples for specifying part, specifies the multiple directions that with radio reception portion (such as array microphone 10) are benchmark corresponding with the appointed part (such as radio reception coordinate) on the image shown based on view data.

Operator 60 observes monitor 61 limit operating touch panel 62 in such as limit.Because the position of the indicating device (not shown) picture shown along with the move operation (such as drag operation) on touch panel 62 is moved, therefore operator 60 can change radio reception coordinate in the indication range on picture.Such as, by the touch operation of operator's 60 pairs of touch panels 62, the coordinate of indicating device provides to radio reception coordinate specifying part 35 as radio reception coordinate.Operator 60 is the examples using surveillance 100 to carry out the observer monitored.

Also the input unit beyond touch panel 62 can be used to specify radio reception coordinate.Such as, mouse can be connected on Monitor and Control device, and operator 60 uses mouse to touch desired image range.

In addition, Image recognizing section 34 can when being identified as view data and comprising the pattern of registration in advance, the coordinate of multiple positions (such as, Reference numeral P1, P2 of Fig. 1) of the monitor 61 at the pattern place identified is provided to radio reception coordinate specifying part 35 as radio reception coordinate.The pattern identified is such as the entirety of personage, the face of personage.

The voice data input also decoding of multiple sound channels of sound decoder 36 automatic network handling part in future 15.And, in sound decoder 36, can separately arrange the sound decoder that the voice data of multiple sound channel processes.In this case, the voice data of multiple sound channels that the microphone 11A ~ 11C that simultaneously can process array microphone 10 collects respectively.

The radio reception coordinate that radio reception angle operational part 38 determines based on radio reception coordinate specifying part 35, derives the radio reception angle θ that (such as calculating) represents the direction of the directive property of array microphone 10.Radio reception angle operational part 38 derive radio reception angle θ as directive property handling part 37 parameter and input.Such as, can be radio reception coordinate and radio reception angle θ one_to_one corresponding, the conversion table comprising this corresponding informance be stored in not shown memory.Radio reception angle operational part 38 can derive radio reception angle θ with reference to this conversion table.

Directive property handling part 37 obtains the information of radio reception angle θ from radio reception angle operational part 38, and obtains voice data from sound decoder 36.The voice data of the multiple sound channels exported from sound decoder 36, according to radio reception angle θ, synthesizes according to pre-defined algorithm by directive property handling part 37, forms directive property (directive property process).

Such as, directive property handling part 37 promotes the level of signal of the acoustic constituents in the direction (direction of directive property) in the place (focus) at the personage place of monitored object, and reduces the level of signal of the acoustic constituents in direction in addition.In addition, when operator 60 specifies the position of multiple monitored object, the direction of directive property exists multiple.Multiple voice datas after directive property process export to test section 39 and speech synthesiser 41 by directive property handling part 37.

Directive property handling part 37 according to the position of the sound source deduced by sound source presumption unit 40 (personage of such as monitored object, abnormal sound), can carry out directive property process.Directive property handling part 37 such as can obtain the information of the estimated position of sound source for more than 40 time from sound source presumption unit, and changes the direction of (such as switching) directive property when obtaining at every turn.Thus, even if when sound source moves, also can follow the trail of and monitor the position of sound source.That is, in the tracking of the position of sound source, directive property is turned to the position of the sound source deduced.

Test section 39 obtains and has carried out the multiple voice datas after directive property process by directive property handling part 37.This voice data comprises the first voice data such as strengthened by the acoustic constituents in the direction of the first directive property and the second voice data strengthened by the acoustic constituents in the direction of the second directive property.Test section 39 detects monitored object sound (example of predetermined sound) according at least one voice data in the multiple voice datas obtained.That is, test section 39 has the function as sound detection portion.In addition, the enhancing of acoustic constituents refers to and only extracts the sound from specific direction by filtering by the array microphone such as utilizing and formed by multiple microphone.

In addition, when monitored object sound being detected, test section 39 carries out various process.About the details of test section 39, be described below.Test section 39 is examples for the handling part carrying out predetermined process when monitored object sound being detected.

Sound source presumption unit 40 obtains the voice data from sound decoder 36, and presumption sends the position of the sound source of the monitored object sound detected by test section 39.Sound source comprises the personage in such as talk, the generation source sending generation source, the specifically ambient sound of personage, specifically personage (male sex, women), object (such as emergency vehicle), the abnormal sound (such as, alarm bell, siren) of sound, other sound source widely.Sound source presumption unit 40 is examples for presumption unit.

Sound source presumption unit 40 such as estimates the position of sound source by known sound source presumption technology.The presumption result of the position of the sound source that sound source presumption unit 40 produces is used for such as based on the tracking of the abnormal sound of directive property handling part 37, the switching of directive property.

The presumption result of the position of sound source can export to such as image output unit 33 or audio output unit 42 by sound source presumption unit 40.The presumption result of the position of image output unit 33 or audio output unit 42 pairs of sound sources is pointed out, and operator 60 easily can grasp the position of sound source thus.

Speech synthesiser 41 obtains the multiple voice datas having carried out directive property process from directive property handling part 37, and synthesizes multiple voice data.Speech synthesiser 41 can be such as by the level of signal of the multiple voice datas obtained merely being added and synthesizing voice data, and is exported to audio output unit 42 by the voice data after synthesis.Speech synthesiser 41 also can be such as have audio switch, optionally makes the level of signal of voice data be that the maximum voice data of the voice data of more than predetermine level or level of signal passes through, exports to audio output unit 42.

Audio output unit 42 such as converts the voice data from speech synthesiser 41 to analog sound data from digital audio data, amplifies voice data, provides to loud speaker 63.

Loud speaker 63 export with from the suitable sound of the voice data of audio output unit 42.Therefore, operator 60 can from loud speaker 63 hear to by array microphone 10 radio reception to the sound that processes of voice data.Loud speaker 63 is examples for prompting part.

Data recording section 43 such as can comprise HDD (HardDiskDrive), SSD (SolidStateDrive), and records voice data or the view data of multiple sound channels that network processes portion 31 obtains successively.The generation moment of voice data and the generation moment of view data, when recording voice data and view data, are set up record accordingly by data recording section 43.And, can by the information in above-mentioned generation moment record together with voice data or view data.Data recording section 43 can be arranged on the inside of Monitor and Control device 30, or the outside that can be arranged on Monitor and Control device 30 is as external recording medium.

And data recording section 43 records the information such as indicating voice data or the retrieval retrieved of view data of record.Be recorded in the retrieval mark of data recording section 43 by the suitable reference of the structural portion of other in Monitor and Control device 30.

Next, the details of test section 39 are described.

The level of signal of the voice data of test section 39 such as after directive property process is below more than the first predetermined threshold or the second predetermined threshold, detect this voice data as monitored object sound.The information of the threshold value compared with the level of signal of voice data such as remains in not shown memory.Such as, when the level of signal of voice data becomes below the second predetermined threshold, although produce the operation sound of machinery, also comprise the situation that this machinery stops and operation sound disappears.

Test section 39 such as detects abnormal sound that the voice data after having carried out directive property process comprises as monitored object sound.Such as, the pattern of abnormal sound is stored in not shown memory, and test section 39 detects abnormal sound when voice data comprises the pattern of abnormal sound.

Test section 39 such as detects predetermined keyword that the voice data after having carried out directive property process comprises as monitored object sound.Such as, the information of keyword is stored in not shown memory, test section 39 detects keyword when voice data comprises the keyword being recorded in memory.In addition, when detecting keyword, such as, known voice recognition technology can be used.In this case, test section 39 has known sound identifying function.

In addition, monitored object sound can preset.Such as, level of signal can be sound below more than the first predetermined threshold or the second predetermined threshold by test section 39, at least one in abnormal sound, keyword be set as monitored object sound.This set information is such as stored in not shown memory.

Test section 39 will detect that the information (Detection Information) of the content of monitored object sound sends at least one party in image output unit 33 and audio output unit 42 when above-mentioned monitored object sound being detected.Detection Information comprise such as abnormal sound, there is the sound of the level of signal of below more than the first predetermined threshold or the second predetermined threshold, the warning message (alarm) of the content of predetermined keyword detected.

And test section 39 sends predetermined information when monitored object sound being detected to data recording section 43.The information that retrieval indicates such as can send to data recording section 43, keep when monitored object sound being detected by test section 39.Retrieval mark comprises the voice data of monitored object sound or the mark of the view data corresponding with this voice data for retrieval from data recording section 43.

Retrieval mark such as can be recorded in data recording section 43 on the opportunity identical with the voice data obtained in real time or view data.And retrieval mark such as also can be set up with the voice data or view data being recorded in data recording section 43 and be recorded in data recording section 43 accordingly.

Such as, operator 60 indicates consistent or corresponding information via touch panel 62 input with retrieval, and picture decoder 32 or sound decoder 36 are retrieved, obtain and retrieved with indicating consistent or corresponding data in the voice data being recorded in data recording section 43 or view data thus.Such as, therefore, even if when recording for a long time to voice data or view data or record a video, also can retrieval time be shortened.

And operator 60 such as via touch panel 62, can select specific retrieval mark from the list being sequentially arranged with multiple retrieval mark.In this case, operator 60 can from generate the moment the oldest or up-to-date retrieval mark select specific retrieval to indicate successively.And operator 60 such as can via touch panel 62, and the specific retrieval of conduct indicates to select the retrieval generated in the moment corresponding with by the moment of timing unit (not shown) timing to indicate.Picture decoder 32 or sound decoder 36 are retrieved in the voice data being recorded in data recording section 43 or view data, obtain and indicate consistent or corresponding data with above-mentioned specific retrieval.List is such as recorded in data recording section 43.

Retrieval mark comprises the information in moment monitored object sound such as being detected by test section 39.Retrieval mark comprises the information in the direction (direction of directive property) of the sound source such as producing monitored object sound.Retrieval mark comprises the information of the classification (comprising the sound of more than abnormal sound, the sound of keyword, predetermined threshold or the level of signal below predetermined threshold) of such as monitored object sound.The classification of sound such as uses known voice recognition technology to judge by test section 39.

Retrieval mark comprises the presence or absence of movement of the sound source of the monitored object sound such as arrived by VMD Function detection or the information in the direction of movement.The sound source detecting the presence or absence of movement etc. is such as included in the view data photographed by camera 20 at generation moment or the generation time band of above-mentioned monitored object sound.The information arrived by VMD Function detection such as often when movement is detected, is sent from Image recognizing section 34 to test section 39.

Retrieval mark comprises the information of the classification of such as having been carried out the sound source of the monitored object sound after image recognition by Image recognizing section 34.The view data of the classification of identification sound source is such as the view data photographed by camera 20 at generation moment or the generation time band of monitored object sound.The information of the classification of sound source sends from Image recognizing section 34 to test section 39.

Retrieval mark comprises such as thumbnail image (rest image).Thumbnail image be the view data that such as photographed by camera 20 at generation moment of monitored object sound or generation time band at least partially.Thumbnail image sends from Image recognizing section 34 to test section 39.

Test section 39 can start recording or the video recording of voice data or the view data received by network processes portion 31 when monitored object sound being detected.Such as, the voice data that network processes portion 31 temporarily accumulates the scheduled period (such as 30 seconds) or view data, when monitored object sound not detected by test section 39, discard the voice data temporarily accumulated or view data.Test section 39 makes instruction when monitored object sound being detected to network processes portion 31, carry out controlling and by temporarily accumulation in voice data or view data included to data recording section 43 recording voice data or view data (prescoring, Video Pre-recording).And data recording section 43 records voice data from network processes portion 31 or view data.Prescoring, Video Pre-recording can terminate after a predetermined time lapses.

The voice data comprising this keyword, when predetermined keyword being detected as monitored object sound, can not record and cancellation to data recording section 43 by test section 39.Or test section 39, when predetermined keyword being detected as monitored object sound, from the part of this keyword of voice data cancellation, also can be replaced by the sound beyond this keyword.Test section 39 can by cancellation or the voice data of part of having replaced keyword to data recording section 43 record.Thus, when the information that keyword should be hidden, secret information or privacy can be protected.Like this to the cancellation of keyword or replace relevant process also referred to as " keyword is processed ".In addition, keyword processing also can be carried out for the voice data recorded to data recording section 43.

Test section 39, when monitored object sound being detected, can be made instruction to directive property handling part 37, switch the direction of directive property.In this case, the direction of directive property can be switched to predetermined direction by directive property handling part 37.Such as, can the information in multiple places (place A, place B) that comprises of coverage be registered in camera 20 in advance in not shown memory.The direction of place A detects monitored object sound, the direction of directive property can be switched to the direction in the place (such as place B) beyond the A of place by directive property handling part 37 from the direction of place A.

Test section 39, when predetermined keyword being detected as monitored object sound, can will comprise the voice data of this keyword to data recording section 43 record.This record can comprise prescoring, Video Pre-recording.Thus, operator 60 is registered in advance by the keyword that should monitor, can be that triggering starts record with keyword, can improve supervision precision.

Next, the configuration status of array microphone 10, camera 20 and each sound source is described.

Fig. 3 is the schematic diagram of an example of the configuration status representing array microphone 10, camera 20 and each sound source.

In figure 3, such as, radio unit 90 is fixed on ceiling face 101 within doors.In figure 3, multiple microphone 11A ~ 11C that array microphone 10 comprises arrange along ceiling face 101 (installation surface of radio unit 90).Reference numeral PA represents sound source.

And radio unit 90 is installed on ceiling face 101 to make the reference direction of array microphone 10 mode consistent with the reference direction (such as optical axis direction) of camera 20.Relative to the horizontal direction of the horizontal direction of the reference direction of array microphone 10 and vertical direction and the reference direction relative to camera 20 and vertical direction consistent.This horizontal direction is x-axis direction, y-axis direction, and vertical direction is z-axis direction.

The reference direction of array microphone 10 is such as the orientation that each microphone 11 in array microphone 10 arranges.Radio reception angle θ is the angle formed by the reference direction of array microphone 10 and the direction of directive property.The horizontal direction composition of the radio reception angle θ formed by the reference direction of array microphone 10 and the direction of directive property is horizontal angle θ h.The vertical direction composition of the radio reception angle θ formed by the reference direction of array microphone 10 and the vertical direction of directive property is vertical angle θ v.

In radio unit 90, each microphone 11 in array microphone 10 is circumferentially arranging at certain intervals, and therefore relative to the horizontal direction along arrangement plane (x-y face), no matter for which direction, the frequency characteristic of voice data is all identical.Therefore, in the example in figure 3, radio reception angle θ depends in fact vertical angle θ v.Therefore, in the following description, mainly as radio reception angle θ, do not consider that horizontal angle θ h is described.

As shown in Figure 3, the radio reception angle θ (vertical angle θ v) of the array microphone 10 in radio unit 90 is the direction (x-axis, y-axis) of the arrangement plane being parallel to microphone 11A ~ 11C direction angulations maximum with the sensitivity of directive property.

Microphone 11A ~ 11C carries out radio reception to the sound arrived towards microphone 11A ~ 11C.And, camera 20 with immediately below the direction in (z-axis direction) be reference direction (optical axis direction), the such as comprehensive of surrounding of camera 20 is taken.

In addition, the radio reception object of array microphone 10 or the reference object of camera 20 can not be comprehensive and be restricted to a part direction.And, under the state that array microphone 10 or Monitor and Control device 30 can be restricted to the direction of a part to radio reception object radio reception to voice data synthesize, to generate and radio reception object is the same voice data of omnibearing situation.And the picture signal that photographs is synthesized under the state that camera 20 or Monitor and Control device 30 also can be restricted to the direction of a part to reference object, to generate and reference object is the same picture signal of omnibearing situation.

In addition, such as, when the reference direction of array microphone 10 and the reference direction of camera 20 inconsistent, horizontal angle θ h can be considered.In this case, horizontal angle θ h and vertical angle θ v can be considered, such as, form directive property according to the position of three-dimensional (x, y, z) or direction.

Next, the action case of Monitor and Control device 30 is described.

Fig. 4 is the flow chart of the action case representing Monitor and Control device 30.

Fig. 4 illustrates real-time action example.Real-time action be such as operator 60 use Monitor and Control device 30 pair array microphone 10 radio reception in real time to voice data and the image that photographs of camera 20 carry out the action of situation about monitoring.

In the diagram, first, network processes portion 31 receives via network 50 view data that camera 20 sends.And network processes portion 31 carrys out the voice data (S11) of multiple sound channels that receiving array microphone 10 is sent via network 50.

The view data that network processes portion 31 receives, by picture decoder 32 decoding, sends to image output unit 33.View data after decoding exports to monitor 61 by image output unit 33, and carries out controlling to make monitor 61 show image (S12).And network processes portion 31 can by view data and voice data to data recording section 43 record.

Next, radio reception coordinate specifying part 35 such as accepts multiple coordinate from touch panel 62 input (S13).Such as, the display position of the image shown by operator 60 visuognosis monitor 61, operating touch panel 62 and specify the image range that should pay close attention to.

Radio reception coordinate specifying part 35 derives the radio reception coordinate corresponding with the image range of specifying.The position (Reference numeral P1, P2 of such as Fig. 1) of the specific personage that the image of display comprises on operator 60 such as touch monitor 61, radio reception coordinate specifying part 35 obtains multiple radio reception coordinate thus.Above-mentioned image range is such as an example of the monitor area that observer should monitor.

Radio reception coordinate specifying part 35 can replace the appointment of operator's 60 pairs of image range, makes Image recognizing section 34 from image, identify multiple predetermined pattern, and the multiple coordinates obtaining predetermined pattern place are as radio reception coordinate.

Radio reception angle operational part 38, based on the radio reception coordinate obtained by radio reception coordinate specifying part 35, such as, with reference to conversion table, or carries out known calculation process, derives radio reception angle θ (S14) thus.

The multiple radio reception angle θ derived by radio reception angle operational part 38 input to directive property handling part 37.Directive property handling part 37, according to radio reception angle θ, derives the parameter of the directive property process of array microphone 10.Further, directive property handling part 37, for the voice data from sound decoder 36, uses the parameter derived to carry out directive property process (S15).Thus, in the voice data that directive property handling part 37 exports, such as, relative to the direction of radio reception angle θ, the radio reception sensitivity of array microphone 10 becomes maximum.

Next, test section 39 detects monitored object sound (sound of the level of signal such as, more than abnormal sound, predetermined keyword, the first predetermined threshold or below the second predetermined threshold) (S16) from the voice data after having carried out directive property process.Before monitored object sound being detected, wait in S16.

Next, Image recognizing section 34 such as can carry out image recognition to the view data of the sound source comprising the monitored object sound detected, identify the classification (such as, people, the male sex, women, object, other sound source) (S17) of the sound source of monitored object sound.Thus, operator 60 can easily judge whether to monitor according to the classification of sound source, therefore, it is possible to alleviate the burden of operator 60, can improve supervision precision.

Image recognizing section 34 such as can use VMD function to detect the movement (S17) of the sound source of monitored object sound.Thus, operator 60 easily can pay close attention to the movement of sound source, therefore, it is possible to alleviate the burden of operator 60, can improve supervision precision.

Result (such as, the information of the movement of the information of the classification of the sound source of monitored object sound, the sound source of monitored object sound) after image recognition can send to test section 39 by Image recognizing section 34.

In addition, the process of S17 can be omitted.Such as, can be user sets the process of whether omitting S17 information via touch panel 62, also can be not shown control part sets the process of whether omitting S17 information according to supervision grade.The information whether omitting the process of S17 such as remains in not shown memory.

Next, Monitor and Control device 30, according at least one party in the image recognition result of the testing result of test section 39 and Image recognizing section 34, carries out predetermined process (action) (S18).

Such as, when detect monitored object sound situation, recognize the classification of sound source situation or detect sound source movement situation, namely produce monitor trigger, test section 39 can indicate to carry out notification alert information by image to image output unit 33.And when producing supervision and triggering, test section 39 also can indicate audio output unit 42, to be carried out notification alert information (S18) by sound.And test section 39 according to monitoring the classification triggered, can carry out the display of the piping of different warning tones, warning message.Thus, the operator 60 of Monitor and Control device 30 can easily identify the generation etc. of monitored object sound can alleviate the burden of operator 60, can improve supervision precision.

Such as, the information that retrieval indicates, when creating supervision and triggering, can record (S18) to data recording section 43 by test section 39.Thus, even if when operator 60 sees voice data or view data in the future again, the specific position of desired voice data or view data also easily can be retrieved, can shorten the such as proving time.

Such as, when creating supervision and triggering, test section 39 can indicate network processes portion 31, to carry out at least one party (S18) in prescoring and Video Pre-recording.Thus, before generation monitors triggering, do not carry out the recording to data recording section 43 or video recording, effective utilization ratio of data recording section 43 can be improved thus.Such as, and when creating supervision and triggering, can monitor the voice data or view data that trigger and produce the moment by reliable recording, future can confirm as checking material.

Such as, when predetermined keyword being detected as monitored object sound, test section 39 can carry out keyword processing (S18).Thus, even if when keyword is secret information, also can secret information be protected.And, when recording in cancellation or displacement keyword the voice data comprising keyword, can secret information be protected and preserve voice data.

Such as, when producing supervision and triggering, test section 39 can indicate directive property handling part 37, to switch the direction (S18) of directive property.Thus, such as, by the direction to change directive property towards the mode in direction preset, can improve when the movement being contemplated to sound source can the possibility of tracing and monitoring object sound.

Next, sound source presumption unit 40 estimates the position (S19) of the sound source of monitored object sound.Thereby, it is possible to improve the supervision precision of operator 60.

Next, directive property handling part 37 is on predetermined opportunity (such as per scheduled time), obtain the information of the position of the sound source of the monitored object sound deduced from sound source presumption unit 40, switch the direction (S20) of directive property to make directive property towards the mode of the position of this sound source.Thereby, it is possible to the sound source of tracing and monitoring object sound, operator 60 easily can monitor the trend of sound source, can improve supervision precision.

In addition, S19, S20 can omit.

According to the action case of Fig. 4, operator 60, via monitor 61 and loud speaker 63, can monitor the image in current monitor area and sound simultaneously.Especially the image of the sound source of the sound comprising monitored object sound and monitored object sound can be monitored.And, can confirm that arbitrary multiple monitor area is specified on image limit in limit, and as monitored object.And carry out various process according to the detection of monitored object sound, effective usability of voice data increases thus, can improve convenience.

Next, the details of the directive property process of surveillance 100 are described.

Fig. 5 is the schematic diagram of the basic comprising example representing directive property process.In Figure 5, directive property handling part 37 comprises multiple delayer 37bA, 37bB, 37bC and adder calculator 37c, can form directive property by the process of these parts.

A/D converter 13A, 13B, 13C convert the analog sound data that microphone 11A ~ 11C exports to digital audio data, and directive property handling part 37 carries out directive property process for the digital audio data after conversion.In addition, the number (n) of the delayer that array microphone 10 comprises the number (n) of microphone, the number (n) of A/D converter and directive property handling part 37 comprise increases and decreases as required.

In Figure 5, multiple microphone 11A ~ 11C is configured in the position being separated from each other preset distance, and the sound wave therefore sent from 1 sound source 80 time arrived till each microphone 11A ~ 11C produces the relative time difference (time of advent is poor).Sound source 80 is sound sources of such as monitored object sound.

Owing to there is the impact of the difference above-mentioned time of advent, if be directly added by the voice data that multiple microphone 11A ~ 11C detects respectively, then owing to there is the addition of multiple voice datas of phase difference, level of signal decays sometimes.Therefore, give time delay by delayer 37bA ~ 37bC respectively to multiple voice data and adjust phase place, and by adder calculator 37c, the voice data that have adjusted phase place is added each other.Thus, be added by multiple voice datas consistent for phase place, level of signal increases.

In Figure 5, differ from the direction of arrival (being equivalent to radio reception angle θ) according to the sound wave of the framework plane of incidence 121 incidence from sound source 80 to array microphone 10 the above-mentioned time of advent and change.Such as, when multiple microphone 11A ~ 11C detects the sound wave arrived from specific direction (θ), phase place to multiple voice datas of adder calculator 37c input is consistent, and the level of signal of the voice data that adder calculator 37c exports increases.On the other hand, in the sound wave arrived beyond specific direction (θ), the multiple voice datas inputted to adder calculator 37c produce phase difference, the level of signal decay of the voice data that adder calculator 37c exports.Thereby, it is possible to improve the directive property mode of the sensitivity of the sound wave arrived from specific direction (θ) being formed to array microphone 10.

When the sound wave of monitored object sound to arrive framework the plane of incidence 121 from the direction of radio reception angle θ, each time of delay D1, D2, D3 of being represented by (formula 1) are assigned to the time of delay of each delayer 37bA, 37bB, 37bC.

D1＝L1/Vs＝d(n-1)cosθ/Vs

D2=L2/Vs=d (n-2) cos θ/Vs ... (formula 1)

D3＝L3/Vs＝d(n-3)cosθ/Vs

Wherein,

L1: the 1 sound wave between microphone and the n-th microphone arrives range difference (known constant)

L2: the 2 sound wave between microphone and the n-th microphone arrives range difference (known constant)

L3: the 3 sound wave between microphone and the n-th microphone arrives range difference (known constant)

Vs: velocity of sound (known constant)

D: the configuration space (known constant) of microphone

In addition, as one example, when the system configuration shown in Fig. 2, n=3, when shown in Fig. 1 radio unit 90, n=16.

As shown in (formula 1), when directive property is consistent with the sound wave arriving array microphone 10 from specific direction θ, the time of advent corresponding to the sound wave relative to each microphone 11A ~ 11C at the framework plane of incidence 121 place is poor, to each delayer 37bA, 37bB, 37bC assignment latency time D1 ~ D3.

Such as, directive property handling part 37 based on from the radio reception angle θ of radio reception angle operational part 38 and (formula 1), obtains D1 ~ D3 each time of delay, and is distributed to each delayer 37bA ~ 37bC by D1 ~ D3 time of delay.Thereby, it is possible to strengthen the voice data arriving the sound wave of the framework plane of incidence 121 from the direction of radio reception angle θ, form the directive property of array microphone 10.

Such as, known constant in addition, in D1 ~ D3 time of delay of distribution, (formula 1) is stored in the not shown memory in Monitor and Control device 30.

According to surveillance 100, such as, in the view data received in real time, accept multiple appointments of the monitor area of the operator 60 from Monitor and Control device 30, under the state of directive property towards the direction corresponding with monitor area, abnormal presence or absence can be monitored.When produce monitor trigger, Monitor and Control device 30 carries out various process, can promote thus by array microphone 10 radio reception to voice data and effective utilization of view data of being photographed by camera 20, the convenience of operator 60 can be improved.

(the second execution mode)

In this second embodiment, assuming that the register of recording voice data or view data and the split of Monitor and Control device are arranged at the situation of surveillance.

Fig. 6 is the synoptic diagram of the surveillance 100B of execution mode.Compared to Figure 1, difference is that surveillance 100B possesses register 70 to Fig. 6.Register 70 is connected with network 50.Register 70 is examples for storage device.Register 70 store such as by array microphone 10 radio reception to voice data, the view data that photographed by camera 20.

Fig. 7 is the block diagram of the configuration example representing surveillance 100B.In the surveillance 100B of Fig. 7, about the structure same with the surveillance 100 shown in Fig. 2, mark same Reference numeral, omit or simplified illustration.

Surveillance 100B possesses array microphone 10, camera 20, Monitor and Control device 30B and register 70.

Compared with the Monitor and Control device 30 shown in Fig. 2, Monitor and Control device 30B does not possess data recording section 43.Monitor and Control device 30B replaces to data recording section 43 record data or from data recording section 43 sense data, and the data recording section 72 that Visitor Logs device 70 possesses carrys out record data or from data recording section 72 sense data.When carrying out data communication between Monitor and Control device 30B and register 70, data communicate via the network processes portion 71 of the network processes portion 31 of Monitor and Control device 30B, network 50 and register 70.

Register 70 possesses network processes portion 71 and data recording section 72.Register 70 comprises such as CPU, DSP, ROM or RAM, and by CPU or DSP, the program of the control of executive logging in ROM or RAM, realizes various function thus.

Network processes portion 71 such as obtains the multiple sound channels sent from array microphone 10 voice data via network 50 or the view data sent from camera 20.The voice data that data recording section 72 such as records by network processes portion 71 or view data are sent to network 50.

Data recording section 72 has the structure same with the data recording section 43 of the Monitor and Control device 30 shown in Fig. 2 and function.And data recording section 72 records data same with the data that data recording section 43 records (such as, voice data, view data, retrieval indicate information).

Such as, network processes portion 71 from Monitor and Control device 30B receive voice data, view data, retrieval mark information, the data received can be set up record accordingly by data recording section 72.And, network processes portion 71 receives the information of retrieval mark from Monitor and Control device 30B, when data recording section 72 has recorded voice data or view data, the information that retrieval can indicate by data recording section 72 and voice data or view data have set up record accordingly.

And, such as perform predetermined command by CPU, thus the information being recorded in the voice data of data recording section 72, view data and retrieval mark is read from data recording section 72, send to Monitor and Control device 30B via network processes portion 71 and network 50.

Such as, when receiving predetermined information via network 50 from Monitor and Control device 30B, data recording section 72 judges that whether the information recorded as retrieval mark is consistent or corresponding with the predetermined information received.Data recording section 72 is when being judged to be that both are consistent, and retrieval indicates with retrieval sets up corresponding voice data or view data, and the voice data retrieved or view data is sent to network 50.

Like this, by using the retrieval mark being recorded in register 70, easily can retrieve voice data or the view data of track record, can retrieval time be shortened, the convenience of operator 60 can be improved.

Next, the action case of Monitor and Control device 30B is described.

Fig. 8 is the flow chart of the action case representing Monitor and Control device 30B.

Fig. 8 illustrates that register exports replay action example.It is such as that operator 60 uses Monitor and Control device 30B to the action of the situation that the voice data in past and picture signal that are recorded in register 70 are analyzed that register exports replay action example.In addition, in fig. 8, about the step of carrying out the process same with the process of Fig. 4, mark same number of steps, omit or simplified illustration.

In surveillance 100B, the voice data record of the view data that past camera 20 photographs and multiple sound channels that array microphone 10 radio reception is arrived, in register 70 when, can read view data and the voice data of record from register 70.

Monitor and Control device 30B, such as according to the input operation from operator 60, makes instruction to register 70, reads the specific view data and the voice data that are recorded in register 70.In this case, above-mentioned specific view data and voice data read from register 70, are received (S21) by network processes portion 31 via network 50.

Next, the process of the S12 ~ S20 of Fig. 8 is carried out.In addition, the process of S17, S19, S20 can be omitted.

According to the action case of Fig. 8, operator 60, via monitor 61 and loud speaker 63, can monitor the image in monitor area in the past and sound simultaneously.Especially the image of the sound source of the sound comprising monitored object sound and monitored object sound can be monitored.And can confirm image in limit, arbitrary multiple monitor area is specified on limit, and as monitored object.And various process is carried out in the detection according to monitored object sound, thus, effective usability of voice data increases, and can improve convenience.

In addition, retrieval mark sets up record accordingly with the view data recorded or voice data, thus such as when the data that retrieval is afterwards relevant to monitored object sound, can retrieve rapidly.Like this, by carrying out various process according to the detection of monitored object sound, effective usability of voice data increases, and can improve convenience.

In addition, the register of Fig. 8 exports the routine action that also can be applied to the situation that the data being recorded in data recording section 43 process in the first execution mode of replay action.

According to surveillance 100B, such as, in the view data recorded in the past, accept multiple appointments of the monitor area of the operator 60 from Monitor and Control device 30B, by directive property towards under the state in the direction corresponding with monitor area, abnormal presence or absence can be monitored.When create monitor trigger, Monitor and Control device 30B by carrying out various process, can promote by array microphone 10 radio reception to voice data and effective utilization of view data of being photographed by camera 20, the convenience of operator 60 can be improved.

In addition, the present invention is not limited to the structure of above-mentioned execution mode, as long as the structure of function that the structure that can realize the function shown in claims or present embodiment has, then no matter is which kind of structure can both be applied.

Such as, in the above-described embodiment, the structural portion of the part relevant to acoustic processing that array microphone 10 or camera 20 can possess Monitor and Control device 30,30B possesses.Array microphone 10 such as can have Image recognizing section 34, radio reception coordinate specifying part 35, radio reception angle operational part 38, directive property handling part 37, test section 39, sound source presumption unit 40, speech synthesiser 41 part or all.Thereby, it is possible to reduce the processing load of Monitor and Control device 30,30B.In addition, when array microphone 10 has the structural portion of a part relevant to acoustic processing, between Monitor and Control device 30,30B and array microphone 10, carry out suitably necessary data communication via network 50.

Such as, in the above-described embodiment, exemplified with multiple microphone 11 at 1 array microphone circumferentially configured at certain intervals 10, but the arrangement of each microphone 11 can be different arrangement.Such as, row can be arranged at certain intervals along one direction (such as x-axis direction).And, can in two directions (such as x-axis direction, y-axis direction) be arranged in cross at certain intervals.And, circumferentially can arrange at certain intervals different 2 of diameter.

Such as, in the above-described embodiment, Monitor and Control device 30,30B can not use camera 20, and set up corresponding by the monitoring range in the space of reality with the radio reception angle θ of array microphone 10 and preset.That is, the not shown memory of Monitor and Control device 30,30B can keep the corresponding informance of above-mentioned monitoring range and radio reception angle θ.In this case, such as, when user specifies multiple predetermined monitoring range via touch panel 62 grade, radio reception angle operational part 38 with reference to the corresponding informance being held in memory, can derive multiple radio reception angle θ.And such as, user can directly specify radio reception angle via touch panel 62 grade, and the data of specifying is processed as the data derived by radio reception angle operational part 38.Thus, do not use camera 20 and the direction of multiple directive property can be determined.

Such as, in the above-described embodiment, surveillance 100,100B do not use image and use sound to carry out the system monitored.In this case, in surveillance 100,100B, such as, camera 20 can be omitted or for the structural portion realized with show the function associated.

Such as, in the above-described embodiment, be fixed on the situation in ceiling face 101 within doors exemplified with radio unit 90, but radio unit 90 also can be fixed on other position (such as, wall) within doors.And, in surveillance 100,100B, multiple camera 20 can be set.And monitor 61, touch panel 62 and loud speaker 63 can be included in Monitor and Control device 30,30B.

Such as, in the above-described embodiment, the soft keyboard (on-screen keyboard) for carrying out volume adjustment can be shown on monitor 61.Operated by touch panel 62 pairs of soft keyboards, the volume of the voice data after such as having carried out directive property process can be adjusted thus.

Such as, in the above-described embodiment, the not shown control part of Monitor and Control device 30,30B can correct the distortion according to the environment and the voice data produced arranging radio unit 90.And not shown control part can correct the distortion of the view data generation photographed by camera 20 (such as having the camera of fish-eye lens).

Such as, in the above-described embodiment, such as, radio reception coordinate specifying part 35 can touch monitor area by touch panel 62, in directive property towards after this monitor area, when again touching monitor area by touch panel 62, this monitor area is got rid of from monitored object.That is, the same position of radio reception coordinate specifying part 35 in the view data shown by monitor 61 or region have been carried out repeatedly touching, terminate the derivation of radio reception coordinate, the directive property process based on directive property handling part 37 is terminated.Multiple monitor area can be got rid of from monitored object by radio reception coordinate specifying part 35 simultaneously.

In the above-described embodiment, such as, radio reception coordinate specifying part 35, when receiving drag operation with the state being touched monitor area by touch panel 62, can make monitor area move.Radio reception coordinate specifying part 35 can accept the movement of multiple monitor area simultaneously.

In the above-described embodiment, surveillance 100,100B can possess multiple radio unit 90.In this case, each radio unit 90 can form view data collaboratively, and forms voice data.And the image photographed by the camera 20 of each radio unit 90 can be simultaneously displayed on the divided frame of monitor 61.Monitor and Control device 30,30B on each divided frame, even if when by touch panel 62 across multiple divided frame receive drag operation, also can use by the radio reception of each radio unit 90 to voice data carry out directive property process.This drag operation such as can accept in different multiple regions of monitor 61 simultaneously.

(summary of a mode of the present invention)

According to this structure, multiple monitor area is specified to the observer that sound processing apparatus processes, under the state that the direction corresponding with monitor area is formed directive property, abnormal presence or absence can be monitored.Thereby, it is possible to promote radio reception to voice data and effective utilization of view data of photographing, improve the convenience of observer.

In addition, the sound processing apparatus of a mode of the present invention can be configured to, above-mentioned specifying part specifies the multiple image range in the above-mentioned view data obtained by above-mentioned data acquisition, and above-mentioned directive property handling part strengthens the multiple acoustic constituents arrived in the direction from multiple image range of being specified by above-mentioned specifying part in tut data.

According to this structure, owing to using view data, therefore observer can specify multiple monitor area intuitively.Thus, observer's listening to based on predetermined sound, easily can monitor multiple monitor area simultaneously.And, when view data being monitored together with voice data, supervision precision can be improved.

And the sound processing apparatus of a mode of the present invention can be configured to, and also possesses: sound detection portion, detect predetermined sound according at least one acoustic constituents in the acoustic constituents by the above-mentioned multiple directions after above-mentioned directive property handling part enhancing; And handling part, when above-mentioned predetermined sound being detected by tut test section, carry out predetermined process.

According to this structure, sound processing apparatus implements various action along with the detection of predetermined sound, therefore, it is possible to promote radio reception to voice data and effective utilization of view data of photographing, improve the convenience of observer.

In addition, the sound processing apparatus of a mode of the present invention can be configured to, when above-mentioned predetermined sound being detected, retrieval mark is recorded in recording unit by above-mentioned handling part, and this retrieval mark is used for the view data retrieving the voice data comprising above-mentioned predetermined sound or the sound source comprising above-mentioned predetermined sound from the above-mentioned recording unit recording tut data and above-mentioned view data.

According to this structure, even if when observer sees voice data or view data in the future again, the specific position of desired voice data or view data or voice data or view data also easily can be retrieved, can shorten the such as proving time.

And the sound processing apparatus of a mode of the present invention can be configured to, above-mentioned handling part obtains and indicates that the predetermined retrieval comprised indicates the corresponding voice data being recorded in above-mentioned recording unit or view data with the retrieval being recorded in above-mentioned recording unit.

According to this structure, such as, the retrieval mark using the user via touch panel to specify, can retrieve voice data or the view data in the past that there occurs same phenomenon.Therefore, it is possible to promote effective utilization of voice data or view data, improve the convenience of observer, also can improve supervision precision.

In addition, the sound processing apparatus of a mode of the present invention can be configured to, above-mentioned retrieval mark comprise above-mentioned predetermined sound classification, with the direction of the sound source of the above-mentioned radio reception portion above-mentioned predetermined sound that is benchmark and above-mentioned predetermined sound detected by tut test section moment at least 1 information.

According to this structure, various information can be recorded as the retrieval corresponding with voice data mark, can promptly retrieve desired voice data where necessary.

And the sound processing apparatus of a mode of the present invention can be configured to, when above-mentioned predetermined sound being detected, above-mentioned handling part makes prompting part point out the warning message comprising the content above-mentioned predetermined sound being detected.

According to this structure, such as, situation about creating with the abnormal sound accompanied is notified to observer, the attention of strengthening supervision etc. can be urged.

And the sound processing apparatus of a mode of the present invention can be configured to, when above-mentioned predetermined sound being detected, above-mentioned handling part will comprise the voice data record of above-mentioned predetermined sound in recording unit.

According to this structure, the moment not recording voice data of predetermined sound effective utilization ratio of recording unit can be improved thus not detecting.And, when predetermined sound being detected, can the voice data in generation moment of reliable recording predetermined sound, such as, can confirm as checking material in the future.And, when equally also recording image data, supervision precision can be improved further.

And the sound processing apparatus of a mode of the present invention can be configured to, when above-mentioned predetermined sound being detected, above-mentioned handling part is changed the direction being strengthened acoustic constituents by above-mentioned directive property handling part.

According to this structure, such as, when the sound source of predetermined sound move and be contemplated to its move, to change the direction of directive property towards the mode in direction preset, the possibility can following the trail of predetermined sound can be improved thus.

And the sound processing apparatus of a mode of the present invention can be configured to, and possesses presumption unit, the presumption of this presumption unit sends the position of the sound source of above-mentioned predetermined sound, and makes prompting part point out the information of the position deduced.

According to this structure, observer easily can confirm the position of the sound source deduced, therefore, it is possible to alleviate the burden of observer, can improve supervision precision.

And, the sound processing apparatus of a mode of the present invention can be configured to, possess the presumption unit that presumption sends the position of the sound source of above-mentioned predetermined sound, above-mentioned directive property handling part strengthens the acoustic constituents arrived from the direction of the position of the above-mentioned sound source deduced by above-mentioned presumption unit.

According to this structure, observer relies on voice data and can follow the trail of the sound source of predetermined sound, therefore, it is possible to easily monitor the trend of sound source, can improve supervision precision.

And, the sound processing apparatus of a mode of the present invention can be configured to, when the level of signal of the acoustic constituents after being strengthened by above-mentioned directive property handling part is below more than the first prearranged signals grade or the second prearranged signals grade, tut test section detects this acoustic constituents as above-mentioned predetermined sound.

According to this structure, such as, by the sound ground setting threshold larger than the life sound of anticipation usually can be detected, the possibility detecting abnormal sound can be improved.

And the sound processing apparatus of a mode of the present invention can be configured to, tut test section detects predetermined keyword as predetermined sound according to by least 1 acoustic constituents in the acoustic constituents after above-mentioned directive property handling part enhancing.

According to this structure, such as, the invisible high keyword in the acoustic constituents in the direction of directive property can be detected, various process can be implemented to keyword, therefore, it is possible to improve the convenience of user.

And the sound processing apparatus of a mode of the present invention can be configured to, the part of above-mentioned handling part to the above-mentioned predetermined keyword comprised in the voice data of the above-mentioned predetermined keyword detected is processed.

According to this structure, even if when keyword is secret information, also can secret information be protected.And, such as, when the voice data comprising keyword is recorded in keyword cancellation or displacement, secret information can be protected, and preserve voice data.

And the sound processing apparatus of a mode of the present invention can be configured to, above-mentioned handling part will comprise the voice data record of the above-mentioned predetermined keyword detected in recording unit.

According to this structure, can with the keyword sent in monitor area for triggering the record starting voice data.Thus, the moment not recording voice data of keyword effective utilization ratio of recording unit can be improved thus not detecting.And, when keyword being detected, also can record the voice data comprising keyword, such as, can confirm as checking material in the future.

And the sound processing apparatus of a mode of the present invention can be configured to, the predetermined abnormal sound that at least one acoustic constituents in the acoustic constituents after the detection of tut test section is strengthened by above-mentioned directive property handling part comprises is as above-mentioned predetermined sound.

According to this structure, such as, can detect in the acoustic constituents in the direction of directive property and show abnormal abnormal sound, abnormal sound can be used to implement various process, therefore, it is possible to improve the convenience of user.

And the sound processing apparatus of a mode of the present invention can be configured to, and possesses the Image recognizing section above-mentioned view data being carried out to image recognition, and above-mentioned handling part carries out above-mentioned predetermined process according to the image recognition result of above-mentioned Image recognizing section.

According to this structure, by also considering image recognition result in the lump with the detection of predetermined sound, such as, the abnormality detection precision of monitor area can be improved.Therefore, observer can suitably tackle, and can improve supervision precision.

And the sound processing apparatus of a mode of the present invention can be configured to, above-mentioned Image recognizing section identifies the classification of the sound source of above-mentioned predetermined sound in above-mentioned view data.

According to this structure, observer can easily judge whether to monitor according to the classification of sound source, therefore, it is possible to alleviate the burden of observer, can improve supervision precision.And, when the retrieval mark of the classification of additional representation sound source, such as, even if when seeing voice data more in the future, also easily can retrieve according to the classification of sound source, can the proving time be shortened.

And the sound processing apparatus of a mode of the present invention can be configured to, above-mentioned Image recognizing section identifies the presence or absence of the movement of the sound source of above-mentioned predetermined sound in above-mentioned view data.

According to this structure, observer easily can pay close attention to the movement of sound source, therefore, it is possible to alleviate the burden of observer, can improve supervision precision.And, when retrieval with presence or absence of the movement of additional representation sound source indicates, such as, even if when seeing voice data more in the future, also easily can retrieve according to the presence or absence of the movement of sound source, can the proving time be shortened.

And, the sound processing apparatus of a mode of the present invention can be configured to, when having carried out image recognition by above-mentioned Image recognizing section to above-mentioned view data, retrieval mark is recorded in recording unit by above-mentioned handling part, and above-mentioned retrieval mark is used for the view data retrieving the voice data comprising above-mentioned predetermined sound or the sound source comprising above-mentioned predetermined sound from the above-mentioned recording unit recording tut data and above-mentioned view data.

According to this structure, even if observer is when seeing view data more in the future, also easily can retrieves the specific position of desired view data or view data, such as, can shorten the proving time.

And, the sound processing apparatus of a mode of the present invention can be configured to, the predetermined retrieval mark that above-mentioned handling part uses the retrieval mark being recorded in above-mentioned recording unit to comprise, obtains with above-mentioned predetermined retrieval mark the tut data or view data that are recorded in above-mentioned recording unit accordingly.

According to this structure, such as, use the retrieval mark of being specified by user via touch panel, the view data in the past that there occurs same phenomenon can be retrieved.Therefore, it is possible to promote effective utilization of the view data corresponding with voice data, the convenience of observer can be improved, also can improve supervision precision.

And the sound processing apparatus of a mode of the present invention can be configured to, above-mentioned retrieval mark comprise the classification of above-mentioned sound source, at least 1 that has that it's too late and comprise in the thumbnail image of above-mentioned sound source of the movement of above-mentioned sound source.

According to this structure, record various information as the retrieval corresponding with view data mark, can promptly retrieve desired view data where necessary.

And, the sound processing apparatus of a mode of the present invention can be configured to, when above-mentioned predetermined sound being detected, above-mentioned handling part, according to the image recognition result of above-mentioned Image recognizing section, makes prompting part point out the warning message comprising the content above-mentioned predetermined sound being detected.

According to this structure, utilize image recognition result, such as, situation about creating with the abnormal sound accompanied is notified to observer, the attention of strengthening supervision etc. can be urged.

And the sound processing apparatus of a mode of the present invention can be configured to, when above-mentioned predetermined sound being detected, above-mentioned handling part, according to the image recognition result of above-mentioned Image recognizing section, will comprise the voice data record of above-mentioned predetermined sound in recording unit.

According to this structure, the moment not recording voice data of predetermined sound effective utilization ratio of recording unit can be improved thus not detecting.And, when predetermined sound being detected, utilize image recognition result, can the voice data in generation moment of reliable recording predetermined sound, such as, can confirm as checking material in the future.And, when equally also recording image data, supervision precision can be improved further.

And, the sound processing apparatus of a mode of the present invention also can be configured to, when above-mentioned predetermined sound being detected, above-mentioned handling part, according to the image recognition result of above-mentioned Image recognizing section, is changed the direction being strengthened acoustic constituents by above-mentioned directive property handling part.

According to this structure, such as, when the sound source of predetermined sound move and envision its move, utilize image recognition result, to change the direction of directive property towards the mode in direction preset, the possibility can following the trail of predetermined sound can be improved thus.

And the sound processing system of a mode of the present invention possesses: audio signal reception device, comprise the radio reception portion using multiple microphone to carry out radio reception; Filming apparatus, comprises the shoot part of photographic images; And sound processing apparatus, to by above-mentioned radio reception portion radio reception to voice data process, tut processing unit possesses: data acquisition, obtain by above-mentioned radio reception portion radio reception to voice data and the view data that photographed by above-mentioned shoot part; Specifying part, specifies the multiple directions that with above-mentioned radio reception portion are benchmark corresponding with the appointed part on the image shown based on above-mentioned view data; And directive property handling part, strengthen the acoustic constituents in the tut data of the multiple directions of being specified by above-mentioned specifying part.

According to this structure, observer specifies multiple monitor area, under the state that the direction corresponding with monitor area defines directive property, can monitor abnormal presence or absence.Thereby, it is possible to promote effective utilization of the voice data that radio reception is arrived, improve the convenience of observer.

And, the sound processing system of a mode of the present invention can be configured to, above-mentioned specifying part specifies the multiple image range in the above-mentioned view data obtained by above-mentioned data acquisition, and above-mentioned directive property handling part strengthens the multiple acoustic constituents arrived in the direction from multiple image range of being specified by above-mentioned specifying part in tut data.

And, the sound processing system of a mode of the present invention can be configured to, tut processing unit also possesses: sound detection portion, detects predetermined sound according at least one acoustic constituents in the acoustic constituents by the above-mentioned multiple directions after above-mentioned directive property handling part enhancing; And handling part, when above-mentioned predetermined sound being detected by tut test section, carry out predetermined process.

According to this structure, sound processing apparatus implements various action along with the detection of predetermined sound, therefore, it is possible to promote radio reception to voice data and effective utilization of view data of photographing, the convenience of observer can be improved.

And, the sound processing system of a mode of the present invention can be configured to, above-mentioned data acquisition obtains tut data from above-mentioned audio signal reception device, above-mentioned view data is obtained from above-mentioned filming apparatus, tut processing unit possesses recording unit, these recording unit record tut data, above-mentioned view data and the retrieval mark for retrieving the voice data comprising above-mentioned predetermined sound.

Such as, according to this structure, according to real-time voice data or view data, can clearly listen the sound getting monitor area.Therefore, it is possible to improve real-time supervision precision.And such as, the record together with retrieval mark by real-time voice data or view data, thus, can easily retrieve desired voice data or view data in the future, can shorten the proving time.

And, the sound processing system of a mode of the present invention can be configured to, possesses the tape deck of record data, above-mentioned tape deck possesses recording unit, this recording unit by by above-mentioned radio reception portion radio reception to voice data set up record accordingly with the view data to be photographed by above-mentioned shoot part, and record is for retrieving the retrieval mark of the voice data comprising above-mentioned predetermined sound, above-mentioned data acquisition obtains tut data, above-mentioned view data and above-mentioned retrieval mark from above-mentioned recording unit.

According to this structure, the sound of the monitor area of image in the past clearly can be listened to.Therefore, such as, the precision that the sound that can improve the monitor area employing image is in the past resolved.And, by the voice data recorded or the additional retrieval mark of view data, the retrieval of desired voice data or view data can be made easy in the future.And, when record retrieval indicates, use retrieval mark, can easily retrieve desired voice data or view data.

And the sound processing method of a mode of the present invention comprises the following steps: obtain by the radio reception portion radio reception comprising multiple microphone to voice data and the view data that photographed by shoot part; Specify the multiple directions that with above-mentioned radio reception portion are benchmark corresponding with the appointed part on the image shown based on above-mentioned view data; And strengthen specified by above-mentioned multiple directions tut data in acoustic constituents.

According to the method, observer specifies multiple monitor area, under the state that the direction corresponding with monitor area defines directive property, can monitor abnormal presence or absence.Thereby, it is possible to promote radio reception to voice data and effective utilization of view data of photographing, improve the convenience of observer.

Industrial applicibility

The present invention for can promote effective utilization of voice data and view data and improve the sound processing apparatus of convenience, sound processing system and sound processing method etc. be useful.

Description of reference numerals

10 array microphones

11A, 11B, 11C microphone

12A, 12B, 12C amplifier

13A, 13B, 13CA/D transducer

14A, 14B, 14C vocoder

15 network processes portions

20 cameras

21 lens

22 transducers

23 image encoders

24 network processes portions

30,30B Monitor and Control device

31 network processes portions

32 picture decoders

33 image output unit

34 Image recognizing section

35 radio reception coordinate specifying part

36 sound decoders

37 directive property handling parts

38 radio reception angle operational parts

39 test sections

40 sound source presumption units

41 speech synthesisers

42 audio output units

43 data recording section

50 networks

60 operators

61 monitors

62 touch panels

63 loud speakers

70 registers

71 network processes portions

72 data recording section

80 sound sources

90 radio unit

91 unit frameworks

100,100B surveillance

101 ceiling faces

102 bed surfaces

The 121 framework planes of incidence

PA sound source

Claims

1. a sound processing apparatus, possesses:

Data acquisition, obtain by the radio reception portion radio reception comprising multiple microphone to voice data and the view data that photographed by shoot part;

Specifying part, specifies the multiple directions that with described radio reception portion are benchmark corresponding with the appointed part on the image shown based on described view data; And

Directive property handling part, strengthens the acoustic constituents in the described voice data of the multiple directions of being specified by described specifying part.

2. sound processing apparatus according to claim 1, wherein,

Described specifying part specifies the multiple image range in the described view data obtained by described data acquisition,

Described directive property handling part strengthens the multiple acoustic constituents arrived in the direction from multiple image range of being specified by described specifying part in described voice data.

3. sound processing apparatus according to claim 1 and 2, wherein,

Described sound processing apparatus also possesses:

Sound detection portion, detects predetermined sound according at least one acoustic constituents in the acoustic constituents by the described multiple directions after described directive property handling part enhancing; And

Handling part, when described predetermined sound being detected by described sound detection portion, carries out predetermined process.

4. sound processing apparatus according to claim 3, wherein,

When described predetermined sound being detected, retrieval mark is recorded in recording unit by described handling part, and described retrieval mark is used for the view data retrieving the voice data comprising described predetermined sound or the sound source comprising described predetermined sound from the described recording unit recording described voice data and described view data.

5. sound processing apparatus according to claim 4, wherein,

Described handling part obtains and indicates that the predetermined retrieval comprised indicates the corresponding voice data being recorded in described recording unit or view data with the retrieval being recorded in described recording unit.

6. the sound processing apparatus according to claim 4 or 5, wherein,

Described retrieval mark comprise described predetermined sound classification, with the direction of the sound source of the described radio reception portion described predetermined sound that is benchmark and at least one information in the moment of described predetermined sound detected by described sound detection portion.

7. the sound processing apparatus according to any one of claim 3 ~ 6, wherein,

When described predetermined sound being detected, described handling part makes prompting part point out the warning message comprising the content described predetermined sound being detected.

8. the sound processing apparatus according to any one of claim 3 ~ 7, wherein,

When described predetermined sound being detected, described handling part will comprise the voice data record of described predetermined sound in recording unit.

9. the sound processing apparatus according to any one of claim 3 ~ 8, wherein,

When described predetermined sound being detected, described handling part is changed the direction being strengthened acoustic constituents by described directive property handling part.

10. the sound processing apparatus according to any one of claim 3 ~ 9, wherein,

Described sound processing apparatus also possesses presumption unit, and described presumption unit presumption sends the position of the sound source of described predetermined sound, and makes prompting part point out the information of the position deduced.

11. sound processing apparatus according to any one of claim 3 ~ 10, wherein,

Described sound processing apparatus also possesses presumption unit, and described presumption unit presumption sends the position of the sound source of described predetermined sound,

Described directive property handling part strengthens the acoustic constituents arrived from the direction of the position of the described sound source deduced by described presumption unit.

12. sound processing apparatus according to any one of claim 3 ~ 11, wherein,

When the level of signal of the acoustic constituents after being strengthened by described directive property handling part is below more than the first prearranged signals grade or the second prearranged signals grade, this acoustic constituents detects as described predetermined sound in described sound detection portion.

13. sound processing apparatus according to any one of claim 3 ~ 12, wherein,

Predetermined keyword detects as described predetermined sound according to by least one acoustic constituents in the acoustic constituents after described directive property handling part enhancing in described sound detection portion.

14. sound processing apparatus according to claim 13, wherein,

The part of described handling part to the described predetermined keyword comprised in the voice data of the described predetermined keyword detected is processed.

15. sound processing apparatus according to claim 13, wherein,

Described handling part will comprise the voice data record of the described predetermined keyword detected in recording unit.

16. sound processing apparatus according to any one of claim 3 ~ 11, wherein,

The predetermined abnormal sound that at least one acoustic constituents in acoustic constituents after the detection of described sound detection portion is strengthened by described directive property handling part comprises is as described predetermined sound.

17. sound processing apparatus according to any one of claim 3 ~ 6, wherein,

Described sound processing apparatus possesses the Image recognizing section described view data being carried out to image recognition,

Described handling part carries out described predetermined process according to the image recognition result of described Image recognizing section.

18. sound processing apparatus according to claim 17, wherein,

Described Image recognizing section identifies the classification of the sound source of described predetermined sound in described view data.

19. sound processing apparatus according to claim 17 or 18, wherein,

Described Image recognizing section identifies the presence or absence of the movement of the sound source of described predetermined sound in described view data.

20. sound processing apparatus according to any one of claim 17 ~ 19, wherein,

When having carried out image recognition by described Image recognizing section to described view data, retrieval mark is recorded in recording unit by described handling part, and described retrieval mark is used for the view data retrieving the voice data comprising described predetermined sound or the sound source comprising described predetermined sound from the described recording unit recording described voice data and described view data.

21. sound processing apparatus according to claim 20, wherein,

The predetermined retrieval mark that described handling part uses the retrieval mark being recorded in described recording unit to comprise, obtains with described predetermined retrieval mark the voice data or view data that are recorded in described recording unit accordingly.

22. sound processing apparatus according to claim 20 or 21, wherein,

Described retrieval mark comprise the classification of described sound source, the movement of described sound source have that it's too late and comprise in the thumbnail image of described sound source at least one.

23. sound processing apparatus according to any one of claim 17 ~ 22, wherein,

When described predetermined sound being detected, described handling part, according to the image recognition result of described Image recognizing section, makes prompting part point out the warning message comprising the content described predetermined sound being detected.

24. sound processing apparatus according to any one of claim 17 ~ 23, wherein,

When described predetermined sound being detected, described handling part, according to the image recognition result of described Image recognizing section, will comprise the voice data record of described predetermined sound in recording unit.

25. sound processing apparatus according to any one of claim 17 ~ 24, wherein,

When described predetermined sound being detected, described handling part, according to the image recognition result of described Image recognizing section, is changed the direction being strengthened acoustic constituents by described directive property handling part.

26. 1 kinds of sound processing systems, possess:

Audio signal reception device, comprises the radio reception portion using multiple microphone to carry out radio reception;

Filming apparatus, comprises the shoot part taken image; And

Sound processing apparatus, to by described radio reception portion radio reception to voice data process,

Described sound processing apparatus possesses:

Data acquisition, obtain by described radio reception portion radio reception to voice data and the view data that photographed by described shoot part;

27. sound processing systems according to claim 26, wherein,

In described sound processing apparatus,

28. sound processing systems according to claim 26 or 27, wherein,

Described sound processing apparatus also possesses:

29. sound processing systems according to claim 28, wherein,

Described data acquisition obtains described voice data from described audio signal reception device, obtains described view data from described filming apparatus,

Described sound processing apparatus possesses recording unit, voice data, described view data and the retrieval mark for retrieving the voice data comprising described predetermined sound described in described recording unit record.

30. sound processing systems according to claim 28, wherein,

Described sound processing system also possesses the tape deck of record data,

Described tape deck possesses recording unit, described recording unit by by described radio reception portion radio reception to voice data set up record accordingly with the view data to be photographed by described shoot part, and record is for retrieving the retrieval mark of the voice data comprising described predetermined sound

Described data acquisition obtains described voice data, described view data and described retrieval mark from described recording unit.

31. 1 kinds of sound processing methods, be the sound processing method of sound processing apparatus, described sound processing method comprises the following steps:

Obtain by the radio reception portion radio reception comprising multiple microphone to voice data and the view data that photographed by shoot part;

Specify the multiple directions that with described radio reception portion are benchmark corresponding with the appointed part on the image shown based on described view data; And

Acoustic constituents in the described voice data of the described multiple directions specified by enhancing.