JP2007295484A

JP2007295484A - Imaging apparatus

Info

Publication number: JP2007295484A
Application number: JP2006123597A
Authority: JP
Inventors: Takahiro Funahashi; 孝博舟橋
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2006-04-27
Filing date: 2006-04-27
Publication date: 2007-11-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide an imaging apparatus capable of achieving efficient data transmission utilizing audio recognition and performing automatic supervision and observation, by solving the problem wherein efficient video transmission is disabled by reacting to changes in sunshine in conventional video transmission, based on an alarm due to image processing, such as moving object detection, and the problem wherein audio is recognized but video image cannot be transmitted as alarm condition. <P>SOLUTION: An audio pattern waveform, recorded beforehand, is recorded and some of decision conditions are combined based on the recorded audio pattern waveform, thereby video image transmission and recording are carried out, when abnormal sound is generated in building supervision or the like, and sound of behavior of animals during outdoor observation are reacted to. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声を集音する機能を有し、且つ撮像素子等を使用したカメラ等の撮像装置による映像をデジタルデータに変換し、LAN（ Local Area Network ）、WAN（Wide Area Network ）等のネットワークヘ伝送する機能を有した撮像装置に関するものである。 The present invention has a function of collecting sound and converts video from an imaging device such as a camera using an imaging device into digital data, such as a LAN (Local Area Network) and a WAN (Wide Area Network). The present invention relates to an imaging apparatus having a function of transmitting to a network.

従来方式の一例を図２によって説明する。図２は、従来の撮像装置の構成を説明するためのブロック図である。100 は撮像装置、101 は TV カメラ等の撮像部、102 はアナログ映像信号をデジタル信号に変換するアナログ−デジタル変換部（ A/D ）、103 はデジタル変換された映像データを蓄積する映像データ記録部、104 はマイク等の集音装置（ MIC ）、105 は集音装置 104 により入力される音声信号を電気的に増幅する増幅部（ AMP ）、106 はアナログ音声信号をデジタル信号に変換するアナログ−デジタル変換部（ A/D ）、107 は、デジタルサンプリングされた音声データを蓄積する音声データ記録部、110 は映像データをネットワークの規格に合わせて変換するプロトコル変換部、111 は音声データをネットワークの規格に合わせて変換するプロトコル変換部、112 はネットワークヘ伝送する伝送部、120 は撮像装置 100 の各構成要素（例えば、撮像部 101 、アナログ−デジタル変換部 102 、映像データ記録部 103 、集音装置 104 、増幅部 105 、アナログ−デジタル変換部 106 、音声データ記録部 107 、プロトコル変換部 110 、プロトコル変換部 111 、及び伝送部 112 ）を制御するための CPU（ Central Processing Unit ）である。
また、150 はネットワーク、170 は端末である。端末 170 は 1 以上存在する。 An example of the conventional method will be described with reference to FIG. FIG. 2 is a block diagram for explaining the configuration of a conventional imaging apparatus. 100 is an imaging device, 101 is an imaging unit such as a TV camera, 102 is an analog-to-digital converter (A / D) that converts an analog video signal into a digital signal, and 103 is a video data recording that stores digitally converted video data , 104 is a sound collecting device (MIC) such as a microphone, 105 is an amplifying unit (AMP) that electrically amplifies the audio signal input by the sound collecting device 104, and 106 is an analog that converts the analog audio signal into a digital signal. -Digital conversion unit (A / D), 107 is an audio data recording unit that stores digitally sampled audio data, 110 is a protocol conversion unit that converts video data according to network standards, and 111 is an audio data network. Is a protocol conversion unit that performs conversion according to the standard of the network, 112 is a transmission unit that transmits to the network, 120 is each component of the imaging apparatus 100 (for example, the imaging unit 101, analog Digital conversion unit 102, video data recording unit 103, sound collecting device 104, amplification unit 105, analog-digital conversion unit 106, audio data recording unit 107, protocol conversion unit 110, protocol conversion unit 111, and transmission unit 112) CPU (Central Processing Unit).
150 is a network, and 170 is a terminal. There are one or more terminals 170.

図２において、撮像部101 は、視野範囲内を撮像し、取得した被写体像をアナログ映像信号に変換して、アナログ−デジタル変換部 102 に出力する。アナログ−デジタル変換部 102 は、入力されたアナログ映像信号は、102 によりデジタル変換され、映像データ記録部103 に映像データとして出力される。映像データ記録部 103 は、入力された映像データを一時的に蓄積する。映像データ記録部103 では、フレーム単位で複数フレームを蓄積する場合もある。 In FIG. 2, the imaging unit 101 captures an image within the visual field range, converts the acquired subject image into an analog video signal, and outputs the analog video signal to the analog-digital conversion unit 102. The analog-to-digital converter 102 digitally converts the input analog video signal by 102 and outputs it to the video data recording unit 103 as video data. The video data recording unit 103 temporarily stores the input video data. The video data recording unit 103 may store a plurality of frames in units of frames.

集音装置104 は、周囲の音を捉え、アナログ音声信号として増幅部 105 に出力する。増幅部 105 は、入力されたアナログ音声信号を可聴音レベルまで増幅し、アナログ−デジタル変換部106 に出力する。アナログ−デジタル変換部 106 は、入力されたアナログ音声信号をデジタルサンプリングしてデジタル音声データとして音声データ記録部 107 に出力する。
音声データ記録部107 は、入力されたデジタル変換された音声データを一時的に蓄積する。 The sound collector 104 captures ambient sounds and outputs them to the amplifier 105 as analog audio signals. The amplifying unit 105 amplifies the input analog audio signal to an audible sound level and outputs it to the analog-digital converting unit 106. The analog-to-digital converter 106 digitally samples the input analog audio signal and outputs it to the audio data recording unit 107 as digital audio data.
The audio data recording unit 107 temporarily stores the input digitally converted audio data.

プロトコル変換部110 は、必要に応じて映像データ記録部103から映像データを読み出し、伝送するネットワーク 150 に合わせ、プロトコルに則って変換して伝送部 112 に出力する。同様に、プロトコル変換部 111 は、必要に応じて音声データ記録部 107 から音声データを読み出し、伝送するネットワークに合わせ、プロトコルに準じて変換して伝送部 112 に出力する。
プロトコル変換部110 と111 はそれぞれ、映像データと音声データとを必要に応じて伝送後に映像と音声の同期が取れるように変換する。 The protocol conversion unit 110 reads the video data from the video data recording unit 103 as necessary, converts the video data according to the network 150 to be transmitted, converts the video data according to the protocol, and outputs it to the transmission unit 112. Similarly, the protocol conversion unit 111 reads out the audio data from the audio data recording unit 107 as necessary, converts the audio data according to the protocol to be transmitted, and outputs the converted data to the transmission unit 112.
Each of the protocol converters 110 and 111 converts video data and audio data as necessary so that the video and audio can be synchronized after transmission.

伝送部112 は、プロトコル変換部 110 および111 によりプロトコル変換されたデータをネットワーク 150 に映像データ及び音声データを送信する。
端末170 は、ネットワーク 150 を介して撮像装置 100 からの映像データ及び音声データを受信する。受信されたデータは、周知の手段によって端末操作者に開示される。 The transmission unit 112 transmits the video data and audio data to the network 150 from the data converted by the protocol conversion units 110 and 111.
The terminal 170 receives video data and audio data from the imaging device 100 via the network 150. The received data is disclosed to the terminal operator by a known means.

CPU 120 は、撮像装置 100 内の実行プログラムに応じて撮像装置 100 内の各構成要素（例えば、撮像部 101 、アナログ−デジタル変換部 102 、映像データ記録部 103 、集音装置 104 、増幅部 105 、アナログ−デジタル変換部 106 、音声データ記録部 107 、プロトコル変換部 110 、プロトコル変換部 111 、及び伝送部 112 ）を制御する。 The CPU 120 is configured according to an execution program in the imaging apparatus 100 (for example, the imaging unit 101, the analog-digital conversion unit 102, the video data recording unit 103, the sound collection device 104, and the amplification unit 105). , An analog-digital conversion unit 106, an audio data recording unit 107, a protocol conversion unit 110, a protocol conversion unit 111, and a transmission unit 112).

以上述べたように、図２の撮像装置 100 は、監視領域内の映像と音声とを取得し、取得した映像と音声を映像信号と音声信号に変換し、更に映像データと音声データにデジタル変換してネットワークに出力する（例えば、特許文献１または２参照。）。ネットワークと結合している端末は、ネットワークを介して、映像信号と音声信号を同期して復号化することができる。ただし、音声を検知し、照合して必要なデータのみを伝送することはできない。
即ち、従来、動体検知などの画像処理によるアラームに基づいての映像伝送では、日照変化などに反応して、効率的な映像伝送ができなかった。また、音声を認識し、アラーム条件として映像伝送をすることができなかった。 As described above, the imaging apparatus 100 in FIG. 2 acquires the video and audio in the monitoring area, converts the acquired video and audio into video signals and audio signals, and further digitally converts them into video data and audio data. And output to the network (see, for example, Patent Document 1 or 2). A terminal coupled to the network can decode the video signal and the audio signal in synchronization via the network. However, only necessary data cannot be transmitted by detecting and collating voice.
That is, conventionally, video transmission based on an alarm by image processing such as moving object detection cannot respond efficiently to changes in sunshine and the like, and efficient video transmission cannot be performed. Also, it was not possible to recognize the voice and transmit the video as an alarm condition.

特開２０００−２６８２６５号公報JP 2000-268265 A 特開２００４−２８９２４６号公報JP 2004-289246 A

上述のように、従来技術では、ネットワークを介して、映像信号と音声信号を同期して復号化することができるが、音声を検知し、照合して必要なデータのみを伝送することはできなかった。
本発明の目的は、上記のような問題を解決し、音声認識を利用した効率的なデータ伝送の実現と、自動監視および観察を可能とする撮像装置を提供することにある。 As described above, in the prior art, the video signal and the audio signal can be synchronized and decoded via the network, but it is not possible to detect the audio and collate and transmit only the necessary data. It was.
An object of the present invention is to provide an imaging apparatus that solves the above-described problems and enables efficient data transmission using voice recognition, and enables automatic monitoring and observation.

上記の目的を達成するために、本発明の撮像装置は、入力された音声データをあらかじめ登録された音声データ（基準音声データ）と照合（または比較）し、入力された音声データが比較した基準音声データと近似した音声パターンや音圧レベルが所定の条件と一致すれば、一致したとしてトリガ信号を出力し、そのトリガ信号に対応した映像データおよび音声データを特定して、特定した映像および音声データを伝送するものである。
即ち、あらかじめ記録した音声パターン波形を記録しておき、記録した音声パターン波形を元に判断条件をいくつか組み合わせることにより、ビル監視等で異常音発生時の映像伝送、記録、野外観察に動物の行動する音に反応することなどを可能とするものである。 In order to achieve the above object, the imaging apparatus of the present invention collates (or compares) input sound data with previously registered sound data (reference sound data), and compares the input sound data with a reference. If the sound pattern or sound pressure level approximated to the sound data matches the predetermined condition, a trigger signal is output as a match, the video data and sound data corresponding to the trigger signal are specified, and the specified video and sound Data is transmitted.
In other words, by recording a pre-recorded audio pattern waveform and combining several judgment conditions based on the recorded audio pattern waveform, the video transmission, recording, and outdoor observation at the time of abnormal sound occurrence in building monitoring etc. It is possible to react to the sound of action.

上記目的を達成するため、本発明の撮像装置は、視野範囲内を撮像する撮像部と、撮像部からの映像信号を処理する映像デジタル変換部と、映像デジタル変換部の映像データを記録する映像データ記録部と、視野範囲内の音を捉える音声入力部と、音声入力部からの音声信号を処理する音声デジタル変換部と、音声デジタル変換部の音声データを記録する音声データ記録部と、基準の音声データを記録する基準音声データ記録部と、基準の音声データと上記音声データ記録部の音声データとを所定の条件に基づいて比較し、近似しているか否かを判断する比較部とを備え、近似していると判断した場合には、近似していると判断した音声データおよび音声データと対応する映像データをネットワークに送信するものである。 In order to achieve the above object, an imaging apparatus of the present invention includes an imaging unit that captures an image within a visual field range, a video digital conversion unit that processes a video signal from the imaging unit, and a video that records video data of the video digital conversion unit A data recording unit, an audio input unit that captures sound within a visual field range, an audio digital conversion unit that processes audio signals from the audio input unit, an audio data recording unit that records audio data of the audio digital conversion unit, and a reference A reference audio data recording unit that records the audio data of the first and a comparison unit that compares the reference audio data and the audio data of the audio data recording unit based on a predetermined condition and determines whether or not they are approximated. When it is determined that the audio data is approximated, the audio data determined to be approximate and the video data corresponding to the audio data are transmitted to the network.

本発明の撮像装置によれば、音声認識により特定した映像および音声データだけを伝送することができるため、監視および観察等の用途において、必要なときのみ映像を得ることが可能となる。
従来の、モーションディテクタあるいは動体検知で知られる画像処理に基づくトリガでは、明るさに対して反応し、日照変化や自動車のライトなどあらゆる変化を捉えてしまい、効率的な映像の収集ができない場合が多い。これに対して、本発明の別の効果として、自動車のエンジン音を排他的条件として除き、ガラスの破壊音などに反応させて、必要な映像データを特定することができる。
また、別の効果として、風雨等の天候による草木の揺れた音を排除し、更に好ましくは、動物が動く音などを反応することにより、観察用としても効率的な利用が可能である。 According to the imaging apparatus of the present invention, only video and audio data specified by voice recognition can be transmitted, so that video can be obtained only when necessary in applications such as monitoring and observation.
Conventional triggers based on image processing known as motion detectors or motion detection often react to brightness and capture all changes, such as sunshine changes and car lights, and often cannot collect video efficiently. . On the other hand, as another effect of the present invention, it is possible to specify necessary video data by removing the engine sound of the automobile as an exclusive condition and reacting to the breaking sound of the glass.
Further, as another effect, it is possible to efficiently use it for observation by eliminating the sound of shaking of the vegetation caused by weather such as wind and rain, and more preferably by reacting the sound of moving animals.

本発明の一実施例を図１で説明する。図１は、本発明の撮像装置の一実施例の構成を説明するためのブロック図である。図２と同じ参照番号の構成要素は同じ機能を備える。その他、200 は撮像装置、108 は基準となる音声データを蓄積する基準音声データ記録部、109 は音声記録部 107 から出力される音声データと基準音声データ記録部 108 から出力される基準音声データとを照合し、あらかじめ決められた許容範囲内の適合率であった場合に伝送するトリガを発生する比較器（ CMP ）、120′はCPU である。 An embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram for explaining the configuration of an embodiment of an imaging apparatus of the present invention. Components with the same reference numbers as in FIG. 2 have the same functions. In addition, 200 is an imaging device, 108 is a reference audio data recording unit that stores reference audio data, 109 is audio data output from the audio recording unit 107, and reference audio data output from the reference audio data recording unit 108. The comparator (CMP) 120 ′ is a CPU that generates a trigger to be transmitted when the precision is within a predetermined allowable range.

図１において、撮像部101 は、視野範囲内を撮像し、取得した被写体像をアナログ映像信号に変換して、アナログ−デジタル変換部 102 に出力する。アナログ−デジタル変換部 102 は、入力されたアナログ映像信号は、102 によりデジタル変換され、映像データ記録部103 に映像データとして出力される。映像データ記録部 103 は、入力された映像データを一時的に蓄積する。映像データ記録部103 では、フレーム単位で複数フレームを蓄積する場合もある。 In FIG. 1, the imaging unit 101 captures an image within the visual field range, converts the acquired subject image into an analog video signal, and outputs the analog video signal to the analog-digital conversion unit 102. The analog-to-digital converter 102 digitally converts the input analog video signal by 102 and outputs it to the video data recording unit 103 as video data. The video data recording unit 103 temporarily stores the input video data. The video data recording unit 103 may store a plurality of frames in units of frames.

集音装置104 は、周囲の音を捉え、アナログ音声信号として増幅部 105 に出力する。増幅部 105 は、入力されたアナログ音声信号を可聴音レベルまで増幅し、アナログ−デジタル変換部106 に出力する。アナログ−デジタル変換部 106 は、入力されたアナログ音声信号をデジタルサンプリングしてデジタル音声データとして音声データ記録部 107 に出力する。なお、集音装置104 の音声信号が、アナログ−デジタル変換部106 においてサンプリングするために十分な電気信号である場合には、増幅部 105 は省略しても良い。
音声データ記録部107 は、入力されたデジタル変換された音声データを一時的に蓄積する。 The sound collector 104 captures ambient sounds and outputs them to the amplifier 105 as analog audio signals. The amplifying unit 105 amplifies the input analog audio signal to an audible sound level and outputs it to the analog-digital converting unit 106. The analog-to-digital converter 106 digitally samples the input analog audio signal and outputs it to the audio data recording unit 107 as digital audio data. If the audio signal of the sound collector 104 is an electrical signal sufficient for sampling in the analog-digital converter 106, the amplifier 105 may be omitted.
The audio data recording unit 107 temporarily stores the input digitally converted audio data.

基準音声データ記録部108 は、トリガとなる音声をあらかじめ定め、その音声を基準となる音声データとしてあらかじめ記録する。即ち、あらかじめ定めた基トリガとする音声を、基準となるサンプリング周波数でデジタルサンプリングすることによってデジタル音声データに変換し、変換された音声データを記録しておく。この基準音声データ記録部 108 は、複数の波形パターンの音声データを登録しても良い。 The reference sound data recording unit 108 determines a sound as a trigger in advance and records the sound as reference sound data. In other words, a predetermined base trigger is converted into digital audio data by digital sampling at a reference sampling frequency, and the converted audio data is recorded. The reference audio data recording unit 108 may register audio data having a plurality of waveform patterns.

比較器 109 は、音声データ記録部107 から出力される音声データと、基準音声データ記録部 108 から出力される基準音声データの波形パターンを比較し、あらかじめ定めた判断基準に基づいて、トリガ信号を発生する。
判断基準は、例えば、振幅パターンの適合率、周波数パターンの適合率、振幅の大きさ範囲などである。また、基準音声データの波形パターンに適合しないことを排他的条件としてトリガ信号を発生することにしても良い。 The comparator 109 compares the waveform pattern of the audio data output from the audio data recording unit 107 and the reference audio data output from the reference audio data recording unit 108, and generates a trigger signal based on a predetermined criterion. appear.
The determination criteria are, for example, an amplitude pattern matching rate, a frequency pattern matching rate, an amplitude magnitude range, and the like. Alternatively, the trigger signal may be generated with an exclusive condition that it does not conform to the waveform pattern of the reference audio data.

比較器 109 からトリガ信号がプロトコル変換部 110 に入力したとき、プロトコル変換部 110 と 111 は、映像データと音声データとをそれぞれ、映像データ記録部 103 と音声データ記録部 107 とから読み出し、伝送するネットワークに合わせたプロトコルに従って変換して伝送部 112 に出力する。
プロトコル変換部110 と111 はそれぞれ、映像データと音声データとを必要に応じて伝送後に映像と音声の同期が取れるように変換する。
なお、プロトコル変換部110 と111 は、１つのユニットとなって（一体化して）いても良い。 When the trigger signal is input from the comparator 109 to the protocol conversion unit 110, the protocol conversion units 110 and 111 read and transmit the video data and the audio data from the video data recording unit 103 and the audio data recording unit 107, respectively. The data is converted according to the protocol suitable for the network and output to the transmission unit 112.
Each of the protocol converters 110 and 111 converts video data and audio data as necessary so that the video and audio can be synchronized after transmission.
Note that the protocol conversion units 110 and 111 may be integrated into one unit.

伝送部112 は、プロトコル変換部 110 および111 によりプロトコル変換されたデータをネットワーク 150 に映像データ及び音声データを送信する。
端末170 は、ネットワーク 150 を介して撮像装置 100 からの基準音声パターンによって特定された音に対応する映像データ及び音声データを受信する。受信されたデータは、周知の手段によって端末操作者に開示され、端末操作者が閲覧または視聴可能となる。 The transmission unit 112 transmits the video data and audio data to the network 150 from the data converted by the protocol conversion units 110 and 111.
The terminal 170 receives video data and audio data corresponding to the sound specified by the reference audio pattern from the imaging device 100 via the network 150. The received data is disclosed to the terminal operator by a known means, and can be browsed or viewed by the terminal operator.

CPU 120′は、撮像装置 200 内の実行プログラムに応じて撮像装置 200 内の各構成要素（例えば、撮像部 101 、アナログ−デジタル変換部 102 、映像データ記録部 103 、集音装置 104 、増幅部 105 、アナログ−デジタル変換部 106 、音声データ記録部 107 、基準音声データ記録部 108 、比較器 109 、プロトコル変換部 110 、プロトコル変換部 111 、及び伝送部 112 ）を制御する。 The CPU 120 ′ is configured according to an execution program in the imaging device 200 (for example, an imaging unit 101, an analog-digital conversion unit 102, a video data recording unit 103, a sound collecting device 104, an amplification unit). 105, an analog-digital conversion unit 106, an audio data recording unit 107, a reference audio data recording unit 108, a comparator 109, a protocol conversion unit 110, a protocol conversion unit 111, and a transmission unit 112).

上述のように、図１の実施例によれば、音声認識に基づいて、特定の映像データと音声データとを伝送することができるため、必要な映像だけを得ること取得することができる。 As described above, according to the embodiment of FIG. 1, specific video data and audio data can be transmitted based on voice recognition, so that only necessary video can be obtained and acquired.

また、従来のモーションディテクタまたは動体検知として知られる画像処理に基づく判断では、画像の明るさ（輝度値）に対して反応するため、日照変化や自動車のライトなど、あらゆる輝度変化を捉えてしまい、効率的な映像の収集ができない場合が多かったが、例えば、自動車のエンジン音を排他的条件として除き、ガラスの破壊音などに反応させることができるため、自動車のエンジン音が所定レベル未満で、ガラスの破壊音に相当する音声波形が基準音声データ（ガラスの破壊音としてあらかじめ基準音声データ記録部 108 に記録された音声データ）と判断された場合には、自動車が何かと衝突しエンジンがストップし、かつフロントガラス等のガラスが割れた音を認識することによって衝突事故と判断してそのときの映像データを音声データと共に出力することができる。 In addition, judgment based on image processing known as conventional motion detector or motion detection reacts to the brightness (brightness value) of the image, so it captures all brightness changes, such as sunshine changes and car lights, and is efficient. However, for example, since the engine sound of a car is excluded as an exclusive condition and can be reacted to the breaking sound of glass, the sound of the car engine is less than a predetermined level. If the sound waveform corresponding to the breaking sound is determined as the reference sound data (the sound data previously recorded in the reference sound data recording unit 108 as the glass breaking sound), the car collides with something and the engine stops, And by recognizing the sound of broken glass such as windshield, it is judged as a collision accident and the video data at that time is It can be output along with the voice data.

また、別の実施例として、基準音声データとして、風雨による草木の揺れだけによって生じる音声と、動物が動いたときに生じる音（動物が動くことによって発生する、例えば、風雨による草木の揺れと異なる音）とを音声パターンとして記録しておき、風雨による草木の揺れた音であるならば、その音を排他的条件として除き、動物が動く音と判断した場合には、トリガ信号を発生することによって、観察用としても効率的な利用が可能となる。 As another example, as reference audio data, as a reference sound data, a sound generated only by shaking of a vegetation caused by wind and rain and a sound generated when an animal moves (different from a shake of vegetation caused by moving an animal, for example, Sound) is recorded as a sound pattern, and if the sound is a sound of vegetation caused by wind and rain, the sound is excluded as an exclusive condition, and if it is determined that the animal moves, a trigger signal is generated. Therefore, it can be efficiently used for observation.

本発明の一実施例の構成を説明するためのブロック図。The block diagram for demonstrating the structure of one Example of this invention. 従来の構成を説明するためのブロック図。The block diagram for demonstrating the conventional structure.

Explanation of symbols

100：撮像装置、 101：撮像部、 102：アナログ−デジタル変換部、 103：映像データ記録部、 104：集音装置、 105：増幅部、 106：アナログ−デジタル変換部、 107：音声データ記録部、 108：基準音声データ記録部、 109：比較器、 110，111：プロトコル変換部、 112：伝送部、 120，120′：CPU、 150：ネットワーク、 170：端末、 200：撮像装置。 100: imaging device, 101: imaging unit, 102: analog-digital conversion unit, 103: video data recording unit, 104: sound collecting device, 105: amplification unit, 106: analog-digital conversion unit, 107: audio data recording unit 108: Reference audio data recording unit 109: Comparator 110, 111: Protocol conversion unit 112: Transmission unit 120, 120 ′: CPU 150: Network 170: Terminal 200: Imaging device

Claims

An imaging unit that captures an image within the field of view, a video digital converter that processes a video signal from the imaging unit, a video data recording unit that records video data of the video digital converter, and a sound within the field of view An audio input unit for capturing; an audio digital conversion unit for processing an audio signal from the audio input unit; an audio data recording unit for recording audio data of the audio digital conversion unit; and reference audio data for recording reference audio data A recording unit and a comparison unit that compares the audio data of the reference and the audio data of the audio data recording unit based on a predetermined condition and determines whether or not they are approximated, and determines that they are approximated In such a case, the image pickup apparatus transmits the audio data determined to be approximate and the video data corresponding to the audio data to the network.