JP2008278323A

JP2008278323A - Network camera

Info

Publication number: JP2008278323A
Application number: JP2007121043A
Authority: JP
Inventors: Kenji Tanaka; 健二田中; Masaaki Oyama; 将明大山; Osamu Rokkaku; 修六角
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-05-01
Filing date: 2007-05-01
Publication date: 2008-11-13

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a network camera which can record an image and sound before the occurrence of a change by sending information of sound change and image change prior to corresponding image and sound and eliminates the need of a buffer in a server when performing storage. <P>SOLUTION: An image encoding part 4 encodes an input image, a sound encoding part 5 encodes sound, a sound change recognition processing part 7 outputs sound change information when input sound changes more than a threshold, to notify a server for storing the encoded image and sound of storing timing, and an IP converting part 6 performs IP conversion of image and sound encoded data and sound change change information, and transmits the sound change information prior to the image and sound encoded data to the outside. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、例えば画像（映像）や音声を配信することができるネットワークカメラに関するものである。 The present invention relates to a network camera capable of delivering, for example, images (video) and audio.

従来のネットワークカメラは、入力された画像と音声を符号化し出力を行うものであり、例えば特許文献１のように、被写体を撮像し、アナログビデオ信号を生成する映像入力部と、音声を入力し、アナログオーディオ信号を生成する音声入力部と、前記アナログビデオ信号および前記アナログオーディオ信号をデジタル化してデジタルデータに変換するＡ／Ｄ変換部と、前記デジタルデータを圧縮して圧縮デジタルデータに変換する情報圧縮部と、前記圧縮デジタルデータを所定のプロトコルに準拠したパケットのペイロードに乗せるためのパケット生成部と、前記パケット化された圧縮デジタルデータを、前記所定のプロトコルを用いたネットワークの輻輳状態、前記ネットワークの帯域、および送信可能か否かの情報を含む各種送信制御情報を参照し、各種送信制御を行う送信制御部と、前記送信制御部によって送出が許可されたパケットを送信するネットワークインタフェース部と、前記各部を制御するための制御部とを備えたものであった。 A conventional network camera encodes and outputs an input image and sound. For example, as in Patent Document 1, a subject is imaged and a video input unit that generates an analog video signal is input. A voice input unit for generating an analog audio signal, an A / D converter for digitizing the analog video signal and the analog audio signal and converting them into digital data, and compressing the digital data into compressed digital data An information compression unit, a packet generation unit for placing the compressed digital data on a payload of a packet conforming to a predetermined protocol, and the congestion state of a network using the predetermined protocol, the packetized compressed digital data, Various transmission systems including information on the bandwidth of the network and whether or not transmission is possible. A transmission control unit that performs various transmission controls with reference to information, a network interface unit that transmits a packet permitted to be transmitted by the transmission control unit, and a control unit that controls each unit. It was.

また付加機能として、画像のモーションディテクト機能を有するものがあり、特許文献２のように、撮像した映像信号のフィールド間信号処理を行うことで雑音抑制（ノイズリダクション）処理した映像信号を得られるようにしたテレビジョンカメラにおいて、撮像された被写体に動きがあるか静止状態かを判定する動き検出（モーションディテクト）回路と、前記検出回路からの検出信号に応じて制御される雑音抑制処理回路とを有するものであった。 Further, as an additional function, there is a function having an image motion detection function, and as in Patent Document 2, a video signal subjected to noise suppression (noise reduction) processing can be obtained by performing inter-field signal processing of the captured video signal. In the television camera according to the present invention, a motion detection (motion detection) circuit that determines whether the captured subject is moving or stationary, and a noise suppression processing circuit that is controlled in accordance with a detection signal from the detection circuit. I had it.

そして、従来のネットワークカメラは、音声の有音・無音を監視するものもあり、特許文献３のように、ネットワークを介して音声データを受信すると、該音声データを音声受信バッファ部に一時的に貯めて、該音声受信バッファ部から出力される音声データを音声処理手段でデコードし、ＤＡ変換後に音声出力する端末であって、音声受信バッファ部への音声データの入出力制御を行うバッファ制御手段と、音声受信バッファ部内の音声データが一定時間継続して所定の波高値以下の場合に無データまたは無音と判定し該波高値を越えた場合に有音と判定する受信バッファレベル判定手段とを備え、バッファ制御手段が無データまたは無音と判定された音声データを廃棄し、残りの音声データの間を詰めて音声処理手段へ出力するものであった。 Some conventional network cameras monitor the presence / absence of voice. When the voice data is received via the network as in Patent Document 3, the voice data is temporarily stored in the voice reception buffer unit. A buffer control unit that stores and decodes audio data output from the audio reception buffer unit by an audio processing unit and outputs audio after DA conversion, and performs input / output control of audio data to the audio reception buffer unit And reception buffer level determination means for determining that there is no data or no sound when the audio data in the audio reception buffer unit continues for a certain period of time and is below a predetermined peak value, and determines that there is sound when the peak value is exceeded. And the buffer control means discards the audio data determined to be no data or no sound, closes the remaining audio data, and outputs it to the audio processing means. It was.

また、サーバにおいては、ネットワークカメラからの画像、音声を蓄積し記録を行うが、記憶容量の関係上、２４時間連続記録ではなく、環境変化が発生した際の画像、音声のみを蓄積することがあった。 In the server, images and sounds from the network camera are stored and recorded. However, due to storage capacity, only images and sounds when an environmental change occurs can be stored instead of continuous recording for 24 hours. there were.

特開２００４−１４７２６２号公報（段落００１０）JP 2004-147262 A (paragraph 0010) 特開２００２−７７７０４号公報（段落０００５）JP 2002-77704 A (paragraph 0005) 特開２００６−１４１５０号公報（段落００１２）JP 2006-14150 (paragraph 0012)

従来のネットワークカメラは以上のように構成され、蓄積のトリガとして、ネットワークカメラが発するモーションディテクトや音声により変化を利用するシステムでは、カメラが発する変化を受信した時から画像，音声の記録を開始しても変化発生の以前を記録できないため常に一定のデータをサーバ内でバッファに蓄積しておく必要があるという課題があった。 A conventional network camera is configured as described above, and in a system that uses changes by motion detection or voice emitted by a network camera as a trigger for accumulation, recording of images and sounds is started from the time when the change emitted by the camera is received. However, there is a problem that it is necessary to always store a certain amount of data in a buffer in the server because it is impossible to record before the change occurs.

この発明は上記のような課題を解決するためになされたもので、音声変化や画像変化の情報を対応する画像，音声よりも先に送ることにより、変化発生の以前の画像，音声を記録することができ、蓄積を行う際に、サーバ内でバッファを不要とするネットワークカメラを得ることを目的とする。 The present invention has been made to solve the above-described problems, and records the image and sound before the occurrence of the change by sending the information of the sound change and the image change before the corresponding image and sound. An object of the present invention is to obtain a network camera that does not require a buffer in the server when storing.

この発明に係るネットワークカメラは、入力した画像を符号化する画像符号化部と、画像と共に入力した音声を符号化する音声符号化部と、符号化された画像及び音声を格納するサーバに格納するタイミングを通知するために、入力音声が所定の閾値以上変化した場合、変化したことを示す音声変化情報を出力する音声変化認識処理部と、画像の符号化後データと音声の符号化後データと音声変化情報とをＩＰ変換すると共に、画像及び音声の符号化後データより先に変換されている音声変化情報を、画像及び音声の符号化後データより先に外部に送信するＩＰ変換部とを備えるものである。 A network camera according to the present invention stores an image encoding unit that encodes an input image, an audio encoding unit that encodes audio input together with the image, and a server that stores the encoded image and audio. In order to notify the timing, when the input voice changes by a predetermined threshold or more, a voice change recognition processing unit that outputs voice change information indicating the change, encoded data of the image, and encoded data of the voice, An IP conversion unit for performing IP conversion on audio change information and transmitting the audio change information converted before the encoded image and audio data to the outside before the encoded image and audio data; It is to be prepared.

この発明によれば、音声変化や画像変化の情報を対応する画像，音声よりも先に送ることにより、変化発生の以前の画像，音声を記録することができ、蓄積を行う際に、サーバ内でバッファを不要とするという効果を奏する。 According to the present invention, by sending information on audio changes and image changes before the corresponding images and sounds, it is possible to record the images and sounds before the occurrence of the change. This produces the effect of eliminating the need for a buffer.

以下、この発明の実施の一形態を説明する。
実施の形態１．
図１は、この発明の実施の形態１のネットワークカメラ１を含むネットワークカメラシステムの構成図である。
図１において、ネットワークカメラシステムは、例えばネットワークカメラ１がネットワーク３を経由してＩＰパケットをサーバ２に送信するものである。ネットワークカメラ１は、画像符号化部４と、音声符号化部５と、ＩＰ変換部６と、音声変化認識処理部７とを備え、サーバ２は、パケット判断部８と、蓄積部９とを備える。 An embodiment of the present invention will be described below.
Embodiment 1 FIG.
FIG. 1 is a configuration diagram of a network camera system including a network camera 1 according to Embodiment 1 of the present invention.
In FIG. 1, the network camera system is such that, for example, the network camera 1 transmits an IP packet to the server 2 via the network 3. The network camera 1 includes an image encoding unit 4, a voice encoding unit 5, an IP conversion unit 6, and a voice change recognition processing unit 7. The server 2 includes a packet determination unit 8 and a storage unit 9. Prepare.

ここで、画像符号化部４は入力した画像をＡ／Ｄ変換，符号化し、音声符号化部５は入力した音声をＡ／Ｄ変換，符号化する。 Here, the image encoding unit 4 performs A / D conversion and encoding on the input image, and the audio encoding unit 5 performs A / D conversion and encoding on the input audio.

音声変化認識処理部７は、符号化された画像及び音声を格納するサーバ２に格納するタイミングを通知するために、入力音声が所定の閾値以上変化した場合、変化したことを示す音声変化情報を出力するものであり、入力される音声のレベルを監視しており、予め指定された閾値レベルを超えた場合、音声変化があったことにより環境変化が発生したと判断をし、ＩＰ変換部６に音声変化発生を伝える。ネットワークカメラ１に入力された音声は、音声符号化部５と同時に音声変化認識処理部７にも渡される。 The voice change recognition processing unit 7 provides voice change information indicating that the input voice has changed when the input voice changes by a predetermined threshold value or more in order to notify the storage timing in the server 2 for storing the encoded image and voice. The level of the input voice is monitored, and when the threshold level specified in advance is exceeded, it is determined that the environmental change has occurred due to the voice change, and the IP conversion unit 6 Tell the voice change occurrence. The voice input to the network camera 1 is transferred to the voice change recognition processing unit 7 simultaneously with the voice encoding unit 5.

音声変化認識処理部７での音声変化ＩＰパケット作成条件は、上記の他に、入力される音声のレベルが、予め指定された閾値レベルを下回った場合、ある特定範囲内に納まった場合、特定範囲内に納まった状態が一定時間継続した場合等、任意に設定可能とすることができるものとする。 In addition to the above, the voice change IP packet creation condition in the voice change recognition processing unit 7 is specified when the input voice level falls below a predetermined threshold level, or falls within a specific range. It can be arbitrarily settable when the state within the range continues for a certain period of time.

ＩＰ変換部６は、画像の符号化後データと音声の符号化後データと音声変化情報とをＩＰ変換すると共に、画像及び音声の符号化後データより先に変換されている音声変化情報を、画像及び音声の符号化後データより先に外部に送信するものであり、画像，音声，音声変化のＩＰパケットを作成し、ネットワーク３に配信を行う。 The IP conversion unit 6 performs IP conversion on the encoded image data, the encoded audio data, and the audio change information, and converts the audio change information converted before the encoded image and audio data, The data is transmitted to the outside prior to the encoded data of the image and sound, and the IP packet of the image, sound and sound change is created and distributed to the network 3.

サーバ２において、パケット判断部８は、ネットワーク３に配信されたＩＰパケットが、受信側であるサーバ２において受信されパケットを蓄積すべきか否かを判断する。パケット判断部８は音声変化ＩＰパケットの音声変化情報を認識したときに、画像と音声の蓄積を開始し、任意に設定された時間が経過したときに、画像と音声の蓄積が終了する。 In the server 2, the packet determination unit 8 determines whether an IP packet distributed to the network 3 is received by the server 2 on the receiving side and should be stored. When the packet determining unit 8 recognizes the voice change information of the voice change IP packet, it starts to store the image and the voice, and when the arbitrarily set time has elapsed, the storage of the image and the voice ends.

蓄積部９は、パケット判断部８で蓄積すべきと判断したパケットを蓄積し、記憶容量の節約のため、画像、音声の蓄積は常時記録ではなく、ネットワークカメラ１周辺で環境変化が有った時に限定しているものとする。蓄積部９に蓄積されたデータは、ユーザが任意に読み出すことができる。 The accumulating unit 9 accumulates the packets that are determined to be accumulated by the packet determining unit 8, and in order to save the storage capacity, the accumulation of images and sounds is not always recorded, but there is an environmental change around the network camera 1 Limited to time. The data stored in the storage unit 9 can be arbitrarily read by the user.

ネットワークカメラ１は、音声変化ＩＰパケットの作成が、情報量の少なさから画像の符号化に比して短時間で行うことができることより、配信された音声変化ＩＰパケットが、受信側であるサーバ２に、変化検出時の画像符号化ＩＰパケットより早く到達することができる。また音声変化認識処理部７を優先的に処理することもできる。 Since the network camera 1 can create a voice change IP packet in a shorter time than the encoding of an image due to a small amount of information, the distributed voice change IP packet is a server on the receiving side. 2 can be reached earlier than the image encoded IP packet at the time of detecting the change. Further, the voice change recognition processing unit 7 can be preferentially processed.

音声の符号化に要する時間は、画像の符号化に比較して、短時間で実施できるが、受信側での画像、音声の同期を取るため、音声配信には、遅延を設けることが一般的であり、受信側のサーバ２には、画像符号化ＩＰパケット、音声符号化ＩＰパケットが同時に届くと考えることができる。 The time required for audio encoding can be reduced in a short time compared to image encoding. However, in order to synchronize the image and audio on the receiving side, it is common to provide a delay in audio distribution. Therefore, it can be considered that the image-encoded IP packet and the audio-encoded IP packet reach the receiving server 2 at the same time.

次に動作について説明する。
図２は、この発明の実施の形態１における音声変化認識に要する時間と画像符号化に要する時間差とを示した一例の図である。
ネットワークカメラ１内部の画像，音声符号化の遅延時間を考え、画像，音声，音声変化のＩＰパケットのサーバへの到達時間を時系列で表すと、例えば図２に示すようになる。画像Ａ（音声Ａ）, 画像Ｂ（音声Ｂ）, 画像Ｃ（音声Ｃ）・・・・の順番に画像，音声がネットワークカメラ１に入力されていた際に、音声Ｂにおいて、音声変化認識処理部７が変化を認識したとすると、ネットワークカメラ１内で、音声Ｂ以前の画像Ａが符号化され配信される以前に、音声Ｂをきっかけとする音声変化情報のＩＰパケットを配信することができ、サーバ２への到達順序は、音声Ｂをきっかけとする音声変化情報のＩＰパケット、画像（音声）符号化パケットＡの順番となり、ネットワークカメラ１での時間的順番が入れ替わっている。
図２では、画像符号化の遅延を１フレームとしているが、遅延量はネットワークカメラ１の構造により異なる。（遅延時間は約３００ｍｓあれば、１０フレーム分の時間差が生じる）。 Next, the operation will be described.
FIG. 2 is a diagram showing an example of the time required for speech change recognition and the time difference required for image coding in Embodiment 1 of the present invention.
Considering the delay time of the image and voice encoding inside the network camera 1, the arrival time of the IP packet of the image, voice and voice change to the server is shown in time series, for example, as shown in FIG. When the image and sound are input to the network camera 1 in the order of image A (audio A), image B (audio B), image C (audio C),. If the unit 7 recognizes the change, the IP packet of the audio change information triggered by the audio B can be distributed before the image A before the audio B is encoded and distributed in the network camera 1. The arrival order to the server 2 is the order of the IP packet of the voice change information and the image (voice) encoded packet A triggered by the voice B, and the temporal order in the network camera 1 is switched.
In FIG. 2, the delay of image encoding is one frame, but the amount of delay differs depending on the structure of the network camera 1. (If the delay time is about 300 ms, a time difference of 10 frames occurs).

受信側であるサーバ２では、受信したＩＰパケットをパケット判断部８で判別し、音声変化ＩＰパケットが到着した場合には、ネットワークカメラ１近辺に環境変化が発生したと判断し、受信する画像符号化ＩＰパケット、音声符号化ＩＰパケットを、画像，音声のデータとして蓄積部９に蓄積を開始する。 In the server 2 on the receiving side, the received IP packet is determined by the packet determining unit 8, and when the voice change IP packet arrives, it is determined that an environmental change has occurred in the vicinity of the network camera 1, and the received image code The storage unit 9 starts storing the encrypted IP packet and the voice encoded IP packet as image and voice data.

図２に示すように、受信側であるサーバ２には、音声変化ＩＰパケットが到達後に、音声変化が発生する以前の画像Ａ（音声Ａ）符号化データが到着する。音声変化ＩＰパケットの到着により、画像、音声の蓄積を開始しているので、環境変化が発生した画像Ｂ(音声Ｂ)以前の画像Ａ(音声Ａ)より、蓄積を開始することができ、蓄積部９のみで環境変化が発生した前の画像，音声から蓄積することができるので、バッファを設ける必要がない。ネットワークカメラ１内の画像符号化の遅延時間の長さにより、環境変化以前の取得できるフレーム数が変わる。遅延が大きいほど、取得できる画像フレーム数が増える。 As shown in FIG. 2, after receiving the voice change IP packet, encoded image A (voice A) encoded data before the voice change arrives at the server 2 on the receiving side. Since the storage of the image and the sound is started upon arrival of the voice change IP packet, the storage can be started from the image A (voice A) before the image B (voice B) in which the environmental change has occurred. Since the image and sound before the environment change can be accumulated only by the unit 9, it is not necessary to provide a buffer. The number of frames that can be acquired before the environmental change changes depending on the length of the delay time of image encoding in the network camera 1. The larger the delay, the more image frames that can be acquired.

従来技術では、音声変化検出後に記録が開始されるので、変化検出以前の画像、音声（図２では、画像Ａ,音声Ａ）を記録するには、前もってサーバ２内のバッファに一時的にデータを蓄えておく必要があったが、実施の形態１において、ネットワークカメラ１より配信される音声変化情報のＩＰパケット検出をトリガとして画像、音声の蓄積を開始することにより、サーバ２側では、バッファを持たせることなく、環境変化発生以前の画像、音声の蓄積を行うことができるようになる。 In the prior art, recording is started after the change in sound is detected. Therefore, in order to record the image and sound before the change is detected (image A and sound A in FIG. 2), data is temporarily stored in a buffer in the server 2 in advance. However, in the first embodiment, the server 2 side starts buffering the image and sound by using the IP packet detection of the sound change information distributed from the network camera 1 as a trigger. It is possible to store images and sounds before the occurrence of environmental changes.

実施の形態１では、音声を遅延させ画像と同期を取ることで説明をしてきたが、同期を取らずに音声を画像に先行して符号化し、ネットワークカメラ１より配信することも可能である。
この場合、音声符号化部５で画像と同期を取るための遅延を行う必要がなくなるため、ネットワークカメラ１内の構造が簡潔になる。 Although the first embodiment has been described by delaying the sound and synchronizing with the image, it is also possible to encode the sound prior to the image without synchronizing and distribute it from the network camera 1.
In this case, since it is not necessary to perform a delay for synchronizing with the image in the audio encoding unit 5, the structure in the network camera 1 is simplified.

以上のように、実施の形態１によれば、ネットワークカメラ１は、入力した画像を符号化する画像符号化部４と、画像と共に入力した音声を符号化する音声符号化部５と、符号化された画像及び音声を格納するサーバに格納するタイミングを通知するために、入力音声が所定の閾値以上変化した場合、変化したことを示す音声変化情報を出力する音声変化認識処理部７と、画像の符号化後データと音声の符号化後データと音声変化情報とをＩＰ変換すると共に、画像及び音声の符号化後データより先に変換されている音声変化情報を、画像及び音声の符号化後データより先に外部に送信するＩＰ変換部６とを備えることにより、音声変化や画像変化の情報を対応する画像，音声よりも先に送り、変化発生の以前の画像，音声を記録することができ、蓄積を行う際に、サーバ内でバッファを不要とするという効果を奏する。 As described above, according to the first embodiment, the network camera 1 includes the image encoding unit 4 that encodes the input image, the audio encoding unit 5 that encodes the audio input together with the image, and the encoding. A voice change recognition processing unit 7 for outputting voice change information indicating that the input voice has changed when the input voice has changed by a predetermined threshold value or more in order to notify the storage timing to the server for storing the generated image and voice; The encoded data, the encoded audio data, and the audio change information are IP-converted, and the audio change information converted before the encoded image and audio data is converted into the encoded image and audio By providing the IP conversion unit 6 that transmits to the outside prior to the data, it is possible to send the information on the sound change and the image change before the corresponding image and sound, and to record the image and sound before the occurrence of the change. Can, when performing accumulation, an effect that eliminates the need for buffer in the server.

また実施の形態１によれば、サーバ２側での蓄積容量の効率化を図るという効果を奏するという効果を奏する。 In addition, according to the first embodiment, there is an effect that the storage capacity is improved on the server 2 side.

実施の形態２．
図３はこの発明の実施の形態２のネットワークカメラ１を含むネットワークカメラシステムの構成図である。
実施の形態２のネットワークカメラ１は、実施の形態１のネットワークカメラ１の構成に加えて、画像変化認識処理部１０を追加したものである。
入力画像は、画像符号化部４と同時に画像変化認識処理部１０にも送られる。 Embodiment 2. FIG.
FIG. 3 is a configuration diagram of a network camera system including the network camera 1 according to the second embodiment of the present invention.
The network camera 1 according to the second embodiment is obtained by adding an image change recognition processing unit 10 to the configuration of the network camera 1 according to the first embodiment.
The input image is sent to the image change recognition processing unit 10 simultaneously with the image encoding unit 4.

画像変化認識処理部１０では、画像の変化を検出した場合、環境の変化が発生したと認識し、ＩＰ変換部６に環境変化の発生を伝える。ＩＰ変換部６は、画像変化ＩＰパケットを生成し、ネットワーク３に配信を行う。画像変化認識処理部１０以外は、実施の形態1と同様に動作を行っており、画像，音声の符号化、音声変化の認識が行われている。 When the image change recognition processing unit 10 detects a change in the image, the image change recognition processing unit 10 recognizes that the environment change has occurred, and notifies the IP conversion unit 6 of the occurrence of the environment change. The IP conversion unit 6 generates an image change IP packet and distributes it to the network 3. Other than the image change recognition processing unit 10, the operation is performed in the same manner as in the first embodiment, and image and sound encoding and sound change recognition are performed.

実施の形態２においては、ネットワークカメラ１付近において、画像，音声の変化が発生した場合、音声変化ＩＰパケットと、画像変化ＩＰパケットの双方がそれぞれ配信されるが、音声変化と画像変化の両方が発生した場合のみを、サーバで蓄積部９に記録することにより、不要な記録の発生を抑えて、蓄積容量の節約を図る。 In the second embodiment, when a change in image and sound occurs in the vicinity of the network camera 1, both the sound change IP packet and the image change IP packet are distributed, but both the sound change and the image change occur. Only when it occurs, it is recorded in the storage unit 9 by the server, thereby preventing unnecessary recording and saving storage capacity.

ネットワークカメラ１の周辺で、音声変化と画像変化の双方が発生した場合には、まずサーバ２において音声変化ＩＰパケット到着後に、画像，音声の記録を開始してから、設定された時間内に画像変化ＩＰパケットが到着するので、蓄積部９の蓄積データの破棄は行わずに蓄積を継続する。 When both the audio change and the image change occur in the vicinity of the network camera 1, first, after the audio change IP packet arrives at the server 2, the recording of the image and audio is started, and then the image is set within the set time. Since the changed IP packet arrives, accumulation is continued without discarding the accumulated data in the accumulation unit 9.

ネットワークカメラ１の周辺で、画像変化と音声変化の双方が発生した場合で、画像ＩＰパケットが音声変化ＩＰパケットより先に発行された場合、画像変化ＩＰパケット受信後に、任意に設定された時間内に、音声変化ＩＰパケットを受信すれば蓄積データの破棄は行われず、蓄積部９の蓄積は継続される。 When both the image change and the sound change occur around the network camera 1 and the image IP packet is issued before the sound change IP packet, within the arbitrarily set time after receiving the image change IP packet In addition, if the voice change IP packet is received, the stored data is not discarded, and the storage in the storage unit 9 is continued.

ネットワークカメラ１の周辺で音声変化のみが発生した場合、ネットワークカメラ１からは、音声変化ＩＰパケットが発信されるが、画像変化ＩＰパケットの配信は行われない。サーバ２のパケット判断部８においては、音声変化ＩＰパケット到着後に、画像，音声の記録を開始しているが、任意に設定された時間内に、画像変化ＩＰパケットが到着しない場合には、それまで蓄積部９に記録していた画像，音声データを破棄すると共に、以降の蓄積を停止する。 When only a voice change occurs around the network camera 1, a voice change IP packet is transmitted from the network camera 1, but the image change IP packet is not distributed. The packet determination unit 8 of the server 2 starts recording the image and audio after arrival of the audio change IP packet. If the image change IP packet does not arrive within an arbitrarily set time, Until the image and sound data recorded in the storage unit 9 are discarded, the subsequent storage is stopped.

ネットワークカメラ１の周辺で、画像の変化のみが発生した場合、画像変化ＩＰパケットが発行されるが、音声変化ＩＰパケットは発行されない。サーバ２においては、画像変化ＩＰパケットを受信することにより、画像，音声の記録を開始するが、任意に設定された時間内に、音声変化ＩＰパケットが到着しない場合には、それまで蓄積部９に記録していた画像，音声データを破棄すると共に、以降の蓄積を停止する。 When only an image change occurs around the network camera 1, an image change IP packet is issued, but an audio change IP packet is not issued. In the server 2, recording of an image and a sound is started by receiving the image change IP packet. However, if the sound change IP packet does not arrive within an arbitrarily set time, the storage unit 9 until that time. Discards the image and sound data recorded in, and stops the subsequent accumulation.

画像，音声双方で環境変化を検出した際にのみ蓄積を実施することとなるので、不要なデータ蓄積を回避することができる。
画像変化ＩＰパケットが先に発行された場合にも、画像，音声双方で環境変化を検出した際にのみ蓄積を実施することとなるので、不要なデータ蓄積を回避することができる。また、画像変化ＩＰパケットが到着のみにより蓄積部９に蓄積させることもできる。 Since accumulation is performed only when an environmental change is detected for both images and sounds, unnecessary data accumulation can be avoided.
Even when the image change IP packet is issued first, the storage is performed only when the environmental change is detected in both the image and the sound, so that unnecessary data storage can be avoided. Further, the image change IP packet can be stored in the storage unit 9 only by arrival.

以上のように、実施の形態２によれば、入力画像が所定の閾値以上変化した場合、変化したことを示す画像変化情報を出力する画像変化認識処理部１０を備え、ＩＰ変換部６は、画像変化情報をＩＰ変換すると共に、画像及び音声の符号化後データより先に変換されている画像変化情報及び音声変化情報を、画像及び音声の符号化後データより先に外部に送信することにより、音声変化や画像変化の情報を対応する画像，音声よりも先に送ることにより、変化発生の以前の画像，音声を記録することができ、蓄積を行う際に、サーバ内でバッファを不要とするという効果を奏する。 As described above, according to the second embodiment, when the input image changes by a predetermined threshold or more, the image change recognition processing unit 10 that outputs the image change information indicating the change is provided. By converting the image change information into IP and transmitting the image change information and the audio change information converted before the encoded image and audio data to the outside before the encoded image and audio data By sending audio change and image change information before the corresponding image and sound, it is possible to record the image and sound before the change occurs, and no buffer is required in the server when storing The effect of doing.

この発明の実施の形態１のネットワークカメラを含むネットワークカメラシステムの構成図である。1 is a configuration diagram of a network camera system including a network camera according to a first embodiment of the present invention. この発明の実施の形態１における音声変化認識に要する時間と画像符号化に要する時間差を示した一例の図である。It is a figure of an example which showed the time required for the audio | voice change recognition in Embodiment 1 of this invention, and the time difference required for image coding. この発明の実施の形態２のネットワークカメラを含むネットワークカメラシステムの構成図である。It is a block diagram of the network camera system containing the network camera of Embodiment 2 of this invention.

Explanation of symbols

１ネットワークカメラ、２サーバ、３ネットワーク、４画像符号化部、５音声符号化部、６ＩＰ変換部、７音声変化認識処理部、８パケット判断部、９蓄積部、１０画像変化認識処理部。 DESCRIPTION OF SYMBOLS 1 Network camera, 2 servers, 3 networks, 4 Image coding part, 5 Voice coding part, 6 IP conversion part, 7 Voice change recognition process part, 8 Packet judgment part, 9 Storage part, 10 Image change recognition process part

Claims

An image encoding unit for encoding the input image;
An audio encoding unit for encoding audio input together with the image;
A voice change recognition processing unit that outputs voice change information indicating that the input voice has changed when the input voice changes by a predetermined threshold value or more in order to notify the storage timing of the encoded image and voice in the server. When,
The image-encoded data, the audio encoded data, and the audio change information are IP-converted, and the audio change information converted before the image and audio encoded data is converted to A network camera, comprising: an IP conversion unit that transmits to the outside prior to encoded image and audio data.

An image change recognition processing unit that outputs image change information indicating a change when the input image changes by a predetermined threshold value or more;
The IP conversion unit performs IP conversion on the image change information and converts the image change information and sound change information converted before the image and sound encoded data into the image and sound encoded data. 2. The network camera according to claim 1, wherein the network camera is transmitted to the outside earlier.