JP2023069116A

JP2023069116A - Information processing apparatus, imaging apparatus, control method thereof, image processing system and program

Info

Publication number: JP2023069116A
Application number: JP2021180751A
Authority: JP
Inventors: 孝一水谷; Koichi Mizutani
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2023-05-18

Abstract

To provide an information processing apparatus, a control method thereof, an imaging apparatus, an image processing system having the imaging apparatus and the information processing apparatus, and a program in which characteristics of vibration detected can be easily understood by a user.SOLUTION: In an image processing system in which a network camera can be connected to a client device through a network, the client device being an information processing apparatus acquires an imaged image of a subject, area information indicating an area of a portion in which vibration occurs in the subject, and vibration sound information representing a sound corresponding to the vibration of the portion from an imaging apparatus, superposes the area indicated by the area information on the imaged image to display on a display unit, and outputs the vibration sound from a sound emitting unit on the basis of the vibration sound information when a selection operation for the area which is displayed on the display unit by a user is accepted.SELECTED DRAWING: Figure 7

Description

本発明は、撮像センサによる振動検出に関する。 The present invention relates to vibration detection by an imaging sensor.

ファクトリーオートメーション（ＦａｃｔｏｒｙＡｕｔｏｍａｔｉｏｎ（ＦＡ））分野等では、作業の工程における装置や部品に対する振動を解析するために、振動を検知するための振動センサが用いられる。特許文献１には、当該振動センサとして、撮像装置におけるイベント駆動型の撮像素子（イベントベースタイプのイメージセンサ（イベントベースセンサ））が開示されている。 2. Description of the Related Art In the field of factory automation (FA) and the like, vibration sensors are used to detect vibrations in order to analyze vibrations of devices and parts in work processes. Patent Document 1 discloses an event-driven imaging device (event-based image sensor (event-based sensor)) in an imaging device as the vibration sensor.

特開２０１９－１３４２７１号公報JP 2019-134271 A

特許文献１に開示されているイベントベースセンサは、振動を検知することは可能であるが、解析者（ユーザ）が検知された振動を解析する際に、当該振動がどのような特徴を有する振動なのかを判断するのは困難であった。 The event-based sensor disclosed in Patent Literature 1 can detect vibration, but when an analyst (user) analyzes the detected vibration, the vibration has what kind of characteristics. It was difficult to determine whether

本発明は上記課題に鑑みてなされたものであり、検知された振動の特徴をユーザが容易に理解できるようにするための技術を提供することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and an object of the present invention is to provide a technology that enables a user to easily understand the characteristics of the detected vibration.

上記目的を達成する一手段として本発明の情報処理装置は、以下の構成を有する。すなわち、撮像装置から、被写体の撮影画像と、前記被写体において振動が生じた部分の領域を示す領域情報と、前記部分の振動に応じた音を表す振動音の情報とを取得する取得手段と、前記撮影画像に前記領域情報により示される前記領域を重畳して表示部に表示する表示制御手段と、ユーザによる前記表示部に対する操作を受け付ける受付手段と、前記受付手段により前記領域に対する選択操作が受け付けられた場合に、前記振動音の情報に基づき、前記振動音を発音部から出力させる音制御手段と、を有する。 As one means for achieving the above object, the information processing apparatus of the present invention has the following configuration. That is, an acquisition means for acquiring, from an imaging device, a photographed image of a subject, region information indicating a region of a portion of the subject where vibration occurs, and vibration sound information representing a sound corresponding to the vibration of the portion; display control means for superimposing the area indicated by the area information on the captured image and displaying the area on a display section; reception means for receiving an operation by a user on the display section; and reception means for receiving a selection operation on the area. and sound control means for outputting the vibrating sound from the sound generating unit when the vibrating sound is generated.

本発明によれば、検知された振動の特徴をユーザが容易に理解できるようにするための技術が提供される。 SUMMARY OF THE INVENTION In accordance with the present invention, techniques are provided to facilitate user comprehension of the characteristics of detected vibrations.

画像処理システムの概略図を示す。1 shows a schematic diagram of an image processing system; FIG. ネットワークカメラのハードウェア構成の一例を示すブロック図である。2 is a block diagram showing an example of the hardware configuration of a network camera; FIG. クライアント装置のハードウェア構成の一例を示すブロック図である。3 is a block diagram showing an example of the hardware configuration of a client device; FIG. 音生成部の機能構成の概念図を示す。4 shows a conceptual diagram of a functional configuration of a sound generator; FIG. 振動音の生成過程を説明するための図である（（ａ）はホワイトノイズ、（ｂ）は検知された振動の周波数特性、（ｃ）は振動音の周波数特性）。FIG. 4 is a diagram for explaining the process of generating vibration sound ((a) is white noise, (b) is the frequency characteristic of the detected vibration, and (c) is the frequency characteristic of the vibration sound). ネットワークカメラにより実行される例示的な処理のフローチャートである。4 is a flowchart of exemplary processing performed by a network camera; 第１実施形態によるクライアント装置により実行される例示的な処理のフローチャートである。4 is a flowchart of exemplary processing performed by a client device according to the first embodiment; （ａ）は撮影画像の一例を示し、（ｂ）は第１実施形態によるクライアント装置に表示される画面の一例を示す。(a) shows an example of a captured image, and (b) shows an example of a screen displayed on the client device according to the first embodiment. 第２実施形態によるクライアント装置により実行される例示的な処理のフローチャートである。4 is a flowchart of exemplary processing performed by a client device according to the second embodiment; 第２実施形態によるクライアント装置に表示される画面の一例を示す。10 shows an example of a screen displayed on the client device according to the second embodiment;

以下、添付図面を参照して、本発明を実施するための実施形態について詳細に説明する。なお、以下に説明する実施形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正または変更されるべきものであり、本発明は以下の実施形態に限定されるものではない。また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。 Embodiments for carrying out the present invention will be described in detail below with reference to the accompanying drawings. The embodiments described below are examples of means for realizing the present invention, and should be appropriately modified or changed according to the configuration of the apparatus to which the present invention is applied and various conditions. It is not limited to the embodiment. Also, not all combinations of features described in the present embodiment are essential for the solution means of the present invention.

［第１実施形態］
（システム構成）
図１（ａ）に、本実施形態による画像処理システムの構成の概略図を示す。本画像処理システムは、ネットワークカメラ１とクライアント装置２で構成され、ネットワークカメラ１とクライアント装置２はネットワーク３で接続可能に構成されている。 [First embodiment]
(System configuration)
FIG. 1A shows a schematic diagram of the configuration of an image processing system according to this embodiment. This image processing system is composed of a network camera 1 and a client device 2 , and the network camera 1 and the client device 2 are configured to be connectable via a network 3 .

ネットワークカメラ１は、後述するように撮像部（撮像部２０３）を備え、任意の被写体に対して当該撮像部により撮像することにより得られた信号から、撮影画像を生成することが可能な撮像装置である。クライアント装置２は、例えばパーソナルコンピュータ（ＰＣ）、携帯電話、スマートフォン、ＰＤＡ、タブレット端末といった任意の情報処理装置である。 The network camera 1 includes an imaging unit (imaging unit 203) as described later, and is an imaging device capable of generating a captured image from a signal obtained by imaging an arbitrary subject with the imaging unit. is. The client device 2 is any information processing device such as a personal computer (PC), mobile phone, smart phone, PDA, or tablet terminal.

（ネットワークカメラ１の構成）
図２を参照して、ネットワークカメラ１の構成例について説明する。まず、ネットワークカメラ１のハードウェア構成について説明する。図２（ａ）は、ネットワークカメラ１のハードウェア構成の一例を示すブロック図である。ネットワークカメラ１は、そのハードウェア構成の一例として、記憶部２０１、制御部２０２、撮像部２０３、集音部２０４、入力部２０５、表示部２０６、発音部２０７、および通信部２０８を有する。 (Configuration of network camera 1)
A configuration example of the network camera 1 will be described with reference to FIG. First, the hardware configuration of the network camera 1 will be described. FIG. 2A is a block diagram showing an example of the hardware configuration of the network camera 1. As shown in FIG. The network camera 1 has a storage unit 201, a control unit 202, an imaging unit 203, a sound collection unit 204, an input unit 205, a display unit 206, a sound generation unit 207, and a communication unit 208 as an example of its hardware configuration.

記憶部２０１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリにより構成され、後述する各種動作を行うためのプログラムや、通信のための通信パラメータ等の各種情報を記憶する。なお、記憶部２０１として、ＲＯＭ、ＲＡＭ等のメモリの他に、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＤＶＤなどの記憶媒体を用いてもよい。また、記憶部２０１が複数のメモリ等を備えていてもよい。 The storage unit 201 includes memories such as ROM (Read Only Memory) and RAM (Random Access Memory), and stores programs for performing various operations described later and various information such as communication parameters for communication. As the storage unit 201, in addition to memories such as ROM and RAM, storage media such as flexible disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, magnetic tapes, non-volatile memory cards, DVDs, etc. may be used. Also, the storage unit 201 may include a plurality of memories or the like.

制御部２０２は、例えば、１つ以上のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）により構成され、記憶部２０１に記憶されたプログラムを実行することにより、ネットワークカメラ１を制御する。 The control unit 202 is composed of, for example, one or more CPUs (Central Processing Units), and controls the network camera 1 by executing programs stored in the storage unit 201 .

撮像部２０３は、レンズ及び撮像素子から構成され、被写体像を電気信号へ変換する光電変換を行う。撮像部２０３にて撮像および光電変換された電気信号は、画像処理部２１１（図２（ｂ））に伝達される。
また、本実施形態において、撮像部２０３の撮像素子はイベントベースセンサであり、画素の輝度変化情報（変化量、変化傾向等）、当該画素のアドレス（Ｘ－Ｙ座標）、変化の時間情報（変化のタイミング／時間等）を出力することが可能に構成される。これにより、撮像部２０３は、被写体の動きを輝度の変化として出力することができる。一般的なイベントベースセンサの検知レートは、約１Ｇ（１×１０Ｅ９）Ｅｖｅｎｔ／ｓｅｃであり、その場合、１００×１００画素の領域では最大で５０ＫＨｚまでのセンシングが可能である。画素の輝度変化情報、当該画素のアドレス（Ｘ－Ｙ座標）、時間情報は、振動検知部２１３（図２（ｂ））に伝達される。
なお、撮像部２０３は、例えば、アバランシェフォトダイオード（ＡＰＤ）によって構成された画素を有していてもよい。具体的には、撮像部２０３は、各々のＡＰＤ（画素）に入射した光子の数を計数し、更にセンサ外部から供給されるクロック（あるいはそのクロックを用いてセンサ内で生成されるクロック）を用いて、光子が一定の数になるまでの時間をカウントする。撮像部２０３は、計測された光子の数がＮ回目に閾値を超えるのに要した第１の時間と、同様に光子の数がＮ＋１回目に閾値を超えるのに要した第２の時間との比較結果に応じて画素の輝度の変化を検出する。あるいは、撮像部２０３は、一定周期で光子をカウントし、Ｎ回目の光子数とＮ＋１回目の光子数との変化に応じて輝度の変化を検出してもよい。撮像部２０３の画素としてアバランシェフォトダイオード（ＡＰＤ）を用いることによって、読み出しノイズが少ないため精度よく振動を検出できる。 The imaging unit 203 is composed of a lens and an imaging device, and performs photoelectric conversion for converting a subject image into an electrical signal. An electric signal imaged and photoelectrically converted by the imaging unit 203 is transmitted to the image processing unit 211 (FIG. 2B).
Further, in the present embodiment, the imaging device of the imaging unit 203 is an event-based sensor, and luminance change information of a pixel (amount of change, tendency of change, etc.), address of the pixel (XY coordinates), time information of change ( timing/time of change, etc.) can be output. Accordingly, the imaging unit 203 can output the movement of the subject as a change in luminance. A typical event-based sensor has a sensing rate of about 1 G (1×10E9) Event/sec, which allows sensing up to 50 KHz in an area of 100×100 pixels. The luminance change information of the pixel, the address (XY coordinates) of the pixel, and the time information are transmitted to the vibration detection section 213 (FIG. 2(b)).
Note that the imaging unit 203 may have pixels configured by, for example, avalanche photodiodes (APDs). Specifically, the imaging unit 203 counts the number of photons that have entered each APD (pixel), and further calculates a clock supplied from outside the sensor (or a clock generated within the sensor using that clock). is used to count the time it takes for photons to reach a certain number. The imaging unit 203 calculates the difference between the first time required for the number of photons measured to exceed the threshold for the Nth time and the second time required for the number of photons to exceed the threshold for the N+1th time. A change in luminance of the pixel is detected according to the comparison result. Alternatively, the imaging unit 203 may count photons at a constant cycle and detect a change in luminance according to a change between the number of photons for the Nth time and the number of photons for the N+1th time. By using an avalanche photodiode (APD) as a pixel of the imaging unit 203, readout noise is small, so vibration can be detected with high accuracy.

集音部２０４は、ネットワークカメラ１の外部に存在する音（外部音）を集音し、音声信号として電気信号へ変換する。集音部２０４にて変換された音声信号（電気信号）は、音処理部２１２（図２（ｂ））へ伝達される。外部音の具体例としては、ネットワークカメラ１の周囲の環境音、人物の会話音等がある。 The sound collector 204 collects sounds existing outside the network camera 1 (external sounds) and converts them into electric signals as audio signals. The audio signal (electrical signal) converted by the sound collector 204 is transmitted to the sound processor 212 (FIG. 2(b)). Specific examples of external sounds include environmental sounds around the network camera 1, human conversation sounds, and the like.

入力部２０５は、例えばユーザからの各種操作の受付を行う。表示部２０６は、各種表示の出力を行う。なお、タッチパネルのように入力部２０５と表示部２０６の両方を１つのモジュールで実現するようにしてもよい。
発音部２０７は、各種音声信号を可聴音として発音する。
通信部２０８は、外部装置との有線／無線通信を制御するインタフェースである。 The input unit 205 receives various operations from the user, for example. The display unit 206 outputs various displays. Note that both the input unit 205 and the display unit 206 may be realized by one module like a touch panel.
The pronunciation unit 207 pronounces various audio signals as audible sounds.
A communication unit 208 is an interface that controls wired/wireless communication with an external device.

続いて、ネットワークカメラ１の機能構成について説明する。図２（ｂ）は、ネットワークカメラ１の機能構成の一例を示すブロック図である。ネットワークカメラ１は、その機能構成の一例として、画像処理部２１１、音処理部２１２、振動検知部２１３、属性情報生成部２１４、音生成部２１５、通信制御部２１６、および出力制御部２１７を有する。 Next, the functional configuration of the network camera 1 will be explained. FIG. 2B is a block diagram showing an example of the functional configuration of the network camera 1. As shown in FIG. The network camera 1 has an image processing unit 211, a sound processing unit 212, a vibration detection unit 213, an attribute information generation unit 214, a sound generation unit 215, a communication control unit 216, and an output control unit 217 as an example of its functional configuration. .

画像処理部２１１は、撮像部２０３（図２（ａ））から電気信号を受信し、当該電気信号に対して所定の画像処理を施すことにより、撮影画像を生成する。生成された撮影画像は、通信制御部２１６へ伝達される。
音処理部２１２は、集音部２０４（図２（ａ））から電気信号へ変換された音声信号（外部音）を受信し、当該音声信号に対して増幅、帯域制限、アナログ－デジタル変換の各処理を施し、外部音のデジタル音声信号（外部音の情報）を生成する。生成されたデジタル音声信号は、通信制御部２１６へ伝達される。 The image processing unit 211 receives an electrical signal from the imaging unit 203 (FIG. 2A) and performs predetermined image processing on the electrical signal to generate a captured image. The generated captured image is transmitted to communication control section 216 .
The sound processing unit 212 receives an audio signal (external sound) converted into an electric signal from the sound collecting unit 204 (FIG. 2(a)), and performs amplification, band limitation, and analog-digital conversion on the audio signal. Each processing is performed to generate a digital audio signal of the external sound (external sound information). The generated digital audio signal is transmitted to communication control section 216 .

振動検知部２１３は、撮像部２０３からの情報（画素の輝度変化情報、当該画素のアドレス（Ｘ－Ｙ座標）、時間情報）に基づいて、被写体の周期的な動きを被写体の振動として検知する。振動検知部２１３は、例えば、輝度変化がある画素が所定のアドレス範囲内で留まり、アドレス変位を繰り返しているときに振動として検知することができる。
上述のように、撮像部２０３の撮像素子であるイベントベースセンサは、一般的に、約１Ｇ（１×１０Ｅ９）Ｅｖｅｎｔ／ｓｅｃの検知レートを有し、この場合、１００×１００画素の領域では最大で５０ＫＨｚまでの振動の検知（センシング）が可能である。 The vibration detection unit 213 detects periodic movements of the subject as vibrations of the subject based on information from the imaging unit 203 (pixel luminance change information, pixel address (XY coordinates), and time information). . The vibration detection unit 213 can detect, for example, a vibration when a pixel with a luminance change stays within a predetermined address range and repeats address displacement.
As described above, the event-based sensor, which is the imaging element of the imaging unit 203, generally has a detection rate of about 1 G (1×10E9) Event/sec. It is possible to detect (sensing) vibrations up to 50 KHz.

属性情報生成部２１４は、振動検知部２１３により振動が検知された被写体（対象物）において、撮像部２０３からの情報（輝度変化情報、当該画素のアドレス（Ｘ－Ｙ座標）、時間情報）を用いて演算処理を行うことにより、振動の特性（特徴）を示す属性情報を生成（導出）する。属性情報は例えば、振動の基本周波数、強度、変調度、および断続度のいずれかを含む。属性情報にはさらに、振動検知された部分の位置（領域）を示す領域情報として、振動検知された画素のアドレス（以下、検知アドレスと称する）が付加される、もしくは含まれる。生成された属性情報は、記憶部２０１に記憶される。
基本周波数は、振動を構成する最も低い周波数成分の周波数であり、強度は、振動を構成する周波数成分ごとの強度であり、変調度は、振動を構成する複数の周波数成分の比率であり、断続度は、振動の断続性を示す度合いである。 The attribute information generation unit 214 converts information (brightness change information, pixel address (XY coordinates), time information) from the imaging unit 203 to a subject (object) for which vibration is detected by the vibration detection unit 213. Attribute information indicating the characteristics (features) of the vibration is generated (derived) by performing arithmetic processing using this. The attribute information includes, for example, any one of fundamental frequency, intensity, modulation degree, and intermittence degree of vibration. The attribute information further includes or adds the address of the pixel where the vibration is detected (hereinafter referred to as the detection address) as area information indicating the position (area) of the portion where the vibration is detected. The generated attribute information is stored in the storage unit 201 .
The fundamental frequency is the frequency of the lowest frequency component that makes up the vibration, the intensity is the intensity of each frequency component that makes up the vibration, and the modulation index is the ratio of the multiple frequency components that make up the vibration. The degree is a degree indicating intermittence of vibration.

当該演算には、例えば、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）演算が用いられ、ＦＦＴにより振動強度の周波数分布が算出される。そして、低次の周波数成分から基本周波数が導出され、各周波数成分の平均信号強度から強度が導出され、基本周波数成分と他の周波数成分の比率から変調度が導出される。断続度は、ＦＦＴ演算を行う単位期間（フレーム）において、フレーム間でのデータの不連続の度合いにより算出される。なお、ＦＦＴ演算および、ＦＦＴ演算により変調度、断続度を導出することを含む音声処理は既知の技術である。生成された振動の属性情報は、音生成部２１５と通信制御部２１６に伝達される。 For example, FFT (Fast Fourier Transform) calculation is used for the calculation, and the frequency distribution of vibration intensity is calculated by FFT. Then, the fundamental frequency is derived from the low-order frequency components, the intensity is derived from the average signal strength of each frequency component, and the degree of modulation is derived from the ratio of the fundamental frequency component to the other frequency components. The degree of discontinuity is calculated from the degree of discontinuity of data between frames in a unit period (frame) in which the FFT operation is performed. Note that the FFT calculation and the audio processing including the derivation of the degree of modulation and the degree of discontinuity by the FFT calculation are known techniques. The attribute information of the generated vibration is transmitted to the sound generation section 215 and the communication control section 216 .

音生成部２１５は、属性情報生成部２１４により生成された属性情報から、振動を表す音として振動音を生成する。生成された振動音の情報は、通信制御部２１６へ伝達される。ここで、図４と図５を参照して、生成される振動音について説明する。図４は音生成部２１５の構成の概念図を示し、図５は振動音の生成過程を説明するための図である。 The sound generation unit 215 generates vibration sound as a sound representing vibration from the attribute information generated by the attribute information generation unit 214 . Information about the generated vibration sound is transmitted to the communication control unit 216 . Here, the generated vibration sound will be described with reference to FIGS. 4 and 5. FIG. FIG. 4 shows a conceptual diagram of the configuration of the sound generator 215, and FIG. 5 is a diagram for explaining the process of generating vibrating sound.

図４に示すように、音生成部２１５は、一例として、ホワイトノイズ生成部４１、デジタルフィルタ４２、設定部４３から構成される。まず、ホワイトノイズ生成部４１は、被写体が起こし得る振動の全ての周波数成分を含む信号であるホワイトノイズを生成する（図５（ａ））。次に、設定部４３は、属性情報に基づいて、検知された振動の周波数特性を有する周波数特性を導出し、デジタルフィルタ４２に設定する。設定部４３は、属性情報生成部２１４により導出された基本周波数、強度（レベルに対応）、変調度から、振動の周波数特性を導出してデジタルフィルタ４２に設定することができる。なお、導出手法はこれに限定されない。 As shown in FIG. 4 , the sound generator 215 is composed of, for example, a white noise generator 41 , a digital filter 42 and a setting section 43 . First, the white noise generator 41 generates white noise, which is a signal containing all frequency components of vibrations that the subject may cause (FIG. 5(a)). Next, the setting unit 43 derives frequency characteristics having frequency characteristics of the detected vibration based on the attribute information, and sets them in the digital filter 42 . The setting unit 43 can derive the frequency characteristic of vibration from the fundamental frequency, the intensity (corresponding to the level), and the degree of modulation derived by the attribute information generation unit 214 and set it in the digital filter 42 . Note that the derivation method is not limited to this.

図５（ｂ）に、設定部４３に導出し設定された周波数特性の例を示す。デジタルフィルタ４２は、ホワイトノイズ生成部４１により生成されたホワイトノイズを、図５（ｂ）に示す周波数特性を有するフィルタでフィルタリング（合成）することにより、属性情報を反映した振動音を生成する。図５（ｃ）に、生成（合成）された振動音の周波数特性を示す。このように、デジタルフィルタ４２のフィルタリングにより、検知された振動と同じ周波数成分を有する振動音が生成される。音生成部２１５により生成された振動音は、検知アドレスが付加されて、通信制御部２１６へ伝達される。 FIG. 5B shows an example of frequency characteristics derived and set in the setting unit 43. As shown in FIG. The digital filter 42 filters (synthesizes) the white noise generated by the white noise generator 41 with a filter having the frequency characteristics shown in FIG. 5B, thereby generating vibration sound reflecting attribute information. FIG. 5(c) shows the frequency characteristics of the generated (synthesized) vibration sound. Thus, the filtering of the digital filter 42 generates vibratory sound having the same frequency components as the detected vibration. A detection address is added to the vibration sound generated by the sound generation unit 215 and transmitted to the communication control unit 216 .

通信制御部２１６は、上述のように生成された、撮影画像、音声信号、属性情報、振動音の情報等を、ネットワーク３を介して送信するためにネットワーク信号（例えばＥｔｈｅｒｎｅｔ信号）に変換し、当該ネットワーク信号を、ネットワーク３を介してクライアント装置２へ送信（配信）する。
また、本実施形態による通信制御部２１６は、サーバ機能を有し、制御部２０２により起動されることにより、ネットワーク３を介してクライアント装置２と接続し、接続後に撮影画像の配信を行うための初期画面データをクライアント装置２に送信するように構成される。
出力制御部２１７は、表示部２０６に対する表示制御や発音部２０７に対する発音制御を行う。 The communication control unit 216 converts the captured image, audio signal, attribute information, vibration sound information, etc. generated as described above into a network signal (for example, an Ethernet signal) for transmission via the network 3, The network signal is transmitted (distributed) to the client device 2 via the network 3 .
In addition, the communication control unit 216 according to the present embodiment has a server function, and when activated by the control unit 202, connects to the client device 2 via the network 3, and after connection, performs the distribution of captured images. It is configured to transmit initial screen data to the client device 2 .
The output control unit 217 performs display control for the display unit 206 and sound generation control for the sound generation unit 207 .

（クライアント装置２の構成）
次に、図３を参照して、クライアント装置２の構成例について説明する。まず、クライアント装置２の機能構成について説明する。図３（ａ）は、クライアント装置２のハードウェア構成の一例を示すブロック図である。クライアント装置２は、そのハードウェア構成の一例として、記憶部３０１、制御部３０２、入力部３０３、表示部３０４、発音部３０５、および通信部３０６を有する。記憶部３０１、制御部３０２、入力部３０３、表示部３０４、発音部３０５、および通信部３０６は、図２（ａ）のネットワークカメラ１の記憶部２０１、制御部２０２、入力部２０５、表示部２０６、発音部２０７、および通信部２０８とそれぞれ同様の構成であるため、説明を省略する。 (Configuration of client device 2)
Next, a configuration example of the client device 2 will be described with reference to FIG. First, the functional configuration of the client device 2 will be described. FIG. 3A is a block diagram showing an example of the hardware configuration of the client device 2. As shown in FIG. The client device 2 has a storage unit 301, a control unit 302, an input unit 303, a display unit 304, a sound generation unit 305, and a communication unit 306 as an example of its hardware configuration. The storage unit 301, the control unit 302, the input unit 303, the display unit 304, the sound generation unit 305, and the communication unit 306 are the storage unit 201, the control unit 202, the input unit 205, and the display unit of the network camera 1 shown in FIG. 206, the sound generating unit 207, and the communication unit 208, the description thereof will be omitted.

続いて、クライアント装置２の機能構成について説明する。図３（ｂ）は、クライアント装置２の機能構成の一例を示すブロック図である。クライアント装置２は、その機能構成の一例として、通信制御部３１１、情報復元部３１２、表示情報生成部３１３、および出力制御部３１４を有する。 Next, the functional configuration of the client device 2 will be described. FIG. 3B is a block diagram showing an example of the functional configuration of the client device 2. As shown in FIG. The client device 2 has a communication control section 311, an information restoration section 312, a display information generation section 313, and an output control section 314 as an example of its functional configuration.

通信制御部３１１は、ネットワーク３を介してネットワークカメラ１により送信されたネットワーク信号を受信する。例えば、通信制御部３１１は、撮影画像、属性情報、振動音と音声信号（外部音）の信号（情報）を受信する。
情報復元部３１２は、通信制御部３１１により受信された信号に対する復元処理を行う。例えば、情報復元部３１２は、受信された信号から、ネットワークカメラ１により生成された、撮像画像（画像信号）、音声信号（外部音）、属性情報、振動音を復元する。 The communication control unit 311 receives network signals transmitted by the network camera 1 via the network 3 . For example, the communication control unit 311 receives signals (information) of captured images, attribute information, vibration sounds, and audio signals (external sounds).
The information restoration unit 312 performs restoration processing on the signal received by the communication control unit 311 . For example, the information restoration unit 312 restores captured images (image signals), audio signals (external sounds), attribute information, and vibration sounds generated by the network camera 1 from the received signals.

表示情報生成部３１３は、受信された振動の属性情報から、表示部３０４に表示する情報（表示情報）を生成する。例えば、表示情報生成部３１３は、当該振動の属性情報から、テキスト情報を生成する。当該テキスト情報には、当該振動がどのような音であるかを表す情報（例えば、「連続音」、「可聴外」、「変調音」）や、当該振動の基本周波数の情報（変調音である場合は複数の周波数）をテキストの形式で生成する。また、表示情報生成部３１３は、受信された（属性情報に付加された）検知アドレスから、振動している部分の領域を示す図形を、振動領域として生成する。 The display information generation unit 313 generates information (display information) to be displayed on the display unit 304 from the received vibration attribute information. For example, the display information generator 313 generates text information from the vibration attribute information. The text information includes information indicating what kind of sound the vibration is (for example, "continuous sound", "inaudible", "modulated sound") and information on the fundamental frequency of the vibration (modulated sound multiple frequencies, if any) in the form of text. In addition, the display information generation unit 313 generates, as a vibration region, a figure indicating the region of the vibrating portion from the received detection address (added to the attribute information).

出力制御部３１４は、クライアント装置２の表示部３０４に対する表示制御を行う。例えば、出力制御部３１４は、通信制御部３１１により受信され、情報復元部３１２により生成された撮影画像に、表示情報生成部３１３で生成された各種情報を重畳して（すなわち、撮影画像上に当該各種情報が表示されるように）、表示部３０４に表示する。また、出力制御部３１４は、発音部３０５からの発音を制御する。 The output control unit 314 performs display control on the display unit 304 of the client device 2 . For example, the output control unit 314 superimposes various information generated by the display information generation unit 313 on the captured image received by the communication control unit 311 and generated by the information restoration unit 312 (that is, on the captured image displayed on the display unit 304 so that the various information is displayed. In addition, the output control section 314 controls sound generation from the sound generation section 305 .

（処理の流れ）
続いて、ネットワークカメラ１とクライアント装置２による処理の流れを説明する。まず、ネットワークカメラ１の処理について説明する。図６は、本実施形態によるネットワークカメラ１により実行される例示的な処理のフローチャートである。なお、図６に示すフローチャートは、ネットワークカメラ１の制御部２０２が記憶部２０１に記憶されている制御プログラムを実行し、情報の演算および加工並びに各ハードウェアの制御を実行することにより実現されうる。 (Processing flow)
Next, the flow of processing by the network camera 1 and client device 2 will be described. First, processing of the network camera 1 will be described. FIG. 6 is a flowchart of exemplary processing performed by the network camera 1 according to this embodiment. The flowchart shown in FIG. 6 can be realized by executing the control program stored in the storage unit 201 by the control unit 202 of the network camera 1 to perform calculation and processing of information and control of each hardware. .

ネットワークカメラ１が所望の場所に設置され、電源が投入されると（Ｓ６１）、制御部２０２はネットワークカメラ１の通信制御部２１６のサーバ機能を立ち上げる。これにより、ネットワークカメラ１は、ネットワーク３を介してクライアント装置２と接続可能な状態になる。続いて、通信制御部２１６は、ネットワーク３を介して初期画面データをクライアント装置２に送信する。 When the network camera 1 is installed at a desired location and the power is turned on (S61), the control unit 202 activates the server function of the communication control unit 216 of the network camera 1. FIG. As a result, the network camera 1 becomes connectable to the client device 2 via the network 3 . Subsequently, the communication control unit 216 transmits initial screen data to the client device 2 via the network 3 .

初期画面データの送信後に、通信制御部２１６によりクライアント装置２から設定完了の通知が受信されると（Ｓ６２）、ネットワークカメラ１は撮影画像の配信を開始する。すなわち、撮像部２０３は、ネットワークカメラ１の撮影画角内にある１つ以上の被写体（対象物）を撮像し、当該撮像により得られた電気信号を画像処理部２１１へ出力する。そして、画像処理部２１１は、当該電気信号から撮影画像を生成する。通信制御部２１６は、画像処理部２１１により生成された撮影画像をクライアント装置２に送信（配信）する（Ｓ６３）。またこのとき、通信制御部２１６は、音処理部２１２により生成された音声信号（外部音）をクライアント装置２に送信（配信）してもよい。図８（ａ）に、クライアント装置２に送信される撮影画像の例を示す。 After transmitting the initial screen data, when the communication control unit 216 receives a setting completion notification from the client device 2 (S62), the network camera 1 starts distributing the captured image. That is, the imaging unit 203 images one or more subjects (objects) within the imaging angle of view of the network camera 1 and outputs an electrical signal obtained by the imaging to the image processing unit 211 . Then, the image processing unit 211 generates a captured image from the electrical signal. The communication control unit 216 transmits (distributes) the captured image generated by the image processing unit 211 to the client device 2 (S63). At this time, the communication control unit 216 may transmit (distribute) the audio signal (external sound) generated by the sound processing unit 212 to the client device 2 . FIG. 8(a) shows an example of a captured image that is transmitted to the client device 2. As shown in FIG.

撮影画像がクライアント装置２に配信されている状態で、振動検知部２１３は、撮像部２０３により出力される情報から、被写体（対象物）の振動の検知を行う。検知処理は上述の通りである。振動検知部２１３により振動が検知されると（Ｓ６４でＹｅｓ）、属性情報生成部２１４は、振動の属性情報を生成する（Ｓ６５）。属性情報の生成処理は上述の通りである。通信制御部２１６は、生成された属性情報を、ネットワーク３を介してクライアント装置２に送信する（Ｓ６６）。 While the captured image is being distributed to the client device 2 , the vibration detection unit 213 detects vibration of the subject (object) from the information output by the imaging unit 203 . The detection process is as described above. When vibration is detected by the vibration detection unit 213 (Yes in S64), the attribute information generation unit 214 generates vibration attribute information (S65). The process of generating attribute information is as described above. The communication control unit 216 transmits the generated attribute information to the client device 2 via the network 3 (S66).

また、Ｓ６５で生成された属性情報は音生成部２１５に入力され、音生成部２１５は、属性情報を反映した振動音を生成する（Ｓ６７）。振動音の生成処理は上述の処理通りである。通信制御部２１６は、生成された振動音を、ネットワーク３を介してクライアント装置２に送信する（Ｓ６８）。
その後、制御部２０２は、撮影画像の配信を終了するか否かを判定し、終了しない場合は（Ｓ６９でＮｏ）、処理はＳ６４へ戻り、終了する場合は（Ｓ６９でＹｅｓ）、図６の処理を終了する。撮影画像の配信を終了するか否かは、例えば、通信制御部２１６が、クライアント装置２から配信終了要求を受信するか否かにより判定可能である。この場合、通信制御部２１６が当該要求を受信すると、制御部２０２は、撮影画像の配信を終了すると判定することができる。 Also, the attribute information generated in S65 is input to the sound generation unit 215, and the sound generation unit 215 generates vibration sound reflecting the attribute information (S67). The vibration sound generation process is the same as the process described above. The communication control unit 216 transmits the generated vibration sound to the client device 2 via the network 3 (S68).
After that, the control unit 202 determines whether or not to end the delivery of the captured image. If not (No in S69), the process returns to S64. End the process. Whether or not to end distribution of the captured image can be determined by, for example, whether or not the communication control unit 216 receives a distribution end request from the client device 2 . In this case, when the communication control unit 216 receives the request, the control unit 202 can determine to end distribution of the captured image.

次に、クライアント装置２の処理について説明する。図７は、本実施形態によるクライアント装置２により実行される例示的な処理のフローチャートである。なお、図７に示すフローチャートは、クライアント装置２の制御部３０２が記憶部３０１に記憶されている制御プログラムを実行し、情報の演算および加工並びに各ハードウェアの制御を実行することにより実現されうる。なお、クライアント装置２は、電源が投入されているものとする。また、ここでは、通信制御部２１６が情報復元部３１２の機能も有するものとする。 Next, processing of the client device 2 will be described. FIG. 7 is a flowchart of exemplary processing performed by the client device 2 according to this embodiment. The flowchart shown in FIG. 7 can be realized by executing the control program stored in the storage unit 301 by the control unit 302 of the client device 2 to perform calculation and processing of information and control of each hardware. . It is assumed that the client device 2 is powered on. Further, here, it is assumed that the communication control unit 216 also has the function of the information restoration unit 312 .

ネットワークカメラ１の電源が投入されたことに応じて初期画面データが送信される（図６のＳ６１）と、クライアント装置２の通信制御部３１１は当該データを受信し、表示制御部３１４は当該データを基に、表示部３０４に初期画面を表示する（Ｓ７１）。その後、クライアント装置２のユーザによる、初期画面に対する所定の操作（例えば所定のＵＲＬへのアクセス）により、撮影画像を表示するためのブラウザが表示部３０４で開くと、通信制御部３１１は、設定完了通知を、ネットワーク３を介してネットワークカメラ１に送信する（Ｓ７２）。 When the initial screen data is transmitted in response to the power-on of the network camera 1 (S61 in FIG. 6), the communication control unit 311 of the client device 2 receives the data, and the display control unit 314 receives the data. , the initial screen is displayed on the display unit 304 (S71). After that, when the user of the client device 2 performs a predetermined operation on the initial screen (for example, accesses a predetermined URL) to open a browser for displaying the captured image on the display unit 304, the communication control unit 311 completes the setting. A notification is sent to the network camera 1 via the network 3 (S72).

設定完了通知の送信後、通信制御部３１１により撮影画像（および外部音）が受信（取得）され、表示制御部３１４は、撮影画像を、表示部３０４に表示する（Ｓ７３）。ここで表示される撮影画像の一例は図８（ａ）に示される。
撮影画像が表示部３０４に表示されている状態で、ネットワークカメラ１により被写体の振動が検知されると、通信制御部３１１は、ネットワークカメラ１により生成された当該振動の属性情報と振動音の情報を受信する（Ｓ７４でＹｅｓ）。受信された属性情報と振動音は、記憶部３０１に記憶される。続いて、表示情報生成部３１３は、受信された属性情報に基づいて、表示部３０４に表示する情報（表示情報）を生成し（Ｓ７５）、表示制御部３１４は、表示部３０４に表示されている撮影画像上に、生成された当該表示情報を表示する（Ｓ７６）。 After transmitting the setting completion notification, the communication control unit 311 receives (acquires) the captured image (and the external sound), and the display control unit 314 displays the captured image on the display unit 304 (S73). An example of the captured image displayed here is shown in FIG. 8(a).
When the network camera 1 detects the vibration of the subject while the captured image is displayed on the display unit 304, the communication control unit 311 receives the vibration attribute information and vibration sound information generated by the network camera 1. is received (Yes in S74). The received attribute information and vibration sound are stored in the storage unit 301 . Subsequently, the display information generation unit 313 generates information (display information) to be displayed on the display unit 304 based on the received attribute information (S75), and the display control unit 314 causes the display unit 304 to display The generated display information is displayed on the captured image (S76).

表示情報生成部３１３により生成される表示情報は、上述のように、例えば、被写体において振動している部分の領域を示す図形（振動領域）、属性情報のテキスト情報を含む。また、表示情報生成部３１３は、これらの情報に加えて、発音可能であることを示すアイコンを生成してもよい。 The display information generated by the display information generation unit 313 includes, for example, a figure (vibration region) indicating the region of the vibrating portion of the subject and text information of attribute information, as described above. In addition to these pieces of information, the display information generation unit 313 may also generate an icon indicating that the sound can be produced.

図８（ｂ）に、本実施形態によるクライアント装置の表示部３０４に表示される画面の例を示す。図８（ｂ）の例では、ネットワークカメラ１が１つ以上の被写体に対して複数の部分の振動を検知し、それぞれの振動について属性情報や振動音を生成して送信した例である。図８（ｂ）において、領域８１、８３、８５は、振動領域を示す図形（振動部分を囲む境界線の図形）を示し、領域８２、８４、８６は、領域８１、８３、８５の振動の属性情報のテキスト情報を示す。
領域８２の「１ＫＨｚ連続音」は、振動音が１ＫＨｚの周波数で連続している音であることを表す。領域８４の「２２ＫＨｚ可聴外」は、振動音が可聴帯域（２０ＫＨｚ程度）外の２２ＫＨｚの音であることを表す。領域８６の「２ＫＨｚ／６００Ｈｚ変調音」は、振動音が２ＫＨｚと６００Ｈｚの周波数の和で構成された変調音であることを表す。
また、図８（ｂ）には、領域８１に対して発音可能であることを示すアイコン８７も示されており、ここではラッパの形のアイコンである。図示していないが、領域８５に対して発音可能であることを示すアイコンも表示されてもよい。 FIG. 8B shows an example of a screen displayed on the display unit 304 of the client device according to this embodiment. In the example of FIG. 8B, the network camera 1 detects vibrations of a plurality of parts of one or more subjects, and generates and transmits attribute information and vibration sounds for each vibration. In FIG. 8(b), regions 81, 83, and 85 show figures indicating vibration regions (graphics of boundary lines surrounding vibrating portions), and regions 82, 84, and 86 show the vibrations of the regions 81, 83, and 85. Indicates text information of attribute information.
"1 KHz continuous sound" in the area 82 indicates that the vibrating sound is continuous sound with a frequency of 1 KHz. "22 KHz inaudible" in the area 84 indicates that the vibration sound is a 22 KHz sound outside the audible band (approximately 20 KHz). "2 KHz/600 Hz modulated sound" in area 86 indicates that the vibrating sound is a modulated sound composed of the sum of frequencies of 2 KHz and 600 Hz.
Also shown in FIG. 8(b) is an icon 87, here an icon in the form of a trumpet, indicating that the area 81 can be pronounced. Although not shown, an icon may also be displayed to indicate that the area 85 can be pronounced.

クライアント装置２のユーザが、入力部３０３を介して表示部３０４に表示されているいずれかの振動領域を選択すると（Ｓ７７でＹｅｓ）、選択出力制御部３１４は、当該選択された振動領域に対応する振動音を発音部３０５から出力する（Ｓ７８）。なお、振動領域に替えて、ユーザが、属性情報のテキストやアイコンのいずれを選択することに応じて、選択された領域に対応する振動音が出力されるように構成されてもよい。 When the user of the client device 2 selects one of the vibration regions displayed on the display unit 304 via the input unit 303 (Yes in S77), the selection output control unit 314 controls the selected vibration region. A vibrating sound is output from the sound generator 305 (S78). It should be noted that, instead of the vibrating area, the user may select text or an icon of the attribute information so that a vibrating sound corresponding to the selected area may be output.

その後、クライアント装置のユーザが、ネットワークカメラ１から配信される撮影画像の表示を終了するまで、Ｓ７４～Ｓ７８の処理が続けられる。そして、ユーザが、表示終了するために、配信終了の要求を、入力部３０３を介して入力し、通信制御部３１１が当該要求をネットワークカメラ１に送信すると（Ｓ７９でＹｅｓ）、図７の処理を終了する。なお、配信（表示）終了のための手続きは、これに限定されない。 After that, the processing of S74 to S78 is continued until the user of the client device ends the display of the photographed image distributed from the network camera 1. FIG. Then, when the user inputs a distribution end request via the input unit 303 in order to end the display, and the communication control unit 311 transmits the request to the network camera 1 (Yes in S79), the process of FIG. exit. Note that the procedure for ending distribution (display) is not limited to this.

以上のように、クライアント装置２の表示部には、ネットワークカメラ１において振動検知された部分の属性情報が表示され、クライアント装置２のユーザにより、表示画面上で振動検知された部分の領域が選択されることに応じて、当該選択された部分に対応した振動音が出力される。これにより、ユーザは、振動がどのような特徴を有するかを、音により把握することが可能となり、振動の特徴の理解が容易になる。 As described above, the display unit of the client device 2 displays the attribute information of the portion where the network camera 1 detects the vibration, and the user of the client device 2 selects the region of the portion where the vibration is detected on the display screen. Vibration sound corresponding to the selected portion is output. As a result, the user can grasp the characteristics of the vibration from the sound, and the characteristics of the vibration can be easily understood.

なお、本実施形態では、図８（ｂ）に示すように、クライアント装置２の表示部３０４は、属性情報のテキスト情報（領域８２、８４、８６）を表示する例を示したが、振動領域のみを表示し、当該領域の選択に応じて、当該領域に応じた振動音を出力するように構成されてもよい。この場合、例えば、ネットワークカメラ１により送信される属性情報には振動の検知アドレス（領域情報）のみが含まれ、クライアント装置２は、当該検知アドレスに基づいて表示部３０４に振動領域を撮影画像に重畳して表示する。そして、ユーザにより当該領域が選択されることに応じて、クライアント装置２は、当該領域に応じた振動音を出力することができる。 In this embodiment, as shown in FIG. 8B, the display unit 304 of the client device 2 shows an example of displaying the text information (regions 82, 84, 86) of the attribute information. may be displayed, and vibration sound corresponding to the area may be output according to the selection of the area. In this case, for example, the attribute information transmitted by the network camera 1 includes only the vibration detection address (area information), and the client device 2 displays the vibration area on the display unit 304 based on the detection address. It is superimposed and displayed. Then, when the user selects the area, the client device 2 can output a vibration sound corresponding to the area.

また、本実施形態では、ネットワークカメラ１において、振動検知部２１３により振動が検知された後に、音生成部２１５が振動音を生成したが、クライアント装置２からの振動領域の選択の通知に応じて、音生成部２１２が振動音を生成してもよい。この場合、ユーザによる振動領域の選択の情報が、ネットワーク３を介してネットワークカメラ１に伝達され、ネットワークカメラ１の制御部２０２により音生成部２１５へ当該情報が伝達される。当該情報には、選択された振動領域に対応する画素のアドレスといった、位置を特定するためのアドレスを含むものとする。音生成部２１５は、記憶部２０１に記憶されている属性情報から、伝達された情報のアドレスを参照し、当該アドレスに対応する検知アドレスが付加された属性情報について演算を開始し、振動音を生成することができる。 Further, in the present embodiment, in the network camera 1, after the vibration detection unit 213 detects vibration, the sound generation unit 215 generates a vibration sound. , the sound generator 212 may generate the vibrating sound. In this case, information on the selection of the vibration region by the user is transmitted to the network camera 1 via the network 3 , and the control section 202 of the network camera 1 transmits the information to the sound generation section 215 . The information includes an address for specifying the position, such as the address of the pixel corresponding to the selected vibration area. The sound generation unit 215 refers to the address of the transmitted information from the attribute information stored in the storage unit 201, starts calculation for the attribute information to which the detection address corresponding to the address is added, and generates the vibration sound. can be generated.

また、図６のＳ６７において、音生成部２１５により生成された振動音（合成音）が可聴帯域外の場合は、出力制御部２１７は、可聴帯域内の所定の音声を警告音として発音部２０７から出力するようにしてもよい。
また、このような場合は、ネットワークカメラ１からクライアント装置２へ所定の通知を行ってもよい。例えば、ネットワークカメラ１の属性情報生成部２１４は振動音が可聴帯域外であることを示す情報もしくは周波数の情報を属性情報に含め、当該情報を受信したクライアント装置２の出力制御部３１４は当該情報から、振動音が可聴帯域でないと判定する。そして、出力制御部３１４が、可聴帯域内の所定の音声に変換し、当該音を警告音として発音部３０５から出力してもよいし、表示制御部３１４が、所定の警告表示を表示部３０４に表示してもよい。 6, if the vibration sound (synthesized sound) generated by the sound generation unit 215 is outside the audible band, the output control unit 217 outputs a predetermined sound within the audible band to the sound generating unit 207 as a warning sound. You may make it output from.
Also, in such a case, the network camera 1 may send a predetermined notification to the client device 2 . For example, the attribute information generation unit 214 of the network camera 1 includes information indicating that the vibration sound is outside the audible band or frequency information in the attribute information, and the output control unit 314 of the client device 2 that receives the information Therefore, it is determined that the vibration sound is not in the audible band. Then, the output control unit 314 may convert the sound into a predetermined sound within the audible band, and output the sound from the sound generation unit 305 as a warning sound. may be displayed in

また、クライアント装置２では、図７のＳ７７において、ユーザが振動領域を選択するまでの間、もしくは、振動音を出力させてないときは、出力制御部３１４は、ネットワークカメラ１から受信した、ネットワークカメラ１の外部音を発音部３０５から出力してもよい。 In addition, in the client device 2, until the user selects the vibration area in S77 of FIG. 7, or when the vibration sound is not output, the output control unit 314 receives the network External sound of the camera 1 may be output from the sound generator 305 .

また、クライアント装置２では、図８（ｂ）のように複数の振動領域が表示され、Ｓ７７における１つの領域選択に応じて振動音が出力されているときに、ユーザが別の領域を選択した場合は、出力制御部３１４は、直近に選択された領域（すなわち、当該別の領域）に対応する振動音を出力するようにしてもよい。あるいは、出力制御部３１４は、選択済みの領域に対応する振動音を請求項に出力するようにしてもよい。 Also, on the client device 2, a plurality of vibration areas are displayed as shown in FIG. 8B, and when the vibration sound is output in response to selection of one area in S77, the user selects another area. , the output control unit 314 may output the vibration sound corresponding to the most recently selected area (that is, the other area). Alternatively, the output control unit 314 may output the vibration sound corresponding to the selected area to the claim.

また、Ｓ７７で振動領域が選択された際に、図８（ｂ）に示すように、出力制御部３１４は、振動音を出力させる時間の複数の選択肢（選択メニュー）を表示部３０４に表示してもよい。ここで、ユーザが入力部３０３を介して当該複数の選択肢のうちのいずれかの選択操作を行った場合に、出力制御部３１４は、当該選択された選択肢に対応する時間の間、振動音を出力し、当該時間が経過した後、振動音の出力を停止するようにしてもよい。 Further, when the vibration region is selected in S77, as shown in FIG. 8B, the output control unit 314 displays on the display unit 304 a plurality of options (selection menu) for the time for outputting the vibration sound. may Here, when the user selects one of the plurality of options via the input unit 303, the output control unit 314 emits a vibration sound for the time corresponding to the selected option. After the time has elapsed, the output of the vibration sound may be stopped.

また、本実施形態では、ネットワークカメラ１の音生成部２１５は、ホワイトノイズにデジタルフィルタを適用することで振動音を生成したが、振動音の生成手法はこれに限定されない。例えば、音生成部２１５は、複数のデジタル音声パターンをあらかじめ作成、記憶しておき、属性情報に応じてデジタル音声パターンを選択することで振動音を生成してもよい。 Furthermore, in the present embodiment, the sound generation unit 215 of the network camera 1 generates vibration sound by applying a digital filter to white noise, but the method of generating vibration sound is not limited to this. For example, the sound generation unit 215 may create and store a plurality of digital sound patterns in advance, and select a digital sound pattern according to attribute information to generate vibration sound.

また、本実施形態では、音生成部２１５は、検知された振動と同じ周波数特性を有する振動音を生成したが、例えば、振動音の周波数が１．９ＫＨｚの場合に、周波数を２．０ＫＨｚに設定するなど、簡略化処理を施してもよく、その場合も同様の効果を有する。 Further, in the present embodiment, the sound generation unit 215 generates vibration sound having the same frequency characteristics as the detected vibration. A simplification process such as setting may be performed, and the same effect is obtained in that case.

また、本実施形態では、クライアント装置２において、表示部３０４と発音部３０５を別個に構成したが、表示部３０４と発音部３０５とが一体になるように構成されてもよい。
また、発音部３０５は、音に替えて／加えて、振動を発生するように構成されてもよい。この場合、表示部３０４と発音部３０５が、一体化された１つのデバイスとして、タッチパネル様の薄膜デバイスを有し、発音部３０５は、表示部３０４に対してユーザにより選択された被写体に対応する振動音による振動を物理的に発生するように構成されてもよい。 Further, in the present embodiment, the display unit 304 and the sound generation unit 305 are configured separately in the client device 2, but the display unit 304 and the sound generation unit 305 may be integrated.
Further, the sound generator 305 may be configured to generate vibration instead of/in addition to sound. In this case, the display unit 304 and the sound generation unit 305 have a touch panel-like thin film device as one integrated device, and the sound generation unit 305 corresponds to the subject selected by the user on the display unit 304. It may be configured to physically generate vibration due to vibration sound.

［第２実施形態］
次に、ネットワークカメラが複数台の場合の実施形態を第２実施形態として説明する。以下、第１実施形態と異なる点について説明し、共通の特徴については説明を省略する。
図１（ｂ）に、本実施形態による画像処理システムの構成の概略図を示す。本画像処理システムは、ネットワークカメラ１、４とクライアント装置２で構成され、ネットワークカメラ１、４とクライアント装置２はネットワーク３で接続可能に構成されている。 [Second embodiment]
Next, an embodiment in which a plurality of network cameras are used will be described as a second embodiment. Differences from the first embodiment will be described below, and descriptions of common features will be omitted.
FIG. 1B shows a schematic diagram of the configuration of the image processing system according to this embodiment. This image processing system is composed of network cameras 1 and 4 and a client device 2 , and the network cameras 1 and 4 and the client device 2 are configured to be connectable via a network 3 .

本実施形態では、ネットワークカメラ１、４とクライアント装置２の構成は、第１実施形態でそれぞれ図２と図３を参照して説明した通りであり、ここでは説明を省略する。
本実施形態では、ネットワークカメラ１、４は、各カメラを識別するために異なる識別番号（ＩＤ）が付されている。図９に示すように、ネットワークカメラ１はＩＤ１、ネットワークカメラ４はＩＤ２であるとする。 In the present embodiment, the configurations of the network cameras 1 and 4 and the client device 2 are as described in the first embodiment with reference to FIGS. 2 and 3, respectively, and the description thereof is omitted here.
In this embodiment, the network cameras 1 and 4 are assigned different identification numbers (IDs) to identify each camera. As shown in FIG. 9, it is assumed that network camera 1 has ID1 and network camera 4 has ID2.

次に、本実施形態の処理の流れについて説明する。ネットワークカメラ１、４はそれぞれ、第１実施形態で説明した図６に示す処理を実行する。図９に、本実施形態によるクライアント装置２により実行される例示的な処理のフローチャートを示す。なお、図７と共通の処理については、同じ参照符号を付し、説明を省略する。 Next, the flow of processing according to this embodiment will be described. The network cameras 1 and 4 each execute the processing shown in FIG. 6 described in the first embodiment. FIG. 9 shows a flowchart of an exemplary process performed by the client device 2 according to this embodiment. It should be noted that the same reference numerals are given to the same processing as in FIG. 7, and the description thereof will be omitted.

クライアント装置２の通信制御部３１１が初期画面データを受信し、表示制御部３１４が当該データを基に、表示部３０４に初期画面を表示した後（Ｓ７１）、クライアント装置２のユーザによる所定の操作により、通信制御部３１１は、設定完了通知をネットワークカメラ１、４に送信する（Ｓ９１）。さらに、出力制御部３１４は、ネットワークカメラ１、４から配信される２つの撮影画像の表示部３０４における画像配置を決定する（Ｓ９１）。 After the communication control unit 311 of the client device 2 receives the initial screen data and the display control unit 314 displays the initial screen on the display unit 304 based on the data (S71), the user of the client device 2 performs a predetermined operation. Accordingly, the communication control unit 311 transmits a setting completion notification to the network cameras 1 and 4 (S91). Further, the output control unit 314 determines the image arrangement on the display unit 304 of the two captured images distributed from the network cameras 1 and 4 (S91).

図１０に、本実施形態によるクライアント装置に表示される画面の一例を示す。図１０の例では、表示部３０４の画面の左上を基準として、ネットワークカメラの識別番号順（カメラごと）に、画像が配置されている。すなわち、左側にネットワークカメラ１（＝ＩＤ１）による撮影画像、右側にネットワークカメラ４（＝ＩＤ４）による撮影画像が表示されている。また、ネットワークカメラ１、４の識別番号（ＩＤ１、ＩＤ２）も表示されている。 FIG. 10 shows an example of a screen displayed on the client device according to this embodiment. In the example of FIG. 10, the images are arranged in the order of network camera identification numbers (for each camera) with the upper left corner of the screen of the display unit 304 as a reference. That is, an image captured by network camera 1 (=ID1) is displayed on the left side, and an image captured by network camera 4 (=ID4) is displayed on the right side. The identification numbers (ID1, ID2) of network cameras 1 and 4 are also displayed.

続いて、通信制御部３１１は、ネットワークカメラ１、４から撮影画像（および外部音）を受信し、表示制御部３１４は、Ｓ９１で決定した画像配置に従い、２つの撮影画像を統合して１つの広域画像を生成して表示部３０４に表示する（Ｓ９２）。
広域画像が表示部３０４に表示されている状態で、ネットワークカメラ１、４により被写体の振動が検知されると、通信制御部３１１は、ネットワークカメラ１、４により生成された当該振動の属性情報と振動音の情報を受信する（Ｓ７４でＹｅｓ）。当該情報は、カメラの識別番号や検知アドレス（領域情報）に対応付けて、記憶管理される。続いて、表示情報生成部３１３は表示情報を生成し（Ｓ７５）、表示制御部３１４は、Ｓ９１で決定した画像配置に従い、表示部３０４に表示されている広域画像上に、生成された当該表示情報を表示する（Ｓ９３）。図１０では、ネットワークカメラ１、４による撮影画像上に、属性情報により生成されたテキスト情報と、振動している部分の領域を示す図形（振動領域）等が表示されている。 Subsequently, the communication control unit 311 receives captured images (and external sounds) from the network cameras 1 and 4, and the display control unit 314 integrates the two captured images into one according to the image layout determined in S91. A wide area image is generated and displayed on the display unit 304 (S92).
When the network cameras 1 and 4 detect the vibration of the object while the wide-area image is being displayed on the display unit 304, the communication control unit 311 generates the attribute information of the vibration generated by the network cameras 1 and 4, and Information on the vibration sound is received (Yes in S74). The information is stored and managed in association with the camera identification number and detection address (area information). Subsequently, the display information generation unit 313 generates display information (S75), and the display control unit 314 displays the generated display information on the wide area image displayed on the display unit 304 according to the image layout determined in S91. Information is displayed (S93). In FIG. 10, text information generated from attribute information, a figure (vibration area) indicating the area of the vibrating portion, and the like are displayed on the image captured by the network cameras 1 and 4 .

クライアント装置２のユーザが、入力部３０３を介して表示部３０４に表示されているいずれかの振動領域を選択すると（Ｓ７７でＹｅｓ）、出力制御部３１４は、当該選択された振動領域とＳ９１で決定した画像配置とから、ネットワークカメラの識別番号を特定する。そして、出力制御部３１４は、当該振動領域とカメラの識別番号に対応する振動音を発音部３０５から出力する（Ｓ９４）。 When the user of the client device 2 selects one of the vibration regions displayed on the display unit 304 via the input unit 303 (Yes in S77), the output control unit 314 controls the selected vibration region and The identification number of the network camera is identified from the determined image layout. Then, the output control unit 314 outputs the vibration sound corresponding to the vibration area and the identification number of the camera from the sound generation unit 305 (S94).

以上のようにして、複数のネットワークカメラによる撮影画像で構成された広域画像において、振動検知された部分についての領域情報といった情報が表示され、ユーザが振動領域を選択することに応じて、当該振動領域に対応する振動音が出力される。 As described above, in a wide-area image composed of images captured by a plurality of network cameras, information such as area information about a portion where vibration has been detected is displayed. A vibration sound corresponding to the region is output.

なお、本実施形態では、クライアント装置２の出力制御部３１４は、広域画像に対する各ネットワークカメラによる撮影画像の配置を、カメラの識別番号の数字順に割り当てたが、ランダムに決定してもよい。また、出力制御部３１４は、ユーザによる入力部３０３を介した操作に応じて、配置を決定してもよい。
また、本実施形態では、ネットワークカメラの台数が２台の例を説明したが、ネットワークカメラの台数が３台以上であっても、同様の説明を適用可能である。また、その際に、出力制御部３１４は、カメラ台数分の画像を合成して、広域画像を生成し、表示部３０４に表示することができる。 In this embodiment, the output control unit 314 of the client device 2 assigns the images captured by each network camera to the wide-area image in numerical order of camera identification numbers, but may be determined randomly. In addition, the output control unit 314 may determine the layout according to the user's operation via the input unit 303 .
Also, in this embodiment, an example in which the number of network cameras is two has been described, but the same description can be applied even if the number of network cameras is three or more. Also, at that time, the output control unit 314 can synthesize images for the number of cameras to generate a wide-area image and display it on the display unit 304 .

このように、以上に説明した実施形態によれば、被写体の振動の特徴をユーザが視覚的及び聴覚的に容易に理解することが可能となり、被写体としての任意の装置で生じた振動の分析や評価をより定量的に行うことが可能となる。 As described above, according to the embodiments described above, the user can easily understand the characteristics of the vibration of the subject visually and aurally, and can analyze the vibration generated in any device as the subject. Evaluation can be made more quantitatively.

また、本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Further, the present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device executes the program. It can also be realized by a process of reading and executing. It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

１；４：ネットワークカメラ、２：クライアント装置、３：ネットワーク 1; 4: network camera, 2: client device, 3: network

Claims

Acquisition means for acquiring, from an imaging device, a photographed image of a subject, region information indicating a region of a portion of the subject in which vibration occurred, and vibration sound information representing a sound corresponding to the vibration of the portion;
display control means for superimposing the area indicated by the area information on the captured image and displaying the area on a display unit;
receiving means for receiving an operation by a user on the display unit;
sound control means for outputting the vibrating sound from a sound generating unit based on the information of the vibrating sound when the selecting operation for the region is received by the accepting means;
An information processing device comprising:

The acquisition means further acquires information on the external sound of the imaging device from the imaging device,
2. The sound control means according to claim 1, wherein when the vibrating sound is not output from the sound generator, the sound controller causes the sound generator to output the external sound based on the information of the external sound. Information processing equipment.

The acquisition means further acquires information on the vibration frequency of the portion from the imaging device,
3. The information processing apparatus according to claim 1, wherein the display control means superimposes the information on the frequency on the photographed image and displays it on the display unit.

The sound control means determines whether or not the vibration sound is in the audible band based on the frequency information, and converts the vibration sound into sound in the audible band when it is determined that the vibration sound is not in the audible band. 4. The information processing apparatus according to claim 3, wherein the sound is output from the sound generator.

The sound control means determines whether or not the vibration sound is in the audible band based on the information on the frequency, and when it is determined that the vibration sound is not in the audible band, the display control means warns the display unit. 5. The information processing apparatus according to claim 3, wherein is displayed.

The display control means displays a plurality of options for a time for outputting the vibration sound on the display unit in response to the user selecting the region on the captured image,
When the selection operation of one of the plurality of options is accepted by the acceptance means, the sound control means outputs the vibrating sound from the sound generator for a time period corresponding to the selected option. 6. The information processing apparatus according to any one of claims 1 to 5, wherein the information is output.

The acquiring means acquires other area information indicating an area of the part other than the part in which the subject vibrates, and information of another vibration sound representing a sound corresponding to the vibration of the different part. death,
The display control means superimposes the region and the another region indicated by the another region information on the captured image and displays the superimposed image on the display unit;
In a state in which the vibrating sound is being output from the sound generating unit by the sound controlling means, when the receiving means receives a selection operation for the different area, the sound controlling means stops outputting the vibrating sound. 7. The information processing apparatus according to any one of claims 1 to 6, wherein the sound generating unit stops and outputs the different vibration sound based on the information of the different vibration sound.

In a state in which the vibrating sound is being output from the sound generating unit by the sound control means, when the receiving means receives a selection operation for the different region, the sound control means controls the vibrating sound and the 8. The information processing apparatus according to claim 7, wherein the different vibration sounds based on information of the other vibration sounds are alternately output from the sound generator.

The obtaining means obtains from each of the plurality of imaging devices a photographed image of a subject, area information indicating an area of a portion of the subject where vibration occurs, and vibration sound information indicating a sound corresponding to the vibration of the portion. and are obtained,
9. The display controller according to any one of claims 1 to 8, wherein the display control unit superimposes the area indicated by the area information on the captured image for each of the plurality of imaging devices and displays the superimposed area on the display unit. The information processing device according to the item.

a photographing means for photographing a subject and generating a photographed image of the subject;
a detection means for detecting vibration in the subject;
Derivation means for deriving area information indicating the area of the part of the subject where the vibration is detected by the detection means, and attribute information indicating characteristics of the vibration;
generating means for generating a vibration sound representing a sound corresponding to the vibration based on the attribute information;
a transmitting means for transmitting the captured image, the area information, and the vibration sound information to another device;
An imaging device characterized by comprising:

The attribute information includes the fundamental frequency that is the frequency of the lowest frequency component that constitutes the vibration, the intensity of each frequency component that constitutes the vibration, the ratio of a plurality of frequency components that constitute the vibration, and the intermittence of the vibration. 11. The imaging device of claim 10, comprising any of the degrees of indicating.

The generation means is composed of a white noise generation unit, a digital filter, and a setting unit,
The setting unit sets the frequency characteristic of the digital filter so as to have the frequency characteristic of the vibration based on the attribute information,
12. The imaging apparatus according to claim 10, wherein the digital filter filters white noise with the frequency characteristics to generate the vibration sound.

A control method for an information processing device having a display unit and a sound generation unit, wherein an imaging device outputs a photographed image of a subject, area information indicating an area of a portion of the subject in which vibration occurs, and information corresponding to the vibration of the portion. an obtaining step of obtaining information about the vibration sound representing the sound;
a display control step of superimposing the area indicated by the area information on the captured image and displaying it on a display unit;
a sound control step of outputting the vibrating sound from the sound generating unit based on the information of the vibrating sound when a user's selection operation on the region displayed on the display unit is accepted;
A control method characterized by having

A control method for an imaging device having imaging means for imaging a subject and generating a captured image of the subject, comprising:
a detection step of detecting vibration in the subject;
a derivation step of deriving area information indicating the area of the part of the subject where the vibration is detected by the detection means, and attribute information indicating characteristics of the vibration;
a generating step of generating a vibration sound representing a sound corresponding to the vibration based on the attribute information;
a transmission step of transmitting the captured image, the area information, and the vibration sound information to another device;
A control method characterized by having

An image processing system having an imaging device and an information processing device,
The imaging device is
a photographing means for photographing a subject and generating a photographed image of the subject;
a detection means for detecting vibration in the subject;
Derivation means for deriving area information indicating the area of the part of the subject where the vibration is detected by the detection means, and attribute information indicating characteristics of the vibration;
generating means for generating a vibration sound representing a sound corresponding to the vibration based on the attribute information;
a transmitting means for transmitting the captured image, the area information, and the vibration sound information to another device;
has
The information processing device is
Acquisition means for acquiring the captured image, the area information, and the vibration sound information from the imaging device;
display control means for superimposing the area indicated by the area information on the captured image and displaying the area on a display unit;
receiving means for receiving an operation by a user on the display unit;
sound control means for outputting the vibrating sound from a sound generating unit based on the information of the vibrating sound when the selecting operation for the region is received by the accepting means;
An image processing system comprising:

A program for causing a computer to function as each means of the information processing apparatus according to any one of claims 1 to 9.

A program for causing a computer to function as each means of the imaging apparatus according to any one of claims 10 to 12.