JP6853909B1

JP6853909B1 - Image processing equipment, image processing methods and programs

Info

Publication number: JP6853909B1
Application number: JP2020154705A
Authority: JP
Inventors: 政明厚地; 隆一郎林; 純一鶴見
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2021-03-31
Anticipated expiration: 2040-09-15
Also published as: JP2022048722A

Abstract

【課題】動画の内容に適した表示態様でユーザが複数の動画を閲覧できる画像処理装置、画像処理方法及びプログラムを提供する。【解決手段】サーバ１は、複数のカメラが所定の領域を撮影することにより生成した複数の撮像画像データと、複数のカメラが所定の領域を撮影している複数の位置で取得された複数の音データと、を関連付けて取得するデータ取得部１３１と、複数の音データのうち少なくとも一部の音データが示す音の状態に基づいて、複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データの表示態様を決定する表示態様決定部１３２と、表示態様決定部１３２が決定した表示態様で特定撮像画像データを表示させるように複数の撮像画像データをユーザ端末のディスプレイに表示させる表示制御部１３３と、を有する。【選択図】図４PROBLEM TO BE SOLVED: To provide an image processing device, an image processing method and a program capable of allowing a user to view a plurality of moving images in a display mode suitable for the contents of the moving image. SOLUTION: A server 1 has a plurality of captured image data generated by a plurality of cameras photographing a predetermined area, and a plurality of captured image data acquired at a plurality of positions where a plurality of cameras are photographing a predetermined area. It is a specific captured image data among the plurality of captured image data based on the data acquisition unit 131 that acquires the sound data in association with each other and the state of the sound indicated by at least a part of the sound data among the plurality of sound data. A display mode determining unit 132 that determines the display mode of the specific captured image data, and a plurality of captured image data are displayed on the display of the user terminal so that the specific captured image data is displayed in the display mode determined by the display mode determining unit 132. It has a display control unit 133. [Selection diagram] Fig. 4

Description

本発明は、画像処理装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method and a program.

従来、画像を表示できる端末に、複数のカメラで撮像された複数の動画を配信し、端末に複数の動画を表示する技術が知られている（例えば、特許文献１を参照）。 Conventionally, there is known a technique of delivering a plurality of moving images captured by a plurality of cameras to a terminal capable of displaying an image and displaying the plurality of moving images on the terminal (see, for example, Patent Document 1).

特開平１１−２８９５３１号公報Japanese Unexamined Patent Publication No. 11-289531

従来の端末は、予め定められた配置またはユーザが任意に設定した配置で複数の動画を表示していた。しかし、複数の動画のいずれかにおいて、多くのユーザの関心を惹きつけるような注目シーンが配信されても、ユーザが他の動画を閲覧している場合には当該シーンを見逃してしまう可能性があった。 The conventional terminal displays a plurality of moving images in a predetermined arrangement or an arrangement arbitrarily set by the user. However, even if a attention scene that attracts the attention of many users is delivered in any of a plurality of videos, there is a possibility that the scene will be missed if the user is viewing another video. there were.

そこで、本発明はこれらの点に鑑みてなされたものであり、動画の内容に適した表示態様でユーザが複数の動画を閲覧できるようにすることを目的とする。 Therefore, the present invention has been made in view of these points, and an object of the present invention is to enable a user to view a plurality of moving images in a display mode suitable for the contents of the moving images.

本発明の第１の態様の画像処理装置は、複数の撮像装置が所定の領域を撮影することにより生成した複数の撮像画像データと、前記複数の撮像装置が前記所定の領域を撮影している複数の位置で取得された複数の音データと、を関連付けて取得するデータ取得部と、前記複数の音データのうち少なくとも一部の音データが示す音の状態に基づいて、前記複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データの表示態様を決定する表示態様決定部と、前記表示態様決定部が決定した表示態様で前記特定撮像画像データを表示させるように前記複数の撮像画像データを表示部に表示させる表示制御部と、を有する。 In the image processing device of the first aspect of the present invention, a plurality of captured image data generated by a plurality of imaging devices capturing a predetermined area and the plurality of imaging devices capturing the predetermined area. The plurality of captured images are based on a data acquisition unit that acquires a plurality of sound data acquired at a plurality of positions in association with each other and a sound state indicated by at least a part of the sound data among the plurality of sound data. A display mode determining unit that determines a display mode of the specific captured image data that is the specific captured image data among the data, and a plurality of the specific captured image data so as to display the specific captured image data in the display mode determined by the display mode determining unit. It has a display control unit for displaying captured image data on the display unit.

前記表示態様決定部は、前記少なくとも一部の音データに対応する音の大きさに基づいて前記表示態様を決定してもよい。 The display mode determining unit may determine the display mode based on the loudness of the sound corresponding to at least a part of the sound data.

前記表示態様決定部は、前記複数の音データのうち、第１音データに対応する第１撮像画像データの前記表示部における表示サイズよりも、第１音データよりも大きな音に対応する第２音データに対応する第２撮像画像データの前記表示サイズが大きくなるように前記表示態様を決定してもよい。 Among the plurality of sound data, the display mode determining unit corresponds to a second sound having a size larger than the display size of the first captured image data corresponding to the first sound data in the display unit and larger than the first sound data. The display mode may be determined so that the display size of the second captured image data corresponding to the sound data becomes large.

前記表示態様決定部は、前記少なくとも一部の音データに対応する音の内容に基づいて前記表示態様を決定してもよい。 The display mode determining unit may determine the display mode based on the content of the sound corresponding to at least a part of the sound data.

前記撮像画像データには、当該撮像画像データが取得された撮像装置を識別するための撮像装置識別情報が含まれており、前記音データには、当該音データが取得された装置を識別するための音声装置識別情報が含まれており、前記表示態様決定部は、前記音識別情報と、前記複数の撮像装置それぞれを特定するための前記撮像装置識別情報とが関連付けられた関連情報を参照することにより、前記表示態様を決定してもよい。 The captured image data includes image pickup device identification information for identifying the image pickup device from which the captured image data was acquired, and the sound data is for identifying the device from which the sound data was acquired. The voice device identification information of the above is included, and the display mode determining unit refers to the related information in which the sound identification information and the image pickup device identification information for identifying each of the plurality of image pickup devices are associated with each other. Thereby, the display mode may be determined.

前記撮像画像データには、当該撮像画像データが取得された位置を示す第１位置情報が含まれており、前記音データには、当該音データが取得された位置を示す第２位置情報が含まれており、前記表示態様決定部は、前記第１位置情報が示す位置から最も近い位置に対応する前記第２位置情報に対応する前記音データの状態に基づいて前記表示態様を決定してもよい。 The captured image data includes first position information indicating the position where the captured image data is acquired, and the sound data includes second position information indicating the position where the sound data is acquired. Even if the display mode determining unit determines the display mode based on the state of the sound data corresponding to the second position information corresponding to the position closest to the position indicated by the first position information. Good.

前記表示態様決定部は、前記音データに含まれている音のうち、所定の音以外の音の状態に基づいて前記表示態様を決定してもよい。 The display mode determining unit may determine the display mode based on the state of sounds other than the predetermined sounds among the sounds included in the sound data.

前記データ取得部は、前記複数の撮像装置それぞれに設けられた指向性マイクロフォンにより取得された前記複数の音データを取得してもよい。 The data acquisition unit may acquire the plurality of sound data acquired by the directional microphones provided in each of the plurality of imaging devices.

本発明の第２の態様の画像処理方法は、コンピュータが、複数の撮像装置が所定の領域を撮影することにより生成した複数の撮像画像データと、前記複数の撮像装置が前記所定の領域を撮影している複数の位置で取得された複数の音データと、を関連付けて取得するステップと、前記複数の音データのうち少なくとも一部の音データが示す音の状態に基づいて、前記複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データの表示態様を決定するステップと、決定した表示態様で前記特定撮像画像データを表示させるように前記複数の撮像画像データを表示部に表示させるステップと、を実行する。 In the image processing method of the second aspect of the present invention, a computer captures a plurality of captured image data generated by a plurality of imaging devices capturing a predetermined area, and the plurality of imaging devices capture the predetermined area. The plurality of imagings are performed based on the step of associating and acquiring the plurality of sound data acquired at the plurality of positions, and the state of the sound indicated by at least a part of the sound data among the plurality of sound data. A step of determining a display mode of the specific captured image data which is the specific captured image data among the image data, and displaying the plurality of captured image data on the display unit so as to display the specific captured image data in the determined display mode. To perform the steps and.

本発明の第３の態様のプログラムは、コンピュータを、複数の撮像装置が所定の領域を撮影することにより生成した複数の撮像画像データと、前記複数の撮像装置が前記所定の領域を撮影している複数の位置で取得された複数の音データと、を関連付けて取得するデータ取得部と、前記複数の音データのうち少なくとも一部の音データが示す音の状態に基づいて、前記複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データの表示態様を決定する表示態様決定部と、前記表示態様決定部が決定した表示態様で前記特定撮像画像データを表示させるように前記複数の撮像画像データを表示部に表示させる表示制御部と、として機能させる。 In the program of the third aspect of the present invention, the computer captures a plurality of captured image data generated by a plurality of imaging devices capturing a predetermined area, and the plurality of imaging devices capture the predetermined area. The plurality of imaging images are based on a data acquisition unit that acquires a plurality of sound data acquired at a plurality of positions in association with each other and a sound state indicated by at least a part of the sound data among the plurality of sound data. A display mode determining unit that determines a display mode of the specific captured image data, which is the specific captured image data among the image data, and a plurality of the display mode determining units so as to display the specific captured image data in the display mode determined by the display mode determining unit. It functions as a display control unit that displays the captured image data of the above on the display unit.

本発明によれば、動画の内容に適した表示態様でユーザが複数の動画を閲覧できるようになるという効果を奏する。 According to the present invention, there is an effect that a user can view a plurality of moving images in a display mode suitable for the contents of the moving image.

複数のカメラが競技場に設置されている状態を示す模式図である。It is a schematic diagram which shows the state which a plurality of cameras are installed in a stadium. 複数のカメラが作成した複数の撮像画像データに基づく画像がディスプレイに表示された状態を示す図である。It is a figure which shows the state which the image based on the plurality of captured image data created by a plurality of cameras is displayed on the display. 画像処理システムのシステム構成図である。It is a system block diagram of an image processing system. サーバの構成を示す図である。It is a figure which shows the configuration of a server. 複数のカメラ、サーバ及びユーザ端末の動作シーケンスを示す図である。It is a figure which shows the operation sequence of a plurality of cameras, a server and a user terminal. ユーザ端末の構成を示す図である。It is a figure which shows the structure of a user terminal. 第２の実施形態における複数のカメラ、サーバ及びユーザ端末の動作シーケンスを示す図である。It is a figure which shows the operation sequence of a plurality of cameras, a server and a user terminal in 2nd Embodiment.

［画像処理システムＳの概要］
図１及び図２は画像処理システムＳの概要を説明するための図である。画像処理システムＳは、それぞれ異なるアングルから所定の領域を撮影する複数の撮像装置であるカメラＣ（図１においてはＣ１〜Ｃ５）が出力した複数の撮像画像データに基づく画像を、画像を見る人（以下、「ユーザ」という）が使用するディスプレイに同時に表示するシステムである。カメラＣは、例えば競技が開催されている競技場、又はイベントが開催されているイベント会場に設置されており、所定の領域は、それぞれのカメラＣが撮影可能な領域である。本明細書において、複数のカメラＣ１〜Ｃ５のそれぞれを区別する必要がない場合、カメラＣと称することがある。 [Overview of image processing system S]
1 and 2 are diagrams for explaining the outline of the image processing system S. The image processing system S is a person who views an image based on a plurality of captured image data output by cameras C (C1 to C5 in FIG. 1), which are a plurality of imaging devices that capture a predetermined area from different angles. It is a system that simultaneously displays on the display used by (hereinafter referred to as "user"). The camera C is installed in, for example, a stadium where a competition is held or an event venue where an event is held, and a predetermined area is an area where each camera C can take a picture. In the present specification, when it is not necessary to distinguish each of the plurality of cameras C1 to C5, it may be referred to as camera C.

図１は、複数のカメラＣ１〜Ｃ５が競技場に設置されている状態を示す模式図である。図２は、複数のカメラＣ１〜Ｃ５が作成した複数の撮像画像データに基づく画像がディスプレイに表示された状態を示す図である。撮像画像データは、例えば動画像であるが、静止画像であってもよい。 FIG. 1 is a schematic view showing a state in which a plurality of cameras C1 to C5 are installed in a stadium. FIG. 2 is a diagram showing a state in which an image based on a plurality of captured image data created by a plurality of cameras C1 to C5 is displayed on a display. The captured image data is, for example, a moving image, but may be a still image.

画像処理システムＳは、複数のカメラＣ１〜Ｃ５のそれぞれの位置で取得された音の状態に基づいて、ディスプレイに表示する画像の態様を変化させることができるという特徴を有する。音の状態は、例えば音の大きさ、音の内容又は音の周波数のように、カメラＣが撮影している領域の状態と関連性がある情報により表される。画像処理システムＳにおいては、例えば、大きな音が取得された位置のカメラＣで撮影された画像が、ディスプレイにおいて他の画像よりも目立つように表示される。例えば、大きな音が取得された位置のカメラＣで撮影された画像が、他の画像よりも大きく表示されたり、特別な枠で囲まれた状態で表示されたりする。 The image processing system S has a feature that the mode of the image displayed on the display can be changed based on the state of the sound acquired at each position of the plurality of cameras C1 to C5. The sound state is represented by information related to the state of the area captured by the camera C, such as loudness, sound content, or sound frequency. In the image processing system S, for example, an image taken by the camera C at a position where a loud sound is acquired is displayed on the display so as to be more conspicuous than other images. For example, an image taken by the camera C at a position where a loud sound is acquired may be displayed larger than other images or may be displayed in a state surrounded by a special frame.

図２（ａ）は、ディスプレイにおける複数の領域Ｒ１〜Ｒ５のそれぞれに、カメラＣ１〜Ｃ５で撮影された画像が表示されている状態を示している。図２（ａ）においては、カメラＣ１で撮影された画像が、最も大きい領域Ｒ１に表示されており、他の画像は、領域Ｒ１よりも小さな領域Ｒ２〜Ｒ５に表示されている。 FIG. 2A shows a state in which images taken by the cameras C1 to C5 are displayed in each of the plurality of areas R1 to R5 on the display. In FIG. 2A, the image taken by the camera C1 is displayed in the largest area R1, and the other images are displayed in the areas R2 to R5 smaller than the area R1.

図２（ｂ）は、図２（ａ）の状態の後に、カメラＣ５で取得された音が、他のカメラＣ１〜Ｃ４で取得された音よりも大きい場合にディスプレイに表示された画像の例を示している。図２（ｂ）においては、図２（ａ）において領域Ｒ１に表示されていたカメラＣ１で撮影された画像に代わって、カメラＣ５で撮影された画像が領域Ｒ１に表示されている。画像処理システムＳにおいては、このように、大きく表示される画像が、撮影された位置において取得された音の状態によって変化するので、例えば大きな歓声が起きた位置に近い位置で撮影された画像が大きく表示される。その結果、多くのユーザの関心を惹きつけるような注目シーンをユーザが見逃しにくくなる。 FIG. 2B is an example of an image displayed on the display when the sound acquired by the camera C5 is louder than the sound acquired by the other cameras C1 to C4 after the state of FIG. 2A. Is shown. In FIG. 2B, the image taken by the camera C5 is displayed in the area R1 instead of the image taken by the camera C1 displayed in the area R1 in FIG. 2A. In the image processing system S, the image displayed in a large size changes depending on the state of the sound acquired at the position where the image was taken. Therefore, for example, the image taken at a position close to the position where a loud cheer occurs is displayed. It is displayed large. As a result, it becomes difficult for the user to overlook the attention scene that attracts the attention of many users.

図３は、画像処理システムＳのシステム構成図である。画像処理システムＳは、複数のカメラＣ（Ｃ１〜Ｃ５）と、サーバ１と、ユーザ端末２とを有する。複数のカメラＣ、サーバ１及びユーザ端末２は、ネットワークＮを介して各種のデータを送受信する。ネットワークＮは、インターネット又は携帯電話網等を含む。 FIG. 3 is a system configuration diagram of the image processing system S. The image processing system S has a plurality of cameras C (C1 to C5), a server 1, and a user terminal 2. The plurality of cameras C, the server 1, and the user terminal 2 transmit and receive various data via the network N. The network N includes the Internet, a mobile phone network, and the like.

サーバ１は、ネットワークＮを介して、複数のカメラＣそれぞれから撮像画像データと音データを取得する。サーバ１は、取得した音データが示す音の状態に基づいて、ユーザ端末２に表示させる際の表示態様を他の画像と異なる表示態様にするべき撮像画像データを決定する。サーバ１は、決定した表示態様で複数の撮像画像データに基づく画像をユーザ端末２に表示させるようにユーザ端末２を制御する画像処理装置の一例である。 The server 1 acquires captured image data and sound data from each of the plurality of cameras C via the network N. The server 1 determines the captured image data to be displayed on the user terminal 2 in a display mode different from that of other images, based on the sound state indicated by the acquired sound data. The server 1 is an example of an image processing device that controls the user terminal 2 so that the user terminal 2 displays an image based on a plurality of captured image data in the determined display mode.

ユーザ端末２は、サーバ１から配信される複数の撮像画像データに基づく複数の画像を表示するディスプレイを有する端末であり、例えば、スマートフォン、タブレット又はパーソナルコンピュータである。ユーザ端末２の台数は任意である。画像処理システムＳにおいては、例えば、ユーザ端末２がサーバ１から複数の撮像画像データ及び少なくとも１つの音データの配信を受けて、配信された複数の撮像画像データを同時に表示するが、ユーザ端末２は、サーバ１を介することなく複数のカメラＣから複数の撮像画像データ及び少なくとも１つの音データを取得し、少なくとも１つの音データが示す音の状態に基づく表示態様で複数の撮像画像データに基づく画像をディスプレイに表示してもよい。 The user terminal 2 is a terminal having a display for displaying a plurality of images based on a plurality of captured image data distributed from the server 1, and is, for example, a smartphone, a tablet, or a personal computer. The number of user terminals 2 is arbitrary. In the image processing system S, for example, the user terminal 2 receives distribution of a plurality of captured image data and at least one sound data from the server 1 and simultaneously displays the distributed plurality of captured image data. Acquires a plurality of captured image data and at least one sound data from a plurality of cameras C without going through the server 1, and is based on the plurality of captured image data in a display mode based on the sound state indicated by at least one sound data. The image may be displayed on the display.

＜第１実施形態＞
まず、サーバ１が表示態様を決定する実施形態について説明する。この場合、ユーザ端末２は、サーバ１から指示された表示態様に基づいて複数の撮像画像データに基づく複数の画像をディスプレイに表示する。 <First Embodiment>
First, an embodiment in which the server 1 determines the display mode will be described. In this case, the user terminal 2 displays a plurality of images based on the plurality of captured image data on the display based on the display mode instructed by the server 1.

［サーバ１の構成］
図４は、サーバ１の構成を示す図である。サーバ１は、通信部１１と、記憶部１２と、制御部１３とを有する。制御部１３は、データ取得部１３１、表示態様決定部１３２及び表示制御部１３３を有する。 [Configuration of server 1]
FIG. 4 is a diagram showing the configuration of the server 1. The server 1 has a communication unit 11, a storage unit 12, and a control unit 13. The control unit 13 includes a data acquisition unit 131, a display mode determination unit 132, and a display control unit 133.

通信部１１は、ネットワークＮを介して複数のカメラＣ及びユーザ端末２との間でデータを送受信するための通信インターフェースを有する。通信部１１は、複数のカメラＣから受信した複数の撮像画像データ及び音データをデータ取得部１３１に入力する。また、通信部１１は、表示制御部１３３から入力された配信用の撮像画像データ及び音データを、ネットワークＮを介してユーザ端末２へと送信する。 The communication unit 11 has a communication interface for transmitting and receiving data between the plurality of cameras C and the user terminal 2 via the network N. The communication unit 11 inputs a plurality of captured image data and sound data received from the plurality of cameras C to the data acquisition unit 131. Further, the communication unit 11 transmits the captured image data and the sound data for distribution input from the display control unit 133 to the user terminal 2 via the network N.

記憶部１２は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）及びハードディスク等の記憶媒体を有する。記憶部１２は、制御部１３が実行するプログラムを記憶している。また、記憶部１２は、通信部１１が受信した複数の撮像画像データを一時的に記憶する。 The storage unit 12 has a storage medium such as a ROM (Read Only Memory), a RAM (Random Access Memory), and a hard disk. The storage unit 12 stores a program executed by the control unit 13. In addition, the storage unit 12 temporarily stores a plurality of captured image data received by the communication unit 11.

制御部１３は、例えばＣＰＵ（Central Processing Unit）を有しており、記憶部１２に記憶されたプログラムを実行することにより、データ取得部１３１、表示態様決定部１３２及び表示制御部１３３として機能する。 The control unit 13 has, for example, a CPU (Central Processing Unit), and functions as a data acquisition unit 131, a display mode determination unit 132, and a display control unit 133 by executing a program stored in the storage unit 12. ..

データ取得部１３１は、通信部１１を介して、複数のカメラＣが所定の領域を撮影することにより生成した複数の撮像画像データと、複数の撮像装置が所定の領域を撮影している複数の位置で取得された複数の音データと、を関連付けて取得する。データ取得部１３１は、例えば複数のカメラＣそれぞれに設けられた指向性マイクロフォンにより取得された複数の音データを取得する。複数のカメラＣが競技場に設置されている場合、観客席の歓声に基づいて、目立つように表示される撮像画像データが選択されるように、データ取得部１３１は、カメラＣが撮影する向きと反対向きに指向性を有するマイクロフォンにより取得された複数の音データを取得してもよい。 The data acquisition unit 131 includes a plurality of captured image data generated by a plurality of cameras C photographing a predetermined area via a communication unit 11, and a plurality of captured image data in which a plurality of imaging devices photograph a predetermined area. Acquires a plurality of sound data acquired at a position in association with each other. The data acquisition unit 131 acquires a plurality of sound data acquired by, for example, directional microphones provided in each of the plurality of cameras C. When a plurality of cameras C are installed in the stadium, the data acquisition unit 131 directs the camera C to shoot so that the captured image data to be conspicuously displayed is selected based on the cheers of the audience seats. A plurality of sound data acquired by a microphone having directivity in the opposite direction to the above may be acquired.

データ取得部１３１は、カメラＣが出力した撮像画像データと、カメラＣの位置において取得された音に対応する音データとを同時に取得してもよく、共通の識別情報が付された撮像画像データと音データとを別々に取得してもよい。データ取得部１３１は、取得した撮像画像データ及び音データを表示態様決定部１３２に入力する。データ取得部１３１は、取得した撮像画像データ及び音データを記憶部１２に記憶させてもよい。 The data acquisition unit 131 may simultaneously acquire the captured image data output by the camera C and the sound data corresponding to the sound acquired at the position of the camera C, and the captured image data with common identification information is attached. And the sound data may be acquired separately. The data acquisition unit 131 inputs the acquired captured image data and sound data to the display mode determination unit 132. The data acquisition unit 131 may store the acquired captured image data and sound data in the storage unit 12.

表示態様決定部１３２は、複数の音データのうち少なくとも一部の音データが示す音の状態に基づいて、複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データの表示態様を決定する。特定撮像画像データは、撮像画像データに関連付けてデータ取得部１３１が取得した音データに対応する音の状態が、他の撮像画像データに関連付けられた音データに対応する音の状態と異なる特徴を有する撮像画像データである。 The display mode determination unit 132 determines the display mode of the specific captured image data, which is the specific captured image data among the plurality of captured image data, based on the sound state indicated by at least a part of the sound data among the plurality of sound data. decide. The specific captured image data has a feature that the sound state corresponding to the sound data acquired by the data acquisition unit 131 in association with the captured image data is different from the sound state corresponding to the sound data associated with other captured image data. It is the captured image data to have.

表示態様決定部１３２は、このような特定撮像画像データの表示態様を、他の撮像画像データよりもユーザが見る確率が高まるようにする。表示態様決定部１３２は、例えば、特定撮像画像データを他の撮像画像データよりも大きな表示態様に決定したり、特別な枠を付けた表示態様に決定したりする。図２に示した例の場合、表示態様決定部１３２は、領域Ｒ１に特定撮像画像データが表示されるように表示態様を決定する。表示態様決定部１３２は、特定撮像画像データを表示する領域に、他の撮像画像データを表示する領域と異なる色の枠を表示するようにしたり、枠を点滅させるようにしたりしてもよい。表示態様決定部１３２は、特定撮像画像データを表示する領域に、他の撮像画像データを表示する領域には表示されないマークを表示させてもよい。 The display mode determining unit 132 makes the probability that the user sees such a display mode of the specific captured image data higher than that of other captured image data. The display mode determination unit 132 determines, for example, a specific captured image data in a display mode larger than other captured image data, or a display mode with a special frame. In the case of the example shown in FIG. 2, the display mode determination unit 132 determines the display mode so that the specific captured image data is displayed in the area R1. The display mode determination unit 132 may display a frame having a color different from that of another area for displaying the captured image data in the area for displaying the specific captured image data, or may make the frame blink. The display mode determination unit 132 may display a mark that is not displayed in the area for displaying the other captured image data in the area for displaying the specific captured image data.

表示態様決定部１３２は、少なくとも一部の音データに対応する音の大きさに基づいて表示態様を決定する。この場合、特定撮像画像データに対応する音の状態は、例えば大歓声の音を含む状態であり、他の撮像画像データに対応する音よりも大きなレベルの状態である。一例として、表示態様決定部１３２は、複数の音データのうち、第１音データに対応する第１撮像画像データの表示サイズよりも、第１音データよりも大きな音に対応する第２音データに対応する第２撮像画像データの表示サイズが大きくなるように表示態様を決定する。表示態様決定部１３２がこのように動作することで、大歓声が取得された位置のカメラＣが出力した撮像画像データをユーザが視認しやすくなるので、ユーザが注目すべきシーンをユーザが見逃しにくくなる。 The display mode determination unit 132 determines the display mode based on the loudness of the sound corresponding to at least a part of the sound data. In this case, the sound state corresponding to the specific captured image data is, for example, a state including a loud cheering sound, and is a state at a level higher than the sound corresponding to other captured image data. As an example, the display mode determination unit 132 determines the second sound data corresponding to a sound larger than the display size of the first captured image data corresponding to the first sound data and larger than the first sound data among the plurality of sound data. The display mode is determined so that the display size of the second captured image data corresponding to the above is large. By operating the display mode determining unit 132 in this way, it becomes easier for the user to visually recognize the captured image data output by the camera C at the position where the great cheers are acquired, so that it is difficult for the user to overlook the scene that the user should pay attention to. Become.

表示態様決定部１３２は、所定の閾値以上の音量に対応する撮像画像データを特定撮像画像データとして、他の撮像画像データの表示態様と異なる表示態様にしてもよい。所定の閾値以上は、例えばユーザにより設定された値、又は大歓声の音量に対応する値である。複数の撮像画像データが特定撮像画像データに該当する場合、表示態様決定部１３２は、複数の特定撮像画像データを他の撮像画像データの表示態様と異なる表示態様にしてもよい。表示態様決定部１３２がこのように動作することで、ユーザが注目すべき画像が複数ある場合に、ユーザが複数の画像を見ることができる。 The display mode determining unit 132 may use the captured image data corresponding to the volume equal to or higher than a predetermined threshold value as the specific captured image data, and may have a display mode different from the display mode of the other captured image data. The value above a predetermined threshold value is, for example, a value set by the user or a value corresponding to the volume of loud cheers. When the plurality of captured image data corresponds to the specific captured image data, the display mode determining unit 132 may display the plurality of specific captured image data in a display mode different from the display mode of the other captured image data. By operating the display mode determining unit 132 in this way, when there are a plurality of images that the user should pay attention to, the user can see the plurality of images.

表示態様決定部１３２は、少なくとも一部の音データに対応する音の内容に基づいて表示態様を決定してもよい。例えば、表示態様決定部１３２は、音データに含まれている音のうち、所定の音以外の音の状態に基づいて表示態様を決定する。表示態様決定部１３２は、音声認識をしたり、音の性状を解析したりすることにより、音データが示す音の内容が、ユーザが注目すべきシーンが撮影された位置と関係ない所定の音（例えば場内アナウンスの声又は外来音）であるか否かを判定する。表示態様決定部１３２は、音データが示す音の内容が、ユーザが注目すべきシーンが撮影された位置と関係ないと判定した場合、このような音データが示す音が、他の位置で取得された音よりも大きいとしても、このような音に対応する撮像画像データを他の撮像画像データと異なる表示態様にしない。 The display mode determination unit 132 may determine the display mode based on the content of the sound corresponding to at least a part of the sound data. For example, the display mode determination unit 132 determines the display mode based on the state of sounds other than the predetermined sounds among the sounds included in the sound data. The display mode determining unit 132 performs voice recognition and analyzes the properties of the sound so that the content of the sound indicated by the sound data is a predetermined sound irrelevant to the position where the scene to be noticed by the user is shot. (For example, the voice of the announcement in the hall or the foreign sound) is determined. When the display mode determination unit 132 determines that the content of the sound indicated by the sound data is not related to the position where the scene to be noticed by the user is photographed, the sound indicated by such sound data is acquired at another position. Even if it is louder than the sound, the captured image data corresponding to such a sound is not displayed in a display mode different from that of other captured image data.

一方、表示態様決定部１３２は、特定の音の内容が、ユーザが注目すべきシーンが撮影された位置と関係する内容（例えば大歓声の声）である場合、他の位置で取得された歓声以外の音よりも小さいとしても、特定の音に対応する撮像画像データを他の撮像画像データよりも目立つ表示態様にする。表示態様決定部１３２がこのように動作することで、ユーザが注目する必要がない画像が目立つように表示されてしまうことを防げる。 On the other hand, when the content of the specific sound is related to the position where the scene to be noticed by the user is shot (for example, a loud cheer), the display mode determining unit 132 cheers acquired at another position. Even if the sound is smaller than other sounds, the captured image data corresponding to the specific sound is displayed in a more conspicuous display mode than the other captured image data. By operating the display mode determining unit 132 in this way, it is possible to prevent an image that the user does not need to pay attention to from being displayed prominently.

撮像画像データには、撮像画像データが取得されたカメラＣを識別するための撮像装置識別情報が含まれており、音データには、音データが取得された音声装置（例えばマイクロフォン）を識別するための音声装置識別情報が含まれていてもよい。この場合、表示態様決定部１３２は、記憶部１２に記憶された、音声装置識別情報と撮像装置識別情報とが関連付けられた関連情報を参照することにより、撮像画像データに対応する音データを特定する。そして、特定した音データが示す音の状態に基づいて、表示態様を決定する。表示態様決定部１３２がこのように動作することで、カメラＣに内蔵されていない外部マイクロフォンで音データが取得された場合であっても、音の状態に基づく表示態様にすることができる。 The captured image data includes image pickup device identification information for identifying the camera C from which the captured image data has been acquired, and the sound data identifies a voice device (for example, a microphone) from which the sound data has been acquired. The voice device identification information for the purpose may be included. In this case, the display mode determination unit 132 identifies the sound data corresponding to the captured image data by referring to the related information stored in the storage unit 12 in which the voice device identification information and the image pickup device identification information are associated with each other. To do. Then, the display mode is determined based on the state of the sound indicated by the specified sound data. By operating the display mode determining unit 132 in this way, even when sound data is acquired by an external microphone not built in the camera C, the display mode can be set based on the sound state.

撮像画像データには、当該撮像画像データが取得された位置を示す第１位置情報が含まれており、音データには、当該音データが取得された位置を示す第２位置情報が含まれていてもよい。この場合、表示態様決定部１３２は、第１位置情報が示す位置から最も近い位置に対応する第２位置情報に対応する音データの状態に基づいて表示態様を決定する。 The captured image data includes the first position information indicating the position where the captured image data is acquired, and the sound data includes the second position information indicating the position where the sound data is acquired. You may. In this case, the display mode determination unit 132 determines the display mode based on the state of the sound data corresponding to the second position information corresponding to the position closest to the position indicated by the first position information.

すなわち、表示態様決定部１３２は、第２位置情報が含まれた複数の音データをデータ取得部１３１が取得した場合に、撮像画像データを出力したカメラＣから最も近い位置に設置されたマイクロフォンで取得された音データが示す音の状態に基づいて表示態様を決定する。表示態様決定部１３２がこのように動作することで、複数の外部マイクロフォンで音データが取得された場合であっても、最も適した音の状態に基づく表示態様にすることができるので、適切な表示態様にすることができる。 That is, the display mode determination unit 132 is a microphone installed at the position closest to the camera C that outputs the captured image data when the data acquisition unit 131 acquires a plurality of sound data including the second position information. The display mode is determined based on the state of the sound indicated by the acquired sound data. By operating the display mode determining unit 132 in this way, even when sound data is acquired by a plurality of external microphones, it is possible to obtain a display mode based on the most suitable sound state, which is appropriate. It can be displayed in a display mode.

表示制御部１３３は、表示態様決定部１３２が決定した表示態様で、複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データを表示させるように複数の撮像画像データをユーザ端末２のディスプレイに表示させるようにユーザ端末２を制御する。表示制御部１３３は、例えば、最も大きな歓声を示す音データに対応する撮像データに、特定撮像画像データであることを示すフラグを付した状態で当該撮像データをユーザ端末２に送信することで、ユーザ端末２が、特定撮像画像データに対応する画像をユーザが注目しやすいように表示できるようにする。表示制御部１３３は、表示態様決定部１３２が決定した表示態様で構成された画面のデータをユーザ端末２に送信することにより、表示態様決定部１３２が決定した表示態様で特定撮像画像データをユーザ端末２に表示させてもよい。 The display control unit 133 displays a plurality of captured image data in the display mode determined by the display mode determining unit 132 so as to display the specific captured image data which is the specific captured image data among the plurality of captured image data. The user terminal 2 is controlled so as to be displayed on the display of. The display control unit 133 transmits, for example, the imaged data corresponding to the sound data showing the loudest cheers to the user terminal 2 with a flag indicating that the image data is the specific imaged image data. The user terminal 2 enables the image corresponding to the specific captured image data to be displayed so that the user can easily pay attention to it. The display control unit 133 transmits the screen data configured in the display mode determined by the display mode determination unit 132 to the user terminal 2, so that the user can use the specific captured image data in the display mode determined by the display mode determination unit 132. It may be displayed on the terminal 2.

［画像処理システムＳにおける動作シーケンス］
図５は、複数のカメラＣ、サーバ１及びユーザ端末２の動作シーケンスを示す図である。複数のカメラＣは、サーバ１に対して撮像画像データ及び音データを送信する。表示態様決定部１３２は、データ取得部１３１が取得した音データが示す音の状態を特定し（Ｓ１１）、特定した音の状態に基づいて、複数の撮像画像データに対応する複数の画像の表示態様を決定する（Ｓ１２）。表示制御部１３３は、表示態様決定部１３２が決定した表示態様に対応する画面データを作成し（Ｓ１３）、作成した画面データと音データとを関連付けてユーザ端末２に送信する。ユーザ端末２は、サーバ１から受信した画面データに基づく画面をディスプレイに表示する（Ｓ１４）。 [Operation sequence in image processing system S]
FIG. 5 is a diagram showing an operation sequence of a plurality of cameras C, a server 1, and a user terminal 2. The plurality of cameras C transmit captured image data and sound data to the server 1. The display mode determination unit 132 identifies the sound state indicated by the sound data acquired by the data acquisition unit 131 (S11), and displays a plurality of images corresponding to the plurality of captured image data based on the specified sound state. The aspect is determined (S12). The display control unit 133 creates screen data corresponding to the display mode determined by the display mode determination unit 132 (S13), associates the created screen data with the sound data, and transmits the screen data to the user terminal 2. The user terminal 2 displays a screen based on the screen data received from the server 1 on the display (S14).

複数のカメラＣ、サーバ１及びユーザ端末２は、複数のカメラＣが撮像画像データ及び音データを送信している間、図５に示す処理を繰り返す。複数のカメラＣ、サーバ１及びユーザ端末２がこのように動作することで、ユーザ端末２が、図２（ｂ）に示したように、多くのユーザの関心を惹きつけるような注目シーンをユーザが見やすいようにディスプレイに表示することができる。 The plurality of cameras C, the server 1, and the user terminal 2 repeat the process shown in FIG. 5 while the plurality of cameras C transmit the captured image data and the sound data. By operating the plurality of cameras C, the server 1, and the user terminal 2 in this way, the user terminal 2 can perform a attention scene that attracts the attention of many users as shown in FIG. 2 (b). Can be displayed on the display for easy viewing.

＜第２の実施形態＞
第１の実施形態においては、サーバ１が音の状態に基づいて表示態様を決定したが、ユーザ端末２が表示態様を決定してもよい。この場合、サーバ１は、複数のカメラＣとユーザ端末２との間で撮像画像データを中継する中継装置として機能する。 <Second embodiment>
In the first embodiment, the server 1 determines the display mode based on the sound state, but the user terminal 2 may determine the display mode. In this case, the server 1 functions as a relay device that relays captured image data between the plurality of cameras C and the user terminal 2.

図６は、ユーザ端末２の構成を示す図である。ユーザ端末２は、表示部２０と、通信部２１と、記憶部２２と、制御部２３とを有する。制御部２３は、データ取得部２３１、表示態様決定部２３２及び表示制御部２３３を有する。 FIG. 6 is a diagram showing the configuration of the user terminal 2. The user terminal 2 has a display unit 20, a communication unit 21, a storage unit 22, and a control unit 23. The control unit 23 includes a data acquisition unit 231, a display mode determination unit 232, and a display control unit 233.

表示部２０は、複数の撮像画像データに基づく画像を表示するディスプレイである。表示部２０は、表示制御部２３３が作成した画面データを表示する。 The display unit 20 is a display that displays an image based on a plurality of captured image data. The display unit 20 displays the screen data created by the display control unit 233.

通信部２１は、ネットワークＮを介してサーバ１から撮像画像データ及び音データを受信するための通信インターフェースである。通信部２１は、受信した撮像画像データ及び音データをデータ取得部２３１に入力する。 The communication unit 21 is a communication interface for receiving captured image data and sound data from the server 1 via the network N. The communication unit 21 inputs the received captured image data and sound data to the data acquisition unit 231.

記憶部２２は、ＲＯＭ及びＲＡＭを含む記憶媒体である。記憶部２２は、制御部２３が実行するプログラムを記憶している。記憶部２２は、通信部２１が受信した撮像画像データ及び音データを一時的に記憶してもよい。 The storage unit 22 is a storage medium including a ROM and a RAM. The storage unit 22 stores a program executed by the control unit 23. The storage unit 22 may temporarily store the captured image data and the sound data received by the communication unit 21.

制御部２３は、例えばＣＰＵを有しており、記憶部２２に記憶されたプログラムを実行することにより、データ取得部２３１、表示態様決定部２３２及び表示制御部２３３として機能する。 The control unit 23 has, for example, a CPU, and functions as a data acquisition unit 231, a display mode determination unit 232, and a display control unit 233 by executing a program stored in the storage unit 22.

データ取得部２３１は、第１実施形態に係るサーバ１が有するデータ取得部１３１と同等の動作を実行する。例えば、データ取得部２３１は、通信部２１を介して、複数のカメラＣが所定の領域を撮影することにより生成した複数の撮像画像データと、複数の撮像装置が所定の領域を撮影している複数の位置で取得された複数の音データと、を関連付けて取得する。データ取得部２３１は、取得した撮像画像データ及び音データを表示態様決定部２３２に入力する。 The data acquisition unit 231 executes the same operation as the data acquisition unit 131 of the server 1 according to the first embodiment. For example, in the data acquisition unit 231, a plurality of captured image data generated by a plurality of cameras C photographing a predetermined area and a plurality of imaging devices are photographing a predetermined area via the communication unit 21. Acquires a plurality of sound data acquired at a plurality of positions in association with each other. The data acquisition unit 231 inputs the acquired captured image data and sound data to the display mode determination unit 232.

表示態様決定部２３２は、第１実施形態に係るサーバ１が有する表示態様決定部１３２と同等の動作を実行する。例えば、表示態様決定部２３２は、複数の音データのうち少なくとも一部の音データが示す音の状態に基づいて、複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データの表示態様を決定する。表示態様決定部２３２は、データ取得部１３１が実行する他の動作も実行することができる。表示態様決定部２３２は、決定した表示態様を表示制御部２３３に通知する。 The display mode determining unit 232 executes the same operation as the display mode determining unit 132 of the server 1 according to the first embodiment. For example, the display mode determining unit 232 displays the specific captured image data which is the specific captured image data among the plurality of captured image data based on the sound state indicated by at least a part of the sound data among the plurality of sound data. Determine the aspect. The display mode determination unit 232 can also execute other operations executed by the data acquisition unit 131. The display mode determination unit 232 notifies the display control unit 233 of the determined display mode.

表示制御部２３３は、第１実施形態に係るサーバ１が有する１３３と同等の動作を実行する。例えば、表示制御部２３３は、表示態様決定部２３２が決定した表示態様で、複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データを表示させるように複数の撮像画像データを表示部２０に表示させる。表示制御部２３３は、例えば、表示態様決定部２３２が決定した表示態様に基づいて図２に示したような画面のデータを作成し、作成したデータを表示部２０に送信することにより、図２に示したような画面を表示部２０に表示させる。 The display control unit 233 executes an operation equivalent to 133 that the server 1 according to the first embodiment has. For example, the display control unit 233 displays a plurality of captured image data so as to display the specific captured image data which is the specific captured image data among the plurality of captured image data in the display mode determined by the display mode determining unit 232. Displayed in unit 20. The display control unit 233 creates screen data as shown in FIG. 2 based on the display mode determined by the display mode determination unit 232, and transmits the created data to the display unit 20, for example, FIG. A screen as shown in is displayed on the display unit 20.

図７は、第２の実施形態における複数のカメラＣ、サーバ１及びユーザ端末２の動作シーケンスを示す図である。複数のカメラＣは、サーバ１に対して撮像画像データ及び音データを送信する。サーバ１は、受信した撮像画像データ及び音データを、予め登録された一以上のユーザ端末２に対して配信する。 FIG. 7 is a diagram showing an operation sequence of the plurality of cameras C, the server 1, and the user terminal 2 in the second embodiment. The plurality of cameras C transmit captured image data and sound data to the server 1. The server 1 distributes the received captured image data and sound data to one or more user terminals 2 registered in advance.

ユーザ端末２の表示態様決定部２３２は、データ取得部２３１が取得した音データが示す音の状態を特定し（Ｓ２１）、特定した音の状態に基づいて、複数の撮像画像データに対応する複数の画像の表示態様を決定する（Ｓ２２）。表示制御部２３３は、表示態様決定部２３２が決定した表示態様に対応する画面データを作成し（Ｓ２３）、作成した画面データを表示部２０に表示させる（Ｓ２４）。 The display mode determination unit 232 of the user terminal 2 specifies the sound state indicated by the sound data acquired by the data acquisition unit 231 (S21), and based on the specified sound state, a plurality of captured image data corresponding to the plurality of captured image data. The display mode of the image of (S22) is determined. The display control unit 233 creates screen data corresponding to the display mode determined by the display mode determination unit 232 (S23), and causes the display unit 20 to display the created screen data (S24).

複数のカメラＣ、サーバ１及びユーザ端末２は、複数のカメラＣが撮像画像データ及び音データを送信している間、図７に示す処理を繰り返す。複数のカメラＣ、サーバ１及びユーザ端末２がこのように動作することで、ユーザ端末２が、図２（ｂ）に示したように、多くのユーザの関心を惹きつけるような注目シーンをユーザが見やすいようにディスプレイに表示することができる。 The plurality of cameras C, the server 1, and the user terminal 2 repeat the process shown in FIG. 7 while the plurality of cameras C transmit the captured image data and the sound data. By operating the plurality of cameras C, the server 1, and the user terminal 2 in this way, the user terminal 2 can perform a attention scene that attracts the attention of many users as shown in FIG. 2 (b). Can be displayed on the display for easy viewing.

［画像処理システムＳによる効果］
以上説明したように、画像処理システムＳにおいては、サーバ１又はユーザ端末２が、複数のカメラＣが出力した複数の撮像画像データ及び複数の音データを取得すると、複数の音データのうち少なくとも一部の音データが示す音の状態に基づいて、複数の撮像画像データのうち特定の撮像画像データである特定撮像画像データの表示態様を決定する。そして、ユーザ端末２は、図２（ｂ）に示したように、例えば大きな歓声を含む音データに対応する撮像画像データを、他の撮像画像データよりも大きく表示する。画像処理システムＳがこのように構成されていることで、例えばユーザが注目すべき画像をユーザが見逃すことがないように、動画の内容に適した表示態様でユーザが複数の動画を閲覧できるようになる。 [Effect of image processing system S]
As described above, in the image processing system S, when the server 1 or the user terminal 2 acquires a plurality of captured image data and a plurality of sound data output by the plurality of cameras C, at least one of the plurality of sound data is obtained. Based on the state of the sound indicated by the sound data of the unit, the display mode of the specific captured image data, which is the specific captured image data among the plurality of captured image data, is determined. Then, as shown in FIG. 2B, the user terminal 2 displays the captured image data corresponding to the sound data including, for example, a loud cheer, larger than the other captured image data. By configuring the image processing system S in this way, for example, the user can view a plurality of moving images in a display mode suitable for the content of the moving image so that the user does not miss an image that the user should pay attention to. become.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes can be made within the scope of the gist thereof. is there. For example, all or a part of the device can be functionally or physically distributed / integrated in any unit. Also included in the embodiments of the present invention are new embodiments resulting from any combination of the plurality of embodiments. The effect of the new embodiment produced by the combination also has the effect of the original embodiment.

１サーバ
２ユーザ端末
１１通信部
１２記憶部
１３制御部
２０表示部
２１通信部
２２記憶部
２３制御部
１３１データ取得部
１３２表示態様決定部
１３３表示制御部
２３１データ取得部
２３２表示態様決定部
２３３表示制御部
1 Server 2 User terminal 11 Communication unit 12 Storage unit 13 Control unit 20 Display unit 21 Communication unit 22 Storage unit 23 Control unit 131 Data acquisition unit 132 Display mode determination unit 133 Display control unit 231 Data acquisition unit 232 Display mode determination unit 233 Display Control unit

Claims

A plurality of captured image data generated by a plurality of imaging devices photographing a predetermined area, a plurality of sound data acquired at a plurality of positions where the plurality of imaging devices are photographing the predetermined area, and a plurality of sound data. And the data acquisition part to be acquired by associating
Specific captured image data among the plurality of captured image data based on the state of sounds other than a predetermined sound among the sounds included in at least a part of the plurality of sound data. A display mode determination unit that determines the display mode of captured image data,
A display control unit that displays the plurality of captured image data on the display unit so that the specific captured image data is displayed in the display mode determined by the display mode determining unit.
An image processing device having.

The display mode determining unit determines the display mode based on the loudness of the sound corresponding to at least a part of the sound data.
The image processing apparatus according to claim 1.

Among the plurality of sound data, the display mode determining unit corresponds to a second sound having a size larger than the display size of the first captured image data corresponding to the first sound data in the display unit and larger than the first sound data. The display mode is determined so that the display size of the second captured image data corresponding to the sound data becomes large.
The image processing apparatus according to claim 2.

The display mode determining unit determines the display mode based on the content of the sound corresponding to at least a part of the sound data.
The image processing apparatus according to any one of claims 1 to 3.

The captured image data includes image pickup device identification information for identifying the image pickup device from which the captured image data was acquired, and the sound data is for identifying the device from which the sound data was acquired. Contains voice device identification information
The display mode determining unit determines the display mode by referring to the related information associated with the audio device identification information and the image pickup device identification information for identifying each of the plurality of image pickup devices.
The image processing apparatus according to any one of claims 1 to 4.

The captured image data includes first position information indicating the position where the captured image data is acquired, and the sound data includes second position information indicating the position where the sound data is acquired. And
The display mode determining unit determines the display mode based on the state of the sound data corresponding to the second position information corresponding to the position closest to the position indicated by the first position information.
The image processing apparatus according to any one of claims 1 to 5.

The data acquisition unit acquires the plurality of sound data acquired by the directional microphones provided in each of the plurality of imaging devices.
The image processing apparatus according to any one of claims 1 to 6.

A plurality of captured image data generated by a plurality of imaging devices photographing a predetermined area, a plurality of sound data acquired at a plurality of positions where the plurality of imaging devices are photographing the predetermined area, and a plurality of sound data. And the data acquisition part to be acquired by associating
A display mode determination for determining a display mode of specific captured image data, which is specific captured image data among the plurality of captured image data, based on a sound state indicated by at least a part of the plurality of sound data. Department and
A display control unit that displays the plurality of captured image data on the display unit so that the specific captured image data is displayed in the display mode determined by the display mode determining unit.
Have a,
The captured image data includes first position information indicating the position where the captured image data is acquired, and the sound data includes second position information indicating the position where the sound data is acquired. And
The display mode determining unit is an image processing device that determines the display mode based on the state of the sound data corresponding to the second position information corresponding to the position closest to the position indicated by the first position information.

The computer
A plurality of captured image data generated by a plurality of imaging devices photographing a predetermined area, a plurality of sound data acquired at a plurality of positions where the plurality of imaging devices are photographing the predetermined area, and a plurality of sound data. And the steps to get associated with
Specific captured image data among the plurality of captured image data based on the state of sounds other than a predetermined sound among the sounds included in at least a part of the plurality of sound data. Steps to determine the display mode of captured image data,
A step of displaying the plurality of captured image data on the display unit so as to display the specific captured image data in the determined display mode, and
Image processing method to execute.

Computer,
A plurality of captured image data generated by a plurality of imaging devices photographing a predetermined area, a plurality of sound data acquired at a plurality of positions where the plurality of imaging devices are photographing the predetermined area, and a plurality of sound data. And the data acquisition part to be acquired by associating
Specific captured image data among the plurality of captured image data based on the state of sounds other than a predetermined sound among the sounds included in at least a part of the plurality of sound data. A display mode determination unit that determines the display mode of captured image data,
A display control unit that displays the plurality of captured image data on the display unit so that the specific captured image data is displayed in the display mode determined by the display mode determining unit.
A program to function as.