JP2021125789A

JP2021125789A - Image processing device, image processing system, image processing method, and computer program

Info

Publication number: JP2021125789A
Application number: JP2020017776A
Authority: JP
Inventors: 麗岳; Rei Gaku
Original assignee: Sumitomo Electric Industries Ltd
Current assignee: Sumitomo Electric Industries Ltd
Priority date: 2020-02-05
Filing date: 2020-02-05
Publication date: 2021-08-30

Abstract

To provide an image processing device capable of generating an image related to a user without forcing an excessive burden on the user.SOLUTION: The image processing device comprises: an acquisition unit which acquires a plurality of images captured by a plurality of cameras; an orientation detection unit which detects the orientation of an object person; a position identification unit which identifies the position of the object person; and an image generation unit which on the basis of the detected orientation of the object person and the identified position of the object person, generates an image related to the object person from the images.SELECTED DRAWING: Figure 3

Description

本開示は、映像処理装置、映像処理システム、映像処理方法、及びコンピュータプログラムに関する。 The present disclosure relates to a video processing apparatus, a video processing system, a video processing method, and a computer program.

従来、手ぶらでテレビ電話をすることのできるハンズフリービデオフォンが提案されている（例えば、非特許文献１参照）。 Conventionally, a hands-free video phone capable of making a videophone call by hand has been proposed (see, for example, Non-Patent Document 1).

非特許文献１に記載のハンズフリービデオフォンでは、眼鏡型ウェアラブル端末装置を利用してテレビ電話を行う。当該端末装置には、顔に向かって５つの超広角カメラが配置されており、それらのカメラで撮影した顔の部分画像を合成することにより顔画像を生成し、相手側の端末装置に送信している。 The hands-free videophone described in Non-Patent Document 1 makes a videophone call using a glasses-type wearable terminal device. Five ultra-wide-angle cameras are arranged in the terminal device toward the face, and a face image is generated by synthesizing partial images of the face taken by those cameras and transmitted to the terminal device on the other side. ing.

“［ＣＥＡＴＥＣＪＡＰＡＮ２０１２］ＮＴＴドコモブース、手ぶらでテレビ電話ができる「ハンズフリービデオフォン」を参考出展”、［ｏｎｌｉｎｅ］、２０１２年１０月３日、Ｌｉｖｅｄｏｏｒニュース、ガジェット通信、［令和１年１１月２２日検索］、インターネット〈ＵＲＬ：ｈｔｔｐｓ：／／ｇｅｔｎｅｗｓ．ｊｐ／ａｒｃｈｉｖｅｓ／２５７９５７〉"[CEATEC JAPAN 2012] NTT DoCoMo booth, reference exhibit of" hands-free video phone "that allows you to make video calls empty-handed", [online], October 3, 2012, Livedoor News, Gadget Tsushin, [Reiwa 1st year 11th Search on 22nd of March], Internet <URL: https: //getnews.jp/archives/257957> “視線検出技術基本原理”、［ｏｎｌｉｎｅ］、２０１３年４月２３日、富士通研究所、［令和２年１月６日検索］、インターネット〈ＵＲＬ：ｈｔｔｐｓ：／／ｗｗｗ．ｆｕｊｉｔｓｕ．ｃｏｍ／ｊｐ／ｇｒｏｕｐ／ｌａｂｓ／ｒｅｓｏｕｒｃｅｓ／ｔｅｃｈ／ｔｅｃｈｇｕｉｄｅ／ｌｉｓｔ／ｅｙｅ−ｍｏｖｅｍｅｎｔｓ／ｐ０３．ｈｔｍｌ〉"Basic Principle of Line-of-sight Detection Technology", [online], April 23, 2013, Fujitsu Laboratories, [Search on January 6, 2nd year of Reiwa], Internet <URL: https: // www. Fujitsu. com / jp / group / labs / resources / tech / techguide / list / eye-movements / p03. html> 岡見和樹、竹内広太、磯貝愛、木全英明、「新たな映像視聴体験を実現する自由視点映像合成技術」、ＮＴＴ技術ジャーナル、２０１７年１０月、ｐ．１０〜１４Kazuki Okami, Kota Takeuchi, Ai Isogai, Hideaki Kizen, "Free-viewpoint video synthesis technology that realizes a new video viewing experience", NTT Technology Journal, October 2017, p. 10-14

しかしながら、従来の眼鏡型ウェアラブル端末装置はユーザの顔を撮影可能な位置に固定される。このため、当該端末装置を装着したユーザの顔以外の映像を相手側の端末装置に送信することができない。 However, the conventional eyeglass-type wearable terminal device is fixed at a position where the user's face can be photographed. Therefore, it is not possible to transmit an image other than the face of the user wearing the terminal device to the terminal device on the other side.

また、複数のカメラが装着された端末装置をユーザが装着することは、その重量等よりユーザに負担となり現実的ではない。また、複数の映像を撮影する必要があるため、長時間のバッテリー駆動が困難であり、発熱も懸念される。 Further, it is not realistic for the user to wear a terminal device equipped with a plurality of cameras because the weight and the like impose a burden on the user. In addition, since it is necessary to shoot a plurality of images, it is difficult to drive the battery for a long time, and there is a concern about heat generation.

本開示はこのような事情に鑑みてなされたものであり、ユーザに過度な負担を強いることなく、ユーザに関連する映像を生成することのできる映像処理装置、映像処理システム、映像処理方法、及びコンピュータプログラムを提供することを目的とする。 The present disclosure has been made in view of such circumstances, and is a video processing device, a video processing system, a video processing method, and a video processing method capable of generating a video related to the user without imposing an excessive burden on the user. The purpose is to provide computer programs.

本開示の一態様に係る映像処理装置は、複数のカメラで撮影された複数の映像を取得する取得部と、取得された前記複数の映像のうち、少なくとも１つの映像に基づいて対象者の向きを検出する向き検出部と、前記対象者の位置を特定する位置特定部と、検出された前記対象者の向き及び特定された前記対象者の位置に基づいて、前記複数の映像から前記対象者に関連する映像を生成する映像生成部とを備える。 The video processing device according to one aspect of the present disclosure is an orientation of a target person based on an acquisition unit that acquires a plurality of images captured by a plurality of cameras and at least one of the acquired plurality of images. Based on the orientation detection unit for detecting the target person, the position specifying unit for specifying the position of the target person, the detected orientation of the target person, and the identified position of the target person, the target person is selected from the plurality of images. It is provided with a video generation unit that generates video related to.

本開示の他の実施態様に係る映像処理システムは、複数のカメラと、前記複数のカメラで撮影された複数の映像に基づいて映像を生成する、上述の映像処理装置とを備える。 The image processing system according to another embodiment of the present disclosure includes a plurality of cameras and the above-mentioned image processing device that generates an image based on a plurality of images taken by the plurality of cameras.

本開示の他の実施態様に係る映像処理方法は、複数のカメラで撮影された複数の映像を取得するステップと、取得された前記複数の映像のうち、少なくとも１つの映像に基づいて対象者の向きを検出するステップと、前記対象者の位置を特定するステップと、検出された前記対象者の向き及び特定された前記対象者の位置に基づいて、前記複数の映像から前記対象者に関連する映像を生成するステップとを含む。 The image processing method according to another embodiment of the present disclosure is based on a step of acquiring a plurality of images captured by a plurality of cameras and at least one of the acquired images of the subject. Based on the step of detecting the orientation, the step of specifying the position of the target person, the detected orientation of the target person, and the identified position of the target person, the plurality of images are related to the target person. Includes steps to generate video.

本開示の他の実施態様に係るコンピュータプログラムは、コンピュータを、複数のカメラで撮影された複数の映像を取得する取得部と、取得された前記複数の映像のうち、少なくとも１つの映像に基づいて対象者の向きを検出する向き検出部と、前記対象者の位置を特定する位置特定部と、検出された前記対象者の向き及び特定された前記対象者の位置に基づいて、前記複数の映像から前記対象者に関連する映像を生成する映像生成部として機能させる。 A computer program according to another embodiment of the present disclosure is based on a computer based on an acquisition unit that acquires a plurality of images captured by a plurality of cameras, and at least one of the acquired plurality of images. The plurality of images are based on a direction detecting unit that detects the orientation of the target person, a position specifying unit that specifies the position of the target person, the detected orientation of the target person, and the identified position of the target person. To function as an image generation unit that generates an image related to the target person.

なお、上記コンピュータプログラムを、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等のコンピュータ読取可能な非一時的な記録媒体やインターネット等の通信ネットワークを介して流通させることができるのは、言うまでもない。また、本開示は、映像処理装置の一部又は全部を実現する半導体集積回路として実現することもできる。 Needless to say, the computer program can be distributed via a computer-readable non-temporary recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or a communication network such as the Internet. Further, the present disclosure can also be realized as a semiconductor integrated circuit that realizes a part or all of the video processing apparatus.

本開示によると、ユーザに過度な負担を強いることなく、ユーザに関連する映像を生成することができる。 According to the present disclosure, it is possible to generate a video related to a user without imposing an excessive burden on the user.

図１は、本開示の実施形態１に係る映像処理システムの構成を示す図である。FIG. 1 is a diagram showing a configuration of a video processing system according to the first embodiment of the present disclosure. 図２は、映像処理装置の機能的な構成を示すブロック図である。FIG. 2 is a block diagram showing a functional configuration of the video processing device. 図３は、映像処理装置が備える映像処理部の詳細な構成を示すブロック図である。FIG. 3 is a block diagram showing a detailed configuration of a video processing unit included in the video processing device. 図４は、対象者映像選択部による対象者映像の選択処理の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of the target person image selection process by the target person image selection unit. 図５は、周囲映像選択部による周囲映像の選択処理の一例を説明するための図である。FIG. 5 is a diagram for explaining an example of the ambient image selection process by the ambient image selection unit. 図６は、周囲映像選択部による周囲映像の選択処理の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of the ambient image selection process by the ambient image selection unit. 図７は、周囲映像選択部による周囲映像の選択処理の一例を説明するための図である。FIG. 7 is a diagram for explaining an example of the ambient image selection process by the ambient image selection unit. 図８は、映像処理装置の処理手順の一例を示すフローチャートである。FIG. 8 is a flowchart showing an example of a processing procedure of the video processing apparatus. 図９は、自撮りモード映像処理（図８のステップＳ４）の詳細を示すフローチャートである。FIG. 9 is a flowchart showing details of the self-shooting mode video processing (step S4 of FIG. 8). 図１０は、視点共有モード映像処理（図８のステップＳ６）の詳細を示すフローチャートである。FIG. 10 is a flowchart showing details of the viewpoint sharing mode video processing (step S6 of FIG. 8). 図１１は、映像処理装置が備える映像処理部の詳細な構成を示すブロック図である。FIG. 11 is a block diagram showing a detailed configuration of a video processing unit included in the video processing device. 図１２は、自撮り映像生成部による自撮り映像の生成処理の一例を説明するための図である。FIG. 12 is a diagram for explaining an example of a self-portrait video generation process by the self-portrait video generation unit. 図１３は、対象者視点映像生成部による対象者視点映像の生成処理の一例を説明するための図である。FIG. 13 is a diagram for explaining an example of the generation process of the target person's viewpoint image by the target person's viewpoint image generation unit. 図１４は、自撮りモード映像処理（図８のステップＳ４）の詳細を示すフローチャートである。FIG. 14 is a flowchart showing details of the self-shooting mode video processing (step S4 of FIG. 8). 図１５は、視点共有モード映像処理（図８のステップＳ６）の詳細を示すフローチャートである。FIG. 15 is a flowchart showing details of the viewpoint sharing mode video processing (step S6 of FIG. 8).

［本開示の実施形態の概要］
最初に本開示の実施形態の概要を列記して説明する。 [Summary of Embodiments of the present disclosure]
First, the outlines of the embodiments of the present disclosure will be listed and described.

（１）本開示の一実施形態に係る映像処理装置は、複数のカメラで撮影された複数の映像を取得する取得部と、対象者の向きを検出する向き検出部と、前記対象者の位置を特定する位置特定部と、検出された前記対象者の向き及び特定された前記対象者の位置に基づいて、前記複数の映像から前記対象者に関連する映像を生成する映像生成部とを備える。 (1) The image processing apparatus according to the embodiment of the present disclosure includes an acquisition unit that acquires a plurality of images taken by a plurality of cameras, an orientation detection unit that detects the orientation of the target person, and a position of the target person. A position specifying unit for specifying the target person, and an image generating unit for generating an image related to the target person from the plurality of images based on the detected orientation of the target person and the identified position of the target person. ..

この構成によると、例えば、ネットワークを介して複数のカメラから取得された複数の映像から、対象者の向き及び位置に基づいて、対象者に関連する映像を生成することができる。このため、対象者がカメラが配置された専用端末装置を装着せずとも、対象者に関連する映像を生成することができる。これにより、ユーザに過度な負担を強いることなく、ユーザに関連する映像を生成することができる。 According to this configuration, for example, from a plurality of images acquired from a plurality of cameras via a network, images related to the target person can be generated based on the orientation and position of the target person. Therefore, it is possible to generate an image related to the target person without the target person wearing a dedicated terminal device on which the camera is arranged. As a result, it is possible to generate a video related to the user without imposing an excessive burden on the user.

（２）好ましくは、前記複数のカメラは、屋外に設置されたカメラ、屋内に設置されたカメラ、及び車載カメラの少なくとも１つを含む。 (2) Preferably, the plurality of cameras include at least one of an outdoor-installed camera, an indoor-installed camera, and an in-vehicle camera.

この構成によると、専用のカメラを用いなくても、対象者の周囲に存在するカメラを用いて、対象者に関連する映像を生成することができる。つまり、対象者がカメラのバッテリー残量や発熱等を気にする必要なく、対象者に関連する映像を生成することができる。 According to this configuration, it is possible to generate an image related to the target person by using a camera existing around the target person without using a dedicated camera. That is, the target person can generate an image related to the target person without having to worry about the remaining battery level of the camera, heat generation, and the like.

（３）さらに好ましくは、前記向き検出部は、取得された前記複数の映像のうち、少なくとも１つの映像に基づいて前記対象者の向きを検出する。 (3) More preferably, the orientation detection unit detects the orientation of the target person based on at least one of the acquired plurality of images.

この構成によると、映像に基づいて対象者の向きを検出することができる。このため、対象者は、対象者の向きを検出するための装置を持ち歩く必要がない。 According to this configuration, the orientation of the target person can be detected based on the image. Therefore, the subject does not need to carry a device for detecting the orientation of the subject.

（４）さらに好ましくは、前記映像生成部は、前記対象者に関連する映像として、前記対象者に対向する位置から前記対象者の向きに前記対象者を撮影した映像を生成する。 (4) More preferably, the image generation unit generates an image obtained by shooting the target person in the direction of the target person from a position facing the target person as an image related to the target person.

この構成によると、対象者に対向する位置から対象者を撮影した、いわゆる自撮り映像を生成することができる。 According to this configuration, it is possible to generate a so-called self-portrait image in which the target person is photographed from a position facing the target person.

（５）また、前記映像生成部は、前記複数の映像から、前記対象者の向きと最も近い向きから撮影した映像を、前記対象者に関連する映像として抽出してもよい。 (5) Further, the image generation unit may extract an image taken from the direction closest to the direction of the target person from the plurality of images as an image related to the target person.

この構成によると、複数の映像の中から、対象者の向きに最も近い向きから対象者を撮影した映像を、いわゆる自撮り映像として生成することができる。このため、高速に自撮り映像を生成することができる。 According to this configuration, it is possible to generate a so-called self-portrait image from a plurality of images in which the target person is photographed from the direction closest to the target person's direction. Therefore, it is possible to generate a self-portrait image at high speed.

（６）また、前記映像生成部は、前記複数の映像から、前記対象者の像を含む映像を検出し、検出された前記対象者の像を含む映像を合成することにより、前記対象者に対向する位置から前記対象者の向きに前記対象者を撮影した映像を生成してもよい。 (6) Further, the image generation unit detects an image including the image of the target person from the plurality of images, and synthesizes the detected image including the image of the target person to make the target person. An image obtained by photographing the target person in the direction of the target person from the opposite position may be generated.

この構成によると、対象者の像を含む映像を合成することにより、いわゆる自撮り映像を生成することができる。このため、対象者の向きに対象者を撮影した映像がカメラから得られない場合であっても、当該向きに対象を撮影した映像を生成することができる。 According to this configuration, a so-called self-portrait image can be generated by synthesizing an image including an image of a target person. Therefore, even if the image obtained by shooting the target person in the direction of the target person cannot be obtained from the camera, it is possible to generate the image obtained by shooting the target person in the direction.

（７）また、前記映像生成部は、前記対象者に関連する映像として、前記対象者の位置から前記対象者の向きに前記対象者の周囲を見た映像を生成してもよい。 (7) Further, the image generation unit may generate an image of the surroundings of the target person in the direction of the target person from the position of the target person as an image related to the target person.

この構成によると、例えば、対象者の視線の先にある対象の映像、つまり、対象者が見ているのと同じ対象の映像を生成することができる。 According to this configuration, for example, it is possible to generate an image of an object in front of the line of sight of the object, that is, an image of the same object as the object is seeing.

（８）また、前記映像生成部は、前記複数の映像から、前記対象者の位置から前記対象者の向きに前記対象者の周囲を見た映像を選択してもよい。 (8) Further, the image generation unit may select an image of the surroundings of the target person from the position of the target person in the direction of the target person from the plurality of images.

この構成によると、複数の映像から、例えば、対象者の視線の先にある対象の映像を選択することができる。このため、対象者から周囲を見た映像を高速に生成することができる。 According to this configuration, it is possible to select, for example, a target image in front of the target person's line of sight from a plurality of images. Therefore, it is possible to generate an image of the surroundings from the target person at high speed.

（９）また、前記映像生成部は、前記複数の映像を合成することにより、前記対象者の位置から前記対象者の向きに前記対象者の周囲を見た映像を生成してもよい。 (9) Further, the image generation unit may generate an image in which the surroundings of the target person are viewed from the position of the target person in the direction of the target person by synthesizing the plurality of images.

この構成によると、複数の映像を合成することにより、例えば、対象者の視線の先にある対象の映像を生成することができる。このため、１つの映像からではこのような映像を生成することができない場合であっても、対象者が見ているのと同じ対象の映像を生成することができる。 According to this configuration, by synthesizing a plurality of images, for example, it is possible to generate an image of an object in front of the line of sight of the object. Therefore, even if it is not possible to generate such an image from one image, it is possible to generate the same image of the target as the subject is viewing.

（１０）また、前記対象者の向きは、前記対象者の視線の向き、前記対象者の顔の向き、及び前記対象者の体の向きの少なくとも１つを含んでいてもよい。 (10) Further, the orientation of the subject may include at least one of the orientation of the subject's line of sight, the orientation of the subject's face, and the orientation of the subject's body.

この構成によると、対象者の視線の向き、顔の向き又は体の向きから対象者を見た映像、又は対象者の位置から当該向きに周囲を見た映像を生成することができる。 According to this configuration, it is possible to generate an image of the target person viewed from the direction of the target person's line of sight, face or body, or an image of the surroundings viewed from the position of the target person in that direction.

（１１）また、前記映像生成部は、さらに、前記対象者に関連する映像に含まれる前記対象者以外の人物の像に対してプライバシー保護処理を施してもよい。 (11) Further, the video generation unit may further perform privacy protection processing on an image of a person other than the target person included in the video related to the target person.

この構成によると、対象者以外の人物のプライバシーを保護した映像を生成することができる。 According to this configuration, it is possible to generate an image in which the privacy of a person other than the target person is protected.

（１２）また、上述の映像処理装置は、さらに、前記映像生成部が生成した前記対象者に関連する映像を端末装置に送信する映像送信部を備えてもよい。 (12) Further, the above-mentioned video processing device may further include a video transmission unit that transmits the video generated by the video generation unit and related to the target person to the terminal device.

この構成によると、生成した映像を端末装置に送信することができる。このため、対象者はカメラを意識することなく、端末装置を利用する対話者とビデオ通話をしたり、テレビ会議をしたりすることができる。 According to this configuration, the generated video can be transmitted to the terminal device. Therefore, the subject can make a video call or have a video conference with an interlocutor who uses the terminal device without being aware of the camera.

（１３）本開示の他の実施形態に係る映像処理システムは、複数のカメラと、前記複数のカメラで撮影された複数の映像に基づいて映像を生成する、上述の映像処理装置とを備える。 (13) The image processing system according to another embodiment of the present disclosure includes a plurality of cameras and the above-mentioned image processing device that generates an image based on a plurality of images taken by the plurality of cameras.

この構成は、上述の映像処理装置と同様の構成を有する。このため、この構成によると、上述の映像処理装置と同様の作用及び効果を奏することができる。 This configuration has the same configuration as the above-mentioned video processing apparatus. Therefore, according to this configuration, it is possible to obtain the same operations and effects as those of the above-mentioned video processing apparatus.

（１４）本開示の他の実施形態に係る映像処理方法は、複数のカメラで撮影された複数の映像を取得するステップと、対象者の向きを検出するステップと、前記対象者の位置を特定するステップと、検出された前記対象者の向き及び特定された前記対象者の位置に基づいて、前記複数の映像から前記対象者に関連する映像を生成するステップとを含む。 (14) The image processing method according to another embodiment of the present disclosure specifies a step of acquiring a plurality of images taken by a plurality of cameras, a step of detecting the orientation of the target person, and a position of the target person. A step of generating an image related to the object from the plurality of images based on the detected orientation of the object and the identified position of the object.

この構成は、上述の映像処理装置が備える特徴的な処理部に対応するステップを含む。このため、この構成によると、上述の映像処理装置と同様の作用及び効果を奏することができる。 This configuration includes a step corresponding to a characteristic processing unit included in the above-mentioned video processing apparatus. Therefore, according to this configuration, it is possible to obtain the same operations and effects as those of the above-mentioned video processing apparatus.

（１５）本開示の他の実施形態に係るコンピュータプログラムは、コンピュータを、複数のカメラで撮影された複数の映像を取得する取得部と、対象者の向きを検出する向き検出部と、前記対象者の位置を特定する位置特定部と、検出された前記対象者の向き及び特定された前記対象者の位置に基づいて、前記複数の映像から前記対象者に関連する映像を生成する映像生成部として機能させる。 (15) A computer program according to another embodiment of the present disclosure includes a computer, an acquisition unit that acquires a plurality of images taken by a plurality of cameras, an orientation detection unit that detects the orientation of a target person, and the target. A position specifying unit that specifies the position of a person, and an image generation unit that generates an image related to the target person from the plurality of images based on the detected orientation of the target person and the identified position of the target person. To function as.

この構成によると、コンピュータを、上述の映像処理装置として機能させることができる。このため、上述の映像処理装置と同様の作用及び効果を奏することができる。 According to this configuration, the computer can function as the above-mentioned video processing device. Therefore, the same operations and effects as those of the above-mentioned video processing apparatus can be obtained.

［本開示の実施形態の詳細］
以下、本開示の実施形態について、図面を参照しながら説明する。なお、以下で説明する実施形態は、いずれも本開示の一具体例を示すものである。以下の実施形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、本開示を限定するものではない。また、以下の実施形態における構成要素のうち、独立請求項に記載されていない構成要素については、任意に付加可能な構成要素である。また、各図は、模式図であり、必ずしも厳密に図示されたものではない。 [Details of Embodiments of the present disclosure]
Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be noted that all of the embodiments described below show a specific example of the present disclosure. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, step order, and the like shown in the following embodiments are examples, and do not limit the present disclosure. Further, among the components in the following embodiments, the components not described in the independent claims are components that can be arbitrarily added. Further, each figure is a schematic view and is not necessarily exactly illustrated.

また、同一の構成要素には同一の符号を付す。それらの機能及び名称も同様であるため、それらの説明は適宜省略する。 Further, the same components are designated by the same reference numerals. Since their functions and names are the same, their description will be omitted as appropriate.

＜実施形態１＞
〔映像処理システムの全体構成〕
図１は、本開示の実施形態１に係る映像処理システムの構成を示す図である。 <Embodiment 1>
[Overall configuration of video processing system]
FIG. 1 is a diagram showing a configuration of a video processing system according to the first embodiment of the present disclosure.

映像処理システム１００は、対象者３の映像又は対象者３が見ているであろう映像を、対象者３以外のユーザ４に送信するシステムである。ユーザ４は、例えば、対象者３と通話を行う対話者である。 The video processing system 100 is a system that transmits the video of the target person 3 or the video that the target person 3 will be viewing to the user 4 other than the target person 3. The user 4 is, for example, an interlocutor who makes a call with the target person 3.

映像処理システム１００は、複数のカメラ２と、映像処理装置１と、端末装置５とを備える。 The video processing system 100 includes a plurality of cameras 2, a video processing device 1, and a terminal device 5.

カメラ２は、例えば、街中又は建物の内外に設置された監視カメラ、室内に設置されたカメラ、車両に設置された車載カメラなどである。カメラ２は、インターネット又は５Ｇ（第５世代移動通信システム）通信網等のネットワーク７に直接的又は間接的に有線又は無線により接続されている。 The camera 2 is, for example, a surveillance camera installed in the city or inside or outside a building, a camera installed indoors, an in-vehicle camera installed in a vehicle, or the like. The camera 2 is directly or indirectly connected to a network 7 such as the Internet or a 5G (fifth generation mobile communication system) communication network by wire or wirelessly.

映像処理装置１は、ネットワーク７に接続され、複数のカメラ２から映像を取得し、取得した映像に基づいて、対象者３の映像又は対象者３が見ているであろう映像を生成する。映像処理装置１は、生成した映像を、ネットワーク７を介してユーザ４の利用するスマートフォンなどの端末装置５に送信する。 The image processing device 1 is connected to the network 7, acquires images from a plurality of cameras 2, and generates an image of the target person 3 or an image that the target person 3 will be viewing based on the acquired images. The video processing device 1 transmits the generated video to the terminal device 5 such as a smartphone used by the user 4 via the network 7.

対象者３は、例えば、ハンズフリーフォン６を装着しており、端末装置５を所持するユーザ４との間で会話を行っているものとする。ユーザ４は、端末装置５の画面越しに対象者３の映像又は対象者３が見ているであろう映像を見ながら、対象者３と会話を行うことができる。 It is assumed that the target person 3 is wearing, for example, a hands-free phone 6 and is having a conversation with a user 4 who has a terminal device 5. The user 4 can have a conversation with the target person 3 while watching the image of the target person 3 or the image that the target person 3 will be seeing through the screen of the terminal device 5.

〔映像処理装置１の構成〕
図２は、映像処理装置１の機能的な構成を示すブロック図である。映像処理装置１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、通信インタフェース、入出力インタフェース等を備えるコンピュータにより構成することができる。 [Configuration of video processing device 1]
FIG. 2 is a block diagram showing a functional configuration of the video processing device 1. The video processing device 1 is composed of, for example, a computer including a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), a communication interface, an input / output interface, and the like. Can be done.

映像処理装置１は、映像取得部１０と、対象者情報取得部２０と、モード受付部３０と、映像処理部４０と、映像送信部５０とを備える。例えば、コンピュータのＨＤＤに記憶されたコンピュータプログラムをＲＡＭ上に展開し、コンピュータプログラムをＣＰＵ上で実行することにより、処理部１０から処理部５０の機能が実現される。 The video processing device 1 includes a video acquisition unit 10, a target person information acquisition unit 20, a mode reception unit 30, a video processing unit 40, and a video transmission unit 50. For example, the functions of the processing units 10 to 50 are realized by expanding the computer program stored in the HDD of the computer on the RAM and executing the computer program on the CPU.

映像取得部１０は、取得部として機能し、ネットワーク７を介してカメラ２から、カメラ２が撮影した映像データ（以下、「映像」という）を受信し、当該映像を映像処理部４０に出力する。なお、映像には、カメラ２の識別子、カメラ２の位置及びカメラ２の光軸の向きを示す情報が付加されているものとする。ただし、これらの情報がＨＤＤ等の記憶装置に記憶されていてもよい。この場合、映像にはカメラ２の識別子の情報が付加されていれば、この情報からカメラ２の位置及びカメラ２の光軸の向きを特定することができる。 The video acquisition unit 10 functions as an acquisition unit, receives video data (hereinafter referred to as “video”) taken by the camera 2 from the camera 2 via the network 7, and outputs the video to the video processing unit 40. .. It is assumed that information indicating the identifier of the camera 2, the position of the camera 2, and the direction of the optical axis of the camera 2 is added to the video. However, these information may be stored in a storage device such as an HDD. In this case, if the information of the identifier of the camera 2 is added to the video, the position of the camera 2 and the direction of the optical axis of the camera 2 can be specified from this information.

対象者情報取得部２０は、映像の生成対象とされる対象者３の情報である対象者情報を取得する。対象者情報とは、対象者３の像を特定するために必要な情報であり、例えば、対象者３の顔画像である。対象者３は、ユーザ４との会話を始めるにあたり、対象者３の所持するスマートフォン等の端末装置を用いて自身の顔画像を撮影（自撮り）し、映像処理装置１に送信し、映像処理装置１は、対象者３の顔画像を対象者情報として受信する。対象者情報取得部２０は、受信した対象者３の顔画像を映像処理部４０に出力する。 The target person information acquisition unit 20 acquires the target person information which is the information of the target person 3 whose image is to be generated. The target person information is information necessary for identifying the image of the target person 3, and is, for example, a face image of the target person 3. When the target person 3 starts a conversation with the user 4, the target person 3 takes a picture (selfie) of his / her own face image using a terminal device such as a smartphone owned by the target person 3, transmits it to the video processing device 1, and performs video processing. The device 1 receives the face image of the target person 3 as the target person information. The target person information acquisition unit 20 outputs the received face image of the target person 3 to the video processing unit 40.

なお、対象者３の所持する端末装置は、自撮りされた顔画像からＳＩＦＴ（Ｓｃａｌｅ−ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）又はＳＵＲＦ（Ｓｐｅｅｄｅｄ−ＵｐＲｏｂｕｓｔＦｅａｔｕｒｅｓ）などの画像特徴量を抽出してもよい。端末装置は、抽出した画像特徴量を対象者情報として映像処理装置１に送信し、映像処理装置１は、画像特徴量を対象者情報として受信する。対象者情報取得部２０は、受信した画像特徴量を映像処理部４０に出力する。 The terminal device possessed by the subject 3 may extract an image feature amount such as SIFT (Scale-Invariant Features Transfer) or SURF (Speeded-Up Robust Features) from the self-portrait face image. The terminal device transmits the extracted image feature amount to the video processing device 1 as the target person information, and the video processing device 1 receives the image feature amount as the target person information. The target person information acquisition unit 20 outputs the received image feature amount to the video processing unit 40.

なお、対象者３の識別子と、対象者３の顔画像又は画像特徴量とが、予め映像処理装置１のＨＤＤ等の記憶装置に記憶されていてもよい。この場合、対象者情報取得部２０は、対象者３の所持する端末装置から対象者３の識別子を取得し、識別子に基づいて、対象者３の顔画像又は画像特徴量を記憶装置から取得する。この場合、対象者３の識別子を、例えば、対象者３の電話番号とすることも可能である。映像処理装置１は、対象者３の通話開始時に電話番号を取得し、取得した電話番号に基づいて、対象者３の顔画像又は画像特徴量を取得する。 The identifier of the target person 3 and the face image or image feature amount of the target person 3 may be stored in advance in a storage device such as an HDD of the image processing device 1. In this case, the target person information acquisition unit 20 acquires the identifier of the target person 3 from the terminal device possessed by the target person 3, and acquires the face image or the image feature amount of the target person 3 from the storage device based on the identifier. .. In this case, the identifier of the target person 3 can be, for example, the telephone number of the target person 3. The video processing device 1 acquires a telephone number at the start of a call of the target person 3, and acquires a face image or an image feature amount of the target person 3 based on the acquired telephone number.

モード受付部３０は、生成対象とする映像のモードを受け付ける。モードは、自撮りモードと、視点共有モードとを含む。 The mode reception unit 30 receives the mode of the video to be generated. The mode includes a selfie mode and a viewpoint sharing mode.

自撮りモードとは、対象者３の映像を生成するモードのことである。より詳細には、自撮りモードとは、対象者３に対向する位置から対象者３の向きに対象者３を撮影した映像を生成するモードのことである。ここで、対象者３の向きとは、例えば、対象者３の視線の向き、対象者３の顔の向き、及び対象者３の体の向きの少なくとも１つを含む。 The self-portrait mode is a mode for generating an image of the subject 3. More specifically, the self-shooting mode is a mode for generating an image of the target person 3 taken in the direction of the target person 3 from a position facing the target person 3. Here, the orientation of the subject 3 includes, for example, at least one of the orientation of the line of sight of the subject 3, the orientation of the face of the subject 3, and the orientation of the body of the subject 3.

視点共有モードとは、対象者３が見ているであろう映像を生成するモードのことである。より詳細には、視点共有モードとは、対象者３の位置から対象者３の向きに対象者３の周囲を見た映像を生成するモードのことである。対象者３の向きは、自撮りモードの場合と同様である。 The viewpoint sharing mode is a mode for generating an image that the subject 3 will be watching. More specifically, the viewpoint sharing mode is a mode for generating an image of the surroundings of the target person 3 in the direction of the target person 3 from the position of the target person 3. The orientation of the subject 3 is the same as in the selfie mode.

対象者３は、ユーザ４との会話を始めるにあたり、対象者３の所持するスマートフォン等の端末装置を用いてモードを選択し、選択したモードをモード受付部３０に送信する。モード受付部３０は、端末装置からモードを受信し、受信したモードを映像処理部４０に出力する。 When starting a conversation with the user 4, the target person 3 selects a mode using a terminal device such as a smartphone possessed by the target person 3, and transmits the selected mode to the mode reception unit 30. The mode reception unit 30 receives the mode from the terminal device and outputs the received mode to the video processing unit 40.

映像処理部４０は、映像取得部１０から複数のカメラ２で撮影された映像と、対象者情報取得部２０から対象者３の対象者情報と、モード受付部３０からモードとを受け付ける。映像処理部４０は、複数の映像と対象者３の対象者情報とに基づいて、モードに応じた映像を生成する。つまり、映像処理部４０は、モードが自撮りモードの場合には対象者３の映像を生成し、モードが視点共有モードの場合には対象者３が見ているであろう映像を生成する。映像処理部４０は、生成した映像を映像送信部５０に出力する。なお、映像処理部４０の詳細については後述する。 The image processing unit 40 receives images taken by a plurality of cameras 2 from the image acquisition unit 10, target person information of the target person 3 from the target person information acquisition unit 20, and a mode from the mode reception unit 30. The video processing unit 40 generates a video according to the mode based on the plurality of videos and the target person information of the target person 3. That is, the image processing unit 40 generates the image of the target person 3 when the mode is the self-shooting mode, and generates the image that the target person 3 will be watching when the mode is the viewpoint sharing mode. The video processing unit 40 outputs the generated video to the video transmission unit 50. The details of the video processing unit 40 will be described later.

映像送信部５０は、映像処理部４０から映像を受け、当該映像を対象者３の通話相手であるユーザ４の利用する端末装置５に送信する。 The video transmission unit 50 receives the video from the video processing unit 40 and transmits the video to the terminal device 5 used by the user 4 who is the other party of the target person 3.

〔映像処理部４０の構成〕
図３は、図２に示した映像処理装置１が備える映像処理部４０の詳細な構成を示すブロック図である。 [Structure of video processing unit 40]
FIG. 3 is a block diagram showing a detailed configuration of the image processing unit 40 included in the image processing device 1 shown in FIG.

映像処理部４０は、対象者映像検出部４１と、対象者向き検出部４２と、対象者映像選択部４３と、プライバシー保護処理部４４と、対象者位置特定部４５と、周囲映像検出部４６と、周囲映像選択部４７とを備える。 The image processing unit 40 includes a target person image detection unit 41, a target person orientation detection unit 42, a target person image selection unit 43, a privacy protection processing unit 44, a target person position identification unit 45, and a surrounding image detection unit 46. And the surrounding image selection unit 47.

対象者映像検出部４１は、対象者情報取得部２０から受けた対象者情報に基づいて、映像取得部１０から受けた複数の映像の内、対象者３が映っている映像（以下、「対象者映像」という）を検出する。 The target person video detection unit 41 is based on the target person information received from the target person information acquisition unit 20, and is a video showing the target person 3 among a plurality of videos received from the video acquisition unit 10 (hereinafter, “target”. Detects "personal video").

具体的には、対象者映像検出部４１は、映像取得部１０から受けた各映像からＳＩＦＴ又はＳＵＲＦなどの画像特徴量を抽出する。また、対象者映像検出部４１は、対象者情報取得部２０から受けた対象者情報が対象者３の顔画像である場合には、当該顔画像から対象者３の画像特徴量を抽出する。なお、対象者映像検出部４１は、対象者情報取得部２０から受けた対象者情報が対象者３の画像特徴量である場合には、当該画像特徴量に対する処理は行わない。 Specifically, the target person image detection unit 41 extracts an image feature amount such as SIFT or SURF from each image received from the image acquisition unit 10. Further, when the target person information received from the target person information acquisition unit 20 is a face image of the target person 3, the target person image detection unit 41 extracts the image feature amount of the target person 3 from the face image. If the target person information received from the target person information acquisition unit 20 is the image feature amount of the target person 3, the target person image detection unit 41 does not perform processing on the image feature amount.

対象者映像検出部４１は、複数の映像の中から、対象者３の画像特徴量に類似する画像特徴量を有する領域を探索し、当該領域を有する映像を、対象者映像として検出する。つまり、対象者映像検出部４１は、複数の映像の中から、対象者３の画像特徴量と類似度が所定閾値以上の画像特徴量の領域を有する映像を、対象者映像として検出する。対象者映像検出部４１は、検出した対象者映像を対象者向き検出部４２、対象者映像選択部４３、及び対象者位置特定部４５に出力する。また、対象者映像検出部４１は、類似度が所定閾値以上の画像特徴量を有する領域の位置を、対象者映像中での対象者３の位置としてプライバシー保護処理部４４及び対象者位置特定部４５に出力する。 The target person image detection unit 41 searches for a region having an image feature amount similar to the image feature amount of the target person 3 from a plurality of images, and detects the image having the area as the target person image. That is, the target person image detection unit 41 detects as the target person image an image having an image feature amount region whose similarity with the image feature amount of the target person 3 is equal to or more than a predetermined threshold value from a plurality of images. The target person image detection unit 41 outputs the detected target person image to the target person orientation detection unit 42, the target person image selection unit 43, and the target person position identification unit 45. Further, the target person image detection unit 41 sets the position of the region having the image feature amount whose similarity is equal to or higher than a predetermined threshold value as the position of the target person 3 in the target person video, and sets the privacy protection processing unit 44 and the target person position identification unit. Output to 45.

対象者向き検出部４２は、向き検出部として機能し、対象者映像検出部４１から対象者映像を受け、当該対象者映像に基づいて、対象者３の向きを検出する処理部である。対象者３の向きの検出には公知の技術を用いることができる。例えば、対象者３の向きを、対象者３の視線の向きとした場合には、対象者向き検出部４２は、対象者映像から目の動かない部分（基準点）と動く部分（動点）とを検出する。ここで、基準点を対象者３の目頭、動点を対象者３の虹彩とする。対象者向き検出部４２は、基準点に対する動点の位置に基づいて、カメラ２の光軸の向きを基準とした場合の対象者３の視線の向きを検出する（例えば、非特許文献２参照）。 The target person orientation detection unit 42 functions as an orientation detection unit, receives a target person image from the target person image detection unit 41, and detects the direction of the target person 3 based on the target person image. A known technique can be used to detect the orientation of the subject 3. For example, when the direction of the target person 3 is the direction of the line of sight of the target person 3, the target person orientation detection unit 42 has a portion where the eyes do not move (reference point) and a portion (moving point) where the eyes do not move from the subject image. And detect. Here, the reference point is the inner corner of the subject 3 and the moving point is the iris of the subject 3. The target person orientation detection unit 42 detects the direction of the target person 3's line of sight when the direction of the optical axis of the camera 2 is used as a reference based on the position of the moving point with respect to the reference point (see, for example, Non-Patent Document 2). ).

対象者向き検出部４２は、対象者映像に付加されたカメラ２の位置及びカメラ２の光軸の向きを示す情報と、検出した対象者３の向きとに基づいて、３次元空間中での対象者３の向きを検出する。つまり、カメラ２の位置及びカメラ２の光軸の向きが分かっており、カメラ２の光軸の向きを基準としたときの対象者３の視線の向きが分かっているため、対象者向き検出部４２は、これらから、３次元空間中での対象者３の向きを計算する。対象者向き検出部４２は、対象者３の向きの情報を対象者映像選択部４３及び周囲映像選択部４７に出力する。 The target person orientation detection unit 42 in the three-dimensional space based on the information indicating the position of the camera 2 and the direction of the optical axis of the camera 2 added to the target person image and the detected orientation of the target person 3. The orientation of the subject 3 is detected. That is, since the position of the camera 2 and the direction of the optical axis of the camera 2 are known, and the direction of the line of sight of the target person 3 with respect to the direction of the optical axis of the camera 2 is known, the target person orientation detection unit 42 calculates the orientation of the subject 3 in the three-dimensional space from these. The target person orientation detection unit 42 outputs information on the orientation of the target person 3 to the target person image selection unit 43 and the surrounding image selection unit 47.

対象者映像選択部４３は、対象者映像検出部４１から対象者映像を受け、対象者向き検出部４２から対象者３の向きの情報を受ける。対象者映像選択部４３は、映像生成部として機能し、モードが自撮りモードの場合に、対象者映像の中から、対象者３の向きと最も近い向きから対象者３を撮影した映像を、対象者３に関連する映像として選択する。つまり、対象者映像選択部４３は、対象者映像ごとに３次元空間中でのカメラ２の光軸の向きと対象者３の向きとの差分を算出する。対象者映像選択部４３は、差分が最小の対象者映像を対象者３に関連する映像として選択する。対象者映像選択部４３は、選択した対象者３に関連する映像をプライバシー保護処理部４４に出力する。 The target person image selection unit 43 receives the target person image from the target person image detection unit 41, and receives information on the orientation of the target person 3 from the target person orientation detection unit 42. The target person image selection unit 43 functions as an image generation unit, and when the mode is the self-shooting mode, the target person 3 is photographed from the direction closest to the direction of the target person 3 from the target person images. Select as the video related to the subject 3. That is, the target person image selection unit 43 calculates the difference between the direction of the optical axis of the camera 2 and the direction of the target person 3 in the three-dimensional space for each target person image. The target person video selection unit 43 selects the target person video having the smallest difference as the video related to the target person 3. The target person video selection unit 43 outputs the video related to the selected target person 3 to the privacy protection processing unit 44.

図４は、対象者映像選択部４３による対象者映像の選択処理の一例を説明するための図である。図４の（Ａ）〜（Ｃ）は、対象者映像検出部４１から受けた対象者映像を示している。ここで、対象者３の向きを、例えば、対象者３の顔の向きとした場合には、図４の（Ｂ）に示す対象者映像が示す対象者３の顔の向きが、対象者映像を撮影したカメラ２の光軸の向きと最も近い。つまり、上記差分が最も小さい。このため、対象者映像選択部４３は、図４の（Ｂ）に示す対象者映像を、対象者３に関連する映像として選択する。図４の（Ｄ）に対象者３に関連する映像を示している。 FIG. 4 is a diagram for explaining an example of the target person image selection process by the target person image selection unit 43. (A) to (C) of FIG. 4 show the subject image received from the subject image detection unit 41. Here, when the orientation of the target person 3 is, for example, the orientation of the face of the target person 3, the orientation of the face of the target person 3 shown by the target person image shown in FIG. 4B is the target person image. It is closest to the direction of the optical axis of the camera 2 that took the picture. That is, the above difference is the smallest. Therefore, the target person image selection unit 43 selects the target person image shown in FIG. 4B as an image related to the target person 3. FIG. 4D shows an image related to the subject 3.

対象者位置特定部４５は、対象者映像検出部４１から対象者映像及び対象者３の位置の情報を受ける。対象者位置特定部４５は、位置特定部として機能し、対象者映像に付加されたカメラ２の位置及びカメラ２の光軸の向きを示す情報と、対象者映像中の対象者３の位置とから、３次元空間中での対象者３の位置を特定する。対象者位置特定部４５は、対象者映像に付加されたカメラ２の位置及びカメラ２の光軸の向きを示す情報から、カメラ２の３次元空間中での光軸方向を特定することができる。また、対象者映像中の対象者３の位置が分かっているため、対象者位置特定部４５は、３次元空間中でカメラ２の位置を基準として対象者３がどの方向に存在しているのかを計算により求めることができる。対象者位置特定部４５は、対象者映像が複数ある場合には、各対象者映像から算出された対象者３の方向の交点を算出することにより、３次元空間中での対象者３の位置を特定することができる。 The target person position identification unit 45 receives information on the target person image and the position of the target person 3 from the target person image detection unit 41. The target person position specifying unit 45 functions as a position specifying unit, and includes information indicating the position of the camera 2 added to the target person image and the direction of the optical axis of the camera 2, and the position of the target person 3 in the target person image. Therefore, the position of the target person 3 in the three-dimensional space is specified. The target person position specifying unit 45 can specify the optical axis direction of the camera 2 in the three-dimensional space from the information indicating the position of the camera 2 and the direction of the optical axis of the camera 2 added to the target person image. .. Further, since the position of the target person 3 in the target person image is known, the target person position specifying unit 45 indicates in which direction the target person 3 exists with respect to the position of the camera 2 in the three-dimensional space. Can be calculated. When there are a plurality of target person images, the target person position specifying unit 45 calculates the intersection of the directions of the target person 3 calculated from each target person image, and thereby, the position of the target person 3 in the three-dimensional space. Can be identified.

なお、対象者位置特定部４５は、対象者映像が１枚であっても、カメラ２から対象者３までの距離をあらかじめ定めた所定距離とみなすことにより、３次元空間中での対象者３の位置を特定することができる。 The target person position specifying unit 45 considers the distance from the camera 2 to the target person 3 as a predetermined predetermined distance even if there is only one target person image, so that the target person 3 in the three-dimensional space The position of can be specified.

対象者位置特定部４５は、特定した３次元空間中での対象者３の位置の情報を周囲映像検出部４６に出力する。 The target person position specifying unit 45 outputs information on the position of the target person 3 in the specified three-dimensional space to the surrounding image detection unit 46.

周囲映像検出部４６は、映像取得部１０から複数のカメラ２で撮影された映像を受け、対象者位置特定部４５から３次元空間中での対象者３の位置の情報を受ける。周囲映像検出部４６は、３次元空間中での対象者３の位置に基づいて、複数の映像の中から対象者３の周囲を撮影した映像（以下、「周囲映像」という）を検出する。例えば、周囲映像検出部４６は、対象者３の位置とカメラ２の位置とに基づいて、対象者３までの距離が所定の距離閾値以下のカメラ２によって撮影された映像を周囲映像として検出する。周囲映像検出部４６は、検出した周囲映像を周囲映像選択部４７に出力する。 The surrounding image detection unit 46 receives images taken by a plurality of cameras 2 from the image acquisition unit 10, and receives information on the position of the target person 3 in the three-dimensional space from the target person position identification unit 45. The surrounding image detection unit 46 detects an image (hereinafter, referred to as “surrounding image”) obtained by photographing the surroundings of the object 3 from a plurality of images based on the position of the object 3 in the three-dimensional space. For example, the ambient image detection unit 46 detects an image captured by the camera 2 whose distance to the object 3 is equal to or less than a predetermined distance threshold value as an ambient image based on the position of the target person 3 and the position of the camera 2. .. The ambient image detection unit 46 outputs the detected ambient image to the ambient image selection unit 47.

周囲映像選択部４７は、周囲映像検出部４６から周囲映像を受け、対象者向き検出部４２から対象者３の向きの情報を受ける。周囲映像選択部４７は、映像生成部として機能し、モードが視点共有モードの場合に、周囲映像の中から、対象者３の位置から対象者３の向きに対象者３の周囲を見た映像を、対象者３に関連する映像として選択する。つまり、周囲映像選択部４７は、周囲映像ごとに３次元空間中でのカメラ２の光軸の向きと対象者３の向きとの差分を算出する。周囲映像選択部４７は、差分が最小の対象者映像を対象者３に関連する映像として選択する。周囲映像選択部４７は、選択した対象者３に関連する映像をプライバシー保護処理部４４に出力する。 The peripheral image selection unit 47 receives the ambient image from the ambient image detection unit 46, and receives the information on the orientation of the target person 3 from the target person orientation detection unit 42. The surrounding image selection unit 47 functions as an image generation unit, and when the mode is the viewpoint sharing mode, an image of the surroundings of the target person 3 viewed from the position of the target person 3 in the direction of the target person 3 from the surrounding images. Is selected as the video related to the subject 3. That is, the surrounding image selection unit 47 calculates the difference between the direction of the optical axis of the camera 2 and the direction of the target person 3 in the three-dimensional space for each surrounding image. The surrounding image selection unit 47 selects the target person image having the smallest difference as the image related to the target person 3. The surrounding image selection unit 47 outputs the image related to the selected target person 3 to the privacy protection processing unit 44.

図５から図７は、周囲映像選択部４７による周囲映像の選択処理の一例を説明するための図である。 5 to 7 are diagrams for explaining an example of the ambient image selection process by the ambient image selection unit 47.

図５は、３次元空間中に位置する対象物を示す図である。例えば、３次元空間中に円錐８１及び球８２が配置されているものとする。矢印７０は、対象者３の向きを示しており、対象者３は、矢印７０の方向に円錐８１及び球８２を見ているものとする。また、矢印７１から矢印７３は、周囲映像を撮影する３台のカメラ２の光軸の向きを示しており、３台のカメラ２は、矢印７１から矢印７３のいずれかの方向に円錐８１及び球８２を撮影しているものとする。 FIG. 5 is a diagram showing an object located in a three-dimensional space. For example, it is assumed that the cone 81 and the sphere 82 are arranged in the three-dimensional space. The arrow 70 indicates the direction of the subject 3, and it is assumed that the subject 3 is looking at the cone 81 and the sphere 82 in the direction of the arrow 70. Further, arrows 71 to 73 indicate the directions of the optical axes of the three cameras 2 that capture the surrounding image, and the three cameras 2 have the cone 81 and the cone 81 in any direction from the arrow 71 to the arrow 73. It is assumed that the ball 82 is photographed.

図６は、対象者３が見ている映像の一例を示す図である。対象者３は、左側に配置された円錐８１と、円錐８１との間に空間を開けて右側に配置された球８２とを見ている。 FIG. 6 is a diagram showing an example of a video viewed by the subject 3. Subject 3 is looking at the cone 81 arranged on the left side and the sphere 82 arranged on the right side with a space between the cone 81.

図７の（Ａ）は、光軸の向きが矢印７１のカメラ２により撮影された映像を示しており、当該映像において球８２は円錐８１の背面に位置している。図７の（Ｂ）は、光軸の向きが矢印７２のカメラ２により撮影された映像を示しており、当該映像において、円錐８１と球８２とが空間を開けて配置されている。図７の（Ｃ）は、光軸の向きが矢印７３のカメラ２により撮影された映像を示しており、当該映像において円錐８１は球８２の背面に位置している。 FIG. 7A shows an image taken by the camera 2 whose optical axis direction is arrow 71, and the sphere 82 is located on the back surface of the cone 81 in the image. FIG. 7B shows an image taken by the camera 2 whose optical axis direction is arrow 72, and in the image, the cone 81 and the sphere 82 are arranged with a space open. FIG. 7C shows an image taken by the camera 2 whose optical axis direction is arrow 73, and the cone 81 is located on the back surface of the sphere 82 in the image.

図５に示すように対象者３の向き（矢印７０）に最も近い光軸の向きは、矢印７２である。このため、周囲映像選択部４７は、図７の（Ｂ）に示す周囲映像を、対象者３に関連する映像として選択する。図７の（Ｄ）に対象者３に関連する映像を示している。 As shown in FIG. 5, the direction of the optical axis closest to the direction of the subject 3 (arrow 70) is arrow 72. Therefore, the ambient image selection unit 47 selects the ambient image shown in FIG. 7 (B) as the image related to the target person 3. FIG. 7 (D) shows an image related to the subject 3.

プライバシー保護処理部４４は、映像生成部として機能する。プライバシー保護処理部４４は、対象者映像検出部４１から対象者映像中での対象者３の位置の情報を受け、対象者映像選択部４３から対象者３に関連する映像として対象者映像を受け、周囲映像選択部４７から対象者３に関連する映像として周囲映像を受ける。 The privacy protection processing unit 44 functions as a video generation unit. The privacy protection processing unit 44 receives information on the position of the target person 3 in the target person video from the target person video detection unit 41, and receives the target person video as a video related to the target person 3 from the target person video selection unit 43. , The surrounding image is received from the surrounding image selection unit 47 as an image related to the target person 3.

プライバシー保護処理部４４は、モードが自撮りモードの場合には、対象者映像選択部４３から受けた対象者映像の中から、人物の像を検出する。人物の像の検出には、例えば、映像から顔画像を検出する顔画像検出技術を用いることができる。 When the mode is the self-portrait mode, the privacy protection processing unit 44 detects an image of a person from the target person image received from the target person image selection unit 43. For the detection of the image of a person, for example, a face image detection technique for detecting a face image from a video can be used.

プライバシー保護処理部４４は、対象者３の位置に存在する人物の像、つまり対象者３の像は残し、それ以外の人物の像に対してモザイクを掛けるモザイク処理や、所定の映像で当該像をマスクするマスク処理等のプライバシー保護処理を施す。これにより、対象者３以外の人物を特定しにくくする。 The privacy protection processing unit 44 leaves the image of the person existing at the position of the target person 3, that is, the image of the target person 3, and applies a mosaic to the images of other people, or the image in a predetermined image. Perform privacy protection processing such as mask processing. This makes it difficult to identify a person other than the target person 3.

プライバシー保護処理部４４は、モードが視点共有モードの場合には、周囲映像選択部４７から受けた周囲映像の中から、人物の像を検出する。プライバシー保護処理部４４は、検出した人物の像に対してプライバシー保護処理を施す。これにより、対象者３以外の人物を特定しにくくする。 When the mode is the viewpoint sharing mode, the privacy protection processing unit 44 detects an image of a person from the surrounding images received from the surrounding image selection unit 47. The privacy protection processing unit 44 performs privacy protection processing on the detected image of the person. This makes it difficult to identify a person other than the target person 3.

プライバシー保護処理部４４は、プライバシー保護処理が施された対象者３に関連する映像を映像送信部５０に出力する。 The privacy protection processing unit 44 outputs a video related to the subject 3 to which the privacy protection processing has been performed to the video transmission unit 50.

映像送信部５０は、プライバシー保護処理部４４から対象者３に関連する映像を受け、当該映像を端末装置５に送信する。 The video transmission unit 50 receives the video related to the target person 3 from the privacy protection processing unit 44 and transmits the video to the terminal device 5.

〔映像処理装置１の処理手順〕
図８は、映像処理装置１の処理手順の一例を示すフローチャートである。 [Processing procedure of video processing device 1]
FIG. 8 is a flowchart showing an example of the processing procedure of the video processing device 1.

対象者情報取得部２０は、対象者３の所持する端末装置から、対象者情報として対象者３の顔画像を取得する（ステップＳ１）。 The target person information acquisition unit 20 acquires the face image of the target person 3 as the target person information from the terminal device possessed by the target person 3 (step S1).

モード受付部３０は、対象者３の所持する端末装置から対象者３が選択したモードを受け付ける（ステップＳ２）。 The mode reception unit 30 receives the mode selected by the target person 3 from the terminal device possessed by the target person 3 (step S2).

映像処理部４０は、受け付けたモードが自撮りモードの場合には（ステップＳ３においてＹＥＳ）、自撮りモード映像処理を実行する（ステップＳ４）。自撮りモード映像処理（ステップＳ４）の詳細については後述する。 When the received mode is the self-shooting mode (YES in step S3), the video processing unit 40 executes the self-shooting mode video processing (step S4). The details of the self-shooting mode video processing (step S4) will be described later.

映像処理部４０は、受け付けたモードが視点共有モードの場合には（ステップＳ３においてＮＯ、ステップＳ５においてＹＥＳ）、視点共有モード映像処理を実行する（ステップＳ６）。視点共有モード映像処理（ステップＳ６）については後述する。 When the received mode is the viewpoint sharing mode (NO in step S3, YES in step S5), the video processing unit 40 executes the viewpoint sharing mode video processing (step S6). The viewpoint sharing mode video processing (step S6) will be described later.

自撮りモード及び視点共有モード以外の誤ったモードを受け付けた場合、又は、モードを受け付けていない場合には（ステップＳ３においてＮＯ、ステップＳ５においてＮＯ）、映像処理装置１は、処理を終了する。 If an erroneous mode other than the self-shooting mode and the viewpoint sharing mode is accepted, or if the mode is not accepted (NO in step S3, NO in step S5), the image processing apparatus 1 ends the process.

〔自撮りモード映像処理について〕
図９は、自撮りモード映像処理（図８のステップＳ４）の詳細を示すフローチャートである。 [Selfie mode video processing]
FIG. 9 is a flowchart showing details of the self-shooting mode video processing (step S4 of FIG. 8).

映像取得部１０は、複数のカメラ２から、当該カメラ２で撮影された映像を取得する（ステップＳ１１）。 The image acquisition unit 10 acquires images captured by the cameras 2 from the plurality of cameras 2 (step S11).

対象者映像検出部４１は、対象者情報取得処理（図８のステップＳ１）で取得された対象者情報としての対象者３の顔画像に基づいて、ステップＳ１１で取得された複数の映像の中から、対象者３が映っている対象者映像を検出する（ステップＳ１２）。 The target person image detection unit 41 is among a plurality of images acquired in step S11 based on the face image of the target person 3 as the target person information acquired in the target person information acquisition process (step S1 in FIG. 8). Therefore, the target person image showing the target person 3 is detected (step S12).

映像処理部４０は、ステップＳ１２において検出された対象者映像の数を判定する（ステップＳ１３）。 The image processing unit 40 determines the number of target person images detected in step S12 (step S13).

対象者映像の数が０の場合には（ステップＳ１３において０）、ステップＳ１４以降の処理は実行されない。 When the number of target person images is 0 (0 in step S13), the processing after step S14 is not executed.

対象者映像の数が２以上の場合には（ステップＳ１３において２以上）、対象者向き検出部４２は、ステップＳ１２において検出された対象者映像ごとに、対象者３の向きを検出する（ステップＳ１４）。 When the number of target person images is 2 or more (2 or more in step S13), the target person orientation detection unit 42 detects the orientation of the target person 3 for each target person image detected in step S12 (step S12). S14).

対象者映像選択部４３は、ステップＳ１４において検出された対象者３の向きごとに、対象者３の向きとカメラ２の光軸の向きとの差分を算出する（ステップＳ１５）。 The target person image selection unit 43 calculates the difference between the direction of the target person 3 and the direction of the optical axis of the camera 2 for each direction of the target person 3 detected in step S14 (step S15).

対象者映像選択部４３は、ステップＳ１２において検出された対象者映像の内、ステップＳ１５において算出された差分が最小となる対象者映像を選択する（ステップＳ１６）。 The target person image selection unit 43 selects the target person image having the smallest difference calculated in step S15 from the target person images detected in step S12 (step S16).

プライバシー保護処理部４４は、ステップＳ１６において選択された対象者映像に対し、対象者３以外の人物の像にプライバシー保護処理を施す（ステップＳ１７）。 The privacy protection processing unit 44 performs privacy protection processing on the image of a person other than the target person 3 with respect to the target person video selected in step S16 (step S17).

映像送信部５０は、ステップＳ１７においてプライバシー保護処理が施された対象者映像を、ユーザ４の利用する端末装置５に送信する（ステップＳ１８）。 The video transmission unit 50 transmits the target person video to which the privacy protection processing has been performed in step S17 to the terminal device 5 used by the user 4 (step S18).

ステップＳ１２において検出された対象者映像の数が１の場合には（ステップＳ１３において２以上）、当該対象者映像に対してプライバシー保護処理（ステップＳ１７）及び映像送信処理（ステップＳ１８）が行われる。 When the number of target person videos detected in step S12 is 1 (2 or more in step S13), privacy protection processing (step S17) and video transmission processing (step S18) are performed on the target person video. ..

映像処理部４０は、ステップＳ１１からステップＳ１８までの処理を対象者３とユーザ４との間の通話が終了するまで繰り返し実行する。 The video processing unit 40 repeatedly executes the processes from step S11 to step S18 until the call between the target person 3 and the user 4 is completed.

〔視点共有モード映像処理について〕
図１０は、視点共有モード映像処理（図８のステップＳ６）の詳細を示すフローチャートである。 [Viewpoint sharing mode video processing]
FIG. 10 is a flowchart showing details of the viewpoint sharing mode video processing (step S6 of FIG. 8).

映像取得部１０は、複数のカメラ２から、当該カメラ２で撮影された映像を取得する（ステップＳ２１）。 The image acquisition unit 10 acquires images captured by the cameras 2 from the plurality of cameras 2 (step S21).

対象者映像検出部４１は、対象者情報取得処理（図８のステップＳ１）で取得された対象者情報としての対象者３の顔画像に基づいて、ステップＳ２１で取得された複数の映像の中から、対象者３が映っている対象者映像と、対象者映像中の対象者３の位置とを検出する（ステップＳ２２）。 The target person image detection unit 41 is among a plurality of images acquired in step S21 based on the face image of the target person 3 as the target person information acquired in the target person information acquisition process (step S1 in FIG. 8). From, the target person image in which the target person 3 is shown and the position of the target person 3 in the target person image are detected (step S22).

映像処理部４０は、ステップＳ２２において対象者映像が検出されたか否かを判定する（ステップＳ２３）。 The image processing unit 40 determines whether or not the target person image is detected in step S22 (step S23).

対象者映像が検出されなかった場合には（ステップＳ２３においてＮＯ）、ステップＳ２４以降の処理は実行されない。 If the subject image is not detected (NO in step S23), the processing after step S24 is not executed.

対象者映像が検出された場合には（ステップＳ２３においてＹＥＳ）、対象者向き検出部４２は、ステップＳ２２において検出された対象者映像ごとに、対象者３の向きを検出する（ステップＳ２４）。 When the target person image is detected (YES in step S23), the target person orientation detection unit 42 detects the orientation of the target person 3 for each target person image detected in step S22 (step S24).

対象者位置特定部４５は、ステップＳ２２において検出された対象者映像及び対象者３の位置の情報に基づいて、３次元空間中での対象者３の位置を特定する（ステップＳ２５）。 The target person position specifying unit 45 identifies the position of the target person 3 in the three-dimensional space based on the target person image detected in step S22 and the position information of the target person 3 (step S25).

周囲映像検出部４６は、ステップＳ２５において特定された３次元空間中での対象者３の位置に基づいて、ステップＳ２１において取得された映像の中から、対象者３の周囲を撮影した周囲映像を検出する（ステップＳ２６）。 The surrounding image detection unit 46 captures the surrounding image of the target person 3 from the images acquired in the step S21 based on the position of the target person 3 in the three-dimensional space specified in the step S25. Detect (step S26).

映像処理部４０は、ステップＳ２６において検出された周囲映像の数を判定する（ステップＳ２７）。 The image processing unit 40 determines the number of ambient images detected in step S26 (step S27).

周囲映像の数が０の場合には（ステップＳ２７において０）、ステップＳ２８以降の処理は実行されない。 When the number of surrounding images is 0 (0 in step S27), the processing after step S28 is not executed.

周囲映像の数が２以上の場合には（ステップＳ２７において２以上）、周囲映像選択部４７は、ステップＳ２６において検出された周囲映像ごとに、３次元空間中でのカメラ２の光軸の向きと対象者３の向きとの差分を算出する（ステップＳ２８）。 When the number of ambient images is 2 or more (2 or more in step S27), the ambient image selection unit 47 directs the optical axis of the camera 2 in the three-dimensional space for each ambient image detected in step S26. The difference between the orientation of the subject 3 and the orientation of the subject 3 is calculated (step S28).

周囲映像選択部４７は、ステップＳ２６において検出された周囲映像の内、ステップＳ２８において算出された差分が最小となる周囲映像を選択する（ステップＳ２９）。 The ambient image selection unit 47 selects the ambient image having the smallest difference calculated in step S28 from the ambient images detected in step S26 (step S29).

プライバシー保護処理部４４は、ステップＳ２９において選択された周囲映像に対し、人物の像にプライバシー保護処理を施す（ステップＳ３０）。 The privacy protection processing unit 44 applies privacy protection processing to the image of a person with respect to the surrounding image selected in step S29 (step S30).

映像送信部５０は、ステップＳ３０においてプライバシー保護処理が施された周囲映像を、ユーザ４の利用する端末装置５に送信する（ステップＳ３１）。 The video transmission unit 50 transmits the surrounding video to which the privacy protection processing has been performed in step S30 to the terminal device 5 used by the user 4 (step S31).

ステップＳ２６において検出された周囲映像の数が１の場合には（ステップＳ２７において１）、当該周囲映像に対してプライバシー保護処理（ステップＳ３０）及び映像送信処理（ステップＳ３１）が行われる。 When the number of ambient images detected in step S26 is 1 (1 in step S27), privacy protection processing (step S30) and video transmission processing (step S31) are performed on the ambient video.

映像処理部４０は、ステップＳ２１からステップＳ３１までの処理を対象者３とユーザ４との間の通話が終了するまで繰り返し実行する。 The video processing unit 40 repeatedly executes the processes from step S21 to step S31 until the call between the target person 3 and the user 4 is completed.

〔実施形態１の効果〕
以上説明したように、本開示の実施形態１によると、映像処理装置１は、ネットワーク７を介して複数のカメラ２から取得された複数の映像から、対象者３の向き及び位置に基づいて、対象者３に関連する映像を生成することができる。このため、対象者３がカメラ２が配置された専用端末装置を装着せずとも、対象者３に関連する映像を生成することができる。これにより、対象者３に過度な負担を強いることなく、対象者３に関連する映像を生成することができる。 [Effect of Embodiment 1]
As described above, according to the first embodiment of the present disclosure, the image processing device 1 is based on the orientation and position of the target person 3 from a plurality of images acquired from a plurality of cameras 2 via the network 7. An image related to the target person 3 can be generated. Therefore, even if the target person 3 does not wear the dedicated terminal device on which the camera 2 is arranged, the image related to the target person 3 can be generated. As a result, it is possible to generate an image related to the target person 3 without imposing an excessive burden on the target person 3.

また、映像処理装置１は、専用のカメラ２を用いなくても、対象者３の周囲に存在するカメラ２を用いて、対象者３に関連する映像を生成することができる。つまり、対象者３がカメラ２のバッテリー残量や発熱等を気にする必要なく、対象者３に関連する映像を生成することができる。 Further, the image processing device 1 can generate an image related to the target person 3 by using the cameras 2 existing around the target person 3 without using the dedicated camera 2. That is, the target person 3 can generate the image related to the target person 3 without having to worry about the remaining battery level of the camera 2, the heat generation, and the like.

また、モードが自撮りモードの場合には、対象者３に対向する位置から対象者３を撮影した、いわゆる自撮り映像を生成することができる。 Further, when the mode is the self-shooting mode, it is possible to generate a so-called self-shooting image in which the target person 3 is photographed from a position facing the target person 3.

また、対象者３の像を含む映像の中から、対象者３の向きに最も近い向きから対象者３を撮影した映像を、いわゆる自撮り映像として生成することができる。このため、高速に自撮り映像を生成することができる。 Further, from the images including the image of the target person 3, an image obtained by shooting the target person 3 from the direction closest to the direction of the target person 3 can be generated as a so-called self-portrait image. Therefore, it is possible to generate a self-portrait image at high speed.

また、モードが視点共有モードの場合には、例えば、対象者３の視線の先にある対象の映像、つまり、対象者３が見ているのと同じ対象の映像を生成することができる。 Further, when the mode is the viewpoint sharing mode, for example, it is possible to generate an image of the target in front of the line of sight of the target person 3, that is, an image of the same target as the target person 3 is watching.

また、複数の映像から、例えば、対象者３の視線の先にある対象の映像を選択することができる。このため、対象者３から周囲を見た映像を高速に生成することができる。 Further, from a plurality of images, for example, an image of the target located in front of the line of sight of the target person 3 can be selected. Therefore, it is possible to generate an image of the surroundings viewed from the subject 3 at high speed.

また、対象者３の向きは、対象者３の視線の向き、対象者３の顔の向き、及び対象者３の体の向きの少なくとも１つを含む。このため、対象者３の視線の向き、顔の向き又は体の向きから対象者３を見た映像、又は対象者３の位置から当該向きに周囲を見た映像を生成することができる。 The orientation of the subject 3 includes at least one of the orientation of the subject 3's line of sight, the orientation of the subject 3's face, and the orientation of the subject 3's body. Therefore, it is possible to generate an image of the target person 3 viewed from the direction of the line of sight, the direction of the face, or the direction of the body of the target person 3, or an image of the surroundings viewed from the position of the target person 3 in that direction.

また、対象者３以外の人物の像にプライバシー保護処理が施される。このため、当該人物のプライバシーを保護した映像を生成することができる。 In addition, privacy protection processing is applied to the image of a person other than the target person 3. Therefore, it is possible to generate an image in which the privacy of the person concerned is protected.

また、映像処理装置１は、生成した映像を端末装置５に送信することができる。このため、対象者３はカメラ２を意識することなく、端末装置５を利用するユーザ４とビデオ通話をしたり、テレビ会議をしたりすることができる。 Further, the video processing device 1 can transmit the generated video to the terminal device 5. Therefore, the target person 3 can make a video call or have a video conference with the user 4 who uses the terminal device 5 without being aware of the camera 2.

＜実施形態２＞
実施形態１では、カメラ２により撮影された映像の中から１つの映像を選択することにより、対象者３に関連する映像を生成した。実施形態２では、カメラ２により撮影された複数の映像を合成することにより、対象者３に関連する映像を生成する例について説明する。 <Embodiment 2>
In the first embodiment, an image related to the target person 3 is generated by selecting one image from the images taken by the camera 2. In the second embodiment, an example of generating an image related to the target person 3 by synthesizing a plurality of images taken by the camera 2 will be described.

本開示の実施形態２に係る映像処理システムの全体構成は、図１に示したものと同様である。 The overall configuration of the video processing system according to the second embodiment of the present disclosure is the same as that shown in FIG.

また、本開示の実施形態２に係る映像処理装置１の機能的な構成は、図２に示したものと同様である。ただし、映像処理部４０の詳細な構成が実施形態１とは異なる。 Further, the functional configuration of the video processing apparatus 1 according to the second embodiment of the present disclosure is the same as that shown in FIG. However, the detailed configuration of the video processing unit 40 is different from that of the first embodiment.

〔映像処理部４０の構成〕
図１１は、図２に示した映像処理装置１が備える映像処理部４０の詳細な構成を示すブロック図である。 [Structure of video processing unit 40]
FIG. 11 is a block diagram showing a detailed configuration of the image processing unit 40 included in the image processing device 1 shown in FIG.

映像処理部４０は、対象者映像検出部４１と、対象者向き検出部４２と、自撮り映像生成部４８と、プライバシー保護処理部４４と、対象者位置特定部４５と、周囲映像検出部４６と、対象者視点映像生成部４９とを備える。 The image processing unit 40 includes a target person image detection unit 41, a target person orientation detection unit 42, a self-portrait video generation unit 48, a privacy protection processing unit 44, a target person position identification unit 45, and a surrounding image detection unit 46. And a target person's viewpoint image generation unit 49.

対象者映像検出部４１、対象者向き検出部４２、対象者位置特定部４５及び周囲映像検出部４６の実行する処理は実施形態１と同様である。 The processes executed by the target person image detection unit 41, the target person orientation detection unit 42, the target person position identification unit 45, and the surrounding image detection unit 46 are the same as those in the first embodiment.

ただし、対象者映像検出部４１は、検出した対象者映像を自撮り映像生成部４８に出力する。また、対象者向き検出部４２は、対象者３の向きの情報を自撮り映像生成部４８及び対象者視点映像生成部４９に出力する。また、周囲映像検出部４６は、検出した周囲映像を対象者視点映像生成部４９に出力する。 However, the target person image detection unit 41 outputs the detected target person image to the self-portrait image generation unit 48. Further, the target person orientation detection unit 42 outputs the orientation information of the target person 3 to the self-portrait image generation unit 48 and the target person viewpoint image generation unit 49. Further, the ambient image detection unit 46 outputs the detected ambient image to the target person viewpoint image generation unit 49.

自撮り映像生成部４８は、対象者映像検出部４１から対象者映像を受け、対象者向き検出部４２から対象者３の向きの情報を受ける。自撮り映像生成部４８は、映像生成部として機能し、モードが自撮りモードの場合に、対象者映像を合成することにより、対象者３と対向する位置から対象者３の向きに対象者３を撮影した映像を生成する。当該映像は、いわゆる自撮り映像と呼ばれるものである。つまり、自撮り映像生成部４８は、複数の対象者映像に基づいて視点変換処理を行うことにより、自撮り映像を内挿補間により合成する。視点変換処理については、例えば、非特許文献３に開示されている。自撮り映像生成部４８は、生成した自撮り映像を、対象者３に関連する映像としてプライバシー保護処理部４４に出力する。 The self-portrait image generation unit 48 receives the target person image from the target person image detection unit 41, and receives the information on the orientation of the target person 3 from the target person orientation detection unit 42. The self-portrait video generation unit 48 functions as a video generation unit, and when the mode is the self-portrait mode, the self-portrait image generation unit 48 synthesizes the target person video so that the target person 3 faces the target person 3 from a position facing the target person 3. Generates a video of a selfie. The video is a so-called self-portrait video. That is, the self-portrait image generation unit 48 synthesizes the self-portrait image by interpolation interpolation by performing the viewpoint conversion process based on the plurality of target person images. The viewpoint conversion process is disclosed in, for example, Non-Patent Document 3. The self-portrait image generation unit 48 outputs the generated self-portrait image to the privacy protection processing unit 44 as an image related to the target person 3.

図１２は、自撮り映像生成部４８による自撮り映像の生成処理の一例を説明するための図である。図１２の（Ａ）〜（Ｃ）は、対象者映像検出部４１から受けた対象者映像を示している。ここで、対象者３の向きを、例えば、対象者３の顔の向きとした場合には、自撮り映像生成部４８は、図１２の（Ａ）〜（Ｃ）に示す３枚の対象者映像から、対象者３を正面から撮影した映像を内挿補間により合成することにより、自撮り映像を生成する。図１２の（Ｄ）は、生成された自撮り映像を示している。 FIG. 12 is a diagram for explaining an example of a self-portrait image generation process by the self-portrait image generation unit 48. (A) to (C) of FIG. 12 show the target person image received from the target person image detection unit 41. Here, when the orientation of the subject 3 is, for example, the orientation of the face of the subject 3, the self-portrait video generation unit 48 has three subjects shown in FIGS. 12A to 12C. A self-portrait video is generated by synthesizing a video shot of the subject 3 from the front by interpolation interpolation from the video. FIG. 12D shows the generated self-portrait image.

対象者視点映像生成部４９は、周囲映像検出部４６から周囲映像を受け、対象者向き検出部４２から対象者３の向きの情報を受ける。対象者視点映像生成部４９は、映像生成部として機能し、モードが視点共有モードの場合に、周囲映像を合成することにより、対象者３の位置から対象者３の向きに対象者３の周囲を見た映像（以下、「対象者視点映像」という）を生成する。つまり、対象者視点映像生成部４９は、複数の周囲映像に基づいて視点変換処理を行うことにより、対象者視点映像を内挿補間により合成する。視点変換処理については、例えば、非特許文献３に開示されている。対象者視点映像生成部４９は、生成した対象者視点映像を、対象者３に関連する映像としてプライバシー保護処理部４４に出力する。 The target person viewpoint image generation unit 49 receives the surrounding image from the surrounding image detection unit 46, and receives the information on the orientation of the target person 3 from the target person orientation detection unit 42. The target person viewpoint image generation unit 49 functions as an image generation unit, and when the mode is the viewpoint sharing mode, the surroundings of the target person 3 are oriented from the position of the target person 3 to the direction of the target person 3 by synthesizing the surrounding images. Generates a video of what you see (hereinafter referred to as "target person's viewpoint video"). That is, the target person viewpoint image generation unit 49 synthesizes the target person viewpoint image by interpolation interpolation by performing the viewpoint conversion process based on the plurality of surrounding images. The viewpoint conversion process is disclosed in, for example, Non-Patent Document 3. The target person viewpoint image generation unit 49 outputs the generated target person viewpoint image to the privacy protection processing unit 44 as an image related to the target person 3.

図１３は、対象者視点映像生成部４９による対象者視点映像の生成処理の一例を説明するための図である。図１３の（Ａ）〜（Ｃ）は、図７の（Ａ）〜（Ｃ）に示したのと同じ周囲映像をそれぞれ示している。つまり、図５を参照して、図１３の（Ａ）は、光軸の向きが矢印７１のカメラ２により撮影された周囲映像を示しており、図１３の（Ｂ）は、光軸の向きが矢印７２のカメラ２により撮影された周囲映像を示しており、図１３の（Ｃ）は、光軸の向きが矢印７３のカメラ２により撮影された周囲映像を示している。自撮り映像生成部４８は、図１３の（Ａ）〜（Ｃ）に示す３枚の周囲映像から、対象者視点映像を内挿補間により合成する。図１３の（Ｄ）は、生成された対象者視点映像を示している。対象者視点映像は、図６に示した対象者３が見ている映像と同様の映像である。 FIG. 13 is a diagram for explaining an example of the generation process of the target person's viewpoint image by the target person's viewpoint image generation unit 49. (A) to (C) of FIG. 13 show the same surrounding images as shown in (A) to (C) of FIG. 7, respectively. That is, with reference to FIG. 5, FIG. 13A shows an ambient image taken by the camera 2 whose optical axis direction is arrow 71, and FIG. 13B shows the direction of the optical axis. Shows the ambient image taken by the camera 2 of the arrow 72, and FIG. 13C shows the ambient image taken by the camera 2 whose optical axis direction is the arrow 73. The self-portrait image generation unit 48 synthesizes the target person's viewpoint image by interpolation interpolation from the three surrounding images shown in FIGS. 13 (A) to 13 (C). FIG. 13 (D) shows the generated target person's viewpoint image. The subject viewpoint image is an image similar to the image seen by the subject 3 shown in FIG.

プライバシー保護処理部４４は、対象者映像検出部４１から対象者映像中での対象者３の位置の情報を受け、自撮り映像生成部４８から対象者３に関連する映像として自撮り映像を受け、対象者視点映像生成部４９から対象者３に関連する映像として対象者視点映像を受ける。 The privacy protection processing unit 44 receives information on the position of the target person 3 in the target person video from the target person video detection unit 41, and receives the self-portrait video as a video related to the target person 3 from the self-portrait video generation unit 48. , The target person's viewpoint image is received from the target person's viewpoint image generation unit 49 as an image related to the target person 3.

プライバシー保護処理部４４は、モードが自撮りモードの場合には、自撮り映像生成部４８から受けた自撮り映像の中から、人物の像を検出する。プライバシー保護処理部４４は、対象者３の位置に存在する人物の像、つまり対象者３の像は残し、それ以外の人物の像に対してプライバシー保護処理を施す。これにより、対象者３以外の人物を特定しにくくする。 When the mode is the self-portrait mode, the privacy protection processing unit 44 detects an image of a person from the self-portrait image received from the self-portrait image generation unit 48. The privacy protection processing unit 44 leaves the image of the person existing at the position of the target person 3, that is, the image of the target person 3, and performs the privacy protection processing on the images of other people. This makes it difficult to identify a person other than the target person 3.

プライバシー保護処理部４４は、モードが視点共有モードの場合には、対象者視点映像生成部４９から受けた対象者視点映像の中から、人物の像を検出する。プライバシー保護処理部４４は、検出した人物の像に対してプライバシー保護処理を施す。これにより、対象者３以外の人物を特定しにくくする。 When the mode is the viewpoint sharing mode, the privacy protection processing unit 44 detects an image of a person from the target person's viewpoint image received from the target person's viewpoint image generation unit 49. The privacy protection processing unit 44 performs privacy protection processing on the detected image of the person. This makes it difficult to identify a person other than the target person 3.

〔映像処理装置１の処理手順〕
映像処理装置１の処理手順は、図８に示した実施形態１に係る映像処理装置１の処理手順と同様である。ただし、自撮りモード映像処理（Ｓ４）及び視点共有モード映像処理（Ｓ６）の詳細が実施形態１とは異なる。 [Processing procedure of video processing device 1]
The processing procedure of the video processing device 1 is the same as the processing procedure of the video processing device 1 according to the first embodiment shown in FIG. However, the details of the self-shooting mode video processing (S4) and the viewpoint sharing mode video processing (S6) are different from those of the first embodiment.

〔自撮りモード映像処理について〕
図１４は、自撮りモード映像処理（図８のステップＳ４）の詳細を示すフローチャートである。 [Selfie mode video processing]
FIG. 14 is a flowchart showing details of the self-shooting mode video processing (step S4 of FIG. 8).

ステップＳ１１からステップＳ１４、ステップＳ１７及びステップＳ１８の処理は、図９に示したものと同様である。 The processes from step S11 to step S14, step S17 and step S18 are the same as those shown in FIG.

実施形態２では、図９に示したステップＳ１５及びステップＳ１６の処理の代わりに、ステップＳ１０１の処理が実行される。 In the second embodiment, the process of step S101 is executed instead of the processes of steps S15 and S16 shown in FIG.

つまり、自撮り映像生成部４８は、対象者映像を合成することにより、対象者３と対向する位置から対象者３の向きに対象者３を撮影した自撮り映像を生成する（ステップＳ１０１）。 That is, the self-portrait video generation unit 48 generates a self-portrait video in which the target person 3 is photographed in the direction of the target person 3 from a position facing the target person 3 by synthesizing the target person video (step S101).

なお、プライバシー保護処理（ステップＳ１７）においては、自撮り映像に対するプライバシー保護処理が行われ、映像送信処理（ステップＳ１８）においては、プライバシー保護処理が施された自撮り映像が端末装置５に送信される。 In the privacy protection process (step S17), the privacy protection process for the self-portrait video is performed, and in the video transmission process (step S18), the self-portrait video with the privacy protection process is transmitted to the terminal device 5. NS.

〔視点共有モード映像処理について〕
図１５は、視点共有モード映像処理（図８のステップＳ６）の詳細を示すフローチャートである。 [Viewpoint sharing mode video processing]
FIG. 15 is a flowchart showing details of the viewpoint sharing mode video processing (step S6 of FIG. 8).

ステップＳ２１からステップＳ２７、ステップＳ３０及びステップＳ３１の処理は、図１０に示したものと同様である。 The processes from step S21 to step S27, step S30 and step S31 are the same as those shown in FIG.

実施形態２では、図１０に示したステップＳ２８及びステップＳ２９の処理の代わりに、ステップＳ１０２の処理が実行される。 In the second embodiment, the process of step S102 is executed instead of the processes of steps S28 and S29 shown in FIG.

つまり、対象者視点映像生成部４９は、周囲映像を合成することにより、対象者視点映像を生成する（ステップＳ１０２）。 That is, the target person viewpoint image generation unit 49 generates the target person viewpoint image by synthesizing the surrounding images (step S102).

なお、プライバシー保護処理（ステップＳ３０）においては、対象者視点映像に対するプライバシー保護処理が行われ、映像送信処理（ステップＳ３１）においては、プライバシー保護処理が施された対象者視点映像が端末装置５に送信される。 In the privacy protection process (step S30), the privacy protection process for the target person's viewpoint video is performed, and in the video transmission process (step S31), the target person's viewpoint video to which the privacy protection process is applied is sent to the terminal device 5. Will be sent.

〔実施形態２の効果〕
以上説明したように、本開示の実施形態２によると、対象者３の像を含む映像を合成することにより、いわゆる自撮り映像を生成することができる。このため、対象者３の向きに対象者３を撮影した映像がカメラ２から得られない場合であっても、当該向きに対象を撮影した映像を生成することができる。 [Effect of Embodiment 2]
As described above, according to the second embodiment of the present disclosure, a so-called self-portrait video can be generated by synthesizing a video including an image of the subject 3. Therefore, even if the image obtained by shooting the target person 3 in the direction of the target person 3 cannot be obtained from the camera 2, it is possible to generate the image obtained by shooting the target person in the direction.

また、複数の周囲映像を合成することにより、例えば、対象者３の視線の先にある対象の映像を生成することができる。このため、１つの周囲映像からではこのような映像を生成することができない場合であっても、対象者３が見ているのと同じ対象の映像を生成することができる。 Further, by synthesizing a plurality of surrounding images, for example, it is possible to generate an image of an object in front of the line of sight of the object 3. Therefore, even if it is not possible to generate such an image from one ambient image, it is possible to generate the same target image as the target person 3 is viewing.

［付記］
以上、本開示の実施形態に係る映像処理装置１について説明したが、本開示は、この実施形態に限定されるものではない。 [Additional Notes]
Although the video processing apparatus 1 according to the embodiment of the present disclosure has been described above, the present disclosure is not limited to this embodiment.

例えば、上述の実施形態では、対象者３に関連する映像をユーザ４の利用する端末装置５に送信することとしたが、映像処理装置１は、ユーザ４に関連する映像を、対象者３に関連する映像と同様に生成し、対象者３の利用する端末装置に送信してもよい。これにより、対象者３とユーザ４との間で双方向のテレビ電話を実現することができる。 For example, in the above-described embodiment, the video related to the target person 3 is transmitted to the terminal device 5 used by the user 4, but the video processing device 1 transmits the video related to the user 4 to the target person 3. It may be generated in the same manner as the related video and transmitted to the terminal device used by the target person 3. As a result, a two-way videophone can be realized between the target person 3 and the user 4.

また、映像処理装置１の適用対象はテレビ電話に限定されるものではない。例えば、映像処理装置１をテレビ会議に適用することも可能である。例えば、第１地点の第１会議室に設置された複数のカメラ２（以下、「第１カメラ」という）で第１会議室内を撮影し、第２地点の第２会議室に設置された複数のカメラ２（以下、「第２カメラ」という）で第２会議室内を撮影することとする。映像処理装置１は、複数の第１カメラで撮影された映像から第１会議室にいるユーザに関連する映像を生成し、第２会議室に設置されたテレビ会議用の第２端末装置に送信する。第２端末装置は、映像を受信し、表示装置に表示することにより、第１会議室にいるユーザに関連する映像を、第２会議室にいるユーザに見せることができる。同様に、映像処理装置１は、複数の第２カメラで撮影された映像から第２会議室にいるユーザに関連する映像を生成し、第１会議室に設置されたテレビ会議用の第１端末装置に送信する。第１端末装置は、映像を受信し、表示装置に表示することにより、第２会議室にいるユーザに関連する映像を、第１会議室にいるユーザに見せることができる。 Further, the application target of the video processing device 1 is not limited to the videophone. For example, the video processing device 1 can be applied to a video conference. For example, a plurality of cameras 2 installed in the first conference room at the first point (hereinafter referred to as "first camera") photograph the first conference room, and a plurality of cameras installed in the second conference room at the second point. Camera 2 (hereinafter referred to as "second camera") will be used to photograph the second conference room. The video processing device 1 generates a video related to the user in the first conference room from the video captured by the plurality of first cameras and transmits the video to the second terminal device for video conferencing installed in the second conference room. do. By receiving the video and displaying it on the display device, the second terminal device can show the video related to the user in the first conference room to the user in the second conference room. Similarly, the video processing device 1 generates a video related to a user in the second conference room from the video captured by the plurality of second cameras, and is installed in the first conference room as a first terminal for video conferencing. Send to the device. By receiving the video and displaying it on the display device, the first terminal device can show the video related to the user in the second conference room to the user in the first conference room.

また、映像処理部４０の対象者位置特定部４５は、カメラ２の位置情報及び映像中の対象者３の位置等から対象者３の３次元空間中での位置を特定することとしたが、対象者３の位置がそれ以外の方法でわかる場合には、その位置情報を利用してもよい。例えば、対象者３が所持する端末装置にＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）センサの機能が備えられている場合には、対象者位置特定部４５は、端末装置で捕捉された対象者３の位置情報を端末装置から受信して、対象者３の３次元空間中での位置情報としてもよい。 Further, the target person position specifying unit 45 of the image processing unit 40 determines the position of the target person 3 in the three-dimensional space from the position information of the camera 2 and the position of the target person 3 in the video. If the position of the target person 3 can be known by any other method, the position information may be used. For example, when the terminal device possessed by the target person 3 is provided with the function of a GPS (Global Positioning System) sensor, the target person positioning unit 45 uses the position information of the target person 3 captured by the terminal device. It may be received from the terminal device and used as the position information of the target person 3 in the three-dimensional space.

また、映像処理部４０の対象者向き検出部４２は、対象者映像に基づいて対象者３の向きを検出することとしたが、対象者３の向きがそれ以外の方法でわかる場合には、その向き情報を利用してもよい。例えば、地磁気センサ（電子コンパス）、又は地磁気センサとジャイロセンサの組を備えるスマートフォンなどの端末装置を対象者３が衣服のポケット等に入れて身に着けている場合には、対象者向き検出部４２は、センサ出力に基づいて対象者の向きを検出し、検出結果を対象者の向き情報としてもよい。なお、端末装置は、対象者３に装着可能な、上記した地磁気センサ等の向きを検出するための専用のセンサであってもよい。また、端末装置は、上記した地磁気センサ等を備えるウェアラブルデバイス（例えば、スマートグラス）であってもよい。 Further, the target person orientation detection unit 42 of the image processing unit 40 has decided to detect the orientation of the target person 3 based on the target person image, but if the orientation of the target person 3 can be known by any other method, The orientation information may be used. For example, when the subject 3 wears a terminal device such as a geomagnetic sensor (electronic compass) or a smartphone equipped with a pair of a geomagnetic sensor and a gyro sensor in a pocket of clothes or the like, a detection unit suitable for the subject. 42 may detect the orientation of the target person based on the sensor output, and use the detection result as the orientation information of the target person. The terminal device may be a dedicated sensor that can be attached to the target person 3 to detect the orientation of the above-mentioned geomagnetic sensor or the like. Further, the terminal device may be a wearable device (for example, smart glasses) including the above-mentioned geomagnetic sensor or the like.

また、映像処理装置１を構成する構成要素の一部又は全部は、１又は複数のシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などの半導体装置から構成されていてもよい。 In addition, some or all of the components constituting the video processing device 1 may be one or more system LSIs (Large Scale Integration), ASICs (Application Specific Integrated Circuits), FPGA (Field-Programmable Gate Array), and the like. It may be composed of.

また、上記したコンピュータプログラムを、コンピュータ読取可能な非一時的な記録媒体、例えば、ＨＤＤ、ＣＤ−ＲＯＭ、半導体メモリなどに記録して流通させてもよい。 Further, the above-mentioned computer program may be recorded and distributed on a computer-readable non-temporary recording medium such as an HDD, a CD-ROM, or a semiconductor memory.

また、映像処理装置１は、複数のコンピュータにより実現されてもよい。 Further, the video processing device 1 may be realized by a plurality of computers.

また、映像処理装置１の一部又は全部の機能がクラウドコンピューティングによって提供されてもよい。つまり、映像処理装置１の一部又は全部の機能がクラウドサーバにより実現されていてもよい。 Further, a part or all of the functions of the video processing device 1 may be provided by cloud computing. That is, a part or all the functions of the video processing device 1 may be realized by the cloud server.

さらに、上記実施形態の少なくとも一部を任意に組み合わせてもよい。 Further, at least a part of the above embodiments may be arbitrarily combined.

今回開示された実施形態はすべての点で例示であって制限的なものではないと考えられるべきである。本開示の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time should be considered to be exemplary in all respects and not restrictive. The scope of the present disclosure is indicated by the scope of claims, not the above-mentioned meaning, and is intended to include all modifications within the meaning and scope equivalent to the scope of claims.

１映像処理装置
２カメラ
３対象者
４ユーザ
５端末装置
６ハンズフリーフォン
７ネットワーク
１０映像取得部
２０対象者情報取得部
３０モード受付部
４０映像処理部
４１対象者映像検出部
４２対象者向き検出部
４３対象者映像選択部
４４プライバシー保護処理部
４５対象者位置特定部
４６周囲映像検出部
４７周囲映像選択部
４８自撮り映像生成部
４９対象者視点映像生成部
５０映像送信部
７０矢印
７１矢印
７２矢印
７３矢印
８１円錐
８２球
１００映像処理システム
1 Video processing device 2 Camera 3 Target person 4 User 5 Terminal device 6 Hands-free phone 7 Network 10 Video acquisition unit 20 Target person information acquisition unit 30 Mode reception unit 40 Video processing unit 41 Target person video detection unit 42 Target person orientation detection unit 42 43 Target person image selection unit 44 Privacy protection processing unit 45 Target person position identification unit 46 Surrounding image detection unit 47 Surrounding image selection unit 48 Self-portrait image generation unit 49 Target person viewpoint image generation unit 50 Video transmission unit 70 Arrow 71 Arrow 72 Arrow 73 Arrow 81 Conical 82 Sphere 100 Video processing system

Claims

An acquisition unit that acquires multiple images taken by multiple cameras,
A direction detection unit that detects the direction of the target person,
The position specifying unit that specifies the position of the target person and
An image processing device including an image generation unit that generates an image related to the object from the plurality of images based on the detected orientation of the object and the identified position of the object.

The video processing apparatus according to claim 1, wherein the plurality of cameras include at least one of a camera installed outdoors, a camera installed indoors, and an in-vehicle camera.

The video processing device according to claim 1 or 2, wherein the orientation detection unit detects the orientation of the target person based on at least one of the acquired plurality of images.

Any one of claims 1 to 3, wherein the image generation unit generates an image of the target person in the direction of the target person from a position facing the target person as an image related to the target person. The video processing apparatus according to claim 1.

The video generator
The image processing apparatus according to claim 4, wherein an image taken from the direction closest to the direction of the target person is extracted from the plurality of images as an image related to the target person.

The video generator
An image including an image of the target person is detected from the plurality of images, and the image is detected.
The video processing according to claim 4, wherein a video including the detected image of the target person is synthesized to generate a video of the target person taken in the direction of the target person from a position facing the target person. Device.

Any one of claims 1 to 6, wherein the image generation unit generates an image of the surroundings of the target person in the direction of the target person from the position of the target person as an image related to the target person. The video processing apparatus according to claim 1.

The video processing device according to claim 7, wherein the video generation unit selects a video of the surroundings of the target person in the direction of the target person from the position of the target person from the plurality of video images.

The video processing apparatus according to claim 7, wherein the video generation unit generates a video of looking around the target person from the position of the target person in the direction of the target person by synthesizing the plurality of videos. ..

Any one of claims 1 to 9, wherein the orientation of the subject includes at least one of the orientation of the subject's line of sight, the orientation of the subject's face, and the orientation of the subject's body. The video processing apparatus described in 1.

The image generation unit further performs privacy protection processing on an image of a person other than the target person included in the video related to the target person, according to any one of claims 1 to 10. Video processing equipment.

Moreover,
The video processing device according to any one of claims 1 to 11, further comprising a video transmission unit that transmits a video related to the target person generated by the video generation unit to a terminal device.

With multiple cameras
An image processing system including the image processing apparatus according to any one of claims 1 to 12, which generates an image based on a plurality of images taken by the plurality of cameras.

Steps to acquire multiple images taken by multiple cameras,
Steps to detect the orientation of the subject and
The step of identifying the position of the target person and
An image processing method including a step of generating an image related to the object from the plurality of images based on the detected orientation of the object and the identified position of the object.

Computer,
An acquisition unit that acquires multiple images taken by multiple cameras,
A direction detection unit that detects the direction of the target person,
The position specifying unit that specifies the position of the target person and
A computer program for functioning as an image generation unit that generates an image related to the object from the plurality of images based on the detected orientation of the object and the identified position of the object.