JP2020086869A

JP2020086869A - Image generation device, image generation method, and computer program

Info

Publication number: JP2020086869A
Application number: JP2018219656A
Authority: JP
Inventors: 周平田良島; Shuhei Tarashima; 啓仁野村; Keiji Nomura; 和彦太田; Kazuhiko Ota
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2020-06-04
Anticipated expiration: 2038-11-22
Also published as: JP7199931B2

Abstract

To improve the recognition accuracy by generating a new image for a technique that uses an image to recognize a living thing or a robot.SOLUTION: An image generation device includes an image generation unit configured to newly generate an image of a specific subject based on parameters of an image generator obtained by performing learning processing using, as teacher data, a combination of each image of a sub-learning image group, which is a group of a plurality of images in which the specific subject that is a common subject satisfying a specific criterion is captured, and posture information of the specific subject in each image of the sub-learning image group. The image generation unit generates an image in which the specific subject has the posture indicated by the posture information by using the given posture information.SELECTED DRAWING: Figure 1

Description

本発明は、既存の画像群に類似した新規の画像を生成するための技術に関する。 The present invention relates to a technique for generating a new image similar to an existing image group.

画像を用いて人物認識を精度よく実現するためには、一般的には教師データとして大量の人物画像が必要となる。仮に特定の人物についての人物認識を実現しようとすると、その特定の人物について大量の教師データが必要となる。このように、人物認識を精度よく実現するためには、大量の教師データを取得することに膨大な労力を要していた。
このような問題に対し、近年では画像のデータオーグメンテーション技術が提案されている（例えば非特許文献１及び非特許文献２参照。）。画像のデータオーグメンテーションでは、既に取得されている教師データに基づいて、教師データに関連する新たな画像が生成される。このように新たな画像を生成することによって、教師データを増加させることが可能となる。 In order to accurately realize person recognition using images, a large amount of person images are generally required as teacher data. If it is attempted to realize person recognition for a specific person, a large amount of teacher data is required for the specific person. As described above, in order to realize person recognition with high accuracy, enormous effort was required to acquire a large amount of teacher data.
In response to such a problem, an image data augmentation technique has recently been proposed (see, for example, Non-Patent Document 1 and Non-Patent Document 2). In data augmentation of images, a new image associated with the teacher data is generated based on the teacher data that has already been acquired. By generating a new image in this way, it is possible to increase the teacher data.

A. G Howard, Some Improvements on Deep Convolutional Neural Network Based Image Classification, in arXiv preprint, 2013.A. G Howard, Some Improvements on Deep Convolutional Neural Network Based Image Classification, in arXiv preprint, 2013. L. Ma et al., Disentangled Person Image Generation, in Proc. CVPR, 2018.L. Ma et al., Disentangled Person Image Generation, in Proc. CVPR, 2018.

しかしながら、単に教師データの量が多いだけでは、人物認識の精度向上は限定的であった。このような問題は、必ずしも画像による人物認識のみに限られた問題ではなく、画像を用いて生物又はロボットを認識する技術全般に共通する問題である。
上記事情に鑑み、本発明は、画像を用いて生物又はロボットを認識する技術に対し、新たな画像を生成することによって認識の精度を向上させることができる技術の提供を目的としている。 However, simply increasing the amount of teacher data has limited the improvement in the accuracy of person recognition. Such a problem is not limited to the person recognition based on an image, but is a problem common to all techniques for recognizing a living thing or a robot using an image.
In view of the above circumstances, an object of the present invention is to provide a technique capable of improving the recognition accuracy by generating a new image, as opposed to the technique of recognizing a living thing or a robot using an image.

本発明の一態様は、特定の基準を満たした共通の被写体である特定被写体が撮影された複数の画像の一群であるサブ学習画像群の各画像と、前記サブ学習画像群の各画像における前記特定被写体の姿勢情報と、の組み合わせを教師データとして用いた学習処理を行うことによって得られる画像生成器のパラメーターに基づいて、前記特定被写体の画像を新たに生成する画像生成部、を備え、前記画像生成部は、与えられた姿勢情報を用いることによって、前記特定被写体が、前記姿勢情報が示す姿勢をとった画像を生成する、画像生成装置である。 One aspect of the present invention is that each image of a sub-learning image group, which is a group of a plurality of images of a specific subject that is a common subject satisfying a specific criterion, and An image generation unit that newly generates an image of the specific subject based on parameters of an image generator obtained by performing a learning process using a combination of posture information of the specific subject as teacher data. The image generation unit is an image generation device that uses the given posture information to generate an image in which the specific subject has the posture indicated by the posture information.

本発明の一態様は、上記の画像生成装置であって、選択の候補となる姿勢情報を複数記憶する記憶部と、前記記憶部から前記姿勢情報を読み出し、読み出された姿勢情報の候補の中から、前記サブ学習画像群において前記特定被写体がとっている姿勢と所定の基準で非類似である姿勢を示す姿勢情報を選択する姿勢情報選択部をさらに備え、前記画像生成部は、前記姿勢情報選択部によって選択された前記姿勢情報を、与えられた姿勢情報として用いることによって、前記画像を生成する。 One embodiment of the present invention is the above-described image generation device, wherein a storage unit that stores a plurality of pieces of posture information that are candidates for selection, the posture information that is read from the storage unit, and a candidate for the posture information that has been read out The image generation unit may further include a posture information selection unit that selects posture information indicating a posture that is dissimilar to the posture of the specific subject in the sub-learning image group from a predetermined reference. The image is generated by using the posture information selected by the information selection unit as the given posture information.

本発明の一態様は、上記の画像生成装置であって、前記サブ学習画像群において、前記特定被写体の姿勢を示す情報である姿勢情報を画像毎に取得する姿勢情報取得部をさらに備え、前記姿勢情報選択部は、前記姿勢情報取得部によって取得された姿勢情報を選択の候補として使用する。 One aspect of the present invention is the image generation device described above, further comprising a posture information acquisition unit that acquires, for each image, posture information that is information indicating the posture of the specific subject in the sub-learning image group, The posture information selection unit uses the posture information acquired by the posture information acquisition unit as a selection candidate.

本発明の一態様は、特定の基準を満たした共通の被写体である特定被写体が撮影された複数の画像の一群であるサブ学習画像群の各画像と、前記サブ学習画像群の各画像における前記特定被写体の姿勢情報と、の組み合わせを教師データとして用いた学習処理を行うことによって得られる画像生成器のパラメーターに基づいて、前記特定被写体の画像を新たに生成する画像生成ステップ、を有し、前記画像生成ステップにおいて、与えられた姿勢情報を用いることによって、前記特定被写体が、前記姿勢情報が示す姿勢をとった画像を生成する、画像生成方法である。 One aspect of the present invention is that each image of a sub-learning image group, which is a group of a plurality of images of a specific subject that is a common subject satisfying a specific criterion, and Based on the parameters of the image generator obtained by performing a learning process using a combination of posture information of the specific subject as teacher data, an image generating step of newly generating the image of the specific subject, In the image generating step, an image generating method is used, in which the given subject information is used to generate an image in which the specific subject has the posture indicated by the posture information.

本発明の一態様は、上記の画像生成装置としてコンピューターを機能させるためのコンピュータープログラムである。 One aspect of the present invention is a computer program for causing a computer to function as the above-described image generation device.

本発明により、画像を用いて生物又はロボットを認識する技術に対し、新たな画像を生成することによって認識の精度を向上させることが可能となる。 According to the present invention, it is possible to improve the recognition accuracy by generating a new image as compared with the technique of recognizing a living thing or a robot using an image.

本発明の画像生成装置の構成例を示す概略ブロック図である。It is a schematic block diagram which shows the structural example of the image generation apparatus of this invention. 姿勢情報の具体例を示す図である。It is a figure which shows the specific example of posture information. 姿勢画像の具体例を示す図である。It is a figure which shows the specific example of a posture image. 画像生成装置の画像生成器パラメーター生成時の処理の流れの具体例を示すフローチャートである。It is a flow chart which shows a concrete example of a flow of processing at the time of image generator parameter generation of an image generation device. 画像生成装置の姿勢情報生成器パラメーター生成時の処理の流れの具体例を示すフローチャートである。It is a flow chart which shows a concrete example of a flow of processing at the time of generating a posture information generator parameter of an image generating device. 画像生成装置の画像生成時の処理の流れの具体例を示すフローチャートである。9 is a flowchart showing a specific example of the flow of processing when an image is generated by the image generating apparatus.

以下、本発明の具体的な構成例について、図面を参照しながら説明する。
図１は、本発明の画像生成装置１０の構成例を示す概略ブロック図である。画像生成装置１０は、パーソナルコンピューターやサーバーやワークステーション等の情報処理装置を用いて構成される。画像生成装置１０は、画像入力部１１、画像出力部１２、指示入力部１３、姿勢情報記憶部１４、姿勢情報生成器記憶部１５、画像生成器記憶部１６及び制御部１７を備える。以下、画像生成装置１０について説明する。 Hereinafter, a specific configuration example of the present invention will be described with reference to the drawings.
FIG. 1 is a schematic block diagram showing a configuration example of an image generating apparatus 10 of the present invention. The image generation device 10 is configured using an information processing device such as a personal computer, a server or a workstation. The image generation device 10 includes an image input unit 11, an image output unit 12, an instruction input unit 13, a posture information storage unit 14, a posture information generator storage unit 15, an image generator storage unit 16, and a control unit 17. The image generation device 10 will be described below.

画像入力部１１は、画像生成装置１０に対して入力される学習画像群のデータを受け付ける。学習画像群は、既に得られている複数の画像の一群である。学習画像群は、１又は複数のサブ学習画像群を含んでもよい。サブ学習画像群は、特定の基準を満たした共通の被写体（以下「特定被写体」という。）が写った複数の画像の一群である。特定の基準とは、例えば特定の人物であることでもよいし、特定の種目の選手であることでもよいし、特定の属性（性別、年齢、人種など）の人物であることでもよいし、特定の種の生物であることでもよいし、特定の種類のロボットであることでもよい。また、特定の基準とは、特定の人物であって、且つ、特定の服を着用していることであってもよい。また、特定の基準とは、特定の人物であって、且つ、特定の動作（例えば、特定の種目の運動、特定の種別の行動）をしていることであってもよい。例えば、ある特定の選手がバスケットボールをしている姿が被写体として映った複数の画像の一群がサブ学習画像群として形成されてもよい。 The image input unit 11 receives the data of the learning image group input to the image generation device 10. The learning image group is a group of a plurality of already acquired images. The learning image group may include one or more sub-learning image groups. The sub-learning image group is a group of a plurality of images in which a common subject (hereinafter, referred to as “specific subject”) satisfying a specific criterion is captured. The specific criterion may be, for example, a specific person, a player of a specific event, or a person of a specific attribute (sex, age, race, etc.), It may be a specific type of organism or a specific type of robot. In addition, the specific criterion may be a specific person and wearing specific clothes. In addition, the specific reference may be that a specific person is also performing a specific action (for example, exercise of a specific type, behavior of a specific type). For example, a group of a plurality of images showing a figure of a certain player playing basketball as a subject may be formed as a sub-learning image group.

画像入力部１１は、有線通信や無線通信を介したデータ通信を行うことによって他の装置から学習画像群のデータを受信してもよい。この場合、画像入力部１１は、通信インターフェースを用いて構成されてもよい。画像入力部１１は、例えばＣＤ−ＲＯＭやＵＳＢメモリー（Universal Serial Bus Memory）等の記録媒体に記録された学習画像群のデータを記録媒体から読み出してもよい。この場合、画像入力部１１は、ＣＤ−ＲＯＭドライブや、ＵＳＢインターフェース等の装置を用いて構成されてもよい。画像入力部１１は、スチルカメラやビデオカメラによって撮像された学習画像群を、カメラから受信してもよい。この場合、画像入力部１１は、カメラとデータ通信可能な通信プロトコルのインターフェースを用いて構成されてもよい。また、画像生成装置１０がスチルカメラやビデオカメラ若しくはカメラを備えた情報処理装置（スマートフォン等）に内蔵されている場合は、画像入力部１１は撮像された画像又は撮像前の画像をバスから受信してもよい。画像入力部１１は、学習画像群のデータの入力を受けることが可能な構成であれば、どのような態様で構成されてもよい。また、画像入力部１１に入力される時点で既に各画像が学習画像群を形成している必要は無く、複数の画像がそれぞれ入力されることによって結果として学習画像群が画像生成装置１０に入力されてもよい。 The image input unit 11 may receive the data of the learning image group from another device by performing data communication via wired communication or wireless communication. In this case, the image input unit 11 may be configured using a communication interface. The image input unit 11 may read the data of the learning image group recorded in a recording medium such as a CD-ROM or a USB memory (Universal Serial Bus Memory) from the recording medium. In this case, the image input unit 11 may be configured using a device such as a CD-ROM drive or a USB interface. The image input unit 11 may receive a learning image group captured by a still camera or a video camera from the camera. In this case, the image input unit 11 may be configured using an interface of a communication protocol capable of data communication with the camera. When the image generating apparatus 10 is built in a still camera, a video camera, or an information processing apparatus (smartphone or the like) including a camera, the image input unit 11 receives a captured image or an image before capturing from the bus. You may. The image input unit 11 may be configured in any manner as long as it can receive the data of the learning image group. Further, it is not necessary that each image has already formed the learning image group at the time of being input to the image input unit 11, and the learning image group is input to the image generating device 10 as a result of inputting a plurality of images. May be done.

画像出力部１２は、制御部１７によって生成された画像のデータを出力する。画像出力部１２は、有線通信や無線通信を介したデータ通信を行うことによって他の装置（例えば他の情報処理装置や他の記憶装置）に対して画像を送信してもよい。この場合、画像出力部１２は、通信インターフェースを用いて構成されてもよい。画像出力部１２は、例えばＤＶＤ−ＲＯＭやＵＳＢメモリー等の記録媒体に対して画像を記録してもよい。この場合、画像出力部１２は、ＤＶＤ−Ｒドライブや、ＵＳＢインターフェース等の装置を用いて構成されてもよい。画像出力部１２は、画像生成装置１０に備えられた記憶装置に画像を記録してもよい。画像出力部１２は、画像のデータを出力することが可能な構成であれば、どのような態様で構成されても良い。 The image output unit 12 outputs the image data generated by the control unit 17. The image output unit 12 may transmit an image to another device (for example, another information processing device or another storage device) by performing data communication via wired communication or wireless communication. In this case, the image output unit 12 may be configured using a communication interface. The image output unit 12 may record an image on a recording medium such as a DVD-ROM or a USB memory. In this case, the image output unit 12 may be configured using a device such as a DVD-R drive or a USB interface. The image output unit 12 may record the image in a storage device included in the image generation device 10. The image output unit 12 may be configured in any manner as long as it can output image data.

指示入力部１３は、キーボード、ポインティングデバイス（マウス、タブレット等）、ボタン、タッチパネル等の既存の入力装置を用いて構成されてもよい。この場合、指示入力部１３は、ユーザーの指示を画像生成装置１０に入力する際にユーザーによって操作される。上述した入力装置は、音声入力を受け付けるためのマイク及び音声認識装置を用いて構成されてもよい。指示入力部１３は、入力装置を画像生成装置１０に接続するためのインターフェースであってもよい。この場合、指示入力部１３は、入力装置においてユーザーの入力に応じ生成された入力信号を画像生成装置１０に入力する。指示入力部１３は、有線通信や無線通信を介したデータ通信を行うことによって他の装置からユーザーの指示を受信してもよい。この場合、指示入力部１３は、通信インターフェースを用いて構成されてもよい。 The instruction input unit 13 may be configured using an existing input device such as a keyboard, a pointing device (mouse, tablet, etc.), a button, a touch panel, or the like. In this case, the instruction input unit 13 is operated by the user when the user's instruction is input to the image generating apparatus 10. The input device described above may be configured using a microphone and a voice recognition device for receiving voice input. The instruction input unit 13 may be an interface for connecting an input device to the image generation device 10. In this case, the instruction input unit 13 inputs the input signal generated by the input device according to the user's input to the image generation device 10. The instruction input unit 13 may receive a user instruction from another device by performing data communication via wired communication or wireless communication. In this case, the instruction input unit 13 may be configured using a communication interface.

姿勢情報記憶部１４は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。姿勢情報記憶部１４は、学習画像群に含まれる画像データから得られた姿勢情報を、姿勢情報が得られた画像を示す画像識別情報と対応付けて記憶する。姿勢情報記憶部１４は、姿勢情報を、姿勢情報が得られた画像が含まれるサブ学習画像群を示すサブ学習画像群識別情報と対応付けて記憶してもよい。 The attitude information storage unit 14 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The posture information storage unit 14 stores the posture information obtained from the image data included in the learning image group in association with the image identification information indicating the image from which the posture information is obtained. The posture information storage unit 14 may store the posture information in association with the sub-learning image group identification information indicating the sub-learning image group including the image for which the posture information is obtained.

姿勢情報は、学習画像群に含まれる各画像の被写体がとっている姿勢を示す情報である。姿勢情報は、例えば被写体に対して予め定められた複数の特徴部位の位置を示す情報であってもよい。このような特徴部位は、予め定められた人体の部位であってもよい。このような特徴部位の具体例として、頭、胴体、右肩、左肩、右足、左足が定義されてもよい。このような特徴部位の他の具体例として、右目、左目、鼻、右肩、左肩、右肘、左肘、右手首、左手首、右手先、左手先、首、腰、右膝、左膝、右足首、左足首、右足先、左足先が定義されてもよい。 The posture information is information indicating the posture of the subject of each image included in the learning image group. The posture information may be, for example, information indicating the positions of a plurality of predetermined characteristic parts with respect to the subject. Such a characteristic part may be a predetermined part of the human body. As specific examples of such characteristic parts, a head, a body, a right shoulder, a left shoulder, a right foot, and a left foot may be defined. Other specific examples of such characteristic parts include right eye, left eye, nose, right shoulder, left shoulder, right elbow, left elbow, right wrist, left wrist, right hand, left hand, neck, waist, right knee, left knee. , Right ankle, left ankle, right ankle, left ankle may be defined.

姿勢情報は、これらの特徴部位毎に、画像上の位置を示す座標（例えばｘ座標及びｙ座標）を有した情報として定義されてもよい。図２は、姿勢情報の具体例を示す図である。図２の例では、頭、胴体、右肩、左肩、左足などの各特徴部位のイメージ座標がｘ及びｙの値の組み合わせとして定義されている。 The posture information may be defined as information having coordinates (for example, x coordinate and y coordinate) indicating a position on the image for each of these characteristic parts. FIG. 2 is a diagram showing a specific example of the posture information. In the example of FIG. 2, the image coordinates of each characteristic part such as the head, body, right shoulder, left shoulder, and left foot are defined as a combination of x and y values.

姿勢情報は、各特徴部位の位置を示す画像（以下「姿勢画像」という。）として定義されてもよい。例えば、姿勢画像は、各特徴部位を示すノードと、ノード間を繋ぐリンクとを用いた画像として定義されてもよい。図３は、姿勢画像の具体例を示す図である。図３では、各特徴部位を表すノード２１の画像と、ノード２１間を繋ぐリンク２２の画像と、を組み合わせることによって人の姿勢情報を示す姿勢画像が形成されている。姿勢画像は、各リンクがそれぞれ異なる色で表されてもよいし、各ノードがそれぞれ異なる色で表されてもよい。 The posture information may be defined as an image showing the position of each characteristic part (hereinafter referred to as “posture image”). For example, the posture image may be defined as an image using a node indicating each characteristic part and a link connecting the nodes. FIG. 3 is a diagram showing a specific example of the posture image. In FIG. 3, a posture image indicating the posture information of a person is formed by combining the image of the node 21 representing each characteristic part and the image of the link 22 connecting the nodes 21. In the posture image, each link may be represented by a different color, or each node may be represented by a different color.

図１の説明に戻る。姿勢情報生成器記憶部１５は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。姿勢情報生成器記憶部１５は、姿勢情報を生成する姿勢情報生成器が動作するために必要となる情報（以下「姿勢情報生成器パラメーター」という。）を記憶する。このような姿勢情報生成器パラメーターは、制御部１７の処理によって得られる。 Returning to the explanation of FIG. The attitude information generator storage unit 15 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The posture information generator storage unit 15 stores information necessary for the posture information generator that generates posture information to operate (hereinafter referred to as “posture information generator parameter”). Such attitude information generator parameters are obtained by the processing of the control unit 17.

画像生成器記憶部１６は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。画像生成器記憶部１６は、画像を生成する画像生成器が動作するために必要となる情報（以下「画像生成器パラメーター」という。）を記憶する。このような画像生成器パラメーターは、制御部１７の処理によって得られる。 The image generator storage unit 16 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The image generator storage unit 16 stores information necessary for the operation of the image generator that generates an image (hereinafter referred to as “image generator parameter”). Such image generator parameters are obtained by the processing of the control unit 17.

制御部１７は、バスで接続されたＣＰＵ（Central Processing Unit）等のプロセッサーとメモリーとを備える。制御部１７が画像生成プログラムを実行することによって、制御部１７は姿勢情報取得部１７１、画像生成器学習部１７２、姿勢情報生成器学習部１７３、姿勢情報生成部１７４、姿勢情報選択部１７５及び画像生成部１７６として動作する。なお、各構成の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。また、各構成の全て又は一部は、ＧＰＵ（Graphics Processing Unit）等の専用プロセッサーがプログラムを実行することによって実現されてもよい。画像生成プログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ、半導体記憶装置（例えばＳＳＤ：Solid State Drive）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。画像生成プログラムは、電気通信回線を介して送信されても良い。 The control unit 17 includes a processor such as a CPU (Central Processing Unit) connected by a bus and a memory. When the control unit 17 executes the image generation program, the control unit 17 causes the posture information acquisition unit 171, the image generator learning unit 172, the posture information generator learning unit 173, the posture information generation unit 174, the posture information selection unit 175, and It operates as the image generation unit 176. Note that all or part of each configuration may be realized using hardware such as ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), and FPGA (Field Programmable Gate Array). Further, all or part of each configuration may be realized by a dedicated processor such as a GPU (Graphics Processing Unit) executing a program. The image generation program may be recorded in a computer-readable recording medium. The computer-readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, a semiconductor memory device (for example, SSD: Solid State Drive), a hard disk or a semiconductor memory built in a computer system. A storage device such as a device. The image generation program may be transmitted via a telecommunication line.

姿勢情報取得部１７１は、画像入力部１１によって入力された学習画像群に含まれる各画像の被写体の姿勢を推定する。姿勢情報取得部１７１は、姿勢の推定結果として、姿勢情報を生成する。姿勢情報取得部１７１には、例えば以下に示す参考文献に記載された技術が適用されてもよい。姿勢情報取得部１７１は、生成された姿勢情報を姿勢情報記憶部１４に記録する。 The posture information acquisition unit 171 estimates the posture of the subject of each image included in the learning image group input by the image input unit 11. The posture information acquisition unit 171 generates posture information as the posture estimation result. The technique described in the following references may be applied to the posture information acquisition unit 171, for example. The posture information acquisition unit 171 records the generated posture information in the posture information storage unit 14.

参考文献１：Z. Cao et al., Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, in Proc. CVPR, 2017.
参考文献２：S. -E. Wei et al., Convolutional Pose Machines, in Proc. CVPR, 2016. Reference 1: Z. Cao et al., Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, in Proc. CVPR, 2017.
Reference 2: S. -E. Wei et al., Convolutional Pose Machines, in Proc. CVPR, 2016.

画像生成器学習部１７２は、サブ学習画像群毎に画像生成器パラメーターを取得する。画像生成器学習部１７２は、例えば処理対象となるサブ学習画像群に含まれる複数の画像と、各画像において推定された姿勢情報と、の組み合わせを教師画像として用いた機械学習を実行することによって、画像生成器パラメーターを取得してもよい。画像生成器パラメーターによって表される画像生成器は、姿勢画像を入力として、その姿勢画像が示す姿勢をとった生物又はロボットの画像を生成する。どのような生物又はロボットの画像が生成されるかは、サブ学習画像群によって決まる。すなわち、処理対象となっているサブ学習画像群において定義された特定の基準を満たした被写体が、入力された姿勢画像が示す姿勢をとった画像が生成される。画像生成器学習部１７２は、例えばＧＡＮ等の敵対的学習によって画像生成器パラメーターを取得するように構成されてもよい。画像生成器学習部１７２には、例えば以下に示す参考文献に記載された技術が適用されてもよい。画像生成器学習部１７２は、生成された画像生成器パラメーターを画像生成器記憶部１６に記録する。 The image generator learning unit 172 acquires an image generator parameter for each sub-learning image group. The image generator learning unit 172 performs machine learning using, for example, a combination of a plurality of images included in the sub-learning image group to be processed and the posture information estimated in each image as a teacher image. , Image generator parameters may be obtained. The image generator represented by the image generator parameter receives the posture image as an input and generates an image of a living thing or a robot having the posture indicated by the posture image. What kind of image of a living thing or a robot is generated depends on the sub-learning image group. That is, an image is generated in which the subject that satisfies the specific criterion defined in the sub-learning image group that is the processing target has the posture indicated by the input posture image. The image generator learning unit 172 may be configured to acquire the image generator parameters by adversarial learning such as GAN. The techniques described in the following references may be applied to the image generator learning unit 172, for example. The image generator learning unit 172 records the generated image generator parameters in the image generator storage unit 16.

参考文献３：X. Mao et al., Least Squares Generative Adversarial Networks, in Proc. ICCV, 2017.
参考文献４：I. Gulrajani, Improved Training of Wasserstein GANs, in Proc. ICLR, 2018. Reference 3: X. Mao et al., Least Squares Generative Adversarial Networks, in Proc. ICCV, 2017.
Reference 4: I. Gulrajani, Improved Training of Wasserstein GANs, in Proc. ICLR, 2018.

姿勢情報生成器学習部１７３は、姿勢情報生成器パラメーターを取得する。姿勢情報生成器学習部１７３は、例えば処理対象となる学習画像群に含まれる複数の画像から姿勢情報取得部１７１によって得られた姿勢情報を教師データとして用いた機械学習を実行することによって、姿勢情報生成器パラメーターを取得してもよい。姿勢情報生成器パラメーターの学習に用いられる教師データには、学習画像群に含まれる画像から得られた全ての姿勢情報が用いられてもよいし、特定のサブ学習画像群の画像から得られた姿勢情報のみが用いられてもよい。 The posture information generator learning unit 173 acquires the posture information generator parameters. The posture information generator learning unit 173 performs the machine learning using the posture information obtained by the posture information acquisition unit 171 from a plurality of images included in the learning image group to be processed as the teacher data, for example. Information generator parameters may be obtained. All the posture information obtained from the images included in the learning image group may be used as the teacher data used for learning the posture information generator parameters, or may be obtained from the images of the specific sub-learning image group. Only the posture information may be used.

姿勢情報生成器パラメーターによって表される姿勢情報生成器は、例えば所定の次元数のランダムな数値列を入力として、所定の大きさで所定のチャンネル数の姿勢画像を生成するように構成される。例えば、縦１９２ピクセル、横１９２ピクセル、チャンネル数３の姿勢画像が生成されてもよい。姿勢情報生成器学習部１７３は、例えば変分オートエンコーダー（ＶＡＥ：下記参考文献５参照）や、敵対的生成ネットワーク（ＧＡＮ：下記参考文献６参照）を用いることによって姿勢情報生成器パラメーターを取得するように構成されてもよい。姿勢情報生成器学習部１７３には、例えば以下に示す参考文献に記載された技術が適用されてもよい。姿勢情報生成器学習部１７３は、生成された姿勢情報生成器パラメーターを姿勢情報生成器記憶部１５に記録する。 The posture information generator represented by the posture information generator parameter is configured to generate a posture image of a predetermined size and a predetermined number of channels, for example, by inputting a random numerical value sequence of a predetermined dimension number. For example, a posture image having 192 vertical pixels, 192 horizontal pixels, and 3 channels may be generated. The posture information generator learning unit 173 acquires the posture information generator parameters by using, for example, a variational auto encoder (VAE: see reference 5 below) or a hostile generation network (GAN: see reference 6 below). It may be configured as follows. The technique described in the following references may be applied to the posture information generator learning unit 173, for example. The posture information generator learning unit 173 records the generated posture information generator parameters in the posture information generator storage unit 15.

参考文献５：D. P. Kingma et al., Auto-Encoding Variational Bayes, in Proc. ICLR, 2014.
参考文献６：I. Goodfellow et al., Generative Adversarial Networks, in NIPS, 2014. Reference 5: DP Kingma et al., Auto-Encoding Variational Bayes, in Proc. ICLR, 2014.
Reference 6: I. Goodfellow et al., Generative Adversarial Networks, in NIPS, 2014.

姿勢情報生成部１７４は、姿勢情報生成器記憶部１５に記憶されている姿勢情報生成器パラメーターに基づいて、姿勢情報生成器として動作する。姿勢情報生成部１７４は、姿勢情報生成器として動作することによって、姿勢情報を生成する。姿勢情報生成部１７４には、姿勢情報生成器学習部１７３において姿勢情報生成器に与えられることが前提とされていた所定の入力パラメーターが与えられる。所定の入力パラメーターは、例えば所定の次元数の数値列であってもよい。所定の入力パラメーターの生成方法はどのように実現されてもよい。例えば、次元数が“１０”と定められた場合、各次元の値を正規分布に基づいて取得することによってランダムな数値列が生成されてもよい。姿勢情報生成部１７４は、生成された姿勢情報を姿勢情報選択部１７５に出力する。 The posture information generator 174 operates as a posture information generator based on the posture information generator parameters stored in the posture information generator storage unit 15. The posture information generation unit 174 generates posture information by operating as a posture information generator. The posture information generation unit 174 is provided with predetermined input parameters that were assumed to be provided to the posture information generator in the posture information generator learning unit 173. The predetermined input parameter may be, for example, a numerical sequence having a predetermined number of dimensions. Any method of generating the predetermined input parameter may be realized. For example, when the number of dimensions is set to “10”, a random numerical value sequence may be generated by acquiring the value of each dimension based on the normal distribution. The posture information generation unit 174 outputs the generated posture information to the posture information selection unit 175.

姿勢情報選択部１７５は、姿勢情報記憶部１４に記憶されている姿勢情報と、姿勢情報生成部１７４によって生成された姿勢情報と、の中から処理の対象となる姿勢情報を選択する。以下の説明では、姿勢情報記憶部１４に記憶されている姿勢情報と、姿勢情報生成部１７４によって生成された姿勢情報と、を合わせて「候補姿勢情報」という。 The posture information selection unit 175 selects the posture information to be processed from the posture information stored in the posture information storage unit 14 and the posture information generated by the posture information generation unit 174. In the following description, the posture information stored in the posture information storage unit 14 and the posture information generated by the posture information generation unit 174 are collectively referred to as “candidate posture information”.

姿勢情報選択部１７５は、画像生成部１７６において画像生成の対象となっている特定被写体の姿勢情報（以下「特定姿勢情報」という。）として未だに得られていない姿勢情報を、候補姿勢情報の中から選択する。より具体的には、姿勢情報選択部１７５は、候補姿勢情報の中から選択される判定対象の姿勢情報について、既に得られている特定姿勢情報毎に類似度を算出し、算出された全ての類似度が所定の基準以上類似していないことを示す場合には、判定対象の姿勢情報を選択する。姿勢情報選択部１７５は、一つの姿勢情報を選択してもよいし、予め定められた数の姿勢情報を選択してもよいし、指示入力部１３を介して入力された指示によって示された数の姿勢情報を選択してもよい。姿勢情報選択部１７５は、予め定められた数の姿勢情報を選択する場合や、指示入力部１３を介して入力された指示によって示された数の姿勢情報を選択する場合には、類似度の値が最も類似していないことを示す値のものから順に姿勢情報を選択してもよい。姿勢情報選択部１７５は、選択された姿勢情報を画像生成部１７６に出力する。 The posture information selection unit 175 sets, in the candidate posture information, posture information that has not yet been obtained as the posture information (hereinafter referred to as “specific posture information”) of the specific subject that is the target of image generation in the image generation unit 176. Select from. More specifically, the posture information selection unit 175 calculates the degree of similarity for each piece of specific posture information that has already been obtained for the posture information that is the determination target that is selected from among the candidate posture information, and calculates all the calculated postures. When the similarity indicates that the similarity is not equal to or more than a predetermined reference, the posture information to be determined is selected. The posture information selection unit 175 may select one piece of posture information, may select a predetermined number of pieces of posture information, and may be indicated by an instruction input via the instruction input unit 13. You may select several posture information. The posture information selecting unit 175 determines the degree of similarity when selecting a predetermined number of posture information or when selecting the number of posture information indicated by the instruction input via the instruction input unit 13. The posture information may be selected in order from a value showing that the values are not the most similar. The posture information selection unit 175 outputs the selected posture information to the image generation unit 176.

画像生成部１７６は、画像生成器記憶部１６に記憶されている画像生成器パラメーターのうち、処理対象となる特定被写体の画像生成器パラメーターを用いることによって、画像生成器として動作する。画像生成部１７６は、画像生成の際に、姿勢情報選択部１７５によって選択された姿勢情報を用いる。画像生成部１７６は、姿勢情報を用いて画像生成器として動作することによって、選択された姿勢情報が示す姿勢を特定被写体がとっている画像を生成する。画像生成部１７６は、生成された画像のデータを画像出力部１２に出力する。 The image generation unit 176 operates as an image generator by using the image generator parameter of the specific subject to be processed among the image generator parameters stored in the image generator storage unit 16. The image generation unit 176 uses the posture information selected by the posture information selection unit 175 when generating the image. The image generation unit 176 operates as an image generator using the posture information to generate an image in which the specific subject has the posture indicated by the selected posture information. The image generation unit 176 outputs the generated image data to the image output unit 12.

図４は、画像生成装置１０の画像生成器パラメーター生成時の処理の流れの具体例を示すフローチャートである。まず、画像入力部１１が、処理対象のサブ学習画像群の画像データを入力する（ステップＳ１０１）。姿勢情報取得部１７１は、処理対象のサブ学習画像群の各画像データにおける被写体の姿勢情報を推定する（ステップＳ１０２）。画像生成器学習部１７２は、処理対象のサブ学習画像群の画像データ及び姿勢情報の複数の組み合わせを教師データとして用いた学習処理を実行することによって、画像生成器パラメーターを取得する（ステップＳ１０３）。画像生成器学習部１７２は、ステップＳ１０１〜Ｓ１０３の処理をサブ学習画像群毎に繰り返し実行することによって、サブ学習画像群毎に画像生成器パラメーターを取得する。画像生成器学習部１７２は、取得された画像生成器パラメーターをサブ学習画像群に対応付けて画像生成器記憶部１６に記録する。 FIG. 4 is a flowchart showing a specific example of the flow of processing when the image generator parameters are generated by the image generating apparatus 10. First, the image input unit 11 inputs image data of a sub-learning image group to be processed (step S101). The posture information acquisition unit 171 estimates the posture information of the subject in each image data of the sub-learning image group to be processed (step S102). The image generator learning unit 172 acquires an image generator parameter by performing a learning process using a plurality of combinations of image data of a sub-learning image group to be processed and posture information as teacher data (step S103). .. The image generator learning unit 172 acquires the image generator parameter for each sub-learning image group by repeatedly executing the processing of steps S101 to S103 for each sub-learning image group. The image generator learning unit 172 records the acquired image generator parameters in the image generator storage unit 16 in association with the sub-learning image group.

図５は、画像生成装置１０の姿勢情報生成器パラメーター生成時の処理の流れの具体例を示すフローチャートである。まず、画像入力部１１が、処理対象の学習画像群の画像データを入力する（ステップＳ２０１）。姿勢情報取得部１７１は、処理対象の学習画像群の各画像データにおける被写体の姿勢情報を推定する（ステップＳ２０２）。姿勢情報生成器学習部１７３は、処理対象の学習画像群から得られた複数の姿勢情報を教師データとして用いた学習処理を実行することによって、姿勢情報生成器パラメーターを取得する（ステップＳ２０３）。姿勢情報生成器学習部１７３は、取得された姿勢情報生成器パラメーターを学習画像群に対応付けて姿勢情報生成器記憶部１５に記録する。 FIG. 5 is a flowchart showing a specific example of the flow of processing when the orientation information generator parameters of the image generating apparatus 10 are generated. First, the image input unit 11 inputs image data of a learning image group to be processed (step S201). The posture information acquisition unit 171 estimates the posture information of the subject in each image data of the learning image group to be processed (step S202). The posture information generator learning unit 173 acquires a posture information generator parameter by performing a learning process using a plurality of posture information obtained from the learning image group of the processing target as teacher data (step S203). The posture information generator learning unit 173 records the acquired posture information generator parameters in the posture information generator storage unit 15 in association with the learning image group.

図６は、画像生成装置１０の画像生成時の処理の流れの具体例を示すフローチャートである。まず、姿勢情報選択部１７５が、複数の姿勢情報を取得する（ステップＳ３０１）。例えば、姿勢情報選択部１７５は、姿勢情報記憶部１４に記憶されている姿勢情報と、姿勢情報生成部１７４によって生成された姿勢情報と、を取得してもよい。姿勢情報選択部１７５は、取得された複数の姿勢情報（候補姿勢情報）の中から、姿勢情報を選択する（ステップＳ３０２）。画像生成部１７６は、姿勢情報選択部１７５によって選択された姿勢情報と、処理対象のサブ学習画像群に応じた画像生成器パラメーターと、に基づいて画像を生成する（ステップＳ３０３）。 FIG. 6 is a flowchart showing a specific example of the flow of processing when the image generating apparatus 10 generates an image. First, the posture information selection unit 175 acquires a plurality of posture information (step S301). For example, the posture information selection unit 175 may acquire the posture information stored in the posture information storage unit 14 and the posture information generated by the posture information generation unit 174. The posture information selection unit 175 selects posture information from the acquired plurality of posture information (candidate posture information) (step S302). The image generation unit 176 generates an image based on the posture information selected by the posture information selection unit 175 and the image generator parameter corresponding to the sub-learning image group to be processed (step S303).

このように構成された画像生成装置１０によれば、画像を用いて生物又はロボットを認識する技術に対し、新たな画像を生成することによって認識の精度を向上させることが可能となる。詳細は以下の通りである。 According to the image generating apparatus 10 configured as described above, it is possible to improve the recognition accuracy by generating a new image as compared with the technique of recognizing a living thing or a robot using an image. Details are as follows.

画像を用いて生物又はロボットを認識する技術では、膨大な量の教師データが必要となるが、同じような画像がたくさん教師データに含まれていても認識精度の向上は限定的であった。このような問題に対し、上述した画像生成装置１０では、それまでその被写体の画像としては存在していなかった新たな姿勢の画像が生成される。このような新たな姿勢の画像を教師データとして用いることによって、認識の精度を向上させることが可能となる。 A technique for recognizing a living thing or a robot by using an image requires a huge amount of teacher data, but even if a large number of similar images are included in the teacher data, improvement in recognition accuracy is limited. In response to such a problem, the image generating apparatus 10 described above generates an image of a new posture that has not existed as an image of the subject until then. By using the image of such a new posture as the teacher data, the recognition accuracy can be improved.

また、画像生成装置１０では、学習画像群を入力するだけで、その中に含まれる各特定被写体の姿勢が推定され、各特定被写体において存在していない新たな姿勢を示す姿勢情報が選択され、選択された姿勢情報が示す姿勢をとった特定被写体の画像が生成される。そのため、ユーザーがわざわざ姿勢を判断して入力する必要が無く、ユーザーの手間を削減することが可能となる。 Further, in the image generation device 10, the posture of each specific subject included in the learning image group is estimated only by inputting the learning image group, and the posture information indicating a new posture that does not exist in each specific subject is selected, An image of the specific subject having the posture indicated by the selected posture information is generated. Therefore, the user does not need to bother to determine the posture and input the posture, and the labor of the user can be reduced.

また、画像生成装置１０では、サブ学習画像群において共通する特定の基準を満たした特定被写体毎に画像生成器パラメーターが生成される。そのため、一般的な生物や種族に応じた画像生成器パラメーターが生成される場合に比べて、各特定被写体の特徴をより顕著に有した画像を生成することができる。このように生成された画像を用いて学習処理を行うことによって、特定被写体の認識精度を向上させることが可能となる。 Further, in the image generating apparatus 10, the image generator parameter is generated for each specific subject satisfying the specific standard common to the sub-learning image group. Therefore, it is possible to generate an image having more distinctive features of each specific subject, as compared with the case where an image generator parameter corresponding to a general creature or race is generated. By performing the learning process using the image generated in this way, the recognition accuracy of the specific subject can be improved.

（変形例）
姿勢取得部１７１は、画像に基づいて姿勢情報を推定するのではなく、予め他の装置や人間によって判断された姿勢情報を外部から取得するように構成されてもよい。この場合、姿勢情報取得部１７１は、画像毎に予め判断された姿勢情報を外部から取得し、姿勢情報記憶部１４に姿勢情報を記録する。 (Modification)
The posture acquisition unit 171 may be configured to acquire posture information previously determined by another device or a person from the outside, instead of estimating the posture information based on the image. In this case, the posture information acquisition unit 171 externally acquires the posture information determined in advance for each image, and records the posture information in the posture information storage unit 14.

画像生成装置１０は、画像生成器学習部１７２を備えないように構成されてもよい。この場合、予め他の装置に実装された画像生成器学習部１７２によって得られた画像生成器パラメーターを画像生成器記憶部１６に記録しておくことで、画像生成部１７６は処理を実行することが可能となる。 The image generation device 10 may be configured not to include the image generator learning unit 172. In this case, the image generation unit 176 executes the process by recording in the image generator storage unit 16 the image generator parameters obtained by the image generator learning unit 172 mounted in another device in advance. Is possible.

画像生成装置１０は、姿勢情報生成器学習部１７３を備えないように構成されてもよい。この場合、予め他の装置に実装された姿勢情報生成器学習部１７３によって得られた姿勢情報生成器パラメーターを姿勢情報生成器記憶部１５に記録しておくことで、姿勢情報生成部１７４は処理を実行することが可能となる。 The image generation device 10 may be configured not to include the posture information generator learning unit 173. In this case, by storing the posture information generator parameters obtained by the posture information generator learning unit 173 installed in another device in the posture information generator storage unit 15 in advance, the posture information generator 174 performs processing. Can be executed.

画像生成装置１０は、姿勢情報生成部１７４を備えないように構成されてもよい。この場合、姿勢情報選択部１７５は、姿勢情報記憶部１４に記憶されている姿勢情報、すなわち学習画像群の画像から得られた姿勢情報の中から姿勢情報を選択する。姿勢情報の中には、他の特定被写体の姿勢情報も含まれているため、姿勢情報選択部１７５は姿勢情報を選択することが可能となる。 The image generation device 10 may be configured without the posture information generation unit 174. In this case, the posture information selection unit 175 selects posture information from the posture information stored in the posture information storage unit 14, that is, the posture information obtained from the images of the learning image group. Since the posture information also includes the posture information of other specific subjects, the posture information selection unit 175 can select the posture information.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail above with reference to the drawings, the specific configuration is not limited to this embodiment, and includes a design and the like within a range not departing from the gist of the present invention.

１０…画像生成装置，１１…画像入力部，１２…画像出力部，１３…指示入力部，１４…姿勢情報記憶部，１５…姿勢情報生成器記憶部，１６…画像生成器記憶部，１７…制御部，１７１…姿勢情報取得部，１７２…画像生成器学習部，１７３…姿勢情報生成器学習部，１７４…姿勢情報生成部，１７５…姿勢情報選択部，１７６…画像生成部 10... Image generating device, 11... Image input unit, 12... Image output unit, 13... Instruction input unit, 14... Attitude information storage unit, 15... Attitude information generator storage unit, 16... Image generator storage unit, 17... Control unit, 171... Attitude information acquisition unit, 172... Image generator learning unit, 173... Attitude information generator learning unit, 174... Attitude information generation unit, 175... Attitude information selection unit, 176... Image generation unit

Claims

Each image of a sub-learning image group, which is a group of a plurality of images in which a specific subject that is a common subject satisfying a specific criterion is captured, and posture information of the specific subject in each image of the sub-learning image group, An image generation unit that newly generates an image of the specific subject based on the parameters of the image generator obtained by performing the learning process using the combination of
The image generation apparatus, wherein the image generation unit generates an image in which the specific subject has the posture indicated by the posture information by using the given posture information.

A storage unit that stores a plurality of posture information items that are candidates for selection,
The posture information is read from the storage unit, and posture information indicating a posture that is dissimilar to the posture of the specific subject in the sub-learning image group based on a predetermined reference from the read posture information candidates. Further comprising a posture information selection unit for selecting
The image generation device according to claim 1, wherein the image generation unit generates the image by using the posture information selected by the posture information selection unit as given posture information.

The sub-learning image group further includes a posture information acquisition unit that acquires posture information, which is information indicating the posture of the specific subject, for each image,
The image generation apparatus according to claim 2, wherein the posture information selection unit uses the posture information acquired by the posture information acquisition unit as a selection candidate.

Each image of a sub-learning image group, which is a group of a plurality of images in which a specific subject that is a common subject satisfying a specific criterion is captured, and posture information of the specific subject in each image of the sub-learning image group, Based on the parameters of the image generator obtained by performing the learning process using the combination of as a teacher data, an image generating step of newly generating the image of the specific subject,
An image generating method, wherein in the image generating step, the given subject information is used to generate an image in which the specific subject has the posture indicated by the posture information.

A computer program for causing a computer to function as the image generation device according to claim 1.