JP2018205885A

JP2018205885A - Image generation device and image generation method

Info

Publication number: JP2018205885A
Application number: JP2017107945A
Authority: JP
Inventors: 隆介林; Ryusuke Hayashi
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2017-05-31
Filing date: 2017-05-31
Publication date: 2018-12-27
Anticipated expiration: 2037-05-31
Also published as: JP6853535B2

Abstract

To provide a device and method that generates an image to which a visual experience of a user is visualized more accurately.SOLUTION: An image generation device 1 comprises: a conversion changeover section 3 converting activity data D of a brain to a brain activity feature vector Vb; a deep layer neural network for image recognition part 2 that generates an image feature vector from image data DG, generates a caption feature vector from caption data DL (DT), and generates a common feature vector Vc corresponding to the image feature vector and the caption feature vector; and a hostile neural network for image generation part 4 creating generation image data corresponding to the common feature vector Vc.SELECTED DRAWING: Figure 1

Description

本発明は、画像を生成する技術に関するものである。 The present invention relates to a technique for generating an image.

近年においては、非特許文献１及び２に示されるように、キャプション文（説明文）を基に、敵対ニューラルネットワークを用いて画像を生成する手法が提案されている。 In recent years, as shown in Non-Patent Documents 1 and 2, a method of generating an image using a hostile neural network based on a caption sentence (description) has been proposed.

なお、非特許文献３及び４には、画像認識用の深層ニューラルネットと画像生成用の敵対ニューラルネットを同時学習する手法が提案されている。 Non-Patent Documents 3 and 4 propose a method of simultaneously learning a deep neural network for image recognition and a hostile neural network for image generation.

一方、特許文献においては、例えば特許文献１には、ニューラルネットワークを利用した画像レイアウト装置が開示されている。 On the other hand, in Patent Literature, for example, Patent Literature 1 discloses an image layout device using a neural network.

特開２００３−２７４１３９号公報JP 2003-274139 A

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative Adversarial Text to Image Synthesis”, arXiv:1605.05396, 2016S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative Adversarial Text to Image Synthesis”, arXiv: 1605.05396, 2016 H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, D. Metacas, “StackGAN:Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”, arXiv:1612.03242, 2016H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, D. Metacas, “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”, arXiv: 1612.03242, 2016 V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, A. Courville, “Adversarially Learned Inference”, arXiv:1606.00704, 2016V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, A. Courville, “Adversarially Learned Inference”, arXiv: 1606.00704, 2016 J. Donahue, P. Krahenbuhl, and T. Darell, “Adversarial Feature Learning”, arXiv:1605.09782, 2016J. Donahue, P. Krahenbuhl, and T. Darell, “Adversarial Feature Learning”, arXiv: 1605.09782, 2016

非特許文献１及び２に記載された手法は、自然な言語表現（文章）の入力を想定しているが、必ずしも入力された文章に対して一義的に対応した情報を利用することはできないため、脳の活動による視覚関連情報を映像として可視化する場合、被検体の視覚体験を十分に再現できないという問題がある。 Although the methods described in Non-Patent Documents 1 and 2 assume the input of natural language expressions (sentences), it is not always possible to use information that uniquely corresponds to the input sentences. When visual information related to brain activity is visualized as a video, there is a problem that the visual experience of the subject cannot be sufficiently reproduced.

本発明は、このような問題を解決するためになされたもので、被検体の視覚体験をより正確に可視化した画像を生成する装置及び方法を提供することを目的とする。 The present invention has been made to solve such a problem, and an object thereof is to provide an apparatus and a method for generating an image in which a visual experience of a subject is visualized more accurately.

上記課題を解決するため、本発明は、画像データから画像特徴ベクトルを生成する画像特徴ベクトル生成手段と、画像に対応するテキストデータからキャプション特徴ベクトルを生成するキャプション特徴ベクトル生成手段と、キャプション特徴ベクトルと画像特徴ベクトルに応じて共通特徴ベクトルを生成する共通特徴ベクトル生成手段と、共通特徴ベクトルに応じて生成画像データを創出する画像生成手段と、脳の活動データを脳活動特徴ベクトルに変換する変換手段と、脳活動特徴ベクトルを画像特徴ベクトル生成手段により生成されるベクトル及び共通特徴ベクトルと比較して、最も相関が高いベクトルを生成するチャネルへ選択的に脳活動特徴ベクトルを供給する選択手段とを備えた画像生成装置を提供する。 In order to solve the above problems, the present invention provides an image feature vector generating means for generating an image feature vector from image data, a caption feature vector generating means for generating a caption feature vector from text data corresponding to an image, and a caption feature vector. A common feature vector generating unit that generates a common feature vector according to the image feature vector, an image generating unit that generates generated image data according to the common feature vector, and a conversion that converts brain activity data into a brain activity feature vector Means for comparing the brain activity feature vector with the vector generated by the image feature vector generation unit and the common feature vector, and selectively supplying the brain activity feature vector to the channel that generates the vector having the highest correlation. An image generation apparatus including the above is provided.

また、上記課題を解決するため、本発明は、画像データから画像特徴ベクトルを生成する第一のステップと、画像データに対応する言語データからキャプション特徴ベクトルを生成する第二のステップと、画像特徴ベクトルとキャプション特徴ベクトルに応じて共通特徴ベクトルを生成する第三のステップと、共通特徴ベクトルに応じて生成画像データを創出する第四のステップとを有する画像生成方法を提供する。 In order to solve the above problem, the present invention provides a first step of generating an image feature vector from image data, a second step of generating a caption feature vector from language data corresponding to the image data, and an image feature There is provided an image generation method including a third step of generating a common feature vector according to a vector and a caption feature vector, and a fourth step of generating generated image data according to the common feature vector.

本発明によれば、ユーザの視覚体験をより正確に可視化した画像を生成する装置及び方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the apparatus and method which produce | generate the image which visualized the user's visual experience more correctly can be provided.

本発明の実施の形態１に係る画像生成装置１の構成を示すブロック図である。It is a block diagram which shows the structure of the image generation apparatus 1 which concerns on Embodiment 1 of this invention. 図１に示された画像認識用深層ニューラルネットワーク部２の構成を示すブロック図である。It is a block diagram which shows the structure of the deep-layer neural network part 2 for image recognition shown by FIG. 図１に示された画像生成用敵対ニューラルネットワーク部４の構成を示すブロック図である。It is a block diagram which shows the structure of the adversity neural network part 4 for image generation shown by FIG. 本発明の実施の形態１に係る画像生成方法を示す第一のフローチャートである。It is a 1st flowchart which shows the image generation method which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る画像生成方法を示す第二のフローチャートである。It is a 2nd flowchart which shows the image generation method which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る画像生成方法を示す第三のフローチャートである。It is a 3rd flowchart which shows the image generation method which concerns on Embodiment 1 of this invention. 図４に示されたステップＳ１２におけるキャプション特徴ベクトルの生成方法を示す図である。It is a figure which shows the production | generation method of the caption feature vector in step S12 shown by FIG. 本発明の実施の形態２に係る画像生成装置１０の構成を示すブロック図である。It is a block diagram which shows the structure of the image generation apparatus 10 which concerns on Embodiment 2 of this invention. 図８に示された画像符号化用深層ニューラルネットワーク部１２の構成を示すブロック図である。It is a block diagram which shows the structure of the deep-layer neural network part 12 for an image encoding shown by FIG. 図８に示された画像復号化用深層ニューラルネットワーク部１４の構成を示すブロック図である。It is a block diagram which shows the structure of the deep-layer neural network part 14 for image decoding shown by FIG. 本発明の実施の形態３に係る画像生成装置２０の構成を示すブロック図である。It is a block diagram which shows the structure of the image generation apparatus 20 which concerns on Embodiment 3 of this invention.

以下において、本発明の実施の形態を図面を参照しつつ詳しく説明する。なお、図中同一符号は同一又は相当部分を示す。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the drawings, the same reference numerals indicate the same or corresponding parts.

本発明の実施の形態に係る画像生成装置及び画像生成方法は、ユーザの脳神経活動により認識される視覚体験を可視化した画像を生成するものである。本画像の生成においては、視覚要素を示す単純な情報を網羅的に抽出し、抽出された情報に関連する言語情報が補助的に利用される。すなわち、ユーザにより認識される画像をクエリ―として、言語情報を援用しながら、似た画像を自動的に生成する。以下において、本画像生成装置及び画像生成方法について詳しく説明する。 An image generation apparatus and an image generation method according to an embodiment of the present invention generate an image visualizing a visual experience recognized by a user's cranial nerve activity. In generating the main image, simple information indicating visual elements is exhaustively extracted, and linguistic information related to the extracted information is supplementarily used. That is, an image recognized by the user is used as a query, and a similar image is automatically generated while using language information. Hereinafter, the image generation apparatus and the image generation method will be described in detail.

［実施の形態１］
図１は、本発明の実施の形態１に係る画像生成装置１の構成を示すブロック図である。図１に示されるように、画像生成装置１は、画像認識用深層ニューラルネットワーク部２と、変換切替部３と、画像生成用敵対ニューラルネットワーク部４と、乱数生成部５と、表示部６と、記憶部７を備える。 [Embodiment 1]
FIG. 1 is a block diagram showing a configuration of an image generation apparatus 1 according to Embodiment 1 of the present invention. As shown in FIG. 1, the image generation apparatus 1 includes an image recognition deep neural network unit 2, a conversion switching unit 3, an image generation hostile neural network unit 4, a random number generation unit 5, and a display unit 6. The storage unit 7 is provided.

画像認識用深層ニューラルネットワーク部２は変換切替部３に接続され、画像生成用敵対ニューラルネットワーク部４は画像認識用深層ニューラルネットワーク部２及び乱数生成部５に接続される。また、表示部６は画像生成用敵対ニューラルネットワーク部４及び記憶部７に接続され、記憶部７はさらに画像生成用敵対ニューラルネットワーク部４に接続される。 The image recognition deep neural network unit 2 is connected to the conversion switching unit 3, and the image generation hostile neural network unit 4 is connected to the image recognition deep neural network unit 2 and the random number generation unit 5. The display unit 6 is connected to the image generation hostile neural network unit 4 and the storage unit 7, and the storage unit 7 is further connected to the image generation hostile neural network unit 4.

ここで、画像認識用深層ニューラルネットワーク部２は、入力される画像データDGと画像に対応するキャプションデータDL（ならびにその他のテキストデータDT）に応じて言語・画像共通特徴ベクトルVcを生成する。また、変換切替部３は、入力された画像観察時における脳の活動データDを脳活動特徴ベクトルVbに変換すると共に、画像認識用深層ニューラルネットワーク部２から供給された回帰結果信号Rに応じて脳の活動データDとの相関が最も高いチャネルを選択して、当該チャネルへ脳活動特徴ベクトルVbを供給する。 Here, the deep neural network unit 2 for image recognition generates a language / image common feature vector Vc according to the input image data DG and caption data DL (and other text data DT) corresponding to the image. Further, the conversion switching unit 3 converts the brain activity data D at the time of the input image observation into a brain activity feature vector Vb, and according to the regression result signal R supplied from the image recognition deep neural network unit 2. The channel having the highest correlation with the brain activity data D is selected, and the brain activity feature vector Vb is supplied to the channel.

また、画像生成用敵対ニューラルネットワーク部４は、画像データDGと言語・画像共通特徴ベクトルVc及び乱数ベクトルVrに応じて生成画像を創出する。 Further, the image generation hostile neural network unit 4 creates a generated image according to the image data DG, the language / image common feature vector Vc, and the random number vector Vr.

また、乱数生成部５は乱数ベクトルVrを生成し、表示部６は画像生成用敵対ニューラルネットワーク部４により創出され若しくは記憶部７に記憶された生成画像を表示する。また、記憶部７は画像生成用敵対ニューラルネットワーク部４により創出された生成画像を記憶する。 The random number generation unit 5 generates a random number vector Vr, and the display unit 6 displays a generated image created by the image generation hostile neural network unit 4 or stored in the storage unit 7. The storage unit 7 also stores the generated image created by the image generation hostile neural network unit 4.

図２は、図１に示された画像認識用深層ニューラルネットワーク部２の構成を示すブロック図である。図２に示されるように、画像認識用深層ニューラルネットワーク部２は画像認識用深層ニューラルネットワーク２１とキャプション特徴ベクトル生成部２２と共通特徴ベクトル生成部２３を備える。 FIG. 2 is a block diagram showing a configuration of the image recognition deep neural network unit 2 shown in FIG. As shown in FIG. 2, the image recognition deep neural network unit 2 includes an image recognition deep neural network 21, a caption feature vector generation unit 22, and a common feature vector generation unit 23.

画像認識用深層ニューラルネットワーク２１は変換切替部３に接続され、共通特徴ベクトル生成部２３は画像認識用深層ニューラルネットワーク２１、キャプション特徴ベクトル生成部２２及び変換切替部３に接続される。 The image recognition deep neural network 21 is connected to the conversion switching unit 3, and the common feature vector generation unit 23 is connected to the image recognition deep neural network 21, the caption feature vector generation unit 22, and the conversion switching unit 3.

ここで、画像認識用深層ニューラルネットワーク２１は、畳込演算層と全結合層から構成されるｎ個の層Ｌｖ１〜Ｌｖｎからなるニューラルネットワークであり、画像特徴ベクトルVgを生成する。キャプション特徴ベクトル生成部２２は、キャプションデータDLからキャプション特徴ベクトルVsを生成する。共通特徴ベクトル生成部２３は、画像に対応したキャプション特徴量が表現されるベクトル空間と画像特徴量が表現されるベクトル空間が一致するような変換を予め機械学習することで、言語・画像共通特徴ベクトルVcの生成が可能となる。 Here, the deep neural network for image recognition 21 is a neural network including n layers Lv1 to Lvn including a convolution operation layer and a fully connected layer, and generates an image feature vector Vg. The caption feature vector generation unit 22 generates a caption feature vector Vs from the caption data DL. The common feature vector generating unit 23 performs machine learning in advance so that the vector space in which the caption feature quantity corresponding to the image is represented matches the vector space in which the image feature quantity is represented, so that the language / image common feature is obtained. The vector Vc can be generated.

そして、図１に示された変換切替部３は、回帰や正準相関解析等の手法を適用することにより上記各層Ｌｖ１〜Ｌｖｎからの出力、画像認識用深層ニューラルネットワーク２１により生成された画像特徴ベクトルVg、及び共通特徴ベクトル生成部２３により生成された言語・画像共通特徴ベクトルVcと、脳活動特徴ベクトルVbとの間でそれぞれ算出された回帰結果を示す回帰結果信号Rに応じて、回帰結果の成績が最も良い、すなわち脳の活動データDとの相関度が最も高いベクトルを生成するチャネルを選択し、後述する実行フェーズ時に当該チャンネルに直接脳活動特徴ベクトルVbを供給し、脳活動データDから共通特徴ベクトルVcの生成を可能にする。 Then, the conversion switching unit 3 shown in FIG. 1 applies outputs such as regression and canonical correlation analysis, outputs from the layers Lv1 to Lvn, and image features generated by the deep neural network 21 for image recognition. According to the regression result signal R indicating the regression results respectively calculated between the vector Vg and the language / image common feature vector Vc generated by the common feature vector generation unit 23 and the brain activity feature vector Vb, the regression result Is selected, that is, the channel that generates the vector having the highest correlation with the brain activity data D, and the brain activity feature vector Vb is directly supplied to the channel during the execution phase described later. Allows generation of a common feature vector Vc.

従って、例えば、脳の活動データDと層Ｌｖ２からの出力との相関度が最も高い場合には、変換切替部３は層Ｌｖ２を最適チャネルとし、脳の活動データDと言語・画像共通特徴ベクトルVcとの相関度が最も高い場合には、変換切替部３は共通特徴ベクトル生成部２３を最適チャネルとし、脳活動特徴ベクトルVbを供給する。 Therefore, for example, when the correlation between the brain activity data D and the output from the layer Lv2 is the highest, the conversion switching unit 3 uses the layer Lv2 as the optimum channel, and the brain activity data D and the language / image common feature vector When the degree of correlation with Vc is the highest, the conversion switching unit 3 uses the common feature vector generation unit 23 as the optimum channel and supplies the brain activity feature vector Vb.

図３は、図１に示された画像生成用敵対ニューラルネットワーク部４の構成を示すブロック図である。図３に示されるように、画像生成用敵対ニューラルネットワーク部４は画像生成用深層ニューラルネットワーク４１と、判別用深層ニューラルネットワーク４２と、誤差検出器４３を備える。 FIG. 3 is a block diagram showing a configuration of the image generation hostile neural network unit 4 shown in FIG. As shown in FIG. 3, the image generation hostile neural network unit 4 includes an image generation deep neural network 41, a discrimination deep neural network 42, and an error detector 43.

画像生成用深層ニューラルネットワーク４１は、共通特徴ベクトル生成部２３と乱数生成部５及び誤差検出器４３に接続される。また、判別用深層ニューラルネットワーク４２は、共通特徴ベクトル生成部２３と画像生成用深層ニューラルネットワーク４１及び誤差検出器４３に接続される。 The image generation deep neural network 41 is connected to the common feature vector generation unit 23, the random number generation unit 5, and the error detector 43. Further, the discrimination deep neural network 42 is connected to the common feature vector generation unit 23, the image generation deep neural network 41, and the error detector 43.

ここで、画像生成用深層ニューラルネットワーク４１は、ｎ′個の層Ｌｇ１〜Ｌｇｎ′からなるニューラルネットワークであり、各層は畳込演算層と全結合層から構成される。この画像生成用深層ニューラルネットワーク４１においては、言語・画像共通特徴ベクトルVcの情報を利用しながら乱数ベクトルVrを元に画像が合成される。 Here, the image generation deep neural network 41 is a neural network including n ′ layers Lg1 to Lgn ′, and each layer includes a convolution operation layer and a fully connected layer. In the deep neural network 41 for image generation, an image is synthesized based on the random number vector Vr while using information of the language / image common feature vector Vc.

一方、判別用深層ニューラルネットワーク４２は、ｎ″個の層Ｌｄ１〜Ｌｄｎ″からなるニューラルネットワークであり、各層は畳込演算層と全結合層から構成される。この判別用深層ニューラルネットワーク４２においては、画像データベース等にある画像データ（実画像）DG及び画像生成用深層ニューラルネットワーク４１が創出した生成画像が入力され、入力された画像が実画像であるか生成画像であるかに応じて判別信号が出力される。 On the other hand, the distinguishing deep neural network 42 is a neural network including n ″ layers Ld1 to Ldn ″, and each layer includes a convolution operation layer and a fully connected layer. In the discrimination deep neural network 42, image data (actual image) DG in an image database or the like and a generation image created by the image generation deep neural network 41 are input, and generation is performed to determine whether the input image is a real image. A discrimination signal is output depending on whether the image is an image.

また、誤差検出器４３は、判別用深層ニューラルネットワーク４２が、実画像を入力したのに生成画像を示す判別信号を出力し、若しくは生成画像を入力したのに実画像を示す判別信号を出力した場合、判別用深層ニューラルネットワーク４２により誤った判別信号が生成されたものとみなし、画像生成用深層ニューラルネットワーク４１及び判別用深層ニューラルネットワーク４２に対して、当該演算で用いる重み係数を修正するための信号errを供給する。 Further, the error detector 43 outputs a determination signal indicating the generated image even though the deep neural network for determination 42 inputs the actual image, or outputs a determination signal indicating the actual image even though the generated image is input. In this case, it is considered that an erroneous discrimination signal is generated by the discrimination deep neural network 42, and the weighting coefficient used in the calculation is corrected for the image generation deep neural network 41 and the discrimination deep neural network 42. Supply signal err.

このようにして、画像生成用敵対ニューラルネットワーク部４では、画像生成用深層ニューラルネットワーク４１は判別用深層ニューラルネットワーク４２にとって、実画像と区別できないような生成画像を出力するように学習し、判別用深層ニューラルネットワーク４２は実画像と生成画像を正しく区別できるように学習する。 In this way, in the image generation hostile neural network unit 4, the image generation deep neural network 41 learns to output a generated image that cannot be distinguished from the actual image for the determination deep neural network 42, and performs the determination. The deep neural network 42 learns so that a real image and a generated image can be correctly distinguished.

図４から図６は、本発明の実施の形態１に係る画像生成方法を示すフローチャートである。なお、後に詳しく説明するように、図４は画像データDGと画像のキャプションデータDLから画像認識用深層ニューラルネットワーク部２の最適化学習を行う第一の学習フェーズを示し、図５は画像データDGと脳活動データDを使って変換切り替え部３の最適化学習を行う第二の学習フェーズを示す。また、図６は脳活動データDのみを用いる実行フェーズを示す。 4 to 6 are flowcharts showing an image generation method according to Embodiment 1 of the present invention. As will be described in detail later, FIG. 4 shows a first learning phase in which optimization learning of the image recognition deep neural network unit 2 is performed from image data DG and image caption data DL, and FIG. 5 shows image data DG. And a second learning phase in which the optimization learning of the conversion switching unit 3 is performed using the brain activity data D. FIG. 6 shows an execution phase using only brain activity data D.

以下においては、図１に示された画像生成装置１を用いて図４から図６に示された画像生成方法を実行する場合を一例として説明するが、本方法は図１に示された画像生成装置１を用いて行う場合に限られず、他の手段を用いて実行されても良いことは言うまでもない。 In the following, a case where the image generation method shown in FIGS. 4 to 6 is executed by using the image generation apparatus 1 shown in FIG. 1 will be described as an example. Needless to say, the present invention is not limited to the case of using the generation apparatus 1 and may be executed using other means.

図４に示された最初のステップＳ１１において、画像認識用深層ニューラルネットワーク２１は、脳により認識される画像データDGに基づいて画像特徴ベクトルVgを生成する。なお、画像特徴ベクトルVgは、例えばコンピュータビジョン研究でよく用いられるＶＧＧ１９（”Very Deep Convolutional Networks for Large-Scale Image Recognition”, K.Simonyan, A.Zisserman, arXiv:1409.1556）を使用することにより生成される。 In the first step S11 shown in FIG. 4, the image recognition deep neural network 21 generates an image feature vector Vg based on the image data DG recognized by the brain. The image feature vector Vg is generated by using, for example, VGG19 (“Very Deep Convolutional Networks for Large-Scale Image Recognition”, K. Simonyan, A. Zisserman, arXiv: 1409.1556) often used in computer vision research. The

次のステップＳ１２においては、キャプション特徴ベクトル生成部２２が当該画像に対応するキャプションデータDLからキャプション特徴ベクトルVsを生成する。以下においては、キャプション特徴ベクトルVsの生成方法を、図７を参照しつつ詳しく説明する。 In the next step S12, the caption feature vector generation unit 22 generates a caption feature vector Vs from the caption data DL corresponding to the image. Hereinafter, a method for generating the caption feature vector Vs will be described in detail with reference to FIG.

なお、図７においては、画像データDGに対応したキャプションデータDLに基づいて、キャプション特徴ベクトルVｓを生成する場合が示されている。 FIG. 7 shows a case where the caption feature vector Vs is generated based on the caption data DL corresponding to the image data DG.

図７に示されるように、まずは、画像に対応したキャプションデータDLと予め用意された単語リストＷとを照合し、リストアップされた各単語ｗ_１〜ｗ_kを抽出して出現回数をカウントし、リスト単語出現回数ベクトルw_oを生成する。 As shown in FIG. 7, first, the caption data DL corresponding to the image is collated with a word list W prepared in advance, and the listed words w _{1 to} w _k are extracted to count the number of appearances. The list word appearance frequency vector w_o is generated.

一方、画像に関係ないテキストデータDTに対しても上記単語リストＷと照合し、全リスト単語の特徴ベクトルの集合であるリスト単語特徴行列Ａを生成する。なお、このリスト単語特徴行列Ａは、テキストデータから各単語の出現関係を機械学習することによって各単語をベクトル表現する手法であるワードトゥベック（Word2vec）などの自然言語処理により生成される。そして、生成されたリスト単語特徴行列Ａに基づいて、リスト単語の類似度行列Ｃが算出される。 On the other hand, text data DT not related to an image is also collated with the word list W to generate a list word feature matrix A that is a set of feature vectors of all list words. The list word feature matrix A is generated by natural language processing such as WordTubec (Word2vec), which is a technique for expressing each word as a vector by machine learning of the appearance relation of each word from text data. Based on the generated list word feature matrix A, a list word similarity matrix C is calculated.

次に、上記のリスト単語出現回数ベクトルにリスト単語の類似度行列Ｃを乗じることによって、リスト単語出現重みベクトルw_o’を生成する。 Next, the list word appearance weight vector w_o ′ is generated by multiplying the list word appearance frequency vector by the list word similarity matrix C.

ここで、リスト単語出現回数ベクトルにリスト単語の類似度行列Ｃを乗じることは、キャプションに直接出現しない単語の寄与を、他のテキストとの間における言語間の類似度を利用して加味することを意味する。 Here, multiplying the list word appearance frequency vector by the list word similarity matrix C takes into account the contribution of words that do not appear directly in the caption using the similarity between languages with other texts. Means.

次に、リスト単語出現重みベクトルw_o’にリスト単語特徴行列Ａを乗じることにより、キャプション特徴ベクトルVsを生成する。 Next, a caption feature vector Vs is generated by multiplying the list word appearance weight vector w_o ′ by the list word feature matrix A.

その後、ステップＳ１３において、共通特徴ベクトル生成部２３は、変換切替部３により生成された脳活動特徴ベクトルVbと画像認識用深層ニューラルネットワーク２１により生成された画像特徴ベクトルVgとキャプション特徴ベクトル生成部２２により生成されたキャプチョン特徴ベクトルVsに応じて、言語・画像共通特徴ベクトルVcを生成する。 Thereafter, in step S13, the common feature vector generation unit 23 generates the brain activity feature vector Vb generated by the conversion switching unit 3, the image feature vector Vg generated by the image recognition deep neural network 21, and the caption feature vector generation unit 22. The language / image common feature vector Vc is generated according to the caption feature vector Vs generated by the above.

ここで、共通特徴ベクトル生成部２３は、回帰解析や正準相関解析の他、自然言語処理で広く用いられる公知論文（”Unifying visual semantic embedings with multimodal neural language models”, Kiros, Salakhutdinov, Zemel, arXiv:1411.2539）に記載された方法を用いて画像特徴ベクトルとキャプション特徴ベクトルが一致するベクトルに変換されるよう予め学習させたものとされる。 Here, the common feature vector generating unit 23 is a well-known paper widely used in natural language processing (“Unifying visual semantic embedings with multimodal neural language models”, Kiros, Salakhutdinov, Zemel, arXiv) in addition to regression analysis and canonical correlation analysis. : 1411.2539) is learned in advance so that the image feature vector and the caption feature vector are converted into a matching vector.

次に、ステップＳ１４において、画像生成用敵対ニューラルネットワーク部４は、言語・画像共通特徴ベクトルVcと乱数ベクトルVrに応じて、生成画像データを創出する。なお、創出された生成画像データは、ユーザの操作に応じて表示部６に表示され、又は記憶部７に記憶される。 Next, in step S14, the image generation hostile neural network unit 4 creates generated image data according to the language / image common feature vector Vc and the random number vector Vr. The created generated image data is displayed on the display unit 6 or stored in the storage unit 7 in accordance with a user operation.

第二の学習フェーズでは、図５に示されたステップＳ２１において、脳の各領域（部位）を対象として脳・神経活動計測装置を用いて計測したデータから構成される脳活動データDを取得する。 In the second learning phase, in step S21 shown in FIG. 5, brain activity data D composed of data measured using a brain / nerve activity measuring device for each region (part) of the brain is acquired. .

次に、ステップＳ２２において、変換切替部３は脳活動データDを変換して脳活動特徴ベクトルVbを生成する。 Next, in step S22, the conversion switching unit 3 converts the brain activity data D to generate a brain activity feature vector Vb.

次に、ステップＳ２３において、変換切替部３は、画像認識用深層ニューラルネットワーク２１の各層の出力ベクトルならびに、共通特徴ベクトルVcと、脳活動特徴ベクトルVbを比較し、最も回帰結果の良い変換に応じて最適化学習を行うとともに、脳活動特徴ベクトルVbと最も相関の高いベクトルを生成するチャネルを最適チャネルとして選択する。 Next, in step S23, the conversion switching unit 3 compares the output vector of each layer of the image recognition deep neural network 21, the common feature vector Vc, and the brain activity feature vector Vb, and responds to the conversion with the best regression result. Optimization learning and a channel that generates a vector having the highest correlation with the brain activity feature vector Vb is selected as the optimum channel.

実行フェーズにおいては、図６に示されたステップＳ３１において脳活動データDを取得し、ステップＳ３２で変換切替部３は脳活動データDを脳活動特徴ベクトルVbに変換した後、ステップＳ３３で変換切替部３が脳活動特徴ベクトルVbを画像認識用深層ニューラルネットワーク部２の最適チャネルに供給する。 In the execution phase, the brain activity data D is acquired in step S31 shown in FIG. 6, the conversion switching unit 3 converts the brain activity data D into the brain activity feature vector Vb in step S32, and then the conversion is switched in step S33. The unit 3 supplies the brain activity feature vector Vb to the optimum channel of the deep neural network unit 2 for image recognition.

そして、ステップＳ３４では、画像生成用敵対ニューラルネットワーク部４は、画像認識用深層ニューラルネットワーク部２によって脳活動特徴ベクトルVbに基づき生成された言語・画像共通特徴ベクトルVcに応じて生成画像データを創出する。 In step S34, the image generation hostile neural network unit 4 creates generated image data according to the language / image common feature vector Vc generated by the image recognition deep neural network unit 2 based on the brain activity feature vector Vb. To do.

以上より、本発明の実施の形態１に係る画像生成装置１によれば、視覚のみならず言語情報の特徴量表現も利用して画像を生成するため、ユーザの視覚体験をより正確に可視化した画像を創出することができる。 As described above, according to the image generation device 1 according to the first embodiment of the present invention, an image is generated using not only visual information but also feature amount expression of language information, so that the visual experience of the user can be visualized more accurately. Images can be created.

［実施の形態２］
以下においては、実施の形態１との相違点のみにつき説明し、共通点については説明を省略する。 [Embodiment 2]
In the following, only differences from the first embodiment will be described, and description of common points will be omitted.

図８は、本発明の実施の形態２に係る画像生成装置１０の構成を示すブロック図である。図８に示されるように、実施の形態２に係る画像生成装置１０は実施の形態１に係る画像生成装置１と同様な構成を有するが、画像認識用深層ニューラルネットワーク部２と画像生成用敵対ニューラルネットワーク部４の替わりに、画像に対応したキャプションデータなどの言語データDLが供給され、キャプション特徴ベクトルVsを生成するキャプション特徴ベクトル生成部２２と、画像符号化用深層ニューラルネットワーク部１２と、画像データDGとキャプション特徴ベクトルVsが供給される画像復号化用深層ニューラルネットワーク部１４が備えられたものである。 FIG. 8 is a block diagram showing the configuration of the image generation apparatus 10 according to Embodiment 2 of the present invention. As shown in FIG. 8, the image generation device 10 according to the second embodiment has the same configuration as the image generation device 1 according to the first embodiment, but the image recognition deep neural network unit 2 and the image generation hostile. Instead of the neural network unit 4, language data DL such as caption data corresponding to an image is supplied, a caption feature vector generating unit 22 for generating a caption feature vector Vs, an image encoding deep neural network unit 12, and an image An image decoding deep neural network unit 14 to which data DG and caption feature vector Vs are supplied is provided.

また、図８に示された変換切替部１３は、図１に示された変換切替部３と同様な機能を有するが、各層Ｌｅ１〜Ｌｅｎからの出力、ベクトル生成部１６により生成された推定乱数ベクトルEVr、推定キャプション特徴ベクトルEVsと、脳活動特徴ベクトルVbとの間でそれぞれ算出された回帰結果を示す回帰結果信号Rに応じて、回帰結果の成績が最も良い、すなわち相関度が最も高いベクトルを生成するチャネルを選択し、実行フェーズ時に脳活動特徴ベクトルVbを選択したチャネルへ供給する。 The conversion switching unit 13 shown in FIG. 8 has the same function as the conversion switching unit 3 shown in FIG. 1, but the output from each layer Le1 to Len and the estimated random number generated by the vector generation unit 16 According to the regression result signal R indicating the regression result calculated between the vector EVr, the estimated caption feature vector EVs, and the brain activity feature vector Vb, the vector having the best regression result, that is, the highest correlation degree Is selected, and the brain activity feature vector Vb is supplied to the selected channel during the execution phase.

図９は、図８に示された画像符号化用深層ニューラルネットワーク部１２の構成を示すブロック図である。図９に示されるように、画像符号化用深層ニューラルネットワーク部１２は画像符号化用深層ニューラルネットワーク１５とベクトル生成部１６を備える。 FIG. 9 is a block diagram showing a configuration of the image coding deep neural network unit 12 shown in FIG. As shown in FIG. 9, the image coding deep neural network unit 12 includes an image coding deep neural network 15 and a vector generation unit 16.

画像符号化用深層ニューラルネットワーク１５は変換切替部１３と誤差検出器１７に接続され、ベクトル生成部１６は画像符号化用深層ニューラルネットワーク１５及び変換切替部１３に接続される。 The image encoding deep neural network 15 is connected to the conversion switching unit 13 and the error detector 17, and the vector generation unit 16 is connected to the image encoding deep neural network 15 and the conversion switching unit 13.

ここで、画像符号化用深層ニューラルネットワーク１５は、畳込演算層と全結合層から構成されるｎ個の層Ｌｅ１〜Ｌｅｎからなるニューラルネットワークであり、画像特徴ベクトルVgを生成する。また、ベクトル生成部１６は、画像特徴ベクトルVgを入力し、推定キャプション特徴ベクトルEVs及び推定乱数ベクトルEVrを出力する。 Here, the image encoding deep neural network 15 is a neural network including n layers Le1 to Len including a convolution operation layer and a fully connected layer, and generates an image feature vector Vg. Further, the vector generation unit 16 receives the image feature vector Vg and outputs an estimated caption feature vector EVs and an estimated random number vector EVr.

図１０は、図８に示された画像復号化用深層ニューラルネットワーク部１４の構成を示すブロック図である。図１０に示されるように、画像復号化用深層ニューラルネットワーク部１４は図３に示された画像生成用敵対ニューラルネットワーク部４と同様な構成を有するが、画像復号化用深層ニューラルネットワーク１８は、乱数ベクトルVrとキャプション特徴ベクトル生成部２２が生成するキャプション特徴ベクトルVsを入力して画像を生成する。 FIG. 10 is a block diagram showing a configuration of the image decoding deep neural network unit 14 shown in FIG. As shown in FIG. 10, the image decoding deep neural network unit 14 has the same configuration as the image generation hostile neural network unit 4 shown in FIG. The random number vector Vr and the caption feature vector Vs generated by the caption feature vector generation unit 22 are input to generate an image.

また、判別用深層ニューラルネットワーク１９は、キャプション特徴ベクトルVs、乱数ベクトルVr、及び生成画像（以下「生成画像関連データ」という。）か、若しくは推定乱数ベクトルEVr及びベクトル生成部１６で生成された推定キャプション特徴ベクトルEVsと、画像データDG（以下「実画像関連データ」という。）を入力とし、入力されたデータが生成画像関連データか実画像関連データかを判別する。 In addition, the discrimination deep neural network 19 includes a caption feature vector Vs, a random number vector Vr, and a generated image (hereinafter referred to as “generated image related data”), or an estimated random number vector EVr and an estimation generated by the vector generation unit 16. Caption feature vector EVs and image data DG (hereinafter referred to as “real image related data”) are input, and it is determined whether the input data is generated image related data or real image related data.

ここで、上記のように推定キャプション特徴ベクトルEVs及び推定乱数ベクトルEVrを利用する画像生成装置１０の実装は、公知論文（”Adversarial Feature Learning”, Jeff Donahue, Philipp Krahenbuhl, Trevor Darrell, arXiv:1605.09782や”Adversarially Learned Inference”, Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville, arXiv:1606.00704）に示された方法により実現できる。 Here, as described above, the implementation of the image generation device 10 using the estimated caption feature vector EVs and the estimated random number vector EVr is disclosed in a known paper (“Adversarial Feature Learning”, Jeff Donahue, Philipp Krahenbuhl, Trevor Darrell, arXiv: 1605.09782 "Adversarially Learned Inference", Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville, arXiv: 1606.00704).

以上のような本発明の実施の形態２に係る画像生成装置１０によれば、画像データDGのみならず画像に対応したキャプションデータDLなどの言語データをも参照して画像を生成するため、かかる言語データが使用できる環境においては、脳により認識される実画像により近い画像を創出することができる。 According to the image generation apparatus 10 according to the second embodiment of the present invention as described above, an image is generated with reference to not only the image data DG but also language data such as caption data DL corresponding to the image. In an environment where language data can be used, an image closer to the real image recognized by the brain can be created.

［実施の形態３］
図１１は、本発明の実施の形態３に係る画像生成装置２０の構成を示すブロック図である。図１１に示されるように、画像生成装置２０は、上記実施の形態２に係る画像生成装置１０においてさらに、上記実施の形態１に係る画像認識用深層ニューラルネットワーク部２を追加した構成を有する。ただし、画像復号化用深層ニューラルネットワーク部１４には、キャプション特徴ベクトル生成部２２が生成するキャプション特徴ベクトルVsの代わりに、画像認識用深層ニューラルネットワーク部２において生成される言語・画像共通特徴ベクトルVcが入力される。 [Embodiment 3]
FIG. 11 is a block diagram showing a configuration of the image generation apparatus 20 according to Embodiment 3 of the present invention. As shown in FIG. 11, the image generation apparatus 20 has a configuration in which the image recognition deep neural network unit 2 according to the first embodiment is further added to the image generation apparatus 10 according to the second embodiment. However, the image decoding deep neural network unit 14 uses the language / image common feature vector Vc generated in the image recognition deep neural network unit 2 instead of the caption feature vector Vs generated by the caption feature vector generation unit 22. Is entered.

このような構成を有する本発明の実施の形態３に係る画像生成装置２０によれば、一つの装置において、上記実施の形態１に係る画像生成装置１と上記実施の形態２に係る画像生成装置１０の双方の機能を実現することができる。 According to the image generation apparatus 20 according to the third embodiment of the present invention having such a configuration, the image generation apparatus 1 according to the first embodiment and the image generation apparatus according to the second embodiment are combined in one apparatus. Both functions can be realized.

１，１０，２０画像生成装置
２画像認識用深層ニューラルネットワーク部
３，１３変換切替部
４画像生成用敵対ニューラルネットワーク部
１２画像符号化用深層ニューラルネットワーク部
１４画像復号化用深層ニューラルネットワーク部
１５画像符号化用深層ニューラルネットワーク
１６ベクトル生成部
１８画像復号化用深層ニューラルネットワーク
２１画像認識用深層ニューラルネットワーク
２２キャプション特徴ベクトル生成部
２３共通特徴ベクトル生成部
４１画像生成用深層ニューラルネットワーク
１９，４２判別用深層ニューラルネットワーク
１７，４３誤差検出器

1, 10, 20 Image generation device 2 Image recognition deep neural network unit 3, 13 Conversion switching unit 4 Image generation hostile neural network unit 12 Image encoding deep neural network unit 14 Image decoding deep neural network unit 15 Image Deep neural network for encoding 16 Vector generating unit 18 Deep neural network for image decoding 21 Deep neural network for image recognition 22 Caption feature vector generating unit 23 Common feature vector generating unit 41 Image generating deep neural network 19, 42 Discriminating deep layer Neural network 17, 43 Error detector

Claims

Image feature vector generation means for generating an image feature vector from image data;
Caption feature vector generating means for generating a caption feature vector from text data corresponding to an image;
Common feature vector generation means for generating a common feature vector according to the caption feature vector and the image feature vector;
Image generating means for generating generated image data according to the common feature vector;
Conversion means for converting brain activity data into brain activity feature vectors;
A selection unit that compares the brain activity feature vector with the vector generated by the image feature vector generation unit and the common feature vector, and selectively supplies the brain activity feature vector to a channel that generates a vector having the highest correlation. An image generation apparatus comprising:

The image feature vector generating means is constituted by a deep neural network,
The image generation apparatus according to claim 1, wherein the image generation unit is configured by a hostile neural network.

The image generation apparatus according to claim 1, wherein the caption feature vector generation unit generates the caption feature vector in consideration of similarity between languages.

The image generation apparatus according to claim 1, wherein the image generation unit further generates the generated image data in accordance with the caption feature vector.

The image feature vector generating means is constituted by a deep neural network,
The image generation device according to claim 4, wherein the image generation means is configured by an adversarial neural network.

A first step of generating an image feature vector from the image data;
A second step of generating a caption feature vector from language data corresponding to the image data;
A third step of generating a common feature vector according to the image feature vector and the caption feature vector;
And a fourth step of generating generated image data according to the common feature vector.

The image generation method according to claim 6, wherein in the second step, the caption feature vector is generated in consideration of similarity between words.

The image generation method according to claim 6, wherein in the fourth step, the generated image data is further created based on the image data and language data.