JP2016085579A

JP2016085579A - Image processing apparatus and method for interactive device, and the interactive device

Info

Publication number: JP2016085579A
Application number: JP2014217541A
Authority: JP
Inventors: チョン・ジーン; Gene Cheung; リュー・シャミング; Xiaming Liu; 浩嗣三功; Hiroshi Sanko; 内藤　整; Hitoshi Naito; 整内藤
Original assignee: KDDI Corp; Research Organization of Information and Systems
Current assignee: KDDI Corp; Research Organization of Information and Systems
Priority date: 2014-10-24
Filing date: 2014-10-24
Publication date: 2016-05-19

Abstract

PROBLEM TO BE SOLVED: To perform correction so that visual lines of interlocutors match each other, and also solve the "occlusion" problem and achieve face beautification at the same time, in an image processing apparatus for an interactive device.SOLUTION: An image processing apparatus for an interactive device includes a first dictionary memory for storing a first dictionary matrix (Φ) for a face reconfiguration process for a blank filling process that occurs due to visual line correction and a second dictionary memory for storing a second dictionary matrix (Ψ) for a face beautification process. When an object function is minimized that includes a first sparse code vector (α) for the first dictionary matrix (Φ), a second sparse code vector (β) for the second dictionary matrix (Ψ), and linear conversion (L) from reconfigured face image data to face beautified simple image data, the optimum value of each of the first sparse code vector (α), the second sparse code vector (β), and the linear conversion (L) is obtained on the basis of a face image vector (x), and face image data that has undergone the face reconfiguration process and face beautification process is generated, and the generated face image data are output.SELECTED DRAWING: Figure 1

Description

本発明は、テレビ電話又はテレビ会議などに利用される対話装置のための画像処理装置及び方法、並びに対話装置に関する。 The present invention relates to an image processing apparatus and method for an interactive apparatus used for a videophone or a video conference, and an interactive apparatus.

安価であって高品質のビデオキャプチャと高信頼性の広帯域伝送を可能にする画像化技術やネットワーク技術の出現により、物理的に長い距離によって分離された二者間を結ぶビデオ会議システムは、例えばスカイプやグーグルプラスのハングアウトなどのツールを用いて容易に実現できるユビキタスなシステムとなっている。 With the advent of imaging technology and network technology that enables low-cost, high-quality video capture and highly reliable broadband transmission, video conferencing systems that connect two people physically separated by long distances, for example, It is a ubiquitous system that can be easily implemented using tools such as Skype and Google Plus Hangouts.

これらのツールに共通の問題は、視線の不一致の問題である。キャプチャカメラは、典型的には、ディスプレイモニタの上側又は下側に置かれるが、ユーザは相手方の顔を見て通話するので、ユーザ同士は視線を一致させて通話をすることができず、ビジュアルコミュニケーションの品質を低下させているという問題点があった。 A common problem with these tools is the problem of gaze mismatch. The capture camera is typically placed on the upper or lower side of the display monitor, but since the user talks while looking at the face of the other party, the users cannot talk with each other with the same line of sight. There was a problem of reducing the quality of communication.

３次元画像化（例えば、非特許文献１参照）における最近の進歩を活用した場合において、視線不一致の問題への１つの一般的なアプローチは、テクスチャ画像と奥行きを有する視点画像がキャプチャカメラ（例えば、非特許文献３〜５参照）から入手可能であるという条件のもとで、奥行き画像に基づくレンダリング（Depth-Image-Based Rendering（以下、ＤＩＢＲという。））を用いてスクリーン（仮想画像）の中心から見た視点画像を合成することである（例えば、非特許文献６〜８参照）。 In taking advantage of recent advances in 3D imaging (eg, see Non-Patent Document 1), one common approach to the problem of gaze mismatch is that a texture image and a viewpoint image with depth capture camera (eg, (See Non-Patent Documents 3 to 5), a screen (virtual image) using rendering based on depth images (DEPth-Image-Based Rendering (hereinafter referred to as DIBR)). This is to synthesize a viewpoint image viewed from the center (for example, see Non-Patent Documents 6 to 8).

また、特許文献１では、対話型通信において対話者同士の視線が一致した自然な対話環境を容易に実現できる対話装置を提供するために、画像処理モジュールは、入力された画像データに含まれる人物の視線が画像データの正面を向くように、画像データに含まれる人物の眼瞼裂領域（眼球の露出領域）の画素を変更することによって虹彩及び瞳孔の位置を変更する。ここで、画像処理モジュールは、顔認識部、視線判定部及び視線変更部を有する。顔認識部は、顔認識処理の結果にもとづいて、顔全体の向きを規定するベクトル（顔ベクトル）の情報を抽出する。視線変更部は、顔ベクトルの向きにもとづいて顔が表示画面を向いていると判定されると、人物の視線が画像データの正面を向くよう、眼瞼裂領域内の画素を変更する。 Further, in Patent Document 1, in order to provide an interactive apparatus that can easily realize a natural interactive environment in which the lines of sight of the interlocutors match in interactive communication, an image processing module includes a person included in input image data. The positions of the iris and pupil are changed by changing the pixels of the eyelid area (exposed area of the eyeball) of the person included in the image data so that the line of sight of the image faces the front of the image data. Here, the image processing module includes a face recognition unit, a line-of-sight determination unit, and a line-of-sight change unit. The face recognition unit extracts information on a vector (face vector) that defines the orientation of the entire face based on the result of the face recognition process. When it is determined that the face is facing the display screen based on the orientation of the face vector, the line-of-sight changing unit changes the pixels in the eyelid region so that the person's line of sight faces the front of the image data.

さらに、特許文献２及び３においても、通話者等の視線を一致させて補正する方法が開示されている。 Further, Patent Documents 2 and 3 disclose a method of correcting the line of sight of a caller or the like by matching them.

特開２００９−２４６４０８号公報JP 2009-246408 A 特開２００５−１１７１０６号公報JP 2005-117106 A 特開２０１３−１８２６１６号公報JP 2013-182616 A 特開２００４−１４７２８８号公報JP 2004-147288 A

M. Tanimoto, M. P. Tehrani, T. Fujii and T. Yendo, "Free-Viewpoint TV," IEEE Signal Processing Magazine, vol.28, no.1, January 2011.M. Tanimoto, M. P. Tehrani, T. Fujii and T. Yendo, "Free-Viewpoint TV," IEEE Signal Processing Magazine, vol.28, no.1, January 2011. D. Tian, P.-L. Lai, P. Lopez and C. Gomila, "View Synthesis Techniques for 3D Video," Applications of Digital Image Processing XXXII, Proceedings of the SPIE, 7443 (2009), 2009.D. Tian, P.-L. Lai, P. Lopez and C. Gomila, "View Synthesis Techniques for 3D Video," Applications of Digital Image Processing XXXII, Proceedings of the SPIE, 7443 (2009), 2009. S.-B. Lee, I.-Y. Shin and Y.-S. Ho, "Gaze-corrected View Generation Using Stereo Camera System for Immersive Videoconferencing," IEEE Transactions on Consumer Electronics, vol.57, no.3, pp.1033-1040, August 2011.S.-B. Lee, I.-Y. Shin and Y.-S. Ho, "Gaze-corrected View Generation Using Stereo Camera System for Immersive Videoconferencing," IEEE Transactions on Consumer Electronics, vol.57, no.3, pp.1033-1040, August 2011. C. Kuster, T. Popa, J.-C. Bazin, C. Gotsman and M. Gross, "Gaze Correction for Home Video Conferencing," ACM SIG-GRAPH Asia, vol.31, no.6, November 2012.C. Kuster, T. Popa, J.-C. Bazin, C. Gotsman and M. Gross, "Gaze Correction for Home Video Conferencing," ACM SIG-GRAPH Asia, vol.31, no.6, November 2012. W. Eng, D. Min, V.-A. Nguyen, J. Lu and M. Do, "Gaze Correction for 3D Tele-immersive Communication System," 11th IEEE IVMSP Workshop: 3D Image/Video Technologies and Applications, Seoul, Korea, June 2013.W. Eng, D. Min, V.-A.Nguyen, J. Lu and M. Do, "Gaze Correction for 3D Tele-immersive Communication System," 11th IEEE IVMSP Workshop: 3D Image / Video Technologies and Applications, Seoul, Korea, June 2013. K.-J. Oh, S. Yea and Y.-S. Ho, "Hole-Filling Method Using Depth Based In-Painting Fore View Synthesis in Free Viewpoint Television FTV and 3D Video," Picture Coding Symposium, Chicago, IL, May 2009.K.-J.Oh, S. Yea and Y.-S.Ho, "Hole-Filling Method Using Depth Based In-Painting Fore View Synthesis in Free Viewpoint Television FTV and 3D Video," Picture Coding Symposium, Chicago, IL, May 2009. S. Reel, G. Cheung, P. Wong and L. Dooley, "Joint Texture-Depth Pixel Inpainting of Dis-occlusion Holes in Virtual View Synthesis," APSIPA ASC, Kaohsiung, Taiwan, October 2013.S. Reel, G. Cheung, P. Wong and L. Dooley, "Joint Texture-Depth Pixel Inpainting of Dis-occlusion Holes in Virtual View Synthesis," APSIPA ASC, Kaohsiung, Taiwan, October 2013. B. Macchiavello, C. Dorea, E. M. Hung and G. Cheung and I. Bajic, "Low-Saliency Prior for Disocclusion Hole Filling in DIBR-Synthesized Images," IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, May 2014.B. Macchiavello, C. Dorea, EM Hung and G. Cheung and I. Bajic, "Low-Saliency Prior for Disocclusion Hole Filling in DIBR-Synthesized Images," IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, May 2014. T. Leyvand, D. Cohen-Or, G. Dror and D. Lischinski, "Data-Driven Enhancement of Facial Attractiveness," ACM SIG-GRAPH, vol.27, no.3, August 2008.T. Leyvand, D. Cohen-Or, G. Dror and D. Lischinski, "Data-Driven Enhancement of Facial Attractiveness," ACM SIG-GRAPH, vol.27, no.3, August 2008. H. Yue, X. Sun, J. Yang and F. Wu, "Cloud-based Image Coding for Mobile Devices-Towards Thousands To One Compression," IEEE Transactions on Multimedia, vol.15, no.4, pp.845-857, June 2013.H. Yue, X. Sun, J. Yang and F. Wu, "Cloud-based Image Coding for Mobile Devices-Towards Thousands To One Compression," IEEE Transactions on Multimedia, vol.15, no.4, pp.845- 857, June 2013. J. Gemmell, k. Toyama, C. L. Zitnick, T. Kang, and S. Seitz, "Gaze awareness for video conferencing: A software approach, IEEE Multimedia, vol.7, no.4, pp. 2635, Oct-Dec 2000.J. Gemmell, k. Toyama, CL Zitnick, T. Kang, and S. Seitz, "Gaze awareness for video conferencing: A software approach, IEEE Multimedia, vol.7, no.4, pp. 2635, Oct-Dec 2000 . A. Criminisi, J. Shotton, A. Blake, and P. Toor, "Gaze manipulation for one-to-one teleconferencing, IEEE International Conference on Computer Vision, Nice, France, October 2003.A. Criminisi, J. Shotton, A. Blake, and P. Toor, "Gaze manipulation for one-to-one teleconferencing, IEEE International Conference on Computer Vision, Nice, France, October 2003. R. Yang and Z. Zhang, "Eye gaze correction with stereovision for video-teleconferencing, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, no.7, pp.956960, July 2004.R. Yang and Z. Zhang, "Eye gaze correction with stereovision for video-teleconferencing, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, no.7, pp.956960, July 2004. G. Cheung, J. Ishida, A. Kubota, A. Ortega, "Transform Domain Sparsification of Depth Maps using Iterative Quadratic Programming," IEEE International Conference on Image Processing, Brussels, Belgium, September 2011.G. Cheung, J. Ishida, A. Kubota, A. Ortega, "Transform Domain Sparsification of Depth Maps using Iterative Quadratic Programming," IEEE International Conference on Image Processing, Brussels, Belgium, September 2011. W. Hu, G. Cheung, X. Li, O. Au, "DepthMap Compression using Multi-resolution Graph-based Transform for Depth-image based Rendering," IEEE International Conference on Image Processing, Orlando, FL, September 2012.W. Hu, G. Cheung, X. Li, O. Au, "DepthMap Compression using Multi-resolution Graph-based Transform for Depth-image based Rendering," IEEE International Conference on Image Processing, Orlando, FL, September 2012. Yael Eisenthal, Gideon Dror, Eytan Ruppin, "Facial Attractiveness: Beauty and the Machine," Neural Computation vol.18, no.1, pp.119-142, 2006.Yael Eisenthal, Gideon Dror, Eytan Ruppin, "Facial Attractiveness: Beauty and the Machine," Neural Computation vol.18, no.1, pp.119-142, 2006. M. Aharon, M. Elad, and A.M. Bruckstein, "The K-SVD: An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation," IEEE Transactions On Signal Processing, vol.54, no.11, pp.4311-4322, November 2006.M. Aharon, M. Elad, and AM Bruckstein, "The K-SVD: An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation," IEEE Transactions On Signal Processing, vol.54, no.11, pp.4311-4322, November 2006. X. Liu, D. Zhai, D. Zhao, W. Gao, "Image Super-Resolution via Hierarchical and Collaborative Sparse Representation," Proceeding of Data Compression Conference, pp.93-102, 2013.X. Liu, D. Zhai, D. Zhao, W. Gao, "Image Super-Resolution via Hierarchical and Collaborative Sparse Representation," Proceeding of Data Compression Conference, pp.93-102, 2013. W. Dong, L. Zhang, G. Shi, and X.Wu,, "Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization," IEEE Transactions On Image Processing, vol.20, no.7, pp. 1838-1857, July 2011.W. Dong, L. Zhang, G. Shi, and X.Wu ,, "Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization," IEEE Transactions On Image Processing, vol.20, no.7, pp 1838-1857, July 2011. Allen Y. Yang, Z. Zhou, A. Ganesh, S. Sastry, and Y.Ma, "Fast ｌ１-minimization algorithms for robust face recognition," IEEE Transactions On Image Processing, vol.22, no.8, pp. 3234-3246, Aug. 2013.Allen Y. Yang, Z. Zhou, A. Ganesh, S. Sastry, and Y.Ma, "Fast 11-minimization algorithms for robust face recognition," IEEE Transactions On Image Processing, vol.22, no.8, pp. 3234-3246, Aug. 2013. X. Zhu and D. Ramanan, "Face detection, pose estimation, and landmark localization in the wild," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879-2886, June 2012.X. Zhu and D. Ramanan, "Face detection, pose estimation, and landmark localization in the wild," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879-2886, June 2012. I. Ahn and C. Kim, "Depth-based disocclusion filling for virtual view synthesis," in IEEE International Conference on Multimedia and Expo, Melbourne, Australia, July 2012.I. Ahn and C. Kim, "Depth-based disocclusion filling for virtual view synthesis," in IEEE International Conference on Multimedia and Expo, Melbourne, Australia, July 2012. Tommer Leyvand, Daniel Cohen-Or, Gideon Dror and Dani Lischinski, "Data-driven enhancement of facial attractiveness," in ACM SIG-GRAPH, vol. vol.27, no.3, August 2008.Tommer Leyvand, Daniel Cohen-Or, Gideon Dror and Dani Lischinski, "Data-driven enhancement of facial attractiveness," in ACM SIG-GRAPH, vol. Vol.27, no.3, August 2008. M. Dumont, S. Maesen, S. Rogmans, and P. Bekaert, "A prototype for practical eye-gaze corrected video chat on graphics hardware," in Inter-national Conference on Signal Processing and Multimedia Applications, Porto, Portugal, July 2008.M. Dumont, S. Maesen, S. Rogmans, and P. Bekaert, "A prototype for practical eye-gaze corrected video chat on graphics hardware," in Inter-national Conference on Signal Processing and Multimedia Applications, Porto, Portugal, July 2008. L. Wolf, Z. Freund, and S. Avidan, "An eye for an eye: A single camera gaze-replacement method," in 11th European Conference on Computer Vision, Crete, Greece, September 2010.L. Wolf, Z. Freund, and S. Avidan, "An eye for an eye: A single camera gaze-replacement method," in 11th European Conference on Computer Vision, Crete, Greece, September 2010. S.-B. Lee and Y.-S. Ho, "Generation of eye contact image using depth camera for realistic telepresence," in APSIPA ASC, Kaohsiung, Taiwan, October 2013.S.-B. Lee and Y.-S. Ho, "Generation of eye contact image using depth camera for realistic telepresence," in APSIPA ASC, Kaohsiung, Taiwan, October 2013. D. Guo and T. Sim, "Digital face makeup by example," in IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, June 2007.D. Guo and T. Sim, "Digital face makeup by example," in IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, June 2007. C.-Y. Yang, S. Liu, and M.-H. Yang, "Structured face hallucination," in IEEE International Conference on Computer Vision and Pattern Recognition, Portland, OR, June 2013.C.-Y. Yang, S. Liu, and M.-H. Yang, "Structured face hallucination," in IEEE International Conference on Computer Vision and Pattern Recognition, Portland, OR, June 2013. J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, T. Cog, J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, quipes-projets Willow, and E. N. Suprieure, "Supervised dictionary learning," 2008.J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, T. Cog, J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, quipes-projets Willow, and EN Suprieure, "Supervised dictionary learning," 2008. J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online dictionary learning for sparse coding," in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML ’09. New York, NY, USA： ACM, 2009, pp. 689-696.J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online dictionary learning for sparse coding," in Proceedings of the 26th Annual International Conference on Machine Learning, ser.ICML '09. New York, NY, USA : ACM, 2009, pp. 689-696. J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, "Coupled dictionary training for image super-resolution," Image Processing, IEEE Transactions on, vol. 21, no. 8, pp. 3467-3478, Aug 2012.J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, "Coupled dictionary training for image super-resolution," Image Processing, IEEE Transactions on, vol. 21, no. 8, pp. 3467 -3478, Aug 2012. S. Wang, D. Zhang, Y. Liang, and Q. Pan, "Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, June 2012, pp. 2216-2223.S. Wang, D. Zhang, Y. Liang, and Q. Pan, "Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, June 2012, pp. 2216-2223. J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 2, pp. 210-227, Feb 2009.J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 2, pp. 210-227, Feb 2009. D. Lowe, "Object recognition from local scale-invariant features," in IEEE International Conference on Computer Vision, September 1999.D. Lowe, "Object recognition from local scale-invariant features," in IEEE International Conference on Computer Vision, September 1999. Y. Yang, B. Geng, Y. Cai, A. Hanjalic, and X.-S. Hua, "Object retrieval using visual query context," in IEEE Transactions on Multimedia, vol. 13, no.6, December 2011, pp. 1295-1307.Y. Yang, B. Geng, Y. Cai, A. Hanjalic, and X.-S. Hua, "Object retrieval using visual query context," in IEEE Transactions on Multimedia, vol. 13, no. 6, December 2011, pp. 1295-1307. I. Daubechies, R. Devore, M. Fornasier, and S. Gunturk, "Iteratively reweighted least squares minimization for sparse recovery," in Commun. Pure Appl. Math, 2009.I. Daubechies, R. Devore, M. Fornasier, and S. Gunturk, "Iteratively reweighted least squares minimization for sparse recovery," in Commun. Pure Appl. Math, 2009. S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, 2004.S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, 2004. M. Taylor and C. Creelman, "PEST: Efficient estimates on probability functions," in The Journal of the Acoustical Society of America, vol. 41, 1988, pp. 782-787.M. Taylor and C. Creelman, "PEST: Efficient estimates on probability functions," in The Journal of the Acoustical Society of America, vol. 41, 1988, pp. 782-787. H. Hadizadeh, I. Bajic, and G. Cheung, "Video error concealment using a computation-efficient low saliency prior," in IEEE Transactions on Multimedia, vol. 15, no.8, December 2013, pp. 2099-2113.H. Hadizadeh, I. Bajic, and G. Cheung, "Video error concealment using a computation-efficient low saliency prior," in IEEE Transactions on Multimedia, vol. 15, no.8, December 2013, pp. 2099-2113. D. J. Sheskins, Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, 2007.D. J. Sheskins, Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall / CRC, 2007.

しかしながら、上述のＤＩＢＲを用いたアプローチの重要な問題は、被写体のテクスチャを生成するときに被写体同士が重なって遮蔽領域が発生するいわゆる「オクルージョン」の問題を解決するための、遮蔽領域をいかに穴埋めするかという「遮蔽除去穴埋め問題」である。ディジタルカメラで撮影視点における前景の構成要素によって遮蔽された仮想画像内の空間領域は画素欠けが含まれており、満足がゆくように完成する必要がある。ＤＩＢＲで合成された画像のための穴埋めは、非特許文献２〜４で対処されてきたが、発明者の知る限り誰もまだ、一般的な人間の顔に対する「オクルージョン」の穴埋めのためのアルゴリズムをいまだ提案していない。 However, an important problem of the DIBR approach described above is how to fill the occlusion area in order to solve the so-called “occlusion” problem in which occlusion areas occur when the subject textures overlap. This is the “shield removal hole filling problem”. The spatial region in the virtual image shielded by the foreground components at the shooting viewpoint with the digital camera contains pixel defects and needs to be completed to satisfy. Filling for images synthesized with DIBR has been dealt with in Non-Patent Documents 2 to 4, but as far as the inventor knows, no algorithm is yet to fill in “occlusion” for general human faces. Has not yet been proposed.

本発明の目的は以上の問題点を解決し、対話装置のための画像処理装置において、対話者の視線が一致するように補正するとともに、「オクルージョン」の問題を同時に解決しかつ顔美化も行うことができる対話装置のための画像処理装置及び方法、並びに対話装置を提供することにある。 The object of the present invention is to solve the above-mentioned problems, and in an image processing apparatus for an interactive apparatus, corrects the line of sight of the conversation person to coincide with each other, simultaneously solves the problem of “occlusion” and also performs facial beautification An image processing apparatus and method for an interactive apparatus, and an interactive apparatus.

第１の発明に係る画像処理装置は、
複数の顔画像データから複数のパッチを抽出した後クラスタリングして構成され、視線補正により発生した穴埋め処理用の顔再構成処理のための第１の辞書行列（Φ）を格納する第１の辞書メモリと、
複数の顔画像データから美しい顔の複数の構成要素毎にクラスタリングして構成され、顔美化処理のための第２の辞書行列（Ψ）を格納する第２の辞書メモリと、
入力される顔画像ベクトル（ｘ）に基づいて、上記第１の辞書行列（Φ）に対する第１のスパースコードベクトル（α）と、上記第２の辞書行列（Ψ）に対する第２のスパースコードベクトル（β）と、上記再構成された顔画像データから上記顔美化処理された簡易画像データへの線形変換行列（Ｌ）とを含む目的関数であって、上記顔再構成及び上記顔美化のための目的関数を最小化するときの上記第１のスパースコードベクトル（α）と上記第２のスパースコードベクトル（β）と上記線形変換行列（Ｌ）との各最適値を求めることにより、上記入力される顔画像ベクトル（ｘ）に対して顔再構成処理及び顔美化処理を行った顔画像データを生成して出力する画像処理手段とを備えることを特徴とする。 An image processing apparatus according to a first invention
A first dictionary configured by clustering after extracting a plurality of patches from a plurality of face image data, and storing a first dictionary matrix (Φ) for face reconstruction processing for hole filling processing generated by line-of-sight correction Memory,
A second dictionary memory configured by clustering a plurality of components of a beautiful face from a plurality of face image data and storing a second dictionary matrix (Ψ) for face beautification processing;
Based on the input face image vector (x), a first sparse code vector (α) for the first dictionary matrix (Φ) and a second sparse code vector for the second dictionary matrix (Ψ) (Β) and a linear transformation matrix (L) from the reconstructed face image data to the simple image data subjected to the face beautification, for the face reconstruction and the face beautification By obtaining respective optimum values of the first sparse code vector (α), the second sparse code vector (β) and the linear transformation matrix (L) when minimizing the objective function of And image processing means for generating and outputting face image data obtained by performing face reconstruction processing and face beautification processing on the face image vector (x).

上記画像処理装置において、上記目的関数は、上記線形変換行列（Ｌ)に代えて、入力される顔画像データ（Ａ）を顔美化された顔画像データ（Ｂ）にワーピング処理するワーピング演算子Ｗ（Ａ，Ｂ）と、スケール不変特徴変換法を用いて画素領域のベクトルを選択された特徴空間におけるベクトルにマッピングするマッピング関数Ｓ（・）とをさらに含み、
上記画像処理手段は、上記顔再構成及び上記顔美化のための目的関数を最小化するときの上記第１のスパースコードベクトル（α）と上記第２のスパースコードベクトル（β）との各最適値を求めることにより、上記入力される顔画像ベクトル（ｘ）に対して顔再構成処理及び顔美化処理を行った顔画像データを生成して出力することを特徴とする。 In the image processing apparatus, the objective function is a warping operator W that warps input face image data (A) to face-beautified face image data (B) instead of the linear transformation matrix (L). (A, B) and a mapping function S (•) for mapping a pixel region vector to a vector in a selected feature space using a scale invariant feature transformation method,
The image processing means respectively optimizes the first sparse code vector (α) and the second sparse code vector (β) when the objective function for the face reconstruction and the beautification is minimized. By obtaining a value, face image data obtained by performing face reconstruction processing and face beautification processing on the input face image vector (x) is generated and output.

また、上記画像処理装置において、上記目的関数は次式で表され、
ここで、μは所定の定数であり、ｕは所定の線形制約を満たすための上記第２のスパースコードベクトル（β）を変調するための重みベクトルであり、
上記画像処理手段は、上記式を用いて上記第１のスパースコードベクトル（α）及び上記第２のスパースコードベクトル（β）を求め、上記入力される顔画像ベクトル（ｘ）に対して顔再構成処理及び顔美化処理を行った顔画像データを生成して出力することを特徴とする。 In the image processing apparatus, the objective function is expressed by the following equation:
Here, μ is a predetermined constant, u is a weight vector for modulating the second sparse code vector (β) to satisfy a predetermined linear constraint,
The image processing means obtains the first sparse code vector (α) and the second sparse code vector (β) using the above formula, and performs face reconstruction on the input face image vector (x). It is characterized by generating and outputting face image data that has undergone configuration processing and face beautification processing.

さらに、上記画像処理装置において、上記画像処理手段の前段に設けられ、撮像される顔画像データに対して、当該顔画像データの顔の視線が対話者の視線に一致するように視線補正して顔画像ベクトル（ｘ）を出力する視線補正処理手段をさらに備えることを特徴とする。 Further, in the image processing device, provided in the preceding stage of the image processing means, the line of sight of the face image data to be captured is corrected so that the line of sight of the face image data matches the line of sight of the conversation person. The image processing apparatus further includes line-of-sight correction processing means for outputting the face image vector (x).

またさらに、上記画像処理装置において、上記画像処理手段の後段に設けられ、上記画像処理手段からの顔画像データに対して、複数の顔輪郭データを用いて当該顔の輪郭を調整して調整された顔画像データを出力する顔輪郭調整手段をさらに備えることを特徴とする。 Furthermore, in the image processing device, provided after the image processing means, the face image data from the image processing means is adjusted by adjusting the face contour using a plurality of face contour data. The image processing apparatus further comprises a face contour adjusting means for outputting the face image data.

またさらに、上記画像処理装置において、上記顔輪郭調整手段の後段に設けられ、上記顔輪郭調整手段からの顔画像データに対して、上記撮像される顔画像データの背景を含むようにワーピング処理を行って顔画像データを出力する画像ワーピング処理手段をさらに備えることを特徴とする。 Still further, in the image processing apparatus, a warping process is provided for the face image data from the face contour adjusting unit, which is provided after the face contour adjusting unit, so as to include the background of the captured face image data. The image processing apparatus further includes image warping processing means for performing and outputting face image data.

第２の発明に係る対話装置は、
相手方の対話装置からの顔画像データを受信し、当該対話装置の顔画像データを上記相手方の対話装置に送信する送受信手段と、
上記各顔画像データを表示する表示部とを備えた対話装置であって、
請求項１〜８のうちのいずれか１つに記載の画像処理装置を備え、
上記送受信手段は、上記画像処理装置からの顔画像データを相手方の対話装置に送信することを特徴とする。 The dialogue apparatus according to the second invention is
Transmitting / receiving means for receiving face image data from the other party's dialogue device and transmitting the face image data of the dialogue device to the other party's dialogue device;
An interactive device comprising a display unit for displaying each face image data,
The image processing apparatus according to any one of claims 1 to 8, comprising:
The transmission / reception means transmits the face image data from the image processing apparatus to the other party's dialogue apparatus.

第３の発明に係る画像処理方法は、
複数の顔画像データから複数のパッチを抽出した後クラスタリングして構成され、視線補正により発生した穴埋め処理用の顔再構成処理のための第１の辞書行列（Φ）を格納する第１の辞書メモリと、
複数の顔画像データから美しい顔の複数の構成要素毎にクラスタリングして構成され、顔美化処理のための第２の辞書行列（Ψ）を格納する第２の辞書メモリとを備えた画像処理装置のための画像処理方法であって、
上記画像処理装置が、入力される顔画像ベクトル（ｘ）に基づいて、上記第１の辞書行列（Φ）に対する第１のスパースコードベクトル（α）と、上記第２の辞書行列（Ψ）に対する第２のスパースコードベクトル（β）と、上記再構成された顔画像データから上記顔美化処理された簡易画像データへの線形変換（Ｌ）とを含む目的関数であって、上記顔再構成及び上記顔美化のための目的関数を最小化するときの上記第１のスパースコードベクトル（α）と上記第２のスパースコードベクトル（β）と上記線形変換（Ｌ）との各最適値を求めることにより、上記入力される顔画像ベクトル（ｘ）に対して顔再構成処理及び顔美化処理を行った顔画像データを生成して出力するステップを含むことを特徴とする。 An image processing method according to a third invention is:
A first dictionary configured by clustering after extracting a plurality of patches from a plurality of face image data, and storing a first dictionary matrix (Φ) for face reconstruction processing for hole filling processing generated by line-of-sight correction Memory,
An image processing apparatus comprising: a second dictionary memory configured by clustering a plurality of components of a beautiful face from a plurality of face image data and storing a second dictionary matrix (Ψ) for face beautification processing An image processing method for
Based on the input face image vector (x), the image processing apparatus performs the first sparse code vector (α) for the first dictionary matrix (Φ) and the second dictionary matrix (Ψ). An objective function comprising a second sparse code vector (β) and a linear transformation (L) from the reconstructed face image data to the simplified face data subjected to face beautification, wherein the face reconstruction and Obtaining respective optimum values of the first sparse code vector (α), the second sparse code vector (β), and the linear transformation (L) when minimizing the objective function for the beautification of the face The method includes generating and outputting face image data obtained by performing face reconstruction processing and face beautification processing on the input face image vector (x).

本発明によれば、対話装置のための画像処理装置において、対話者の視線が一致するように補正するとともに、「オクルージョン」の問題を同時に解決しかつ顔美化も行うことができる。 According to the present invention, in an image processing apparatus for an interactive apparatus, correction can be performed so that the lines of sight of the interlocutor coincide with each other, the problem of “occlusion” can be solved simultaneously and face beautification can be performed.

本発明の実施形態１に係る画像処理部１５を備えた対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the dialogue apparatus provided with the image processing part 15 which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係る画像処理部１５Ａを備えた対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the dialogue apparatus provided with 15 A of image processing parts which concern on Embodiment 2 of this invention. 図２の画像処理部１５Ａにおいて実行されるデユアルコードベクトルの検索を説明するための図である。It is a figure for demonstrating the search of the dual code vector performed in 15 A of image processing parts of FIG. 図２の画像処理部１５Ａにおいて実行される典型的な画像のアライメント（位置合わせ）で用いるランドマークの一例を示す図である。It is a figure which shows an example of the landmark used by the alignment (position alignment) of the typical image performed in 15 A of image processing parts of FIG. 図２の画像処理部１５Ａによる画像処理結果であって、無表情のフレームに対する小さな撮像角度のときの画像処理結果の写真であり、上段は穴埋め処理のみの画像処理結果であり、下段は画像再構成及び顔美化処理の画像処理結果である。FIG. 2 shows a result of image processing by the image processing unit 15A in FIG. 2 and is a photograph of the image processing result at a small imaging angle with respect to an expressionless frame. It is an image processing result of composition and face beautification processing. 図２の画像処理部１５Ａによる画像処理結果であって、少し笑みの表情のフレームに対する小さな撮像角度のときの画像処理結果の写真であり、上段は穴埋め処理のみの画像処理結果であり、下段は画像再構成及び顔美化処理の画像処理結果である。FIG. 2 is an image processing result by the image processing unit 15A in FIG. 2 and is a photograph of the image processing result at a small imaging angle with respect to a frame with a slightly smiling expression, the upper row is an image processing result of only the hole filling processing, and the lower row is an image processing result. It is an image processing result of image reconstruction and face beautification processing. 図２の画像処理部１５Ａによる顔美化された正面画像に対する投票結果を示すテーブルである。It is a table which shows the vote result with respect to the front image by which the image processing part 15A of FIG. 図２の画像処理部１５Ａによる画像処理結果であって、顔美化された顔画像をキャプチャされた画像に統合した結果であり、上段は元のキャプチャされた顔画像であり、下段は統合された顔画像である。FIG. 2 shows the result of image processing by the image processing unit 15A in FIG. 2, which is a result of integrating the face beautified face image into the captured image, the upper row is the original captured face image, and the lower row is the integrated image. It is a face image.

以下、本発明に係る実施形態について図面を参照して説明する。なお、以下の各実施形態において、同様の構成要素については同一の符号を付している。 Hereinafter, embodiments according to the present invention will be described with reference to the drawings. In addition, in each following embodiment, the same code | symbol is attached | subjected about the same component.

実施形態１．
１−１．まえがき．
本実施形態では、統一化されたスパースコーディングのフレームワークにおける顔美化の問題と一緒に人間の顔に対する遮蔽除去穴埋め問題も解決することを提案する。顔美化はレンダリングされた顔の魅力を高めるために顔の特徴を微小に変形する処理である（例えば、非特許文献９、特許文献４参照）。現在撮影された画像にのみ利用可能なデータを用いて欠落している画素を埋めることを試みる従来の穴埋め処理（例えば、非特許文献６〜８参照）とは異なり、オフラインの２つの対応辞書を用いて学習して、レンダリングの再構成処理された対象画像と多くの美しい顔の両方を含む大容量のコーパスを利用することを特徴としており、膨大な画像データベースがソーシャルネットワークや検索エンジンを通じて簡単にアクセスできるビッグデータの時代において利用可能となったものである。画像合成中において、以下のコードベクトルを同時に求める。
（１）１つは、第１の辞書に対するスパースコードベクトルであり、利用可能なＤＩＢＲ合成画素データを含む。
（２）もう１つは、第２の辞書に対するスパースコードベクトルであり、制約された線形変換まで最初のベクトルとよく一致している。 Embodiment 1. FIG.
1-1. Preface.
In the present embodiment, it is proposed to solve the masking removal hole filling problem for human faces together with the face beautification problem in the unified sparse coding framework. Face beautification is a process of minutely changing facial features in order to enhance the attractiveness of a rendered face (see, for example, Non-Patent Document 9 and Patent Document 4). Unlike conventional hole-filling processing (for example, see Non-Patent Documents 6 to 8) that attempts to fill in missing pixels using data that can only be used for the currently captured image, two offline correspondence dictionaries are used. It is characterized by using a large-capacity corpus containing both the reconstructed target image of rendering and many beautiful faces, and a huge image database can be easily accessed through social networks and search engines It became available in the era of accessible big data. During image composition, the following code vectors are obtained simultaneously.
(1) One is a sparse code vector for the first dictionary, which contains available DIBR synthesized pixel data.
(2) The other is a sparse code vector for the second dictionary, which matches well with the first vector up to the constrained linear transformation.

ここで、予め決められたセットに対する線形変換の制約は、アトラクティブさ（魅力的な引きつけ度合い）を改善するために、キャプチャされた対象画像の認識度と、美しい顔の特徴との間の良好なバランスを確立することができる。後述する本実施形態の実験結果は、認識可能に改善されたアトラクティブさを有する自然にレンダリング処理された人間の顔を示している。 Here, the linear transformation constraint on the predetermined set is a good match between the perception of the captured target image and the beautiful facial features to improve the attractiveness (attractive attractiveness). A balance can be established. The experimental results of this embodiment described below show a naturally rendered human face with recognizable and improved attractiveness.

１−２．関連技術等
視線不一致の問題はテレビ会議システム（例えば、非特許文献１参照）ではよく知られた問題であり、従来技術文献において多数の解決策が存在する（例えば、非特許文献１２〜１３、３〜５、特許文献１〜３参照）。初期の段階での解決法（例えば、非特許文献１２，１３参照）は、ステレオでキャプチャされた視点画像に対して、画像を基礎とするレンダリング処理を実行し、この処理はディジタル計算が集約的である傾向がある。奥行きのセンシング技術の最近の進歩を活用することにより、キャプチャカメラを仮定するより最近の提案（例えば、非特許文献３、４、５参照）は、視線一致の修正画像の合成のＤＩＢＲ合成のためにテクスチャ画像及び奥行き画像の両方を提供する。ここで、このテクスチャ画像及び奥行き画像のフォーマットは、送信側での奥行き画像の圧縮を必要としており、昨今研究される（例えば、非特許文献１４，１５参照）。 1-2. Related Art etc. The problem of gaze mismatch is a well-known problem in video conference systems (for example, see Non-Patent Document 1), and there are many solutions in the prior art documents (for example, Non-Patent Documents 12 to 13, 3-5, patent documents 1-3). A solution at an early stage (for example, see Non-Patent Documents 12 and 13) performs rendering processing based on an image on a viewpoint image captured in stereo, and this processing is intensive in digital calculation. Tend to be. By taking advantage of recent advances in depth sensing technology, more recent proposals assuming a capture camera (see, for example, Non-Patent Documents 3, 4, and 5) are for DIBR synthesis of corrected line-of-sight image synthesis. Provides both texture and depth images. Here, the format of the texture image and the depth image requires compression of the depth image on the transmission side, and has been recently studied (for example, see Non-Patent Documents 14 and 15).

リアルタイムの実施に向け、「遮蔽除去穴埋め問題」に対するこれらの提案の解決策は、簡単でかつアドホック的であり、カメラ視点と目標仮想視点とが遠く離れていない場合にうまく機能する傾向がある。しかし、ビデオ会議中に強化された没入のために、大型ディスプレイがしばしば使用され、キャプチャカメラは画面中心から遠くに配置することができる。このように得られた穴は大きく、特定の被験者に合わせた辞書を用いて、発明者らのより洗練されたデュアルスパースコーディング方式がより有益であり得る。 For real-time implementation, these proposed solutions to the “shield removal hole filling problem” are simple and ad hoc, and tend to work well when the camera viewpoint and the target virtual viewpoint are not far apart. However, due to enhanced immersion during video conferencing, large displays are often used and the capture camera can be located far from the center of the screen. The holes thus obtained are large and the inventors' more sophisticated dual sparse coding scheme may be more beneficial, using a dictionary tailored to a particular subject.

本発明者らのアプローチで必要な辞書は、複数の画像についての利用可能なコーパスを使用してオフラインで学習されるので、会議中のみの複雑さは、適切なスパースコードベクトルの計算のみである。このことは、例えばＬａｓｓｏなどの最新の最適化ツールを使用して、適度なサイズの辞書をリアルタイムで行うことができると考えられる。 Since the dictionaries required by our approach are learned offline using available corpora for multiple images, the only complexity during the meeting is the calculation of the appropriate sparse code vector. . This is considered to be possible to perform a moderately sized dictionary in real time using the latest optimization tool such as Lasso.

顔美化処理については、個別の静止画像に対する文献（例えば、非特許文献９参照）において研究されており、テレビ会議の状況における人間の顔画像のためにそうすることは、時間的一貫性とリアルタイム実装に対して新たな課題をもたらす。本実施形態では、実験を行っていないが、スパースコーディングのアプローチは前の瞬間の最適化されたベクトルを用いて新しい時刻のスパースコードベクトルを初期化することで両方の目的を達成することができると考えられる。 Face beautification processing has been studied in the literature for individual still images (see, for example, Non-Patent Document 9), and doing so for human face images in a video conferencing situation is time consistent and real-time. Bring new challenges to implementation. In this embodiment, no experiment has been performed, but the sparse coding approach can achieve both objectives by initializing the sparse code vector at the new time using the optimized vector of the previous moment. it is conceivable that.

１−３．対話装置のシステム構成
図１は本発明の実施形態１に係る画像処理部１５を備えた対話装置の構成を示すブロック図である。図１において、左側カメラ１１と、右側カメラ１２と、一時メモリ１３ｍを有する画像データ取得部１３と、視線補正処理部１４と、画像処理部１５と、画像通信部１６と、表示制御部１７と、ディスプレイ１８とを備えて構成される。ここで、画像処理部１５は、再構成処理部２１と、それに接続される辞書メモリ３１と、顔美化処理部２２と、それに接続される辞書メモリ３２とを備えて構成される。辞書メモリ３１，３２は以下に詳細後述するようにオフラインで学習されてなる各辞書行列Φ，Ψを格納する。また、再構成処理部２１及び顔美化処理部２２とは詳細後述するように一括の画像処理部２１にて行い、画像処理を実質的にリアルタイムで行うことを特徴としている。 1-3. System Configuration of Dialogue Device FIG. 1 is a block diagram showing the configuration of the dialogue device provided with the image processing unit 15 according to Embodiment 1 of the present invention. In FIG. 1, a left camera 11, a right camera 12, an image data acquisition unit 13 having a temporary memory 13m, a line-of-sight correction processing unit 14, an image processing unit 15, an image communication unit 16, a display control unit 17, and the like. And a display 18. Here, the image processing unit 15 includes a reconstruction processing unit 21, a dictionary memory 31 connected thereto, a face beautification processing unit 22, and a dictionary memory 32 connected thereto. The dictionary memories 31 and 32 store dictionary matrices Φ and Ψ learned offline as will be described later in detail. Further, as will be described in detail later, the reconstruction processing unit 21 and the face beautification processing unit 22 are performed by a batch image processing unit 21, and image processing is performed substantially in real time.

当該対話装置は、ディスプレイ１８の左右側のユーザと対向する２つのカメラ１１，１２を使用する。画像データ取得部１３は２つのカメラ１１，１２からの画像データを取得して一部メモリ１３ｍに一時的に格納した後、視線補正処理部１４に出力する。視線補正処理部１４は、公知のＤＩＢＲに基づく視線補正方法（ディスプレイ方向に向いたユーザの視線を、あたかもカメラ方向（実質的に画像中心）に向くように視線補正することで、相手方の対話者の視線と、当該対話装置のユーザの視線とを一致させる。）を用いて、最初のテクスチャと奥行きマッピングはディジタルカメラでキャプチャされた視点画像データから利用可能であると仮定して、視線が補正された画像データを合成して、画像処理部１５の再構成処理部２１に出力する。自己遮蔽のために、充填が必要なＤＩＢＲの視線補正方法で合成された画像内の画素欠落が発生する。ここで、人間の顔に属さない画素の充填は、一般的な穴埋め技術（例えば、非特許文献６〜８参照）を使用して解決可能な直交問題である。 The dialogue apparatus uses two cameras 11 and 12 facing the users on the left and right sides of the display 18. The image data acquisition unit 13 acquires the image data from the two cameras 11 and 12, temporarily stores it in the partial memory 13 m, and then outputs it to the line-of-sight correction processing unit 14. The line-of-sight correction processing unit 14 corrects the line-of-sight correction method based on the known DIBR (the line of sight of the user facing the display direction as if it is directed to the camera direction (substantially the image center), so that the other party's dialog ) And the user's line of sight of the interaction device)), the initial texture and depth mapping are assumed to be available from the viewpoint image data captured by the digital camera, and the line of sight is corrected. The combined image data is synthesized and output to the reconstruction processing unit 21 of the image processing unit 15. Due to self-occlusion, pixel loss in the image synthesized by the DIBR line-of-sight correction method that requires filling occurs. Here, filling of pixels that do not belong to the human face is an orthogonal problem that can be solved by using a general hole filling technique (see, for example, Non-Patent Documents 6 to 8).

なお、本実施形態に係る対話装置において２つのカメラ１１，１２を備えているが、本発明はこれに限らず、１つのカメラで撮像した後、画像処理を行ってもよい。 Although the two cameras 11 and 12 are provided in the interactive apparatus according to the present embodiment, the present invention is not limited to this, and image processing may be performed after imaging with one camera.

本実施形態において、主要な構成要素は、統一化されたデュアルスパースコーディングのフレームワークを用いて、人間のための自己遮蔽穴埋め問題及び顔美化の問題を一緒に解くための画像処理部１５であり、画像処理部１５は再構成処理部２１と顔美化処理部２２とを備えて構成される。画像処理部１５は、先験的に収集することができる２つのオフライン学習セットを使用して学習される２個の辞書行列Φ，Ψをそれぞれ備える辞書メモリ３１，３２を用いる。 In this embodiment, the main component is an image processing unit 15 for solving a self-occlusion hole filling problem and a face beautification problem for humans together using a unified dual sparse coding framework. The image processing unit 15 includes a reconstruction processing unit 21 and a face beautification processing unit 22. The image processing unit 15 uses dictionary memories 31 and 32 respectively including two dictionary matrices Φ and Ψ that are learned using two offline learning sets that can be collected a priori.

顔画像データの再構成に使用する辞書メモリ３１内の辞書行列Φはビデオ会議における現在の人物の特定の顔写真を使用して学習される。このことは、顔の外見によって張られる空間が比較的小さく、特定の面で張られる空間が大幅に小さいことが観察される。従って、このような辞書行列Φを使用することによって、高品質の顔再構成を得るために有用な事前確率を導出することができる。 The dictionary matrix Φ in the dictionary memory 31 used for reconstruction of face image data is learned using a specific face picture of the current person in the video conference. This is observed that the space stretched by the appearance of the face is relatively small and the space stretched on a specific surface is much smaller. Therefore, by using such a dictionary matrix Φ, a prior probability useful for obtaining a high-quality face reconstruction can be derived.

また、顔の美化のために使用される辞書メモリ３２内の辞書行列Ψは、本実施形態の場合、インターネットからのアジアの映画スターの美しい真正面の顔画像を集めている一般的な美しい顔のセットを使用して学習される。それを通じて、顔の美化を行うために、スパース表現の識別的な性質を利用することができる。最もコンパクトな表現をもたらすアトム（最小表現の画像データ）は美化の候補として優先されるべきであって好ましい。すなわち、顔美化は、これらのアトムの線形結合として表すことができる。さらに、制限された線形変換行列Ｌは美化のレベルを制約するために使用され、すなわち、変形は微小であるべきである。再構成処理及び顔美化の処理はより良い結果を得るために反復して実行することができる。 In addition, the dictionary matrix Ψ in the dictionary memory 32 used for the beautification of the face is, in the case of the present embodiment, a general beautiful face collection of beautiful face images of Asian movie stars from the Internet. Learned using sets. Through it, the discriminative nature of sparse representation can be used to beautify the face. The atom (image data of the minimum expression) that gives the most compact expression should be given priority as a candidate for beautification and is preferable. That is, face beautification can be expressed as a linear combination of these atoms. Furthermore, the limited linear transformation matrix L is used to constrain the level of beautification, i.e. the deformation should be small. The reconstruction process and face beautification process can be performed iteratively to obtain better results.

なお、画像処理部１５の再構成処理及び顔美化処理については詳細後述する。画像処理部１５の処理後の画像データは画像通信部１６に出力されて、所定の動画データに変換された後、他の対話装置に送信される。画像通信部１６は相手方の対話装置からの動画データも受信して、双方の動画データは表示制御部１７によりディスプレイ１８に表示される。 Details of the reconstruction process and the face beautification process of the image processing unit 15 will be described later. The image data processed by the image processing unit 15 is output to the image communication unit 16, converted into predetermined moving image data, and then transmitted to another interactive device. The image communication unit 16 also receives moving image data from the other party's interactive device, and both moving image data are displayed on the display 18 by the display control unit 17.

１−４．穴埋めと顔美化処理の結合
以下では、画像処理部１５により実行される穴埋め及び顔美化処理の詳細について以下に説明する。発明者らは、成功する変換は、次の３つの基準を満たしている必要があることを提案している。
（１）フィデリティ（忠実性）：ＤＩＢＲの視線補正方法で合成された顔画像の穴は自然な人間の顔として穴埋めされる必要がある。
（２）アトラクティブさ（魅力性）：レンダリング処理された顔が「美しい」の局所的な特徴を持つ顔の魅力を高めておく必要がある。
（３）識別可能：ターゲット被験者は紛れもなく認識可能であるようにだけ微小な変形は、元の顔の画像データに実行する必要がある。 1-4. Combination of hole filling and face beautification processing Details of the hole filling and face beautification processing executed by the image processing unit 15 will be described below. The inventors have proposed that a successful transformation must meet the following three criteria:
(1) Fidelity: The hole in the face image synthesized by the DIBR line-of-sight correction method needs to be filled as a natural human face.
(2) Attractiveness (attractiveness): It is necessary to enhance the attractiveness of a face having a local characteristic that the face subjected to rendering processing is “beautiful”.
(3) Recognizable: Only a minute deformation needs to be performed on the image data of the original face so that the target subject can be clearly recognized.

第１及び第２の基準は、デュアルスパースコーディングの問題として定式化することができる。ここで、パッチベースのオーバーコンプリート辞書は現在の顔の構造ドメインを特徴づけるために、現在の人の例面から学習される。また、特徴（フィーチャー）ベースのオーバーコンプリート辞書は複数の美しい顔の特徴領域を特徴づけるために映画スターの美しい顔の集合から学習される。第三の基準は、美化の度合いを制限するために異なる顔の特徴を行う線形変換の種類を制限することによって満たすことができる。これらの基準のすべてが最終的な結果を得るために、統一された最適化フレームワークに落とし込むことができる。
１−４−１．デュアル辞書学習．
どのようにデュアルスパースコーディングのための適切な辞書行列Φ，Ψを導き出すことは、提案方式における重要な問題である。２つの独立した辞書行列Φ，Ψは、それぞれ顔の再構成と顔美化のために学習される。
１−４−１−１．再構成のための辞書学習．
顔の再構成のために、本実施形態の目的は、合成された顔画像にＤＩＢＲの視線補正方法によって露出された穴を埋めることである。従来技術文献では、局所パッチを用いて画像再構成は非常にポピュラーとなっており、高度に有効であることが示された（例えば、非特許文献１７，１８参照）。そのため、顔の再構成のために、辞書行列Φ，Ψを学習するためのパッチベースのローカル適応法を提案する。現在の人間の与えられた複数の顔画像データ例に対して、オーバーラップを有するすべてのパッチを抽出する。そして、収集された複数のパッチを、Ｋ−ｍｅａｎｓクラスタリング法を用いて同様の幾何学的構造を有する複数のクラスタに分類し、コンパクトなサブ辞書を学習することによって、各クラスタをモデル化する。具体的には、符号化されるべきｎ_ｉ個の画像パッチを含むあるクラスタｉに対して、複数のパッチのベクトルをｘ_ｉで示された行列で格納する。次いで、主成分分析法（ＰＣＡ）を行列ｘ_ｉに適用することで行列ｘ_ｉに最も関連する適応サブ辞書行列Φ_ｉを学習する。ＰＣＡは、辞書行列Φのアトムが行列ｘ_ｉの共分散行列の固有ベクトルデュアル辞書行列Φ_ｉを生成する。 The first and second criteria can be formulated as a dual sparse coding problem. Here, the patch-based overcomplete dictionary is learned from the current human example to characterize the current facial structure domain. Also, a feature-based overcomplete dictionary is learned from a set of beautiful faces of movie stars to characterize a plurality of beautiful face feature regions. The third criterion can be met by limiting the type of linear transformation that performs different facial features to limit the degree of beautification. All of these criteria can be put into a unified optimization framework to get the final result.
1-4-1. Dual dictionary learning.
How to derive an appropriate dictionary matrix Φ, Ψ for dual sparse coding is an important problem in the proposed scheme. Two independent dictionary matrices Φ and Ψ are learned for face reconstruction and face beautification, respectively.
1-4-1-1. Dictionary learning for reconstruction.
For face reconstruction, the object of the present embodiment is to fill a hole exposed in the synthesized face image by the DIBR line-of-sight correction method. In the prior art document, it has been shown that image reconstruction using a local patch is very popular and highly effective (for example, see Non-Patent Documents 17 and 18). Therefore, we propose a patch-based local adaptation method for learning dictionary matrices Φ and Ψ for face reconstruction. All patches having overlap are extracted from a plurality of face image data examples given by the current person. Then, the collected patches are classified into a plurality of clusters having the same geometric structure using the K-means clustering method, and each cluster is modeled by learning a compact sub-dictionary. More specifically, for a certain cluster i including n _i image patches to be encoded, a plurality of patch vectors are stored in a matrix indicated by x _i . Then, learning the adaptation sub dictionary matrix [Phi _i most relevant to the matrix x _i by applying principal component analysis method (PCA) to the matrix x _i. The PCA generates an eigenvector dual dictionary matrix Φ _i that is a covariance matrix whose atom of the dictionary matrix Φ is a matrix x _i .

１つのクラスタ内の複数のパッチは互いに類似しているので、サブ辞書はオーバーコンプリートである必要はないが、すべてのサブ辞書が顔のすべての可能な局所構造を特徴づけるために、大容量のオーバーコンプリート辞書を構築するために組み合わされる（例えば、非特許文献１９参照）。 Since multiple patches in a cluster are similar to each other, the sub-dictionaries do not have to be overcomplete, but because all sub-dictionaries characterize all possible local structures of the face, Combined to build an overcomplete dictionary (see, for example, Non-Patent Document 19).

１−４−１−２．顔美化のための辞書学習．
複数の顔は大きい多様性を持つオブジェクトであるにもかかわらず、それらは、目、眉、鼻、口などのいくつかの基本的な特徴で構成される。従来技術文献では、美しさの判断課題における顔の特徴の重要性を示している（例えば、非特許文献１６参照）。顔の美化処理のために、特徴ベースの辞書学習のアプローチを取る。正面の顔が入力として与えられると、まず、顔のランドマークのセットを識別する（例えば、非特許文献２１参照）。次いで、ランドマークの位置に応じて、左目、右目、左眉、右眉、鼻、口を含む顔の特徴の６つのクラスを抽出する。実用的な実装では、口を美化しない。なぜならば、通常、テレビ会議の間に、高速で頻繁に動くからである。そして、各特徴のサブ辞書を構築する。 1-4-1-2. Dictionary learning for face beautification.
Even though multiple faces are objects with great diversity, they are composed of several basic features such as eyes, eyebrows, nose and mouth. The prior art document shows the importance of facial features in the beauty determination task (see, for example, Non-Patent Document 16). Take a feature-based dictionary learning approach for facial beautification. When a front face is given as input, first a set of face landmarks is identified (see, for example, Non-Patent Document 21). Next, six classes of facial features including the left eye, right eye, left eyebrow, right eyebrow, nose, and mouth are extracted according to the position of the landmark. In practical implementation, do not beautify the mouth. This is because it usually moves fast and frequently during a video conference. Then, a sub-dictionary for each feature is constructed.

具体的には、現在の顔の特徴ベクトルｘ_ｉのために、美しい顔データセットから学習サンプルとして次式の十分な例の特徴行列Ａ_ｉを収集する。 Specifically, for the current face feature vector x _i , a feature matrix A _i of a sufficient example of the following equation is collected as a learning sample from a beautiful face data set.

光の正規化は、異なる照明条件の影響をキャンセルするために学習サンプルに対して行われる。発明者らは、光の正規化のための次のモデルを提案する。 Light normalization is performed on the learning samples to cancel the effects of different lighting conditions. The inventors propose the following model for light normalization.

ここで、ｍｉｎ（）とｍａｘ（）はそれぞれ最小値演算子と最大値演算子を表す。光正規化後の、学習サンプル自体が辞書メモリ３２内の美化用辞書行列Ψの基本要素として使用される。
１−４−２．結合されたスパースコーディング．
本実施形態では、以下のエネルギー関数の最小化を用いて、線形変換行列Ｌと同様に、２つの学習された辞書行列Φ，Ψに関するスパースコードベクトルα，βを見つけることによって全体の目的を達成する。 Here, min () and max () represent a minimum value operator and a maximum value operator, respectively. The learning sample itself after light normalization is used as a basic element of the beautification dictionary matrix Ψ in the dictionary memory 32.
1-4-2. Combined sparse coding.
In this embodiment, the overall objective is achieved by finding the sparse code vectors α, β for the two learned dictionary matrices Φ, Ψ, similar to the linear transformation matrix L, using the following energy function minimization: To do.

ここで、目的関数の最初の２項は、特定の人物の辞書行列Φに関する顔ベクトルｘを再構成するための標準的なスパースコーディングである。第３及び第４項は顔美化の目的のための美しい顔の辞書行列Ψに関する再構成された顔画像データのベクトルΦαを表すためのスパースコーディングを実行するものである。ここで、２つのスパースコードベクトルα，β間の関係は線形変換行列Ｌを介して確立される。なお、線形変換行列Ｌは再構成された顔から合成された美しい顔への変換を示す。線形変換行列Ｌに対して制約を行うことによって、識別可能性の基準を満たすように実行される美化の度合い及びタイプを制限することができる。 Here, the first two terms of the objective function are standard sparse coding for reconstructing the face vector x related to the dictionary matrix Φ of a specific person. The third and fourth terms perform sparse coding to represent a vector Φα of reconstructed face image data related to a beautiful face dictionary matrix Ψ for the purpose of face beautification. Here, the relationship between the two sparse code vectors α and β is established via a linear transformation matrix L. Note that the linear transformation matrix L indicates transformation from a reconstructed face to a beautiful face synthesized. By constraining the linear transformation matrix L, the degree and type of beautification performed to meet the discriminability criteria can be limited.

目的関数はスパースコードベクトルα、β及び線形変換行列Ｌに対してともに凸関数（極大又は極小を有する関数）ではないが、もし他の変数が固定されれば、１つの変数に対して凸関数となる。従って、これらの変数を最適化するために交互の手順を繰り返し用いて実行する。具体的には、式（２）における目的関数に対処するためには、目的関数を３つの問題に分離し、すなわち、
（１）現在の顔に対するスパースコーディングと、
（２）美しい顔に対するスパースコーディングと、
（３）マッピング関数の更新と
に分離する。この手順は、収束するまで、又は所定の最大反復回数Ｔに達するまで繰り返される。以下では、初期化処理、３つのサブ問題と、それらの最適な解決法について説明する。 The objective function is not a convex function (a function having a maximum or a minimum) with respect to the sparse code vectors α and β and the linear transformation matrix L. However, if other variables are fixed, the convex function is applied to one variable. It becomes. Therefore, an alternate procedure is used repeatedly to optimize these variables. Specifically, to deal with the objective function in equation (2), the objective function is separated into three problems:
(1) Sparse coding for the current face,
(2) Sparse coding for beautiful faces,
(3) Separated into mapping function update. This procedure is repeated until convergence or until a predetermined maximum number of iterations T is reached. In the following, the initialization process, the three sub-problems and their optimal solutions will be described.

１−４−２−１．初期化処理．
次式の問題を解くことにより、初期スパース表現の係数ベクトルαを得る。 1-4-2-1. Initialization process.
By solving the problem of the following equation, a coefficient vector α of the initial sparse representation is obtained.

ここで、λ_１はラグランジュ乗数であり、係数ベクトルαの最適解は、拡張されたラグランジアン法（Augmented Lagrangian Methods（ＡＬＭ））（例えば、非特許文献２０参照）として公知である高速ｌ１最小化アルゴリズムによって有効的にかつ効率的に解くことができる。そして、線形変換行列Ｌは、現在の顔の特徴とスターの顔の特徴のグローバルな線形結合の目的のためのスカラー行列としてまた初期化される。具体的には、美化のレベルを制限するために、線形変換行列ＬをＬ＝０．４Ｉとして初期化する。ここで、Ｉは単位行列である。 Here, λ ₁ is a Lagrangian multiplier, and the optimum solution of the coefficient vector α is a fast l1 minimization algorithm known as an augmented Lagrangian method (ALM) (see, for example, Non-Patent Document 20). Can be solved effectively and efficiently. The linear transformation matrix L is then also initialized as a scalar matrix for the purpose of global linear combination of the current facial features and the star facial features. Specifically, in order to limit the level of beautification, the linear transformation matrix L is initialized as L = 0.4I. Here, I is a unit matrix.

１−４−２−２．βに関する最適化．
線形変換を行う対角行列Ｌを初期化した行列と、上記導出した係数ベクトルαとを用いて、係数ベクトルβに関する最適化問題は次式で表される。 1-4-2-2. Optimization for β.
The optimization problem related to the coefficient vector β is expressed by the following equation using the matrix obtained by initializing the diagonal matrix L for performing linear transformation and the derived coefficient vector α.

ここで、λ_２はラグランジュ乗数である。最後に、係数ベクトルαに対して前に行ったようにＡＬＭを用いて、スパース表現の係数ベクトルβをまた得ることができる。 Here, λ ₂ is a Lagrange multiplier. Finally, a sparse representation coefficient vector β can also be obtained using ALM as previously done for the coefficient vector α.

１−４−２−３．Ｌに関する最適化．
上記導出した係数ベクトルα，βを用いて、次式の最小化問題として線形変換行列Ｌを更新する。 1-4-2-3. Optimization for L.
The linear transformation matrix L is updated as a minimization problem of the following equation using the derived coefficient vectors α and β.

実用的な実装では、セットＬｓを構成する、例えば回転、スケーリング、シフトなどの簡単な線形変換であるように線形変換行列Ｌを制限する。その後、線形変換行列Ｌを最適化する問題は、上記定義された複数の変換を用いて簡単に検索するように変換する。このようにして、美化の度合い及びタイプが制約され、認識度が顔美化処理において保持される。 In a practical implementation, the linear transformation matrix L is limited to be a simple linear transformation such as rotation, scaling, and shift that constitutes the set Ls. Thereafter, the problem of optimizing the linear transformation matrix L is transformed so that it can be easily searched using the plurality of transformations defined above. In this way, the degree and type of beautification are restricted, and the recognition level is maintained in the face beautification process.

１−４−２−４．コードベクトルαに関する最適化．
係数ベクトルβ及び線形変換行列Ｌを固定することで、コードベクトルαに対するサブ問題は次式で表される。 1-4-2-4. Optimization for code vector α.
By fixing the coefficient vector β and the linear transformation matrix L, the sub-problem for the code vector α is expressed by the following equation.

簡単な代数演算と、定数の削除を行うことで、上記目的関数は標準的なスパースコーディングの問題として次式のように再定式化することができる。 By performing a simple algebraic operation and deleting constants, the objective function can be reformulated as a standard sparse coding problem as shown in the following equation.

上記の式はまた、効率的にＡＬＭアルゴリズムによって解くことができる。そして、画像処理部１５は次式を用いて出力顔画像データｙを生成して出力する。 The above equation can also be efficiently solved by the ALM algorithm. Then, the image processing unit 15 generates and outputs output face image data y using the following equation.

１−６．結論．
視線の不一致は、ビデオ会議での既知の問題である。本実施形態では、１つの統一されたスパースコーディングのフレームワークの中でＤＩＢＲで合成された視線補正済み画像と顔美化の穴埋めを行うことを提案した。ビッグデータ時代の合理的な仮定として、大容量のーパスの利用可能性を仮定すると、本実施形態の要旨は、１つは目標の人間対象でありもう１つは美しい顔生成のための２つのオフライン辞書行列Φ，Ψを学習することである。その結果、リアルタイムで２つのスパースコードベクトルを求めることができる。１つのスパースコードベクトルは、カメラでキャプチャされた画像から合成された画素を記述しており、もう１つのスパースコードベクトルは、複数の美しい顔の近似特徴を記述している。１つの再構成画像からもう１つの再構成画像にマッピングする線形変換はテレビ会議の参加者の被写体画像の認識度を確実にするように注意深く選択される。発明者らによる実験結果によれば、改善された魅力性を有する自然にレンダリング処理された顔画像を得ることができた。 1-6. Conclusion.
Gaze mismatch is a known problem in video conferencing. In the present embodiment, it has been proposed to perform face-beautification filling with a line-of-sight corrected image synthesized by DIBR within one unified sparse coding framework. Assuming the availability of a large-capacity path as a reasonable assumption in the age of big data, the gist of this embodiment is that one is a target human target and the other is two for beautiful face generation. It is to learn offline dictionary matrices Φ and Ψ. As a result, two sparse code vectors can be obtained in real time. One sparse code vector describes pixels synthesized from an image captured by the camera, and the other sparse code vector describes approximate features of a plurality of beautiful faces. The linear transformation that maps from one reconstructed image to another reconstructed image is carefully selected to ensure the recognition of the subject image of the video conference participant. According to the results of experiments by the inventors, it was possible to obtain a naturally rendered face image having improved attractiveness.

実施形態２．
２−１．まえがき
本実施形態では、実施形態１の改良された拡張版であって、以下の相違点（主要な改善点）を有することを特徴としている。主要な改善は以下のとおりである。
（１）制約付き線形変換の代わりに、テレビ会議の参加者の被写体画像の認識度を確実にするための、特徴空間の距離における第１及び第２のコードベクトル間のより一般的なマッチング基準を導入する。
（２）追加された魅力のために、参加者の顔の輪郭を修正するための新しいスパース隣接選択手順を追加した。 Embodiment 2. FIG.
2-1. 1. Introduction This embodiment is an improved extended version of the first embodiment, and is characterized by having the following differences (main improvements). Major improvements are as follows.
(1) Instead of constrained linear transformation, a more general matching criterion between first and second code vectors at feature space distances to ensure the recognition of subject images of participants in a video conference Is introduced.
(2) A new sparse neighbor selection procedure was added to modify the participant's facial contours for added appeal.

本実施形態のアプローチでは、必要な辞書が、複数の画像の利用可能なコーパスを用いてオフラインで学習され、その結果、テレビ会議注の複雑さは、適切なスパースコードベクトルの計算のみとなる。このことは、後述するように、最新のｌ_０−ノルムの最適化ツールを使用して、適度なサイズの辞書のためにリアルタイムで行うことができると考えられる。 In the approach of the present embodiment, the necessary dictionaries are learned offline using the available corpus of images, so that the complexity of the video conference note is only the calculation of the appropriate sparse code vector. This can be done in real time for moderately sized dictionaries using the latest l ₀ -norm optimization tool, as described below.

２−２．関連技術等
２−２−１．顔美化
例えばサポートベクトル回帰法（ＳＶＲ）などの機械学習技術を用いて、人間の評価者により入力された人間の顔に対する美スコアを予測することができることが開示される（例えば、非特許文献１６参照）が、人間の顔の美しさの正確な技術的な定義は、とらえどころのない理解しにくいままの状態となっている。非特許文献９、２３では、強化された魅力な人間の顔に変更して適切な変換を決定するために、同様のＳＶＲ技術を使用する。ＳＶＲ技術の代わりの方法としては、遺伝的アルゴリズムに基づく対話型進化計算（Interactive Evolutionary Computing(IEC)）と呼ばれる新しい方法が用いられる。顔美化の代わりには、例えば非特許文献２７は、女性の顔に対するメイクアップ効果を適用し、魅力性を高めるためのアルゴリズムを考案した。 2-2. Related technology 2-2-1. Face beautification It is disclosed that a beauty score for a human face input by a human evaluator can be predicted using machine learning techniques such as support vector regression (SVR) (for example, Non-Patent Document 16). However, the exact technical definition of the beauty of the human face remains elusive and difficult to understand. Non-Patent Documents 9 and 23 use a similar SVR technique to change to an enhanced attractive human face and determine the appropriate transformation. As an alternative to the SVR technique, a new method called Interactive Evolutionary Computing (IEC) based on a genetic algorithm is used. Instead of facial beautification, for example, Non-Patent Document 27 has devised an algorithm for applying a makeup effect on a female face to enhance attractiveness.

非特許文献９、２３の関連技術と比較して、本実施形態では、下記の２つの主要な点で異なっている。
（１）第一に、２つの辞書行列Φ，Ψをオフラインで構築するために、インターネット上の画像の膨大なデータベースを活用することであり、１つの辞書行列Φはテレビ会議の参加者の顔を目標対象とし、もう１つの辞書行列Ψは美しい顔に関するものである。後者の辞書行列Ψでは、例えば２０歳代から４０歳代までの中国、日本、韓国のテレビスター及び映画スターの美しい顔を学習データとして用いる。このことは、実際のビデオ会議中にリアルタイムで美しさのスコアを推定する必要がないことを意味する。ここで、ただ２つのスパースコードベクトルが、レンダリング処理された顔画像の画素を、第２の辞書行列Ψにおける美しい特徴にマッピングするように識別される必要がある。
（２）第二に、本実施形態の顔の美化の目的は、単一の画像に対して行うものではなく、リアルタイムにテレビ会議でエンコードされて送信される動画のために行うことにある。すなわち、リアルタイムの実装に適していない過度に複雑な顔美化プロセスは、アプリケーションに適していないことを意味する。なお、スパースコードベクトル探索の効率的な実装は詳細後述される。 Compared with the related technologies of Non-Patent Documents 9 and 23, the present embodiment is different in the following two main points.
(1) First, in order to construct two dictionary matrices Φ and Ψ offline, a vast database of images on the Internet is used. One dictionary matrix Φ is the face of a TV conference participant. The other dictionary matrix Ψ relates to a beautiful face. In the latter dictionary matrix Ψ, for example, beautiful faces of TV stars and movie stars in China, Japan and Korea from the 20s to 40s are used as learning data. This means that it is not necessary to estimate the beauty score in real time during an actual video conference. Here, only two sparse code vectors need to be identified to map the rendered face image pixels to the beautiful features in the second dictionary matrix Ψ.
(2) Secondly, the purpose of the beautification of the face according to the present embodiment is not to be performed on a single image but to be performed for a moving image encoded and transmitted in a video conference in real time. That is, an overly complex face beautification process that is not suitable for real-time implementation means that it is not suitable for the application. An efficient implementation of the sparse code vector search will be described later in detail.

２−２−２．スパース表現
スパース表現は、オーバーコンプリート辞書行列Φ，Ψ内の複数のアトムのスパース線形結合への信号の分解を意味する。ここで、Ｄを次式のように辞書行列とする。 2-2-2. Sparse representation Sparse representation means the decomposition of a signal into a sparse linear combination of a plurality of atoms in the overcomplete dictionary matrix Φ, Ψ. Here, D is a dictionary matrix as shown in the following equation.

ここで、ｎ＜Ｋであり、各ベクトルｄ_ｉは辞書行列Ｄ内のアトムである。信号ベクトルｘ（ｘ∈Ｒ^ｎ）は、辞書行列Ｄにおける複数のアトムの線形結合にいくらかの摂動εを加えたものとして表すことができる。すなわち、次式で表される。 Here, n <K, and each vector d _i is an atom in the dictionary matrix D. The signal vector x (xεR ⁿ ) can be expressed as a linear combination of a plurality of atoms in the dictionary matrix D plus some perturbation ε. That is, it is expressed by the following formula.

ここで、
でかつ
であることを同時に達成できるならば、上記式（８）の表現は「スパース」であるということができる。 here,
And
If it can be achieved at the same time, the expression of the above formula (8) can be said to be “sparse”.

辞書行列はスパース表現において重要な役割を果たしている。従来技術文献では、辞書の学習に強い関心がある。例えば、学習データにオーバーコンプリート辞書の適応化を行うために、機械学習技術を使用する。一例としては、Ｋ−ＳＶＤ法（例えば、非特許文献１７参照）、教師あり辞書学習（例えば、非特許文献２９参照）、オンラインの大容量辞書（例えば、非特許文献１８参照）などがある。これらの方法は、主に、単一の特徴空間でオーバーコンプリート辞書の育成に重点を置いている。最近では、一部の研究者は、結合された疎な特徴空間から辞書を学習することを提案した。例えば、ヤンら（例えば非特許文献３１参照）は、高解像度（ＨＲ）と低解像度（ＬＲ）の画像パッチ空間用の２つの辞書を学習し、ＨＲ及びＬＲパッチの各ペアに対して同じスパース表現を行わせるための協働の辞書学習法を提案した。この方法とその追随者（例えば非特許文献３２参照）は、スパース表現のために用いられる２つの辞書は結合され、もしくは半結合される必要がある。すなわち、観測空間と対象空間から来る２つの辞書中のアトムは、同一の内容を共用するが、例えばＬＲ−ＨＲパッチや、フォトスケッチパッチなどの異なった表現形式である必要がある。新しい観測画像が入力されると、上記観測辞書に関するスパースコードがまず取得され、次いで当該スパースコードは最終結果を得るために、同等又は線形の関係に基づいて、対象空間にマッピングされる。 Dictionary matrices play an important role in sparse representation. In the prior art literature, there is a strong interest in learning a dictionary. For example, machine learning techniques are used to adapt the overcomplete dictionary to the learning data. Examples include the K-SVD method (see, for example, Non-Patent Document 17), supervised dictionary learning (see, for example, Non-Patent Document 29), and an online large-capacity dictionary (see, for example, Non-Patent Document 18). These methods mainly focus on the development of overcomplete dictionaries in a single feature space. Recently, some researchers have suggested learning a dictionary from a combined sparse feature space. For example, Yang et al. (See, for example, Non-Patent Document 31) learn two dictionaries for high resolution (HR) and low resolution (LR) image patch spaces and use the same sparse for each pair of HR and LR patches. We proposed a collaborative dictionary learning method for expression. This method and its followers (see, for example, Non-Patent Document 32) require that two dictionaries used for sparse representation be combined or semi-connected. That is, the atoms in the two dictionaries coming from the observation space and the target space share the same content, but need to be in different expression formats such as LR-HR patches and photo sketch patches. When a new observation image is input, a sparse code related to the observation dictionary is first acquired, and then the sparse code is mapped to the target space based on an equivalent or linear relationship to obtain a final result.

これに対して、本実施形態の方法では、これらの制約を受けない。本実施形態では、モデリング能力とスパース表現の弁別の両方の性質を利用するために２つの非結合辞書を構築する。第１の辞書行列Φは、顔の再構成に使用される。これは、ビデオ会議の現在のテーマの具体的な顔の写真を使用して学習される。第２の辞書行列Ψは一般的に美しい顔のセットを使用して学習される顔美化のために使用される。明らかに、本実施形態では、両方の辞書行列Φ，Ψが目標空間から見ることができる。また、２つのスパースコードの間に直接的な関係が存在しない。そして、再構成されたパッチと美化パッチとの間の関係を確立する特徴空間の制約を使用する。上記の特徴点は、従来技術に係る結合された辞書の学習とは明らかに異なる。 On the other hand, the method of this embodiment is not subject to these restrictions. In this embodiment, two disjoint dictionaries are constructed to take advantage of the properties of both modeling capabilities and sparse representation discrimination. The first dictionary matrix Φ is used for face reconstruction. This is learned using specific face photos of the current theme of video conferencing. The second dictionary matrix Ψ is used for face beautification, which is typically learned using a set of beautiful faces. Obviously, in this embodiment, both dictionary matrices Φ and Ψ can be seen from the target space. Also, there is no direct relationship between the two sparse codes. The feature space constraint is then used to establish the relationship between the reconstructed patch and the beautification patch. The above feature points are clearly different from learning a combined dictionary according to the prior art.

２−３．システム構成
図２は本発明の実施形態２に係る画像処理部１５Ａを備えた対話装置の構成を示すブロック図である。図２において、実施形態２に係る対話装置は、図１の対話装置に比較して、画像処理部１５に代えて、画像処理部１５Ａ及び１９を備え、画像処理部１９は、顔輪郭調整部２３と、それに接続される顔輪郭メモリ３３と、画像ワーピング処理部２４とを備えたことを特徴とする。なお、画像処理部１５Ａでの画像処理は、詳細後述するように、画像処理部１５の画像処理とは若干異なる。ここで、再構成処理部２１及び顔美化処理部２２とは詳細後述するように一括の画像処理部２１にて行い、画像処理を実質的にリアルタイムで行うことを特徴としている。 2-3. System Configuration FIG. 2 is a block diagram showing the configuration of an interactive apparatus provided with an image processing unit 15A according to Embodiment 2 of the present invention. 2, the interactive apparatus according to the second embodiment includes image processing units 15A and 19 instead of the image processing unit 15 as compared with the interactive apparatus of FIG. 1, and the image processing unit 19 includes a face contour adjustment unit. 23, a face contour memory 33 connected thereto, and an image warping processing unit 24. Note that the image processing in the image processing unit 15A is slightly different from the image processing in the image processing unit 15 as described later in detail. Here, as will be described in detail later, the reconstruction processing unit 21 and the face beautification processing unit 22 are performed by the collective image processing unit 21, and image processing is performed substantially in real time.

本実施形態の対話装置は、低コストと展開の容易さという目標に向けて、ディスプレイ１８の下部に、例えばマイクロソフトのＫｉｎｅｃｔのような、テクスチャ画像（色）及び奥行き画像を同時に取り込むことができる、単一の深度感知カメラ１１Ａを採用する。カメラ１１Ａによりキャプチャされた画像データは、実施形態１と同様の処理を行う、一時メモリ１３ｍを有する画像データ取得部１３及び視線補正処理部１４を介して画像処理部１５Ａに入力される。 The interactive apparatus of the present embodiment can simultaneously capture a texture image (color) and a depth image, such as Microsoft Kinect, at the bottom of the display 18 for the purpose of low cost and easy deployment. A single depth sensing camera 11A is employed. Image data captured by the camera 11A is input to the image processing unit 15A via the image data acquisition unit 13 and the line-of-sight correction processing unit 14 having the temporary memory 13m, which perform the same processing as in the first embodiment.

ビデオ会議セッションの開始に先立ち、種々の表情を持つ正面の美しい人の画像データを収集してなる、大きな画像セットはオフライン辞書学習に利用可能であると仮定する。辞書メモリ３１，３２内の２つの辞書行列Φ，Ψは以下のように学習される。すなわち、まず、辞書メモリ３１，３２は画素パッチ毎に動作して画像の補完のために使用され、次いで、顔の構成要素毎に（例えば、目や眉）動作し、顔構成要素の美化のために使用される。ここで、技術的に人間の顔の美しさの概念を定義することは困難である。従って、本実施形態では、単純に２０歳代から４０歳代までの齢でインターネットを介して、男性又は女性で公的に入手可能であり、例えば中国、日本、韓国のテレビスターや映画スターなどの顔画像を収集する。本実施形態では、２つの辞書行列Φ，Ψの学習のために、テレビスターや映画スターの顔画像の構成要素（構成要素）を使用する。再構成処理部２１による再構成処理及び顔美化処理部２２による顔美化処理により、例えば目及び眉毛などを美化するように顔の構成要素を再構成する。 Assume that prior to the start of a video conference session, a large set of images, which is a collection of image data of beautiful people with different facial expressions, can be used for offline dictionary learning. The two dictionary matrices Φ and Ψ in the dictionary memories 31 and 32 are learned as follows. That is, first, the dictionary memories 31 and 32 operate for each pixel patch and are used for image complementation, and then operate for each face component (for example, eyes and eyebrows) to beautify the face component. Used for. Here, it is difficult to technically define the concept of human face beauty. Therefore, in this embodiment, it is simply available for men or women via the Internet at the age of 20s to 40s, such as TV stars and movie stars in China, Japan, Korea, etc. Collect face images. In the present embodiment, constituent elements (constituent elements) of face images of TV stars and movie stars are used for learning two dictionary matrices Φ and Ψ. By the reconstruction processing by the reconstruction processing unit 21 and the face beautification processing by the face beautification processing unit 22, the constituent elements of the face are reconfigured so as to beautify the eyes and eyebrows, for example.

次いで、以下のようにビデオ会議中に、同じ時刻にキャプチャされたテクスチャ画像及び深度画像の各対が処理される。まず、最初に、前景／背景分割処理は、深度画像をしきい値処理することにより行われる。すなわち、事前に選択された値τよりも小さな値を持つ深度の画素は、前景とみなされる。第二に、ディスプレイ１８の中心が所望の仮想視点として、分割処理されて識別された画素の画像データに対してＤＩＢＲに基づく視線補正処理が行われる。ここで、キャプチャされたカメラ画像の視点からのテクスチャ画像の画素を、仮想的な視点が与えられた対応する利用可能な深度画像の画素にマッピングすることができる。上記マッピングするＤＩＢＲのために必要な、本質的な固有の行列と、非本質的な外来の行列は、標準的なカメラ１１Ａを使用して導出することができる。キャプチャされた深度画像における自己遮蔽及び欠落深度画素のために、充填を必要とするＤＩＢＲに基づく処理で合成された前景の画素において欠落部が存在する。例えば非特許文献６、９、１０などの従来技術文献に開示された一般的な穴埋め（インペインティングと呼ばれる）技術を使用することに代えて、本実施形態では、穴埋め及び顔構成要素の美化処理を結合して視覚的に満足な顔を再構成するために実行する。次に、顔輪郭調整部２３は顔輪郭メモリ３３を用いてスパース隣接選択手順を使用して顔の輪郭をオプショナルで調整し、画像ワーピング処理部２４は、再構成され視線補正された顔データが転置処理及びワーピング処理されて、視線不一致の顔を置き換えることで元のキャプチャされた顔画像に元の顔画像の背景を含むようにワーピング処理が実行される。また、取り込まれた画像の背景を再利用することで、仮想的な画像に起因するオクルージョンによる背景画像を完成する必要はない。画像ワーピング処理部２４から出力される顔画像データは画像通信部１６に出力され、その後の処理は実施形態１と同様に行われる。 Each pair of texture and depth images captured at the same time is then processed during the video conference as follows. First, foreground / background division processing is performed by threshold processing of a depth image. That is, a pixel having a depth smaller than the preselected value τ is regarded as the foreground. Second, the line-of-sight correction processing based on DIBR is performed on the image data of the pixels identified by the division processing, with the center of the display 18 as the desired virtual viewpoint. Here, the pixels of the texture image from the viewpoint of the captured camera image can be mapped to the corresponding available depth image pixels given the virtual viewpoint. The intrinsic intrinsic and non-essential extraneous matrices required for the DIBR to be mapped can be derived using the standard camera 11A. Due to self-occlusion and missing depth pixels in the captured depth image, there are missing parts in the foreground pixels synthesized in the DIBR-based process that requires filling. For example, instead of using a general hole filling (referred to as inpainting) technique disclosed in prior art documents such as Non-Patent Documents 6, 9, and 10, in this embodiment, hole filling and beautification of face components are performed. Perform processing to combine and reconstruct a visually satisfied face. Next, the face contour adjustment unit 23 optionally adjusts the face contour using the sparse adjacent selection procedure using the face contour memory 33, and the image warping processing unit 24 stores the reconstructed and line-of-sight corrected face data. The warping process is executed so that the background of the original face image is included in the original captured face image by replacing the face that does not match the line of sight by the transposition process and the warping process. Further, by reusing the background of the captured image, it is not necessary to complete a background image by occlusion caused by a virtual image. The face image data output from the image warping processing unit 24 is output to the image communication unit 16, and the subsequent processing is performed in the same manner as in the first embodiment.

本実施形態において重要な技術的貢献は、統一された二重のスパースコーディングのフレームワークを経由して視線補正後の顔画像データの再構成と顔構成要素美化処理である。本実施形態では、２つのオフラインで学習された辞書行列Φ，Ψが与えられ、２つの辞書行列Φ，Ψのために２つのスパースコードベクトルα，βを協働して検索する。ここで、辞書行列Φ，Ψはそれぞれ、観測されたＤＩＢＲで合成された顔画像の画素を記述し、美しい顔の構成要素への近接性を向上させるために用いられる。 An important technical contribution in the present embodiment is the reconstruction of face image data after eye gaze correction and a face component beautification process via a unified double sparse coding framework. In the present embodiment, two offline learned dictionary matrices Φ and Ψ are given, and two sparse code vectors α and β are searched in cooperation for the two dictionary matrices Φ and Ψ. Here, the dictionary matrices Φ and Ψ are used to describe the pixels of the face image synthesized by the observed DIBR and improve the proximity to the components of the beautiful face.

図３は図２の画像処理部１５Ａにおいて実行されるデユアルコードベクトルの検索を説明するための図である。図３では、幾何学的に、協働最適化について図示している。図３（ａ）に示すように、観測ベクトルｘが与えられたときに、顔再構成ベクトルΦαが黒い円４１内のｌ_２−ノルム空間に近接する必要がある。また、顔美化ベクトルΨβは特徴空間の距離内（領域４２）におけるベクトルΦαに近接する必要がある。個別に解を求めた場合は、スパースコードベクトルβの探索領域は、円４１内の中心を有するすべての領域４２の集合というよりはむしろ、１つのスパースコードベクトルαが与えられたときの１つの領域４２である。図３（ｂ）に示すように、１つの不規則な形状の領域４２は二次関数で近似することができる。 FIG. 3 is a diagram for explaining a search for a dual code vector executed in the image processing unit 15A of FIG. In FIG. 3, geometrically, cooperative optimization is illustrated. As shown in FIG. 3A, when the observation vector x is given, the face reconstruction vector Φα needs to be close to the l ₂ -norm space in the black circle 41. Further, the face beautification vector Ψβ needs to be close to the vector Φα within the distance (region 42) of the feature space. When the solution is obtained individually, the search area of the sparse code vector β is one set when one sparse code vector α is given, rather than the set of all areas 42 having the center in the circle 41. Region 42. As shown in FIG. 3B, one irregularly shaped region 42 can be approximated by a quadratic function.

図３において、不完全でかつ雑音で破損した観測ベクトルｘが与えられたときに、２次元空間における円になるｌ_２−ノルム空間内において観測ベクトルｘに近いスパース顔再構成データを求めている。同時に、テレビ会議の被写体画像の認識度を所望のレベルに維持するために、実行する顔美化の量を制限する特徴空間の距離拘束Ｓを設定する。数学的には、顔美化ベクトルΨβは、与えられたαのための領域４２で、定義された特徴空間における再構成ベクトルΨαに近接する必要があることを意味する。 In FIG. 3, when an observation vector x that is incomplete and damaged by noise is given, sparse face reconstruction data close to the observation vector x is obtained in an l ₂ -norm space that becomes a circle in a two-dimensional space. . At the same time, in order to maintain the recognition level of the subject image of the video conference at a desired level, a feature space distance constraint S that limits the amount of face beautification to be performed is set. Mathematically, it means that the face beautification vector ψβ needs to be close to the reconstructed vector ψα in the defined feature space in the region 42 for a given α.

本実施形態では、協働して同時に遮蔽物除去穴埋めと顔構成要素の美化処理を行うことにより、潜在的に全体的な目的関数を最小化することができ、コードベクトルα及びβの最適なペアを識別することができる。ここで、当該目的関数は両方のベクトルの穴埋め、顔美化の量とスパース性の忠実度を考慮している。幾何学的には、これらのベクトルのこのペアのための探索空間は、コードベクトルαのための全体の円と、コードベクトルβのための円内の中心を持つすべての領域４２の和集合である。これに対して、穴埋めと顔美化を個別に実行する場合には、固定されたコードベクトルαについて解いた後に、再構成ベクトルΦαを中心とした唯一の領域４２はコードベクトルβの探索空間として考えられ、この場合ははるかに多くの制限がかけられることになる。本実施形態では、協働した最適化を実行することで、個々の最適化よりもより高い美化度スコアを有する顔の構成要素を再構成することができる。 In the present embodiment, the overall objective function can potentially be minimized by cooperating and simultaneously performing occlusion removal hole filling and face component beautification processing, and optimal code vectors α and β can be optimized. Pairs can be identified. Here, the objective function takes into account the filling of both vectors, the amount of face beautification and the fidelity of sparsity. Geometrically, the search space for this pair of these vectors is the union of the entire circle 42 for code vector α and all regions 42 with centers in the circle for code vector β. is there. On the other hand, when hole filling and face beautification are performed separately, after solving for a fixed code vector α, the only region 42 centered on the reconstructed vector Φα is considered as a search space for the code vector β. In this case, there are far more restrictions. In this embodiment, by performing cooperative optimization, facial components having higher beautification scores than individual optimization can be reconstructed.

２−４．デユアルスパース表現を用いた顔再構成及び顔美化の協働処理
以下では、再構成処理部２１及び顔美化部２２の処理の詳細について説明する。成功したアルゴリズムは、以下の３つの基準を満たす必要があることを提案している。 2-4. Collaborative processing of face reconstruction and face beautification using dual sparse expression The details of the processing of the reconstruction processing unit 21 and the face beautification unit 22 will be described below. A successful algorithm proposes that the following three criteria must be met:

（１）フィデリティ（忠実性）：ＤＩＢＲで合成された顔画像の穴は、自然な人間の顔が得られるように、満足して完成させる必要がある。
（２）アトラクティブさ（魅力性）：レンダリング処理された顔は、美しい顔の構成要素を有する顔の魅力を高めておく必要がある。
（３）認識度：対象の被験者が間違いなく認識可能であるように、微妙な修正を元の顔データに対して処理を実行する必要がある。 (1) Fidelity: The hole in the face image synthesized by DIBR needs to be satisfied and completed so that a natural human face can be obtained.
(2) Attractiveness (attractiveness): The rendered face needs to enhance the attractiveness of the face having beautiful face components.
(3) Degree of recognition: It is necessary to perform a process on the original face data with a slight correction so that the subject subject can be surely recognized.

図１の再構成処理部２１及び顔美化処理部２２に対して上記第１及び第２の基準を満たすように適用し、第３の基準は、顔美化処理部２２における顔美化の量を制限するように適切な特徴空間の制約を導入することによって満たすことができる。 1 is applied to the reconstruction processing unit 21 and the face beautification processing unit 22 in FIG. 1 so as to satisfy the first and second criteria, and the third criterion limits the amount of face beautification in the face beautification processing unit 22. Can be satisfied by introducing appropriate feature space constraints.

２−４−１．典型的な画像アライメント（位置合わせ）
図４は図２の画像処理部１５Ａにおいて実行される典型的な画像のアライメント（位置合わせ）で用いるランドマークの一例を示す図である。図４の例では、図４（ｂ）の顔正面画像から検出された１０３個のランドマークを図示しており、図４（ａ）ではその２次元プロットを図示している。 2-4-1. Typical image alignment
FIG. 4 is a diagram showing an example of landmarks used in typical image alignment (positioning) executed in the image processing unit 15A of FIG. In the example of FIG. 4, 103 landmarks detected from the face front image of FIG. 4B are illustrated, and FIG. 4A illustrates a two-dimensional plot thereof.

テスト画像と学習画像を含むそれぞれの顔データは、すべての顔の構成要素との輪郭を正確に識別できるように、ランドマークのセットによって識別される。図４に示すように、例えば非特許文献２１に記載の手順を用いて、１０３個のランドマークを識別する。スケールシフト及び回転の影響を回避するために、学習セットにおける各見本の顔画像は、予め決定されたデフォルトの面に位置合わせされる。 Each face data including the test image and the learning image is identified by a set of landmarks so that the outlines of all face components can be accurately identified. As shown in FIG. 4, for example, 103 landmarks are identified using the procedure described in Non-Patent Document 21. In order to avoid the effects of scale shifting and rotation, each sample face image in the learning set is aligned to a predetermined default plane.

２−４−２．顔再構成におけるスパース表現
自己遮蔽（オクルージョン）のために、ＤＩＢＲで合成された顔画像中において画素の欠落が存在することになり、良好な充填を達成するために有効な顔の再構成手順が必要となる。顔の再構成の目的は、その劣化した観測ベクトルｘをより、高品質の顔のベクトルｙを得ることにあり、これをモデル化すると次式で表される。 2-4-2. Sparse Representation in Face Reconstruction Due to self-occlusion, there will be pixel loss in the face image synthesized with DIBR, and an effective face reconstruction procedure to achieve good filling Necessary. The purpose of the face reconstruction is to obtain a higher quality face vector y from the deteriorated observation vector x, and when this is modeled, it is expressed by the following equation.

ここで、Ｈは、穴の位置を示す劣化行列であり、ｖは、投影後の２次元画素グリッドに丸めること、捕捉された深度画素の不正確さによるＤＩＢＲに基づく処理の合成誤差をモデル化したベクトルである。 Here, H is a deterioration matrix indicating the position of the hole, and v is a model of a synthesis error of processing based on DIBR due to rounding to a two-dimensional pixel grid after projection and inaccuracy of captured depth pixels. Vector.

統計的な観点からは、劣化した形式から顔データを再構成することは、本質的に不良設定された逆問題である。顔の再構成アルゴリズムの性能は、主に、数値問題を解くときに事前分布を使用することができるどうかの度合いに依存する。画像について事前知識を組み込むことに対する１つの一般的な技術は、いわゆるスパースモデルを介して行われる。ここで、顔画像は適切に選択された辞書行列Φの要素のスパース線形結合で次式のように近似される。 From a statistical point of view, reconstructing face data from a degraded format is essentially an inverse problem with poor settings. The performance of the face reconstruction algorithm depends mainly on the degree to which the prior distribution can be used when solving numerical problems. One common technique for incorporating prior knowledge about images is through a so-called sparse model. Here, the face image is approximated by the sparse linear combination of the elements of the appropriately selected dictionary matrix Φ as follows:

ここで、εは近似誤差である。観測ベクトルｘを生成する劣化行列Ｈ及び入力画像ベクトルｙをスパース的に表すための辞書行列Φの事前データを協働して表すために、顔の再構成問題は次式のように定式化される。 Here, ε is an approximation error. In order to cooperatively represent the pre-data of the degradation matrix H that generates the observation vector x and the dictionary matrix Φ for sparsely representing the input image vector y, the face reconstruction problem is formulated as The

ここで、λはラグランジュ乗数である。 Here, λ is a Lagrange multiplier.

上記スパース再構成の重要なポイントは、顔画像が高スパース性を示す、辞書行列のアトムが存在する空間をどのように発見するかにある。顔の再構成は、画像再構成問題とみなすことができる。例えば従来技術文献１７，１８では、局所パッチを用いた画像再構成は非常に一般的であり、非常に有効であることが示されている。また、顔画像が空間的に変化する疎表現を有する非定常で統計的であるので、本実施形態では、局所的な統計を学習して表す適応パッチ単位を有し、スパースを基礎とする再構成方法を示す。テレビ会議の参加者の与えられた顔画像例から、重複を有するすべてのパッチを抽出する。顔の複数の特徴は、画像面を通して多くの部分で変化するが、顔のマイクロ構造は少数の構造的な原始的元データで表すことができることを発見した。これらの原始的元データは、セルの受容フィールドを簡単化するための形式と質的に似ている。そこで、収集された複数のパッチデータを、Ｋ−ｍｅａｎｓクラスタリング法（例えば、非特許文献１９参照）を用いて同様の幾何学的構造を有する複数のクラスタにクラスタリングし、コンパクトなサブ辞書を学習することにより各クラスタをモデル化する。具体的には、符号化対象のｎ_ｉ個の画像パッチを含むある特定のクラスタｉについて、複数のパッチのベクトルをＹ_ｉで示される行列に積み重ねる。そして、パッチベクトルＹ_ｉに対して主成分分析法（ＰＣＡ）を適用することにより、パッチベクトルＹ_ｉに最も関連する適応化サブ辞書行列Φ_ｉを学習させる。ＰＣＡはサブ辞書行列Φ_ｉを生成し、サブ辞書行列Φ_ｉのアトムはパッチベクトルＹ_ｉの共分散行列の固有ベクトルである。 An important point of the sparse reconstruction is how to find a space in which an atom of a dictionary matrix in which a face image exhibits high sparsity exists. Face reconstruction can be regarded as an image reconstruction problem. For example, prior art documents 17 and 18 show that image reconstruction using a local patch is very common and very effective. In addition, since the face image is non-stationary and statistical having a sparse representation that varies spatially, this embodiment has an adaptive patch unit that learns and represents local statistics, and is a sparse-based replay. The configuration method is shown. All patches having duplication are extracted from the face image examples given by the participants of the video conference. It has been discovered that facial features vary in many parts throughout the image plane, but the facial microstructure can be represented by a small number of structural primitive source data. These primitive source data are qualitatively similar to the format for simplifying the cell acceptance field. Therefore, a plurality of collected patch data is clustered into a plurality of clusters having the same geometric structure using a K-means clustering method (see, for example, Non-Patent Document 19), and a compact sub-dictionary is learned. To model each cluster. Specifically, for a specific cluster i including n _i image patches to be encoded, a plurality of patch vectors are stacked in a matrix indicated by Y _i . Then, by applying principal component analysis method (PCA) with respect to the patch vector Y _i, train the adaptive sub-dictionary matrix [Phi _i most relevant to the patch vector Y _i. PCA produces a sub-dictionary matrix [Phi _i, Atom subdictionaries matrix [Phi _i is the eigenvector of the covariance matrix of the patch vector Y _i.

クラスタ内の複数のパッチが互いに類似しているので、サブ辞書行列Φ_ｉはオーバーコンプリートである必要はない。しかし、すべてのサブ辞書行列Φ_ｉは、顔のすべての可能な局所構造を特徴づけるために、大容量のオーバーコンプリート辞書行列Φを構築するために一緒に結合される。 Because the patches in the cluster are similar to each other, the sub-dictionary matrix Φ _i need not be overcomplete. However, all sub-dictionary matrices Φ _i are joined together to build a large overcomplete dictionary matrix Φ to characterize all possible local structures of the face.

２−４−３．顔美化処理のスパース表現
顔画像の構造は、顔の構成要素の輪郭と平滑領域の連結であると解釈することができる
（例えば、非特許文献２８参照）。顔美化処理部２２の顔美化処理は、以下の３つの部分を含む。
（１）顔の構成要素美化（例えば眼の拡大など）、
（２）顔の輪郭調整（顔形状の調整など）、及び
（３）皮膚の滑らかさ。 2-4-3. Sparse representation of face beautification processing The structure of a face image can be interpreted as a connection between the contours of facial components and smooth regions (for example, see Non-Patent Document 28). The face beautification process of the face beautification processing unit 22 includes the following three parts.
(1) Facial component beautification (eg, eye enlargement),
(2) facial contour adjustment (such as facial shape adjustment), and (3) skin smoothness.

顔の再構成は、ある度合いまで、皮膚の滑らかさの効果を得ることができる。それは、オーバーラップされた複数のパッチに基づいているからである。以下、顔美化処理部２２による顔の構成要素美化、及び、顔輪郭調整部２３による顔の輪郭調整について説明する。 Facial reconstruction can achieve the effect of skin smoothness to a certain extent. Because it is based on multiple overlapping patches. The face beautification process performed by the face beautification processing unit 22 and the face contour adjustment performed by the face contour adjustment unit 23 will be described below.

まず、顔の構成要素美化について以下に説明する。顔が大きい多様性を持つオブジェクトであるにもかかわらず、目、眉、鼻、口などのいくつかの基本的な構成要素から構成される。従来の研究では、美しさの判断課題における顔の構成要素の重要性を示している（例えば、非特許文献１６参照）。再構成された顔データが入力されると、ランドマークの位置に応じて、左目、右目、左眉、右眉の顔の構成要素の４つのクラスを抽出する。そして、次のように個別に構成要素毎に１つのサブ辞書を構築する。 First, facial component beautification will be described below. Although the face is an object with great diversity, it is composed of several basic components such as eyes, eyebrows, nose and mouth. Conventional research has shown the importance of facial components in the beauty determination task (see, for example, Non-Patent Document 16). When the reconstructed face data is input, four classes of constituent elements of the left eye, right eye, left eyebrow, and right eyebrow face are extracted according to the position of the landmark. Then, one sub-dictionary is constructed for each component individually as follows.

美的観点から、顔の構成要素の美しさは、構成要素の形状や大きさによって反映され、これは、ランドマークの位置によって記述される。従って、美しい顔の構成要素がランドマークのより美しい分布を持っていることを前提とする。それに対応して、本実施形態では、顔の構成要素美化のためのパラメータベースの辞書学習アプローチを採用する。具体的には、現在の顔の構成要素ｘ_ｉのために、学習サンプルとして美しい顔データセットから、十分な量の例示的な構成要素Ｘ_ｉを収集する。ここで、構成要素Ｘ_ｉは次式で表される。 From an aesthetic point of view, the beauty of facial components is reflected by the shape and size of the components, which is described by the location of the landmarks. Therefore, it is assumed that the beautiful face components have a more beautiful distribution of landmarks. Correspondingly, the present embodiment employs a parameter-based dictionary learning approach for beautifying facial components. Specifically, for the current face component x _i , a sufficient amount of example component X _i is collected from a beautiful face data set as a learning sample. Here, the component X _i is represented by the following equation.

テストサンプルと学習サンプルの両方から、それぞれｖ_ｉ及び
によって表されるそれらのランドマークの位置を抽出する。学習する美しい構成要素の位置パラメータは顔美化辞書行列Ψ_ｉの基本要素として使用される。顔美化辞書行列Ψ_ｉは次式で表される。 From both the test and learning samples, v _i and
Extract the positions of those landmarks represented by The position parameter of the beautiful component to be learned is used as a basic element of the face beautification dictionary matrix Ψ _i . The face beautification dictionary matrix Ψ _i is expressed by the following equation.

顔美化処理は、顔美化辞書行列Ψ_ｉにおけるアトム（ベクトル）
にアプローチすることである。再構成された構成要素が最も近接したアトム
に近くなるにつれて、再構成ベクトルがより美しくなる。数学的には、最適な再構成されたランドマーク分布β_ｉ ^＊を次式のように求める。 The face beautification process is an atom (vector) in the face beautification dictionary matrix Ψ _i .
To approach. Atom with the closest reconstructed component
The reconstruction vector becomes more beautiful as it approaches. Mathematically, the optimum reconstructed landmark distribution β _i ^* is obtained as follows:

ここで、β_ｉ ^（ｊ）は、ランドマーク分布β_ｉのｊ番目の要素を示す。この調整項は、最も近接する学習サンプルが再構成の美化を評価するときに用いられることを保証する。そして、顔美化されたランドマークの位置は次式のように表すことができる。 Here, β _i ^(j) indicates the j-th element of the landmark distribution β _i . This adjustment term ensures that the closest learning sample is used when evaluating the beautification of the reconstruction. The position of the landmark beautified can be expressed as follows.

次いで、入力される構成要素を次式のその美化バージョンにワーピングする（ひずませる）ために上記計算されランドマーク分布を用いる。 The calculated landmark distribution is then used to warp the incoming component to its beautified version of

ここで、Ｗ（・，・）はワーピング演算子であり、Φ_ｉα_ｉは構成要素ｘ_ｉに対応する再構成された顔の構成要素である。 Here, W (•, •) is a warping operator, and Φ _i α _i is a component of the reconstructed face corresponding to the component x _i .

本実施形態でのもう１つの課題は、結果として得られる美化された顔が元の顔から頑強で間違いなく認識可能であることを保持するように、元の顔に対してただ微小で微妙な修正を導入することで顔美化の目的を達成することである。本実施形態では、それぞれの顔の構成要素の認識性を個別に検討し、次のように視認性制約を記述する。 Another challenge with this embodiment is that the resulting beautified face is just subtle and subtle with respect to the original face so as to keep it robust and definitely recognizable from the original face. It is to achieve the purpose of facial beautification by introducing corrections. In this embodiment, the recognizability of each facial component is examined individually, and visibility constraints are described as follows.

ここで、Ｓ（）は例えばスケール不変特徴変換法（ＳＩＦＴ）（例えば、非特許文献３４参照）を用いて、画素領域のベクトルを、選択された特徴空間におけるベクトルにマッピングする関数である。ここで、スケール不変特徴変換法（ＳＩＦＴ）は、ある度合いまでスケーリング、回転、変換（これらにより、小さな変化がなされても認識度に対して影響を与えることはない）を行う。また、γは認識性の所望の度合いを指定する制約パラメータである。換言すれば、式（１５）は、顔美化の前後のパッチ間の特徴ベクトル差のｌ_２−ノルムが大きすぎないようにする必要があることを示している。視認性のための基準として特徴ベクトル差のｌ_２−ノルムを使用すると、オブジェクト検索の文献（例えば、非特許文献３３参照）と一致している。 Here, S () is a function that maps a pixel area vector to a vector in a selected feature space using, for example, a scale invariant feature transformation method (SIFT) (see, for example, Non-Patent Document 34). Here, the scale invariant feature conversion method (SIFT) performs scaling, rotation, and conversion to a certain degree (thus, even if a small change is made, the recognition degree is not affected). Γ is a constraint parameter that specifies a desired degree of recognition. In other words, Equation (15) indicates that the l ₂ -norm of the feature vector difference between patches before and after face beautification needs to be not too large. If the l ₂ -norm of the feature vector difference is used as a reference for visibility, it is consistent with a document for object search (see, for example, Non-Patent Document 33).

視認性の制約によって、顔再構成と顔構成要素の美化の問題を、次式で示すように、１つの統一されたデュアルスパースコーディングのフレームワークにまとめることができる。 Due to visibility constraints, the problem of face reconstruction and beautification of face components can be combined into one unified dual sparse coding framework, as shown in the following equation.

ここで、λ_１，λ_２はラグランジュ乗数であり、μは所定の繰り返し定数である。 Here, λ ₁ and λ ₂ are Lagrange multipliers, and μ is a predetermined repetition constant.

目的関数の最初の２つの項は、顔が特定された辞書行列Φ_ｉに関する顔ベクトルｘ_ｉを構成するためのスパースコーディングにおいて一般的である。第３項において、２つのスパースコードベクトルα_ｉ及びβ_ｉの間の関係が確立される。そして、画像処理部１５Ａは次式を用いて出力顔画像データｙを生成して出力する。 The first two terms of the objective function are common in sparse coding to construct a face vector x _i for the face-specified dictionary matrix Φ _i . In the third term, a relationship between the two sparse code vectors α _i and β _i is established. Then, the image processing unit 15A generates and outputs output face image data y using the following equation.

なお、式（１６）をどのようにして効率的に計算するかを詳細後述する。 Details of how to efficiently calculate equation (16) will be described later.

２−４−５．顔の輪郭調整
顔の輪郭は図２の顔輪郭調整部２３により実行され、人間の顔の魅力に高度に影響を与える。本実施形態では、以下の４つのステップを含み、基本的な考え方は、次のとおりである。 2-4-5. Face Contour Adjustment Face contour is executed by the face contour adjustment unit 23 in FIG. 2 and highly affects the attractiveness of a human face. In this embodiment, the following four steps are included, and the basic concept is as follows.

（１）テストの顔と、学習美化された顔との両方について、平面のグラフは頂点として検出されたランドマークを有して構成され、当該グラフ中の辺の長さに対応する距離のベクトルが抽出される。
（２）テストの顔に対して、より美しい輪郭に対応する修正された距離ベクトルは、学習美しい距離ベクトルを使用して発見される。
（３）変形された距離ベクトルに従って、新たな平面グラフは、変更された距離にできるだけ近い新たなエッジの長さを形成するように生成される。
（４）美化された顔の輪郭は、複数のランドマークの結果の新しい位置にテストの顔をワープすることにより得られる。 (1) For both the test face and the learned and beautified face, the plane graph is configured with landmarks detected as vertices, and a distance vector corresponding to the length of the side in the graph Is extracted.
(2) For the test face, a modified distance vector corresponding to a more beautiful contour is found using the learned beautiful distance vector.
(3) According to the modified distance vector, a new plane graph is generated to form a new edge length as close as possible to the changed distance.
(4) The beautified face contour is obtained by warping the test face to a new location of the multiple landmark results.

第１のステップのために、例えば非特許文献２３のようにランドマークのすべての対の間の距離を計算しないが、図４において１〜２５のように索引付けされた、顔の輪郭に沿った複数のランドマークの対の間の距離を計算する。主な課題は、修正された距離ベクトルをどのように検索するかにある。例えば非特許文献２３では、顔空間内のその顔のＫ個の最近傍（ＫＮＮ）の美しさでの重み付けされた平均を用いて距離ベクトルを修正することを提案している。しかし、これらのすべての固定されたＫ個の隣接ベクトルが、入力ベクトルに対して最初のＫ個の最も関連したものであることを保証するものではない。類似した最近傍の数がＫ未満である場合には、ＫＮＮは美化の性能を制限する関係のないサンプルを導入する。これとは対照的に、本実施形態では、適応型で柔軟なスパース隣接選択方法を提案する。修正されたものを再構成するために最小限の密接に関連する学習距離ベクトルを見つけることによって、この問題を解消することが可能である。 For the first step, do not calculate distances between all pairs of landmarks, eg as in Non-Patent Document 23, but along the face contours indexed as 1-25 in FIG. Calculate the distance between multiple landmark pairs. The main challenge is how to retrieve the modified distance vector. For example, Non-Patent Document 23 proposes correcting the distance vector using a weighted average of the beauty of K nearest neighbors (KNN) of the face in the face space. However, it does not guarantee that all these fixed K neighboring vectors are the first K most relevant ones for the input vector. If the number of similar nearest neighbors is less than K, KNN introduces irrelevant samples that limit the beautification performance. In contrast, the present embodiment proposes an adaptive and flexible sparse neighbor selection method. It is possible to eliminate this problem by finding a minimal closely related learning distance vector to reconstruct the correction.

Θは距離辞書を表し、その距離辞書のアトムは学習する美しい顔の距離ベクトルから構成される。次いで、テスト距離ベクトルｄのスパース表現は次式で表される。 Θ represents a distance dictionary, and atoms in the distance dictionary are composed of distance vectors of beautiful faces to be learned. Next, the sparse representation of the test distance vector d is expressed by the following equation.

ここで、
はスパース表現の係数ベクトルを示し、ｎは、辞書行列中のアトムの数である。次いで、近傍選択は、以下の基準に従って達成される。 here,
Indicates a coefficient vector of sparse representation, and n is the number of atoms in the dictionary matrix. Neighbor selection is then achieved according to the following criteria:

ここで、Ｎ（ｄ）はｄの近傍物であり、δは次式のように定義される近傍選択関数である。 Here, N (d) is a neighbor of d, and δ is a neighborhood selection function defined as follows.

ここで、τはしきい値である。近傍物が決定されると、以下のように、Ｎ（ｄ）のすべての隣接の重みが計算される。 Here, τ is a threshold value. Once the neighborhood is determined, the weights of all neighbors of N (d) are calculated as follows:

最後に、修正された距離ベクトルは次式のように導出することができる。 Finally, the modified distance vector can be derived as:

２−５．最適化アルゴリズム
本実施形態では、式（１６）で定式化したように、統一されたスパースコーディングのフレームワークにおける顔の構成要素の美化問題とともに、人間の顔に対する遮蔽物除去穴埋め問題を協働して解決する。しかし、式（１６）で定式化された目的関数は非凸であり（極値が存在しない）、スパースコードベクトルβ_ｉ’に対する制約が存在し、その結果、制約された非凸最適化問題が生じる。これは直接的に解くことが難しい問題にしている。 2-5. Optimization Algorithm In this embodiment, as formulated by Equation (16), the face removal component filling problem for the human face is cooperated with the face beautification problem in the unified sparse coding framework. To solve. However, the objective function formulated in equation (16) is non-convex (no extrema), and there are constraints on the sparse code vector β _i ′, resulting in a constrained non-convex optimization problem. Arise. This makes it difficult to solve directly.

式（１６）における第３の調整項の正味の効果は、顔美化における原理の学習サンプルの数を制限することにある。すなわち、スパースコードベクトルβ_ｉを構成するときにスパース性を促進することができる。これにより、スパースコーディングにおけるｌ_０−ノルムなどと同様の効果がある。そこで、式（１６）の最適化問題を次式のように緩和する。 The net effect of the third adjustment term in equation (16) is to limit the number of principle learning samples in face beautification. That is, sparsity can be promoted when the sparse code vector β _i is constructed. This has the same effect as the l ₀ -norm in sparse coding. Therefore, the optimization problem of equation (16) is relaxed as the following equation.

ここで、すべての顔の構成要素が同じ目的関数を共有しているので、簡単に記述するために下付きインデックスｉを削除した。ＨΦをΦとして記載し、ｕは、式（１６）における線形制約をほぼ満たすように、スパースコードベクトルβにおける要素を変調するための重みｕ_ｉのベクトルである。ベクトルｕは、式（１５）が次に説明されるように繰り返して解かれるので更新する必要がある。 Here, since all face components share the same objective function, the subscript index i is deleted for easy description. HΦ is described as Φ, and u is a vector of weights u _i for modulating elements in the sparse code vector β so as to substantially satisfy the linear constraint in Equation (16). The vector u needs to be updated because equation (15) is solved iteratively as described next.

上記の目的関数はスパースコードベクトルα及びβに対してともに凸関数となっていない。上記目的関数を直接に最適化する代わりに、より効果的に問題を解決するための代替手順を採用する。具体的には、式（２２）の目的関数に取り込むために、上記目的関数を２つのサブ問題に分離する。すなわち、スパースコードベクトルαに関する最適化と、スパースコードベクトルβに関する最適化との２つのサブ問題に分離する。この手順は、収束するまで、又は所定の繰り返し数Ｔに達するまでのいずれか早い方になるまで繰り返される。以下では、初期化処理、２つのサブ問題とその最適な解法について説明する。 The above objective function is not a convex function with respect to the sparse code vectors α and β. Instead of optimizing the objective function directly, an alternative procedure is adopted to solve the problem more effectively. Specifically, the objective function is separated into two subproblems in order to incorporate it into the objective function of equation (22). That is, it is divided into two sub-problems: optimization with respect to the sparse code vector α and optimization with respect to the sparse code vector β. This procedure is repeated until convergence or until a predetermined number of iterations T is reached, whichever comes first. In the following, the initialization process, two sub-problems and their optimal solution will be described.

２−５−１．初期化処理
次式の問題を解くことにより、固定されたスパースコードベクトルβのもとで初期のスパースコードベクトル（スパース表現の係数）αを得る。 2-5-1. Initialization processing By solving the problem of the following equation, an initial sparse code vector (sparse expression coefficient) α is obtained under a fixed sparse code vector β.

スパースコードベクトルαの最適解は、拡張ラグラジアン法（ＡＬＭ）（例えば、非特許文献３４参照）として知られる高速ｌ１−最小化アルゴリズムによって、効果的かつ効率的に解くことができる。 The optimal solution of the sparse code vector α can be effectively and efficiently solved by a fast l1-minimization algorithm known as an extended Lagrangian method (ALM) (see, for example, Non-Patent Document 34).

２−５−２．スパースコードベクトルβに関する最適化
式（２３）で導出されたスパースコードベクトルαを用いて、スパースコードベクトルβに関する最適化問題は次式で表される。 2-5-2. Optimization regarding the sparse code vector β Using the sparse code vector α derived by the equation (23), the optimization problem regarding the sparse code vector β is expressed by the following equation.

ここで、Ｓ（）は一般的な非線形マッピングであり、式（２４）における顔美化のペナルティ項（第１項）は画素領域での形状が不規則である実現可能な空間を記述する。このことは、図３中の領域４２によって図示されている。このことは、式（２４）における最適化を解くことを難しくしている。 Here, S () is a general nonlinear mapping, and the face beautification penalty term (first term) in Equation (24) describes a feasible space whose shape in the pixel region is irregular. This is illustrated by region 42 in FIG. This makes it difficult to solve the optimization in equation (24).

そこで、ベクトルΦαを中心とする二次のペナルティ項を有する式（２４）における顔美化のペナルティを近似する。 Therefore, the face beautification penalty in Equation (24) having a second-order penalty term centered on the vector Φα is approximated.

図３は図２の画像処理部１５Ａにおいて実行されるデユアルコードベクトルの検索を説明するための図である。図３（ｂ）に示すように、不規則な形の領域４２は楕円（二次ペナルティ関数の断面）で近似されることがわかる。具体的には、フル行列として二次ペナルティ関数を記載することの代わりに、二次のペナルティ項を、線形変換行列Ｌを介して「座標の変化」を有するより簡単な重み付けされたｌ_２−ノルムとして次式のように記載する。 FIG. 3 is a diagram for explaining a search for a dual code vector executed in the image processing unit 15A of FIG. As shown in FIG. 3B, it can be seen that the irregularly shaped region 42 is approximated by an ellipse (cross-section of the secondary penalty function). Specifically, instead of describing the quadratic penalty function as a full matrix, the quadratic penalty term is converted to a simpler weighted l ₂ − with “coordinate changes” via the linear transformation matrix L. The norm is described as follows.

ここで、Ｗは、行列要素（ｉ，ｉ）に対して重みｗ_ｉを有する対角行列である。 Here, W is a diagonal matrix having a weight w _i for the matrix element (i, i).

上記の「座標の変化」については、下記の慣性のシルベスターの法則を用いる。慣性のシルベスターの法則は、基本の適切な変化を経由して、ｎ個の変数の実際の二次形式Ｑは次式の重み付けされた和の対角形式に変換できることを示している。
For the above “change in coordinates”, the following Sylvester's law of inertia is used. Sylvester's law of inertia shows that, via a basic appropriate change, the actual quadratic form Q of n variables can be converted to the diagonal form of the weighted sum of

次いで、最適化問題は次式のように再定式化することができる。 The optimization problem can then be reformulated as

いまスパースコードベクトルαが固定されかつスパースコードベクトルが未知であるので、計算の便宜のため、ベクトルΦαを逆にパラメータ空間に折り返し、次の等価な最適化問題を得る。 Since the sparse code vector α is now fixed and the sparse code vector is unknown, for convenience of calculation, the vector Φα is turned back to the parameter space to obtain the following equivalent optimization problem.

ここで、Ｗ^−１（）は逆ワーピング演算子を表す。 Here, W ⁻¹ () represents an inverse warping operator.

スパースコードベクトルβの重み付けされたスパース項であるｌ_０−ノルムはまだ取り扱いが難しい。上記の最適化の近似された二次形式が与えられるとき、反復再重み付け最小二乗法（ＩＲＬＳ）を用いて（例えば、非特許文献３６参照）、重み付けされたｌ_０−ノルムを、スパース促進法で重み付けされたｌ_２−ノルムβ^ＴＵβに置き換える。ここで、Ｕは要素（ｉ，ｉ）での重みｕ_ｉを有するもう１つの対角行列である。最適化はいま次式のように書くことができる。 The l ₀ -norm, which is a weighted sparse term of the sparse code vector β, is still difficult to handle. Given an approximated quadratic form of the above optimization, iterative reweighting least squares (IRLS) is used (see, eg, Non-Patent Document 36) to calculate the weighted l ₀ -norm as a sparse acceleration method. in weighted _{l 2} - replace the norm β ^T Uβ. Where U is another diagonal matrix with weights u _i at element (i, i). The optimization can now be written as

ここで、Ｕ^ｔが反復ｔにおけるＩＲＬＳ重み行列である。ある与えられた反復ｔにおける式（２１）の最小化はいま、次式で示すように、閉じた形式の解を有する制約なしの二次計画問題である（例えば、非特許文献３７参照）。 Where U ^t is the IRLS weight matrix at iteration t. The minimization of equation (21) at a given iteration t is now an unconstrained quadratic programming problem with a closed form solution, as shown by the following equation (see, for example, Non-Patent Document 37).

ここで、Λ及びΞは２つの行列である。 Here, Λ and Ξ are two matrices.

スパース性促進すべき重み付けされたｌ_２−ノルムβ^ＴＵβに対して、式（２８）は複数回解く必要があり、各反復ｔに対して、ＩＲＬＳ重み行列Ｕ^ｔにおける複数の重みを調整する必要がある。具体的には、
は数値の安定性のために選択されたパラメータε＞０に対して前の反復ｔ−１の解β^ｔ−１を用いて次式に更新される（詳細については、例えば、非特許文献３６参照）。
For the weighted l ₂ -norm β ^T Uβ to promote sparsity, equation (28) needs to be solved multiple times, and for each iteration t, adjust the weights in the IRLS weight matrix U ^t . There is a need. In particular,
Is updated to the following equation using the solution β ^t−1 of the previous iteration t−1 for the parameter ε> 0 selected for numerical stability (for example, see Non-Patent Document 36: reference).

唯一残っている問題は、式（２５）における近似は良いものになるようにいかに線形変換行列Ｌ及び重みｗ_ｉを選択するかにある。説明を容易にするために、スパースコードベクトルαが固定される場合、
がベクトルΦαから外れたベクトルであるとする。二次ペナルティ関数Ｐ（ｖ）は、個別の構成要素の単純な重み付けされた和のみになるように、線形変換行列Ｌは座標系を変更する。すなわち、次式で表される。 The only remaining problem is how to select the linear transformation matrix L and weight w _i so that the approximation in equation (25) is good. For ease of explanation, if the sparse code vector α is fixed,
Is a vector deviating from the vector Φα. The linear transformation matrix L changes the coordinate system so that the secondary penalty function P (v) is only a simple weighted sum of the individual components. That is, it is expressed by the following formula.

ここで、（）_ｉがベクトル内のｉ番目の要素を表す。 Here, () _i represents the i-th element in the vector.

固定されたベクトルα_０が与えられたときに、ベクトルα_１に対して少量だけ摂動させるとする。ベクトルΦα_０からベクトルΦα_１への画素領域における変化は、ベクトルｖ_１で表され、特徴ベクトルの変化は次式で表される。 When a fixed vector α ₀ is given, the vector α _{1 is} perturbed by a small amount. The change in the pixel region from the vector Φα ₀ to the vector Φα ₁ is represented by the vector v ₁ , and the change in the feature vector is represented by the following equation.

は本質的に二次ペナルティ関数Ｐ（）を評価することを用いたサンプルポイントである。ここで、Ｋをそのようなサンプルであるとし、Ｋ＜Ｎとすると、次式を得る。 Is essentially a sample point using evaluating the secondary penalty function P (). Here, when K is such a sample and K <N, the following equation is obtained.

これらのＫ個のベクトルｖ_Ｋは線形独立であると仮定すると、Ｋ次元空間を過ごすＫ個の直交基底のセットを構築するために、反復投影を行うことができる。これらの基底ベクトルを与えられると、これらのサンプル点を通過するようにＫ次元二次ペナルティ関数Ｐ（Ｋ）を設計することができる。もし次元の残り（Ｎ−Ｋ）がまた、Ｋ個の計算重みの平均である重みｗ_ｉをそれぞれ有する直交基底ベクトルを有すると仮定すると、完全な１つのペナルティ関数を得る。 Assuming that these K vectors v _K are linearly independent, iterative projections can be performed to construct a set of K orthogonal bases that spend a K-dimensional space. Given these basis vectors, a K-dimensional quadratic penalty function P (K) can be designed to pass through these sample points. If we assume that the rest of the dimensions (N−K) also have orthogonal basis vectors, each with a weight w _i that is the average of the K computational weights, we get a complete penalty function.

２−５−３．スパースコードベクトルαに関する最適化
スパースコードベクトルβを固定すると、スパースコードベクトルαを最適化するためのサブ問題は次式で表される。 2-5-3. Optimization for Sparse Code Vector α When the sparse code vector β is fixed, a sub-problem for optimizing the sparse code vector α is expressed by the following equation.

簡単な代数学を用いて定数を削除すると、上記の目的関数は次のように再定式化することができる。 If we remove the constants using simple algebra, the above objective function can be reformulated as follows:

これは事実式（２９）と同じ形式を有する。従って、当該問題は反復再重み付け最小二乗アルゴリズムによって解くことが効果的であることがわかる。 This has the same form as fact (29). Therefore, it can be seen that it is effective to solve the problem by an iterative reweighting least squares algorithm.

Ｄ．スピードアップと時間的整合性
発明者らは、ただ単に単一の画像のために動作するものではなくテレビ会議用対話装置に適用するための最適化アルゴリズムを意図している。本手法では、前のフレームの計算された解ベクトルα^ｔ−１及びβ^ｔ−１は、現在のフレームｔのベクトルα^ｔ及びβ^ｔを解くためのＩＲＬＳ最適化の処理中において重み行列Ｕ^ｔの初期化のために用いることができる。式（１８）における二次関数パラメータＬ及びｗはまた再使用され又はさらに精度を上げるように計算（リファイン）することができる。このパラメータの再使用には下記の２つの明白な利点がある。
（１）第一に、最適化処理中のＩＲＬＳに対して必要な繰り返し数を大幅に減少させることができ、従って、顔の再構成と美化の処理をスピードアップすることができる。
（２）第二に、実験で示されるように、類似の美化顔の構成要素を選択することで、すべてのフレーム間で時間的な一貫性を達成することができる。 D. Speed Up and Temporal Consistency The inventors contemplate an optimization algorithm for application to video conferencing interaction devices, not just operating on a single image. In this approach, the calculated solution vectors α ^t-1 and β ^t-1 of the previous frame are used in the weighting matrix U ^t during the IRLS optimization process to solve the vectors α ^t and β ^t of the current frame ^t. Can be used for initialization. The quadratic function parameters L and w in equation (18) can also be reused or calculated (refined) for further accuracy. The reuse of this parameter has two distinct advantages:
(1) First, the number of iterations required for IRLS during the optimization process can be greatly reduced, thus speeding up the face reconstruction and beautification process.
(2) Second, as shown in the experiment, temporal consistency between all frames can be achieved by selecting similar beautifying face components.

２−６．実験
当該実験及び実験結果は、提案された図２の画像処理部１５Ａのシステム性能を実証するために提示される。 2-6. Experiment The experiment and experimental results are presented to demonstrate the system performance of the proposed image processor 15A of FIG.

図４は図２の画像処理部１５Ａにおいて実行される典型的な画像のアライメント（位置合わせ）で用いるランドマークの一例を示す図である。発明者らの実験設定において、ディスプレイ１８の下部に配置されたＫｎｅｃｔカメラ１１Ａを用いて、図４に示すように、発明者らは、８人の顔画像をキャプチャしてテストサンプルとして使用した。図４から明らかなように、カメラ１１Ａでキャプチャされた顔画像と、ディスプレイ１８の中心から観測されるように視線補正された顔画像との角度差は従来技術文献（例えば、非特許文献２参照）のシステムよりもるかに大きいことに注意する必要がある。 FIG. 4 is a diagram showing an example of landmarks used in typical image alignment (positioning) executed in the image processing unit 15A of FIG. In the experiment setting of the inventors, as shown in FIG. 4, the inventors captured eight facial images and used them as test samples using the Kect camera 11 </ b> A arranged at the lower part of the display 18. As is clear from FIG. 4, the angle difference between the face image captured by the camera 11 </ b> A and the face image that has been line-of-sight corrected so as to be observed from the center of the display 18 is known from the prior art document (for example, see Non-Patent Document 2). Note that it is much larger than the system).

まず、辞書学習について以下に説明する。実験では、顔の再構成の処理が顔のディジタルカメラで撮影視点の領域及び現在の人の他の正面の写真で構成される特定顔データセットに基づいている。顔再構成のための辞書行列Φはパッチが賢明である。発明者らの実験では、辞書行列Φの学習のための５×５の小さなパッチをオーバーラップ抽出した。顔美化のプロセスは、オフライン美しい顔データセットに依存する。当該データセットは、それぞれ男性と女性、６００個のサンプルを含み、ウェブ上から収集された。年齢、肌の色や他の無関係な要因の影響を低減するために、発明者らは、美しい顔学習セットを構築するために、メガネなしで、正面にある４０代の二十代から年齢とともに中国、日本、韓国のテレビスターや映画スターを選択してその顔画像データを収集した。発明者らは、図３に示すように、顔領域を検出し、１０３個のランドマークを識別するためのアルゴリズム（例えば、非特許文献２１参照）を使用した。すべてのスターの顔はテストの顔の瞳孔間の距離で規格化され、２つの目が同一の位置になるように位置合わせされた。男性及び女性の美化顔が独自の辞書行列Ψとして使用された。 First, dictionary learning will be described below. In the experiment, the face reconstruction process is based on a specific face data set consisting of the area of the photographic viewpoint and other frontal photos of the current person with a digital camera of the face. The dictionary matrix Φ for face reconstruction has a sensible patch. In the experiments of the inventors, 5 × 5 small patches for learning the dictionary matrix Φ were overlap-extracted. The face beautification process relies on an offline beautiful face data set. The data set was collected from the web, including 600 samples for males and females, respectively. In order to reduce the effects of age, skin color and other unrelated factors, the inventors have developed a beautiful face learning set, without glasses, from the forties in their 40s in front to age Chinese, Japanese, and Korean TV stars and movie stars were selected and their face image data was collected. The inventors used an algorithm (see, for example, Non-Patent Document 21) for detecting a face region and identifying 103 landmarks as shown in FIG. All star faces were normalized by the distance between the test face pupils and aligned so that the two eyes were in the same position. Male and female beautification faces were used as a unique dictionary matrix Ψ.

顔美化のための辞書行列Ψは、顔の構成要素に基づいている。目と眉毛に対して、複数のマスクが、隣接するランドマークによって形成された凸ポリゴンで生成した。ここで、鼻のランドマークは近接する多角形を形成しなかった。従って、鼻のマスクが上部及び下部ランドマークポイントをカバーする三角形によって定義される。 The dictionary matrix Ψ for beautifying the face is based on the face components. For eyes and eyebrows, a plurality of masks were generated with convex polygons formed by adjacent landmarks. Here, the nose landmarks did not form an adjacent polygon. Thus, the nose mask is defined by a triangle covering the upper and lower landmark points.

次いで、小さい視野角での撮像顔画像データについて以下に説明する。発明者らの実験において、完全を期すために、発明者らはまた、より小さな角度のケースについて検討した。ここで、自己遮蔽された画素は画面中央のビュー画像へのインパルス雑音のランダムな挿入によってシミュレーションした。実際の実験では、３０％の画素がランダムに挿入された。 Next, captured face image data with a small viewing angle will be described below. In our experiments, for completeness, we also examined the smaller angle case. Here, the self-shielded pixel was simulated by random insertion of impulse noise into the view image at the center of the screen. In actual experiments, 30% of pixels were inserted randomly.

図５Ａは図２の画像処理部１５Ａによる画像処理結果であって、無表情のフレームに対する小さな撮像角度のときの画像処理結果の写真であり、上段は穴埋め処理のみの画像処理結果であり、下段は画像再構成及び顔美化処理の画像処理結果である。また、図５Ｂは図２の画像処理部１５Ａによる画像処理結果であって、少し笑みの表情のフレームに対する小さな撮像角度のときの画像処理結果の写真であり、上段は穴埋め処理のみの画像処理結果であり、下段は画像再構成及び顔美化処理の画像処理結果である。 5A is an image processing result by the image processing unit 15A of FIG. 2 and is a photograph of the image processing result at a small imaging angle with respect to an expressionless frame. The upper part is an image processing result of only the filling process, and the lower part. Is an image processing result of image reconstruction and face beautification processing. 5B is a result of the image processing by the image processing unit 15A of FIG. 2 and is a photograph of the image processing result at a small imaging angle with respect to a frame with a slightly smiling expression, and the upper stage is an image processing result of only the filling process. The lower row shows the image processing results of image reconstruction and face beautification processing.

知覚的評価において、ＤＩＢＲで視線補正された顔を再構成し、顔美化の画像処理結果を図５Ａ及び図５Ｂに示した。図５Ａ及び図５Ｂの結果では、二人の男性のテストサンプル及び三人の女性のテストサンプルを含む。図５Ａ及び図５Ｂから明らかなように、視覚的に明らかに提案された方法は満足の再構成の結果を達成できたことがわかる。また、すべての穴を埋め、輪郭などの細かい顔の細部を再構成したことがわかる。良い顔再構成は、発明者らが構築し、特定の顔の辞書行列Φから有用な事前分布を導き出すことができるという事実に起因している可能性がある。顔の特徴は、顔画像を通して多くのことを変化させることができるが、顔の微細構造は、少数の元画像で表すことができることが見出される。これらの元画像は、単純なセルで受容可能なフィールドの形式で定性的に似ている。従って、いくつかの学習用の顔画像から抽出された多くのパッチを使用して、発明者らは良好にテスト用の顔画像を表すことができる辞書行列Φを学習することができる。 In the perceptual evaluation, the face subjected to line-of-sight correction with DIBR was reconstructed, and the results of image processing for beautification of the face are shown in FIGS. The results of FIGS. 5A and 5B include two male test samples and three female test samples. As can be seen from FIGS. 5A and 5B, it can be seen that the method which has been proposed visually is able to achieve satisfactory reconstruction results. It can also be seen that all the holes were filled and fine facial details such as contours were reconstructed. Good face reconstruction may be due to the fact that we can build and derive a useful prior distribution from the dictionary matrix Φ for a particular face. While facial features can vary a lot through facial images, it is found that facial microstructure can be represented by a small number of original images. These original images are qualitatively similar in the form of fields that are acceptable in simple cells. Thus, using many patches extracted from several learning face images, the inventors can learn a dictionary matrix Φ that can well represent test face images.

図５Ａ及び図５Ｂの下段の結果から、発明者らは全てのテストサンプルは、顔の構成要素を美化していることをできる。全てのサンプルは、それらをより美しく見えるように拡大された目を持っている。最初の４つのサンプルは、より顕著に眉毛を持っている。なお、第２ないし第４のテストサンプルは、顔の輪郭を調整したより美しい顔の形になっている。また、第２のテストサンプルは、目の周りのしわが除去される。再構成のみの顔は鼻と髪の光を遮断による目の周りの暗い色合いを持っている。美化の面は、被験者がより活発に見えるようにするには、この影響を排除する。さらに重要なことは、いまだ再構成専用の顔に視線不一致少量が存在することに注意する必要がある。ここで、美化プロセスは完璧な視線補正を実現する。元の顔美化一つの違いは非常に微妙であり、そして、このように２つの面の間の類似は紛れもないである。しかし、微妙な変化は、明らかに、これらの面の魅力に顕著な影響を与える。ビデオ会議システムでは、時間的一貫性の問題を考慮すべきである。 From the lower results of FIGS. 5A and 5B, we can see that all test samples have beautified facial components. All samples have eyes that are enlarged to make them look more beautiful. The first four samples have eyebrows more prominently. The second to fourth test samples have a more beautiful face shape with the face contour adjusted. Further, wrinkles around the eyes are removed from the second test sample. A reconstructed face has a dark tint around the eyes due to blocking the light on the nose and hair. The beautification aspect eliminates this effect in order to make the subject appear more active. More importantly, it should be noted that there is still a small amount of gaze mismatch in the face dedicated to reconstruction. Here, the beautification process achieves perfect line-of-sight correction. The difference between the original face beautification is very subtle and thus the similarity between the two faces is unmistakable. However, subtle changes clearly have a noticeable effect on the appeal of these aspects. Video conferencing systems should consider the issue of time consistency.

次いで、統計的評価について以下に説明する。 Next, statistical evaluation will be described below.

美しさは主観的であるため、オブザーバーを選択して上で投票するために、発明者らは、ウエブページ上で美化前及び美化後の画像対を掲載した。具体的には、同じ人（ランダムな順序で美化前及び美化後）の顔画像対を発表し、好みの画像を「左」又は「右」を選択するようにユーザに要求した。これは、基本的に例えば非特許文献３８で説明された２つの代替強制選択（２ＡＦＣ）法である。画像を比較するこの方法は、そのような平均オピニオンスコア（ＭＯＳ）などの品質の評価によりよりノイズに対して敏感ではない方法である。 Because beauty is subjective, in order to select an observer and vote above, the inventors posted a pre-beautified and post-beautified image pair on a web page. Specifically, face image pairs of the same person (before and after beautification in a random order) were announced, and the user was requested to select “left” or “right” as a favorite image. This is basically the two alternative forced selection (2AFC) methods described in, for example, Non-Patent Document 38. This method of comparing images is a method that is less sensitive to noise due to the evaluation of quality, such as the mean opinion score (MOS).

図６は図２の画像処理部１５Ａによる顔美化された正面画像に対する投票結果を示すテーブルである。発明者らは、３７人の観測者から票を集めており、その結果は図６に要約される。例えば非特許文献３９に開示されたように、発明者らは、ある人の元の顔画像と美化後のバージョンの選択が一様にランダムであるという帰無仮説を想定しており、ピアソンのカイ二乗検定（例えば、非特許文献４０参照）を行い、各画像対のｐ値を計算した。経験則として、帰無仮説は、ｐ＜０．０５あれば拒否される。図６に示したように、視聴者は高い尤度を持つ三人の女性被験者（Ｂ１，Ｂ２，Ｃ１，Ｃ２、Ｄ１，Ｄ２）の美化画像（小さいｐ値）を選択する傾向があり、このように視聴者が元画像と美化画像の間で尤度がない帰無仮説は棄却できる。男性被験者（Ａ１，Ａ２）の場合は、視聴者の尤度は区別することが困難である。男性の美化はり主観的となりコンセンサスを達成することが難しくなる。 FIG. 6 is a table showing the voting results for the front image whose face is beautified by the image processing unit 15A of FIG. The inventors have collected votes from 37 observers and the results are summarized in FIG. For example, as disclosed in Non-Patent Document 39, the inventors have assumed a null hypothesis that the selection of a person's original facial image and beautified version is uniformly random, and Pearson's Chi-square test (for example, see Non-Patent Document 40) was performed, and the p value of each image pair was calculated. As a rule of thumb, the null hypothesis is rejected if p <0.05. As shown in FIG. 6, viewers tend to select beautified images (small p-values) of three female subjects (B1, B2, C1, C2, D1, D2) with high likelihood. Thus, the null hypothesis that the viewer has no likelihood between the original image and the beautified image can be rejected. In the case of male subjects (A1, A2), it is difficult to distinguish the likelihood of the viewer. Men's beautification becomes subjective and difficult to achieve consensus.

図７は図２の画像処理部１５Ａによる画像処理結果であって、顔美化された顔画像をキャプチャされた画像に統合した結果であり、上段は元のキャプチャされた顔画像であり、下段は統合された顔画像である。図７において２つのテストサンプルの統合結果を示している。再構成された完全な顔画像は、視線一致とともに、前後での統合及び顔表情を保持している。 FIG. 7 shows the result of image processing by the image processing unit 15A of FIG. 2, which is the result of integrating the face beautified face image into the captured image, the upper part is the original captured face image, and the lower part is It is an integrated face image. FIG. 7 shows an integration result of two test samples. The reconstructed complete face image retains the front and back integration and facial expression as well as line-of-sight matching.

以上説明したように、視線の不一致は、ビデオ会議システムでの既知の問題であるが、本実施形態では、デユアルスパースコードベクトルを用いて、ＤＩＢＲで合成された視線補正された顔画像に対して穴埋めと顔美化を協働して実行することで、知覚的に現実そのものよりも優れている実体験のように感じる没入型コミュニケーションの仮想現実画像を生成することを提供することを目的としている。大容量のコーパスデータの利用可能性を仮定すると、キーアイデアは、２つの辞書行列Φ，Ψをオフライン（ターゲットのテレビ会議用と美化顔用）を学習させることであり、これにより、リアルタイムに２つのスパースコードベクトルを求めることができる。１つのスパースコードベクトルは利用可能なＤＩＢＲで合成された画像に対するスパースコードベクトルであり、もう１つは、美化された顔に近接させながら受容可能なレベルの認識性を確立するように特徴空間の距離に関する第１のベクトルによく一致させるものである。さらに、必要に応じて、スパース近傍選択手順を経由して対象者の顔の輪郭を絞り込む。本発明者らは、大規模な実験を通して、強化された魅力性と一貫して自然に見える顔を合成することができることを示している。 As described above, the line-of-sight mismatch is a known problem in the video conference system. In the present embodiment, however, the line-of-sight corrected face image synthesized by DIBR using the dual sparse code vector is used. The goal is to provide a virtual reality image of immersive communication that feels like a real experience that is perceptually superior to reality itself, by jointly executing hole filling and face beautification. Assuming the availability of large-capacity corpus data, the key idea is to train the two dictionary matrices Φ, Ψ offline (for the target video conferencing and for the beautifying face). Two sparse code vectors can be determined. One sparse code vector is a sparse code vector for images synthesized with available DIBR, and the other is a feature space so as to establish an acceptable level of cognition while approaching a beautified face. It is a good match with the first vector for distance. Further, the contour of the subject's face is narrowed down as necessary via a sparse neighborhood selection procedure. The inventors have shown through extensive experiments that it is possible to synthesize faces that look natural with enhanced attractiveness.

以上詳述したように、本発明によれば、対話装置のための画像処理装置において、対話者の視線が一致するように補正するとともに、「オクルージョン」の問題を同時に解決しかつ顔美化も行うことができる。 As described in detail above, according to the present invention, in the image processing apparatus for an interactive apparatus, correction is performed so that the line of sight of the conversation person coincides, and the problem of “occlusion” is simultaneously solved and face beautification is also performed. be able to.

１１…左側カメラ、
１１Ａ…カメラ、
１２…右側カメラ、
１３…画像データ取得部、
１４…視線補正部、
１５，１５Ａ，１９，２０…画像処理部、
１６…画像通信部、
１７…表示制御部、
１８…ディスプレイ、
２１…再構成処理部、
２２…顔美化処理部、
２３…顔輪郭調整部、
２４…画像ワーピング処理部、
３１，３２…辞書メモリ、
３３…顔輪郭メモリ。 11 ... Left camera,
11A ... Camera,
12 ... Right camera,
13: Image data acquisition unit,
14: Gaze correction unit,
15, 15A, 19, 20 ... image processing unit,
16 ... image communication part,
17 ... display control unit,
18 ... Display,
21 ... Reconfiguration processing unit,
22 ... Face beautification processing unit,
23. Face contour adjustment unit,
24. Image warping processing unit,
31, 32 ... Dictionary memory,
33: Face outline memory.

Claims

A first dictionary configured by clustering after extracting a plurality of patches from a plurality of face image data, and storing a first dictionary matrix (Φ) for face reconstruction processing for hole filling processing generated by line-of-sight correction Memory,
A second dictionary memory configured by clustering a plurality of components of a beautiful face from a plurality of face image data and storing a second dictionary matrix (Ψ) for face beautification processing;
Based on the input face image vector (x), a first sparse code vector (α) for the first dictionary matrix (Φ) and a second sparse code vector for the second dictionary matrix (Ψ) (Β) and a linear transformation matrix (L) from the reconstructed face image data to the simple image data subjected to the face beautification, for the face reconstruction and the face beautification By obtaining respective optimum values of the first sparse code vector (α), the second sparse code vector (β) and the linear transformation matrix (L) when minimizing the objective function of And image processing means for generating and outputting face image data obtained by performing face reconstruction processing and face beautification processing on the face image vector (x) to be output.

The objective function includes a warping operator W (A, B) that warps the input face image data (A) to beautified face image data (B) instead of the linear transformation matrix (L). A mapping function S (•) for mapping a pixel region vector to a vector in a selected feature space using a scale invariant feature transformation method;
The image processing means respectively optimizes the first sparse code vector (α) and the second sparse code vector (β) when the objective function for the face reconstruction and the beautification is minimized. 2. The image according to claim 1, wherein face image data obtained by performing face reconstruction processing and face beautification processing on the input face image vector (x) is generated and output by obtaining a value. Processing equipment.

The objective function is expressed as
Here, μ is a predetermined constant, u is a weight vector for modulating the second sparse code vector (β) to satisfy a predetermined linear constraint,
The image processing means obtains the first sparse code vector (α) and the second sparse code vector (β) using the above formula, and performs face reconstruction on the input face image vector (x). The image processing apparatus according to claim 2, wherein face image data subjected to configuration processing and face beautification processing is generated and output.

A face image vector (x) is output by correcting the line of sight of the face image data to be captured so that the line of sight of the face of the face image data matches the line of sight of the conversation person. The image processing apparatus according to claim 1, further comprising a line-of-sight correction processing unit.

A face contour that is provided at a subsequent stage of the image processing means and outputs face image data adjusted by adjusting the contour of the face using a plurality of face contour data for the face image data from the image processing means The image processing apparatus according to claim 1, further comprising an adjusting unit.

An image that is provided after the face contour adjusting unit and warps the face image data from the face contour adjusting unit so as to include the background of the captured face image data and outputs the face image data. 6. The image processing apparatus according to claim 5, further comprising warping processing means.

Transmitting / receiving means for receiving face image data from the other party's dialogue device and transmitting the face image data of the dialogue device to the other party's dialogue device;
An interactive device comprising a display unit for displaying each face image data,
The image processing apparatus according to any one of claims 1 to 6, comprising:
The dialogue apparatus characterized in that the transmission / reception means transmits face image data from the image processing device to a dialogue apparatus of the other party.

A first dictionary configured by clustering after extracting a plurality of patches from a plurality of face image data, and storing a first dictionary matrix (Φ) for face reconstruction processing for hole filling processing generated by line-of-sight correction Memory,
An image processing apparatus comprising: a second dictionary memory configured by clustering a plurality of components of a beautiful face from a plurality of face image data and storing a second dictionary matrix (Ψ) for face beautification processing An image processing method for
Based on the input face image vector (x), the image processing apparatus performs the first sparse code vector (α) for the first dictionary matrix (Φ) and the second dictionary matrix (Ψ). An objective function comprising a second sparse code vector (β) and a linear transformation (L) from the reconstructed face image data to the simplified face data subjected to face beautification, wherein the face reconstruction and Obtaining respective optimum values of the first sparse code vector (α), the second sparse code vector (β), and the linear transformation (L) when minimizing the objective function for the beautification of the face And generating and outputting face image data obtained by performing face reconstruction processing and face beautification processing on the input face image vector (x).