JP7408562B2

JP7408562B2 - Program, information processing device, quantification method, and information processing system

Info

Publication number: JP7408562B2
Application number: JP2020552524A
Authority: JP
Inventors: 瞳遠藤; 成人豊田; 雄一郎森; 義満青木
Original assignee: Shiseido Co Ltd
Current assignee: Shiseido Co Ltd
Priority date: 2018-10-24
Filing date: 2019-07-08
Publication date: 2024-01-05
Anticipated expiration: 2039-07-08
Also published as: JPWO2020084842A1; WO2020084842A1; CN112912925A; TW202016881A

Description

本発明は、プログラム、情報処理装置、定量化方法及び情報処理システムに関する。 The present invention relates to a program, an information processing device, a quantification method, and an information processing system.

メークの仕上がりは、使用する化粧品やメークテクニックなどによって異なることが知られている。従来、メークを行う多くのユーザは、他人のメークテクニックを目にする機会が少なかった。近年、動画サイト等のＷｅｂサイトには、メーク中の様子を撮影した動画（メーク動画）が数多く投稿されるようになった。このようなメーク動画によりユーザはメークテクニックを確認できる機会が増加した（例えば非特許文献１参照）。 It is known that the finish of makeup varies depending on the cosmetics used and makeup techniques. Conventionally, many makeup users have had few opportunities to see other people's makeup techniques. In recent years, many videos of people applying makeup (makeup videos) have been posted on websites such as video sites. Such makeup videos have increased opportunities for users to check makeup techniques (for example, see Non-Patent Document 1).

"動画でレッスン"、［online］、株式会社資生堂、［平成30年4月20日検索］、インターネット〈URL:https://www.shiseido.co.jp/beauty/dictionary/lesson/index.html〉"Lessons with videos", [online], Shiseido Co., Ltd., [searched on April 20, 2018], Internet <URL: https://www.shiseido.co.jp/beauty/dictionary/lesson/index.html 〉

上記したメーク動画を視聴することにより、ユーザはメーク動画に写る人物が使用した化粧品や化粧動作を見ることができる。しかし、ユーザはメーク動画に写る人物の化粧動作を見るだけで、その人物の化粧動作を上手に真似ることが難しかった。なお、メーク動画から化粧動作を定量化することができれば、真似したユーザの化粧動作と、メーク動画に写る人物の化粧動作との比較が容易となり、便利である。 By viewing the makeup video described above, the user can see the cosmetics used by the person in the makeup video and the makeup actions. However, it has been difficult for users to skillfully imitate the makeup actions of a person in a makeup video just by looking at them. Note that if makeup actions can be quantified from makeup videos, it will be convenient because it will be easy to compare the makeup actions of the imitated user and the makeup actions of the person in the makeup video.

本発明の一実施形態は、動画データに写る人物の化粧動作を定量化できるプログラムを提供することを目的とする。 An object of an embodiment of the present invention is to provide a program that can quantify makeup actions of a person appearing in video data.

上記の課題を解決するため、本発明の一実施形態は、メーク又はスキンケアの動画データに写る人物の化粧動作を解析するようにコンピュータを機能させるためのプログラムであって、前記コンピュータを、前記動画データから前記人物の顔が写る顔領域を検出する第１の検出手段、前記動画データから前記人物の手が写る手領域を検出する第２の検出手段、検出した前記顔領域及び前記手領域に基づき、前記顔領域及び前記手領域の動きから取得した前記動画データに写る人物の化粧動作中の手の座標及び速度と顔の座標及び速度から、手動作の細かさ及び手と肘の連動性、または、手動作の細かさ及び顔の動きを定量化して算出し、出力する出力手段、として機能させるためのプログラムであることを特徴とする。

In order to solve the above-mentioned problems, one embodiment of the present invention is a program for causing a computer to function to analyze the makeup movements of a person appearing in makeup or skin care video data , a first detection means for detecting a face area in which the person's face is captured from video data; a second detection unit for detecting a hand area in which the person's hand is captured from the video data; and the detected face area and the hand area. Based on the coordinates and speed of the hand of the person appearing in the video data acquired from the movements of the face area and the hand area, and the coordinates and speed of the face, the fineness of the hand movement and the coordination of the hand and elbow are determined. The present invention is characterized in that it is a program for functioning as an output means for quantifying and calculating the fineness of hand movements and facial movements, and outputting the results .

本発明の一実施形態によれば、動画データに写る人物の化粧動作を定量化できる。 According to an embodiment of the present invention, it is possible to quantify makeup actions of a person appearing in video data.

本実施形態に係る情報処理システムの一例の構成図である。FIG. 1 is a configuration diagram of an example of an information processing system according to an embodiment. 本実施形態に係る情報処理システムの一例の構成図である。FIG. 1 is a configuration diagram of an example of an information processing system according to an embodiment. 本実施形態に係るコンピュータの一例のハードウェア構成図である。FIG. 1 is a hardware configuration diagram of an example of a computer according to the present embodiment. 計測の結果の一例の散布図である。It is a scatter diagram of an example of a measurement result. 計測の結果の一例の散布図である。It is a scatter diagram of an example of a measurement result. 本実施形態に係る情報処理システムの一例の機能ブロック図である。FIG. 1 is a functional block diagram of an example of an information processing system according to the present embodiment. 領域検出部の一例の構成図である。FIG. 3 is a configuration diagram of an example of a region detection section. フレーム画像に写る人物の顔領域を検出する処理の一例のイメージ図である。FIG. 3 is an image diagram of an example of a process for detecting a face area of a person in a frame image. 顔特徴点検出処理の一例のフローチャートである。3 is a flowchart of an example of facial feature point detection processing. フレーム画像に写る人物の手領域を検出する処理の一例のイメージ図である。FIG. 3 is an image diagram of an example of a process for detecting a hand region of a person in a frame image. 領域検出部の一例の構成図である。FIG. 3 is a configuration diagram of an example of a region detection section. 顔領域検出部及び手領域検出部の処理の一例のイメージ図である。FIG. 6 is an image diagram of an example of processing by a face area detection unit and a hand area detection unit. 力場計算部及びチャンネル計算部の処理の一例のイメージ図である。FIG. 3 is an image diagram of an example of processing by a force field calculation unit and a channel calculation unit.

次に、本発明の実施形態について詳細に説明する。 Next, embodiments of the present invention will be described in detail.

［第１の実施形態］
＜システム構成＞
図１Ａ及び図１Ｂは、本実施形態に係る情報処理システムの一例の構成図である。図１Ａの情報処理システムは単体の情報処理装置１を備えている。情報処理装置１は、ユーザが操作するＰＣ、スマートフォン、タブレット、家庭用や業務用の化粧動作を定量化する専用機器などである。[First embodiment]
<System configuration>
1A and 1B are configuration diagrams of an example of an information processing system according to this embodiment. The information processing system in FIG. 1A includes a single information processing device 1. The information processing system shown in FIG. The information processing device 1 is a PC operated by a user, a smartphone, a tablet, a dedicated device for quantifying makeup actions for home use or business use, or the like.

また、図１Ｂの情報処理システムは１台以上のクライアント端末２とサーバ装置３とがインターネット等のネットワーク４を介して接続されている。クライアント端末２はユーザが操作するＰＣ、スマートフォン、タブレットなどの端末装置、家庭用や業務用の化粧動作を定量化する専用機器などである。サーバ装置３はクライアント端末２で行われる化粧動作の定量化に関する処理等を行う。 Further, in the information processing system shown in FIG. 1B, one or more client terminals 2 and a server device 3 are connected via a network 4 such as the Internet. The client terminal 2 is a terminal device such as a PC, a smartphone, or a tablet operated by a user, or a dedicated device for quantifying makeup actions for home or business use. The server device 3 performs processing related to quantifying makeup actions performed on the client terminal 2.

このように、本発明は図１Ｂに示すようなクライアント・サーバ型の情報処理システムの他、図１Ａに示す単体の情報処理装置１においても適用可能である。なお、図１Ａ及び図１Ｂの情報処理システムは一例であって、用途や目的に応じて様々なシステム構成例があることは言うまでもない。例えば図１Ｂのサーバ装置３は複数のコンピュータに分散して構成してもよい。 In this way, the present invention is applicable not only to the client-server type information processing system shown in FIG. 1B but also to the single information processing device 1 shown in FIG. 1A. Note that the information processing systems shown in FIGS. 1A and 1B are merely examples, and it goes without saying that there are various system configuration examples depending on the usage and purpose. For example, the server device 3 in FIG. 1B may be configured to be distributed among multiple computers.

＜ハードウェア構成＞
図１Ａ及び図１Ｂの情報処理装置１、クライアント端末２及びサーバ装置３は、例えば図２に示すようなハードウェア構成のコンピュータにより実現される。図２は、本実施形態に係るコンピュータの一例のハードウェア構成図である。<Hardware configuration>
The information processing device 1, client terminal 2, and server device 3 in FIGS. 1A and 1B are realized, for example, by a computer with a hardware configuration as shown in FIG. 2. FIG. 2 is a hardware configuration diagram of an example of a computer according to this embodiment.

図２のコンピュータは、入力装置５０１、出力装置５０２、外部Ｉ／Ｆ５０３、ＲＡＭ５０４、ＲＯＭ５０５、ＣＰＵ５０６、通信Ｉ／Ｆ５０７、及びＨＤＤ５０８などを備えており、それぞれがバスＢで相互に接続されている。 The computer in FIG. 2 includes an input device 501, an output device 502, an external I/F 503, a RAM 504, a ROM 505, a CPU 506, a communication I/F 507, an HDD 508, and the like, each of which is interconnected by a bus B.

入力装置５０１は入力に用いるキーボード、マウスなどである。出力装置５０２は、画面を表示する液晶や有機ＥＬなどのディスプレイ、音声や音楽などの音データを出力するスピーカ等で構成されている。通信Ｉ／Ｆ５０７はコンピュータをネットワーク４に接続するインターフェースである。ＨＤＤ５０８はプログラムやデータを格納している不揮発性の記憶装置の一例である。 The input device 501 is a keyboard, mouse, or the like used for input. The output device 502 includes a display such as a liquid crystal or organic EL that displays a screen, a speaker that outputs sound data such as voice and music, and the like. The communication I/F 507 is an interface that connects the computer to the network 4. The HDD 508 is an example of a nonvolatile storage device that stores programs and data.

外部Ｉ／Ｆ５０３は、外部装置とのインターフェースである。コンピュータは外部Ｉ／Ｆ５０３を介して記録媒体５０３ａの読み取り及び／又は書き込みを行うことができる。記録媒体５０３ａにはＤＶＤ、ＳＤメモリカード、ＵＳＢメモリなどがある。 External I/F 503 is an interface with an external device. The computer can read from and/or write to the recording medium 503a via the external I/F 503. The recording medium 503a includes a DVD, an SD memory card, a USB memory, and the like.

ＣＰＵ５０６はＲＯＭ５０５やＨＤＤ５０８などの記憶装置からプログラムやデータをＲＡＭ５０４上に読み出し、処理を実行することで、コンピュータ全体の制御や機能を実現する演算装置である。本実施形態に係る情報処理装置１、クライアント端末２及びサーバ装置３は上記したハードウェア構成のコンピュータにおいてプログラムを実行することにより各種機能を実現できる。 The CPU 506 is an arithmetic unit that implements control and functions of the entire computer by reading programs and data from a storage device such as the ROM 505 and the HDD 508 onto the RAM 504 and executing processing. The information processing device 1, client terminal 2, and server device 3 according to this embodiment can implement various functions by executing programs on a computer with the above-described hardware configuration.

なお、図２のハードウェア構成は一例であって、用途や目的に応じて様々な構成例があることは言うまでもない。例えば図２のコンピュータは、入力装置５０１に動画を撮影可能なカメラ機能を有していてもよい。 Note that the hardware configuration shown in FIG. 2 is one example, and it goes without saying that there are various configuration examples depending on the usage and purpose. For example, the computer in FIG. 2 may have the input device 501 have a camera function that can shoot moving images.

＜化粧動作の定量化の検討＞
化粧動作の定量化の方法としては、例えばセンサーを用いるものがある。センサーを用いる化粧動作の定量化の方法ではメークを行う人物にセンサーを装着してもらい、メーク動作を行ってもらう。センサーを用いる化粧動作の定量化の方法では、センサーから出力されたデータにより人物の化粧動作を定量化することができるが、撮影済みのメーク動画に写る人物の化粧動作を定量化できない。<Study on quantifying makeup movements>
As a method for quantifying makeup actions, for example, there is a method using a sensor. In the method of quantifying makeup movements using sensors, the person applying makeup is asked to wear a sensor and perform the makeup movements. In the method of quantifying makeup actions using sensors, it is possible to quantify the makeup actions of a person based on the data output from the sensor, but it is not possible to quantify the makeup actions of a person in a makeup video that has already been filmed.

撮影済みのメーク動画を解析して、そのメーク動画に写る人物の化粧動作を定量化できれば、動画サイト等のＷｅｂサイトのメーク動画を利用でき、また、センサーなどを装着する必要がないため、自然な化粧動作の定量化が期待できる。そこで、本実施形態では撮影済みのメーク動画に写る人物の化粧動作を定量化する為に、メーク動画から取得する対象の把握と、その対象を用いた化粧動作の定量化の検討を行った。 If makeup videos that have already been filmed can be analyzed and the makeup movements of the people in those videos can be quantified, makeup videos from websites such as video sites can be used, and there is no need to wear sensors, etc., so it can be done naturally. We can expect to be able to quantify makeup movements. Therefore, in the present embodiment, in order to quantify the makeup actions of people in the makeup videos that have already been shot, we investigated the object to be acquired from the makeup video and the quantification of the makeup actions using that object.

《メーク動画から取得する対象の把握》
メーク動画から取得する対象を把握するため、被験者の化粧動作中の動きをモーションキャプチャで計測した。なお、測定部位は、右手中指先端、中指根元、手の甲中央、手首中央、肘、額の計６カ所である。解析対象は、測定部位の座標（変位）と速度と加速度と角速度である。このように解析した被験者の動きの主成分は、化粧動作中の主要な動きの要素と推定できる。《Understanding what to obtain from makeup videos》
In order to understand what was captured from the makeup videos, we used motion capture to measure the subjects' movements while applying makeup. The measurement sites were a total of six locations: the tip of the middle finger of the right hand, the base of the middle finger, the center of the back of the hand, the center of the wrist, the elbow, and the forehead. The objects of analysis are the coordinates (displacement), velocity, acceleration, and angular velocity of the measurement site. The principal components of the subject's movements analyzed in this way can be estimated to be the main movement elements during the makeup action.

図３Ａ及び図３Ｂは計測の結果の一例の散布図である。図３Ａは化粧動作ごとの手動作の細かさ及び手と肘の連動性の一例を示す散布図である。図３Ｂは化粧動作ごとの手動作の細かさ及び顔の動きの一例を示す散布図である。図３Ａ及び図３Ｂに示したように、被験者の動きの主成分は、手動作の細かさ、手と肘の連動性、及び、顔の動き、であった。そこで、本実施形態ではメーク動画から取得する対象を、手の座標・速度と、顔の座標・速度とした。 FIGS. 3A and 3B are scatter diagrams of examples of measurement results. FIG. 3A is a scatter diagram illustrating an example of the fineness of hand movements and the coordination of hands and elbows for each makeup action. FIG. 3B is a scatter diagram showing an example of the fineness of hand movements and facial movements for each makeup action. As shown in FIGS. 3A and 3B, the main components of the subject's movements were the fineness of the hand movements, the coordination of the hands and elbows, and the movements of the face. Therefore, in this embodiment, the objects to be acquired from the makeup video are the coordinates and speed of the hand and the coordinates and speed of the face.

《取得した対象を用いた化粧動作の定量化》
メーク動画に写る人物の化粧動作中の手の座標・速度と、顔の座標・速度とを取得する手法としては、畳み込みニューラルネットワーク（以下、ＣＮＮと呼ぶ）を用いた画像認識がある。ＣＮＮを用いた画像認識では、二次元画像から顔領域及び手領域を検出することができるので、メーク動画のフレーム画像から検出した顔領域及び手領域をトラッキング（追跡）することで、メーク動画に写る人物の化粧動作中の手の座標・速度と、顔の座標・速度とを取得できる。なお、本実施形態のＣＮＮを用いた画像認識の詳細については後述する。《Quantification of makeup movements using acquired objects》
Image recognition using a convolutional neural network (hereinafter referred to as CNN) is a method for acquiring the coordinates and speed of the hands and the coordinates and speed of the face of a person in a makeup video during the makeup action. Image recognition using CNN can detect face areas and hand areas from two-dimensional images, so by tracking the face and hand areas detected from frame images of makeup videos, it is possible to detect facial and hand areas from two-dimensional images. It is possible to obtain the coordinates and speed of the hand and the coordinates and speed of the face while the person in the photo is applying makeup. Note that details of image recognition using CNN of this embodiment will be described later.

＜ソフトウェア構成＞
《機能ブロック》
本実施形態に係る情報処理システムのソフトウェア構成について説明する。なお、ここでは図１Ａに示した情報処理装置１を一例として説明する。図４は本実施形態に係る情報処理システムの一例の機能ブロック図である。情報処理装置１はプログラムを実行することにより、操作受付部１０、領域検出部１２、定量化部１４、後処理部１６及び動画データ記憶部１８を実現している。<Software configuration>
《Functional block》
The software configuration of the information processing system according to this embodiment will be described. Note that the information processing device 1 shown in FIG. 1A will be described here as an example. FIG. 4 is a functional block diagram of an example of an information processing system according to this embodiment. The information processing device 1 implements an operation reception section 10, an area detection section 12, a quantification section 14, a post-processing section 16, and a video data storage section 18 by executing programs.

操作受付部１０はユーザからの各種操作を受け付ける。動画データ記憶部１８はメーク動画を記憶している。なお、動画データ記憶部１８は情報処理装置１の外部に設けられていてもよい。領域検出部１２は動画データ記憶部１８に記憶しているメーク動画やカメラ機能により撮影されたメーク動画が入力される。領域検出部１２は、入力されたメーク動画を構成するフレーム画像ごとに、そのフレーム画像に写る人物の顔領域及び手領域を後述するように検出する。 The operation reception unit 10 receives various operations from the user. The video data storage unit 18 stores makeup videos. Note that the video data storage section 18 may be provided outside the information processing device 1. The area detection unit 12 receives input of a makeup video stored in the video data storage unit 18 or a makeup video captured by the camera function. The area detection unit 12 detects, for each frame image constituting the input makeup video, a face area and a hand area of a person appearing in the frame image, as will be described later.

定量化部１４は、領域検出部１２が検出した顔領域及び手領域から、メーク動画に写る人物の化粧動作中の手の座標・速度と、顔の座標・速度とを取得することで、メーク動画に写る人物の化粧動作を定量化する。後処理部１６は領域検出部１２及び定量化部１４による処理結果を、後処理して出力装置５０２等に出力する。 The quantification unit 14 acquires the coordinates and speed of the hand of the person in the makeup video and the coordinates and speed of the face during the makeup movement of the person from the face area and hand area detected by the area detection unit 12. Quantifying the makeup movements of people in videos. The post-processing unit 16 post-processes the processing results by the area detection unit 12 and the quantification unit 14 and outputs the post-processed results to the output device 502 or the like.

例えば後処理部１６は、メーク動画に写る人物の化粧動作中の顔領域及び手領域を矩形で囲う後処理を行う。また、後処理部１６はメーク動画に写る人物の化粧動作中の手の座標・速度と、顔の座標・速度から、手動作の細かさ、手と肘の連動性、及び、顔の動きを視覚的に表す等の後処理を行う。 For example, the post-processing unit 16 performs post-processing of enclosing a face area and a hand area of a person appearing in a makeup video in a rectangle while the person is applying makeup. Additionally, the post-processing unit 16 calculates the fineness of hand movements, the coordination between hands and elbows, and the movement of the face from the coordinates and speed of the hands of the person in the makeup video and the coordinates and speed of the face. Perform post-processing such as visual representation.

また、後処理部１６は２つのメーク動画に写る人物の化粧動作を定量化して比較することにより、その比較結果を出力できる。例えば本実施形態に係る情報処理システムを利用するユーザは、自分の化粧動作とメーキャップアーティスト等のメークの上手な人物の化粧動作とを定量化して比較することにより、自分のメークテクニックとの違いを理解し易くなる。これにより、本実施形態に係る情報処理システムはユーザのメークテクニックを向上させるサービスの提供が可能となる。 Further, the post-processing unit 16 can output the comparison result by quantifying and comparing the makeup actions of the people in the two makeup videos. For example, a user who uses the information processing system according to the present embodiment can quantify and compare his or her makeup actions with those of a makeup artist or other person who is skilled at applying makeup, thereby identifying the differences between his or her own makeup techniques. It becomes easier to understand. Thereby, the information processing system according to this embodiment can provide a service that improves the user's makeup technique.

メーク動画のフレーム画像に写る人物の顔領域及び手領域を検出する図４の領域検出部１２は例えば図５に示すように構成される。図５は領域検出部の一例の構成図である。図５の領域検出部１２は、フレーム化部２０、顔領域検出部２２、手領域検出部２４及び顔特徴点検出部２６を有する構成である。 The area detection unit 12 shown in FIG. 4, which detects a face area and a hand area of a person in a frame image of a makeup video, is configured as shown in FIG. 5, for example. FIG. 5 is a configuration diagram of an example of the area detection section. The area detection unit 12 in FIG. 5 has a configuration including a framing unit 20, a face area detection unit 22, a hand area detection unit 24, and a facial feature point detection unit 26.

フレーム化部２０は入力されたメーク動画をフレーム画像の単位で顔領域検出部２２及び手領域検出部２４に提供する。顔領域検出部２２は、顔パーツ領域学習モデルを含む顔領域学習モデルを有する。 The framing unit 20 provides the input makeup video to the face area detection unit 22 and the hand area detection unit 24 in units of frame images. The face area detection unit 22 has a face area learning model including a face part area learning model.

なお、顔領域検出部２２が有する顔領域学習モデルは、顔領域に手領域が被った二次元画像を教師データとして用いた機械学習により作成されている。顔領域に手領域が被った教師データは手が顔の前景となるように写った二次元画像から作成する。顔領域に手領域が被った教師データは、アノテーションされた（教師データとして作成された）手領域学習データセットに対し、アノテーションされた顔領域学習データセットの画像を、手が前景となるように貼り付けることで作成してもよい。 Note that the face area learning model possessed by the face area detection unit 22 is created by machine learning using a two-dimensional image in which the hand area overlaps the face area as training data. Teacher data in which the hand region overlaps the face region is created from a two-dimensional image in which the hand is in the foreground of the face. The training data in which the hand region overlaps the face region is an annotated (created as training data) hand region training dataset, and the image of the annotated face region training dataset is changed so that the hand is in the foreground. It can also be created by pasting.

顔領域検出部２２は、顔領域の一部が手領域で隠れた状態（遮蔽環境）の教師データを学習データセットに用いたＣＮＮを行うことで学習した顔領域学習モデルを利用することにより、顔領域と手領域との重なりに頑強な顔領域検出を実現する。 The face area detection unit 22 uses a face area learning model learned by performing CNN using training data in which a part of the face area is hidden by the hand area (occlusion environment) as a learning data set. Achieving robust face area detection against overlap between face and hand areas.

手領域検出部２４は、指先位置領域学習モデルを含む手領域学習モデルを有する。手領域検出部２４が有する手領域学習モデルは、化粧中の手の二次元画像を教師データとして用いて作成されている。 The hand region detection unit 24 has a hand region learning model including a fingertip position region learning model. The hand region learning model possessed by the hand region detection unit 24 is created using a two-dimensional image of a hand applying makeup as training data.

なお、化粧中の手の教師データは、化粧中の手の形に特化したアノテーションされた手領域学習データセット、及び、化粧中の指先位置に特化したアノテーションされた指先位置領域学習データセットにより作成される。手領域検出部２４は、上記の教師データを学習データセットに用いたＣＮＮを行うことで学習した手領域学習モデルを利用することにより、化粧動作中の形のバリエーションが多い手及び指先位置の検出精度の高い手領域検出を実現する。 The training data for hands while applying makeup includes an annotated hand region learning dataset that specializes in the shape of hands while applying makeup, and an annotated fingertip position area learning dataset that specializes in the position of fingertips while applying makeup. Created by The hand area detection unit 24 detects the positions of hands and fingertips that have many variations in shape during makeup movements by using a hand area learning model learned by performing CNN using the above-mentioned teacher data as a learning data set. Achieve highly accurate hand region detection.

また、顔特徴点検出部２６は顔パーツ特徴点学習モデルを含む顔特徴点学習モデルを有する。顔特徴点検出部２６は顔特徴点学習モデルを利用することにより、顔全体の顔特徴点を検出する。その後、顔特徴点検出部２６は顔パーツ特徴点学習モデルを利用することにより、顔パーツの顔特徴点を部位別に検出する。顔特徴点検出部２６は、顔パーツに含まれる目の顔特徴点を検出し、目の顔特徴点の位置から目以外の部位の顔特徴点（輪郭を含む）の位置を修正することにより、低解像度や遮蔽環境でも高精度な顔特徴点検出を実現する。 Further, the facial feature point detection unit 26 has a facial feature point learning model including a facial part feature point learning model. The facial feature point detection unit 26 detects facial feature points of the entire face by using a facial feature point learning model. Thereafter, the facial feature point detection unit 26 detects facial feature points of facial parts for each region by using the facial part feature point learning model. The facial feature point detection unit 26 detects the facial feature points of the eyes included in the facial parts, and corrects the positions of the facial feature points of parts other than the eyes (including the contour) from the positions of the facial feature points of the eyes. , achieves highly accurate facial feature point detection even in low resolution or occluded environments.

＜処理＞
《顔領域検出及び顔特徴点検出》
領域検出部１２がフレーム画像に写る人物の顔領域を検出する処理は、例えば図６に示すように行われる。図６は、フレーム画像に写る人物の顔領域を検出する処理の一例のイメージ図である。<Processing>
《Face area detection and facial feature point detection》
The process by which the area detection unit 12 detects a face area of a person appearing in a frame image is performed as shown in FIG. 6, for example. FIG. 6 is an image diagram of an example of a process for detecting a face area of a person in a frame image.

領域検出部１２の顔領域検出部２２は、上記した顔領域学習モデルを利用することによりフレーム画像１０００の人物の顔が写る顔領域を矩形１００２で検出する。顔領域検出部２２は矩形１００２の領域から、上記した顔パーツ領域学習モデルを利用して顔パーツ領域を検出し、検出した鼻を中心として矩形１００２の矩形比を矩形１００４のように修正する。 The face area detection unit 22 of the area detection unit 12 uses the above-described face area learning model to detect a face area in which a person's face appears in the frame image 1000 as a rectangle 1002. The face area detection unit 22 detects a facial part area from the area of the rectangle 1002 using the above-described facial part area learning model, and corrects the rectangular ratio of the rectangle 1002 to a rectangle 1004 with the detected nose as the center.

顔特徴点検出部２６は、矩形１００４の領域から、上記した顔特徴点学習モデルを利用して顔特徴点を矩形領域画像１００６のように検出する。また、顔特徴点検出部２６は矩形領域画像１００６から、上記した顔パーツ特徴点学習モデルを利用して、顔パーツの特徴点を矩形領域画像１００８のように部位別に検出する。 The facial feature point detection unit 26 detects facial feature points from the rectangular area 1004 as in the rectangular area image 1006 using the above-described facial feature point learning model. Further, the facial feature point detection unit 26 detects feature points of facial parts for each region from the rectangular area image 1006 as in the rectangular area image 1008 using the above-described facial part feature point learning model.

なお、図５の領域検出部１２は図７のフローチャートに示すように処理することで低解像度や遮蔽環境でも高精度な顔特徴点検出を実現できる。図７は顔特徴点検出処理の一例のフローチャートである。 Note that the area detection unit 12 in FIG. 5 can realize highly accurate facial feature point detection even in a low resolution or shielded environment by performing processing as shown in the flowchart in FIG. FIG. 7 is a flowchart of an example of facial feature point detection processing.

ステップＳ１１において、領域検出部１２の顔特徴点検出部２６は、顔領域検出部２２が検出した顔領域（顔全体）から、上記した顔特徴点学習モデルを利用して顔特徴点を検出し、頭部姿勢を推定する。 In step S11, the facial feature point detecting unit 26 of the area detecting unit 12 detects facial feature points from the facial area (the entire face) detected by the facial area detecting unit 22 using the above-described facial feature point learning model. , estimate the head pose.

ステップＳ１２に進み、顔特徴点検出部２６はステップＳ１１で推定した頭部姿勢を考慮し、顔領域検出部２２が検出した目について顔パーツ特徴点学習モデルを使用して検出することで、目の位置推定を補正し、目の位置推定精度を上げる。 Proceeding to step S12, the facial feature point detection unit 26 takes into consideration the head posture estimated in step S11, and detects the eyes detected by the face area detection unit 22 using the facial part feature point learning model. Correct the position estimation of the eye and improve the accuracy of the eye position estimation.

ステップＳ１３に進み、顔特徴点検出部２６はステップＳ１２で補正した目の推定位置を考慮し、上記した顔パーツ特徴点学習モデルを利用して目以外の顔パーツの特徴点（輪郭を含む）を検出し、目以外の顔パーツの推定位置を修正する。図７のフローチャートの処理は、例えば手で顔の輪郭が遮蔽されていた場合に有効である。 Proceeding to step S13, the facial feature point detection unit 26 takes into account the estimated positions of the eyes corrected in step S12 and uses the above-described facial part feature point learning model to determine the feature points (including contours) of facial parts other than the eyes. Detects and corrects the estimated positions of facial parts other than eyes. The process in the flowchart of FIG. 7 is effective, for example, when the outline of the face is obscured by a hand.

《手領域検出》
領域検出部１２がフレーム画像に写る人物の手領域を検出する処理は、例えば図８に示すように行われる。図８は、フレーム画像に写る人物の手領域を検出する処理の一例のイメージ図である。《Hand area detection》
The process by which the area detection unit 12 detects a hand area of a person appearing in a frame image is performed, for example, as shown in FIG. 8. FIG. 8 is an image diagram of an example of a process for detecting a hand region of a person in a frame image.

領域検出部１２の手領域検出部２４は、上記した手領域学習モデルを利用することでフレーム画像１１００の人物の左右の手が写る手領域を矩形１１０２で検出する。また、手領域検出部２４は、矩形１１０２の領域から、上記した指先位置領域学習モデルを利用してフレーム画像１１１０の人物の左右の手の領域１１１２及び１１１４と、左右の手の指先位置１１１６及び１１１８を検出する。 The hand area detection unit 24 of the area detection unit 12 uses the above-described hand area learning model to detect a hand area in which the left and right hands of the person in the frame image 1100 are captured in a rectangle 1102. Further, the hand area detection unit 24 uses the above-described fingertip position area learning model to determine the areas 1112 and 1114 of the left and right hands of the person in the frame image 1110, the fingertip positions 1116 and 1116 of the left and right hands from the area of the rectangle 1102. 1118 is detected.

《出力》
本実施形態に係る情報処理装置１は、例えばメーク動画に写る人物の化粧動作中の手の座標・速度と、顔の座標・速度から、手動作の細かさ、手と肘の連動性、及び、顔の動きを算出し、出力できる。このような手動作の細かさ、手と肘の連動性、及び、顔の動きの出力は化粧動作の研究などに有用である。"output"
The information processing device 1 according to the present embodiment calculates, for example, the fineness of hand movements, the coordination between hands and elbows, based on the coordinates and speed of the hands of the person in the makeup video and the coordinates and speed of the face during the makeup movement. , can calculate and output facial movements. The fineness of hand movements, the coordination of hands and elbows, and the output of facial movements are useful for research on makeup movements.

また、本実施形態に係る情報処理装置１は２つのメーク動画に写る人物の化粧動作を定量化して比較できるので、メークテクニックを学びたいユーザの化粧動作とメーキャップアーティスト等のメークの上手な人物の化粧動作とを定量化して比較できる。比較結果は点数化してユーザに提示してもよいし、ユーザとメーキャップアーティスト等のメークの上手な人物との化粧動作の違いを視覚的にユーザに提示してもよい。さらに、本実施形態に係る情報処理装置１は比較結果に基づき、ユーザの化粧動作がメーキャップアーティスト等のメークの上手な人物の化粧動作に近づくようにメークテクニックをユーザに提示してもよい。 Furthermore, since the information processing device 1 according to the present embodiment can quantify and compare the makeup actions of people in two makeup videos, it is possible to compare the makeup actions of a user who wants to learn makeup techniques with the makeup actions of a person who is good at makeup, such as a makeup artist. You can quantify and compare makeup actions. The comparison result may be presented to the user as a score, or the difference in makeup action between the user and a person who is good at makeup, such as a makeup artist, may be visually presented to the user. Further, based on the comparison result, the information processing device 1 according to the present embodiment may present a makeup technique to the user so that the user's makeup behavior approaches that of a makeup artist or other person skilled in makeup.

例えば本実施形態に係る情報処理装置１はチーク／ファンデーションなど、広い面積に塗るメーキャップ製品塗布時の塗布面積判定及び塗布範囲のレクチャーをユーザに対して行うことができる。また、本実施形態に係る情報処理装置１はアイライナー／アイシャドウ／コンシーラーなど、テクニックが難しいメーキャップ製品使用時の動きの正誤判定及びレクチャーをユーザに対して行うことができる。さらに、本実施形態に係る情報処理装置１はヘアワックス（ヘア製品）の塗布方法のレクチャー、スキンケア製品の塗布方法のレクチャー又はマッサージ方法のレクチャーをユーザに対して行うこともできる。 For example, the information processing device 1 according to the present embodiment can provide a user with a lecture on the application area determination and application range when applying a makeup product such as blush/foundation over a wide area. Further, the information processing device 1 according to the present embodiment can judge whether the movements are correct or incorrect when using a makeup product such as eyeliner/eyeshadow/concealer that requires a difficult technique and give a lecture to the user. Furthermore, the information processing device 1 according to the present embodiment can also give a lecture to the user on how to apply hair wax (hair product), a lecture on how to apply a skin care product, or a lecture on a massage method.

また、本実施形態に係る情報処理装置１は骨格がシャープなユーザと丸めなユーザとでチーク（頬紅）の入れ方など、メークのやり方が異なる場合があるため、ユーザの顔立ちにあったメークのリコメンドと、そのメークを実現するためのテクニックをレクチャーしてもよい。 In addition, the information processing device 1 according to the present embodiment may apply makeup in a manner that suits the user's facial features, since users with sharp bones and users with round bones may apply makeup in different ways, such as how to apply blush. You can also give recommendations and lectures on techniques to achieve the makeup.

［第２の実施形態］
第２の実施形態に係る情報処理システムは、一部を除いて第１の実施形態に係る情報処理システムと同様であるため、適宜説明を省略する。第２の実施形態の情報処理システムは図５の領域検出部１２に替えて、図９の領域検出部１２を備えた構成である。図９は領域検出部の一例の構成図である。図９の領域検出部１２は、フレーム化部５０、肌色領域抽出部５２、領域分割部５４、顔領域検出部５６、手領域検出部５８、力場計算部６０及びチャンネル計算部６２を有する構成である。[Second embodiment]
The information processing system according to the second embodiment is the same as the information processing system according to the first embodiment except for some parts, so the description will be omitted as appropriate. The information processing system of the second embodiment has a configuration including the area detection section 12 shown in FIG. 9 in place of the area detection section 12 shown in FIG. FIG. 9 is a configuration diagram of an example of the area detection section. The region detection section 12 in FIG. 9 includes a framing section 50, a skin color region extraction section 52, a region division section 54, a face region detection section 56, a hand region detection section 58, a force field calculation section 60, and a channel calculation section 62. It is.

フレーム化部５０は入力されたメーク動画をフレーム画像の単位で肌色領域抽出部５２に提供する。肌色領域抽出部５２はフレーム画像から肌色領域を抽出する。領域分割部５４は肌色領域抽出部５２が抽出した肌色領域を候補ブロブに分割し、更に、候補ブロブのラベリングを行う。領域分割部５４はラベル付けされた候補ブロブを顔領域検出部５６及び手領域検出部５８に提供する。 The framing unit 50 provides the input makeup video to the skin color area extraction unit 52 in units of frame images. The skin color area extraction unit 52 extracts a skin color area from the frame image. The region dividing section 54 divides the skin color region extracted by the skin color region extracting section 52 into candidate blobs, and further performs labeling of the candidate blobs. The region dividing section 54 provides the labeled candidate blobs to the face region detecting section 56 and the hand region detecting section 58 .

顔領域検出部５６は提供された候補ブロブのラベル（分割された肌色領域の特徴）に基づいて、顔領域の候補ブロブを分類（顔領域を検出）する。また、手領域検出部５８は領域分割部５４から提供された候補ブロブのラベル（分割された肌色領域の特徴）と、顔領域検出部５６により分類された顔領域の候補ブロブとに基づいて、手領域の候補ブロブを分類（手領域を検出）する。 The face area detection unit 56 classifies candidate blobs of the face area (detects the face area) based on the provided label of the candidate blob (features of the divided skin color areas). Furthermore, based on the label of the candidate blob (features of the divided skin color region) provided from the region dividing section 54 and the candidate blob of the face region classified by the face region detecting section 56, the hand region detecting section 58 performs the following operations. Classify hand region candidate blobs (detect hand region).

図９の顔領域検出部５６及び手領域検出部５８は、図１０に示すように、先に顔領域検出部５６が顔領域の候補ブロブを分類し、顔領域の候補ブロブを除外して、手領域検出部５８が手領域の候補ブロブを分類する。したがって、本実施形態では手領域の誤検出を防止できる。 As shown in FIG. 10, the face area detection unit 56 and the hand area detection unit 58 in FIG. 9 first classify candidate blobs of the face area, exclude candidate blobs of the face area, The hand region detection unit 58 classifies candidate blobs of the hand region. Therefore, in this embodiment, erroneous detection of the hand area can be prevented.

力場計算部６０は顔領域検出部５６において顔領域の候補ブロブを分類できなかった場合に、顔領域と手領域との干渉が発生していると判断し、次のような処理を行う。力場計算部６０は前フレーム（ｔ－１）の顔領域及び手領域の候補ブロブと現フレーム（ｔ）のラベル付けされた候補ブロブとが提供される。 If the face area detection unit 56 is unable to classify the candidate blob of the face area, the force field calculation unit 60 determines that interference between the face area and the hand area has occurred, and performs the following processing. The force field calculation unit 60 is provided with candidate blobs of the face region and hand region of the previous frame (t-1) and labeled candidate blobs of the current frame (t).

力場計算部６０は図１１に示すように、力場（Force Field）により候補ブロブの画像内に多量のチャンネル（Channel）を設定する。チャンネル計算部６２は、前フレームからの移動距離をチャンネルごとに計算し、移動距離の大きいチャンネルの候補ブロブを、動いている手の候補ブロブであると仮定することにより、手領域及び顔領域の候補ブロブを分類できる。 As shown in FIG. 11, the force field calculation unit 60 sets a large number of channels within the candidate blob image using a force field. The channel calculation unit 62 calculates the movement distance from the previous frame for each channel, and assumes that the candidate blob of the channel with a large movement distance is the candidate blob of the hand that is moving. Can classify candidate blobs.

なお、領域検出部１２の力場計算部６０及びチャンネル計算部６２は移動距離の大きさに加えて、手領域らしい動きをクラスタリングしておくことで、手領域及び顔領域の候補ブロブの誤検出を更に防止できる。 Note that the force field calculation unit 60 and channel calculation unit 62 of the region detection unit 12 cluster the movements that are typical of the hand region in addition to the size of the movement distance, thereby preventing false detection of candidate blobs in the hand region and face region. can be further prevented.

（まとめ）
以上、本実施形態によれば、センサーなどを装着することなく、撮影済みのメーク動画に写る人物の化粧動作を定量化でき、メークテクニックの提供や教示が可能となる。本発明は、具体的に開示された上記の実施形態に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。例えば本実施形態では、二次元の動画データを一例として説明したが、三次元の動画データであってもよい。本実施形態によれば、二次元の動画データと同様なデータ解析、又は二次元の動画データ解析に三次元情報を組み合わせた解析により、三次元の動画データに写る人物の化粧動作を定量化でき、メークテクニックの提供や教示が可能となる。(summary)
As described above, according to the present embodiment, it is possible to quantify the makeup actions of a person in a shot makeup video without wearing a sensor or the like, and it is possible to provide and teach makeup techniques. The present invention is not limited to the above-described specifically disclosed embodiments, and various modifications and changes can be made without departing from the scope of the claims. For example, in this embodiment, two-dimensional video data has been described as an example, but three-dimensional video data may also be used. According to this embodiment, it is possible to quantify makeup movements of a person in 3D video data through data analysis similar to 2D video data or analysis that combines 2D video data analysis with 3D information. , it becomes possible to provide and teach makeup techniques.

以上、本発明を実施例に基づいて説明したが、本発明は上記実施例に限定されるものではなく、特許請求の範囲に記載の範囲内で様々な変形が可能である。本願は、日本特許庁に２０１８年１０月２４日に出願された基礎出願２０１８―１９９７３９号の優先権を主張するものであり、その全内容を参照によりここに援用する。 Although the present invention has been described above based on examples, the present invention is not limited to the above-mentioned examples, and various modifications can be made within the scope of the claims. This application claims priority to Basic Application No. 2018-199739 filed with the Japan Patent Office on October 24, 2018, and the entire contents thereof are incorporated herein by reference.

１情報処理装置
２クライアント端末
３サーバ装置
４ネットワーク
１０操作受付部
１２領域検出部
１４定量化部
１６後処理部
１８動画データ記憶部
２０フレーム化部
２２顔領域検出部
２４手領域検出部
２６顔特徴点検出部
５０フレーム化部
５２肌色領域抽出部
５４領域分割部
５６顔領域検出部
５８手領域検出部
６０力場計算部
６２チャンネル計算部1 Information processing device 2 Client terminal 3 Server device 4 Network 10 Operation reception unit 12 Area detection unit 14 Quantification unit 16 Post-processing unit 18 Video data storage unit 20 Framing unit 22 Face area detection unit 24 Hand area detection unit 26 Facial features Point detection section 50 Framing section 52 Skin color region extraction section 54 Region division section 56 Face region detection section 58 Hand region detection section 60 Force field calculation section 62 Channel calculation section

Claims

A program for operating a computer to analyze the makeup movements of a person in makeup or skin care video data,
The computer,
a first detection means for detecting a face area in which the person's face is captured from the video data;
a second detection means for detecting a hand region in which the person's hand is captured from the video data;
Based on the detected face area and the hand area, the hand movement is determined from the hand coordinates and speed of the person appearing in the video data acquired from the movements of the face area and the hand area, and the coordinates and speed of the face during the makeup movement. output means for quantifying and outputting the fineness of hand and elbow coordination, or the fineness of hand movements and facial movements;
A program to function as

The first detection means includes:
face area detection means for detecting the face area of the person from the video data using a face area learning model;
facial part area detection means for detecting a facial part area from the facial area using a facial part feature point learning model;
has
The second detection means includes:
hand area detection means for detecting the hand area of the person from the video data using the hand area learning model of the makeup action;
Fingertip position area detection means for detecting a fingertip position area from the hand area using the fingertip position area learning model of the makeup action;
2. The program according to claim 1, comprising:

3. The computer program product according to claim 2 , wherein the facial area learning model is a convolutional neural network trained using teacher data in which a part of the facial area is hidden by a hand area as a facial area learning data set.

The hand region learning model is a convolutional neural network that is trained using training data of the shape of a hand while applying makeup as a hand region learning dataset, and the fingertip position region learning model is trained using training data of the position of a fingertip while applying makeup. 3. The program according to claim 2 , wherein the program is a convolutional neural network trained using a fingertip position region learning data set.

The computer,
extraction means for extracting a skin color region from the video data;
dividing means for dividing the skin color region into divided regions;
further function as
The first detection means detects a face area in which the face of the person is reflected from the skin color area based on the feature amount of the divided area,
The second detection means detects a hand region in which the person's hand is captured from the skin color region excluding the skin color region detected as a face region in which the person's face is captured, based on the feature amount of the divided region. The program according to claim 1, characterized in that:

The computer,
When the first detection means cannot detect a face area in which the person's face is reflected from the skin color area, the first detection means detects the movement distance of the divided area between frame images forming the video data, and detects the movement distance of the divided area between the frame images forming the video data. a third detection means that assumes that the divided area is a hand area in which the person's hand is captured;
6. The program according to claim 5 , further functioning as a program.

The output means outputs an image visually representing a difference in the makeup actions of the people shown in the two video data, based on a comparison result of an output calculated by quantifying the makeup actions of the people shown in the two video data. 2. The program according to claim 1, wherein the program outputs an output.

An information processing device that analyzes makeup movements of a person appearing in makeup or skin care video data, the information processing device comprising:
a first detection means for detecting a facial area in which the person's face is captured from the video data;
a second detection means for detecting a hand region in which the person's hand is captured from the video data;
Based on the detected face area and the hand area, the hand movement is determined from the hand coordinates and speed of the person appearing in the video data acquired from the movements of the face area and the hand area, and the coordinates and speed of the face during the makeup movement. output means for quantifying and outputting the fineness of the hand and the hand-elbow coordination, or the fineness of the hand movement and the facial movement;
An information processing device having:

A quantification method executed in an information processing device that analyzes makeup movements of a person in makeup or skin care video data, the method comprising:
a first detection step of detecting a facial area in which the person's face is captured from the video data;
a second detection step of detecting a hand region in which the person's hand is captured from the video data;
Based on the detected face area and the hand area, the hand movement is determined from the hand coordinates and speed of the person appearing in the video data acquired from the movements of the face area and the hand area, and the coordinates and speed of the face during the makeup movement. an output step of quantifying and outputting the fineness of the hand and the coordination of the hand and the elbow, or the fineness of the hand movement and the movement of the face;
A quantification method with

An information processing system comprising: a client terminal that receives operations from a user; and a server device that analyzes makeup actions of a person in makeup or skin care video data based on the operations that the client terminal receives from the user. The server device includes:
receiving means for receiving information on an operation that the client terminal has accepted from the user;
a first detection means for detecting a facial area in which the person's face is captured from the video data based on an operation received from the user;
a second detection means for detecting a hand region in which the person's hand is captured from the video data based on an operation received from the user;
Based on the detected face area and the hand area, the hand movement is determined from the hand coordinates and speed of the person appearing in the video data acquired from the movements of the face area and the hand area, and the coordinates and speed of the face during the makeup movement. output means for quantifying and outputting the fineness of the hand and the hand-elbow coordination, or the fineness of the hand movement and the facial movement;
transmitting means for transmitting the output of the output means to the client terminal;
An information processing system with