JPWO2020113326A5

JPWO2020113326A5 -

Info

Publication number: JPWO2020113326A5
Application number: JP2021554782A
Authority: JP
Publication date: 2022-09-14
Anticipated expiration: 2039-12-03

Description

本出願は、米国に関して、２０１８年１２月４日に出願された米国仮特許出願第６２／７７５，１１７号のパリ条約優先権を主張し、その内容は、許容される場合に参照により本明細書に組み込まれる。 This application claims, with respect to the United States, Paris Convention priority of US Provisional Patent Application No. 62/775,117, filed December 4, 2018, the contents of which are incorporated herein by reference where permitted. incorporated into the book.

本明細書は、皮膚科などの皮膚診断、及び皮膚治療のモニタリングに関し、より詳細には、深層学習を用いた自動的な画像ベースの皮膚診断のためのシステム及び方法に関する。 TECHNICAL FIELD This specification relates to skin diagnostics, such as dermatology, and skin treatment monitoring, and more particularly to systems and methods for automatic image-based skin diagnostics using deep learning.

正確な皮膚の分析は、医療および化粧品分野の両方において重要な分野である。皮膚の画像を生成し、分析して、１又は複数の皮膚の状態を判定することがある。皮膚分析の課題は、コンピュータ技術を用いて、画像（見た目で分かる皮膚診断のタスク）を通して皮膚を観察することだけで解決することが望ましい。この課題の解決に成功すれば、皮膚科医が人を直接診察する必要が無くなるので、皮膚分析をより迅速かつ安価にできる。 Accurate skin analysis is an important area in both the medical and cosmetic fields. An image of the skin may be generated and analyzed to determine one or more skin conditions. It would be desirable to solve the problem of skin analysis using computer technology simply by observing the skin through images (the task of visual skin diagnostics). Successful resolution of this problem would make skin analysis faster and cheaper, since dermatologists would no longer need to see people in person.

顔などの画像には、符号化（エンコード）された１又は複数の皮膚の状態が画像のピクセル内に示される。画像から１又は複数の皮膚の状態を復号化（デコード）するために深層学習を使用して自動的な画像ベースの皮膚診断を実行するか、又は実行することを可能にする、コンピュータ実装方法、コンピュータ装置、及び他の態様を提供することが望ましい。 An image, such as a face, has one or more skin conditions encoded within the pixels of the image. A computer-implemented method for performing or enabling to perform automatic image-based skin diagnostics using deep learning to decode one or more skin conditions from an image; It would be desirable to provide a computing device and other aspects.

皮膚診断のための深層学習に基づくシステム及び方法、並びに、そのような深層学習に基づくシステムが、見かけ上の皮膚の診断タスクに関して人間の専門家を上回ることを示す、試験メトリクスが開示され、説明される。また、皮膚診断のための深層学習に基づくシステム及び方法を使用して、皮膚治療の方法をモニタするシステム及び方法が開示され、説明される。 Disclosed and described are deep learning-based systems and methods for skin diagnostics, and test metrics showing that such deep learning-based systems outperform human experts on apparent skin diagnostic tasks. be done. Also disclosed and described are systems and methods for monitoring skin treatment regimens using deep learning based systems and methods for skin diagnostics.

記憶部と、その記憶部に結合される処理部とを備える皮膚の診断装置が提供され、前記記憶部は、複数であるＮ個の各皮膚の徴候について、Ｎ個の各皮膚の徴候診断を判定するために画像のピクセルを分類する畳み込みニューラルネットワークであるＣＮＮを記憶して提供し、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように構成された画像分類のための深層ニューラルネットワークであり、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候のそれぞれに関する皮膚の徴候データを使用して学習され、前記処理部は、前記画像を受信し、前記ＣＮＮを用いて前記画像を処理し、前記Ｎ個の各皮膚の徴候診断を生成する。 A skin diagnostic device is provided comprising a storage unit and a processing unit coupled to the storage unit, wherein the storage unit stores N skin symptom diagnoses for each of a plurality of N skin symptoms. storing and providing a CNN, a convolutional neural network that classifies pixels of an image for determination, said CNN being a deep layer for image classification configured to generate a diagnosis of each of said N skin signs; a neural network, wherein the CNN is trained using skin manifestation data for each of the N skin manifestations, and the processor receives the images and uses the CNN to process the images; and generate a skin symptom diagnosis for each of the N skin signs.

前記ＣＮＮは、画像分類のための学習済ネットワークから定義され、最終のエンコーダ段階の特徴ネットに特徴を符号化するように構成されたエンコーダ段階と、前記Ｎ個の各皮膚の徴候診断を生成するために、複数であるＮ個の並列の皮膚徴候の分岐によって復号化するための前記最終のエンコーダ段階の特徴ネットを受信するように構成されたデコーダ段階と、を備えていても良い。前記デコーダ段階は、前記最終のエンコーダ段階の特徴ネットを処理して前記Ｎ個の並列の皮膚徴候の分岐の各々に提供するグローバルプーリング処理を備える。前記ＣＮＮは、前記ピクセルを分類して民族性ベクトルを判定するように構成されても良く、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候および複数の民族性に関する皮膚の徴候データを使用して学習される。前記デコーダ段階は、前記民族性ベクトルを生成するための民族性に関する並列の分岐を備えていても良い。 The CNN generates an encoder stage defined from a trained network for image classification and configured to encode features into a final encoder stage feature net and each of the N skin symptom diagnoses. a decoder stage configured to receive the feature net of said final encoder stage for decoding by means of a plurality of N parallel cutaneous manifestation branches. The decoder stage comprises a global pooling process that processes the feature nets of the final encoder stage to provide each of the N parallel skin manifestation branches. The CNN may be configured to classify the pixels to determine an ethnicity vector, the CNN using each of the N skin signs and skin sign data for a plurality of ethnicities. be learned. The decoder stage may comprise parallel ethnic branches for generating the ethnicity vector.

前記Ｎ個の並列の皮膚徴候の分岐の各分岐は、第１の全結合層と、それに続く第１の活性化層と、第２の全結合層と、第２の活性化層と、最終の活性化層と、を連続して備えても良く、前記Ｎ個の各皮膚の徴候診断および前記民族性ベクトルのうちの１つを含む最終値を出力する。前記最終の活性化層は、前記第２の活性化層から受信した入力スコアｘに関する以下の数式１の関数に従って定義されても良く、αは傾き、ａは下限、ｂは前記Ｎ個の各皮膚の徴候診断の各々のスコア範囲の上限である。

Each branch of the N parallel skin manifestation branches comprises a first fully connected layer, followed by a first activated layer, a second fully connected layer, a second activated layer, and a final activation layers, and output a final value comprising one of each of said N skin symptom diagnoses and said ethnicity vector. The final activation layer may be defined according to the function of Equation 1 below on the input score x received from the second activation layer, where α is the slope, a is the lower bound, and b is each of the N The upper bound of the score range for each of the skin sign diagnoses.

前記ＣＮＮは、（ｘ_ｉ、ｙ_ｉ）形式の複数のサンプルを用いて学習されても良く、ｘ_ｉは、ｉ番目の学習画像であり、ｙ_ｉは、グランドトゥルースの皮膚の徴候診断の対応ベクトルであり、前記ＣＮＮは、前記Ｎ個の並列の皮膚徴候の分岐、及び前記民族性に関する並列の分岐の各分岐に対する損失関数を最小化するように学習されても良い。前記ＣＮＮは、前記Ｎ個の並列の皮膚徴候の分岐のそれぞれについての損失関数Ｌ２に、前記民族性に関する並列の分岐についての標準交差エントロピー分類損失Ｌ_{ｅｔｈｎｉｃｉｔｙ}を重み付けして組み合わせた損失関数Ｌを、以下の数式３に従って最小化するように学習されても良く、λは、スコア回帰と民族性の分類損失との間のバランスを制御する。

The CNN may be trained using multiple samples of the form (x _i , y _i ), where x _i is the ith training image and y _i is the ground truth skin sign diagnosis correspondence A vector, the CNN may be trained to minimize a loss function for each branch of the N parallel skin manifestation branches and the parallel ethnic branches. The CNN provides a loss function L that is a weighted combination of a loss function L2 for each of the N parallel skin manifestation branches and a standard cross-entropy classification loss L _ethnicity for the parallel ethnicity branches, It may be learned to minimize according to Equation 3 below, where λ controls the balance between score regression and ethnicity classification loss.

前記記憶部は、前記画像を前処理するための顔およびランドマークの検出器を記憶しても良く、前記処理部は、前記顔およびランドマークの検出器を用いて前記画像から正規化された画像を生成し、前記ＣＮＮを使用する際に前記正規化された画像を使用するように構成されても良い。 The storage unit may store a face and landmark detector for pre-processing the image, and the processing unit stores normalized from the image using the face and landmark detector. An image may be generated and configured to use the normalized image when using the CNN.

前記ＣＮＮは、Ｎ個の各皮膚の徴候診断を生成するように適合された画像分類のための学習済ネットワークから予め構成されても良く、このネットワークは、前記学習済ネットワークの全結合層を除去し、Ｎ個の各層のグループを、前記Ｎ個の各皮膚の徴候診断のそれぞれについて同じ特徴ネットを並列に復号するように定義する。 The CNN may be pre-configured from a pre-trained network for image classification adapted to generate each of the N skin symptom diagnoses, the network removing all connected layers of the pre-trained network. and define a group of N layers to decode in parallel the same feature net for each of said N skin symptom diagnoses.

皮膚の診断装置は、モバイル端末からなる個人用のコンピュータ装置と、通信ネットワークを介して皮膚の診断サービスを提供するサーバと、のうちの１つとして構成されても良い。 The skin diagnostic device may be configured as one of a personal computing device comprising a mobile terminal and a server providing skin diagnostic services via a communication network.

前記記憶部は、前記処理部によって実行された時に前記Ｎ個の各皮膚の徴候診断の少なくともいくつかに対応して、製品および治療計画のうちの少なくとも１つに関する推奨事項を取得するための治療製品セレクタを提供するコードを記憶しても良い。 The storage unit is responsive to at least some of each of the N skin symptom diagnoses when executed by the processing unit to obtain recommendations for at least one of a product and a treatment regimen. A code that provides a product selector may be stored.

前記記憶部は、前記処理部によって実行された時に前記画像を受信するための画像取得機能を提供するコードを記憶しても良い。 The storage unit may store code that, when executed by the processing unit, provides an image acquisition function for receiving the image.

前記記憶部は、前記処理部によって実行された時に少なくとも１つの皮膚の徴候に対する治療を監視するための治療モニタを提供するコードを記憶しても良い。 The storage unit may store code that, when executed by the processing unit, provides a therapy monitor for monitoring therapy for at least one skin indication.

前記処理部は、各治療セッションに対する製品の適用に関連する治療活動を、思い出させる、指示する、及び／又は記録する、のうちの少なくとも１つを行うように構成されても良い。 The processing unit may be configured to at least one of remind, instruct and/or record treatment activities associated with application of the product for each treatment session.

前記処理部は、治療セッション後に受信した後続の皮膚の診断を生成するために前記ＣＮＮを用いて第２の画像を処理するように構成されても良い。前記記憶部は、前記処理部によって実行された時に前記後続の皮膚の診断を用いた比較結果の提示を行うコードを記憶しても良い。 The processing unit may be configured to process a second image using the CNN to generate a subsequent skin diagnosis received after a treatment session. The storage unit may store code that, when executed by the processing unit, presents comparison results with the subsequent skin diagnosis.

皮膚診断のコンピュータ実装方法であって、画像のピクセルを分類して、複数であるＮ個の各皮膚の徴候の各々についてＮ個の各皮膚の徴候診断を判定するように構成された畳み込みニューラルネットワークであるＣＮＮを記憶する記憶部を提供し、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように構成された画像分類のための深層ニューラルネットワークであり、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候についての皮膚の徴候データを使用して学習され、前記記憶部に結合された処理部によって、前記画像を受信することと、前記ＣＮＮを用いて前記画像を処理して前記Ｎ個の各皮膚の徴候診断を生成することと、を実行する方法が提供される。 A computer-implemented method of skin diagnostics, the convolutional neural network configured to classify pixels of an image to determine N skin symptom diagnoses for each of a plurality of N skin symptoms. , wherein the CNN is a deep neural network for image classification configured to generate each of the N skin symptom diagnoses, the CNN comprising the N receiving the images by a processing unit coupled to the storage unit trained using skin manifestation data for each of the skin manifestations; and processing the images using the CNN to produce the Generating each of the N skin symptom diagnoses is provided.

第２の方法であって、画像のピクセルを分類して、複数であるＮ個の各皮膚の徴候の各々についてＮ個の各皮膚の徴候診断を判定するように構成された畳み込みニューラルネットワークであるＣＮＮを学習させ、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように構成された画像分類のための深層ニューラルネットワークであり、前記学習は、前記Ｎ個の各皮膚の徴候についての皮膚の徴候データを用いて実行される方法が提供される。 A second method is a convolutional neural network configured to classify pixels of an image to determine N individual skin sign diagnoses for each of a plurality of N individual skin signs. training a CNN, said CNN being a deep neural network for image classification configured to generate a diagnosis for each of said N skin signs, said training comprising: A method is provided that is performed using the skin manifestation data of .

これらの態様および他の態様は、（非一時的な）記憶部が命令を格納しており、この命令が処理部によって実行された時に本明細書に記載されているコンピュータ装置の実装方法のいずれかを実行するようにコンピュータ装置の処理を構成する、コンピュータプログラム製品の態様を含むことが当業者には明らかであろう。 These and other aspects can be applied to any of the computer apparatus implementation methods described herein when the (non-transitory) storage stores instructions and the instructions are executed by the processing unit. It will be apparent to those skilled in the art that it includes aspects of a computer program product that configures the processing of a computing device to perform any of the following.

皮膚の状態を示す写真を合成したものである。This is a composite of photographs showing skin conditions. 本明細書の一実施形態または例による深層学習システムの概略図である。1 is a schematic diagram of a deep learning system according to one embodiment or example herein; FIG. 図２の深層学習システムのより詳細な概略図である。3 is a more detailed schematic diagram of the deep learning system of FIG. 2; FIG. 本明細書の一実施形態による様々な態様の環境を提供するコンピュータネットワークの図である。1 is a diagram of a computer network providing an environment for various aspects according to one embodiment of the present disclosure; FIG. 図４のコンピュータネットワークのコンピュータ装置のブロック図である。5 is a block diagram of a computer device of the computer network of FIG. 4; FIG. 図６（ａ）及び図６（ｂ）は、本明細書の一実施形態によるコンピュータ装置の動作のフローチャートである。6(a) and 6(b) are flowcharts of the operation of a computing device according to one embodiment of the present disclosure. 図７（ａ）及び図７（ｂ）は、本明細書の一実施形態によるコンピュータ装置の動作のフローチャートである。7(a) and 7(b) are flowcharts of the operation of a computing device according to one embodiment of the present disclosure.

本発明の概念は、添付の図面を参照し、本明細書で説明される特定の実施形態を通して最もよく説明され、ここで、同一の符号は全体を通して同一の特徴を指す。本明細書で使用される場合、「発明」という用語は、単に実施形態自体ではなく、以下に記載される実施形態の基礎をなす発明概念を暗示することを意図していることを理解されたい。更に、本発明の一般的な概念は、以下に記載される例示的な実施形態に限定されず、以下の説明は、そのような観点から読まれるべきであることが理解されるべきである。 The concepts of the present invention are best illustrated through the specific embodiments described herein with reference to the accompanying drawings, wherein like numerals refer to like features throughout. It should be understood that the term "invention", as used herein, is intended to connote the inventive concept underlying the embodiments described below and not merely the embodiments themselves. . Furthermore, it should be understood that the general concept of the invention is not limited to the exemplary embodiments described below, and that the following description should be read in that light.

＜序文＞
本明細書における「皮膚の徴候」又は「徴候」という語は法令線、様々な位置にあるしわ、顔の下部の下垂、皮脂孔、顔全体の色素沈着、及び血管障害など（但し、これらに限定されない）、特定の皮膚の状態を指す。図１は、額のしわ１０２、眉間のしわ１０４、目の下のしわ１０６、法令線１０８、唇の角のしわ１１０、及び顔の下部の下垂１１２などの皮膚の徴候を示す写真合成物１００である。人間の顔の外観は、様々な要因によって構造的な変化を受ける。これらの要因には、年齢的な老化、光老化、食習慣（食欲不振または肥満）、生活習慣（睡眠問題、喫煙、アルコール依存など）がある。これらの構造な変化は、局所的なシワ（額、眉間、上唇など）に関するものが最も明らかであるが、顔の全体的なたるみ（眼瞼下垂、眼のたるみ、頸部の下垂など）や頬の毛穴の拡大を伴うことが多い。これらの変化は、何年も何十年もかけて僅かに進行していく可能性があるが、性別や民族によって表れ方が異なる。また、これらの変化は、皮膚の色素沈着（ほくろ、暗点、皮膚の暗色化）や皮膚の血管網（赤み、毛細血管の拡張など）に対する周知の影響（大気汚染との関連の有無に関わらず）に加えて、日光（紫外線）への様々な（複数回の）曝露によって強調される。 <Preface>
As used herein, the term "skin signs" or "signs" includes statutory lines, wrinkles in various locations, lower facial ptosis, sebaceous pores, hyperpigmentation across the face, and vascular disorders. non-limiting), referring to certain skin conditions. FIG. 1 is a photographic composite 100 showing skin manifestations such as forehead wrinkles 102, glabellar wrinkles 104, under-eye wrinkles 106, normal lines 108, lip corner wrinkles 110, and lower facial drooping 112. . The appearance of the human face undergoes structural changes due to various factors. These factors include chronological aging, photoaging, dietary habits (anorexia or obesity), lifestyle habits (sleep problems, smoking, alcoholism, etc.). These structural changes are most evident with respect to localized wrinkles (forehead, glabella, upper lip, etc.), but general drooping of the face (ptosis, drooping eyes, cervical ptosis, etc.) and cheeks. often accompanied by enlarged pores. These changes can be subtle over years and decades, but they are manifested differently by gender and ethnicity. These changes are also associated with well-known effects on skin pigmentation (moles, dark spots, skin darkening) and skin vascular network (redness, capillary dilatation, etc.), whether or not they are associated with air pollution. ), as well as accentuated by variable (multiple) exposures to sunlight (ultraviolet).

いくつかの顔の徴候の重症度を等級付けすることは、皮膚科学的（皮膚剥離、矯正手術など）、美容的（スキンケア、アンチエイジング製品）、又は消費者への可能な支援／アドバイス等、異なる目的のため重要なニーズである。このようなニーズは、主な科学的な客観性に応えるだけでなく、誤った製品の請求を検出することにも役立ち得る。この等級付けの目的は、Ｌ’ＯｒｅａｌＳ．Ａの複数巻の参照用の皮膚の図版集（Ｒ．Ｂａｚｉｎ，Ｅ．Ｄｏｕｂｌｅｔ，ｉｎ：Ｐ．Ｅ．Ｍｅｄ’Ｃｏｍ（Ｅｄ．），ＳｋｉｎＡｇｉｎｇＡｔｌａｓ．Ｖｏｌｕｍｅ１，ＣａｕｃａｓｉａｎＴｙｐｅ，２００７．Ｒ．Ｂａｚｉｎ，Ｆ．Ｆｌａｍｅｎｔ，ｉｎ：Ｐ．Ｅ．Ｍｅｄ’Ｃｏｍ（Ｅｄ．），ＳｋｉｎＡｇｉｎｇＡｔｌａｓ．Ｖｏｌｕｍｅ２，ＡｓｉａｎＴｙｐｅ，２０１０．Ｒ．Ｂａｚｉｎ，Ｆ．Ｆｌａｍｅｎｔ，Ｆ．Ｇｉｒｏｎ，ｉｎ：Ｐ．Ｅ．Ｍｅｄ’Ｃｏｍ（Ｅｄ．），ＳｋｉｎＡｇｉｎｇＡｔｌａｓ．Ｖｏｌｕｍｅ３，Ａｆｒｏ－ＡｍｅｒｉｃａｎＴｙｐｅ，２０１２．Ｒ．Ｂａｚｉｎ，Ｆ．Ｆｌａｍｅｎｔ，Ｖ．Ｒｕｂｅｒｔ，ｉｎ：Ｐ．Ｅ．Ｍｅｄ’Ｃｏｍ（Ｅｄ．），ＳｋｉｎＡｇｉｎｇＡｔｌａｓ．Ｖｏｌｕｍｅ４，ＩｎｄｉａｎＴｙｐｅ，２０１５．Ａｎｄ，Ｆ．Ｆｌａｍｅｎｔ，Ｒ．Ｂａｚｉｎ，Ｈ．Ｑｉｕ，ｉｎ：Ｐ．Ｅ．Ｍｅｄ’Ｃｏｍ（Ｅｄ．），ＳｋｉｎＡｇｉｎｇＡｔｌａｓ．Ｖｏｌｕｍｅ５，Ｐｈｏｔｏ－ａｇｉｎｇＦａｃｅ＆Ｂｏｄｙ，２０１７）によって達成された。この皮膚の図版集は、年齢と共に４つの民族の両性別において、２０を超える顔の徴候の視覚的な等級付け（及び０～４、５、６又は７に増加する重症度の各スケール）を専門的に処理された写真によって標準化した。全体的な顔の外観に関わらず、特定の徴候をズームすることにより、皮膚の専門家は盲目的に、顔の各徴候の重症度を等級付けすることができる。これらの皮膚の図版集では、加齢によって人々に及ぼす影響は性別によって異なるが、同じ性別では影響が類似していることを示した。しかし、顔の徴候の一部の変化は、民族特有のものであった。この手法は、４つの民族の両性別において加齢に伴う顔の徴候の変化を正確に説明できるだけでなく、民族白人女性または中国人女性では、いくつかの顔の徴候が一日の労働による疲労に関係している、又はそれに関連しているという判断をもたらした。しかし、もう一つの困難で重要なステップが依然として残されている：それは、様々な現実の照明条件下や人間が活動中（仕事、スポーツ、交通機関への乗車など）に携帯電話によって撮影された写真（例えば、「セルフィー」や自撮り動画）、又は標準化された写真のいずれかを通して構造的な顔の徴候を等級付けする、人間の評価に頼らない自動的な処理を開発できないかということである。要するに、定量化されたデータを「ブラインド／ニュートラル」な自動的なシステムから得ることは、多くのアプリケーションで望まれている。 Grading the severity of some facial indications can be dermatological (dermabrasion, corrective surgery, etc.), cosmetic (skin care, anti-aging products), or possible consumer assistance/advice, etc. It is an important need for different purposes. Such a need not only meets the primary scientific objectivity, but can also help detect false product claims. The purpose of this grading is L'Oreal S.M. A multi-volume reference skin pictorial collection (R. Bazin, E. Doublet, in: PE Med'Com (Ed.), Skin Aging Atlas. Volume 1, Caucasian Type, 2007. R. Bazin, F. Flament, in: PE Med'Com (Ed.), Skin Aging Atlas.Volume 2, Asian Type, 2010. R. Bazin, F. Flament, F. Giron, in: PE Med'Com (Ed.), Skin Aging Atlas.Volume 3, Afro-American Type, 2012. R. Bazin, F. Flament, V. Rubert, in: PE Med'Com (Ed.), Skin Aging Atlas.Volume 4, Indian Type, 2015. And, F. Flament, R. Bazin, H. Qiu, in: PE Med'Com (Ed.), Skin Aging Atlas. Volume 5, Photo-aging Face & Body, 2017) . This skin pictorial provides a visual grading of over 20 facial signs (and each scale of severity increasing from 0 to 4, 5, 6 or 7) in both sexes of four ethnic groups with age. Standardized by professionally processed photographs. By zooming in on a particular symptom regardless of the overall facial appearance, a skin professional can blindly grade the severity of each facial symptom. These skin illustrations show that aging affects people differently depending on their gender, but that the effects are similar for people of the same sex. However, some changes in facial features were ethnically specific. Not only can this method accurately describe age-related changes in facial signs in both genders of the four ethnic groups, but in ethnic Caucasian or Chinese women, some facial signs are associated with fatigue from a day's work. resulted in a judgment that it was related to or associated with But one more difficult and important step still remains: it was filmed by a mobile phone under a variety of real-world lighting conditions and during human activity (work, sports, public transportation, etc.). to develop an automated process that does not rely on human evaluation to grade structural facial features either through photographs (e.g., "selfies" or selfie videos) or through standardized photographs. be. In short, obtaining quantified data from a "blind/neutral" automated system is desirable in many applications.

このように、様々な年齢や民族の女性のデータを用いて開発された皮膚診断に対する深層学習のアプローチについて、このアプローチの技術的な側面および得られた結果を含めて説明する。また、専門家による等級付け（皮膚の図版集を使用）によって得られたデータとの比較も提示する。 Thus, we describe a deep learning approach to skin diagnosis developed using data from women of various ages and ethnicities, including technical aspects of the approach and the results obtained. We also present comparisons with data obtained by expert grading (using skin pictorials).

画像のみから皮膚の徴候を評価するという見かけ上の皮膚診断の問題は、深層学習を用いたコンピュータ実装のための教師付きの回帰問題として投げかけられる。深層学習システム２００を示す図２の概略図によって表されるように、顔の画像ｘ２０２が与えられると、システム２００のニューラルネットワーク２０４は、スコアｙ２０６のベクトルを返し、ｙ＝ｆ_θ（ｘ）であり、ｆ_θは、θによってパラメータ化されたニューラルネットワーク２０４である。ｙ２０６の各成分は、異なる皮膚の徴候に対応する。民族性のような要素や他の皮膚の評価も、更に説明するように決定され得る。 The apparent skin diagnosis problem of evaluating skin signs from images alone is posed as a supervised regression problem for computer implementation using deep learning. As represented by the schematic diagram of FIG. 2 showing deep learning system 200, given an image x 202 of a face, neural network 204 of system 200 returns a vector of scores y 206, with y=f _θ (x) , and f _θ is the neural network 204 parameterized by θ. Each component of y206 corresponds to a different skin indication. Factors such as ethnicity and other skin ratings may also be determined as described further below.

皮膚の徴候ごとに別々のニューラルネットワークをデザインすることは可能だが、徴候間で学習された低レベルの特徴が類似しているため、上記のアプローチの実装では、全ての徴候を単一のネットワークによって共同で推定できる。副次的な利点は、より高い計算効率である。 Although it is possible to design a separate neural network for each skin sign, due to the similarity of the low-level features learned across signs, the implementation of the above approach is limited to handling all signs by a single network. can be jointly estimated. A side benefit is higher computational efficiency.

ニューラルネットワークをゼロから設計するのではなく、様々なタスクで上手く機能することが証明されたアーキテクチャを適合させることができる。特に、ＲｅｓＮｅｔ５０（ＭｉｃｒｏｓｏｆｔＲｅｓｅａｒｃｈＡｓｉａによる５０層の残差ネットワークであり、Ｋ．Ｈｅ，Ｘ．Ｚｈａｎｇ，Ｓ．Ｒｅｎ，Ｊ．Ｓｕｎ，ＤｅｅｐＲｅｓｉｄｕａｌＬｅａｒｎｉｎｇｆｏｒＩｍａｇｅＲｅｃｏｇｎｉｔｉｏｎ，ｉｎ：ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥｃｏｎｆｅｒｅｎｃｅｏｎｃｏｍｐｕｔｅｒｖｉｓｉｏｎａｎｄｐａｔｔｅｒｎｒｅｃｏｇｎｉｔｉｏｎ，２０１６，ｐｐ．７７０－７７８に記載され、その全体が本明細書に組み込まれる）及びＭｏｂｉｌｅＮｅｔＶ２（Ｇｏｏｇｌｅ社による深さ方向に分離可能な畳み込みニューラルネットワークの第２バージョンであり、Ｍ．Ｓａｎｄｌｅｒ，Ａ．Ｈｏｗａｒｄ，Ｍ．Ｚｈｕ，Ａ．Ｚｈｍｏｇｉｎｏｖ，Ｌ．－Ｃ．Ｃｈｅｎ，ＩｎｖｅｒｔｅｄＲｅｓｉｄｕａｌｓａｎｄＬｉｎｅａｒＢｏｔｔｌｅｎｅｃｋｓ：ＭｏｂｉｌｅＮｅｔｗｏｒｋｓｆｏｒＣｌａｓｓｉｆｉｃａｔｉｏｎ，ＤｅｔｅｃｔｉｏｎａｎｄＳｅｇｍｅｎｔａｔｉｏｎ，ａｒＸｉｖｐｒｅｐｒｉｎｔａｒＸｉｖ：１８０１．０４３８１，１３Ｊａｎ．２０１８に記載され、その全体が本明細書に組み込まれる）のアーケテクチャを適用できる。 Rather than designing neural networks from scratch, we can adapt architectures that have been proven to work well for different tasks. In particular, ResNet50 (a 50-layer residual network by Microsoft Research Asia, K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE conference conference and pattern recognition, 2016, pp. 770-778, which is incorporated herein in its entirety) and MobileNetV2 (a second version of Google Inc.'s depth-wise separable convolutional neural network; Ｓａｎｄｌｅｒ，Ａ．Ｈｏｗａｒｄ，Ｍ．Ｚｈｕ，Ａ．Ｚｈｍｏｇｉｎｏｖ，Ｌ．－Ｃ．Ｃｈｅｎ，ＩｎｖｅｒｔｅｄＲｅｓｉｄｕａｌｓａｎｄＬｉｎｅａｒＢｏｔｔｌｅｎｅｃｋｓ：ＭｏｂｉｌｅＮｅｔｗｏｒｋｓｆｏｒＣｌａｓｓｉｆｉｃａｔｉｏｎ，ＤｅｔｅｃｔｉｏｎａｎｄＳｅｇｍｅｎｔａｔｉｏｎ，ａｒＸｉｖｐｒｅｐｒｉｎｔａｒＸｉｖ：１８０１．０４３８１，１３Ｊａｎ．２０１８に記載and incorporated herein in its entirety) can be applied.

ＲｅｓＮｅｔ５０とＭｏｂｉｌｅＮｅｔＶ２は、ＩｍａｇｅＮｅｔ（分類用の画像データを集めたオープンソースのデータベース）で学習した畳み込みニューラルネットワークである。ＲｅｓＮｅｔ５０は、多くの最先端のシステムのバックボーンとして使用されており、ＭｏｂｉｌｅＮｅｔＶ２は、実行時間と記憶領域が気になる場合に、精度を適度に落として使用できる、より効率的なネットワークである。分類に使用する場合、これらの各ネットワークには、低解像度であるが強力なＣＮＮの特徴のセットをもたらす大規模な完全畳み込み部分が含まれており（例えば、エンコーダ段階）、続いてグローバル・マックス（ｇｌｏｂａｌｍａｘ）プーリング層又はグローバル・アベレージ（ｇｌｏｂａｌａｖｅｒａｇｅ）プーリング層、全結合層、最終的な分類層が続きます（デコーダ段階）。それぞれが適応するための良い候補となります。 ResNet50 and MobileNetV2 are convolutional neural networks trained on ImageNet (an open-source database that collects image data for classification). ResNet50 is used as the backbone of many state-of-the-art systems, and MobileNetV2 is a more efficient network that can be used with modest precision reductions when execution time and storage space are concerns. When used for classification, each of these networks contains a large fully convolutional part (e.g., the encoder stage) that yields a low-resolution but powerful set of CNN features, followed by a global max A (global max) pooling layer or global average pooling layer, a fully connected layer and a final classification layer follow (decoder stage). Each is a good candidate for adaptation.

図３は、ＲｅｓＮｅｔ５０又はＭｏｂｉｌｅＮｅｔＶ２などのソースネットワークからの層（例えば、それぞれの処理を有する構成要素）によって定義されるエンコーダ部（ｅｎｃｏｄｅｒｃｏｍｐｏｎｅｎｔｓ）３０２と、デコーダ部（ｄｅｃｏｄｅｒｃｏｍｐｏｎｅｎｔｓ）３０４とを備えるニューラルネットワーク２０２をより詳細に示す深層学習システム２００の概略図である。デコーダ部３０４はグローバル・マックスプーリング層３０６と、出力ベクトル２０６内のＮ個の皮膚の徴候の各々を復号化するための各並列分岐（ｐａｒａｌｌｅｌｂｒａｎｃｈｅｓ）（例えば、簡略化のために示された３０８、３１０、及び３１２であり、Ｎ個の皮膚の徴候に対してＮ＋１個の並列分岐があることが理解される）と、民族性ファクタ（出力３１４）とを備える。 FIG. 3 shows a neural network comprising encoder components 302 and decoder components 304 defined by layers (e.g., components with respective processing) from a source network such as ResNet50 or MobileNetV2. 2 is a schematic diagram of deep learning system 200 showing 202 in more detail; FIG. The decoder section 304 includes a global maxpooling layer 306 and respective parallel branches (eg, 308 shown for simplicity) for decoding each of the N skin manifestations in the output vector 206. , 310, and 312, where it is understood that there are N+1 parallel branches for N skin manifestations), and an ethnicity factor (output 314).

最終の分類層だけを置き換えるのではなく、ソースネットワークの各々をプーリング層の後にクロッピング（切り取り）して特徴ネット（ニューラルネットワーク２０４）を構築する。具体的には、ＲｅｓＮｅｔ５０がその平均プーリング層の後にクロッピングされ、平均プーリング層は、グローバル・マックスプーリング層（例えば、３０６）に置き換えられ、１×１×２０４８の特徴ベクトルを生成する。同様に、ＭｏｂｉｌｅＮｅｔＶ２については、全結合層が切り取られ、平均プーリング層がグローバル・マックスプーリング層に置き換えられ、新しい特徴ネットが１×１×１２８０の特徴ベクトルを出力する。並列分岐３０８、３１０、３１２の各々は、グローバル・マックスプーリング層３０６からの出力を受け取る。 Instead of replacing only the final classification layer, each of the source networks is cropped after the pooling layer to build the feature net (neural network 204). Specifically, ResNet 50 is cropped after its average pooling layer, which is replaced by a global maxpooling layer (eg, 306) to produce a 1×1×2048 feature vector. Similarly, for MobileNetV2, the fully connected layer is truncated, the average pooling layer is replaced with a global max pooling layer, and the new feature net outputs 1x1x1280 feature vectors. Each of parallel branches 308 , 310 , 312 receives output from global maxpooling layer 306 .

この初期の分岐（ブランチ）の選択を行うのは、異なる皮膚の徴候が、潜在的に異なる画像の特徴に依存するためであり、この選択は実験を通して検証される。各皮膚の徴候の分岐（各並列分岐３０８、３１０のうちの１つ）は、活性化層に続く２つの全結合層で構成される。最初に、入力サイズをプーリング後の特徴サイズ（例えば、それぞれ、１×１×２０４８または１×１×１２８０）、出力サイズを５０とする全結合層で特徴ネット（ＲｅｓＮｅｔ５０又はＭｏｂｉｌｅＮｅｔ）を接続し、それに続いてＲｅＬＵ活性化層（例えば、正規化線形活性化ユニット）を接続する。次に、入力サイズ５０、出力サイズ１の第２の全結合層の後に、最終スコアを出力するカスタマイズされた活性化層が続く。 This initial branch selection is made because different skin manifestations depend on potentially different image features, and this selection is verified through experimentation. Each skin manifestation branch (one of each parallel branch 308, 310) consists of two fully connected layers following the activation layer. First, connect the feature nets (ResNet50 or MobileNet) with a fully connected layer where the input size is the feature size after pooling (for example, 1×1×2048 or 1×1×1280, respectively) and the output size is 50, Then connect the ReLU activation layer (eg, normalized linear activation unit). Then a second fully connected layer with an input size of 50 and an output size of 1 is followed by a customized activation layer that outputs the final score.

このシステムは、本明細書中で上記に参照したように、Ｌ’Ｏｒｅａｌによって維持される国際的に許容された皮膚スコアの図版集に準拠しており、そして結果として、皮膚の徴候は、それらのタイプ、人の民族性、及び性別に応じて個々のスケールを有する。なぜなら、皮膚の徴候の各々には境界があるためである。最後の層には、純粋な線形の回帰層や、他の活性化関数ではなく、カスタムの関数、即ちＬｅａｋｙＲｅＬＵのような活性化関数（ＬｅａｋｙＣｌａｍｐと呼ばれる）が使用される。ＬｅａｋｙＲｅＬＵは、Ａ．Ｌ．Ｍａａｓ，Ａ．Ｙ．Ｈａｎｎｕｎ，Ａ．Ｙ．Ｎｇ，ＲｅｃｔｉｆｉｅｒＮｏｎｌｉｎｅａｒｉｔｉｅｓＩｍｐｒｏｖｅＮｅｕｒａｌＮｅｔｗｏｒｋＡｃｏｕｓｔｉｃＭｏｄｅｌｓ，ｉｎ：Ｐｒｏｃ．ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＭａｃｈｉｎｅＬｅａｒｎｉｎｇ，Ｖｏｌ．３０，２０１３，ｐ．３，に記載され、参照により本明細書に組み込まれる。ＬｅａｋｙＲｅＬＵは、ｘ＜０である場合の「ｄｙｉｉｎｇＲｅＬＵ」の問題に対処しようとするものである。標準的なＲｅＬＵ関数がｘ＜０の時に０になるが、ＬｅａｋｙＲｅＬＵは小さな負の傾き（例えば、０に近い０．０１程度の傾き）を有する。 The system complies with the internationally accepted skin score pictorial compendium maintained by L'Oreal, as referenced herein above, and as a result, skin manifestations are It has individual scales according to the type of person, the ethnicity of the person, and the gender. This is because each skin manifestation has boundaries. The final layer uses a custom function, namely an activation function like Leaky ReLU (called LeakyClamp), rather than a purely linear regression layer or another activation function. Leaky ReLU is an A.I. L. Maas, A.; Y. Hannun, A.; Y. Ng, Rectifier Nonlinearities Improve Neural Network Acoustic Models, in: Proc. International Conference on Machine Learning, Vol. 30, 2013, p. 3, which is incorporated herein by reference. Leaky ReLU attempts to address the problem of "dying ReLU" when x<0. While the standard ReLU function goes to 0 when x<0, Leaky ReLU has a small negative slope (eg, a slope of about 0.01 close to 0).

ＬｅａｋｙＣｌａｍｐは、最小の活性化（ｍｉｎ－ａｃｔｉｖａｔｉｏｎ）以下および最大の活性化（ｍａｘ－ａｃｔｉｖａｔｉｏｎ）以上では０に近い傾きを持ち、最大の活性化は、以下の数式１のように徴候（ｓｉｇｎ）によって異なっている。αは傾き、ａは下限、ｂはスコア範囲の上限である。学習では、αを０．０１、ａ、ｂを各徴候のスコア範囲とするように選択する。

LeakyClamp has a slope close to 0 below min-activation and above max-activation, where max-activation is determined by sign as in Equation 1 below. different. α is the slope, a is the lower bound, and b is the upper bound of the score range. For learning, α is chosen to be 0.01 and a, b to be the score range for each symptom.

本明細書の評価の項で更に説明されるように、深層学習ネットワークを学習させるために、ｘｉがｉ番目の学習画像であり、ｙｉがスコアのベクトルである（ｘｉ、ｙｉ）という形式の多数のサンプルを取得して使用する。最適なパラメータθのセットを見つけるために、損失関数が最小化される。いくつかの損失関数を用いて実験を行ったが、一方が他方より優れていることは見出されなかった。 To train a deep learning network, as further described in the evaluation section of this specification, multiple obtain and use a sample of A loss function is minimized to find the optimal set of parameters θ. Experiments were performed with several loss functions and no one was found to be superior to the other.

従って、標準的なＬ２損失（数式２）は最小化され、本明細書に示されるデータで使用され、ここで、Ｌ２は下記の数式２で示される。

Therefore, the standard L2 loss (equation 2) is minimized and used in the data presented herein, where L2 is given in equation 2 below.

更に、皮膚のスコアの民族性に依存することから、独自のコンポーネント構造と標準的な交差エントロピー分類損失のＬ_{ｅｔｈｎｉｃｉｔｙ}を有する別個の民族性予測の分岐（各並列分岐３１２のうちの１つ）が定義される。民族性の分岐（３１２）は、入力サイズを特徴サイズとし、出力サイズを民族性の個数とする１つの全結合層を有する。追加の損失Ｌ_{ｅｔｈｎｉｃｉｔｙ}は、学習を正しい方向に導くのに役立つが、試験時にも役立ち、個人の民族グループを使用することによって出力スコアを正しく解釈できるようになる。Ｌ２損失および交差エントロピー分類損失のＬ_{ｅｔｈｎｉｃｉｔｙ}は、以下の数式４に示すように、重みλと合わせて損失Ｌとなる。λは、スコア回帰と民族性の分類損失の間のバランスを制御する。学習ではλ＝０．００２を用いた。

Furthermore, due to the skin score ethnicity dependence, a separate ethnicity prediction branch (one of each parallel branch 312) with its own component structure and standard cross-entropy classification loss L _ethnicity Defined. The ethnicity branch (312) has one fully connected layer with the input size as the feature size and the output size as the number of ethnicities. The additional loss, L _identity , helps steer learning in the right direction, but it also helps during testing, allowing correct interpretation of output scores by using a person's ethnic group. The L _thnicity of the L2 loss and the cross-entropy classification loss, together with the weight λ, is the loss L, as shown in Equation 4 below. λ controls the balance between score regression and ethnicity classification loss. λ=0.002 was used for learning.

一般的な転移学習の手法に従い、ネットワークはＩｍａｇｅＮｅｔ上で事前に学習され、次いで、上記の損失を使用して（例えば、最小化して）、皮膚の診断データ上で微調整される。また、ＩｍａｇｅＮｅｔの事前学習の手順と同様に、［０．４８５、０．４５６、０．４０６］を中心とし、［０．２２９、０．２２４、０．２２５］の標準偏差で画像の正規化が適用される。アダムオプティマイザ（Ａｄａｍｏｐｔｉｍｉｚｅｒ）は、確率的な目的関数の一次勾配ベースの最適化を行い、学習率は０．０００１、バッチサイズ１６で学習プロセスを微調整する。アダムオプティマイザは、Ｄ．Ｐ．Ｋｉｎｇｍａ，Ｊ．Ｂａ，Ａｄａｍ：Ａｍｅｔｈｏｄｆｏｒｓｔｏｃｈａｓｔｉｃｏｐｔｉｍｉｚａｔｉｏｎ，ＣｏＲＲａｂｓ／１４１２．６９８０．ａｒＸｉｖ：１４１２．６９８０ａｓｅａｒｌｙａｓ２２Ｄｅｃ．２０１４に記載されており、参照により本明細書に組み込まれる。 Following a common transfer learning approach, the network is pre-trained on ImageNet and then fine-tuned on skin diagnostic data using (eg, minimizing) the above losses. Also, similar to ImageNet's pre-training procedure, we normalize the image with a center of [0.485, 0.456, 0.406] and a standard deviation of [0.229, 0.224, 0.225] applies. The Adam optimizer performs first-order gradient-based optimization of a stochastic objective function, fine-tuning the learning process with a learning rate of 0.0001 and a batch size of 16. The Adam optimizer was developed by D. P. Kingma, J.; Ba, Adam: A method for stochastic optimization, CoRR abs/1412.6980. arXiv: 1412.6980 as early as 22 Dec. 2014, which is incorporated herein by reference.

消費者向けの用途を含め、見かけ上の皮膚診断には、多くの科学的、商業的および他の用途がある。制御された照明条件で画像を撮影し、標準化された姿勢を利用することによって、そのような幾つかの用途のための撮影条件を制御することは可能であるが、そのようなことは、特に消費者向けの用途では実現不可能の場合がある。従って、深層学習システムは、様々な照明条件や顔の姿勢を取り扱うことができることが望ましい。更に図３を参照すると、後者に対処するための一例として、ソース入力画像３１８上の顔のランドマーク検出器３１６（Ｖ．Ｋａｚｅｍｉ，Ｊ．Ｓｕｌｌｉｖａｎ，Ｏｎｅｍｉｌｌｉｓｅｃｏｎｄｆａｃｅａｌｉｇｎｍｅｎｔｗｉｔｈａｎｅｎｓｅｍｂｌｅｏｆｒｅｇｒｅｓｓｉｏｎｔｒｅｅｓ，ｉｎ：ＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，２０１４，ｐｐ．１８６７－１８７４に記載され、参照により本明細書に組み込まれる）を用いて、検出されたランドマークに基づいて顔（出力画像２０２）を正規化するために画像を前処理しても良い。このように、入力された顔画像ｘは、常に一定のスケールでの顔の直立した正面画像となる。 Apparent skin diagnostics has many scientific, commercial and other uses, including consumer applications. Although it is possible to control the imaging conditions for some such applications by taking images in controlled lighting conditions and utilizing standardized poses, such is particularly It may not be feasible for consumer applications. Therefore, it is desirable that a deep learning system be able to handle various lighting conditions and facial poses. Still referring to FIG. 3, as an example to address the latter, a face landmark detector 316 (V. Kazemi, J. Sullivan, One million face alignment with an ensemble of regression trees, in : IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1867-1874, incorporated herein by reference) to normalize faces (output image 202) based on detected landmarks. The image may be pre-processed in order to Thus, the input face image x is always an upright frontal image of the face at a fixed scale.

学習中、ランドマークベースのクロッピングの後でも、任意のスケール変動に対応するために、様々なスケールのクロッピング（０．８から１．０までランダムに選択される）で学習データを増強しても良い。画像をランダムにクロッピングした後、各入力画像は、４４８ピクセル×３３４ピクセルの解像度にリサイズされる（例えば、ソースネットワークの予想される入力解像度に合うように）。更に、選択された画像は、学習中に０．５の確率で水平方向にランダムに反転される。また、照明の変動に対応するために、本願明細書の評価項で述べるように、様々な照明条件の画像を用いて学習が行われる。
＜評価＞ During training, even after landmark-based cropping, we augment the training data with various scales of cropping (chosen randomly from 0.8 to 1.0) to accommodate arbitrary scale variations. good. After randomly cropping the images, each input image is resized to a resolution of 448 pixels by 334 pixels (eg, to match the expected input resolution of the source network). In addition, the selected images are randomly flipped horizontally during training with a probability of 0.5. Also, in order to cope with illumination variations, learning is performed using images under various illumination conditions, as described in the evaluation section of this specification.
<Evaluation>

このモデルは、以下の９つの皮膚の徴候に従って女性画像の２つのデータセットで評価された学習結果である。
・法令線
・眉間のしわ
・額のしわ
・目の下のしわ
・唇の角のしわ
・顔の下部の下垂
・頬の皮脂腺の毛穴
・顔全体の色素沈着
・血管障害 This model has been trained on two datasets of female images according to the following nine skin signatures.
・Wrinkles between the eyebrows ・Wrinkles on the forehead ・Wrinkles under the eyes ・Wrinkles on the corners of the lips ・Ptosis of the lower part of the face

最後の２つの皮膚の徴候は、白人およびアジア人の民族に対してのみ定義されていることに注意されたい。第１のデータセットは、プロ用のカメラを用いて、理想的な照明と顔の姿勢で制御された実験室条件で撮影した５８３４人の女性画像（以下「分析データセット」という）で構成されている。このデータセットの全ての画像が９つの徴候全てのグランドトゥルースを含んでいるわけではないことに注意されたい。第２のデータセットは、照明が制御されていない条件下で携帯電話で撮影した自撮り画像（以下「自撮りデータセット」という）で構成されている。このデータセットには、３つの民族（白人、アジア人、アフリカ人）の女性３８０人の画像を含み、それぞれの人物が、屋外の昼光、屋内の昼光、屋内の人工的な拡散光、及び屋内の人工的な直射光の４つの異なる照明条件で撮影される。その結果、全体で４５６０枚の画像が得られた。両方のデータセットについて、データの９０％を学習に使用し、１０％を試験に使用した。同様に、いずれのケースにおいても、分析データセットの一部の画像には不要ではあるが、同じ顔の正規化の構成（フレームワーク）が適用される。この構成では、一部の画像で顔や顔のランドマークを検出することができず、学習や試験のデータ量が若干減少する。 Note that the last two skin manifestations are defined only for Caucasian and Asian ethnicities. The first data set consisted of images of 5834 women captured under controlled laboratory conditions with ideal lighting and facial poses using a professional camera (hereinafter referred to as the "analysis data set"). ing. Note that not all images in this dataset contain ground truth for all nine symptoms. The second dataset consists of selfie images taken with a mobile phone under uncontrolled lighting conditions (hereinafter referred to as the "selfie dataset"). This dataset contains images of 380 women of three ethnicities (Caucasian, Asian, and African), each of whom is exposed to outdoor daylight, indoor daylight, indoor artificial diffuse light, and indoor artificial direct light in four different lighting conditions. As a result, a total of 4560 images were obtained. For both datasets, 90% of the data was used for training and 10% for testing. Similarly, in both cases, the same face normalization framework is applied, although it is not necessary for some images in the analysis dataset. With this configuration, faces and facial landmarks cannot be detected in some images, which slightly reduces the amount of data for training and testing.

両方のデータセットは専門の皮膚科医によって手動で注釈付けされ、各画像は１０～１２人の専門家によって注釈付けされた。専門家の予想の平均値をグランドトゥルースとする。 Both datasets were manually annotated by expert dermatologists and each image was annotated by 10-12 experts. The ground truth is the average of the expert's predictions.

男性の画像に関する学習が行われることがある。鮮明な画像を得るために、顔の毛が無いことなどの画像条件を課しても良い。顔の毛は、顔の毛で覆われた皮膚領域の徴候のスコアに大きな影響を及ぼすだけでなく、全ての徴候について特徴を合わせて学習するため、これは学習の全体にも影響を与える。男性と女性の皮膚の徴候は同じである。 Training may be done on images of men. Image conditions such as no facial hair may be imposed in order to obtain a clear image. Not only does facial hair have a large impact on the score of symptoms in areas of skin covered with facial hair, but it also impacts overall learning, as features are learned together for all symptoms. Skin manifestations in men and women are the same.

ニューラルネットワーク２０２で構成される学習済みの深層学習システム２００を評価するために、いくつかの尺度が使用される。民族の予知のために、正しく分類された割合が測定される。分析データセット及び自撮りデータセットの試験精度は、それぞれ９９．７％および９８．２％である。皮膚のスコアについては、２種類の測定値が使用される。第１は平均絶対誤差（ＭＡＥ）であり、これは、全てのサンプルにおける予測値とグランドトゥルース値との間の絶対差の平均である。しかし、より有意義な誤差測度は、絶対誤差がある閾値（％（ＭＡＥ＜Ｔ））以下であるサンプルの割合である。用途によってはこの閾値が多かれ少なかれ厳密になる可能性がある。従って、この誤差測定値はいくつかの異なる閾値について報告される。以下は、分析データセット及び自撮りデータセットの両方の結果である。 Several measures are used to evaluate a trained deep learning system 200 consisting of neural networks 202 . For ethnic precognition, the correctly classified percentage is measured. The test accuracies for the analysis and selfie datasets are 99.7% and 98.2%, respectively. For skin scores, two measurements are used. The first is the Mean Absolute Error (MAE), which is the average of the absolute differences between the predicted and ground truth values over all samples. However, a more meaningful error measure is the percentage of samples whose absolute error is below some threshold (%(M AE<T)). Depending on the application, this threshold may be more or less stringent. Therefore, this error measure is reported for several different thresholds. Below are the results for both the analysis dataset and the selfie dataset.

表１は、分析データセットの結果を示し、表２は、自撮りデータセットの結果を示す。例えば、スコアの典型的な範囲は０～５～１０であるが、深層学習システム２００は、任意の皮膚の徴候について９０％以上で１の絶対誤差内のスコアを予測することができることに留意されたい（いくつかの徴候については、より正確である）。

Table 1 shows the results for the analysis dataset and Table 2 shows the results for the selfie dataset. For example, a typical range of scores is 0-5-10, but it is noted that deep learning system 200 can predict scores within an absolute error of 1 over 90% for any skin manifestation. (more accurate for some indications).

自撮りデータセット（表２）では、制御されていない照明条件にも関わらず、ほとんどの場合、結果は更に良好である。しかし、専門家自身の間で、また同じ専門家の異なる照明条件間でさえ、スコアに非常に大きなばらつきがあることも観察される。従って、グランドトゥルースには偏りがあり、システム２００が内部的に証明条件の予測を学習して、スコアをより良く予測している可能性が高い。様々な照明条件の間で、より一貫性のあるグランドトゥルースを収集することが助けになり得る。 For the selfie dataset (Table 2), the results are mostly even better, despite the uncontrolled lighting conditions. However, it is also observed that there is a very large variability in scores between the experts themselves and even between different lighting conditions of the same expert. Therefore, it is likely that the ground truth is biased and the system 200 is internally learning to predict proof conditions to better predict scores. Collecting more consistent ground truth across different lighting conditions can help.

しかしながら、現時点では「出回っている（ｉｎ－ｔｈｅ－ｗｉｌｄ）」画像に基づいて皮膚の徴候をスコア付けすることは、専門の皮膚科医にとっても困難な課題であり、この課題においてシステム２００がそれを上回る性能を有することを示している。これは、表３に示されるように、各画像の絶対誤差は、各画像が平均して１２人の専門家によってスコア付けされ、各専門家の予測とこの画像の平均的な専門家の予測とを比較することによって計算された。表２と表３を比較することにより、システム１００は、顔全体の色素沈着を除いて、全ての徴候について専門家よりも正確であることが分かる。

Currently, however, scoring skin manifestations based on "in-the-wild" images is a difficult task, even for professional dermatologists, in which system 200 can help. It shows that it has a performance exceeding that of This is shown in Table 3, where the absolute error for each image is calculated by comparing each expert's prediction with the average expert's prediction for this image, where each image is scored by 12 experts on average. It was calculated by comparing with By comparing Tables 2 and 3, it can be seen that the system 100 is more accurate than the expert for all indications except pigmentation across the face.

自撮りデータの画像ベースのスコアに関するモデルの検証に加えて、皮膚科医がその人の皮膚状態の徴候をスコア付けすることができた被験者のサブセットについても検証を実施した。皮膚科専門医は、６８名（１被験者あたり約１２名の専門家）から訪問を受け、被験者の画像ベースのスコアに関係なく、ライブで評価を行った。画像ベースの分析と同様に、各皮膚状態の徴候について、１）システム２００のモデルについて、モデルからの予測と、特定の被験者の徴候に対する専門家の平均スコアとを比較すること、及び２）各専門家による対面での評価については、各専門家のスコアのベクトルと、この被験者に対する専門家の平均スコアのベクトルとを比較することによって、平均絶対誤差を算出した。モデルのパフォーマンス（表４）及び専門家のパフォーマンス（表５）に関する２つの表が以下に示される。画像ベースのスコアリングの場合と同様に、専門家による対面でのスコアリングの場合でも、システム２００からの自動的なスコア予測は、専門の皮膚科医による予測よりも高い精度をもたらし、これは、全ての徴候についてである。

In addition to validation of the model for image-based scores of selfie data, validation was also performed on a subset of subjects for whom dermatologists were able to score signs of their skin condition. Dermatologists were visited by 68 individuals (approximately 12 experts per subject) and evaluated live, regardless of subject's image-based score. Similar to the image-based analysis, for each skin condition indication, 1) for the system 200 model, compare the predictions from the model to the expert's average score for the particular subject's indication; For face-to-face expert assessments, the mean absolute error was calculated by comparing each expert's score vector to the expert's mean score vector for this subject. Two tables for model performance (Table 4) and expert performance (Table 5) are shown below. As with image-based scoring, even with face-to-face scoring by professionals, automatic score prediction from system 200 yields higher accuracy than predictions by professional dermatologists, which , for all indications.

表４、５の結果をより深く把握するために、表２、３と同様の検証解析を行うが、その際、実際に評価対象となった６８名のサブセットのみを用いている。その結果を以下の表６及び表７に示す。この場合も、システム２００からのモデルスコアの予測では、専門家のスコアリングよりも著しく高い精度が得られる。

In order to gain a deeper understanding of the results in Tables 4 and 5, validation analyzes similar to those in Tables 2 and 3 were performed, but only a subset of the 68 subjects actually evaluated was used. The results are shown in Tables 6 and 7 below. Again, prediction of model scores from system 200 provides significantly higher accuracy than expert scoring.

図４は、例示的なコンピュータネットワーク４００のブロック図であり、ユーザ４０４によって操作される個人的使用のコンピュータ装置４０２が、通信ネットワーク４０４を介して、遠隔に位置するサーバコンピュータ装置、即ちサーバ４０６及びサーバ４０８と通信している。。ユーザ４０４は、皮膚科医の患者および／又は消費者であっても良い。また、第２のユーザ４１０と通信ネットワーク４０４を介して通信するように構成された第２のコンピュータ装置４１２も示されている。第２のユーザ４１０は、皮膚科医であっても良い。コンピュータ装置４０２は、ユーザが個人的に使用するためのものであり、サーバからのサービスのように、公衆が利用することはできない。ここで、公衆とは、登録されたユーザ及び／又は顧客等を含む。 FIG. 4 is a block diagram of an exemplary computer network 400 in which a personal-use computing device 402 operated by a user 404 communicates, via a communications network 404, with remotely located server computing devices, namely server 406 and server 406 . Communicating with server 408 . . User 404 may be a dermatologist's patient and/or a consumer. Also shown is a second computing device 412 configured to communicate with a second user 410 via communications network 404 . A second user 410 may be a dermatologist. The computing device 402 is for the personal use of the user and is not available to the public like services from a server. Here, the public includes registered users and/or customers.

簡潔には、コンピュータ装置４０２は、本明細書に記載されるように皮膚の診断を実行するように構成される。ニューラルネットワーク２００は、ボード上のコンピュータ装置４０２上に格納されて利用されても良いし、コンピュータ装置４０２から受信した（複数の）画像からのクラウドサービス、ウェブサービスなどを介してサーバ４０６から提供されても良い。 Briefly, computing device 402 is configured to perform skin diagnostics as described herein. Neural network 200 may be stored and utilized on board computing device 402 or may be provided from server 406 via cloud services, web services, etc. from image(s) received from computing device 402 . can be

コンピュータ装置４０２は、例えば、皮膚診断の情報を提供し、皮膚診断および／又はユーザに関する他の情報（例えば、年齢、性別など）に応じて、製品／推奨される治療法を受信するために、サーバ４０８と通信するように構成される。コンピュータ装置４０２は、皮膚診断の情報（画像データを含んでも良い）を、例えば、データストア（図示せず）に記憶するために、サーバ４０６及び４０８のいずれかまたは両方に伝達するように構成され得る。サーバ４０８（又は図示しない別のサービスの提供者）は、推奨される（複数の）商品を販売するための電子商取引のサービスを提供することができる。 The computing device 402 may, for example, provide skin diagnostic information and receive products/recommended treatments depending on the skin diagnostic and/or other information about the user (eg, age, gender, etc.). It is configured to communicate with server 408 . Computing device 402 is configured to communicate skin diagnostic information (which may include image data) to either or both of servers 406 and 408 for storage, for example, in a data store (not shown). obtain. Server 408 (or another service provider not shown) may provide an e-commerce service for selling the recommended product(s).

コンピュータ装置４０２は、携帯モバイル端末（例えば、スマートフォン又はタブレット）として示されている。しかし、ラップトップ、デスクトップ、ワークステーションなどの別のコンピュータ装置であっても良い。本明細書に記載される皮膚診断は、他のコンピュータ装置に実装されても良い。コンピュータ装置４０２は、例えば、１又は複数のネイティブアプリケーション又はブラウザベースのアプリケーションを使用して構成されても良い。 Computing device 402 is depicted as a portable mobile terminal (eg, smart phone or tablet). However, it may be another computing device such as a laptop, desktop, workstation, or the like. The skin diagnostics described herein may be implemented in other computing devices. Computing device 402 may be configured using, for example, one or more native or browser-based applications.

コンピュータ装置４０２は、例えば、皮膚、特に顔の写真のような１又は複数の画像を取得し、画像を処理して皮膚診断を提供するユーザデバイスを備えても良い。皮膚診断は、画像を定期的に取得して分析し、１又は複数の皮膚の徴候に対する皮膚のスコアを決定する皮膚治療のプランに関連して実行されても良い。スコアは（ローカル、リモート、又はその両方で）記憶され、例えば傾向や改善点などを示すためにセッション間で比較されても良い。皮膚のスコアおよび／又は皮膚の画像は、コンピュータ装置４０２のユーザ４０４にアクセス可能であっても良く、皮膚科医などのコンピュータシステム４００の別のユーザ（例えば、第２のユーザ４１０）が利用できるように（例えば、サーバ４０６を介して、又は通信ネットワーク４０４を介して別の方法で（電子的に）通信されるように）構成しても良い。また、第２のコンピュータ装置４１２は、上記に説明した皮膚診断を実行しても良い。それは、リモートソース（例えば、コンピュータ装置４０２、サーバ４０６、サーバ４０８等）から画像を受信しても良く、及び／又はそれに結合された光学センサ（例えば、カメラ）を介して、又は任意の他の方法で画像をキャプチャしても良い。ニューラルネットワーク２００は、説明されるように、第２のコンピュータ装置４１２から、又はサーバ４０６から、記憶および使用されても良い。 Computing device 402 may comprise, for example, a user device that acquires one or more images, such as photographs of skin, particularly a face, and processes the images to provide a skin diagnosis. Skin diagnostics may be performed in conjunction with a skin treatment plan in which images are periodically acquired and analyzed to determine a skin score for one or more skin manifestations. Scores may be stored (locally, remotely, or both) and compared between sessions to show trends, improvements, and the like. The skin score and/or skin image may be accessible to a user 404 of the computing device 402 and available to another user of the computer system 400 (eg, a second user 410), such as a dermatologist. (eg, communicated via server 406 or otherwise (electronically) via communication network 404). The second computing device 412 may also perform skin diagnostics as described above. It may receive images from a remote source (eg, computing device 402, server 406, server 408, etc.) and/or via an optical sensor (eg, camera) coupled thereto, or any other You can capture the image in any way. Neural network 200 may be stored and used from second computing device 412 or from server 406 as described.

皮膚診断を実行し、１又は複数の製品を提案し、１又は複数の製品を適用（治療プランにおける治療セッションを定義しても良い）した後の皮膚の変化をある期間にわたってモニタリングするアプリケーションを提供しても良い。コンピュータ・アプリケーションは、以下のアクティビティのいずれかを実行するために、一連の指示的なグラフィカル・ユーザ・インタフェース（ＧＵＩ）及び／又は他のユーザ・インタフェース（通常は対話的であり、ユーザの入力を受け取る）などのワークフローを提供しても良い。
・皮膚診断
・治療計画などの製品の推奨事項
・製品の購入やその他の取得
・各治療セッションでの製品の使用を思い出させる、指示する、及び／又は記録する（例えば、ログを記録する）
・その後の（例えば、１回以上のフォローアップ）皮膚診断
・結果（例えば比較結果）の提示
例えば、皮膚治療の計画の進捗をモニタリングするために、治療計画のスケジュールに従ってデータを生成することもできる。これらのアクティビティのいずれも、例えば、ユーザ４１０がレビューするため、別の個人がレビューするため、他のユーザのデータと集約して治療計画の有効性を測定するためなどのために、遠隔的に記憶されるデータを生成できる。 Provides an application that performs skin diagnostics, recommends one or more products, and monitors skin changes over time after applying one or more products (which may define treatment sessions in a treatment plan). You can A computer application uses a series of instructional graphical user interfaces (GUIs) and/or other user interfaces (usually interactive and accepting user input) to perform any of the following activities: receive) or other workflows may be provided.
-Recommendation of products such as skin diagnostics -Treatment plans -Purchasing or otherwise obtaining products -Reminding, directing and/or recording (e.g., logging) the use of products in each treatment session
Subsequent (e.g. one or more follow-ups) skin diagnosis Presentation of results (e.g. comparison results) Data can also be generated according to the schedule of the treatment plan, e.g. to monitor the progress of the skin treatment plan . Any of these activities can be performed remotely, e.g., for review by user 410, for review by another individual, aggregated with other users' data to measure treatment plan effectiveness, etc. It can generate data to be stored.

比較結果（例えば、前後の結果）は、治療計画の最中および／又は完了時などに関わらず、コンピュータ装置４０２を介して提示しても良い。上述したように、皮膚診断の態様は、コンピュータ装置４００上で、又は遠隔的に結合されたデバイス（例えば、クラウド内のサーバ又は別の構成）によって実行することができる。 Comparison results (eg, before and after results) may be presented via computing device 402, such as during and/or at the completion of the treatment plan. As noted above, aspects of skin diagnostics may be performed on computing device 400 or by a remotely coupled device (eg, a server in the cloud or another configuration).

図５は、本発明の１又は複数の態様によるコンピュータ装置４０２の構成図である。コンピュータ装置４０２は、１又は複数のプロセッサ５０２、１又は複数の入力装置５０４、ジェスチャベースの入出力装置５０６、１又は複数の通信ユニット５０８、及び１又は複数の出力装置５１０を備える。コンピュータ装置４０２はまた、１又は複数のモジュール及び／又はデータを記憶する１又は複数の記憶装置５１２を含む。モジュールは、深層ニューラルネットワーク・モデル５１４、グラフィカル・ユーザ・インターフェース（ＧＵＩ５１８）のためのコンポーネントを有するアプリケーション５１６、及び／又は治療のモニタリングのためのワークフロー（例えば、治療モニタ５２０）、画像取得５２２（例えば、インターフェース）、及び治療／製品セレクタ５３０（例えば、インターフェース）を含んでも良い。データは、処理のための１つ以上の画像（例えば、画像５２４）、皮膚診断データ（例えば、それぞれのスコア、民族、又は他のユーザデータ）、具体的な治療に関連するログデータ、リマインダ用などのスケジュールを伴う治療計画などの治療データを含んでも良い。 FIG. 5 is a block diagram of computing device 402 in accordance with one or more aspects of the present invention. Computing device 402 includes one or more processors 502 , one or more input devices 504 , gesture-based input/output devices 506 , one or more communication units 508 , and one or more output devices 510 . Computing device 402 also includes one or more storage devices 512 that store one or more modules and/or data. The modules include a deep neural network model 514, an application 516 with components for a graphical user interface (GUI 518), and/or a workflow for therapy monitoring (e.g., therapy monitor 520), image acquisition 522 ( interface), and a therapy/product selector 530 (eg, interface). The data may include one or more images for processing (e.g., image 524), skin diagnostic data (e.g., respective scores, ethnicity, or other user data), log data related to specific treatments, and for reminders. It may also include treatment data such as treatment plans with schedules such as.

アプリケーション５１６は、映像などの１又は複数の画像を取得し、画像を処理して、ニューラルネットワークモデル５１４によって提供される深層ニューラルネットワークの皮膚診断を判定する機能を提供する。ネットワークモデルは、図２及び図３に示すモデルとして構成しても良い。別の実施形態では、ネットワークモデルは遠隔地に配置され、コンピュータ装置４０２は、アプリケーション５１６を介して、皮膚診断のデータの処理および返送のために画像を通信することができる。アプリケーション５１６は、前述の活動を実行するように構成され手も良い。 Application 516 provides functionality for acquiring one or more images, such as videos, and processing the images to determine a deep neural network skin diagnosis provided by neural network model 514 . The network model may be configured as the models shown in FIGS. 2 and 3. FIG. In another embodiment, the network model is remotely located and the computing device 402 can communicate images for processing and return of skin diagnostic data via the application 516 . Application 516 may also be configured to perform the aforementioned activities.

（複数の）記憶装置５１２は、処理システム５３２や、通信モジュール、画像処理モジュール（例えば、プロセッサ５０２のＧＰＵ用）、地図モジュール、連絡先モジュール、カレンダモジュール、写真／ギャラリーモジュール、写真（画像／メディア）エディタ、メディアプレーヤ及び／又はストリーミングモジュール、ソーシャルメディアアプリケーション、ブラウザモジュールなどを含む、他のモジュール（図示せず）などの追加のモジュールを記憶しても良い。記憶装置は、本明細書では記憶ユニットとして参照されることがある。 The storage device(s) 512 may include a processing system 532, a communication module, an image processing module (eg, for the GPU of processor 502), a map module, a contact module, a calendar module, a photo/gallery module, a photo (image/media ) may store additional modules such as other modules (not shown), including editors, media players and/or streaming modules, social media applications, browser modules, and the like. A storage device may be referred to herein as a storage unit.

通信チャンネル５３８は、コンポーネント５０２、５０４、５０６、５０８、５１０、５１２のそれぞれや、コンポーネント間通信のための任意のモジュール５１４、５１６、及び５３２を、通信的、物理的、及び／又は動作的に結合することができる。いくつかの例では、通信チャンネル５３８がシステムバス、ネットワーク接続、プロセス間通信データ構造、又はデータを通信するための他の任意の方法を含んでも良い。 Communication channel 538 may communicatively, physically and/or operationally connect each of components 502, 504, 506, 508, 510, 512 and any modules 514, 516, and 532 for inter-component communication. can be combined. In some examples, communication channel 538 may include a system bus, network connection, interprocess communication data structures, or any other method for communicating data.

１又は複数のプロセッサ５０２は、コンピュータ装置４０２内で機能性を実装し、及び／又は命令を実行することができる。例えば、プロセッサ５０２は、図５に示されているモジュールの機能性（例えば、処理システム、アプリケーションなど）を実行するために、記憶装置５１２から命令および／又はデータを受信するように構成されても良い。コンピュータ装置４０２は、データ／情報を記憶装置５１２に記憶しても良い。機能のいくつかは、本明細書において以下に更に説明される。処理は、図５のモジュール５１４、５１６、及び５３２内に正確に含まれない場合があり、１つのモジュールが他のモジュールの機能を支援する場合があることを理解されたい。 One or more processors 502 may implement functionality and/or execute instructions within computing device 402 . For example, processor 502 may be configured to receive instructions and/or data from storage device 512 to perform the functionality of the modules (eg, processing system, application, etc.) shown in FIG. good. Computing device 402 may store data/information in storage device 512 . Some of the functions are further described herein below. It should be appreciated that the processing may not be exactly contained within modules 514, 516, and 532 of FIG. 5, and that one module may support the functionality of another module.

処理を実行するためのコンピュータプログラムコードは、１又は複数のプログラミング言語、例えば、Ｊａｖａ、Ｓｍａｌｌｔａｌｋ、Ｃ＋＋などのオブジェクト指向プログラミング言語、又は「Ｃ」プログラミング言語や同様のプログラミング言語等、従来のオブジェクト指向プログラミング言語の任意の組合せで記述できる。 The computer program code for carrying out the processes may be written in one or more programming languages, e.g., object oriented programming languages such as Java, Smalltalk, C++, or conventional object oriented programming, such as the "C" programming language or similar programming languages. It can be written in any combination of languages.

コンピュータ装置４０２は、ジェスチャベースの入出力装置５０６のスクリーン上に表示するための出力を生成しても良く、いくつかの例では、プロジェクタ、モニタ、又は他の表示デバイスによる表示のための出力を生成しても良い。ジェスチャベースの入出力装置５０６は、様々な技術（例えば、抵抗式タッチスクリーン、表面弾性波タッチスクリーン、静電容量式タッチスクリーン、投影型静電容量式タッチスクリーン、感圧スクリーン、音響パルス認識タッチスクリーン、又は別のタッチ感応スクリーン技術、及び出力能力：液晶ディスプレイ（ＬＣＤ）、発光ダイオード（ＬＥＤ）ディスプレイ、有機発光ダイオード（ＯＬＥＤ）ディスプレイ、ドットマトリクスディスプレイ、ｅインク、又は同様のモノクロ又はカラーディスプレイ）を使用して構成され得ることが理解されるのであろう。 Computing device 402 may generate output for display on the screen of gesture-based input/output device 506, and in some examples output for display by a projector, monitor, or other display device. may be generated. Gesture-based input/output devices 506 may be implemented using a variety of technologies (e.g., resistive touch screens, surface acoustic wave touch screens, capacitive touch screens, projected capacitive touch screens, pressure sensitive screens, acoustic pulse recognition touch screens, etc.). screen, or another touch-sensitive screen technology and output capability: liquid crystal display (LCD), light emitting diode (LED) display, organic light emitting diode (OLED) display, dot matrix display, e-ink, or similar monochrome or color display) It will be appreciated that it can be configured using

本明細書で説明する例では、ジェスチャベースの入出力装置５０６は、タッチスクリーンと対話するユーザから触覚的な対話またはジェスチャを入力として受信することができるタッチスクリーンデバイスを含む。そのようなジェスチャは、タップジェスチャ、ドラッグジェスチャ又はスワイプジェスチャ、フリックジェスチャ、ユーザがジェスチャベースの入出力装置５０６の１又は複数の位置にタッチ又はポイントするジェスチャの一時停止（例えば、ユーザが少なくとも閾値時間にわたってスクリーンの同じ位置にタッチする場合）を含むことができる。ジェスチャベースの入出力装置５０６は、ノンタップジェスチャを含むこともできる。ジェスチャベースの入出力装置５０６は、グラフィカルユーザインターフェースなどの情報をユーザに出力または表示することができる。ジェスチャベースの入出力装置５０６は、コンピュータ装置４０２の様々なアプリケーション、機能、及び能力を提示することができ、これらには、例えば、画像を取得し、画像を閲覧し、画像を処理し、新しい画像を表示するためのアプリケーション５１６、メッセージングアプリケーション、電話通信、連絡先およびカレンダアプリケーション、ウェブブラウジングアプリケーション、ゲームアプリケーション、電子書籍アプリケーション及び金融、支払い、並びに他のアプリケーションまたは機能を含む。 In the examples described herein, the gesture-based input/output device 506 includes a touch screen device capable of receiving tactile interactions or gestures as input from a user interacting with the touch screen. Such gestures may include tap gestures, drag or swipe gestures, flick gestures, pause gestures in which the user touches or points to one or more locations on the gesture-based input/output device 506 (e.g., the user touches or points at least a threshold amount of time). (when touching the same location on the screen across the screen). The gesture-based input/output device 506 can also include non-tap gestures. Gesture-based input/output device 506 can output or display information to a user, such as a graphical user interface. The gesture-based input/output device 506 can present various applications, functions, and capabilities of the computing device 402, including, for example, acquiring images, viewing images, processing images, creating new Applications 516 for displaying images, messaging applications, telephony, contacts and calendar applications, web browsing applications, gaming applications, e-book applications and finance, payments, and other applications or functions.

本発明は主に入出力能力を有するディスプレイスクリーンデバイス（例えば、タッチスクリーン）の形態のジェスチャベースの入出力装置５０６について説明するが、動きを検出することができ、スクリーン自体を含まない他のジェスチャベースの入出力装置を利用できる。そのようなケースでは、コンピュータ装置４０２がディスプレイスクリーンを含むか、又はアプリケーション５１６の新しい画像およびＧＵＩを提示するためにディスプレイ機器に結合される。コンピュータ装置４０２は、トラックパッド／タッチパッド、１又は複数のカメラ、又は別のプレゼンス又はジェスチャ感知入力装置からジェスチャベースの入力を受信することができ、プレゼンスは例えば、ユーザの全部または一部の動きを含むユーザのプレゼンス態様を意味する。 Although the present invention primarily describes gesture-based input/output devices 506 in the form of display screen devices (e.g., touch screens) that have input/output capabilities, other gestures that can detect motion and do not involve the screen itself. Base I/O devices are available. In such cases, computing device 402 includes a display screen or is coupled to a display device for presenting new images and GUIs of application 516 . Computing device 402 can receive gesture-based input from a trackpad/touchpad, one or more cameras, or another presence or gesture-sensing input device, where presence is, for example, movement of all or part of the user. It means the user's presence mode including.

１又は複数の通信ユニット５０８は、１又は複数のネットワーク上でネットワーク信号を送信および／又は受信することによって、通信ネットワーク４０４などを介して、説明した目的および／又は他の目的（例えば、プリント）などのために、外部デバイス（例えば、サーバ４０６、サーバ４０８、第２のコンピュータ装置４１２）と通信しても良い。通信ユニットは、無線および／又は有線通信のための様々なアンテナ及び／又はネットワークインターフェースカード、チップ（例えば、全地球測位衛星（ＧＰＳ））などを含んでも良い。 One or more communication units 508 may transmit and/or receive network signals over one or more networks for the purposes described and/or other purposes (e.g., printing) over communication network 404 or the like. For example, it may communicate with an external device (eg, server 406, server 408, second computing device 412). The communication unit may include various antennas and/or network interface cards, chips (eg, global positioning satellites (GPS)), etc. for wireless and/or wired communication.

入力装置５０４及び出力装置５１０は、１又は複数のボタン、スイッチ、ポインティングデバイス、カメラ、キーボード、マイクロフォン、１又は複数のセンサ（例えば、バイオメトリックなど）、スピーカ、ベル、１又は複数のライト、触覚（振動）デバイスなどのいずれかを含んでも良い。１又は複数のそれらを、ユニバーサルシリアルバス（ＵＳＢ）又は他の通信チャンネル（例えば５３８）を介して結合することができる。カメラ（入力装置８０４）はユーザが「自撮り」を撮るためにジェスチャベースの入出力装置５０６を見ながら、カメラを使用して（複数の）画像をキャプチャすることを可能にするために、前方に向けられても良い（即ち、同じ側に向けられても良い）。 Input device 504 and output device 510 may be one or more buttons, switches, pointing devices, cameras, keyboards, microphones, one or more sensors (eg, biometrics, etc.), speakers, bells, one or more lights, tactile (vibration) devices, etc. may be included. One or more of them can be coupled via a Universal Serial Bus (USB) or other communication channel (eg, 538). The camera (input device 804) is positioned forward to allow the user to capture image(s) using the camera while looking at the gesture-based input/output device 506 to take a "selfie". (ie, to the same side).

１又は複数の記憶装置５１２は、例えば、短期メモリまたは長期メモリとして、異なる形態および／又は構成をとっても良い。記憶装置５１２は、揮発性メモリとしての情報の短期記憶のために構成されても良く、この揮発性メモリは、電力が除去されたときに記憶された内容を保持しない。揮発性メモリには、ランダムアクセスメモリ（ＲＡＭ）、ダイナミックランダムアクセスメモリ（ＤＲＡＭ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）などがある。記憶デバイス５１２は、いくつかの例では、例えば、揮発性メモリよりも大量の情報を記憶し、及び／又は電力が除去されても情報を保持してその情報を長期にわたって記憶するために、１又は複数のコンピュータ可読記憶媒体も含む。不揮発性メモリの例には、磁気ハードディスク、光ディスク、フロッピーディスク、フラッシュメモリ、又は電気的にプログラム可能なメモリ（ＥＰＲＯＭ）又は電気的に消去およびプログラム可能な（ＥＥＰＲＯＭ）メモリの形態が含まれる。 The one or more storage devices 512 may take different forms and/or configurations, eg, as short-term memory or long-term memory. Storage device 512 may be configured for short-term storage of information as volatile memory, which does not retain stored contents when power is removed. Volatile memory includes random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), and the like. Storage device 512, in some examples, may be a single memory device, for example, to store a larger amount of information than volatile memory and/or to retain information even when power is removed for long-term storage of that information. or multiple computer readable storage media. Examples of non-volatile memory include magnetic hard disks, optical disks, floppy disks, flash memory, or forms of electrically programmable memory (EPROM) or electrically erasable and programmable (EEPROM) memory.

図示されていないが、コンピュータ装置は、例えば、図３に示すようなネットワークを適切な学習および／又は試験データと共に使用して、ニューラルネットワークモデル５１４を学習させるための学習環境として構成されても良い。 Although not shown, a computing device may be configured as a learning environment for training the neural network model 514 using, for example, a network such as that shown in FIG. 3 with appropriate training and/or test data. .

深層ニューラルネットワークは、ラップトップ、デスクトップ、ワークステーション、サーバ、又は他の同等の世代のコンピュータ装置などの「より大きい」装置よりも少ない処理リソースを有するモバイル端末（例えば、スマートフォン又はタブレット）であるコンピュータ装置のためのライトアーキテクチャに適合させることができる。 Deep neural networks are computers that are mobile terminals (e.g., smartphones or tablets) that have fewer processing resources than "larger" devices such as laptops, desktops, workstations, servers, or other comparable generation computing devices. It can be adapted to the light architecture for the device.

一態様では、深層ニューラルネットワークモデルは、個々の標準的な畳み込みが深さ方向の畳み込み（ｄｅｐｔｈｗｉｓｅｃｏｎｖｏｌｕｔｉｏｎ）と点方向の畳み込み（ｐｏｉｎｔｗｉｓｅｃｏｎｖｏｌｕｔｉｏｎ）とに因数分解される畳み込みからなる、深さ方向に分離可能な畳み込みニューラルネットワークとして構成することができる。深さ方向の畳み込みは各入力チャンネルに単一フィルタを適用することに制限され、点方向の畳み込みは、深さ方向畳み込みの出力を結合することに制限される。 In one aspect, the deep neural network model is depthwise segregated, consisting of convolutions where each standard convolution is factored into a depthwise convolution and a pointwise convolution. It can be configured as a possible convolutional neural network. Depth-wise convolution is restricted to applying a single filter to each input channel, and point-wise convolution is restricted to combining the outputs of depth-wise convolutions.

第２のコンピュータ装置４１２は、コンピュータ装置４０２と同様に構成されても良いことが理解される。第２のコンピュータ装置４１２は、様々なユーザに対してサーバ４０６に記憶されたデータから画像および皮膚の徴候診断を要求して表示するなどのＧＵＩを有しても良い。 It is understood that the second computing device 412 may be configured similarly to computing device 402 . The second computing device 412 may have a GUI, such as for requesting and displaying images and skin symptom diagnoses from data stored on the server 406 for various users.

図６，７は、例えば、コンピュータ装置４０２（又は４１０）の各処理６００、６１０、６２０、及び６３０のフローチャートである。処理６００は、コンピュータ装置４０２のユーザが、アプリケーション５１６などのアプリケーションを使用して、ユーザの顔の画像を含む自撮りをし、複数（Ｎ）の各皮膚の徴候に対する皮膚診断を実行することに関する。６０１において、画像は、カメラ又は他の方法（例えば、メッセージの添付ファイルから）を介して、プロセッサで受信される。 6 and 7 are flow charts of respective processes 600, 610, 620, and 630 of computing device 402 (or 410), for example. Process 600 involves a user of computing device 402 using an application, such as application 516, to take a selfie containing an image of the user's face and perform a skin diagnosis for each of a plurality (N) of skin manifestations. . At 601, an image is received at the processor via a camera or other method (eg, from a message attachment).

６０２では、画像を前処理し、ＣＮＮに提示する正規化された画像を定義する。画像は、ＣＮＮの学習に従って、同じようなサイズの画像をＣＮＮに提示するために、センタリングされ、所定のサイズ（解像度）にトリミングされても良い。６０３では、正規化された画像がＣＮＮ（ニューラルネットワークモデル５１４）を使用して処理され、Ｎ個の皮膚の徴候診断の結果を生成する。また、民族ベクトルも生成される。Ｎ個の皮膚の徴候診断および民族性ベクトル（又はその単一の値）は、画像および／又は正規化された画像を提示することもできるＧＵＩなどを介して６０４で提示される。画像を提示することは、どの顔の領域がどの皮膚の徴候に関連しているかを示す、Ｎ個の皮膚の徴候の各々（又は少なくとも１つ）について、画像（又は正規化された画像）をセグメント化することを含んでも良い。画像からの抽出は、例えば、境界ボックス及び／又はマスクを使用して、ＧＵＩで提示するために準備された部位を分離するなどして皮膚の徴候を行っても良い。ＣＮＮは、各（又は少なくとも１つの）特定の領域についての境界ボックス及び／又はマスクを構成するセグメンテーション関連のデータを出力するように構成されても良い。画像には、拡張現実または仮想現実などの技術を用いて注釈を付けて、領域を強調表示しても良い。例えば、画像内の領域の関連ピクセルを強調表示することができる。画像（又は正規化された画像）を示すＧＵＩが提供されても良い。ポインティングデバイス又はジェスチャなど入力を受けて、ＣＮＮによって皮膚の徴候診断が生成された領域の１つ以上のピクセルを示したり選択したりしても良い。示された領域の外側のピクセルは、その領域のマスク及び／又は境界ボックスを使用して、選択された領域のピクセルを強調するためにぼかされても良い。また、ぼかすのではなく、領域の外にあるピクセル、例えば境界線内（例えば、１～Ｘピクセル）のピクセルを強調色で着色して領域を取り囲み、ハロー効果を出しても良い。領域に隣接するピクセルはより暗く（色が濃く）、領域から離れた（境界線内の）ピクセルはより明るい色になっていても良い。異なる徴候は、異なる色の境界線を持つ可能性がある。その領域の皮膚の徴候診断が表示されても良い。色は、スケーリング係数を使用するなど、皮膚の徴候診断の結果に比例する重症度を示すために使用されても良い。特定の皮膚の徴候診断に単一の色を使用し、その色の深さ（例えば、明るい色から暗い色）を皮膚の徴候診断のスケールに比例して調整しても良い。別の例では、皮膚の徴候診断のスケールの各レベルに異なる色を使用しても良い。ＧＵＩが深さによって変化する単一の色を示すか、異なる色を使用するかに関わらず、スケールとの関係を示す色の凡例を提供しても良い。画像に適用される拡張現実または仮想現実をオン／オフ、例えば、強調表示をオン／オフするために、ユーザのトグルコントロールを提供しても良い。皮膚の徴候診断の各々（例えば、各重症度および各皮膚の徴候に対して１つ）を示す他人の代表的な画像を示す分析例の画像（又は特定の患部の抽出物）を比較対象として提示しても良く、そのような例は、他人のプライバシを尊重する方法で示されても良い。以下で更に説明するように、製品および／又は治療の推奨を提示しても良い。更に後述するように、治療の前後の画像（例えば、後の画像は１つ以上の治療の後に撮影された後続の画像を表し、比較対象として用意された後続の皮膚の徴候診断を有していても良い。また、画像を介して入力されたジェスチャが領域を選択または指示することが説明したが、例えば、所定の皮膚の徴候の入力を受信することによって領域を自動的に選択するＧＵＩを提供しても良い。例えば、ＧＵＩでは、各皮膚の徴候および／又は皮膚の徴候診断を表または他の形態の出力を提示しても良い。表または他の形態から特定の項目を選択すると、ＧＵＩを呼び出して、皮膚の徴候診断に関連する領域が強調表示された画像（又は正規化された画像）を表示しても良い。ジェスチャー起動型のＧＵＩ（及び／又は他の入力起動型ＧＵＩ（例えば、テキストコマンド））に加えて、又はその代わりに、音声起動型ＧＵＩが、本明細書の任意の例で使用されても良いことが理解される。 At 602, the images are preprocessed to define normalized images to present to the CNN. Images may be centered and cropped to a predetermined size (resolution) in order to present similarly sized images to the CNN as the CNN learns. At 603, the normalized image is processed using a CNN (neural network model 514) to generate N skin symptom diagnosis results. An ethnicity vector is also generated. The N skin sign diagnoses and ethnicity vectors (or single values thereof) are presented at 604 via a GUI or the like that can also present images and/or normalized images. Presenting the images includes displaying an image (or a normalized image) for each (or at least one) of the N skin features that indicate which facial regions are associated with which skin features. Segmenting may be included. Extraction from the image may, for example, perform skin signatures using bounding boxes and/or masks to isolate regions prepared for presentation in the GUI. The CNN may be configured to output segmentation-related data that constitute bounding boxes and/or masks for each (or at least one) particular region. The image may be annotated using techniques such as augmented reality or virtual reality to highlight regions. For example, relevant pixels of regions within an image can be highlighted. A GUI may be provided that shows the image (or normalized image). Input, such as a pointing device or gesture, may be received to indicate or select one or more pixels in the region for which a skin symptom diagnosis was generated by the CNN. Pixels outside the indicated region may be blurred using a mask and/or bounding box for that region to enhance pixels in the selected region. Also, instead of blurring, pixels outside the region, eg pixels within the border (eg, 1-X pixels), may be colored with a highlight color to surround the region to create a halo effect. Pixels adjacent to the region may be darker (higher in color) and pixels further away from the region (within the border) may be lighter in color. Different signs may have different colored borders. A skin symptom diagnosis for the area may be displayed. Color may be used to indicate severity proportional to skin symptom diagnostic results, such as using a scaling factor. A single color may be used for a particular skin symptom diagnosis and its color depth (eg, light to dark) may be scaled proportionally to the scale of the skin symptom diagnosis. In another example, different colors may be used for each level of the skin symptom diagnosis scale. Whether the GUI shows a single color that varies with depth or uses different colors, it may provide a color legend that shows the relationship to scale. A user toggle control may be provided to turn on/off the augmented reality or virtual reality applied to the image, eg, to turn highlighting on/off. Analysis example images (or extracts of specific affected areas) showing representative images of others showing each of the skin sign diagnoses (e.g., one for each severity and each skin sign) for comparison. may be presented, and such examples may be presented in a manner that respects the privacy of others. Product and/or treatment recommendations may be provided, as described further below. As discussed further below, pre- and post-treatment images (e.g., post-images represent subsequent images taken after one or more treatments, with subsequent skin symptom diagnoses provided for comparison). Also, while it has been described that a gesture input via an image selects or indicates an area, for example, a GUI that automatically selects an area by receiving input of a given skin indicia may be used. For example, a GUI may present each skin symptom and/or skin symptom diagnosis in a tabular or other form of output Selecting a particular item from the tabular or other form may A GUI may be invoked to display an image (or normalized image) in which regions relevant to skin symptom diagnosis are highlighted, gesture-activated GUIs (and/or other input-activated GUIs ( For example, it is understood that a voice-activated GUI may be used in any of the examples herein in addition to or instead of text commands)).

図６（ｂ）は、処理６１０を示す。６１１では、ＧＵＩが提示され（ＧＵＩは、処理６００、６１０、６２０、及び６３０のいずれかに対して提示されても良いことに留意されたい）、製品および／又は治療法の推奨事項を開始する。入力は、パフォーマンスを起動するために受信されても良い。６１２では、推奨事項が受信され、動作は、推奨事項を受信するために、サーバ４０８などのリモートサーバへの皮膚の診断情報（例えば、スコア、民族性ベクトル、画像、ユーザ情報など）の通信を含んでも良い。推奨事項は、１又は複数の製品、皮膚の領域への適用方法、及びスケジュールを有する治療計画に関連するものが含まれる。６１３では、推薦事項は、例えばＧＵＩを介して提示される。複数の推奨事項を受信し、提示しても良い。６１４では、推薦事項の受け入れを示す選択が行われる。これは、記憶（ログ）されても良く、例えば、コンピュータ装置４０２の治療のモニタリング機能を開始しても良い。６１５では、サーバ４０８や他のサーバなどを介して製品の購買が促進されても良い。 FIG. 6B shows process 610 . At 611, a GUI is presented (note that a GUI may be presented for any of the processes 600, 610, 620, and 630) to initiate product and/or therapy recommendations. . Input may be received to initiate a performance. At 612, the recommendations are received and the operation communicates skin diagnostic information (eg, scores, ethnicity vectors, images, user information, etc.) to a remote server, such as server 408, to receive the recommendations. may include. Recommendations include those relating to a treatment regimen having one or more products, a method of application to an area of skin, and a schedule. At 613, recommendations are presented, eg, via a GUI. Multiple recommendations may be received and presented. At 614, a selection is made to indicate acceptance of the recommendation. This may be stored (logged) and may, for example, initiate a therapy monitoring function of computing device 402 . At 615, product purchases may be facilitated, such as through server 408 or other servers.

図７（ａ）は、モニタリングのためなどの処理６２０を示す。モニタリングは、コンピュータ装置４０２によって受信された、又はブラウザなどを介してアクセス可能な治療計画（例えば、データに記載されている）に応答しても良い。治療計画は、１週間に１回の第２の製品の適用などのスケジュール（例えば、朝および夜の製品の適用）を有していても良い。スケジュールは、例えば、ネイティブアプリケーションによる通知を介して、又はカレンダアプリケーションのような別の手段を介して、（例えば、６２１で）通知されても良い。６２２では、治療活動を容易にするために、例えば、その発生を記録したり、及び／又は活動を実行するための支持を提供したりするＧＵＩが提供される。６２３では、活動が実行されたことの確認のような入力が受信される。画像は、活動を記録するために含まれても良い。データがログに記録される場合がある。モニタリングによって、治療計画がどれだけ忠実に守られているかを測定しても良い。６２４では、製品の再購入が促進され、例えば、治療のモニタリングに応答して、手持ちの製品の量が不足しているかもしれないことが判断されても良い。 FIG. 7(a) shows a process 620, such as for monitoring. Monitoring may be responsive to treatment plans (eg, described in data) received by computing device 402 or accessible via a browser or the like. The treatment regimen may have a schedule such as application of the second product once a week (eg application of morning and evening products). The schedule may be announced (eg, at 621), for example, via notification by the native application, or via another means such as a calendar application. At 622, a GUI is provided to facilitate therapeutic activities, eg, documenting their occurrence and/or providing support for performing the activities. At 623, input is received, such as confirmation that the activity was performed. Images may be included to record activity. Data may be logged. Monitoring may measure how well the treatment regimen is being followed. At 624, repurchasing of the product may be facilitated, for example, in response to therapy monitoring, it may be determined that the quantity of product on hand may be insufficient.

図７（ｂ）は、モニタリングの活動として実行されても良い、例えば、比較を実行するための処理６３０を示す。６３１では、ユーザなどに指示するために比較のためのＧＵＩが提供される。６３２では、新しい画像（例えば、６０１で受信した最初の画像と比較される）が（任意選択で）格納される。６３３では、後続の皮膚の徴候診断が、新しい画像（例えば、処理６００と同様に、正規化されたものなど）に対してＣＮＮを使用して実行される。６３４では、ＧＵＩは、最初の皮膚の徴候診断と後続の皮膚の徴候診断を用いた比較を、任意選択で最初の画像と新しい新しい画像を用いて提示する。 FIG. 7(b) shows a process 630 for performing a comparison, for example, which may be performed as a monitoring activity. At 631, a comparison GUI is provided to guide the user and the like. At 632, the new image (eg, compared to the first image received at 601) is (optionally) stored. At 633, subsequent skin symptom diagnosis is performed using a CNN on the new image (eg, normalized, etc., similar to process 600). At 634, the GUI presents a comparison using the first skin symptom diagnosis and the subsequent skin symptom diagnosis, optionally using the first image and the new new image.

図６及び図７には示されていないが、受信または生成された任意のデータは、サーバ４０６などの遠隔記憶装置のために通信されても良い。 Although not shown in FIGS. 6 and 7, any data received or generated may be communicated to a remote storage device such as server 406 .

皮膚の徴候診断、その後の皮膚の徴候診断（任意選択で他のモニタリングを伴う）、及び集計のためのデータの提供は、製品や治療の有効性および／又は、製品および治療の不正請求の調査を可能にできる。データは、皮膚科医および／又は他の専門家および／又はユーザに収集、分析、提示されても良い。従って、本明細書のシステム及び方法は、皮膚治療のための分散型研究のモデルを容易にできる。 Diagnosis of skin manifestations, subsequent diagnosis of skin manifestations (optionally with other monitoring), and provision of data for aggregation may be used to investigate product or treatment efficacy and/or fraudulent claims of products and treatments. can make it possible. Data may be collected, analyzed and presented to dermatologists and/or other professionals and/or users. Thus, the systems and methods herein can facilitate a distributed research model for skin treatments.

本明細書の教示は、ローカルからグローバルへのリンク機能（例えば、顔全体を処理しながら顔の領域内の特定の条件）や、全ての重要な領域（例えば、額から口までの顔の各層に存在するしわ）を対象として顔を網羅的にマッピングする機能を含む。 The teachings herein provide local-to-global linking capabilities (e.g., specific conditions within facial regions while processing the entire face) and all critical regions (e.g., each layer of the face from forehead to mouth). It includes a function to comprehensively map the face targeting wrinkles present in the face.

局所的な皮膚の徴候の組合せを使用して、全体的（グローバル）な外見（例えば、見かけの年齢、輝き、疲労度など）を予測（分類）しても良い。また、外観は、化粧の存在下で皮膚分析を行うことによって判断、比較しても良い。本明細書における皮膚の診断は、顔の徴候の性質および位置に関して十分に網羅されており、他の人間が被験者を見ている時の知覚を説明することができる。皮膚の徴候診断は、他人からの９５％以上の知覚に基づくなど、見かけの年齢に関する更なる結論を導くために使用できる。また、化粧をしている場合には、皮膚の老化現象を隠すためのファンデーションなどの効果を測定し、顔のラインや構造をどのように回復させるかを判断するために、皮膚の診断と全体的な外見や魅力に関する更なる予測／分類を使用することができる。 A combination of local skin signs may be used to predict (classify) global appearance (eg, apparent age, radiance, fatigue, etc.). Appearance may also be determined and compared by performing a skin analysis in the presence of makeup. The cutaneous diagnostics herein are sufficiently comprehensive in terms of the nature and location of facial signs to be able to explain the perception of other humans when they are looking at the subject. Skin sign diagnostics can be used to draw further conclusions about apparent age, such as based on perceptions of 95% or more from others. In addition, if you wear makeup, skin diagnosis and general Further predictions/categorizations regarding personal appearance and attractiveness can be used.

本発明の皮膚の診断法および技術は、顔の５つの分析クラスタ（しわ／質感、たるみ、色素沈着障害、血管障害、頬の毛孔）を測定し、加齢の過程、環境条件（太陽光曝露、慢性的な都市公害への曝露など）、又はライフスタイル（ストレス、疲労、睡眠の質、喫煙、飲酒など）の全ての影響を記述するためのデータを容易にする。これらを経時的、運動中に測定したり、消費者の年齢と比較したりすることによって、この手法およびコンピュータ装置などは、老化の加速、明確な環境の影響（いくつかの徴候は、いくつかのクラスタに影響を与え、他のクラスタには影響を与えない）に関する情報を提供するように構成されても良い。
・化粧品および／又は治療薬、若しくは予防薬に関する推奨事項（例えば、太陽光を浴びた場合に、地理的にどのような種類のフィルター、抗酸化物質、落屑剤等が必要か）
・食生活、ライフスタイル、スポーツ／運動などの観点からの推奨事項は、顔の徴候の損傷または特異性にプラスの影響を及ぼす可能性がある。例えば、顔の徴候は、毎日の活動によって影響を受けることが知られており、それに基づいていくつかの戦略を提案する。 The skin diagnostic methods and techniques of the present invention measure five analytical clusters of the face (wrinkles/texture, sagging, pigmentation disorders, vascular disorders, cheek pores), aging process, environmental conditions (sun exposure, , exposure to chronic urban pollution), or lifestyle (stress, fatigue, sleep quality, smoking, alcohol consumption, etc.). By measuring these over time, during exercise, and by comparing them with consumer age, this technique, and computer equipment, etc., can be used to predict accelerated aging, distinct environmental influences (some signs, some clusters and not other clusters).
・Recommendations for cosmetics and/or therapeutic or prophylactic agents (e.g. what types of filters, antioxidants, desquamating agents, etc. are needed geographically for sun exposure)
• Recommendations in terms of diet, lifestyle, sports/exercise, etc. may have a positive impact on the lesions or idiosyncrasies of facial signs. For example, facial signs are known to be affected by daily activities, and we suggest several strategies based on that.

このように記載された皮膚の診断方法および技術は、消費者／患者をあらゆる次元において高精度な方法で動的に追跡するために使用できる。評価は、日／季節／ホルモン／安静時の影響および治療／化粧品／健康上の利益を評価するために、異なる時間および／又は異なる領域で採用されても良い。このような評価は、より正確な診断を提供し、より良い解決策の推奨を可能にする。 The skin diagnostic methods and techniques thus described can be used to dynamically track consumers/patients in all dimensions in a highly accurate manner. Assessments may be taken at different times and/or in different areas to assess daily/seasonal/hormonal/resting effects and therapeutic/cosmetic/health benefits. Such an assessment provides a more accurate diagnosis and allows better solution recommendations.

このように説明された皮膚の診断法および技術は、自撮り又は他のビデオからなど、動いているユーザの画像に対して評価を実行するために使用されても良い。本方法およびコンピュータ装置は、映像の各フレーム又は選択されたフレームについて評価し、顔が動いている時に顔の（複数の）スコアを記録するように構成されても良い。しわ又はたるみなどの動的曲線を定義しても良い。ビデオは、顔にストレスを誘発する特定の顔の位置および遷移をキャプチャして、特定の徴候の分析を支援しても良い。 The skin diagnostic methods and techniques thus described may be used to perform assessments on images of the user in motion, such as from selfies or other videos. The method and computing device may be configured to evaluate each frame or selected frames of the video and record the score(s) of the face when the face is moving. Dynamic curves such as wrinkles or sagging may be defined. The video may capture specific facial positions and transitions that induce stress on the face to aid in analysis of specific symptoms.

特徴を強調し、顔にストレスを与える特定のジェスチャやポーズなどをユーザに実行させる指示を提供しても良い。一例として、指示は（例えば、グラフィカル又は他のユーザインターフェースを介して）、例えば、頬をつまむような特定のジェスチャを実行するようにユーザに要求しても良い。このような評価は、より正確な診断を提供し、より良い解決策の推奨を可能にする。 Instructions may be provided that force the user to perform specific gestures, poses, etc. that emphasize features and stress the face. As an example, the instructions (eg, via a graphical or other user interface) may ask the user to perform a particular gesture, eg, pinch the cheeks. Such an assessment provides a more accurate diagnosis and allows better solution recommendations.

身体が直立または仰臥している体位などによる機能的ストレスのような、他のストレスが指示されても良い。機能的ストレスは、若い消費者によって非常に重要であり、非常にニュートラルな標準的なＩＤ画像では見られなかったしわを記録できる。目尻の小さいしわは、ユーザが微笑んだり、特定の感情を持ったりした時に見られる。 Other stresses may be indicated, such as functional stress due to the body being in an upright or supine position. Functional stress is very important to young consumers and can register wrinkles not seen in the very neutral standard ID image. Small crow's feet wrinkles are seen when the user smiles or has certain emotions.

従って、皮膚の診断法および技術は、顔が動いている時のビデオを受信し、それから多数の画像を評価することを可能にすることができ、例えば、ビデオはフレーム１，２，．．Ｎを有し、各フレームは、２０個の徴候に対して２０のスコアを生成することができる。システムは、ユーザにジェスチャ（顔をつまむ）を指示し、結果を記録する。顔をつまむ前後の画像を分析して、ストレス前後の皮膚の状態や、水分の移動量などを結論付けることができる（ｐｕｂｌｉｃａｔｉｏｎＤｅｒｍｏＴｒａｃｅ：ＦｌａｍｅｎｔＦ，ＢａｚｉｎＲ．Ｉｎｆｌｕｅｎｃｅｓｏｆａｇｅ，ｅｔｈｎｉｃｇｒｏｕｐ，ａｎｄｓｋｉｎｓｉｔｅｓｏｎａｐｒｏｖｉｓｏｒｙｓｋｉｎｍａｒｋｉｎｇ，ｅｘｐｅｒｉｍｅｎｔａｌｌｙｉｎｄｕｃｅｄ，ｉｎｖｉｖｏ．ＳｋｉｎＲｅｓＴｅｃｈｎｏｌ２４，１８０－１８６（２０１８））。ちょうど２つのフレームを使用する必要がある。 Thus, skin diagnostics and techniques can receive a video of the face in motion and then allow multiple images to be evaluated, eg, the video can be frames 1, 2, . . N, each frame can generate 20 scores for 20 symptoms. The system instructs the user to gesture (face pinch) and records the result. Images before and after pinching the face can be analyzed to conclude the condition of the skin before and after stress, the amount of water movement, etc. (Publication DermoTrace: Flament F, Bazin R. Influences of age, ethnic group, and skin sites. on a provisionary skin marking, experimentally induced, in vivo. Skin Res Technol 24, 180-186 (2018)). Exactly two frames should be used.

このように説明された皮膚の診断方法および技術は、分析的特徴の展開によって感情分析の性能を更に向上させるために使用しても良い。顔の全体的な評価を行うことにより、皮膚の徴候の組合せと、喜び、恐怖、嫌悪などの特定の視覚的な徴候とを関連付けることによって感情の評価（例えば、分類）が可能になり得る。 The skin diagnostic methods and techniques thus described may be used to further improve the performance of sentiment analysis through the development of analytical features. Performing a global assessment of the face may allow the assessment (eg, classification) of emotions by correlating combinations of skin signs with specific visual signs such as joy, fear, disgust, and the like.

このように説明された皮膚の診断方法および技術は、口頭または同じ言語でコミュニケーションをとれない人などの特定の感情を示すものとして徴候の分類を使用して、ヘルスケアの実施を促進するために使用されても良い。痛みのある患者は、それに関連する感情を示すことがあり、それを分析して薬の投与などに利用できる。例えば、眉間の徴候とその重症度の組み合わせは、健康分野、特に病院において、意思疎通が困難な痛みを抱える患者のための重要な手がかりとなり得る。顔を正確に読み取ることで、薬を投与したり、具体的な治療法を設計したりすることができる。 The skin diagnostic methods and techniques thus described use the classification of symptoms as indicative of particular emotions, such as those who cannot communicate verbally or in the same language, to facilitate the practice of health care. May be used. Patients in pain may exhibit emotions associated with it, which can be analyzed and used to administer medications, etc. For example, the combination of glabellar symptoms and their severity can be important clues in the health field, especially in hospitals, for patients with difficult-to-communicate pain. By accurately reading your face, you can administer medicines or design specific treatments.

このように説明された皮膚の診断方法および技術は、環境またはライフスタイルの影響の特徴付け（例えば、分類）の性能を更に向上させるために使用されても良い。データベースと比較することによって、外因性の老化と内因性の老化を定義でき、我々の知識データベースに基づき、環境条件（紫外線、汚染など）又はライフスタイル（ストレス、食生活、アルコール、喫煙、スポーツなど）の影響を、定量化（重症度の割合）や適格性（顔の徴候の性質や位置）の観点から把握している。本明細書に記載される皮膚診断の評価は、データベースからの情報によって強化され、都市での高齢化をテーマにした主要な化粧品について、より正確かつパーソナライズされたフィードバックを消費者に返すことができる。 The skin diagnostic methods and techniques thus described may be used to further improve the performance of characterizing (eg, classifying) environmental or lifestyle influences. By comparing with databases, we can define extrinsic and intrinsic aging, and based on our knowledge database, environmental conditions (UV, pollution, etc.) or lifestyle (stress, diet, alcohol, smoking, sports, etc.) ) in terms of quantification (percentage of severity) and eligibility (the nature and location of facial signs). The skin diagnostic assessments described herein can be enhanced with information from databases to provide consumers with more accurate and personalized feedback on leading cosmetic products for urban aging. .

このように記載された皮膚の診断方法および技術は、他の状態についての他の医学的診断の性能を向上させるために使用されても良い。皮膚の徴候の組み合せは、特定の顔の徴候と特定の病態または疾患とを相関させる研究に基づいて、特定の病態と関連付けることができる。例えば、額のしわは心疾患と関連している。 The skin diagnostic methods and techniques thus described may be used to enhance other medical diagnostics for other conditions. Combinations of cutaneous manifestations can be associated with specific medical conditions based on studies correlating specific facial manifestations with specific conditions or diseases. For example, forehead wrinkles are associated with heart disease.

このように記載された製品および／又はアプリケーションの推奨事項を含む皮膚の診断方法および技術は、自然かつ正常に発生し、典型的には疾患に関連しない皮膚の徴候（例えば、老化および／又は環境曝露に関連するような、疾患状態を示さない皮膚の徴候）に関連して実行されても良いことが理解されるであろう。しかし、このような非疾患の皮膚の徴候の発症および／又は進行は、それぞれの医薬品およびそれぞれの適用計画（それ自体は医学的治療ではないが、広義には治療）に反応する可能性がある。従って、本明細書では、非疾患の皮膚の徴候のための皮膚の診断装置および方法が提供される。非疾患の皮膚の徴候のための製品を推奨するための装置および方法が提供される。装置は、記憶部と、その記憶部に結合される処理部とを備え、記憶部は、複数であるＮ個の各非疾患の皮膚の徴候について、Ｎ個の各皮膚の徴候診断を判定するために画像のピクセルを分類する畳み込みニューラルネットワークであるＣＮＮを記憶して提供し、前記ＣＮＮは、前記Ｎ個の各非疾患の皮膚の徴候診断を生成するように構成された画像分類のための深層ニューラルネットワークであり、前記ＣＮＮは、前記Ｎ個の各非疾患の皮膚の徴候のそれぞれに関する非疾患の皮膚の徴候データを使用して学習され、前記処理部は、前記画像を受信し、前記ＣＮＮを用いて前記画像を処理し、前記Ｎ個の各非疾患の皮膚の徴候診断を生成する。処理部は、製品の推奨コンポーネント（例えば、各非疾患の皮膚の徴候に関連する各製品について、１又は複数の製品および任意で各製品の適用計画を選択するルールに基づくシステム又は他のシステムなど）を使用するなどして、Ｎ個の各非疾患の皮膚の徴候診断のうちの少なくとも１つに対して製品の推奨を生成するよう構成されても良い。製品の推奨コンポーネント、更には製品の推奨は、性別、民族性などの他の要因に対応していても良い。Ｎ個の各皮膚の徴候診断を生成するために、ＣＮＮを学習させたり、ＣＮＮを有するシステムを定義したりすることに関連する学習方法およびシステムが明らかになるであろう。 Skin diagnostic methods and techniques, including product and/or application recommendations so described, are naturally and normally occurring and typically not disease-related skin manifestations (e.g., aging and/or environmental It will be understood that it may also be practiced in relation to skin manifestations not indicative of a disease state, such as those associated with exposure. However, the development and/or progression of such non-diseased cutaneous manifestations may be responsive to the respective pharmaceutical product and the respective regimen of application (which is not a medical treatment per se, but treatment in a broader sense). . Accordingly, provided herein are skin diagnostic devices and methods for non-diseased skin indications. Apparatus and methods are provided for recommending products for non-diseased skin indications. The apparatus comprises a storage unit and a processing unit coupled to the storage unit, the storage unit determining a respective N skin symptom diagnosis for each of a plurality of N non-diseased skin symptoms. storing and providing a CNN, a convolutional neural network that classifies pixels of an image for image classification, said CNN configured to generate a diagnosis of each of said N non-diseased skin manifestations for image classification; A deep neural network, wherein the CNN is trained using non-diseased skin manifestation data for each of each of the N non-diseased skin manifestations, the processing unit receiving the images; A CNN is used to process the image to generate each of the N non-diseased skin symptom diagnoses. The processing unit may include a product recommendation component, such as a rule-based system or other system that selects one or more products and, optionally, an application regimen for each product, for each product associated with each non-diseased skin indication. ) to generate product recommendations for at least one of each of the N non-disease skin symptom diagnoses. Product recommendation components, and even product recommendations, may be responsive to other factors such as gender, ethnicity, and the like. Learning methods and systems related to training a CNN and defining a system with a CNN to generate each of the N skin symptom diagnoses will be apparent.

コンピュータ装置の態様に加えて、本明細書に記載された方法の態様のいずれかを実行するようにコンピュータ装置を構成するための命令が非一時的な記憶装置（例えば、メモリー、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ディスクなど）に格納される、コンピュータプログラム製品の態様が開示されていることを当業者は理解するのであろう。 In addition to computer device aspects, instructions for configuring the computer device to perform any of the method aspects described herein may be stored in non-transitory storage (e.g., memory, CD-ROM, One of ordinary skill in the art will appreciate that aspects of a computer program product stored on a DVD-ROM, disc, etc.) are disclosed.

実際の実施は、本明細書に記載された特徴のいずれか又は全てを含むことができる。これら及び他の態様、特徴、並びに様々な組み合わせは、方法、装置、システム、機能を実行するための手段、プログラムプロダクト、及び、他の方法で、本明細書で説明される特徴を組み合わせて表され得る。多数の実施形態が記載されているが、本明細書で説明されるプロセス及び技術的思想および範囲から逸脱することなく、様々な修正を行うことができることが理解されるだろう。加えて、他のステップが提供されても良く、又は記載された方法からステップが排除されても良く、他のコンポーネントが記載されたシステムに対し、追加または除去されても良い。従って、他の態様は特許請求の範囲内にある。 Actual implementations may include any or all of the features described herein. These and other aspects, features, and various combinations may refer to methods, apparatus, systems, means for performing functions, program products, and otherwise combining the features described herein. can be While a number of embodiments have been described, it will be appreciated that various modifications can be made without departing from the spirit and scope of the processes and techniques described herein. Additionally, other steps may be provided or steps may be omitted from the described methods, and other components may be added or removed from the described systems. Accordingly, other aspects are within the scope of the claims.

本明細書の記載および特許請求の範囲を通して、単語「含む」及び「備える」及びそれらの変形表現は「含むがこれに限定されない」を意味し、他の構成要素、整数またはステップを排除することを意図しない（排除しない）。本明細書全体を通して、文脈が別途必要としない限り、単数は複数を包含する。つまり、本明細書がその状況が他のことを要求していない限り、単数だけでなく複数も意図していると理解されたい。 Throughout the description and claims of this specification, the words "including" and "comprising" and variations thereof mean "including but not limited to" and to the exclusion of other elements, integers or steps. does not intend (do not exclude) Throughout this specification, the singular encompasses the plural unless the context otherwise requires. In other words, it is to be understood that this specification intends the plural as well as the singular, unless the context requires otherwise.

本発明の特定の態様、実施形態または例に関連して記載される特徴、整数特性、化合物、化学部分または基は、それらと非互換でない限り、任意の他の態様、実施形態または例に適用可能であると理解されるべきである。本明細書に開示された特徴（添付の特許請求の範囲、要約書、及び、図面を含む）の全て、或いはそのように開示された任意の方法または処理のステップの全ては、そのような特徴或いはステップの少なくともいくつかが相互に排他的である組み合わせを除いて、任意の組合せで組み合わせることができる。本発明は、前述の例または実施形態の詳細に限定されない。本発明は、本明細書（添付の特許請求の範囲、要約書、及び、図面を含む）に開示された特徴の任意の新規なもの、又は任意の新規な組み合わせ、又は開示された任意の手法または処理のステップの任意の新規なもの、又は任意の新規な組み合わせに拡張される。
＜その他＞
＜手段＞
技術的思想１の皮膚の診断装置は、記憶部と、その記憶部に結合される処理部とを備える皮膚の診断装置であって、前記記憶部は、複数であるＮ個の各皮膚の徴候について、Ｎ個の各皮膚の徴候診断を判定するために画像のピクセルを分類する畳み込みニューラルネットワークであるＣＮＮを記憶して提供し、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように構成された画像分類のための深層ニューラルネットワークであり、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候のそれぞれに関する皮膚の徴候データを使用して学習され、前記処理部は、前記画像を受信し、前記ＣＮＮを用いて前記画像を処理し、前記Ｎ個の各皮膚の徴候診断を生成する。
技術的思想２の皮膚の診断装置は、技術的思想１記載の皮膚の診断装置において、前記ＣＮＮは、画像分類のための学習済ネットワークから定義され、最終のエンコーダ段階の特徴ネットに特徴を符号化するように構成されたエンコーダ段階と、前記Ｎ個の各皮膚の徴候診断を生成するために、複数であるＮ個の並列の皮膚徴候の分岐によって復号化するための前記最終のエンコーダ段階の特徴ネットを受信するように構成されたデコーダ段階と、を備える。
技術的思想３の皮膚の診断装置は、技術的思想２記載の皮膚の診断装置において、前記デコーダ段階は、前記最終のエンコーダ段階の特徴ネットを処理して前記Ｎ個の並列の皮膚徴候の分岐の各々に提供するグローバルプーリング処理を備える。
技術的思想４の皮膚の診断装置は、技術的思想２又は３に記載の皮膚の診断装置において、前記ＣＮＮは、前記ピクセルを分類して民族性ベクトルを判定するように構成され、記ＣＮＮは、前記Ｎ個の各皮膚の徴候および複数の民族性に関する皮膚の徴候データを使用して学習される。
技術的思想５の皮膚の診断装置は、技術的思想４記載の皮膚の診断装置において、前記デコーダ段階は、前記民族性ベクトルを生成するための民族性に関する並列の分岐を備える。
技術的思想６の皮膚の診断装置は、技術的思想２から５のいずれかに記載の皮膚の診断装置において、前記Ｎ個の並列の皮膚徴候の分岐の各分岐は、第１の全結合層と、それに続く第１の活性化層と、第２の全結合層と、第２の活性化層と、最終の活性化層と、を連続して備え、前記Ｎ個の各皮膚の徴候診断および前記民族性ベクトルのうちの１つを含む最終値を出力する。
技術的思想７の皮膚の診断装置は、技術的思想６記載の皮膚の診断装置において、前記最終の活性化層は、前記第２の活性化層から受信した入力スコアｘに関する以下の数式１の関数に従って定義され、αは傾き、ａは下限、ｂは前記Ｎ個の各皮膚の徴候診断の各々のスコア範囲の上限である。

技術的思想８の皮膚の診断装置は、技術的思想４から７のいずれかに記載の皮膚の診断装置において、記ＣＮＮは、（ｘ _ｉ、ｙ _ｉ）形式の複数のサンプルを用いて学習され、ｘ _ｉは、ｉ番目の学習画像であり、ｙ _ｉは、グランドトゥルースの皮膚の徴候診断に対応するベクトルであり、前記ＣＮＮは、前記Ｎ個の並列の皮膚徴候の分岐、及び前記民族性に関する並列の分岐の各分岐に対する損失関数を最小化するように学習される。
技術的思想９の皮膚の診断装置は、技術的思想８記載の皮膚の診断装置において、前記ＣＮＮは、前記Ｎ個の並列の皮膚徴候の分岐のそれぞれについての損失関数Ｌ２に、前記民族性に関する並列の分岐についての標準交差エントロピー分類損失Ｌ _{ｅｔｈｎｉｃｉｔｙ} を重み付けして組み合わせた損失関数Ｌを、以下の数式３に従って最小化するように学習され、λは、スコア回帰と民族性の分類損失の間のバランスを制御する。

技術的思想１０の皮膚の診断装置は、技術的思想１から９のいずれかに記載の皮膚の診断装置において、前記記憶部は、前記画像を前処理するための顔およびランドマークの検出器を記憶し、前記処理部は、前記顔およびランドマークの検出器を用いて前記画像から正規化された画像を生成し、前記ＣＮＮを使用する際に前記正規化された画像を使用するように構成される。
技術的思想１１の皮膚の診断装置は、技術的思想１から１０のいずれかに記載の皮膚の診断装置において、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように適合された画像分類のための学習済ネットワークを備え、前記学習済ネットワークの全結合層が省略され、Ｎ個の各層のグループは、前記Ｎ個の各皮膚の徴候診断のそれぞれについて同じ特徴ネットを並列に復号するように定義される。
技術的思想１２の皮膚の診断装置は、技術的思想１から１１のいずれかに記載の皮膚の診断装置において、モバイル端末からなる個人用のコンピュータ装置と、通信ネットワークを介して皮膚の診断サービスを提供するサーバと、のいずれかから構成される。
技術的思想１３の皮膚の診断装置は、技術的思想１から１２のいずれかに記載の皮膚の診断装置において、前記記憶部は、前記処理部によって実行されると、前記Ｎ個の各皮膚の徴候診断の少なくともいくつかに対応して、製品および治療計画のうちの少なくとも１つに関する推奨事項を取得するための治療製品セレクタを提供するコードを記憶する。
技術的思想１４の皮膚の診断装置は、技術的思想１から１３のいずれかに記載の皮膚の診断装置において、前記記憶部は、前記処理部によって実行されると、前記画像を受信するための画像取得機能を提供するコードを記憶する。
技術的思想１５の皮膚の診断装置は、技術的思想１から１４のいずれかに記載の皮膚の診断装置において、前記記憶部は、前記処理部によって実行されると、少なくとも１つの皮膚の徴候に対する治療をモニタするための治療モニタを提供するコードを記憶する。
技術的思想１６の皮膚の診断装置は、技術的思想１５記載の皮膚の診断装置において、前記処理部は、各治療セッションに対する製品の適用に関連する治療活動を、思い出させる、指示する、及び／又は記録する、のうちの少なくとも１つを行うように構成される。
技術的思想１７の皮膚の診断装置は、技術的思想１から１６のいずれかに記載の皮膚の診断装置において、前記処理部は、前記ＣＮＮを用いて第２の画像を処理し、治療セッション後に受信した後続の皮膚の診断を生成するように構成される。
技術的思想１８の皮膚の診断装置は、技術的思想１７記載の皮膚の診断装置において、前記記憶部は、前記処理部によって実行されると、前記後続の皮膚の診断を用いた比較結果の提示を行うコードを記憶する。
技術的思想１９のコンピュータ実装方法は、皮膚診断のコンピュータ実装方法であって、画像のピクセルを分類して、複数であるＮ個の各皮膚の徴候の各々についてＮ個の各皮膚の徴候診断を判定するように構成された畳み込みニューラルネットワークであるＣＮＮを記憶する記憶部を提供し、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように構成された画像分類のための深層ニューラルネットワークであり、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候についての皮膚の徴候データを使用して学習され、前記記憶部に結合された処理部によって、前記画像を受信することと、前記ＣＮＮを用いて前記画像を処理して前記Ｎ個の各皮膚の徴候診断を生成することと、を実行する。
技術的思想２０のコンピュータ実装方法は、技術的思想１９記載のコンピュータ実装方法において、前記ＣＮＮは、画像分類のための学習済ネットワークから定義され、最終のエンコーダ段階の特徴ネットに特徴を符号化するように構成されたエンコーダ段階と、前記Ｎ個の各皮膚の徴候診断を生成するために、複数であるＮ個の並列の皮膚徴候の分岐によって復号化するための前記最終のエンコーダ段階の特徴ネットを受信するように構成されたデコーダ段階と、を備える。
技術的思想２１のコンピュータ実装方法は、技術的思想２０に記載のコンピュータ実装方法において、前記デコーダ段階は、前記最終のエンコーダ段階の特徴ネットを処理して前記Ｎ個の並列の皮膚徴候の分岐の各々に提供するグローバルプーリング処理を備える。
技術的思想２２のコンピュータ実装方法は、技術的思想２０又は２１に記載のコンピュータ実装方法において、前記ＣＮＮは、前記ピクセルを分類して民族性ベクトルを判定するように構成され、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候および複数の民族性に関する皮膚の徴候データを使用して学習され、前記ＣＮＮによる前記画像の処理は、前記民族性ベクトルを生成する。
技術的思想２３のコンピュータ実装方法は、技術的思想２２記載のコンピュータ実装方法において、前記デコーダ段階は、前記民族性ベクトルを生成するための民族性に関する並列の分岐を備える。
技術的思想２４のコンピュータ実装方法は、技術的思想２０から２３のいずれかに記載のコンピュータ実装方法において、前記Ｎ個の並列の皮膚徴候の分岐の各分岐は、第１の全結合層と、それに続く第１の活性化層と、第２の全結合層と、第２の活性化層と、最終の活性化層と、を連続して備え、前記Ｎ個の各皮膚の徴候診断および前記民族性ベクトルのうちの１つを含む最終値を出力する。
技術的思想２５のコンピュータ実装方法は、技術的思想２４記載のコンピュータ実装方法において、前記最終の活性化層は、前記第２の活性化層から受信した入力スコアｘに関する以下の数式１の関数に従って定義され、αは傾き、ａは下限、ｂは前記Ｎ個の各皮膚の徴候診断の各々のスコア範囲の上限である。

技術的思想２６のコンピュータ実装方法は、技術的思想２２から２５のいずれかに記載のコンピュータ実装方法において、前記ＣＮＮは、（ｘ _ｉ、ｙ _ｉ）形式の複数のサンプルを用いて学習され、ｘ _ｉは、ｉ番目の学習画像であり、ｙ _ｉは、グランドトゥルースの皮膚の徴候診断に対応するベクトルであり、前記ＣＮＮは、前記Ｎ個の並列の皮膚徴候の分岐、及び前記民族性に関する並列の分岐の各分岐に対する損失関数を最小化するように学習される。
技術的思想２７のコンピュータ実装方法は、技術的思想２６記載のコンピュータ実装方法において、前記ＣＮＮは、前記Ｎ個の並列の皮膚徴候の分岐のそれぞれについての損失関数Ｌ２に、前記民族性に関する並列の分岐についての標準交差エントロピー分類損失Ｌ _{ｅｔｈｎｉｃｉｔｙ} を重み付けして組み合わせた損失関数Ｌを、以下の数式３に従って最小化するように学習され、λは、スコア回帰と民族性の分類損失の間のバランスを制御する。

技術的思想２８のコンピュータ実装方法は、技術的思想１９から２７のいずれかに記載のコンピュータ実装方法において、前記記憶部は、前記画像を前処理するための顔およびランドマークの検出器を記憶し、前記方法は、前記処理部が前記顔およびランドマークの検出器を用いて記画像を前処理して前記画像から正規化された画像を生成し、前記ＣＮＮを使用する際に前記正規化された画像を使用する。
技術的思想２９のコンピュータ実装方法は、技術的思想１９から２８のいずれかに記載のコンピュータ実装方法において、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように適合された画像分類のための学習済ネットワークを備え、前記学習済ネットワークの全結合層が省略され、Ｎ個の各層のグループは、前記Ｎ個の各皮膚の徴候診断のそれぞれについて同じ特徴ネットを並列に復号するように定義される
技術的思想３０のコンピュータ実装方法は、技術的思想１９から２９のいずれかに記載のコンピュータ実装方法において、前記記憶部および前記処理部は、モバイル端末からなる個人用のコンピュータ装置と、通信ネットワークを介して皮膚の診断サービスを提供するサーバと、のいずれかの構成要素である。
技術的思想３１のコンピュータ実装方法は、技術的思想１９から３０のいずれかに記載のコンピュータ実装方法において、前記記憶部は、前記処理部によって実行されると、前記Ｎ個の各皮膚の徴候診断の少なくともいくつかに対応して、製品および治療計画のうちの少なくとも１つに関する推奨事項を取得するための治療製品セレクタを提供するコードを記憶し、前記方法は、前記処理部によって前記治療製品セレクタの前記コードを実行して、製品および治療計画のうちの少なくとも１つに関する推奨事項を取得する
技術的思想３２のコンピュータ実装方法は、技術的思想１９から３１のいずれかに記載のコンピュータ実装方法において、前記記憶部は、前記処理部によって実行されると、前記画像を受信するための画像取得機能を提供するコードを記憶し、前記方法は、前記画像を受信するために、前記処理部によって前記画像取得機能の前記コードを実行する。
技術的思想３３のコンピュータ実装方法は、技術的思想１９から３２のいずれかに記載のコンピュータ実装方法において、前記記憶部は、前記処理部によって実行されると、少なくとも１つの皮膚の徴候に対する治療をモニタするための治療モニタを提供するコードを記憶し、前記方法は、少なくとも１つの皮膚の徴候に対する治療をモニタするために、前記処理部よって前記治療モニタのコードを実行する。
技術的思想３４のコンピュータ実装方法は、技術的思想３３記載のコンピュータ実装方法において、前記方法は、前記処理部によって、各治療セッションに対する製品の適用に関連する治療活動を、思い出させる、指示する、及び／又は記録する、のうちの少なくとも１つを行う。
技術的思想３５のコンピュータ実装方法は、技術的思想１９から３４のいずれかに記載のコンピュータ実装方法において、前記処理部によって、前記ＣＮＮを用いて第２の画像を処理し、治療セッション後に受信した後続の皮膚の診断を生成する。
技術的思想３６のコンピュータ実装方法は、技術的思想３５記載のコンピュータ実装方法において、処理部によって、前記後続の皮膚の診断を用いて比較結果の提示を提供する。
技術的思想３７の方法は、画像のピクセルを分類して、複数であるＮ個の各皮膚の徴候の各々についてＮ個の各皮膚の徴候診断を判定するように構成された畳み込みニューラルネットワークであるＣＮＮを学習させ、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように構成された画像分類のための深層ニューラルネットワークであり、前記学習は、前記Ｎ個の各皮膚の徴候についての皮膚の徴候データを用いて実行される、方法。
技術的思想３８の方法は、技術的思想３７記載の方法において、前記ＣＮＮは、画像分類のための学習済ネットワークから定義され、最終のエンコーダ段階の特徴ネットに特徴を符号化するように構成されたエンコーダ段階と、前記Ｎ個の各皮膚の徴候診断を生成するために、複数であるＮ個の並列の皮膚徴候の分岐によって復号化するための前記最終のエンコーダ段階の特徴ネットを受信するように構成されたデコーダ段階と、を備える。
技術的思想３９の方法は、技術的思想３８記載の方法において、前記デコーダ段階は、前記最終のエンコーダ段階の特徴ネットを処理して前記Ｎ個の並列の皮膚徴候の分岐の各々に提供するグローバルプーリング処理を備える。
技術的思想４０の方法は、技術的思想３８又は３９に記載の方法において、前記ＣＮＮは、前記ピクセルを分類して民族性ベクトルを判定するように構成され、前記方法は、前記Ｎ個の各皮膚の徴候および複数の民族性に関する皮膚の徴候データを使用して前記ＣＮＮを学習させる。
技術的思想４１の方法は、技術的思想４０記載の方法において、前記デコーダ段階は、前記民族性ベクトルを生成するための民族性に関する並列の分岐を備える。
技術的思想４２の方法は、技術的思想３８から４１のいずれかに記載の方法において、前記Ｎ個の並列の皮膚徴候の分岐の各分岐は、第１の全結合層と、それに続く第１の活性化層と、第２の全結合層と、第２の活性化層と、最終の活性化層と、を連続して備え、前記Ｎ個の各皮膚の徴候診断および前記民族性ベクトルのうちの１つを含む最終値を出力する。
技術的思想４３の方法は、技術的思想４２記載の方法において、前記最終の活性化層は、前記第２の活性化層から受信した入力スコアｘに関する以下の数式１の関数に従って定義され、αは傾き、ａは下限、ｂは前記Ｎ個の各皮膚の徴候診断の各々のスコア範囲の上限である。

技術的思想４４の方法は、技術的思想４０から４４のいずれかに記載の方法において、前記学習は、（ｘ _ｉ、ｙ _ｉ）形式の複数のサンプルを用いて前記ＣＮＮを学習させ、ｘ _ｉは、ｉ番目の学習画像であり、ｙ _ｉは、グランドトゥルースの皮膚の徴候診断に対応するベクトルであり、前記学習は、前記Ｎ個の並列の皮膚徴候の分岐、及び前記民族性に関する並列の分岐の各分岐に対する損失関数を最小化するように前記ＣＮＮを学習させる。
技術的思想４５の方法は、技術的思想４４記載の方法において、前記学習は、前記Ｎ個の並列の皮膚徴候の分岐のそれぞれについての損失関数Ｌ２に、前記民族性に関する並列の分岐についての標準交差エントロピー分類損失Ｌ _{ｅｔｈｎｉｃｉｔｙ} を重み付けして組み合わせた損失関数Ｌを、以下の数式３に従って最小化するように前記ＣＮＮを学習させ、λは、スコア回帰と民族性の分類損失の間のバランスを制御する。

技術的思想４６のコンピュータ実装方法は、技術的思想１９から２７のいずれかに記載のコンピュータ実装方法において、前記ＣＮＮは、顔およびランドマークの検出によって前処理された正規化された画像を受信するように構成される。
技術的思想４７のコンピュータ実装方法は、技術的思想１９から２８のいずれかに記載のコンピュータ実装方法において、前記ＣＮＮは、前記Ｎ個の各皮膚の徴候診断を生成するように適合された画像分類のための学習済ネットワークを予め備え、前記学習済ネットワークの全結合層が省略され、Ｎ個の各層のグループは、前記Ｎ個の各皮膚の徴候診断のそれぞれについて同じ特徴ネットを並列に復号するように定義される。 Features, integral properties, compounds, chemical moieties or groups described in connection with a particular aspect, embodiment or example of the invention apply to any other aspect, embodiment or example unless incompatible with them. It should be understood that it is possible. Any and all features disclosed in this specification (including any appended claims, abstract, and drawings) or any method or process step so disclosed may be referred to as such features. Alternatively, they may be combined in any combination, except combinations where at least some of the steps are mutually exclusive. The invention is not limited to the details of the foregoing examples or embodiments. The present invention resides in any novel invention or any novel combination of features disclosed in this specification (including the appended claims, abstract and drawings) or any technique disclosed. or extend to any novel one or any novel combination of process steps.
<Others>
<Means>
Technical idea 1 is a skin diagnostic device comprising a storage unit and a processing unit coupled to the storage unit, wherein the storage unit stores a plurality of N skin symptoms. , a CNN that is a convolutional neural network that classifies pixels of an image to determine N skin symptom diagnoses, said CNN generating said N skin symptom diagnoses. A deep neural network for image classification, wherein the CNN is trained using skin manifestation data for each of the N skin manifestations, and the processing unit processes the images as Receive and process the image using the CNN to generate a diagnosis of each of the N skin signs.
The skin diagnostic apparatus of technical idea 2 is the skin diagnostic apparatus according to technical idea 1, wherein the CNN is defined from a trained network for image classification, and encodes features into feature nets in the final encoder stage. and said final encoder stage for decoding with a plurality of N parallel skin manifestation branches to generate said N respective skin manifestation diagnoses. a decoder stage configured to receive the feature net.
Technical idea 3 is the skin diagnostic apparatus according to technical idea 2, wherein the decoder stage processes the feature net of the final encoder stage to branch the N parallel skin signs. , with global pooling processing provided to each of the .
Technical idea 4 is the skin diagnostic apparatus according to technical idea 2 or 3, wherein the CNN is configured to classify the pixels to determine an ethnicity vector, and the CNN comprises , is learned using each of the N skin signs and skin sign data for multiple ethnicities.
The skin diagnostic apparatus according to technical idea 5 is the skin diagnostic apparatus according to technical idea 4, wherein said decoder stage comprises parallel ethnic branches for generating said ethnicity vector.
Technical idea 6 is the skin diagnostic apparatus according to any one of technical ideas 2 to 5, wherein each branch of the N parallel skin symptom branches comprises a first fully connected layer followed by a first activated layer, a second fully connected layer, a second activated layer, and a final activated layer, wherein each of the N skin symptom diagnoses and output a final value containing one of said ethnicity vectors.
The skin diagnostic device according to technical idea 7 is the skin diagnostic device according to technical idea 6, wherein the final activation layer is the following formula 1 regarding the input score x received from the second activation layer is the slope, a is the lower bound, and b is the upper bound of the score range for each of the N skin sign diagnoses, defined according to a function.

The skin diagnostic device of technical idea 8 is the skin diagnostic device according to any one of technical ideas 4 to 7, wherein the CNN is learned using a plurality of samples of (x _i , y _i ) format. , x _i is the ith training image, y _i is the vector corresponding to the ground truth skin sign diagnosis, the CNN is the N parallel skin sign branches, and the ethnicity is learned to minimize the loss function for each branch of parallel branches with respect to .
The skin diagnostic apparatus according to technical idea 9 is the skin diagnostic apparatus according to technical idea 8, wherein the CNN is a loss function L2 for each of the N parallel skin sign branches, A loss function L that is a weighted combination of the standard cross-entropy classification loss L _ethnicity for parallel branches is learned to minimize according to Equation 3 below, where λ is the difference between score regression and ethnicity classification loss. Control your balance.

Technical idea 10 is the skin diagnostic apparatus according to any one of technical ideas 1 to 9, wherein the storage unit stores a face and landmark detector for preprocessing the image. and the processing unit is configured to generate a normalized image from the image using the face and landmark detector, and to use the normalized image when using the CNN. be done.
Technical idea 11 is the skin diagnostic apparatus according to any one of technical ideas 1 to 10, wherein the CNN is adapted to generate each of the N skin symptom diagnoses. A trained network for image classification, wherein a fully connected layer of said trained network is omitted, and each group of N layers decodes in parallel the same feature net for each of said N skin symptom diagnoses. defined to be
Technical idea 12 is the skin diagnostic device according to any one of technical ideas 1 to 11, wherein skin diagnostic service is provided via a personal computer device composed of a mobile terminal and a communication network. It consists of either a server that provides
Technical idea 13 is the skin diagnostic apparatus according to any one of technical ideas 1 to 12, wherein the storage unit, when executed by the processing unit, Code is stored that provides a therapeutic product selector for obtaining recommendations for at least one of a product and a therapeutic regimen, corresponding to at least some of the symptom diagnoses.
Technical idea 14 is the skin diagnostic apparatus according to any one of technical ideas 1 to 13, wherein the storage unit receives the image when executed by the processing unit. Stores code that provides image acquisition functionality.
Technical idea 15 is the skin diagnostic apparatus according to any one of technical ideas 1 to 14, wherein the storage unit stores information for at least one skin symptom when executed by the processing unit. Store code to provide a therapy monitor for monitoring therapy.
Technical idea 16. The skin diagnostic apparatus according to technical idea 15, wherein the processing unit reminds, instructs, and/or treats activities associated with application of products for each treatment session. or recording.
Technical idea 17 is the skin diagnostic apparatus according to any one of technical ideas 1 to 16, wherein the processing unit processes a second image using the CNN, and after a treatment session configured to generate a subsequent skin diagnosis received;
Technical idea 18 is the skin diagnostic apparatus according to technical idea 17, wherein the storage unit presents comparison results using the subsequent skin diagnosis when executed by the processing unit. memorize the code to do
The computer-implemented method of Technical Thought 19 is a computer-implemented method of skin diagnosis, classifying pixels of an image to generate N skin symptom diagnoses for each of a plurality of N skin symptoms. providing a storage unit for storing a CNN, a convolutional neural network configured to determine, said CNN being a deep neural network for image classification configured to generate each of said N skin symptom diagnoses; a network, the CNN being trained using skin manifestation data for each of the N skin manifestations, and receiving the images by a processing unit coupled to the storage unit; to generate each of the N skin symptom diagnoses.
The computer-implemented method of Technical Thought 20 is the computer-implemented method of Technical Thought 19, wherein the CNN is defined from a trained network for image classification and encodes features into a feature net in the final encoder stage. and a feature net of said final encoder stage for decoding by means of a plurality of N parallel cutaneous manifestation branches to generate said N respective cutaneous manifestation diagnoses. and a decoder stage configured to receive the
The computer-implemented method of Technical Thought 21 is the computer-implemented method of Technical Thought 20, wherein the decoder stage processes the feature net of the final encoder stage to generate the N parallel cutaneous manifestation branches. It has a global pooling process to serve each.
The computer-implemented method of Technical Thought 22 is the computer-implemented method of Technical Thought 20 or 21, wherein the CNN is configured to classify the pixels to determine an ethnicity vector, wherein the CNN comprises the The processing of the images by the CNN, learned using each of the N skin signs and skin sign data for multiple ethnicities, produces the ethnicity vector.
The computer-implemented method of technical idea 23 is the computer-implemented method of technical idea 22, wherein the decoder stage comprises parallel ethnicity branches for generating the ethnicity vector.
Technical idea 24 is the computer-implemented method of any one of technical ideas 20 to 23, wherein each branch of the N parallel skin manifestation branches comprises a first fully connected layer; successively comprising a first activation layer, a second all-connected layer, a second activation layer, and a final activation layer, wherein each of said N skin symptom diagnoses and said Output the final value containing one of the ethnicity vectors.
The computer-implemented method of Technical Thought 25 is the computer-implemented method of Technical Thought 24, wherein the final activation layer calculates the input score x received from the second activation layer according to the function of Equation 1 below: is the slope, a is the lower bound, and b is the upper bound of the score range for each of the N skin sign diagnoses.

The computer-implemented method of technical idea 26 is the computer-implemented method according to any one of technical ideas 22 to 25, wherein the CNN is trained using a plurality of samples of the form (x _i , y _i ), and x _i is the ith training image, _yi is the vector corresponding to the ground truth skin sign diagnosis, the CNN is the N parallel skin sign branches, and the parallel is learned to minimize the loss function for each of the branches of .
The computer-implemented method of technical idea 27 is the computer-implemented method according to technical idea 26, wherein the CNN includes a loss function L2 for each of the N parallel skin sign branches, a parallel A loss function L that is a weighted combination of the standard cross-entropy classification loss L _ethnicity for bifurcations is learned to minimize according to Equation 3 below, where λ is the balance between score regression and ethnicity classification loss. Control.

Technical idea 28 is the computer-implemented method according to any one of technical ideas 19 to 27, wherein the storage unit stores a face and landmark detector for preprocessing the image. , the method comprises: the processing unit preprocessing the image with the face and landmark detector to generate a normalized image from the image; and the normalized image in using the CNN. Use an image that is
The computer-implemented method of Technical Thought 29 is the computer-implemented method of any one of Technical Thoughts 19-28, wherein the CNN comprises image classification adapted to generate each of the N skin symptom diagnoses. , wherein the fully connected layers of said trained network are omitted, and each group of N layers decodes in parallel the same feature net for each of said N skin symptom diagnoses. defined as
The computer-implemented method of technical idea 30 is the computer-implemented method according to any one of technical ideas 19 to 29, wherein the storage unit and the processing unit comprise a personal computer device consisting of a mobile terminal and a communication network. a server that provides skin diagnostic services via;
Technical idea 31 is the computer-implemented method according to any one of technical ideas 19 to 30, wherein the storage unit, when executed by the processing unit, stores each of the N skin symptom diagnoses. and storing code for providing a therapeutic product selector for obtaining recommendations for at least one of a product and a treatment plan, corresponding to at least some of the methods comprising: to obtain recommendations for at least one of products and treatment plans
The computer-implemented method of technical idea 32 is the computer-implemented method according to any one of technical ideas 19 to 31, wherein, when executed by the processing unit, the storage unit performs image acquisition for receiving the image. Code that provides a function is stored, and the method executes the code of the image acquisition function by the processing unit to receive the image.
The computer-implemented method of Technical Thought 33 is the computer-implemented method of any of Technical Thoughts 19-32, wherein the storage unit, when executed by the processing unit, provides a treatment for at least one skin indication. Code for providing a therapy monitor for monitoring is stored, and the method executes the therapy monitor code by the processing unit to monitor therapy for at least one skin manifestation.
The computer-implemented method of Technical Thought 34 is the computer-implemented method of Technical Thought 33, wherein said method reminds, instructs, by said processor, a treatment activity associated with application of a product for each treatment session; and/or record.
Technical idea 35 is the computer-implemented method of any one of technical ideas 19 to 34, wherein the processing unit processes a second image using the CNN, received after a treatment session Generate a subsequent skin diagnosis.
The computer-implemented method of idea 36, wherein the computer-implemented method of idea 35, by the processing unit, provides presentation of comparison results using said subsequent skin diagnosis.
The method of idea 37 is a convolutional neural network configured to classify pixels of an image to determine N skin symptom diagnoses for each of a plurality of N skin symptoms. training a CNN, said CNN being a deep neural network for image classification configured to generate a diagnosis for each of said N skin signs, said training comprising: skin manifestation data.
The method of Technical Thought 38 is the method of Technical Thought 37, wherein said CNN is defined from a trained network for image classification and configured to encode features into a feature net in a final encoder stage. and said final encoder stage feature net for decoding by a plurality of N parallel skin symptom branches to generate said N respective skin symptom diagnoses. and a decoder stage configured to:
The method of Technical Thought 39 is the method of Technical Thought 38, wherein the decoder stage processes the feature net of the final encoder stage to provide each of the N parallel cutaneous manifestation branches. It has a pooling process.
Technical idea 40 is the method according to technical idea 38 or 39, wherein the CNN is configured to classify the pixels to determine an ethnicity vector, the method comprising: The CNN is trained using skin signs and skin sign data for multiple ethnicities.
The method of idea 41 is the method of idea 40, wherein said decoder stage comprises parallel branches on ethnicity to generate said ethnicity vector.
Technical idea 42 is the method according to any one of technical ideas 38 to 41, wherein each branch of the N parallel skin manifestation branches comprises a first fully connected layer followed by a first , a second fully connected layer, a second activation layer, and a final activation layer, wherein each of the N skin symptom diagnoses and the ethnicity vector Output the final value containing one of
Technical idea 43 is the method according to technical idea 42, wherein the final activation layer is defined according to the function of Equation 1 below with respect to the input score x received from the second activation layer, and α is the slope, a is the lower bound, and b is the upper bound of the score range for each of the N skin sign diagnoses.

The method of technical idea 44 is the method according to any one of technical ideas 40 to 44, wherein the training includes training the CNN using a plurality of samples of (x _i , y _i ) format, and x _i is the i-th training image, y _i is a vector corresponding to the ground truth skin sign diagnosis, and the learning consists of the N parallel skin sign branches and the parallel ethnicity The CNN is trained to minimize the loss function for each branch of branches.
The method of technical idea 45 is the method according to technical idea 44, wherein the learning is a loss function L2 for each of the N parallel skin sign branches, and a standard for the parallel branch for the ethnicity Train the CNN to minimize a loss function L that is a weighted combination of the cross-entropy classification loss L _ethnicity according to Equation 3 below, where λ controls the balance between score regression and ethnicity classification loss do.

The computer-implemented method of idea 46 is the computer-implemented method of any of ideas 19-27, wherein the CNN receives normalized images preprocessed by face and landmark detection. configured as
Technical idea 47 is the computer-implemented method of any one of technical ideas 19 to 28, wherein the CNN comprises an image classification adapted to generate each of the N skin symptom diagnoses. , wherein the fully connected layers of said trained network are omitted, and each group of N layers decodes in parallel the same feature net for each of said N skin symptom diagnoses is defined as

Claims

1. A skin diagnostic device comprising a storage unit and a processing unit coupled to the storage unit, comprising:
the storage unit stores and provides a CNN that is a convolutional neural network that classifies pixels of an image for each of the plurality of N skin signs to determine each of the N skin sign diagnoses;
said CNN is a deep neural network for image classification configured to generate each of said N skin sign diagnoses;
the CNN is trained using skin manifestation data for each of the N skin manifestations;
The skin diagnosis apparatus, wherein the processing unit receives the image and processes the image using the CNN to generate each of the N skin symptom diagnoses.

The CNN generates an encoder stage defined from a trained network for image classification and configured to encode features into a final encoder stage feature net and each of the N skin symptom diagnoses. a decoder stage configured to receive the final encoder stage feature net for decoding by a plurality of N parallel skin manifestation branches for diagnostic equipment.

3. The skin diagnostic apparatus of claim 2, wherein the decoder stage comprises a global pooling process for processing the feature nets of the final encoder stage to provide each of the N parallel skin manifestation branches.

the CNN is configured to classify the pixels to determine an ethnicity vector;
4. The skin diagnostic apparatus of claim 2 or 3, wherein the CNN is trained using skin manifestation data for each of the N skin manifestations and multiple ethnicities.

5. The skin diagnostic apparatus of claim 4, wherein the decoder stage comprises parallel branches for ethnicity to generate the ethnicity vector.

Each branch of the N parallel skin manifestation branches comprises a first fully connected layer, followed by a first activated layer, a second fully connected layer, a second activated layer, and a final activation layers, and outputting a final value comprising one of each of the N skin symptom diagnoses and the ethnicity vector. Skin diagnostic equipment.

The final activation layer is defined according to the function of Equation 1 below for input scores x received from the second activation layer,
7. The skin diagnostic apparatus of claim 6, wherein [alpha] is the slope, a is the lower bound, and b is the upper bound of the score range for each of said N skin symptom diagnoses.

The CNN is trained using a plurality of samples of the form (x _i , y _i ),
x _i is the i-th training image,
y _i is a vector corresponding to the ground truth skin sign diagnosis,
8. The CNN of any of claims 4-7, wherein the CNN is trained to minimize a loss function for each branch of the N parallel skin manifestation branches and the parallel ethnicity branches. Skin diagnostic equipment.

The CNN provides a loss function L that is a weighted combination of a loss function L2 for each of the N parallel skin manifestation branches and a standard cross-entropy classification loss L _ethnicity for the parallel ethnicity branches, learned to minimize according to Equation 3 below,
9. The skin diagnostic device of claim 8, wherein [lambda] controls the balance between score regression and ethnic classification loss.

the storage unit stores face and landmark detectors for preprocessing the image;
the processing unit is configured to generate a normalized image from the image using the face and landmark detector and to use the normalized image when using the CNN; The skin diagnostic device according to any one of claims 1 to 9.

said CNN comprising a trained network for image classification adapted to generate each of said N skin sign diagnoses;
fully connected layers of the trained network are omitted,
11. A skin diagnostic apparatus according to any preceding claim, wherein each group of N layers is defined to decode in parallel the same feature net for each of said N skin symptom diagnoses.

the storage unit for, when executed by the processing unit, for obtaining recommendations for at least one of products and treatment regimens corresponding to at least some of each of the N skin symptom diagnoses; 12. A skin diagnostic device according to any preceding claim, storing code for providing a therapeutic product selector.

13. The skin diagnostic apparatus of any of claims 1-12, wherein the storage unit stores code that, when executed by the processing unit, provides an image acquisition function for receiving the image.

4. The skin of any of claims 1-13 , wherein the storage unit stores code that, when executed by the processing unit, provides a therapy monitor for monitoring therapy for at least one skin indication. diagnostic equipment.

15. The method of claim 14 , wherein the processing unit is configured to at least one of remind, instruct and record therapeutic activities associated with application of the product for each treatment session. skin diagnostic equipment.

16. The skin of any of claims 1-15 , wherein the processing unit is configured to process a second image using the CNN to generate a subsequent skin diagnosis received after a treatment session. diagnostic equipment.

17. The skin diagnosis apparatus of claim 16 , wherein the storage unit stores code that, when executed by the processing unit, presents comparison results using the subsequent skin diagnosis.

18. A skin diagnostic device according to any of the preceding claims, wherein the CNN is configured to receive normalized images preprocessed by face and landmark detection.

A skin diagnostic method using a storage unit and a processing unit coupled to the storage unit, comprising:
storing and providing in the storage unit a CNN, a convolutional neural network for classifying pixels of an image for each of a plurality of N skin signs to determine each of the N skin sign diagnoses;
said CNN is a deep neural network for image classification configured to generate each of said N skin sign diagnoses;
the CNN is trained using skin manifestation data for each of the N skin manifestations;
A method of diagnosing skin, wherein each of the N skin symptom diagnoses is generated by causing the processing unit to process the received image using the CNN .

A skin diagnostic program for causing a computer comprising a storage unit and a processing unit coupled to the storage unit to perform skin diagnostic processing,
storing in the storage a CNN, a convolutional neural network that classifies pixels of an image for each of a plurality of N skin signs to determine each of the N skin sign diagnoses;
and causing the computer to perform the steps of: causing the processing unit to process the received images using the CNN to generate each of the N skin symptom diagnoses;
said CNN is a deep neural network for image classification configured to generate each of said N skin sign diagnoses;
A skin diagnostic program, wherein said CNN is trained using skin symptom data for each of said N skin symptoms.