KR102543172B1

KR102543172B1 - Method and system for collecting data for skin diagnosis based on artificail intellience through user terminal

Info

Publication number: KR102543172B1
Application number: KR1020220091351A
Authority: KR
Inventors: 노현우
Original assignee: 주식회사 메타스킨큐어
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2023-06-14

Abstract

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 피부진단시스템으로서, 사용자에게 빛을 조사할 수 있는 조명부를 포함하고, 사용자의 얼굴 이미지를 촬영하도록 구성된 카메라부를 포함하는 단말, 및 미리 정해진 품질 정보를 포함하는 불특정 다수의 학습 2D 얼굴 이미지에 대응하는 3차원 모델링된 학습 3D 얼굴 이미지와, 상기 학습 2D 얼굴 이미지를 이용하여 인공신경망 모델을 학습시키고, 학습된 인공신경망 모델에 기초하여, 상기 단말로부터 수신된 사용자의 얼굴 이미지를 3차원의 3D 얼굴 이미지로 변환하도록 구성된 중앙 서버를 포함하고, 상기 단말은, 상기 인공신경망 모델에 기초하여, 상기 카메라부에 실시간으로 입력되는 영상 데이터 내 사용자의 얼굴 이미지의 품질 정보를 검출하고, 상기 품질 정보가 미리 정해진 기준을 충족하면, 특정 시점(view point)에서의 사용자의 얼굴 이미지를 촬영하도록 구성된다.As a skin diagnosis system according to an embodiment for realizing the above object of the present invention, a terminal including a lighting unit capable of irradiating light to a user and a camera unit configured to photograph a user's face image, and a predetermined An artificial neural network model is trained using a 3D modeled learning 3D face image corresponding to a plurality of unspecified learning 2D face images including quality information and the learning 2D face image, and based on the learned artificial neural network model, the and a central server configured to convert a user's face image received from a terminal into a three-dimensional 3D face image, wherein the terminal includes a user's image data input in real time to the camera unit based on the artificial neural network model. and detecting quality information of the face image, and photographing the user's face image at a specific viewpoint if the quality information meets a predetermined criterion.

Description

Method and system for collecting data for skin diagnosis based on artificial intelligence through user terminals

본 발명은 사용자 단말을 통한 인공지능에 기반한 피부진단용 데이터 수집 방법 및 시스템에 관한 것으로, 보다 상세하게는 딥러닝 기반의 객체 인식 기술을 활용하여, 피부 진단에 적합한 고품질의 얼굴 이미지를 획득하기 위한 사용자 단말을 통한 인공지능에 기반한 피부진단용 데이터 수집 방법 및 시스템에 관한 발명이다.The present invention relates to a method and system for collecting data for skin diagnosis based on artificial intelligence through a user terminal, and more particularly, to a user for obtaining a high-quality facial image suitable for skin diagnosis by utilizing a deep learning-based object recognition technology. The invention relates to a method and system for collecting data for skin diagnosis based on artificial intelligence through a terminal.

최근 사물 인터넷(IoT) 및 스마트 홈 관련하여 전자 통신 기술이 발달하고, 비교적 낮은 비용으로 피부 관리가 가능하다는 점에서, 집에서 직접 피부 진단 장치를 통해 피부를 진단한 후, 맞춤형의 화장품을 추천받고자 하는 사람들이 늘어나고 있다. Recently, electronic communication technology has developed in relation to the Internet of Things (IoT) and smart home, and skin care is possible at a relatively low cost. After diagnosing the skin through a skin diagnosis device directly at home, people want to receive customized cosmetics recommendations. More and more people are doing it.

이러한 추세를 반영하여, 최근 스마트 미러를 포함한 다양한 피부 진단 장치가 출시되고 있다. 상기 스마트 미러는 사용자의 움직임, 또는 음성을 인식할 수 있는 거울을 디스플레이로 활용하는 장치로서, 카메라 촬영에 의한 얼굴 이미지 분석을 통해 사용자의 피부를 진단하는데 활용되고 있다. Reflecting this trend, various skin diagnosis devices including smart mirrors have recently been released. The smart mirror is a device that utilizes a mirror capable of recognizing a user's motion or voice as a display, and is used to diagnose the user's skin through analysis of a facial image taken by a camera.

최근 이러한 진단 기능을 보강하기 위하여 조명 기능을 포함하는 스마트 미러가 출시되고 있으나, 전문 피부 진단 장치가 아닌 이상, 외부 광에 의한 빛 번짐 또는 빛 반사 등에 의해 정확한 피부 상태를 진단하는데는 한계가 있을 수밖에 없다. 이를 해결하기 위하여 조명을 제어할 수 있는 다양한 스마트 미러가 출시되고 있으나, 외부 조건 및 환경의 변화에 따라 조명 제어가 어려울 수밖에 없으며, 서로 다른 조명 스펙으로 인해 스마트 미러의 제조사 또는 사양에 따라, 피부 진단 결과가 다를 수 밖에 없는 한계가 있다. Recently, smart mirrors with lighting functions have been released to reinforce these diagnosis functions, but unless they are professional skin diagnosis devices, there are bound to be limitations in accurately diagnosing skin conditions due to light spread or light reflection caused by external light. does not exist. In order to solve this problem, various smart mirrors capable of controlling lighting have been released, but lighting control is inevitably difficult according to changes in external conditions and environment, and due to different lighting specifications, depending on the manufacturer or specification of the smart mirror, skin diagnosis There are limitations that inevitably result in different results.

또한, 정확도에 있어서도, 발진 등의 피부 전반의 상태를 파악할 수는 있지만, 국소 영역의 부어오름, 물사마귀 등 병변의 정밀 진단에는 어려움이 있을 수 밖에 없었다. 이에, 집에서 측정한 얼굴 이미지를 기초로 정확한 피부 진단을 위해서는 중앙 서버 차원에서 고품질의 얼굴 이미지를 수집할 필요가 있으며, 이에 기초하여 스마트 미러의 종류에 상관없이 통일된 피부 진단을 할 수 있는 시스템이 요구된다. In addition, in terms of accuracy, overall skin conditions such as rashes can be grasped, but there is inevitably difficulty in precisely diagnosing lesions such as swelling in a local area and water warts. Therefore, in order to accurately diagnose skin based on facial images measured at home, it is necessary to collect high-quality facial images at the central server level, and based on this, a system that can perform unified skin diagnosis regardless of the type of smart mirror this is required

대한민국 특허공개공보 제10-2022-0088219호, 2022년 06월 27일Republic of Korea Patent Publication No. 10-2022-0088219, June 27, 2022

본 발명의 일 목적은 스마트 미러를 포함하는 사용자 단말의 종류에 상관없이 중앙 서버에서 관리하는 얼굴 이미지 획득 모델에 기초하여, 스마트 미러로부터 학습된 이미지의 품질에 대응되는 얼굴의 위치, 크기, 밝기, 초점 등이 적용된 고품질의 얼굴 이미지가 획득될 수 있도록 한다.One object of the present invention is based on a face image acquisition model managed by a central server regardless of the type of user terminal including a smart mirror, the location, size, brightness, and A high-quality face image with focus applied can be obtained.

또한, 본 발명의 일 목적은, 미리 정해진 얼굴 방향이 포함된 얼굴 이미지를 학습한 얼굴 이미지 획득 모델에 기초하여 스마트 미러로부터 상기 미리 정해진 얼굴 방향이 포함된 고품질의 얼굴 이미지를 수집하고, 서로 다른 얼굴 방향이 포함된 복수의 고품질의 얼굴 이미지에 기초하여 3차원의 얼굴 이미지를 획득하는데 있다. In addition, one object of the present invention is to collect high-quality face images including a predetermined face direction from a smart mirror based on a face image acquisition model that has learned a face image including a predetermined face direction, and to collect different faces from each other. A three-dimensional face image is obtained based on a plurality of high-quality face images including directions.

본 발명의 일 실시예에 있어서, 상기 미리 정해진 품질 정보는 상기 얼굴 이미지 내의 얼굴의 위치, 크기, 밝기 및 초점정보 중 적어도 하나를 포함할 수 있다. In one embodiment of the present invention, the predetermined quality information may include at least one of position, size, brightness, and focus information of a face in the face image.

본 발명의 일 실시예에 있어서, 상기 조명부는 광량 및 조사 각도 중 적어도 하나를 제어가능하도록 구성되고, 상기 품질 정보가 상기 미리 정해진 기준을 충족하지 않는 경우, 상기 조명부를 제어하여, 광량 또는 조사 각도를 제어할 수 있다. In one embodiment of the present invention, the lighting unit is configured to control at least one of a light amount and an irradiation angle, and when the quality information does not satisfy the predetermined criterion, the lighting unit controls the light amount or the irradiation angle can control.

본 발명의 일 실시예에 있어서, 상기 학습 3D 얼굴 이미지는 서로 다른 시점에 대한 상기 학습 2D 얼굴 이미지로부터 생성될 수 있다. In one embodiment of the present invention, the training 3D face image may be generated from the training 2D face images for different viewpoints.

본 발명의 일 실시예에 있어서, 상기 학습 3D 얼굴 이미지는 학습 대상인 얼굴을 고정한 상태에서 카메라를 회전시켜 여러 시점에서 연속적인 3D 얼굴 이미지를 획득하는 3차원 피부정밀 진단기로부터 획득되는 3D 이미지이며, 상기 학습 2D 얼굴 이미지는 상기 학습 3D 얼굴 이미지로부터 시점별로 추출될 수 있다. In one embodiment of the present invention, the learning 3D face image is a 3D image obtained from a three-dimensional skin precision diagnostic device that obtains continuous 3D face images at various viewpoints by rotating the camera while the face to be studied is fixed, wherein the A training 2D face image may be extracted for each viewpoint from the training 3D face image.

본 발명의 일 실시예에 있어서, 상기 인공신경망 모델은 학습 2D 얼굴 이미지에 포함된 얼굴의 병변 정보가 추가로 학습되고, 상기 단말은, 상기 사용자의 얼굴 이미지 내 포함된 병변을 검출하고, 상기 병변의 위치 정보에 기초하여, 상기 특정 시점을 결정할 수 있다. In one embodiment of the present invention, the artificial neural network model additionally learns lesion information of a face included in a training 2D face image, and the terminal detects a lesion included in the user's face image, and the lesion Based on the location information of , it is possible to determine the specific viewpoint.

본 발명의 일 실시예에 있어서, 상기 특정 시점은 복수 개의 서로 다른 시점을 포함할 수 있다. In one embodiment of the present invention, the specific viewpoint may include a plurality of different viewpoints.

본 발명의 일 실시예에 있어서, 상기 특정 시점은 사용자의 얼굴의 정면, 양 측면일 수 있다. In one embodiment of the present invention, the specific viewpoint may be the front or both sides of the user's face.

본 발명의 일 실시예에 있어서, 상기 인공신경망 모델은 상기 특정 시점을 포함하는 서로 다른 시점들의 학습 2D 얼굴 이미지에 기초하여 학습될 수 있다. In one embodiment of the present invention, the artificial neural network model may be learned based on the learning 2D face images of different viewpoints including the specific viewpoint.

본 발명의 일 실시예에 있어서, 상기 인공신경망 모델은 학습 2D 얼굴 이미지의 시점 정보가 추가로 학습되고, 상기 단말은, 상기 영상 데이터 내 사용자의 얼굴 이미지의 시점 정보를 검출하여, 상기 특정 시점으로부터 미리 정해진 오차 범위 내에 있을 때, 상기 카메라부를 제어하여 동영상 촬영을 개시하고, 사용자가 얼굴의 각도를 움직이는 동안의 영상 프레임 내에 상기 특정 시점의 얼굴 이미지가 포함된 경우, 동영상 촬영을 중지하고, 해당 특정 시점의 얼굴 이미지가 포함된 영상 프레임을 추출할 수 있다. In one embodiment of the present invention, the artificial neural network model additionally learns viewpoint information of a learning 2D face image, and the terminal detects viewpoint information of a user's face image in the video data, and from the specific viewpoint When it is within a predetermined error range, video recording is started by controlling the camera unit, and when the face image at the specific time point is included in a video frame while the user moves the angle of the face, video recording is stopped, and the video recording is stopped. An image frame including a face image of a viewpoint may be extracted.

본 발명의 일 실시 예에 따르면, 사용자 단말의 종류에 상관없이, 중앙 서버에서 관리하는 얼굴 이미지 획득 모델에 기초하여, 스마트 미러로부터 학습된 이미지의 품질에 대응되는 얼굴의 위치, 크기, 밝기, 초점 등이 적용된 고품질의 얼굴 이미지를 획득할 수 있다. According to an embodiment of the present invention, regardless of the type of user terminal, based on a face image acquisition model managed by a central server, the position, size, brightness, and focus of a face corresponding to the quality of an image learned from a smart mirror A high-quality face image to which the back is applied may be obtained.

또한, 본 발명의 일 실시 예에 따르면, 미리 정해진 얼굴 방향이 포함된 얼굴 이미지를 학습한 얼굴 이미지 획득 모델에 기초하여 스마트 미러로부터 상기 미리 정해진 얼굴 방향이 포함된 고품질의 얼굴 이미지를 수집할 수 있으며, 서로 다른 얼굴 방향이 포함된 복수의 고품질의 얼굴 이미지에 기초하여 3차원의 얼굴 이미지를 획득할 수 있다. 이에 따라, 국소 영역에서의 병변의 상태를 3차원 이미지로 정밀 분석하여 정확한 피부 진단을 할 수 있다. In addition, according to an embodiment of the present invention, a high-quality face image including a predetermined face direction can be collected from a smart mirror based on a face image acquisition model that has learned a face image including a predetermined face direction. , It is possible to obtain a three-dimensional face image based on a plurality of high-quality face images including different face directions. Accordingly, accurate skin diagnosis can be performed by precisely analyzing the condition of the lesion in the local area using a 3D image.

도 1은 본 발명의 일 실시예에 따른 피부진단시스템을 설명하기 위한 블록도이다.
도 2는 도 1의 피부진단단말을 설명하기 위한 블록도들이다.
도 3은 딥러닝을 이용하여 학습한 얼굴 이미지 획득 모델에 기초하여, 사용자의 얼굴 이미지를 획득하는 것을 설명하기 위한 예시도이다.
도 4는 딥러닝을 이용하여 학습한 얼굴 이미지 획득 모델에 기초하여, 사용자의 얼굴 이미지를 획득하는 것을 설명하기 위한 다른 예시도이다.
도 5는 딥러닝 기술을 이용하여 수신된 2D 얼굴 이미지를 3D 얼굴 이미지로 변환하는 중앙 서버를 설명하기 위한 블록도이다.
도 6는 딥러닝 기술을 이용하여 얼굴 이미지 획득 모델을 생성시키는 것을 설명하기 위한 블록도이다.
도 7은 2D 얼굴 이미지를 3D 얼굴 이미지로 변환하는 예시도이다.
도 8는 본 발명의 일 실시예에 따른 피부 진단을 위해 3D 얼굴 이미지를 획득하는 방법을 설명하기 위한 순서도이다.
도 9는 본 발명의 일 실시예에 따른 피부 진단을 위해 얼굴 이미지를 촬영하는 방법을 설명하기 위한 순서도이다.1 is a block diagram for explaining a skin diagnosis system according to an embodiment of the present invention.
FIG. 2 is block diagrams for explaining the skin diagnosis terminal of FIG. 1 .
3 is an exemplary diagram for explaining acquiring a user's face image based on a face image acquisition model learned using deep learning.
4 is another exemplary diagram for explaining acquiring a user's face image based on a face image acquisition model learned using deep learning.
5 is a block diagram illustrating a central server that converts a received 2D face image into a 3D face image using deep learning technology.
6 is a block diagram illustrating generating a face image acquisition model using deep learning technology.
7 is an exemplary view of converting a 2D face image into a 3D face image.
8 is a flowchart illustrating a method of obtaining a 3D face image for skin diagnosis according to an embodiment of the present invention.
9 is a flowchart illustrating a method of photographing a face image for skin diagnosis according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Hereinafter, the embodiments disclosed in this specification will be described in detail with reference to the accompanying drawings, but the same or similar components are given the same reference numerals regardless of reference numerals, and redundant description thereof will be omitted. The suffixes "module" and "unit" for components used in the following description are given or used together in consideration of ease of writing the specification, and do not have meanings or roles that are distinct from each other by themselves. In addition, in describing the embodiments disclosed in this specification, if it is determined that a detailed description of a related known technology may obscure the gist of the embodiment disclosed in this specification, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in this specification, the technical idea disclosed in this specification is not limited by the accompanying drawings, and all changes included in the spirit and technical scope of the present invention , it should be understood to include equivalents or substitutes.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including ordinal numbers, such as first and second, may be used to describe various components, but the components are not limited by the terms. These terms are only used for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle.

본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this application, terms such as "comprise" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1개의 유닛이 2개 이상의 하드웨어를 이용하여 실현되어도 되고, 2개 이상의 유닛이 1개의 하드웨어에 의해 실현되어도 된다. In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by a terminal, device, or device may be performed instead by a server connected to the terminal, device, or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal, apparatus, or device connected to the server.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be implemented as computer readable code on a computer readable recording medium, and the computer readable recording medium includes all types of recording devices storing data that can be read by a computer system. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. In addition, the computer-readable recording medium may be distributed to computer systems connected through a network, so that computer-readable codes may be stored and executed in a distributed manner.

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 피부진단시스템을 설명하기 위한 블록도이다. 1 is a block diagram for explaining a skin diagnosis system according to an embodiment of the present invention.

일 실시예에 따른 피부진단시스템(1000)은 피부진단단말(150)을 이용한 사용자의 얼굴 이미지 촬영에 있어서, 원하는 품질의 얼굴 이미지가 미리 학습된 이미지 획득모델에 기초하여 원하는 품질의 얼굴 이미지를 촬영하는 시스템일 수 있다. 즉, 상기 피부진단단말(150)은 딥 러닝 알고리즘을 이용한 객체 인식 기술을 활용하여 카메라를 통해 입력되는 영상 데이터 내에 얼굴 이미지의 품질 및 시점(view point) 정보를 검출하도록 구성된다. 예를 들어, 도 1에 도시된 바와 같이, 상기 피부진단단말(150)은, 카메라를 통해 미리 학습한 얼굴 이미지의 품질 정보가 검출되면, 서로 다른 시점(view point)에서의 사용자의 얼굴 이미지를 촬영하고, 획득된 사용자의 얼굴 이미지를 상기 중앙 서버(300)에 전송하도록 구성된다. 상기 피부진단시스템(1000)은 정밀한 사용자의 피부 진단을 위하여, 피부진단단말(150)을 통해 획득된 2D 얼굴 이미지를 기초로 3D 얼굴 이미지로 변환하도록 구성된다. In the skin diagnosis system 1000 according to an embodiment, in capturing a user's face image using the skin diagnosis terminal 150, a face image of a desired quality is captured based on an image acquisition model in which a face image of a desired quality has been learned in advance. It may be a system that That is, the skin diagnosis terminal 150 is configured to detect the quality and view point information of a face image in image data input through a camera by utilizing object recognition technology using a deep learning algorithm. For example, as shown in FIG. 1 , the skin diagnosis terminal 150 detects the user's face image from different viewpoints when the quality information of the face image previously learned through the camera is detected. It is configured to photograph and transmit the obtained user's face image to the central server (300). The skin diagnosis system 1000 is configured to convert a 2D face image obtained through the skin diagnosis terminal 150 into a 3D face image for precise diagnosis of the user's skin.

상기 피부진단시스템(1000)은, 사용자 단말(100) 및 중앙 서버(300)를 포함한다. 다만, 이러한 도 1의 피부진단시스템(1000)은, 본 발명의 일 실시예에 불과하므로, 도 1을 통하여 본 발명이 한정 해석되는 것은 아니다. The skin diagnosis system 1000 includes a user terminal 100 and a central server 300 . However, since the skin diagnosis system 1000 of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1 .

예를 들어, 도 1의 각 구성요소들은 일반적으로 네트워크(network, 10)를 통해 연결된다. 즉, 도 1에 도시된 바와 같이, 적어도 하나의 사용자 단말(100)은 네트워크(10)를 통하여 중앙 서버(300)와 연결될 수 있다. 또한, 본 실시예에 따른 피부진단시스템(1000)은 사용자의 얼굴 이미지를 촬영하기 위한 피부진단단말(150)을 더 포함할 수 있다. 예를 들어, 상기 피부진단단말(150)은 스마트 미러일 수 있다. 상기 스마트 미러는, 사용자의 움직임, 또는 음성을 인식할 수 있는 거울을 디스플레이로 활용하는 장치로서, 카메라를 통해 사용자의 얼굴 이미지를 촬영하도록 구성될 수 있다. 사용자 단말(100)과 상기 피부진단단말(150)은 독립된 별개의 구성인 것으로 설명하였으나, 이에 한정되지 않는다. 예를 들어, 상기 사용자 단말(100)이 상기 피부진단단말(150)이거나, 또는 포함 관계에 있을 수 있다. For example, each component in FIG. 1 is generally connected through a network 10 . That is, as shown in FIG. 1 , at least one user terminal 100 may be connected to the central server 300 through the network 10 . In addition, the skin diagnosis system 1000 according to the present embodiment may further include a skin diagnosis terminal 150 for capturing a user's face image. For example, the skin diagnosis terminal 150 may be a smart mirror. The smart mirror is a device that utilizes a mirror capable of recognizing a user's motion or voice as a display, and may be configured to capture a user's face image through a camera. Although the user terminal 100 and the skin diagnosis terminal 150 have been described as independent and separate components, they are not limited thereto. For example, the user terminal 100 may be the skin diagnosis terminal 150 or may be included in the skin diagnosis terminal 150 .

여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의 미하는 것으로, 이러한 네트워크의 일 예에는 RF, 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5GPP(5th Generation Partnership Project) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스 (Bluetooth) 네트워크, NFC 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, the network means a connection structure capable of exchanging information between nodes such as a plurality of terminals and servers, and examples of such networks include RF, 3rd Generation Partnership Project (3GPP) network, and Long Term Evolution (Term Evolution) network, 5GPP (5th Generation Partnership Project) network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network) ), PAN (Personal Area Network), Bluetooth (Bluetooth) network, NFC network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, etc. are included, but are not limited thereto.

하기에서, 적어도 하나의 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명 하다 할 것이다. 또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시예에 따라 변경가능하다 할 것이다.In the following, the term at least one is defined as a term including singular and plural, and even if at least one term does not exist, each component may exist in singular or plural, and may mean singular or plural. It will be self-evident. In addition, the singular or plural number of each component may be changed according to embodiments.

상기 사용자 단말(100)은 상기 피부진단시스템(1000)를 통해 자신의 얼굴의 피부 진단을 수행하려는 자(이하, '사용자'라 칭함.)가 소유하고 있는 컴퓨터나 휴대용 단말로서, 사용자가 웹(Web), 앱(Application) 또는 웹앱의 형태로 제공되는 클라이언트에서 네트워크(10)를 통해 상기 중앙 서버(300)에 통신 접속하여 회원가입을 수행한 후, 사용자 정보를 입력하는 단말일 수 있다. 상기 사용자 단말(100)은 상기 피부진단단말(150)와 유선 또는 무선으로 연결되어, 상기 피부진단단말(150)을 통해 촬영된 얼굴이미지를 관리하고, 상기 중앙 서버(300)와 통신을 통해 사용자의 얼굴 이미지를 상기 중앙 서버(300)로 전송하도록 구성된다. The user terminal 100 is a computer or portable terminal owned by a person (hereinafter referred to as 'user') who intends to perform skin diagnosis on his or her face through the skin diagnosis system 1000, and the user can access the web ( It may be a terminal provided in the form of a web, application, or web app, which communicates with the central server 300 through the network 10 to sign up for membership, and then inputs user information. The user terminal 100 is connected to the skin diagnosis terminal 150 by wire or wirelessly, manages facial images taken through the skin diagnosis terminal 150, and communicates with the central server 300 to provide user information. It is configured to transmit the face image of the central server (300).

여기서, 상기 사용자 단말(100)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 사용자 단말(100)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 스마트 패드(smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. Here, the user terminal 100 may be implemented as a computer capable of accessing a remote server or terminal through a network. Here, the computer may include, for example, a laptop, a desktop, a laptop, and the like equipped with a navigation system and a web browser. At this time, the user terminal 100 is, for example, a wireless communication device that ensures portability and mobility, navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) It may include all types of handheld-based wireless communication devices such as terminals, smartphones, smart pads, tablet PCs, and the like.

상기 피부진단단말(150)은 상기 사용자 단말(100)과 유선 또는 무선으로 통신하여 연결될 수 있고, 상기 중앙 서버(300)에서 생성한 얼굴 이미지 획득 모델에 기초하여 사용자의 얼굴 이미지의 품질을 검출하도록 구성된다. 또한, 상기 피부진단단말(150)은 상기 얼굴 이미지 획득 모델에 기초하여 학습된 서로 다른 시점에의 사용자의 얼굴 이미지를 검출하여 서로 다른 시점에서 촬영된 복수의 상자 얼굴 이미지를 촬영하도록 구성된다. 상기 피부진단단말(150)은 중앙 서버(300)에서 생성한 얼굴 이미지 획득 모델에 기초하여, 사용자의 얼굴 이미지를 촬영하기 때문에, 제품의 종류를 불문한다. 예를 들어, 상기 피부진단시스템(1000)은 서로 다른 피부진단단말(150a, 150b)에도 적용할 수 있다. The skin diagnosis terminal 150 may be connected to the user terminal 100 through wired or wireless communication, and detect the quality of the user's face image based on the face image acquisition model generated by the central server 300. It consists of In addition, the skin diagnosis terminal 150 is configured to detect a user's face image at different viewpoints learned based on the face image acquisition model and capture a plurality of box face images taken at different viewpoints. Since the skin diagnosis terminal 150 captures a user's face image based on the face image acquisition model generated by the central server 300, it does not matter what type of product it is. For example, the skin diagnosis system 1000 can be applied to different skin diagnosis terminals 150a and 150b.

또한, 상기 피부진단단말(150)을 통해 촬영된 다수의 얼굴 이미지는 상기 사용자 단말(100)를 경유하여, 상기 중앙서버(300)로 전송되며, 상기 중앙서버(300)에서 사용자의 정밀한 피부 분석을 위해 3차원의 3D 얼굴 이미지로 변환되도록 구성된다. 이하, 도 2 및 3을 참조하여, 피부진단단말(150)을 상세히 설명한다. In addition, the plurality of facial images captured by the skin diagnosis terminal 150 are transmitted to the central server 300 via the user terminal 100, and the central server 300 precisely analyzes the user's skin. For this, it is configured to be converted into a three-dimensional 3D face image. Hereinafter, the skin diagnosis terminal 150 will be described in detail with reference to FIGS. 2 and 3 .

도 2는 도 1의 피부진단단말을 설명하기 위한 블록도들이다. 도 3은 딥러닝을 이용하여 학습한 얼굴 이미지 획득 모델에 기초하여, 사용자의 얼굴 이미지를 획득하는 것을 설명하기 위한 예시도이다. FIG. 2 is block diagrams for explaining the skin diagnosis terminal of FIG. 1 . 3 is an exemplary diagram for explaining acquiring a user's face image based on a face image acquisition model learned using deep learning.

도 1 내지 도 3을 참조하면, 상기 피부진단단말(150)은, 카메라부(151), 조명부(152), 검출부(153), 출력부(154) 및 제어부(155)를 포함할 수 있다. 1 to 3 , the skin diagnosis terminal 150 may include a camera unit 151, a lighting unit 152, a detection unit 153, an output unit 154, and a control unit 155.

상기 카메라부(151)는 사용자의 적어도 하나 이상의 얼굴 이미지를 촬영하도록 구성된다. 구체적으로, 상기 카메라부(151)는 1초에 다수의 프레임(이미지)를 처리 가능하도록 구성되어, 적어도 하나의 프레임이 포함된 영상 데이터를 획득하도록 구성되며, 획득된 영상 데이터를 상기 카메라부(151)과 연결된 검출부(153)로 출력하도록 구성된다.The camera unit 151 is configured to capture at least one face image of the user. Specifically, the camera unit 151 is configured to process a plurality of frames (images) in one second, and is configured to acquire image data including at least one frame, and convert the obtained image data to the camera unit ( 151) and is configured to output to the connected detection unit 153.

한편, 본 명세서에서 사용되는 용어를 정의함에 있어서, 이미지 촬영 시점(time point)를 기준으로, 촬영 개시 전 카메라 상에 비추어지는 영상을 "프리뷰 영상"이라 하고, 촬영된 영상을 "촬영 영상"이라고 한다. 촬영이 종료된 후에는 "촬영 영상"으로부터 "프리뷰 영상"으로 이어진다. 아울러, "프리뷰 영상"과 "촬영 영상"을 통칭하는 용어로 "카메라 영상"을 사용한다.On the other hand, in defining terms used in this specification, based on the time point of image capture, an image projected on a camera before the start of capturing is referred to as a "preview image" and a captured image is referred to as a "captured image". do. After shooting is finished, the "captured image" continues with the "preview image". In addition, "camera image" is used as a general term for "preview image" and "photographed image".

상기 카메라부(151)는 사용자의 얼굴 이미지를 촬영가능하도록 구성된 카메라 모듈로서, 하나 이상의 센서(예: 전면 센서 또는 후면 센서), 렌즈, ISP(image signal processor), 또는 플래시(flash)(예: LED 또는 xenon lamp 등)를 포함할 수 있다. 예를 들어, 상기 카메라부(151)는, 복수의 카메라를 포함할 수 있으며, 이들은 서로 다른 초점 또는 시야각을 갖는 카메라일 수 있다. 즉, 상기 카메라부(151)의 복수의 카메라는 서로의 렌즈 수차를 보정하기 위하여 피부진단단말(150)의좌우 및/또는 위아래로 배치될 수 있다. The camera unit 151 is a camera module configured to capture a user's face image, and includes one or more sensors (eg, a front sensor or a rear sensor), a lens, an image signal processor (ISP), or a flash (eg, a front sensor). LED or xenon lamp, etc.) may be included. For example, the camera unit 151 may include a plurality of cameras, which may have different focal points or viewing angles. That is, the plurality of cameras of the camera unit 151 may be disposed on the left and right sides and/or above and below the skin diagnosis terminal 150 in order to correct lens aberrations of each other.

또한, 카메라부(151)는 줌 기능(예: 줌-인(zoom-in) 및/또는 줌-아웃(zoom-out)기능)를 포함하는 카메라를 포함할 수 있다. 또한, 상기 카메라부(151)는 일반 카메라, 깊이 영상을 촬영가능한 뎁스 카메라 또는 고속 촬영이 가능한 고프레임 카메라로 구성될 수 있다. Also, the camera unit 151 may include a camera having a zoom function (eg, a zoom-in and/or zoom-out function). In addition, the camera unit 151 may be configured as a general camera, a depth camera capable of capturing a depth image, or a high frame rate camera capable of high-speed capturing.

상기 카메라부(151)는 카메라를 통해 비추어지는 프리뷰 영상을 영상 데이터로 변환하여, 실시간으로 상기 검출부(153)로 송신하도록 구성된다. 이 때, 상기 검출부(153)는 실시간으로 입력되는 상기 영상 데이터로부터 사용자의 얼굴 이미지의 촬영에 있어 미리 학습된 얼굴 이미지 획득 모델에 기초하여, 학습된 얼굴 이미지의 품질이 인식되고, 미리 학습된 특정 시점의 얼굴 이미지가 검출되면, 이를 트리거(trigger)로 인식하여, 제어부(155)가 상기 카메라부(151)를 제어하여 촬영하도록 구성된다. 즉, 상기 카메라부(151)는 상기 제어부(155)의 제어에 의해, 피사체, 즉 사용자의 얼굴을 촬영하도록 구성된다. The camera unit 151 is configured to convert a preview image projected through the camera into image data and transmit the image data to the detection unit 153 in real time. At this time, the detection unit 153 recognizes the quality of the learned face image based on a pre-learned face image acquisition model in capturing the user's face image from the video data input in real time, and recognizes the pre-learned specific quality of the face image. When a face image of a viewpoint is detected, it is recognized as a trigger, and the controller 155 controls the camera unit 151 to take a picture. That is, the camera unit 151 is configured to photograph a subject, that is, a user's face, under the control of the controller 155 .

또한, 상기 카메라부(151)는 선명한 영상 데이터를 획득하기 위하여, 자동 초점(Auto-focus) 기능, 자동 플레쉬 기능을 포함하도록 구성될 수 있다. 또는, 상기 카메라부(151)은 상기 검출부(153)로 전송된 영상 데이터 상에서 검출된 객체의 신뢰도를 기초로 초점을 맞추거나, 플레쉬를 터트리도록 구성될 수도 있다. 이에 따라, 상기 카메라부(151)는 후술하는 조명부(152)와 함께 선명한 영상 데이터를 획득함에 있어, 환경으로부터의 영향을 최소화하도록 할 수 있으며, 영상 데이터 내에서 객체를 높은 신뢰도로 검출하고, 원하는 품질의 얼굴 이미지를 획득할 수 있다.In addition, the camera unit 151 may be configured to include an auto-focus function and an auto-flash function in order to obtain clear image data. Alternatively, the camera unit 151 may be configured to focus or emit a flash based on the reliability of an object detected on the image data transmitted to the detection unit 153 . Accordingly, the camera unit 151, together with the lighting unit 152 described later, can minimize the influence from the environment in obtaining clear image data, detect objects in the image data with high reliability, and It is possible to obtain a high-quality face image.

상기 조명부(152)는 사용자의 얼굴 이미지가 원하는 품질 상태로 촬영될 수 있도록 빛을 조사할 수 있도록 구성된다. 즉, 상기 조명부(152)는 빛을 조사할 수 있는 복수 개의 LED를 포함할 수 있다. 예를 들어, 조명부(152)에 제공되는 복수 개의 LED는 피부진단단말(150)의 둘레를 따라 배치될 수 있다. The lighting unit 152 is configured to emit light so that the user's face image can be captured in a desired quality state. That is, the lighting unit 152 may include a plurality of LEDs capable of irradiating light. For example, a plurality of LEDs provided to the lighting unit 152 may be disposed along the circumference of the skin diagnosis terminal 150 .

상기 조명부(152)를 통해 피부 진단에서 요구되는 이미지의 품질이 최적화되고, 외부 변화에 대해 일정한 조건의 조명를 제공할 수 있는바, 정확한 피부 진단이 이루어 질 수 있다. 또한, 상기 조명부(152)는 사용자의 얼굴에 있어, 원하는 위치에 LED의 빛이 조사될 수 있도록 광량 조절과 각도 조절이 가능하도록 제공될 수 있다. 이에 따라, 사용자의 피부에 드리워진 음영을 고려하여, 조명부(152)에서 조사되는 빛을 조사하여 음영을 제거할 수 있다. The quality of an image required for skin diagnosis is optimized through the lighting unit 152 and lighting under constant conditions for external changes can be provided, so that accurate skin diagnosis can be made. In addition, the lighting unit 152 may be provided to adjust the light amount and angle so that the light of the LED can be irradiated to a desired position on the user's face. Accordingly, in consideration of the shadow cast on the user's skin, the light emitted from the lighting unit 152 may be irradiated to remove the shadow.

상기 검출부(153)는 딥 러닝 알고리즘을 이용하여 객체를 검출하도록 구성된다. 예를 들어, 상기 검출부(153)는 상기 카메라부(151)로부터 입력된 영상 데이터에 대해 일정 시간 간격으로 객체를 자동으로 검출하고, 검출된 객체에 대해 딥 네트워크를 이용한 추적 기술을 적용함으로써 실시간 처리를 행하고, 객체가 검색되는 영역을 최소화하는 방법을 적용하여 딥 네트워크를 설계하고 트레이닝하도록 구성될 수 있다. 상기 딥 네트워크는 DNN(Deep Neural Network) 기반의 딥 러닝 알고리즘에 따른 신경 회로망일 수 있으며, 해당 신경 회로망은 입력층(Input Layer), 하나 이상의 은닉층(Hidden Layers) 및 출력층(Output Layer)으로 구성된다. 여기서의, 상기 딥 네트워크는 DNN 이외의 다른 신경망이 적용될 수도 있으며, 일례로, CNN(Convolution Neural Network)이나 RNN(Recurrent Neural Network)과 같은 신경망이 적용될 수도 있다.The detection unit 153 is configured to detect an object using a deep learning algorithm. For example, the detection unit 153 automatically detects an object at regular time intervals for image data input from the camera unit 151, and applies a tracking technology using a deep network to the detected object for real-time processing. , and design and train a deep network by applying a method of minimizing an area in which an object is searched. The deep network may be a neural network based on a deep neural network (DNN)-based deep learning algorithm, and the neural network is composed of an input layer, one or more hidden layers, and an output layer. . Here, the deep network may be applied to a neural network other than DNN, and for example, a neural network such as a Convolution Neural Network (CNN) or a Recurrent Neural Network (RNN) may be applied.

상기 검출부(153)는 입력된 영상 데이터 중 적어도 하나의 프레임으로부터 객체를 검출하도록 구성된다. 예를 들어, 상기 검출부(153)는 프리뷰 영상 내 영상 데이터의 연속된 복수의 프레임 또는 이들로부터 랜덤으로 추출된 일정 개수 이상의 프레임에 대해, 동영상 기반의 딥 네트워크를 적용하여 객체를 검출할 수 있다. 이 경우, 인접하는 프레임들으로부터 검출되는 객체의 움직임을 추적할 수 있다. 또는 이와 달리, 상기 검출부(153)는 입력된 영상 데이터 중 대표 프레임을 선정하고, 당해 대표 프레임을 기초로 이미지 기반의 딥 네트워크를 적용하여 객체를 검출할 수도 있다. 예를 들어, 상기 대표 프레임은, 별도의 감지 센서 등을 통해 상기 카메라부(151)의 프리뷰 영상 내에 얼굴이 검출될 때, 획득되는 프레임일 수 있다. The detection unit 153 is configured to detect an object from at least one frame of input image data. For example, the detector 153 may detect an object by applying a video-based deep network to a plurality of consecutive frames of image data in the preview image or a predetermined number or more frames randomly extracted from them. In this case, motion of an object detected from adjacent frames may be tracked. Alternatively, the detection unit 153 may select a representative frame from among the input image data and detect the object by applying an image-based deep network based on the representative frame. For example, the representative frame may be a frame obtained when a face is detected in the preview image of the camera unit 151 through a separate sensor or the like.

또한, 도시되진 않았으나, 상기 카메라부(151)로부터 획득된 영상 데이터는 상기 검출부(153)에 입력되기 전, 그레이(Gray) 변환 및 평활화(Equalize) 처리 등이 되도록 구성될 수 있다. 여기서, 그레이 변환은 3채널 색상을 가진 이미지를 gray scale의 이미지로 바꾸는 것인데 이는 평활화 처리를 효과적으로 하기 위한 것이며, 평활화 처리는 조명 또는 플레쉬에 의해 이미지의 밝기 분포가 치우친 것을 넓은 영역에 거쳐 분포하도록 넓혀 주기 위한 것으로, 즉, 밝기 분포를 넓히기 위함이다. 이에 따라, 영상 데이터 내에서 얼굴 이미지를 높은 신뢰도로 검출할 수 있다. 한편, 상기 영상 데이터에는 공지의 다양한 이미지 처리 기법 역시 적용될 수 있으며, 이에 대해서는 자세한 설명을 생략한다. 아울러, 상기 영상 데이터는 상기 검출부(153)에 입력되기 전에 처리되는 것을 예로 설명하였으나, 이에 한정되진 않는다. Also, although not shown, the image data obtained from the camera unit 151 may be configured to be gray converted and equalized before being input to the detection unit 153 . Here, gray conversion is to convert an image with 3-channel color into a gray scale image, which is for effective smoothing processing. This is to give, that is, to widen the brightness distribution. Accordingly, it is possible to detect a face image with high reliability within the image data. Meanwhile, various well-known image processing techniques may also be applied to the image data, and a detailed description thereof will be omitted. In addition, it has been described as an example that the image data is processed before being input to the detection unit 153, but is not limited thereto.

예를 들어, 상기 검출부(153)는 영상 데이터 내에서 상기 얼굴 이미지 획득 모델 생성에 미리 학습한 얼굴 이미지의 품질 정보가 검출되면, 품질 정보가 미리 정해진 기준을 충족하는 경우, 서로 다른 시점에서의 사용자의 얼굴 이미지를 검출하도록 구성된다. 구체적으로, 상기 검출부(153)는 얼굴 이미지의 품질을 검출하는 품질 검출부(153a)와 미리 학습된 서로 다른 시점의 얼굴 이미지를 검출하는 얼굴 검출부(153b)를 포함할 수 있다. For example, when the quality information of the face image pre-learned in the generation of the face image acquisition model is detected in the image data, the detection unit 153, when the quality information meets a predetermined criterion, the user at different time points. It is configured to detect a face image of. Specifically, the detection unit 153 may include a quality detection unit 153a that detects the quality of a face image and a face detection unit 153b that detects pre-learned face images of different viewpoints.

상기 품질 검출부(153a)는 정확한 피부 진단을 위해 얼굴 이미지가 포함된 영상 데이터의 품질을 검출한다. 상기 품질 검출부(153a)는 딥 네트워크를 이용하여 상기 얼굴 이미지 획득 모델에 기초하여, 상기 카메라부(151)로부터 입력된 영상 데이터에 대해 자동으로 얼굴의 위치, 크기, 밝기, 초점 등의 품질 정보를 추출하도록 구성된다. 즉, 상기 품질 검출부(153a)는 상기 얼굴 이미지 획득 모델의 학습 데이터인 얼굴 이미지의 품질에 기초하여, 수신된 영상 데이터 내 얼굴 이미지의 품질을 검출할 수 있다. The quality detection unit 153a detects the quality of image data including a face image for accurate skin diagnosis. The quality detection unit 153a automatically obtains quality information such as the location, size, brightness, and focus of the face of the image data input from the camera unit 151 based on the face image acquisition model using a deep network. configured to extract. That is, the quality detector 153a may detect the quality of a face image in received video data based on the quality of a face image, which is the learning data of the face image acquisition model.

예를 들어, 상기 품질 검출부(153a)는 검출된 얼굴 이미지를 기초로 영상 데이터 상의 중심부에 얼굴이 놓여져 있는지, 얼굴 외형 중 일부가 잘렸는지 등에 대한 얼굴의 위치를 검출하도록 구성된다. 또한, 학습 데이터인 얼굴 이미지의 크기에 기초하여, 영상 데이터 내 얼굴 크기가 원하는 얼굴의 크기인지 여부를 검출하도록 구성된다. 예를 들어, 상기 품질 검출부(153a) 검출 대상인 객체인 얼굴에 대해 카메라로부터 떨어진 거리를 통해 크기 정보를 추출할 수 있다. 또한, 상기 품질 검출부(153a)는 학습 데이터인 얼굴 이미지의 부분별 조도 및 선명도에 기초하여, 외부 환경에 따른 영상 데이터 내의 얼굴 이미지의 광량, 음영(그림자), 빛 번짐, 빛 반사 발생 여부 위치를 검출하도록 구성된다. For example, the quality detection unit 153a is configured to detect the position of the face based on the detected face image, whether the face is placed in the center of the image data or whether a part of the face shape is cut off. In addition, based on the size of the face image as learning data, it is configured to detect whether the face size in the image data is a desired face size. For example, the quality detector 153a may extract size information of a face, which is an object to be detected, through a distance away from a camera. In addition, the quality detection unit 153a determines the location of light quantity, shade (shadow), light smearing, and light reflection of the face image in the image data according to the external environment, based on the illuminance and sharpness of each part of the face image, which is learning data. configured to detect.

여기서, 빛 번짐은 사용자의 안면의 유분에 의한 것이고, 빛 반사는 조명부(152)의 조명과 상관없이, 외부 광원에 의해 빛이 반사되는 것을 의미한다. 즉, 얼굴 이미지 내에서 빛 번짐이 발생하는 경우, 발생한 영역의 선명도가 떨어지며, 빛 반사가 발생하는 경우, 발생한 영역이 백색으로 나타나게 된다. 즉, 상기 품질 검출부(153a)는 외부 환경 또는 조건에 의한 광량을 측정하고, 얼굴 이미지 내에 빛 번짐, 빛 반사가 발생 여부를 검출하도록 구성된다. 또한, 학습 데이터에 기초하여, 영상 데이터 내 얼굴 이미지의 초점이 맞는지 여부를 검출할 수 있다. 다만, 이에 한정되지 않는다. 예를 들어, 초점의 맞는지 여부는 품질 검출부(153a)에서 검출하는 것이 아닌, 상기 카메라부(151)의 오토 포커스(Auto Focus) 기능에 따라 자동으로 초점이 맞춰질 수 있다. Here, the light spread is due to the oil of the user's face, and the light reflection means that light is reflected by an external light source regardless of the illumination of the lighting unit 152 . That is, when light smearing occurs in the face image, the sharpness of the generated area is lowered, and when light reflection occurs, the generated area appears in white. That is, the quality detection unit 153a is configured to measure the amount of light according to the external environment or condition, and to detect whether light spread or light reflection occurs in the face image. In addition, based on the training data, it is possible to detect whether or not a face image in the image data is in focus. However, it is not limited thereto. For example, whether or not focus is correct is not detected by the quality detection unit 153a, but can be automatically focused according to an auto focus function of the camera unit 151.

또한, 상기 품질 검출부(153a)는 정밀한 피부 진단을 위해 요구되는 조건이 충족되는지 여부, 예를 들어, 앞머리를 위로 올려 이마가 노출되었는지, 얼굴 이미지 내 이물이 묻었는지 여부를 검출할 수 있다. 즉, 상기 품질 검출부(153a)는 학습 데이터인 얼굴 이미지에 기초하여, 영상 데이터 내 얼굴 이미지의 품질을 검출할 수 있다. In addition, the quality detection unit 153a may detect whether conditions required for precise skin diagnosis are satisfied, for example, whether the forehead is exposed by raising the bangs or whether a foreign substance is present in the face image. That is, the quality detector 153a may detect the quality of the face image in the video data based on the face image as learning data.

상기 얼굴 검출부(153b)는 영상 데이터 내에 얼굴을 검출하도록 구성된다. 상기 얼굴 검출부(153b)는 딥 네트워크를 이용하여 상기 얼굴 이미지 획득 모델에 기초하여, 상기 카메라부(151)로부터 입력된 영상 데이터에 대해 자동으로 얼굴의 포함여부, 영상 데이터 내 얼굴 이미지의 시점 정보를 추출하도록 구성된다. 즉, 상기 품질 검출부(153b)는 상기 얼굴 이미지 획득 모델의 학습 데이터인 특정 시점에서의 얼굴 이미지에 기초하여, 수신된 영상 데이터 내 얼굴 이미지의 시점 정보를 검출할 수 있다. The face detection unit 153b is configured to detect a face in image data. The face detection unit 153b automatically determines whether or not a face is included in the video data input from the camera unit 151 and viewpoint information of the face image in the video data based on the face image acquisition model using a deep network. configured to extract. That is, the quality detector 153b may detect viewpoint information of a face image in received image data based on a face image at a specific viewpoint, which is learning data of the face image acquisition model.

예를 들어, 상기 얼굴 검출부(153b)는 학습 데이터인 특정 시점 정보에 따라 검출된 얼굴 이미지를 기초로 영상 데이터 상의 얼굴 이미지의 시점 정보, 예를 들어, 정면인지, 좌측면, 우측면, 45도 기울어져 있는지 등을 검출하도록 구성된다. For example, the face detection unit 153b determines viewpoint information of a face image on image data based on a face image detected according to specific viewpoint information, which is learning data, for example, whether it is a front face, a left face, a right face, or a 45-degree tilt. It is configured to detect whether or not the

상기 출력부(154)는 후술하는 제어부(155)의 제어에 의해 사용자에게 피드백을 제공하도록 구성된다. 예를 들어, 상기 출력부(154)는 디스플레이부 또는 스피커부를 포함할 수 있다. 즉, 피드백 제공에 있어, 상기 피부진단단말(150), 예를 들어, 스마트 미러 있어서는 거울 상에 피드백을 출력할 수 있고, 또는 스피커를 통해 음성으로 피드백을 출력할 수 있다. 예를 들어, 특정 시점에서의 사용자의 얼굴 이미지의 촬영을 위해, 상기 출력부(154)는 검출부(153b)에서 인식되는 얼굴 이미지의 시점 정보에 기초하여, 사용자에게 얼굴 각도의 회전에 대한 피드백을 출력할 수 있다. 사용자가 얼굴 각도를 인식하고 포즈를 잡는 것은 불가능하기 때문에, 이에, 사용자로서는 출력부의 피드백에 기초하여 얼굴 각도를 미세하게 조절함으로써, 원하는 시점의 얼굴 이미지를 촬영할 수 있다. 한편 여기서 얼굴 각도는 위아래, 좌우 각도를 모두 포함한다. 한편, 여기서 얼굴 각도는 학습 얼굴 이미지의 학습을 통해 특정 기준에 따른 수치이며, 실제 사용자가 의식적으로 조작하는 얼굴 각도를 의미하지 않는다. 예를 들어, 측면을 기준으로 이마와 턱선이 수직이 되는 위아래 각도를 기준 각도로 할 수 있고, 사용자에 따라 상대적으로 고개를 조작해야 기준 각도에 도달할 수 있다. 즉, 사용자의 거북목 등 신체적 특징에 상관없이, 인공지능에 기반하여, 촬영하고자 하는 시점의 얼굴 이미지를 검출하여 촬영함으로써, 정밀한 피부 진단을 위한 공통된 형상의 분석 데이터를 획득할 수 있다. The output unit 154 is configured to provide feedback to the user under the control of a control unit 155 to be described later. For example, the output unit 154 may include a display unit or a speaker unit. That is, in providing feedback, the skin diagnosis terminal 150, for example, a smart mirror, may output feedback on a mirror or output feedback through a speaker. For example, in order to capture a user's face image at a specific viewpoint, the output unit 154 provides feedback about the rotation of the face angle to the user based on the viewpoint information of the face image recognized by the detection unit 153b. can be printed out. Since it is impossible for the user to recognize the angle of the face and pose, the user can capture a face image at a desired viewpoint by finely adjusting the angle of the face based on the feedback from the output unit. Meanwhile, the face angle includes both up and down and left and right angles. Meanwhile, the face angle here is a numerical value according to a specific criterion through learning face image learning, and does not mean a face angle consciously manipulated by an actual user. For example, an up and down angle at which the forehead and the jaw line are perpendicular to the side may be set as the reference angle, and the reference angle may be reached by relatively manipulating the head according to the user. That is, analysis data of a common shape for precise skin diagnosis can be obtained by detecting and capturing a face image at a point in time to be photographed based on artificial intelligence, regardless of physical characteristics such as a user's turtle neck.

상기 제어부(155)는 상기 검출부(153)로부터 검출된 정보에 기초하여, 상기 출력부(154)에 피드백 신호를 출력하도록 구성된다. The control unit 155 is configured to output a feedback signal to the output unit 154 based on information detected by the detection unit 153 .

예를 들어, 상기 제어부(155)는 상기 품질 검출부(153a)로부터 검출된 품질정보가 미리 정해진 기준을 충족하는 않는 경우, 예를 들어, 얼굴 이미지를 기초로 영상 데이터 상의 중심부에 얼굴이 위치하지 않은 경우, 상기 출력부(154)를 통해 얼굴의 위치를 이동 요청에 대한 피드백을 출력할 수 있다. 또한, 영상 데이터 내 얼굴 이미지의 크기 또는 객체인 얼굴의 거리에 기초하여, 가까이 또는 멀게 이동하라는 피드백을 출력할 수 있다. For example, if the quality information detected by the quality detector 153a does not meet a predetermined criterion, for example, the controller 155 determines that the face is not located in the center of the image data based on the face image. In this case, the output unit 154 may output feedback about the request for moving the position of the face. In addition, based on the size of the face image in the image data or the distance of the face as an object, feedback to move closer or farther may be output.

또한, 상기 제어부(155)는 상기 품질 검출부(153a)에서 영상 데이터 내의 얼굴 이미지의 낮은 광량 또는 음영(그림자)가 검출되는 경우, 상기 조명부(152)를 제어하여, 사용자의 얼굴의 원하는 위치에 LED의 빛이 조사될 수 있도록 광량 또는 조사하는 각도를 제어하도록 구성된다. 예를 들어, LED가 위치를 달리 하여 복수개 설치된 경우, 음영이 발생한 위치에 따라 LED의 광량을 개별 제어할 수도 있다. 개별 제어 대상은 음영 발생 위치에 따라 미리 프로그램화 되어 있을 수 있다. 이에 따라, 사용자의 얼굴 이미지 상에 부족한 광량 또는 음영에 대해 빛을 제어하여 음영을 제거할 수 있다. 다만, 이에 한정되지 않는다. 상기 제어부(155)는 광량 조절 등에 대한 사용자의 조작 요청에 대한 피드백을 출력할 수 있다. In addition, the control unit 155 controls the lighting unit 152 when a low amount of light or shade (shadow) of the face image in the video data is detected by the quality detection unit 153a, so that the LED is positioned at a desired position on the user's face. It is configured to control the amount of light or the irradiation angle so that the light of the light can be irradiated. For example, when a plurality of LEDs are installed at different positions, the amount of light of the LEDs may be individually controlled according to the position where the shade is generated. Individual control targets may be programmed in advance according to the shadow occurrence position. Accordingly, it is possible to remove shadows by controlling light for an insufficient amount of light or shadows on the user's face image. However, it is not limited thereto. The control unit 155 may output feedback for a user's manipulation request for adjusting the amount of light.

또한, 상기 제어부(155)는 상기 품질 검출부(153a)에서 영상 데이터 내의 얼굴 이미지에서 빛 번짐이 검출되는 경우, 상기 출력부(154)를 통해 세안을 하고, 진단할 것을 제안하는 피드백을 출력할 수 있다. 또한, 상기 품질 검출부(153a)에서 영상 데이터 내의 얼굴 이미지에서 빛 반사가 검출되는 경우, 상기 출력부(154)를 통해 위치를 이동하여 외부 광원이 반사되는 위치를 피할 것을 제안하는 피드백을 출력할 수 있다.In addition, the control unit 155 may output feedback suggesting face washing and diagnosis through the output unit 154 when light smearing is detected in the face image in the video data by the quality detection unit 153a. there is. In addition, when the quality detection unit 153a detects light reflection in the face image in the video data, the output unit 154 may output feedback suggesting avoiding a position where an external light source is reflected by moving the position. there is.

또한, 상기 제어부(155)는 상기 얼굴 검출부(153b)로부터 영상 데이터 내의 사용자의 얼굴 이미지의 시점 정보가 검출되면, 학습된 데이터의 특정 시점에 대응하는 시점으로의 사용자의 얼굴을 회전하라는 피드백을 출력할 수 있다. 상기 특정 시점은 2개 이상을 전제로 하며, 예를 들어, 사용자의 정면, 좌우 측면 총 3개의 시점을 포함할 수 있다. 즉, 제어부(150)는 사용자의 얼굴 이미지의 시점이 미리 정해진 특정 시점에 대응될 때, 카메라부(151)를 제어하여 영상을 촬영하도록 구성된다. 즉, 상기 특정 시점이 복수인 경우, 상기 제어부(155)는 순차적으로 각 특정 시점에서 영상을 촬영하면서, 다음의 시점으로 사용자의 얼굴을 회전을 안내하는 피드백을 출력할 수 있다. 이에 따라, 도 3에 도시된 바와 같이, 사용자의 정면, 양 측면의 3개의 얼굴 이미지를 획득할 수 있다. In addition, when the face detection unit 153b detects viewpoint information of the user's face image in the video data, the control unit 155 outputs feedback instructing the user to rotate the user's face to a viewpoint corresponding to a specific viewpoint of the learned data. can do. The specific viewpoint is assumed to be two or more, and may include, for example, a total of three viewpoints of the user's front and left and right sides. That is, the controller 150 is configured to capture an image by controlling the camera unit 151 when the viewpoint of the user's face image corresponds to a predetermined specific viewpoint. That is, when there are a plurality of specific viewpoints, the controller 155 may output feedback for guiding rotation of the user's face to the next viewpoint while sequentially capturing images at each specific viewpoint. Accordingly, as shown in FIG. 3 , three face images of the front and both sides of the user may be acquired.

또한, 상기 제어부(155)는 상기 피부진단단말(150)의 동작 전반을 제어할 수 있다. 예를 들어, 상기 제어부(155)는 상기 검출부(153)를 제어하여, 얼굴 이미지 획득을 위한 인공신경망 모델에 기초하여 상기 카메라부(151)로부터의 실시간으로 전달받는 프리뷰 영상의 영상 데이터 내에 원하는 얼굴 이미지의 품질이 검출되면, 상기 카메라 영상 내에 사용자의 얼굴 이미지의 촬영에 있어 미리 설정된 트리거인 특정 얼굴 각도인지 여부를 판단하여, 상기 카메라부(151)를 제어하도록 구성된다. Also, the controller 155 may control overall operations of the skin diagnosis terminal 150 . For example, the control unit 155 controls the detection unit 153 to obtain a face image based on an artificial neural network model for a desired face within image data of a preview image received in real time from the camera unit 151. When the quality of the image is detected, the camera unit 151 is configured to control the camera unit 151 by determining whether it is a specific face angle, which is a preset trigger in capturing a user's face image in the camera image.

구체적으로, 상기 제어부(155)는 상기 영상 데이터의 복수의 프레임들로부터 특성을 추출하여 특성 맵(feature map)들을 생성하고, 이들을 연산하여, 실시간으로 상기 카메라 영상 내에 사용자 얼굴의 위치 및 얼굴 이미지의 품질을 추적하고, 품질을 충족시키는 경우, 사용자의 얼굴 이미지의 시점 정보에 기초하여 미리 설정된 트리거인 특정 각도에 해당되는지 여부를 판단하여, 상기 카메라부(151)를 제어하도록 구성된다. 이에 따라, 사용자가 특정 얼굴 각도에서 얼굴을 고정할 필요없이, 출력부에서의 피드백에 따라 얼굴을 움직일 경우, 특정 얼굴 각도가 트리거가 되어 촬영이 이루어진다. 다만, 이에 한정되지 않는다. 상기 영상 데이터 내 사용자의 얼굴 이미지의 시점 정보를 검출하여, 상기 특정 시점으로부터 미리 정해진 오차 범위 내에 있을 때, 상기 카메라부를 제어하여 동영상 촬영을 개시하고, 사용자가 얼굴의 각도를 움직이는 동안의 영상 프레임 내에 상기 특정 시점의 얼굴 이미지가 포함된 경우, 동영상 촬영을 중지하고, 해당 특정 시점의 얼굴 이미지가 포함된 영상 프레임을 추출하여 획득할 수도 있다. Specifically, the controller 155 extracts features from a plurality of frames of the image data to generate feature maps, calculates them, and calculates the position of the user's face and the face image in the camera image in real time. The quality is tracked, and if the quality is satisfied, it is configured to control the camera unit 151 by determining whether or not a specific angle, which is a preset trigger, is applied based on viewpoint information of the user's face image. Accordingly, when the user moves the face according to the feedback from the output unit without the need to fix the face at a specific face angle, the specific face angle triggers the photographing. However, it is not limited thereto. The viewpoint information of the user's face image in the video data is detected, and when it is within a predetermined error range from the specific viewpoint, the camera unit is controlled to start video recording, and within the image frame while the user moves the angle of the face. If the face image of the specific point in time is included, video recording may be stopped, and an image frame including the face image of the specific point in time may be extracted and obtained.

상기 제어부(155)는 영상 데이터의 프레임의 특성 맵에서 관심 영역(ROI)를 추출할 수 있으며, 또한, 영상 데이터의 프레임의 적어도 일부인 관심 영역을 얼굴 이미지의 촬영에 있어 미리 설정된 트리거가 있는지 여부를 판단하기 위한 주요 영역으로 설정할 수 있다. 예를 들어, 상기 제어부(155)는 영상 데이터의 프레임에서 사용자의 얼굴이 인식된 영역을 포함하도록 주요 영역을 설정할 수 있다. The control unit 155 may extract a region of interest (ROI) from a characteristic map of a frame of image data, and also determines whether there is a preset trigger in capturing a facial image of a region of interest, which is at least a part of a frame of image data. It can be set as the main area for judgment. For example, the controller 155 may set the main area to include the area where the user's face is recognized in the frame of the image data.

한편, 상기 특성 맵의 생성 및 연산은 입력층(Input Layer), 하나 이상의 은닉층(Hidden Layers) 및 출력층(Output Layer)으로 구성되는 DNN(Deep Neural Network) 기반으로 구현되는 레이어, 예를 들어, 컨볼루션 신경망(convolution neural network; CNN) 및/또는 순환 신경망(recurrent neural network RNN)를 기반으로 구현되는 레이어, 정정 선형 유닛(rectified linear unit; RELU) 레이어, 풀링(pooling) 레이어, 바이어스 가산(bias add) 레이어, 소프트맥스(softmax) 레이어 등과 같은 다양한 연산 레이어들을 포함하는 연산 모델에 의해 구현될 수 있다.On the other hand, the generation and operation of the feature map is a layer implemented based on a deep neural network (DNN) consisting of an input layer, one or more hidden layers, and an output layer, for example, convolution Layers implemented based on convolution neural networks (CNNs) and/or recurrent neural networks (RNNs), rectified linear unit (RELU) layers, pooling layers, bias add ) layer, softmax layer, etc., may be implemented by a computation model including various computation layers.

여기서, 인공 신경망(artificial neural network; ANN)이란 연결 선으로 연결된 많은 수의 인공 뉴런들을 사용하여 생물학적인 시스템의 계산 능력을 모방하는 소프트웨어나 하드웨어로 구현된 연산 모델을 나타낸다. 상기 인공 신경망에서는 생물학적인 뉴런의 기능을 단순화시킨 인공 뉴런을 사용하게 된다. 그리고 연결 강도를 갖는 연결 선을 통해 상호 연결시켜 인간의 인지 작용이나 학습 과정을 수행하게 된다. 즉, 상기 검출부(153)에서 이용되는 상기 연산 모델은 얼굴 이미지의 품질 및 시점 정보가 학습된 인공 신경망 모델일 수 있다. Here, an artificial neural network (ANN) represents a computation model implemented in software or hardware that imitates the computational capability of a biological system by using a large number of artificial neurons connected by connection lines. In the artificial neural network, artificial neurons that simplify the functions of biological neurons are used. In addition, human cognitive function or learning process is performed by interconnecting them through connection lines having connection strength. That is, the calculation model used in the detection unit 153 may be an artificial neural network model in which quality and viewpoint information of a face image are learned.

즉, 상기 제어부(155)는 카메라를 통해 미리 학습한 얼굴 이미지의 품질 정보가 검출되면, 품질 정보가 미리 정해진 기준을 충족하는 경우, 서로 다른 시점에서의 사용자의 얼굴 이미지를 촬영하고, 획득된 사용자의 서로 다른 시점에서의 얼굴 이미지를 상기 중앙 서버(300)에 전송하도록 구성된다. 한편, 본 실시예에서는, 상기 얼굴 이미지 획득 모델에 기초하여, 영상 데이터 내에 얼굴 이미지의 품질 및 시점 정보가 검출되는 것을 예로 설명하였으나, 이에 한정되지 않는다. 예를 들어, 얼굴 이미지의 품질을 학습한 품질 학습 모델과 얼굴 이미지의 시점 정보를 학습한 시점 학습 모델로 구별될 수 있다. 즉, 앙상블 기법에 의해, 품질 학습 모델에 기초하여 얼굴 이미지의 품질을 검출한 후, 상기 시점 학습 모델에 기초하여 학습된 특정 시점에서 카메라 촬영을 제어할 수 있다. 또한, 시점별로 별도의 학습 모델에 기초하여 얼굴 이미지 촬영을 할 수 있다.That is, when the quality information of the face image previously learned through the camera is detected, the controller 155 captures the user's face image at different points in time when the quality information meets a predetermined criterion, and acquires the user's face image. It is configured to transmit face images at different viewpoints to the central server 300. On the other hand, in the present embodiment, based on the face image acquisition model, the quality of the face image and viewpoint information are detected in the image data as an example, but it is not limited thereto. For example, a quality learning model that learns the quality of a face image and a viewpoint learning model that learns viewpoint information of a face image can be distinguished. That is, after detecting the quality of the face image based on the quality learning model by the ensemble technique, camera shooting can be controlled at a specific viewpoint learned based on the viewpoint learning model. In addition, a face image may be captured based on a separate learning model for each viewpoint.

또한, 본 실시예에 따른 상기 제어부(155)는 사용자의 얼굴 이미지의 시점이 학습 데이터인 미리 정해진 특정 시점에 대응될 때, 카메라부(151)를 제어하여 영상을 촬영하는 것으로 설명하였으나, 이에 한정되지 않는다. 예를 들어, 상기 제어부(151)는 사용자의 얼굴 이미지 상에 병변이 검출되는 경우, 검출된 위치에 따라 촬영할 시점 정보를 결정하여 촬영을 제어할 수 있다. 예를 들어, 도 4를 참조하여 설명하면, 사용자의 왼쪽 뺨에 병변(A)이 검출된 경우, 사용자의 왼쪽 뺨이 카메라 정면을 향하도록 우측으로 θ도, 예를 들어 45도 회전된 시점과 우측으로 추가로 90도만큼 더 회전된 135도 회전된 시점으로 촬영할 시점이 결정될 수 있다. 이는, 정밀한 피부 진단을 위해 병변(A)의 크기(너비)와 돌출(높이) 정도를 파악하기 위한 것으로 후술할 중앙 서버(300)에서의 3D 얼굴 이미지 변형에 있어서, 병변의 위치에서의 상태 파악이 보다 용이하게 이루어지기 위함이다. 예를 들어, 병변의 위치가 여러개인 경우, 병변의 갯수에 따라 촬영할 시점이 결정될 수 있으며, 일정 영역에 군집되어 있는 병변의 경우는 그룹화하여, 그룹화된 영역의 중심을 기준으로 촬영할 시점이 결정될 수 있다. In addition, the control unit 155 according to the present embodiment has been described as capturing an image by controlling the camera unit 151 when the viewpoint of the user's face image corresponds to a predetermined specific viewpoint, which is learning data, but is limited to this. It doesn't work. For example, when a lesion is detected on the user's face image, the control unit 151 may control photographing by determining viewpoint information to be photographed according to the detected location. For example, referring to FIG. 4, when a lesion A is detected on the user's left cheek, the user's left cheek is rotated θ degrees to the right, for example, 45 degrees, so that the user's left cheek faces the front of the camera. A photographing time point may be determined as a time point rotated by 135 degrees further rotated by 90 degrees to the right. This is to determine the size (width) and protrusion (height) of the lesion (A) for precise skin diagnosis. to make this easier. For example, when there are multiple locations of lesions, the imaging time point may be determined according to the number of lesions, and in the case of lesions clustered in a certain area, the imaging time point may be determined based on the center of the grouped area by grouping the lesions. there is.

한편, 병변(A)의 위치는 상기 얼굴 이미지 획득 모델의 학습 단계에서 병변에 대한 정보를 추가로 학습함으로써, 상기 얼굴 이미지 획득 모델을 이용하여 검출할 수 있다. 또는, 이와 달리, 일반적으로 병변이 자주 발생하는 이마, 좌우 뺨에 대한 정밀 진단을 위해, 촬영할 시점은, 얼굴 이미지의 정면, 양 측면, 우측으로 45도, 135도 회전된 측면, 좌측으로 45도, 135도 회전된 측면 총 7개의 시점으로 결정될 수 있다.Meanwhile, the location of the lesion A may be detected using the face image acquisition model by additionally learning information about the lesion in the learning step of the face image acquisition model. Alternatively, for precise diagnosis of the forehead and left and right cheeks where lesions commonly occur, the time to be photographed is the front of the face image, both sides, 45 degrees to the right, the side rotated 135 degrees, and 45 degrees to the left. , can be determined as a total of 7 viewpoints on the side rotated by 135 degrees.

한편, 도 4의 경우, 상기 피부진단단말(150)의 카메라의 렌즈 수차가 발생되지 않는 것을 전제로 사용자의 얼굴을 향하는 전면에 카메라인 것으로 가정하여 설명하였으나, 이에 한정되지 않는다. 예를 들어, 정중앙에 단일의 카메라에 의해 동작하는 피부진단단말(150)의 경우, 병변의 높이를 측정하기 위한 요구되는 얼굴의 회전 각도는 사용자로부터의 거리를 고려하여 달리 결정될 수 있다. 또는 이와 달리, 미리 정해진 떨어진 위치에서 병변의 위치가 카메라의 위치로부터 최단 거리에 위치하도록 사용자에게 피드백을 제공하여 얼굴의 위치를 좌우로 이동시킨 후, 촬영을 할 수도 있다. Meanwhile, in the case of FIG. 4 , it is assumed that the skin examination terminal 150 has a front camera facing the user's face on the premise that lens aberration does not occur, but the present invention is not limited thereto. For example, in the case of the skin diagnostic terminal 150 operated by a single camera at the center, a rotation angle of the face required for measuring the height of the lesion may be differently determined in consideration of the distance from the user. Alternatively, a picture may be taken after moving the position of the face left and right by providing feedback to the user so that the position of the lesion is located at the shortest distance from the position of the camera at a predetermined distance.

한편, 본 실시예에 따른 피부진단단말(150)은 카메라부(151), 조명부(152), 검출부(153), 출력부(154) 및 제어부(155)를 포함하는 것을 예로 설명하였으나, 이에 한정되지 않는다. 본 실시예에 따른 피부진단시스템(1000)은 중앙 서버(300)에서 생성한 얼굴 이미지 획득 모델에 기초하여, 사용자의 얼굴 이미지를 촬영하기 때문에, 피부진단단말의 종류를 불문한다. 예를 들어, 상기 제1 피부진단단말(150a)는 카메라부(151), 조명부(152)만 포함할 수 있고, 검출부(153), 출력부(154) 및 제어부(155)는 사용자 단말(100)에서 기능을 수행할 수도 있다. 이와 달리, 상기 제2 피부진단단말(150b)는 카메라부(151), 조명부(152), 출력부(154) 만 포함할 수 있고, 검출부(153) 및 제어부(155)는 사용자 단말(100)에서 기능을 수행할 수도 있다. 또는 이와 달리, 별도의 피부진단단말(150)없이 사용자단말(100)로 각 구성의 기능을 수행할 수도 있다. 이 때, 상기 사용자단말(100)은 스마트폰인 것을 전제로 한다. Meanwhile, the skin diagnosis terminal 150 according to the present embodiment has been described as including a camera unit 151, a lighting unit 152, a detection unit 153, an output unit 154, and a control unit 155 as an example, but is limited thereto. It doesn't work. Since the skin diagnosis system 1000 according to the present embodiment captures a user's face image based on the face image acquisition model generated by the central server 300, it does not matter what type of skin diagnosis terminal is used. For example, the first skin diagnosis terminal 150a may include only the camera unit 151 and the lighting unit 152, and the detection unit 153, the output unit 154, and the control unit 155 may include the user terminal 100 ) may perform the function. Unlike this, the second skin diagnosis terminal 150b may include only the camera unit 151, the lighting unit 152, and the output unit 154, and the detection unit 153 and the control unit 155 may include the user terminal 100 function can also be performed. Alternatively, the function of each component may be performed by the user terminal 100 without a separate skin diagnosis terminal 150 . At this time, it is assumed that the user terminal 100 is a smart phone.

이하, 도 5 내지 도 7을 참조하여, 상기 중앙 서버(300)에서 사용자의 2D 얼굴 이미지(VIMG)에서 3D 얼굴 이미지(3DVIMG)로 변환하는 것을 설명한다. Hereinafter, the conversion from the 2D face image (VIMG) of the user to the 3D face image (3DVIMG) in the central server 300 will be described with reference to FIGS. 5 to 7 .

도 5는 딥러닝 기술을 이용하여 수신된 2D 얼굴 이미지를 3D 얼굴 이미지로 변환하는 중앙 서버를 설명하기 위한 블록도이다. 도 6는 딥러닝 기술을 이용하여 얼굴 이미지 획득 모델을 생성시키는 것을 설명하기 위한 블록도이다. 도 7은 2D 얼굴 이미지를 3D 얼굴 이미지로 변환하는 예시도이다. 5 is a block diagram illustrating a central server that converts a received 2D face image into a 3D face image using deep learning technology. 6 is a block diagram illustrating generating a face image acquisition model using deep learning technology. 7 is an exemplary view of converting a 2D face image into a 3D face image.

도 5를 참조하면, 상기 중앙 서버(300)는 입력 버퍼(310), 적어도 하나의 프로세싱 소자(320) 및 출력 버퍼(330)를 포함한다. 상기 중앙 서버(300)는 파라미터 버퍼(340) 및 메모리(350)를 더 포함할 수 있다.Referring to FIG. 5 , the central server 300 includes an input buffer 310 , at least one processing element 320 and an output buffer 330 . The central server 300 may further include a parameter buffer 340 and a memory 350 .

상기 입력 버퍼(310)는 서로 다른 시점에서의 사용자의 복수의 얼굴 이미지(VIMG)를 수신할 수 있다. The input buffer 310 may receive a plurality of user's face images VIMG at different viewpoints.

상기 프로세싱 소자(320)는 본 발명의 실시예들에 따른 서로 다른 시점의 복수의 2D 얼굴 이미지(VIMG)를 3D 얼굴 이미지(3DVIMG)으로 변환을 수행한다.The processing element 320 converts a plurality of 2D face images VIMG from different viewpoints into a 3D face image 3DVIMG according to embodiments of the present invention.

구체적으로, 상기 2D 얼굴 이미지(VIMG)는 촬영 대상물인 사용자의 얼굴을 일반적인 카메라를 이용하여 촬영한 이미지로, 상기 촬영 대상물의 x좌표, y좌표 및 그레이 스케일 값(또는 컬러 영상인 경우, R, G, B 값)에 대한 정보를 포함할 수 있다([f(x, y)]). 이는 촬영된 얼굴 이미지 영상은 이차원 평면 상에 표현되기 때문이다. 즉, 상기 2D 얼굴 이미지(VIMG)는 촬영 대상물의 이차원 평면 상에서의 좌표값 들의 정보를 포함하고 있으며, 촬영 대상물의 깊이(depth; 즉, 3차원 위치 정보)를 포함하지 않고 있다.Specifically, the 2D face image (VIMG) is an image of a user's face, which is a photographing target, photographed using a general camera, and the x-coordinate, y-coordinate and gray scale value of the photographing target (or in the case of a color image, R, G and B values) may be included ([f(x, y)]). This is because the photographed face image is expressed on a two-dimensional plane. That is, the 2D face image VIMG includes information on coordinate values of an object to be captured on a two-dimensional plane, but does not include the depth of the object to be captured (ie, 3D position information).

상기 프로세싱 소자(320)는 영상 변환부(321)를 포함할 수 있다. 상기 영상 변환부(321)는 상기 입력 버퍼(310)로부터 수신한 상기 2D 얼굴 이미지(VIMG)를 미리 학습된 인공 신경망 모델을 이용하여 상기 2D 얼굴 이미지(VIMG)의 각 이미지에 포함되지 않은 정보인 3차원 공간의 좌표값, 즉 뎁스(depth)값을 추론할 수 있다.The processing element 320 may include an image conversion unit 321 . The image converter 321 converts the 2D face image VIMG received from the input buffer 310 into information that is not included in each image of the 2D face image VIMG by using a pre-learned artificial neural network model. A coordinate value in a 3D space, that is, a depth value may be inferred.

이에 따라 상기 촬영 대상물의 2차원적인 위치(f(x,y))만을 포함하는 이미지 영상으로부터 상기 촬영 대상물의 3차원적인 위치(f(x,y,z))정보를 포함하는 3D 이미지 영상(3DVIMG)를 생성할 수 있다.Accordingly, a 3D image image (including information on the 3-dimensional position (f(x,y,z)) of the object to be captured is obtained from an image image including only the 2-dimensional position (f(x,y)) of the object to be captured. 3DVIMG) can be created.

즉, 상기 프로세싱 소자(320)는 각각의 프레임 영상이 상기 촬영 대상물의 2차원 좌표만을 포함하는 상기 2D 얼굴 이미지(VIMG)으로부터 상기 촬영 대상물의 3차원 좌표를 포함하는 상기 3D 얼굴 이미지(3DVIMG)를 생성할 수 있다.That is, the processing element 320 converts the 3D face image 3DVIMG including the 3D coordinates of the object from the 2D face image VIMG in which each frame image includes only the 2D coordinates of the object to be captured. can create

예를 들면, 상기 프로세싱 소자(320)는 상기 2D 얼굴 이미지(VIMG) 각각에 대한 특성을 추출하여 특성 맵(feature map)들을 생성하고, 이들을 연산하여, 상기 각각의 프레임 영상들의 뎁스(depth)를 추론할 수 있다. 이때, 상기 특성 맵의 생성 및 연산은 컨볼루션 신경망(convolution neural network; CNN) 및/또는 순환 신경망(recurrent neural network RNN)를 기반으로 구현되는 레이어, 정정 선형 유닛(rectified linear unit; RELU) 레이어, 풀링(pooling) 레이어, 바이어스 가산(bias add) 레이어, 소프트맥스(softmax) 레이어 등과 같은 다양한 연산 레이어들을 포함하는 연산 모델에 의해 구현될 수 있다.For example, the processing element 320 generates feature maps by extracting features of each of the 2D face images (VIMG), calculates them, and determines the depth of each of the frame images. can be inferred. At this time, the generation and operation of the feature map is a layer implemented based on a convolution neural network (CNN) and / or a recurrent neural network (RNN), a rectified linear unit (RELU) layer, It may be implemented by a computation model including various computation layers such as a pooling layer, a bias add layer, a softmax layer, and the like.

상기 3D 얼굴 이미지(3DVIMG)는 서로 다른 시점의 2D 얼굴 이미지(VIMG)를 이용하여 3차원 모델링한 3D 이미지다. 따라서, 상기 3D 얼굴 이미지(3DVIMG)는 각 시점에서의 3차원 좌표(x좌표, y좌표 및 z좌표)에 대한 정보를 포함할 수 있다 ([f(x, y, z)]).The 3D face image 3DVIMG is a 3D image obtained by 3D modeling using 2D face images VIMG from different viewpoints. Accordingly, the 3D face image 3DVIMG may include information on 3D coordinates (x coordinate, y coordinate, and z coordinate) at each viewpoint ([f(x, y, z)]).

즉, 상기 3D 얼굴 이미지(3DVIMG)는 상기 촬영된 대상물의 3차원 좌표 정보를 포함하고 있으므로, 도 7에 도시된 바와 같이, 사용자의 얼굴 이미지를 사용자가 원하는 다양한 각도에서 회전시키면서 바라볼 수 있다. 즉, 상기 3D 얼굴 이미지(3DVIMG)의 시점을 사용자가 원하는 위치 및 방향으로, 자유롭게 변경할 수 있다.That is, since the 3D face image 3DVIMG includes 3D coordinate information of the photographed object, as shown in FIG. 7 , the user's face image can be rotated and viewed from various angles desired by the user. That is, the viewpoint of the 3D face image 3DVIMG can be freely changed to a position and direction desired by the user.

종래의 경우, 서로 다른 시점(view point)에서 동일한 촬영대상물을 촬영된 복수의 영상들을 스티칭(stitching) 등의 기술을 이용하여, 3D 이미지 영상을 생성하는 방법을 사용하였으나, 본 실시예에 의하면, 미리 학습된 연산 모델을 구현하는 프로세서(320)를 이용하여, 2D 얼굴 이미지로부터 3D 얼굴 이미지를 생성할 수 있다.In the conventional case, a method of generating a 3D image using a technique such as stitching a plurality of images of the same object to be photographed at different viewpoints was used, but according to the present embodiment, A 3D face image may be generated from a 2D face image using the processor 320 that implements a pre-learned calculation model.

이는 앞서 설명한 것과 마찬가지로, 사람이 대상의 특정 시점의 이미지만으로 눈에 보이지 않는 전체적인 형상을 추론하는 것과 유사하게, 미리 학습된 인공 신경망 연산 모델을 통해, 얼굴의 특정 시점의 이미지만으로 전체의 얼굴 윤곽 등을 추론(뎁스(depth)를 추론)할 수 있기 때문이다. 한편, 서로 다른 시점의 복수의 2D 얼굴 이미지를 입력하는 것은 시점별 얼굴 이미지를 많이 입력할수록 보다 정밀한 3D 얼굴 이미지로의 변환이 가능하기 때문이다. 즉, 본원은 사용자로부터 간단히 요구할 수 있는 2D 얼굴 이미지를 수신하고, 이에 기초하여, 모든 시점에서의 정밀 진단이 가능한 3D 얼굴 이미지의 획득이 가능하다. 이에 따라, 비교적 저렴한 피부진단단말을 이용하더라도 정밀한 피부 진단을 받을 수 있으며, 얼굴 각도 변경에 따른 흔들림 또는 외부 광의 영향을 최소화할 수 있다. 또한, 정밀한 피부 진단 및 데이터 활용에 있어서 통일된 방식의 데이터 확보가 가능하다. As described above, this is similar to how a person infers the overall invisible shape of an object only with the image of a specific viewpoint, through a pre-learned artificial neural network operation model, the entire facial contour, etc. This is because it can infer (infer depth). On the other hand, inputting a plurality of 2D face images of different viewpoints is because the more face images for each viewpoint are input, the more precise conversion into 3D face images is possible. That is, the present invention receives a 2D face image that can be simply requested from a user, and based on this, it is possible to obtain a 3D face image capable of precise diagnosis at all viewpoints. Accordingly, precise skin diagnosis can be obtained even when a relatively inexpensive skin diagnosis terminal is used, and the influence of shaking or external light due to a change in face angle can be minimized. In addition, it is possible to secure data in a unified way in precise skin diagnosis and data utilization.

일 실시예에서, 상기 프로세싱 소자(320)는 상술한 복수의 연산들을 수행하기 위해 중앙 처리 장치(central processing unit; CPU), 그래픽 처리 장치(graphic processing unit; GPU), 신경 처리 장치(neural processing unit; NPU), 디지털 신호 프로세서(digital signal processor; DSP), 영상 신호 프로세서(image signal processor; ISP) 등과 같은 다양한 처리 장치들 중 적어도 하나를 포함하여 구현될 수 있다. 실시예에 따라서, 프로세싱 소자(320)는 상술한 처리 장치들 중 동일한 종류의(homogeneous) 처리 장치들을 복수 개 포함하거나, 서로 다른 종류의(heterogeneous) 처리 장치들을 복수 개 포함하여 구현될 수 있다.In one embodiment, the processing element 320 may include a central processing unit (CPU), a graphic processing unit (GPU), and a neural processing unit to perform the plurality of operations described above. NPU), a digital signal processor (DSP), an image signal processor (ISP), and the like. According to embodiments, the processing element 320 may include a plurality of homogeneous processing devices among the above-described processing devices or may be implemented by including a plurality of heterogeneous processing devices.

일 실시예에서, 상기 프로세싱 소자(320)는 상술한 복수의 연산들을 병렬 처리하기 위해 복수의 프로세서 코어(processor core)들을 포함하여 구현될 수 있다.In one embodiment, the processing element 320 may be implemented by including a plurality of processor cores to parallelly process the plurality of operations described above.

상기 출력 버퍼(330)는 상기 프로세싱 소자(320)의 연산의 결과로서 출력 결과인 상기 3D 얼굴 이미지(3DVIMG)를 저장 및 출력할 수 있다. 예를 들어, 상기 출력 버퍼(330)는 적어도 하나의 레지스터를 포함할 수 있다.The output buffer 330 may store and output the 3D face image 3DVIMG as an output result as an operation result of the processing element 320 . For example, the output buffer 330 may include at least one register.

상기 파라미터 버퍼(340)는 프로세싱 소자(320)가 상술한 복수의 연산들을 수행하는데 이용되는 복수의 파라미터들 및/또는 복수의 하이퍼 파라미터(hyper parameter)들을 저장할 수 있다. 예를 들면, 상기 파라미터 버퍼(340)는 학습 과정에 의해 학습된 인공 신경망 모델의 파라미터들을 저장할 수 있다.The parameter buffer 340 may store a plurality of parameters and/or a plurality of hyper parameters used for the processing element 320 to perform the plurality of operations described above. For example, the parameter buffer 340 may store parameters of an artificial neural network model learned through a learning process.

상기 메모리(350)는 상기 프로세싱 소자(120)에 의해 처리되었거나 처리될 예정인 데이터들을 임시로 또는 지속적으로 저장할 수 있다. 예를 들어, 상기 메모리(150)는 DRAM(dynamic random access memory), SRAM(static random access memory) 등과 같은 휘발성 메모리, 및 플래시 메모리(flash memory), PRAM(phase change random access memory), RRAM(resistance random access memory), NFGM(nano floating gate memory), PoRAM(polymer random access memory), MRAM(magnetic random access memory), FRAM(ferroelectric random access memory) 등과 같은 비휘발성 메모리 중 적어도 하나를 포함할 수 있다. 실시예에 따라서, 상기 메모리(350)는 SSD(solid state drive), eMMC(embedded multimedia card), UFS(universal flash storage) 등과 같은 대용량 저장 장치의 형태로 구현될 수도 있다.The memory 350 may temporarily or continuously store data processed or to be processed by the processing element 120 . For example, the memory 150 may include volatile memory such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, phase change random access memory (PRAM), and resistance random access memory), nano floating gate memory (NFGM), polymer random access memory (PoRAM), magnetic random access memory (MRAM), and ferroelectric random access memory (FRAM). According to embodiments, the memory 350 may be implemented in the form of a mass storage device such as a solid state drive (SSD), an embedded multimedia card (eMMC), or a universal flash storage (UFS).

도시하지는 않았으나, 상기 2D 얼굴 이미지(VIMG)를 3D 얼굴 이미지(3DVIMG)로 변환하기 위해 구성요소들의 전반적인 동작을 제어하는 제어부, 특정 작업의 할당을 관리하는 작업 관리자 등을 더 포함할 수 있다.Although not shown, it may further include a control unit for controlling overall operations of components to convert the 2D face image VIMG into a 3D face image 3DVIMG, a task manager for managing assignment of specific tasks, and the like.

도 6는 딥러닝 기술을 이용하여 얼굴 이미지 획득 모델을 생성시키는 것을 설명하기 위한 블록도이다.6 is a block diagram illustrating generating a face image acquisition model using deep learning technology.

도 5 및 5를 참조하면, 상기 프로세싱 소자(320)는 학습모델 생성부(322)를 더 포함할 수 있다. 상기 학습모델 생성부(322)는 인공지능에 기반하여, 학습 2D 얼굴 이미지(TVIMG)와 대응되는 학습 3D 얼굴 이미지(3DTVIMG)를 학습하도록 구성된다. Referring to FIGS. 5 and 5 , the processing element 320 may further include a learning model generator 322 . The learning model generation unit 322 is configured to learn the learning 3D face image 3DTVIMG corresponding to the learning 2D face image TVIMG based on artificial intelligence.

여기서, 학습되는 학습 2D 얼굴 이미지(TVIMG) 및 학습 3D 얼굴 이미지(3DTVIMG)는 정밀한 피부 진단이 가능하도록 원하는 품질의 학습 이미지로 구성될 수 있다. 즉, 상기 학습 이미지는 정밀 피부 진단에 적합한 얼굴의 위치, 크기, 밝기, 초점 등이 적용된 고품질의 얼굴 이미지가 포함될 수 있다. 즉, 학습 전 전처리에 있어서, 정밀한 피부 진단이 가능한 고품질의 학습 데이터를 선별함으로써, 사용자 얼굴 이미지의 품질과의 비교가 가능하도록 할 수 있다. 또한, 각 품질 항목별 비교를 위해 대조군이 되는 학습 데이터도 함께 학습시킬 수 있다. 예를 들어, 지도학습에 의해 얼굴 이미지 전체에 광량이 적합한 경우, 부분적으로 음영이 발생한 경우, 등 얼굴 이미지의 품질에 있어 피드백할 수 있는 데이터를 학습 시킬 수 있다. 또한, 학습 데이터인 얼굴 이미지에는 각각 병변에 대한 정보가 지도학습될 수 있다. 이에 따라, 검출 단계에서 병변을 검출하여, 병변에 대한 정밀 검사가 가능한 시점 정보를 산출하여 촬영이 이루어질수 있다. Here, the learning 2D face image (TVIMG) and the learning 3D face image (3DTVIMG) may be configured as learning images of a desired quality to enable precise skin diagnosis. That is, the learning image may include a high-quality face image to which the location, size, brightness, focus, etc. of the face suitable for precise skin diagnosis are applied. That is, in pre-processing before learning, it is possible to compare with the quality of the user's face image by selecting high-quality learning data capable of accurate skin diagnosis. In addition, learning data serving as a control group for comparison of each quality item can be trained together. For example, by supervised learning, data that can be fed back in terms of the quality of a face image, such as when the amount of light is suitable for the entire face image or when shadows are partially generated, can be learned. In addition, information on each lesion may be supervised on the face image, which is learning data. Accordingly, imaging can be performed by detecting a lesion in the detection step and calculating information on a point in time at which a precise examination of the lesion can be performed.

한편, 상기 대응되는 학습 3D 얼굴 이미지(3DTVIMG)는, 학습 대상인 얼굴 이미지를 여러 시점에서 촬영한 학습 2D 얼굴 이미지(TVIMG)로부터 획득할 수 있다. 예를 들어, 상기 학습 2D 얼굴 이미지(TVIMG)는 여러 시점에서 촬영되는 복수의 카메라를 이용하여 획득할 수 있다. 상기 학습 2D 얼굴 이미지(TVIMG)는 적어도 두 개 이상의 카메라를 이용하여, 동일한 시간 동안 동일한 학습용 촬영 대상물을 서로 다른 시점(view point)에서 촬영한 두 개 이상의 이미지 영상 세트를 포함할 수 있다. 예를 들어, 4개의 카메라를 이용하여 상기 학습 2D 얼굴 이미지(TVIMG)를 획득하는 경우, 상기 학습 2D 얼굴 이미지(TVIMG)는 동일한 학습용 얼굴 이미지를 서로 다른 시점(view point)에서 촬영한 제1 내지 제4 학습 이미지 영상들(TVIMG1, TVIMG2, TVIMG3, TVIMG4)를 포함하는 것이다. 예를 들어, 사용자의 얼굴 이미지의 촬영 시점이 미리 정해진 경우, 상기 이미지 영상들의 시점 중 상기 촬영 시점이 포함될 수 있다. 한편, 촬영 시점은 얼굴 각도에 대응하며, 얼굴 각도는 좌우 각도, 위아래 각도를 포함한다. 예를 들어, 상기 학습 2D 얼굴 이미지(TVIMG)의 얼굴의 위아래 각도는 기준 각도로 고정될 수 있다. Meanwhile, the corresponding learning 3D face image 3DTVIMG can be obtained from a learning 2D face image TVIMG taken at various viewpoints of a face image to be learned. For example, the learning 2D face image (TVIMG) can be acquired using a plurality of cameras photographed at various viewpoints. The learning 2D face image (TVIMG) may include two or more image sets obtained by capturing the same learning subject at different viewpoints during the same period of time using at least two or more cameras. For example, when the training 2D face image (TVIMG) is acquired using four cameras, the training 2D face image (TVIMG) is obtained from first to second learning face images taken at different viewpoints. It includes the fourth learning image images (TVIMG1, TVIMG2, TVIMG3, and TVIMG4). For example, if the capturing point of time of the user's face image is predetermined, the capturing point of time may be included among the points of view of the image images. Meanwhile, a photographing viewpoint corresponds to a face angle, and the face angle includes left and right angles and up and down angles. For example, an upper and lower angle of the face of the learning 2D face image TVIMG may be fixed as a reference angle.

즉, 상기 학습 2D 얼굴 이미지(TVIMG)는 적어도 두 개 이상의 카메라들을 이용하여 획득할 수 있다. 예를 들면, 상기 2D 얼굴 이미지(TVIMG)은 제1 카메라를 이용하여 촬영된 제1 학습 2D 얼굴 이미지(TVIMG1), 제2 카메라를 이용하여 촬영된 제2 학습 2D 얼굴 이미지(TVIMG2), 제3 카메라를 이용하여 촬영된 제3 학습 2D 얼굴 이미지(TVIMG3) 및 제4 카메라를 이용하여 촬영된 제4 학습 2D 얼굴 이미지(TVIMG1)를 포함할 수 있다.That is, the learning 2D face image TVIMG can be acquired using at least two or more cameras. For example, the 2D face image TVIMG includes a first learning 2D face image TVIMG1 captured using a first camera, a second learning 2D face image TVIMG2 captured using a second camera, and a third learning 2D face image TVIMG2 captured using a second camera. A third training 2D face image (TVIMG3) captured using a camera and a fourth training 2D face image (TVIMG1) captured using a fourth camera may be included.

상기 제1 내지 제4 카메라들을 이용하여 학습용 촬영 대상물을 동시에 촬영하여, 상기 학습용 촬영 대상물을 서로 다른 각도에서 촬영한 학습 이미지 영상을 획득할 수 있다. 상기 제1 카메라가 촬영하는 영상의 평면의 축을 x 축 및 y축으로 정의하면, xz평면상에서 볼 때, 상기 학습용 촬영 대상물을 중심으로, 인접하는 상기 제1 내지 제4 카메라들 간의 각도는 45도, 90도 또는 135도로 설정되고, 상기 제1 카메라와 상기 제2 카메라가 서로 마주보고, 상기 제3 카메라와 상기 제4 카메라가 서로 마주보도록 설정할 수 있다. It is possible to acquire learning image images obtained by taking pictures of the subject for learning at the same time using the first to fourth cameras, and taking pictures of the subject for learning at different angles. If the axis of the plane of the image captured by the first camera is defined as the x-axis and the y-axis, when viewed on the xz plane, the angle between the first to fourth cameras adjacent to the learning shooting object as the center is 45 degrees , 90 degrees or 135 degrees, the first camera and the second camera may face each other, and the third camera and the fourth camera may face each other.

본 실시예에 따른, 이미지 영상을 3D 이미지 영상으로 변환하는 영상 변환부는 복수 사람들의 동일한 동작들을 반복적으로 학습하여, 하나의 2D 얼굴 이미지에 2차원 좌표 정보만을 포함하는 이미지 영상으로부터 하나의 프레임 영상이 3차원 좌표 정보를 포함하는 3D 얼굴 이미지를 추론(inference)할 수 있다. 즉, 얼굴 정면에서의 2D 얼굴 이미지만으로도 3D 얼굴 이미지의 추론이 가능하다. According to the present embodiment, the image conversion unit for converting image images into 3D image images repeatedly learns the same actions of a plurality of people, so that one frame image is obtained from an image image including only 2D coordinate information in one 2D face image. A 3D face image including 3D coordinate information may be inferred. That is, it is possible to infer a 3D face image only from a 2D face image in front of the face.

다만, 본 실시예의 피부진단 시스템(1000)은 정밀한 피부 진단을 위해, 서로 다른 시점의 복수의 사용자의 2D 얼굴 이미지를 획득하도록 구성된다. 이 때, 3D 변환에 있어, 서로 다른 특정 시점의 2D 얼굴 이미지를 입력함으로써, 3D 얼굴 이미지를 추론하는데, 입력된 여러 시점에 있어서는 얼굴 이미지 정보가 가중합(weighted sum)하는 후처리에 의해 보다 정밀한 3D 얼굴 이미지가 획득되게 된다. 즉, 3D 얼굴 이미지에 있어서, 입력된 여러 시점에서의 정밀한 피부 진단이 가능해진다. However, the skin diagnosis system 1000 of this embodiment is configured to acquire 2D face images of a plurality of users at different viewpoints for precise skin diagnosis. At this time, in the 3D conversion, a 3D face image is inferred by inputting 2D face images of different specific viewpoints, and at various viewpoints input, more precise face image information is weighted summed by post-processing. A 3D face image is obtained. That is, in the 3D face image, precise skin diagnosis at various viewpoints input is possible.

예를 들어, 3D 얼굴 이미지 변형에 있어서, 병변의 위치에서의 상태 파악이 보다 용이하게 하기 위해, 피부진단단말의 얼굴 이미지 검출 단계에서 사용자의 얼굴 이미지 상에 병변의 위치가 검출되는 경우, 검출된 위치에 따라 촬영할 시점 정보를 결정하여 촬영을 제어할 수 있다. 즉, 사용자 맞춤형의 정밀한 피부 진단을 위해 입력이 요구되는 시점 정보가 결정될 수 있다. For example, in the 3D face image transformation, in order to more easily grasp the state at the location of the lesion, when the location of the lesion is detected on the user's face image in the step of detecting the face image of the skin diagnosis terminal, the detected Photographing may be controlled by determining viewpoint information to be photographed according to a location. That is, time point information at which input is required for user-customized precise skin diagnosis may be determined.

학습 3D 얼굴 이미지(3DTVIMG)는 상기 영상 변환부(321)의 미리 학습된 인공 신경망 모델을 이용하여 2D 얼굴 이미지에 포함되지 않은 정보인 3차원 공간의 좌표값, 즉 뎁스(depth)값을 추론하여 획득될 수 있다. The learning 3D face image 3DTVIMG uses the pre-learned artificial neural network model of the image conversion unit 321 to infer the coordinates of the 3D space, that is, the depth value, which is information not included in the 2D face image. can be obtained

여기서, 동일한 대상의 학습용 얼굴 이미지에 대한 서로 다른 시점(view point)에서 촬영한 시점 정보는 특정 시점으로 관리될 수 있다. 즉, 학습 데이터로서 해당 특정 시점에서의 2D 얼굴 이미지가 학습되기 때문에, 사용자의 3D 얼굴 이미지의 변환을 위해 해당 특정 시점으로부터의 사용자의 2D 얼굴 이미지를 요구할 수 있다. 즉, 앞서 설명한 바와 같이, 상기 피부진단단말(150)은 해당 특정 시점에서의 얼굴이 검출되는 것을 트리거로 하여, 카메라를 제어하여 촬영할 수 있다. 다만, 이에 한정되지 않는다. 여러 시점에 학습 2D 얼굴 이미지와 학습 3D 얼굴 이미지의 학습이 이루어지는 경우, 상기 피부진단단말(150)의 얼굴 이미지의 촬영은 특정 시점으로 제한되지 않는다.Here, view point information captured at different view points of the face image for learning of the same subject may be managed as a specific view point. That is, since the 2D face image at the specific point in time is learned as learning data, the user's 2D face image from the specific point in time can be requested for conversion of the user's 3D face image. That is, as described above, the skin diagnosis terminal 150 may take a picture by controlling a camera with the detection of a face at a corresponding specific time point as a trigger. However, it is not limited thereto. When the learning of the learning 2D face image and the learning 3D face image are performed at various times, capturing of the face image by the skin diagnosis terminal 150 is not limited to a specific time point.

상기 학습 3D 얼굴 이미지(3DTVIMG)는 상기 학습 2D 얼굴 이미지(TVIMG)를 이용하여 3차원 모델링한 3D 이미지이며, 여러 시점에서의 얼굴 이미지의 3차원 좌표(x좌표, y좌표 및 z좌표)에 대한 정보를 포함할 수 있다.The learning 3D face image (3DTVIMG) is a 3D image modeled in three dimensions using the learning 2D face image (TVIMG), and for the three-dimensional coordinates (x coordinate, y coordinate and z coordinate) of the face image at various viewpoints information may be included.

상기 학습 3D 얼굴 이미지(3DTVIMG)는 서로 다른 각도에서 촬영된 두 개 이상의 영상을 이용하여 3D 영상으로 변환하는 종래의 기술을 이용하여 생성할 수 있으며, 예를 들면, 여러 대의 카메라가 촬영한 영상을 360도 방향에서 볼 수 있도록 이어 붙여주는 스티칭(Stitching) 기술 등을 이용할 수 있다.The learning 3D face image 3DTVIMG can be generated using a conventional technique of converting two or more images taken at different angles into 3D images. For example, images taken by multiple cameras can be generated. It is possible to use stitching technology that connects and attaches so that they can be viewed from 360 degrees.

또는, 학습 3D 얼굴 이미지(3DTVIMG)는, 학습 대상인 얼굴을 고정한 상태에서 카메라를 회전시켜 여러 시점에서 연속적인 3D 얼굴 이미지를 획득하는 전문 기계장비인 3차원 피부정밀 진단기를 통해 획득할 수 있다. 예를 들어, 해당 전문 기계장비는 정밀 피부 진단을 위해 피부과 병원 등에서 사용되는 고가의 장비일 수 있으며, 촬영되는 동안 조명 장치가 피부 전역을 일정 광량으로 균일하게 유지하도록 한다. 즉, 피부과 병원 등과 연결된 외부 서버로부터 전문 데이터인 3D 얼굴 이미지가 학습 데이터로 제공될 수 있으며, 전문의 학습 3D 얼굴 이미지가 획득되면, 그 중 시점별 학습 2D 얼굴 이미지의 추출이 가능하므로, 모든 시점에서의 데이터 학습이 이루어질 수 있다.Alternatively, the learning 3D face image 3DTVIMG can be acquired through a 3D skin precision diagnostic device, which is a professional mechanical equipment that acquires continuous 3D face images at various viewpoints by rotating the camera while the face to be studied is fixed. For example, the specialized mechanical equipment may be expensive equipment used in dermatology hospitals for precise skin diagnosis, and a lighting device maintains uniform light intensity throughout the skin during imaging. That is, 3D face images, which are professional data, can be provided as training data from an external server connected to a dermatology hospital, etc., and when the 3D face images trained by specialists are acquired, it is possible to extract learning 2D face images for each viewpoint among them, so that all viewpoints Data learning in can be made.

즉, 상기 학습모델 생성부(322)는 상기 학습 2D 얼굴 이미지(TVIMG) 및 상기 3D 학습 얼굴 이미지(3DTVIMG)를 이용하여, 상기 인공 신경망 모델을 학습(training)시킬 수 있다. 즉, 상기 학습 2D 얼굴 이미지(TVIMG) 및 상기 3D 학습 얼굴 이미지(3DTVIMG)를 학습 데이터 세트로 이용하여, 지도학습(Supervised Learning)를 실행할 수 있다. 복수의 학습용 촬영 대상물에 대해 반복적으로 학습 데이터 세트를 얻고, 이를 반복하여 학습을 수행할 수 있다. 이에 따라, 상기 얼굴 이미지 획득 모델을 생성할 수 있다. That is, the learning model generating unit 322 may train the artificial neural network model using the learning 2D face image (TVIMG) and the 3D learning face image (3DTVIMG). That is, supervised learning may be executed by using the learning 2D face image TVIMG and the 3D learning face image 3DTVIMG as a learning data set. It is possible to repeatedly obtain a learning data set for a plurality of subjects for learning, and to perform learning by repeating it. Accordingly, the face image acquisition model may be generated.

본 실시예에 따른 중앙 서버(300)의 구성들 중 적어도 하나 및 이들을 제어하기 위한 구성은 하드웨어 구성(hardware component), 프로세서와 같은 하드웨어 구성에 의해 실행되는 소프트웨어 구성(software component) 또는 이들의 조합된 것을 이용하여 컴퓨터 또는 이와 유사한 장치로 읽을 수 있는 기록매체 내에서 구현될 수 있다.At least one of the components of the central server 300 according to this embodiment and a component for controlling them may be a hardware component, a software component executed by a hardware component such as a processor, or a combination thereof. It can be implemented in a recording medium readable by a computer or a device similar thereto.

하드웨어적인 구현에 의하면, ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), FLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛 중 적어도 하나를 이용하여 구현될 수 있다. 특히, 상기 검출부(153)는 인공 지능(AI; artificial intelligence)를 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 기존의 범용 프로세서(예: CPU 또는 application processor) 또는 그래픽 전용 프로세서(예: GPU)의 일부로 제작되어 상기 비전 서버(15)에 탑재될 수도 있다.According to the hardware implementation, ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), FLDs (programmable logic devices), FPGAs (field programmable gate arrays), processors, controllers It may be implemented using at least one of controllers, micro-controllers, microprocessors, and electrical units for performing other functions. In particular, the detection unit 153 may be manufactured in the form of a dedicated hardware chip for artificial intelligence (AI), or an existing general-purpose processor (eg, CPU or application processor) or graphics-only processor (eg, GPU). It may be manufactured as a part of and mounted on the vision server 15.

소프트웨어적인 구현에 의하면, 별도의 소프트웨어 모듈들로 구현될 수 있다. 상기 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 동작을 수행할 수 있으며, 적절한 프로그램 언어로 쓰여진 소프트웨어 어플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 비전 서버(15)의 메모리에 저장되고, 별도의 제어부에 의해 실행될 수 있다. 특히, 상기 검출부(153)는 소프트웨어 모듈(또는, 인스트럭션(instruction) 포함하는 프로그램 모듈)로 구현되는 경우, 소프트웨어 모듈은 컴퓨터로 읽을 수 있는 판독 가능한 비일시적 판독 가능 기록매체(non-transitory computer readable media)에 저장될 수 있다. 또한, 이 경우, 소프트웨어 모듈은 OS(Operating System)에 의해 제공되거나, 소정의 애플리케이션에 의해 제공될 수 있다. 또는, 소프트웨어 모듈 중 일부는 OS(Operating System)에 의해 제공되고, 나머지 일부는 소정의 애플리케이션에 의해 제공될 수 있다.According to the software implementation, it may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein, and the software code may be implemented as a software application written in an appropriate programming language. The software code may be stored in the memory of the vision server 15 and executed by a separate control unit. In particular, when the detection unit 153 is implemented as a software module (or a program module including instructions), the software module is a computer-readable non-transitory computer readable recording medium (non-transitory computer readable media). ) can be stored in Also, in this case, the software module may be provided by an Operating System (OS) or a predetermined application. Alternatively, some of the software modules may be provided by an Operating System (OS), and the other part may be provided by a predetermined application.

도 8은 본 발명의 실시예들에 따른 2D 얼굴 이미지(VIMG)를 3D 얼굴 이미지(3DVIMG)으로 변환하는 방법을 나타낸 순서도이다.8 is a flowchart illustrating a method of converting a 2D face image (VIMG) into a 3D face image (3DVIMG) according to embodiments of the present invention.

도 8을 참조하면, 2D 얼굴 이미지(VIMG)를 3D 얼굴 이미지(3DVIMG)으로 변환하는 방법은 학습 얼굴 이미지(TVIMG) 수집 단계(S100), 모델 생성 단계(S200), 사용자 2D 얼굴 이미지(VIMG) 입력 단계(S300), 영상 변환 단계(S400), 3D 얼굴 이미지(3DVIMG) 출력 단계(S500)를 포함할 수 있다.Referring to FIG. 8 , the method of converting a 2D face image (VIMG) into a 3D face image (3DVIMG) includes a training face image (TVIMG) collection step (S100), a model generation step (S200), a user 2D face image (VIMG) It may include an input step (S300), an image conversion step (S400), and a 3D face image (3DVIMG) output step (S500).

상기 학습 얼굴 이미지(TVIMG) 수집 단계(S100)에서는, 적어도 두 개 이상의 카메라를 이용하여 학습 얼굴 이미지(TVIMG)를 수집할 수 있다. 제1 내지 제4 카메라들을 이용하여 학습용 촬영 대상물을 동시에 촬영하여, 상기 학습용 촬영 대상물을 서로 다른 각도에서 촬영한 상기 학습 얼굴 이미지(TVIMG)를 획득할 수 있다. 이때, 상기 학습용 촬영 대상물을 중심으로, 인접하는 상기 제1 내지 제4 카메라들 간의 각도는 45도, 90도 또는 135도일 수 있다.In the learning face image TVIMG collecting step (S100), at least two or more cameras may be used to collect the learning face image TVIMG. The learning face image TVIMG obtained by taking pictures of the learning subject at different angles may be obtained by simultaneously photographing the learning subject using the first to fourth cameras. In this case, an angle between the first to fourth cameras adjacent to the learning recording target may be 45 degrees, 90 degrees, or 135 degrees.

또한, 해당 단계에서 학습 3D 얼굴 이미지(3DTVIMG)를 수집할 수 있다.In addition, a learning 3D face image (3DTVIMG) can be collected in this step.

또는, 학습 3D 얼굴 이미지(3DTVIMG)는, 학습 대상인 얼굴을 고정한 상태에서 카메라를 회전시켜 여러 시점에서 연속적인 3D 얼굴 이미지를 획득하는 전문 기계장비인 3차원 피부정밀 진단기를 통해 획득할 수 있다. 예를 들어, 해당 전문 기계장비는 정밀 피부 진단을 위해 피부과 병원 등에서 사용되는 고가의 장비일 수 있으며, 촬영되는 동안 조명 장치가 피부 전역을 일정 광량으로 균일하게 유지하도록 한다. 즉, 피부과 병원 등과 연결된 외부 서버로부터 전문 데이터인 3D 얼굴 이미지가 학습 데이터로 제공될 수 있으며, 전문의 학습 3D 얼굴 이미지가 획득되면, 그 중 특정 시점의 학습 2D 얼굴 이미지 획득이 가능하므로, 모든 시점에서의 데이터 학습이 이루어질 수 있다.Alternatively, the learning 3D face image 3DTVIMG can be acquired through a 3D skin precision diagnostic device, which is a professional mechanical equipment that acquires continuous 3D face images at various viewpoints by rotating the camera while the face to be studied is fixed. For example, the specialized mechanical equipment may be expensive equipment used in dermatology hospitals for precise skin diagnosis, and a lighting device maintains uniform light intensity throughout the skin during imaging. That is, 3D face images, which are expert data, can be provided as training data from an external server connected to a dermatology hospital, etc., and when a specialist's learning 3D face image is acquired, it is possible to acquire a learning 2D face image at a specific point in time. Data learning in can be made.

상기 모델 생성 단계(S200)에서는, 상기 학습 얼굴 이미지(TVIMG)를 이용하여 3D 모델링된 학습 3D 얼굴 이미지를 생성하고, 상기 학습 얼굴 이미지(TVIMG)과 상기 학습 3D 얼굴 이미지(3DTVIMG)를 이용하여 인공 신경망 모델을 학습시킬 수 있다. 예를 들면, 복수 사람들의 동일한 동작들을 반복적으로 학습할 수 있으며, 이에 대한 자세한 설명은 도 1 내지 도 7 에서 설명한 바와 같다. 즉, 정밀 피부 진단을 위한 얼굴 이미지 획득 모델이 생성된다. In the model generation step (S200), a 3D modeled learning 3D face image is generated using the learning face image (TVIMG), and an artificial face image is generated using the learning face image (TVIMG) and the learning 3D face image (3DTVIMG). A neural network model can be trained. For example, it is possible to repeatedly learn the same motions of a plurality of people, and detailed descriptions thereof are as described in FIGS. 1 to 7 . That is, a face image acquisition model for precise skin diagnosis is created.

상기 사용자 2D 얼굴 이미지(VIMG) 입력 단계(S300)에서는, 3D 변환하고자 하는 얼굴 이미지(VIMG)를 상기 인공 신경망 모델에 입력할 수 있다. 상기 얼굴 이미지(VIMG)은 서로 다른 시점에서 촬영된 복수의 얼굴 이미지일 수 있으며, 각각 얼굴 이미지(VIMG)의 촬영 대상물의 2차원 좌표 정보를 포함할 수 있다.In the step of inputting the user's 2D face image (VIMG) (S300), the face image (VIMG) to be 3D converted may be input to the artificial neural network model. The face image VIMG may be a plurality of face images captured at different viewpoints, and each may include 2D coordinate information of an object of the face image VIMG.

상기 영상 변환 단계(S400)에서는, 학습된 상기 인공 신경망 모델을 이용하여 상기 2D 얼굴 이미지(VIMG)으로부터 3D 얼굴 이미지(3DVIMG)를 생성할 수 있다. 상기 3D 얼굴 이미지(3DVIMG)은 상기 얼굴 이미지(VIMG)의 촬영 대상물의 3차원 위치 정보를 포함할 수 있다. 즉, 상기 3D 얼굴 이미지(3DVIMG)는 상기 촬영 대상물의 3차원 좌표 정보를 포함할 수 있다. 따라서, 상기 3D 얼굴 이미지(3DVIMG)의 시점(view point)은 사용자의 요구에 따라 다양하게 변화 가능할 수 있다.In the image conversion step (S400), a 3D face image 3DVIMG may be generated from the 2D face image VIMG using the learned artificial neural network model. The 3D face image 3DVIMG may include 3D location information of an object to be photographed in the face image VIMG. That is, the 3D face image 3DVIMG may include 3D coordinate information of the object to be captured. Accordingly, the view point of the 3D face image 3DVIMG can be variously changed according to the user's request.

상기 출력 단계(S500)에서는, 상기 3D 얼굴 이미지(3DVIMG)를 출력할 수 있다.In the outputting step (S500), the 3D face image 3DVIMG may be output.

도 9는 본 발명의 일 실시예에 따른 피부 진단을 위해 얼굴 이미지를 촬영하는 방법을 설명하기 위한 순서도이다.9 is a flowchart illustrating a method of photographing a face image for skin diagnosis according to an embodiment of the present invention.

도 1 내지 도 9을 참조하면, 피부 진단을 위해 얼굴 이미지를 촬영하는 방법은, 얼굴 이미지 품질 검출 단계(S210), 이미지 품질 제어 단계(S220), 얼굴 이미지 획득 단계(S230), 얼굴 이미지 송신 단계(S240)를 포함할 수 있다.1 to 9, the method of capturing a face image for skin diagnosis includes a face image quality detection step (S210), an image quality control step (S220), a face image acquisition step (S230), and a face image transmission step. (S240) may be included.

상기 피부 진단을 위해 얼굴 이미지를 촬영하는 방법은, 도 8에서 설명한 얼굴 이미지 획득 모델에 기초하여, 사용자의 얼굴 이미지를 촬영하도록 구성된다. The method for taking a face image for skin diagnosis is configured to capture a user's face image based on the face image acquisition model described in FIG. 8 .

얼굴 이미지 품질 검출 단계(S210)에서는 정확한 피부 진단을 위해 얼굴 이미지가 포함된 영상 데이터의 품질을 검출한다. 딥 네트워크를 이용하여 상기 얼굴 이미지 획득 모델에 기초하여, 상기 카메라부(151)로부터 입력된 영상 데이터에 대해 자동으로 얼굴의 위치, 크기, 밝기, 초점 등의 품질 정보를 추출하도록 구성된다. 즉, 상기 얼굴 이미지 획득 모델의 학습 데이터인 얼굴 이미지의 품질 정보에 기초하여, 수신된 영상 데이터 내 얼굴 이미지의 품질을 검출할 수 있다. In the face image quality detection step (S210), the quality of image data including the face image is detected for accurate skin diagnosis. Based on the face image acquisition model using a deep network, quality information such as the position, size, brightness, and focus of the face is automatically extracted for the image data input from the camera unit 151. That is, the quality of the face image in the received image data may be detected based on the face image quality information, which is the learning data of the face image acquisition model.

예를 들어, 검출된 얼굴 이미지를 기초로 영상 데이터 상의 중심부에 얼굴이 놓여져 있는지, 얼굴 외형 중 일부가 잘렸는지 등에 대한 얼굴의 위치를 검출하도록 구성된다. 또한, 학습 데이터인 얼굴 이미지의 크기에 기초하여, 영상 데이터 내 얼굴 크기가 원하는 얼굴의 크기인지 여부를 검출하도록 구성된다. 예를 들어,검출 대상인 객체인 얼굴에 대해 카메라로부터 떨어진 거리를 통해 크기 정보를 추출할 수 있다. 또한, 학습 데이터인 얼굴 이미지의 부분별 조도 및 선명도에 기초하여, 외부 환경에 따른 영상 데이터 내의 얼굴 이미지의 광량, 음영(그림자), 빛 번짐, 빛 반사 발생 여부 위치를 검출하도록 구성된다. For example, based on the detected face image, the face position is detected based on whether the face is placed in the center of the image data or whether a part of the face shape is cut off. In addition, based on the size of the face image as learning data, it is configured to detect whether the face size in the image data is a desired face size. For example, size information may be extracted through a distance away from a camera with respect to a face, which is an object to be detected. In addition, based on the illuminance and sharpness of each part of the face image, which is learning data, the amount of light, shade (shadow), light smearing, and light reflection of the face image in the image data according to the external environment are configured to detect the location.

또한, 해당 단계에서는 정밀한 피부 진단을 위해 요구되는 조건이 충족되는지 여부, 예를 들어, 앞머리를 위로 올려 이마가 노출되었는지, 얼굴 이미지 내 이물이 묻었는지 여부를 검출할 수 있다. 즉, 학습 데이터인 얼굴 이미지에 기초하여, 영상 데이터 내 얼굴 이미지의 품질을 검출할 수 있다. In addition, in this step, it is possible to detect whether conditions required for precise skin diagnosis are satisfied, for example, whether the forehead is exposed by raising the bangs or whether a foreign substance is present in the face image. That is, based on the face image as learning data, the quality of the face image in the video data can be detected.

상기 이미지 품질 제어 단계(S220)에서는, 검출 단계에서 검출된 정보에 기초하여, 상기 출력부(154)에 피드백 신호를 출력하도록 구성된다. In the image quality control step (S220), based on the information detected in the detection step, it is configured to output a feedback signal to the output unit 154.

예를 들어, 해당 단계에서는 검출된 얼굴 이미지를 기초로 영상 데이터 상의 중심부에 얼굴이 위치하지 않은 경우, 상기 출력부(154)를 통해 얼굴의 위치를 이동 요청에 대한 피드백을 출력할 수 있다. 또한, 영상 데이터 내 얼굴 이미지의 크기 또는 객체인 얼굴의 거리에 기초하여, 가까이 또는 멀게 이동하라는 피드백을 출력할 수 있다. For example, in this step, if the face is not located at the center of the image data based on the detected face image, feedback regarding the request for moving the face position may be output through the output unit 154 . In addition, based on the size of the face image in the image data or the distance of the face as an object, feedback to move closer or farther may be output.

또한, 영상 데이터 내의 얼굴 이미지의 낮은 광량 또는 음영(그림자)가 검출되는 경우, 상기 조명부(152)를 제어하여, 사용자의 얼굴의 원하는 위치에 LED의 빛이 조사될 수 있도록 광량 또는 조사하는 각도를 제어하도록 구성된다. 예를 들어, LED가 위치를 달리 하여 복수개 설치된 경우, 음영이 발생한 위치에 따라 LED의 광량을 개별 제어할 수도 있다. 개별 제어 대상은 음영 발생 위치에 따라 미리 프로그램화 되어 있을 수 있다. 이에 따라, 사용자의 얼굴 이미지 상에 부족한 광량 또는 음영에 대해 빛을 제어하여 음영을 제거할 수 있다. 다만, 이에 한정되지 않는다. 광량 조절 등에 대한 사용자의 조작 요청에 대한 피드백을 출력할 수 있다. In addition, when a low amount of light or shade (shadow) of the face image in the video data is detected, the lighting unit 152 is controlled to set the amount of light or the irradiation angle so that the light of the LED can be irradiated to a desired location on the user's face. configured to control. For example, when a plurality of LEDs are installed at different positions, the amount of light of the LEDs may be individually controlled according to the position where the shade is generated. Individual control targets may be programmed in advance according to the shadow occurrence position. Accordingly, it is possible to remove shadows by controlling light for an insufficient amount of light or shadows on the user's face image. However, it is not limited thereto. Feedback on the user's manipulation request for adjusting the amount of light may be output.

또한, 영상 데이터 내의 얼굴 이미지에서 빛 번짐이 검출되는 경우, 상기 출력부(154)를 통해 세안을 하고, 진단할 것을 제안하는 피드백을 출력할 수 있으며, 얼굴 이미지에서 빛 반사가 검출되는 경우, 상기 출력부(154)를 통해 위치를 이동하여 외부 광원이 반사되는 위치를 피할 것을 제안하는 피드백을 출력할 수 있다.In addition, when light smearing is detected in the face image in the video data, feedback suggesting face washing and diagnosis may be output through the output unit 154, and when light reflection is detected in the face image, the Feedback suggesting to avoid a location where an external light source is reflected by moving the location may be output through the output unit 154 .

상기 얼굴 이미지 획득 단계(S230)에서는, 사용자의 얼굴 이미지로부터 품질 정보가 미리 정해진 기준을 충족하는 경우, 상기 얼굴 이미지 획득 모델에 기초하여, 서로 다른 시점에서의 사용자의 얼굴 이미지를 촬영하도록 구성된다. In the face image acquisition step (S230), when the quality information from the user's face image meets a predetermined criterion, the user's face image is captured at different viewpoints based on the face image acquisition model.

즉, 영상 데이터 내의 사용자의 얼굴 이미지의 시점 정보가 검출되면, 학습된 데이터의 특정 시점에 대응하는 시점으로의 사용자의 얼굴을 회전하라는 피드백을 출력할 수 있다. 상기 특정 시점은 2개 이상을 전제로 하며, 예를 들어, 사용자의 정면, 좌우 측면 총 3개의 시점을 포함할 수 있다. 즉, 사용자의 얼굴 이미지의 시점이 미리 정해진 특정 시점에 대응될 때, 카메라부(151)를 제어하여 영상을 촬영하도록 구성된다. 즉, 상기 특정 시점이 복수인 경우, 상기 제어부(155)는 순차적으로 각 특정 시점에서 영상을 촬영하면서, 다음의 시점으로 사용자의 얼굴을 회전을 안내하는 피드백을 출력할 수 있다. 이에 따라, 도 3에 도시된 바와 같이, 사용자의 정면, 양 측면의 3개의 얼굴 이미지를 획득할 수 있다. That is, when viewpoint information of the user's face image in the video data is detected, a feedback requesting rotation of the user's face to a viewpoint corresponding to a specific viewpoint of the learned data may be output. The specific viewpoint is assumed to be two or more, and may include, for example, a total of three viewpoints of the user's front and left and right sides. That is, when the viewpoint of the user's face image corresponds to a predetermined specific viewpoint, the camera unit 151 is controlled to capture an image. That is, when there are a plurality of specific viewpoints, the controller 155 may output feedback for guiding rotation of the user's face to the next viewpoint while sequentially capturing images at each specific viewpoint. Accordingly, as shown in FIG. 3 , three face images of the front and both sides of the user may be obtained.

또한, 얼굴 이미지 획득은, 인공신경망 모델에 기초하여 상기 카메라부(151)로부터의 실시간으로 전달받는 프리뷰 영상의 영상 데이터 내에 원하는 얼굴 이미지의 품질이 검출되면, 상기 카메라 영상 내에 사용자의 얼굴 이미지의 촬영에 있어 미리 설정된 트리거인 특정 각도인지 여부를 판단하여, 상기 카메라부(151)를 제어하도록 구성된다. In addition, in the face image acquisition, when a desired quality of the face image is detected in the image data of the preview image received in real time from the camera unit 151 based on the artificial neural network model, the user's face image is captured in the camera image. It is configured to control the camera unit 151 by determining whether it is a specific angle, which is a preset trigger.

구체적으로, 상기 영상 데이터의 복수의 프레임들로부터 특성을 추출하여 특성 맵(feature map)들을 생성하고, 이들을 연산하여, 실시간으로 상기 카메라 영상 내에 사용자 얼굴의 위치 및 얼굴 이미지의 품질을 추적하고, 품질을 충족시키는 경우, 사용자의 얼굴 이미지의 시점 정보에 기초하여 미리 설정된 트리거인 특정 각도에 해당되는지 여부를 판단하여, 상기 카메라부(151)를 제어하도록 구성된다. Specifically, by extracting features from a plurality of frames of the image data to create feature maps, calculating them, tracking the position of the user's face and the quality of the face image in the camera image in real time, and When is satisfied, it is configured to control the camera unit 151 by determining whether it corresponds to a specific angle, which is a preset trigger, based on viewpoint information of a user's face image.

예를 들어, 영상 데이터의 프레임의 특성 맵에서 관심 영역(ROI)를 추출할 수 있으며, 또한, 영상 데이터의 프레임의 적어도 일부인 관심 영역을 얼굴 이미지의 촬영에 있어 미리 설정된 트리거가 있는지 여부를 판단하기 위한 주요 영역으로 설정할 수 있다. 예를 들어, 상기 제어부(155)는 영상 데이터의 프레임에서 사용자의 얼굴이 인식된 영역을 포함하도록 주요 영역을 설정할 수 있다. 이에 대한 자세한 설명은 도 1 내지 도 7 에서 설명한 바와 같다. For example, a region of interest (ROI) may be extracted from a feature map of a frame of image data, and a region of interest (ROI) that is at least a part of a frame of image data may be used to determine whether there is a preset trigger in capturing a face image. It can be set as the main area for For example, the controller 155 may set the main area to include the area where the user's face is recognized in the frame of the image data. A detailed description of this is as described in FIGS. 1 to 7 .

또한, 해당 단계에서는 사용자의 얼굴 이미지의 시점이 학습 데이터인 미리 정해진 특정 시점에 대응될 때, 카메라부(151)를 제어하여 영상을 촬영하는 것으로 설명하였으나, 이에 한정되지 않는다. 예를 들어, 사용자의 얼굴 이미지 상에 병변의 위치가 검출되는 경우, 검출된 위치에 따라 촬영할 시점 정보를 결정하여 촬영을 제어할 수 있다. 예를 들어, 사용자의 왼쪽 뺨에 병변이 검출된 경우, 사용자의 왼쪽 뺨이 카메라 정면을 향하도록 우측으로 45도 회전된 시점과 우측으로 135도 회전된 시점으로 촬영할 시점이 결정될 수 있다. 이는, 정밀한 피부 진단을 위해 병변의 크기(너비)와 돌출 정도를 파악하기 위한 것으로 후술할 중앙 서버(300)에서의 3D 얼굴 이미지 변형에 있어서, 병변의 위치에서의 상태 파악이 보다 용이하게 이루어지기 위함이다. 한편, 병변의 위치는 상기 얼굴 이미지 획득 모델의 학습 단계에서 병변에 대한 정보를 추가로 학습함으로써, 상기 얼굴 이미지 획득 모델을 이용하여 검출할 수 있다. 또는, 이와 달리, 일반적으로 병변이 자주 발생하는 이마, 좌우 뺨에 대한 정밀 진단을 위해, 촬영할 시점은, 얼굴 이미지의 정면, 양 측면, 우측으로 45도, 135도 회전된 측면, 좌측으로 45도, 135도 회전된 측면 총 7개의 시점으로 결정될 수 있다.In addition, in the corresponding step, it has been described that when the viewpoint of the user's face image corresponds to a predetermined specific viewpoint, which is learning data, an image is captured by controlling the camera unit 151, but the present invention is not limited thereto. For example, when a position of a lesion is detected on a user's face image, photographing may be controlled by determining viewpoint information to be photographed according to the detected position. For example, when a lesion is detected on the user's left cheek, the time point at which the user's left cheek is rotated 45 degrees to the right and the time point when the left cheek is rotated 135 degrees to the right to face the front of the camera may be determined. This is to determine the size (width) and degree of protrusion of the lesion for precise skin diagnosis. In the 3D face image transformation in the central server 300, which will be described later, it is easier to grasp the state at the location of the lesion. It is for Meanwhile, the location of the lesion may be detected using the face image acquisition model by additionally learning information about the lesion in the learning step of the face image acquisition model. Alternatively, for precise diagnosis of the forehead and left and right cheeks where lesions commonly occur, the time to be photographed is the front of the face image, both sides, 45 degrees to the right, the side rotated 135 degrees, and 45 degrees to the left. , can be determined as a total of 7 viewpoints on the side rotated by 135 degrees.

얼굴 이미지 송신 단계(S240)에서는, 카메라를 통해 미리 학습한 얼굴 이미지의 품질 정보가 검출되면, 서로 다른 시점에서의 사용자의 얼굴 이미지를 촬영하고, 획득된 사용자의 서로 다른 시점에서의 얼굴 이미지를 상기 중앙 서버(300)에 전송하도록 구성된다. In the face image transmission step (S240), when the quality information of the face image previously learned through the camera is detected, the user's face image at different viewpoints is photographed, and the obtained user's face image at different viewpoints is recalled. It is configured to transmit to the central server (300).

전술한 본 발명의 일 실시예에 따른 얼굴 이미지를 촬영하는 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 참여자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션 (즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 사용성테스트 매칭 방법은 단말기에 기본적으로 설치되거나 참여자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다The above-described method for capturing a face image according to an embodiment of the present invention may be executed by an application basically installed in a terminal (this may include a program included in a platform or an operating system basically installed in the terminal), and , It may be executed by an application (that is, a program) directly installed in the master terminal by a participant through an application providing server such as an application store server, an application or a web server related to the corresponding service. In this sense, the usability test matching method according to an embodiment of the present invention described above is implemented as an application (i.e., a program) that is basically installed in a terminal or directly installed by a participant, and is a computer-readable recording medium such as a terminal. can be recorded in

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

전술한 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드(또는, 애플리케이션이나 소프트웨어)로서 구현하는 것이 가능하다. 상술한 얼굴 이미지를 촬영하는 방법은 메모리 등에 저장된 코드에 의하여 실현될 수 있다The above-described present invention can be implemented as computer readable code (or application or software) on a medium on which a program is recorded. The above-described method of photographing a face image may be realized by a code stored in a memory or the like.

컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 상기 컴퓨터는 프로세서를 포함할 수도 있다. 따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.The computer-readable medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable media include Hard Disk Drive (HDD), Solid State Disk (SSD), Silicon Disk Drive (SDD), ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. , and also includes those implemented in the form of a carrier wave (eg, transmission over the Internet). Also, the computer may include a processor. Accordingly, the above detailed description should not be construed as limiting in all respects and should be considered illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

100: 사용자 단말
150: 피부진단단말
151: 카메라부
152: 조명부
153: 검출부
154: 출력부
155: 제어부
300: 중앙 서버100: user terminal
150: skin diagnosis terminal
151: camera unit
152: lighting unit
153: detection unit
154: output unit
155: control unit
300: central server

Claims

As a skin diagnosis system,
A terminal including a lighting unit capable of irradiating light to a user and a camera unit configured to capture a user's face image; and
An artificial neural network model is trained using a 3D modeled learning 3D face image corresponding to a plurality of unspecified learning 2D face images including predetermined quality information and the learning 2D face image, and based on the learned artificial neural network model , A central server configured to convert the user's face image received from the terminal into a three-dimensional 3D face image,
The terminal, based on the artificial neural network model, detects the quality information of the user's face image in the video data input in real time to the camera unit, and if the quality information meets a predetermined criterion, the user is informed of the face angle. By outputting feedback on the rotation, it is configured to capture an image of the user's face at a specific viewpoint
The artificial neural network model additionally learns lesion information of the face included in the learning 2D face image, and the terminal detects the lesion included in the user's face image, and based on the location information of the lesion, the specific Skin diagnosis system, characterized in that for determining the point of view.

According to claim 1,
The skin diagnosis system, characterized in that the predetermined quality information includes at least one of position, size, brightness and focus information of the face in the face image.

According to claim 1,
Skin diagnosis characterized in that the lighting unit is configured to control at least one of a light amount and an irradiation angle, and controls the light amount or the irradiation angle by controlling the lighting unit when the quality information does not satisfy the predetermined criterion. system.

According to claim 1,
The skin diagnosis system, characterized in that the learning 3D face image is generated from the learning 2D face images for different viewpoints.

According to claim 1,
The learning 3D face image is a 3D image obtained from a 3D skin precision diagnostic device that acquires continuous 3D face images at various viewpoints by rotating the camera while the face to be studied is fixed, and the learning 2D face image is the learning 3D face image. Skin diagnosis system, characterized in that extracted from the image for each viewpoint.

delete

According to claim 1,
The skin diagnosis system, characterized in that the specific point of time includes a plurality of different points of view.

According to claim 7,
The skin diagnosis system, characterized in that the specific time point is determined based on the center of the area in which the lesions are grouped when the lesions are clustered in a certain area.

According to claim 1,
The artificial neural network model is learned based on learning 2D face images of different viewpoints including the specific viewpoint.

According to claim 1,
In the artificial neural network model, viewpoint information of the learning 2D face image is additionally learned,
The terminal detects viewpoint information of the user's face image in the video data and, when the specific viewpoint is within a predetermined error range, controls the camera unit to start capturing a video, while the user moves the angle of the face. When a face image of the specific point in time is included in the image frame of the skin diagnosis system, characterized in that for stopping video recording and extracting an image frame including the face image of the specific point in time.