KR102100973B1

KR102100973B1 - Machine learning system using joint learning and the method thereof

Info

Publication number: KR102100973B1
Application number: KR1020180008604A
Authority: KR
Inventors: 노용만; 김성태; 이홍주; 위삼 자랄 알하즈 바다르; 김학구; 이영복; 이재욱; 유대훈; 박주경
Original assignee: 주식회사 제네시스랩; 한국과학기술원
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2020-04-14
Also published as: KR20190090141A

Abstract

본 발명은 교사 학습 모듈 및 학생 학습 모듈을 공동으로 학습시키는 기술에 관한 것으로, 손실 함수를 이용하여, 입력 데이터의 특징을 추출하는 공유 엔진에서 분기(branch)된 교사 학습 모듈 및 학생 학습 모듈 사이의 오류 및 특성 차이를 최소화하는 단계 및 상기 손실 함수에 의해 상기 교사 학습 모듈 및 상기 학생 학습 모듈을 공동으로 학습시키는 단계를 포함한다. The present invention relates to a technology for jointly learning a teacher learning module and a student learning module, between a teacher learning module and a student learning module branched in a shared engine that extracts features of input data using a loss function. Minimizing errors and characteristic differences and jointly learning the teacher learning module and the student learning module by the loss function.

Description

MACHINE LEARNING SYSTEM USING JOINT LEARNING AND THE METHOD THEREOF}

본 발명은 공동 학습을 이용한 기계학습 시스템 및 그 방법에 관한 것으로, 보다 상세하게는 교사 학습 모듈 및 학생 학습 모듈을 공동으로 학습시키는 기술에 관한 것이다. The present invention relates to a machine learning system using co-learning and a method thereof, and more particularly, to a technique for jointly learning a teacher learning module and a student learning module.

얼굴 표식 검출(Facial Landmark Detection; FLD)은 눈, 코, 콧대, 입, 눈썹 및 안면 윤곽과 같은 얼굴 부분 주위의 얼굴 핵심 포인트를 지역화하는(위치를 알아내는) 방법이다. FLD는 다양한 얼굴 관련 어플리케이션에서 중요한 역할을 수행하므로, 얼굴 분석에서 많은 주목을 받고 있다. 예를 들면, 얼굴 표식은 얼굴 인식과 머리자세 평가에서 얼굴 검증 및 자세 조성에 사용된다. 또한, 최근 연구된 GAN(Generative Adversarial Network) 모델은 얼굴 표식을 통해 얼굴 이미지를 생성한다. Facial Landmark Detection (FLD) is a method of localizing (locating) key points around the face around the face, such as eyes, nose, nose, mouth, eyebrows and facial contours. Since FLD plays an important role in various face-related applications, it has attracted much attention in face analysis. For example, face markings are used for facial verification and postural composition in facial recognition and head posture evaluation. In addition, a recently studied GAN (Generative Adversarial Network) model generates a face image through a face marker.

이전 연구의 대부분은 손으로 제작된 FLD를 기반으로 하였다. 예를 들어, Liang(Liang, L., Xiao, R., Wen, F., Sun, J.: Face alignment via componentbased discriminative search. Computer VisionECCV 2008 7285 (2008))의 연구 논문에서, 얼굴 표식 포인트는 반복 최적화를 통해 개선되었다. 또한, BurgosArtizzu(BurgosArtizzu, X.P., Perona, P., Dollㅱr, P.: Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 15131520. (2013))의 논문은 회귀 기반 접근법(RCPR이라 칭함)을 제안하였다.Most of the previous studies were based on hand-crafted FLDs. For example, in a research paper by Liang (Liang, L., Xiao, R., Wen, F., Sun, J .: Face alignment via componentbased discriminative search. Computer VisionECCV 2008 7285 (2008)), face marker points Has been improved through iterative optimization. In addition, a paper from BurgosArtizzu (BurgosArtizzu, XP, Perona, P., Doll Pr, P .: Robust face landmark estimation under occlusion.In: Proceedings of the IEEE International Conference on Computer Vision, pp. 15131520. (2013)) Proposed a regression-based approach (referred to as RCPR).

이 외에 최근 FLD 연구는 컨볼루션 신경 네트워크(convolutional neural networks, CNN)에 초점을 맞추고 있다. 예를 들어, Sun(Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 34763483. (2013))은 논문을 통해 FLD를 위한 3단계 CNN를 제안하였고, Zhang (Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE transactions on pattern analysis and machine intelligence 38,918930 (2016))은 논문을 통해 다중 작업 학습 방법(TCDCN이라 칭함)을 제안하였다. 다만, 대부분의 CNN 기반의 FLD 방법들은 많은 수의 네트워크 파라미터 및 관련 연산 비용을 필요로 한다. 따라서, 이러한 방법들은 모바일 어플리케이션에 적합하지 않다. 이에, 제한된 환경에서 적은 파라미터를 가지는 컴팩트한 신경 네트워크가 요구된다. In addition, recent FLD research has focused on convolutional neural networks (CNNs). For example, Sun (Sun, Y., Wang, X., Tang, X .: Deep convolutional network cascade for facial point detection.In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 34763483. ( 2013)) proposed a three-step CNN for FLD through the paper, Zhang (Zhang, Z., Luo, P., Loy, CC, Tang, X .: Learning deep representation for face alignment with auxiliary attributes.IEEE transactions on pattern analysis and machine intelligence 38,918930 (2016)) proposed a multi-task learning method (called TCDCN) through a paper. However, most CNN-based FLD methods require a large number of network parameters and related computational costs. Therefore, these methods are not suitable for mobile applications. Accordingly, a compact neural network having few parameters in a limited environment is required.

최근 연구에서는 CNN 메모리 및 연산 비용 문제를 다루는데 중점을 두고 있다. 예를 들어, Howard(Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017))는 논문을 통해 뎁스와이즈(depthwise) 분리가 가능한 컨볼루션을 사용하는 모바일넷(MobileNets)을 제안하였다. 상기 논문은 뎁스와이즈 분리 가능한 컨볼루션과 1×1 포인트와이즈 컨볼루션을 사용하여 컨볼루션 필터 파라미터의 수와 연산 복잡성을 감소시키는 내용을 제시하였다. Recent research has focused on addressing CNN memory and computational cost issues. For example, Howard (Howard, AG, Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H .: Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv preprint arXiv: 1704.04861 (2017)) proposed MobileNets using convolution that allows for depthwise separation through the paper. The above paper proposed to reduce the number of convolution filter parameters and computational complexity by using depth-wise separable convolution and 1 × 1 point-wise convolution.

또한, Lee(Lee, T.K., Baddar, W.J., Kim, S.T., Ro, Y.M.: Convolution with Logarithmic Filter Groups for Efficient Shallow CNN. arXiv preprint arXiv:1707.09855 (2017))은 논문을 통해 대수(logarithmic) 그룹 필터를 갖는 컴팩트한 컨볼루션 레이어를 제안하였다. 또한, Lin(Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013))은 완전 연결 레이어를 대체하기 위해 전역 평균 풀링(pooling)을 제안하여 파라미터를 감소시켰다. In addition, Lee (Lee, TK, Baddar, WJ, Kim, ST, Ro, YM: Convolution with Logarithmic Filter Groups for Efficient Shallow CNN.arXiv preprint arXiv: 1707.09855 (2017)) is a logarithmic group filter A compact convolutional layer was proposed. In addition, Lin (Lin, M., Chen, Q., Yan, S .: Network in network. ArXiv preprint arXiv: 1312.4400 (2013)) proposed global average pooling to replace the fully connected layer, Decreased.

또한, 전술한 바와 같이 완전 연결 레이어를 전역 평균 풀링으로 대체하는 컨볼루션 기술뿐만 아니라, 지식 증류(distillation) 방법 또한 네트워크 파라미터 감소에 유용하다. In addition, as described above, not only a convolution technique that replaces the fully connected layer with global average pooling, but also a knowledge distillation method is useful for reducing network parameters.

상기 지식 증류 방법에서는 원래의 '교사' 네트워크에 대해 습득한 지식을 활용하여 '학생' 네트워크를 트레이닝한다. 이 때, 학생 네트워크는 사전에 트레이닝된 교사 네트워크를 모방하도록 트레이닝되었다. 이에 따라서, 사전에 트레이닝된 교사 네트워크가 필요하기 때문에 이 방법은 한계를 가지고 있다. 나아가, 사전에 트레이닝된 교사 네트워크가 없다면, 교사 네트워크 및 학생 네트워크를 트레이닝하는 경우, 너무 많은 시간이 소모된다는 문제점이 존재하였다. In the knowledge distillation method, the 'student' network is trained using the knowledge acquired about the original 'teacher' network. At this time, the student network was trained to mimic a network of pre-trained teachers. Accordingly, this method has limitations because a network of pre-trained teachers is required. Furthermore, if there is no teacher network previously trained, there is a problem that too much time is consumed when training the teacher network and the student network.

본 발명의 목적은 기계학습 모델의 크기를 경량화하기 위해 교사 학습 모듈 및 학생 학습 모듈을 공동으로 학습시키는 시스템 및 방법을 제공하고자 한다.An object of the present invention is to provide a system and method for jointly learning a teacher learning module and a student learning module to reduce the size of a machine learning model.

또한, 본 발명의 목적은 모바일 어플리케이션을 목표로 하는 새로운 컴팩트한 기계학습 모델을 제안하며, 컴팩트한 기계학습 모델에 적용될 수 있는 새로운 교사 및 학생 공동 학습 방법을 제공하고자 한다.In addition, an object of the present invention is to propose a new compact machine learning model targeting a mobile application, and to provide a new teacher and student co-learning method that can be applied to a compact machine learning model.

본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 방법은 손실 함수를 이용하여, 입력 데이터의 특징을 추출하는 공유 엔진에서 분기(branch)된 교사 학습 모듈 및 학생 학습 모듈 사이의 오류 및 특성 차이를 최소화하는 단계 및 상기 손실 함수에 의해 상기 교사 학습 모듈 및 상기 학생 학습 모듈을 공동으로 학습시키는 단계를 포함한다.In the machine learning method using joint learning according to an embodiment of the present invention, errors and characteristic differences between a teacher learning module and a student learning module branched from a sharing engine extracting features of input data using a loss function And minimizing and jointly learning the teacher learning module and the student learning module by the loss function.

또한, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 방법은 상기 학습된 학생 학습 모듈 및 상기 공유 엔진 각각의 네트워크 파라미터를 이용하여 상기 입력 데이터에 대한 분류(classification) 및 회귀(regression) 값을 추론하는 단계를 더 포함할 수 있다.In addition, the machine learning method using the joint learning according to the embodiment of the present invention classifies and regresses the input data using the network parameters of the learned student learning module and the sharing engine. It may further include the step of reasoning.

상기 분기된 교사 학습 모듈 및 학생 학습 모듈 사이의 오류 및 특성 차이를 최소화하는 단계는 세 가지 손실 함수 중 제1 손실 함수 및 제2 손실 함수를 이용하여 상기 교사 학습 모듈 및 상기 학생 학습 모듈의 표식 오류를 최소화하고, 제3 손실 함수를 이용하여 상기 교사 학습 모듈 및 상기 학생 학습 모듈 사이의 특성 차이를 최소화할 수 있다.The step of minimizing the error and characteristic difference between the branched teacher learning module and the student learning module is a marking error of the teacher learning module and the student learning module using the first loss function and the second loss function among the three loss functions. And minimize the characteristic difference between the teacher learning module and the student learning module using a third loss function.

상기 손실 함수에 의해 상기 교사 학습 모듈 및 상기 학생 학습 모듈을 공동으로 학습시키는 단계는 상기 세 가지 손실 함수 중 최종 손실 함수에 의해, 상기 교사 학습 모듈을 모방하는 상기 학생 학습 모듈을 학습시킬 수 있다.The step of jointly learning the teacher learning module and the student learning module by the loss function may train the student learning module imitating the teacher learning module by the final loss function among the three loss functions.

상기 손실 함수에 의해 상기 교사 학습 모듈 및 상기 학생 학습 모듈을 공동으로 학습시키는 단계는 컴팩트한 아키텍처(compact architectures)인 상기 학생 학습 모듈과, 완전 연결 레이어(fullyconnected layer)를 포함하는 상기 교사 학습 모듈을 공동으로 학습되어 서로 모방하도록 할 수 있다.The step of jointly learning the teacher learning module and the student learning module by the loss function includes the student learning module, which is compact architectures, and the teacher learning module including a fully connected layer. It can be learned jointly and emulate each other.

상기 입력 데이터에 대한 분류 및 회귀 값을 추론하는 단계는 상기 공유 엔진 및 상기 학생 학습 모듈로부터 수신되는 상기 네트워크 파라미터에 따른 컴팩트한 기계 학습 모델을 통해 상기 입력 데이터에 대한 분류 및 회귀 값을 추론할 수 있다.The step of inferring classification and regression values for the input data may infer classification and regression values for the input data through a compact machine learning model according to the network parameter received from the sharing engine and the student learning module. have.

상기 기계학습 모델은 상기 교사 학습 모듈을 제거하고, 상기 학생 학습 모듈로부터 구성된 컴팩트 학습 모듈 및 상기 공유 엔진을 포함할 수 있다.The machine learning model may remove the teacher learning module and include a compact learning module constructed from the student learning module and the sharing engine.

본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템은 손실 함수를 이용하여, 입력 데이터의 특징을 추출하는 공유 엔진에서 분기(branch)된 교사 학습 모듈 및 학생 학습 모듈 사이의 오류 및 특성 차이를 최소화하는 처리부 및 상기 손실 함수에 의해 상기 교사 학습 모듈 및 상기 학생 학습 모듈을 공동으로 학습시키는 공동 학습부를 포함한다. In the machine learning system using the joint learning according to the embodiment of the present invention, errors and characteristic differences between the teacher learning module and the student learning module branched from the shared engine extracting the characteristics of the input data using the loss function And a joint learning unit for jointly learning the teacher learning module and the student learning module by a minimizing processing unit and the loss function.

또한, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템은 상기 학습된 학생 학습 모듈 및 상기 공유 엔진 각각의 네트워크 파라미터를 이용하여 상기 입력 데이터에 대한 분류(classification) 및 회귀(regression) 값을 추론하는 추론부를 더 포함할 수 있다.In addition, the machine learning system using joint learning according to an embodiment of the present invention classifies and regresses values of the input data by using the network parameters of each of the learned student learning module and the sharing engine. The reasoning unit may further include a reasoning unit.

상기 처리부는 세 가지 손실 함수 중 제1 손실 함수 및 제2 손실 함수를 이용하여 상기 교사 학습 모듈 및 상기 학생 학습 모듈의 표식 오류를 최소화하고, 제3 손실 함수를 이용하여 상기 교사 학습 모듈 및 상기 학생 학습 모듈 사이의 특성 차이를 최소화할 수 있다.The processing unit minimizes the marking error of the teacher learning module and the student learning module by using a first loss function and a second loss function among the three loss functions, and the teacher learning module and the student by using a third loss function. Differences in characteristics between learning modules can be minimized.

상기 공동 학습부는 상기 세 가지 손실 함수 중 최종 손실 함수에 의해, 상기 교사 학습 모듈을 모방하는 상기 학생 학습 모듈을 학습시킬 수 있다.The co-learning unit may train the student learning module imitating the teacher learning module by a final loss function among the three loss functions.

상기 공동 학습부는 컴팩트한 아키텍처(compact architectures)인 상기 학생 학습 모듈과, 완전 연결 레이어(fullyconnected layer)를 포함하는 상기 교사 학습 모듈을 공동으로 학습되어 서로 모방하도록 할 수 있다.The co-learning unit may jointly learn the student learning module, which is compact architectures, and the teacher learning module including a fully connected layer, to emulate each other.

상기 추론부는 상기 공유 엔진 및 상기 학생 학습 모듈로부터 수신되는 상기 네트워크 파라미터에 따른 컴팩트한 기계학습 모델을 통해 상기 입력 데이터에 대한 분류 및 회귀 값을 추론할 수 있다.The inference unit may infer classification and regression values for the input data through a compact machine learning model according to the network parameter received from the sharing engine and the student learning module.

본 발명의 다른 실시예에 따른 공동 학습을 이용한 기계학습 시스템은 입력 데이터의 특징을 추출하는 공유 엔진에서 분기(branch)된 교사 학습 모듈 및 학생 학습 모듈을 공동으로 학습시켜, 상기 교사 학습 모듈을 모방하는 상기 학생 학습 모듈을 트레이닝하는 트레이닝부 및 상기 공유 엔진과 컴팩트 학습 모듈을 포함하는 컴팩트한 기계학습 모델을 구축하여 상기 입력 데이터에 대한 분류(classification) 및 회귀(regression) 값을 추론하는 테스트부를 포함한다.The machine learning system using the co-learning according to another embodiment of the present invention co-learns the teacher learning module and the student learning module branched in a sharing engine that extracts features of the input data, thereby mimicking the teacher learning module. It includes a training unit for training the student learning module and a test unit for constructing a compact machine learning model including the shared engine and a compact learning module to infer classification and regression values for the input data. do.

상기 트레이닝부는 상기 공유 엔진 내 공유 계층(Shared layer)에 의해 인코딩된 특징을 상기 교사 학습 모듈 및 상기 학생 학습 모듈에 동일하게 공급하며, 상기 교사 학습 모듈 및 상기 학생 학습 모듈을 통해 상기 인코딩된 특징을 분류 및 회귀 값으로 변환할 수 있다.The training unit equally supplies the features encoded by the shared layer in the sharing engine to the teacher learning module and the student learning module, and the encoded features through the teacher learning module and the student learning module. Can be converted to classification and regression values.

상기 트레이닝부는 세 가지 손실 함수를 이용하여 상기 교사 학습 모듈 및 상기 학생 학습 모듈을 공동으로 학습시키는 트레이닝하는 것을 특징으로 하며, 세 가지 손실 함수 중 제1 손실 함수 및 제2 손실 함수를 이용하여 상기 교사 학습 모듈 및 상기 학생 학습 모듈의 표식 오류를 최소화하고, 제3 손실 함수를 이용하여 상기 교사 학습 모듈 및 상기 학생 학습 모듈 사이의 특성 차이를 최소화할 수 있다.The training unit is characterized by training to jointly train the teacher learning module and the student learning module using three loss functions, and the teacher uses the first loss function and the second loss function among the three loss functions. Marking errors of the learning module and the student learning module may be minimized, and a difference in characteristics between the teacher learning module and the student learning module may be minimized using a third loss function.

상기 테스트부는 상기 교사 학습 모듈을 제거하고, 상기 학생 학습 모듈로부터 상기 컴팩트 학습 모듈을 구성하여 구축된 상기 컴팩트한 기계학습 모델을 통해 상기 입력 데이터에 대한 분류 및 회귀 값을 추론할 수 있다. The test unit may remove the teacher learning module and infer classification and regression values for the input data through the compact machine learning model constructed by constructing the compact learning module from the student learning module.

본 발명의 실시예에 따르면, 학생 학습 모듈의 컴팩트한 아키텍처는 교사 학습 모듈의 완전 연결 레이어와 공동으로 학습되어 서로를 모방함으로써, 컴팩트한 기계학습 모델에 적용될 수 있다.According to an embodiment of the present invention, the compact architecture of the student learning module can be applied to the compact machine learning model by co-learning with the fully connected layer of the teacher learning module and imitating each other.

도 1은 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템의 세부 구성에 대한 블록도를 도시한 것이다.
도 2는 본 발명의 다른 실시예에 따른 컴팩트한 기계학습 모델에 대해 제안된 교사 및 학생 공동 학습 방법의 적용 예를 도시한 것이다.
도 3은 본 발명의 실시예에 따른 교사 학습 모듈 및 학생 학습 모듈 간의 공동 학습 방식에 대한 세부 사항을 도시한 것이다.
도 4a 및 도 4b는 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템의 성능을 검증한 실험 결과를 도시한 것이다.
도 5는 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 방법의 흐름도를 도시한 것이다. 1 is a block diagram of a detailed configuration of a machine learning system using joint learning according to an embodiment of the present invention.
Figure 2 shows an application example of the proposed teacher and student co-learning method for a compact machine learning model according to another embodiment of the present invention.
3 shows details of a joint learning method between a teacher learning module and a student learning module according to an embodiment of the present invention.
4A and 4B show experimental results verifying the performance of a machine learning system using joint learning according to an embodiment of the present invention.
5 is a flowchart of a machine learning method using joint learning according to an embodiment of the present invention.

이하, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited or limited by the embodiments. In addition, the same reference numerals shown in each drawing denote the same members.

또한, 본 명세서에서 사용되는 용어(terminology)들은 본 발명의 바람직한 실시예를 적절히 표현하기 위해 사용된 용어들로서, 이는 시청자, 운용자의 의도 또는 본 발명이 속하는 분야의 관례 등에 따라 달라질 수 있다. 따라서, 본 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. In addition, terms used in the present specification (terminology) are terms used to properly express a preferred embodiment of the present invention, which may vary depending on viewers, operator's intentions, or customs in the field to which the present invention pertains. Therefore, definitions of these terms should be made based on the contents throughout the present specification.

최근 널리 보급된 모바일 어플리케이션에서는 메모리와 연산이 제한된 컴팩트한 신경 네트워크가 요구되고 있다. 이에, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템 및 그 방법을 이용한 하나의 예로 얼굴 표식 검출을 위한 컴팩트한 신경 네트워크인 얼굴 표식 검출 네트워크(compact Facial Landmark Detection; compact FLD)를 구축함으로써, 종래 기술의 문제점 및 한계점을 해결하고, 제한된 환경에서 적은 파라미터를 가지는 신경 네트워크를 제공하며, 기계학습 모델의 크기를 경량화하는 것을 그 요지로 한다. Recently, a widely used mobile application requires a compact neural network with limited memory and computation. Thus, as an example of using a machine learning system and method using a joint learning according to an embodiment of the present invention, by constructing a compact facial landmark detection network (compact Facial Landmark Detection; compact FLD) that is a compact neural network for face mark detection , It solves the problems and limitations of the prior art, provides a neural network with a small number of parameters in a limited environment, and makes the size of the machine learning model lighter.

얼굴 표식 검출은 얼굴 분석 어플리케이션에 필수적으로 필요한 정면(frontal) 모듈이다. 본 발명은 컴팩트한 얼굴 표식 검출 네트워크에 적용될 수 있는 새로운 교사 및 학생 공동 학습 방법을 제안한다. Face mark detection is a necessary frontal module for face analysis applications. The present invention proposes a new teacher and student co-learning method that can be applied to a compact face mark detection network.

다만, 이하에서는 '얼굴 표식 검출'의 랜드마크(Landmark) 검출에 초점을 맞춰 설명하였으나, 본 발명에서 제안하는 교사 및 학생 공동 학습 방법은 랜드마크 검출뿐만이 아닌 모든 추론 모델 예를 들면, 객체인식, 위치추론, 감성인식 등의 딥 네트워크 기술에 적용 가능하므로, 얼굴 표식 검출에 한정하지 않는다. However, hereinafter, focusing on the detection of landmarks of 'face mark detection', the teacher and student co-learning method proposed in the present invention includes not only landmark detection, but also all inference models, for example, object recognition, Since it can be applied to deep network technologies such as location inference and emotion recognition, it is not limited to face mark detection.

이하에서는, 도 1 내지 도 5를 참조하여 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템 및 그 방법에 대해 보다 상세히 설명하고자 한다. Hereinafter, a machine learning system and method using the joint learning according to an embodiment of the present invention will be described in more detail with reference to FIGS. 1 to 5.

도 1은 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템의 세부 구성에 대한 블록도를 도시한 것이고, 도 5는 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 방법의 흐름도를 도시한 것이다. 1 is a block diagram showing a detailed configuration of a machine learning system using joint learning according to an embodiment of the present invention, and FIG. 5 is a flowchart of a machine learning method using joint learning according to an embodiment of the present invention. It is done.

본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템(100)은 교사 학습 모듈 및 학생 학습 모듈을 공동으로 학습시킨다.The machine learning system 100 using joint learning according to an embodiment of the present invention jointly trains a teacher learning module and a student learning module.

이를 위해, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템(100)은 처리부(110) 및 공동 학습부(120)를 포함한다. 이 때, 처리부(110) 및 공동 학습부(120)는 도 5의 단계들(510~520)을 수행하기 위해 구성될 수 있다.To this end, the machine learning system 100 using joint learning according to an embodiment of the present invention includes a processing unit 110 and a joint learning unit 120. At this time, the processing unit 110 and the joint learning unit 120 may be configured to perform the steps 510 to 520 of FIG. 5.

단계 510에서, 처리부(110)는 손실 함수를 이용하여, 입력 데이터의 특징을 추출하는 공유 엔진(또는 공유 컨볼루션 네트워크, Shared convolution network)에서 분기(branch)된 교사 학습 모듈(또는 교사 회귀 네트워크, Teacher regression network) 및 학생 학습 모듈(또는 학생 회귀 네트워크, Student regression network) 사이의 오류 및 특성 차이를 최소화한다. 이 때, 상기 입력 데이터는 이미지 데이터, 오디오 데이터, 콘텐츠 데이터 등의 다양한 데이터일 수 있으므로, 한정하지 않는다. In step 510, the processor 110 uses a loss function to extract a feature of the input data, a teacher learning module (or a teacher regression network) branched from a shared engine (or a shared convolution network). Minimize error and character differences between the teacher regression network and the student learning module (or student regression network). In this case, the input data may be various data such as image data, audio data, content data, and the like, and is not limited.

처리부(110)는 세 가지 손실 함수를 이용하여 교사 학습 모듈 및 학생 학습 모듈 사이의 오류 및 특성 차이를 최소화할 수 있다. 예를 들어, 세 가지 종류의 손실 함수 중 제1 손실 함수 및 제2 손실 함수를 이용하여 교사 학습 모듈 및 학생 학습 모듈의 표식 오류를 최소화하고, 제3 손실 함수를 이용하여 교사 학습 모듈 및 학생 학습 모듈 사이의 특성 차이를 최소화할 수 있다. 이 때, 학생 학습 모듈은 세 가지 손실 함수 중 최종 손실 함수에 의해, 교사 학습 모듈을 모방할 수 있다.The processing unit 110 may minimize errors and characteristic differences between the teacher learning module and the student learning module using three loss functions. For example, among the three types of loss functions, the first learning function and the second loss function are used to minimize the marking errors of the teacher learning module and the student learning module, and the third loss function is used to learn the teacher learning module and the student learning. Differences in characteristics between modules can be minimized. At this time, the student learning module may mimic the teacher learning module by the final loss function among the three loss functions.

예를 들면, 세 가지 손실 함수 중 두 개의 손실 함수는 교사 학습 모듈 및 학생 학습 모듈의 표식 오류를 최소화하고, 또 다른 손실 함수는 교사 학습 모듈 및 학생 학습 모듈 사이의 특성 차이를 최소화할 수 있다. For example, two of the three loss functions may minimize marking errors in the teacher learning module and the student learning module, and another loss function may minimize the characteristic difference between the teacher learning module and the student learning module.

단계 520에서, 공동 학습부(120)는 손실 함수에 의해 교사 학습 모듈 및 학생 학습 모듈을 공동으로 학습시킨다. In step 520, the joint learning unit 120 jointly trains the teacher learning module and the student learning module by a loss function.

예를 들면, 공동 학습부(120)는 세 가지 손실 함수 중 최종 손실 함수에 의해, 교사 학습 모듈을 모방하는 학생 학습 모듈을 학습시킬 수 있다. For example, the co-learning unit 120 may train a student learning module that mimics a teacher learning module by a final loss function among three loss functions.

본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템(100)에서 교사 학습 모듈 및 학생 학습 모듈은 공유 엔진을 통해 서로 연결되고, 이후에 분기(branch)된다. 이 때, 학생 학습 모듈은 컴팩트한 아키텍처(compact architectures)인 반면, 교사 학습 모듈은 큰 사이즈의 파라미터를 포함하는 완전 연결 레이어(fullyconnected layer)를 포함하며, 교사 학습 모듈의 완전 연결 레이어와 학생 학습 모듈의 컴팩트한 아키텍처는 공동으로 학습되어 서로 모방할 수 있다. 즉, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템(100)에서 교사 학습 모듈 및 학생 학습 모듈은 공유 엔진과 동시에 학습된다. 이는 사전에 트레이닝된 교사 네트워크와 별도로 학습되는 학생 네트워크를 포함하는 기존의 교사 및 학생 학습과는 차별화된다. In the machine learning system 100 using the joint learning according to the embodiment of the present invention, the teacher learning module and the student learning module are connected to each other through a sharing engine, and then branched. At this time, the student learning module is a compact architecture, while the teacher learning module includes a fully connected layer including large-sized parameters, and the teacher learning module's fully connected layer and the student learning module. Its compact architecture can be learned jointly and imitate each other. That is, in the machine learning system 100 using the joint learning according to the embodiment of the present invention, the teacher learning module and the student learning module are simultaneously learned with the sharing engine. This differs from traditional teacher and student learning, which includes a network of pre-trained teachers and a student network that is learned separately.

또한, 공유 엔진은 입력 데이터에 대한 특징을 추출하며, 교사 학습 모듈 및 학생 학습 모듈 각각은 공유 엔진에 의해 인코딩된 특징을 사용하여 포인트(points)를 추정할 수 있다. 이에, 공유 엔진 내 공유 계층(Shared layer)에 의한 인코딩된 특징은 교사 학습 모듈 및 학생 학습 모듈에 동일하게 공급됨으로써, 교사 학습 모듈 및 학생 학습 모듈은 인코딩된 특징을 분류 및 회귀 값으로 변환할 수 있다. In addition, the sharing engine extracts features for the input data, and each of the teacher learning module and the student learning module can estimate points using features encoded by the sharing engine. Accordingly, the features encoded by the shared layer in the sharing engine are supplied to the teacher learning module and the student learning module equally, so that the teacher learning module and the student learning module can convert the encoded features into classification and regression values. have.

도 1 및 도 5를 참조하면, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템(100)은 추론부(130)를 더 포함할 수 있으며, 추론부(130)는 도 5의 단계 530을 수행하기 위해 구성될 수 있다. 1 and 5, the machine learning system 100 using joint learning according to an embodiment of the present invention may further include a reasoning unit 130, and the reasoning unit 130 may include step 530 of FIG. 5. It can be configured to perform.

추론부(130)는 학습된 학생 학습 모듈 및 공유 엔진 각각의 네트워크 파라미터를 이용하여 입력 데이터에 대한 분류(classification) 및 회귀(regression) 값을 추론할 수 있다.The inference unit 130 may infer classification and regression values of the input data using the network parameters of the learned student learning module and the sharing engine.

예를 들면, 추론부(130)는 공유 엔진과 컴팩트 학습 모듈(Compact regression network)를 포함하는 컴팩트한 기계학습 모델을 통해 입력 데이터에 대한 분류 및 회귀 값을 추론할 수 있다. 실시예에 따라서, 상기 기계학습 모델은 얼굴 표식 검출 네트워크(compact Facial Landmark Detection; compact FLD)일 수 있다. For example, the reasoning unit 130 may infer classification and regression values for input data through a compact machine learning model including a sharing engine and a compact regression network. According to an embodiment, the machine learning model may be a compact facial landmark detection (compact FLD) network.

상기 기계학습 모델은 공유 엔진 및 컴팩트 학습 모듈로 구성되며, 교사 학습 모듈은 제거되고, 학생 학습 모듈은 컴팩트 학습 모듈로 네트워크 파라미터를 전송할 수 있다. 이 때, 공유 엔진 및 학생 학습 모듈 각각은 컴팩트한 기계학습 모델로 네트워크 파라미터를 각각 전송할 수 있다. 이에, 추론부(130)는 컴팩트한 기계학습 모델을 통해 입력 데이터에 대한 분류 및 회귀 값을 추론할 수 있다. The machine learning model is composed of a shared engine and a compact learning module, the teacher learning module is removed, and the student learning module can transmit network parameters to the compact learning module. At this time, each of the shared engine and the student learning module may transmit network parameters in a compact machine learning model. Accordingly, the inference unit 130 may infer classification and regression values for input data through a compact machine learning model.

즉, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템(100) 내 모든 프레임 워크는 학습이 가능하므로, 사전에 트레이닝된 기존의 교사 네트워크가 필요하지 않으며, 본 발명의 학생 학습 모듈은 교사 학습 모듈과 비슷한 성능을 달성할 수 있다. That is, since all the frameworks in the machine learning system 100 using the joint learning according to the embodiment of the present invention are capable of learning, there is no need for a pre-trained existing teacher network, and the student learning module of the present invention is a teacher Performance similar to the learning module can be achieved.

도 2는 본 발명의 다른 실시예에 따른 컴팩트한 기계학습 모델에 대해 제안된 교사 및 학생 공동 학습 방법의 적용 예를 도시한 것이다.Figure 2 shows an application example of the proposed teacher and student co-learning method for a compact machine learning model according to another embodiment of the present invention.

도 2를 참조하면, 본 발명의 다른 실시예에 따른 공동 학습을 이용한 기계학습 시스템은 트레이닝부(또는 트레이닝 단계, 210) 및 테스트부(또는 테스트 단계, 220)를 포함한다.Referring to FIG. 2, a machine learning system using joint learning according to another embodiment of the present invention includes a training unit (or training step 210) and a testing unit (or test step, 220).

트레이닝부(210)에서, 네트워크는 3개의 서브 모듈(211, 212, 213)로 구성된다. In the training unit 210, the network is composed of three sub-modules 211, 212, and 213.

첫 번째 네트워크는 입력 데이터의 특징을 추출하는 공유 엔진(또는 공유 컨볼루션 네트워크, Shared convolution network, 211)이다. 두 번째는 교사 학습 모듈(또는 교사 회귀 네트워크, Teacher regression network, 212)이고, 세 번째는 학생 학습 모듈(또는 학생 회귀 네트워크, Student regression network, 213)이다. 이에, 공유 엔진(211)은 입력 데이터에 대한 특징 맵을 추출하고, 트레이닝부(210)는 공유 엔진(211)에서 분기된 교사 학습 모듈(212) 및 학생 학습 모듈(213)을 공동으로 학습시켜, 교사 학습 모듈(212)을 모방하는 학생 학습 모듈(213)을 트레이닝한다. The first network is a sharing engine (or shared convolution network, 211) that extracts the characteristics of the input data. The second is a teacher learning module (or teacher regression network, 212), and the third is a student learning module (or student regression network, 213). Accordingly, the sharing engine 211 extracts the feature map for the input data, and the training unit 210 jointly trains the teacher learning module 212 and the student learning module 213 branched from the sharing engine 211. , Train a student learning module 213 that mimics the teacher learning module 212.

도 2를 참조하여 예를 들면, 공유 엔진(211)은 입력 데이터(또는 인풋 이미지(Input image)) 내 얼굴 구성 요소의 특징 맵을 추출할 수 있고, 공유 엔진(211)의 특징 맵을 사용하여 교사 학습 모듈(212) 및 학생 학습 모듈(213)은 얼굴 표식 좌표(facial landmarks coordinates)를 추정할 수 있다. 이러한 교사 학습 모듈(212) 및 학생 학습 모듈(213)에서 얼굴 표식 포인트(facial landmark points, 도 2에서 이미지(image) 상의 초록색 점)는 공유 엔진(211)에 의해 인코딩된 특징을 통해 추정될 수 있다. 다만, 상기 입력 데이터는 이미지 데이터, 오디오 데이터, 콘텐츠 데이터 등의 다양한 데이터일 수 있으므로, 이미지에 한정하지 않는다. Referring to FIG. 2, for example, the sharing engine 211 may extract a feature map of facial components in input data (or an input image), and use the feature map of the sharing engine 211 The teacher learning module 212 and the student learning module 213 may estimate facial landmarks coordinates. In the teacher learning module 212 and the student learning module 213, facial landmark points (green dots on an image in FIG. 2) may be estimated through features encoded by the sharing engine 211. have. However, the input data may be various data such as image data, audio data, and content data, and thus is not limited to images.

트레이닝부(210)에서, 교사 학습 모듈(212) 및 학생 학습 모듈(213)은 공유 엔진(211)와 동시에 학습된다. 이 때, 교사 학습 모듈(212) 및 학생 학습 모듈(213)은 3가지 손실 함수로 공동 학습될 수 있다. 예를 들면, 3가지 손실 함수 중 두 가지 손실 함수를 이용하여 교사 학습 모듈(212) 및 학생 학습 모듈(213) 간의 추정된 표식 오차를 최소화하고, 또 다른 손실 함수를 이용하여 교사 학습 모듈(212) 및 학생 학습 모듈(213) 사이의 특징의 차이를 최소화할 수 있다. 나아가, 최종 손실 함수에 의해, 학생 학습 모듈(213)는 교사 학습 모듈(212)을 모방할 수 있게 된다.In the training unit 210, the teacher learning module 212 and the student learning module 213 are simultaneously learned with the sharing engine 211. At this time, the teacher learning module 212 and the student learning module 213 may be co-learned with three loss functions. For example, the estimated learning error between the teacher learning module 212 and the student learning module 213 is minimized using two of the three loss functions, and the teacher learning module 212 using another loss function. ) And the difference between the features between the student learning module 213 can be minimized. Further, by the final loss function, the student learning module 213 can mimic the teacher learning module 212.

즉, 트레이닝부(210)에서, 교사 학습 모듈(212) 및 학생 학습 모듈(213)은 공유 엔진(211)의 동일한 특징을 사용함으로써, 학생 학습 모듈(213)는 교사 학습 모듈(212)을 쉽게 모방할 수 있게 된다. 이러한 학습 과정을 통해 컴팩트 학습 모듈(221)은 파라미터 수가 적지만 우수한 성능을 유지할 수 있으며, 학생 학습 모듈(213)은 교사 학습 모듈(212)를 모방하므로 미리 트레이닝된 교사 네트워크(또는 교사 학습 모듈)를 필요로 하지 않게 된다. That is, in the training unit 210, the teacher learning module 212 and the student learning module 213 use the same features of the sharing engine 211, so that the student learning module 213 can easily make the teacher learning module 212 easy. You can imitate. Through such a learning process, the compact learning module 221 may maintain excellent performance with a small number of parameters, and the student learning module 213 mimics the teacher learning module 212 so that a pre-trained teacher network (or teacher learning module) It does not need.

다시 도 2를 참조하면, 테스트부(220)에서, 컴팩트한 기계학습 모델은 2개의 서브 모듈(211, 221)로 구성된다. 이 때, 상기 기계학습 모델은 얼굴 표식 검출 네트워크(compact Facial Landmark Detection; compact FLD)일 수 있다. Referring to FIG. 2 again, in the test unit 220, the compact machine learning model is composed of two sub-modules 211 and 221. In this case, the machine learning model may be a compact facial landmark detection network (compact FLD).

첫 번째 네트워크는 공유 엔진(211)이고, 두 번째 네트워크는 컴팩트 학습 모듈(또는 컴팩트 회귀 네트워크, Compact regression network, 221)이다. 이 때, 공유 엔진(211)은 트레이닝부(210)에서의 공유 엔진(211)로부터 네트워크 파라미터(Parameters)를 수신하고, 컴팩트 학습 모듈(221)은 트레이닝부(210)에서의 학생 학습 모듈(213)로부터 수신된 네트워크 파라미터를 사용한다.The first network is a sharing engine 211, and the second network is a compact learning module (or compact regression network, 221). At this time, the sharing engine 211 receives network parameters from the sharing engine 211 in the training unit 210, and the compact learning module 221 is a student learning module 213 in the training unit 210. ).

이하에서는 하기의 [표 1]을 통해 기존 얼굴 표식 검출 네트워크(FLD)에서의 파라미터의 점유(Occupation)를 살펴보고자 한다. Hereinafter, through the following [Table 1] to look at the occupancy (Occupation) of the parameters in the existing face mark detection network (FLD).

[표 1][Table 1]

[표 1]은 최근 보고된 얼굴 표식 검출 네트워크(FLD)의 파라미터 수를 나타낸다. 하나는 TCDCN이고, 다른 하나는 DCNNI/C이다. 이 때, TCDCN은 Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE transactions on pattern analysis and machine intelligence 38,918930 (2016)를 통해 보고되었으며, DCNNI/C은 Baddar, W.J., Son, J., Kim, D.H., Kim, S.T., Ro, Y.M.: A deep facial landmarks detection with facial contour and facial components constraint. In: Image Processing (ICIP), 2016 IEEE International Conference on, pp. 32093213. IEEE, (2016)를 통해 보고되었다.[Table 1] shows the number of parameters of the recently reported face mark detection network (FLD). One is TCDCN and the other is DCNNI / C. At this time, TCDCN is Zhang, Z., Luo, P., Loy, C.C., Tang, X .: Learning deep representation for face alignment with auxiliary attributes. Reported by IEEE transactions on pattern analysis and machine intelligence 38,918930 (2016), DCNNI / C is Baddar, WJ, Son, J., Kim, DH, Kim, ST, Ro, YM: A deep facial landmarks detection with facial contour and facial components constraint. In: Image Processing (ICIP), 2016 IEEE International Conference on, pp. 32093213. IEEE, (2016).

TCDCN은 하나의 완전 연결 레이어 및 하나의 출력 레이어를 포함하는 4개의 컨볼루션 레이어를 포함한 간단한 구조이다. DCNNI/C는 두 개의 서브 네트워크로 구성된 FLD 네트워크이다. 이 때, DCNNI/C에서 하나의 서브 네트워크(예를 들면, DCNNC)는 얼굴 윤곽 표식을 검출하는데 사용되고, 다른 하나(예를 들면, DCNNI)는 얼굴 구성을 검출하는데 사용될 수 있다. 이 때, 상기 DCNNC는 하나의 완전 연결 레이어 및 하나의 출력 레이어를 포함하는 4개의 컨볼루션 레이어를 포함하고, DCNNI는 3개의 공통 컨볼루션 레이어 및 5개의 내부 얼굴 구성 컨볼루션 레이어를 포함하며, 각 내부 얼굴 구성 컨볼루션 레이어는 자체적으로 완전히 연결된 레이어 및 출력 레이어를 포함한다.TCDCN is a simple structure with 4 convolutional layers including one fully connected layer and one output layer. DCNNI / C is an FLD network composed of two sub-networks. At this time, in DCNN (I / C, one sub-network (eg, DCNNC) can be used to detect the face contour marker, and the other (eg, DCNNI) can be used to detect the face composition. At this time, the DCNNC includes four convolutional layers including one fully connected layer and one output layer, and DCNNI includes three common convolutional layers and five internal facial component convolutional layers, each The inner facial composition convolution layer itself has a fully connected layer and an output layer.

즉, [표 1]을 참조하면, 모든 파라미터의 절반 이상이 완전 연결 레이어에 포함된다. 이러한 점에서, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템은 컴팩트한 얼굴 표식 검출 네트워크를 구축하여 완전 연결 레이어에서 파라미터의 수를 최소화할 수 있다. That is, referring to [Table 1], more than half of all parameters are included in the complete connection layer. In this regard, the machine learning system using joint learning according to an embodiment of the present invention can minimize the number of parameters in the fully connected layer by constructing a compact face mark detection network.

다시 도 2를 참조하면, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템의 테스트부(220)에서, 학생 학습 모듈(213)은 평평한 레이어 및 완전 연결 레이어 대신에 1×1의 컨볼루션 및 전역 평균 풀링(pooling)을 사용하기 때문에 네트워크 파라미터가 크게 감소할 수 있다. 또한, 학생 학습 모듈(213)은 교사 학습 모듈(212)을 모방할 수 있도록 트레이닝 되었기 때문에, 교사 학습 모듈(212)에 필적하는 성능을 제공할 수 있다. 이에 따라서, 공유 엔진(211) 및 컴팩트 학습 모듈(221)로 구축된 컴팩트한 얼굴 표식 검출 네트워크(compact FLD)는 네트워크 파라미터의 수를 현저하게 감소시킬 수 있다. Referring back to FIG. 2, in the test unit 220 of the machine learning system using the joint learning according to the embodiment of the present invention, the student learning module 213 has a convolution of 1 × 1 instead of a flat layer and a fully connected layer. And global average pooling, which can significantly reduce network parameters. In addition, since the student learning module 213 is trained to imitate the teacher learning module 212, it can provide performance comparable to the teacher learning module 212. Accordingly, the compact face mark detection network (compact FLD) constructed with the sharing engine 211 and the compact learning module 221 may significantly reduce the number of network parameters.

도 3은 본 발명의 실시예에 따른 교사 학습 모듈 및 학생 학습 모듈 간의 공동 학습 방식에 대한 세부 사항을 도시한 것이다. 3 shows details of a joint learning method between a teacher learning module and a student learning module according to an embodiment of the present invention.

도 3을 참조하면, 공유 엔진 내 공유 계층(또는 공유 컨볼루션 레이어(Shared convolutional layers), 310)의 마지막 부분에서 교사 학습 모듈(320) 및 학생 학습 모듈(330)로 두 개의 네트워크가 분기(branch)된다. 이 때, 교사 학습 모듈(320)의 완전 연결 레이어는 TFC(322)로 지칭하고, 학생 학습 모듈(330)의 해당 레이어는 STDFC(332)로 지칭하며, STDFC 레이어(332)는 1×1 컨볼루션(1×1 convolution)과 전역 평균 폴링(Global Average Pooling; GAP, 333)으로 구성된다. STDFC(332) 및 TFC(322)의 차원은 동일하다.Referring to FIG. 3, the two networks branch to the teacher learning module 320 and the student learning module 330 at the end of the shared layer (or shared convolutional layers) 310 in the sharing engine. )do. At this time, the fully connected layer of the teacher learning module 320 is referred to as the TFC 322, the corresponding layer of the student learning module 330 is referred to as the STDFC 332, and the STDFC layer 332 is a 1 × 1 convolution. It consists of a solution (1x1 convolution) and global average pooling (GAP, 333). The dimensions of the STDFC 332 and TFC 322 are the same.

공유 계층(310)에 의해 인코딩된 특징은 교사 학습 모듈(320)과 학생 학습 모듈(330)에 동시에 공급된다. 상기 인코딩된 특징은 교사 학습 모듈(320) 및 학생 학습 모듈(330)를 통해 분류 및 회귀 값으로 변환된다. 이 때, 교사 학습 모듈(320) 및 학생 학습 모듈(330)의 입력은 공유 계층(310)에 의해 인코딩된 것과 동일한 것을 특징으로 한다. Features encoded by the shared layer 310 are simultaneously supplied to the teacher learning module 320 and the student learning module 330. The encoded features are converted into classification and regression values through the teacher learning module 320 and the student learning module 330. At this time, the input of the teacher learning module 320 and the student learning module 330 is characterized in that it is the same as that encoded by the shared layer 310.

상기 완전 연결 레이어는 CNN(Convolutional Neural Networks, 컨볼루션 신경 네트워크) 특징 맵을 표식 좌표로 변환하므로, 본 발명의 실시예에 따른 컴팩트한 기계학습 모델(예를 들면, 컴팩트한 얼굴 표식 검출 네트워크(compact Facial Landmark Detection; compact FLD))을 이용하는 기계학습 시스템에서 필수적인 역할을 수행한다. Since the fully connected layer converts the CNN (Convolutional Neural Networks) feature map into marker coordinates, a compact machine learning model according to an embodiment of the present invention (e.g., a compact facial marker detection network) Facial Landmark Detection (compact FLD)) plays an essential role in machine learning systems.

이에 따라서, STDFC 레이어(332)가 TFC 레이어(322)를 모방하도록 공동 학습을 위해 지정된 손실 함수(loss function)가 필요하다. 이를 위해, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템에서 세 가지 손실 함수가 고안되었다.Accordingly, a loss function designated for joint learning is required so that the STD_FC layer 332 mimics the T_FC layer 322. To this end, three loss functions have been devised in a machine learning system using joint learning according to an embodiment of the present invention.

세 가지 손실 함수 중 제1 손실 함수 L¹는 교사 학습 모듈 출력(Teacher regression output, 321)의 추정된 오차 즉, 교사 학습 모듈(320)에 의한 얼굴 표식 검출을 최소화하는 역할을 수행한다. 제2 손실 함수 L²는 학생 학습 모듈 출력(Student regression output, 331)의 추정된 오차 즉, 학생 학습 모듈(330)에 의한 얼굴 표식 검출을 최소화하는 역할을 수행한다. 제3 손실 함수 L³는 TFC 레이어(322)와 STDFC 레이어(332)의 출력 벡터 간의 차이를 나타낸다. 세 가지 손실 조건 L¹, L², L³은 하기의 [수식 1]과 같이 수식화될 수 있다.Of the three loss functions, the first loss function L ¹ serves to minimize the estimated error of the teacher regression output 321, that is, the detection of the face mark by the teacher learning module 320. The second loss function L ² plays a role of minimizing the detection of the face mark by the student learning module 330, that is, the estimated error of the student regression output 331. The third loss function L ³ represents the difference between the output vectors of the TFC layer 322 and the STDFC layer 332. The three loss conditions L ¹ , L ² , and L ³ can be formulated as [Equation 1] below.

[수식 1][Equation 1]

[수식 1]에서, N은 트레이닝 이미지의 수를 나타내고, i는 입력 이미지의 인덱스를 나타내며,

는 실측 자료(ground truth) 얼굴 표식 좌표를 나타내고,

는 입력 이미지를 나타낸다. In Equation 1, N represents the number of training images, i represents the index of the input image,

Indicates the ground truth face marker coordinates,

Indicates an input image.

또한, h(.)는 W_CNN에 의해 파라미터화된 공유 엔진의 함수를 의미하며, f(.)는 W_T에 의해 파라미터화된 교사 학습 모듈(320)의 함수를 의미하고, g(.)는 W_STD에 의해 파라미터화된 학생 학습 모듈(330)의 함수를 의미한다. 또한,

는 TFC(322)의 출력 벡터이며,

는 STDFC(332)의 출력 벡터이다. In addition, h (.) Means the function of the shared engine parameterized by W _CNN , f (.) Means the function of the teacher learning module 320 parameterized by W _T , and g (.) _Denotes a function of the student learning module 330 parameterized by W _STD . In addition,

Is the output vector of the TFC 322,

Is the output vector of STDFC 332.

분기된 포인트로부터의 총 손실은

와 같이 정의된다. 이 때,

은 각 손실의 중요도(정도)를 나타낸다. 모든 손실은 모든 네트워크와 공동으로 영향을 미치기 때문에, 파라미터를 적절하게 설정한다. The total loss from the forked point is

Is defined as At this time,

Indicates the importance (degree) of each loss. Since all losses affect all networks jointly, set the parameters accordingly.

예를 들면, 트레이닝 단계에서, 교사 학습 모듈(320)은 최고의 성능을 달성해야 하기 때문에 지속적으로 트레이닝 되어야 한다. 교사 학습 모듈(320)이 최상의 성능을 달성하지 못하면 학생 학습 모듈(330) 또한 최상의 성능을 달성할 수 없게 된다. 이러한 최상의 성능을 달성하기 위해 처음부터 끝까지

로 고정하였다. 그런 다음 교사 학습 모듈(320), 학생 학습 모듈(330) 및 공유 엔진(또는 공유 컨볼루션 네트워크, Shared convolution network)에 공동으로 영향을 미치는

를 0부터 1까지 점진적으로 증가시킨다(즉, t번째 반복에서

). 실시예에 따라서, 초기에

로 설정하는 경우, 교사 학습 모듈(320)의 성능은 최상의 성능을 얻기 전에 포화 상태에 도달한다. 따라서, 트레이닝이 시작될 때에는 공유 엔진과 교사 학습 모듈(320)이 중점적으로 트레이닝되고, 이후 학생 학습 모듈(330)이 점차적으로 트레이닝된다. For example, in the training phase, the teacher learning module 320 must be continuously trained because it must achieve the best performance. If the teacher learning module 320 does not achieve the best performance, the student learning module 330 also cannot achieve the best performance. To achieve this best performance, from start to finish

Was fixed with. Then jointly affect the teacher learning module 320, the student learning module 330, and the sharing engine (or shared convolution network).

Incrementally from 0 to 1 (i.e. at the t iteration

). Depending on the embodiment, initially

If set to, the performance of the teacher learning module 320 reaches saturation before obtaining the best performance. Therefore, when training starts, the sharing engine and the teacher learning module 320 are intensively trained, and then the student learning module 330 is gradually trained.

하기의 [알고리즘 1]은 전술한 바와 같이 제안된 교사 및 학생 공동 학습 방법에 대해 설명한다. [Algorithm 1] below describes the proposed teacher and student co-learning method as described above.

[알고리즘 1][Algorithm 1]

도 4a 및 도 4b는 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템의 성능을 검증한 실험 결과를 도시한 것이다. 4A and 4B show experimental results verifying the performance of a machine learning system using joint learning according to an embodiment of the present invention.

도 4a 및 도 4b의 실험 결과에 대한 설명에 앞서, 실험에 사용된 기계학습 모델의 얼굴 표식 검출 네트워크(FLD) 및 실험 설정에 대해 설명하고자 한다.Before explaining the experimental results of FIGS. 4A and 4B, the face mark detection network (FLD) of the machine learning model used in the experiment and the experimental setup will be described.

본 발명에서 제안하는 컴팩트한 얼굴 표식 검출 네트워크(compact FLD)를 통한 공동 학습을 이용한 기계학습 시스템의 효과를 검증하기 위해 기존의 FLD 네트워크에 적용하였다. 이 때, TCDCN 및 수정된 DCNNI/C(MDCNNI/C)인 두 개의 FLD 네트워크를 사용하였다. In order to verify the effectiveness of the machine learning system using joint learning through the compact face mark detection network (compact FLD) proposed in the present invention, it was applied to an existing FLD network. At this time, two FLD networks, TCDCN and modified DCNNI / C (MDCNNI / C), were used.

TCDCN에서 네 개의 컨볼루션 레이어를 공유 엔진으로 간주하고, 마지막 컨볼루션 레이어의 특징은 교사 학습 모듈 및 학생 학습 모듈 각각에 제공된다. 제안된 교사 학습 모듈은 평평한 레이어(Flatten layer), 완전 연결 레이어 및 교사 회귀 출력 레이어로 구성되고, 학생 학습 모듈은 1×1 컨볼루션 레이어, 전력 평균 풀링 레이어 및 학생 회귀 출력 레이어로 구성된다. In TCDCN, four convolutional layers are considered as shared engines, and the features of the last convolutional layer are provided in each of the teacher learning module and the student learning module. The proposed teacher learning module consists of a flat layer, fully connected layer, and teacher regression output layer, and the student learning module consists of a 1 × 1 convolution layer, a power average pooling layer, and a student regression output layer.

MDCNNI/C는 얼굴 안쪽 부분과 얼굴 윤곽 부분을 통합한다. 이 때, DCNNI/C 아키텍처에서 눈과 눈썹의 컨볼루션 레이어를 좌측 눈/우측 눈 및 좌측 눈썹/우측 눈썹 컨볼루션 레이어로 분리한다. MDCNNI/C는 세 개의 컨볼루션 레이어와 여덟 개의 얼굴 구성 요소 컨볼루션 레이어(좌측 눈썹, 우측 눈썹, 좌측 눈, 우측 눈, 콧대, 코, 입 및 얼굴 윤곽선)로 구성된다. 이 때, MDCNNI/C에서 공유 엔진은 3개의 컨볼루션 레이어 및 얼굴 구성 컨볼루션 레이어이다. 각 얼굴 구성 요소에 대한 컨볼루션 레이어는 자체 교사 학습 모듈과 학생 학습 모듈로 구분된다. MDCNNI / C integrates the inner part of the face and the contour of the face. At this time, in the DCNNI / C architecture, the convolutional layers of the eye and eyebrow are separated into the left eye / right eye and the left eyebrow / right eyebrow convolution layer. MDCNNI / C consists of three convolutional layers and eight facial component convolutional layers (left eyebrow, right eyebrow, left eye, right eye, nose, nose, mouth and face outline). At this time, in MDCNNI / C, the sharing engine is three convolutional layers and a facial convolutional layer. The convolutional layer for each facial component is divided into its own teacher learning module and student learning module.

도 4a 및 도 4b의 실험에서는 최근 연구에서 널리 사용되는 벤치 마크 데이터 세트인 300W에 대한 실험을 수행하였다. 상기 데이터 세트는 68개의 얼굴 표식 포인트와 얼굴 경계 박스를 제공한다. 본 발명의 트레이닝을 위한 데이터 세트는 AFW, HELEN, LFPW 및 IBUG의 하위 집합에서 수집되었다. 특히, 337개의 트레이닝 이미지는 AFW, 2000개의 트레이닝 영상은 HELEN, 811개의 트레이닝 영상은 LFPW 각각에서 수집된 것이다. 따라서, 총 트레이닝 이미지 수는 3,148이고, 테스트 이미지의 총 수는 689개이다.In the experiments of FIGS. 4A and 4B, an experiment was performed on a 300 dBW, a benchmark data set widely used in recent studies. The data set provides 68 face marker points and a face bounding box. Datasets for training of the present invention were collected from a subset of AFW, HELEN, LFPW and IBUG. In particular, 337 training images were collected from AFW, 2000 training images from HELEN, and 811 training images from LFPW. Therefore, the total number of training images is 3,148, and the total number of test images is 689.

이 때, 테스트 데이터 세트는 두 가지 유형의 테스트 세트로 구성된다. 하나는 LFPW(224개 테스트 이미지)와 HELEN(330개 테스트 이미지)의 공통 테스트 세트이고, 또 다른 테스트 데이터 세트는 IBUG(135개 테스트 이미지)로부터의 과제 세트이다. 상기 과제 세트에는 교합(occlusions), 일루미네이션 변형, 머리 자세 및 표현과 같은 과제 조건이 포함된다. 본 발명은 초과 적용(overfitting)을 방지하고, 트레이닝 데이터 변형을 증가시키기 위해 데이터 증가를 수행하였으며, 이미지 번역, 회전 및 확대/축소를 실행하여 트레이닝 이미지의 수를 증가시켰다. At this time, the test data set is composed of two types of test sets. One is the common test set of LFPW (224 test images) and HELEN (330 test images), and the other test data set is the task set from IBUG (135 test images). The task set includes task conditions such as occlusions, illumination deformation, head posture and expression. The present invention has been performed to increase data to prevent overfitting and to increase training data transformation, and to increase the number of training images by performing image translation, rotation, and zooming.

전술한 바와 같은 설정을 기반으로 실험을 진행하였으며, 본 발명의 컴팩트한 얼굴 표식 검출 네트워크(compact Facial Landmark Detection; compact FLD)에 대한 성능을 검증하기 위해 네트워크 성능을 평가하였다. 이 때, 평균 오차(Mean error)는 하기의 [수식 2]를 통해 추정된 표식과 실측 자료(ground truth) 사이의 거리 척도로서 산출된다.Experiments were conducted based on the above-described settings, and network performance was evaluated to verify the performance of the compact facial landmark detection (compact FLD) network of the present invention. At this time, the mean error (Mean error) is calculated as a measure of the distance between the ground truth and the marker estimated through [Equation 2] below.

[수식 2][Equation 2]

[수식 2]에서, N는 이미지의 수를 나타내고, M은 표식(landmarks)의 수를 나타내며, j는 표식 포인트(landmark points)의 지수를 나타내다. 또한, o는 출력(output)을 나타내며, g는 실측 자료(ground truth)를 나타내고, l 및 r은 각각 좌측 눈과 우측 눈의 좌표를 나타낸다.In Equation 2, N represents the number of images, M represents the number of landmarks, and j represents the index of the landmark points. In addition, o represents the output, g represents the ground truth, and l and r represent the coordinates of the left eye and the right eye, respectively.

본 발명에서 제안하는 컴팩트한 얼굴 표식 검출 네트워크(compact FLD)에 대한 성능을 평가하기 위해, 본 발명의 실시예에 따른 compact FLD를 통한 교사 및 학생 공동 학습 방법을 수작업 기반의 방법 및 딥러닝 기반의 방법을 포함한 다양한 기존 FLD 방법들과 비교하였다. In order to evaluate the performance of the compact face mark detection network (compact FLD) proposed in the present invention, the teacher-student co-learning method through compact FLD according to an embodiment of the present invention is based on a manual method and deep learning based Compared to various existing FLD methods, including methods.

하기의 [표 2]는 기존 FLD 방법과, 68개의 얼굴 표식 포인트가 있는 300W 데이터 세트의 본 발명의 실시예에 따른 컴팩트한 얼굴 표식 검출 네트워크 사이의 평균 오차 비교를 나타낸다. Table 2 below shows a comparison of the average error between the existing FLD method and the compact face mark detection network according to an embodiment of the present invention of a 300 W data set with 68 face mark points.

[표 2][Table 2]

[표 2]에서 볼 수 있듯이, 수작업 방법(Hand craft methods)의 경우 평균 오차는 6.3% 이상이고, 딥러닝 기반 방법(Deep learning based methods)은 300W 데이터 세트의 수작업 방법보다 낮은 평균 오차를 달성하였다.As shown in [Table 2], the average error of hand craft methods is more than 6.3%, and the deep learning based methods achieve a lower average error than the manual method of 300W data set. Did.

[표 2]에서 TCDCNC는 본 발명의 실시예에 따른 compact FLD를 통한 교사 및 학생 공동 학습 방법에 의해 트레이닝된 TCDCN 네트워크의 컴팩트한 버전이고, MDCNNI/CC는 본 발명의 실시예에 따른 compact FLD를 통한 교사 및 학생 공동 학습 방법으로 트레이닝된 MDCNNI/C의 컴팩트한 버전이다. 이 때, TCDCNC 성능(5.40%)은 TCDCN(5.54%)과 비슷한 성능을 나타낸다. 또한, MDCNNI/CC(4.95%)는 MDCNNI/C(4.92%)에 필적하는 성능을 달성하는 반면, 파라미터의 수는 크게 감소하였다(하기의 [표 3] 참고).In Table 2, TCDCNC is a compact version of the TCDCN network trained by the teacher and student co-learning method through the compact FLD according to an embodiment of the present invention, and MDCNNI / CC is a compact FLD according to an embodiment of the present invention. It is a compact version of MDCNNI / C trained in a joint learning method for teachers and students through. At this time, TCDCNC performance (5.40%) shows similar performance to TCDCN (5.54%). In addition, while MDCNNI / CC (4.95%) achieved performance comparable to MDCNNI / C (4.92%), the number of parameters was significantly reduced (see Table 3 below).

[표 3][Table 3]

따라서, 전술한 바에 따라, 본 발명의 실시예에 따른 compact FLD를 통한 교사 및 학생 공동 학습 방법이 다양한 네트워크에 잘 적용될 수 있음을 보여준다. Therefore, according to the above, it is shown that the teacher and student co-learning method through compact FLD according to an embodiment of the present invention can be well applied to various networks.

[표 3]을 참조하여 네트워크 파라미터 수의 총 수를 평가하면, TCDCN은 네트워크 파라미터의 총 수가 312,244개에서 148,404개로 감소하였다(파라미터가 약 52% 감소됨). 특히, MDCNNI/C는 많은 수의 네트워크 파라미터(4,331,444개)를 갖는 반면, 본 발명의 실시예에 따른 compact FLD를 통한 교사 및 학생 공동 학습 방법을 MDCNNI/C에 적용한 후의 네트워크 파라미터의 총 수는 4,331,444개에서 399,284개로 현저하게 감소된 것을 알 수 있다(파라미터 약 90% 감소됨). 나아가, MDCNNI/CC는 DCNNI/C의 파라미터 수([표 1]에서의 3,298,152개)보다 더 적은 파라미터를 가질 뿐만 아니라, DCNNI/C([표 2]에서의 6.12%)의 오차율보다 더 적은 오차율을 나타낸다. When evaluating the total number of network parameters with reference to Table 3, TCDCN reduced the total number of network parameters from 312,244 to 148,404 (parameters were reduced by about 52%). In particular, while MDCNNI / C has a large number of network parameters (4,331,444), the total number of network parameters after applying the teacher and student co-learning method through compact FLD according to an embodiment of the present invention to MDCNNI / C is 4,331,444 It can be seen that the number of dogs was significantly reduced from 399,284 dogs (parameters are reduced by about 90%). Furthermore, MDCNNI / CC not only has fewer parameters than the number of DCNNI / C parameters (3,298,152 in [Table 1]), but also an error rate less than the error rate of DCNNI / C (6.12% in [Table 2]). Indicates.

이하에서는 전술한 바와 같은 실험 설정 및 성능 비교를 기반으로, 본 발명의 실시예에 따른 compact FLD를 통한 교사 및 학생 공동 학습 방법에 대한 평가를 설명하고자 한다.Hereinafter, evaluation of the teacher and student co-learning method through the compact FLD according to the embodiment of the present invention will be described based on the experiment setup and performance comparison as described above.

본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템은 학생 학습 모듈이 교사 네트워크의 사전 트레이닝 모델 없이도 교사 및 학생의 공동 학습에 의해 교사 학습 모듈을 모방하고자 하는 것이다. 이를 확인하기 위해 트레이닝 단계에서 각 주기(epoch)마다 테스트 이미지의 모든 평균 오차를 측정하였다. 트레이닝 단계에서 손실 균형 가중치는

로 설정되었다. 이 때,

와

는 0에서 1로 기하 급수적으로 증가하였다. In the machine learning system using the joint learning according to the embodiment of the present invention, the student learning module is intended to mimic the teacher learning module by the joint learning of the teacher and the student without the prior training model of the teacher network. To confirm this, all average errors of the test image were measured for each epoch in the training phase. The loss balance weight during the training phase

Was set to. At this time,

Wow

Increased exponentially from 0 to 1.

또한, 트레이닝 단계의 시작에서, 교사 학습 모듈 손실(L¹)이 다시 우세하게 전파되었으며, 임의의 주기(epoch) 이후에 학생 학습 모듈은 L²와 L³로 점진적으로 트레이닝되었다. 트레이닝 단계의 후반부에서는 모든 손실 함수는 전체 네트워크를 트레이닝하기 위해 수행되었다. In addition, at the beginning of the training phase, the teacher learning module loss (L ¹ ) was again predominantly propagated, and after a certain period (epoch) the student learning module was gradually trained into L ² and L ³ . In the second half of the training phase, all loss functions were performed to train the entire network.

도 4a 및 도 4b는 TCDCN과 MDCNNI/C인 두 네트워크 구조의 평균 오차에 대한 실험 결과를 도시한 것이다. 도 4a는 TCDCN의 실험 결과이고, 도 4b는 MDCNNI/C의 실험 결과이다. 4A and 4B show the experimental results for the average error of two network structures, TCDCN and MDCNNI / C. FIG. 4A is an experimental result of TCDCN, and FIG. 4B is an experimental result of MDCNNI / C.

도 4a 및 도 4b를 참조하면, 4개의 FLD 네트워크가 비교되는 것을 알 수 있다. '기존의 교사 학생 학습을 이용한 컴팩트한 FLD(compact FLD with the conventional teacher student learning)'는 기존의 교사 및 학생 학습 방법으로 트레이닝된 컴팩트한 네트워크(학생 모델)이다. 교사 모델(TCDCN, MDCNNI/C)과 학생 모델(공유 엔진 및 학생 학습 모듈로 구성)은 공동이 아닌 별도로 즉, 사전 트레이닝된 교사 모델을 모방한 학생 모델로 학습되었다.4A and 4B, it can be seen that four FLD networks are compared. 'Compact FLD with the conventional teacher student learning' is a compact network (student model) trained with existing teacher and student learning methods. The teacher models (TCDCN, MDCNNI / C) and student models (composed of shared engines and student learning modules) were trained as separate, ie, student models that mimic the pre-trained teacher model.

반면, '교사 학습 모듈을 이용한 FLD(FLD with teacher regression network)'는 공유 엔진과 교사 학습 모듈로 구성된 FLD 네트워크이며, 학생 학습 모듈과 공동으로 트레이닝 된다. On the other hand, 'FLD with teacher regression network (FLD) using a teacher learning module' is a FLD network composed of a sharing engine and a teacher learning module, and is trained jointly with a student learning module.

도 4a 및 도 4b를 참조하면, 학생 학습 모듈은 TCDCN과 MDCNNI/C 모두에서 컴팩트한 얼굴 표식 검출 네트워크(compact Facial Landmark Detection; compact FLD)를 만들기 위해 잘 작동한 것을 알 수 있다. '교사 학습 모듈을 이용한 FLD(FLD with teacher regression network)'는 TCDCN 및 MDCNNI/C의 경우 모두 우수한 성능을 제공하였다. TCDCNC의 성능은 '교사 학습 모듈을 이용한 FLD(FLD with teacher regression network)'의 성능을 잘 따르는 것을 알 수 있으며, MDCNNI/C의 경우, MDCNNI/CC의 성능은 '교사 학습 모듈을 이용한 FLD(FLD with teacher regression network)' 및 MDCNNI/C의 성능과 거의 동일한 것을 알 수 있다. 4A and 4B, it can be seen that the student learning module worked well to create a compact Facial Landmark Detection (compact FLD) network in both TCDCN and MDCNNI / C. 'FLD with teacher regression network (FLD) using teacher learning module' provided excellent performance in both TCDCN and MDCNNI / C. It can be seen that the performance of TCDCNC follows the performance of 'FLD with teacher regression network (FLD) using teacher learning module', and in the case of MDCNNI / C, the performance of MDCNNI / CC is' FLD (FLD using teacher learning module) with teacher regression network) 'and MDCNNI / C.

즉, 도 4a 및 도 4b에 따른 실험 결과를 살펴보면, 본 발명의 실시예에 따른 공동 학습을 이용한 기계학습 시스템은 컴팩트한 얼굴 표식 검출 네트워크(compact Facial Landmark Detection; compact FLD)를 구축하는데 효과적이었음을 확인할 수 있다. 나아가, 실험 결과는 본 발명에서 제안된 컴팩트한 FLD(compact FLD)가 네트워크 파라미터의 수를 현저하게 감소시켰음을 나타내며, 기존의 교사 및 학생 네트워크보다 훨씬 낮은 오차율을 달성하였음을 나타내었다. 이러한 결과는 본 발명의 제안된 컴팩트한 FLD(compact FLD)가 모바일 어플리케이션에서 적은 수의 네트워크 파라미터로 높은 얼굴 표식 검출 네트워크(FLD) 성능을 제공할 수 있음을 나타낸다. That is, looking at the experimental results according to FIGS. 4A and 4B, the machine learning system using the joint learning according to the embodiment of the present invention was effective in constructing a compact facial landmark detection (compact FLD) network. Can be confirmed. Furthermore, the experimental results indicate that the compact FLD proposed in the present invention significantly reduced the number of network parameters, and achieved a much lower error rate than the existing teacher and student networks. These results indicate that the proposed compact FLD of the present invention can provide high face mark detection network (FLD) performance with a small number of network parameters in a mobile application.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and / or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, field programmable arrays (FPAs), It may be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may perform an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and / or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed on networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CDROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기광 매체(magnetooptical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CDROMs, DVDs, and magnetic optical media such as floptical disks. (magnetooptical media), and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and / or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

100: 공동 학습을 이용한 기계학습 시스템
210: 트레이닝부(또는 트레이닝 단계)
220: 테스트부(또는 테스트 단계)
211: 공유 엔진
212, 320: 교사 학습 모듈
213, 330: 학생 학습 모듈
221: 컴팩트 학습 모듈
310: 공유 계층(또는 공유 컨볼루션 레이어)
321: 교사 학습 모듈 출력
322: TFC 레이어
323: 평평한 레이어(Flatten layer)
331: 학생 학습 모듈 출력
332: STDFC 레이어
333: 전역 평균 폴링100: machine learning system using joint learning
210: training unit (or training phase)
220: test unit (or test phase)
211: sharing engine
212, 320: Teacher Learning Module
213, 330: Student Learning Module
221: compact learning module
310: shared layer (or shared convolutional layer)
321: Teacher learning module output
322: TFC layer
323: Flatten layer
331: Student learning module output
332: STDFC layer
333: global average polling

Claims

In the machine learning method using the joint learning of the machine learning system including a processing unit and a joint learning unit,
Minimizing errors and characteristic differences between a teacher learning module and a student learning module branched from a sharing engine that extracts features of the input data using the loss function; And
The co-learning unit jointly learning the teacher learning module and the student learning module by the loss function.
Machine learning method using a joint learning that includes.

According to claim 1,
Inferring the classification and regression values for the input data using the network parameters of the learned student learning module and the sharing engine, which are further included in the machine learning system.
Machine learning method using a joint learning further comprising.

According to claim 1,
The step of the processing unit to minimize errors and characteristic differences between the branched teacher learning module and the student learning module is
Among the three loss functions, a first loss function and a second loss function are used to minimize the marking error of the teacher learning module and the student learning module, and a third loss function is used between the teacher learning module and the student learning module. Machine learning method using co-learning to minimize the difference in characteristics.

According to claim 3,
The co-learning unit jointly learning the teacher learning module and the student learning module by the loss function is
A machine learning method using joint learning, characterized by learning the student learning module imitating the teacher learning module by a final loss function among the three loss functions.

According to claim 1,
The co-learning unit jointly learning the teacher learning module and the student learning module by the loss function is
A machine learning method using co-learning such that the student learning modules, which are compact architectures, and the teacher learning modules including a fully connected layer are jointly learned and imitated with each other.

According to claim 2,
The inferring unit may infer classification and regression values for the input data.
A machine learning method using joint learning to infer classification and regression values for the input data through a compact machine learning model according to the network parameters received from the sharing engine and the student learning module.

The method of claim 6,
The machine learning model
A machine learning method using a joint learning that removes the teacher learning module and includes a compact learning module and the sharing engine constructed from the student learning module.

A computer program stored in a computer-readable recording medium to carry out the method of claim 1.

A processing unit for minimizing error and characteristic differences between a teacher learning module and a student learning module branched from a sharing engine that extracts features of input data using a loss function; And
Joint learning unit for jointly learning the teacher learning module and the student learning module by the loss function
Machine learning system using a joint learning that includes.

The method of claim 9,
Inference unit for inferring classification and regression values for the input data using the network parameters of each of the learned student learning module and the sharing engine
Machine learning system using a joint learning further comprising.

The method of claim 9,
The processing unit
Among the three loss functions, a first loss function and a second loss function are used to minimize the marking error of the teacher learning module and the student learning module, and a third loss function is used between the teacher learning module and the student learning module. Machine learning system using co-learning to minimize the difference in characteristics.

The method of claim 11,
The joint learning department
A machine learning system using joint learning, characterized by learning the student learning module imitating the teacher learning module by a final loss function among the three loss functions.

The method of claim 9,
The joint learning department
A machine learning system using co-learning that enables the student learning modules, which are compact architectures, and the teacher learning modules, including a fully connected layer, to be jointly learned and imitated with each other.

The method of claim 10,
The reasoning part
A machine learning system using joint learning to infer classification and regression values for the input data through a compact machine learning model according to the network parameters received from the sharing engine and the student learning module.

A training unit for training the student learning module imitating the teacher learning module by jointly learning a teacher learning module and a student learning module branched from a sharing engine extracting features of input data; And
A test unit for constructing a compact machine learning model including the shared engine and a compact learning module to infer classification and regression values for the input data
Machine learning system using a joint learning that includes.

The method of claim 15,
The training unit
The features encoded by the shared layer in the sharing engine are equally supplied to the teacher learning module and the student learning module, and the encoded features are classified and regressed through the teacher learning module and the student learning module. Machine learning system using co-learning to convert to value.

The method of claim 15,
The training unit
Characterized in that the training to jointly learn the teacher learning module and the student learning module using three loss functions,
Among the three loss functions, a first loss function and a second loss function are used to minimize the marking error of the teacher learning module and the student learning module, and a third loss function is used between the teacher learning module and the student learning module. Machine learning system using co-learning to minimize the difference in characteristics.

The method of claim 15,
The test section
A machine learning system using joint learning to remove the teacher learning module and infer classification and regression values for the input data through the compact machine learning model constructed by constructing the compact learning module from the student learning module.