KR102605220B1

KR102605220B1 - System and method for optimization of hyperparameter

Info

Publication number: KR102605220B1
Application number: KR1020180061277A
Authority: KR
Inventors: 최영준; 박민아; 김유진; 권일환
Original assignee: 삼성에스디에스 주식회사
Priority date: 2018-04-11
Filing date: 2018-05-29
Publication date: 2023-11-23
Also published as: KR20190118937A

Abstract

하이퍼파라미터의 최적화 시스템 및 방법이 제공된다. 본 발명의 일 실시예에 따른 하이퍼파라미터의 최적화 시스템은, 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출하고, 상기 특징 벡터 및 K-NN(K-Nearest Neighbor) 알고리즘을 이용하여 상기 복수의 학습 데이터 중 최적의 하이퍼파라미터(hyperparameter)를 탐색하는 데 사용될 하나 이상의 학습 데이터를 선별하는 데이터 선별부; 선별된 상기 학습 데이터를 기반으로 베이신 호핑(basin hopping) 알고리즘을 반복적으로 수행하여 상기 하이퍼파라미터의 탐색 범위를 제한하고, 제한된 상기 탐색 범위 내에서 베이지안 최적화(Bayesian Optimization) 알고리즘을 수행하여 하나의 하이퍼파라미터를 추천하는 파라미터 탐색부; 및 추천된 상기 하이퍼파라미터를 기반으로 대상 모델(target model)에서의 학습 및 성능에 대한 평가(evaluation)를 수행하여 새로운 모델을 생성하는 모델 생성부를 포함한다.A hyperparameter optimization system and method are provided. The hyperparameter optimization system according to an embodiment of the present invention extracts feature vectors for each of a plurality of previously prepared training data, and uses the feature vector and the K-NN (K-Nearest Neighbor) algorithm to a data selection unit that selects one or more learning data to be used to search for optimal hyperparameters among the learning data; Based on the selected learning data, a basis hopping algorithm is repeatedly performed to limit the search range of the hyperparameters, and a Bayesian optimization algorithm is performed within the limited search range to select one hyperparameter. a parameter search unit that recommends parameters; and a model creation unit that generates a new model by performing evaluation of learning and performance in the target model based on the recommended hyperparameters.

Description

Hyperparameter optimization system and method {SYSTEM AND METHOD FOR OPTIMIZATION OF HYPERPARAMETER}

본 발명의 실시예들은 딥러닝 모델과 같은 대상 모델(target model)의 학습 속도와 성능에 큰 영향을 미치는 하이퍼파라미터를 최적화하는 기술과 관련된다.Embodiments of the present invention relate to technology for optimizing hyperparameters that have a significant impact on the learning speed and performance of a target model, such as a deep learning model.

딥러닝(Deep Learning) 모델은 하이퍼파라미터(Hyperparameter)에 따라 그 학습 속도와 성능이 달라지게 된다. 하이퍼파라미터는 학습을 통해 튜닝 또는 최적화해야 하는 주변수가 아니라 학습 진도율이나 일반화 변수처럼 사람들이 선험적 지식(priori)으로 설정하거나 외부 모델 메커니즘을 통해 자동으로 설정되는 변수를 의미한다. 상기 하이퍼파라미터는 예를 들어, 학습 진도율(learning rate), 일반화 변수(regularization parameter), 학습(training)의 반복 횟수, 히든 유닛(hidden unit)의 개수 등이 될 수 있다.Deep Learning model's learning speed and performance vary depending on hyperparameters. Hyperparameters are not peripheral numbers that need to be tuned or optimized through learning, but rather variables that people set with a priori knowledge (priori) or are set automatically through an external model mechanism, such as learning progress rate or generalization variables. The hyperparameter may be, for example, a learning rate, a regularization parameter, the number of repetitions of training, the number of hidden units, etc.

종래에는 그리드 탐색(grid search)이나 랜덤 탐색(random search) 등의 방식으로 하이퍼파라미터를 선정한 후 학습을 수행하였으나, 이 경우 하이퍼파라미터 선정을 위해 사용된 학습 데이터의 양, 네트워크 아키텍처의 깊이 등에 따라 많은 시간이 소요되는 문제점이 있었다. 또한, 최근에는 하이퍼파라미터를 최적화하는 기법으로서 베이지안 최적화(Bayesian Optimization) 알고리즘이 사용되고 있으나, 상기 베이지안 최적화 알고리즘의 경우 수 시간 내지 수 일 소요되는 탐색 시간과 국부적 최적해(local optimum) 를 탐색할 가능성 때문에 실제 사업에 활용되는 데에는 제약이 있다.Conventionally, learning was performed after selecting hyperparameters using methods such as grid search or random search, but in this case, depending on the amount of learning data used to select hyperparameters and the depth of the network architecture, etc. There was a problem that it took time. In addition, the Bayesian Optimization algorithm has recently been used as a technique for optimizing hyperparameters. However, in the case of the Bayesian optimization algorithm, it is not practical due to the search time that takes several hours to several days and the possibility of searching for a local optimum. There are limitations to its use in business.

한국등록특허공보 제10-1075824호(2011.10.25)Korean Patent Publication No. 10-1075824 (2011.10.25)

본 발명의 실시예들은 대상 모델의 구성 및 학습 과정에서 소요되는 시간을 단축시키고 하이퍼파라미터의 전역 최적해(global optimum)를 찾는 수단을 제공하기 위한 것이다.Embodiments of the present invention are intended to reduce the time required for constructing and learning a target model and to provide a means of finding the global optimum of hyperparameters.

예시적인 실시예에 따르면, 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출하고, 상기 특징 벡터 및 K-NN(K-Nearest Neighbor) 알고리즘을 이용하여 상기 복수의 학습 데이터 중 최적의 하이퍼파라미터(hyperparameter)를 탐색하는 데 사용될 하나 이상의 학습 데이터를 선별하는 데이터 선별부; 선별된 상기 학습 데이터를 기반으로 베이신 호핑(basin hopping) 알고리즘을 반복적으로 수행하여 상기 하이퍼파라미터의 탐색 범위를 제한하고, 제한된 상기 탐색 범위 내에서 베이지안 최적화(Bayesian Optimization) 알고리즘을 수행하여 하나의 하이퍼파라미터를 추천하는 파라미터 탐색부; 및 추천된 상기 하이퍼파라미터를 기반으로 대상 모델(target model)에서의 학습 및 성능에 대한 평가(evaluation)를 수행하여 새로운 모델을 생성하는 모델 생성부를 포함하는, 하이퍼파라미터의 최적화 시스템이 제공된다.According to an exemplary embodiment, feature vectors for a plurality of previously prepared training data are extracted, and an optimal hyperparameter among the plurality of training data is determined using the feature vector and a K-Nearest Neighbor (K-NN) algorithm. a data selection unit that selects one or more learning data to be used to explore (hyperparameter); Based on the selected learning data, a basis hopping algorithm is repeatedly performed to limit the search range of the hyperparameters, and a Bayesian optimization algorithm is performed within the limited search range to select one hyperparameter. a parameter search unit that recommends parameters; and a model generator that generates a new model by performing evaluation of learning and performance in a target model based on the recommended hyperparameters. A hyperparameter optimization system is provided.

상기 데이터 선별부는, 각 학습 데이터의 특징 벡터에 대해 상기 K-NN 알고리즘을 적용하여 상기 복수의 학습 데이터 중 하나인 타깃 데이터의 클래스(class)와 상기 타깃 데이터가 속한 영역의 대표 클래스를 결정하고, 상기 타깃 데이터의 클래스가 상기 대표 클래스와 상이한 경우 상기 타깃 데이터를 상기 최적의 하이퍼파라미터를 탐색하는 데 사용될 학습 데이터로 선별할 수 있다.The data selection unit applies the K-NN algorithm to the feature vector of each training data to determine a class of target data, which is one of the plurality of training data, and a representative class of the area to which the target data belongs, If the class of the target data is different from the representative class, the target data may be selected as learning data to be used to search for the optimal hyperparameter.

상기 데이터 선별부는, 각 클래스별 상기 영역 내 학습 데이터들의 개수 및 상기 타깃 데이터와 상기 영역 내 학습 데이터들 간의 거리를 고려하여 상기 하나 이상의 학습 데이터를 선별할 수 있다.The data selection unit may select the one or more learning data in consideration of the number of learning data in the region for each class and the distance between the target data and the learning data in the region.

상기 파라미터 탐색부는, 상기 베이신 호핑 알고리즘 또는 상기 베이지안 최적화 알고리즘으로부터 도출되는 복수의 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실(loss) 지표 및 성능 평가 지표를 각각 저장하고, k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 성능 평가시 상기 저장된 손실 지표 중 설정된 값 이상의 성능 평가 지표에 대응되는 n개의 손실 지표의 이동 평균(moving average)을 계산하며, 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실 지표와 상기 이동 평균을 비교할 수 있다.The parameter search unit stores a loss index and a performance evaluation index of the target model for a plurality of candidate hyperparameters derived from the Basin hopping algorithm or the Bayesian optimization algorithm, respectively, and stores the loss index and performance evaluation index for the kth candidate hyperparameter. When evaluating the performance of the target model, a moving average of n loss indicators corresponding to a performance evaluation indicator greater than a set value among the stored loss indicators is calculated, and the loss indicator of the target model for the kth candidate hyperparameter is calculated. You can compare the moving average with .

상기 파라미터 탐색부는, 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실 지표가 상기 이동 평균 미만인 경우 학습의 조기 종료(early stopping) 여부를 결정하기 전에 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 학습을 종료시키고, 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표가 상기 이동 평균 이상인 경우 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 학습을 계속적으로 수행할 수 있다.The parameter search unit, when the loss index of the target model for the k-th candidate hyperparameter is less than the moving average, determines whether to early stop learning. Learning is terminated, and if the loss index for the kth candidate hyperparameter is greater than or equal to the moving average, learning of the target model for the kth candidate hyperparameter may be continuously performed.

상기 파라미터 탐색부는, 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표가 상기 이동 평균 이상이고 상기 k번째 후보 하이퍼파라미터에 대한 성능 평가 지표가 상기 저장된 성능 평가 지표들 중 하나보다 큰 경우 상기 n개의 손실 지표 중 가장 낮은 성능 평가 지표에 대응되는 손실 지표를 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표로 교체하여 상기 이동 평균을 재계산할 수 있다.If the loss index for the kth candidate hyperparameter is greater than or equal to the moving average and the performance evaluation index for the kth candidate hyperparameter is greater than one of the stored performance evaluation indexes, the parameter search unit selects one of the n loss metrics. The moving average can be recalculated by replacing the loss index corresponding to the lowest performance evaluation index with the loss index for the kth candidate hyperparameter.

상기 손실 지표는, 상기 대상 모델의 학습 과정에서 출력되는 손실의 Nat (natural unit of information) 값일 수 있다.The loss indicator may be a Nat (natural unit of information) value of loss output during the learning process of the target model.

상기 성능 평가 지표는, 상기 대상 모델의 정확도(accuracy), 에러율(error rate), 민감도(sensitivity), 정밀성(precision), 특이도(specificity) 및 오탐율(false Positive rate) 중 하나 이상을 포함할 수 있다. The performance evaluation index may include one or more of accuracy, error rate, sensitivity, precision, specificity, and false positive rate of the target model. You can.

상기 파라미터 탐색부는, 선별된 상기 학습 데이터를 기반으로 상기 베이신 호핑 알고리즘을 반복적으로 수행하여 복수의 제1 후보 하이퍼파라미터를 도출하고, 상기 각 제1 후보 하이퍼파라미터의 최소값과 최대값을 상기 하이퍼파라미터의 탐색 범위로 제한할 수 있다.The parameter search unit repeatedly performs the basis hopping algorithm based on the selected learning data to derive a plurality of first candidate hyperparameters, and sets the minimum and maximum values of each first candidate hyperparameter to the hyperparameters. The search range can be limited to .

상기 파라미터 탐색부는, 제한된 상기 탐색 범위 내에서 상기 베이지안 최적화 알고리즘을 반복적으로 수행하여 복수의 제2 후보 하이퍼파라미터를 도출하고, 상기 복수의 제1 후보 파이퍼파라미터 및 상기 복수의 제2 후보하이퍼파라미터 중 가장 최적의 성능 평가 지표를 출력하는 하나의 하이퍼파라미터를 추천할 수 있다.The parameter search unit repeatedly performs the Bayesian optimization algorithm within the limited search range to derive a plurality of second candidate hyperparameters, and the best candidate among the plurality of first candidate Piper parameters and the plurality of second candidate hyperparameters. One hyperparameter that outputs the optimal performance evaluation index can be recommended.

다른 예시적인 실시예에 따르면, 데이터 선별부에서, 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출하는 단계; 상기 데이터 선별부에서, 상기 특징 벡터 및 K-NN(K-Nearest Neighbor) 알고리즘을 이용하여 상기 복수의 학습 데이터 중 최적의 하이퍼파라미터(hyperparameter)를 탐색하는 데 사용될 하나 이상의 학습 데이터를 선별하는 단계; 파라미터 탐색부에서, 선별된 상기 학습 데이터를 기반으로 베이신 호핑(basin hopping) 알고리즘을 반복적으로 수행하여 상기 하이퍼파라미터의 탐색 범위를 제한하는 단계; 상기 파라미터 탐색부에서, 제한된 상기 탐색 범위 내에서 베이지안 최적화(Bayesian Optimization) 알고리즘을 수행하여 하나의 하이퍼파라미터를 추천하는 단계; 및 모델 생성부에서, 추천된 상기 하이퍼파라미터를 기반으로 대상 모델(target model)에서의 학습 및 상기 대상 모델의 성능에 대한 평가(evaluation)를 수행하여 새로운 모델을 생성하는 단계를 포함하는, 하이퍼파라미터 최적화 방법이 제공된다.According to another exemplary embodiment, in a data selection unit, extracting feature vectors for each of a plurality of already provided training data; In the data selection unit, selecting one or more learning data to be used to search for an optimal hyperparameter among the plurality of learning data using the feature vector and a K-Nearest Neighbor (K-NN) algorithm; In a parameter search unit, limiting the search range of the hyperparameter by repeatedly performing a basis hopping algorithm based on the selected learning data; Recommending one hyperparameter by performing a Bayesian optimization algorithm within the limited search range, in the parameter search unit; And in the model creation unit, a step of generating a new model by learning from a target model and evaluating the performance of the target model based on the recommended hyperparameters. Hyperparameters An optimization method is provided.

상기 하나 이상의 학습 데이터를 선별하는 단계는, 각 학습 데이터의 특징 벡터에 대해 상기 K-NN 알고리즘을 적용하여 상기 복수의 학습 데이터 중 하나인 타깃 데이터의 클래스(class)와 상기 타깃 데이터가 속한 영역의 대표 클래스를 결정하고, 상기 타깃 데이터의 클래스가 상기 대표 클래스와 상이한 경우 상기 타깃 데이터를 상기 최적의 하이퍼파라미터를 탐색하는 데 사용될 학습 데이터로 선별할 수 있다.The step of selecting the one or more learning data includes applying the K-NN algorithm to the feature vector of each learning data to determine the class of the target data, which is one of the plurality of learning data, and the area to which the target data belongs. A representative class may be determined, and if the class of the target data is different from the representative class, the target data may be selected as learning data to be used to search for the optimal hyperparameter.

상기 하나 이상의 학습 데이터를 선별하는 단계는, 각 클래스별 상기 영역 내 학습 데이터들의 개수 및 상기 타깃 데이터와 상기 영역 내 학습 데이터들 간의 거리를 고려하여 상기 하나 이상의 학습 데이터를 선별할 수 있다.In the step of selecting the one or more learning data, the one or more learning data may be selected in consideration of the number of learning data in the region for each class and the distance between the target data and the learning data in the region.

상기 하이퍼파라미터 최적화 방법은, 상기 파라미터 탐색부에서, 상기 베이신 호핑 알고리즘 또는 상기 베이지안 최적화 알고리즘으로부터 도출되는 복수의 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실(loss) 지표 및 성능 평가 지표를 각각 저장하는 단계; 상기 파라미터 탐색부에서, k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 성능 평가시 상기 저장된 손실 지표 중 설정된 값 이상의 성능 평가 지표에 대응되는 n개의 손실 지표의 이동 평균(moving average)을 계산하는 단계; 및 상기 파라미터 탐색부에서, 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실 지표와 상기 이동 평균을 비교하는 단계를 더 포함할 수 있다.The hyperparameter optimization method stores, in the parameter search unit, a loss index and a performance evaluation index of the target model for a plurality of candidate hyperparameters derived from the Basin hopping algorithm or the Bayesian optimization algorithm, respectively. step; Calculating, in the parameter search unit, a moving average of n loss indicators corresponding to performance evaluation indicators greater than a set value among the stored loss indicators when evaluating the performance of the target model for the kth candidate hyperparameter; and comparing, in the parameter search unit, a loss index of the target model for the k-th candidate hyperparameter with the moving average.

상기 하이퍼파라미터 최적화 방법은, 상기 비교하는 단계 이후, 상기 파라미터 탐색부에서, 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실 지표가 상기 이동 평균 미만인 경우 학습의 조기 종료(early stopping) 여부를 결정하기 전에 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 학습을 종료시키는 단계; 또는 상기 파라미터 탐색부에서, 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표가 상기 이동 평균 이상인 경우 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 학습을 계속적으로 수행하는 단계를 더 포함할 수 있다.In the hyperparameter optimization method, after the comparison step, the parameter search unit determines whether to early stop learning when the loss index of the target model for the kth candidate hyperparameter is less than the moving average. terminating learning of the target model for the kth candidate hyperparameter before starting; Alternatively, the parameter search unit may further include continuously performing learning of the target model for the kth candidate hyperparameter when the loss index for the kth candidate hyperparameter is greater than or equal to the moving average.

상기 하이퍼파라미터 최적화 방법은, 상기 비교하는 단계 이후, 상기 파라미터 탐색부에서, 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표가 상기 이동 평균 이상이고 상기 k번째 후보 하이퍼파라미터에 대한 성능 평가 지표가 상기 저장된 성능 평가 지표들 중 하나보다 큰 경우 상기 n개의 손실 지표 중 가장 낮은 성능 평가 지표에 대응되는 손실 지표를 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표로 교체하여 상기 이동 평균을 재계산하는 단계를 더 포함할 수 있다.In the hyperparameter optimization method, after the comparing step, in the parameter search unit, a loss index for the kth candidate hyperparameter is greater than or equal to the moving average, and a performance evaluation index for the kth candidate hyperparameter is calculated as the stored performance. If it is greater than one of the evaluation indicators, it may further include recalculating the moving average by replacing the loss indicator corresponding to the lowest performance evaluation indicator among the n loss indicators with the loss indicator for the kth candidate hyperparameter. You can.

상기 성능 평가 지표는, 상기 대상 모델의 정확도(accuracy), 에러율(error rate), 민감도(sensitivity), 정밀성(precision), 특이도(specificity) 및 오탐율(false Positive rate) 중 하나 이상을 포함할 수 있다.The performance evaluation index may include one or more of accuracy, error rate, sensitivity, precision, specificity, and false positive rate of the target model. You can.

상기 하이퍼파라미터의 탐색 범위를 제한하는 단계는, 상기 하이퍼파라미터의 초기 탐색 범위를 설정하고, 상기 대상 모델을 학습시키는 과정에서 상기 베이신 호핑 알고리즘으로부터 도출된 각 제1 후보 하이퍼파라미터의 최소값과 최대값을 상기 하이퍼파라미터의 탐색 범위로 제한할 수 있다.The step of limiting the search range of the hyperparameters includes setting the initial search range of the hyperparameters and setting the minimum and maximum values of each first candidate hyperparameter derived from the basis hopping algorithm in the process of learning the target model. can be limited to the search range of the hyperparameter.

상기 하나의 하이퍼파라미터를 추천하는 단계는, 상기 각 제1 후보 하이퍼파라미터, 및 상기 대상 모델을 학습시키는 과정에서 상기 베이지안 최적화 알고리즘으로부터 도출된 각 제2 후보 하이퍼파라미터에 대해 상기 대상 모델의 성능에 대한 평가를 수행하고, 상기 각 제1 후보 하이퍼파라미터 및 상기 각 제2 후보하이퍼파라미터 중 가장 최적의 성능 평가 지표를 출력하는 하나의 하이퍼파라미터를 추천할 수 있다.The step of recommending one hyperparameter includes determining the performance of the target model for each first candidate hyperparameter and each second candidate hyperparameter derived from the Bayesian optimization algorithm in the process of learning the target model. Evaluation may be performed, and one hyperparameter that outputs the most optimal performance evaluation index among each of the first candidate hyperparameters and each of the second candidate hyperparameters may be recommended.

본 발명의 실시예들에 따르면, 베이신 호핑 알고리즘을 반복적으로 수행함으로써 도출되는 후보 하이퍼파라미터의 최소값과 최대값을 하이퍼파라미터의 탐색 범위로 제한하고 상기 탐색 범위 내에서 베이지안 최적화 알고리즘을 수행함으로써, 하이퍼파라미터의 탐색 시간을 줄이면서 전역 최적해(global optimum) 효율적으로 찾을 수 있다.According to embodiments of the present invention, by limiting the minimum and maximum values of candidate hyperparameters derived by repeatedly performing the Basin hopping algorithm to the search range of the hyperparameter and performing the Bayesian optimization algorithm within the search range, The global optimum can be found efficiently while reducing parameter search time.

또한, 본 발명의 실시예들에 따르면, 하이퍼파라미터 탐색 과정에서 학습의 조기 종료 여부를 결정하기 전 추가적인 학습의 수행여부를 결정함으로써, 학습 시간을 단축시키고 이에 따라 전체 탐색 시간을 줄일 수 있다.Additionally, according to embodiments of the present invention, by determining whether to perform additional learning before deciding whether to end learning early in the hyperparameter search process, the learning time can be shortened, and thus the overall search time can be reduced.

도 1은 본 발명의 일 실시예에 따른 최적화 시스템의 상세 구성을 나타낸 블록도
도 2는 본 발명의 제1 실시예에 따른 데이터 선별부에서 학습 데이터를 선별하는 방법을 설명하기 위한 흐름도
도 3은 본 발명의 제2 실시예에 따른 데이터 선별부에서 학습 데이터를 선별하는 방법을 설명하기 위한 흐름도
도 4는 본 발명의 실시예들에 따른 학습 데이터 선별 과정을 나타낸 예시
도 5는 본 발명의 일 실시예에 따른 파라미터 탐색부에서 제1 탐색 과정을 수행하는 방법을 설명하기 위한 흐름도
도 6은 본 발명의 일 실시예에 따른 파라미터 탐색부에서 제2 탐색 과정을 수행하는 방법을 설명하기 위한 흐름도
도 7은 본 발명의 일 실시예에 따른 모델 생성부에서 새로운 모델을 생성하는 방법을 설명하기 위한 흐름도
도 8은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도1 is a block diagram showing the detailed configuration of an optimization system according to an embodiment of the present invention.
Figure 2 is a flowchart illustrating a method of selecting learning data in the data selection unit according to the first embodiment of the present invention.
Figure 3 is a flowchart illustrating a method of selecting learning data in the data selection unit according to the second embodiment of the present invention.
Figure 4 is an example showing the learning data selection process according to embodiments of the present invention
Figure 5 is a flowchart illustrating a method of performing a first search process in a parameter search unit according to an embodiment of the present invention.
Figure 6 is a flowchart illustrating a method of performing a second search process in a parameter search unit according to an embodiment of the present invention.
Figure 7 is a flowchart illustrating a method of creating a new model in the model creation unit according to an embodiment of the present invention.
8 is a block diagram illustrating and illustrating a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The detailed description below is provided to provide a comprehensive understanding of the methods, devices and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments of the present invention, if it is determined that a detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification. The terminology used in the detailed description is merely for describing embodiments of the present invention and should in no way be limiting. Unless explicitly stated otherwise, singular forms include plural meanings. In this description, expressions such as “comprising” or “comprising” are intended to indicate certain features, numbers, steps, operations, elements, parts or combinations thereof, and one or more than those described. It should not be construed to exclude the existence or possibility of any other characteristic, number, step, operation, element, or part or combination thereof.

도 1은 본 발명의 일 실시예에 따른 최적화 시스템(100)의 상세 구성을 나타낸 블록도이다. 본 발명의 일 실시예에 따른 최적화 시스템(100)은 대상 모델(target model)의 학습 속도와 성능에 큰 영향을 미치는 하이퍼파라미터(hyperparameter)를 최적화하기 위한 시스템이다. Figure 1 is a block diagram showing the detailed configuration of the optimization system 100 according to an embodiment of the present invention. The optimization system 100 according to an embodiment of the present invention is a system for optimizing hyperparameters that have a significant impact on the learning speed and performance of a target model.

본 실시예들에 있어서, 대상 모델은 학습(training) 및 성능에 대한 평가(evaluation)의 대상이 되는 목적 함수(objective function)로서, 예를 들어 딥러닝 모델, SVM(Support Vector Machine) 모델 등이 이에 해당할 수 있다. 또한, 하이퍼파라미터는 학습을 통해 튜닝 또는 최적화해야 하는 주변수가 아니라 학습 진도율이나 일반화 변수처럼 사람들이 선험적 지식(priori)으로 설정하거나 외부 모델 메커니즘을 통해 자동으로 설정되는 변수를 의미한다. 상기 하이퍼파라미터는 예를 들어, 학습 진도율(learning rate), 일반화 변수(regularization parameter), 학습(training)의 반복 횟수, 히든 유닛(hidden unit)의 개수 등이 될 수 있다. 또한, 대상 모델에서의 학습은 학습 데이터로부터 계산되는 손실 함수(loss function)를 최소화하는 과정을 의미하며, 하이퍼파라미터의 최적화는 손실 함수의 값을 가능한 한 낮추는 매개변수(또는 목적 함수의 함수값을 최대화 또는 최소화시키는 매개변수)를 탐색하는 것, 즉 오차를 최소화하는 최적의 가중치를 탐색하는 것을 의미한다.In these embodiments, the target model is an objective function that is the subject of training and performance evaluation, for example, a deep learning model, SVM (Support Vector Machine) model, etc. This may apply. In addition, hyperparameters are not peripheral numbers that need to be tuned or optimized through learning, but rather variables that people set with a priori knowledge (priori) or are automatically set through an external model mechanism, such as learning progress rate or generalization variables. The hyperparameter may be, for example, a learning rate, a regularization parameter, the number of repetitions of training, the number of hidden units, etc. In addition, learning in the target model refers to the process of minimizing the loss function calculated from the learning data, and hyperparameter optimization refers to the parameter (or the function value of the objective function) that lowers the value of the loss function as much as possible. This means searching for parameters that maximize or minimize), that is, searching for optimal weights that minimize the error.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 최적화 시스템(100)은 데이터 선별부(102), 파라미터 탐색부(104) 및 모델 생성부(106)를 포함한다.As shown in FIG. 1, the optimization system 100 according to an embodiment of the present invention includes a data selection unit 102, a parameter search unit 104, and a model creation unit 106.

데이터 선별부(102)는 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출하고, 상기 특징 벡터 및 K-NN(K-Nearest Neighbor) 알고리즘을 이용하여 상기 복수의 학습 데이터 중 최적의 하이퍼파라미터를 탐색하는 데 사용될 하나 이상의 학습 데이터를 선별한다. 종래에는 최적의 하이퍼파라미터를 찾기 위해 전체 학습 데이터 또는 무작위 추출을 통해 선별된 학습 데이터들을 활용하였으나, 이 경우 학습 시간이 오래 걸리거나 학습 성능이 다소 떨어지는 문제점이 있었다. 이에 따라, 본 발명의 실시예들에서는 전체 학습 데이터 대신 학습 성능에 영향도(또는 불확실성)가 높은 학습 데이터를 선별한 후 선별된 상기 학습 데이터를 기초로 하이퍼파라미터를 탐색하도록 하였다.The data selection unit 102 extracts feature vectors for each of the plurality of training data, and selects the optimal hyperparameter among the plurality of training data using the feature vector and K-NN (K-Nearest Neighbor) algorithm. Select one or more learning data to be used to explore. Conventionally, in order to find the optimal hyperparameter, all training data or training data selected through random sampling were used, but in this case, there was a problem that the training time took a long time or the learning performance was somewhat poor. Accordingly, in embodiments of the present invention, training data with high influence (or uncertainty) on learning performance is selected instead of all training data, and then hyperparameters are searched based on the selected training data.

이를 위해, 데이터 선별부(102)는 먼저 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출할 수 있다. 이후, 데이터 선별부(102)는 각 학습 데이터의 특징 벡터에 대해 상기 K-NN 알고리즘을 적용하여 상기 복수의 학습 데이터 중 하나인 타깃 데이터의 클래스(class)와 상기 타깃 데이터가 속한 영역의 대표 클래스를 결정할 수 있다. 구체적으로, 데이터 선별부(102)는 상기 특징 벡터를 이용하여 각 학습 데이터 간 유클리디안 거리(Euclidian distance)를 계산하여 타깃 데이터의 클래스와 상기 타깃 데이터 주변의 학습 데이터들의 클래스를 각각 결정하고, 이로부터 상기 타깃 데이터가 속한 영역의 대표 클래스, 즉 상기 영역에 가장 많이 존재하는 클래스를 결정할 수 있다. 이때, 상기 영역의 경계, 즉 K-NN 알고리즘의 K는 예를 들어, 아래 수학식 1과 같이 계산될 수 있다.To this end, the data selection unit 102 may first extract feature vectors for each of the plurality of pre-installed training data. Thereafter, the data selection unit 102 applies the K-NN algorithm to the feature vector of each learning data to select the class of the target data, which is one of the plurality of learning data, and the representative class of the area to which the target data belongs. can be decided. Specifically, the data selection unit 102 calculates the Euclidean distance between each learning data using the feature vector to determine the class of the target data and the class of learning data surrounding the target data, respectively. From this, it is possible to determine the representative class of the area to which the target data belongs, that is, the class that exists most frequently in the area. At this time, the boundary of the area, that is, K of the K-NN algorithm, can be calculated, for example, as in Equation 1 below.

[수학식 1][Equation 1]

여기서, 은 학습 데이터의 전체 개수, 는 학습 데이터들이 속한 클래스들의 전체 개수, s는 노이즈 계수를 각각 나타낸다.here, is the total number of training data, represents the total number of classes to which the learning data belong, and s represents the noise coefficient.

만약, 상기 타깃 데이터의 클래스가 상기 대표 클래스와 상이한 경우, 데이터 선별부(102)는 상기 타깃 데이터를 최적의 하이퍼파라미터를 탐색하는 데 사용될 학습 데이터로 선별할 수 있다.If the class of the target data is different from the representative class, the data selection unit 102 may select the target data as learning data to be used to search for optimal hyperparameters.

일 예시로서, 타깃 데이터의 클래스가 a이며 타깃 클래스가 속한 영역의 대표 클래스가 b인 경우, 상기 타깃 데이터는 학습 성능(또는 분류 성능)에 영향도가 높은 학습 데이터인 것으로 볼 수 있다. 이에 따라, 데이터 선별부(102)는 상기 타깃 데이터를 최적의 하이퍼파라미터를 탐색하는 데 사용될 학습 데이터로 선별할 수 있다.As an example, if the class of the target data is a and the representative class of the area to which the target class belongs is b, the target data can be viewed as learning data that has a high impact on learning performance (or classification performance). Accordingly, the data selection unit 102 can select the target data as learning data to be used to search for optimal hyperparameters.

또한, 데이터 선별부(102)는 각 클래스별 상기 영역 내 학습 데이터들의 개수 및 상기 타깃 데이터와 상기 영역 내 학습 데이터들 간의 거리를 고려하여 상기 하나 이상의 학습 데이터를 선별할 수도 있다. 구체적으로, 데이터 선별부(102)는 상기 특징 벡터 및 K-NN 알고리즘을 이용하여 각 클래스별 상기 영역 내 학습 데이터들의 개수를 획득하고, 상기 타깃 데이터와 상기 영역 내 학습 데이터들 간의 거리에 따라 상기 각 클래스별로 보다 카운트(borda count) 알고리즘을 적용하여 각 클래스별 보팅 결과(voting result)를 도출할 수 있다. 이때, 데이터 선별부(102)는 상기 타깃 데이터와 가까운 거리에 존재하는 학습 데이터에 대해 더 높은 가중치를 부여할 수 있다.Additionally, the data selection unit 102 may select the one or more learning data in consideration of the number of learning data in the region for each class and the distance between the target data and the learning data in the region. Specifically, the data selection unit 102 obtains the number of training data in the region for each class using the feature vector and the K-NN algorithm, and uses the feature vector and the K-NN algorithm to obtain the number of training data in the region according to the distance between the target data and the training data in the region. Voting results for each class can be derived by applying the borda count algorithm to each class. At this time, the data selection unit 102 may assign a higher weight to learning data that exists close to the target data.

위 예시에서, 타깃 데이터의 클래스(예를 들어, a)와 동일한 클래스를 갖는 학습 데이터가 3개 존재하고 상기 타깃 클래스가 속한 영역의 대표 클래스(예를 들어, b)와 동일한 클래스를 갖는 학습 데이터가 4개 존재하는 상태에서 a 클래스의 학습 데이터들이 b 클래스의 학습 데이터들보다 타깃 데이터로부터 가까운 위치에 존재하는 경우, 데이터 선별부(102)는 상기 타깃 데이터를 최적의 하이퍼파라미터를 탐색하는 데 사용될 학습 데이터로 선별하지 않을 수 있다. 이 경우, 타깃 데이터에 대응되는 클래스의 개수가 상대적으로 적지만 상기 클래스를 갖는 학습 데이터가 상대적으로 몰려 있어 이들을 분류하는 것이 크게 어렵지 않으므로, 데이터 선별부(102)는 상기 타깃 데이터를 최적의 하이퍼파라미터를 탐색하는 데 사용될 학습 데이터로 선별하지 않을 수 있다.In the example above, there are three learning data having the same class as the class of the target data (e.g., a), and learning data having the same class as the representative class (e.g., b) of the area to which the target class belongs. When there are four, and the learning data of class a are located closer to the target data than the learning data of class b, the data selection unit 102 uses the target data to search for optimal hyperparameters. It may not be selected as training data. In this case, although the number of classes corresponding to the target data is relatively small, the learning data having the classes are relatively concentrated and it is not very difficult to classify them, so the data selection unit 102 selects the target data as an optimal hyperparameter. It may not be selected as training data to be used to explore.

이와 같이, 데이터 선별부(102)는 K-NN 알고리즘, 보다 카운트 등을 이용하여 전체 학습 데이터 대신 학습 성능에 영향도가 높은 학습 데이터를 선별할 수 있다. 다만, 데이터 선별부(102)가 학습 데이터를 선별하는 방식이 이에 한정되는 것은 아니며, 데이터 선별부(102)는 다양한 방식으로 상기 학습 데이터를 선별할 수 있다. 예를 들어, 데이터 선별부(102)는 프로덕트 양자화(Product Quantization) 기법을 이용하여 학습 데이터를 선별할 수도 있다. 구체적으로, 데이터 선별부(102)는 각 학습 데이터의 특징 벡터를 M 등분하여 서브벡터(subvector)를 생성하고, 각 서브 벡터에 대해 k-평균 클러스터링(k-mean clustering) 기법 등을 적용하여 다수의 중심(centroid)을 생성할 수 있다. 데이터 선별부(102)는 각 서브벡터에 가장 가까운 중심에 대해 유클리디안 거리를 계산하고, 이에 따라 각 학습 데이터의 특징 벡터는 상기 중심과의 거리로부터 연산 가능한 거리 값들을 갖게 된다. 데이터 선별부(102)는 각 클래스별로 동일한 수의 학습 데이터 추출을 위해 각 클래스별로 가장 작은 거리 값과 가장 큰 거리 값, 중간 범위의 거리 값을 갖는 학습 데이터를 각각 선별할 수 있으며, 이와 같이 선별된 학습 데이터가 추후 상기 하이퍼파라미터를 탐색하는 데 사용될 수 있다.In this way, the data selection unit 102 can select learning data that has a high impact on learning performance instead of all learning data using the K-NN algorithm, Boda Count, etc. However, the method by which the data selection unit 102 selects the learning data is not limited to this, and the data selection unit 102 may select the learning data in various ways. For example, the data selection unit 102 may select learning data using a product quantization technique. Specifically, the data selection unit 102 divides the feature vectors of each learning data into M equal parts to generate subvectors, and applies a k-mean clustering technique to each subvector to create a plurality of subvectors. A centroid can be created. The data selection unit 102 calculates the Euclidean distance for the center closest to each subvector, and thus the feature vector of each learning data has distance values that can be calculated from the distance to the center. The data selection unit 102 may select learning data having the smallest distance value, the largest distance value, and the middle range distance value for each class in order to extract the same number of learning data for each class, and selects in this way. The learned data can be used to explore the hyperparameters later.

파라미터 탐색부(104)는 선별된 상기 학습 데이터를 기반으로 최적의 하이퍼파라미터를 탐색한다. 종래에는 최적의 하이퍼파라미터를 찾기 위해 초기 탐색 범위를 설정하고 상기 탐색 범위로 베이지안 최적화(Bayesian Optimization) 알고리즘을 수행하였다. 일반적으로, 베이지안 최적화 알고리즘은 Exploration-Exploitation의 trade-off를 이용하여 전역 최적해(global optimum)를 찾아가게 된다. 그러나, 베이지안 최적화 알고리즘의 경우 상대적으로 Exploration이 부족하여 초기 탐색 범위 또는 최초의 후보 하이퍼파라미터를 잘못 선정할 경우 국부적 최적해(local optimum)를 탐색할 가능성이 있다. 이에 따라, 본 발명의 실시예들에서는 베이지안 최적화 알고리즘을 수행하기 전에 베이신 호핑 알고리즘을 수행하여 하이퍼파라미터의 탐색 범위를 줄이고 줄어든 탐색 범위 내에서 상기 베이지안 최적화 알고리즘을 수행하도록 하였다.The parameter search unit 104 searches for optimal hyperparameters based on the selected learning data. Conventionally, in order to find optimal hyperparameters, an initial search range was set and a Bayesian optimization algorithm was performed using the search range. In general, the Bayesian optimization algorithm uses the trade-off of Exploration-Exploitation to find the global optimum. However, in the case of the Bayesian optimization algorithm, exploration is relatively lacking, so if the initial search range or first candidate hyperparameter is selected incorrectly, there is a possibility of searching for a local optimum. Accordingly, in embodiments of the present invention, a Basin hopping algorithm is performed before performing the Bayesian optimization algorithm to reduce the search range of hyperparameters and the Bayesian optimization algorithm is performed within the reduced search range.

보다 구체적으로 설명하면, 파라미터 탐색부(104)는 아래와 같은 제1 탐색 과정 및 제2 탐색 과정을 통해 최적의 하이퍼파라미터를 결정할 수 있다. 여기서, 제1 탐색 과정은 베이신 호핑 알고리즘을 통해 하이퍼파라미터를 탐색하는 과정을 의미하며, 제2 탐색 과정은 베이지안 최적화 알고리즘을 통해 하이퍼파라미터를 탐색하는 과정을 의미한다.To be more specific, the parameter search unit 104 can determine the optimal hyperparameter through the first and second search processes as follows. Here, the first search process refers to a process of searching hyperparameters through a Basin hopping algorithm, and the second search process refers to a process of searching hyperparameters through a Bayesian optimization algorithm.

* 제1 탐색 과정* First search process

파라미터 탐색부(104)는 선별된 학습 데이터를 기반으로 베이신 호핑 알고리즘을 반복적으로 수행하여 복수의 제1 후보 하이퍼파라미터를 도출할 수 있다. 베이신 호핑 알고리즘은 목적 함수의 전역 최적해를 확률적으로 탐색하는 알고리즘으로서, Exploration에 강인한 장점이 있다. 복수의 제1 후보 하이퍼파라미터가 도출되는 경우, 파라미터 탐색부(104)는 도출된 각 제1 후보 하이퍼파라미터의 최소값과 최대값을 상기 하이퍼파라미터의 탐색 범위로 제한(또는 설정)할 수 있다. 한편, 베이신 호핑 알고리즘을 통해 제1 후보 하이퍼파라미터가 도출되는 방법은 본 발명이 속한 기술분야에서 일반적으로 널리 알려져 있는바 이에 대한 자세한 설명은 생략하기로 한다.The parameter search unit 104 may repeatedly perform a basis hopping algorithm based on the selected learning data to derive a plurality of first candidate hyperparameters. The Basin hopping algorithm is an algorithm that stochastically searches for the global optimal solution of the objective function, and has the advantage of being robust in exploration. When a plurality of first candidate hyperparameters are derived, the parameter search unit 104 may limit (or set) the minimum and maximum values of each derived first candidate hyperparameter to the search range of the hyperparameter. Meanwhile, the method of deriving the first candidate hyperparameter through the basis hopping algorithm is generally well known in the technical field to which the present invention pertains, so a detailed description thereof will be omitted.

* 제2 탐색 과정* Second search process

파라미터 탐색부(104)는 제1 탐색 과정에서 제한된 상기 탐색 범위 내에서 베이지안 최적화 알고리즘을 반복적으로 수행하여 복수의 제2 후보 하이퍼파라미터를 도출한다. 베이지안 최적화 알고리즘은 베이신 호핑 알고리즘과 마찬가지로 목적 함수의 전역 최적해를 탐색하는 알고리즘이다. 다만, 베이지안 최적화 알고리즘의 경우 베이신 호핑 알고리즘에 비해 Exploitation에 강인한 장점이 있다. 파라미터 탐색부(104)는 상기 탐색 범위 내에서 상기 베이지안 최적화 알고리즘을 수행할 수 있으며, 이에 따라 상기 제2 후보 하이퍼파라미터 각각은 상기 탐색 범위 내에 존재하게 된다. 한편, 베이지안 최적화 알고리즘을 통해 제2 후보 하이퍼파라미터가 도출되는 방법은 본 발명이 속한 기술분야에서 일반적으로 널리 알려져 있는바 이에 대한 자세한 설명은 생략하기로 한다.The parameter search unit 104 repeatedly performs a Bayesian optimization algorithm within the limited search range in the first search process to derive a plurality of second candidate hyperparameters. The Bayesian optimization algorithm, like the Basin hopping algorithm, is an algorithm that searches for the global optimal solution of the objective function. However, the Bayesian optimization algorithm has the advantage of being stronger against exploitation than the Basin hopping algorithm. The parameter search unit 104 may perform the Bayesian optimization algorithm within the search range, and accordingly, each of the second candidate hyperparameters exists within the search range. Meanwhile, the method of deriving the second candidate hyperparameter through the Bayesian optimization algorithm is generally known in the technical field to which the present invention pertains, and detailed description thereof will be omitted.

이후, 파라미터 탐색부(104)는 상기 복수의 제1 후보 파이퍼파라미터 및 상기 복수의 제2 후보하이퍼파라미터 중 가장 최적의 성능 평가 지표를 출력하는 하나의 하이퍼파라미터를 추천할 수 있다.Thereafter, the parameter search unit 104 may recommend one hyperparameter that outputs the most optimal performance evaluation index among the plurality of first candidate Piper parameters and the plurality of second candidate hyperparameters.

이와 같이, 파라미터 탐색부(104)는 설정된 탐색 횟수만큼 베이신 호핑 알고리즘을 반복적으로 수행하여 탐색 범위를 제한하고, 상기 탐색 범위 내에서 설정된 탐색 횟수만큼 베이지안 최적화 알고리즘을 반복적으로 수행할 수 있다. 상기 탐색 횟수는 예를 들어, 사용자에 의해 미리 설정되어 있을 수 있다. 여기서는, 설명의 편의상 제1 탐색 과정에서의 탐색 횟수 및 제2 탐색 과정에서의 탐색 횟수가 각각 50번인 것으로 가정한다.In this way, the parameter search unit 104 can repeatedly perform the Basin hopping algorithm a set number of searches to limit the search range, and repeatedly perform the Bayesian optimization algorithm within the search range a set number of searches. For example, the number of searches may be preset by the user. Here, for convenience of explanation, it is assumed that the number of searches in the first search process and the number of searches in the second search process are each 50.

일 예시로서, 파라미터 탐색부(104)는 베이신 호핑 알고리즘을 50번 수행하여 h₁ ~ h₅₀ 을 도출하고, h₁ ~ h₅₀ 의 최소값(h_min)와 최대값(h_max)을 상기 탐색 범위로 제한할 수 있다(즉, h_min < 탐색 범위 < h_max). 이후, 파라미터 탐색부(104)는 상기 탐색 범위 내에서 베이지안 최적화 알고리즘을 50번 수행하여 h₅₁ ~ h₁₀₀ 을 도출할 수 있다. 또한, 파라미터 탐색부(104)는 h₁ ~ h₁₀₀ 중 가장 최적의 성능 평가 지표를 출력하는 하나의 하이퍼파라미터를 최적의 하이퍼파라미터로 결정할 수 있다. 이하에서는, 설명의 편의상 베이신 호핑 알고리즘을 통해 도출되는 하이퍼파라미터들을 제1 후보 하이퍼파라미터, 베이지안 최적화 알고리즘을 통해 도출되는 하이퍼파라미터들을 제2 후보 하이퍼파라미터라 칭하기로 한다.As an example, the parameter search unit 104 performs the basis hopping algorithm 50 times to derive h ₁ to h ₅₀ , and searches for the minimum value (h _min ) and maximum value (h _max ) of h ₁ to h ₅₀ . It can be limited by range (i.e. h _min < search range < h _max ). Thereafter, the parameter search unit 104 may perform the Bayesian optimization algorithm 50 times within the search range to derive h ₅₁ to h ₁₀₀ . Additionally, the parameter search unit 104 may determine one hyperparameter outputting the most optimal performance evaluation index among h ₁ to h ₁₀₀ as the optimal hyperparameter. Hereinafter, for convenience of explanation, hyperparameters derived through the Basin hopping algorithm will be referred to as first candidate hyperparameters, and hyperparameters derived through the Bayesian optimization algorithm will be referred to as second candidate hyperparameters.

또한, 파라미터 탐색부(104)는 각 제1 후보 하이퍼파라미터 및 각 제2 후보 하이퍼파라미터에 대해 대상 모델을 학습시킬 수 있으며, 이 과정에서 각 제1 후보 하이퍼파라미터 및 각 제2 후보 하이퍼파라미터에 대한 대상 모델의 손실(loss) 지표 및 성능 평가 지표를 획득할 수 있다. 파라미터 탐색부(104)는 상기 손실 지표 및 성능 평가 지표를 이용하여 상기 제1 탐색 과정 및 상기 제2 탐색 과정에서의 학습 시간을 줄일 수 있다. In addition, the parameter search unit 104 may learn a target model for each first candidate hyperparameter and each second candidate hyperparameter, and in this process, the target model for each first candidate hyperparameter and each second candidate hyperparameter Loss indicators and performance evaluation indicators of the target model can be obtained. The parameter search unit 104 can reduce the learning time in the first search process and the second search process by using the loss index and the performance evaluation index.

이를 위해, 파라미터 탐색부(104)는 상기 베이신 호핑 알고리즘 또는 상기 베이지안 최적화 알고리즘으로부터 도출되는 복수의 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실 지표 및 성능 평가 지표를 각각 저장할 수 있다. 여기서, 손실 지표는 후보 하이퍼파라미터의 사용시 대상 모델을 통해 예측된 값과 실제 값과의 차이(error)에 대한 정도를 나타낸 지표로서, 예를 들어 상기 대상 모델의 학습 과정에서 나타되는 손실의 nat (natural unit of information) 값일 수 있다. 상기 손실 지표(nats of loss)는 예를 들어, 아래 수학식 2와 같이 표현될 수 있다.To this end, the parameter search unit 104 may store a loss index and a performance evaluation index of the target model for a plurality of candidate hyperparameters derived from the Basin hopping algorithm or the Bayesian optimization algorithm, respectively. Here, the loss index is an index indicating the degree of difference (error) between the value predicted through the target model and the actual value when using the candidate hyperparameter. For example, the loss index (nat) that appears during the learning process of the target model It may be a natural unit of information) value. The loss indicator (nats of loss) can be expressed, for example, as Equation 2 below.

[수학식 2][Equation 2]

이때, 배치 크기(batch size)는 배치 하나에 포함되는 학습 데이터의 개수를 의미한다. 또한, 학습의 초기 단계에서는 손실 값이 큰 의미가 없으므로, 위 수학식 2에서 사용되는 손실(loss) 값은 학습이 어느 정도 진행된 상태에서의 손실 값일 수 있다. 상기 수학식 2에서 사용되는 손실(loss) 값은 예를 들어, p ~ q번째 에폭(epoch)에서의 손실 값일 수 있다(이때, 1 < p < q)At this time, batch size refers to the number of learning data included in one batch. Additionally, since the loss value does not have much meaning in the early stages of learning, the loss value used in Equation 2 above may be the loss value when learning has progressed to some extent. The loss value used in Equation 2 may be, for example, the loss value in the p to qth epoch (in this case, 1 < p < q).

또한, 성능 평가 지표는 예를 들어, 상기 대상 모델의 정확도(accuracy), 에러율(error rate), 민감도(sensitivity), 정밀성(precision), 특이도(specificity) 및 오탐율(false Positive rate) 중 하나 이상을 포함할 수 있다.In addition, the performance evaluation index is, for example, one of accuracy, error rate, sensitivity, precision, specificity, and false positive rate of the target model. It may include more.

일 예시로서, 파라미터 탐색부(104)는 h₁ ~ h₂₀ 에 대한 손실 지표 및 성능 평가 지표를 각각 저장할 수 있다. As an example, the parameter search unit 104 may store loss indicators and performance evaluation indicators for h ₁ to h ₂₀ , respectively.

이후, 파라미터 탐색부(104)는 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 성능 평가시 상기 저장된 손실 지표 중 설정된 값 이상의 성능 평가 지표에 대응되는 n개의 손실 지표의 이동 평균(moving average)을 계산할 수 있다. Thereafter, when evaluating the performance of the target model for the k-th candidate hyperparameter, the parameter search unit 104 calculates a moving average of n loss indicators corresponding to a performance evaluation indicator greater than a set value among the stored loss indicators. You can.

위 예시에서, 파라미터 탐색부(104)는 21번째 후보 하이퍼파라미터, 즉 h₂₁ 에 대한 상기 대상 모델의 성능 평가시 상기 저장된 손실 지표 중 설정된 값 이상의 성능 평가 지표에 대응되는 n개의 손실 지표의 이동 평균을 계산할 수 있다. 예를 들어, 파라미터 탐색부(104)는 현재까지 저장된 20개의 손실 지표 중 설정된 값 이상의 성능 평가 지표를 출력하는 후보 하이퍼파라미터에 대응되는 15개의 손실 지표의 이동 평균을 계산할 수 있다.In the above example, when evaluating the performance of the target model for the 21st candidate hyperparameter, that is, h ₂₁ , the parameter search unit 104 calculates a moving average of n loss indicators corresponding to a performance evaluation indicator greater than a set value among the stored loss indicators. can be calculated. For example, the parameter search unit 104 may calculate a moving average of 15 loss indicators corresponding to candidate hyperparameters that output a performance evaluation indicator greater than a set value among the 20 loss indicators stored to date.

이후, 파라미터 탐색부(104)는 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실 지표와 상기 이동 평균을 비교할 수 있다. Thereafter, the parameter search unit 104 may compare the moving average with the loss index of the target model for the k-th candidate hyperparameter.

만약, 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 손실 지표가 상기 이동 평균 미만인 경우, 파라미터 탐색부(104)는 학습의 조기 종료(early stopping) 여부를 결정하기 전에 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 추가적인 학습을 종료시킬 수 있다. 이 경우, 손실이 너무 커 추가적인 학습이 큰 의미가 없으므로, 파라미터 탐색부(104)는 더 이상의 학습을 추가적으로 수행하지 않고 상기 k번째 후보 하이퍼파라미터에 대한 평가를 수행할 수 있다.If the loss index of the target model for the kth candidate hyperparameter is less than the moving average, the parameter search unit 104 determines whether to stop learning early. Additional learning of the target model may be terminated. In this case, since the loss is so large that additional learning is meaningless, the parameter search unit 104 may evaluate the kth candidate hyperparameter without performing additional learning.

만약, 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표가 상기 이동 평균 이상인 경우, 파라미터 탐색부(104)는 상기 k번째 후보 하이퍼파라미터에 대한 상기 대상 모델의 학습을 계속적으로 수행할 수 있다. 또한, 파라미터 탐색부(104)는 추후 학습 결과에 따라 상기 학습의 조기 종료(early stopping) 여부를 결정할 수 있다. 일 예시로서, 학습 과정에서 손실의 변화 또는 정확도의 변화 정도가 연속적으로 설정된 값(예를 들어, 1%) 이내에 해당하는 경우, 파라미터 탐색부(104)는 상기 학습을 조기 종료시킬 수 있다.If the loss index for the kth candidate hyperparameter is greater than or equal to the moving average, the parameter search unit 104 may continuously perform learning of the target model for the kth candidate hyperparameter. Additionally, the parameter search unit 104 may determine whether to early stop the learning according to the results of later learning. As an example, if the degree of change in loss or change in accuracy during the learning process continuously falls within a set value (eg, 1%), the parameter search unit 104 may terminate the learning early.

또한, 파라미터 탐색부(104)는 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표가 상기 이동 평균 이상이고 상기 k번째 후보 하이퍼파라미터에 대한 성능 평가 지표가 상기 저장된 성능 평가 지표들 중 하나보다 큰 경우 상기 n개의 손실 지표 중 가장 낮은 성능 평가 지표에 대응되는 손실 지표를 상기 k번째 후보 하이퍼파라미터에 대한 손실 지표로 교체하여 상기 이동 평균을 재계산할 수 있다.In addition, the parameter search unit 104 operates when the loss index for the kth candidate hyperparameter is greater than or equal to the moving average and the performance evaluation index for the kth candidate hyperparameter is greater than one of the stored performance evaluation indexes. The moving average can be recalculated by replacing the loss indicator corresponding to the lowest performance evaluation indicator among the loss indicators with the loss indicator for the kth candidate hyperparameter.

위 예시에서, 21번째 후보 하이퍼파라미터에 대한 손실 지표가 상기 이동 평균 이상이고 21번째 후보 하이퍼파라미터에 대한 성능 평가 지표가 상기 저장된 성능 평가 지표들 중 하나보다 큰 경우, 파라미터 탐색부(104)는 저장된 상기 15개의 손실 지표 중 가장 낮은 성능 평가 지표에 대응되는 손실 지표를 21번째 후보 하이퍼파라미터에 대한 손실 지표로 교체하여 상기 이동 평균을 새롭게 계산할 수 있다. 이 경우, 상기 이동 평균이 이전보다 높아질 수 있다.In the example above, if the loss index for the 21st candidate hyperparameter is greater than the moving average and the performance evaluation index for the 21st candidate hyperparameter is greater than one of the stored performance evaluation indexes, the parameter search unit 104 determines the stored performance evaluation index. The moving average can be newly calculated by replacing the loss indicator corresponding to the lowest performance evaluation indicator among the 15 loss indicators with the loss indicator for the 21st candidate hyperparameter. In this case, the moving average may be higher than before.

이와 같이, 본 발명의 실시예들에 따르면, 하이퍼파라미터 탐색 과정에서 학습의 조기 종료 여부를 결정하기 전 추가적인 학습의 수행여부를 결정함으로써, 학습 시간을 단축시키고 이에 따라 전체 탐색 시간을 줄일 수 있다. As such, according to embodiments of the present invention, by determining whether to perform additional learning before deciding whether to end learning early in the hyperparameter search process, the learning time can be shortened, and thus the overall search time can be reduced.

모델 생성부(106)는 파라미터 탐색부(104)에서 추천된 하이퍼파라미터를 기반으로 대상 모델에서의 학습 및 성능에 대한 평가를 추가적으로 수행하고, 이로부터 새로운 모델을 생성한다. The model creation unit 106 additionally evaluates the learning and performance of the target model based on the hyperparameters recommended by the parameter search unit 104 and creates a new model therefrom.

도 2는 본 발명의 제1 실시예에 따른 데이터 선별부(102)에서 학습 데이터를 선별하는 방법을 설명하기 위한 흐름도이다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.Figure 2 is a flowchart illustrating a method of selecting learning data in the data selection unit 102 according to the first embodiment of the present invention. In the illustrated flow chart, the method is divided into a plurality of steps, but at least some of the steps are performed in a different order, combined with other steps, omitted, divided into detailed steps, or not shown. One or more steps may be added and performed.

S102 단계에서, 데이터 선별부(102)는 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출한다.In step S102, the data selection unit 102 extracts feature vectors for each of the plurality of prepared training data.

S104 단계에서, 데이터 선별부(102)는 임의의 타깃 데이터를 선택하고, 상기 타깃 데이터와 주변 학습 데이터들과의 거리(예를 들어, 유클리디안 거리)를 계산한다.In step S104, the data selection unit 102 selects random target data and calculates the distance (eg, Euclidean distance) between the target data and surrounding learning data.

S106 단계에서, 데이터 선별부(102)는 K-NN 알고리즘을 통해 타깃 데이터의 클래스와 주변 학습 데이터들의 대표 클래스를 각각 결정한다.In step S106, the data selection unit 102 determines the class of the target data and the representative class of surrounding learning data through the K-NN algorithm.

S108 단계에서, 데이터 선별부(102)는 타깃 데이터의 클래스와 상기 대표 클래스를 비교한다.In step S108, the data selection unit 102 compares the class of target data and the representative class.

S110 단계에서, 데이터 선별부(102)는 S108 단계에서의 비교 결과 타깃 데이터의 클래스와 상기 대표 클래스가 일치하지 않는 경우 상기 타깃 데이터를 하이퍼파라미터의 탐색에 사용될 학습 데이터로 선별한다. 만약, S108 단계에서의 비교 결과 타깃 데이터의 클래스와 상기 대표 클래스가 일치하는 경우, 데이터 선별부(102)는 S104 단계로 되돌아가 다른 타깃 데이터를 선택한다. 이후, 데이터 선별부(102)는 새롭게 선택된 타깃 데이터에 대해 앞선 S106 단계 및 S108 단계를 수행하면서 상술한 과정을 반복한다.In step S110, if the class of the target data and the representative class do not match as a result of the comparison in step S108, the data selection unit 102 selects the target data as learning data to be used for hyperparameter search. If, as a result of the comparison in step S108, the class of the target data matches the representative class, the data selection unit 102 returns to step S104 and selects other target data. Thereafter, the data selection unit 102 repeats the above-described process while performing steps S106 and S108 for the newly selected target data.

도 3은 본 발명의 제2 실시예에 따른 데이터 선별부(102)에서 학습 데이터를 선별하는 방법을 설명하기 위한 흐름도이다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.Figure 3 is a flowchart illustrating a method of selecting learning data in the data selection unit 102 according to the second embodiment of the present invention. In the illustrated flow chart, the method is divided into a plurality of steps, but at least some of the steps are performed in a different order, combined with other steps, omitted, divided into detailed steps, or not shown. One or more steps may be added and performed.

S202 단계에서, 데이터 선별부(102)는 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출한다.In step S202, the data selection unit 102 extracts feature vectors for each of the plurality of prepared training data.

S204 단계에서, 데이터 선별부(102)는 임의의 타깃 데이터를 선택하고, 상기 타깃 데이터와 주변 학습 데이터들과의 거리(예를 들어, 유클리디안 거리)를 계산한다.In step S204, the data selection unit 102 selects random target data and calculates the distance (eg, Euclidean distance) between the target data and surrounding learning data.

S206 단계에서, 데이터 선별부(102)는 K-NN 알고리즘을 통해 타깃 데이터의 클래스와 주변 학습 데이터들의 대표 클래스를 각각 결정한다.In step S206, the data selection unit 102 determines the class of the target data and the representative class of surrounding learning data through the K-NN algorithm.

S208 단계에서, 데이터 선별부(102)는 상기 타깃 데이터와 상기 영역 내 학습 데이터들 간의 거리에 따라 상기 각 클래스별로 보다 카운트(borda count) 알고리즘을 적용하여 각 클래스별로 보팅 결과(voting result), 즉 점수(score)를 도출한다.In step S208, the data selection unit 102 applies a borda count algorithm to each class according to the distance between the target data and learning data in the area to obtain a voting result for each class, that is, Derive a score.

S210 단계에서, 데이터 선별부(102)는 상기 타깃 데이터의 클래스와 동일한 클래스에 대응되는 점수가 대표 클래스에 대응되는 점수 이하인지의 여부를 판단한다.In step S210, the data selection unit 102 determines whether the score corresponding to the same class as the class of the target data is less than or equal to the score corresponding to the representative class.

S212 단계에서, 상기 타깃 데이터의 클래스와 동일한 클래스에 대응되는 점수가 대표 클래스에 대응되는 점수 이하인 경우, 데이터 선별부(102)는 상기 타깃 데이터를 하이퍼파라미터의 탐색에 사용될 학습 데이터로 선별한다. 만약, S210 단계에서의 비교 결과 상기 타깃 데이터의 클래스와 동일한 클래스에 대응되는 점수가 대표 클래스에 대응되는 점수보다 큰 경우, 데이터 선별부(102)는 S204 단계로 되돌아가 다른 타깃 데이터를 선택한다. 이후, 데이터 선별부(102)는 새롭게 선택된 타깃 데이터에 대해 앞선 S206 단계 내지 S210 단계를 수행하면서 상술한 과정을 반복한다.In step S212, if the score corresponding to the same class as the class of the target data is less than or equal to the score corresponding to the representative class, the data selection unit 102 selects the target data as learning data to be used for hyperparameter search. If, as a result of the comparison in step S210, the score corresponding to the same class as the class of the target data is greater than the score corresponding to the representative class, the data selection unit 102 returns to step S204 and selects other target data. Thereafter, the data selection unit 102 repeats the above-described process while performing steps S206 to S210 for the newly selected target data.

도 4는 본 발명의 실시예들에 따른 학습 데이터 선별 과정을 나타낸 예시이다.Figure 4 is an example showing a learning data selection process according to embodiments of the present invention.

상술한 바와 같이, 데이터 선별부(102)는 기 구비된 복수의 학습 데이터에 대한 특징 벡터를 각각 추출하고, 상기 특징 벡터 및 K-NN 알고리즘을 이용하여 상기 복수의 학습 데이터 중 하이퍼파라미터의 탐색에 사용될 학습 데이터를 선별할 수 있다. 여기서는, 설명의 편의상 학습 데이터가 이미지인 것으로 가정한다. As described above, the data selection unit 102 extracts feature vectors for each of the plurality of training data, and uses the feature vectors and the K-NN algorithm to search for hyperparameters among the plurality of training data. The learning data to be used can be selected. Here, for convenience of explanation, it is assumed that the learning data is an image.

도 4를 참조하면, 기 학습된 모델에 기반하여 복수의 이미지 각각에 대한 특징 벡터를 추출하고, 상기 특징 벡터를 기초로 각 이미지 간의 거리를 계산할 수 있다. 도 4의 (a)와 (b)는 타깃 이미지와 상기 타깃 이미지가 속한 영역의 타 이미지에 대해 각각 도시하고 있다. 여기서, 도 4의 (a)는 본 발명의 제1 실시예에 따른 학습 데이터 선별 과정을 나타낸 예시이며, 도 4의 (b)는 본 발명의 제2 실시예에 따른 학습 데이터 선별 과정을 나타낸 예시이다. 또한, 도 4의 (a) 및 (b)에서는 동일한 클래스에 속한 이미지들이 동일한 빗금 형상으로 표시되어 있다.Referring to FIG. 4, feature vectors for each of a plurality of images can be extracted based on a previously learned model, and distances between each image can be calculated based on the feature vectors. Figures 4 (a) and (b) respectively show a target image and other images in the area to which the target image belongs. Here, Figure 4 (a) is an example showing the learning data selection process according to the first embodiment of the present invention, and Figure 4 (b) is an example showing the learning data selection process according to the second embodiment of the present invention. am. Additionally, in Figures 4 (a) and (b), images belonging to the same class are displayed with the same hatched shape.

먼저, 도 4의 (a)를 참조하면, 데이터 선별부(102)는 타깃 이미지의 클래스와 상기 타깃 이미지의 주변에 있는 타 이미지의 클래스를 각각 비교할 수 있다. 비교 결과, 타깃 이미지의 클래스와 동일한 클래스(예를 들어, a)의 개수가 3개, 타깃 이미지의 클래스와 상이한 클래스들(예를 들어, b, c, d)의 개수가 각각 4개, 4개, 4개인 것을 확인할 수 있다. 이 경우, 타깃 이미지의 클래스가 상기 타깃 이미지가 속한 영역의 대표 클래스와 상이하므로(즉, 타깃 이미지의 클래스와 동일한 클래스의 개수가 타깃 이미지의 클래스와 상이한 클래스들의 개수보다 작으므로), 데이터 선별부(102)는 상기 타깃 이미지를 하이퍼파라미터의 탐색에 사용될 학습 데이터로 선별할 수 있다.First, referring to (a) of FIG. 4, the data selection unit 102 may compare the class of the target image with the class of other images surrounding the target image. As a result of the comparison, the number of classes (e.g., a) that are the same as the class of the target image is 3, and the number of classes (e.g., b, c, d) that are different from the class of the target image are 4 and 4, respectively. You can see that there are four. In this case, since the class of the target image is different from the representative class of the area to which the target image belongs (that is, the number of classes identical to the class of the target image is smaller than the number of classes different from the class of the target image), the data selection unit (102) may select the target image as learning data to be used for hyperparameter search.

다음으로, 도 4의 (b)를 참조하면, 데이터 선별부(102)는 상기 타깃 이미지와 상기 영역 내 타 이미지들 간의 거리에 따라 각 클래스별로 보다 카운트 알고리즘을 적용하여 각 클래스별로 보팅 결과(voting result), 즉 점수(score)를 도출할 수 있다. 이때, 데이터 선별부(102)는 상기 타깃 이미지와 가까운 거리에 존재하는 타 이미지에 대해 더 높은 가중치를 부여할 수 있다. 즉, 데이터 선별부(102)는 상기 타 이미지 각각에 대해 상기 타깃 이미지와 가까운 순으로 k, k-1, k-2, … 2, 1의 가중치를 각각 부여할 수 있다. 이에 따라, 타깃 이미지의 클래스와 동일한 클래스, 즉 클래스 a에 대한 점수는 15 + 14 + 13 = 42가 되며, 타깃 이미지의 클래스와 상이한 클래스, 즉 클래스 b, c, d에 대한 점수는 각각 12 + 11 + 5 + 2 = 30, 10 + 9 + 8 + 4 = 31, 7 + 6 + 3 + 1 = 17이 될 수 있다. 이후, 데이터 선별부(102)는 상기 타깃 이미지의 클래스와 동일한 클래스에 대응되는 점수가 대표 클래스에 대응되는 점수 이하인지의 여부를 판단할 수 있다. 위 예시에서, 상기 타깃 이미지의 클래스와 동일한 클래스에 대응되는 점수(즉, 42점)가 대표 클래스에 대응되는 점수(즉, 30점, 31점, 17점)보다 크므로, 데이터 선별부(102)는 상기 타깃 이미지를 하이퍼파라미터의 탐색에 학습 데이터로 선별하지 않게 된다.Next, referring to (b) of FIG. 4, the data selection unit 102 applies a counting algorithm to each class according to the distance between the target image and other images in the area to obtain a voting result for each class. result), that is, a score can be derived. At this time, the data selection unit 102 may assign a higher weight to other images that exist close to the target image. That is, the data selection unit 102 selects k, k-1, k-2,... for each of the other images in the order of proximity to the target image. Weights of 2 and 1 can be assigned respectively. Accordingly, the score for the same class as the target image's class, i.e., class a, is 15 + 14 + 13 = 42, and the scores for the classes different from the target image's class, i.e., classes b, c, and d, are respectively 12 + 11 + 5 + 2 = 30, 10 + 9 + 8 + 4 = 31, 7 + 6 + 3 + 1 = 17. Thereafter, the data selection unit 102 may determine whether the score corresponding to the same class as the class of the target image is less than or equal to the score corresponding to the representative class. In the above example, since the score corresponding to the same class as the target image class (i.e., 42 points) is greater than the score corresponding to the representative class (i.e., 30 points, 31 points, and 17 points), the data selection unit 102 ) does not select the target image as learning data for hyperparameter search.

즉, 동일한 타깃 데이터라도 위 실시예들에 따라 하이퍼파라미터의 탐색에 학습 데이터로 선별되거나 선별되지 않을 수 있다.That is, even the same target data may or may not be selected as learning data for hyperparameter search according to the above embodiments.

도 5는 본 발명의 일 실시예에 따른 파라미터 탐색부(104)에서 제1 탐색 과정을 수행하는 방법을 설명하기 위한 흐름도이다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.FIG. 5 is a flowchart illustrating a method of performing a first search process in the parameter search unit 104 according to an embodiment of the present invention. In the illustrated flow chart, the method is divided into a plurality of steps, but at least some of the steps are performed in a different order, combined with other steps, omitted, divided into detailed steps, or not shown. One or more steps may be added and performed.

S302 단계에서, 파라미터 탐색부(104)는 하이퍼파라미터의 초기 탐색 범위를 설정한다. 파라미터 탐색부(104)는 예를 들어, 기 알려진 여러 통계적인 기법으로 하이퍼파라미터의 초기 탐색 범위를 설정할 수 있다.In step S302, the parameter search unit 104 sets the initial search range of the hyperparameter. For example, the parameter search unit 104 may set the initial search range of hyperparameters using various known statistical techniques.

S304 단계에서, 파라미터 탐색부(104)는 상기 초기 탐색 범위 내에서 선별된 상기 학습 데이터를 기반으로 베이신 호핑 알고리즘을 수행한다. 이에 따라, 제1 후보 하이퍼파라미터(예를 들어, h₁)가 도출될 수 있다.In step S304, the parameter search unit 104 performs a basis hopping algorithm based on the learning data selected within the initial search range. Accordingly, a first candidate hyperparameter (eg, h ₁ ) may be derived.

S306 단계에서, 파라미터 탐색부(104)는 베이신 호핑 알고리즘으로부터 도출된 제1 후보 하이퍼파라미터(예를 들어, h₁)에 대해 대상 모델을 학습시킨다.In step S306, the parameter search unit 104 trains a target model for the first candidate hyperparameter (eg, h ₁ ) derived from the basis hopping algorithm.

S308 단계에서, 파라미터 탐색부(104)는 상기 베이신 호핑 알고리즘으로부터 도출된 제1 후보 하이퍼파라미터(예를 들어, h₁)에 대한 상기 대상 모델의 손실(loss) 지표 및 성능 평가 지표를 각각 저장한다. In step S308, the parameter search unit 104 stores a loss index and a performance evaluation index of the target model for the first candidate hyperparameter (e.g., h ₁ ) derived from the basis hopping algorithm, respectively. do.

이후, 다음 제1 후보 하이퍼파라미터에 대해 S304 단계 및 S308 단계가 반복적으로 수행한다. 이에 따라, 복수의 제1 후보 하이퍼파라미터(예를 들어, h₁ ~ h₂₀)에 대한 상기 대상 모델의 손실(loss) 지표 및 성능 평가 지표가 각각 저장된다.Thereafter, steps S304 and S308 are repeatedly performed for the next first candidate hyperparameter. Accordingly, the loss index and performance evaluation index of the target model for a plurality of first candidate hyperparameters (eg, h ₁ to h ₂₀ ) are respectively stored.

이후, S310 단계에서, k번째 후보 하이퍼파라미터(예를 들어, h₂₁)에 대한 상기 대상 모델의 성능 평가시, 파라미터 탐색부(104)는 상기 저장된 손실 지표 중 설정된 값 이상의 성능 평가 지표에 대응되는 n개의 손실 지표의 이동 평균을 계산한다.Thereafter, in step S310, when evaluating the performance of the target model for the kth candidate hyperparameter (e.g., h ₂₁ ), the parameter search unit 104 selects a performance evaluation index corresponding to a set value or more among the stored loss indices. Calculate the moving average of n loss indicators.

S312 단계에서, 파라미터 탐색부(104)는 상기 k번째 후보 하이퍼파라미터(예를 들어, h₂₁)에 대한 상기 대상 모델의 손실 지표와 상기 이동 평균을 비교한다.In step S312, the parameter search unit 104 compares the loss index of the target model for the kth candidate hyperparameter (eg, h ₂₁ ) with the moving average.

S314 단계에서, 파라미터 탐색부(104)는 S312 단계에서의 비교 결과 상기 k번째 후보 하이퍼파라미터(예를 들어, h₂₁)에 대한 상기 대상 모델의 손실 지표가 상기 이동 평균 이상인 경우 상기 k번째 후보 하이퍼파라미터(예를 들어, h₂₁)에 대한 상기 학습을 계속적으로 수행하고, 추후 상기 학습의 조기 종료 여부를 결정한다. In step S314, the parameter search unit 104 selects the kth candidate hyperparameter if the loss index of the target model for the kth candidate hyperparameter (for example, h ₂₁ ) is greater than or equal to the moving average as a result of the comparison in step S312. The learning for parameters (e.g., h ₂₁ ) is continuously performed, and it is later determined whether or not the learning should be terminated early.

파라미터 탐색부(104)는 예를 들어, 학습 과정에서 손실의 변화 또는 정확도의 변화 정도가 연속적으로 설정된 값(예를 들어, 1%) 이내에 해당하는 경우 상기 학습을 조기 종료시키는 것으로 결정할 수 있으며, 이 경우 S304 단계로 되돌아가 베이신 호핑 알고리즘으로부터 도출되는 다음 제1 후보 하이퍼파라미터(예를 들어, h₂₂)에 대해 앞선 과정을 반복 수행할 수 있다. 또한, 학습 과정에서 손실의 변화 또는 정확도의 변화 정도가 연속적으로 설정된 값(예를 들어, 1%) 이내에 해당하지 않는 경우, 파라미터 탐색부(104)는 대상 모델의 학습을 계속적으로 수행할 수 있다(S306).For example, the parameter search unit 104 may determine to terminate the learning early if the degree of change in loss or change in accuracy during the learning process continuously falls within a set value (e.g., 1%), In this case, the previous process may be repeated for the next first candidate hyperparameter (e.g., h ₂₂ ) derived from the basis hopping algorithm by returning to step S304. In addition, if the degree of change in loss or accuracy during the learning process does not fall within a continuously set value (e.g., 1%), the parameter search unit 104 may continuously perform learning of the target model. (S306).

또한, S312 단계에서의 비교 결과 상기 k번째 후보 하이퍼파라미터(예를 들어, h₂₁)에 대한 상기 대상 모델의 손실 지표가 상기 이동 평균 미만인 경우, 파라미터 탐색부(104)는 상기 k번째 후보 하이퍼파라미터(예를 들어, h₂₁)에 대한 상기 대상 모델의 학습을 종료시킨다. 이후, 파라미터 탐색부(104)는 S304 단계로 되돌아가 베이신 호핑 알고리즘으로부터 도출되는 다음 제1 후보 하이퍼파라미터(예를 들어, h₂₂)에 대해 앞선 과정을 반복 수행할 수 있다.In addition, as a result of the comparison in step S312, if the loss index of the target model for the k-th candidate hyperparameter (for example, h ₂₁ ) is less than the moving average, the parameter search unit 104 determines the k-th candidate hyperparameter Training of the target model for (e.g., h ₂₁ ) is terminated. Thereafter, the parameter search unit 104 may return to step S304 and repeat the previous process for the next first candidate hyperparameter (eg, h ₂₂ ) derived from the basis hopping algorithm.

이와 같이, 파라미터 탐색부(104)는 베이신 호핑 알고리즘을 통해 도출되는 복수의 제1 후보 하이퍼파라미터(예를 들어, h₁ ~ h₅₀) 각각에 대해 앞선 S304 단계 내지 S314 단계를 반복 수행할 수 있다.In this way, the parameter search unit 104 may repeatedly perform steps S304 to S314 for each of the plurality of first candidate hyperparameters (e.g., h ₁ to h ₅₀ ) derived through the basis hopping algorithm. there is.

S316 단계에서, 파라미터 탐색부(104)는 베이신 호핑 알고리즘을 통해 도출된 각 제1 후보 하이퍼파라미터(예를 들어, h₁ ~ h₅₀)의 최소값과 최대값을 상기 하이퍼파라미터의 탐색 범위로 제한할 수 있다.In step S316, the parameter search unit 104 limits the minimum and maximum values of each first candidate hyperparameter (e.g., h ₁ to h ₅₀ ) derived through a basis hopping algorithm to the search range of the hyperparameter. can do.

도 6은 본 발명의 일 실시예에 따른 파라미터 탐색부(104)에서 하나의 하이퍼파라미터를 추천하는 방법을 설명하기 위한 흐름도이다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.Figure 6 is a flowchart illustrating a method of recommending one hyperparameter in the parameter search unit 104 according to an embodiment of the present invention. In the illustrated flow chart, the method is divided into a plurality of steps, but at least some of the steps are performed in a different order, combined with other steps, omitted, divided into detailed steps, or not shown. One or more steps may be added and performed.

S402 단계에서, 파라미터 탐색부(104)는 제한된 상기 탐색 범위 내에서 베이지안 최적화 알고리즘을 수행하고, 이로부터 제2 후보 하이퍼파라미터(예를 들어, h₅₁)가 도출된다.In step S402, the parameter search unit 104 performs a Bayesian optimization algorithm within the limited search range, from which a second candidate hyperparameter (eg, h ₅₁ ) is derived.

이후, 파라미터 탐색부(104)는 도 5에서와 동일한 방법으로 S404 단계 내지 S412 단계를 수행한다. 이때, 파라미터 탐색부(104)는 도 5에서와 달리 제1 후보 하이퍼파라미터 대신 제2 후보 하이퍼파라미터(예를 들어, h₅₁~ h₁₀₀)를 이용하여 S404 단계 내지 S412 단계를 수행하게 된다.Thereafter, the parameter search unit 104 performs steps S404 to S412 in the same manner as in FIG. 5. At this time, unlike in FIG. 5 , the parameter search unit 104 performs steps S404 to S412 using second candidate hyperparameters (for example, h ₅₁ to h ₁₀₀ ) instead of the first candidate hyperparameter.

S414 단계에서, 파라미터 탐색부(104)는 각 제1 후보 하이퍼파라미터(예를 들어, h₁~ h₅₀) 및 각 제2 후보 하이퍼파라미터(예를 들어, h₅₁~ h₁₀₀) 중 가장 최적의 성능 평가 지표를 출력하는 하나의 하이퍼파라미터를 추천한다.In step S414, the parameter search unit 104 selects the most optimal among each first candidate hyperparameter (e.g., h ₁ to h ₅₀ ) and each second candidate hyperparameter (e.g., h ₅₁ to h ₁₀₀ ). We recommend one hyperparameter that outputs performance evaluation indicators.

도 7은 본 발명의 일 실시예에 따른 모델 생성부(106)에서 새로운 모델을 생성하는 방법을 설명하기 위한 흐름도이다.FIG. 7 is a flowchart illustrating a method of generating a new model in the model creation unit 106 according to an embodiment of the present invention.

S502 단계에서, 모델 생성부(106)는 파라미터 탐색부(104)로부터 최적의 하이퍼파라미터(예를 들어, h₇₂)를 수신한다.In step S502, the model creation unit 106 receives optimal hyperparameters (eg, h ₇₂ ) from the parameter search unit 104.

S504 단계에서, 모델 생성부(106)는 상기 최적의 하이퍼파라미터(예를 들어, h₇₂)를 기반으로 대상 모델에서의 학습을 추가적으로 수행하여 새로운 모델을 생성한다.In step S504, the model generator 106 generates a new model by additionally performing learning on the target model based on the optimal hyperparameter (eg, h ₇₂ ).

S506 단계에서, 모델 생성부(106)는 새롭게 생성된 모델에 대한 성능을 평가한다.In step S506, the model creation unit 106 evaluates the performance of the newly created model.

도 8은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술되지 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.8 is a block diagram illustrating and illustrating a computing environment including a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components other than those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 최적화 시스템(100), 또는 최적화 시스템(100)에 포함되는 하나 이상의 컴포넌트일 수 있다.The illustrated computing environment 10 includes a computing device 12 . In one embodiment, computing device 12 may be optimization system 100, or one or more components included in optimization system 100.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14, a computer-readable storage medium 16, and a communication bus 18. Processor 14 may cause computing device 12 to operate in accordance with the example embodiments noted above. For example, processor 14 may execute one or more programs stored on computer-readable storage medium 16. The one or more programs may include one or more computer-executable instructions, which, when executed by the processor 14, cause computing device 12 to perform operations according to example embodiments. It can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, computer-readable storage medium 16 includes memory (volatile memory, such as random access memory, non-volatile memory, or an appropriate combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, another form of storage medium that can be accessed by computing device 12 and store desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.Communication bus 18 interconnects various other components of computing device 12, including processor 14 and computer-readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide an interface for one or more input/output devices 24. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. Input/output device 24 may be coupled to other components of computing device 12 through input/output interface 22. Exemplary input/output devices 24 include, but are not limited to, a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touch screen), a voice or sound input device, various types of sensor devices, and/or imaging devices. It may include input devices and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 24 may be included within the computing device 12 as a component constituting the computing device 12, or may be connected to the computing device 12 as a separate device distinct from the computing device 12. It may be possible.

이상에서 대표적인 실시예를 통하여 본 발명에 대하여 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 전술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다. Although the present invention has been described in detail above through representative embodiments, those skilled in the art will recognize that various modifications to the above-described embodiments are possible without departing from the scope of the present invention. You will understand. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims described later but also by equivalents to the claims.

10 : 컴퓨팅 환경
12 : 컴퓨팅 장치
14 : 프로세서
16 : 컴퓨터 판독 가능 저장 매체
18 : 통신 버스
20 : 프로그램
22 : 입출력 인터페이스
24 : 입출력 장치
26 : 네트워크 통신 인터페이스
100 : 최적화 시스템
102 : 데이터 선별부
104 : 파라미터 탐색부
106 : 모델 생성부10: Computing environment
12: Computing device
14: processor
16: computer-readable storage medium
18: communication bus
20: Program
22: input/output interface
24: input/output device
26: Network communication interface
100: Optimization system
102: Data selection unit
104: Parameter search unit
106: Model creation unit

Claims

To extract feature vectors for each of the plurality of training data, and to search for optimal hyperparameters among the plurality of training data using the feature vector and K-NN (K-Nearest Neighbor) algorithm. a data selection unit that selects one or more learning data to be used;
Based on the selected learning data, a basis hopping algorithm is repeatedly performed to limit the search range of the hyperparameters, and a Bayesian optimization algorithm is performed within the limited search range to select one hyperparameter. a parameter search unit that recommends parameters; and
A model generator that generates a new model by performing evaluation of learning and performance in the target model based on the recommended hyperparameters,
The parameter search unit repeatedly performs the basis hopping algorithm based on the selected learning data to derive a plurality of first candidate hyperparameters, and sets the minimum and maximum values of each first candidate hyperparameter to the hyperparameters. A hyperparameter optimization system limited to a search range of .

In claim 1,
The data selection unit applies the K-NN algorithm to the feature vector of each training data to determine a class of target data, which is one of the plurality of training data, and a representative class of the area to which the target data belongs, A hyperparameter optimization system that selects the target data as learning data to be used to search for the optimal hyperparameter when the class of the target data is different from the representative class.

In claim 2,
The data selection unit selects the one or more learning data in consideration of the number of learning data in the region for each class and the distance between the target data and the learning data in the region.

In claim 1,
The parameter search unit stores a loss index and a performance evaluation index of the target model for a plurality of candidate hyperparameters derived from the Basin hopping algorithm or the Bayesian optimization algorithm, respectively, and stores the loss index and performance evaluation index for the kth candidate hyperparameter. When evaluating the performance of the target model, a moving average of n loss indicators corresponding to a performance evaluation indicator greater than a set value among the stored loss indicators is calculated, and the loss indicator of the target model for the kth candidate hyperparameter is calculated. A hyperparameter optimization system that compares and the moving average.

In claim 4,
The parameter search unit, when the loss index of the target model for the k-th candidate hyperparameter is less than the moving average, determines whether to early stop learning. A hyperparameter optimization system that terminates learning and continues learning the target model for the kth candidate hyperparameter when the loss index for the kth candidate hyperparameter is greater than or equal to the moving average.

In claim 5,
If the loss index for the kth candidate hyperparameter is greater than or equal to the moving average and the performance evaluation index for the kth candidate hyperparameter is greater than one of the stored performance evaluation indexes, the parameter search unit selects one of the n loss metrics. A hyperparameter optimization system that recalculates the moving average by replacing the loss index corresponding to the lowest performance evaluation index with the loss index for the kth candidate hyperparameter.

In claim 4,
The loss indicator is a hyperparameter optimization system that is a Nat (natural unit of information) value of the loss output during the learning process of the target model.

In claim 4,
The performance evaluation index includes one or more of accuracy, error rate, sensitivity, precision, specificity, and false positive rate of the target model. , Hyperparameter optimization system.

delete

In claim 1,
The parameter search unit repeatedly performs the Bayesian optimization algorithm within the limited search range to derive a plurality of second candidate hyperparameters, and the best candidate among the plurality of first candidate Piper parameters and the plurality of second candidate hyperparameters. A hyperparameter optimization system that recommends one hyperparameter that outputs the optimal performance evaluation index.

In the data selection unit, extracting feature vectors for each of the plurality of training data already provided;
In the data selection unit, selecting one or more learning data to be used to search for an optimal hyperparameter among the plurality of learning data using the feature vector and a K-Nearest Neighbor (K-NN) algorithm;
In a parameter search unit, limiting the search range of the hyperparameter by repeatedly performing a basis hopping algorithm based on the selected learning data;
Recommending one hyperparameter by performing a Bayesian optimization algorithm within the limited search range, in the parameter search unit; and
In the model creation unit, generating a new model by learning from a target model and evaluating the performance of the target model based on the recommended hyperparameters,
The step of limiting the search range of the hyperparameters includes setting the initial search range of the hyperparameters and setting the minimum and maximum values of each first candidate hyperparameter derived from the basis hopping algorithm in the process of learning the target model. A hyperparameter optimization method that limits to the search range of the hyperparameter.

In claim 11,
The step of selecting the one or more learning data includes applying the K-NN algorithm to the feature vector of each learning data to determine the class of the target data, which is one of the plurality of learning data, and the area to which the target data belongs. A hyperparameter optimization method that determines a representative class and, if the class of the target data is different from the representative class, selects the target data as learning data to be used to search for the optimal hyperparameter.

In claim 12,
The step of selecting the one or more learning data includes selecting the one or more learning data in consideration of the number of learning data in the region for each class and the distance between the target data and the learning data in the region, optimization of hyperparameters. method.

In claim 11,
In the parameter search unit, storing a loss index and a performance evaluation index of the target model for a plurality of candidate hyperparameters derived from the Basin hopping algorithm or the Bayesian optimization algorithm, respectively;
Calculating, in the parameter search unit, a moving average of n loss indicators corresponding to performance evaluation indicators greater than a set value among the stored loss indicators when evaluating the performance of the target model for the kth candidate hyperparameter; and
A hyperparameter optimization method further comprising comparing, in the parameter search unit, a loss index of the target model for the kth candidate hyperparameter with the moving average.

In claim 14,
After the above comparing step,
In the parameter search unit, if the loss index of the target model for the k-th candidate hyperparameter is less than the moving average, the target model for the k-th candidate hyperparameter before determining whether to early stop learning. terminating learning; or
Optimization of hyperparameters, further comprising, in the parameter search unit, continuously performing learning of the target model for the kth candidate hyperparameter when the loss index for the kth candidate hyperparameter is greater than or equal to the moving average. method.

In claim 15,
After the above comparing step,
In the parameter search unit, when the loss index for the kth candidate hyperparameter is greater than the moving average and the performance evaluation index for the kth candidate hyperparameter is greater than one of the stored performance evaluation indexes, the n loss indexes The hyperparameter optimization method further includes the step of replacing the loss index corresponding to the lowest performance evaluation index with the loss index for the kth candidate hyperparameter and recalculating the moving average.

In claim 14,
The loss indicator is a hyperparameter optimization method wherein the loss index is a Nat (natural unit of information) value of the loss output during the learning process of the target model.

In claim 14,
The performance evaluation index includes one or more of accuracy, error rate, sensitivity, precision, specificity, and false positive rate of the target model. , Hyperparameter optimization method.

delete

In claim 11,
The step of recommending one hyperparameter includes determining the performance of the target model for each first candidate hyperparameter and each second candidate hyperparameter derived from the Bayesian optimization algorithm in the process of learning the target model. A hyperparameter optimization method that performs evaluation and recommends one hyperparameter that outputs the most optimal performance evaluation index among each of the first candidate hyperparameters and each of the second candidate hyperparameters.