KR20210024872A

KR20210024872A - Method for evaluating test fitness of input data for neural network and apparatus thereof

Info

Publication number: KR20210024872A
Application number: KR1020190104591A
Authority: KR
Inventors: 유신; 김진한; 댄 로버트 펠트
Original assignee: 한국과학기술원; 액셀레란디움 에이비
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2021-03-08
Also published as: KR102287430B1

Abstract

Disclosed are a method and apparatus for evaluating the test fitness of input data for a neural network. The method for evaluating the test fitness of input data for a neural network according to one embodiment of the present invention comprises the steps of: receiving input data; acquiring an evaluation model corresponding to an output pattern of neurons while the neural network is trained; inferring a first result among inferable results by applying the input data to the neural network; inferring a second result among the inferable results based on the evaluation model and the output pattern of the neurons in response to the input data; and evaluating the test fitness of the input data by comparing the first result and the second result.

Description

[Method FOR EVALUATING TEST FITNESS OF INPUT DATA FOR NEURAL NETWORK AND APPARATUS THEREOF}

신경망 용 입력 데이터의 검사 적합도 평가 방법 및 그 장치에 관한 것으로, 예를 들어 딥 러닝 시스템(Deep Learning System)에 관한 것이다.It relates to a method and apparatus for evaluating the test suitability of input data for neural networks, and to, for example, a deep learning system.

딥 러닝(이하, DL) 시스템은 이미지 인식(image recognition), 음성 인식(speech recognition) 및 기계 번역(machine translation)을 포함하는 다양한 분야에서 상당한 발전을 이루었다. 인간의 행동에 대응하거나, 혹은 심지어 이를 능가하는 성능을 바탕으로, DL 시스템은 자율 주행(autonomous driving) 및 맬웨어 탐지(malware detection)와 같은 안전 및 보안 핵심 영역에서 채택되고 있다.Deep learning (hereinafter, DL) systems have made significant advances in various fields including image recognition, speech recognition and machine translation. Based on their ability to respond to or even surpass human behavior, DL systems are being adopted in key areas of safety and security, such as autonomous driving and malware detection.

안전 및 보안 핵심 영역은 정확하고 예측 가능한 것이어야 할 수 있다. DL 시스템은 뛰어난 성능을 보이지만, 특정 상황에서 예기치 않은 동작을 보이기도 하는 것으로 알려져 있다. 예를 들어, DL 시스템이 장착된 자율 주행 차량의 경우, 다른 차량이 양보해 줄 것으로 예상했으나 실제로 그러지 않은 경우 해당 차량과 충돌을 일으키는 경우가 있을 수 있다.Safety and security key areas may have to be accurate and predictable. Although the DL system exhibits excellent performance, it is known to exhibit unexpected behavior in certain situations. For example, in the case of an autonomous vehicle equipped with a DL system, if another vehicle is expected to give way, but does not, there may be a case of causing a collision with the vehicle.

이러한 점 때문에, DL 시스템은 그 동작과 유효성의 검증이 필요하다. 다만, 기존 소프트웨어 테스트 기술을 DL 시스템에 직접 적용하기 어려울 수 있다. 예를 들어, 구조적 커버리지(structural coverage)를 높이는 기존의 화이트 박스 테스팅 기법(white-box testing techniques)은 DL 시스템에서 유용하지 않을 수 있다. 이는, DL 시스템의 동작은 제어 흐름 구조(control flow structure)에서 명시적으로 인코딩되지 않기 때문이다.Because of this, the DL system needs to verify its operation and validity. However, it may be difficult to directly apply the existing software test technology to the DL system. For example, existing white-box testing techniques that increase structural coverage may not be useful in a DL system. This is because the operation of the DL system is not explicitly encoded in the control flow structure.

DL 시스템의 테스트 및 검증을 위한 두 가지 가정이 제시될 수 있다. 첫 번째 가정은 DL 시스템에 대한 두 개의 입력이 어떤 인간의 의미(human sense)와 유사하다면, 그 출력도 비슷해야 한다는 가정일 수 있다. 이는 메타모픽 테스팅(metamorphic testing)의 본질을 일반화한 것일 수 있다. 두 번째 가정은 입력 세트가 다양할수록 DL 시스템을보다 효과적으로 테스트할 수 있다는 것일 수 있다.Two assumptions can be made for testing and verification of the DL system. The first assumption might be that if the two inputs to the DL system are similar to some human sense, then the outputs should be similar. This may be a generalization of the nature of metamorphic testing. The second assumption may be that the more diverse the input set, the more effectively the DL system can be tested.

두 가지 가정 하에서 이루어지는 테스트 및 검증은 수작업의 애드혹(ad hoc) 테스트에 비해 발전된 형태이나, 여전히 그 한계가 존재한다. 단순히 활성화 값이 특정 조건을 만족하는 뉴런의 갯수를 세는 것은 주어진 입력 세트의 테스팅 효과를 정량화 할 수 있게 하지만 개별 입력에 대한 정보는 거의 전달하지 않을 수 있다. 예를 들어, 더 높은 NC를 가진 입력이 더 낮은 NC를 가진 다른 입력보다 나은 것으로 간주되어야 하는 이유 및 특정 입력이 다른 입력보다 임계 값 이상으로 더 많은 뉴런을 자연스럽게 활성화하는 이유를 설명하기 어려울 수 있다. 테스트 적합성 기준이 실제로 유용하기 위해서는 개별 입력의 선택을 가이드 할 수 있어야 할 수 있다.Testing and verification under both assumptions is an advanced form compared to manual ad hoc testing, but there are still limitations. Simply counting the number of neurons whose activation value satisfies a particular condition makes it possible to quantify the testing effect of a given set of inputs, but may convey little information about individual inputs. For example, it can be difficult to explain why an input with a higher NC should be considered better than another input with a lower NC, and why certain inputs naturally activate more neurons above a threshold than other inputs. . For test suitability criteria to be really useful, it may be necessary to be able to guide the selection of individual inputs.

본 발명에서, DL 시스템에 대한 새로운 적합성 테스트가 제안될 수 있다. 새로운 적합성 테스트는 DL 시스템에 대한 놀라움 적합도(Surprise Adequacy for Deep Learning, SADL)일 수 있다. DL 시스템에 적합한 테스트 입력 세트는 학습 데이터(training data)와 유사한 입력부터 학습 데이터와 현저히 다른 입력을 포함하도록 체계적으로 다양화되어야 할 수 있다.In the present invention, a new conformance test for the DL system may be proposed. The new conformance test may be Surprise Adequacy for Deep Learning (SADL) for the DL system. A set of test inputs suitable for a DL system may need to be systematically diversified from inputs similar to training data to inputs significantly different from training data.

개별 입력을 세분화하는 것과 관련하여, SADL은 입력이 DL 시스템에 있어서 얼마나 놀라운지 측정할 수 있다. 실제 놀라움 정도의 측정은 시스템이 학습 동안 유사한 입력을 보았을 가능성에 기초할 수 있다(예를 들어, 커널 밀도 추정을 사용하여 학습 과정에서 추정된 확률 밀도 분포에 관련될 수 있음). 또는, 실제 놀라움의 측정은 주어진 입력의 뉴런 활성화 흔적을 나타내는 벡터와 학습 데이터 사이의 거리(예를 들어, 유클리드 거리를 사용할 수 있음)에 기초할 수 있다. 결과적으로, 일련의 테스트 입력의 SA(Surprise Adequacy)는 집합이 포함하는 개별 입력의 놀라움 값(surprise value)을 통해 측정될 수 있다.When it comes to subdividing individual inputs, SADL can measure how surprising the inputs are for a DL system. A measure of the actual degree of surprise can be based on the likelihood that the system has seen similar inputs during training (for example, it can be related to the estimated probability density distribution during the learning process using kernel density estimation). Alternatively, the measurement of the actual surprise may be based on the distance between the training data and the vector representing the neuron activation trace of a given input (for example, a Euclidean distance can be used). As a result, the SA (Surprise Adequacy) of the series of test inputs can be measured through the surprise values of individual inputs included in the set.

본 발명에서, 학습 데이터와 관련하여 각 입력의 상대적 놀라움(SA, Surprise Adequacy)을 정량적으로 측정 할 수 있는 DL 시스템을위한 놀라움 적합성 프레임워크(surprise adequacy framework)인 SADL이 제안될 수 있다. 또한, 특정 활성화 특성을 가진 뉴런 수를 측정하는 대신, SA를 사용하여 이산 입력(discretized input)의 놀라움 범위를 측정하는 SC(Surprise Coverage, 놀라움 커버리지)가 제안될 수 있다. SA와 SC는 입력의 놀라움을 정확하게 포착 할 수 있으며 DL 시스템이 알려지지 않은 입력에 어떻게 반응하는지에 대한 좋은 지표가 될 수 있다. SA는 DL 시스템이 입력을 찾는 방법과 상관 관계가 있으며, 적대적 예시들을 정확하게 분류하는 데 이용될 수 있다. 또한, SC는 합성된 입력 뿐만 아니라 적대적인 예시들을 위한 DL 시스템의 보다 효과적인 재 학습을 위한 입력 선택을 가이드(guide)하는 데 사용될 수 있다.In the present invention, SADL, which is a surprise adequacy framework for a DL system capable of quantitatively measuring the relative surprise (SA, Surprise Adequacy) of each input in relation to the learning data, may be proposed. In addition, instead of measuring the number of neurons with a specific activation characteristic, an SC (Surprise Coverage, surprise coverage) can be proposed that uses SA to measure the surprise range of a discrete input. SA and SC can accurately capture input surprises and can be good indicators of how the DL system reacts to unknown inputs. SA correlates with how the DL system finds the input, and can be used to accurately classify hostile examples. In addition, the SC can be used to guide input selection for more effective re-learning of the DL system for hostile examples as well as the synthesized input.

일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 방법은 상기 입력 데이터를 수신하는 단계; 복수의 뉴런들을 포함하는 상기 신경망에 의하여 추론 가능한 결과들 별로, 상기 신경망이 학습되는 중 상기 뉴런들의 출력 양상에 대응하는 평가 모델을 획득하는 단계; 상기 입력 데이터를 상기 신경망에 인가함으로써, 상기 추론 가능한 결과들 중 제1 결과를 추론하는 단계; 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상 및 상기 평가 모델에 기초하여, 상기 추론 가능한 결과들 중 제2 결과를 추론하는 단계; 및 상기 제1 결과와 상기 제2 결과를 비교함으로써, 상기 입력 데이터의 검사 적합도를 평가하는 단계를 포함한다.A method for evaluating the test suitability of input data for a neural network according to an embodiment includes the steps of: receiving the input data; Acquiring an evaluation model corresponding to an output pattern of the neurons while the neural network is being trained, for each of the results that can be inferred by the neural network including a plurality of neurons; Inferring a first result from among the inferable results by applying the input data to the neural network; Inferring a second result from among the inferable results based on the evaluation model and output patterns of the neurons in response to the input data; And comparing the first result and the second result, thereby evaluating the test suitability of the input data.

일실시예에 따르면, 신경망 용 입력 데이터의 검사 적합도 평가 방법은 상기 검사 적합도를 평가한 결과에 기초하여, 상기 신경망이 얼마나 잘 학습되었는지를 판단하는 단계를 더 포함할 수 있다.According to an embodiment, the method for evaluating the test suitability of the input data for a neural network may further include determining how well the neural network has learned based on a result of evaluating the test suitability.

일실시예에 따르면, 상기 검사 적합도가 미리 정해진 임계 값보다 작은 경우, 신경망 용 입력 데이터의 검사 적합도 평가 방법은 학습 세트(training set)를 획득하는 단계; 및 상기 획득된 학습 세트에 기초하여, 상기 신경망을 다시 학습시키는 단계를 더 포함할 수 있다.According to an embodiment, when the test suitability is less than a predetermined threshold value, a method for evaluating test suitability of input data for neural networks includes: obtaining a training set; And retraining the neural network based on the acquired training set.

일실시예에 따르면, 상기 학습 세트를 획득하는 단계는 상기 학습 세트가 포함하는 데이터 중 적어도 일부가 상기 입력 데이터와 연관되도록 상기 학습 세트를 결정하는 단계를 포함할 수 있다.According to an embodiment, obtaining the learning set may include determining the learning set such that at least some of data included in the learning set is associated with the input data.

일실시예에 따르면, 상기 제1 결과를 추론하는 단계는 상기 입력 데이터에 대응하여 상기 신경망에서 출력되는 결과를 획득하는 단계를 포함할 수 있다.According to an embodiment, the step of inferring the first result may include obtaining a result output from the neural network in response to the input data.

일실시예에 따르면, 상기 제2 결과를 추론하는 단계는 상기 평가 모델이 포함하는 데이터가 형성하는 확률 밀도 분포(probability density distribution)에 기초하여, 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상이 상기 확률 밀도 분포 상 어디에 위치할지를 결정하는 단계를 포함할 수 있다.According to an embodiment, the step of inferring the second result is based on a probability density distribution formed by data included in the evaluation model, and the output pattern of the neurons in response to the input data is It may include determining where to place on the probability density distribution.

일실시예에 따르면, 상기 제2 결과를 추론하는 단계는 상기 평가 모델이 포함하는 데이터 중 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상과 미리 정해진 유사도가 가장 높은 데이터를 결정하는 단계; 및 상기 유사도가 가장 높은 데이터를 상기 신경망이 입력받은 경우에 출력하는 결과에 대응하는 값을 상기 제2 결과로 결정하는 단계를 포함할 수 있다.According to an embodiment, the step of inferring the second result may include: determining data having the highest similarity to an output pattern of the neurons responding to the input data among data included in the evaluation model; And determining a value corresponding to a result of outputting the data having the highest similarity as the second result when the neural network is input.

일실시예에 따르면, 상기 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 상기 제1 데이터의 개별의 원소와 대응하는 상기 제2 데이터의 원소를 각각 비교함으로써 계산될 수 있다.According to an embodiment, the similarity may be calculated by comparing individual elements of the first data and corresponding elements of the second data with respect to the first data and the second data to be compared.

일실시예에 따르면, 상기 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 상기 제1 데이터의 개별의 원소와 대응하는 상기 제2 데이터의 원소를 각각 비교함으로써 계산되는 유클리드 거리(Euclidean distance)에 대응할 수 있다.According to an embodiment, the similarity is a Euclidean distance calculated by comparing individual elements of the first data and corresponding elements of the second data in relation to the first data and second data to be compared. Euclidean distance).

일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 장치는 프로그램이 기록된 메모리; 및 상기 프로그램을 수행하는 프로세서를 포함하고, 상기 프로그램은, 상기 입력 데이터를 수신하는 단계; 복수의 뉴런들을 포함하는 상기 신경망에 의하여 추론 가능한 결과들 별로, 상기 신경망이 학습되는 중 상기 뉴런들의 출력 양상에 대응하는 평가 모델을 획득하는 단계; 상기 입력 데이터를 상기 신경망에 인가함으로써, 상기 추론 가능한 결과들 중 제1 결과를 추론하는 단계; 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상 및 상기 평가 모델에 기초하여, 상기 추론 가능한 결과들 중 제2 결과를 추론하는 단계; 및 상기 제1 결과와 상기 제2 결과를 비교함으로써, 상기 입력 데이터의 검사 적합도를 평가하는 단계를 포함한다.An apparatus for evaluating the test suitability of input data for a neural network according to an embodiment includes a memory in which a program is recorded; And a processor that executes the program, wherein the program comprises: receiving the input data; Acquiring an evaluation model corresponding to an output pattern of the neurons while the neural network is being trained, for each of the results that can be inferred by the neural network including a plurality of neurons; Inferring a first result from among the inferable results by applying the input data to the neural network; Inferring a second result from among the inferable results based on the evaluation model and output patterns of the neurons in response to the input data; And comparing the first result and the second result, thereby evaluating the test suitability of the input data.

도 1은 일실시예에 따른 신경망에 포함되는 복수의 뉴런들의 출력 양상을 예시적으로 설명하기 위한 도면이다.
도 2는 일실시예에 따른 놀라움 적합도를 설명하고, 가능성 기반의 놀라움 적합도를 설명하기 위한 도면이다.
도 3은 일실시예에 따른 거리 기반의 놀라움 적합도를 설명하기 위한 도면이다.
도 4는 일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 방법을 설명하기 위한 동작 흐름도이다.1 is a diagram for explaining an output pattern of a plurality of neurons included in a neural network according to an exemplary embodiment.
FIG. 2 is a diagram for describing a degree of suitability for surprise according to an exemplary embodiment, and for explaining a degree of suitability for surprise based on a possibility.
3 is a diagram for describing a degree of suitability for surprise based on a distance according to an exemplary embodiment.
4 is a flowchart illustrating a method of evaluating a test suitability of input data for a neural network according to an embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 실시될 수 있다. 따라서, 실시예들은 특정한 개시형태로 한정되는 것이 아니며, 본 명세서의 범위는 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for the purpose of illustration only, and may be changed and implemented in various forms. Accordingly, the embodiments are not limited to a specific disclosure form, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various components, these terms should be interpreted only for the purpose of distinguishing one component from other components. For example, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being "connected" to another component, it is to be understood that it may be directly connected or connected to the other component, but other components may exist in the middle.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, action, component, part, or combination thereof is present, but one or more other features or numbers, It is to be understood that the presence or addition of steps, actions, components, parts, or combinations thereof, does not preclude the possibility of preliminary exclusion.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the relevant technical field. Terms as defined in a commonly used dictionary should be construed as having a meaning consistent with the meaning of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in the present specification. Does not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The same reference numerals shown in each drawing indicate the same members.

도 1은 일실시예에 따른 신경망에 포함되는 복수의 뉴런들의 출력 양상을 예시적으로 설명하기 위한 도면이다.1 is a diagram for explaining an output pattern of a plurality of neurons included in a neural network according to an exemplary embodiment.

도 1을 참조하면, 신경망은 하나 이상의 레이어(layer)를 포함하고, 각 레이어는 복수의 뉴런들(neurons)을 포함할 수 있다. 일실시예에 따르면, 신경망은 제1 레이어(110) 및 제2 레이어(120)를 포함할 수 있다.Referring to FIG. 1, a neural network includes one or more layers, and each layer may include a plurality of neurons. According to an embodiment, the neural network may include a first layer 110 and a second layer 120.

제1 레이어(110) 및 제2 레이어(120)가 포함하는 복수의 뉴런들은 개별의 입력(input)

와 관련하여 그 값을 출력할 수 있다. 예를 들어, 제1 레이어(110)가 포함하는 복수의 뉴런들은 0.6, 0.2 및 0.1이라는 값들을 출력하고, 제2 레이어(120)가 포함하는 복수의 뉴런들은 0.1, 0.3 및 0.5라는 값들을 출력할 수 있다.A plurality of neurons included in the first layer 110 and the second layer 120 are individual inputs

In relation to, you can print the value. For example, a plurality of neurons included in the first layer 110 outputs values of 0.6, 0.2, and 0.1, and a plurality of neurons included in the second layer 120 outputs values of 0.1, 0.3, and 0.5. can do.

신경망은 입력

와 관련하여 이를 분류한 결과를 출력할 수 있다. 예를 들어, 입력

와 관련하여 출력된 결과는 'dog'가 될 수 있다. 이 경우, 출력된 결과는 입력

가 미리 정해진 출력의 후보들 중 '개'의 영상에 가장 유사하게 매치됨을 의미할 수 있다.Neural network input

In relation to, the result of classifying it can be output. For example, enter

The result output in relation to the may be'dog'. In this case, the output result is the input

It may mean that is most similarly matched to an image of'dog' among candidates of predetermined output.

복수의 뉴런들이 출력하는 값들은 검사 적합도 평가를 위하여 이용될 수 있다. 이하, 도 1 내지 도 3에서, 설명의 편의를 위하여 복수의 뉴런들이 출력하는 값들을 활성화 궤적(AT, Activation Trace)으로, 검사 적합도 평가의 기준이 되는 값을 놀라움 적합도(Surprise Adequacy)로 지칭할 수 있다. 다만, 도 4에서는 동작의 흐름을 설명하는 데 있어서 청구항의 표현을 존중하여 '복수의 뉴런들이 출력하는 값' 및 '검사 적합도 평가'라는 표현을 그대로 이용할 수 있다.Values output by a plurality of neurons may be used to evaluate test suitability. Hereinafter, in FIGS. 1 to 3, values output by a plurality of neurons are referred to as activation traces (AT) for convenience of explanation, and a value that is a standard for evaluating test suitability will be referred to as Surprise Adequacy. I can. However, in FIG. 4, in describing the flow of operation, the expressions'values output by a plurality of neurons' and'test suitability evaluation' may be used as they are in respect of the expression of the claims.

DL 시스템은 익숙하지 않은 입력에 대해 오류가 발생하기 쉽기 때문에, 학습 시스템과 관련된 측정에서 DL 시스템의 테스트 입력의 다양성이 더욱 의미를 가질 수 있다. 본 발명의 목표는 학습 데이터와 관련하여 주어진 입력 세트에서 관찰되는 행동 차이를 정량적으로 측정하는 기준을 정의하는 것에 있다.Since the DL system is prone to errors for unfamiliar inputs, the diversity of test inputs of the DL system can be more meaningful in measurements related to the learning system. The aim of the present invention is to define a criterion for quantitatively measuring behavioral differences observed in a given set of inputs with respect to training data.

A. 활성화 궤적(Activation Trace) 및 놀라움 적합도(Surprise Adequacy)A. Activation Trace and Surprise Adequacy

DL 시스템 D를 구성하는 뉴런 집합을

으로, 입력 값의 집합을

으로 설정하자. 입력 x에 대한 단일 뉴런 n의 활성화 값을

으로 설정하자. 정렬된 뉴런의 서브 집합에 대해,

이고,

는 활성화 값의 벡터를 나타내고,

의 개별 뉴런에 대응하는 각 요소 :

의 카디널리티(cardinality)는

과 같을 수 있다.

는

의 뉴런들

의 활성화 궤적(AT, Activation Trace)일 수 있다(이하, 활성화 궤적AT로 지칭함). 유사하게,

는 일련의 입력

에 대해

의 뉴런을 통해 관찰되는 AT 집합이 되도록 할 수 있다(

). 주어진 입력에 대해 네트워크를 실행할 때마다 AT을 이용할 수 있다.The set of neurons that make up the DL system D

With the set of input values

Let's set it to The activation value of a single neuron n for input x

Let's set it to For a subset of ordered neurons,

ego,

Denotes a vector of activation values,

Each element corresponding to an individual neuron in:

The cardinality of

It can be the same as

Is

Neurons

It may be an activation trace (AT, Activation Trace) of (hereinafter referred to as an activation trace AT). Similarly,

Is a series of inputs

About

It can be made to be the set of ATs observed through neurons in (

). You can use AT whenever you run the network for a given input.

DL 시스템의 동작은 제어 흐름(control-flow)이 아닌 데이터 흐름(data-flow)을 따라 구동되므로,

와 관련하여 모든

에서 관찰된 AT은

를 이용하여 실행될 때 조사중인 DL 시스템의 동작을 완전히 캡처한다고 가정할 수 있다.Since the operation of the DL system is driven by data-flow rather than control-flow,

All in connection with

AT observed in

It can be assumed that the operation of the DL system under investigation is completely captured when executed using.

도 2는 일실시예에 따른 놀라움 적합도를 설명하고, 가능성 기반의 놀라움 적합도를 설명하기 위한 도면이다.FIG. 2 is a diagram for describing a degree of suitability for surprise according to an exemplary embodiment, and for explaining a degree of suitability for surprise based on a possibility.

도 2를 참조하면, AT는 캡션 230 및 캡션 240과 같이 표현될 수 있다. 다만, 이는 설명의 편의를 위하여 2차원상에 도시된 것일 뿐, 실제 AT는 3차원 혹은 그 이상의 차원을 가질 수 있다.Referring to FIG. 2, AT may be expressed as a caption 230 and a caption 240. However, this is only illustrated in two dimensions for convenience of explanation, and the actual AT may have three dimensions or higher dimensions.

AT 및 입력이 2차원상에 도시될 수 있다는 가정 하에, 입력들이 캡션 210 및 캡션 220과 같이 도시될 수 있다. 캡션 210은 놀라운(Surprising) 입력을 나타내고, 캡션 220은 놀랍지 않은(Not Surprising) 입력을 나타낼 수 있다.The inputs can be shown as captions 210 and 220, assuming that the AT and inputs can be shown on two dimensions. Caption 210 may indicate Surprising input, and caption 220 may indicate Not Surprising input.

놀라움은 학습에 사용된 입력과 관련하여 주어진 새로운 입력의 상대적 신규성에 대응될 수 있다. 놀라움 적합도(이하, 놀라움 적합도를 SA로 지칭함)는 학습에 사용된 입력과 관련하여 주어진 새로운 입력의 상대적 신규성(즉, 놀라움)을 측정하는 것을 목표로 할 수 있다. 학습 집합(training set) T가 주어지면, 먼저 학습 데이터 집합의 모든 입력을 사용하여 모든 뉴런의 활성화 값을 기록함으로써

를 계산할 수 있다. 이어서, 새로운 입력

가 주어지면,

의 AT을

와 비교하여 T가

에 비해 얼마나 놀라운지를 측정할 수 있다. 이 정량적 유사성이 측정된 결과가 놀라움 적합도가 될 수 있다.The surprise may correspond to the relative novelty of a new input given in relation to the input used for learning. The surprise fit (hereinafter, the surprise fit is referred to as SA) may aim to measure the relative novelty (ie, surprise) of a given new input in relation to the input used for learning. Given the training set T, we first record the activation values of all neurons using all the inputs in the training data set.

Can be calculated. Subsequently, a new input

Is given,

AT

Compared with T

You can measure how surprising it is. The result of this quantitative similarity measurement can be a surprising fit.

이하,

와

의 유사성을 측정하는 방법이 서로 다른 SA의 두 가지 방식을 소개한다. 한 가지 방식은 도 2에서 설명되고, 나머지 한 가지 방식은 도 3에서 설명될 수 있다.Below,

Wow

Here are two ways of measuring the similarity of SAs. One method may be described in FIG. 2, and the other method may be described in FIG. 3.

B. 가능성 기반의 놀라움 적합도(LSA, Likelihood-based Surprise Adequacy)B. Likelihood-based Surprise Adequacy (LSA)

확률 밀도 함수를 추정하기 위하여, 랜덤 변수의 확률 밀도 함수를 추정하는 방법인 KDE(Kernel Density Estimation)가 이용될 수 있다. 결과 밀도 함수를 사용하면 랜덤 변수의 특정 값에 대한 상대적인 가능성이 추정될 수 있다. LSA(Likelihood-based SA, 가능성 기반의 놀라움 적합도)는 KDE를 사용하여

의 각 활성화 값의 확률 밀도를 추정하고, 추정된 밀도와 관련하여 새로운 입력의 놀라움을 획득하기 위한 방법일 수 있다. 이것은 KDE를 사용하여 적대적 예시들(adversarial examples)을 탐지하는 기존 연구의 확장된 형태일 수 있다. 차원(dimensionality)과 계산 비용을 줄이려면 선택한 레이어

의 뉴런만을 고려할 수 있다. 이 경우, AT 집합

이 생성될 수 있다.In order to estimate the probability density function, Kernel Density Estimation (KDE), which is a method of estimating the probability density function of a random variable, may be used. Using the resulting density function, the relative likelihood of a random variable for a specific value can be estimated. Likelihood-based SA (LSA) uses KDE to

It may be a method for estimating the probability density of each activation value of, and obtaining a surprise of a new input in relation to the estimated density. This could be an extended form of existing research that uses KDE to detect adversarial examples. Selected layer to reduce dimensionality and computational cost

We can only consider the neurons of In this case, the AT set

Can be generated.

계산 비용을 더 줄이기 위하여, 활성화 값이 사전 정의된 임계 값보다 낮은 분산을 나타내는 뉴런을 필터링할 수 있다. 이 뉴런은 KDE에 많은 정보를 제공하지 않을 수 있다. 각 궤적의 카디널리티는

일 수 있다. 대역폭 매트릭스 H, 가우시안 커널 함수 K, 새로운 입력

의 AT 및

가 주어지면 KDE는 아래의 수학식 1과 같이 밀도 함수

를 생성할 수 있다.In order to further reduce the computational cost, neurons having an activation value lower than a predefined threshold value may be filtered out. These neurons may not provide much information to the KDE. The cardinality of each trajectory is

Can be Bandwidth matrix H, Gaussian kernel function K, new input

AT and

Given is, KDE is a density function as shown in Equation 1 below.

Can be created.

입력 x의 놀라움을 측정하기 위하여, 확률 밀도가 감소 할 때(즉, 입력이 학습 데이터와 비교하여 드문 경우)에는 증가하고, 확률 밀도가 증가할 때(즉, 입력이 학습 데이터와 비슷한 경우) 감소하는 메트릭스(metics)가 필요할 수 있다.To measure the surprise of the input x, it increases when the probability density decreases (i.e., the input is rare compared to the training data), and decreases when the probability density increases (i.e., the input is similar to the training data). You may need metics to do it.

일실시예에 따르면, 확률 밀도를 희소성 척도(measure of rareness)로 변환하는 일반적인 접근 방법을 채택할 수 있다. 다만, LSA의 정의 방식이 반드시 이러한 예시에 한정되는 것은 아니나, 설명의 편의를 위하여 이하 확률 밀도를 희소성 척도로 변환하는 접근 방법을 채택하는 실시예들을 설명한다.According to one embodiment, a general approach to converting the probability density to a measure of rareness can be adopted. However, the definition method of the LSA is not necessarily limited to this example, but for convenience of description, embodiments in which an approach method of converting a probability density into a scarcity measure will be described below.

이 경우, LSA는 밀도에 대한 로그의 음수 값이 되도록 정의될 수 있다. 그 결과는 아래의 수학식 2와 같을 수 있다.In this case, the LSA can be defined to be a negative logarithmic value for density. The result may be as in Equation 2 below.

입력 유형에 대한 추가 정보를 사용하여 LSA가 보다 정밀(precise)해질 수 있다. 예를 들어, DL 분류기 D와 관련하여, 동일한 클래스 라벨(class label)을 공유하는 입력은 유사한 AT들을 가질 것으로 예상될 수 있다. 이는 클래스 당 LSA를 계산하고, 클래스 c에 대하여 T를

로 교체함으로써 수행될 수 있다. 일실시예에 따르면, DL 분류기에 퍼 클래스 LSA(per-class LSA)가 이용될 수 있다.The LSA can be made more precise with additional information about the input type. For example, with respect to DL classifier D, inputs sharing the same class label can be expected to have similar ATs. This computes the LSA per class, and T for class c

It can be done by replacing with. According to an embodiment, a per-class LSA (LSA) may be used for the DL classifier.

특정 유형의 DL 작업을 통해 학습 집합 T의 적어도 일부에 집중하여 SA를 보다 정확하고 의미있게 측정 할 수 있다. 예를 들어, 새로운 입력

를 사용하여 분류기(classifier)를 테스트하는 경우, 입력

는 조사중인 DL 시스템에 의해 클래스 c로 분류될 수 있다. 이 경우,

의 놀라움은

에 대하여 보다 의미있게 측정될 수 있다(Tc는 구성원이 c로 분류되는 T의 서브 집합). 기본적으로, 입력이 전체의 학습 예시들(training examples)과 관련하여 놀라운 것이 아니더라도, 클래스 c의 예로서는 놀라운 것일 수가 있다.SA can be measured more accurately and meaningfully by focusing on at least a part of the learning set T through a specific type of DL task. For example, new input

If you are testing a classifier using the input

Can be classified as class c by the DL system under investigation. in this case,

The surprise of

Can be measured more meaningfully for (Tc is a subset of T whose members are classified as c). Basically, although the input isn't surprising with respect to the overall training examples, it can be surprising as an example of class c.

도 3은 일실시예에 따른 거리 기반의 놀라움 적합도를 설명하기 위한 도면이다.3 is a diagram for describing a degree of suitability for surprise based on a distance according to an exemplary embodiment.

도 3을 참조하면, AT 및 입력이 2차원상에 도시될 수 있다는 가정 하에, 입력들이 놀라운 입력이 도 2의 캡션 210 및 놀랍지 않은 입력이 도 2의 캡션 220과 같이 도시될 수 있다. 도 2의 캡션 210은 새로운 입력

에 대응되고, 도 2의 캡션 220은 새로운 입력

에 대응된다고 하자. 이 경우,

에서 클래스

까지의 거리 및

에서 클래스

까지의 거리와 비교하여,

의 AT는

의 AT에 비하여 클래스

에서 더 멀리 떨어져 있을 수 있다(즉,

). 결과적으로, 클래스

과 관련하여,

이

보다 더 놀라운 것으로 결정될 수 있다.Referring to FIG. 3, assuming that the AT and the input can be shown in two dimensions, the inputs may be shown as the caption 210 of FIG. 2 and the non-surprising input may be shown as caption 220 of FIG. Caption 210 of Figure 2 is a new input

Corresponds to, and the caption 220 of FIG. 2 is a new input

Let's say it corresponds to. in this case,

In class

Distance to and

In class

Compared to the distance to,

AT

Class compared to AT of

Can be further away from (i.e.

). As a result, the class

In relation to,

this

It can be determined to be even more surprising.

LSA의 대안으로써, 단순히 놀라움의 척도로 AT 간의 거리가 이용될 수 있다. 여기서, 새로운 입력

의 AT와 학습 중에 관측된 AT 사이의 유클리드 거리를 이용하는 DSA(Distance-based SA, 거리 기반의 놀라움 적합도)가 정의될 수 있다. 거리 측정 기준 인 DSA는 입력 간 경계들(boundaries)을 활용하는 데 효과적일 수 있다. 거리

및

를 비교함으로써(다시 말해, 새로운 입력의 AT와 기준점의 거리 간의 거리를 비교함으로써),

의 학습 데이터에서 가장 가까운 AT 인

및

까지의 거리(즉, 기준점에서 측정된

까지의 거리)는 새로운 입력이 클래스 경계(class boundary)에 얼마나 가까운지를 나타낼 수 있다.As an alternative to the LSA, the distance between ATs can be used simply as a measure of surprise. Here, the new input

DSA (Distance-based SA, distance-based surprise fit) using the Euclidean distance between the AT of the AT and the AT observed during learning can be defined. DSA, a distance measurement criterion, can be effective in utilizing boundaries between inputs. Street

And

By comparing (that is, by comparing the distance between the AT of the new input and the distance of the reference point),

Which is the closest AT in the training data of

And

Distance to (i.e., measured from the reference point)

Distance) can indicate how close the new input is to the class boundary.

분류 문제의 경우, 클래스 경계에 더 가까운 입력은 테스트 입력 다양성 측면에서 더 놀랍고 가치가 있을 수 잇다. 한편, 자율 주행 차에 대한 적절한 조향각(steering angle) 예측과 같이 입력 사이에 경계가 없는 작업의 경우 DSA를 적용하기 어려울 수 있다. 클래스 경계가 존재하지 않는 경우, 새로운 입력의 AT가 다른 학습 입력의 AT와 거리가 멀더라도, 새로운 입력이 놀라운 것을 보장하지는 않을 수 있다. 이는, 클래스 경계가 존재하지 않는 경우, 새로운 입력의 AT가 다른 학습 입력의 AT와 거리가 멀더라도, 새로운 입력의 AT가 여전히 AT 공간의 밀집된 부분들(crpowded parts)에 위치할 수 있기 때문일 수 있다. 다만, 분류 작업들(classification tasks)의 경우, 여전히 DSA만 적용하는 것이 LSA를 적용하는 것에 비하여 더 효과적일 수 있다.For classification problems, inputs closer to class boundaries can be more surprising and valuable in terms of test input diversity. On the other hand, it may be difficult to apply DSA in the case of tasks without boundaries between inputs, such as predicting an appropriate steering angle for an autonomous vehicle. If the class boundary does not exist, even if the AT of the new input is far from the AT of the other learning input, it may not be guaranteed that the new input is surprising. This may be because if the class boundary does not exist, even if the AT of the new input is far from the AT of the other learning input, the AT of the new input may still be located in the crpowded parts of the AT space. . However, in the case of classification tasks, still applying only DSA may be more effective than applying LSA.

일실시예에 따르면, 뉴런 N들의 집합으로 구성된 DL 시스템 D는 학습 데이터 집합 T를 이용하여, 클래스 C 집합의 분류 작업을 위해 학습될 수 있다. 활성화 궤적 집합

, 새로운 입력

및 새로운 입력에 대한 예측된 클래스

가 주어지는 경우, 기준점

가 동일한 클래스를 공유하는

의 가장 가까운 이웃으로 정의될 수 있다.According to an embodiment, a DL system D composed of a set of neurons N may be trained for classification of a class C set using a training data set T. Set of activation trajectories

, New input

And predicted class for new input

If is given, the reference point

Share the same class

Can be defined as the nearest neighbor of.

기준점

는 아래의 수학식 3과 같이 계산될 수 있다.Benchmark

Can be calculated as in Equation 3 below.

및

사이의 거리

는 아래의 수학식 4와 같이 계산될 수 있다.

And

Distance between

Can be calculated as in Equation 4 below.

다음으로,

이외의 클래스에서

에서 가장 가까운 이웃을 찾을 수 있다.to the next,

In a class other than

You can find the nearest neighbors in.

가장 가까운 이웃

는 아래의 수학식 5와 같이 계산될 수 있다.Nearest neighbor

Can be calculated as in Equation 5 below.

및

사이의 거리

는 아래의 수학식 6과 같이 계산될 수 있다.

And

Distance between

Can be calculated as in Equation 6 below.

직관적으로, DSA는 새로운 입력

의 AT로부터 자신의 클래스

에 속하는 알려진 AT까지의 거리 및

클래스의 AT와 다른 클래스인

에 알려진 AT 사이의 거리를 비교하는 것을 목표로 할 수 있다. 자신의 클래스

에 속하는 알려진 AT까지의 거리가

클래스의 AT와 다른 클래스인

에 알려진 AT 사이의 거리보다 더 큰 경우,

는 분류 DL 시스템 D의 클래스

에 대한 놀라운 입력이 될 수 있다.Intuitively, the DSA is a new input

Own class from AT

Distance to a known AT belonging to and

Which is a different class from AT of the class

You can aim to compare the distances between the ATs known to. Own class

The distance to a known AT belonging to

Which is a different class from AT of the class

If greater than the distance between ATs known to,

Class of Classification DL System D

It can be an amazing input for.

일실시예에 따른 DSA는 아래의 수학식 7과 같이 계산될 수 있다.DSA according to an embodiment may be calculated as in Equation 7 below.

다만, 이러한 실시예 이외에도, DSA를 공식화하기 위한 여러 가지 방법들 존재할 수 있다. 예를 들어,

및

는 유클리드 거리가 아닌 다른 방법으로 계산될 수 있다. 또는, DSA는

에서

를 뺀 값으로 계산될 수도 있다.However, in addition to these embodiments, there may be various methods for formulating DSA. E.g,

And

Can be calculated in a way other than the Euclidean distance. Or, DSA

in

It can also be calculated by subtracting.

도 4는 일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 방법을 설명하기 위한 동작 흐름도이다.4 is a flowchart illustrating a method of evaluating a test suitability of input data for a neural network according to an embodiment.

도 4를 참조하면, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 입력 데이터를 수신한다(410).Referring to FIG. 4, the apparatus for evaluating the test suitability of input data for neural networks receives input data (410).

신경망 용 입력 데이터의 검사 적합도 평가 장치는 복수의 뉴런들을 포함하는 신경망에 의하여 추론 가능한 결과들 별로 뉴런들의 출력 양상에 대응하는 평가 모델을 획득한다(420).The apparatus for evaluating the test suitability of input data for a neural network acquires an evaluation model corresponding to an output pattern of neurons for each result that can be inferred by a neural network including a plurality of neurons (420 ).

입력 데이터를 신경망에 인가함으로써, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 추론 가능한 결과들 중 제1 결과를 추론한다(430). 신경망 용 입력 데이터의 검사 적합도 평가 장치는 입력 데이터에 대응하여 신경망에서 출력되는 결과를 획득함으로써 제1 결과를 추론할 수 있다.By applying the input data to the neural network, the apparatus for evaluating the test suitability of the input data for the neural network infers a first result from among the inferable results (430). The apparatus for evaluating the test suitability of input data for neural networks may infer a first result by acquiring a result output from the neural network in response to the input data.

입력 데이터에 반응한 뉴런들의 출력 양상 및 평가 모델에 기초하여, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 추론 가능한 결과들 중 제2 결과를 추론한다(440). 일실시예에 따르면, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 평가 모델이 포함하는 데이터가 형성하는 확률 밀도 분포(probability density distribution)에 기초하여, 입력 데이터에 반응한 뉴런들의 출력 양상이 확률 밀도 분포 상 어디에 위치할지를 결정함으로써 제2 결과를 추론할 수 있다.The apparatus for evaluating the test suitability of input data for neural networks infers a second result from among the inferable results based on the output pattern and the evaluation model of the neurons responding to the input data (440). According to an embodiment, the apparatus for evaluating the test suitability of input data for neural networks is based on a probability density distribution formed by data included in the evaluation model, and the output pattern of neurons responding to the input data is a probability density distribution. The second result can be inferred by deciding where to place the image.

일실시예에 따르면, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 평가 모델이 포함하는 데이터 중 입력 데이터에 반응한 뉴런들의 출력 양상과 미리 정해진 유사도가 가장 높은 데이터를 결정하고, 유사도가 가장 높은 데이터를 신경망이 입력 받은 경우에 출력하는 결과에 대응하는 값을 제2 결과로 결정할 수 있다. 이 경우, 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 제1 데이터의 개별의 원소와 대응하는 제2 데이터의 원소를 각각 비교함으로써 계산될 수 있다. 예를 들어, 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 제1 데이터의 개별의 원소와 대응하는 제2 데이터의 원소를 각각 비교함으로써 계산되는 유클리드 거리(Euclidean distance)에 대응할 수 있다.According to an embodiment, the apparatus for evaluating the test suitability of input data for neural networks determines the output patterns of neurons responding to the input data and data with the highest similarity, among data included in the evaluation model, and determines the data with the highest similarity. When the neural network receives an input, a value corresponding to the output result may be determined as the second result. In this case, the degree of similarity may be calculated by comparing individual elements of the first data and corresponding elements of the second data with respect to the first data and the second data to be compared. For example, the similarity may correspond to a Euclidean distance calculated by comparing individual elements of the first data and elements of the corresponding second data in relation to the first data and the second data to be compared. I can.

제1 결과와 제2 결과를 비교함으로써, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 입력 데이터의 검사 적합도를 평가한다(450).By comparing the first result and the second result, the apparatus for evaluating the test suitability of the input data for neural networks evaluates the test suitability of the input data (450).

신경망 용 입력 데이터의 검사 적합도 평가 장치는 검사 적합도를 평가한 결과에 기초하여, 신경망이 얼마나 잘 학습되었는지를 더 판단할 수 있다. 일실시예에 따르면, 검사 적합도가 미리 정해진 임계 값보다 작은 경우, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 학습 세트(training set)를 획득하고, 획득된 학습 세트에 기초하여, 신경망을 다시 학습시킬 수 있다. 이 경우, 학습 세트는 학습 세트가 포함하는 데이터 중 적어도 일부가 입력 데이터와 연관되도록 결정될 수 있다.The apparatus for evaluating the test suitability of the input data for neural networks may further determine how well the neural network is trained based on the result of evaluating the test suitability. According to an embodiment, when the test suitability is less than a predetermined threshold, the apparatus for evaluating test suitability of input data for neural networks acquires a training set, and retrains the neural network based on the acquired training set. I can. In this case, the training set may be determined such that at least some of the data included in the training set is associated with the input data.

이하, 도 1 내지 도 4에서 설명되지 않은 내용들을 추가로 더 설명한다.Hereinafter, contents not described in FIGS. 1 to 4 will be further described.

D. 놀라움 커버리지(SC, Surprise Coverage)D. Surprise Coverage (SC)

입력 세트가 주어지면, 입력 세트가 커버하는 SC(Surprise Coverage, 놀라움 커버리지) 값의 범위를 측정할 수도 있다. LSA와 DSA는 연속 공간에서 정의되므로, 버킷팅(bucketing)을 이용하여 놀라움의 공간을 구분하고 LSC(Likelihood-Based Surprise Coverage)와 DSC(Distance-based Surprise Coverage)를 정의할 수 있다.

의 상한과,

를 n개의 SA 세그먼트로 나누는 버킷들

이 주어지면, 입력 X 집합에 대한 SC인 SC(X)는 아래의 수학식 8과 같이 정의될 수 있다.Given an input set, it is also possible to measure the range of SC (Surprise Coverage) values that the input set covers. Since LSA and DSA are defined in a continuous space, the space of surprise can be classified using bucketing and LSC (Likelihood-Based Surprise Coverage) and DSC (Distance-based Surprise Coverage) can be defined.

With the upper limit of,

Buckets dividing into n SA segments

Given is given, SC(X), which is the SC for the input X set, may be defined as in Equation 8 below.

높은 SC를 갖는 입력 세트는 학습 데이터와 유사한 입력(즉, SA가 낮은 경우)부터 학습 데이터와 유사하지 않은 입력(즉, SA가 높은 경우)을 포괄하는 다양한 입력들을 포함하는 세트일 수 있다. DL 시스템에 대한 입력 세트가 SA를 고려하여 체계적으로 다각화될 수록, 입력 세트를 통하여 효과적으로 네트워크의 학습 결과를 체크할 수 있다.The input set having a high SC may be a set including various inputs ranging from an input similar to the training data (ie, when the SA is low) to an input that is not similar to the training data (ie, when the SA is high). As the input set for the DL system is systematically diversified in consideration of the SA, the learning result of the network can be effectively checked through the input set.

SC에서의 C와 같이 커버리지(Coverage) 또는 커버(Cover)와 같은 용어들이 이용되지만, SA 기반 커버리지의 의미는 단순한 구조적 커버리지(structural coverage)와는 그 의미에 차이가 있을 수 있다. 첫째로, 대부분의 구조적 커버리지 기준과 달리, 성명서(statement) 또는 브랜치(branch) 커버리지에서와 같이 커버해야 할 유한한 수의 목표가 존재하는 것이 아닐 수 있다. 최소한 이론적으로는, 입력의 놀라움의 정도는 임의로 결정될 수 있다. 그러나, SA 값이 임의로 높은 입력은 문제 영역(problem domain)과 관련이 없거나, 또는 덜 흥미로울(less interesting) 수 있다(예 : 교통 표지 이미지는 동물 사진 분류기의 테스트와 관련이 없을 수 있다). SC는 이론적으로 유한한 경로 범위가 매개 변수에 의해 제한되는 것과 같은 방식으로, 사전에 정의된 상한에 대해서만 측정 될 수 있다.Like C in SC, terms such as coverage or cover are used, but the meaning of SA-based coverage may be different from that of simple structural coverage. First, unlike most structural coverage criteria, there may not be a finite number of targets to be covered, such as in statement or branch coverage. At least in theory, the degree of surprise of the input can be arbitrarily determined. However, an input with an arbitrarily high SA value may not be related to the problem domain or may be less interesting (e.g., traffic sign images may not be related to the test of an animal photo classifier). SC can only be measured against a predefined upper limit, in the same way that the theoretically finite path range is limited by parameters.

둘째, SC는 조합 집합 커버 문제(combinatorial set cover problem)에 그치지 않으며, 테스트 스위트 최소화(test suite minimization)에 기초하여 공식화될 수 있다. 이는, 단일 입력이 단일 SA 값만 생성하고 여러 SA 버킷에 속할 수 없기 때문일 수 있다. 커버리지 기준(coverage criteria)으로서 SC에 대한 리던던시(redundancy)는 구조적 커버리지보다 약하며, 단일 입력으로 복수의 타겟을 커버할 수 있다.Second, the SC is not limited to the combinatorial set cover problem, but can be formulated based on test suite minimization. This may be because a single input generates only a single SA value and cannot belong to multiple SA buckets. Redundancy for SC as coverage criteria is weaker than structural coverage, and multiple targets can be covered with a single input.

III. 연구 관련 질문들III. Research Questions

본 발명과 관련하여, 다음과 같은 질문들이 제시될 수 있다.In connection with the present invention, the following questions may be presented.

RQ1. 놀라움(surprise): SADL은 DL 시스템의 입력에 대한 상대적인 놀라움(relative surprise)을 포착 할 수 있는가?RQ1. Surprise: Can SADL capture a relative surprise on the input of the DL system?

서로 다른 관점에서 RQ1에 대한 답변이 제공될 수 있다. 먼저, 원래 데이터 집합에 포함된 각 테스트 입력의 SA를 계산하고, DL 분류기가 입력을 정확하게 분류하기가 훨씬 더 어려운 입력을 찾는 지 여부를 확인할 수 있다. 더 놀라운 입력은 올바르게 분류하기가 더 어려울 것으로 예상될 수 있다. 둘째, 적대적 예시들(adversarial examples)이 더 놀라울 뿐만 아니라 DL 시스템의 다른 행동들을 유발할 것으로 기대하므로, SA 값을 기반으로 적대적 예시들을 탐지할 수 있는지 여부를 평가할 수 있다. 서로 다른 기술을 사용하여 여러 집합의 적대적 예시들이 생성되고, SA 값에 기초하여 비교될 수 있다.Answers to RQ1 can be provided from different perspectives. First, we can calculate the SA of each test input included in the original data set, and determine whether the DL classifier finds an input that is much more difficult to accurately classify the input. More surprising inputs can be expected to be more difficult to classify correctly. Second, since adversarial examples are expected to be more surprising and cause other behaviors of the DL system, it is possible to evaluate whether or not adversarial examples can be detected based on the SA value. Several sets of hostile examples can be created using different techniques and compared based on the SA value.

마지막으로, SA 값에 대한 로지스틱 회귀(logistic regression)를 사용하여 적대적 예시 분류기(adversarial example classifiers)가 학습될 수 있다. 일 예시로, 각 대적 공격 전략에 대해 MNIST 및 CIFAR-10에서 제공 한 10,000 개의 원본 테스트 이미지를 사용하여 10,000 개의 적대적 예시들이 생성되고, 무작위로 선택된 1,000 개의 원본 테스트 이미지와 1,000 개의 적대적 예시들을 사용하여 로지스틱 회귀 분류기(logistic regression classifiers)가 학습될 수 있다. 그 후, 나머지 9,000 개의 원본 테스트 이미지와 9,000 개의 적대적 예시를 사용하여 학습된 분류기가 평가될 수 있다. SA 값이 DL 시스템의 동작을 올바르게 포착한다면, SA 기반 분류기가 적대적 예시들을 성공적으로 감지 할 것으로 예상될 수 있다. 일실시예에 따르면, ROC-AUC(Reliant Under Operator Operator 특성의 곡선 아래 면적)를 평가에 사용함으로써, 참 및 거짓 양성 비율이 모두 포착될 수 있다.Finally, adversarial example classifiers may be trained using logistic regression on SA values. As an example, 10,000 hostile examples were generated using 10,000 original test images provided by MNIST and CIFAR-10 for each adversarial attack strategy, and 1,000 original test images randomly selected and 1,000 hostile examples were used. Logistic regression classifiers can be learned. The trained classifier can then be evaluated using the remaining 9,000 original test images and 9,000 hostile examples. If the SA value correctly captures the operation of the DL system, it can be expected that the SA-based classifier will successfully detect hostile examples. According to one embodiment, by using ROC-AUC (area under the curve of the Reliant Under Operator Operator characteristic) for evaluation, both true and false positive rates can be captured.

RQ2. 층 민감도(Layer Sensitivity): SA 계산에 사용되는 뉴런 층의 선택이 SA가 DL 시스템의 동작을 얼마나 정확하게 반영하는지에 영향을 주는가?RQ2. Layer Sensitivity: Does the selection of neuronal layers used in SA calculations affect how accurately the SA reflects the behavior of the DL system?

일실시예에 따르면, Bengio et al. KDE 기반의 적대적 예시들 탐지 기술을 도입하는 경우, 탐지에 유용한 가장 많은 정보를 포함하는 가장 깊은(즉, 마지막 숨겨진) 레이어를 가정할 수 있다. 모든 개별 계층의 LSA 및 DSA를 계산 한 다음 각 계층에서 SA에 대해 학습된 적대적 예시들 분류기를 비교하여 SA의 맥락에서 이 가정을 평가할 수 있다.According to one embodiment, Bengio et al. When a KDE-based hostile example detection technology is introduced, it is possible to assume the deepest (ie, the last hidden) layer that contains the most information useful for detection. We can evaluate this assumption in the context of the SA by calculating the LSA and DSA of all individual layers and then comparing the classifiers of the hostile examples learned for the SA at each layer.

RQ3. 상관 관계: SC는 DL 시스템의 기존 커버리지 기준과 상관 관계가 있는가?RQ3. Correlation: Does the SC correlate with the existing coverage criteria of the DL system?

입력 놀라움(input surprise)을 캡처하는 것 외에도, SC는 집계를 기반으로 기존 커버리지 기준과 일치되어야 할 수 있다. 그렇지 않은 경우, SC는 입력 다양성 이외의 것을 측정할 위험이 있을 수 있다. 이를 위하여, SC가 다른 기준과 상관 관계가 있는지를 확인할 수 있다. 구체적으로, 서로 다른 방법으로 생성된 입력(즉, 다른 적대적 예시들 생성 기술 또는 입력 합성 기술)에 의해 생성된 입력들을 누적하여 입력 다양성을 제어하고, 이러한 입력으로 DL 시스템을 실행하고, SC 및 복수 개의 기존 커버리지 기준들을 포함하는 다양한 커버리지 기준들의 변화를 관찰 및 비교할 수 있다. 일실시예에 따른 복수 개의 기존 커버리지 기준들은 DeepXplore의 뉴런 커버리지(NC), 딥 게이지가 도입한 뉴런 레벨 커버리지(NLC), k- 섹션 뉴런 커버리지(KMNC), 뉴런 경계 커버리지(NBC) 및 강력한 뉴런 활성화 커버리지(SNAC) 중 적어도 일부를 포함할 수 있다.In addition to capturing the input surprise, the SC may have to match existing coverage criteria based on aggregation. If not, the SC may risk measuring anything other than input diversity. To this end, it can be checked whether the SC is correlated with other criteria. Specifically, inputs generated by inputs generated in different ways (i.e., different hostile examples generation techniques or input synthesis techniques) are accumulated to control input diversity, execute a DL system with these inputs, and SC and multiple inputs. It is possible to observe and compare changes in various coverage criteria, including three existing coverage criteria. A plurality of existing coverage criteria according to an embodiment include Neuron Coverage (NC) of DeepXplore, Neuron Level Coverage (NLC) introduced by Deep Gauge, k-section Neuron Coverage (KMNC), Neuron Boundary Coverage (NBC), and strong neuron activation. It may include at least some of the coverage (SNAC).

일 예시로, MNIST 및 CIFAR-10의 경우, 데이터 집합에서 제공 한 원본 테스트 데이터(10,000 개 이미지)부터 시작하여 각 단계에서 FGSM, BIM-A, BIM-B, JSMA 및 C & W에서 생성 한 1,000 개의 적대적 예시가 추가될 수 있다. 다른 예시로, Dave-2의 경우, 원래 테스트 데이터(5,614 개 이미지)에서 시작하여 각 단계마다 DeepXplore에서 생성된 700 개의 합성 이미지가 추가될 수 있다. 다른 예시로, Chauffeur의 경우, 각 단계는 임의의 수의 DeepTest 변환을 적용하여 생성된 1,000 개의 합성 이미지 (Set1 ~ Set3)가 추가될 수 있다.As an example, in the case of MNIST and CIFAR-10, 1,000 generated by FGSM, BIM-A, BIM-B, JSMA and C&W in each step, starting with the original test data (10,000 images) provided by the dataset. Hostile examples of dogs may be added. As another example, in the case of Dave-2, 700 composite images generated by DeepXplore may be added for each step starting from the original test data (5,614 images). As another example, in the case of Chauffeur, 1,000 composite images (Set1 to Set3) generated by applying an arbitrary number of DeepTest transforms may be added to each step.

RQ4. 지침(Guidance): SA는 DeepXplore에서 생성된 적대적 예시들과 합성 테스트 입력에 대한 정확도를 향상시키기 위해 DL 시스템의 재학습을 가이드 할 수 있는가?RQ4. Guidance: Can the SA guide retraining of the DL system to improve the accuracy of the hostile examples and synthetic test inputs generated in DeepXplore?

SADL이 적대적 예시들에 비해 정확성을 높이기 위해 DL 시스템들의 추가적인 학습을 가이드(guide) 할 수 있는지 평가하기 위하여, SA가 추가 학습을 위한 입력 선택을 가이드할 수 있는지가 중요할 수 있다. 일 예시로, 이러한 모델들에 대한 적대적 예시들과 합성된 입력에서, 4개의 서로 다른 SA 범위에서 100 개의 이미지로 구성된 4 가지 집합들이 선택될 수 있다. SC를 계산하기 위하여 RQ3에서 이용된 상한(upper bound)으로 U를 가정하면, SA의 범위

를 4 개의 겹치는(overlapping) 부분 집합들로 분류될 수 있다. 구체적으로, SA의 값에 따라, 하위 25%의 SA 값들(

), 하위 50%의 SA 값들(

), 하위 75%의 SA 값들(

) 및 전체 SA 범위(

)이 부분 집합들에 포함될 수 있다.In order to evaluate whether SADL can guide additional learning of DL systems in order to increase accuracy compared to hostile examples, it may be important whether the SA can guide input selection for further learning. As an example, in an input synthesized with hostile examples for these models, four sets of 100 images in four different SA ranges may be selected. Assuming U as the upper bound used in RQ3 to calculate SC, the range of SA

Can be classified into four overlapping subsets. Specifically, according to the value of SA, SA values of the lower 25% (

), the bottom 50% of SA values (

), SA values of the lower 75% (

) And the full SA range (

) Can be included in subsets.

네 가지 부분 집합들은 점점 더 다양한 입력 세트들을 나타낼 것으로 예상될 수 있다. 일실시예에따르면, 범위 R을 네가지 부분 집합들 중 하나로 설정하고, 각 R에서 100개의 이미지를 무작위로 샘플링하고, 추가 세대들(예를 들어, 5개의 추가 세대들)을 위해 기존 모델들을 학습시킬 수 있다. 마지막으로 모든 적대적 입력들과 합성 입력들(synthetic inputs)에 대한 각 모델의 성능 (예를 들어, MNIST의 정확도, CIFAR-10의 정확도 및 Dave-2의 MSE)을 각각 측정할 수 있다. 보다 다양한 하위 집합으로 재 학습하는 경우, 성능이 향상 될 것으로 예상될 수 있다.The four subsets can be expected to represent an increasingly diverse set of inputs. According to one embodiment, set the range R to one of four subsets, randomly sample 100 images from each R, and train existing models for additional generations (e.g., 5 additional generations). I can make it. Finally, we can measure the performance of each model (e.g., MNIST's accuracy, CIFAR-10's accuracy, and Dave-2's MSE) for all hostile and synthetic inputs. When retraining with a more diverse subset, performance can be expected to improve.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices, methods, and components described in the embodiments are, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. Further, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to operate as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or, to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodyed in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited drawings, a person of ordinary skill in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order from the described method, and/or components such as systems, structures, devices, circuits, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and those equivalent to the claims also fall within the scope of the claims to be described later.

Claims

In the test suitability evaluation method of input data for neural networks,
Receiving the input data;
Acquiring an evaluation model corresponding to an output pattern of the neurons while the neural network is being trained, for each of the results inferred by the neural network including a plurality of neurons;
Inferring a first result from among the inferable results by applying the input data to the neural network;
Inferring a second result from among the inferable results based on the evaluation model and output patterns of the neurons in response to the input data; And
Evaluating the test suitability of the input data by comparing the first result and the second result
Containing,
A method for evaluating the test fit of input data for neural networks.

The method of claim 1,
Determining how well the neural network has learned based on the result of evaluating the test suitability
Further comprising,
A method for evaluating the test fit of input data for neural networks.

The method of claim 1,
If the test suitability is less than a predetermined threshold value,
Obtaining a training set; And
Based on the acquired training set, retraining the neural network
Further comprising,
A method for evaluating the test fit of input data for neural networks.

The method of claim 3,
Obtaining the learning set
Determining the training set such that at least some of the data included in the training set is associated with the input data
Containing,
A method for evaluating the test fit of input data for neural networks.

The method of claim 1,
Inferring the first result
Obtaining a result output from the neural network in response to the input data
Containing,
A method for evaluating the test fit of input data for neural networks.

The method of claim 1,
Inferring the second result
Determining where the output patterns of the neurons responding to the input data will be located on the probability density distribution based on a probability density distribution formed by data included in the evaluation model
Containing,
A method for evaluating the test fit of input data for neural networks.

The method of claim 1,
Inferring the second result
Determining data having the highest similarity to an output pattern of the neurons responding to the input data among data included in the evaluation model; And
Determining a value corresponding to the output result when the neural network receives the data having the highest similarity as the second result
Containing,
A method for evaluating the test fit of input data for neural networks.

The method of claim 7,
The similarity is
With respect to the first data and the second data to be compared, calculated by comparing each individual element of the first data and the corresponding element of the second data,
A method for evaluating the test fit of input data for neural networks.

The method of claim 7,
The similarity is
Corresponding to the Euclidean distance calculated by comparing respective elements of the first data and corresponding elements of the second data with respect to the first data and second data to be compared,
A method for evaluating the test fit of input data for neural networks.

A computer-readable recording medium containing a program for performing the method of claim 1.

In the test suitability evaluation device for input data for neural networks,
A memory in which a program is recorded; And
Processor that executes the above program
Including,
The above program,
Receiving the input data;
Acquiring an evaluation model corresponding to an output pattern of the neurons while the neural network is being trained, for each of the results inferred by the neural network including a plurality of neurons;
Inferring a first result from among the inferable results by applying the input data to the neural network;
Inferring a second result from among the inferable results based on the evaluation model and output patterns of the neurons in response to the input data; And
Evaluating the test suitability of the input data by comparing the first result and the second result
To do,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 11,
Determining how well the neural network has learned based on the result of evaluating the test suitability
To do more,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 11,
If the test suitability is less than a predetermined threshold value,
Obtaining a training set; And
Based on the acquired training set, retraining the neural network
To do more,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 13,
Obtaining the learning set
Determining the training set such that at least some of the data included in the training set is associated with the input data
Containing,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 11,
Inferring the first result
Obtaining a result output from the neural network in response to the input data
Containing,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 11,
Inferring the second result
Determining where the output patterns of the neurons responding to the input data will be located on the probability density distribution based on a probability density distribution formed by data included in the evaluation model
Containing,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 11,
Inferring the second result
Determining data having the highest similarity to an output pattern of the neurons responding to the input data among data included in the evaluation model; And
Determining a value corresponding to the output result when the neural network receives the data having the highest similarity as the second result
Containing,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 17,
The similarity is
With respect to the first data and the second data to be compared, calculated by comparing each individual element of the first data and the corresponding element of the second data,
A device for evaluating the test suitability of input data for neural networks.

The method of claim 17,
The similarity is
Corresponding to the Euclidean distance calculated by comparing respective elements of the first data and corresponding elements of the second data with respect to the first data and second data to be compared,
A device for evaluating the test suitability of input data for neural networks.