KR102287430B1

KR102287430B1 - Method for evaluating test fitness of input data for neural network and apparatus thereof

Info

Publication number: KR102287430B1
Application number: KR1020190104591A
Authority: KR
Inventors: 유신; 김진한; 댄 로버트 펠트
Original assignee: 한국과학기술원; 액셀레란디움 에이비
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2021-08-09
Also published as: KR20210024872A

Abstract

신경망 용 입력 데이터의 검사 적합도를 평가하는 방법 및 장치가 개시된다. 일실시예에 따른 신경망 용 입력 데이터의 검사 적합도를 평가하는 방법은 입력 데이터를 수신하는 단계; 신경망이 학습되는 중 뉴런들의 출력 양상에 대응하는 활성화 궤적을 획득하는 단계; 입력 데이터를 신경망에 인가함으로써, 추론 가능한 결과들 중 제1 결과를 추론하는 단계; 입력 데이터에 반응한 뉴런들의 출력 양상 및 활성화 궤적에 기초하여, 추론 가능한 결과들 중 제2 결과를 추론하는 단계; 및 제1 결과와 제2 결과를 비교함으로써, 입력 데이터의 검사 적합도를 평가하는 단계를 포함한다.A method and apparatus for evaluating the test fit of input data for a neural network are disclosed. According to an embodiment, a method for evaluating test suitability of input data for a neural network includes receiving input data; acquiring activation trajectories corresponding to output patterns of neurons while the neural network is being trained; inferring a first result among inferable results by applying the input data to the neural network; inferring a second result among inferable results based on the output pattern and activation trajectory of neurons in response to the input data; and evaluating the test suitability of the input data by comparing the first result with the second result.

Description

Method for evaluating test fit of input data for neural network and device therefor

신경망 용 입력 데이터의 검사 적합도 평가 방법 및 그 장치에 관한 것으로, 예를 들어 딥 러닝 시스템(Deep Learning System)에 관한 것이다.To a method and an apparatus for evaluating test suitability of input data for a neural network, for example, to a deep learning system.

딥 러닝(이하, DL) 시스템은 이미지 인식(image recognition), 음성 인식(speech recognition) 및 기계 번역(machine translation)을 포함하는 다양한 분야에서 상당한 발전을 이루었다. 인간의 행동에 대응하거나, 혹은 심지어 이를 능가하는 성능을 바탕으로, DL 시스템은 자율 주행(autonomous driving) 및 맬웨어 탐지(malware detection)와 같은 안전 및 보안 핵심 영역에서 채택되고 있다.Deep learning (hereinafter DL) systems have made significant advances in various fields including image recognition, speech recognition, and machine translation. Based on their ability to respond to, or even exceed, human behavior, DL systems are being adopted in key safety and security areas such as autonomous driving and malware detection.

안전 및 보안 핵심 영역은 정확하고 예측 가능한 것이어야 할 수 있다. DL 시스템은 뛰어난 성능을 보이지만, 특정 상황에서 예기치 않은 동작을 보이기도 하는 것으로 알려져 있다. 예를 들어, DL 시스템이 장착된 자율 주행 차량의 경우, 다른 차량이 양보해 줄 것으로 예상했으나 실제로 그러지 않은 경우 해당 차량과 충돌을 일으키는 경우가 있을 수 있다.Safety and security critical areas may need to be accurate and predictable. Although DL systems have excellent performance, they are known to exhibit unexpected behavior in certain situations. For example, in the case of an autonomous vehicle equipped with a DL system, there may be a case where another vehicle expects to yield but does not, causing a collision with the vehicle.

이러한 점 때문에, DL 시스템은 그 동작과 유효성의 검증이 필요하다. 다만, 기존 소프트웨어 테스트 기술을 DL 시스템에 직접 적용하기 어려울 수 있다. 예를 들어, 구조적 커버리지(structural coverage)를 높이는 기존의 화이트 박스 테스팅 기법(white-box testing techniques)은 DL 시스템에서 유용하지 않을 수 있다. 이는, DL 시스템의 동작은 제어 흐름 구조(control flow structure)에서 명시적으로 인코딩되지 않기 때문이다.Because of this, the DL system needs verification of its operation and validity. However, it may be difficult to directly apply the existing software test technology to the DL system. For example, existing white-box testing techniques that increase structural coverage may not be useful in a DL system. This is because the operation of the DL system is not explicitly encoded in the control flow structure.

DL 시스템의 테스트 및 검증을 위한 두 가지 가정이 제시될 수 있다. 첫 번째 가정은 DL 시스템에 대한 두 개의 입력이 어떤 인간의 의미(human sense)와 유사하다면, 그 출력도 비슷해야 한다는 가정일 수 있다. 이는 메타모픽 테스팅(metamorphic testing)의 본질을 일반화한 것일 수 있다. 두 번째 가정은 입력 세트가 다양할수록 DL 시스템을보다 효과적으로 테스트할 수 있다는 것일 수 있다.Two assumptions can be made for testing and verification of the DL system. The first assumption may be that if two inputs to a DL system are similar to some human sense, then their outputs should be similar. This may be a generalization of the nature of metamorphic testing. A second assumption might be that the more diverse the input set, the more effectively the DL system can be tested.

두 가지 가정 하에서 이루어지는 테스트 및 검증은 수작업의 애드혹(ad hoc) 테스트에 비해 발전된 형태이나, 여전히 그 한계가 존재한다. 단순히 활성화 값이 특정 조건을 만족하는 뉴런의 갯수를 세는 것은 주어진 입력 세트의 테스팅 효과를 정량화 할 수 있게 하지만 개별 입력에 대한 정보는 거의 전달하지 않을 수 있다. 예를 들어, 더 높은 NC를 가진 입력이 더 낮은 NC를 가진 다른 입력보다 나은 것으로 간주되어야 하는 이유 및 특정 입력이 다른 입력보다 임계 값 이상으로 더 많은 뉴런을 자연스럽게 활성화하는 이유를 설명하기 어려울 수 있다. 테스트 적합성 기준이 실제로 유용하기 위해서는 개별 입력의 선택을 가이드 할 수 있어야 할 수 있다.Testing and verification conducted under two assumptions is an advanced form compared to manual ad hoc testing, but its limitations still exist. Simply counting the number of neurons whose activation values satisfy a certain condition allows us to quantify the testing effect of a given set of inputs, but may convey little information about individual inputs. For example, it can be difficult to explain why an input with a higher NC should be considered better than another input with a lower NC, and why certain inputs naturally activate more neurons above a threshold than others. . For test suitability criteria to be really useful, it may need to be able to guide the selection of individual inputs.

본 발명에서, DL 시스템에 대한 새로운 적합성 테스트가 제안될 수 있다. 새로운 적합성 테스트는 DL 시스템에 대한 놀라움 적합도(Surprise Adequacy for Deep Learning, SADL)일 수 있다. DL 시스템에 적합한 테스트 입력 세트는 학습 데이터(training data)와 유사한 입력부터 학습 데이터와 현저히 다른 입력을 포함하도록 체계적으로 다양화되어야 할 수 있다.In the present invention, a new conformance test for the DL system can be proposed. A new conformance test may be Surprise Adequacy for Deep Learning (SADL) for DL systems. A test input set suitable for a DL system may have to be systematically diversified to include inputs similar to training data to significantly different from training data.

개별 입력을 세분화하는 것과 관련하여, SADL은 입력이 DL 시스템에 있어서 얼마나 놀라운지 측정할 수 있다. 실제 놀라움 정도의 측정은 시스템이 학습 동안 유사한 입력을 보았을 가능성에 기초할 수 있다(예를 들어, 커널 밀도 추정을 사용하여 학습 과정에서 추정된 확률 밀도 분포에 관련될 수 있음). 또는, 실제 놀라움의 측정은 주어진 입력의 뉴런 활성화 흔적을 나타내는 벡터와 학습 데이터 사이의 거리(예를 들어, 유클리드 거리를 사용할 수 있음)에 기초할 수 있다. 결과적으로, 일련의 테스트 입력의 SA(Surprise Adequacy)는 집합이 포함하는 개별 입력의 놀라움 값(surprise value)을 통해 측정될 수 있다.In terms of subdividing individual inputs, SADL can measure how surprising an input is for a DL system. A measure of the actual degree of surprise may be based on the likelihood that the system saw similar inputs during training (eg, it may relate to the probability density distribution estimated during the learning process using kernel density estimation). Alternatively, the measure of actual surprise may be based on the distance (eg, Euclidean distance can be used) between the vector representing the neuron activation trace of a given input and the training data. As a result, the Surprise Adequacy (SA) of a set of test inputs can be measured through the surprise value of the individual inputs the set contains.

본 발명에서, 학습 데이터와 관련하여 각 입력의 상대적 놀라움(SA, Surprise Adequacy)을 정량적으로 측정 할 수 있는 DL 시스템을위한 놀라움 적합성 프레임워크(surprise adequacy framework)인 SADL이 제안될 수 있다. 또한, 특정 활성화 특성을 가진 뉴런 수를 측정하는 대신, SA를 사용하여 이산 입력(discretized input)의 놀라움 범위를 측정하는 SC(Surprise Coverage, 놀라움 커버리지)가 제안될 수 있다. SA와 SC는 입력의 놀라움을 정확하게 포착 할 수 있으며 DL 시스템이 알려지지 않은 입력에 어떻게 반응하는지에 대한 좋은 지표가 될 수 있다. SA는 DL 시스템이 입력을 찾는 방법과 상관 관계가 있으며, 적대적 예시들을 정확하게 분류하는 데 이용될 수 있다. 또한, SC는 합성된 입력 뿐만 아니라 적대적인 예시들을 위한 DL 시스템의 보다 효과적인 재 학습을 위한 입력 선택을 가이드(guide)하는 데 사용될 수 있다.In the present invention, SADL, a surprise adequacy framework for a DL system that can quantitatively measure the relative surprise (SA, Surprise Adequacy) of each input with respect to learning data, can be proposed. Also, instead of measuring the number of neurons with a specific activation characteristic, a Surprise Coverage (SC) that measures the surprise range of a discrete input using SA may be proposed. SA and SC can accurately capture the surprise of the input and can be good indicators of how the DL system reacts to the unknown input. SA correlates with how the DL system looks for input and can be used to accurately classify adversarial instances. In addition, SC can be used to guide input selection for more effective re-learning of the DL system for adversarial examples as well as synthesized input.

일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 방법은 상기 입력 데이터를 수신하는 단계; 복수의 뉴런들을 포함하는 상기 신경망에 의하여 추론 가능한 결과들 별로, 상기 신경망이 학습되는 중 상기 뉴런들의 출력 양상에 대응하는 활성화 궤적을 획득하는 단계; 상기 입력 데이터를 상기 신경망에 인가함으로써, 상기 추론 가능한 결과들 중 제1 결과를 추론하는 단계; 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상 및 상기 활성화 궤적에 기초하여, 상기 추론 가능한 결과들 중 제2 결과를 추론하는 단계; 및 상기 제1 결과와 상기 제2 결과를 비교함으로써, 상기 입력 데이터의 검사 적합도를 평가하는 단계를 포함한다.According to an embodiment, a method for evaluating test suitability of input data for a neural network includes receiving the input data; obtaining an activation trajectory corresponding to an output pattern of the neurons while the neural network is being trained for each result that can be inferred by the neural network including a plurality of neurons; inferring a first result among the inferable results by applying the input data to the neural network; inferring a second result from among the inferable results based on an output pattern of the neurons in response to the input data and the activation trajectory; and evaluating the test suitability of the input data by comparing the first result with the second result.

일실시예에 따르면, 신경망 용 입력 데이터의 검사 적합도 평가 방법은 상기 검사 적합도를 평가한 결과에 기초하여, 상기 신경망이 얼마나 잘 학습되었는지를 판단하는 단계를 더 포함할 수 있다.According to an embodiment, the method for evaluating test suitability of input data for a neural network may further include determining how well the neural network is trained based on a result of evaluating the test suitability.

일실시예에 따르면, 상기 검사 적합도가 미리 정해진 임계 값보다 작은 경우, 신경망 용 입력 데이터의 검사 적합도 평가 방법은 학습 세트(training set)를 획득하는 단계; 및 상기 획득된 학습 세트에 기초하여, 상기 신경망을 다시 학습시키는 단계를 더 포함할 수 있다.According to an embodiment, when the test suitability is less than a predetermined threshold, the test suitability evaluation method of input data for a neural network includes: acquiring a training set; and based on the acquired training set, re-training the neural network.

일실시예에 따르면, 상기 학습 세트를 획득하는 단계는 상기 학습 세트가 포함하는 데이터 중 적어도 일부가 상기 입력 데이터와 연관되도록 상기 학습 세트를 결정하는 단계를 포함할 수 있다.According to an embodiment, acquiring the training set may include determining the training set so that at least a portion of data included in the training set is associated with the input data.

일실시예에 따르면, 상기 제1 결과를 추론하는 단계는 상기 입력 데이터에 대응하여 상기 신경망에서 출력되는 결과를 획득하는 단계를 포함할 수 있다.According to an embodiment, inferring the first result may include obtaining a result output from the neural network in response to the input data.

일실시예에 따르면, 상기 제2 결과를 추론하는 단계는 상기 활성화 궤적이 포함하는 데이터가 형성하는 확률 밀도 분포(probability density distribution)에 기초하여, 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상이 상기 확률 밀도 분포 상 어디에 위치할지를 결정하는 단계를 포함할 수 있다.According to an embodiment, in the inferring of the second result, the output pattern of the neurons in response to the input data is determined based on a probability density distribution formed by the data included in the activation trajectory. determining where on the probability density distribution it will be located.

일실시예에 따르면, 상기 제2 결과를 추론하는 단계는 상기 활성화 궤적이 포함하는 데이터 중 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상과 미리 정해진 유사도가 가장 높은 데이터를 결정하는 단계; 및 상기 유사도가 가장 높은 데이터를 상기 신경망이 입력받은 경우에 출력하는 결과에 대응하는 값을 상기 제2 결과로 결정하는 단계를 포함할 수 있다.According to an embodiment, the inferring of the second result may include: determining, among data included in the activation trajectory, data having the highest predetermined similarity to an output pattern of the neurons in response to the input data; and determining, as the second result, a value corresponding to a result output when the neural network receives the data having the highest similarity.

일실시예에 따르면, 상기 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 상기 제1 데이터의 개별의 원소와 대응하는 상기 제2 데이터의 원소를 각각 비교함으로써 계산될 수 있다.According to an embodiment, the similarity may be calculated by comparing individual elements of the first data and elements of the second data corresponding to the first data and second data to be compared.

일실시예에 따르면, 상기 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 상기 제1 데이터의 개별의 원소와 대응하는 상기 제2 데이터의 원소를 각각 비교함으로써 계산되는 유클리드 거리(Euclidean distance)에 대응할 수 있다.According to an embodiment, the degree of similarity is a Euclidean distance ( Euclidean distance).

일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 장치는 프로그램이 기록된 메모리; 및 상기 프로그램을 수행하는 프로세서를 포함하고, 상기 프로그램은, 상기 입력 데이터를 수신하는 단계; 복수의 뉴런들을 포함하는 상기 신경망에 의하여 추론 가능한 결과들 별로, 상기 신경망이 학습되는 중 상기 뉴런들의 출력 양상에 대응하는 활성화 궤적을 획득하는 단계; 상기 입력 데이터를 상기 신경망에 인가함으로써, 상기 추론 가능한 결과들 중 제1 결과를 추론하는 단계; 상기 입력 데이터에 반응한 상기 뉴런들의 출력 양상 및 상기 활성화 궤적에 기초하여, 상기 추론 가능한 결과들 중 제2 결과를 추론하는 단계; 및 상기 제1 결과와 상기 제2 결과를 비교함으로써, 상기 입력 데이터의 검사 적합도를 평가하는 단계를 포함한다.An apparatus for evaluating test suitability of input data for a neural network according to an embodiment includes: a memory in which a program is recorded; and a processor executing the program, wherein the program comprises: receiving the input data; obtaining an activation trajectory corresponding to an output pattern of the neurons while the neural network is being trained for each result that can be inferred by the neural network including a plurality of neurons; inferring a first result among the inferable results by applying the input data to the neural network; inferring a second result from among the inferable results based on an output pattern of the neurons in response to the input data and the activation trajectory; and evaluating the test suitability of the input data by comparing the first result with the second result.

도 1은 일실시예에 따른 신경망에 포함되는 복수의 뉴런들의 출력 양상을 예시적으로 설명하기 위한 도면이다.
도 2는 일실시예에 따른 놀라움 적합도를 설명하고, 가능성 기반의 놀라움 적합도를 설명하기 위한 도면이다.
도 3은 일실시예에 따른 거리 기반의 놀라움 적합도를 설명하기 위한 도면이다.
도 4는 일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 방법을 설명하기 위한 동작 흐름도이다.1 is a diagram for exemplarily explaining an output aspect of a plurality of neurons included in a neural network according to an embodiment.
FIG. 2 is a diagram for explaining surprise fitness according to an embodiment and for explaining probability-based surprise fitness.
3 is a diagram for explaining distance-based surprise fit according to an embodiment.
4 is a flowchart illustrating a method for evaluating test suitability of input data for a neural network according to an exemplary embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 실시될 수 있다. 따라서, 실시예들은 특정한 개시형태로 한정되는 것이 아니며, 본 명세서의 범위는 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for purposes of illustration only, and may be changed and implemented in various forms. Accordingly, the embodiments are not limited to a specific disclosure form, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical spirit.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various components, these terms should be interpreted only for the purpose of distinguishing one component from another. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected to” another component, it may be directly connected or connected to the other component, but it should be understood that another component may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers, It should be understood that the possibility of the presence or addition of steps, operations, components, parts or combinations thereof is not precluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Like reference numerals in each figure indicate like elements.

도 1은 일실시예에 따른 신경망에 포함되는 복수의 뉴런들의 출력 양상을 예시적으로 설명하기 위한 도면이다.1 is a diagram for exemplarily explaining an output aspect of a plurality of neurons included in a neural network according to an embodiment.

도 1을 참조하면, 신경망은 하나 이상의 레이어(layer)를 포함하고, 각 레이어는 복수의 뉴런들(neurons)을 포함할 수 있다. 일실시예에 따르면, 신경망은 제1 레이어(110) 및 제2 레이어(120)를 포함할 수 있다.Referring to FIG. 1 , a neural network may include one or more layers, and each layer may include a plurality of neurons. According to an embodiment, the neural network may include a first layer 110 and a second layer 120 .

제1 레이어(110) 및 제2 레이어(120)가 포함하는 복수의 뉴런들은 개별의 입력(input)

와 관련하여 그 값을 출력할 수 있다. 예를 들어, 제1 레이어(110)가 포함하는 복수의 뉴런들은 0.6, 0.2 및 0.1이라는 값들을 출력하고, 제2 레이어(120)가 포함하는 복수의 뉴런들은 0.1, 0.3 및 0.5라는 값들을 출력할 수 있다.The plurality of neurons included in the first layer 110 and the second layer 120 are individual inputs.

You can output its value in relation to . For example, the plurality of neurons included in the first layer 110 outputs values of 0.6, 0.2, and 0.1, and the plurality of neurons included in the second layer 120 outputs values of 0.1, 0.3, and 0.5. can do.

신경망은 입력

와 관련하여 이를 분류한 결과를 출력할 수 있다. 예를 들어, 입력

와 관련하여 출력된 결과는 'dog'가 될 수 있다. 이 경우, 출력된 결과는 입력

가 미리 정해진 출력의 후보들 중 '개'의 영상에 가장 유사하게 매치됨을 의미할 수 있다.Neural network input

In relation to this, it is possible to output the classification result. For example, input

The output result in relation to 'dog' may be 'dog'. In this case, the output result is the input

may mean that it most closely matches the image of 'dog' among the candidates of the predetermined output.

복수의 뉴런들이 출력하는 값들은 검사 적합도 평가를 위하여 이용될 수 있다. 이하, 도 1 내지 도 3에서, 설명의 편의를 위하여 복수의 뉴런들이 출력하는 값들을 활성화 궤적(AT, Activation Trace)으로, 검사 적합도 평가의 기준이 되는 값을 놀라움 적합도(Surprise Adequacy)로 지칭할 수 있다. 다만, 도 4에서는 동작의 흐름을 설명하는 데 있어서 청구항의 표현을 존중하여 '복수의 뉴런들이 출력하는 값' 및 '검사 적합도 평가'라는 표현을 그대로 이용할 수 있다.Values output from the plurality of neurons may be used to evaluate test suitability. Hereinafter, in FIGS. 1 to 3 , for convenience of explanation, values output from a plurality of neurons are referred to as activation traces (AT), and a value serving as a criterion for evaluation of test suitability is referred to as Surprise Adequacy. can However, in FIG. 4 , expressions such as 'values output by a plurality of neurons' and 'test suitability evaluation' may be used as they are in FIG.

DL 시스템은 익숙하지 않은 입력에 대해 오류가 발생하기 쉽기 때문에, 학습 시스템과 관련된 측정에서 DL 시스템의 테스트 입력의 다양성이 더욱 의미를 가질 수 있다. 본 발명의 목표는 학습 데이터와 관련하여 주어진 입력 세트에서 관찰되는 행동 차이를 정량적으로 측정하는 기준을 정의하는 것에 있다.Because DL systems are prone to errors with unfamiliar inputs, the diversity of test inputs in DL systems can be more meaningful in measurements related to learning systems. It is an object of the present invention to define a criterion for quantitatively measuring the behavioral difference observed in a given set of inputs with respect to learning data.

A. 활성화 궤적(Activation Trace) 및 놀라움 적합도(Surprise Adequacy)A. Activation Trace and Surprise Adequacy

DL 시스템 D를 구성하는 뉴런 집합을

으로, 입력 값의 집합을

으로 설정하자. 입력 x에 대한 단일 뉴런 n의 활성화 값을

으로 설정하자. 정렬된 뉴런의 서브 집합에 대해,

이고,

는 활성화 값의 벡터를 나타내고,

의 개별 뉴런에 대응하는 각 요소 :

의 카디널리티(cardinality)는

과 같을 수 있다.

는

의 뉴런들

의 활성화 궤적(AT, Activation Trace)일 수 있다(이하, 활성화 궤적AT로 지칭함). 유사하게,

는 일련의 입력

에 대해

의 뉴런을 통해 관찰되는 AT 집합이 되도록 할 수 있다(

). 주어진 입력에 대해 네트워크를 실행할 때마다 AT을 이용할 수 있다.The set of neurons that make up the DL system D

, the set of input values

Let's set it to The activation value of a single neuron n for an input x

Let's set it to For a subset of sorted neurons,

ego,

denotes the vector of activation values,

Each element corresponding to an individual neuron in:

The cardinality of

can be the same as

Is

neurons in

may be an activation trace (AT) of (hereinafter referred to as an activation trace AT). Similarly,

is a series of inputs

About

It can be made to be the AT set observed through the neurons of

). AT is available whenever the network is running for a given input.

DL 시스템의 동작은 제어 흐름(control-flow)이 아닌 데이터 흐름(data-flow)을 따라 구동되므로,

와 관련하여 모든

에서 관찰된 AT은

를 이용하여 실행될 때 조사중인 DL 시스템의 동작을 완전히 캡처한다고 가정할 수 있다.Since the operation of the DL system is driven by data-flow, not control-flow,

all in relation to

AT observed in

can be assumed to fully capture the behavior of the DL system under investigation when run using

도 2는 일실시예에 따른 놀라움 적합도를 설명하고, 가능성 기반의 놀라움 적합도를 설명하기 위한 도면이다.FIG. 2 is a diagram for explaining surprise fitness according to an embodiment and for explaining probability-based surprise fitness.

도 2를 참조하면, AT는 캡션 230 및 캡션 240과 같이 표현될 수 있다. 다만, 이는 설명의 편의를 위하여 2차원상에 도시된 것일 뿐, 실제 AT는 3차원 혹은 그 이상의 차원을 가질 수 있다.Referring to FIG. 2 , AT may be expressed as caption 230 and caption 240 . However, this is only shown in 2D for convenience of description, and the actual AT may have 3D or more dimensions.

AT 및 입력이 2차원상에 도시될 수 있다는 가정 하에, 입력들이 캡션 210 및 캡션 220과 같이 도시될 수 있다. 캡션 210은 놀라운(Surprising) 입력을 나타내고, 캡션 220은 놀랍지 않은(Not Surprising) 입력을 나타낼 수 있다.Assuming that the AT and the input can be shown in two dimensions, the inputs can be shown as caption 210 and caption 220 . A caption 210 may indicate a surprising input, and a caption 220 may indicate a Not Surprising input.

놀라움은 학습에 사용된 입력과 관련하여 주어진 새로운 입력의 상대적 신규성에 대응될 수 있다. 놀라움 적합도(이하, 놀라움 적합도를 SA로 지칭함)는 학습에 사용된 입력과 관련하여 주어진 새로운 입력의 상대적 신규성(즉, 놀라움)을 측정하는 것을 목표로 할 수 있다. 학습 집합(training set) T가 주어지면, 먼저 학습 데이터 집합의 모든 입력을 사용하여 모든 뉴런의 활성화 값을 기록함으로써

를 계산할 수 있다. 이어서, 새로운 입력

가 주어지면,

의 AT을

와 비교하여 T가

에 비해 얼마나 놀라운지를 측정할 수 있다. 이 정량적 유사성이 측정된 결과가 놀라움 적합도가 될 수 있다.Surprise may correspond to the relative novelty of a given new input with respect to the input used for learning. Surprise fit (hereinafter referred to as surprise fit) may aim to measure the relative novelty (ie, surprise) of a given new input with respect to the input used for learning. Given a training set T, we first record the activation values of all neurons using all inputs from the training data set.

can be calculated. Then, a new input

is given,

AT's

compared with T

You can measure how amazing it is. The result of measuring this quantitative similarity can be a surprising fit.

이하,

와

의 유사성을 측정하는 방법이 서로 다른 SA의 두 가지 방식을 소개한다. 한 가지 방식은 도 2에서 설명되고, 나머지 한 가지 방식은 도 3에서 설명될 수 있다.below,

Wow

We introduce two methods of SA with different methods of measuring the similarity of . One scheme may be described in FIG. 2 , and the other scheme may be described in FIG. 3 .

B. 가능성 기반의 놀라움 적합도(LSA, Likelihood-based Surprise Adequacy)B. Likelihood-based Surprise Adequacy (LSA)

확률 밀도 함수를 추정하기 위하여, 랜덤 변수의 확률 밀도 함수를 추정하는 방법인 KDE(Kernel Density Estimation)가 이용될 수 있다. 결과 밀도 함수를 사용하면 랜덤 변수의 특정 값에 대한 상대적인 가능성이 추정될 수 있다. LSA(Likelihood-based SA, 가능성 기반의 놀라움 적합도)는 KDE를 사용하여

의 각 활성화 값의 확률 밀도를 추정하고, 추정된 밀도와 관련하여 새로운 입력의 놀라움을 획득하기 위한 방법일 수 있다. 이것은 KDE를 사용하여 적대적 예시들(adversarial examples)을 탐지하는 기존 연구의 확장된 형태일 수 있다. 차원(dimensionality)과 계산 비용을 줄이려면 선택한 레이어

의 뉴런만을 고려할 수 있다. 이 경우, AT 집합

이 생성될 수 있다.In order to estimate the probability density function, Kernel Density Estimation (KDE), which is a method of estimating the probability density function of a random variable, may be used. Using the resulting density function, the relative probabilities for a particular value of a random variable can be estimated. LSA (Likelihood-based SA, Likelihood-Based Surprise Fit) uses KDE to

It may be a method for estimating the probability density of each activation value of , and obtaining a surprise of a new input in relation to the estimated density. This could be an extension of existing research using KDE to detect adversarial examples. Selected layer to reduce dimensionality and computational cost

Only neurons of In this case, the AT set

can be created.

계산 비용을 더 줄이기 위하여, 활성화 값이 사전 정의된 임계 값보다 낮은 분산을 나타내는 뉴런을 필터링할 수 있다. 이 뉴런은 KDE에 많은 정보를 제공하지 않을 수 있다. 각 궤적의 카디널리티는

일 수 있다. 대역폭 매트릭스 H, 가우시안 커널 함수 K, 새로운 입력

의 AT 및

가 주어지면 KDE는 아래의 수학식 1과 같이 밀도 함수

를 생성할 수 있다.In order to further reduce the computational cost, it is possible to filter out neurons whose activation values exhibit a variance lower than a predefined threshold. This neuron may not provide much information to the KDE. The cardinality of each trajectory is

can be Bandwidth matrix H, Gaussian kernel function K, new input

AT and

If is given, KDE is a density function as shown in Equation 1 below.

can create

입력 x의 놀라움을 측정하기 위하여, 확률 밀도가 감소 할 때(즉, 입력이 학습 데이터와 비교하여 드문 경우)에는 증가하고, 확률 밀도가 증가할 때(즉, 입력이 학습 데이터와 비슷한 경우) 감소하는 메트릭스(metics)가 필요할 수 있다.To measure the surprise of the input x, it increases when the probability density decreases (i.e., the input is sparse compared to the training data), and decreases when the probability density increases (i.e. when the input is similar to the training data). metrics may be required.

일실시예에 따르면, 확률 밀도를 희소성 척도(measure of rareness)로 변환하는 일반적인 접근 방법을 채택할 수 있다. 다만, LSA의 정의 방식이 반드시 이러한 예시에 한정되는 것은 아니나, 설명의 편의를 위하여 이하 확률 밀도를 희소성 척도로 변환하는 접근 방법을 채택하는 실시예들을 설명한다.According to an embodiment, a general approach of converting a probability density into a measure of rareness may be adopted. However, the definition method of the LSA is not necessarily limited to this example, but for convenience of explanation, examples will be described below in which an approach for converting a probability density into a sparsity measure is adopted.

이 경우, LSA는 밀도에 대한 로그의 음수 값이 되도록 정의될 수 있다. 그 결과는 아래의 수학식 2와 같을 수 있다.In this case, LSA can be defined to be the negative value of the logarithm of the density. The result may be as in Equation 2 below.

입력 유형에 대한 추가 정보를 사용하여 LSA가 보다 정밀(precise)해질 수 있다. 예를 들어, DL 분류기 D와 관련하여, 동일한 클래스 라벨(class label)을 공유하는 입력은 유사한 AT들을 가질 것으로 예상될 수 있다. 이는 클래스 당 LSA를 계산하고, 클래스 c에 대하여 T를

로 교체함으로써 수행될 수 있다. 일실시예에 따르면, DL 분류기에 퍼 클래스 LSA(per-class LSA)가 이용될 수 있다.By using additional information about the input type, the LSA can be made more precise. For example, with respect to DL classifier D, inputs sharing the same class label may be expected to have similar ATs. It computes the LSA per class, and T for class c

This can be done by replacing According to an embodiment, a per-class LSA may be used for the DL classifier.

특정 유형의 DL 작업을 통해 학습 집합 T의 적어도 일부에 집중하여 SA를 보다 정확하고 의미있게 측정 할 수 있다. 예를 들어, 새로운 입력

를 사용하여 분류기(classifier)를 테스트하는 경우, 입력

는 조사중인 DL 시스템에 의해 클래스 c로 분류될 수 있다. 이 경우,

의 놀라움은

에 대하여 보다 의미있게 측정될 수 있다(Tc는 구성원이 c로 분류되는 T의 서브 집합). 기본적으로, 입력이 전체의 학습 예시들(training examples)과 관련하여 놀라운 것이 아니더라도, 클래스 c의 예로서는 놀라운 것일 수가 있다.Certain types of DL tasks allow us to more accurately and meaningfully measure SA by focusing on at least a portion of the training set T. For example, a new input

When testing a classifier using

can be classified as class c by the DL system under investigation. in this case,

the surprise of

can be measured more meaningfully for (Tc is a subset of T whose members are classified as c). Basically, although the input is not surprising with respect to the training examples as a whole, it can be surprising for the example of class c.

도 3은 일실시예에 따른 거리 기반의 놀라움 적합도를 설명하기 위한 도면이다.3 is a diagram for explaining distance-based surprise fit according to an embodiment.

도 3을 참조하면, AT 및 입력이 2차원상에 도시될 수 있다는 가정 하에, 입력들이 놀라운 입력이 도 2의 캡션 210 및 놀랍지 않은 입력이 도 2의 캡션 220과 같이 도시될 수 있다. 도 2의 캡션 210은 새로운 입력

에 대응되고, 도 2의 캡션 220은 새로운 입력

에 대응된다고 하자. 이 경우,

에서 클래스

까지의 거리 및

에서 클래스

까지의 거리와 비교하여,

의 AT는

의 AT에 비하여 클래스

에서 더 멀리 떨어져 있을 수 있다(즉,

). 결과적으로, 클래스

과 관련하여,

이

보다 더 놀라운 것으로 결정될 수 있다.Referring to FIG. 3 , assuming that the AT and the input can be shown in two dimensions, the input with surprising inputs can be shown as caption 210 of FIG. 2 and the unsurprising input with caption 220 of FIG. 2 . Caption 210 of Figure 2 is a new input

Corresponding to , the caption 220 of FIG. 2 is a new input

Let's say it corresponds to in this case,

class in

distance to and

class in

Compared to the distance to

AT's

class compared to AT of

may be further away from (i.e.,

). As a result, the class

In relation to

this

It can be decided even more surprising.

LSA의 대안으로써, 단순히 놀라움의 척도로 AT 간의 거리가 이용될 수 있다. 여기서, 새로운 입력

의 AT와 학습 중에 관측된 AT 사이의 유클리드 거리를 이용하는 DSA(Distance-based SA, 거리 기반의 놀라움 적합도)가 정의될 수 있다. 거리 측정 기준 인 DSA는 입력 간 경계들(boundaries)을 활용하는 데 효과적일 수 있다. 거리

및

를 비교함으로써(다시 말해, 새로운 입력의 AT와 기준점의 거리 간의 거리를 비교함으로써),

의 학습 데이터에서 가장 가까운 AT 인

및

까지의 거리(즉, 기준점에서 측정된

까지의 거리)는 새로운 입력이 클래스 경계(class boundary)에 얼마나 가까운지를 나타낼 수 있다.As an alternative to LSA, the distance between ATs can be used simply as a measure of surprise. Here, new input

Distance-based SA (Distance-based SA, distance-based surprise fit) using the Euclidean distance between the AT of and the observed AT during training can be defined. DSA, a distance metric, can be effective in exploiting boundaries between inputs. distance

and

By comparing (i.e., by comparing the distance between the AT of the new input and the distance of the reference point),

The closest AT from the training data of

and

distance to (i.e., measured from the reference point

distance to) may indicate how close the new input is to the class boundary.

분류 문제의 경우, 클래스 경계에 더 가까운 입력은 테스트 입력 다양성 측면에서 더 놀랍고 가치가 있을 수 잇다. 한편, 자율 주행 차에 대한 적절한 조향각(steering angle) 예측과 같이 입력 사이에 경계가 없는 작업의 경우 DSA를 적용하기 어려울 수 있다. 클래스 경계가 존재하지 않는 경우, 새로운 입력의 AT가 다른 학습 입력의 AT와 거리가 멀더라도, 새로운 입력이 놀라운 것을 보장하지는 않을 수 있다. 이는, 클래스 경계가 존재하지 않는 경우, 새로운 입력의 AT가 다른 학습 입력의 AT와 거리가 멀더라도, 새로운 입력의 AT가 여전히 AT 공간의 밀집된 부분들(crpowded parts)에 위치할 수 있기 때문일 수 있다. 다만, 분류 작업들(classification tasks)의 경우, 여전히 DSA만 적용하는 것이 LSA를 적용하는 것에 비하여 더 효과적일 수 있다.For classification problems, inputs closer to class boundaries may be more surprising and valuable in terms of test input diversity. On the other hand, it can be difficult to apply DSA for tasks where there are no boundaries between inputs, such as predicting an appropriate steering angle for an autonomous vehicle. If class boundaries do not exist, even if the AT of a new input is far from the AT of another learning input, it may not guarantee that the new input is surprising. This may be because if no class boundary exists, even if the AT of the new input is far from the AT of the other learning input, the AT of the new input may still be located in crowded parts of the AT space. . However, in the case of classification tasks, applying only DSA may still be more effective than applying LSA.

일실시예에 따르면, 뉴런 N들의 집합으로 구성된 DL 시스템 D는 학습 데이터 집합 T를 이용하여, 클래스 C 집합의 분류 작업을 위해 학습될 수 있다. 활성화 궤적 집합

, 새로운 입력

및 새로운 입력에 대한 예측된 클래스

가 주어지는 경우, 기준점

가 동일한 클래스를 공유하는

의 가장 가까운 이웃으로 정의될 수 있다.According to an embodiment, a DL system D configured with a set of neurons N may be trained for a classification task of a class C set using the training data set T. Set of activation trajectories

, new input

and predicted classes for new inputs

If given, the reference point

share the same class

can be defined as the nearest neighbor of

기준점

는 아래의 수학식 3과 같이 계산될 수 있다.Benchmark

can be calculated as in Equation 3 below.

및

사이의 거리

는 아래의 수학식 4와 같이 계산될 수 있다.

and

distance between

can be calculated as in Equation 4 below.

다음으로,

이외의 클래스에서

에서 가장 가까운 이웃을 찾을 수 있다.to the next,

in a class other than

You can find the nearest neighbors in

가장 가까운 이웃

는 아래의 수학식 5와 같이 계산될 수 있다.nearest neighbor

can be calculated as in Equation 5 below.

및

사이의 거리

는 아래의 수학식 6과 같이 계산될 수 있다.

and

distance between

can be calculated as in Equation 6 below.

직관적으로, DSA는 새로운 입력

의 AT로부터 자신의 클래스

에 속하는 알려진 AT까지의 거리 및

클래스의 AT와 다른 클래스인

에 알려진 AT 사이의 거리를 비교하는 것을 목표로 할 수 있다. 자신의 클래스

에 속하는 알려진 AT까지의 거리가

클래스의 AT와 다른 클래스인

에 알려진 AT 사이의 거리보다 더 큰 경우,

는 분류 DL 시스템 D의 클래스

에 대한 놀라운 입력이 될 수 있다.Intuitively, the DSA is a new input

Own class from AT

distance to a known AT belonging to and

A class different from the class AT

may aim to compare the distances between known ATs. own class

The distance to a known AT belonging to

A class different from the class AT

If greater than the distance between the known AT,

is a class of classification DL system D

can be a surprising input to

일실시예에 따른 DSA는 아래의 수학식 7과 같이 계산될 수 있다.DSA according to an embodiment may be calculated as in Equation 7 below.

다만, 이러한 실시예 이외에도, DSA를 공식화하기 위한 여러 가지 방법들 존재할 수 있다. 예를 들어,

및

는 유클리드 거리가 아닌 다른 방법으로 계산될 수 있다. 또는, DSA는

에서

를 뺀 값으로 계산될 수도 있다.However, in addition to these embodiments, various methods for formulating the DSA may exist. E.g,

and

can be calculated in a way other than the Euclidean distance. Alternatively, the DSA is

at

It may be calculated by subtracting

도 4는 일실시예에 따른 신경망 용 입력 데이터의 검사 적합도 평가 방법을 설명하기 위한 동작 흐름도이다.4 is a flowchart illustrating a method for evaluating test suitability of input data for a neural network according to an exemplary embodiment.

도 4를 참조하면, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 입력 데이터를 수신한다(410).Referring to FIG. 4 , the apparatus for evaluating the test suitability of input data for a neural network receives input data ( S410 ).

신경망 용 입력 데이터의 검사 적합도 평가 장치는 복수의 뉴런들을 포함하는 신경망에 의하여 추론 가능한 결과들 별로 뉴런들의 출력 양상에 대응하는 활성화 궤적을 획득한다(420).The apparatus for evaluating the test suitability of input data for a neural network acquires an activation trajectory corresponding to an output pattern of neurons for each result that can be inferred by a neural network including a plurality of neurons ( 420 ).

입력 데이터를 신경망에 인가함으로써, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 추론 가능한 결과들 중 제1 결과를 추론한다(430). 신경망 용 입력 데이터의 검사 적합도 평가 장치는 입력 데이터에 대응하여 신경망에서 출력되는 결과를 획득함으로써 제1 결과를 추론할 수 있다.By applying the input data to the neural network, the apparatus for evaluating the test suitability of the input data for the neural network infers a first result among inferable results ( 430 ). The apparatus for evaluating the test suitability of input data for a neural network may infer the first result by obtaining a result output from the neural network in response to the input data.

입력 데이터에 반응한 뉴런들의 출력 양상 및 활성화 궤적에 기초하여, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 추론 가능한 결과들 중 제2 결과를 추론한다(440). 일실시예에 따르면, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 활성화 궤적이 포함하는 데이터가 형성하는 확률 밀도 분포(probability density distribution)에 기초하여, 입력 데이터에 반응한 뉴런들의 출력 양상이 확률 밀도 분포 상 어디에 위치할지를 결정함으로써 제2 결과를 추론할 수 있다.Based on the output pattern and activation trajectory of the neurons in response to the input data, the test suitability evaluation apparatus for the input data for the neural network infers a second result from among the inferable results ( 440 ). According to an embodiment, the apparatus for evaluating the test suitability of input data for a neural network determines the output pattern of neurons responding to the input data based on the probability density distribution formed by the data including the activation trajectory. A second result can be inferred by determining where the image is located.

일실시예에 따르면, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 활성화 궤적이 포함하는 데이터 중 입력 데이터에 반응한 뉴런들의 출력 양상과 미리 정해진 유사도가 가장 높은 데이터를 결정하고, 유사도가 가장 높은 데이터를 신경망이 입력 받은 경우에 출력하는 결과에 대응하는 값을 제2 결과로 결정할 수 있다. 이 경우, 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 제1 데이터의 개별의 원소와 대응하는 제2 데이터의 원소를 각각 비교함으로써 계산될 수 있다. 예를 들어, 유사도는 비교 대상이 되는 제1 데이터 및 제2 데이터와 관련하여, 제1 데이터의 개별의 원소와 대응하는 제2 데이터의 원소를 각각 비교함으로써 계산되는 유클리드 거리(Euclidean distance)에 대응할 수 있다.According to an embodiment, the apparatus for evaluating the test suitability of input data for a neural network determines data having the highest predetermined similarity with an output pattern of neurons responding to the input data among data included in the activation trajectory, and selects the data with the highest similarity. When the neural network receives an input, a value corresponding to an output result may be determined as the second result. In this case, the degree of similarity may be calculated by respectively comparing individual elements of the first data and elements of the corresponding second data with respect to the first data and the second data to be compared. For example, the degree of similarity may correspond to a Euclidean distance calculated by respectively comparing individual elements of the first data and corresponding elements of the second data with respect to the first data and the second data to be compared. can

제1 결과와 제2 결과를 비교함으로써, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 입력 데이터의 검사 적합도를 평가한다(450).By comparing the first result and the second result, the test suitability evaluation apparatus of the input data for the neural network evaluates the test suitability of the input data ( 450 ).

신경망 용 입력 데이터의 검사 적합도 평가 장치는 검사 적합도를 평가한 결과에 기초하여, 신경망이 얼마나 잘 학습되었는지를 더 판단할 수 있다. 일실시예에 따르면, 검사 적합도가 미리 정해진 임계 값보다 작은 경우, 신경망 용 입력 데이터의 검사 적합도 평가 장치는 학습 세트(training set)를 획득하고, 획득된 학습 세트에 기초하여, 신경망을 다시 학습시킬 수 있다. 이 경우, 학습 세트는 학습 세트가 포함하는 데이터 중 적어도 일부가 입력 데이터와 연관되도록 결정될 수 있다.The apparatus for evaluating the test suitability of input data for a neural network may further determine how well the neural network is trained based on a result of evaluating the test suitability. According to an embodiment, when the test suitability is less than a predetermined threshold, the test suitability evaluation apparatus of the input data for a neural network acquires a training set, and based on the acquired training set, the neural network is trained again. can In this case, the training set may be determined such that at least a portion of data included in the training set is associated with the input data.

이하, 도 1 내지 도 4에서 설명되지 않은 내용들을 추가로 더 설명한다.Hereinafter, contents not described in FIGS. 1 to 4 will be further described.

D. 놀라움 커버리지(SC, Surprise Coverage)D. Surprise Coverage (SC)

입력 세트가 주어지면, 입력 세트가 커버하는 SC(Surprise Coverage, 놀라움 커버리지) 값의 범위를 측정할 수도 있다. LSA와 DSA는 연속 공간에서 정의되므로, 버킷팅(bucketing)을 이용하여 놀라움의 공간을 구분하고 LSC(Likelihood-Based Surprise Coverage)와 DSC(Distance-based Surprise Coverage)를 정의할 수 있다.

의 상한과,

를 n개의 SA 세그먼트로 나누는 버킷들

이 주어지면, 입력 X 집합에 대한 SC인 SC(X)는 아래의 수학식 8과 같이 정의될 수 있다.Given a set of inputs, it is also possible to measure the range of Surprise Coverage (SC) values that the set of inputs covers. Since LSA and DSA are defined in a continuous space, it is possible to classify the space of surprise using bucketing and define Likelihood-Based Surprise Coverage (LSC) and Distance-based Surprise Coverage (DSC).

the upper limit of

buckets that divide n into n SA segments.

Given this, SC(X), which is the SC for the input X set, can be defined as in Equation 8 below.

높은 SC를 갖는 입력 세트는 학습 데이터와 유사한 입력(즉, SA가 낮은 경우)부터 학습 데이터와 유사하지 않은 입력(즉, SA가 높은 경우)을 포괄하는 다양한 입력들을 포함하는 세트일 수 있다. DL 시스템에 대한 입력 세트가 SA를 고려하여 체계적으로 다각화될 수록, 입력 세트를 통하여 효과적으로 네트워크의 학습 결과를 체크할 수 있다.The input set with high SC may be a set including various inputs ranging from inputs similar to training data (ie, low SA) to inputs not similar to training data (ie, high SA). As the input set to the DL system is systematically diversified in consideration of SA, it is possible to effectively check the learning result of the network through the input set.

SC에서의 C와 같이 커버리지(Coverage) 또는 커버(Cover)와 같은 용어들이 이용되지만, SA 기반 커버리지의 의미는 단순한 구조적 커버리지(structural coverage)와는 그 의미에 차이가 있을 수 있다. 첫째로, 대부분의 구조적 커버리지 기준과 달리, 성명서(statement) 또는 브랜치(branch) 커버리지에서와 같이 커버해야 할 유한한 수의 목표가 존재하는 것이 아닐 수 있다. 최소한 이론적으로는, 입력의 놀라움의 정도는 임의로 결정될 수 있다. 그러나, SA 값이 임의로 높은 입력은 문제 영역(problem domain)과 관련이 없거나, 또는 덜 흥미로울(less interesting) 수 있다(예 : 교통 표지 이미지는 동물 사진 분류기의 테스트와 관련이 없을 수 있다). SC는 이론적으로 유한한 경로 범위가 매개 변수에 의해 제한되는 것과 같은 방식으로, 사전에 정의된 상한에 대해서만 측정 될 수 있다.Like C in SC, terms such as coverage or cover are used, but the meaning of SA-based coverage may be different from that of simple structural coverage. First, unlike most structural coverage criteria, there may not be a finite number of goals to cover, such as in a statement or branch coverage. At least theoretically, the degree of surprise of the input can be arbitrarily determined. However, an input with an arbitrarily high SA value may not be related to a problem domain, or may be less interesting (eg, a traffic sign image may not be related to a test of an animal photo classifier). SC can theoretically be measured only against a predefined upper bound, in the same way that a finite path range is bounded by a parameter.

둘째, SC는 조합 집합 커버 문제(combinatorial set cover problem)에 그치지 않으며, 테스트 스위트 최소화(test suite minimization)에 기초하여 공식화될 수 있다. 이는, 단일 입력이 단일 SA 값만 생성하고 여러 SA 버킷에 속할 수 없기 때문일 수 있다. 커버리지 기준(coverage criteria)으로서 SC에 대한 리던던시(redundancy)는 구조적 커버리지보다 약하며, 단일 입력으로 복수의 타겟을 커버할 수 있다.Second, SC is not limited to a combinatorial set cover problem, and can be formulated based on test suite minimization. This may be because a single input only generates a single SA value and cannot belong to multiple SA buckets. As coverage criteria, redundancy for SC is weaker than structural coverage, and multiple targets can be covered with a single input.

III. 연구 관련 질문들III. Research questions

본 발명과 관련하여, 다음과 같은 질문들이 제시될 수 있다.In connection with the present invention, the following questions may be posed.

RQ1. 놀라움(surprise): SADL은 DL 시스템의 입력에 대한 상대적인 놀라움(relative surprise)을 포착 할 수 있는가?RQ1. Surprise: Can SADL capture a relative surprise on the input of the DL system?

서로 다른 관점에서 RQ1에 대한 답변이 제공될 수 있다. 먼저, 원래 데이터 집합에 포함된 각 테스트 입력의 SA를 계산하고, DL 분류기가 입력을 정확하게 분류하기가 훨씬 더 어려운 입력을 찾는 지 여부를 확인할 수 있다. 더 놀라운 입력은 올바르게 분류하기가 더 어려울 것으로 예상될 수 있다. 둘째, 적대적 예시들(adversarial examples)이 더 놀라울 뿐만 아니라 DL 시스템의 다른 행동들을 유발할 것으로 기대하므로, SA 값을 기반으로 적대적 예시들을 탐지할 수 있는지 여부를 평가할 수 있다. 서로 다른 기술을 사용하여 여러 집합의 적대적 예시들이 생성되고, SA 값에 기초하여 비교될 수 있다.Answers to RQ1 may be provided from different perspectives. First, we can compute the SA of each test input included in the original data set and see whether the DL classifier finds the input that is much more difficult to classify the input correctly. A more surprising input can be expected to be more difficult to classify correctly. Second, since adversarial examples are expected to cause other behaviors of the DL system as well as more surprising, it can be evaluated whether adversarial examples can be detected based on the SA value. Multiple sets of adversarial instances can be generated using different techniques and compared based on SA values.

마지막으로, SA 값에 대한 로지스틱 회귀(logistic regression)를 사용하여 적대적 예시 분류기(adversarial example classifiers)가 학습될 수 있다. 일 예시로, 각 대적 공격 전략에 대해 MNIST 및 CIFAR-10에서 제공 한 10,000 개의 원본 테스트 이미지를 사용하여 10,000 개의 적대적 예시들이 생성되고, 무작위로 선택된 1,000 개의 원본 테스트 이미지와 1,000 개의 적대적 예시들을 사용하여 로지스틱 회귀 분류기(logistic regression classifiers)가 학습될 수 있다. 그 후, 나머지 9,000 개의 원본 테스트 이미지와 9,000 개의 적대적 예시를 사용하여 학습된 분류기가 평가될 수 있다. SA 값이 DL 시스템의 동작을 올바르게 포착한다면, SA 기반 분류기가 적대적 예시들을 성공적으로 감지 할 것으로 예상될 수 있다. 일실시예에 따르면, ROC-AUC(Reliant Under Operator Operator 특성의 곡선 아래 면적)를 평가에 사용함으로써, 참 및 거짓 양성 비율이 모두 포착될 수 있다.Finally, adversarial example classifiers can be trained using logistic regression on SA values. As an example, for each adversarial attack strategy, 10,000 adversarial examples are generated using 10,000 original test images provided by MNIST and CIFAR-10, and 1,000 randomly selected original test images and 1,000 adversarial examples are used. Logistic regression classifiers can be trained. Then, the trained classifier can be evaluated using the remaining 9,000 original test images and 9,000 adversarial examples. If the SA value correctly captures the behavior of the DL system, it can be expected that the SA-based classifier will successfully detect hostile instances. According to one embodiment, both true and false positive rates can be captured by using ROC-AUC (area under the curve of the Reliant Under Operator Operator characteristic) for evaluation.

RQ2. 층 민감도(Layer Sensitivity): SA 계산에 사용되는 뉴런 층의 선택이 SA가 DL 시스템의 동작을 얼마나 정확하게 반영하는지에 영향을 주는가?RQ2. Layer Sensitivity: Does the choice of neuron layer used to calculate SA affect how accurately SA reflects the behavior of the DL system?

일실시예에 따르면, Bengio et al. KDE 기반의 적대적 예시들 탐지 기술을 도입하는 경우, 탐지에 유용한 가장 많은 정보를 포함하는 가장 깊은(즉, 마지막 숨겨진) 레이어를 가정할 수 있다. 모든 개별 계층의 LSA 및 DSA를 계산 한 다음 각 계층에서 SA에 대해 학습된 적대적 예시들 분류기를 비교하여 SA의 맥락에서 이 가정을 평가할 수 있다.According to one embodiment, Bengio et al. When KDE-based adversarial examples detection technology is introduced, the deepest (ie, last hidden) layer containing the most information useful for detection may be assumed. We can evaluate this assumption in the context of SA by computing the LSA and DSA of all individual layers and then comparing the adversarial examples classifiers learned for SA in each layer.

RQ3. 상관 관계: SC는 DL 시스템의 기존 커버리지 기준과 상관 관계가 있는가?RQ3. Correlation: Does the SC correlate with the existing coverage criteria of the DL system?

입력 놀라움(input surprise)을 캡처하는 것 외에도, SC는 집계를 기반으로 기존 커버리지 기준과 일치되어야 할 수 있다. 그렇지 않은 경우, SC는 입력 다양성 이외의 것을 측정할 위험이 있을 수 있다. 이를 위하여, SC가 다른 기준과 상관 관계가 있는지를 확인할 수 있다. 구체적으로, 서로 다른 방법으로 생성된 입력(즉, 다른 적대적 예시들 생성 기술 또는 입력 합성 기술)에 의해 생성된 입력들을 누적하여 입력 다양성을 제어하고, 이러한 입력으로 DL 시스템을 실행하고, SC 및 복수 개의 기존 커버리지 기준들을 포함하는 다양한 커버리지 기준들의 변화를 관찰 및 비교할 수 있다. 일실시예에 따른 복수 개의 기존 커버리지 기준들은 DeepXplore의 뉴런 커버리지(NC), 딥 게이지가 도입한 뉴런 레벨 커버리지(NLC), k- 섹션 뉴런 커버리지(KMNC), 뉴런 경계 커버리지(NBC) 및 강력한 뉴런 활성화 커버리지(SNAC) 중 적어도 일부를 포함할 수 있다.In addition to capturing input surprise, the SC may need to match existing coverage criteria based on aggregation. Otherwise, the SC may run the risk of measuring anything other than input diversity. To this end, it can be checked whether the SC is correlated with other criteria. Specifically, input diversity is controlled by accumulating inputs generated by inputs generated by different methods (i.e., different adversarial example generation techniques or input synthesis techniques), running a DL system with these inputs, SC and multiple It is possible to observe and compare changes in various coverage criteria including the existing coverage criteria. A plurality of existing coverage criteria according to an embodiment are DeepXplore's neuron coverage (NC), deep gauge introduced neuron level coverage (NLC), k-section neuron coverage (KMNC), neuron boundary coverage (NBC), and strong neuron activation. It may include at least a portion of the coverage (SNAC).

일 예시로, MNIST 및 CIFAR-10의 경우, 데이터 집합에서 제공 한 원본 테스트 데이터(10,000 개 이미지)부터 시작하여 각 단계에서 FGSM, BIM-A, BIM-B, JSMA 및 C & W에서 생성 한 1,000 개의 적대적 예시가 추가될 수 있다. 다른 예시로, Dave-2의 경우, 원래 테스트 데이터(5,614 개 이미지)에서 시작하여 각 단계마다 DeepXplore에서 생성된 700 개의 합성 이미지가 추가될 수 있다. 다른 예시로, Chauffeur의 경우, 각 단계는 임의의 수의 DeepTest 변환을 적용하여 생성된 1,000 개의 합성 이미지 (Set1 ~ Set3)가 추가될 수 있다.As an example, for MNIST and CIFAR-10, starting with the original test data (10,000 images) provided by the dataset, at each step 1,000 generated by FGSM, BIM-A, BIM-B, JSMA and C&W. A hostile example of dogs may be added. As another example, for Dave-2, 700 synthetic images generated by DeepXplore can be added at each step, starting from the original test data (5,614 images). As another example, in the case of Chauffeur, 1,000 composite images (Set1 to Set3) generated by applying an arbitrary number of DeepTest transforms may be added to each step.

RQ4. 지침(Guidance): SA는 DeepXplore에서 생성된 적대적 예시들과 합성 테스트 입력에 대한 정확도를 향상시키기 위해 DL 시스템의 재학습을 가이드 할 수 있는가?RQ4. Guidance: Can the SA guide the retraining of the DL system to improve the accuracy of the synthetic test inputs and adversarial examples generated in DeepXplore?

SADL이 적대적 예시들에 비해 정확성을 높이기 위해 DL 시스템들의 추가적인 학습을 가이드(guide) 할 수 있는지 평가하기 위하여, SA가 추가 학습을 위한 입력 선택을 가이드할 수 있는지가 중요할 수 있다. 일 예시로, 이러한 모델들에 대한 적대적 예시들과 합성된 입력에서, 4개의 서로 다른 SA 범위에서 100 개의 이미지로 구성된 4 가지 집합들이 선택될 수 있다. SC를 계산하기 위하여 RQ3에서 이용된 상한(upper bound)으로 U를 가정하면, SA의 범위

를 4 개의 겹치는(overlapping) 부분 집합들로 분류될 수 있다. 구체적으로, SA의 값에 따라, 하위 25%의 SA 값들(

), 하위 50%의 SA 값들(

), 하위 75%의 SA 값들(

) 및 전체 SA 범위(

)이 부분 집합들에 포함될 수 있다.In order to evaluate whether SADL can guide further learning of DL systems to increase accuracy compared to adversarial examples, it may be important whether SA can guide input selection for further learning. As an example, in the input synthesized with adversarial examples for these models, four sets of 100 images from four different SA ranges may be selected. Assuming U as the upper bound used in RQ3 to calculate SC, the range of SA

can be classified into four overlapping subsets. Specifically, according to the value of SA, the lower 25% of SA values (

), SA values of the lower 50% (

), SA values of the lower 75% (

) and full SA coverage (

) can be included in the subsets.

네 가지 부분 집합들은 점점 더 다양한 입력 세트들을 나타낼 것으로 예상될 수 있다. 일실시예에따르면, 범위 R을 네가지 부분 집합들 중 하나로 설정하고, 각 R에서 100개의 이미지를 무작위로 샘플링하고, 추가 세대들(예를 들어, 5개의 추가 세대들)을 위해 기존 모델들을 학습시킬 수 있다. 마지막으로 모든 적대적 입력들과 합성 입력들(synthetic inputs)에 대한 각 모델의 성능 (예를 들어, MNIST의 정확도, CIFAR-10의 정확도 및 Dave-2의 MSE)을 각각 측정할 수 있다. 보다 다양한 하위 집합으로 재 학습하는 경우, 성능이 향상 될 것으로 예상될 수 있다.The four subsets can be expected to represent increasingly diverse input sets. According to one embodiment, we set the range R to one of four subsets, randomly sample 100 images in each R, and train existing models for additional generations (e.g., 5 additional generations). can do it Finally, we can measure the performance of each model on all adversarial inputs and synthetic inputs (eg MNIST accuracy, CIFAR-10 accuracy, and Dave-2 MSE), respectively. When retraining with a more diverse subset, performance can be expected to improve.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA) array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the method for evaluating the test suitability of input data for a neural network performed in a processor,
receiving the input data;
obtaining an activation trajectory corresponding to an output pattern of the neurons while the neural network is being trained for each result that can be inferred by the neural network including a plurality of neurons;
inferring a first result among the inferable results by applying the input data to the neural network;
inferring a second result from among the inferable results based on an output pattern of the neurons in response to the input data and the activation trajectory; and
evaluating the test suitability of the input data by comparing the first result with the second result;
containing,
A method for evaluating the test fit of input data for neural networks.

According to claim 1,
determining how well the neural network has been trained based on a result of evaluating the test suitability;
further comprising,
A method for evaluating the test fit of input data for neural networks.

According to claim 1,
If the test fit is less than a predetermined threshold,
obtaining a training set; and
re-training the neural network based on the acquired training set
further comprising,
A method for evaluating the test fit of input data for neural networks.

4. The method of claim 3,
The step of obtaining the training set is
determining the training set such that at least some of the data included in the training set is associated with the input data;
containing,
A method for evaluating the test fit of input data for neural networks.

According to claim 1,
The step of inferring the first result is
obtaining a result output from the neural network in response to the input data;
containing,
A method for evaluating the test fit of input data for neural networks.

According to claim 1,
The step of inferring the second result is
determining where the output pattern of the neurons in response to the input data is located on the probability density distribution based on a probability density distribution formed by data included in the activation trajectory
containing,
A method for evaluating the test fit of input data for neural networks.

According to claim 1,
The step of inferring the second result is
determining, among data included in the activation trajectory, data having the highest predetermined similarity to output patterns of the neurons in response to the input data; and
determining, as the second result, a value corresponding to a result output when the neural network receives the data having the highest similarity as the second result;
containing,
A method for evaluating the test fit of input data for neural networks.

8. The method of claim 7,
The similarity is
with respect to first data and second data to be compared, calculated by respectively comparing respective elements of the first data and corresponding elements of the second data;
A method for evaluating the test fit of input data for neural networks.

8. The method of claim 7,
The similarity is
With respect to the first data and the second data to be compared, corresponding to a Euclidean distance calculated by respectively comparing respective elements of the first data and corresponding elements of the second data,
A method for evaluating the test fit of input data for neural networks.

A computer-readable recording medium storing a program for executing the method of any one of claims 1 to 9 on a computer.

In the test suitability evaluation apparatus for input data for a neural network,
memory in which the program is recorded; and
a processor that executes the program
including,
The program is
receiving the input data;
obtaining an activation trajectory corresponding to an output pattern of the neurons while the neural network is being trained for each result that can be inferred by the neural network including a plurality of neurons;
inferring a first result among the inferable results by applying the input data to the neural network;
inferring a second result from among the inferable results based on an output pattern of the neurons in response to the input data and the activation trajectory; and
evaluating the test suitability of the input data by comparing the first result with the second result;
to do,
Test fit evaluation device for input data for neural networks.

12. The method of claim 11,
determining how well the neural network has been trained based on a result of evaluating the test suitability;
to do more,
Test fit evaluation device for input data for neural networks.

12. The method of claim 11,
If the test fit is less than a predetermined threshold,
obtaining a training set; and
re-training the neural network based on the acquired training set
to do more,
Test fit evaluation device for input data for neural networks.

14. The method of claim 13,
The step of obtaining the training set is
determining the training set such that at least some of the data included in the training set is associated with the input data;
containing,
Test fit evaluation device for input data for neural networks.

12. The method of claim 11,
The step of inferring the first result is
obtaining a result output from the neural network in response to the input data;
containing,
Test fit evaluation device for input data for neural networks.

12. The method of claim 11,
The step of inferring the second result is
determining where the output pattern of the neurons in response to the input data is located on the probability density distribution based on a probability density distribution formed by data included in the activation trajectory
containing,
Test fit evaluation device for input data for neural networks.

12. The method of claim 11,
The step of inferring the second result is
determining, among data included in the activation trajectory, data having the highest predetermined similarity to output patterns of the neurons in response to the input data; and
determining, as the second result, a value corresponding to a result output when the neural network receives the data having the highest similarity as the second result;
containing,
Test fit evaluation device for input data for neural networks.

18. The method of claim 17,
The similarity is
with respect to first data and second data to be compared, calculated by respectively comparing respective elements of the first data and corresponding elements of the second data;
Test fit evaluation device for input data for neural networks.

18. The method of claim 17,
The similarity is
With respect to the first data and the second data to be compared, corresponding to a Euclidean distance calculated by respectively comparing respective elements of the first data and corresponding elements of the second data,
Test fit evaluation device for input data for neural networks.