KR101149002B1

KR101149002B1 - Adaptive brain-computer interface device

Info

Publication number: KR101149002B1
Application number: KR1020110055735A
Authority: KR
Inventors: 김기응; 김진형; 박재영
Original assignee: 한국과학기술원
Priority date: 2011-06-09
Filing date: 2011-06-09
Publication date: 2012-05-23
Also published as: KR20110085952A

Abstract

본 발명은 적응형 뇌-컴퓨터 인터페이스 장치에 관한 것으로, 해결하고자 하는 기술적 과제는 최대한 빠른 시간 안에 정확하게 사용자가 원하는 명령어를 찾아낼 수 있도록 상황에 맞추어 적응적으로 자극의 순서를 결정할 수 있는 적응형 뇌-컴퓨터 인터페이스 장치를 제공하는데 있다.
이를 위해 본 발명에 따른 적응형 뇌-컴퓨터 인터페이스 장치는 자극을 생성하고 사용자에게 자극을 가하는 자극 생성 장치와, 상기 자극에 의해 발생되는 사용자의 뇌파 신호를 기록하는 신호 수집 장치와, 상기 뇌파 신호로부터 상기 자극에 대한 P300의 특질을 추출하는 전처리 장치와, 상기 전처리 장치의 추출 신호로부터 상기 P300의 존재 여부를 판단하는 해석 장치 및 상기 해석 장치의 관측으로부터 현재 상태를 추론하고 상기 현재 상태에 대한 최적 자극을 선택하여 상기 자극 생성 장치의 자극 순서를 결정하는 자극 순서 결정 장치를 포함하는 것을 특징으로 한다.The present invention relates to an adaptive brain-computer interface device, the technical problem to be solved is an adaptive brain that can adaptively determine the order of stimulation according to the situation so that the user can accurately find the desired command as soon as possible To provide a computer interface device.
To this end, the adaptive brain-computer interface device according to the present invention comprises a stimulus generating device for generating a stimulus and applying a stimulus to the user, a signal collection device for recording an EEG signal of the user generated by the stimulus, A preprocessing device for extracting the characteristics of P300 for the stimulus, an analysis device for determining the presence or absence of the P300 from the extraction signal of the preprocessing device, and an inference of the current state from the observation of the analysis device and an optimal stimulus for the current state It is characterized in that it comprises a stimulation order determination device for selecting the stimulation order of the stimulus generating device.

Description

ADAPTIVE BRAIN-COMPUTER INTERFACE DEVICE

본 발명은 적응형 뇌-컴퓨터 인터페이스 장치에 관한 것으로, 단위 시간당 입력할 수 있는 명령어 및 메시지의 수를 향상시킨 적응형 뇌-컴퓨터 인터페이스 장치에 관한 것이다.
The present invention relates to an adaptive brain-computer interface device, and more particularly, to an adaptive brain-computer interface device for improving the number of commands and messages that can be input per unit time.

일반적으로 뇌-컴퓨터 인터페이스는 물리적인 행동을 사용하지 않고 뇌의 활동을 사용하여 외부 장치에 명령어 및 메시지를 내리는 장치를 말하며, 특히 신체장애자들의 의사 표현 및 장치 제어에 큰 도움을 줄 수 있다.In general, the brain-computer interface refers to a device that uses commands of the brain to send commands and messages to external devices without using physical actions. In particular, the brain-computer interface can be very helpful for the expression and control of the physically handicapped.

이러한 뇌-컴퓨터 인터페이스를 구성함에 있어, 뇌에 직접적으로 전극을 삽입하여 뇌의 활동을 측정하는 침습형 방법과 두개골 외부에서 관측되는 뇌파를 이용하여 뇌의 활동을 측정하는 비침습형 방법이 사용된다.
In constructing the brain-computer interface, an invasive method for measuring brain activity by directly inserting electrodes into the brain and a non-invasive method for measuring brain activity using brain waves observed from outside the skull are used. .

비침습형 방법을 이용하여 측정되는 뇌파(Electroencephalogram; EEG)는 뇌의 뉴런들에 의해 발생하는 전기적 신호이다.Electroencephalograms (EEGs), measured using non-invasive methods, are electrical signals generated by neurons in the brain.

이러한 전기적 신호는 뇌파를 측정하는 전극 주위에 존재하는, 수천에서 수백만 뉴런들의 동시적인 전기 활동의 합을 나타낸다.These electrical signals represent the sum of the simultaneous electrical activity of thousands to millions of neurons, which exist around the electrode measuring brain waves.

사용자가 집중하고 있는 체성(體性) 자극이 주어지면 뇌파에는 해당 자극에 따른 사건 연계 전위(Event-Related Potentials; ERPs)가 발생하게 된다.Given the somatic stimulus that the user is focused on, EEG generates Event-Related Potentials (ERPs) according to the stimulus.

P300은 자극이 주어진 이후 약 300ms 이후에 발생하는 사건 연계 전위에서의 + 전위로 향하는 피크를 말하며, P300은 뇌-컴퓨터 인터페이스를 구성함에 있어 신뢰할 수 있는 뇌의 활동으로 알려져 있다.
P300 is the peak towards the + potential at the event linkage potential that occurs about 300 ms after the stimulus is given, and P300 is known for reliable brain activity in constructing the brain-computer interface.

이러한 P300을 이용하여 뇌-컴퓨터를 구성하기 위해 활발히 연구되어 온 실시 예는 P300 speller로 뇌파를 이용하여 키보드 입력을 하기 위한 장치이다.An embodiment that has been actively studied to construct a brain-computer using the P300 is a device for inputting a keyboard using brain waves with a P300 speller.

P300 speller에서는 6x6 행렬에 문자가 배열되어 있고, 사용자는 이 36개의 문자 중 자신이 입력하고자 하는 문자를 주시한다. 각 행과 열의 문자들이 임의의(random) 순서로 짧은 시간 동안 밝아졌다가 어두워지는 형태의 자극이 사용자에게 주어진다.In the P300 speller, the characters are arranged in a 6x6 matrix, and the user watches the characters of the 36 characters. The user is given a stimulus in which the characters in each row and column light up and darken in a random order for a short time.

이러한 자극이 주어지는 상황에서 사용자가 응시하고 있는 문자에 자극이 주어지게 되면 사건 연계 전위가 뇌파에 나타나게 되고, 자극이 주어진 약 300ms 이후에는 뇌파에 P300이 발생한다.Given this stimulus, if the stimulus is given to the character staring at the user, the event-linked potential appears in the EEG, and P300 is generated in the EEG after about 300 ms given the stimulus.

뇌-컴퓨터 인터페이스는 이러한 P300 신호를 인식하여 사용자가 응시하고 있는 문자를 찾아낼 수 있게 된다. 즉, 어떤 자극에 대응되는 뇌의 신호에 P300이 존재한다면, 사용자가 해당 자극에 대응되는 문자를 입력하기 원한다고 해석할 수 있고, P300이 검출되지 않는다면, 사용자가 해당 자극에 대응되는 문자의 입력을 원하지 않는다고 해석할 수 있다.The brain-computer interface recognizes the P300 signal and can find out which character the user is staring at. That is, if P300 is present in the brain signal corresponding to a stimulus, the user may interpret that the user wants to input a character corresponding to the stimulus. If P300 is not detected, the user inputs a character corresponding to the stimulus. It can be interpreted as not desired.

또한, P300 speller에서 행렬의 크기를 조절하여 입력할 수 있는 문자의 수를 변화시킬 수 있으며, 문자를 명령어로 대체하여 휠체어와 같은 장치의 제어에 적용할 수도 있다.
In addition, the number of characters that can be input can be changed by adjusting the size of the matrix in the P300 speller, and the characters can be replaced with commands to be applied to the control of devices such as wheelchairs.

도 4는 종래의 적응형 뇌-컴퓨터 인터페이스 장치의 일 구성도이다.
4 is a block diagram of a conventional adaptive brain-computer interface device.

한편, P300을 기반으로 하는 뇌-컴퓨터 인터페이스 장치의 전형적인 구조는 도 4에 도시된 바와 같다.Meanwhile, a typical structure of the brain-computer interface device based on P300 is shown in FIG. 4.

자극 생성 장치(10)는 사용자가 원하는 조건에서 P300이 발생할 수 있도록 자극을 생성하는 장치이며, 신호 수집 장치(20)는 주어지는 자극에 대한 사용자의 뇌파를 기록하는 장치이다. 또한, 전처리 장치(30)는 주어진 뇌파 신호로부터 P300의 추출과 같은 전처리를 하게 되고, 해석 장치(40)는 전처리 장치로 부터 주어진 추출된 신호에 P300이 존재하는지 여부를 판단하며, 뇌-컴퓨터 인터페이스에 연결된 외부 장치(50)에 명령을 내리는 역할을 한다.The stimulus generating device 10 is a device for generating a stimulus so that P300 may occur under a condition desired by the user, and the signal collection device 20 is a device for recording the brain waves of the user with respect to a given stimulus. In addition, the preprocessor 30 performs preprocessing such as extraction of P300 from a given EEG signal, and the interpreter 40 determines whether P300 is present in the extracted signal given from the preprocessor, and the brain-computer interface. It serves to give a command to the external device 50 connected to.

상기와 같은 종래의 뇌-컴퓨터 인터페이스 장치는 사용자가 입력하고자 하는 명령어를 찾아내기 위하여, 모든 종류의 자극에 대하여 동일한 횟수의 자극을 사용자에게 주게 되고, 사용자에게 각 종류의 자극을 주는 순서가 임의적(random)으로 정해진다.The conventional brain-computer interface device as described above gives the user the same number of stimuli for all kinds of stimuli in order to find a command to be input by the user, and the order of giving each kind of stimuli to the user is arbitrary ( random).

상기와 같이, 모든 종류의 자극에 대하여 동일한 횟수의 자극을 주는 것은 불필요한 과정이며, 자극을 주는 순서를 임의적으로 정하는 것 또한 뇌-컴퓨터 인터페이스의 성능 향상에 한계를 가져온다.As described above, giving the same number of stimuli to all kinds of stimuli is an unnecessary process, and arbitrarily setting the order of stimulation also brings limitations to the performance improvement of the brain-computer interface.

예를 들어 P300 speller에 있어서, 현재까지 주어진 자극과 그에 대해 수집된 뇌파 신호를 사용하여, 첫 번째 행에 사용자의 의도가 존재할 가능성이 매우 낮은 상황일 경우 첫 번째 행에 다시 자극을 주어야 할 이유는 없다.For example, in the P300 speller, given the stimulus given so far and the EEG signals collected for it, if the user's intention is very unlikely to be present in the first row, the reason to re-stimulate the first row is none.

또한 두 번째 열과, 세 번째 열에 사용자의 의도가 존재할 가능성이 높은 상황이라면, 이 두 열에 자극을 반복적으로 주어 불확실성을 줄이는 것이 더 좋은 방법이 될 것이다.Also, in situations where there is a high likelihood of user intent in the second and third columns, it may be better to reduce the uncertainty by repeatedly stimulating these two columns.

즉 사용자에게 주는 자극의 순서를 효과적으로 결정할 수 있다면, 적은 횟수의 자극만으로도 사용자의 의도를 파악할 수 있다.
That is, if the order of the stimuli to give to the user can be effectively determined, the user's intention can be grasped with only a small number of stimuli.

본 발명은 상기한 바와 같은 문제를 해결하기 위해 안출된 것으로, 최대한 빠른 시간 안에 정확하게 사용자가 원하는 명령어를 찾아낼 수 있도록 상황에 맞추어 적응적으로 자극의 순서를 결정할 수 있는 적응형 뇌-컴퓨터 인터페이스 장치를 제공하는 것을 목적으로 한다.The present invention has been made to solve the above problems, an adaptive brain-computer interface device capable of adaptively determining the order of stimulation according to the situation so that the user can accurately find the desired command as soon as possible The purpose is to provide.

본 발명의 다른 목적은, 단위 시간당 많은 수의 명령어 및 메시지를 파악하여 신체 장애인이나 손, 발의 활용이 어려운 상황에서도 외부 장치 제어에 적용할 수 있는 적응형 뇌-컴퓨터 인터페이스 장치를 제공하는 것이다.
Another object of the present invention is to provide an adaptive brain-computer interface device that can grasp a large number of commands and messages per unit time and can be applied to external device control even in a situation where it is difficult to use a handicapped person or a hand or a foot.

상기한 바와 같은 목적을 달성하기 위해 본 발명에 따른 적응형 뇌-컴퓨터 인터페이스 장치는 자극을 생성하고 사용자에게 자극을 가하는 자극 생성 장치와, 상기 자극에 의해 발생되는 사용자의 뇌파 신호를 기록하는 신호 수집 장치와, 상기 뇌파 신호로부터 상기 자극에 대한 P300의 특질을 추출하는 전처리 장치와, 상기 전처리 장치의 추출 신호로부터 상기 P300의 존재 여부를 판단하는 해석 장치 및 상기 해석 장치의 관측으로부터 현재 상태를 추론하고 상기 현재 상태에 대한 최적 자극을 선택하여 상기 자극 생성 장치의 자극 순서를 결정하는 자극 순서 결정 장치를 포함하는 것을 특징으로 한다.In order to achieve the above object, the adaptive brain-computer interface device according to the present invention comprises a stimulus generating device for generating a stimulus and applying a stimulus to a user, and collecting a signal for recording an EEG signal of the user generated by the stimulus. A preprocessing device for extracting features of the P300 for the stimulus from the EEG signal, an analysis device for determining the presence or absence of the P300 from the extraction signal of the preprocessing device, and inferring the current state from the observations of the analysis device And a stimulus order determination device for selecting an optimal stimulus for the current state to determine the stimulation order of the stimulus generating device.

또한, 상기 적응형 뇌-컴퓨터 인터페이스 장치는 부분 관찰 마르코프 의사 결정 모델(POMDP)이 적용되어 최적 행동 정책을 결정할 수 있다.In addition, the adaptive brain-computer interface device may be subjected to a partial observation Markov decision model (POMDP) to determine the optimal behavioral policy.

또한, 상기 자극 순서 결정 장치는 사용자에게 주어진 이전의 자극과 각각의 자극에 대응되는 관측으로부터 현재 상태에 대한 확률 분포를 추론하는 신뢰 상태 갱신부 및 상기 신뢰 상태 갱신부의 신뢰 상태에 대응되는 최적 자극을 선택하여 수행하고, 외부 장치에 명령어 또는 메시지를 전달하는 최적 자극 선택부를 포함할 수 있다.In addition, the stimulus order determination device may be configured to generate a confidence state update unit for inferring a probability distribution for the current state from a previous stimulus given to the user and observations corresponding to each stimulus, and an optimum stimulus corresponding to the confidence state of the confidence state update unit. It may include an optimal stimulus selection unit to select and perform, and to deliver a command or a message to an external device.

또한, 상기 적응형 뇌-컴퓨터 인터페이스 장치는 지연 관측(delayed observation) POMDP가 적용되어 최적 행동 정책을 결정할 수 있다.In addition, the adaptive brain-computer interface device may apply a delayed observation POMDP to determine the optimal behavior policy.

또한, 상기 적응형 뇌-컴퓨터 인터페이스 장치는 중복 자극으로 인한 눈 멈(repetition blindness) 현상이 발생하는 시간 이내에 수행된 행동을 제외한 행동만으로 최적 행동 정책을 결정할 수 있다.In addition, the adaptive brain-computer interface device may determine the optimal behavioral policy based only on the behavior except for the behavior performed within the time when the repetition blindness due to the overlapping stimulus occurs.

또한, 상기 최적 행동 정책은 하기의 수학식으로 정의된 가치 함수에 의해 계산될 수 있다.In addition, the optimal behavior policy may be calculated by a value function defined by the following equation.

(여기서, A는 행동의 집합, A'는 500ms 이내에 수행된 행동의 집합)(Where A is the set of actions and A 'is the set of actions performed within 500ms)

또한, 상기 전처리 장치는 뇌파 신호의 평균을 구하여 잡음을 제거하거나, spatial filter algorithm, Mexican hat wavelet 등의 P300 추출 알고리즘을 사용할 수 있다.In addition, the preprocessor may remove noise by obtaining an average of EEG signals, or may use a P300 extraction algorithm such as a spatial filter algorithm or a Mexican hat wavelet.

또한, 상기 해석 장치는 Fisher's linear discriminant, stepwise linear discriminant analysis, support vector machine 등의 분류 알고리즘을 사용하는 P300 분류자(classifier)로 이루어질 수 있다.
In addition, the analysis device may be composed of a P300 classifier using a classification algorithm, such as Fisher's linear discriminant, stepwise linear discriminant analysis, support vector machine.

상기한 바와 같이 본 발명에 따른 적응형 뇌-컴퓨터 인터페이스 장치에 의하면, 최대한 빠른 시간 안에 정확하게 사용자가 원하는 명령어를 찾아낼 수 있도록 상황에 맞추어 적응적으로 자극의 순서를 결정할 수 있는 효과가 있다.As described above, according to the adaptive brain-computer interface device according to the present invention, it is possible to adaptively determine the order of stimulation according to the situation so that the user can accurately find the desired command as quickly as possible.

또한, 단위 시간당 많은 수의 명령어 및 메시지를 파악하여 신체 장애인이나 손, 발의 활용이 어려운 상황에서도 외부 장치 제어에 적용할 수 있는 효과가 있다.
In addition, by identifying a large number of commands and messages per unit time, there is an effect that can be applied to the control of the external device even in a situation where it is difficult to use the handicapped or hands and feet.

도 1은 본 발명의 일 실시예에 따른 적응형 뇌-컴퓨터 인터페이스 장치의 일 구성도.
도 2는 본 발명의 일 실시예에 따른 적응형 뇌-컴퓨터 인터페이스 장치의 다른 구성도.
도 3a는 [2x2] 행렬에서의 success rate의 실험 결과를 나타내는 도.
도 3b는 [2x3] 행렬에서의 success rate의 실험 결과를 나타내는 도.
도 4는 종래의 적응형 뇌-컴퓨터 인터페이스 장치의 일 구성도.1 is a block diagram of an adaptive brain-computer interface device in accordance with an embodiment of the present invention.
2 is another configuration diagram of an adaptive brain-computer interface device according to an embodiment of the present invention.
3A is a diagram showing an experimental result of a success rate in a [2 × 2] matrix.
Fig. 3b shows the experimental result of the success rate in the [2x3] matrix.
4 is a schematic diagram of a conventional adaptive brain-computer interface device.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다. 우선, 도면들 중 동일한 구성요소 또는 부품들은 가능한 한 동일한 참조부호를 나타내고 있음에 유의해야 한다. 본 발명을 설명함에 있어서 관련된 공지기능 혹은 구성에 대한 구체적인 설명은 본 발명의 요지를 모호하게 하지 않기 위해 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. First, it should be noted that the same components or parts in the drawings represent the same reference numerals as much as possible. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted so as not to obscure the subject matter of the present invention.

도 1은 본 발명의 일 실시예에 따른 적응형 뇌-컴퓨터 인터페이스 장치의 일 구성도이다.
1 is a block diagram of an adaptive brain-computer interface device according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 적응형 뇌-컴퓨터 인터페이스 장치는 도 1에 도시된 바와 같이, 자극 생성 장치(100)와, 신호 수집 장치(200)와, 전처리 장치(300)와, 해석 장치(400) 및 자극 순서 결정 장치(500)를 포함한다.
Adaptive brain-computer interface device according to an embodiment of the present invention, as shown in Figure 1, the stimulus generating device 100, the signal collection device 200, the preprocessing device 300, the analysis device ( 400 and the stimulation order determination device 500.

상기 자극 생성 장치(100)는 자극을 생성하여 사용자에게 자극을 가하는 장치이다.The stimulus generating device 100 generates a stimulus and applies a stimulus to a user.

구체적으로, 상기 자극 생성 장치(100)는 사용자가 원하는 조건에서 P300, 즉, 자극이 주어진 이후 약 300ms 이후에 발생하는 사건 연계 전위의 + 방향의 피크가 발생할 수 있도록 사용자에게 각 명령어에 해당되는 자극을 생성하여 자극을 주는 장치이다.Specifically, the stimulus generating device 100 provides a stimulus corresponding to each command to the user so that a peak in the + direction of the event-linked potential occurring after P300, that is, about 300 ms after the stimulus is given, under a condition desired by the user. It is a device to generate stimulation.

상기 자극 생성 장치(100)에서 자극을 주는 방법은 일반적인 P300 speller 시스템을 따른다.The stimulus generating method in the stimulus generating device 100 follows the general P300 speller system.

즉, 사용자는 선택하려는 문자를 응시하고, 행렬의 문자들은 한번에 한 문자씩 250ms 동안 깜빡인다. 이때, 250ms 중 앞의 125ms 동안은 문자가 밝아지고, 나머지 125ms 동안은 문자가 어두워진다.That is, the user gazes at the character to select, and the characters in the matrix blink 250ms one character at a time. At this time, the text becomes bright for the first 125ms among the 250ms, and the text becomes dark for the remaining 125ms.

사용자가 의도하는 하나의 문자를 찾아내는 작업을 test라고 할 때, 연속된 두 test 사이에는 2.5초의 휴지 기간이 존재할 수 있다.
When the task of finding a single character intended by the user is called test, there may be a 2.5 second pause between two consecutive tests.

상기 신호 수집 장치(200)는 상기 자극에 의해 발생되는 사용자의 뇌파 신호를 기록하는 장치이다.The signal collection device 200 is a device for recording the brain wave signal of the user generated by the stimulus.

상기 뇌파 신호(EEG signal)는 Biopac MP 150 system을 통해 16개의 채널로부터 1kHz로 수집할 수 있다.The EEG signal may be collected at 1 kHz from 16 channels through the Biopac MP 150 system.

P300은 자극을 준 이후 300ms 이후에 발생하므로, 한번의 깜빡이는 자극에 대응되는 EEG 신호는 자극을 준 이후 200ms 부터 450ms 까지의 data로 구성될 수 있다.
Since P300 occurs 300ms after the stimulus, the EEG signal corresponding to one blinking stimulus may consist of data from 200ms to 450ms after the stimulus.

상기 전처리 장치(300)는 상기 뇌파 신호로부터 상기 자극에 대한 P300의 특질을 추출하는 장치이다.The preprocessor 300 is a device for extracting the characteristics of P300 for the stimulus from the EEG signal.

상기 뇌파 신호는 여러 가지 원인으로부터 발생하는 뇌의 활동 상태를 모두 포함하고 있기 때문에 상기 P300은 다른 원인으로부터 발생한 뇌파 신호에 의해 직접적으로 잘 드러나지 않는다. 즉, 뇌파 신호에 잡음이 많이 포함되므로 P300의 직접적 측정에는 무리가 따른다.Since the EEG signal includes all the active states of the brain resulting from various causes, the P300 is not directly exposed by the EEG signal generated from other causes. In other words, since the EEG signal includes a lot of noise, direct measurement of the P300 is difficult.

이를 해결하기 위해 기계 학습(Machine Learning)의 여러 가지 기법들이 사용될 수 있다.To solve this problem, various techniques of machine learning can be used.

필터를 사용하여 P300의 검출이 수월하도록 정제된 뇌파를 구하는 것을 P300 특질 추출(P300 feature extraction)이라고 하며, 이를 위하여 뇌파의 평균을 구함으로서 잡음을 제거하거나, spatialfilter algorithm, Mexican hat wavelet와 같은 P300 추출 알고리즘을 사용하여 잡음을 제거할 수 있다.
Obtaining purified EEG using a filter to facilitate the detection of P300 is called P300 feature extraction. To do this, remove the noise by calculating the average of EEG, or extract P300 such as spatial filter algorithm or Mexican hat wavelet. An algorithm can be used to remove the noise.

상기 해석 장치(400)는 상기 전처리 장치(100)의 추출 신호로부터 상기 P300의 존재 여부를 판단하는 장치이다.The analysis device 400 is a device for determining the presence of the P300 from the extraction signal of the preprocessing device 100.

상기 해석 장치(400)는 P300 분류자(classifier)로 이루어질 수 있는데, 상기 P300 분류자(classifier)는 Fisher's linear discriminant, stepwise linear discriminant analysis(SWLDA), support vector Machin(SVM) 일 수 있다.
The analysis device 400 may include a P300 classifier. The P300 classifier may be Fisher's linear discriminant, stepwise linear discriminant analysis (SWLDA), and support vector machin (SVM).

상기 전처리 장치(Preprocessor,300)와 분류자(classifier)를 생성하기 위하여, 우선 target, 즉, 사용자가 응시하는 문자에 자극이 주어지는 경우와, non-target, 즉, 사용자가 응시하지 않는 문자에 자극이 주어지는 경우의 EEG 신호들로 이루어진 training data를 수집할 수 있다.In order to generate the preprocessor 300 and the classifier, first, a stimulus is given to a target, that is, a character gazing at the user, and a stimulus is generated on a non-target, ie, a character not gazed at the user. In this case, training data consisting of EEG signals can be collected.

원시 EEG 신호들에는 많은 잡음이 포함되어 있기 때문에, EEG 신호들은 6차 Butterworth 필터를 통해 band-pass filtering(0.5~30Hz)이 이루어질 수 있고, 100Hz로 down-sampling 될 수 있다.Since the raw EEG signals contain a lot of noise, the EEG signals can be band-pass filtered (0.5-30 Hz) through a 6th-order Butterworth filter and down-sampled to 100 Hz.

상기와 같이 수집된 EEG 신호로부터 특질(feature)을 추출하기 위하여 spatial projection algorithm을 사용할 수 있다.In order to extract a feature from the collected EEG signal, the spatial projection algorithm may be used.

이 알고리즘은 target과 non-target에 대응되는 EEG 신호를 최대한으로 구별할 수 있도록, 각 채널들로부터 수집된 EEG 신호를, 의미상 하나의 가상 채널로부터 수집된 신호로 변환하는 filter들을 계산한다.
This algorithm calculates filters that transform the EEG signal collected from each channel into a signal collected from one virtual channel, semantically, so that the EEG signal corresponding to the target and non-target can be distinguished as much as possible.

한편, 전처리 장치(Preprocessor,300)로부터 추출된 특질(feature)이 P300이 존재하는 target인지의 여부를 판단하기 위하여 분류자(classifier)로 LIBLINEAR package를 사용할 수 있고, binary output 대신 0~1 사이의 실수를 얻기 위하여 L2-regularized logistic regression을 사용할 수 있다.Meanwhile, a LIBLINEAR package can be used as a classifier to determine whether a feature extracted from a preprocessor (Preprocessor, 300) is a target in which P300 exists. You can use L2-regularized logistic regression to get a mistake.

이때, 상기 실수는 해당 특질(feature)이 target일 확률을 의미하며, 상기 분류자(classifier)의 parameter들은 training data로부터 5-fold cross-validation을 통해 결정될 수 있다.
In this case, the mistake means a probability that the feature is a target, and the parameters of the classifier may be determined through 5-fold cross-validation from training data.

상기 자극 순서 결정 장치(500)는 상기 해석 장치(400)의 관측으로부터 현재 상태를 추론하고, 상기 현재 상태에 대한 최적 자극을 선택하여 상기 자극 생성 장치(100)의 자극 순서를 결정하는 장치이다.
The stimulation order determination device 500 is a device for inferring the current state from the observation of the analysis device 400, and selecting an optimal stimulus for the current state to determine the stimulation order of the stimulus generating device 100.

일반적으로, 부분 관찰 마르코프 의사 결정 모델(partially observable Markov decision process,POMDP)은 부분적으로 관찰 가능한 문제에서의 일반적인 의사 결정 프레임워크이다.In general, the partially observable Markov decision process (POMDP) is a general decision framework for partially observable problems.

이는 불확실성이 존재하는 실제 문제들에 적합한 모델이고, 실제 세계에 대한 정확한 모델을 가정하며, 주어진 모델에서의 최적 행동 정책을 찾는다.It is a suitable model for real problems with uncertainty, assuming an accurate model of the real world, and finding the optimal policy of action in a given model.

상기 POMDP는 <S,A,Z,b₀,T,O,R,γ>의 8개 요소로 정의된다.The POMDP is defined by eight elements of <S, A, Z, b ₀ , T, O, R, γ>.

S는 상태의 집합, A는 행동의 집합, Z는 관찰의 집합, b₀는 초기 신뢰 상태로 b₀(s)는 초기 환경의 상태가 s일 확률을 나타낸다.S is the set of states, A is the set of actions, Z is the set of observations, b ₀ is the initial confidence state, and b ₀ (s) represents the probability that the state of the initial environment is s.

또한, T는 상태 전이 확률로 T(s,a,s')는 행동 a를 통해 상태 s에서 s'로 전이될 확률을 나타내고, O는 관측 확률로, O(s,a,z)는 행동 a를 통해 상태 s에 도달하여 관측 z를 관측할 확률을 나타낸다.In addition, T is the state transition probability T (s, a, s ') is the probability of transition from state s to s' through action a, O is the observed probability, O (s, a, z) is the action The probability of reaching observation state z by reaching state s through a.

더불어, R은 보상 함수로 R(s,a)는 상태 s에서 행동 a를 취했을 때 얻는 보상을 나타내며, γ는 할인율로 0과 1사이의 실수 값을 가진다.
In addition, R is the compensation function R (s, a) represents the reward obtained when taking action a in state s, and γ has a real value between 0 and 1 as the discount rate.

행동 주체는 현재의 환경 상태를 직접적으로 알 수 없고, 대신 관찰만을 얻을 수 있다. 따라서 행동 주체는 현재까지의 모든 행동과 그에 대응되는 관측을 통해 현재의 환경 상태에 대한 확률 분포를 유지한다. 이와 같은 것을 신뢰 상태(belief state)라고 하며, b_t(s)의 경우, 시간 t에서의 상태가 s일 확률을 나타낸다.The actor cannot directly know the current state of the environment, but can only obtain observations. Therefore, the actor maintains the probability distribution of the current environmental state through all the behaviors up to now and the corresponding observations. This is called a trust state, and in the case of b _t (s), it represents the probability that the state at time t is s.

시간 t에서의 신뢰 상태, 즉, b_t에서 행동 a_t를 수행하고 관측 z_t ₊₁을 얻었을 때, 다음 시간에 대한 신뢰 상태인, b_t ₊₁= τ(b_t,a_t,z_t ₊₁)는 Bayes rule에 의하여, 하기의 [수학식 1]과 같이 계산된다.
Trust status at time t, that is, when performing the action in a _t b _t and scored observation z _t _+1, the trust state of the next time, b _t ₊₁ = τ _(t b, _t a, z _t _{+ 1} ) is calculated by Equation 1 below by Bayes rule.

여기서 P(z_t ₊₁│b_t,a_t)는

이 되도록 하기 위한 정규화 상수이다.
Where P (z _t _{+1 |} b _t , a _t )

Normalization constant to ensure that

행동 정책은 행동 주체가 수행할 행동을 결정하며, 이는 신뢰 상태에서 행동으로의 대응관계로 표현된다.The action policy determines the action to be performed by the actor, which is expressed as a response from trust state to action.

모든 행동 정책은 대응되는 가치 함수(value function)을 가지고 있으며, 이것은 주어진 신뢰 상태로부터 해당 행동 정책에 따라 무한히 행동을 수행하였을 때 얻을 수 있는 할인된 보상 함수의 기대값을 나타낸다.
Every behavioral policy has a corresponding value function, which represents the expected value of a discounted reward function that can be obtained from a given trust state when an action is performed infinitely according to the behavioral policy.

*따라서, POMDP를 푼다는 것은 각각의 신뢰 상태에서 최대의 가치를 얻을 수 있는 행동 정책을 계산하는 것이다.Thus, solving POMDP is to calculate the behavioral policy that will get the most value from each trust state.

임의의 신뢰 상태에서 얻을 수 있는 최대의 가치는 하기의 [수학식 2]와 같이 재귀적으로 계산된다.
The maximum value that can be obtained in any confidence state is calculated recursively as in Equation 2 below.

또한, 주어진 최적 가치 함수에 대응되는 최적 행동 정책은 하기의 [수학식 3]과 같이 계산된다.
In addition, an optimal behavior policy corresponding to a given optimal value function is calculated as in Equation 3 below.

실제로 최적 행동 정책을 계산하는 것은 너무 오랜 시간이 걸리게 되어 불가능하므로, Point-Based Value Iteration과 같은 최적 행동 정책 근사 알고리즘들이 사용될 수 있다.
In practice, it is impossible to calculate the optimal behavioral policy because it takes too long, so optimal behavioral policy approximation algorithms such as Point-Based Value Iteration can be used.

도 2는 본 발명의 일 실시예에 따른 적응형 뇌-컴퓨터 인터페이스 장치의 구성도이다.
2 is a block diagram of an adaptive brain-computer interface device according to an embodiment of the present invention.

상기 자극 순서 결정 장치(500)는 부분 관찰 마르코프 의사 결정 모델(POMDP)이 적용되어 최적 행동 정책을 결정할 수 있다.The stimulation order determination device 500 may apply a partial observation Markov decision model (POMDP) to determine an optimal behavior policy.

이 경우에, 주어진 뇌-컴퓨터 인터페이스의 환경을 POMDP로 모델링하고, 주어진 POMDP 모델로부터 최적 행동 정책을 계산하여야 한다. 이후, 상기 POMDP 모델과 해당 POMDP 모델의 최적 행동 정책을 이용하여 도 2에 도시된 바와 같이, 신뢰 상태 갱신부(510)와 최적 자극 선택부(520)를 구성할 수 있다.In this case, the environment of a given brain-computer interface should be modeled with POMDP and the optimal behavior policy should be calculated from the given POMDP model. Thereafter, as shown in FIG. 2, the trust state updater 510 and the optimum stimulus selector 520 may be configured by using the POMDP model and the optimal behavior policy of the POMDP model.

상기 최적 행동 정책은 뇌-컴퓨터 인터페이스에서의 최적 행동 순서와 동일하며, 이는 관측에 잡음이 많이 끼어 있는 상황에서 최대의 정확도로 최대한 빠른 시간 안에 사용자가 의도하는 명령어를 파악할 수 있는 자극의 순서를 결정한다.
The optimal behavior policy is the same as the optimal behavior sequence in the brain-computer interface, which determines the sequence of stimuli that can grasp the user's intended instruction as quickly as possible with maximum accuracy in the case of noisy observations. do.

상기 신뢰 상태 갱신부(510)는 사용자에게 주어진 이전의 자극과 각각의 자극에 대응되는 관측으로부터 일반적인 POMDP에서의 신뢰 상태 갱신 방법을 이용하여 현재 상태에 대한 확률 분포를 추론할 수 있다.The confidence state updater 510 may infer a probability distribution of the current state by using a confidence state update method of the general POMDP from previous stimuli given to the user and observations corresponding to each stimulus.

즉, 상기 신뢰 상태 갱신부(510)는 각 명령어에 대해 현재 사용자가 입력하기를 원하는가에 대한 확률 분포를 추론하는 부분이다.
That is, the confidence state updater 510 is a part of inferring a probability distribution as to whether a current user wants to input the command.

상기 최적 자극 선택부(520)는 상기 신뢰 상태 갱신부(510)의 신뢰 상태에 대응되는 최적 자극을 선택하여 수행하고, 외부 장치(600)에 명령어 또는 메시지를 전달할 수 있다.The optimal stimulus selector 520 may select and execute an optimal stimulus corresponding to the trusted state of the trusted state updater 510, and may transmit a command or a message to the external device 600.

즉, 상기 최적 자극 선택부(520)는 미리 계산되어진 최적 행동 정책 또는 근사 최적 행동 정책을 통해 현재의 신뢰 상태에 대응되는 행동을 선택하여 수행한다.
That is, the optimal stimulus selector 520 selects and performs an action corresponding to the current trust state through a predetermined optimal behavior policy or an approximate optimal behavior policy.

상술한 바와 같이, 뇌-컴퓨터 인터페이스의 환경을 POMDP로 모델링하였을 때, 해당 모델은 최대한 적은 횟수의 자극으로 정확하게 사용자가 의도하는 명령어를 찾아내는 최적 행동 정책을 구할 수 있어야 한다.As described above, when the environment of the brain-computer interface is modeled with POMDP, the model should be able to obtain an optimal behavior policy that accurately finds the command intended by the user with the smallest number of stimuli.

이것은 POMDP 분야에서의 호랑이 문제(tiger problem)와 매우 유사하며, 호랑이 문제에서 문의 개수가 뇌-컴퓨터 인터페이스에서의 명령어 개수만큼 확장된 형태로 생각할 수 있다.
This is very similar to the tiger problem in the field of POMDP, where the number of statements in the tiger problem can be thought of as being expanded by the number of instructions in the brain-computer interface.

N을 명령어의 개수라고 할 때, 각 명령어들을 POMDP에서의 각 상태(N개의 상태)로 구성할 수 있고, 사용자에게 각 명령어에 해당되는 자극을 주거나, 각 명령어를 선택하는 것을 행동(2*N개의 행동)으로 구성할 수 있으며, 분류자(classifier)로부터 나오는 실수를 연속적인 관찰로 구성하거나, 이산화(discretization)하여 각각을 관찰(K개의 관찰)로 구성할 수 있다.When N is the number of commands, each command can be configured with each state (N states) in POMDP, giving the user a stimulus corresponding to each command, or selecting each command (2 * N Dog's behavior), and real numbers from classifiers can be composed of continuous observations, or discretized, each of which consists of observations (K observations).

또한, 자극을 주는 행동에 대해 낮은 보상을 설정할 수 있고, 사용자가 주시하고 있는 명령어를 선택하는 행동을 취할 경우 높은 보상을 설정할 수 있으며, 사용자가 주시하고 있지 않은 명령어를 선택하는 행동에 대해 매우 낮은 보상을 설정할 수 있다.You can also set low rewards for stimulating behaviors, set high rewards when you take action to select commands that you are watching, and very low for actions that users are not watching. Compensation can be set.

또한, 상태 전이함수는 자극을 주는 행동에 대해서는 이전의 상태와 동일하고, 사용자가 의도하는 명령어라고 선택하는 행동에 대해서는 모든 문자에 대해 동일한 확률로 상태 전이가 일어나도록 할 수 있다.In addition, the state transition function may be the same as the previous state for the stimulating action, and the state transition occurs with the same probability for all characters for the action that the user selects as the intended command.

또한, 관찰함수의 경우 뇌파에 많은 잡음이 포함되어 있기 때문에 실제의 뇌-컴퓨터 인터페이스를 동작하는 환경과 동일하게 모델링하는 것은 불가능하지만 POMDP가 적용되지 않은 기존의 비침습형 뇌-컴퓨터 인터페이스로부터 사전 실험을 통해 데이터를 수집하여, 베타 분포, 지수 분포 혹은 임의의 확률 분포로 맞춤(fitting)하여 사용할 수 있다.In addition, the observation function contains a lot of noise in the EEG, so it is impossible to model the actual brain-computer interface in the same way as the operating environment, but pre-experimental experiments are performed from the existing non-invasive brain-computer interface without POMDP. The data can be collected and used for beta distribution, exponential distribution or arbitrary probability distribution.

더불어, 할인율은 0에서 1사이의 실수로 행동 주체가 적절히 행동하도록 조절할 수 있고, 초기 신뢰 상태는 확률 분포로, 초기 환경 상태에 맞추어 적절히 조절할 수 있다.
In addition, the discount rate can be adjusted so that the actor acts properly by mistake between 0 and 1, and the initial trust state is a probability distribution, and can be adjusted appropriately according to the initial environment state.

한편, POMDP에서의 상태는 현재 사용자가 선택하기 원하는 명령어로 생각할 수 있으나, 뇌-컴퓨터 인터페이스는 이러한 상태를 직접적으로 알지 못한다.On the other hand, the state in the POMDP can be thought of as a command that the current user wants to select, but the brain-computer interface does not know this state directly.

따라서, 사용자에게 주어진 자극들과 각각의 자극에 대응되는 관측으로부터 사용자가 어떠한 명령어를 입력하고자 하는지를 각 상태에 대한 확률 분포로, 즉 신뢰 상태로 추론해야 한다.Therefore, from the stimuli given to the user and the observations corresponding to each stimulus, it is necessary to infer the command that the user wants to input as the probability distribution for each state, that is, the confidence state.

뇌-컴퓨터 인터페이스가 사용자에게 하나의 자극을 주었을 때 해당 자극이 사용자가 입력하기 원하는 명령어에 대한 자극일 경우, 주어지는 관측은 사용자가 입력하기 원하는 명령어에 대한 자극이 주어졌을 때 빈번하게 발생하는 관측일 확률이 높아진다.When the brain-computer interface gives a user a stimulus, if the stimulus is a stimulus for a command that the user wants to enter, the observations given are frequently observed when a stimulus is given for the command the user wants to enter. The probability increases.

반대로 주어진 자극이 사용자가 입력하기를 원하는 명령어에 대한 자극이 아닐 경우, 주어지는 관측은 사용자가 입력하기 원하는 명령어에 대한 자극이 주어졌을 때 빈번하게 발생하는 관측이 아닐 확률이 높아진다.On the contrary, if a given stimulus is not a stimulus for a command that a user wants to input, the probability that a given observation is not frequently generated when a stimulus for a command that a user wants to input is given.

즉, 하나의 자극을 주는 행동과 그에 대응되는 관측 값으로부터, 각 자극에 대응되는 명령어가 사용자가 원하는 명령어일 확률을 POMDP에서의 신뢰 상태로 추론할 수 있게 되고, 특정한 명령어가 사용자가 원하는 명령어일 확률이 비교적 높을 경우에는 뇌-컴퓨터 인터페이스가 해당 명령어에 대한 자극을 더욱 발생시킴으로써, 해당 명령어가 사용자가 원하는 명령어인지의 여부를 판별할 수 있다.That is, from the act of giving one stimulus and corresponding observations, the probability that the command corresponding to each stimulus is the command desired by the user can be inferred as a confidence state in the POMDP, and the specific command is the command desired by the user. If the probability is relatively high, the brain-computer interface may further generate a stimulus for the command, thereby determining whether the command is a command desired by the user.

이후, 특정한 명령어가 사용자가 원하는 명령어일 확률이 어느 수준 이상으로 높아지게 되면, 자극을 주는 행동을 취하는 것 보다, 해당 명령어를 선택함으로서, 주어진 POMDP 모델로부터 얻을 수 있는 가치의 기대값을 최대화하도록 행동하게 된다.
Then, when the probability that a particular instruction is the user's desired instruction rises above a certain level, rather than taking a stimulus action, choose the instruction to act to maximize the expected value of the value obtained from a given POMDP model. do.

한편, 상술된 POMDP 모델로부터 최적 행동 정책을 계산하는 것은 기존에 존재하는 최적 행동 정책 계산 알고리즘들을 사용하여 구할 수 있다.On the other hand, calculating the optimal behavior policy from the above-described POMDP model can be obtained using existing optimal behavior policy calculation algorithms.

그러나, 일반적으로 POMDP의 최적 행동 정책을 결정하는 문제는 PSPACE-Complete로 알려져 있으며, 상태, 행동 및 관찰의 수가 많은 문제에 대해서는 정확한 행동 정책을 계산하는데 있어 너무 오랜 시간이 걸리게 되므로 실제로는 계산할 수 없다. 이를 해결하기 위하여 Point-Based Value Iteration, Heuristic Search Value Iteration 등과 같은 최적 행동 정책 근사 알고리즘들이 사용될 수 있다.However, in general, the problem of determining the optimal behavior policy of POMDP is known as PSPACE-Complete, and for problems with a large number of states, behaviors, and observations, it takes too long to calculate the exact behavior policy, so it cannot be calculated. . To solve this problem, optimal behavior policy approximation algorithms such as Point-Based Value Iteration and Heuristic Search Value Iteration can be used.

이러한 근사 알고리즘을 이용할 경우 적당한 크기의 POMDP 모델에 대하여 근사된 최적 행동 정책을 충분히 계산할 수 있다.
Using this approximation algorithm, we can sufficiently calculate the approximate optimal behavior policy for the POMDP model of appropriate size.

한편, 상기 자극 순서 결정 장치(500)는 지연 관측(delayed observation) POMDP가 적용되어 최적 행동 정책을 결정할 수 있다.Meanwhile, the stimulation order determination device 500 may determine an optimal behavior policy by applying a delayed observation POMDP.

기본적인 POMDP는 하나의 행동을 수행한 이후, 그에 대한 관측이 다음 행동을 수행하기 이전에 주어진다고 가정할 수 있으나, 실제 뇌-컴퓨터 인터페이스에서는 이러한 가정이 성립하지 않을 수 있다.The basic POMDP may assume that an observation is given after performing an action and before performing the next action, but this assumption may not hold true in an actual brain-computer interface.

예를 들어, 250ms 간격으로 자극을 사용자에게 줄 경우에 있어, 첫 번째 자극에 대응되는 관측으로서 사용될 뇌파의 길이는 250ms를 넘을 수도 있다. 특히 P300을 사용하는 뇌-컴퓨터 인터페이스에서는 사용자에게 자극을 준 이후 300ms가 지나야 P300이 검출될 수 있으므로, 첫 번째 자극에 대한 관측은 최소한 두 번째 자극이 주어진 이후에야 얻을 수 있다.For example, in the case of giving the stimulus to the user at intervals of 250 ms, the length of the EEG to be used as the observation corresponding to the first stimulus may exceed 250 ms. In particular, in the brain-computer interface using P300, P300 can be detected only 300ms after the user is stimulated. Therefore, observation of the first stimulus can be obtained at least after the second stimulus is given.

이를 지연 관측이라고 하며, 이 제약 조건은 실제 POMDP 모델 과정에는 영향이 전혀 없지만, 해당 POMDP 모델의 최적 행동 정책을 계산하는 과정에서 문제가 발생할 수 있다. 이러한 제약 조건은 상기 지연 관측 POMDP(POMDP with delayed observations)를 사용하여 해결될 수 있다.
This is called delayed observation, and this constraint does not affect the actual POMDP model process, but problems may arise in calculating the optimal behavior policy of the POMDP model. This constraint can be solved using the POMDP with delayed observations.

또한, 상기 자극 순서 결정 장치(500)는 중복 자극으로 인한 눈 멈(repetition blindness) 현상이 발생하는 시간 이내에 수행된 행동을 제외한 행동만으로 최적 행동 정책을 결정할 수 있다.In addition, the stimulation order determination apparatus 500 may determine an optimal behavior policy based only on the behavior except the behavior performed within the time when the repetition blindness phenomenon due to the overlap stimulus occurs.

중복 자극으로 인한 눈 멈 현상(repetition blindness)은, 예를 들어 사용자가 의도하는 명령어가 " A " 라고 할 때, " A " 에 대응되는 자극을 사용자에게 준 이후, 500ms 이내에 다시 사용자에게 " A " 에 대응되는 자극을 주게 되면 두 번째 자극에 대한 뇌파에서는 P300이 발생하지 않는다.Repetition blindness due to overlapping stimuli is, for example, when the user's intended command is "A", the "A" is returned to the user again within 500ms after giving the user a stimulus corresponding to "A". If you give a stimulus corresponding to P300 does not occur in the EEG for the second stimulus.

이를 해결하는 간단한 방법은 500ms 이내에 사용자에게 주었던 자극은 다시 주지 않는 것이다.A simple way to solve this is to not give back the stimulus that was given to the user within 500 ms.

이 또한 실제 POMDP를 모델링하는데 있어서는 영향이 없지만, 해당 POMDP 모델의 최적 행동 정책을 계산하는 과정에서 문제가 발생한다. 이를 해결하는 방법은 를 A를 행동의 집합, A'를 500ms 이내에 수행된 행동의 집합이라고 할 때, 하기의 [수학식 4]와 같은 가치 함수를 정의함으로서 계산할 수 있다.
This also has no effect on modeling the actual POMDP, but there is a problem in calculating the optimal behavior policy of the POMDP model. To solve this problem, when A is a set of behaviors and A 'is a set of actions performed within 500 ms, it can be calculated by defining a value function as shown in Equation 4 below.

이하 본 발명의 일 실시예에 따른 적응형 뇌-컴퓨터 인터페이스 장치를 이용한 실험예를 설명한다.
Hereinafter, an experimental example using an adaptive brain-computer interface device according to an embodiment of the present invention will be described.

하기의 <실험예>에서는 신체 및 정신적 이상이 없는 9명을 통해 실험을 진행하였고, Baseline의 경우 자극을 주는 순서가 임의적인 것을 제외하고는 본 발명과 동일하며, PWSA는 POMDP가 적용된 장치를 나타낸다.In the following Experimental Example, the experiment was conducted through nine persons without physical and mental abnormalities, and in the case of Baseline, the same as the present invention except that the order of stimulation was arbitrary, PWSA indicates a device to which POMDP was applied. .

상기 Baseline과 PWSA의 공정한 비교를 위하여, baseline의 경우 이전 2개의 자극과는 다른 자극들 중에서 임의로 자극을 선택하여 수행함으로써 눈 멈 현상(repetition blindness)을 해결하였다.For the fair comparison of the baseline and PWSA, repetition blindness was solved by randomly selecting a stimulus among two stimuli different from the previous two stimuli.

<실험예><Experimental Example>

우선 [2x2] 행렬에 대해 관측 확률이 서로 다른 9개의 POMDP 모델을, [2x3] 행렬에 대해 서로 다른 11개의 모델을 작성하고 해당 모델의 행동 정책을 실험 이전에 계산하였다.First, nine POMDP models with different observation probabilities for the [2x2] matrix were created, and eleven different models for the [2x3] matrix were created, and the behavioral policy of the model was calculated before the experiment.

본 실험에서는 피실험자 개개인마다의 전처리 장치(preprocessor)와 분류자(classifier)를 생성하기 위하여 training data를 먼저 수집하였다.In this experiment, training data was first collected to generate preprocessor and classifier for each subject.

이후 [2x2]와 [2x3] 행렬에 대하여 각각 baseline과 PWSA의 실험을 수행하였다. PWSA의 경우 해당 피실험자에 대한 관측의 분포와 가장 유사한 POMDP 모델을 찾고 해당 모델의 행동 정책을 사용하여 자극의 순서를 결정하였다.
Subsequently, baseline and PWSA experiments were performed on [2x2] and [2x3] matrices, respectively. In the case of PWSA, we found the POMDP model that most closely resembled the distribution of observations for the subject and determined the order of the stimuli using the behavioral policy of the model.

평가 항목으로 Success rate와 bit rate에 대해서 baseline과 PWSA를 비교하였다. 이때, 정확도는 주어진 자극의 횟수에 대한 사용자의 의도를 맞춘 비율로 정의되며, Bit rate의 경우 단위 시간당 전송된 정보의 양을 나타낸다.
For evaluation, baseline and PWSA were compared for success rate and bit rate. At this time, the accuracy is defined as the ratio of the user's intention to a given number of stimuli, and in the case of bit rate, it indicates the amount of information transmitted per unit time.

실제로는 9명의 실험을 진행하였으나, 그 중 2명의 경우 관측 확률이 미리 만들어 놓은 POMDP 모델과 차이가 매우 커 실험 결과에서 제외하였다.
In fact, nine experiments were carried out, but two of them were excluded from the experimental results because the observation probabilities were very different from the POMDP model.

도 3a는 [2x2] 행렬에서의 success rate의 실험 결과를 나타내는 도이고, 도 3b는 [2x3] 행렬에서의 success rate의 실험 결과를 나타내는 도이다.
FIG. 3A is a diagram showing an experimental result of a success rate in the [2x2] matrix, and FIG. 3B is a diagram showing an experimental result of a success rate in the [2x3] matrix.

도 3a 및 도 3b에 도시된 바와 같이, PWSA의 성공 확률이 baseline보다 임의의 자극횟수에 대해 더 높음을 알 수 있고, 정확도의 차이가 행렬의 크기가 커질수록 더 커짐을 알 수 있다. 또한 PWSA의 경우 더욱 빠른 속도로 높은 정확도에 수렴하는 것을 알 수 있다.
As shown in FIGS. 3A and 3B, it can be seen that the probability of success of the PWSA is higher for any number of stimuli than the baseline, and the difference in accuracy becomes larger as the size of the matrix increases. The PWSA also converges to higher accuracy at a faster rate.

한편, bit rate의 실험 결과는 하기의 표 1에 나타난 바와 같다.
On the other hand, the experimental results of the bit rate is as shown in Table 1 below.

[2x2] [2x2] MatrixMatrix

[2x3] [2x3] MatrixMatrix

Baseline

Baseline

10.065 (96.4%)
10.065 (96.4%)
8.052 (92.9%)
8.052 (92.9%)
PWSA

PWSA

24.368 (98.2%)
24.368 (98.2%)
21.367 (97.6%)
21.367 (97.6%)

[표 1]에 나타난 바와 같이, [2x2] 행렬의 경우 PWSA는 98.2% 정확도로 24.368 bits/min의 bit rate를 나타내었고, [2x3] 행렬의 경우 97.6%의 정확도로 21.367 bits/min의 bit rate를 나타내었다. As shown in [Table 1], the PWSA showed 24.368 bits / min with 98.2% accuracy for the [2x2] matrix and the bit rate of 21.367 bits / min with 97.6% with the [2x3] matrix. Indicated.

각각의 행렬에 대해 baseline은 최대 96.4%와 92.9%의 정확도를 나타내었으며, 해당 정확도에 대한 bit rate는 10.065 bits/min과 8.052 bits/min에 불과하였다.
For each matrix, the baseline showed up to 96.4% and 92.9% accuracy, and the bit rates for the accuracy were only 10.065 bits / min and 8.052 bits / min.

이상과 같이 본 발명에 따른 적응형 뇌-컴퓨터 인터페이스 장치를 예시한 도면을 참조로 하여 설명하였으나, 본 명세서에 개시된 실시예와 도면에 의해 본 발명이 한정되는 것은 아니며, 본 발명의 기술사상 범위내에서 당업자에 의해 다양한 변형이 이루어질 수 있음은 물론이다.
As described above with reference to the drawings illustrating an adaptive brain-computer interface device according to the present invention, the present invention is not limited by the embodiments and drawings disclosed herein, it is within the scope of the technical spirit of the present invention Of course, various modifications can be made by those skilled in the art.

100,10:자극 생성 장치 200,20:신호 수집 장치
300,30:전처리 장치 400,40:해석 장치
410:분류자 500:자극 순서 결정 장치
510:신뢰 상태 갱신부 520:최적 자극 선택부
600,50:외부 장치100, 10: stimulation generator 200, 20: signal acquisition device
300, 30: pretreatment device 400, 40: analysis device
410: classifier 500: stimulation order determination device
510: Trust status update unit 520: Optimal stimulus selector
600,50: external device

Claims

A stimulus generating device for generating a stimulus and applying a stimulus to a user;
A signal collection device for recording an EEG signal of the user generated by the stimulus;
A pre-processing device for extracting characteristics of P300 for the stimulus from the EEG signal;
An analysis device for determining whether P300 is present from an extraction signal of the preprocessor; And
An adaptive brain-computer interface device for inferring a current state from the observation of the analysis device, and selecting an optimal stimulus for the current state to determine a stimulation order of the stimulus generating device. .

The method of claim 1,
The stimulation order determining device is adapted to determine the optimal behavior policy by applying a partial observation Markov decision model (POMDP).

The method of claim 2,
The stimulation order determination device,
A confidence state updater for inferring a probability distribution of the current state from previous stimuli given to the user and observations corresponding to each stimulus; And
And an optimum stimulus selector configured to select and perform an optimal stimulus corresponding to the trusted state of the trusted state updater, and to transmit a command or a message to an external device.

The method of claim 1,
The apparatus for determining stimulation order is an adaptive brain-computer interface device, characterized in that a delayed observation POMDP is applied to determine an optimal behavior policy.

The method of claim 1,
The apparatus for determining stimulation is an adaptive brain-computer interface device, characterized in that for determining the optimal behavioral policy based only on the behavior except for the behavior performed within a time when repetition blindness due to overlapping stimulation occurs.

6. The method of claim 5,
The optimal behavior policy,
Adaptive brain-computer interface device, characterized in that it is calculated by a value function defined by the following equation.

(Where A is the set of actions and A 'is the set of actions performed within 500ms)

The method of claim 1,
The pretreatment device,
Adaptive brain-computer interface device, characterized in that by removing the noise by averaging the EEG signal, or using a P300 extraction algorithm, such as spatial filter algorithm, Mexican hat wavelet.

The method of claim 1,
The analysis device,
Using classification algorithms such as Fisher's linear discriminant, stepwise linear discriminant analysis, and support vector machine Adaptive brain-computer interface device, characterized in that consisting of a P300 classifier.