WO2023047519A1

WO2023047519A1 - Learning device, estimation device, learning method, estimation method, and program

Info

Publication number: WO2023047519A1
Application number: PCT/JP2021/035019
Authority: WO
Inventors: 慎平山岸; 茂人古川
Original assignee: 日本電信電話株式会社
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2023-03-30

Abstract

The purpose of the present invention is to provide a learning device that learns an estimation model that estimates attention to auditory stimuli on the basis of the characteristics of microsaccades. This learning device comprises: a feature quantity extraction unit that extracts feature quantities of microsaccades from time-series data of eyeball positions included in learning data; and a learning unit that learns the estimation model by using the feature quantities of the microsaccades and information indicative of the attentional states to auditory stimuli included in the learning data. The estimation model estimates a subject's attentional state to the auditory stimuli from the direction of the generated microsaccades for the auditory stimuli.

Description

Learning device, estimation device, learning method, estimation method, and program

The present invention relates to an estimation model learning device, an estimation device using the estimation model, a learning method, an estimation method, and a program relating to attention using minute eye movements (microsaccades).

Knowing information about attention, such as direction of attention and level of attention, will lead to technology for estimating a person's interest and degree of concentration on work. Note that the direction of attention indicates the direction in which the subject pays attention, and the attention level indicates how much attention the subject pays. In the field of vision research, it has been reported that involuntary small eye movements called microsaccades reflect the direction of attention. is On the other hand, evidence for a relationship between microsaccades and auditory attention is sparse, and evidence for a relationship with time-varying levels of attention has not been established.

According to Non-Patent Document 1, it is possible to estimate the direction of attention not only for visual stimuli but also for auditory stimuli from the characteristics of microsaccades.

However, it is not clear whether information related to human behavior and psychological state, such as attentional task performance that fluctuates over time according to psychological state, can be inferred from the characteristics of microsaccades. An object of the present invention is to estimate the state of attention to auditory stimuli based on the characteristics of microsaccades.

The present invention provides a learning device that learns an estimation model for estimating attentional states for auditory stimuli based on the characteristics of microsaccades, an estimation device that uses the estimation model, a learning method, an estimation method, and a program. aim.

In order to solve the above problems, according to one aspect of the present invention, a learning device extracts a microsaccade feature quantity from time-series data of eyeball positions included in learning data. and a learning unit that learns an estimation model using the feature amount of the sound and the information indicating the attention state with respect to the auditory stimulus included in the learning data, and the estimation model learns the generated microsaccade with respect to the auditory stimulus. It is a model that estimates the attentional state of the subject to auditory stimuli from the orientation.

In order to solve the above problems, according to another aspect of the present invention, an estimating device extracts a microsaccade feature amount from time-series data of eyeball positions of a subject, and includes a feature amount extracting unit and an estimating model. and an estimating unit that estimates the subject's attentional state to auditory stimuli from the microsaccade feature value using microsaccade features. It is a model learned using information indicating

According to the present invention, it is possible to estimate the state of attention to auditory stimuli by measuring microsaccades.

The figure which shows the time change of an eyeball position, and the generation example of a micro saccade. The figure which shows the relationship between attention direction and the time change of the direction of the microsaccade which generate|occur|produced. The figure which shows the relationship between attention level and the time change of the direction of the microsaccade which generate|occur|produced. The figure which shows the structural example of the estimation system which concerns on 1st embodiment. FIG. 2 is a functional block diagram of the learning device according to the first embodiment; 4 is a diagram showing an example of the processing flow of the learning device according to the first embodiment; FIG. The functional block diagram of the estimation apparatus which concerns on 1st embodiment. The figure which shows the example of the processing flow of the estimation apparatus which concerns on 1st embodiment. The figure which shows the structural example of the computer which applies this method.

Embodiments of the present invention will be described below. It should be noted that in the drawings used for the following description, the same reference numerals are given to components having the same functions and steps that perform the same processing, and redundant description will be omitted.

<Points of the first embodiment>
In this embodiment, the time change characteristics of the microsaccade when the subject is working on the auditory attention task in the left and right directions are related to the task performance (reaction time slow vs. fast, failure vs. success). Based on the discovery that it differs depending on the subject, we estimate the direction of attention and the attention level at that time from the microsaccades obtained by measuring the eye movement of the subject.

First, we will explain the content and results of the experiment that is the basis for the discovery of the relationship between the time-varying characteristics of microsaccades and task performance when the subject is working on an attention task, which is the background of this embodiment.

<Background experiment>
1. An eye tracker is used to record microsaccades when a specific sound (standard sound) is repeatedly presented to the left and right ears of a subject. The subject pays attention to either the left or right sound, and performs a task to detect a sound (oddball sound) that occasionally appears in the sequence of sounds to which attention is directed and has characteristics different from the standard sound. For example, the subject receives the following instructions (i) to (iii).
(i) Focusing attention on either left or right sound.
(ii) Press the button when the oddball sound is detected from the indicated direction.
(iii) ignoring oddball sounds presented to the unattended ear;

2. Calculating the timing and amplitude of microsaccades from the captured images. Note that the temporal change in the amplitude value of the microsaccade during the task is used as a feature quantity.

Figure 1 shows an example of changes in eyeball position over time and occurrence of microsaccades. As a result of the experiment, (1) the characteristics of the temporal change in the direction of the generated microsaccade differ depending on the direction of attention to the auditory stimulus (see Fig. 2), (2) the characteristics of the temporal change in the direction of the microsaccade. differed depending on the performance (attention level) of the auditory attention task (see Fig. 3). (1) is that microsaccades occur in the direction opposite to the direction of attention to auditory stimuli. In FIG. 2, the time when the oddball sound presentation is started is set to zero. In (2), specifically, when the detection is slow (when the attention level is low), a microsaccade occurs in the same direction as the oddball sound within 0 to 0.8 seconds after the oddball sound is presented. It is the generation of microsaccades in the opposite direction to the sound. In FIG. 3, the time when the oddball sound presentation is started is set to zero.

In this embodiment, the discovery of the relationship between the time-varying characteristics of microsaccades and task performance is used to estimate the state of attention (direction of attention and level of attention) to auditory stimuli from time-series data of eyeball positions.

<First Embodiment>
FIG. 4 shows a configuration example of an estimation system according to the first embodiment.

The estimation system includes a learning device 100 and an estimation device 200.

The processing of the estimation system consists of a learning stage and an estimation stage.

In the learning stage, the learning device 100 receives learning data N, learns an estimation model, and outputs a trained estimation model M.

The estimation device 200 receives the estimation model M prior to the estimation process.

In the estimation stage, the estimation device 200 receives the eyeball position time-series data S as input, estimates the state of attention (attention direction and level of attention in this embodiment) to the auditory stimulus, and outputs an estimation result T.

The learning device 100 and the estimating device 200 are configured by reading a special program into a known or dedicated computer having, for example, a central processing unit (CPU: Central Processing Unit), a main memory (RAM: Random Access Memory), etc. It is a special device designed Learning device 100 and estimating device 200 execute each process under the control of a central processing unit, for example. The data input to the learning device 100 and the estimation device 200 and the data obtained in each process are stored, for example, in a main storage device, and the data stored in the main storage device are read into the central processing unit as needed. output and used for other processing. At least a part of each processing unit of learning device 100 and estimation device 200 may be configured by hardware such as an integrated circuit. Each storage unit included in the learning device 100 and the estimating device 200 can be configured by, for example, a main storage device such as RAM (Random Access Memory), or middleware such as a relational database or key-value store. However, each storage unit does not necessarily have to be provided inside the learning device 100 and the estimation device 200, and may be configured by an auxiliary storage device configured by a semiconductor memory device such as a hard disk, an optical disk, or a flash memory. , may be provided outside the learning device 100 and the estimation device 200 .

First, I will explain the learning stage.

<Learning Device 100>
FIG. 5 is a functional block diagram of the learning device 100 according to the first embodiment, and FIG. 6 shows its processing flow.

The learning device 100 includes a feature extraction unit 110 and a learning unit 120.

The learning device 100 receives learning data N, learns an estimation model, and outputs a learned estimation model M.

In order to create learning data, the subject is repeatedly presented with standard sounds and occasionally with oddball sounds via the sound generator 80 . Furthermore, the eye movement acquiring unit 90 acquires the eye movement of the subject who is presented with the auditory stimulus. Similar processing is performed for a plurality of subjects to create learning data for the plurality of subjects. For example, as in the background experiment, subjects were instructed to turn their attention to either the left or right sound, and were instructed to press a button when an oddball sound was detected from the instructed direction. instructed to ignore oddball sounds presented to the non-attending ear.

The sound generator 80 is, for example, an earphone or speaker that presents auditory stimulation to the subject. The reproduction signal reproduced by the sound generator 80 is a signal that repeatedly presents the standard sound and occasionally presents the oddball sound. In the background experiments, we used earphones as the sound generator, but any device that can present sounds that are perceived at different spatial locations will suffice.

The eye movement acquisition unit 90 captures the movement of the subject's eyes while presenting the auditory stimulus with a camera or eye tracker, calculates and outputs the eye position. Because microsaccades are very fast eye movements, it is desirable to use a camera or eye tracker with a high sampling rate (at least 500 Hz). The eye movement acquiring unit 90 may be configured by a device separate from the learning device 100 , or may be configured by a camera separate from the learning device 100 and a video processing unit provided in the learning device 100 .

The learning data includes a set of eyeball position time-series data acquired by the eye movement acquisition unit 90 and information indicating the state of attention to the auditory stimulus.

Information indicating the state of attention to auditory stimuli includes, for example, information indicating the direction in which the subject's attention is directed and the reaction time. As the information indicating the reaction time, for example, the difference between the time when the oddball sound was presented and the time when the subject pressed the button when the oddball sound was detected can be used. This reaction time may be calculated in the learning device 100 by using the signal presenting the auditory stimulus to the subject and the output signal of the button as inputs to the learning device 100 . In this case, the information indicating the reaction time is information including, for example, the time when the oddball sound was presented and the time when the subject pressed the button when the oddball sound was detected. A signal that presents the auditory stimulation to the subject is a reproduction signal that is reproduced by the sound generator 80 . Alternatively, a configuration may be adopted in which the reaction time is obtained by an external device and the external device outputs the reaction time to the learning device 100 . In this case, the information indicating the reaction time is the reaction time itself output by the external device.

Each part will be explained below.

<Feature quantity extraction unit 110>
The feature amount extraction unit 110 receives the time-series data of the eyeball position obtained by the eye movement acquisition unit 90, which is included in the learning data N, and extracts the microsaccade feature amount from the time-series data of the eyeball position ( S110), output. For example, the feature amount is extracted by the following procedure.

1.Exclude the part missing due to blinking from the time-series data of eyeball position.

2. Calculate the discrete value of the binary data of the amplitude value or the direction according to the generation timing of the microsaccade during the presentation of the oddball sound.

3. Calculate the moving average from the discrete values with a certain window width (0.5 to several seconds), extract the temporal change in the direction of occurrence of the microsaccade, and use it as a feature value. Alternatively, the average value of a certain section (for example, the entire section during oddball sound presentation) is used as the feature amount.

The feature quantity includes at least the above-described microsaccade amplitude information, and may also include characteristics such as microsaccade velocity, vibration characteristics, and frequency of occurrence.

<Learning unit 120>
The learning unit 120 receives the feature quantity of the microsaccade extracted by the feature quantity extraction unit 110 and the information indicating the state of attention to the auditory stimulus included in the learning data, and learns the estimation model (S120). output the estimated model of The estimation model is a model that uses microsaccade features as input and estimates attentional states for auditory stimuli. The estimation model can also be said to be a model for estimating the subject's attentional state to the auditory stimulus from the direction of the generated microsaccade with respect to the auditory stimulus.

For example, the learning unit 120 prepares an estimation model in which an appropriate initial value is set in advance, and the estimation result for the microsaccade feature quantity extracted by the feature quantity extraction unit 110 is the attention state for the auditory stimulus included in the learning data. The estimation model is learned by machine learning so as to match the information indicating The learning unit 120 repeatedly updates the parameters of the estimation model until a predetermined condition is satisfied, and outputs the estimation model at that time as a learned estimation model when the predetermined condition is satisfied. The predetermined condition is a condition for determining whether or not the learning of the estimation model has converged. value or less.

For example, when the results shown in Figs. 2 and 3 are obtained, the estimation model trained in this way will: (a) Information indicating that the subject is paying attention (information indicating the direction of attention), (b) the amplitude and direction of the microsaccade included in the input microsaccade feature amount are 0 to 0.8 after the oddball sound presentation. If it occurs in the same direction as the oddball within a second, information indicating that the attention level has decreased (information indicating the attention level) is output as the estimation result. When a threshold is used to determine the attention direction or attention level, the threshold is, for example, a value similar to the value when paying attention to a sound source other than the target sound source (preferably in the opposite direction), or It may simply be an amplitude value of 0 (occurring equally in both directions).

Next, we will explain the estimation stage.

<Estimation device 200>
FIG. 7 is a functional block diagram of the estimation device 200 according to the first embodiment, and FIG. 8 shows its processing flow.

The estimating device 200 includes a feature extraction unit 210 and an estimating unit 220 .

In the estimation stage, the estimation device 200 receives the eyeball position time series data S as input, estimates the state of attention to the auditory stimulus, and outputs the estimation result T.

In the estimation stage, the subject is repeatedly presented with the standard sound through the sound generator 80, and occasionally presented with the oddball sound. Furthermore, the eye movement acquiring unit 90 acquires the eye movement of the subject who is presented with the auditory stimulus. Note that in the estimation stage, it is not necessary to issue instructions to the subject unlike in the learning stage.

<Feature quantity extraction unit 210>
The feature amount extraction unit 210 receives the eyeball position time-series data S obtained by the eye movement acquisition unit 90, extracts the microsaccade feature amount from the eyeball position time-series data S (S210), and outputs the feature amount. . The feature amount extraction unit 210 may extract the same feature amount using the same method as the feature amount extraction unit 110 .

<estimating unit 220>
The estimation unit 220 receives the microsaccade feature amount extracted by the feature amount extraction unit 210, estimates the state of attention to the auditory stimulus using an estimation model (S220), and outputs an estimation result T. For example, in the present embodiment, the estimation result T includes attention direction and attention level for auditory stimuli.

<effect>
With such a configuration, it is possible to estimate the state of attention to an auditory stimulus by measuring microsaccades. In the prior art, a technique for extracting the potential direction of attention to visual stimuli from microsaccades has been reported. shows the estimation method of

<Modification>
In the present embodiment, the "attentional state for auditory stimulation" to be estimated is the attentional direction and the attentional level, but either one of them may be used. It is only necessary to prepare learning data according to the target to be estimated, and learn the estimation model.

In the present embodiment, the eye movement acquisition unit 90 is configured to include a camera, but other configurations may be used as long as they are configured to acquire time-series data of eyeball positions. For example, without including a camera, the position of the eyeball may be obtained by arranging electrodes around the eye and measuring the electric potential generated by muscles or the like for moving the eyeball.

<Other Modifications>
The present invention is not limited to the above embodiments and modifications. For example, the various types of processing described above may not only be executed in chronological order according to the description, but may also be executed in parallel or individually according to the processing capacity of the device that executes the processing or as necessary. In addition, appropriate modifications are possible without departing from the gist of the present invention.

<Program and recording medium>
The various processes described above can be performed by loading a program for executing each step of the above method into the storage unit 2020 of the computer shown in FIG. .

A program that describes this process can be recorded on a computer-readable recording medium. Any computer-readable recording medium may be used, for example, a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory, or the like.

In addition, the distribution of this program is carried out, for example, by selling, assigning, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Further, the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to other computers via the network.

A computer that executes such a program, for example, first stores the program recorded on a portable recording medium or the program transferred from the server computer once in its own storage device. Then, when executing the process, this computer reads the program stored in its own recording medium and executes the process according to the read program. Also, as another execution form of this program, the computer may read the program directly from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to this computer. Each time, the processing according to the received program may be executed sequentially. In addition, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer the program from the server computer to this computer, and realizes the processing function only by its execution instruction and result acquisition. may be It should be noted that the program in this embodiment includes information that is used for processing by a computer and that conforms to the program (data that is not a direct instruction to the computer but has the property of prescribing the processing of the computer, etc.).

In addition, in this embodiment, the device is configured by executing a predetermined program on a computer, but at least part of these processing contents may be implemented by hardware.

Claims

a feature extraction unit that extracts microsaccade features from time series data of eyeball positions included in learning data;
a learning unit that learns an estimation model using the feature amount of the microsaccade and information indicating the state of attention to auditory stimuli contained in the learning data,
The estimation model is a model that estimates the attentional state of the subject for auditory stimuli from the direction of the generated microsaccade for auditory stimuli.
learning device.
The learning device of claim 1,
the attentional state for the auditory stimulus includes at least one of an attentional direction and an attentional level;
learning device.
a feature extraction unit that extracts microsaccade features from time-series data of eyeball positions of a subject;
an estimating unit that estimates the subject's attentional state to auditory stimuli from the microsaccade features using an estimation model,
The estimation model is a model learned using time-series data of eyeball positions included in the learning data and information indicating the state of attention to auditory stimuli.
estimation device.
The estimating device of claim 3,
the attentional state for the auditory stimulus includes at least one of an attentional direction and an attentional level;
estimation device.
a feature extraction step of extracting microsaccade features from time-series data of eyeball positions included in learning data;
a learning step of learning an estimation model using the feature quantity of the microsaccade and information indicating the state of attention to the auditory stimulus contained in the learning data,
The estimation model is a model that estimates the attentional state of the subject for auditory stimuli from the direction of the generated microsaccade for auditory stimuli.
learning method.
a feature extraction step of extracting a microsaccade feature from time-series data of eyeball positions of a subject;
an estimation step of estimating the subject's attentional state to auditory stimuli from the feature quantity of the microsaccade using an estimation model,
The estimation model is a model learned using time-series data of eyeball positions included in the learning data and information indicating the state of attention to auditory stimuli.
estimation method.
A program for causing a computer to function as the learning device of claim 1 or claim 2 or the estimation device of claim 3 or claim 4.