WO2022264269A1

WO2022264269A1 - Training device, estimation device, methods therefor, and program

Info

Publication number: WO2022264269A1
Application number: PCT/JP2021/022704
Authority: WO
Inventors: 瑛彦高島; 亮増村
Original assignee: 日本電信電話株式会社
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2022-12-22
Also published as: JPWO2022264269A1

Abstract

Provided are an estimation device, etc., with which it is possible to estimate an angle of sight with high accuracy and assesses an effective visual field capturing the features of both the angle of sight and effective visual field assessment. This training device converts a face image for training to an intermediate feature by a neural network function, using a model parameter ^θ_g or ^θ_p, converts the intermediate feature to an estimate sight angle vector by a neural network function, using the model parameter ^θ_g, updates the model parameter ^θ_g, using the estimate sight angle vector and a sight angle correct-answer label for the face image for training, converts an intermediate feature v to an estimate effective visual field probability vector by a neural network function, using the model parameter ^θ_p, and updates the model parameter ^θ_p, using the estimate effective visual field probability vector and an effective visual field correct-answer label for the face image for training.

Description

LEARNING APPARATUS, ESTIMATION APPARATUS, THEIR METHOD, AND PROGRAM

The present invention relates to an estimation technique for estimating a line-of-sight angle and an effective field of view from a face image, and a technique for learning parameters used for estimation.

The line-of-sight angle is the rotation angle that represents the orientation of the pupils of the left and right eyeballs. If the line-of-sight angle can be estimated from the image, it will be possible to understand the state of eye movement, such as what the person is gazing at or looking around, and analyze the state and inner understanding of the person. Gaze angle estimation is generally performed using a neural network. In the prior art, only the line-of-sight angle is learned by regression using a neural network model from the eye area image. The correct label of the line-of-sight angle is vector data, and is continuous data of each rotation angle value of the horizontal component and vertical component of the eyeball. In learning the parameters used for estimating the line-of-sight angle using a neural network, for example, convolution layers and pooling layers, which are widely used in image recognition, are used to extract image features. The line-of-sight angle can be estimated by performing a regression on .

Non-Patent Document 1 is a method of estimating the line-of-sight angle using a neural network.

On the other hand, there is a method that utilizes the effective field of view as a measure for determining whether a person recognizes an object. The effective field of view refers to the horizontal angle of the eyeball between -15 degrees and 15 degrees and the vertical angle between -12 degrees and 8 degrees (see Fig. 1). Humans can clearly recognize objects within this range. can. Conversely, the ability to recognize objects outside the effective field of view is extremely reduced. For example, if the object is a robot as a use case, the effective field of view scale can be used to determine whether a robot equipped with a camera is recognized by a human being captured by the camera. Specifically, although the human is facing the robot, if the line-of-sight angle is outside the effective visual field, the robot is not recognized, and if it is within the effective visual field, the robot is recognized. It can be determined that there is FIG. 1 is a diagram showing the human visual field and visual field characteristics. FIG. 2 is a diagram showing the inside of the effective field of view (the range in which the presence of an object can be recognized) and the outside of the effective field of view (the range in which the presence of an object cannot be recognized).

　Conventional learning of model parameters used for estimating the line-of-sight angle uses a method that regresses only the line-of-sight angle with a neural network, and whether the line-of-sight angle is within the effective field of view or outside the effective field of view is determined by rule determination from the estimated line-of-sight angle. In addition, the conventional technique only minimizes the error of the line-of-sight angle, and performs learning without considering whether it is inside or outside the effective visual field. As for the tendency of the estimation error of the line-of-sight angle, the larger the absolute value of the correct angle, the larger the error. Angle errors also tend to increase around -12 degrees and around 8 degrees. For this reason, the method of determining the effective field of view based on a rule based on the estimated line-of-sight angle has a problem that the accuracy of determination of the effective field of view decreases near the boundary of the effective field of view.

An object of the present invention is to provide an estimating device, a learning device, a method thereof, and a program capable of performing high-accuracy line-of-sight angle estimation and effective field-of-view determination that capture the characteristics of both line-of-sight angle and effective field-of-view determination. and

In order to solve the above problems, according to one aspect of the present invention, a learning device uses a model parameter ^θ _g or ^θ _p to transform a learning face image S ^b into an intermediate feature v using a neural network function. a shared network unit that transforms into , a line-of-sight angle network unit that transforms the intermediate feature v into an estimated line-of-sight angle vector Z' _g using a neural network function using the model parameter ^θ _g , an estimated line-of-sight angle vector Z' _g and Using the gaze angle correct label for the learning face image S ^b , the gaze angle model parameter optimization unit updates the model parameter ^θ _g , and the model parameter ^θ _p is used to obtain the intermediate feature v to an estimated effective visual field probability vector Z' _p , and the estimated effective visual field probability vector Z' _p and the effective visual field correct label for the learning face image S ^b to update the model parameter ^θ _p and an effective field of view model parameter optimizing unit that acquires a learned model parameter θ _g corresponding to the model parameter ^θ _g and a learned model parameter θ _p corresponding to the model parameter ^θ _p .

According to the present invention, it is possible to perform highly accurate line-of-sight angle estimation and effective field-of-view determination by capturing the features of both line-of-sight angle and effective field-of-view determination.

A diagram showing the human visual field and visual field characteristics. The figure which shows the inside of an effective visual field, and the outside of an effective visual field. FIG. 4 is a diagram for explaining an estimated angle error; The figure which shows the structural example of the estimation system which concerns on 1st embodiment. FIG. 2 is a functional block diagram of the learning device according to the first embodiment; 4 is a diagram showing an example of the processing flow of the learning device according to the first embodiment; FIG. The functional block diagram of the estimation apparatus which concerns on 1st embodiment. The figure which shows the example of the processing flow of the estimation apparatus which concerns on 1st embodiment. The figure which shows the structural example of the computer which applies this method.

Embodiments of the present invention will be described below. It should be noted that in the drawings used for the following description, the same reference numerals are given to components having the same functions and steps that perform the same processing, and redundant description will be omitted. In the following description, symbols such as "^" used in the text should be written directly above the immediately preceding character, but are written immediately after the relevant character due to text notation restrictions. These symbols are written in their original positions in the formulas. Further, unless otherwise specified, the processing performed for each element of a vector or matrix is applied to all the elements of the vector or matrix.

<Points of the first embodiment>
In this embodiment, a learning device simultaneously learns the model parameters used for estimating the line-of-sight angle and the model parameters used for determining the effective visual field. The learning data includes a learning face image, a correct label Y ^g for the line-of-sight angle, and a correct label Y ^p for the effective visual field. The configuration of the neural network is a shared network section, and a line-of-sight angle network section and an effective visual field network section that branch off after that. The sharing network receives face images for learning as input, and can be expected to learn to acquire features of both line-of-sight angle and effective field of view. Calculate the probability in , use each correct label to minimize the error of the estimate, and update the model parameters. At the time of estimation, the line-of-sight angle estimator and the effective visual field determiner are used, respectively. The line-of-sight angle estimation unit estimates the line-of-sight angle using the network architecture of the shared network unit and the subsequent line-of-sight angle network unit, and the model parameters learned by the learning device. Similarly, the effective field of view determination section uses the network architecture of the shared network section and the subsequent effective field of view network section, and the model parameters learned by the learning device to determine the effective field of view.

<First embodiment>
FIG. 4 shows a configuration example of an estimation system according to the first embodiment.

The estimation system includes a learning device 100 and an estimation device 200.

The learning device 100 and the estimating device 200 are configured by reading a special program into a known or dedicated computer having, for example, a central processing unit (CPU: Central Processing Unit), a main memory (RAM: Random Access Memory), etc. It is a special device designed Learning device 100 and estimating device 200 execute each process under the control of a central processing unit, for example. The data input to the learning device 100 and the estimation device 200 and the data obtained in each process are stored, for example, in a main storage device, and the data stored in the main storage device are read into the central processing unit as needed. output and used for other processing. At least a part of each processing unit of learning device 100 and estimation device 200 may be configured by hardware such as an integrated circuit. Each storage unit included in the learning device 100 and the estimating device 200 can be configured by, for example, a main storage device such as RAM (Random Access Memory), or middleware such as a relational database or key-value store. However, each storage unit does not necessarily have to be provided inside the learning device 100 and the estimation device 200, and may be configured by an auxiliary storage device configured by a semiconductor memory device such as a hard disk, an optical disk, or a flash memory. , may be provided outside the learning device 100 and the estimation device 200 .

First, the learning device 100 will be explained.

<Learning Device 100>
Learning device 100 _receives learning data D ⁼ ( _S ₁ , T ^g ₁ , _T ^p ₁ ⁾ , . is used to learn the model parameters θ _g and θ _p , and the learned model parameters θ _g and θ _p are output. Here, M is the data size of the learning data D. S _m (m=1,...,M) are learning face images. The learning face image may be, for example, an image obtained by cutting out only the face or an image obtained by cutting out only the eye region. For example, the training face image has a resolution of 224x224 pixels and three RGB channels. T ^g _m (m=1,...,M) is the correct label for the line-of-sight angle (hereinafter also referred to as the "correct line-of-sight angle label"). Yes, and takes a data format like [-35,56]. T ^p _m (m=1,...,M) is the correct label of the effective field of view (hereinafter also referred to as "effective field of view correct label"). If , it becomes [0].

FIG. 5 shows a functional block diagram of the learning device 100, and FIG. 6 shows its processing flow.

The learning device 100 includes a shared network unit 120 , a line-of-sight angle network unit 130 , a line-of-sight angle model parameter optimization unit 140 , an effective visual field network unit 150 , and an effective visual field model parameter optimization unit 160 .

Below is an overview of each part.

The shared network unit 120 receives the learning face image, learns the features of both the line-of-sight angle and the effective field of view, and acquires intermediate features using an arbitrary neural network that outputs intermediate features.

The line-of-sight angle network unit 130 acquires the line-of-sight angle estimate using an arbitrary neural network that inputs intermediate features and outputs the line-of-sight angle estimate.

The line-of-sight angle model parameter optimization unit 140 receives the line-of-sight angle estimate and the line-of-sight angle correct label, calculates the error of the line-of-sight angle estimate, and updates the model parameter θ _g based on the error.

The effective field of view network unit 150 acquires an estimated effective field of view probability using an arbitrary neural network that receives intermediate features as input and outputs a probability of being within the effective field of view (estimated effective field of view probability).

The effective field of view model parameter optimization unit 160 receives the estimated effective field of view probability and the correct label of the effective field of view as inputs, calculates the error of the estimated effective field of view probability, and updates the model parameter θ _p based on the error.

The above process shows the learning procedure for one batch (a block of data partially selected from the learning data), and by repeating this, all data can be learned any number of times.

The details of each part will be explained below.

<Shared Network Unit 120>
_Input : _training _face images S ₁ , .
The shared network unit 120 is composed of an arbitrary neural network, and is composed of, for example, four convolutional layers.

Prior to the conversion process, the shared network unit 120 selects the updated model parameter ^θ _g (output value of the line-of-sight angle model parameter optimization unit 140) or ^θ _p (output value of the effective field model parameter optimization unit 160). It receives parameters corresponding to arbitrary neural networks that make up the shared network unit 120 .

The shared network unit 120 uses the updated model parameters ^θ _g or ^θ _p corresponding to an arbitrary neural network constituting the shared network unit 120 to obtain the learning face image S by the function of an arbitrary neural network. ^b is converted to an intermediate feature v (S120). The learning face images S ^b are obtained by dividing the learning face images S ₁ , . . . , S _|M| into batches. One batch of learning face images S ^b is composed of, for example, 16 images. For example, if a batch of training face images S ^b consists of Q images, let v _q be the intermediate feature corresponding to the q-th image, and v=[v ₁ ,v ₂ ,...,v _Q ] and However, q=1,2,...,Q.

<Line-of-sight angle network unit 130>
Input: 1 batch of intermediate features v, updated model parameters ^θ _g that correspond to the neural network forming the line-of-sight angle network unit 130 Output: estimated line-of-sight angle vector Z' _g
The line-of-sight angle network unit 130 is composed of an arbitrary neural network, such as two fully connected layers.

Prior to the conversion process, the line-of-sight angle network unit 130 selects parameters corresponding to arbitrary neural networks constituting the line-of-sight angle network unit 130 among the updated model parameters ^θ _g (output values of the line-of-sight angle model parameter optimization unit 140). receive.

The line-of-sight angle network unit 130 uses the received parameters to transform the intermediate feature v into an estimated line-of-sight angle vector Z' _g by an arbitrary neural network function (S130). The estimated line-of-sight angle vector is vector data that stores an estimated value of the horizontal rotation angle and an estimated value of the vertical rotation angle of the line-of-sight angle. In other words, the estimated line-of-sight angle vector is an estimate of the line-of-sight angle vector. The line-of-sight angle vector is vector data that stores the horizontal rotation angle and vertical rotation angle of the line-of-sight angle. The rotation angle range is from -180 degrees to 180 degrees. For example, the line-of-sight angle vector is a vector such as [-35,56]. For example, let Z' _g,q be the estimated gaze angle vector corresponding to the q-th intermediate feature v _q of a batch, and Z' _g =[Z' _g,1 ,Z' _g,2 ,...,Z' _{g, Q} ]. The estimated line-of-sight angle vector Z' _g,q becomes a vector like [-35,56].

<Visual line angle model parameter optimization unit 140>
Input: estimated line-of-sight angle vector Z' _g , correct line-of-sight angle label T ^g ₁ ,...,T ^g _|M|
Output: updated model parameters ^θ _g or learned model parameters θ _g
The model parameters ^θ _g and θ _g are obtained by connecting the parameters corresponding to the neural networks forming the shared network unit 120 and the parameters corresponding to the neural networks forming the line-of-sight angle network unit 130 .

The line-of-sight angle model parameter optimization unit 140 updates the model parameter ^θ _g using the estimated line-of-sight angle vector Z' _g and the correct line-of-sight angle label T ^g ₁ , ..., T ^g _|M| (S140), Optimize. For example, the line-of-sight angle model parameter optimization unit 140 calculates the error between the estimated line-of-sight angle vector Z′ _g and the correct line-of-sight angle label T ^g ₁ , . . . , T ^g _|M| update the model parameter ^θ _g and perform optimization. For example, an MSE error or an MAE error can be used as the error, and a gradient descent method or the like can be used as a parameter update method.

<Effective field of view network unit 150>
Input: 1 batch of intermediate features v, updated model parameters ^θ _p corresponding to the neural network constituting the effective field of view network unit 150 Output: estimated effective field of view probability vector Z' _p
The effective field of view network unit 150 is composed of an arbitrary neural network, such as two fully connected layers.

Prior to conversion processing, the effective field of view network unit 150 extracts parameters corresponding to arbitrary neural networks constituting the effective field of view network unit 150 among the updated model parameters ^θ _p (output values of the effective field of view model parameter optimization unit 160). receive.

The effective field of view network unit 150 uses the received parameters to convert one batch of intermediate features v into an estimated effective field of view probability vector Z′ _p by an arbitrary neural network function (S150). The estimated effective visual field probability vector is a vector consisting of the estimated value of the effective visual field probability (estimated effective visual field probability), and the effective visual field probability is the probability of whether or not the gaze direction of the learning face image is within the effective visual field. . In other words, the effective field of view probability is the probability of whether or not the camera that captures the learning face image exists within the effective field of view of the subject of the learning face image. For example, let Z' _p,q be the estimated effective field probability corresponding to the q-th intermediate feature v _q of a batch, and Z' _p =[Z' _p,1 ,Z' _p,2 ,...,Z' _{g, Q} ].

<Effective field of view model parameter optimization unit 160>
Input: estimated effective visual field probability vector Z' _p , effective visual field correct label T ^p ₁ ,...,T ^p _|M|
Output: updated model parameters ^θ _p or learned model parameters θ _p
The model parameters ^θ _p and θ _p are obtained by connecting the parameters corresponding to the neural networks forming the shared network section 120 and the parameters corresponding to the neural networks forming the effective field of view network section 150 .

The effective visual field model parameter optimization unit 160 updates the model parameter ^θ _p using the estimated effective visual field probability vector Z' _p and the effective visual field correct labels T ^p ₁ , ..., T ^p _|M| (S160). , to perform the optimization. For example, the effective field of view model parameter optimization unit 160 calculates the error between the estimated effective field of view probability vector Z′ _p and the effective field of view _correct label T ^p ₁ , ^. Update the model parameter ^θ _p as follows and perform the optimization. For example, a cross-entropy error can be used as the error, and a gradient descent method or the like can be used as a parameter update method.

The above processes S120 to S160 are repeated until a predetermined condition is satisfied (S170). The predetermined condition is a condition for determining whether or not the update of the parameter has converged. The parameter difference may be smaller than a predetermined value.

Furthermore, the above-described processes S120 to S170 are performed for all batch data (learning data). For example, it is determined whether or not there is unprocessed batch data (S180), and if there is unprocessed batch data, the above-described processes S120 to S170 are performed (NO in S180), and there is no unprocessed batch data. If so (YES in S180), the process ends.

After performing the above-described processing on all batch data, the finally obtained updated model parameters ^θ _g and ^θ _p are output as learned model parameters θ _g and θ _p .

Next, the estimation device 200 will be described.

<Estimation device 200>
The estimation device 200 receives learned model parameters θ _g and θ _p prior to the estimation process. The estimating apparatus 200 receives the face image S to be estimated as an input, estimates the line-of-sight angle using the learned model parameter θ _g , estimates the effective visual field probability using the learned model parameter θ _p , and calculates the estimated line-of-sight. The angle vector Z _g and the determination result Z _p of the effective field of view are output.

FIG. 7 shows a functional block diagram of the estimation device 200, and FIG. 8 shows its processing flow.

The estimating device 200 includes a line-of-sight angle estimating section 210 and an effective visual field determining section 220 .

The details of each part will be explained below.

<Line-of-sight angle estimation unit 210>
Input: face image S, model parameters θ _g
Output: Estimated gaze angle vector Z _g for face image S
The line-of-sight angle estimator 210 receives the model parameter θ _g prior to the estimation process.

The line-of-sight angle estimating unit 210 estimates the line-of-sight angle from the face image S using the network architecture of the shared network unit 120 and the subsequent line-of-sight angle network unit 130, and the model parameter θ _g (S210). Find the line-of-sight angle vector Z _g ). For example, when the shared network unit 120 is composed of an arbitrary neural network consisting of four convolutional layers, and the line-of-sight angle network unit 130 is composed of an arbitrary neural network consisting of two fully connected layers, the line-of-sight angle estimation unit A neural network 210 is composed of four convolution layers and two fully connected layers corresponding to the shared network section 120 and the line-of-sight angle network section 130, and uses the model parameter θ _g in this neural network.

<Effective field of view determination unit 220>
Input: face image S, model parameters θ _p
Output: Effective field of view determination result Z _p for face image S
The effective field of view determining section 220 receives the model parameters θ _p prior to the estimation process.

The effective visual field determination unit 220 estimates the effective visual field probability from the face image S using the network architecture of the shared network unit 120 and the subsequent effective visual field network unit 150, and the model parameters θ _p , and obtains the estimated value of the effective visual field probability. (S220). For example, the effective field of view determination unit 220 determines whether or not it is within the effective field of view based on the magnitude relationship between the estimated value of the effective field of view probability and a predetermined threshold value. For example, the effective visual field determination unit 220 outputs a determination result indicating that the effective visual field probability is within the effective visual field when the estimated value of the effective visual field probability is equal to or greater than a predetermined threshold value (for example, 0.5), and when the estimated value of the effective visual field probability is less than the predetermined threshold value is out of the effective field of view. However, the effective visual field determination section 220 may output the estimated value of the effective visual field probability itself as the determination result. For example, when the shared network unit 120 is composed of an arbitrary neural network consisting of four convolutional layers, and the effective visual field network unit 150 is composed of an arbitrary neural network consisting of two fully connected layers, the effective visual field determination unit A neural network 220 is composed of four convolutional layers and two fully connected layers corresponding to the shared network section 120 and the effective field of view network section 150, and uses the model parameter θ _p in this neural network.

<effect>
With the above configuration, it is possible to perform highly accurate line-of-sight angle estimation and effective field-of-view determination by capturing the characteristics of both the line-of-sight angle and the effective field-of-view determination.

<Modification>
In this embodiment, the estimation device 200 estimates the line-of-sight angle and the effective visual field probability, but may be configured to estimate only one of them. Even in that case, since the model parameters used for estimating the line-of-sight angle and the effective field of view are learned in a single neural network system during learning, high accuracy is achieved by capturing the characteristics of both the line-of-sight angle and the effective field of view. line-of-sight angle estimation or effective field of view determination.

<effect>
With the above configuration, highly accurate line-of-sight angle estimation and effective field-of-view determination can be performed by capturing the features of both the line-of-sight angle and effective field-of-view determination.

<Other Modifications>
The present invention is not limited to the above embodiments and modifications. For example, the various types of processing described above may not only be executed in chronological order according to the description, but may also be executed in parallel or individually according to the processing capacity of the device that executes the processing or as necessary. In addition, appropriate modifications are possible without departing from the gist of the present invention.

<Program and recording medium>
The various processes described above can be performed by loading a program for executing each step of the above method into the storage unit 2020 of the computer shown in FIG. .

A program that describes this process can be recorded on a computer-readable recording medium. Any computer-readable recording medium may be used, for example, a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory, or the like.

In addition, the distribution of this program is carried out, for example, by selling, assigning, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Further, the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to other computers via the network.

A computer that executes such a program, for example, first stores the program recorded on a portable recording medium or the program transferred from the server computer once in its own storage device. Then, when executing the process, this computer reads the program stored in its own recording medium and executes the process according to the read program. Also, as another execution form of this program, the computer may read the program directly from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to this computer. Each time, the processing according to the received program may be executed sequentially. In addition, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer the program from the server computer to this computer, and realizes the processing function only by its execution instruction and result acquisition. may be It should be noted that the program in this embodiment includes information that is used for processing by a computer and that conforms to the program (data that is not a direct instruction to the computer but has the property of prescribing the processing of the computer, etc.).

In addition, in this embodiment, the device is configured by executing a predetermined program on a computer, but at least part of these processing contents may be implemented by hardware.

Claims

a shared network unit that converts a learning face image S b into an intermediate feature v using a neural network function using model parameters ^θ g or ^θ p ;
a line-of-sight angle network unit that converts the intermediate feature v into an estimated line-of-sight angle vector Z' g by a neural network function using the model parameter ^θ g ;
a line-of-sight angle model parameter optimization unit that updates the model parameter ^θ g using the estimated line-of-sight angle vector Z′ g and the correct line-of-sight angle label for the learning face image S b ;
an effective field of view network unit that converts the intermediate feature v into an estimated effective field of view probability vector Z′ p by a neural network function using the model parameter ^θ p ;
an effective visual field model parameter optimization unit that updates the model parameter ^θ p using the estimated effective visual field probability vector Z'p and the effective visual field correct label for the learning face image Sb ,
obtaining a learned model parameter θ g corresponding to the model parameter ^θ g and a learned model parameter θ p corresponding to the model parameter ^θ p ;
learning device.
An estimating device that uses the model parameters θ g learned by the learning device of claim 1,
A line-of-sight angle estimating unit that estimates a line-of-sight angle from the face image S to be estimated using the network architecture of the shared network unit, the line-of-sight angle network unit that follows, and the model parameter θ g ,
estimation device.
An estimating device that uses the model parameters θ p learned by the learning device of claim 1,
The effective visual field probability is estimated from the face image S using the network architecture of the shared network unit and the subsequent effective visual field network unit, and the model parameter θp , and the effective visual field probability is estimated based on the estimated value of the effective visual field probability. Including an effective field of view determination unit that determines whether it is within
estimation device.
An estimating device using the model parameter θ g and the model parameter θ p learned by the learning device of claim 1,
a line-of-sight angle estimation unit for estimating a line-of-sight angle from a face image S to be estimated using the network architecture of the shared network unit and the line-of-sight angle network unit following it, and the model parameter θ g ;
The effective visual field probability is estimated from the face image S using the network architecture of the shared network unit and the subsequent effective visual field network unit, and the model parameter θp , and the effective visual field probability is estimated based on the estimated value of the effective visual field probability. Including an effective field of view determination unit that determines whether it is within
estimation device.
A shared network step of transforming the learning face image S b into an intermediate feature v by a neural network function using the model parameters ^θ g or ^θ p ;
a line-of-sight angle network step of transforming the intermediate feature v into an estimated line-of-sight angle vector Z' g by a neural network function using the model parameter ^θ g ;
a gaze angle model parameter optimization step of updating the model parameter ^θ g using the estimated gaze angle vector Z′ g and the gaze angle correct label for the learning face image S b ;
an effective visual field network step of transforming the intermediate feature v into an estimated effective visual field probability vector Z′ p by a neural network function using the model parameters ^θ p ;
an effective visual field model parameter optimization step of updating the model parameter ^θ p using the estimated effective visual field probability vector Z' p and the effective visual field correct label for the learning face image S b ;
obtaining a learned model parameter θ g corresponding to the model parameter ^θ g and a learned model parameter θ p corresponding to the model parameter ^θ p ;
learning method.
An estimation method using the model parameter θ g learned by the learning method of claim 5,
A gaze angle estimation step of estimating a gaze angle from the face image S to be estimated using the network architecture of the shared network step and the subsequent gaze angle network step, and the model parameter θ g ,
estimation method.
An estimation method using the model parameter θ p learned by the learning method of claim 5,
estimating a valid field of view probability from the face image S using the network architecture of the shared network step and the subsequent effective field of view network step, and the model parameters θ p ; Including an effective field of view determination step for determining whether within
estimation method.
A program for causing a computer to function as the learning device of claim 1 or the estimation device of any one of claims 2 to 4.