WO2022264269A1 - Training device, estimation device, methods therefor, and program - Google Patents

Training device, estimation device, methods therefor, and program Download PDF

Info

Publication number
WO2022264269A1
WO2022264269A1 PCT/JP2021/022704 JP2021022704W WO2022264269A1 WO 2022264269 A1 WO2022264269 A1 WO 2022264269A1 JP 2021022704 W JP2021022704 W JP 2021022704W WO 2022264269 A1 WO2022264269 A1 WO 2022264269A1
Authority
WO
WIPO (PCT)
Prior art keywords
model parameter
line
visual field
estimated
sight angle
Prior art date
Application number
PCT/JP2021/022704
Other languages
French (fr)
Japanese (ja)
Inventor
瑛彦 高島
亮 増村
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/022704 priority Critical patent/WO2022264269A1/en
Priority to JP2023528808A priority patent/JPWO2022264269A1/ja
Publication of WO2022264269A1 publication Critical patent/WO2022264269A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/10Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
    • A61B3/113Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for determining or recording eye movement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to an estimation technique for estimating a line-of-sight angle and an effective field of view from a face image, and a technique for learning parameters used for estimation.
  • the line-of-sight angle is the rotation angle that represents the orientation of the pupils of the left and right eyeballs. If the line-of-sight angle can be estimated from the image, it will be possible to understand the state of eye movement, such as what the person is gazing at or looking around, and analyze the state and inner understanding of the person. Gaze angle estimation is generally performed using a neural network. In the prior art, only the line-of-sight angle is learned by regression using a neural network model from the eye area image. The correct label of the line-of-sight angle is vector data, and is continuous data of each rotation angle value of the horizontal component and vertical component of the eyeball.
  • the parameters used for estimating the line-of-sight angle using a neural network for example, convolution layers and pooling layers, which are widely used in image recognition, are used to extract image features.
  • the line-of-sight angle can be estimated by performing a regression on .
  • Non-Patent Document 1 is a method of estimating the line-of-sight angle using a neural network.
  • the effective field of view refers to the horizontal angle of the eyeball between -15 degrees and 15 degrees and the vertical angle between -12 degrees and 8 degrees (see Fig. 1). Humans can clearly recognize objects within this range. can. Conversely, the ability to recognize objects outside the effective field of view is extremely reduced. For example, if the object is a robot as a use case, the effective field of view scale can be used to determine whether a robot equipped with a camera is recognized by a human being captured by the camera.
  • FIG. 1 is a diagram showing the human visual field and visual field characteristics.
  • FIG. 2 is a diagram showing the inside of the effective field of view (the range in which the presence of an object can be recognized) and the outside of the effective field of view (the range in which the presence of an object cannot be recognized).
  • An object of the present invention is to provide an estimating device, a learning device, a method thereof, and a program capable of performing high-accuracy line-of-sight angle estimation and effective field-of-view determination that capture the characteristics of both line-of-sight angle and effective field-of-view determination.
  • a learning device uses a model parameter ⁇ g or ⁇ p to transform a learning face image S b into an intermediate feature v using a neural network function.
  • a shared network unit that transforms into , a line-of-sight angle network unit that transforms the intermediate feature v into an estimated line-of-sight angle vector Z' g using a neural network function using the model parameter ⁇ g , an estimated line-of-sight angle vector Z' g and
  • the gaze angle model parameter optimization unit updates the model parameter ⁇ g
  • the model parameter ⁇ p is used to obtain the intermediate feature v to an estimated effective visual field probability vector Z' p
  • the estimated effective visual field probability vector Z' p and the effective visual field correct label for the learning face image S b to update the model parameter ⁇ p and an effective field of view model parameter optimizing unit that acquires a learned model parameter
  • FIG. 4 is a diagram for explaining an estimated angle error;
  • FIG. 2 is a functional block diagram of the learning device according to the first embodiment; 4 is a diagram showing an example of the processing flow of the learning device according to the first embodiment;
  • a learning device simultaneously learns the model parameters used for estimating the line-of-sight angle and the model parameters used for determining the effective visual field.
  • the learning data includes a learning face image, a correct label Y g for the line-of-sight angle, and a correct label Y p for the effective visual field.
  • the configuration of the neural network is a shared network section, and a line-of-sight angle network section and an effective visual field network section that branch off after that.
  • the sharing network receives face images for learning as input, and can be expected to learn to acquire features of both line-of-sight angle and effective field of view. Calculate the probability in , use each correct label to minimize the error of the estimate, and update the model parameters.
  • the line-of-sight angle estimator and the effective visual field determiner are used, respectively.
  • the line-of-sight angle estimation unit estimates the line-of-sight angle using the network architecture of the shared network unit and the subsequent line-of-sight angle network unit, and the model parameters learned by the learning device.
  • the effective field of view determination section uses the network architecture of the shared network section and the subsequent effective field of view network section, and the model parameters learned by the learning device to determine the effective field of view.
  • FIG. 4 shows a configuration example of an estimation system according to the first embodiment.
  • the estimation system includes a learning device 100 and an estimation device 200.
  • the learning device 100 and the estimating device 200 are configured by reading a special program into a known or dedicated computer having, for example, a central processing unit (CPU: Central Processing Unit), a main memory (RAM: Random Access Memory), etc. It is a special device designed Learning device 100 and estimating device 200 execute each process under the control of a central processing unit, for example.
  • the data input to the learning device 100 and the estimation device 200 and the data obtained in each process are stored, for example, in a main storage device, and the data stored in the main storage device are read into the central processing unit as needed. output and used for other processing.
  • At least a part of each processing unit of learning device 100 and estimation device 200 may be configured by hardware such as an integrated circuit.
  • Each storage unit included in the learning device 100 and the estimating device 200 can be configured by, for example, a main storage device such as RAM (Random Access Memory), or middleware such as a relational database or key-value store.
  • a main storage device such as RAM (Random Access Memory), or middleware such as a relational database or key-value store.
  • middleware such as a relational database or key-value store.
  • each storage unit does not necessarily have to be provided inside the learning device 100 and the estimation device 200, and may be configured by an auxiliary storage device configured by a semiconductor memory device such as a hard disk, an optical disk, or a flash memory. , may be provided outside the learning device 100 and the estimation device 200 .
  • M is the data size of the learning data D.
  • the learning face image may be, for example, an image obtained by cutting out only the face or an image obtained by cutting out only the eye region.
  • the training face image has a resolution of 224x224 pixels and three RGB channels.
  • FIG. 5 shows a functional block diagram of the learning device 100
  • FIG. 6 shows its processing flow.
  • the learning device 100 includes a shared network unit 120 , a line-of-sight angle network unit 130 , a line-of-sight angle model parameter optimization unit 140 , an effective visual field network unit 150 , and an effective visual field model parameter optimization unit 160 .
  • the shared network unit 120 receives the learning face image, learns the features of both the line-of-sight angle and the effective field of view, and acquires intermediate features using an arbitrary neural network that outputs intermediate features.
  • the line-of-sight angle network unit 130 acquires the line-of-sight angle estimate using an arbitrary neural network that inputs intermediate features and outputs the line-of-sight angle estimate.
  • the line-of-sight angle model parameter optimization unit 140 receives the line-of-sight angle estimate and the line-of-sight angle correct label, calculates the error of the line-of-sight angle estimate, and updates the model parameter ⁇ g based on the error.
  • the effective field of view network unit 150 acquires an estimated effective field of view probability using an arbitrary neural network that receives intermediate features as input and outputs a probability of being within the effective field of view (estimated effective field of view probability).
  • the effective field of view model parameter optimization unit 160 receives the estimated effective field of view probability and the correct label of the effective field of view as inputs, calculates the error of the estimated effective field of view probability, and updates the model parameter ⁇ p based on the error.
  • the above process shows the learning procedure for one batch (a block of data partially selected from the learning data), and by repeating this, all data can be learned any number of times.
  • the shared network unit 120 is composed of an arbitrary neural network, and is composed of, for example, four convolutional layers.
  • the shared network unit 120 selects the updated model parameter ⁇ g (output value of the line-of-sight angle model parameter optimization unit 140) or ⁇ p (output value of the effective field model parameter optimization unit 160). It receives parameters corresponding to arbitrary neural networks that make up the shared network unit 120 .
  • the shared network unit 120 uses the updated model parameters ⁇ g or ⁇ p corresponding to an arbitrary neural network constituting the shared network unit 120 to obtain the learning face image S by the function of an arbitrary neural network.
  • b is converted to an intermediate feature v (S120).
  • the learning face images S b are obtained by dividing the learning face images S 1 , . . . , S
  • the line-of-sight angle network unit 130 is composed of an arbitrary neural network, such as two fully connected layers.
  • the line-of-sight angle network unit 130 selects parameters corresponding to arbitrary neural networks constituting the line-of-sight angle network unit 130 among the updated model parameters ⁇ g (output values of the line-of-sight angle model parameter optimization unit 140). receive.
  • the line-of-sight angle network unit 130 uses the received parameters to transform the intermediate feature v into an estimated line-of-sight angle vector Z' g by an arbitrary neural network function (S130).
  • the estimated line-of-sight angle vector is vector data that stores an estimated value of the horizontal rotation angle and an estimated value of the vertical rotation angle of the line-of-sight angle.
  • the estimated line-of-sight angle vector is an estimate of the line-of-sight angle vector.
  • the line-of-sight angle vector is vector data that stores the horizontal rotation angle and vertical rotation angle of the line-of-sight angle.
  • the rotation angle range is from -180 degrees to 180 degrees.
  • the line-of-sight angle vector is a vector such as [-35,56].
  • Z' g,q be the estimated gaze angle vector corresponding to the q-th intermediate feature v q of a batch
  • Z' g [Z' g,1 ,Z' g,2 ,...,Z' g, Q ].
  • the estimated line-of-sight angle vector Z' g,q becomes a vector like [-35,56].
  • ⁇ Visual line angle model parameter optimization unit 140> Input: estimated line-of-sight angle vector Z' g , correct line-of-sight angle label T g 1 ,...,T g
  • the model parameters ⁇ g and ⁇ g are obtained by connecting the parameters corresponding to the neural networks forming the shared network unit 120 and the parameters corresponding to the neural networks forming the line-of-sight angle network unit 130 .
  • the line-of-sight angle model parameter optimization unit 140 updates the model parameter ⁇ g using the estimated line-of-sight angle vector Z' g and the correct line-of-sight angle label T g 1 , ..., T g
  • the line-of-sight angle model parameter optimization unit 140 calculates the error between the estimated line-of-sight angle vector Z′ g and the correct line-of-sight angle label T g 1 , . . . , T g
  • an MSE error or an MAE error can be used as the error, and a gradient descent method or the like can be used as a parameter update method.
  • the effective field of view network unit 150 Prior to conversion processing, extracts parameters corresponding to arbitrary neural networks constituting the effective field of view network unit 150 among the updated model parameters ⁇ p (output values of the effective field of view model parameter optimization unit 160). receive.
  • the effective field of view network unit 150 uses the received parameters to convert one batch of intermediate features v into an estimated effective field of view probability vector Z′ p by an arbitrary neural network function (S150).
  • the estimated effective visual field probability vector is a vector consisting of the estimated value of the effective visual field probability (estimated effective visual field probability), and the effective visual field probability is the probability of whether or not the gaze direction of the learning face image is within the effective visual field. .
  • the effective field of view probability is the probability of whether or not the camera that captures the learning face image exists within the effective field of view of the subject of the learning face image.
  • Z' p,q be the estimated effective field probability corresponding to the q-th intermediate feature v q of a batch
  • Z' p [Z' p,1 ,Z' p,2 ,...,Z' g, Q ].
  • the model parameters ⁇ p and ⁇ p are obtained by connecting the parameters corresponding to the neural networks forming the shared network section 120 and the parameters corresponding to the neural networks forming the effective field of view network section 150 .
  • the effective visual field model parameter optimization unit 160 updates the model parameter ⁇ p using the estimated effective visual field probability vector Z' p and the effective visual field correct labels T p 1 , ..., T p
  • the effective field of view model parameter optimization unit 160 calculates the error between the estimated effective field of view probability vector Z′ p and the effective field of view correct label T p 1 , .
  • the above processes S120 to S160 are repeated until a predetermined condition is satisfied (S170).
  • the predetermined condition is a condition for determining whether or not the update of the parameter has converged.
  • the parameter difference may be smaller than a predetermined value.
  • the above-described processes S120 to S170 are performed for all batch data (learning data). For example, it is determined whether or not there is unprocessed batch data (S180), and if there is unprocessed batch data, the above-described processes S120 to S170 are performed (NO in S180), and there is no unprocessed batch data. If so (YES in S180), the process ends.
  • the estimation device 200 receives learned model parameters ⁇ g and ⁇ p prior to the estimation process.
  • the estimating apparatus 200 receives the face image S to be estimated as an input, estimates the line-of-sight angle using the learned model parameter ⁇ g , estimates the effective visual field probability using the learned model parameter ⁇ p , and calculates the estimated line-of-sight.
  • the angle vector Z g and the determination result Z p of the effective field of view are output.
  • FIG. 7 shows a functional block diagram of the estimation device 200
  • FIG. 8 shows its processing flow.
  • the estimating device 200 includes a line-of-sight angle estimating section 210 and an effective visual field determining section 220 .
  • ⁇ Line-of-sight angle estimation unit 210 > Input: face image S, model parameters ⁇ g Output: Estimated gaze angle vector Z g for face image S The line-of-sight angle estimator 210 receives the model parameter ⁇ g prior to the estimation process.
  • the line-of-sight angle estimating unit 210 estimates the line-of-sight angle from the face image S using the network architecture of the shared network unit 120 and the subsequent line-of-sight angle network unit 130, and the model parameter ⁇ g (S210). Find the line-of-sight angle vector Z g ).
  • the line-of-sight angle estimation unit A neural network 210 is composed of four convolution layers and two fully connected layers corresponding to the shared network section 120 and the line-of-sight angle network section 130, and uses the model parameter ⁇ g in this neural network.
  • the effective field of view determining section 220 receives the model parameters ⁇ p prior to the estimation process.
  • the effective visual field determination unit 220 estimates the effective visual field probability from the face image S using the network architecture of the shared network unit 120 and the subsequent effective visual field network unit 150, and the model parameters ⁇ p , and obtains the estimated value of the effective visual field probability. (S220). For example, the effective field of view determination unit 220 determines whether or not it is within the effective field of view based on the magnitude relationship between the estimated value of the effective field of view probability and a predetermined threshold value.
  • the effective visual field determination unit 220 outputs a determination result indicating that the effective visual field probability is within the effective visual field when the estimated value of the effective visual field probability is equal to or greater than a predetermined threshold value (for example, 0.5), and when the estimated value of the effective visual field probability is less than the predetermined threshold value is out of the effective field of view.
  • a predetermined threshold value for example, 0.5
  • the effective visual field determination section 220 may output the estimated value of the effective visual field probability itself as the determination result.
  • the effective visual field determination unit A neural network 220 is composed of four convolutional layers and two fully connected layers corresponding to the shared network section 120 and the effective field of view network section 150, and uses the model parameter ⁇ p in this neural network.
  • the estimation device 200 estimates the line-of-sight angle and the effective visual field probability, but may be configured to estimate only one of them. Even in that case, since the model parameters used for estimating the line-of-sight angle and the effective field of view are learned in a single neural network system during learning, high accuracy is achieved by capturing the characteristics of both the line-of-sight angle and the effective field of view. line-of-sight angle estimation or effective field of view determination.
  • the present invention is not limited to the above embodiments and modifications.
  • the various types of processing described above may not only be executed in chronological order according to the description, but may also be executed in parallel or individually according to the processing capacity of the device that executes the processing or as necessary.
  • appropriate modifications are possible without departing from the gist of the present invention.
  • a program that describes this process can be recorded on a computer-readable recording medium.
  • Any computer-readable recording medium may be used, for example, a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory, or the like.
  • this program is carried out, for example, by selling, assigning, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded.
  • the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to other computers via the network.
  • a computer that executes such a program for example, first stores the program recorded on a portable recording medium or the program transferred from the server computer once in its own storage device. Then, when executing the process, this computer reads the program stored in its own recording medium and executes the process according to the read program. Also, as another execution form of this program, the computer may read the program directly from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to this computer. Each time, the processing according to the received program may be executed sequentially. In addition, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer the program from the server computer to this computer, and realizes the processing function only by its execution instruction and result acquisition. may be It should be noted that the program in this embodiment includes information that is used for processing by a computer and that conforms to the program (data that is not a direct instruction to the computer but has the property of prescribing the processing of the computer, etc.).
  • ASP
  • the device is configured by executing a predetermined program on a computer, but at least part of these processing contents may be implemented by hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Ophthalmology & Optometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Image Analysis (AREA)

Abstract

Provided are an estimation device, etc., with which it is possible to estimate an angle of sight with high accuracy and assesses an effective visual field capturing the features of both the angle of sight and effective visual field assessment. This training device converts a face image for training to an intermediate feature by a neural network function, using a model parameter ^θg or ^θp, converts the intermediate feature to an estimate sight angle vector by a neural network function, using the model parameter ^θg, updates the model parameter ^θg, using the estimate sight angle vector and a sight angle correct-answer label for the face image for training, converts an intermediate feature v to an estimate effective visual field probability vector by a neural network function, using the model parameter ^θp, and updates the model parameter ^θp, using the estimate effective visual field probability vector and an effective visual field correct-answer label for the face image for training.

Description

学習装置、推定装置、それらの方法、およびプログラムLEARNING APPARATUS, ESTIMATION APPARATUS, THEIR METHOD, AND PROGRAM
 本発明は、顔画像から視線角度と有効視野を推定する推定技術と、推定する際に用いるパラメータの学習技術に関する。 The present invention relates to an estimation technique for estimating a line-of-sight angle and an effective field of view from a face image, and a technique for learning parameters used for estimation.
 視線角度とは、左目、右目の眼球の瞳孔の向きを表す回転角度である。視線角度を画像から推定することができると、人間が何を注視しているのか、またはキョロキョロしている、など眼球の運動状態が分かり、人間の状態、内面理解を分析することができる。視線角度推定は、一般にニューラルネットワークを利用して行われる。従来技術では、視線角度のみを目領域画像からニューラルネットワークモデルを用いて、回帰をする学習を行う。視線角度の正解ラベルはベクトルデータであり、眼球の水平成分、垂直成分の各回転角度値の連続データである。ニューラルネットワークによる視線角度推定に用いるパラメータの学習では、例えば画像認識等で広く用いられている畳み込み層、プーリング層を用いて画像の特徴を抽出し、その後の全結合層により、視線角度のベクトルデータへの回帰を行うことで、視線角度を推定することができる。 The line-of-sight angle is the rotation angle that represents the orientation of the pupils of the left and right eyeballs. If the line-of-sight angle can be estimated from the image, it will be possible to understand the state of eye movement, such as what the person is gazing at or looking around, and analyze the state and inner understanding of the person. Gaze angle estimation is generally performed using a neural network. In the prior art, only the line-of-sight angle is learned by regression using a neural network model from the eye area image. The correct label of the line-of-sight angle is vector data, and is continuous data of each rotation angle value of the horizontal component and vertical component of the eyeball. In learning the parameters used for estimating the line-of-sight angle using a neural network, for example, convolution layers and pooling layers, which are widely used in image recognition, are used to extract image features. The line-of-sight angle can be estimated by performing a regression on .
 非特許文献1は、ニューラルネットワークを用いて視線角度を推定する手法である。 Non-Patent Document 1 is a method of estimating the line-of-sight angle using a neural network.
 一方、人物が物体を認識しているか、どうかを判別する尺度として有効視野を活用する手法がある。有効視野とは眼球の水平角度-15度以上15度以下、垂直角度-12度以上8度以下の範囲を示し(図1参照)、人間はこの範囲にある対象物を明に認識することができる。逆に有効視野外の物体の認識力は極端に低下する。例えばユースケースとして対象物をロボットとすると、カメラが搭載されているロボットが、カメラに映っている人間から認識されているか、認識されていないかを有効視野尺度を用いて判別することができる。具体的には、人間がロボットに対して顔を向けているのにも関わらず、視線角度が有効視野外の場合は、ロボットを認識していなく、有効視野内の場合はロボットを認識していると判定できる。図1は、人間の視野と視野特性を表す図である。図2は、有効視野内(ものがあると認識できている範囲)と有効視野外(ものがあると認識できていない範囲)を示す図である。 On the other hand, there is a method that utilizes the effective field of view as a measure for determining whether a person recognizes an object. The effective field of view refers to the horizontal angle of the eyeball between -15 degrees and 15 degrees and the vertical angle between -12 degrees and 8 degrees (see Fig. 1). Humans can clearly recognize objects within this range. can. Conversely, the ability to recognize objects outside the effective field of view is extremely reduced. For example, if the object is a robot as a use case, the effective field of view scale can be used to determine whether a robot equipped with a camera is recognized by a human being captured by the camera. Specifically, although the human is facing the robot, if the line-of-sight angle is outside the effective visual field, the robot is not recognized, and if it is within the effective visual field, the robot is recognized. It can be determined that there is FIG. 1 is a diagram showing the human visual field and visual field characteristics. FIG. 2 is a diagram showing the inside of the effective field of view (the range in which the presence of an object can be recognized) and the outside of the effective field of view (the range in which the presence of an object cannot be recognized).
 従来の視線角度推定に用いるモデルパラメータの学習では視線角度のみをニューラルネットワークで回帰する手法を用いており、視線角度が有効視野内か有効視野外かは推定視線角度からのルール判定によって判別する。また、従来技術は視線角度の誤差最小化のみを行っており、有効視野内か外かは考慮していない学習を行っている。視線角度の推定誤差の傾向として、正解角度の絶対値が大きくなるほど、誤差も大きくなり、有効視野境界である水平角度-15度付近および15度付近(図3の破線で囲んだ部分)、垂直角度-12度付近および8度付近も角度誤差が大きくなる傾向がある。このため、推定した視線角度から、ルールにて有効視野を判定する手法では、有効視野境界付近では、有効視野の判定精度が低下する問題がある。  Conventional learning of model parameters used for estimating the line-of-sight angle uses a method that regresses only the line-of-sight angle with a neural network, and whether the line-of-sight angle is within the effective field of view or outside the effective field of view is determined by rule determination from the estimated line-of-sight angle. In addition, the conventional technique only minimizes the error of the line-of-sight angle, and performs learning without considering whether it is inside or outside the effective visual field. As for the tendency of the estimation error of the line-of-sight angle, the larger the absolute value of the correct angle, the larger the error. Angle errors also tend to increase around -12 degrees and around 8 degrees. For this reason, the method of determining the effective field of view based on a rule based on the estimated line-of-sight angle has a problem that the accuracy of determination of the effective field of view decreases near the boundary of the effective field of view.
 本発明は、視線角度と有効視野判定の両者の特徴を捉えた高精度な視線角度推定、有効視野判定を行うことができる推定装置、学習装置、それらの方法、およびプログラムを提供することを目的とする。 An object of the present invention is to provide an estimating device, a learning device, a method thereof, and a program capable of performing high-accuracy line-of-sight angle estimation and effective field-of-view determination that capture the characteristics of both line-of-sight angle and effective field-of-view determination. and
 上記の課題を解決するために、本発明の一態様によれば、学習装置は、モデルパラメータ^θgまたは^θpを用いて、ニューラルネットワークの関数により学習用顔画像Sbを中間特徴vに変換する共有ネットワーク部と、モデルパラメータ^θgを用いて、ニューラルネットワークの関数により中間特徴vを推定視線角度ベクトルZ'gに変換する視線角度ネットワーク部と、推定視線角度ベクトルZ'gと学習用顔画像Sbに対する視線角度正解ラベルとを用いて、モデルパラメータ^θgを更新する視線角度モデルパラメータ最適化部と、モデルパラメータ^θpを用いて、ニューラルネットワークの関数により中間特徴vを推定有効視野確率ベクトルZ'pに変換する有効視野ネットワーク部と、推定有効視野確率ベクトルZ'pと学習用顔画像Sbに対する有効視野正解ラベルとを用いて、モデルパラメータ^θpを更新する有効視野モデルパラメータ最適化部とを含み、モデルパラメータ^θgに対応する学習済みのモデルパラメータθgと、モデルパラメータ^θpに対応する学習済みのモデルパラメータθpを取得する。 In order to solve the above problems, according to one aspect of the present invention, a learning device uses a model parameter ^θ g or ^θ p to transform a learning face image S b into an intermediate feature v using a neural network function. a shared network unit that transforms into , a line-of-sight angle network unit that transforms the intermediate feature v into an estimated line-of-sight angle vector Z' g using a neural network function using the model parameter ^θ g , an estimated line-of-sight angle vector Z' g and Using the gaze angle correct label for the learning face image S b , the gaze angle model parameter optimization unit updates the model parameter ^θ g , and the model parameter ^θ p is used to obtain the intermediate feature v to an estimated effective visual field probability vector Z' p , and the estimated effective visual field probability vector Z' p and the effective visual field correct label for the learning face image S b to update the model parameter ^θ p and an effective field of view model parameter optimizing unit that acquires a learned model parameter θ g corresponding to the model parameter ^θ g and a learned model parameter θ p corresponding to the model parameter ^θ p .
 本発明によれば、視線角度と有効視野判定の両者の特徴を捉えた高精度な視線角度推定、有効視野判定を行うことができるという効果を奏する。 According to the present invention, it is possible to perform highly accurate line-of-sight angle estimation and effective field-of-view determination by capturing the features of both line-of-sight angle and effective field-of-view determination.
人間の視野と視野特性を表す図。A diagram showing the human visual field and visual field characteristics. 有効視野内と有効視野外を示す図。The figure which shows the inside of an effective visual field, and the outside of an effective visual field. 推定角度誤差を説明するための図。FIG. 4 is a diagram for explaining an estimated angle error; 第一実施形態に係る推定システムの構成例を示す図。The figure which shows the structural example of the estimation system which concerns on 1st embodiment. 第一実施形態に係る学習装置の機能ブロック図。FIG. 2 is a functional block diagram of the learning device according to the first embodiment; 第一実施形態に係る学習装置の処理フローの例を示す図。4 is a diagram showing an example of the processing flow of the learning device according to the first embodiment; FIG. 第一実施形態に係る推定装置の機能ブロック図。The functional block diagram of the estimation apparatus which concerns on 1st embodiment. 第一実施形態に係る推定装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the estimation apparatus which concerns on 1st embodiment. 本手法を適用するコンピュータの構成例を示す図。The figure which shows the structural example of the computer which applies this method.
 以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号「^」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Embodiments of the present invention will be described below. It should be noted that in the drawings used for the following description, the same reference numerals are given to components having the same functions and steps that perform the same processing, and redundant description will be omitted. In the following description, symbols such as "^" used in the text should be written directly above the immediately preceding character, but are written immediately after the relevant character due to text notation restrictions. These symbols are written in their original positions in the formulas. Further, unless otherwise specified, the processing performed for each element of a vector or matrix is applied to all the elements of the vector or matrix.
<第一実施形態のポイント>
 本実施形態は、視線角度の推定に用いるモデルパラメータと、有効視野の判定に用いるモデルパラメータを、同時に学習装置にて学習する。学習データには、学習用顔画像と、それに対する視線角度の正解ラベルYgと、有効視野の正解ラベルYpを持つ。ニューラルネットワークの構成は、共有ネットワーク部と、その後段に分岐する視線角度ネットワーク部と有効視野ネットワーク部である。共有ネットワーク部は学習用顔画像を入力とし、視線角度と有効視野の両者の特徴を獲得する学習が期待でき、後段の視線角度ネットワーク部では、視線角度を回帰し、有効視野ネットワーク部では有効視野内の確率を計算し、それぞれの正解ラベルを用いて推定値の誤差最小化を行い、モデルパラメータを更新する。推定時には、それぞれ、視線角度推定部、有効視野判定部を用いる。視線角度推定部は、共有ネットワーク部とそれに続く視線角度ネットワーク部のネットワークアーキテクチャと、学習装置で学習したモデルパラメータを用いて、視線角度の推定を行う。同様に、有効視野判定部は、共有ネットワーク部とそれに続く有効視野ネットワーク部のネットワークアーキテクチャと、学習装置で学習したモデルパラメータを用いて、有効視野の判定を行う。
<Points of the first embodiment>
In this embodiment, a learning device simultaneously learns the model parameters used for estimating the line-of-sight angle and the model parameters used for determining the effective visual field. The learning data includes a learning face image, a correct label Y g for the line-of-sight angle, and a correct label Y p for the effective visual field. The configuration of the neural network is a shared network section, and a line-of-sight angle network section and an effective visual field network section that branch off after that. The sharing network receives face images for learning as input, and can be expected to learn to acquire features of both line-of-sight angle and effective field of view. Calculate the probability in , use each correct label to minimize the error of the estimate, and update the model parameters. At the time of estimation, the line-of-sight angle estimator and the effective visual field determiner are used, respectively. The line-of-sight angle estimation unit estimates the line-of-sight angle using the network architecture of the shared network unit and the subsequent line-of-sight angle network unit, and the model parameters learned by the learning device. Similarly, the effective field of view determination section uses the network architecture of the shared network section and the subsequent effective field of view network section, and the model parameters learned by the learning device to determine the effective field of view.
<第一実施形態>
 図4は第一実施形態に係る推定システムの構成例を示す。
<First embodiment>
FIG. 4 shows a configuration example of an estimation system according to the first embodiment.
 推定システムは、学習装置100と、推定装置200とを含む。 The estimation system includes a learning device 100 and an estimation device 200.
 学習装置100および推定装置200は、例えば、中央演算処理装置(CPU: Central Processing Unit)、主記憶装置(RAM: Random Access Memory)などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。学習装置100および推定装置200は、例えば、中央演算処理装置の制御のもとで各処理を実行する。学習装置100および推定装置200に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて中央演算処理装置へ読み出されて他の処理に利用される。学習装置100および推定装置200の各処理部は、少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。学習装置100および推定装置200が備える各記憶部は、例えば、RAM(Random Access Memory)などの主記憶装置、またはリレーショナルデータベースやキーバリューストアなどのミドルウェアにより構成することができる。ただし、各記憶部は、必ずしも学習装置100および推定装置200がその内部に備える必要はなく、ハードディスクや光ディスクもしくはフラッシュメモリ(Flash Memory)のような半導体メモリ素子により構成される補助記憶装置により構成し、学習装置100および推定装置200の外部に備える構成としてもよい。 The learning device 100 and the estimating device 200 are configured by reading a special program into a known or dedicated computer having, for example, a central processing unit (CPU: Central Processing Unit), a main memory (RAM: Random Access Memory), etc. It is a special device designed Learning device 100 and estimating device 200 execute each process under the control of a central processing unit, for example. The data input to the learning device 100 and the estimation device 200 and the data obtained in each process are stored, for example, in a main storage device, and the data stored in the main storage device are read into the central processing unit as needed. output and used for other processing. At least a part of each processing unit of learning device 100 and estimation device 200 may be configured by hardware such as an integrated circuit. Each storage unit included in the learning device 100 and the estimating device 200 can be configured by, for example, a main storage device such as RAM (Random Access Memory), or middleware such as a relational database or key-value store. However, each storage unit does not necessarily have to be provided inside the learning device 100 and the estimation device 200, and may be configured by an auxiliary storage device configured by a semiconductor memory device such as a hard disk, an optical disk, or a flash memory. , may be provided outside the learning device 100 and the estimation device 200 .
 まず、学習装置100について説明する。 First, the learning device 100 will be explained.
<学習装置100>
 学習装置100は、学習データD=(S1,Tg 1,Tp 1),…,(S|M|,Tg |M|,Tp |M|)を入力とし、学習データDを用いてモデルパラメータθgpを学習し、学習済みのモデルパラメータθgpを出力する。ここで、Mは学習データDのデータサイズである。Sm(m=1,…,M)は、学習用顔画像である。学習用顔画像は、例えば、顔だけを切り出した画像、または、目領域だけを切り出した画像のどちらでもよい。例えば、学習用顔画像の解像度は224x224ピクセル、RGB3チャンネルを持つ。Tg m(m=1,…,M)は、視線角度の正解ラベル(以下、「視線角度正解ラベル」ともいう)であり、例えば、視線角度の水平、垂直回転角度を格納したベクトルデータであり、[-35,56]のようなデータ形式をとる。Tp m(m=1,…,M)は、有効視野の正解ラベル(以下、「有効視野正解ラベル」ともいう)であり、例えば、有効視野内である場合は[1]、有効視野外である場合[0]となる。
<Learning Device 100>
Learning device 100 receives learning data D = ( S 1 , T g 1 , T p 1 ) , . is used to learn the model parameters θ g and θ p , and the learned model parameters θ g and θ p are output. Here, M is the data size of the learning data D. S m (m=1,...,M) are learning face images. The learning face image may be, for example, an image obtained by cutting out only the face or an image obtained by cutting out only the eye region. For example, the training face image has a resolution of 224x224 pixels and three RGB channels. T g m (m=1,...,M) is the correct label for the line-of-sight angle (hereinafter also referred to as the "correct line-of-sight angle label"). Yes, and takes a data format like [-35,56]. T p m (m=1,...,M) is the correct label of the effective field of view (hereinafter also referred to as "effective field of view correct label"). If , it becomes [0].
 図5は、学習装置100の機能ブロック図を、図6はその処理フローを示す。 FIG. 5 shows a functional block diagram of the learning device 100, and FIG. 6 shows its processing flow.
 学習装置100は、共有ネットワーク部120と、視線角度ネットワーク部130と、視線角度モデルパラメータ最適化部140と、有効視野ネットワーク部150と、有効視野モデルパラメータ最適化部160とを含む。 The learning device 100 includes a shared network unit 120 , a line-of-sight angle network unit 130 , a line-of-sight angle model parameter optimization unit 140 , an effective visual field network unit 150 , and an effective visual field model parameter optimization unit 160 .
 以下、各部の概要について説明する。 Below is an overview of each part.
 共有ネットワーク部120は、学習用顔画像を入力とし、視線角度と有効視野の両者の特徴を学習し、中間特徴を出力する任意のニューラルネットワークを用いて、中間特徴を取得する。 The shared network unit 120 receives the learning face image, learns the features of both the line-of-sight angle and the effective field of view, and acquires intermediate features using an arbitrary neural network that outputs intermediate features.
 視線角度ネットワーク部130は、中間特徴を入力とし、視線角度の推定値を出力する任意のニューラルネットワークを用いて、視線角度の推定値を取得する。 The line-of-sight angle network unit 130 acquires the line-of-sight angle estimate using an arbitrary neural network that inputs intermediate features and outputs the line-of-sight angle estimate.
 視線角度モデルパラメータ最適化部140は、視線角度の推定値と視線角度の正解ラベルを入力とし、視線角度の推定値の誤差を計算し、誤差に基づいてモデルパラメータθgを更新する。 The line-of-sight angle model parameter optimization unit 140 receives the line-of-sight angle estimate and the line-of-sight angle correct label, calculates the error of the line-of-sight angle estimate, and updates the model parameter θ g based on the error.
 有効視野ネットワーク部150は、中間特徴を入力とし、有効視野内である確率(推定有効視野確率)を出力する任意のニューラルネットワークを用いて、推定有効視野確率を取得する。 The effective field of view network unit 150 acquires an estimated effective field of view probability using an arbitrary neural network that receives intermediate features as input and outputs a probability of being within the effective field of view (estimated effective field of view probability).
 有効視野モデルパラメータ最適化部160は、推定有効視野確率と有効視野の正解ラベルを入力とし、推定有効視野確率の誤差を計算し、誤差に基づいてモデルパラメータθpを更新する。 The effective field of view model parameter optimization unit 160 receives the estimated effective field of view probability and the correct label of the effective field of view as inputs, calculates the error of the estimated effective field of view probability, and updates the model parameter θ p based on the error.
 上述の処理は、1バッチ(学習データのうち部分的に選択した一塊のデータ)の学習手順を示したものであり、これを繰り返して、全データの学習を任意の回数、行えるものとする。 The above process shows the learning procedure for one batch (a block of data partially selected from the learning data), and by repeating this, all data can be learned any number of times.
 以下、各部の詳細について説明する。 The details of each part will be explained below.
<共有ネットワーク部120>
入力:学習用顔画像S1,…,S|M|、更新したモデルパラメータ^θgまたは^θpのうち共有ネットワーク部120を構成するニューラルネットワークに対応するパラメータ
出力:1バッチの中間特徴v
 共有ネットワーク部120は、任意のニューラルネットワークで構成されており、例えば、4層の畳み込み層などで構成される。
<Shared Network Unit 120>
Input : training face images S 1 , .
The shared network unit 120 is composed of an arbitrary neural network, and is composed of, for example, four convolutional layers.
 共有ネットワーク部120は、変換処理に先立ち、更新したモデルパラメータ^θg(視線角度モデルパラメータ最適化部140の出力値)または^θp(有効視野モデルパラメータ最適化部160の出力値)のうち共有ネットワーク部120を構成する任意のニューラルネットワークに対応するパラメータを受け取る。 Prior to the conversion process, the shared network unit 120 selects the updated model parameter ^θ g (output value of the line-of-sight angle model parameter optimization unit 140) or ^θ p (output value of the effective field model parameter optimization unit 160). It receives parameters corresponding to arbitrary neural networks that make up the shared network unit 120 .
 共有ネットワーク部120は、更新したモデルパラメータ^θgまたは^θpのうち共有ネットワーク部120を構成する任意のニューラルネットワークに対応するパラメータを用いて、任意のニューラルネットワークの関数により学習用顔画像Sbを中間特徴vに変換する(S120)。学習用顔画像Sbは、学習用顔画像S1,…,S|M|をバッチ単位に分割したものである。1バッチの学習用顔画像Sbは、例えば、16枚の画像で構成される。例えば、あるバッチの学習用顔画像SbがQ枚の画像で構成される場合、q番目の画像に対応する中間特徴をvqとし、v=[v1,v2,…,vQ]とする。ただし、q=1,2,…,Qである。 The shared network unit 120 uses the updated model parameters ^θ g or ^θ p corresponding to an arbitrary neural network constituting the shared network unit 120 to obtain the learning face image S by the function of an arbitrary neural network. b is converted to an intermediate feature v (S120). The learning face images S b are obtained by dividing the learning face images S 1 , . . . , S |M| into batches. One batch of learning face images S b is composed of, for example, 16 images. For example, if a batch of training face images S b consists of Q images, let v q be the intermediate feature corresponding to the q-th image, and v=[v 1 ,v 2 ,...,v Q ] and However, q=1,2,...,Q.
<視線角度ネットワーク部130>
入力:1バッチの中間特徴v、更新したモデルパラメータ^θgのうち視線角度ネットワーク部130を構成するニューラルネットワークに対応するパラメータ
出力:推定視線角度ベクトルZ'g
 視線角度ネットワーク部130は、任意のニューラルネットワークで構成されており、例えば2層の全結合層などで構成される。
<Line-of-sight angle network unit 130>
Input: 1 batch of intermediate features v, updated model parameters ^θ g that correspond to the neural network forming the line-of-sight angle network unit 130 Output: estimated line-of-sight angle vector Z' g
The line-of-sight angle network unit 130 is composed of an arbitrary neural network, such as two fully connected layers.
 視線角度ネットワーク部130は、変換処理に先立ち、更新したモデルパラメータ^θg(視線角度モデルパラメータ最適化部140の出力値)のうち視線角度ネットワーク部130を構成する任意のニューラルネットワークに対応するパラメータを受け取る。 Prior to the conversion process, the line-of-sight angle network unit 130 selects parameters corresponding to arbitrary neural networks constituting the line-of-sight angle network unit 130 among the updated model parameters ^θ g (output values of the line-of-sight angle model parameter optimization unit 140). receive.
 視線角度ネットワーク部130は、受け取ったパラメータを用いて、任意のニューラルネットワークの関数により中間特徴vを推定視線角度ベクトルZ'gに変換する(S130)。推定視線角度ベクトルとは、視線角度の水平回転角度の推定値、垂直回転角度の推定値を格納したベクトルデータである。別の言い方をすると、推定視線角度ベクトルとは、視線角度ベクトルの推定値である。視線角度ベクトルとは、視線角度の水平回転角度、垂直回転角度を格納したベクトルデータである。回転角度の範囲は-180度~180度の範囲を持ち、例えば視線角度ベクトルは、[-35,56]などのベクトルとなる。例えば、あるバッチのq番目の中間特徴vqに対応する推定視線角度ベクトルをZ'g,qとし、Z'g=[Z'g,1,Z'g,2,…,Z'g,Q]とする。推定視線角度ベクトルZ'g,qが、[-35,56]のようなベクトルとなる。 The line-of-sight angle network unit 130 uses the received parameters to transform the intermediate feature v into an estimated line-of-sight angle vector Z' g by an arbitrary neural network function (S130). The estimated line-of-sight angle vector is vector data that stores an estimated value of the horizontal rotation angle and an estimated value of the vertical rotation angle of the line-of-sight angle. In other words, the estimated line-of-sight angle vector is an estimate of the line-of-sight angle vector. The line-of-sight angle vector is vector data that stores the horizontal rotation angle and vertical rotation angle of the line-of-sight angle. The rotation angle range is from -180 degrees to 180 degrees. For example, the line-of-sight angle vector is a vector such as [-35,56]. For example, let Z' g,q be the estimated gaze angle vector corresponding to the q-th intermediate feature v q of a batch, and Z' g =[Z' g,1 ,Z' g,2 ,...,Z' g, Q ]. The estimated line-of-sight angle vector Z' g,q becomes a vector like [-35,56].
<視線角度モデルパラメータ最適化部140>
入力:推定視線角度ベクトルZ'g、視線角度正解ラベルTg 1,…,Tg |M|
出力:更新したモデルパラメータ^θgまたは学習済みモデルパラメータθg
 モデルパラメータ^θgおよびθgは、共有ネットワーク部120を構成するニューラルネットワークに対応するパラメータと視線角度ネットワーク部130を構成するニューラルネットワークに対応するパラメータとを連結したものである。
<Visual line angle model parameter optimization unit 140>
Input: estimated line-of-sight angle vector Z' g , correct line-of-sight angle label T g 1 ,...,T g |M|
Output: updated model parameters ^θ g or learned model parameters θ g
The model parameters ^θ g and θ g are obtained by connecting the parameters corresponding to the neural networks forming the shared network unit 120 and the parameters corresponding to the neural networks forming the line-of-sight angle network unit 130 .
 視線角度モデルパラメータ最適化部140は、推定視線角度ベクトルZ'gと視線角度正解ラベルTg 1,…,Tg |M|とを用いて、モデルパラメータ^θgを更新し(S140)、最適化を行う。例えば、視線角度モデルパラメータ最適化部140は、推定視線角度ベクトルZ'gと視線角度正解ラベルTg 1,…,Tg |M|との間の誤差を計算し、誤差を最小化するようにモデルパラメータ^θgを更新し、最適化を行う。誤差は例えばMSE誤差やMAE誤差などを用いることができ、パラメータの更新方法としては勾配降下法等を用いることができる。 The line-of-sight angle model parameter optimization unit 140 updates the model parameter ^θ g using the estimated line-of-sight angle vector Z' g and the correct line-of-sight angle label T g 1 , ..., T g |M| (S140), Optimize. For example, the line-of-sight angle model parameter optimization unit 140 calculates the error between the estimated line-of-sight angle vector Z′ g and the correct line-of-sight angle label T g 1 , . . . , T g |M| update the model parameter ^θ g and perform optimization. For example, an MSE error or an MAE error can be used as the error, and a gradient descent method or the like can be used as a parameter update method.
<有効視野ネットワーク部150>
入力:1バッチの中間特徴v、更新したモデルパラメータ^θpのうち有効視野ネットワーク部150を構成するニューラルネットワークに対応するパラメータ
出力:推定有効視野確率ベクトルZ'p
 有効視野ネットワーク部150は、任意のニューラルネットワークで構成されており、例えば2層の全結合層などで構成される。
<Effective field of view network unit 150>
Input: 1 batch of intermediate features v, updated model parameters ^θ p corresponding to the neural network constituting the effective field of view network unit 150 Output: estimated effective field of view probability vector Z' p
The effective field of view network unit 150 is composed of an arbitrary neural network, such as two fully connected layers.
 有効視野ネットワーク部150は、変換処理に先立ち、更新したモデルパラメータ^θp(有効視野モデルパラメータ最適化部160の出力値)のうち有効視野ネットワーク部150を構成する任意のニューラルネットワークに対応するパラメータを受け取る。 Prior to conversion processing, the effective field of view network unit 150 extracts parameters corresponding to arbitrary neural networks constituting the effective field of view network unit 150 among the updated model parameters ^θ p (output values of the effective field of view model parameter optimization unit 160). receive.
 有効視野ネットワーク部150は、受け取ったパラメータを用いて、任意のニューラルネットワークの関数により1バッチの中間特徴vを推定有効視野確率ベクトルZ'pに変換する(S150)。推定有効視野確率ベクトルとは、有効視野確率の推定値(推定有効視野確率)からなるベクトルであり、有効視野確率とは学習用顔画像の視線向きが有効視野内であるかどうかの確率である。別の言い方をすると、有効視野確率は、学習用顔画像を撮影するカメラが学習用顔画像の被写体の有効視野内に存在するか否かの確率である。例えば、あるバッチのq番目の中間特徴vqに対応する推定有効視野確率をZ'p,qとし、Z'p=[Z'p,1,Z'p,2,…,Z'g,Q]とする。 The effective field of view network unit 150 uses the received parameters to convert one batch of intermediate features v into an estimated effective field of view probability vector Z′ p by an arbitrary neural network function (S150). The estimated effective visual field probability vector is a vector consisting of the estimated value of the effective visual field probability (estimated effective visual field probability), and the effective visual field probability is the probability of whether or not the gaze direction of the learning face image is within the effective visual field. . In other words, the effective field of view probability is the probability of whether or not the camera that captures the learning face image exists within the effective field of view of the subject of the learning face image. For example, let Z' p,q be the estimated effective field probability corresponding to the q-th intermediate feature v q of a batch, and Z' p =[Z' p,1 ,Z' p,2 ,...,Z' g, Q ].
<有効視野モデルパラメータ最適化部160>
入力:推定有効視野確率ベクトルZ'p、有効視野正解ラベルTp 1,…,Tp |M|
出力:更新したモデルパラメータ^θpまたは学習済みモデルパラメータθp
 モデルパラメータ^θpおよびθpは、共有ネットワーク部120を構成するニューラルネットワークに対応するパラメータと有効視野ネットワーク部150を構成するニューラルネットワークに対応するパラメータとを連結したものである。
<Effective field of view model parameter optimization unit 160>
Input: estimated effective visual field probability vector Z' p , effective visual field correct label T p 1 ,...,T p |M|
Output: updated model parameters ^θ p or learned model parameters θ p
The model parameters ^θ p and θ p are obtained by connecting the parameters corresponding to the neural networks forming the shared network section 120 and the parameters corresponding to the neural networks forming the effective field of view network section 150 .
 有効視野モデルパラメータ最適化部160は、推定有効視野確率ベクトルZ'pと有効視野正解ラベルTp 1,…,Tp |M|とを用いて、モデルパラメータ^θpを更新し(S160)、最適化を行う。例えば、有効視野モデルパラメータ最適化部160は、推定有効視野確率ベクトルZ'pと有効視野正解ラベルTp 1,…,Tp |M|との間の誤差を計算し、誤差を最小化するようにモデルパラメータ^θpを更新し、最適化を行う。誤差は例えばクロスエントロピー誤差などを用いることができ、パラメータの更新方法としては勾配降下法等を用いることができる。 The effective visual field model parameter optimization unit 160 updates the model parameter ^θ p using the estimated effective visual field probability vector Z' p and the effective visual field correct labels T p 1 , ..., T p |M| (S160). , to perform the optimization. For example, the effective field of view model parameter optimization unit 160 calculates the error between the estimated effective field of view probability vector Z′ p and the effective field of view correct label T p 1 , . Update the model parameter ^θ p as follows and perform the optimization. For example, a cross-entropy error can be used as the error, and a gradient descent method or the like can be used as a parameter update method.
 上述の処理S120~S160を所定の条件を満たすまで繰り返す(S170)。所定の条件とは、パラメータの更新が収束したか否かを判断するための条件であり、例えば、所定の条件を(i)更新回数が所定の回数を超えたこと、(ii)更新前後のパラメータの差分が所定の値よりも小さいことなどとしてもよい。 The above processes S120 to S160 are repeated until a predetermined condition is satisfied (S170). The predetermined condition is a condition for determining whether or not the update of the parameter has converged. The parameter difference may be smaller than a predetermined value.
 さらに、上述の処理S120~S170を全てのバッチデータ(学習データ)に対して行う。例えば、未処理のバッチデータがあるか否かを判定し(S180)、未処理のバッチデータがある場合には上述の処理S120~S170を行い(S180のNO)、未処理のバッチデータがない場合(S180のYES)には処理を終了する。 Furthermore, the above-described processes S120 to S170 are performed for all batch data (learning data). For example, it is determined whether or not there is unprocessed batch data (S180), and if there is unprocessed batch data, the above-described processes S120 to S170 are performed (NO in S180), and there is no unprocessed batch data. If so (YES in S180), the process ends.
 全てのバッチデータに対して上述の処理を行った後、最終的に得られた、更新したモデルパラメータ^θg、^θpを学習済みモデルパラメータθg、θpとして出力する。 After performing the above-described processing on all batch data, the finally obtained updated model parameters ^θ g and ^θ p are output as learned model parameters θ g and θ p .
 次に、推定装置200について説明する。 Next, the estimation device 200 will be described.
<推定装置200>
 推定装置200は、推定処理に先立ち、学習済みのモデルパラメータθgpを受け取る。推定装置200は、推定対象の顔画像Sを入力とし、学習済みのモデルパラメータθgを用いて視線角度を推定し、学習済みのモデルパラメータθpを用いて有効視野確率を推定し、推定視線角度ベクトルZg、有効視野の判定結果Zpを出力する。
<Estimation device 200>
The estimation device 200 receives learned model parameters θ g and θ p prior to the estimation process. The estimating apparatus 200 receives the face image S to be estimated as an input, estimates the line-of-sight angle using the learned model parameter θ g , estimates the effective visual field probability using the learned model parameter θ p , and calculates the estimated line-of-sight. The angle vector Z g and the determination result Z p of the effective field of view are output.
 図7は、推定装置200の機能ブロック図を、図8はその処理フローを示す。 FIG. 7 shows a functional block diagram of the estimation device 200, and FIG. 8 shows its processing flow.
 推定装置200は、視線角度推定部210と、有効視野判定部220とを含む。 The estimating device 200 includes a line-of-sight angle estimating section 210 and an effective visual field determining section 220 .
 以下、各部の詳細について説明する。 The details of each part will be explained below.
<視線角度推定部210>
入力: 顔画像S、モデルパラメータθg
出力: 顔画像Sに対する推定視線角度ベクトルZg
 視線角度推定部210は、推定処理に先立ち、モデルパラメータθgを受け取る。
<Line-of-sight angle estimation unit 210>
Input: face image S, model parameters θ g
Output: Estimated gaze angle vector Z g for face image S
The line-of-sight angle estimator 210 receives the model parameter θ g prior to the estimation process.
 視線角度推定部210は、共有ネットワーク部120とそれに続く視線角度ネットワーク部130のネットワークアーキテクチャと、モデルパラメータθgとを用いて、顔画像Sから視線角度を推定し(S210)、推定値(推定視線角度ベクトルZg)を求める。例えば、共有ネットワーク部120が4層の畳み込み層からなる任意のニューラルネットワークで構成され、視線角度ネットワーク部130が2層の全結合層からなる任意のニューラルネットワークで構成される場合、視線角度推定部210は、共有ネットワーク部120と視線角度ネットワーク部130とに対応する4層の畳み込み層と2層の全結合層とからなるニューラルネットワークで構成され、このニューラルネットワークでモデルパラメータθgを用いる。 The line-of-sight angle estimating unit 210 estimates the line-of-sight angle from the face image S using the network architecture of the shared network unit 120 and the subsequent line-of-sight angle network unit 130, and the model parameter θ g (S210). Find the line-of-sight angle vector Z g ). For example, when the shared network unit 120 is composed of an arbitrary neural network consisting of four convolutional layers, and the line-of-sight angle network unit 130 is composed of an arbitrary neural network consisting of two fully connected layers, the line-of-sight angle estimation unit A neural network 210 is composed of four convolution layers and two fully connected layers corresponding to the shared network section 120 and the line-of-sight angle network section 130, and uses the model parameter θ g in this neural network.
<有効視野判定部220>
入力: 顔画像S、モデルパラメータθp
出力: 顔画像Sに対する有効視野の判定結果Zp
 有効視野判定部220は、推定処理に先立ち、モデルパラメータθpを受け取る。
<Effective field of view determination unit 220>
Input: face image S, model parameters θ p
Output: Effective field of view determination result Z p for face image S
The effective field of view determining section 220 receives the model parameters θ p prior to the estimation process.
 有効視野判定部220は、共有ネットワーク部120とそれに続く有効視野ネットワーク部150のネットワークアーキテクチャと、モデルパラメータθpとを用いて、顔画像Sから有効視野確率を推定し、有効視野確率の推定値に基づき有効視野内か否かを判定する(S220)。例えば、有効視野判定部220は、有効視野確率の推定値と所定の閾値との大小関係に基づき、有効視野内であるか否かを判定する。例えば、有効視野判定部220は、有効視野確率の推定値が所定の閾値(例えば、0.5)以上の場合には有効視野内であることを示す判定結果を出力し、所定の閾値未満の場合には有効視野外であることを示す判定結果を出力する。ただし、有効視野判定部220は、有効視野確率の推定値そのものを判定結果として出力してもよい。例えば、共有ネットワーク部120が4層の畳み込み層からなる任意のニューラルネットワークで構成され、有効視野ネットワーク部150が2層の全結合層からなる任意のニューラルネットワークで構成される場合、有効視野判定部220は、共有ネットワーク部120と有効視野ネットワーク部150とに対応する4層の畳み込み層と2層の全結合層とからなるニューラルネットワークで構成され、このニューラルネットワークでモデルパラメータθpを用いる。 The effective visual field determination unit 220 estimates the effective visual field probability from the face image S using the network architecture of the shared network unit 120 and the subsequent effective visual field network unit 150, and the model parameters θ p , and obtains the estimated value of the effective visual field probability. (S220). For example, the effective field of view determination unit 220 determines whether or not it is within the effective field of view based on the magnitude relationship between the estimated value of the effective field of view probability and a predetermined threshold value. For example, the effective visual field determination unit 220 outputs a determination result indicating that the effective visual field probability is within the effective visual field when the estimated value of the effective visual field probability is equal to or greater than a predetermined threshold value (for example, 0.5), and when the estimated value of the effective visual field probability is less than the predetermined threshold value is out of the effective field of view. However, the effective visual field determination section 220 may output the estimated value of the effective visual field probability itself as the determination result. For example, when the shared network unit 120 is composed of an arbitrary neural network consisting of four convolutional layers, and the effective visual field network unit 150 is composed of an arbitrary neural network consisting of two fully connected layers, the effective visual field determination unit A neural network 220 is composed of four convolutional layers and two fully connected layers corresponding to the shared network section 120 and the effective field of view network section 150, and uses the model parameter θ p in this neural network.
<効果>
 以上の構成により、視線角度と有効視野判定の両者の特徴を捉えた高精度な視線角度推定および有効視野判定ができる。
<effect>
With the above configuration, it is possible to perform highly accurate line-of-sight angle estimation and effective field-of-view determination by capturing the characteristics of both the line-of-sight angle and the effective field-of-view determination.
<変形例>
 本実施形態では、推定装置200は、視線角度と有効視野確率とを推定しているが、何れか一方のみを推定する構成としてもよい。その場合であっても、学習時には、視線角度と有効視野の推定に用いるモデルパラメータを1つのニューラルネットワークの系で学習しているため、視線角度と有効視野判定の両者の特徴を捉えた高精度な視線角度推定または有効視野判定を行うことができる。
<Modification>
In this embodiment, the estimation device 200 estimates the line-of-sight angle and the effective visual field probability, but may be configured to estimate only one of them. Even in that case, since the model parameters used for estimating the line-of-sight angle and the effective field of view are learned in a single neural network system during learning, high accuracy is achieved by capturing the characteristics of both the line-of-sight angle and the effective field of view. line-of-sight angle estimation or effective field of view determination.
<効果>
 以上の構成により、視線角度と有効視野判定の両者の特徴を捉えた高精度な視線角度推定、有効視野判定が行うことができる。
<effect>
With the above configuration, highly accurate line-of-sight angle estimation and effective field-of-view determination can be performed by capturing the features of both the line-of-sight angle and effective field-of-view determination.
<その他の変形例>
 本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。
<Other Modifications>
The present invention is not limited to the above embodiments and modifications. For example, the various types of processing described above may not only be executed in chronological order according to the description, but may also be executed in parallel or individually according to the processing capacity of the device that executes the processing or as necessary. In addition, appropriate modifications are possible without departing from the gist of the present invention.
<プログラム及び記録媒体>
 上述の各種の処理は、図9に示すコンピュータの記憶部2020に、上記方法の各ステップを実行させるプログラムを読み込ませ、制御部2010、入力部2030、出力部2040などに動作させることで実施できる。
<Program and recording medium>
The various processes described above can be performed by loading a program for executing each step of the above method into the storage unit 2020 of the computer shown in FIG. .
 この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 A program that describes this process can be recorded on a computer-readable recording medium. Any computer-readable recording medium may be used, for example, a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory, or the like.
 また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 In addition, the distribution of this program is carried out, for example, by selling, assigning, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Further, the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to other computers via the network.
 このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP(Application Service Provider)型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの(コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等)を含むものとする。 A computer that executes such a program, for example, first stores the program recorded on a portable recording medium or the program transferred from the server computer once in its own storage device. Then, when executing the process, this computer reads the program stored in its own recording medium and executes the process according to the read program. Also, as another execution form of this program, the computer may read the program directly from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to this computer. Each time, the processing according to the received program may be executed sequentially. In addition, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer the program from the server computer to this computer, and realizes the processing function only by its execution instruction and result acquisition. may be It should be noted that the program in this embodiment includes information that is used for processing by a computer and that conforms to the program (data that is not a direct instruction to the computer but has the property of prescribing the processing of the computer, etc.).
 また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, in this embodiment, the device is configured by executing a predetermined program on a computer, but at least part of these processing contents may be implemented by hardware.

Claims (8)

  1.  モデルパラメータ^θgまたは^θpを用いて、ニューラルネットワークの関数により学習用顔画像Sbを中間特徴vに変換する共有ネットワーク部と、
     前記モデルパラメータ^θgを用いて、ニューラルネットワークの関数により前記中間特徴vを推定視線角度ベクトルZ'gに変換する視線角度ネットワーク部と、
     前記推定視線角度ベクトルZ'gと前記学習用顔画像Sbに対する視線角度正解ラベルとを用いて、前記モデルパラメータ^θgを更新する視線角度モデルパラメータ最適化部と、
     前記モデルパラメータ^θpを用いて、ニューラルネットワークの関数により中間特徴vを推定有効視野確率ベクトルZ'pに変換する有効視野ネットワーク部と、
     前記推定有効視野確率ベクトルZ'pと前記学習用顔画像Sbに対する有効視野正解ラベルとを用いて、前記モデルパラメータ^θpを更新する有効視野モデルパラメータ最適化部とを含み、
     前記モデルパラメータ^θgに対応する学習済みのモデルパラメータθgと、前記モデルパラメータ^θpに対応する学習済みのモデルパラメータθpを取得する、
     学習装置。
    a shared network unit that converts a learning face image S b into an intermediate feature v using a neural network function using model parameters ^θ g or ^θ p ;
    a line-of-sight angle network unit that converts the intermediate feature v into an estimated line-of-sight angle vector Z' g by a neural network function using the model parameter ^θ g ;
    a line-of-sight angle model parameter optimization unit that updates the model parameter ^θ g using the estimated line-of-sight angle vector Z′ g and the correct line-of-sight angle label for the learning face image S b ;
    an effective field of view network unit that converts the intermediate feature v into an estimated effective field of view probability vector Z′ p by a neural network function using the model parameter ^θ p ;
    an effective visual field model parameter optimization unit that updates the model parameter ^θ p using the estimated effective visual field probability vector Z'p and the effective visual field correct label for the learning face image Sb ,
    obtaining a learned model parameter θ g corresponding to the model parameter ^θ g and a learned model parameter θ p corresponding to the model parameter ^θ p ;
    learning device.
  2.  請求項1の学習装置で学習したモデルパラメータθgを用いる推定装置であって、
     前記共有ネットワーク部とそれに続く前記視線角度ネットワーク部のネットワークアーキテクチャと、前記モデルパラメータθgとを用いて、推定対象の顔画像Sから視線角度を推定する視線角度推定部を含む、
     推定装置。
    An estimating device that uses the model parameters θ g learned by the learning device of claim 1,
    A line-of-sight angle estimating unit that estimates a line-of-sight angle from the face image S to be estimated using the network architecture of the shared network unit, the line-of-sight angle network unit that follows, and the model parameter θ g ,
    estimation device.
  3.  請求項1の学習装置で学習したモデルパラメータθpを用いる推定装置であって、
     前記共有ネットワーク部とそれに続く前記有効視野ネットワーク部のネットワークアーキテクチャと、前記モデルパラメータθpとを用いて、前記顔画像Sから有効視野確率を推定し、前記有効視野確率の推定値に基づき有効視野内か否かを判定する有効視野判定部を含む、
     推定装置。
    An estimating device that uses the model parameters θ p learned by the learning device of claim 1,
    The effective visual field probability is estimated from the face image S using the network architecture of the shared network unit and the subsequent effective visual field network unit, and the model parameter θp , and the effective visual field probability is estimated based on the estimated value of the effective visual field probability. Including an effective field of view determination unit that determines whether it is within
    estimation device.
  4.  請求項1の学習装置で学習したモデルパラメータθgとモデルパラメータθpとを用いる推定装置であって、
     前記共有ネットワーク部とそれに続く前記視線角度ネットワーク部のネットワークアーキテクチャと、前記モデルパラメータθgとを用いて、推定対象の顔画像Sから視線角度を推定する視線角度推定部と、
     前記共有ネットワーク部とそれに続く前記有効視野ネットワーク部のネットワークアーキテクチャと、前記モデルパラメータθpとを用いて、前記顔画像Sから有効視野確率を推定し、前記有効視野確率の推定値に基づき有効視野内か否かを判定する有効視野判定部とを含む、
     推定装置。
    An estimating device using the model parameter θ g and the model parameter θ p learned by the learning device of claim 1,
    a line-of-sight angle estimation unit for estimating a line-of-sight angle from a face image S to be estimated using the network architecture of the shared network unit and the line-of-sight angle network unit following it, and the model parameter θ g ;
    The effective visual field probability is estimated from the face image S using the network architecture of the shared network unit and the subsequent effective visual field network unit, and the model parameter θp , and the effective visual field probability is estimated based on the estimated value of the effective visual field probability. Including an effective field of view determination unit that determines whether it is within
    estimation device.
  5.  モデルパラメータ^θgまたは^θpを用いて、ニューラルネットワークの関数により学習用顔画像Sbを中間特徴vに変換する共有ネットワークステップと、
     前記モデルパラメータ^θgを用いて、ニューラルネットワークの関数により前記中間特徴vを推定視線角度ベクトルZ'gに変換する視線角度ネットワークステップと、
     前記推定視線角度ベクトルZ'gと前記学習用顔画像Sbに対する視線角度正解ラベルとを用いて、前記モデルパラメータ^θgを更新する視線角度モデルパラメータ最適化ステップと、
     前記モデルパラメータ^θpを用いて、ニューラルネットワークの関数により中間特徴vを推定有効視野確率ベクトルZ'pに変換する有効視野ネットワークステップと、
     前記推定有効視野確率ベクトルZ'pと前記学習用顔画像Sbに対する有効視野正解ラベルとを用いて、前記モデルパラメータ^θpを更新する有効視野モデルパラメータ最適化ステップとを含み、
     前記モデルパラメータ^θgに対応する学習済みのモデルパラメータθgと、前記モデルパラメータ^θpに対応する学習済みのモデルパラメータθpを取得する、
     学習方法。
    A shared network step of transforming the learning face image S b into an intermediate feature v by a neural network function using the model parameters ^θ g or ^θ p ;
    a line-of-sight angle network step of transforming the intermediate feature v into an estimated line-of-sight angle vector Z' g by a neural network function using the model parameter ^θ g ;
    a gaze angle model parameter optimization step of updating the model parameter ^θ g using the estimated gaze angle vector Z′ g and the gaze angle correct label for the learning face image S b ;
    an effective visual field network step of transforming the intermediate feature v into an estimated effective visual field probability vector Z′ p by a neural network function using the model parameters ^θ p ;
    an effective visual field model parameter optimization step of updating the model parameter ^θ p using the estimated effective visual field probability vector Z' p and the effective visual field correct label for the learning face image S b ;
    obtaining a learned model parameter θ g corresponding to the model parameter ^θ g and a learned model parameter θ p corresponding to the model parameter ^θ p ;
    learning method.
  6.  請求項5の学習方法で学習したモデルパラメータθgを用いる推定方法であって、
     前記共有ネットワークステップとそれに続く前記視線角度ネットワークステップのネットワークアーキテクチャと、前記モデルパラメータθgとを用いて、推定対象の顔画像Sから視線角度を推定する視線角度推定ステップを含む、
     推定方法。
    An estimation method using the model parameter θ g learned by the learning method of claim 5,
    A gaze angle estimation step of estimating a gaze angle from the face image S to be estimated using the network architecture of the shared network step and the subsequent gaze angle network step, and the model parameter θ g ,
    estimation method.
  7.  請求項5の学習方法で学習したモデルパラメータθpを用いる推定方法であって、
     前記共有ネットワークステップとそれに続く前記有効視野ネットワークステップのネットワークアーキテクチャと、前記モデルパラメータθpとを用いて、前記顔画像Sから有効視野確率を推定し、前記有効視野確率の推定値に基づき有効視野内か否かを判定する有効視野判定ステップを含む、
     推定方法。
    An estimation method using the model parameter θ p learned by the learning method of claim 5,
    estimating a valid field of view probability from the face image S using the network architecture of the shared network step and the subsequent effective field of view network step, and the model parameters θ p ; Including an effective field of view determination step for determining whether within
    estimation method.
  8.  請求項1の学習装置、または、請求項2から請求項4の何れかの推定装置として、コンピュータを機能させるためのプログラム。 A program for causing a computer to function as the learning device of claim 1 or the estimation device of any one of claims 2 to 4.
PCT/JP2021/022704 2021-06-15 2021-06-15 Training device, estimation device, methods therefor, and program WO2022264269A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/022704 WO2022264269A1 (en) 2021-06-15 2021-06-15 Training device, estimation device, methods therefor, and program
JP2023528808A JPWO2022264269A1 (en) 2021-06-15 2021-06-15

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/022704 WO2022264269A1 (en) 2021-06-15 2021-06-15 Training device, estimation device, methods therefor, and program

Publications (1)

Publication Number Publication Date
WO2022264269A1 true WO2022264269A1 (en) 2022-12-22

Family

ID=84526357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/022704 WO2022264269A1 (en) 2021-06-15 2021-06-15 Training device, estimation device, methods therefor, and program

Country Status (2)

Country Link
JP (1) JPWO2022264269A1 (en)
WO (1) WO2022264269A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019159518A (en) * 2018-03-09 2019-09-19 株式会社国際電気通信基礎技術研究所 Visual state detection apparatus, visual state detection method, and visual state detection program
US20190377409A1 (en) * 2018-06-11 2019-12-12 Fotonation Limited Neural network image processing apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019159518A (en) * 2018-03-09 2019-09-19 株式会社国際電気通信基礎技術研究所 Visual state detection apparatus, visual state detection method, and visual state detection program
US20190377409A1 (en) * 2018-06-11 2019-12-12 Fotonation Limited Neural network image processing apparatus

Also Published As

Publication number Publication date
JPWO2022264269A1 (en) 2022-12-22

Similar Documents

Publication Publication Date Title
JP6807471B2 (en) Semantic segmentation model training methods and equipment, electronics, and storage media
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
CN105447498B (en) Client device, system and server system configured with neural network
JP6345276B2 (en) Face authentication method and system
JP6943291B2 (en) Learning device, learning method, and program
EP3745309A1 (en) Training a generative adversarial network
US20210064989A1 (en) Continual learning of artificial intelligence systems based on bi-level optimization
CN113159283A (en) Model training method based on federal transfer learning and computing node
JP7086878B2 (en) Learning device, learning method, program and recognition device
CN111985458A (en) Method for detecting multiple targets, electronic equipment and storage medium
JP2019152964A (en) Learning method and learning device
JP7472471B2 (en) Estimation system, estimation device, and estimation method
JP6927410B2 (en) Image classification system, image classification method and image classification program
WO2021090777A1 (en) Behavior recognition learning device, behavior recognition learning method, behavior recognition device, and program
JP2020177582A (en) Leaning device, learning method, program, and recognition device
WO2022264269A1 (en) Training device, estimation device, methods therefor, and program
WO2021100184A1 (en) Learning device, estimation device, learning method, and learning program
JP2022509564A (en) Neural network active training system and image processing system
US20220351533A1 (en) Methods and systems for the automated quality assurance of annotated images
CN114119970B (en) Target tracking method and device
WO2022264268A1 (en) Learning device, estimation device, method for these, and program
CN112651467B (en) Training method and system and prediction method and system for convolutional neural network
US20200175376A1 (en) Learning Method, Learning Device, Program, and Recording Medium
KR20210074205A (en) System and method for image classification based positioning
WO2020040007A1 (en) Learning device, learning method, and learning program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21945948

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023528808

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE