WO2023032014A1 - Estimation method, estimation device, and estimation program - Google Patents
Estimation method, estimation device, and estimation program Download PDFInfo
- Publication number
- WO2023032014A1 WO2023032014A1 PCT/JP2021/031791 JP2021031791W WO2023032014A1 WO 2023032014 A1 WO2023032014 A1 WO 2023032014A1 JP 2021031791 W JP2021031791 W JP 2021031791W WO 2023032014 A1 WO2023032014 A1 WO 2023032014A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mind
- data
- state
- estimation
- input data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 35
- 238000004364 calculation method Methods 0.000 claims abstract description 33
- 230000014509 gene expression Effects 0.000 claims description 27
- 230000007246 mechanism Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000010365 information processing Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000008909 emotion recognition Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 230000006996 mental state Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present invention relates to an estimation method, an estimation device, and an estimation program.
- Estimation of the state of mind that appears in such nonverbal/paralinguistic information is generally performed by labeling each label that represents a defined state of mind for inputs such as feature values and data itself extracted from speech and video images. It is defined as supervised learning that outputs posterior probabilities and the like (see Non-Patent Document 1).
- the present invention has been made in view of the above, and aims to accurately estimate a label representing a state of mind appearing in nonverbal/paralinguistic information.
- an estimation method is an estimation method executed by an estimation device, comprising: non-linguistic information or paralinguistic information; an acquisition step of acquiring learning data including a correct label representing a state of mind that appears; using a calculation step of calculating an embedded representation of a state and an embedded representation of a state of mind of the reference data, and a comparison result between the embedded representation calculated from the input data and the embedded representation calculated from the reference data; and an estimation step of estimating the state of mind of the input data.
- FIG. 1 is a schematic diagram illustrating a schematic configuration of an estimation device.
- FIG. 2 is a diagram for explaining the processing of the estimation device.
- FIG. 3 is a diagram illustrating a data configuration of learning data.
- FIG. 4 is a diagram for explaining the processing of the calculator and the estimator.
- FIG. 5 is a flowchart showing an estimation processing procedure.
- FIG. 6 is a diagram for explaining the embodiment.
- FIG. 7 is a diagram illustrating a computer that executes an estimation program;
- FIG. 1 is a schematic diagram illustrating a schematic configuration of an estimation device.
- FIG. 2 is a diagram for explaining the processing of the estimation device.
- the estimation device 10 of the present embodiment uses a neural network for a moving image showing the upper body of a subject, which is nonverbal/paralinguistic information, to calculate the degree of understanding as the state of mind that appears in the nonverbal/paralinguistic information. Estimated in 5 stages. The degree of comprehension is, for example, 1. 2. do not understand; Somewhat do not understand;3. 4. Normal state; 5. Somewhat understand; It is defined as understanding, and the higher the number, the better the understanding.
- the estimation device 10 of the present embodiment is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15. Prepare.
- the input unit 11 is implemented using input devices such as a keyboard and a mouse, and inputs various instruction information such as processing start to the control unit 15 in response to input operations by the practitioner.
- the output unit 12 is implemented by a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, or the like.
- the communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and an external device such as a server or a device for managing learning data via a network.
- NIC Network Interface Card
- the storage unit 14 is implemented by semiconductor memory devices such as RAM (Random Access Memory) and flash memory, or storage devices such as hard disks and optical disks. Note that the storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13 . In the present embodiment, the storage unit 14 stores, for example, learning data 14a used for estimation processing, which will be described later, model parameters 14d generated and updated in the estimation processing, and the like.
- the learning data 14a of this embodiment includes the input data 14b and the reference data 14c, but the data configuration is the same.
- FIG. 3 is a diagram illustrating a data configuration of learning data.
- the learning data 14a includes at least video data showing the upper body of a subject as non-verbal/paralinguistic information, a data ID for identifying each video data, and a personal ID for identifying the subject. , and a correct label representing the state of mind such as the degree of understanding appearing in each moving image data.
- the learning data 14a may include labels representing attributes of a person such as age and gender.
- the learning data 14a may be learned, developed, divided into evaluation sets, or data expanded as necessary.
- preprocessing such as contrast normalization and face detection may be performed, and only areas with video data may be used.
- codec of the input data is not particularly limited.
- H264 format video data recorded by a web camera at 30 frames per second is resized so that one side is 224 pixels. do it.
- Each of the X pieces of moving image data is provided with the individual IDs of S subjects and the correct label of the degree of comprehension.
- the input data and the reference data should not contain the same data.
- the input data 14b and the reference data 14c may be generated by any combination of the learning data 14a so as to avoid mixing of the same data.
- the control unit 15 is implemented using a CPU (Central Processing Unit), NP (Network Processor), FPGA (Field Programmable Gate Array), etc., and executes a processing program stored in memory. Thereby, the control unit 15 functions as an acquisition unit 15a, a calculation unit 15b, an estimation unit 15c, and a learning unit 15d, as illustrated in FIG. Note that these functional units may be implemented in different hardware. For example, the acquisition unit 15a may be implemented in hardware different from other functional units. Also, the control unit 15 may include other functional units.
- the acquisition unit 15a acquires learning data including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information. Specifically, the acquisition unit 15a receives video data showing the upper body of the subject as nonverbal/paralinguistic information via the input unit 11 or from a device that generates learning data via the communication control unit 13. Then, the learning data 14a including the data ID for identifying each piece of moving image data and the correct label representing the state of mind such as the degree of understanding appearing in each piece of moving image data is obtained.
- the acquisition unit 15a causes the storage unit 14 to store learning data 14a acquired in advance prior to the following processing.
- the acquiring unit 15a may transfer the acquired learning data 14a to the estimating unit 15c described below without storing the acquired learning data 14a in the storage unit 14. FIG.
- the calculation unit 15b uses the feature amounts of the input data 14b and the reference data 14c in the acquired learning data 14a to obtain the embedded representation of the state of mind of the input data and the state of mind of the reference data. and the embedded representation of .
- processing using the neural network described below is not limited to this embodiment. good too.
- the calculation unit 15b first extracts feature amounts from the input data 14b and the reference data 14c for the same subject. For example, the calculation unit 15b extracts the log mel-filter bank of audio, MFCC (Mel frequency cepstrum coefficient), HOG (Histogram of Oriented Gradients), HOF (Histogram of Optical Flow), etc. for each frame of video as feature amounts. do.
- the moving image itself may be used as the feature quantity.
- calculation unit 15b may perform preprocessing such as voice enhancement, noise removal, contrast normalization, cutout of the face peripheral region, and feature amount normalization as necessary.
- calculation unit 15b may perform data diffusion processing such as superimposition of noise and reverberation, rotation of moving images, and addition of noise before extracting the feature amount.
- the calculation unit 15b calculates the face data x 1: T of the input data 14b having a frame length T and the video data y 1:T (1, . . . , N) of the N pieces of reference data 14c. Only the peripheral area is cut into a square and resized again so that one side has 224 pixels. In addition, the calculation unit 15b normalizes the value of each pixel so that it ranges from 0.0 to 1.0. All of the moving image data of the N pieces of reference data 14c may be given a correct label with an understanding level of 3, or each label with an understanding level of 1 to 5 may be mixed.
- the input data 14b and the reference data 14c are selected so as not to contain the same moving image data.
- mixing of the same moving image data may be avoided by performing preprocessing such as burning out or transforming metadata.
- the calculation unit 15b calculates an embedded expression from the feature amounts of the input data 14b and the reference data 14c. For example, the calculation unit 15b calculates the embedded representation H for each time using a 2D CNN (Convolutional Neural Network) or RNN (Recurrent Neural Network). Note that the calculation unit 15b may replace the 2D CNN with a 3D CNN, or replace the RNN with a Transformer.
- 2D CNN Convolutional Neural Network
- RNN Recurrent Neural Network
- model parameters 14d may include those pre-learned in any other task, or the initial values may be generated with arbitrary random numbers. Moreover, when the learned model parameter 14d is used, whether or not to update the model parameter 14d may be determined arbitrarily.
- the calculation unit 15b uses the 2D CNN and the D-dimensional output dimension as the RNN to obtain the following equation (1) from the input video data x 1:T: Compute the embedded representation tensor H x .
- ⁇ is the CNN parameter set and ⁇ is the RNN parameter set.
- the calculation unit 15b also calculates an embedded expression tensor H y from the reference moving image data y 1:T (1, . . . , N) as shown in the following equation (2).
- the calculation unit 15b compares the embedding expression calculated from the input moving image data x1:T and the embedding expression calculated from the reference moving image data y1:T (1, . . . , N) . Specifically, as shown in FIG. 4, the calculation unit 15b compares the embedded expression tensor H x calculated from the input data 14b with the embedded expression tensor H y calculated from the reference data 14c to obtain e ( 1, . . . , N) .
- the calculation unit 15b makes a comparison using a source-target attention mechanism.
- the calculator 15b calculates the comparison result vector e (1,...,N) as shown in the following equation (3).
- the calculation unit 15b calculates attention weight from queryQ i (1, .
- d1 is the number of attention heads
- i is each attention head
- W i Q , W i K , and W i V are weights for Query, key, and value in each attention head.
- calculation unit 15b is not limited to the source-target attention mechanism, and may be compared using predetermined four arithmetic operations, combinations, etc. between embedded expressions.
- the calculator 15b compares e (1, . . . , N) to calculate a comparison result vector v, as shown in FIG.
- the calculation unit 15b may add arbitrary information such as metadata of y1:T (1,...,N) to e( 1,...,N) .
- the calculation unit 15b adds e (1, . . . , N) to y 1:T (1, . Combine the data m (1,...,N) and generate a tensor E1 :T combining them.
- calculation unit 15b calculates v from E 1:T as shown in the following equation (5) using the multi-head self attention mechanism.
- d2 is the number of attention heads
- j is each attention head
- WjQ , WjK , and WjV are weights for Query, key, and value in each attention head.
- the estimating unit 15c estimates the state of mind of the input data 14b using the result of comparison between the embedded representation calculated from the input data 14b and the embedded representation calculated from the reference data 14c.
- the estimation unit 15c compares the embedding expression calculated from the input data 14b by the calculating unit 15b and the embedding expression calculated from the reference data 14c using the multi-head self-attention mechanism. is used to estimate the state of mind of the input data 14b.
- the estimation unit 15c estimates the state of mind of the input data 14b from the comparison result vector v calculated by the calculation unit 15b. At that time, the estimating unit 15c may calculate the posterior probability for each class as a classification problem of an arbitrary number of classes to estimate the state of mind. That is, the estimating unit 15c may estimate the state of mind by calculating the posterior probability for each class of the state of mind classification. Alternatively, the estimation unit 15c may estimate a numerical value representing the state of mind as a regression problem.
- the estimating unit 15c uses two fully connected layers to determine the posterior probability p(C
- W 1 FC and W 2 FC represent the weights of the two fully connected layers
- D FC represents the number of output dimensions of the first fully connected layer
- a ReLU function is used as the activation function of the first fully connected layer.
- the learning unit 15d uses the input data 14b and the estimated state of mind of the input data 14b to obtain model parameters 14d of a model that estimates the state of mind appearing in the input nonverbal information or paralinguistic information. learn.
- the learning unit 15d updates the model parameter set ⁇ and acquires the learned model parameter set ⁇ '.
- the learning unit 15d can apply well-known loss functions and update methods.
- the model parameter set ⁇ may include those pre-trained in any other task, initial values may be generated with arbitrary random numbers, and some model parameters may not be updated. may
- the learning unit 15d uses the stochastic gradient method (SGD) to update the model parameter set ⁇ using the cross entropy L shown in the following equation (7) as a loss function.
- SGD stochastic gradient method
- mx is the correct distribution of the input moving image data x 1:T .
- the method of expressing the correct answer distribution is not particularly limited, and may be expressed as a one-hot vector, for example.
- the correct distribution may be represented by approximating a normal distribution centered on the correct class.
- the learning unit 15d causes the storage unit 14 to store the acquired learned model parameter set ⁇ ' as the model parameter 14d.
- the calculation unit 15b uses the learned model parameters 14b to calculate the state-of-mind embedding representation of the input data 14b and the state-of-mind embedding representation of the reference data 14c. do.
- FIG. 5 is a flowchart showing an estimation processing procedure.
- the flowchart of FIG. 5 is started, for example, when an input instructing the start of the estimation process is received.
- the acquisition unit 15a acquires the learning data 14a including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information (step S1).
- the calculation unit 15b uses the feature amounts of the input data 14b and the reference data 14c in the acquired learning data 14a to obtain the embedded representation of the state of mind of the input data and the reference data.
- An embedded representation of the state of mind is calculated (step S2).
- the calculation unit 15b also compares the embedding expression calculated from the input data 14b and the embedding expression calculated from the reference data 14c (step S3).
- the estimating unit 15c estimates the state of mind of the input data 14b using the result of comparison between the embedded expression calculated from the input data 14b and the embedded expression calculated from the reference data 14c (step S4). This completes a series of estimation processes.
- the acquisition unit 15a performs learning including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information. Get data.
- the calculation unit 15b uses the feature amounts of the input data 14b and the reference data 14c in the acquired learning data 14a to obtain the state-of-mind embedding expression of the input data 14b and the reference data 14c. and the embedded representation of the state of mind of .
- the estimation unit 15c estimates the state of mind of the input data 14b by using the result of comparison between the embedded representation calculated from the input data 14b and the embedded representation calculated from the reference data 14c.
- the estimation unit 15c compares the embedding expression calculated from the input data 14b and the embedding expression calculated from the reference data 14c using a multi-head self-attention mechanism.
- the estimating unit 15c also calculates the posterior probability for each class of the state of mind classification to estimate the state of mind.
- the estimation unit 15c estimates a numerical value representing the state of mind as a regression problem.
- the estimation device 10 estimates the state of mind using a plurality of correct labels other than the normal state registered in advance as reference information. As a result, even if the variance of the labels is so large that individual differences cannot be normalized or absorbed, it is possible to accurately estimate the label representing the state of mind appearing in the nonverbal/paralinguistic information.
- the learning unit 15d uses the input data 14b and the estimated state of mind of the input data 14b to set the model parameter of the model for estimating the state of mind appearing in the input nonverbal information or paralinguistic information. Study 14d.
- the calculation unit 15b uses the learned model parameters 14b to calculate the state-of-mind embedding representation of the input data 14b and the state-of-mind embedding representation of the reference data 14c. As a result, it becomes possible to estimate the label representing the state of mind appearing in the nonverbal/paralinguistic information with higher accuracy.
- FIG. 6 is a diagram for explaining the embodiment.
- FIG. 6 shows the accuracy of each of the three methods, including the present invention, when estimating five levels of intelligibility for unknown moving image data of the same person.
- a general method that does not use reference data (none)
- a method that absorbs individual differences using reference data only in normal conditions (understanding level 3 only)
- a method that applies the present invention (understanding level 2, 3, 4) were applied.
- comprehension was estimated using the self-attention mechanism and the fully connected layer for the embedded representation H x without using reference data.
- the estimating device 10 can be implemented by installing an estimating program that executes the above estimating process as package software or online software on a desired computer.
- the information processing device can function as the estimation device 10 by causing the information processing device to execute the above estimation program.
- information processing devices include mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone Systems), and slate terminals such as PDAs (Personal Digital Assistants).
- the functions of the estimation device 10 may be implemented in a cloud server.
- FIG. 7 is a diagram showing an example of a computer that executes an estimation program.
- Computer 1000 includes, for example, memory 1010 , CPU 1020 , hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012 .
- the ROM 1011 stores a boot program such as BIOS (Basic Input Output System).
- BIOS Basic Input Output System
- Hard disk drive interface 1030 is connected to hard disk drive 1031 .
- Disk drive interface 1040 is connected to disk drive 1041 .
- a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041, for example.
- a mouse 1051 and a keyboard 1052 are connected to the serial port interface 1050, for example.
- a display 1061 is connected to the video adapter 1060 .
- the hard disk drive 1031 stores an OS 1091, application programs 1092, program modules 1093 and program data 1094, for example. Each piece of information described in the above embodiment is stored in the hard disk drive 1031 or the memory 1010, for example.
- the estimation program is stored in the hard disk drive 1031 as a program module 1093 in which instructions to be executed by the computer 1000 are written, for example.
- the hard disk drive 1031 stores a program module 1093 that describes each process executed by the estimation device 10 described in the above embodiment.
- data used for information processing by the estimation program is stored as program data 1094 in the hard disk drive 1031, for example. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the hard disk drive 1031 to the RAM 1012 as necessary, and executes each procedure described above.
- program module 1093 and program data 1094 related to the estimation program are not limited to being stored in the hard disk drive 1031.
- they are stored in a removable storage medium and read by the CPU 1020 via the disk drive 1041 or the like.
- the program module 1093 and program data 1094 related to the estimation program are stored in another computer connected via a network such as LAN (Local Area Network) or WAN (Wide Area Network), and via network interface 1070 It may be read by CPU 1020 .
- LAN Local Area Network
- WAN Wide Area Network
- estimation device 11 input unit 12 output unit 13 communication control unit 14 storage unit 14a learning data 14b input data 14c reference data 14d model parameter 15 control unit 15a acquisition unit 15b calculation unit 15c estimation unit 15d learning unit
Abstract
An acquisition unit (15a) acquires learning data that includes nonverbal information or paralinguistic information and correct labels representing the states of mind indicated in the nonverbal information or paralinguistic information. A calculation unit (15b) uses a feature quantity of each of input data (14b) and reference data (14c) in the acquired learning data (14a) to calculate an embedded representation of the state of mind indicated by the input data (14b) and an embedded representation of the state of mind indicated by the reference data (14c). An estimation unit (15c) estimates the state of mind indicated by the input data (14b), using the result of a comparison between the embedded representation calculated from the input data (14b) and the embedded representation calculated from the reference data (14c).
Description
本発明は、推定方法、推定装置および推定プログラムに関する。
The present invention relates to an estimation method, an estimation device, and an estimation program.
従来、人間の音声や顔、身振り手振り等の非言語・パラ言語情報に表れる心の状態を自動的に推定する技術の研究開発が行われてきた。例えば、エージェントやロボットとの対話において、それらの反応の生成時に対話相手の心の状態を反映させたり、メンタルヘルスケアの一環として推定結果を活用したり、web会議等で参加者の心の状態を数値化して把握しやすくしたりすることが期待されている。
Conventionally, research and development have been conducted on technology that automatically estimates the state of mind that appears in non-verbal and paralinguistic information such as human voice, face, and gestures. For example, in dialogues with agents and robots, the mental state of the dialogue partner is reflected when generating those reactions, the estimation results are used as part of mental health care, and the mental state of participants in web conferences, etc. is expected to be quantified to make it easier to understand.
このような非言語・パラ言語情報に表れる心の状態の推定は、一般に、音声や動画像から抽出される特徴量やデータそのもの等の入力に対し、定義された心の状態を表す各ラベルの事後確率等を出力する教師あり学習として定義される(非特許文献1参照)。
Estimation of the state of mind that appears in such nonverbal/paralinguistic information is generally performed by labeling each label that represents a defined state of mind for inputs such as feature values and data itself extracted from speech and video images. It is defined as supervised learning that outputs posterior probabilities and the like (see Non-Patent Document 1).
ここで、心の状態およびその表出には個人差があるといわれている。これに対し、一般に、多人数のデータを収集して個人差を機械学習のモデルに吸収させる手法が用いられる。また、事前登録された参考となる音声や動画像を用いて、個人差を正規化したり吸収したりする手法も知られている(非特許文献2、3参照)。
Here, it is said that there are individual differences in the state of mind and its expression. On the other hand, in general, a technique is used in which data of a large number of people are collected and individual differences are absorbed into a machine learning model. Also known is a method of normalizing or absorbing individual differences using pre-registered reference sounds and moving images (see Non-Patent Documents 2 and 3).
しかしながら、従来技術では、非言語・パラ言語情報に表れる心の状態を表すラベルを精度高く推定することが困難であった。例えば、平常、喜び、悲しみ、驚き、恐怖、憎悪、怒り、軽蔑等の一般感情認識や、理解度等の特定の指標の段階等の細分化により、対応するラベルの分散が増加すると、平常状態だけでは十分に個人差を吸収したり正規化したりすることが困難な場合がある。また、モデルの再学習において、モデルパラメタを安定して評価対象者に適応させるために十分な量のデータを確保することや、確実に学習することが困難である。
However, with conventional technology, it is difficult to accurately estimate the labels that represent the states of mind that appear in nonverbal and paralinguistic information. For example, normality, joy, sadness, surprise, fear, hatred, anger, contempt, and other general emotion recognition, and subdivision of specific index stages such as comprehension, increase the variance of the corresponding label, and the normal state It may be difficult to sufficiently absorb or normalize individual differences only by In addition, when re-learning the model, it is difficult to secure a sufficient amount of data for stably adapting the model parameters to the person to be evaluated, and to perform the learning reliably.
本発明は、上記に鑑みてなされたものであって、非言語・パラ言語情報に表れる心の状態を表すラベルを精度高く推定することを目的とする。
The present invention has been made in view of the above, and aims to accurately estimate a label representing a state of mind appearing in nonverbal/paralinguistic information.
上述した課題を解決し、目的を達成するために、本発明に係る推定方法は、推定装置が実行する推定方法であって、非言語情報またはパラ言語情報と、該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データを取得する取得工程と、取得された前記学習データのうち、入力データと参照データとのそれぞれの特徴量を用いて、該入力データの心の状態の埋め込み表現と、該参照データの心の状態の埋め込み表現とを算出する算出工程と、前記入力データから算出された埋め込み表現と、前記参照データから算出された埋め込み表現との比較結果を用いて、前記入力データの心の状態を推定する推定工程と、を含んだことを特徴とする。
In order to solve the above-described problems and achieve the object, an estimation method according to the present invention is an estimation method executed by an estimation device, comprising: non-linguistic information or paralinguistic information; an acquisition step of acquiring learning data including a correct label representing a state of mind that appears; using a calculation step of calculating an embedded representation of a state and an embedded representation of a state of mind of the reference data, and a comparison result between the embedded representation calculated from the input data and the embedded representation calculated from the reference data; and an estimation step of estimating the state of mind of the input data.
本発明によれば、非言語・パラ言語情報に表れる心の状態を表すラベルを精度高く推定することが可能となる。
According to the present invention, it is possible to accurately estimate a label representing a state of mind appearing in nonverbal/paralinguistic information.
以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。
An embodiment of the present invention will be described in detail below with reference to the drawings. It should be noted that the present invention is not limited by this embodiment. Moreover, in the description of the drawings, the same parts are denoted by the same reference numerals.
[推定装置の構成]
図1は、推定装置の概略構成を例示する模式図である。また、図2は、推定装置の処理を説明するための図である。本実施形態の推定装置10は、非言語・パラ言語情報である対象者の上半身が映る動画に対して、ニューラルネットワークを用いて、非言語・パラ言語情報に表れる心の状態として、理解度を5段階で推定する。理解度は、例えば、1.理解していない、2.やや理解していない、3.平常状態、4.やや理解している、5.理解している、として、数字が大きいほど理解していることを表すように定義される。 [Configuration of estimation device]
FIG. 1 is a schematic diagram illustrating a schematic configuration of an estimation device. Also, FIG. 2 is a diagram for explaining the processing of the estimation device. The estimation device 10 of the present embodiment uses a neural network for a moving image showing the upper body of a subject, which is nonverbal/paralinguistic information, to calculate the degree of understanding as the state of mind that appears in the nonverbal/paralinguistic information. Estimated in 5 stages. The degree of comprehension is, for example, 1. 2. do not understand; Somewhat do not understand;3. 4. Normal state; 5. Somewhat understand; It is defined as understanding, and the higher the number, the better the understanding.
図1は、推定装置の概略構成を例示する模式図である。また、図2は、推定装置の処理を説明するための図である。本実施形態の推定装置10は、非言語・パラ言語情報である対象者の上半身が映る動画に対して、ニューラルネットワークを用いて、非言語・パラ言語情報に表れる心の状態として、理解度を5段階で推定する。理解度は、例えば、1.理解していない、2.やや理解していない、3.平常状態、4.やや理解している、5.理解している、として、数字が大きいほど理解していることを表すように定義される。 [Configuration of estimation device]
FIG. 1 is a schematic diagram illustrating a schematic configuration of an estimation device. Also, FIG. 2 is a diagram for explaining the processing of the estimation device. The estimation device 10 of the present embodiment uses a neural network for a moving image showing the upper body of a subject, which is nonverbal/paralinguistic information, to calculate the degree of understanding as the state of mind that appears in the nonverbal/paralinguistic information. Estimated in 5 stages. The degree of comprehension is, for example, 1. 2. do not understand; Somewhat do not understand;3. 4. Normal state; 5. Somewhat understand; It is defined as understanding, and the higher the number, the better the understanding.
まず、図1に例示するように、本実施形態の推定装置10は、パソコン等の汎用コンピュータで実現され、入力部11、出力部12、通信制御部13、記憶部14、および制御部15を備える。
First, as illustrated in FIG. 1, the estimation device 10 of the present embodiment is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15. Prepare.
入力部11は、キーボードやマウス等の入力デバイスを用いて実現され、実施者による入力操作に対応して、制御部15に対して処理開始などの各種指示情報を入力する。出力部12は、液晶ディスプレイなどの表示装置、プリンター等の印刷装置、情報通信装置等によって実現される。通信制御部13は、NIC(Network Interface Card)等で実現され、サーバや、学習用データを管理する装置等の外部の装置と制御部15とのネットワークを介した通信を制御する。
The input unit 11 is implemented using input devices such as a keyboard and a mouse, and inputs various instruction information such as processing start to the control unit 15 in response to input operations by the practitioner. The output unit 12 is implemented by a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, or the like. The communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and an external device such as a server or a device for managing learning data via a network.
記憶部14は、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。なお、記憶部14は、通信制御部13を介して制御部15と通信する構成でもよい。本実施形態において、記憶部14には、例えば、後述する推定処理に用いられる学習データ14aや、推定処理で生成・更新されるモデルパラメタ14d等が記憶される。
The storage unit 14 is implemented by semiconductor memory devices such as RAM (Random Access Memory) and flash memory, or storage devices such as hard disks and optical disks. Note that the storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13 . In the present embodiment, the storage unit 14 stores, for example, learning data 14a used for estimation processing, which will be described later, model parameters 14d generated and updated in the estimation processing, and the like.
ここで、図1に示したように、本実施形態の学習データ14aには、入力データ14bと参照データ14cとが含まれるが、データ構成は同様である。図3は、学習データのデータ構成を例示する図である。図3に示すように、学習データ14aには、少なくとも非言語・パラ言語情報としての対象者の上半身が映る動画データと、各動画データを識別するデータIDと、対象者を識別する個人IDと、各動画データに表れる理解度等の心の状態を表す正解ラベルとが含まれる。学習データ14aには、年齢、性別等の人物の属性を表すラベルが含まれていてもよい。また、必要に応じて、学習データ14aの学習、開発、あるいは評価セットへの分割やデータ拡張が行われてもよい。
Here, as shown in FIG. 1, the learning data 14a of this embodiment includes the input data 14b and the reference data 14c, but the data configuration is the same. FIG. 3 is a diagram illustrating a data configuration of learning data. As shown in FIG. 3, the learning data 14a includes at least video data showing the upper body of a subject as non-verbal/paralinguistic information, a data ID for identifying each video data, and a personal ID for identifying the subject. , and a correct label representing the state of mind such as the degree of understanding appearing in each moving image data. The learning data 14a may include labels representing attributes of a person such as age and gender. In addition, the learning data 14a may be learned, developed, divided into evaluation sets, or data expanded as necessary.
なお、コントラストの正規化、顔検出等の事前処理を行って、動画データのある領域のみが利用されてもよい。また、入力データ(動画データ)のコーデック等は特に限定されない。
It should be noted that preprocessing such as contrast normalization and face detection may be performed, and only areas with video data may be used. Also, the codec of the input data (moving image data) is not particularly limited.
具体的には、後述する推定処理で動画データから理解度を推定する場合に、例えばWebカメラで30フレーム/秒で収録されたH264形式の動画データを、1辺が224ピクセルとなるようにリサイズするとよい。X個の各動画データには、S人の対象者の個人ID,理解度の正解ラベルが付与される。
Specifically, when estimating the degree of comprehension from video data in the estimation process described later, for example, H264 format video data recorded by a web camera at 30 frames per second is resized so that one side is 224 pixels. do it. Each of the X pieces of moving image data is provided with the individual IDs of S subjects and the correct label of the degree of comprehension.
また、入力データと参照データとは、互いに同一データが混在しなければよい。後述する推定処理の際に、学習データ14aのうちの任意の組み合わせにより、入力データ14bと参照データ14cとを、同一データの混在を回避するようにして生成してもよい。
Also, the input data and the reference data should not contain the same data. During the estimation process, which will be described later, the input data 14b and the reference data 14c may be generated by any combination of the learning data 14a so as to avoid mixing of the same data.
制御部15は、CPU(Central Processing Unit)やNP(Network Processor)やFPGA(Field Programmable Gate Array)等を用いて実現され、メモリに記憶された処理プログラムを実行する。これにより、制御部15は、図1に例示するように、取得部15a、算出部15b、推定部15c、および学習部15dとして機能する。なお、これらの機能部は、それぞれが異なるハードウェアに実装されてもよい。例えば取得部15aは他の機能部とは異なるハードウェアに実装されてもよい。また、制御部15は、その他の機能部を備えてもよい。
The control unit 15 is implemented using a CPU (Central Processing Unit), NP (Network Processor), FPGA (Field Programmable Gate Array), etc., and executes a processing program stored in memory. Thereby, the control unit 15 functions as an acquisition unit 15a, a calculation unit 15b, an estimation unit 15c, and a learning unit 15d, as illustrated in FIG. Note that these functional units may be implemented in different hardware. For example, the acquisition unit 15a may be implemented in hardware different from other functional units. Also, the control unit 15 may include other functional units.
取得部15aは、非言語情報またはパラ言語情報と該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データを取得する。具体的には、取得部15aは、入力部11を介して、あるいは学習データを生成する装置等から通信制御部13を介して、非言語・パラ言語情報としての対象者の上半身が映る動画データと、各動画データを識別するデータIDと、各動画データに表れる理解度等の心の状態を表す正解ラベルとを含む学習データ14aを取得する。
The acquisition unit 15a acquires learning data including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information. Specifically, the acquisition unit 15a receives video data showing the upper body of the subject as nonverbal/paralinguistic information via the input unit 11 or from a device that generates learning data via the communication control unit 13. Then, the learning data 14a including the data ID for identifying each piece of moving image data and the correct label representing the state of mind such as the degree of understanding appearing in each piece of moving image data is obtained.
また、取得部15aは、以下の処理に先立って予め取得した学習データ14aを、記憶部14に記憶させる。なお、取得部15aは、取得した学習データ14aを記憶部14に記憶させずに以下に示す推定部15cに転送してもよい。
In addition, the acquisition unit 15a causes the storage unit 14 to store learning data 14a acquired in advance prior to the following processing. The acquiring unit 15a may transfer the acquired learning data 14a to the estimating unit 15c described below without storing the acquired learning data 14a in the storage unit 14. FIG.
図1の説明に戻る。算出部15bは、取得された学習データ14aのうち、入力データ14bと参照データ14cとのそれぞれの特徴量を用いて、該入力データの心の状態の埋め込み表現と、該参照データの心の状態の埋め込み表現とを算出する。
Return to the description of Figure 1. The calculation unit 15b uses the feature amounts of the input data 14b and the reference data 14c in the acquired learning data 14a to obtain the embedded representation of the state of mind of the input data and the state of mind of the reference data. and the embedded representation of .
なお、以下に説明するニューラルネットワークを用いた処理は、本実施形態に限定されず、例えば、Batch Normalization、ドロップアウト、L1/L2正則化等の周知の技術の要素が任意の箇所に付与されてもよい。
In addition, the processing using the neural network described below is not limited to this embodiment. good too.
具体的には、算出部15bは、まず、同一対象者についての入力データ14bと参照データ14cとから、それぞれの特徴量を抽出する。例えば、算出部15bは、特徴量として、音声のlog mel-filterbankや、MFCC(メル周波数ケプストラム係数)、動画のフレームごとのHOG(Histogram of Oriented Gradients)、HOF(Histogram of Optical Flow)等を抽出する。動画そのものを特徴量としてもよい。
Specifically, the calculation unit 15b first extracts feature amounts from the input data 14b and the reference data 14c for the same subject. For example, the calculation unit 15b extracts the log mel-filter bank of audio, MFCC (Mel frequency cepstrum coefficient), HOG (Histogram of Oriented Gradients), HOF (Histogram of Optical Flow), etc. for each frame of video as feature amounts. do. The moving image itself may be used as the feature quantity.
また、算出部15bは、必要に応じて、音声強調、雑音除去、コントラスト正規化、顔周辺領域の切り出し、特徴量の正規化等の前処理を行ってもよい。また、算出部15bは、特徴量を抽出する前に、雑音や残響の重畳、動画の回転やノイズ付与等のデータ拡散の処理を行ってもよい。
In addition, the calculation unit 15b may perform preprocessing such as voice enhancement, noise removal, contrast normalization, cutout of the face peripheral region, and feature amount normalization as necessary. In addition, the calculation unit 15b may perform data diffusion processing such as superimposition of noise and reverberation, rotation of moving images, and addition of noise before extracting the feature amount.
具体的には、算出部15bは、フレーム長Tの入力データ14bの動画データx1:TおよびN個の参照データ14cの動画データy1:T
(1,…,N)に対して、顔周辺領域のみを正方形に切り抜き、1辺224ピクセルとなるように再度リサイズする。また、算出部15bは、各ピクセルの値が0.0~1.0となるように、正規化する。N個の参照データ14cの動画データの全てに理解度3の正解ラベルが付与されていてもよいし、理解度1~5の各ラベルが混在していてもよい。
Specifically, the calculation unit 15b calculates the face data x 1: T of the input data 14b having a frame length T and the video data y 1:T (1, . . . , N) of the N pieces of reference data 14c. Only the peripheral area is cut into a square and resized again so that one side has 224 pixels. In addition, the calculation unit 15b normalizes the value of each pixel so that it ranges from 0.0 to 1.0. All of the moving image data of the N pieces of reference data 14c may be given a correct label with an understanding level of 3, or each label with an understanding level of 1 to 5 may be mixed.
また、入力データ14bと参照データ14cとには、同一の動画データが混在しないようい選定されているものとする。参照データ14cの動画データについて、メタデータを焼失させたり変形させたりする前処理によって、同一の動画データの混在を回避してもよい。
Also, it is assumed that the input data 14b and the reference data 14c are selected so as not to contain the same moving image data. For the moving image data of the reference data 14c, mixing of the same moving image data may be avoided by performing preprocessing such as burning out or transforming metadata.
次に、算出部15bは、入力データ14bと参照データ14cとの特徴量から埋め込み表現を算出する。例えば、算出部15bは、2D CNN(Convolutional Neural Network)やRNN(Recurrent Neural Network)を用いて、時間ごとの埋め込み表現Hを算出する。なお、算出部15bは、2D CNNを3D CNNに置き換えてもよいし、RNNをTransformerに置き換えてもよい。
Next, the calculation unit 15b calculates an embedded expression from the feature amounts of the input data 14b and the reference data 14c. For example, the calculation unit 15b calculates the embedded representation H for each time using a 2D CNN (Convolutional Neural Network) or RNN (Recurrent Neural Network). Note that the calculation unit 15b may replace the 2D CNN with a 3D CNN, or replace the RNN with a Transformer.
また、モデルパラメタ14dは、任意の他のタスクで事前学習されたものが含まれてもよいし、任意の乱数で初期値が生成されてもよい。また、学習済みのモデルパラメタ14dが用いられる場合には、モデルパラメタ14dの更新の可否が任意に決定されてもよい。
In addition, the model parameters 14d may include those pre-learned in any other task, or the initial values may be generated with arbitrary random numbers. Moreover, when the learned model parameter 14d is used, whether or not to update the model parameter 14d may be determined arbitrarily.
具体的には、図4に示すように、算出部15bは、2D CNNとD次元の出力次元をRNNとを用いて、次式(1)に示すように、入力動画データx1:Tから埋め込み表現テンソルHxを算出する。ここで、θはCNNのパラメタ集合、φはRNNのパラメタ集合である。
Specifically, as shown in FIG. 4, the calculation unit 15b uses the 2D CNN and the D-dimensional output dimension as the RNN to obtain the following equation (1) from the input video data x 1:T: Compute the embedded representation tensor H x . where θ is the CNN parameter set and φ is the RNN parameter set.
また、算出部15bは、次式(2)に示すように、参照動画データy1:T
(1,…,N)から埋め込み表現テンソルHyを算出する。
The calculation unit 15b also calculates an embedded expression tensor H y from the reference moving image data y 1:T (1, . . . , N) as shown in the following equation (2).
次に、算出部15bは、入力動画データx1:Tから算出された埋め込み表現と参照動画データy1:T
(1,…,N)から算出された埋め込み表現とを比較する。具体的には、算出部15bは、図4に示すように、入力データ14bから算出された埋め込み表現テンソルHxと、参照データ14cから算出された埋め込み表現テンソルHyを比較して、e(1,…,N)を算出する。
Next, the calculation unit 15b compares the embedding expression calculated from the input moving image data x1:T and the embedding expression calculated from the reference moving image data y1:T (1, . . . , N) . Specifically, as shown in FIG. 4, the calculation unit 15b compares the embedded expression tensor H x calculated from the input data 14b with the embedded expression tensor H y calculated from the reference data 14c to obtain e ( 1, . . . , N) .
例えば、算出部15bは、source-target attention機構を用いて比較する。この場合には、算出部15bは、次式(3)に示すように、比較結果ベクトルe(1,…,N)を算出する。
For example, the calculation unit 15b makes a comparison using a source-target attention mechanism. In this case, the calculator 15b calculates the comparison result vector e (1,...,N) as shown in the following equation (3).
上記式(3)では、算出部15bは、queryQi
(1,…,N)とkeyKiからattention weightを算出して、valueViに適用し、最後に時間方向の合計を算出している。
In the above equation ( 3), the calculation unit 15b calculates attention weight from queryQ i (1, .
ここで、d1はattention headsの数、iは各attention heads、Wi
Q、Wi
K、Wi
Vはそれぞれ、各attention headsにおけるQuery、key、valueに対する重みを表す。
Here, d1 is the number of attention heads, i is each attention head, and W i Q , W i K , and W i V are weights for Query, key, and value in each attention head.
なお、算出部15bは、source-target attention機構に限定されず、埋め込み表現間での所定の四則演算や結合等を用いて比較してもよい。
It should be noted that the calculation unit 15b is not limited to the source-target attention mechanism, and may be compared using predetermined four arithmetic operations, combinations, etc. between embedded expressions.
次に、算出部15bは、図4に示すように、e(1,…,N)同士を比較して、比較結果ベクトルvを算出する。その際に、算出部15bは、e(1,…,N)に、y1:T
(1,…,N)のメタデータ等の任意の情報を付加してもよい。
Next, the calculator 15b compares e (1, . . . , N) to calculate a comparison result vector v, as shown in FIG. At that time, the calculation unit 15b may add arbitrary information such as metadata of y1:T (1,...,N) to e( 1,...,N) .
例えば、算出部15bは、multi-head self attention機構を用いて、次式(4)に示すように、e(1,…,N)に、y1:T
(1,…,N)のメタデータm(1,…,N)を結合し、さらにそれらを結合したテンソルE1:Tを生成する。ここで、メタデータm(1,…,N)は、C段階(C=5)の理解度ラベルをone-hot vectorで表現したものである。
For example, using the multi-head self attention mechanism, the calculation unit 15b adds e (1, . . . , N) to y 1:T (1, . Combine the data m (1,...,N) and generate a tensor E1 :T combining them. Here, the metadata m (1, . . . , N) is a one-hot vector representation of the comprehension level label of the C level (C=5).
また、算出部15bは、multi-head self attention機構を用いて、次式(5)に示すように、E1:Tからvを算出する。
Further, the calculation unit 15b calculates v from E 1:T as shown in the following equation (5) using the multi-head self attention mechanism.
ここで、d2はattention headsの数、jは各attention heads、Wj
Q、Wj
K、Wj
Vはそれぞれ、各attention headsにおけるQuery、key、valueに対する重みを表す。
Here, d2 is the number of attention heads, j is each attention head, and WjQ , WjK , and WjV are weights for Query, key, and value in each attention head.
図1の説明に戻る。推定部15cは、入力データ14bから算出された埋め込み表現と、参照データ14cから算出された埋め込み表現との比較結果を用いて、入力データ14bの心の状態を推定する。
Return to the description of Figure 1. The estimating unit 15c estimates the state of mind of the input data 14b using the result of comparison between the embedded representation calculated from the input data 14b and the embedded representation calculated from the reference data 14c.
例えば、推定部15cは、上記したように、算出部15bにより入力データ14bから算出された埋め込み表現と、参照データ14cから算出された埋め込み表現とをmulti-head self atention機構を用いて比較した結果を用いて、入力データ14bの心の状態を推定する。
For example, as described above, the estimation unit 15c compares the embedding expression calculated from the input data 14b by the calculating unit 15b and the embedding expression calculated from the reference data 14c using the multi-head self-attention mechanism. is used to estimate the state of mind of the input data 14b.
具体的には、推定部15cは、図4に示すように、算出部15bが算出した比較結果ベクトルvから入力データ14bの心の状態を推定する。その際に、推定部15cは、任意のクラス数の分類問題として、各クラスに対する事後確率を算出して、心の状態を推定してもよい。すなわち、推定部15cは、心の状態の分類の各クラスに対する事後確率を算出して該心の状態を推定してもよい。あるいは、推定部15cは、回帰問題として心の状態を表す数値を推定してもよい。
Specifically, as shown in FIG. 4, the estimation unit 15c estimates the state of mind of the input data 14b from the comparison result vector v calculated by the calculation unit 15b. At that time, the estimating unit 15c may calculate the posterior probability for each class as a classification problem of an arbitrary number of classes to estimate the state of mind. That is, the estimating unit 15c may estimate the state of mind by calculating the posterior probability for each class of the state of mind classification. Alternatively, the estimation unit 15c may estimate a numerical value representing the state of mind as a regression problem.
例えば、推定部15cは、次式(6)に示すように、2層の全結合層を用いて、5段階の理解度のそれぞれに対する事後確率p(C|x1:T,y1:T
(1,…,N))を算出する。
For example, as shown in the following equation (6), the estimating unit 15c uses two fully connected layers to determine the posterior probability p(C|x 1:T , y 1:T (1,...,N) ) is calculated.
ここで、W1
FC、W2
FCは、2層の全結合層の重みを表し、DFCは1層目の全結合層の出力次元数を表し、Cは予測ラベルの数を表す(本実施形態ではC=5)。また、1層目の全結合層の活性化関数には、ReLU関数が用いられている。
where W 1 FC and W 2 FC represent the weights of the two fully connected layers, D FC represents the number of output dimensions of the first fully connected layer, and C represents the number of predicted labels (this C=5 in the embodiment). A ReLU function is used as the activation function of the first fully connected layer.
図1の説明に戻る。学習部15dは、入力データ14bと、推定された該入力データ14bの心の状態とを用いて、入力された非言語情報またはパラ言語情報に表れる心の状態を推定するモデルのモデルパラメタ14dを学習する。
Return to the description of Figure 1. The learning unit 15d uses the input data 14b and the estimated state of mind of the input data 14b to obtain model parameters 14d of a model that estimates the state of mind appearing in the input nonverbal information or paralinguistic information. learn.
具体的には、学習部15dは、モデルパラメタ集合Ωを更新し、学習済みモデルパラメタ集合Ω’を取得する。学習部15dは、周知の損失関数や更新手法を適用可能である。例えば、モデルパラメタ集合Ωは、任意の他のタスクで事前学習されたものが含まれてもよいし、任意の乱数で初期値が生成されてもよいし、一部のモデルパラメタが更新されなくてもよい。
Specifically, the learning unit 15d updates the model parameter set Ω and acquires the learned model parameter set Ω'. The learning unit 15d can apply well-known loss functions and update methods. For example, the model parameter set Ω may include those pre-trained in any other task, initial values may be generated with arbitrary random numbers, and some model parameters may not be updated. may
例えば、学習部15dは、確率的勾配法(SGD)を用いて、次式(7)に示す交差エントロピーLを損失関数として、モデルパラメタ集合Ωを更新する。
For example, the learning unit 15d uses the stochastic gradient method (SGD) to update the model parameter set Ω using the cross entropy L shown in the following equation (7) as a loss function.
ここで、mxは入力される動画データx1:Tの正解分布である。正解分布の表現手法は特に限定されず、例えば、one-hot vectorとして表現されてもよい。あるいは、正解分布は、正解クラスを中心とする正規分布を近似して表されてもよい。
Here, mx is the correct distribution of the input moving image data x 1:T . The method of expressing the correct answer distribution is not particularly limited, and may be expressed as a one-hot vector, for example. Alternatively, the correct distribution may be represented by approximating a normal distribution centered on the correct class.
なお、学習部15dは、取得した学習済みモデルパラメタ集合Ω’をモデルパラメタ14dとして、記憶部14に記憶させる。
Note that the learning unit 15d causes the storage unit 14 to store the acquired learned model parameter set Ω' as the model parameter 14d.
この場合には、上記したように、算出部15bは、学習済みのモデルパラメタ14bを用いて、入力データ14bの心の状態の埋め込み表現と、参照データ14cの心の状態の埋め込み表現とを算出する。
In this case, as described above, the calculation unit 15b uses the learned model parameters 14b to calculate the state-of-mind embedding representation of the input data 14b and the state-of-mind embedding representation of the reference data 14c. do.
[推定処理]
次に、推定装置10による推定処理について説明する。図5は、推定処理手順を示すフローチャートである。図5のフローチャートは、例えば、推定処理の開始を指示する入力があったタイミングで開始される。 [Estimation process]
Next, estimation processing by the estimation device 10 will be described. FIG. 5 is a flowchart showing an estimation processing procedure. The flowchart of FIG. 5 is started, for example, when an input instructing the start of the estimation process is received.
次に、推定装置10による推定処理について説明する。図5は、推定処理手順を示すフローチャートである。図5のフローチャートは、例えば、推定処理の開始を指示する入力があったタイミングで開始される。 [Estimation process]
Next, estimation processing by the estimation device 10 will be described. FIG. 5 is a flowchart showing an estimation processing procedure. The flowchart of FIG. 5 is started, for example, when an input instructing the start of the estimation process is received.
まず、取得部15aは、非言語情報またはパラ言語情報と該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データ14aを取得する(ステップS1)。
First, the acquisition unit 15a acquires the learning data 14a including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information (step S1).
次に、算出部15bが、取得された学習データ14aのうち、入力データ14bと参照データ14cとのそれぞれの特徴量を用いて、該入力データの心の状態の埋め込み表現と、該参照データの心の状態の埋め込み表現とを算出する(ステップS2)。
Next, the calculation unit 15b uses the feature amounts of the input data 14b and the reference data 14c in the acquired learning data 14a to obtain the embedded representation of the state of mind of the input data and the reference data. An embedded representation of the state of mind is calculated (step S2).
また、算出部15bは、入力データ14bから算出した埋め込み表現と、参照データ14cから算出した埋め込み表現とを比較する(ステップS3)。
The calculation unit 15b also compares the embedding expression calculated from the input data 14b and the embedding expression calculated from the reference data 14c (step S3).
そして、推定部15cが、入力データ14bから算出された埋め込み表現と、参照データ14cから算出された埋め込み表現との比較結果を用いて、入力データ14bの心の状態を推定する(ステップS4)。これにより、一連の推定処理が終了する。
Then, the estimating unit 15c estimates the state of mind of the input data 14b using the result of comparison between the embedded expression calculated from the input data 14b and the embedded expression calculated from the reference data 14c (step S4). This completes a series of estimation processes.
[効果]
以上、説明したように、本実施形態の推定装置10において、取得部15aが、非言語情報またはパラ言語情報と、該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データを取得する。また、算出部15bが、取得された学習データ14aのうち、入力データ14bと参照データ14cとのそれぞれの特徴量を用いて、該入力データ14bの心の状態の埋め込み表現と、該参照データ14cの心の状態の埋め込み表現とを算出する。また、推定部15cが、入力データ14bから算出された埋め込み表現と、参照データ14cから算出された埋め込み表現との比較結果を用いて、入力データ14bの心の状態を推定する。 [effect]
As described above, in the estimation device 10 of the present embodiment, the acquisition unit 15a performs learning including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information. Get data. Further, thecalculation unit 15b uses the feature amounts of the input data 14b and the reference data 14c in the acquired learning data 14a to obtain the state-of-mind embedding expression of the input data 14b and the reference data 14c. and the embedded representation of the state of mind of . Also, the estimation unit 15c estimates the state of mind of the input data 14b by using the result of comparison between the embedded representation calculated from the input data 14b and the embedded representation calculated from the reference data 14c.
以上、説明したように、本実施形態の推定装置10において、取得部15aが、非言語情報またはパラ言語情報と、該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データを取得する。また、算出部15bが、取得された学習データ14aのうち、入力データ14bと参照データ14cとのそれぞれの特徴量を用いて、該入力データ14bの心の状態の埋め込み表現と、該参照データ14cの心の状態の埋め込み表現とを算出する。また、推定部15cが、入力データ14bから算出された埋め込み表現と、参照データ14cから算出された埋め込み表現との比較結果を用いて、入力データ14bの心の状態を推定する。 [effect]
As described above, in the estimation device 10 of the present embodiment, the acquisition unit 15a performs learning including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information. Get data. Further, the
具体的には、推定部15cは、入力データ14bから算出された埋め込み表現と、参照データ14cから算出された埋め込み表現とをmulti-head self atention機構を用いて比較する。また、推定部15cは、心の状態の分類の各クラスに対する事後確率を算出して該心の状態を推定する。あるいは、推定部15cは、回帰問題として心の状態を表す数値を推定する。
Specifically, the estimation unit 15c compares the embedding expression calculated from the input data 14b and the embedding expression calculated from the reference data 14c using a multi-head self-attention mechanism. The estimating unit 15c also calculates the posterior probability for each class of the state of mind classification to estimate the state of mind. Alternatively, the estimation unit 15c estimates a numerical value representing the state of mind as a regression problem.
このように、推定装置10は、参考情報として、事前に登録した平常状態以外の複数の正解ラベルを用いて心の状態を推定する。これにより、個人差を正規化したり吸収したりし切れないほどにラベルの分散が大きくても、非言語・パラ言語情報に表れる心の状態を表すラベルを精度高く推定することが可能となる。
In this way, the estimation device 10 estimates the state of mind using a plurality of correct labels other than the normal state registered in advance as reference information. As a result, even if the variance of the labels is so large that individual differences cannot be normalized or absorbed, it is possible to accurately estimate the label representing the state of mind appearing in the nonverbal/paralinguistic information.
また、モデルの再学習を必要としないため、分類の各クラスにおいてバランスよく適当量のデータを収集したり、学習を監視したりすることが不要であり、低リソースで処理が可能となる。
In addition, since there is no need to retrain the model, there is no need to collect an appropriate amount of data in a well-balanced manner for each class of classification, or to monitor learning, making it possible to process with low resources.
また、学習部15dは、入力データ14bと、推定された該入力データ14bの心の状態とを用いて、入力された非言語情報またはパラ言語情報に表れる心の状態を推定するモデルのモデルパラメタ14dを学習する。この場合には、算出部15bは、学習済みのモデルパラメタ14bを用いて、入力データ14bの心の状態の埋め込み表現と、参照データ14cの心の状態の埋め込み表現とを算出する。これにより、さらに高精度に非言語・パラ言語情報に表れる心の状態を表すラベルを推定することが可能となる。
In addition, the learning unit 15d uses the input data 14b and the estimated state of mind of the input data 14b to set the model parameter of the model for estimating the state of mind appearing in the input nonverbal information or paralinguistic information. Study 14d. In this case, the calculation unit 15b uses the learned model parameters 14b to calculate the state-of-mind embedding representation of the input data 14b and the state-of-mind embedding representation of the reference data 14c. As a result, it becomes possible to estimate the label representing the state of mind appearing in the nonverbal/paralinguistic information with higher accuracy.
[実施例]
図6は、実施例を説明するための図である。図6は、本発明を含む3つの手法で、同一人物の未知の動画データについて5段階の理解度を推定した場合の各手法における精度を示す。ここでは、参照データを利用しない一般的な手法(なし)、平常状態のみの参照データを用いて個人差を吸収する手法(理解度3のみ)、および本発明を適用した手法(理解度2,3,4)を適用した。「なし」では、参照データを用いずに埋め込み表現Hxに対してself-attention機構と全結合層とを用いて理解度を推定した。また、「理解度3のみ」では、理解度3の参照データだけを用いて理解度を推定した(N=3)。また、「理解度2,3,4」では、理解度2~4の参照データを用いて理解度を推定した(N=3)。 [Example]
FIG. 6 is a diagram for explaining the embodiment. FIG. 6 shows the accuracy of each of the three methods, including the present invention, when estimating five levels of intelligibility for unknown moving image data of the same person. Here, a general method that does not use reference data (none), a method that absorbs individual differences using reference data only in normal conditions (understanding level 3 only), and a method that applies the present invention (understanding level 2, 3, 4) were applied. For “none”, comprehension was estimated using the self-attention mechanism and the fully connected layer for the embedded representation H x without using reference data. In addition, in the case of "understanding level 3 only", the understanding level was estimated using only the reference data with the understanding level 3 (N=3). In addition, for "level of understanding 2, 3, 4", the level of understanding was estimated using the reference data for levels of understanding 2 to 4 (N=3).
図6は、実施例を説明するための図である。図6は、本発明を含む3つの手法で、同一人物の未知の動画データについて5段階の理解度を推定した場合の各手法における精度を示す。ここでは、参照データを利用しない一般的な手法(なし)、平常状態のみの参照データを用いて個人差を吸収する手法(理解度3のみ)、および本発明を適用した手法(理解度2,3,4)を適用した。「なし」では、参照データを用いずに埋め込み表現Hxに対してself-attention機構と全結合層とを用いて理解度を推定した。また、「理解度3のみ」では、理解度3の参照データだけを用いて理解度を推定した(N=3)。また、「理解度2,3,4」では、理解度2~4の参照データを用いて理解度を推定した(N=3)。 [Example]
FIG. 6 is a diagram for explaining the embodiment. FIG. 6 shows the accuracy of each of the three methods, including the present invention, when estimating five levels of intelligibility for unknown moving image data of the same person. Here, a general method that does not use reference data (none), a method that absorbs individual differences using reference data only in normal conditions (understanding level 3 only), and a method that applies the present invention (understanding level 2, 3, 4) were applied. For “none”, comprehension was estimated using the self-attention mechanism and the fully connected layer for the embedded representation H x without using reference data. In addition, in the case of "understanding level 3 only", the understanding level was estimated using only the reference data with the understanding level 3 (N=3). In addition, for "level of understanding 2, 3, 4", the level of understanding was estimated using the reference data for levels of understanding 2 to 4 (N=3).
図6に示すように、参照データを用いない一般的な手法(なし)や、平常状態のみの参照データを用いる手法(理解度3のみ)と比較して、複数の状態の参照データを用いる本発明の手法(理解度2,3,4)では、動画データに表れる理解度の推定の精度が安定して向上することが確認された。
As shown in Fig. 6, compared to the general method that does not use reference data (none) and the method that uses reference data only for normal conditions (comprehension level 3 only), this method using reference data for multiple states It was confirmed that the method of the invention (levels of understanding 2, 3, and 4) stably improves the accuracy of estimating the level of understanding appearing in moving image data.
[プログラム]
上記実施形態に係る推定装置10が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。一実施形態として、推定装置10は、パッケージソフトウェアやオンラインソフトウェアとして上記の推定処理を実行する推定プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の推定プログラムを情報処理装置に実行させることにより、情報処理装置を推定装置10として機能させることができる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やPHS(Personal Handyphone System)等の移動体通信端末、さらには、PDA(Personal Digital Assistant)等のスレート端末等がその範疇に含まれる。また、推定装置10の機能を、クラウドサーバに実装してもよい。 [program]
It is also possible to create a program in which the processing executed by the estimation device 10 according to the above embodiment is described in a computer-executable language. As one embodiment, the estimating device 10 can be implemented by installing an estimating program that executes the above estimating process as package software or online software on a desired computer. For example, the information processing device can function as the estimation device 10 by causing the information processing device to execute the above estimation program. In addition, information processing devices include mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone Systems), and slate terminals such as PDAs (Personal Digital Assistants). Also, the functions of the estimation device 10 may be implemented in a cloud server.
上記実施形態に係る推定装置10が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。一実施形態として、推定装置10は、パッケージソフトウェアやオンラインソフトウェアとして上記の推定処理を実行する推定プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の推定プログラムを情報処理装置に実行させることにより、情報処理装置を推定装置10として機能させることができる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やPHS(Personal Handyphone System)等の移動体通信端末、さらには、PDA(Personal Digital Assistant)等のスレート端末等がその範疇に含まれる。また、推定装置10の機能を、クラウドサーバに実装してもよい。 [program]
It is also possible to create a program in which the processing executed by the estimation device 10 according to the above embodiment is described in a computer-executable language. As one embodiment, the estimating device 10 can be implemented by installing an estimating program that executes the above estimating process as package software or online software on a desired computer. For example, the information processing device can function as the estimation device 10 by causing the information processing device to execute the above estimation program. In addition, information processing devices include mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone Systems), and slate terminals such as PDAs (Personal Digital Assistants). Also, the functions of the estimation device 10 may be implemented in a cloud server.
図7は、推定プログラムを実行するコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010と、CPU1020と、ハードディスクドライブインタフェース1030と、ディスクドライブインタフェース1040と、シリアルポートインタフェース1050と、ビデオアダプタ1060と、ネットワークインタフェース1070とを有する。これらの各部は、バス1080によって接続される。
FIG. 7 is a diagram showing an example of a computer that executes an estimation program. Computer 1000 includes, for example, memory 1010 , CPU 1020 , hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
メモリ1010は、ROM(Read Only Memory)1011およびRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1031に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1041に接続される。ディスクドライブ1041には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース1050には、例えば、マウス1051およびキーボード1052が接続される。ビデオアダプタ1060には、例えば、ディスプレイ1061が接続される。
The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012 . The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1031 . Disk drive interface 1040 is connected to disk drive 1041 . A removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041, for example. A mouse 1051 and a keyboard 1052 are connected to the serial port interface 1050, for example. For example, a display 1061 is connected to the video adapter 1060 .
ここで、ハードディスクドライブ1031は、例えば、OS1091、アプリケーションプログラム1092、プログラムモジュール1093およびプログラムデータ1094を記憶する。上記実施形態で説明した各情報は、例えばハードディスクドライブ1031やメモリ1010に記憶される。
Here, the hard disk drive 1031 stores an OS 1091, application programs 1092, program modules 1093 and program data 1094, for example. Each piece of information described in the above embodiment is stored in the hard disk drive 1031 or the memory 1010, for example.
また、推定プログラムは、例えば、コンピュータ1000によって実行される指令が記述されたプログラムモジュール1093として、ハードディスクドライブ1031に記憶される。具体的には、上記実施形態で説明した推定装置10が実行する各処理が記述されたプログラムモジュール1093が、ハードディスクドライブ1031に記憶される。
Also, the estimation program is stored in the hard disk drive 1031 as a program module 1093 in which instructions to be executed by the computer 1000 are written, for example. Specifically, the hard disk drive 1031 stores a program module 1093 that describes each process executed by the estimation device 10 described in the above embodiment.
また、推定プログラムによる情報処理に用いられるデータは、プログラムデータ1094として、例えば、ハードディスクドライブ1031に記憶される。そして、CPU1020が、ハードディスクドライブ1031に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して、上述した各手順を実行する。
In addition, data used for information processing by the estimation program is stored as program data 1094 in the hard disk drive 1031, for example. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the hard disk drive 1031 to the RAM 1012 as necessary, and executes each procedure described above.
なお、推定プログラムに係るプログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1031に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ1041等を介してCPU1020によって読み出されてもよい。あるいは、推定プログラムに係るプログラムモジュール1093やプログラムデータ1094は、LAN(Local Area Network)やWAN(Wide Area Network)等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。
Note that the program module 1093 and program data 1094 related to the estimation program are not limited to being stored in the hard disk drive 1031. For example, they are stored in a removable storage medium and read by the CPU 1020 via the disk drive 1041 or the like. may be Alternatively, the program module 1093 and program data 1094 related to the estimation program are stored in another computer connected via a network such as LAN (Local Area Network) or WAN (Wide Area Network), and via network interface 1070 It may be read by CPU 1020 .
以上、本発明者によってなされた発明を適用した実施形態について説明したが、本実施形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施形態に基づいて当業者等によりなされる他の実施形態、実施例および運用技術等は全て本発明の範疇に含まれる。
Although the embodiment to which the invention made by the present inventor is applied has been described above, the present invention is not limited by the descriptions and drawings forming part of the disclosure of the present invention according to the present embodiment. That is, other embodiments, examples, operation techniques, etc. made by those skilled in the art based on this embodiment are all included in the scope of the present invention.
10 推定装置
11 入力部
12 出力部
13 通信制御部
14 記憶部
14a 学習データ
14b 入力データ
14c 参照データ
14d モデルパラメタ
15 制御部
15a 取得部
15b 算出部
15c 推定部
15d 学習部 10estimation device 11 input unit 12 output unit 13 communication control unit 14 storage unit 14a learning data 14b input data 14c reference data 14d model parameter 15 control unit 15a acquisition unit 15b calculation unit 15c estimation unit 15d learning unit
11 入力部
12 出力部
13 通信制御部
14 記憶部
14a 学習データ
14b 入力データ
14c 参照データ
14d モデルパラメタ
15 制御部
15a 取得部
15b 算出部
15c 推定部
15d 学習部 10
Claims (7)
- 推定装置が実行する推定方法であって、
非言語情報またはパラ言語情報と、該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データを取得する取得工程と、
取得された前記学習データのうち、入力データと参照データとのそれぞれの特徴量を用いて、該入力データの心の状態の埋め込み表現と、該参照データの心の状態の埋め込み表現とを算出する算出工程と、
前記入力データから算出された埋め込み表現と、前記参照データから算出された埋め込み表現との比較結果を用いて、前記入力データの心の状態を推定する推定工程と、
を含んだことを特徴とする推定方法。 An estimation method executed by an estimation device,
an acquiring step of acquiring learning data including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information;
Using the feature amounts of the input data and the reference data of the acquired learning data, the embedded expression of the state of mind of the input data and the embedded expression of the state of mind of the reference data are calculated. a calculation step;
an estimation step of estimating the state of mind of the input data using a comparison result between the embedded representation calculated from the input data and the embedded representation calculated from the reference data;
An estimation method characterized by including - 前記推定工程は、前記入力データから算出された埋め込み表現と、前記参照データから算出された埋め込み表現とをmulti-head self atention機構を用いて比較することを特徴とする請求項1に記載の推定方法。 2. The estimation according to claim 1, wherein the estimation step compares the embedded expression calculated from the input data and the embedded expression calculated from the reference data using a multi-head self-attention mechanism. Method.
- 前記推定工程は、前記心の状態の分類の各クラスに対する事後確率を算出して該心の状態を推定することを特徴とする請求項1に記載の推定方法。 The estimation method according to claim 1, wherein the estimation step estimates the state of mind by calculating the posterior probability for each class of the state of mind classification.
- 前記推定工程は、回帰問題として前記心の状態を表す数値を推定することを特徴とする請求項1に記載の推定方法。 The estimation method according to claim 1, wherein the estimation step estimates a numerical value representing the state of mind as a regression problem.
- 前記入力データと、推定された該入力データの心の状態とを用いて、入力された非言語情報またはパラ言語情報に表れる心の状態を推定するモデルのモデルパラメタを学習する学習工程をさらに含み、
前記算出工程は、学習された前記モデルパラメタを用いて、前記入力データの心の状態の埋め込み表現と、前記参照データの心の状態の埋め込み表現とを算出することを特徴とする請求項1に記載の推定方法。 Further comprising a learning step of learning model parameters of a model for estimating the state of mind appearing in the input nonverbal information or paralinguistic information, using the input data and the estimated state of mind of the input data. ,
2. The method according to claim 1, wherein said calculating step calculates an embedded representation of a state of mind in said input data and an embedded representation of a state of mind in said reference data using said learned model parameters. Estimation method described. - 非言語情報またはパラ言語情報と、該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データを取得する取得部と、
取得された前記学習データのうち、入力データと参照データとのそれぞれの特徴量を用いて、該入力データの心の状態の埋め込み表現と、該参照データの心の状態の埋め込み表現とを算出する算出部と、
前記入力データから算出された埋め込み表現と、前記参照データから算出された埋め込み表現との比較結果を用いて、前記入力データの心の状態を推定する推定部と、
を有することを特徴とする推定装置。 an acquisition unit for acquiring learning data including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information;
Using the feature amounts of the input data and the reference data of the acquired learning data, the embedded expression of the state of mind of the input data and the embedded expression of the state of mind of the reference data are calculated. a calculation unit;
an estimating unit that estimates the state of mind of the input data using a comparison result between the embedded expression calculated from the input data and the embedded expression calculated from the reference data;
An estimation device characterized by comprising: - 非言語情報またはパラ言語情報と、該非言語情報またはパラ言語情報に表れる心の状態を表す正解ラベルとを含む学習データを取得する取得ステップと、
取得された前記学習データのうち、入力データと参照データとのそれぞれの特徴量を用いて、該入力データの心の状態の埋め込み表現と、該参照データの心の状態の埋め込み表現とを算出する算出ステップと、
前記入力データから算出された埋め込み表現と、前記参照データから算出された埋め込み表現との比較結果を用いて、前記入力データの心の状態を推定する推定ステップと、
をコンピュータに実行させるための推定プログラム。 an acquisition step of acquiring learning data including nonverbal information or paralinguistic information and correct labels representing states of mind appearing in the nonverbal information or paralinguistic information;
Using the feature amounts of the input data and the reference data of the acquired learning data, the embedded expression of the state of mind of the input data and the embedded expression of the state of mind of the reference data are calculated. a calculation step;
an estimation step of estimating the state of mind of the input data using a comparison result between the embedded representation calculated from the input data and the embedded representation calculated from the reference data;
An estimation program for causing a computer to execute
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023544819A JPWO2023032014A1 (en) | 2021-08-30 | 2021-08-30 | |
PCT/JP2021/031791 WO2023032014A1 (en) | 2021-08-30 | 2021-08-30 | Estimation method, estimation device, and estimation program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/031791 WO2023032014A1 (en) | 2021-08-30 | 2021-08-30 | Estimation method, estimation device, and estimation program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023032014A1 true WO2023032014A1 (en) | 2023-03-09 |
Family
ID=85412275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/031791 WO2023032014A1 (en) | 2021-08-30 | 2021-08-30 | Estimation method, estimation device, and estimation program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2023032014A1 (en) |
WO (1) | WO2023032014A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180114057A1 (en) * | 2016-10-21 | 2018-04-26 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing facial expression |
WO2021166207A1 (en) * | 2020-02-21 | 2021-08-26 | 日本電信電話株式会社 | Recognition device, learning device, method for same, and program |
-
2021
- 2021-08-30 JP JP2023544819A patent/JPWO2023032014A1/ja active Pending
- 2021-08-30 WO PCT/JP2021/031791 patent/WO2023032014A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180114057A1 (en) * | 2016-10-21 | 2018-04-26 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing facial expression |
WO2021166207A1 (en) * | 2020-02-21 | 2021-08-26 | 日本電信電話株式会社 | Recognition device, learning device, method for same, and program |
Non-Patent Citations (1)
Title |
---|
MIMURA ATSUSHI, HAGIWARA MASAFUMI: "Understanding Presumption System from Facial Images", DENKI GAKKAI RONBUNSHI. C, EREKUTORONIKUSU, JOHO KOGAKU, SHISUTEMU, THE INSTITUTE OF ELECTRICAL ENGINEERS OF JAPAN, 1 February 2000 (2000-02-01), pages 273 - 278, XP093043173, Retrieved from the Internet <URL:https://www.jstage.jst.go.jp/article/ieejeiss1987/120/2/120_2_273/_pdf/-char/ja> [retrieved on 20230501], DOI: 10.1541/ieejeiss1987.120.2_273 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2023032014A1 (en) | 2023-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11562145B2 (en) | Text classification method, computer device, and storage medium | |
JP7306062B2 (en) | Knowledge transfer method, information processing device and storage medium | |
WO2022007823A1 (en) | Text data processing method and device | |
WO2020007129A1 (en) | Context acquisition method and device based on voice interaction | |
CN111814620A (en) | Face image quality evaluation model establishing method, optimization method, medium and device | |
KR20190081243A (en) | Method and apparatus of recognizing facial expression based on normalized expressiveness and learning method of recognizing facial expression | |
WO2020098083A1 (en) | Call separation method and apparatus, computer device and storage medium | |
WO2021051497A1 (en) | Pulmonary tuberculosis determination method and apparatus, computer device, and storage medium | |
WO2020073533A1 (en) | Automatic question answering method and device | |
Salathé et al. | Focus group on artificial intelligence for health | |
WO2020252903A1 (en) | Au detection method and apparatus, electronic device, and storage medium | |
WO2020244151A1 (en) | Image processing method and apparatus, terminal, and storage medium | |
US20240152770A1 (en) | Neural network search method and related device | |
CN112884326A (en) | Video interview evaluation method and device based on multi-modal analysis and storage medium | |
US20230107505A1 (en) | Classifying out-of-distribution data using a contrastive loss | |
CN114817612A (en) | Method and related device for calculating multi-modal data matching degree and training calculation model | |
CN112037904B (en) | Online diagnosis and treatment data processing method and device, computer equipment and storage medium | |
WO2023032014A1 (en) | Estimation method, estimation device, and estimation program | |
CN113327212B (en) | Face driving method, face driving model training device, electronic equipment and storage medium | |
WO2023032016A1 (en) | Estimation method, estimation device, and estimation program | |
JP6992725B2 (en) | Para-language information estimation device, para-language information estimation method, and program | |
JP5931021B2 (en) | Personal recognition tendency model learning device, personal recognition state estimation device, personal recognition tendency model learning method, personal recognition state estimation method, and program | |
CN113515935A (en) | Title generation method, device, terminal and medium | |
US11875785B2 (en) | Establishing user persona in a conversational system | |
WO2023119672A1 (en) | Inference method, inference device, and inference program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21955909 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023544819 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |