WO2022064660A1 - 機械学習プログラム、機械学習方法および推定装置 - Google Patents

機械学習プログラム、機械学習方法および推定装置 Download PDF

Info

Publication number
WO2022064660A1
WO2022064660A1 PCT/JP2020/036456 JP2020036456W WO2022064660A1 WO 2022064660 A1 WO2022064660 A1 WO 2022064660A1 JP 2020036456 W JP2020036456 W JP 2020036456W WO 2022064660 A1 WO2022064660 A1 WO 2022064660A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
machine learning
model
subject
movement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/036456
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
淳哉 斎藤
昭嘉 内田
健太郎 村瀬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN202080105040.6A priority Critical patent/CN116018613A/zh
Priority to EP20955254.6A priority patent/EP4220546A4/en
Priority to JP2022551068A priority patent/JP7396509B2/ja
Priority to PCT/JP2020/036456 priority patent/WO2022064660A1/ja
Publication of WO2022064660A1 publication Critical patent/WO2022064660A1/ja
Priority to US18/119,342 priority patent/US20230237845A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/175Static expression
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • An embodiment of the present invention relates to a facial expression estimation technique.
  • AU Action Unit
  • a typical form of the AU estimation engine that estimates AU is based on machine learning based on a large amount of teacher data, and as teacher data, facial expression image data, Occurrence (presence or absence) and Integrity (occurrence) of each AU. Strength) is used.
  • the Occurrence and Integrity of the teacher data are annotated by a specialist called a coder. In the following, only the Integrity may be described, but the same applies to Occurrence.
  • AU's Integrity is uniformly defined as the movement of facial muscles.
  • the amount of movement of the skin surface and changes in appearance which vary depending on the age, skeleton, degree of obesity, and how the skin and facial muscles are connected. Therefore, it is difficult to define a unified integrity boundary standard (hereinafter, may be simply referred to as “standard”) for all people. Therefore, the boundary criteria of Intensity that can be observed from the outside must be ambiguous.
  • the coder guesses the movement of the facial muscles while watching the video of the subject to which the Integrity is given as the correct label, and gives the Integrity to each frame image.
  • the boundary standard of Intensity that can be observed from the outside is ambiguous and not uniform for all people, the correct label given by the coder may deviate from the boundary standard depending on the subject.
  • One aspect is to provide a machine learning program, a machine learning method, and an estimation device that can improve the accuracy of facial expression estimation.
  • the machine learning program causes a computer to execute a process of generating a trained model and a process of generating a third model.
  • the process of generating the trained model indicates which of the first image and the second image pair and the first image and the second image has the larger movement of the facial muscles of the subject.
  • the first output value obtained by inputting the first image into the first model and the second image sharing the parameters with the first model.
  • a trained model is generated by performing machine learning of the first model based on the second output value obtained by inputting to the second model and the first label.
  • the third output value obtained by inputting the third image into the trained model and the intensity or generation of the movement of the facial muscles of the subject included in the third image are generated.
  • a third model is generated by machine learning based on the second label indicating the presence or absence of.
  • the accuracy of facial expression estimation can be improved.
  • FIG. 1 is an explanatory diagram illustrating an outline of an embodiment.
  • FIG. 2 is a block diagram showing a functional configuration example of the information processing apparatus according to the first embodiment.
  • FIG. 3 is a flowchart showing an operation example of the information processing apparatus according to the first embodiment.
  • FIG. 4 is a flowchart showing an operation example of the information processing apparatus according to the first embodiment.
  • FIG. 5 is a block diagram showing a functional configuration example of the information processing apparatus according to the second embodiment.
  • FIG. 6 is a block diagram showing a functional configuration example of the information processing apparatus according to the third embodiment.
  • FIG. 7 is a block diagram showing a functional configuration example of the information processing apparatus according to the fourth embodiment.
  • FIG. 8 is an explanatory diagram illustrating the processing content of the information processing apparatus according to the fourth embodiment.
  • FIG. 9 is a block diagram showing an example of a computer configuration.
  • the information is selected from the learning data set (training data) for learning. Learn about boundary criteria by excluding information that is affected by changes.
  • FIG. 1 is an explanatory diagram illustrating an outline of an embodiment.
  • the image pair (a 1 , a 2 ), which is a set of images of the same subject a, and which image of the image pair has the movement of the facial muscle of the subject a.
  • the correct answer label indicating whether it is large is used as the training data set D1.
  • the integrity given to each image by the coder as the correct answer flag is compared to obtain the correct answer label.
  • the Integrity in the image a 1 is larger than the Integrity in the image a 2 , "1" is used as the correct label. If the Integrity in the image a 1 is smaller than the Integrity in the image a 2 , "0" is used as the correct label.
  • the learning data set D1 is prepared for each of a plurality of subjects.
  • the neural network NN is trained so that the order Intensity whose order relationship is guaranteed can be calculated by using the prepared training data set D1.
  • the neural network NN is a pre-learning model M1 and M2 (shared with each other's parameters (weights)) trained in advance using a general object recognition data set and the output values of the pre-learning models M1 and M2. It is a network using a loss function F1 for evaluating.
  • the sequence Integrity ( I a1 ) is obtained as an output value with respect to the input of the image a1 in the training data set D1. Further, from the pre-learning model M2, the sequence Integrity (I a2 ) is obtained as an output value with respect to the input of the image a2 in the training data set D1.
  • the loss function F1 is based on the correct label L1 and the output values (I a1 and I a2 ) of the pre-learning models M1 and M2, and the correct label is 1 and I a1 > I a2 or the correct label is 0 and I a1 . ⁇ If I a2 , the order is correct, so the loss value is calculated small. Further, in the loss function F1, if the correct answer label is 1 and I a1 ⁇ I a2 , or if the correct answer label is 0 and I a1 > I a2 , the order is incorrect. Output.
  • the parameters (parameters of the pre-learning model M1) in the neural network NN are relearned so that the loss value output by the loss function F1 becomes small.
  • the model using the relearned parameters (trained model) it becomes possible to calculate the order intensity in which the order relationship is guaranteed for the input image.
  • the order Intensity estimated by the generated model and the correct answer flag (for example, the value of the Intensity having a range of 0 to 5) given by the coder included in the training data are converted into the order Intensity.
  • Learn the function This conversion function may convert the order Integrity to Occurrence from the correct answer flag regarding Occurren given by the coder.
  • noise is included in the order Intensity and the correct answer flag of the coder (for example, the value of Intensity), but the conversion function takes the form of a simple monotonic increase function, so it is not affected by noise. It is possible to learn to.
  • machine learning is divided into two stages to generate a model and a transformation function, so that the boundary criteria of Intensity are learned by excluding the information affected by the change.
  • machine learning in the present embodiment, it is possible to generate a model that correctly captures features related to facial expression estimation such as movement of facial muscles, and it is possible to improve the accuracy of facial expression estimation in the AU estimation engine.
  • FIG. 2 is a block diagram showing a functional configuration example of the information processing apparatus according to the first embodiment.
  • the information processing apparatus 1 includes a pair data set creation unit 10, an order score learning unit 11, a conversion data set creation unit 12, a conversion function learning unit 13, an order score estimation unit 30, and a conversion processing unit 31.
  • a pair data set creation unit 10 an order score learning unit 11
  • a conversion data set creation unit 12 a conversion function learning unit 13
  • an order score estimation unit 30 an order score estimation unit 31.
  • the pair data set creation unit 10, the order score learning unit 11, the conversion data set creation unit 12, and the conversion function learning unit 13 are functional units that mainly perform processing related to the learning phase (S1) for executing machine learning. .. Further, the order score estimation unit 30 and the conversion processing unit 31 are functional units that mainly perform processing related to the estimation phase (S2) in which the label 51 is estimated from the image 50 to be estimated using the model generated by machine learning.
  • the pair data set creation unit 10 uses the image / label DB 40 for storing training data for machine learning to indicate that the image pair (a 1 , a 2 ) of the same subject a or the image of the image pair is the subject. It is a processing unit for creating a learning data set D1 including a correct answer label indicating whether or not the movement of the facial muscle of a is large.
  • the image / label DB 40 stores, for example, each image related to a plurality of subjects a to which a correct answer flag (for example, Intensity) by a coder or the like and metadata such as subject information indicating the subject a are added as training data. do.
  • the training data stored in the image / label DB 40 may include a moving image in which each image is one frame.
  • the pair data set creation unit 10 obtains, for example, an image pair which is a set of images of the same subject a based on the metadata in each image of the image / label DB 40. Further, the pair data set creating unit 10 may obtain an image pair from the image / label DB 40 in which the difference in movement of the facial muscles of the subject a (for example, the difference in intensity) in each image is equal to or greater than a specific value.
  • the subject ID in the subject information is the same person, but the attributes (for example, age, gender, person) such that the boundary standard of Intensity does not change between the subjects.
  • the species may be the same subject a.
  • the coder assigns the correct answer flag in the unit of the divided video, even if the person is the same person, the standard is determined by the video due to the ambiguity of the boundary standard of Integrity. In such a case, it may be treated as the same subject a only when the moving images are the same.
  • the pair data set creation unit 10 obtains a correct answer label indicating which image of the image pair has the larger movement of the facial muscle of the subject a by comparing the correct answer flags included in the metadata of the image pair. By repeating the above processing, the pair data set creation unit 10 creates a learning data set D1 for each of the plurality of subjects a.
  • the order score learning unit 11 is a processing unit that learns the neural network NN so that the order integrity with a guaranteed order relationship can be calculated using the learning data set D1 created by the pair data set creation unit 10.
  • the order score learning unit 11 constructs the pre-learning models M1 and M2 (mutual parameters are shared) with the parameters acquired from the pre-learning model DB 41 that stores the parameters (weights) of the pre-learning models M1 and M2. do.
  • the pre-learning models M1 and M2 are, for example, VGG16 pre-trained with the ImageNet dataset, and the output of the final layer is one-dimensional.
  • the order score learning unit 11 obtains the loss value by the loss function F1 based on the output values of the pre-learning models M1 and M2 and the correct answer label L1.
  • the loss function F1 for example, it is assumed that a function such as the following equation (1) is used.
  • the ordinal score learning unit 11 relearns the parameters (parameters of the pre-learning model M1) in the neural network NN so that the loss value output by the loss function F1 becomes small.
  • the neural network NN may be ResNet.
  • you may use the model trained in advance using the face image data set.
  • a model in which weights are initialized by random numbers may be used without pre-learning.
  • the order score learning unit 11 stores the parameters obtained by the re-learning in the learning model DB 20.
  • the conversion data set creation unit 12 is a processing unit in which the conversion function learning unit 13 creates a data set for learning the conversion function. Specifically, the conversion data set creation unit 12 reads the image included in the training data of the image / label DB 40 and the correct answer flag (for example, Integrity or Occurrence) given to the image. Next, the conversion data set creation unit 12 inputs the read image to the order score estimation unit 30, and acquires the output (order Integrity) from the trained model generated by the learning of the order score learning unit 11. Next, the conversion data set creation unit 12 creates a data set in which the Integrity (or Occurrence) of the correct answer flag given to the image for the acquired order Integrity is used as the correct answer label.
  • the Integrity or Occurrence
  • the conversion function learning unit 13 is a processing unit that uses the data set created by the conversion data set creation unit 12 to machine-learn a conversion function that converts an ordinal Integrity into an Integrity (or Occurrence).
  • the transformation function learning unit 13 stores the parameters related to the transformation function obtained by machine learning in the transformation model DB 21.
  • the conversion function learning unit 13 performs regression learning based on the data set created by the conversion data set creation unit 12 so as to output the value of Integrity having a range of 0 to 5 as a continuous value. To find the conversion function. Further, the conversion function learning unit 13 performs classification learning so as to output the discrete values of [0, 1, 2, 3, 4, 5] (discrete values of 0, 1 in the case of Occurrence), and the conversion function. May be sought. In either case of regression learning or classification learning, the conversion function learning unit 13 can obtain a conversion function by a known machine learning method such as a neutral network or SVM (Support Vector Machine).
  • SVM Serial Vector Machine
  • the order score estimation unit 30 is a processing unit that estimates the order Integrity from an input image (for example, an image 50 input as an estimation target).
  • the ordinal score estimation unit 30 reads out the parameters (weights) of the model learned and generated by the ordinal score learning unit 11 from the learning model DB 20 and constructs the model. Next, the order score estimation unit 30 estimates the order Integrity by inputting the image 50 to be estimated into the model.
  • the conversion processing unit 31 is a processing unit that converts the order Integrity estimated by the order score estimation unit 30 into an Integrity (or Occurrence) by a conversion function.
  • the conversion processing unit 31 outputs the converted Integrity (or Occurrence) as an image 50 showing an estimation result for the image 50.
  • the conversion processing unit 31 reads the parameters of the conversion function learned by the conversion function learning unit 13 from the conversion model DB 21 and constructs the conversion function. Next, the conversion processing unit 31 converts the order Integrity estimated by the order score estimation unit 30 into an Integrity (or Occurrence) by the constructed conversion function, and outputs the image 50.
  • the conversion processing unit 31 When the conversion processing unit 31 performs regression learning to output continuous values for the conversion function, the output range is limited to the domain (0 to 5), so that the upper limit is 5 and the lower limit is 0. It may be corrected. Further, the conversion processing unit 31 may discretize the output value into six stages of 0, 1, 2, 3, 4, and 5 by rounding to the nearest whole number.
  • FIG. 3 is a flowchart showing an operation example of the information processing apparatus 1 according to the first embodiment. More specifically, FIG. 3 is a flowchart showing an operation example of the learning phase (S1) in the information processing apparatus 1.
  • the pair data set creation unit 10 includes a learning data set including an image pair relating to the same subject a from the image / label DB 40 and a correct answer label indicating the order relationship of the Integrity. Create D1 (S11).
  • the order score learning unit 11 relearns the neural network NN from the created learning data set D1 (S12), and stores the learned neural network NN parameters (pre-learning model M1 parameters) in the learning model DB 20. (S13).
  • the conversion data set creation unit 12 reads the image and the correct answer flag (for example, Intensity) given to the image from the image / label DB 40 (S14).
  • the conversion data set creation unit 12 inputs the read image to the order score estimation unit 30, and acquires the output (order integrity) from the trained model generated by the learning of the order score learning unit 11.
  • the conversion data set creation unit 12 creates a learning data set in which the Integrity of the correct answer flag given to the image with respect to the acquired order Integrity is used as the correct answer label (S15).
  • the conversion function learning unit 13 learns the conversion function from the learning data set created by the conversion data set creation unit 12 (S16).
  • the transformation function learning unit 13 stores the learned transformation function parameters in the transformation model DB 21 (S17), and ends the process.
  • FIG. 4 is a flowchart showing an operation example of the information processing apparatus 1 according to the first embodiment. More specifically, FIG. 4 is a flowchart showing an operation example of the estimation phase (S2) in the information processing apparatus 1.
  • the order score estimation unit 30 acquires the image 50 to be estimated (S21). Next, the order score estimation unit 30 acquires parameters from the learning model DB 20 and constructs a neural network (trained model) (S22).
  • the order score estimation unit 30 estimates the order Integrity with respect to the image 50 by inputting the image 50 into the constructed neural network (trained model) and obtaining an output value (S23).
  • the conversion processing unit 31 acquires parameters from the conversion model DB 21 and constructs a conversion function (S24). Next, the conversion processing unit 31 converts the order Integrity estimated by the order score estimation unit 30 into the integrity by the constructed conversion function (S25). Next, the conversion processing unit 31 outputs the converted Intensity as the estimated label 51 (S26).
  • the correct answer flag to which the training data is given includes not only the one of the coder but also the measurement result (Intensity or the amount of movement of the facial muscle) of the measuring device that measures the movement of the facial muscle of the subject. do.
  • FIG. 5 is a block diagram showing a functional configuration example of the information processing apparatus according to the second embodiment.
  • the image / label DB 40a has a plurality of subjects a to which a correct answer flag of the measurement result (Intensity or facial muscle movement amount) of the measuring device and metadata such as subject information indicating the subject a are added.
  • Each image related to is stored as training data.
  • the pair data set creation unit 10a of the information processing apparatus 1a refers to the images / labels DB 40 and 40a, and either the image pair (a 1 , a 2 ) of the same subject a or the image of the image pair is the subject a.
  • a training data set D1 including a correct label indicating whether the movement of the facial muscles is large is created.
  • the correct answer flag (for example, Integrity) given by the coder may contain noise even for the same subject a due to a human error.
  • the measurement result of the measuring device does not generate noise due to human error. Therefore, by including the measurement result of the measuring device in the correct answer flag, the effect of improving the learning accuracy of the order relation can be expected.
  • the training data to which the correct answer flag is given by the coder is not used, and the training data to which the measurement result of the measuring device is given as the correct answer flag is used for the first stage learning.
  • the training data to which the coder has added the correct answer flag shall be used.
  • FIG. 6 is a block diagram showing a functional configuration example of the information processing apparatus according to the third embodiment.
  • the pair data set creation unit 10a of the information processing apparatus 1b does not refer to the image / label DB 40, but creates the learning data set D1 by referring to the image / label DB 40a.
  • the conversion data set creation unit 12 creates a data set for the conversion function learning unit 13 to learn the conversion function with reference to the image / label DB 40.
  • the information processing device 1b can be expected to have the effect of further improving the learning accuracy of the order relationship by learning the order relationship using the measurement result of the measuring device without using the correct answer flag given by the coder.
  • the conversion function is machine-learned by machine learning including the feature amount of the moving image, which includes the image as a one-frame image.
  • the estimation phase (S2) the label 51 is estimated by a conversion function based on the feature amount of the original moving image including the image 50 to be estimated as a one-frame image.
  • FIG. 7 is a block diagram showing a functional configuration example of the information processing apparatus according to the fourth embodiment.
  • the conversion data set creation unit 12a of the information processing apparatus 1c acquires a moving image having the image as one frame together with the image included in the image / label DB 40.
  • the conversion data set creation unit 12a acquires the feature amount of the moving image by analyzing the acquired moving image, and includes the acquired feature amount in the data set for learning the conversion function.
  • the conversion function learning unit 13a of the information processing apparatus 1c uses the data set created by the conversion data set creation unit 12 to generate a moving image when machine learning a conversion function for converting an ordinal Integrity into an Integrity (or Occurrence). Perform machine learning including based features.
  • FIG. 8 is an explanatory diagram illustrating the processing content of the information processing apparatus 1c according to the fourth embodiment.
  • S30 in FIG. 8 shows a specific process executed by the conversion data set creation unit 12a.
  • S40 in FIG. 8 shows a configuration related to processing in the conversion function learning unit 13a.
  • the conversion data set creation unit 12a inputs the image 42 of the image / label DB 40 into the order score estimation unit 30, and outputs a value (order Integrity) from the trained model (neural network (VGG)). Is acquired (S31).
  • the conversion data set creation unit 12a inputs a moving image 43 having the image 42 as one frame into the order score estimation unit 30, and outputs a time-series output value (order Intensity) from the trained model (neural network (VGG)). Time series data) is acquired (S32).
  • the conversion data set creation unit 12a extracts the feature amount related to the distribution of the time series data (S33).
  • the conversion data set creation unit 12a is a histogram based on time-series data (for example, frequency of order Integrity -10 or less, frequency of -10 to -9, frequency of -9 to -8, ..., 9 to 10). Frequency of 10 or more). Further, the conversion data set creation unit 12a obtains a percentile (0-th percentile, 10-th percentile, ..., 100-th percentile) based on time-series data.
  • the conversion data set creation unit 12a performs image selection processing based on time-series data (S34). Specifically, the conversion data set creation unit 12a selects one or a plurality of images whose order intensity satisfies a predetermined condition from the time series data (for example, selection of an image having the minimum order indentity).
  • the conversion data set creation unit 12a performs extraction processing such as an image feature amount and a face feature amount on the selected image (S35). Specifically, the conversion data set creation unit 12a extracts the SIFT feature amount of the image, the landmark of the image, and the like.
  • the transformation function learning unit 13a inputs the data set created by S30 into the transformation function models LSTM22, VGG23, and the fully connected neural network 24, and obtains Integrity 25.
  • the fully connected neural network 24 is configured so that the image order Integrity by S31 and other feature quantities are input.
  • a network for time-series data such as LSTM22 is used for the network in which the time-series data itself is input.
  • a network for image data such as VGG23 is used as a network in which the image data itself is input.
  • the output of the LSTM 22 and the output of the VGG 23 are configured to be connected to the fully connected neural network 24.
  • the conversion processing unit 31a labels the order Integrity estimated by the order score estimation unit 30 by the conversion function constructed based on the parameters of the conversion model DB 21 and the feature amount of the original moving image 50a. Estimate 51. Specifically, the conversion processing unit 31a obtains the feature amount of the original moving image 50a in the same manner as the conversion data set creation unit 12a, and inputs the feature amount to the conversion function together with the order Integrity to estimate the label 51.
  • the information processing apparatus 1 has a pair data set including an image pair included in the image / label DB 40 and a correct label indicating which of the image pairs has the largest movement of the facial muscles of the subject. create.
  • the information processing apparatus 1 shares the output value obtained by inputting the first image of the image pair into the pre-learning model M1 and the parameters of the second image of the image pair with the pre-learning model M1.
  • a trained model is generated by performing machine learning of the pre-learning model M1 based on the output value obtained by inputting to the pre-learning model M2 and the first label.
  • the information processing apparatus 1 inputs an output value obtained by inputting a third image included in the image / label DB 40 into the trained model, and the intensity or generation of movement of the facial muscles of the subject included in the image.
  • a model transformation function
  • a model is generated by machine learning based on a label indicating the presence or absence of.
  • the information processing device 1 can generate a model for correctly capturing and estimating features related to facial expression estimation such as movement of facial muscles. Further, the information processing apparatus 1 can improve the accuracy of facial expression estimation by estimating the label 51 for the image 50 to be estimated, that is, facial expression estimation, using the model generated in this way.
  • the image pair related to the generation of the trained model is the image pair of the same subject.
  • the image pair related to the generation of the trained model is the image pair of the same subject. For example, even if the standard changes between the subjects of the image included in the image / label DB 40 depending on the age, the skeleton, the degree of obesity, the connection between the skin and the facial muscles, etc., the standard does not change in the same subject. Therefore, by generating a trained model with the same subject image pair, the information processing apparatus 1 can more appropriately estimate whether the movement of the facial muscles of the subject is large (intensity order relationship). Can be generated.
  • the correct label in the pair data set is given based on the measurement result (for example, Integrity) of the measuring device that measures the movement of the facial muscle of the subject. For example, even if the movement of the facial muscles is the same, the amount of movement differs depending on the person, and the reference is deviated depending on the subject in the measurement result of the measuring device as in the case of being given by a coder who is an expert. Become. In the information processing apparatus 1, even when a correct answer label based on the measurement result of such a measuring apparatus is given, it is possible to generate a model for correctly capturing and estimating the characteristics related to facial expression estimation.
  • the measurement result for example, Integrity
  • the difference in the movement of the facial muscles of the subject in each image is more than a specific value.
  • a specific value the difference in the movement of the facial muscles of the subject is equal to or greater than a specific value and the movement of the facial muscles of the subject is clearly different.
  • the image pair related to the generation of the trained model may be a pair with the same movement size of the facial muscles.
  • the following equation (2) is used as the loss function F1.
  • a more accurate model can be generated by using a pair having the same movement size of the facial muscles.
  • the information processing device generates a model (conversion function) by machine learning including a feature amount based on a moving image including a third image.
  • a model conversion function
  • machine learning including a feature amount based on a moving image including a third image.
  • the feature amount based on the moving image including the third image is selected from the time-series data by the image group included in the moving image, the feature amount related to the distribution of the time-series data, and the image group based on the distribution of the time-series data. It may be at least one of the one or more images and the feature amount of the one or more images. In the information processing device, a more accurate model can be generated by performing machine learning including such features.
  • each component of each of the illustrated devices does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in any unit according to various loads and usage conditions. Can be integrated and configured.
  • the functional configuration for performing the learning phase (S1) in the information processing devices 1, 1a, 1b, and 1c and the functional configuration for performing the estimation phase (S2) may be separate configurations, and each is an independent device. It may be realized by the configuration.
  • the various processing functions of the unit 30 and the conversion processing units 31, 31a are all or any part thereof on the CPU (or a microcomputer such as an MPU or MCU (Micro Controller Unit) or a GPU (Graphics Processing Unit)). You may want to do it.
  • various processing functions should be executed in whole or in any part on a program analyzed and executed by a CPU (or a microcomputer such as MPU or MCU or GPU) or on hardware by wired logic. Needless to say, it's okay.
  • various processing functions performed by the information processing devices 1, 1a, 1b, and 1c may be executed by a plurality of computers in cooperation by cloud computing.
  • FIG. 9 is a block diagram showing an example of a computer configuration.
  • the computer 200 has a CPU 201 that executes various arithmetic processes, an input device 202 that accepts data input, a monitor 203, and a speaker 204. Further, the computer 200 has a medium reading device 205 for reading a program or the like from a storage medium, an interface device 206 for connecting to various devices, and a communication device 207 for communicating with an external device by wire or wirelessly. Further, the computer 200 has a RAM 208 for temporarily storing various information and a hard disk device 209. Further, each part (201 to 209) in the computer 200 is connected to the bus 210.
  • the hard disk device 209 has a functional configuration described in each of the above embodiments (for example, pair data set creation unit 10, 10a, order score learning unit 11, conversion data set creation unit 12, 12a, conversion function learning unit 13, 13a, etc.
  • a program 211 for executing various processes in the order score estimation unit 30 and the conversion processing units 31, 31a) is stored. Further, various data 212 referred to by the program 211 are stored in the hard disk device 209.
  • the input device 202 receives, for example, an input of operation information from an operator.
  • the monitor 203 displays, for example, various screens operated by the operator. For example, a printing device or the like is connected to the interface device 206.
  • the communication device 207 is connected to a communication network such as a LAN (Local Area Network), and exchanges various information with an external device via the communication network.
  • LAN Local Area Network
  • the CPU 201 reads out the program 211 stored in the hard disk device 209, expands it into the RAM 208, and executes the above-mentioned functional configuration (for example, pair data set creation unit 10, 10a, order score learning unit 11, conversion data set). Various processes related to the creation unit 12, 12a, the conversion function learning unit 13, 13a, the order score estimation unit 30, and the conversion processing unit 31, 31a) are performed.
  • the program 211 may not be stored in the hard disk device 209.
  • the computer 200 may read and execute the program 211 stored in the readable storage medium.
  • the storage medium that can be read by the computer 200 is, for example, a CD-ROM, a DVD disk, a portable recording medium such as a USB (Universal Serial Bus) memory, a semiconductor memory such as a flash memory, a hard disk drive, or the like.
  • the program 211 may be stored in a device connected to a public line, the Internet, a LAN, or the like, and the computer 200 may read the program 211 from these and execute the program 211.
  • Communication device 208 ... RAM 209 ... Hard disk device 210 ... Bus 211 ... Program 212 ... Various data a ... Subject D1 ... Learning data set F1 ... Loss function L1 ... Correct labels M1, M2 ... Pre-learning model NN ... Neural network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
PCT/JP2020/036456 2020-09-25 2020-09-25 機械学習プログラム、機械学習方法および推定装置 Ceased WO2022064660A1 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN202080105040.6A CN116018613A (zh) 2020-09-25 2020-09-25 机器学习程序、机器学习方法以及推定装置
EP20955254.6A EP4220546A4 (en) 2020-09-25 2020-09-25 MACHINE LEARNING PROGRAM, MACHINE LEARNING METHOD AND INFERENCE APPARATUS
JP2022551068A JP7396509B2 (ja) 2020-09-25 2020-09-25 機械学習プログラム、機械学習方法および推定装置
PCT/JP2020/036456 WO2022064660A1 (ja) 2020-09-25 2020-09-25 機械学習プログラム、機械学習方法および推定装置
US18/119,342 US20230237845A1 (en) 2020-09-25 2023-03-09 Machine learning program, machine learning method, and estimation apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/036456 WO2022064660A1 (ja) 2020-09-25 2020-09-25 機械学習プログラム、機械学習方法および推定装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/119,342 Continuation US20230237845A1 (en) 2020-09-25 2023-03-09 Machine learning program, machine learning method, and estimation apparatus

Publications (1)

Publication Number Publication Date
WO2022064660A1 true WO2022064660A1 (ja) 2022-03-31

Family

ID=80846435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/036456 Ceased WO2022064660A1 (ja) 2020-09-25 2020-09-25 機械学習プログラム、機械学習方法および推定装置

Country Status (5)

Country Link
US (1) US20230237845A1 (https=)
EP (1) EP4220546A4 (https=)
JP (1) JP7396509B2 (https=)
CN (1) CN116018613A (https=)
WO (1) WO2022064660A1 (https=)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230368409A1 (en) * 2022-05-13 2023-11-16 Fujitsu Limited Storage medium, model training method, and model training device
JP2025067257A (ja) * 2023-10-12 2025-04-24 株式会社Ridge-i 情報処理装置、画像評価方法及び画像評価プログラム、教師データ生成装置、教師データ生成方法、教師データ生成プログラム

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014119879A (ja) * 2012-12-14 2014-06-30 Nippon Hoso Kyokai <Nhk> 顔表情評価結果平滑化装置および顔表情評価結果平滑化プログラム
JP2018036734A (ja) 2016-08-29 2018-03-08 日本放送協会 表情変化検出装置及びプログラム
CN109657586A (zh) * 2018-12-10 2019-04-19 华中师范大学 一种基于排序卷积神经网络的人脸表情分析方法及系统
US20190294868A1 (en) 2016-06-01 2019-09-26 Ohio State Innovation Foundation System and method for recognition and annotation of facial expressions
JP2020057111A (ja) 2018-09-28 2020-04-09 パナソニックIpマネジメント株式会社 表情判定システム、プログラム及び表情判定方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102564854B1 (ko) * 2017-12-29 2023-08-08 삼성전자주식회사 정규화된 표현력에 기초한 표정 인식 방법, 표정 인식 장치 및 표정 인식을 위한 학습 방법
CN110188615B (zh) * 2019-04-30 2021-08-06 中国科学院计算技术研究所 一种人脸表情识别方法、装置、介质及系统
CN110765873B (zh) * 2019-09-19 2022-08-16 华中师范大学 一种基于表情强度标签分布的面部表情识别方法与装置
CN111582067B (zh) * 2020-04-22 2022-11-29 西南大学 人脸表情识别方法、系统、存储介质、计算机程序、终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014119879A (ja) * 2012-12-14 2014-06-30 Nippon Hoso Kyokai <Nhk> 顔表情評価結果平滑化装置および顔表情評価結果平滑化プログラム
US20190294868A1 (en) 2016-06-01 2019-09-26 Ohio State Innovation Foundation System and method for recognition and annotation of facial expressions
JP2018036734A (ja) 2016-08-29 2018-03-08 日本放送協会 表情変化検出装置及びプログラム
JP2020057111A (ja) 2018-09-28 2020-04-09 パナソニックIpマネジメント株式会社 表情判定システム、プログラム及び表情判定方法
CN109657586A (zh) * 2018-12-10 2019-04-19 华中师范大学 一种基于排序卷积神经网络的人脸表情分析方法及系统

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
See also references of EP4220546A4
SU LUMEI, SATO YOICHI: "Early facial expression recognition using early RankBoost", 2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), IEEE, 1 April 2013 (2013-04-01), pages 1 - 7, XP055922220, ISBN: 978-1-4673-5545-2, DOI: 10.1109/FG.2013.6553740 *
Y ANG, PENG ET AL.: "RankBoost with 11 regularization for Facial Expression Recognition and Intensity Estimation", 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV, pages 1018 - 1025, XP031672701, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5459371&tag=1> [retrieved on 20201130] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230368409A1 (en) * 2022-05-13 2023-11-16 Fujitsu Limited Storage medium, model training method, and model training device
JP2025067257A (ja) * 2023-10-12 2025-04-24 株式会社Ridge-i 情報処理装置、画像評価方法及び画像評価プログラム、教師データ生成装置、教師データ生成方法、教師データ生成プログラム

Also Published As

Publication number Publication date
EP4220546A1 (en) 2023-08-02
JPWO2022064660A1 (https=) 2022-03-31
EP4220546A4 (en) 2023-10-25
US20230237845A1 (en) 2023-07-27
CN116018613A (zh) 2023-04-25
JP7396509B2 (ja) 2023-12-12

Similar Documents

Publication Publication Date Title
US11669729B2 (en) Model training method and apparatus
Panwar et al. Modeling EEG data distribution with a Wasserstein generative adversarial network to predict RSVP events
US10482352B2 (en) Information processing device and non-transitory computer readable medium
Monteiro et al. A multiple hold-out framework for Sparse Partial Least Squares
JP6270182B2 (ja) 属性要因分析方法、装置、およびプログラム
CN107679466B (zh) 信息输出方法和装置
Xin et al. Pain intensity estimation based on a spatial transformation and attention CNN
CN105225222A (zh) 对不同图像集的感知视觉质量的自动评估
US20250390752A1 (en) Human characteristic normalization with an autoencoder
Wismüller et al. Large-scale nonlinear Granger causality: A data-driven, multivariate approach to recovering directed networks from short time-series data
US20230237845A1 (en) Machine learning program, machine learning method, and estimation apparatus
JP6905892B2 (ja) 計算機システム
JP7276018B2 (ja) 学習方法、推定方法および学習プログラム
O'Reilly et al. Pre-trained vs. random weights for calculating fréchet inception distance in medical imaging
Spaulding et al. Frustratingly easy personalization for real-time affect interpretation of facial expression
CN116152645B (zh) 一种融合多种表征平衡策略的室内场景视觉识别方法及系统
US20230086573A1 (en) Mental image visualization method, mental image visualization device and program
Bajwa et al. A multifaceted independent performance analysis of facial subspace recognition algorithms
JPWO2018083853A1 (ja) 視野感度推定装置、視野感度推定装置の制御方法、及びプログラム
Guo et al. Age transformation based on deep learning: a survey
JP7700863B2 (ja) 推定方法、推定装置および推定プログラム
JP6947460B1 (ja) プログラム、情報処理装置、及び方法
WO2024042736A1 (ja) 情報処理方法、情報処理システム、及び情報処理プログラム
US20200294669A1 (en) Learning method, estimating method, and storage medium
JP2020086626A (ja) 学習装置、学習方法、プログラムおよび記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20955254

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022551068

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020955254

Country of ref document: EP

Effective date: 20230425