WO2023013075A1 - Dispositif d'apprentissage, dispositif d'estimation, procédé d'apprentissage et programme d'apprentissage - Google Patents

Dispositif d'apprentissage, dispositif d'estimation, procédé d'apprentissage et programme d'apprentissage Download PDF

Info

Publication number
WO2023013075A1
WO2023013075A1 PCT/JP2021/029440 JP2021029440W WO2023013075A1 WO 2023013075 A1 WO2023013075 A1 WO 2023013075A1 JP 2021029440 W JP2021029440 W JP 2021029440W WO 2023013075 A1 WO2023013075 A1 WO 2023013075A1
Authority
WO
WIPO (PCT)
Prior art keywords
age
data
posterior probability
source data
neural network
Prior art date
Application number
PCT/JP2021/029440
Other languages
English (en)
Japanese (ja)
Inventor
直弘 俵
厚徳 小川
佑樹 北岸
歩相名 神山
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/029440 priority Critical patent/WO2023013075A1/fr
Priority to JP2023539586A priority patent/JPWO2023013075A1/ja
Publication of WO2023013075A1 publication Critical patent/WO2023013075A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a learning device, an estimation device, a learning method, and a learning program.
  • Non-Patent Document 1 a human age estimation method using a neural network (NN) (for example, Non-Patent Document 1) is known in the field of voice processing and image processing.
  • NN neural network
  • Non-Patent Document 1 an NN that converts a speech signal into a feature amount vector and an NN that estimates the posterior probability of an age label from the feature amount vector are connected to maximize the posterior probability for the correct age value. It is described that the age can be estimated with high accuracy by learning these NNs at the same time.
  • Non-Patent Documents 2 and 3 The degradation of NN performance due to differences in input data properties is a well-known phenomenon in the field of image processing, and several solutions have been proposed (for example, Non-Patent Documents 2 and 3).
  • Non-Patent Document 2 during NN learning, training data (source data) collected in an environment different from that during operation and given a teacher label is given along with teacher labels collected in the same environment as during operation.
  • source data collected in an environment different from that during operation and given a teacher label
  • teacher labels collected in the same environment as during operation.
  • a method for solving this problem is described by using data (target data) that is not targeted.
  • the source data and target data are input, and the distance between the distributions of the obtained NN intermediate output is calculated according to the Maximum Mean Discrepancy (MMD) standard, and the NN is configured to minimize this.
  • MMD Maximum Mean Discrepancy
  • Non-Patent Document 2 the distribution of each data is brought closer regardless of the class to be estimated, so there is a problem that the original purpose of classification accuracy is lowered.
  • Non-Patent Document 3 describes that this problem can be solved by introducing a technique called Local MMD that first estimates the class of the target data and approximates the distribution for each class based on the estimated class posterior probability. It is
  • Non-Patent Documents 2 and 3 were developed for the problem of estimating labels that are independent of each other, such as image classification, and do not work well for the problem of estimating ordered labels, such as age. .
  • the label is first estimated for the target data, and based on the estimated label, the distributions of the source data and the target data are approximated to solve the problem of distribution mismatch.
  • Non-Patent Document 3 in the age estimation problem, there is a label order such that the difference between 20 and 25 years is greater than the difference between 20 and 80 years. There is a problem that the performance deteriorates when two distributions are approached independently for each class, ignoring the order.
  • An object is to provide an apparatus, a learning method and a learning program.
  • a learning device uses a first neural network to convert age-labeled source data into a feature amount vector, and age-labeled source data.
  • a conversion unit that converts target data that has not been converted into a feature amount vector
  • a second neural network that estimates the posterior probability for the age of the target person from the feature amount vector of the source data converted by the conversion unit.
  • the posterior probability for the age of the target person is estimated from the feature amount vector of the source data
  • the posterior probability for the age of the target person is estimated from the feature amount vector of the target data.
  • a first neural network, a second neural network, and a third neural network are arranged so that the distributions of the feature amount vectors of the source data and the feature amount vectors of the target data are brought close to each other according to the inter-distribution distance criterion defined in advance for each age. and an updating unit that updates each parameter with the network.
  • the estimation apparatus includes a conversion unit that converts data into a feature amount vector using a first neural network, and a feature amount vector converted by the conversion unit using a second neural network.
  • an estimator for estimating the age of a person wherein the first neural network and the second neural network estimate the posterior probability for each age of the age-labeled source data estimated by the second neural network; and the posterior probability of the correct age of the target person and the posterior probability of the correct age of the target person with respect to the posterior probability of the age of the source data estimated by the third neural network for estimating the posterior probability of the age of the person.
  • FIG. 1 is a diagram schematically showing an example of the configuration of a learning device according to an embodiment.
  • 2 is a diagram for explaining the flow of processing in the learning apparatus shown in FIG. 1;
  • FIG. 3 is a diagram illustrating an example of the configuration of the first NN.
  • FIG. 4 is a diagram illustrating an example of the configuration of the first NN.
  • FIG. 5 is a diagram illustrating an example of the configuration of the second NN.
  • FIG. 6 is a diagram illustrating an example of the configuration of the third NN.
  • FIG. 7 is a flow chart showing a processing procedure of learning processing according to the embodiment.
  • FIG. 8 is a diagram schematically illustrating an example of a configuration of an estimation device according to an embodiment
  • 9 is a flowchart showing an estimation processing procedure executed by the estimation device shown in FIG. 8.
  • FIG. 10 is a diagram illustrating an example of a computer that implements a learning device and an estimation device by executing a program.
  • FIG. 1 is a diagram schematically showing an example of the configuration of a learning device according to an embodiment.
  • FIG. 2 is a diagram for explaining the flow of processing in the learning device shown in FIG.
  • the learning device 10 for example, a computer or the like including ROM (Read Only Memory), RAM (Random Access Memory), CPU (Central Processing Unit), etc. is loaded with a predetermined program, and the CPU executes a predetermined program. It is realized by executing the program.
  • the learning device 10 also has a communication interface for transmitting and receiving various information to and from another device connected via a wired connection or a network or the like.
  • the learning device 10 has a data selection unit 11, an estimation unit 12, an update unit 13, and a control processing unit .
  • the learning device 10 uses source data with age labels collected under recording conditions different from those during actual operation and data (target data) without age labels recorded under the same environment as during operation.
  • the source data and adaptation data are face image data or voice data.
  • the source data and target data are face image data or voice data.
  • the data selection unit 11 selects one source data from learning data (source data group) having a plurality of source data as data to be input to the feature amount conversion unit 121 (described later), Target data is randomly selected from the target data (target data group).
  • the data selection unit 11 outputs the correct age of the selected source data and the correct age of the source data obtained from the correct age to the updating unit 13 .
  • the estimation unit 12 estimates the age of the target person based on a plurality of face image data or voice data based on the same person's face image data or voice data.
  • the estimating unit 12 has a feature quantity transforming unit 121 (transforming unit), an age estimating unit 122 (first estimating unit), and an age estimating unit 123 (second estimating unit).
  • the feature amount conversion unit 121 uses the first NN 1211 to convert face image data or voice data into a feature amount vector for age estimation.
  • the feature amount conversion unit 121 selects source data and target data from the source data set and the target data set, and extracts feature amount vectors of each data.
  • the first NN 1211 is a NN that converts a series of face image data or voice data of a person, selected as source data and target data by the data selection unit 11, into feature amount vectors.
  • the first NN 1211 converts the source data selected by the data selection unit 11 into a feature amount vector.
  • the first NN 1211 converts the target data selected by the data selection unit 11 into a feature amount vector.
  • the first NN 1211 is implemented by an NN that converts speech data into feature vectors using the technique described in Non-Patent Document 1, for example.
  • FIG. 3 is a diagram illustrating an example of the configuration of the first NN 1211.
  • the first NN 1211 is implemented by, for example, an NN having a structure as shown in FIG.
  • the first NN 1211 is realized by a convolutional NN consisting of multiple time-delay layers and statistical pooling layers.
  • the first NN 1211 is implemented by an NN that converts facial image data into feature vectors using the technique described in Non-Patent Document 2, for example.
  • FIG. 4 is a diagram illustrating an example of the configuration of the first NN 1211.
  • the first NN 1211 is implemented by, for example, an NN having a structure as shown in FIG.
  • the first NN 1211 is implemented by a convolutional NN consisting of multiple residual blocks employing squeeze-and-excitation.
  • the age estimation unit 122 uses the second NN 1221 to estimate the posterior probability for the age of the target person from the feature amount vector of the source data converted by the feature amount conversion unit 121 .
  • the second NN 1221 is a NN that estimates the age of the target person from the feature quantity vector transformed by the first NN 1211 .
  • the second NN 1221 is implemented by an NN that estimates the age value of the target person from the feature amount vector, for example, using the technology described in Non-Patent Document 1.
  • FIG. 5 is a diagram illustrating an example of the configuration of the second NN 1221.
  • This second NN 1221 is implemented by, for example, an NN having a structure as shown in FIG.
  • the second NN 1221 is realized by a plurality of fully connected layers of 512 dimensions and fully connected layers of the same number of dimensions as the number of age classes to be estimated (for example, 101 classes from 0 to 100).
  • the age estimation unit 123 uses the third NN 1231 to estimate each posterior probability for the target person's age from the feature amount vector of the source data and the feature amount vector of the target data converted by the feature amount conversion unit 121 .
  • the third NN 1231 is a NN that estimates the age of the target person from the feature vectors converted by the first NN 1211 .
  • the third NN 1231 estimates the posterior probability for the age of the target person from the feature vector of the source data.
  • the third NN 1231 estimates the posterior probability for the target person's age from the feature vector of the target data.
  • FIG. 6 is a diagram illustrating an example of the configuration of the third NN 1231.
  • the third NN 1231 is implemented by, for example, an NN having a structure as shown in FIG.
  • the third NN 1231 is implemented, for example, by a plurality of 512-dimensional fully connected layers and a fully connected layer with the same number of dimensions as the number of predefined age classes.
  • the number of age classes should be less than or equal to the age group originally desired to be estimated. For example, if the second NN 1221 estimates from 0 to 100 in 1-year increments, the third NN 1231 estimates 10 years from 0 to 100 in 10-year increments (teens, twenties, ). should be a class.
  • the update unit 13 updates the first NN 1211, the second NN 1221, the third NN 1231, and each parameter so as to maximize the posterior probability of the correct age class and the correct age class of the target person.
  • the updating unit 13 inputs the feature amount vector of the target data to the third NN 1231 to obtain the posterior probability of the age class of the target person of the target data.
  • the update unit 13 weights the source data feature amount vector and the target data feature amount vector output from the first NN 1211 with the correct age of each source data and the posterior probability of the age of the target data. Calculate the conditional inter-distribution distance.
  • the updating unit 13 updates each parameter of the first NN 1211, the second NN 1221, and the third NN 1231 so as to minimize the calculated conditional distance between distributions.
  • the update unit 13 updates the correct age of the target person with respect to the posterior probability for each age of the source data estimated by the age estimation unit 122 and the posterior probability for each age of the source data estimated by the age estimation unit 123.
  • the posterior probability of each age of the target data estimated by the age estimation unit 123 and the correct age of the source data are used as conditions to maximize the posterior probability of the posterior probability of and the correct age.
  • the first NN 1211 and the second NN 1221 are arranged so that the distributions of the feature amount vectors of the source data and the feature amount vectors of the target data, which are converted by the unit 121, are brought closer to each other based on the inter-distribution distance criterion defined in advance for each age.
  • Each parameter with the third NN 1231 is updated.
  • cross_entropy ( ⁇ ys i , ys i ), which is the first term in Equation (1), is the cross entropy between the age posterior probability estimated by the second NN 1221 and the correct age label.
  • Cross_entropy ( ⁇ v si , v si ), which is the second term in Equation (2), is the cross entropy between the age posterior probability estimated by the third 1131 and the correct age label.
  • d H (p s , p t ) represents the conditional inter-distribution distance between the feature vector of the source data and the feature vector of the target data.
  • d H (p s , p t ) is , for example, the first Let the feature vectors obtained by applying NN1211 be x t 1 , x t 2 , . be done. Also, hereinafter, the age estimation results obtained by applying the third NN 1231 to the feature amount vector of the target data are assumed to be ⁇ v t 1 , ⁇ v t 2 , ..., ⁇ v t nt .
  • C represents the age class number.
  • w sc i and w tc j represent the contribution ratios of the i-th source data and the j-th target data to age class c (weights that contribute to the conditional inter-distribution distance of each age class);
  • p( ⁇ v t j c
  • x t j ) be the posterior probability for the c-th age class of the j-th target data obtained by NN 1231 of .
  • the updating unit 13 calculates the parameters (the first NN 1211 and the second NN 1221 and each parameter with the third NN 1231).
  • Equation (4) is a preset learning weight and is a positive constant.
  • Equation (1) is a learning weight of the conditional inter-distribution distance, and is designed to be a small value close to 0 at the beginning of learning and gradually approach 1 as learning progresses. For example, if the maximum number of iterations of the updating unit 13 is I, the weight ⁇ i at the i-th iteration can be calculated by Equation (5).
  • ⁇ in Equation (5) is a parameter that determines the speed of the preset learning weights and is a positive constant.
  • the control processing unit 14 causes the feature quantity conversion unit 121, the age estimation unit 122, the age estimation unit 123, and the updating unit 13 to repeatedly execute the processing until a predetermined condition is satisfied.
  • the control processing unit 14 causes the updating unit 13 to repeatedly update the parameters of the first NN 1211, the second NN 1221, and the third NN 1231 until a predetermined condition is satisfied.
  • the predetermined condition is, for example, a predetermined number of iterations is reached, the total update amount of the parameters of the first NN 1211, the second NN 1221, and the third NN 1231 is less than a predetermined threshold, and the first NN 1211 and the second NN 1221 and the third NN 1231 are sufficiently trained.
  • FIG. 7 is a flow chart showing a processing procedure of learning processing according to the embodiment.
  • the data selection unit 11 selects source data and target data (step S1).
  • the data selector 11 randomly selects target data.
  • the feature amount conversion unit 121 uses the first NN 1211 to convert the source data and the target data selected by the data selection unit 11 into feature amount vectors (step S2).
  • the age estimation unit 122 uses the second NN 1221 to estimate the age of the target person from the feature amount vector of the source data converted by the feature amount conversion unit 121 (step S3).
  • the age estimation unit 123 uses the third NN 1231 to estimate the posterior probability for the age of the target person from the feature amount vector of the source data converted by the feature amount conversion unit 121, and the target person from the feature amount vector of the target data. Estimate the posterior probability for the age of (step S4).
  • the update unit 13 updates the posterior probability of the correct age of the target person and the correct answer with respect to the posterior probability for each age of the source data estimated by the age estimation unit 122 and each age of the source data estimated by the age estimation unit 123.
  • the posterior probability of each age of the target data estimated by the age estimation unit 123 and the correct age of the source data are converted by the feature amount conversion unit 121 as conditions so that the posterior probability with the age of 123 is maximized.
  • the first NN 1211, the second NN 1221 and the third NN 1231 and are updated (step S5).
  • the control processing unit 14 determines whether or not a predetermined condition is satisfied (step S6). If the predetermined condition is not satisfied (step S6: No), the learning device 10 returns to step S2 and performs each process of data processing, feature conversion, age estimation, and parameter update. On the other hand, if the predetermined condition is satisfied (step S6: Yes), the learning device 10 ends the learning process.
  • FIG. 8 is a diagram schematically illustrating an example of a configuration of an estimation device according to an embodiment
  • 9 is a flowchart showing an estimation processing procedure executed by the estimation device shown in FIG. 8.
  • the estimation device 20 shown in FIG. 8 has a feature amount conversion unit 221 (conversion unit) having a first NN 1211 and an age estimation unit 222 (estimation unit) having a second NN 1221 .
  • the first NN 1211 and the second NN 1221 are NNs that have been learned by the learning device 10 .
  • the feature amount conversion unit 221 When the feature amount conversion unit 221 receives input of face image data or voice data (step S11 in FIG. 9), it uses the first NN 1211 to convert the face image data or voice data into feature amounts (step S11 in FIG. 9). step S12).
  • the age estimation unit 222 uses the second NN 1221 to estimate the age of the target person from the feature amount vector converted by the feature amount conversion unit 211 (step S13 in FIG. 9), and outputs the estimated age (step S13 in FIG. 9). step S14).
  • the average absolute error between the correct age value and the result of estimating the age of the speaker by the first NN 1211 and the second NN 1221 was 8.02 years old.
  • the correlation coefficient between the correct age value and the estimation result of the speaker's age was 0.84.
  • the absolute error with the estimation result of the speaker's age was 11.76 years, and the correlation coefficient was 0.71. . Therefore, like the learning device 10, the method of learning the first NN 1211, the second NN 1221, and the third NN 1231 so that the distributions of the source data and the target data are brought closer to each other for each age function effectively. was confirmed.
  • an age estimator (second NN 1221) can be obtained.
  • first NN 1211 when learning an NN that estimates age from face image data or voice data, "labels are labeled in age groups larger than the age groups to be originally estimated (for example, ages in 10-year increments).
  • the difference in data recording conditions during training and during operation is eliminated. In spite of this, it was possible to realize an NN capable of outputting age estimation results with high accuracy.
  • the learning device 10 estimates the label of the target data and approximates the distributions of the source data and the target data based on this estimation result to solve the data mismatch problem. It is similar to technology, but differs in the following points.
  • Non-Patent Document 3 targets only discrete labels whose data to be estimated are independent of each other, like general classification problems.
  • the present embodiment differs in that ordered age labels are targeted for estimation.
  • Non-Patent Document 3 attempts to approximate the distribution using the estimation result of the label, which is the original estimation target.
  • learning is performed so that the distribution is brought closer to each age with coarser granularity.
  • source data with an age label recorded in an environment different from that during operation and target data recorded in the same environment as during operation but without an age label are used.
  • the performance of the feature transformer and age estimator can be significantly improved compared to the case of learning only with source data.
  • Non-Patent Document 3 The content of approximating the distribution for each class to be estimated is similar to Non-Patent Document 3, but the technology described in Non-Patent Document 3 does not assume a task such as age estimation, and even if it is applied as it is little effect.
  • age is a value with an order, and it can be defined in a larger division such as age, so the distribution is approximated in this large division", so that a highly accurate age estimator realized.
  • the first NN 1211 may be changed to one suitable for each type of input data.
  • Each component of the learning device 10 and the estimating device 20 is functionally conceptual and does not necessarily need to be physically configured as illustrated. That is, the specific forms of distribution and integration of the functions of the learning device 10 and the estimating device 20 are not limited to those illustrated, and all or part of them can be functioned in arbitrary units according to various loads and usage conditions. can be physically or physically distributed or integrated.
  • each process performed in the learning device 10 and the estimation device 20 may be realized by a CPU, a GPU (Graphics Processing Unit), and a program that is analyzed and executed by the CPU and GPU. good. Further, each process performed in the learning device 10 and the estimation device 20 may be implemented as hardware based on wired logic.
  • FIG. 10 is a diagram showing an example of a computer that implements the learning device 10 and the estimation device 20 by executing programs.
  • the computer 1000 has a memory 1010 and a CPU 1020, for example.
  • Computer 1000 also has hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
  • the memory 1010 includes a ROM 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • Hard disk drive interface 1030 is connected to hard disk drive 1090 .
  • a disk drive interface 1040 is connected to the disk drive 1100 .
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 .
  • Serial port interface 1050 is connected to mouse 1110 and keyboard 1120, for example.
  • Video adapter 1060 is connected to display 1130, for example.
  • the hard disk drive 1090 stores an OS (Operating System) 1091, application programs 1092, program modules 1093, and program data 1094, for example. That is, a program that defines each process of the learning device 10 and the estimation device 20 is implemented as a program module 1093 in which code executable by the computer 1000 is described. Program modules 1093 are stored, for example, on hard disk drive 1090 .
  • the hard disk drive 1090 stores a program module 1093 for executing processing similar to the functional configurations of the learning device 10 and the estimation device 20 .
  • the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.
  • the program modules 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program modules 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Program modules 1093 and program data 1094 may then be read by CPU 1020 through network interface 1070 from other computers.
  • LAN Local Area Network
  • WAN Wide Area Network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un dispositif d'apprentissage (10) qui comprend : une unité de conversion de quantités de caractéristiques (121) qui utilise un premier réseau de neurones artificiels (NN) (1211) pour convertir des données sources étiquetées et des données cibles non étiquetées en vecteurs de quantités de caractéristiques ; une unité d'estimation d'âge (122) qui utilise un deuxième NN (1221) pour estimer une probabilité a posteriori de l'âge d'un sujet à partir du vecteur de quantité de caractéristique de données sources ; une unité d'estimation de tranche d'âge (123) qui utilise un troisième NN (1231) pour estimer une probabilité a posteriori de la tranche d'âge du sujet à partir des vecteurs de quantités de caractéristiques des données sources et des données cibles ; et une unité de mise à jour (13) qui met à jour les paramètres du premier NN (1211), du deuxième NN (1221) et du troisième NN (1231), de sorte que la distance entre les distributions des vecteurs de quantités de caractéristiques des données sources et des données cibles se rapprochent les unes des autres, en utilisant comme conditions la probabilité a posteriori de la tranche d'âge de données cibles et de la tranche d'âge correcte dans les données sources, tout en maximisant la probabilité a posteriori de l'âge correct et de la tranche d'âge correcte par rapport à la probabilité a posteriori de l'âge de données sources et de la tranche d'âge.
PCT/JP2021/029440 2021-08-06 2021-08-06 Dispositif d'apprentissage, dispositif d'estimation, procédé d'apprentissage et programme d'apprentissage WO2023013075A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/029440 WO2023013075A1 (fr) 2021-08-06 2021-08-06 Dispositif d'apprentissage, dispositif d'estimation, procédé d'apprentissage et programme d'apprentissage
JP2023539586A JPWO2023013075A1 (fr) 2021-08-06 2021-08-06

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/029440 WO2023013075A1 (fr) 2021-08-06 2021-08-06 Dispositif d'apprentissage, dispositif d'estimation, procédé d'apprentissage et programme d'apprentissage

Publications (1)

Publication Number Publication Date
WO2023013075A1 true WO2023013075A1 (fr) 2023-02-09

Family

ID=85154110

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/029440 WO2023013075A1 (fr) 2021-08-06 2021-08-06 Dispositif d'apprentissage, dispositif d'estimation, procédé d'apprentissage et programme d'apprentissage

Country Status (2)

Country Link
JP (1) JPWO2023013075A1 (fr)
WO (1) WO2023013075A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021095509A1 (fr) * 2019-11-14 2021-05-20 オムロン株式会社 Système d'inférence, dispositif d'inférence et procédé d'inférence

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021095509A1 (fr) * 2019-11-14 2021-05-20 オムロン株式会社 Système d'inférence, dispositif d'inférence et procédé d'inférence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAWARA NAOHIRO; OGAWA ATSUNORI; KITAGISHI YUKI; KAMIYAMA HOSANA: "Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation", ICASSP 2021 - 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 6 June 2021 (2021-06-06), pages 6963 - 6967, XP033955338, DOI: 10.1109/ICASSP39728.2021.9414272 *
YONGCHUN ZHU; FUZHEN ZHUANG; JINDONG WANG; GUOLIN KE; JINGWU CHEN; JIANG BIAN; HUI XIONG; QING HE: "Deep Subdomain Adaptation Network for Image Classification", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 17 June 2021 (2021-06-17), 201 Olin Library Cornell University Ithaca, NY 14853, XP081991391, DOI: 10.1109/TNNLS.2020.2988928 *

Also Published As

Publication number Publication date
JPWO2023013075A1 (fr) 2023-02-09

Similar Documents

Publication Publication Date Title
CN111340021B (zh) 基于中心对齐和关系显著性的无监督域适应目标检测方法
Stolcke et al. Explicit word error minimization in n-best list rescoring.
WO2016037350A1 (fr) Apprentissage de dnn élève par le biais d'une distribution de sortie
US11468293B2 (en) Simulating and post-processing using a generative adversarial network
CN108304890B (zh) 一种分类模型的生成方法及装置
EP4143752A1 (fr) Procédés et appareils permettant un apprentissage fédéré
JP6992709B2 (ja) マスク推定装置、マスク推定方法及びマスク推定プログラム
US20180232632A1 (en) Efficient connectionist temporal classification for binary classification
JP6620882B2 (ja) ドメイン適応を用いたパターン認識装置、方法およびプログラム
JPWO2019198306A1 (ja) 推定装置、学習装置、推定方法、学習方法及びプログラム
EP1863014B1 (fr) Procédés et dispositif pour l'apprentissage et l'utilisation d'un modèle de transition de distances
CN110598848A (zh) 一种基于通道剪枝的迁移学习加速方法
KR20190136578A (ko) 음성 인식 방법 및 장치
WO2019138897A1 (fr) Dispositif et procédé d'apprentissage, et programme
US7496509B2 (en) Methods and apparatus for statistical biometric model migration
WO2023013075A1 (fr) Dispositif d'apprentissage, dispositif d'estimation, procédé d'apprentissage et programme d'apprentissage
McDermott et al. A derivation of minimum classification error from the theoretical classification risk using Parzen estimation
Lee et al. Training hidden Markov models by hybrid simulated annealing for visual speech recognition
WO2020151017A1 (fr) Procédé et dispositif de suivi d'état de système de dialogue homme-machine à champ évolutif
JP7103235B2 (ja) パラメタ算出装置、パラメタ算出方法、及び、パラメタ算出プログラム
JP2015038709A (ja) モデルパラメータ推定方法、装置、及びプログラム
Tong et al. Graph convolutional network based semi-supervised learning on multi-speaker meeting data
CN115828100A (zh) 基于深度神经网络的手机辐射源频谱图类别增量学习方法
CN114742166A (zh) 一种基于时延优化的通信网现场维护模型迁移方法
KR102432854B1 (ko) 잠재 벡터를 이용하여 군집화를 수행하는 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952898

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023539586

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE