WO2019194128A1 - Model learning device, model learning method, and program - Google Patents

Model learning device, model learning method, and program Download PDF

Info

Publication number
WO2019194128A1
WO2019194128A1 PCT/JP2019/014476 JP2019014476W WO2019194128A1 WO 2019194128 A1 WO2019194128 A1 WO 2019194128A1 JP 2019014476 W JP2019014476 W JP 2019014476W WO 2019194128 A1 WO2019194128 A1 WO 2019194128A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
model parameter
output
learning
model learning
Prior art date
Application number
PCT/JP2019/014476
Other languages
French (fr)
Japanese (ja)
Inventor
崇史 森谷
山口 義和
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Publication of WO2019194128A1 publication Critical patent/WO2019194128A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a model learning technique using a neural network.
  • Non-Patent Document 1 discloses a method of learning an acoustic model used for speech recognition using a neural network. In particular, the details are disclosed in II. “TRAININGEURDEEP NEURAL ⁇ NETWORKS” of Non-Patent Document 1.
  • FIG. 5 is a block diagram illustrating a configuration of the model learning apparatus 900.
  • FIG. 6 is a flowchart showing the operation of the model learning apparatus 900.
  • the model learning apparatus 900 includes a feature amount processing unit 920, a model learning unit 930, and a recording unit 990.
  • the recording unit 990 is a component that appropriately records information necessary for processing of the model learning device 900.
  • the initial value of the model parameter ⁇ is recorded in advance.
  • the model parameter ⁇ generated in the learning process is recorded as appropriate.
  • the initial value of the model parameter ⁇ may be generated using a random number, or a model parameter generated using data different from the data used for the current learning may be used.
  • the feature quantity processing unit 920 includes an intermediate feature quantity calculation unit 921 and an output probability distribution calculation unit 922.
  • feature quantities are extracted from input data (speech data in Non-Patent Document 1) serving as learning data and prepared.
  • the feature quantity is expressed as a real vector.
  • the input data is audio data
  • an example of the feature amount is FBANK (filter bank logarithmic power) extracted for each frame (usually about 20 ms to 40 ms) obtained by dividing the audio data.
  • a correct output number that is a number for identifying the correct output corresponding to the feature quantity is also prepared.
  • a set of the feature quantity and the correct answer number is an input to the model learning apparatus 900.
  • a set of feature quantity and correct answer number is called training data.
  • the number of output types corresponding to the feature value is M (M is an integer of 1 or more), and each output type is assigned a number (hereinafter referred to as an output number) from 1 to M, and the output number
  • the output is identified using m (1 ⁇ m ⁇ M, that is, m is an index representing the output number).
  • the model learning device 900 learns the model parameter ⁇ from the training data (that is, the combination of the feature value and the correct answer number).
  • the model parameter ⁇ is a weight or bias in each layer.
  • the intermediate feature amount calculation unit 921 is a configuration unit that executes calculation in each layer from the input layer to the final hidden layer.
  • the output probability distribution calculation unit 922 is a component that executes output calculation in the output layer. Therefore, in this case, the model parameter ⁇ learned by the model learning apparatus 900 is a DNN model parameter that characterizes the intermediate feature amount calculation unit 921 and the output probability distribution calculation unit 922.
  • the model learning apparatus 900 sets the initial value of the model parameter ⁇ recorded in the recording unit 990 in the intermediate feature amount calculation unit 921 and the output probability distribution calculation unit 922 before learning starts. Further, the model learning device 900 performs the calculation of the model parameter ⁇ to the intermediate feature amount calculation unit 921 and the output probability each time the model learning unit 930 performs optimization calculation (that is, updates to optimize) during learning. Set in the distribution calculator 922. Thus, the next training data is processed using the intermediate feature amount calculation unit 921 and the output probability distribution calculation unit 922 characterized by the newly calculated model parameter ⁇ .
  • Feature quantity processing unit 920 using the model parameters Omega, the features extracted from the input data, the distribution of the probability p m is the output of the output output number m (1 ⁇ m ⁇ M) corresponding to the feature quantity
  • the output probability distribution p (p 1 ,..., P M ) is calculated (S920).
  • the intermediate feature amount calculation unit 921 calculates an intermediate feature amount from the input feature amount (S921).
  • the processing here corresponds to the calculation of Equation (1) in Non-Patent Document 1.
  • the intermediate feature amount corresponds to the output feature amount of the final hidden layer of the DNN being learned.
  • the output probability distribution calculation unit 922 calculates the output probability distribution p from the intermediate feature amount calculated in S921 (S922).
  • the processing here corresponds to the calculation of Equation (2) in Non-Patent Document 1.
  • the output probability distribution p corresponds to the output feature amount of the output layer of the DNN being learned.
  • the model learning unit 930 learns the model parameter ⁇ by using the output probability distribution p calculated in S920 and a correct output number that is a number for identifying a correct output corresponding to the feature quantity that is an input in S920. (S930). For example, the optimization calculation of the model parameter ⁇ is performed so as to decrease the value of the loss function C defined by the following equation.
  • the processing here corresponds to the calculation of Equation (3) or Equation (4) in Non-Patent Document 1.
  • d (d 1 ,..., D M ) is a correct probability distribution defined by the following equation.
  • the model learning apparatus 900 repeats the processes of S920 to S930 for the number of training data (generally a very large number of tens of millions to hundreds of millions).
  • the model learning device 900 outputs the model parameter ⁇ at the time when this repetition is completed.
  • Non-Patent Document 2 discloses a learning method that can reduce the model size (number of model parameters) in a neural network.
  • the model learning apparatus 901 corresponding to the model learning of Non-Patent Document 2 will be described below with reference to FIGS.
  • FIG. 5 is a block diagram illustrating a configuration of the model learning device 901.
  • FIG. 6 is a flowchart showing the operation of the model learning device 901.
  • the model learning device 901 includes a feature amount processing unit 920, a model learning unit 931, and a recording unit 990.
  • model learning device 901 differs from the model learning device 900 only in that it includes a model learning unit 931 instead of the model learning unit 930.
  • the model learning unit 931 learns the model parameter ⁇ using the output probability distribution p calculated in S920 and the correct output number that is a number for identifying the correct output corresponding to the feature quantity that is the input in S920. (S931).
  • the model parameter ⁇ is optimized using a loss function L ( ⁇ ) defined by the following equation.
  • E ( ⁇ ) is an error term indicating an error between the output probability distribution calculated from the feature value using the model parameter ⁇ and the correct output, and is a term corresponding to the above-described loss function C.
  • R ( ⁇ ) is a regular parameter
  • the real number ⁇ is a hyperparameter for adjusting the influence of the regularization term R ( ⁇ ).
  • the model learning unit 931 learns the model parameter ⁇ by using the loss function L ( ⁇ ) obtained by adding the regularization term R ( ⁇ ) (scalar multiple) to the error term E ( ⁇ ), so that the model parameter ⁇ Learning is performed so that the values of some elements are close to 0 (the model becomes sparse).
  • model parameter ⁇ when a part of the elements of the model parameter ⁇ is 0 or a value close to 0, the model parameter ⁇ is said to have sparsity.
  • the model learning unit 931 learns the model parameter ⁇ having sparsity using the loss function L ( ⁇ ) including the regularization term R ( ⁇ ).
  • Non-Patent Document 2 regularized terms called Ridge (L2) and Group Lasso are used.
  • Ridge (L2) regularization term R L2 (W l ) When updating only the weight parameter W l in layer l (l is an integer for identifying the layers constituting the neural network)
  • R group (W l ) of is given by
  • R L2 (W l ) is the sum of squares of all elements of the weight parameter between the l-th layer and the (l-1) -th layer
  • Non-Patent Document 2 learning is performed as a unit (group) for grouping rows or columns of the matrix when the model parameter ⁇ is expressed using a matrix. Also, by learning the matrix row as a grouping unit and deleting the model parameter elements of the group whose norm value calculated for each row is smaller than the predetermined threshold from the model parameter ⁇ at the end of learning, the model size Have reduced.
  • the regularization term is used to avoid over-learning, but depending on the purpose, other than the regularization term R L2 (W l ) and the regularization term R group (W l ) of Non-Patent Document 2 Also, various regularization terms can be defined and used.
  • Non-Patent Document 1 learns a model in one domain (for example, in the case of speech recognition, the speech data collected on the premise that conditions such as background noise, recording equipment, speech style, etc. are the same. To learn). Therefore, the domain 1 data is obtained by using a model obtained by additionally learning using data of another domain (domain 2) using a model learned using data of a certain domain (domain 1) as an initial model.
  • domain 2 When the recognition process is performed on the image, there is a possibility that the accuracy is significantly deteriorated.
  • This characteristic of neural network learning is called catastrophic forgetting. In general, to prevent catastrophic forgetting (ie, additional learning without compromising the performance of a trained model corresponding to existing knowledge), we can use both domain 1 and domain 2 data again. Since it is necessary to re-learn the model, there is a problem that the cost related to the learning time is very high.
  • an object of the present invention is to provide a model learning technique capable of additionally learning using data of another domain without impairing the performance of a model learned using data of a certain domain. To do.
  • One aspect of the present invention is used for learning the learned model parameter using a setup unit that generates a mask from the learned model parameter that is an initial value of the model parameter ⁇ to be learned, and the model parameter ⁇ .
  • the output corresponding to the feature quantity calculating the output probability distribution is a distribution of the probability p m is the output of the output number m (1 ⁇ m ⁇ M)
  • a model learning unit that learns a model parameter ⁇ using a feature amount processing unit, the mask, the output probability distribution, and a correct output number that is a number for identifying a correct output corresponding to the feature amount;
  • L ( ⁇ ) is a loss function used when learning the model parameter ⁇ , ⁇ is a real number, and the setup unit corresponds to the element ⁇ of the model parameter ⁇ .
  • the elements gamma, using a threshold theta calculated by the following equation,
  • the model learning unit calculates the update difference ⁇ ( ⁇ ) of the element ⁇ of the model parameter ⁇ by the following formula, and updates the element ⁇
  • FIG. 3 is a diagram illustrating an example of a configuration of a setup unit 110.
  • FIG. The figure which shows an example of a structure of the model learning apparatus 900/901.
  • FIG. 1 is a block diagram illustrating a configuration of the model learning device 100.
  • FIG. 2 is a flowchart showing the operation of the model learning device 100.
  • the model learning device 100 includes a setup unit 110, a feature amount processing unit 920, a model learning unit 130, and a recording unit 990.
  • the recording unit 990 is a component that appropriately records information necessary for processing of the model learning device 100.
  • the initial value of the model parameter ⁇ is recorded in advance.
  • the initial value of this model parameter ⁇ trains a set of a correct output number that is a number for identifying a correct output corresponding to a feature quantity extracted from input data in a certain domain (hereinafter referred to as domain 1) and the feature quantity.
  • the data is, for example, learned model parameters learned by the model learning device 900 or the model learning device 901. Therefore, when the learned model parameter learned by the model learning device 901 is used, the learned model parameter has sparsity.
  • the learned model parameter is represented as ⁇ (0) and its element as ⁇ (0) .
  • the model learning device 100 extracts a feature amount extracted from input data in a domain (hereinafter referred to as domain 2) different from the domain used for learning the learned model parameter (that is, domain 1), and correct output corresponding to the feature amount.
  • domain 2 a domain
  • the model parameter ⁇ is learned from the training data that is a set of correct output numbers that are numbers for identifying.
  • the model learning device 100 uses the initial value of the model parameter ⁇ (that is, the learned model parameter) recorded in the recording unit 990 before the start of learning as a feature amount processing unit 920 (intermediate feature amount calculation unit 921 and output probability distribution calculation unit). 922). Further, the model learning device 100 sets the calculated model parameter ⁇ in the feature amount processing unit 920 every time the model learning unit 130 performs optimization calculation (that is, updates so as to optimize) the model parameter ⁇ during learning.
  • the setup unit 110 generates a mask from the learned model parameter that is the initial value of the model parameter ⁇ to be learned, which is recorded in the recording unit 990 (S110).
  • the setup unit 110 will be described below with reference to FIGS.
  • FIG. 3 is a block diagram illustrating a configuration of the setup unit 110.
  • FIG. 4 is a flowchart showing the operation of the setup unit 110.
  • the setup unit 110 includes a threshold value determination unit 111 and a mask generation unit 112. The operation of the setup unit 110 will be described with reference to FIG.
  • the threshold determination unit 111 determines the threshold ⁇ from the learned model parameter (S111). Any determination method may be used as long as it is a method for determining the threshold ⁇ so that a predetermined number of elements whose absolute values are close to 0 among the elements of the learned model parameters are extracted. For example, a frequency distribution regarding the values of learned model parameter elements is created, and the threshold ⁇ is determined so that the ratio of the model parameter elements whose absolute value is close to 0 is 25%. (Hereinafter referred to as determination method 1). In addition, a frequency distribution of values calculated for each group in which learned model parameter elements are grouped is created, and a value between two values (for example, an average value of the two values) is determined as a threshold ⁇ .
  • determination method 2 For example, when the learned model parameter is expressed using a matrix, the row (or column) of the matrix is grouped, and for each group, the norm value of the row vector (or column vector) of the group is related. A frequency distribution can be created and a value between two norm values can be determined as the threshold ⁇ . That is, the determination method 1 determines the threshold ⁇ based on the frequency distribution related to the values of the learned model parameter elements, and the determination method 2 is calculated for each group in which the learned model parameter elements are grouped. The threshold value ⁇ is determined based on the frequency distribution relating to the value.
  • the mask generation unit 112 generates a mask ⁇ from the learned model parameters using the threshold ⁇ determined in S111 (S112). A method for generating the mask ⁇ will be specifically described.
  • the element ⁇ of the mask ⁇ corresponding to the element ⁇ of the model parameter ⁇ is 1 when the absolute value of the learned model parameter element ⁇ (0) is smaller than the threshold ⁇ (below the threshold ⁇ ), otherwise If it is 0. That is, the element ⁇ of the mask corresponding to the element ⁇ of the model parameter ⁇ is calculated by the following equation using the threshold ⁇ .
  • the mask ⁇ is represented by a matrix having the same size as the matrix representing the model parameter ⁇ , in which all elements are 0 or 1.
  • the feature quantity processing unit 920 uses the model parameter ⁇ , and from the feature quantity extracted from the input data in the domain 2, the probability p that the output corresponding to the feature quantity is an output of the output number m (1 ⁇ m ⁇ M).
  • An output probability distribution p (p 1 ,..., p M ) that is a distribution of m is calculated (S920).
  • the model learning unit 130 includes the mask ⁇ generated in S110, the output probability distribution p calculated in S920, and the correct output number that is a number for identifying the correct output corresponding to the feature quantity that is the input in S920.
  • the model parameter ⁇ is learned (S130).
  • the model parameter ⁇ is optimized using the loss function L ( ⁇ ) defined by the formula (1) or the formula (2).
  • the update difference ⁇ ( ⁇ ) of the element ⁇ of the model parameter ⁇ is calculated by the equation (3), and the element ⁇ is updated by the equation (4).
  • is a (positive) real number representing the learning rate, and is a parameter for adjusting the degree of model parameter update.
  • ⁇ L ( ⁇ ) / ⁇ represents a gradient related to the element ⁇ of the loss function L ( ⁇ ). Note that the gradient ⁇ L ( ⁇ ) / ⁇ is also used for learning in the model learning device 900 and the model learning device 901.
  • model parameter ⁇ and the mask ⁇ are represented by a matrix
  • the optimization calculation of the model parameter ⁇ is performed as follows using a Hadamard product. Can be represented.
  • each element of the model parameter ⁇ can be effectively set to a value close to 0.
  • the model learning device 100 repeats the processing of S920 to S130 as many times as the number of training data, and outputs the finally calculated model parameter ⁇ .
  • the apparatus of the present invention includes, for example, a single hardware entity as an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity.
  • a communication unit a CPU (Central Processing Unit, may include a cache memory or a register), a RAM or ROM that is a memory, an external storage device that is a hard disk, and an input unit, an output unit, or a communication unit thereof ,
  • the hardware entity may be provided with a device (drive) that can read and write a recording medium such as a CD-ROM.
  • a physical entity having such hardware resources includes a general-purpose computer.
  • the external storage device of the hardware entity stores a program necessary for realizing the above functions and data necessary for processing the program (not limited to the external storage device, for example, reading a program) It may be stored in a ROM that is a dedicated storage device). Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device.
  • each program stored in an external storage device or ROM or the like
  • data necessary for processing each program are read into a memory as necessary, and are interpreted and executed by a CPU as appropriate.
  • the CPU realizes a predetermined function (respective component requirements expressed as the above-described unit, unit, etc.).
  • the processing functions in the hardware entity (the device of the present invention) described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.
  • the program describing the processing contents can be recorded on a computer-readable recording medium.
  • a computer-readable recording medium for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.
  • a magnetic recording device a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.
  • this program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.
  • a computer that executes such a program first stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device.
  • the computer reads a program stored in its own recording medium and executes a process according to the read program.
  • the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer.
  • the processing according to the received program may be executed sequentially.
  • the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good.
  • ASP Application Service Provider
  • the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).
  • the hardware entity is configured by executing a predetermined program on the computer.
  • a predetermined program on the computer.
  • at least a part of these processing contents may be realized in hardware.

Abstract

Provided is a model learning feature with which it is possible, without impairing the performance of a model learned using the data of a given domain, to additionally learn using the data of another domain. The present invention includes: a setup unit for generating a mask from a learned model parameter that is the initial value of a model parameter Ω; a feature quantity processing unit for calculating an output probability distribution that is the distribution of probability that an output corresponding to a feature quantity extracted from input data in a domain different from the domain used in the learning of the learned model parameter is the output of an output number m; and a model learning unit for learning the model parameter Ω using the mask, the output probability distribution, and a correct answer output number that is a number for identifying a correct answer output that corresponds to the feature quantity. The model learning unit calculates an update difference δ(ω) for an element ω of the model parameter Ω by a prescribed expression in which there are used a loss function L(Ω) and a mask element γ that corresponds to the element ω of the model parameter Ω, and updates the element ω.

Description

モデル学習装置、モデル学習方法、プログラムModel learning device, model learning method, program
 本発明は、ニューラルネットワークを用いたモデル学習技術に関する。 The present invention relates to a model learning technique using a neural network.
 従来のニューラルネットワークを用いたモデル(モデルパラメータ)の学習方法について説明する。非特許文献1には、ニューラルネットワークを用いて、音声認識に用いる音響モデルを学習する方法が開示されている。特に、非特許文献1のII.”TRAINING DEEP NEURAL NETWORKS”にその詳細が開示されている。 A model (model parameter) learning method using a conventional neural network will be described. Non-Patent Document 1 discloses a method of learning an acoustic model used for speech recognition using a neural network. In particular, the details are disclosed in II. “TRAININGEURDEEP NEURAL の NETWORKS” of Non-Patent Document 1.
 以下、図5~図6を参照して非特許文献1のモデル学習に対応するモデル学習装置900について説明する。図5は、モデル学習装置900の構成を示すブロック図である。図6は、モデル学習装置900の動作を示すフローチャートである。図5に示すようにモデル学習装置900は、特徴量処理部920、モデル学習部930、記録部990を含む。 Hereinafter, a model learning apparatus 900 corresponding to the model learning of Non-Patent Document 1 will be described with reference to FIGS. FIG. 5 is a block diagram illustrating a configuration of the model learning apparatus 900. FIG. 6 is a flowchart showing the operation of the model learning apparatus 900. As shown in FIG. 5, the model learning apparatus 900 includes a feature amount processing unit 920, a model learning unit 930, and a recording unit 990.
 記録部990は、モデル学習装置900の処理に必要な情報を適宜記録する構成部である。例えば、モデルパラメータΩの初期値を事前に記録しておく。また、学習過程で生成されるモデルパラメータΩを適宜記録する。モデルパラメータΩの初期値は、乱数を用いて生成してもよいし、今回の学習に用いるデータとは異なる別のデータを用いて生成したモデルパラメータを利用してもよい。 The recording unit 990 is a component that appropriately records information necessary for processing of the model learning device 900. For example, the initial value of the model parameter Ω is recorded in advance. Also, the model parameter Ω generated in the learning process is recorded as appropriate. The initial value of the model parameter Ω may be generated using a random number, or a model parameter generated using data different from the data used for the current learning may be used.
 また、図7に示すように、特徴量処理部920は、中間特徴量計算部921、出力確率分布計算部922を含む。 Further, as shown in FIG. 7, the feature quantity processing unit 920 includes an intermediate feature quantity calculation unit 921 and an output probability distribution calculation unit 922.
 学習開始前に、学習データとなる入力データ(非特許文献1では音声データ)から特徴量を抽出し、用意しておく。特徴量は実数ベクトルとして表される。入力データを音声データとする場合、特徴量の例として、音声データを分割したフレーム(通常20ms~40ms程度)ごとに抽出されるFBANK(フィルタバンク対数パワー)が挙げられる。また、特徴量に対応する正解出力を識別するための番号である正解出力番号も併せて用意しておく。この特徴量と正解出力番号の組がモデル学習装置900の入力となる。特徴量と正解出力番号の組のことを訓練データという。 Before the start of learning, feature quantities are extracted from input data (speech data in Non-Patent Document 1) serving as learning data and prepared. The feature quantity is expressed as a real vector. When the input data is audio data, an example of the feature amount is FBANK (filter bank logarithmic power) extracted for each frame (usually about 20 ms to 40 ms) obtained by dividing the audio data. Also, a correct output number that is a number for identifying the correct output corresponding to the feature quantity is also prepared. A set of the feature quantity and the correct answer number is an input to the model learning apparatus 900. A set of feature quantity and correct answer number is called training data.
 以下、特徴量に対応する出力の種類の数をM(Mは1以上の整数)とし、各出力の種類には番号(以下、出力番号という)が1~Mまで振られており、出力番号m(1≦m≦M、つまり、mは出力番号を表すインデックスである)を用いて出力を識別することにする。 In the following, the number of output types corresponding to the feature value is M (M is an integer of 1 or more), and each output type is assigned a number (hereinafter referred to as an output number) from 1 to M, and the output number The output is identified using m (1 ≦ m ≦ M, that is, m is an index representing the output number).
 モデル学習装置900は、訓練データ(つまり、特徴量と正解出力番号の組)から、モデルパラメータΩを学習する。ディープニューラルネットワーク(DNN: Deep Neural Networks)を用いる場合、モデルパラメータΩは、各層における重みやバイアスである。 The model learning device 900 learns the model parameter Ω from the training data (that is, the combination of the feature value and the correct answer number). When deep neural networks (DNN: Deep Neural Networks) are used, the model parameter Ω is a weight or bias in each layer.
 DNNを用いる場合を例に、各構成部について説明する。中間特徴量計算部921は、入力層から最終隠れ層までの各層における計算を実行する構成部である。また、出力確率分布計算部922は、出力層における出力の計算を実行する構成部である。したがって、この場合、モデル学習装置900が学習するモデルパラメータΩは、中間特徴量計算部921と出力確率分布計算部922を特徴付けるDNNのモデルパラメータとなる。 Each component will be described by using DNN as an example. The intermediate feature amount calculation unit 921 is a configuration unit that executes calculation in each layer from the input layer to the final hidden layer. The output probability distribution calculation unit 922 is a component that executes output calculation in the output layer. Therefore, in this case, the model parameter Ω learned by the model learning apparatus 900 is a DNN model parameter that characterizes the intermediate feature amount calculation unit 921 and the output probability distribution calculation unit 922.
 モデル学習装置900は、学習開始までに、記録部990に記録したモデルパラメータΩの初期値を中間特徴量計算部921、出力確率分布計算部922に設定する。また、モデル学習装置900は、学習中、モデル学習部930がモデルパラメータΩを最適化計算(つまり、最適化するよう更新)する都度、計算したモデルパラメータΩを中間特徴量計算部921、出力確率分布計算部922に設定する。これにより、新たに計算されたモデルパラメータΩで特徴付けられる中間特徴量計算部921と出力確率分布計算部922を用いて、次の訓練データを処理することになる。 The model learning apparatus 900 sets the initial value of the model parameter Ω recorded in the recording unit 990 in the intermediate feature amount calculation unit 921 and the output probability distribution calculation unit 922 before learning starts. Further, the model learning device 900 performs the calculation of the model parameter Ω to the intermediate feature amount calculation unit 921 and the output probability each time the model learning unit 930 performs optimization calculation (that is, updates to optimize) during learning. Set in the distribution calculator 922. Thus, the next training data is processed using the intermediate feature amount calculation unit 921 and the output probability distribution calculation unit 922 characterized by the newly calculated model parameter Ω.
 図6に従いモデル学習装置900の動作について説明する。特徴量処理部920は、モデルパラメータΩを用いて、入力データから抽出した特徴量から、当該特徴量に対応する出力が出力番号m(1≦m≦M)の出力である確率pmの分布である出力確率分布p=(p1,…,pM)を計算する(S920)。以下、図8を参照して特徴量処理部920の動作について説明する。中間特徴量計算部921は、入力された特徴量から、中間特徴量を計算する(S921)。中間特徴量は、入力された特徴量に対応する出力が出力番号m(1≦m≦M)の出力である確率pmの分布である出力確率分布p=(p1,…,pM)を計算するために用いる特徴量である。ここでの処理は、非特許文献1の式(1)の計算に相当するものである。なお、DNNを用いる場合、中間特徴量は学習中のDNNの最終隠れ層の出力特徴量に該当する。 The operation of the model learning apparatus 900 will be described with reference to FIG. Feature quantity processing unit 920, using the model parameters Omega, the features extracted from the input data, the distribution of the probability p m is the output of the output output number m (1 ≦ m ≦ M) corresponding to the feature quantity The output probability distribution p = (p 1 ,..., P M ) is calculated (S920). Hereinafter, the operation of the feature amount processing unit 920 will be described with reference to FIG. The intermediate feature amount calculation unit 921 calculates an intermediate feature amount from the input feature amount (S921). Intermediate feature quantity is output probability distribution of the probability p m is the output of the output output number m corresponding to the feature quantity input (1 ≦ m ≦ M) distribution p = (p 1, ..., p M) Is a feature amount used to calculate. The processing here corresponds to the calculation of Equation (1) in Non-Patent Document 1. When DNN is used, the intermediate feature amount corresponds to the output feature amount of the final hidden layer of the DNN being learned.
 出力確率分布計算部922は、S921で計算した中間特徴量から、出力確率分布pを計算する(S922)。ここでの処理は、非特許文献1の式(2)の計算に相当するものである。なお、DNNを用いる場合、出力確率分布pは学習中のDNNの出力層の出力特徴量に該当する。 The output probability distribution calculation unit 922 calculates the output probability distribution p from the intermediate feature amount calculated in S921 (S922). The processing here corresponds to the calculation of Equation (2) in Non-Patent Document 1. When DNN is used, the output probability distribution p corresponds to the output feature amount of the output layer of the DNN being learned.
 モデル学習部930は、S920で計算した出力確率分布pと、S920での入力である特徴量に対応する正解出力を識別するための番号である正解出力番号とを用いて、モデルパラメータΩを学習する(S930)。例えば、次式で定義される損失関数Cの値を減少させるように、モデルパラメータΩの最適化計算をしていく。ここでの処理は、非特許文献1の式(3)や式(4)の計算に相当するものである。 The model learning unit 930 learns the model parameter Ω by using the output probability distribution p calculated in S920 and a correct output number that is a number for identifying a correct output corresponding to the feature quantity that is an input in S920. (S930). For example, the optimization calculation of the model parameter Ω is performed so as to decrease the value of the loss function C defined by the following equation. The processing here corresponds to the calculation of Equation (3) or Equation (4) in Non-Patent Document 1.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 ただし、d=(d1,…,dM)は次式で定義される正解確率分布である。 However, d = (d 1 ,..., D M ) is a correct probability distribution defined by the following equation.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 モデル学習装置900は、S920~S930の処理を訓練データの数(一般に数千万~数億程度と非常に大きい数)だけ繰り返す。モデル学習装置900は、この繰り返しが終了した時点のモデルパラメータΩを出力する。 The model learning apparatus 900 repeats the processes of S920 to S930 for the number of training data (generally a very large number of tens of millions to hundreds of millions). The model learning device 900 outputs the model parameter Ω at the time when this repetition is completed.
 また、非特許文献2には、ニューラルネットワークにおけるモデルサイズ(モデルパラメータの数)を削減することができる学習方法が開示されている。以下、図5~図6を参照して非特許文献2のモデル学習に対応するモデル学習装置901について説明する。図5は、モデル学習装置901の構成を示すブロック図である。図6は、モデル学習装置901の動作を示すフローチャートである。図5に示すようにモデル学習装置901は、特徴量処理部920、モデル学習部931、記録部990を含む。 Also, Non-Patent Document 2 discloses a learning method that can reduce the model size (number of model parameters) in a neural network. The model learning apparatus 901 corresponding to the model learning of Non-Patent Document 2 will be described below with reference to FIGS. FIG. 5 is a block diagram illustrating a configuration of the model learning device 901. FIG. 6 is a flowchart showing the operation of the model learning device 901. As shown in FIG. 5, the model learning device 901 includes a feature amount processing unit 920, a model learning unit 931, and a recording unit 990.
 つまり、モデル学習装置901は、モデル学習部930の代わりに、モデル学習部931を含む点においてのみモデル学習装置900と異なる。 That is, the model learning device 901 differs from the model learning device 900 only in that it includes a model learning unit 931 instead of the model learning unit 930.
 そこで、以下、モデル学習部931の動作について説明する(図6参照)。モデル学習部931は、S920で計算した出力確率分布pと、S920での入力である特徴量に対応する正解出力を識別するための番号である正解出力番号とを用いて、モデルパラメータΩを学習する(S931)。例えば、次式で定義される損失関数L(Ω)を用いて、モデルパラメータΩを最適化する。 Therefore, the operation of the model learning unit 931 will be described below (see FIG. 6). The model learning unit 931 learns the model parameter Ω using the output probability distribution p calculated in S920 and the correct output number that is a number for identifying the correct output corresponding to the feature quantity that is the input in S920. (S931). For example, the model parameter Ω is optimized using a loss function L (Ω) defined by the following equation.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 ここで、E(Ω)は、モデルパラメータΩを用いて特徴量から計算した出力確率分布と正解出力との誤差を示す誤差項であり、上述の損失関数Cに相当する項である。また、R(Ω)は正則化項、実数λは正則化項R(Ω)の影響を調整するためのハイパーパラメータである。 Here, E (Ω) is an error term indicating an error between the output probability distribution calculated from the feature value using the model parameter Ω and the correct output, and is a term corresponding to the above-described loss function C. Further, R (Ω) is a regular parameter, and the real number λ is a hyperparameter for adjusting the influence of the regularization term R (Ω).
 モデル学習部931は、誤差項E(Ω)に正則化項R(Ω)(のスカラー倍)を加えた損失関数L(Ω)を用いてモデルパラメータΩを学習することにより、モデルパラメータΩの一部の要素の値が0に近い値となる(モデルがスパースとなる)ような学習を行う。 The model learning unit 931 learns the model parameter Ω by using the loss function L (Ω) obtained by adding the regularization term R (Ω) (scalar multiple) to the error term E (Ω), so that the model parameter Ω Learning is performed so that the values of some elements are close to 0 (the model becomes sparse).
 ここで、モデルパラメータΩの要素の一部が0または0に近い値になる場合、モデルパラメータΩはスパース性を有するという。 Here, when a part of the elements of the model parameter Ω is 0 or a value close to 0, the model parameter Ω is said to have sparsity.
 したがって、モデル学習部931は、正則化項R(Ω)を含む損失関数L(Ω)を用いて、スパース性を有するモデルパラメータΩを学習するものである。 Therefore, the model learning unit 931 learns the model parameter Ω having sparsity using the loss function L (Ω) including the regularization term R (Ω).
 非特許文献2では、Ridge(L2)とGroup Lassoと呼ばれる正則化項が用いられている。例えば、層l(lはニューラルネットワークを構成する層を識別するための整数である)における重みパラメータWlのみを更新する場合のRidge(L2)の正則化項RL2(Wl)、Group Lassoの正則化項Rgroup(Wl)は、次式で与えられる。 In Non-Patent Document 2, regularized terms called Ridge (L2) and Group Lasso are used. For example, Ridge (L2) regularization term R L2 (W l ), Group Lasso when updating only the weight parameter W l in layer l (l is an integer for identifying the layers constituting the neural network) The regularization term R group (W l ) of is given by
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
 つまり、RL2(Wl)は第l層と第(l-1)層間の重みパラメータのすべての要素の二乗和を、Rgroup(Wl)は第l層の1つの素子と第(l-1)層のすべての素子(j=1, …, Nl-1)を結合する重み(の絶対値)の和を表している。 That is, R L2 (W l ) is the sum of squares of all elements of the weight parameter between the l-th layer and the (l-1) -th layer, and R group (W l ) is one element of the l-th layer and (l -1) represents the sum of weights (absolute values) for coupling all elements (j = 1,..., N l-1 ) of the layer.
 正則化項としてGroup Lassoを用いる場合、モデルパラメータΩを任意にグルーピングして学習することが可能である。例えば、非特許文献2では、モデルパラメータΩを行列を用いて表した場合における、当該行列の行あるいは列をグルーピングする単位(グループ)として学習している。また、行列の行をグルーピングの単位として学習し、行ごとに計算されるノルムの値が所定の閾値より小さいグループのモデルパラメータの要素を学習終了時点のモデルパラメータΩから削除することにより、モデルサイズを削減している。 When Group Lasso is used as a regularization term, it is possible to learn by arbitrarily grouping the model parameter Ω. For example, in Non-Patent Document 2, learning is performed as a unit (group) for grouping rows or columns of the matrix when the model parameter Ω is expressed using a matrix. Also, by learning the matrix row as a grouping unit and deleting the model parameter elements of the group whose norm value calculated for each row is smaller than the predetermined threshold from the model parameter Ω at the end of learning, the model size Have reduced.
 本来、正則化項は過学習を避けるために用いるものであるが、目的に応じて、(非特許文献2の正則化項RL2(Wl)、正則化項Rgroup(Wl)以外にも)様々な正則化項を定義し利用することができる。 Originally, the regularization term is used to avoid over-learning, but depending on the purpose, other than the regularization term R L2 (W l ) and the regularization term R group (W l ) of Non-Patent Document 2 Also, various regularization terms can be defined and used.
 非特許文献1の学習方法は、1つのドメインでモデルを学習する(例えば、音声認識の場合、背景雑音、収録機器、発話スタイルなどの条件が同一であるという前提のもと収集した音声データを用いて学習を行う)ことを前提としている。したがって、あるドメイン(ドメイン1)のデータを用いて学習したモデルを初期モデルとして、別のドメイン(ドメイン2)のデータを用いて追加的に学習して得られるモデルを用いて、ドメイン1のデータについて認識処理を行うと、その精度が著しく劣化するという問題が起こりうる。このようなニューラルネットワークの学習に関する性質を破滅的忘却という。一般に、破滅的忘却が起こることを防ぐ(つまり、既存の知識に相当する学習済みモデルの性能を損なうことなく追加的に学習する)には、ドメイン1とドメイン2の両方のデータを用いて再度モデルを学習し直す必要があるため、学習時間に関するコストが非常にかかるという問題がある。 The learning method of Non-Patent Document 1 learns a model in one domain (for example, in the case of speech recognition, the speech data collected on the premise that conditions such as background noise, recording equipment, speech style, etc. are the same. To learn). Therefore, the domain 1 data is obtained by using a model obtained by additionally learning using data of another domain (domain 2) using a model learned using data of a certain domain (domain 1) as an initial model. When the recognition process is performed on the image, there is a possibility that the accuracy is significantly deteriorated. This characteristic of neural network learning is called catastrophic forgetting. In general, to prevent catastrophic forgetting (ie, additional learning without compromising the performance of a trained model corresponding to existing knowledge), we can use both domain 1 and domain 2 data again. Since it is necessary to re-learn the model, there is a problem that the cost related to the learning time is very high.
 そこで本発明は、あるドメインのデータを用いて学習したモデルの性能を損なうことなく、別のドメインのデータを用いて追加的に学習することができるようなモデル学習技術を提供することを目的とする。 Accordingly, an object of the present invention is to provide a model learning technique capable of additionally learning using data of another domain without impairing the performance of a model learned using data of a certain domain. To do.
 本発明の一態様は、学習対象となるモデルパラメータΩの初期値である学習済みモデルパラメータから、マスクを生成するセットアップ部と、モデルパラメータΩを用いて、前記学習済みモデルパラメータの学習に用いたドメインとは異なるドメインにおける入力データから抽出した特徴量から、当該特徴量に対応する出力が出力番号m(1≦m≦M)の出力である確率pmの分布である出力確率分布を計算する特徴量処理部と、前記マスクと、前記出力確率分布と、前記特徴量に対応する正解出力を識別するための番号である正解出力番号とを用いて、モデルパラメータΩを学習するモデル学習部とを含み、L(Ω)をモデルパラメータΩを学習する際に用いる損失関数、μを実数とし、前記セットアップ部は、モデルパラメータΩの要素ωに対応するマスクの要素γを、閾値θを用いて、次式により計算し、 One aspect of the present invention is used for learning the learned model parameter using a setup unit that generates a mask from the learned model parameter that is an initial value of the model parameter Ω to be learned, and the model parameter Ω. from the feature quantity extracted from the input data in a different domain than the domain, the output corresponding to the feature quantity calculating the output probability distribution is a distribution of the probability p m is the output of the output number m (1 ≦ m ≦ M) A model learning unit that learns a model parameter Ω using a feature amount processing unit, the mask, the output probability distribution, and a correct output number that is a number for identifying a correct output corresponding to the feature amount; L (Ω) is a loss function used when learning the model parameter Ω, μ is a real number, and the setup unit corresponds to the element ω of the model parameter Ω. The elements gamma, using a threshold theta, calculated by the following equation,
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
(ただし、ω(0)は要素ωの初期値である)
前記モデル学習部は、モデルパラメータΩの要素ωの更新差分δ(ω)を、次式により計算し、要素ωを更新する
(Where ω (0) is the initial value of element ω)
The model learning unit calculates the update difference δ (ω) of the element ω of the model parameter Ω by the following formula, and updates the element ω
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
(ただし、∂L(Ω)/∂ωは、損失関数L(Ω)の要素ωに関する勾配)。 (However, ∂L (Ω) / ∂ω is the gradient with respect to the element ω of the loss function L (Ω)).
 本発明によれば、あるドメインのデータを用いて学習したモデルの性能を損なうことなく、別のドメインのデータを用いて追加的に学習することができる。 According to the present invention, it is possible to additionally learn using data of another domain without impairing the performance of a model learned using data of a certain domain.
モデル学習装置100の構成の一例を示す図。The figure which shows an example of a structure of the model learning apparatus. モデル学習装置100の動作の一例を示す図。The figure which shows an example of operation | movement of the model learning apparatus. セットアップ部110の構成の一例を示す図。FIG. 3 is a diagram illustrating an example of a configuration of a setup unit 110. セットアップ部110の動作の一例を示す図。The figure which shows an example of operation | movement of the setup part 110. FIG. モデル学習装置900/901の構成の一例を示す図。The figure which shows an example of a structure of the model learning apparatus 900/901. モデル学習装置900/901の動作の一例を示す図。The figure which shows an example of operation | movement of the model learning apparatus 900/901. 特徴量処理部920の構成の一例を示す図。The figure which shows an example of a structure of the feature-value process part 920. 特徴量処理部920の動作の一例を示す図。The figure which shows an example of operation | movement of the feature-value process part 920.
 以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.
<第一実施形態>
 以下、図1~図2を参照してモデル学習装置100について説明する。図1は、モデル学習装置100の構成を示すブロック図である。図2は、モデル学習装置100の動作を示すフローチャートである。図1に示すようにモデル学習装置100は、セットアップ部110、特徴量処理部920、モデル学習部130、記録部990を含む。
<First embodiment>
The model learning apparatus 100 will be described below with reference to FIGS. FIG. 1 is a block diagram illustrating a configuration of the model learning device 100. FIG. 2 is a flowchart showing the operation of the model learning device 100. As illustrated in FIG. 1, the model learning device 100 includes a setup unit 110, a feature amount processing unit 920, a model learning unit 130, and a recording unit 990.
 記録部990は、モデル学習装置100の処理に必要な情報を適宜記録する構成部である。例えば、モデルパラメータΩの初期値を事前に記録しておく。このモデルパラメータΩの初期値は、あるドメイン(以下、ドメイン1という)における入力データから抽出した特徴量と当該特徴量に対応する正解出力を識別するための番号である正解出力番号の組を訓練データとして、例えば、モデル学習装置900やモデル学習装置901が学習した学習済みモデルパラメータとする。したがって、モデル学習装置901が学習した学習済みモデルパラメータを用いる場合、学習済みモデルパラメータは、スパース性を有するものとなる。以下、学習済みモデルパラメータをΩ(0)、その要素をω(0)と表す。 The recording unit 990 is a component that appropriately records information necessary for processing of the model learning device 100. For example, the initial value of the model parameter Ω is recorded in advance. The initial value of this model parameter Ω trains a set of a correct output number that is a number for identifying a correct output corresponding to a feature quantity extracted from input data in a certain domain (hereinafter referred to as domain 1) and the feature quantity. The data is, for example, learned model parameters learned by the model learning device 900 or the model learning device 901. Therefore, when the learned model parameter learned by the model learning device 901 is used, the learned model parameter has sparsity. In the following, the learned model parameter is represented as Ω (0) and its element as ω (0) .
 モデル学習装置100は、学習済みモデルパラメータの学習に用いたドメイン(つまり、ドメイン1)とは異なるドメイン(以下、ドメイン2という)における入力データから抽出した特徴量と当該特徴量に対応する正解出力を識別するための番号である正解出力番号の組である訓練データから、モデルパラメータΩを学習する。 The model learning device 100 extracts a feature amount extracted from input data in a domain (hereinafter referred to as domain 2) different from the domain used for learning the learned model parameter (that is, domain 1), and correct output corresponding to the feature amount. The model parameter Ω is learned from the training data that is a set of correct output numbers that are numbers for identifying.
 モデル学習装置100は、学習開始までに、記録部990に記録したモデルパラメータΩの初期値(つまり、学習済みモデルパラメータ)を特徴量処理部920(中間特徴量計算部921及び出力確率分布計算部922)に設定する。また、モデル学習装置100は、学習中、モデル学習部130がモデルパラメータΩを最適化計算(つまり、最適化するよう更新)する都度、計算したモデルパラメータΩを特徴量処理部920に設定する。 The model learning device 100 uses the initial value of the model parameter Ω (that is, the learned model parameter) recorded in the recording unit 990 before the start of learning as a feature amount processing unit 920 (intermediate feature amount calculation unit 921 and output probability distribution calculation unit). 922). Further, the model learning device 100 sets the calculated model parameter Ω in the feature amount processing unit 920 every time the model learning unit 130 performs optimization calculation (that is, updates so as to optimize) the model parameter Ω during learning.
 図2に従いモデル学習装置100の動作について説明する。セットアップ部110は、記録部990に記録してある、学習対象となるモデルパラメータΩの初期値である学習済みモデルパラメータから、マスクを生成する(S110)。以下、図3~図4を参照してセットアップ部110について説明する。図3は、セットアップ部110の構成を示すブロック図である。図4は、セットアップ部110の動作を示すフローチャートである。図3に示すようにセットアップ部110は、閾値決定部111、マスク生成部112を含む。図4に従いセットアップ部110の動作について説明する。 The operation of the model learning device 100 will be described with reference to FIG. The setup unit 110 generates a mask from the learned model parameter that is the initial value of the model parameter Ω to be learned, which is recorded in the recording unit 990 (S110). The setup unit 110 will be described below with reference to FIGS. FIG. 3 is a block diagram illustrating a configuration of the setup unit 110. FIG. 4 is a flowchart showing the operation of the setup unit 110. As shown in FIG. 3, the setup unit 110 includes a threshold value determination unit 111 and a mask generation unit 112. The operation of the setup unit 110 will be described with reference to FIG.
 閾値決定部111は、学習済みモデルパラメータから閾値θを決定する(S111)。学習済みモデルパラメータの要素のうち、その絶対値が0に近い要素を所定の数だけ抽出するような、閾値θの決定方法であれば、どのような決定方法を用いてもよい。例えば、学習済みモデルパラメータの要素の値に関する頻度の分布を作成し、当該モデルパラメータの要素のうち、その値の絶対値が0に近いものの割合が25%になるように閾値θを決定することができる(以下、決定方法1という)。また、学習済みモデルパラメータの要素をグルーピングしたグループごとに計算される値に関する頻度の分布を作成し、ある2つの値の間の値(例えば、この2つの値の平均値)を閾値θとして決定することができる(以下、決定方法2という)。例えば、学習済みモデルパラメータが行列を用いて表されている場合、当該行列の行(あるいは列)をグループとして、各グループに対して、当該グループの行ベクトル(あるいは列ベクトル)のノルムの値に関する頻度の分布を作成し、ある2つのノルムの値の間の値を閾値θとして決定することができる。つまり、決定方法1は、学習済みモデルパラメータの要素の値に関する頻度の分布に基づいて閾値θを決定するものであり、決定方法2は、学習済みモデルパラメータの要素をグルーピングしたグループごとに計算される値に関する頻度の分布に基づいて閾値θを決定するものである。 The threshold determination unit 111 determines the threshold θ from the learned model parameter (S111). Any determination method may be used as long as it is a method for determining the threshold θ so that a predetermined number of elements whose absolute values are close to 0 among the elements of the learned model parameters are extracted. For example, a frequency distribution regarding the values of learned model parameter elements is created, and the threshold θ is determined so that the ratio of the model parameter elements whose absolute value is close to 0 is 25%. (Hereinafter referred to as determination method 1). In addition, a frequency distribution of values calculated for each group in which learned model parameter elements are grouped is created, and a value between two values (for example, an average value of the two values) is determined as a threshold θ. (Hereinafter referred to as determination method 2). For example, when the learned model parameter is expressed using a matrix, the row (or column) of the matrix is grouped, and for each group, the norm value of the row vector (or column vector) of the group is related. A frequency distribution can be created and a value between two norm values can be determined as the threshold θ. That is, the determination method 1 determines the threshold θ based on the frequency distribution related to the values of the learned model parameter elements, and the determination method 2 is calculated for each group in which the learned model parameter elements are grouped. The threshold value θ is determined based on the frequency distribution relating to the value.
 マスク生成部112は、S111で決定した閾値θを用いて、学習済みモデルパラメータからマスクΓを生成する(S112)。マスクΓの生成方法について、具体的に説明する。モデルパラメータΩの要素ωに対応するマスクΓの要素γは、学習済みモデルパラメータの要素ω(0)の絶対値が閾値θよりも小さい(閾値θ以下である)場合は1に、それ以外の場合は0にする。つまり、モデルパラメータΩの要素ωに対応するマスクの要素γを、閾値θを用いて、次式により計算する。 The mask generation unit 112 generates a mask Γ from the learned model parameters using the threshold θ determined in S111 (S112). A method for generating the mask Γ will be specifically described. The element γ of the mask Γ corresponding to the element ω of the model parameter Ω is 1 when the absolute value of the learned model parameter element ω (0) is smaller than the threshold θ (below the threshold θ), otherwise If it is 0. That is, the element γ of the mask corresponding to the element ω of the model parameter Ω is calculated by the following equation using the threshold θ.
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
(ただし、ω(0)は要素ωの初期値である)
 モデルパラメータΩを行列を用いて表す場合、マスクΓは、すべての要素が0または1となる、モデルパラメータΩを表す行列と同じサイズの行列で表される。
(Where ω (0) is the initial value of element ω)
When the model parameter Ω is represented using a matrix, the mask Γ is represented by a matrix having the same size as the matrix representing the model parameter Ω, in which all elements are 0 or 1.
 特徴量処理部920は、モデルパラメータΩを用いて、ドメイン2における入力データから抽出した特徴量から、当該特徴量に対応する出力が出力番号m(1≦m≦M)の出力である確率pmの分布である出力確率分布p=(p1,…,pM)を計算する(S920)。 The feature quantity processing unit 920 uses the model parameter Ω, and from the feature quantity extracted from the input data in the domain 2, the probability p that the output corresponding to the feature quantity is an output of the output number m (1 ≦ m ≦ M). An output probability distribution p = (p 1 ,..., p M ) that is a distribution of m is calculated (S920).
 モデル学習部130は、S110で生成したマスクΓと、S920で計算した出力確率分布pと、S920での入力である特徴量に対応する正解出力を識別するための番号である正解出力番号とを用いて、モデルパラメータΩを学習する(S130)。例えば、式(1)または式(2)で定義される損失関数L(Ω)を用いて、モデルパラメータΩを最適化する。 The model learning unit 130 includes the mask Γ generated in S110, the output probability distribution p calculated in S920, and the correct output number that is a number for identifying the correct output corresponding to the feature quantity that is the input in S920. Using this, the model parameter Ω is learned (S130). For example, the model parameter Ω is optimized using the loss function L (Ω) defined by the formula (1) or the formula (2).
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
 具体的には、モデルパラメータΩの要素ωの更新差分δ(ω)を式(3)により計算し、要素ωを式(4)により更新する。 Specifically, the update difference δ (ω) of the element ω of the model parameter Ω is calculated by the equation (3), and the element ω is updated by the equation (4).
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000014
 ここで、μは学習率を表す(正の)実数であり、モデルパラメータの更新の程度を調整するパラメータである。また、∂L(Ω)/∂ωは、損失関数L(Ω)の要素ωに関する勾配を表す。なお、勾配∂L(Ω)/∂ωは、モデル学習装置900やモデル学習装置901における学習でも用いられるものである。 Here, μ is a (positive) real number representing the learning rate, and is a parameter for adjusting the degree of model parameter update. Further, ∂L (Ω) / ∂ω represents a gradient related to the element ω of the loss function L (Ω). Note that the gradient ∂L (Ω) / ∂ω is also used for learning in the model learning device 900 and the model learning device 901.
 この更新差分を用いると、学習対象としたいモデルパラメータΩの要素、つまり、閾値θより小さい(閾値θ以下である)要素のみを選択的に更新することが可能となる。 Using this update difference, it is possible to selectively update only the elements of the model parameter Ω that are desired to be learned, that is, elements that are smaller than the threshold θ (that is equal to or smaller than the threshold θ).
 上述のようにモデルパラメータΩ及びマスクΓが行列で表されている場合、その行列そのものもΩやΓで表すことにすると、モデルパラメータΩの最適化計算は、アダマール積を用いて以下のように表すことができる。 As described above, when the model parameter Ω and the mask Γ are represented by a matrix, if the matrix itself is also represented by Ω or Γ, the optimization calculation of the model parameter Ω is performed as follows using a Hadamard product. Can be represented.
Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000015
 なお、正則化項R(Ω)を含む損失関数L(Ω)(式(2))を用いると、モデルパラメータΩの各要素を効率的に0に近い値とすることができる。 It should be noted that if the loss function L (Ω) (formula (2)) including the regularization term R (Ω) is used, each element of the model parameter Ω can be effectively set to a value close to 0.
 モデル学習装置100は、S920~S130の処理を訓練データの数だけ繰り返し、最終的に計算されたモデルパラメータΩを出力する。 The model learning device 100 repeats the processing of S920 to S130 as many times as the number of training data, and outputs the finally calculated model parameter Ω.
 本実施形態の発明によれば、あるドメインのデータを用いて学習したモデルの性能を損なうことなく、別のドメインのデータを用いて追加的に学習することができる。これにより、ドメイン2における入力データのみを用いて、ドメイン1における入力データを用いて学習した学習済みモデルを初期モデルとして、ドメイン1とドメイン2の両方の入力データを精度よく処理することができるモデルを学習することができるため、学習時間に関するコストを削減することが可能となる。 According to the invention of this embodiment, it is possible to additionally learn using data of another domain without impairing the performance of a model learned using data of a certain domain. As a result, a model capable of accurately processing both domain 1 and domain 2 input data using only the input data in domain 2 and using the learned model learned using the input data in domain 1 as an initial model. Therefore, it is possible to reduce the cost related to the learning time.
<補記>
 本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置(例えば通信ケーブル)が接続可能な通信部、CPU(Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい)、メモリであるRAMやROM、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、CPU、RAM、ROM、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、CD-ROMなどの記録媒体を読み書きできる装置(ドライブ)などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。
<Supplementary note>
The apparatus of the present invention includes, for example, a single hardware entity as an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Can be connected to a communication unit, a CPU (Central Processing Unit, may include a cache memory or a register), a RAM or ROM that is a memory, an external storage device that is a hard disk, and an input unit, an output unit, or a communication unit thereof , A CPU, a RAM, a ROM, and a bus connected so that data can be exchanged between the external storage devices. If necessary, the hardware entity may be provided with a device (drive) that can read and write a recording medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.
 ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている(外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるROMに記憶させておくこととしてもよい)。また、これらのプログラムの処理によって得られるデータなどは、RAMや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores a program necessary for realizing the above functions and data necessary for processing the program (not limited to the external storage device, for example, reading a program) It may be stored in a ROM that is a dedicated storage device). Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device.
 ハードウェアエンティティでは、外部記憶装置(あるいはROMなど)に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にCPUで解釈実行・処理される。その結果、CPUが所定の機能(上記、…部、…手段などと表した各構成要件)を実現する。 In the hardware entity, each program stored in an external storage device (or ROM or the like) and data necessary for processing each program are read into a memory as necessary, and are interpreted and executed by a CPU as appropriate. . As a result, the CPU realizes a predetermined function (respective component requirements expressed as the above-described unit, unit, etc.).
 本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .
 既述のように、上記実施形態において説明したハードウェアエンティティ(本発明の装置)における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing functions in the hardware entity (the device of the present invention) described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.
 この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD(Digital Versatile Disc)、DVD-RAM(Random Access Memory)、CD-ROM(Compact Disc Read Only Memory)、CD-R(Recordable)/RW(ReWritable)等を、光磁気記録媒体として、MO(Magneto-Optical disc)等を、半導体メモリとしてEEP-ROM(Electronically Erasable and Programmable-Read Only Memory)等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.
 また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 Also, this program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.
 このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP(Application Service Provider)型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの(コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等)を含むものとする。 For example, a computer that executes such a program first stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).
 また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the hardware entity is configured by executing a predetermined program on the computer. However, at least a part of these processing contents may be realized in hardware.

Claims (8)

  1.  学習対象となるモデルパラメータΩの初期値である学習済みモデルパラメータから、マスクを生成するセットアップ部と、
     モデルパラメータΩを用いて、前記学習済みモデルパラメータの学習に用いたドメインとは異なるドメインにおける入力データから抽出した特徴量から、当該特徴量に対応する出力が出力番号m(1≦m≦M、ただし、Mは特徴量に対応する出力の種類の数を表す)の出力である確率pmの分布である出力確率分布を計算する特徴量処理部と、
     前記マスクと、前記出力確率分布と、前記特徴量に対応する正解出力を識別するための番号である正解出力番号とを用いて、モデルパラメータΩを学習するモデル学習部と
     を含むモデル学習装置であって、
     L(Ω)をモデルパラメータΩを学習する際に用いる損失関数、μを実数とし、
     前記セットアップ部は、
     モデルパラメータΩの要素ωに対応するマスクの要素γを、閾値θを用いて、次式により計算し、
    Figure JPOXMLDOC01-appb-M000001
    (ただし、ω(0)は要素ωの初期値である)
     前記モデル学習部は、
     モデルパラメータΩの要素ωの更新差分δ(ω)を、次式により計算し、要素ωを更新する
    Figure JPOXMLDOC01-appb-M000002
    (ただし、∂L(Ω)/∂ωは、損失関数L(Ω)の要素ωに関する勾配)
     モデル学習装置。
    A setup unit that generates a mask from a learned model parameter that is an initial value of the model parameter Ω to be learned, and
    Using the model parameter Ω, the output corresponding to the feature amount is output number m (1 ≦ m ≦ M, from the feature amount extracted from the input data in a domain different from the domain used for learning the learned model parameter. However, M is a feature quantity processing unit for calculating the output probability distribution is a distribution of the probability p m is the output of representing a number of types of output corresponding to the feature quantity),
    A model learning device including: a model learning unit that learns a model parameter Ω using the mask, the output probability distribution, and a correct output number that is a number for identifying a correct output corresponding to the feature quantity There,
    L (Ω) is a loss function used when learning the model parameter Ω, μ is a real number,
    The setup unit
    The mask element γ corresponding to the element ω of the model parameter Ω is calculated by the following equation using the threshold θ,
    Figure JPOXMLDOC01-appb-M000001
    (Where ω (0) is the initial value of element ω)
    The model learning unit
    The update difference δ (ω) of the element ω of the model parameter Ω is calculated by the following formula, and the element ω is updated.
    Figure JPOXMLDOC01-appb-M000002
    (However, ∂L (Ω) / ∂ω is the gradient of element ω of loss function L (Ω))
    Model learning device.
  2.  請求項1に記載のモデル学習装置であって、
     前記学習済みモデルパラメータは、スパース性を有するものである
     ことを特徴とするモデル学習装置。
    The model learning device according to claim 1,
    The learned model parameter has a sparsity.
  3.  請求項2に記載のモデル学習装置であって、
     前記学習済みモデルパラメータは、次式で与えられる損失関数L(Ω)を用いて学習されたものである
    Figure JPOXMLDOC01-appb-M000003
    (ただし、E(Ω)はモデルパラメータΩを用いて特徴量から計算した出力確率分布と正解出力との誤差を示す誤差項、R(Ω)は正則化項、λは実数)
     ことを特徴とするモデル学習装置。
    The model learning device according to claim 2,
    The learned model parameter is learned using a loss function L (Ω) given by the following equation:
    Figure JPOXMLDOC01-appb-M000003
    (However, E (Ω) is an error term indicating the error between the output probability distribution calculated from the feature value using the model parameter Ω and the correct output, R (Ω) is a regularization term, and λ is a real number)
    A model learning apparatus characterized by that.
  4.  請求項1ないし3の何れか1項に記載のモデル学習装置であって、
     閾値θは、前記学習済みモデルパラメータの要素の値に関する頻度の分布に基づいて決定される
     ことを特徴とするモデル学習装置。
    The model learning device according to any one of claims 1 to 3,
    The model learning device is characterized in that the threshold θ is determined based on a frequency distribution related to values of elements of the learned model parameter.
  5.  請求項1ないし3の何れか1項に記載のモデル学習装置であって、
     閾値θは、前記学習済みモデルパラメータの要素をグルーピングしたグループごとに計算される値に関する頻度の分布に基づいて決定される
     ことを特徴とするモデル学習装置。
    The model learning device according to any one of claims 1 to 3,
    The threshold value θ is determined based on a frequency distribution related to a value calculated for each group obtained by grouping elements of the learned model parameter.
  6.  モデル学習装置が、学習対象となるモデルパラメータΩの初期値である学習済みモデルパラメータから、マスクを生成するセットアップステップと、
     前記モデル学習装置が、モデルパラメータΩを用いて、前記学習済みモデルパラメータの学習に用いたドメインとは異なるドメインにおける入力データから抽出した特徴量から、当該特徴量に対応する出力が出力番号m(1≦m≦M)の出力である確率pmの分布である出力確率分布を計算する特徴量処理ステップと、
     前記モデル学習装置が、前記マスクと、前記出力確率分布と、前記特徴量に対応する正解出力を識別するための番号である正解出力番号とを用いて、モデルパラメータΩを学習するモデル学習ステップと
     を含むモデル学習方法であって、
     L(Ω)をモデルパラメータΩを学習する際に用いる損失関数、μを実数とし、
     前記セットアップステップでは、
     モデルパラメータΩの要素ωに対応するマスクの要素γを、閾値θを用いて、次式により計算し、
    Figure JPOXMLDOC01-appb-M000004
    (ただし、ω(0)は要素ωの初期値である)
     前記モデル学習ステップでは、
     モデルパラメータΩの要素ωの更新差分δ(ω)を、次式により計算し、要素ωを更新する
    Figure JPOXMLDOC01-appb-M000005
    (ただし、∂L(Ω)/∂ωは、損失関数L(Ω)の要素ωに関する勾配)
     モデル学習方法。
    A setup step in which the model learning device generates a mask from the trained model parameter that is the initial value of the model parameter Ω to be trained;
    From the feature quantity extracted from input data in a domain different from the domain used for learning the learned model parameter by the model learning device using the model parameter Ω, the output corresponding to the feature quantity is output number m ( a feature quantity processing step of calculating the output probability distribution is a distribution of the probability p m is the output of 1 ≦ m ≦ M),
    A model learning step in which the model learning device learns a model parameter Ω using the mask, the output probability distribution, and a correct output number that is a number for identifying a correct output corresponding to the feature quantity; A model learning method including
    L (Ω) is a loss function used when learning the model parameter Ω, μ is a real number,
    In the setup step,
    The mask element γ corresponding to the element ω of the model parameter Ω is calculated by the following equation using the threshold θ,
    Figure JPOXMLDOC01-appb-M000004
    (Where ω (0) is the initial value of element ω)
    In the model learning step,
    The update difference δ (ω) of the element ω of the model parameter Ω is calculated by the following formula, and the element ω is updated.
    Figure JPOXMLDOC01-appb-M000005
    (However, ∂L (Ω) / ∂ω is the gradient of element ω of loss function L (Ω))
    Model learning method.
  7.  請求項6に記載のモデル学習方法であって、
     前記学習済みモデルパラメータは、スパース性を有するものである
     ことを特徴とするモデル学習方法。
    The model learning method according to claim 6,
    The model learning method, wherein the learned model parameter has sparsity.
  8.  請求項1ないし5の何れか1項に記載のモデル学習装置としてコンピュータを機能させるためのプログラム。 A program for causing a computer to function as the model learning device according to any one of claims 1 to 5.
PCT/JP2019/014476 2018-04-04 2019-04-01 Model learning device, model learning method, and program WO2019194128A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-072225 2018-04-04
JP2018072225A JP2019185207A (en) 2018-04-04 2018-04-04 Model learning device, model learning method and program

Publications (1)

Publication Number Publication Date
WO2019194128A1 true WO2019194128A1 (en) 2019-10-10

Family

ID=68100579

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/014476 WO2019194128A1 (en) 2018-04-04 2019-04-01 Model learning device, model learning method, and program

Country Status (2)

Country Link
JP (1) JP2019185207A (en)
WO (1) WO2019194128A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7277771B2 (en) * 2019-10-08 2023-05-19 サミー株式会社 game machine
JP7332883B2 (en) * 2019-10-08 2023-08-24 サミー株式会社 game machine

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012128744A (en) * 2010-12-16 2012-07-05 Canon Inc Object recognition device, object recognition method, learning device, learning method, program and information processing system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012128744A (en) * 2010-12-16 2012-07-05 Canon Inc Object recognition device, object recognition method, learning device, learning method, program and information processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TOKUI, SEIYA: "Deep learning approach from the viewpoint of optimization", COMMUNICATIONS OF THE OPERATIONS RESEARCH SOCIETY OF JAPAN, vol. 60, no. 4, 1 April 2015 (2015-04-01), pages 191 - 197, ISSN: 0030-3674 *

Also Published As

Publication number Publication date
JP2019185207A (en) 2019-10-24

Similar Documents

Publication Publication Date Title
US11081105B2 (en) Model learning device, method and recording medium for learning neural network model
US10950225B2 (en) Acoustic model learning apparatus, method of the same and program
US20220067588A1 (en) Transforming a trained artificial intelligence model into a trustworthy artificial intelligence model
JP6579198B2 (en) Risk assessment method, risk assessment program, and information processing apparatus
JP7085158B2 (en) Neural network learning device, neural network learning method, program
WO2020234984A1 (en) Learning device, learning method, computer program, and recording medium
JP6827911B2 (en) Acoustic model learning devices, speech recognition devices, their methods, and programs
JP2021039640A (en) Learning device, learning system, and learning method
WO2019194128A1 (en) Model learning device, model learning method, and program
US20180082167A1 (en) Recurrent neural network processing pooling operation
CN111079944A (en) Method and device for realizing interpretation of transfer learning model, electronic equipment and storage medium
CN110704668B (en) Grid-based collaborative attention VQA method and device
US20210224642A1 (en) Model learning apparatus, method and program
JP6827910B2 (en) Acoustic model learning devices, speech recognition devices, their methods, and programs
US20210334654A1 (en) Methods and systems for reducing bias in an artificial intelligence model
US20220027739A1 (en) Search space exploration for deep learning
JP7095747B2 (en) Acoustic model learning devices, model learning devices, their methods, and programs
US20220122626A1 (en) Accoustic model learning apparatus, accoustic model learning method, and program
JP6728083B2 (en) Intermediate feature amount calculation device, acoustic model learning device, speech recognition device, intermediate feature amount calculation method, acoustic model learning method, speech recognition method, program
US11281747B2 (en) Predicting variables where a portion are input by a user and a portion are predicted by a system
CN110543549A (en) semantic equivalence judgment method and device
WO2023203769A1 (en) Weight coefficient calculation device and weight coefficient calculation method
US20220269953A1 (en) Learning device, prediction system, method, and program
JP2023136713A (en) Learning device, method and program, and inference system
CN112508165A (en) Apparatus, method, and non-transitory computer-readable storage medium for information processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19780519

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19780519

Country of ref document: EP

Kind code of ref document: A1