WO2019182059A1 - Dispositif de génération de modèle, procédé de génération de modèle et programme - Google Patents

Dispositif de génération de modèle, procédé de génération de modèle et programme Download PDF

Info

Publication number
WO2019182059A1
WO2019182059A1 PCT/JP2019/011865 JP2019011865W WO2019182059A1 WO 2019182059 A1 WO2019182059 A1 WO 2019182059A1 JP 2019011865 W JP2019011865 W JP 2019011865W WO 2019182059 A1 WO2019182059 A1 WO 2019182059A1
Authority
WO
WIPO (PCT)
Prior art keywords
error
model
training data
data
model generation
Prior art date
Application number
PCT/JP2019/011865
Other languages
English (en)
Japanese (ja)
Inventor
裕也 海野
Original Assignee
株式会社 Preferred Networks
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社 Preferred Networks filed Critical 株式会社 Preferred Networks
Publication of WO2019182059A1 publication Critical patent/WO2019182059A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to a model generation device, a model generation method, and a program.
  • an embodiment of the present invention proposes a model generation device, a model generation method, and a program that perform machine learning with a memory usage amount that does not depend on the mini-batch size.
  • a model generation apparatus includes an input unit, an error output unit, an error calculation unit, and a model generation unit.
  • the input means divides the training data and inputs it to the model.
  • the error output means outputs a first error representing a difference between the data acquired by inputting the divided training data into the model and the correct answer label of the divided training data.
  • the error calculation means calculates a second error representing a difference between the data acquired by inputting the training data into the model and the correct answer label of the training data.
  • the model generation means generates the learned model in which the weight of at least one layer of the neural network is updated by back propagation based on the second error.
  • the figure which shows the concept of a prediction problem The figure which shows the concept of a prediction problem.
  • the figure which shows the concept of a prediction problem The block diagram which shows the function of the machine learning apparatus which concerns on one Embodiment.
  • the flowchart which shows the flow of the process which concerns on one Embodiment.
  • FIG. 1A is a diagram illustrating an example of a concept of an input / output state in forward propagation for a word prediction problem.
  • Input x is a vector of D dimension
  • product xW T of the weight matrix W and the input x is a hidden layer is calculated.
  • W T represents a transposed matrix of W.
  • the weight matrix W is indicated by a V ⁇ D matrix.
  • V indicates the vocabulary size of the model, that is, the number of words to be predicted in the model.
  • an output element predicted in the model at that time point is obtained by inputting the calculation result to, for example, a softmax function in the output layer.
  • the predicted result is compared with teacher data to calculate a loss.
  • the hidden layer elements are optimized.
  • the model is optimized by learning the hidden layer W by repeating this forward propagation and back propagation.
  • FIG. 1A described above uses one of the training data, and at the time of back propagation error, the loss is calculated for one or more partial data in the training data.
  • batch learning all training data is used for error back propagation at one time.
  • mini-batch learning that uses some data at once is generally used. Done.
  • the number of data calculated at one time is called the number of batches.
  • a highly parallel processor such as a GPU
  • parallel calculation for reducing the number of batches cannot be performed efficiently, so that the calculation efficiency is significantly reduced. Therefore, calculation is performed with a certain number of batches in order to increase the utilization efficiency of the processor.
  • X indicates an input x in which each row is input to the input layer of FIG. 1B.
  • B inputs that is, inputs x 1 , x 2 ,..., X i ,. ⁇ x input B which is a matrix, each including a row.
  • the output for this each input x i is likewise summarized as Y, Y can be, for example, B-number of the output, i.e., the output y 1 with respect to the input x 1, ⁇ ⁇ ⁇ , the output y i to the input x i, ⁇ ⁇ -, it indicated the output y B for the input x B in a matrix, each including a row.
  • the operation of the mini batch size B can be efficiently processed by applying the softmax function to each row of the output of the hidden layer. It is possible to improve the convergence and accuracy of the model by calculating the average of each row of Y, calculating the loss from the error with the correct label, and optimizing the loss by propagating back.
  • the mini-batch size B can be increased to further increase the calculation efficiency.
  • the calculation method of the input and the hidden layer the calculation is performed on data having a large mini-batch size, for example, a size that is normally difficult to store in the accelerator memory. Make it possible to do.
  • the memory usage method, the loss calculation method, and the like will be described in detail.
  • FIG. 2 is a block diagram illustrating functions of the machine learning device according to the present embodiment.
  • the machine learning device 1 includes an input unit 10, a control unit 12, a storage unit 14, a learning unit 16, and an output unit 18, and performs machine learning.
  • the machine learning device 1 functions as a model generation device that generates a model by machine learning.
  • the input unit 10 is an interface through which data from the outside of the machine learning device 1 is input, and receives input of training data, hyper parameters, and the like.
  • the input data is transmitted to the control unit 12 and processed. Further, the input data may be transmitted to the storage unit 14 and temporarily stored.
  • the control unit 12 performs control for the learning unit 16 to learn the model and control for storing data in the storage unit 14.
  • the learning unit 16 learns the model based on the input training data.
  • the learning unit 16 performs processing by transmitting and receiving data stored in the storage unit 14 at an appropriate timing.
  • the storage unit 14 stores training data input from the input unit 10.
  • the storage unit 14 may include a main storage device and an auxiliary storage device as a hardware configuration.
  • a program for operating the control unit 12, the learning unit 16, and the like may be stored.
  • Training data has a large capacity and can be input / output at a low speed. Once stored in the main storage, the capacity is smaller than that of the main storage, and the data can be input / output at a high speed. It may be transferred to the auxiliary storage device as necessary.
  • the output unit 18 outputs the model learned by the learning unit 16 to the outside.
  • a database (not shown) may be provided inside the machine learning device 1 and a model may be output to the database.
  • This database may be provided in the storage unit 14.
  • the machine learning device 1 may be a processing device that performs natural language processing or the like. In this case, the learned model may be stored in a necessary place as appropriate.
  • FIG. 3 is a block diagram showing the internal functions of the learning unit 16.
  • the function of the learning unit 16 is not limited to RNN in natural language processing, and can be similarly applied to other models in which loss calculation is performed by dividing into relatively large mini-batches. It is.
  • the learning unit 16 includes a data selection unit 160, a forward propagation unit 162, an error output unit 164, an error calculation unit 166, and a back propagation unit 168.
  • the data selection unit 160 receives a control signal from the control unit 12 and selects training data used for learning in the mini-batch. For example, the training data is randomly allocated to the mini-batch having the element B. The randomly distributed data is output to the forward propagation unit 162. Alternatively, the label of randomly distributed data may be notified to the forward propagation unit 162, and the forward propagation unit 162 may acquire the mini-batch training data from the storage unit 14 when performing the calculation.
  • the forward propagation unit 162 performs forward propagation in model generation and calculates a numerical value for calculating a loss in each layer.
  • the error output unit 164 refers to the output of each layer obtained by the forward propagation unit 162. This output is calculated as a vector based on a dimension in each layer or a matrix format in which a predetermined number of rows are aggregated.
  • the error output unit 164 calculates the difference (first difference) from the correct label for the output vector or matrix in each layer output by the forward propagation unit 162. By multiplying the calculated first difference by, for example, a softmax function, a predetermined number (b) of input data in a mini-batch, that is, at least a part (B) of training data, is input. Calculate the loss (first error) for the data.
  • the error calculation unit 166 calculates the mini-batch loss (second error), which is the second difference, by calculating the sum or average of the first errors. That is, the first error indicates a difference between the model output value and the label value of b pieces of data, and the second error indicates a loss in the mini-batch (B pieces).
  • FIG. 4 is a diagram showing an outline of the mini-batch processing in the present embodiment.
  • the input corresponding to the batch size is divided into N groups by the predetermined size b smaller than B.
  • B N ⁇ b
  • all sub-matrices Xn are b ⁇ D matrices.
  • b may be defined as a predetermined number smaller than B that does not depend on B.
  • V is a vocabulary size and has a value of about 50,000 to 100,000. For this reason, when the batch size B is increased, the matrix shown in FIG. 1B, in particular, all the elements of the matrix Y including B ⁇ V elements to be acquired in the output layer are stored in the memory at the same timing. It is difficult to make the batch size B sufficiently large.
  • the n computation for n 1,2, ⁇ , by performing sequentially to N, the first error by the error output unit 164 L 1, L 2, ⁇ , L N is calculated.
  • the error calculation unit 166 calculates a second error that is a loss in the mini-batch by calculating the sum or average of the first errors.
  • the first error L n is calculated, by discarding the partial matrix Y n , the second error corresponding to the matrix Y, which is a B ⁇ V matrix, can be obtained even if the mini batch size B is large. It is possible to calculate without directly obtaining.
  • the submatrix Y n does not need to be discarded. For example, an area corresponding to b ⁇ V ⁇ (the number of element bits) may be secured and used. Furthermore, the need to store also the first error L 1, etc. all without prepares a variable called loss L, we take the sequentially sum the first error L n calculated from the partial matrix Y n You may do it.
  • the loss L may be obtained by computing the partial matrix Y n in each computation core such as a GPU, obtaining the first error L n by parallel computation, and obtaining this sum.
  • the back propagation unit 168 optimizes the model by performing error back propagation based on the second error calculated by the forward propagation unit 162.
  • the error calculation unit 166 recalculates and outputs the first error (a partial error with respect to the submatrix Xn ) for each predetermined number b with respect to the mini-batch at a timing required during execution of error back propagation.
  • back propagation portion 168 may perform backpropagation for each partial matrix X n, again, a partial matrix Y n are required to calculate the first error is the error of each the partial matrix X n.
  • Error calculation unit 166 by recalculating the Y n, obtains the Y n of the error output unit 164 is calculated, to calculate a first error.
  • the amount of memory required at the time of recalculation is sufficient if an area that can be calculated for each mini-batch can be secured, and if the area for b ⁇ V ⁇ (number of element bits) is secured, as in the case of the calculation described above, it is executed. It is possible. In view of b ⁇ B, it can be seen that the amount of memory used can be reduced.
  • the overall calculation cost is that the batch size B cannot be made sufficiently large even when the recalculation cost of Y n is taken into consideration. Becomes higher. Therefore, recalculate the Y n, i.e., the re-calculated difference from b pieces of data by recalculating a partial error (first error), as take a minute calculation cost for obtaining a first error in the mini-batch
  • a partial error first error
  • the error calculation unit 166 does not need to be provided independently, and the error output unit 164 may have a function of the error calculation unit 166 and may not include the error calculation unit 166.
  • the forward propagation unit 162 and the back propagation unit 168 may function as a single unit, and may function as a model generation unit that generates a model by performing forward propagation and back propagation. Further, this back propagation of error can be similarly applied not only to the output layer but also to an intermediate layer.
  • the second error may be calculated using the result of forward propagation to a middle layer and the result of back propagation to the layer with respect to the partial matrix Xn , and the back propagation of the layer may be executed. In this case, it is not always necessary to use a softmax function or the like in the calculation of the first error.
  • the error calculation of the present embodiment can be executed for at least one layer constituting the network.
  • the forward propagation unit 162 performs processing, and the calculation is continued using the next mini-batch, or when the next epoch is shifted to, the data selection unit 160 performs processing.
  • FIG. 5 is a flowchart illustrating an example of a processing flow according to the present embodiment.
  • training data is input to the machine learning device 1 via the input unit 10 (S100).
  • the input training data is stored in the storage unit 14 as necessary.
  • Necessary information such as the number of data is output to the control unit 12. Thereafter, the control unit 12 controls processing necessary for learning, such as learning of the learning unit 16 and data transmission of the storage unit 14.
  • the data selection unit 160 of the learning unit 16 randomly distributes data for each predetermined mini-batch size B and selects data for which a mini-batch is generated (S102).
  • the mini-batch size B may be a preset parameter or may be input to the control unit 12 via the input unit 10 as a hyper parameter.
  • the control unit 12 When input as a hyperparameter, the control unit 12 generates a mini-batch so that the data selection unit 160 selects data for each mini-batch size B.
  • This step may be a step of distributing data in advance for each mini-batch instead of selecting data for each mini-batch.
  • the data selection may be, for example, a process of reading the selected data into an easily accessible memory, or a process of outputting the index of the selected data to the forward propagation unit 162 and the back propagation unit 168.
  • the data necessary for the calculation may be read into an easily accessible memory at the timing when the processing is performed by the forward propagation unit 162 or the like.
  • other optimization processes such as loop unrolling and software pipelining may be performed.
  • the training data in the mini-batch is forward propagated for each part by the forward propagation unit 162, and a first error is calculated for each partial data forwardly propagated by the error output unit 164, and the first error for each partial data is calculated. Based on the error, the second error of the entire mini-batch is calculated (S104).
  • the back propagation unit 168 performs error back propagation using the first error or the second error calculated in S104, and the model is optimized (S106).
  • recalculation may be performed for each back propagation process of the partial data.
  • the first error may be calculated by performing this recalculation, and error back propagation may be performed using the first error. That is, in each layer, the second difference between the output of the layer and the error propagated back from the next layer is obtained as the first error, and the error propagates back to the previous layer as the first error. Also good. Thus, the first error may be obtained in each layer.
  • the second error as the whole mini-batch may be calculated. In this way, the back propagation unit 168 performs back propagation using the second error and a process (function) for obtaining the first error corresponding to the second error by forward propagation.
  • the output unit 18 outputs the learned model and finishes the process (S112).
  • the end of the learning is that the value of the loss, for example, the second error in the output layer, is smaller than the predetermined value, the calculation of the predetermined number of epochs is completed, and the evaluation value becomes larger than the predetermined value in the validation. It is judged according to the conditions such as.
  • the machine learning device 1 may be stored in the storage unit 14 to function as a natural language processing device using the model, for example. Further, the output unit 18 may output the model stored in the storage unit 14.
  • FIG. 6 is a diagram illustrating a hardware implementation example of the present embodiment.
  • the machine learning device 1 includes a CPU 200, an accelerator 202, a main storage device 204, an auxiliary storage device 206, a network interface 208, and a device interface 210. Each of these devices is connected by a bus 212.
  • a CPU (Central Processing Unit) 200 is a processor that operates the machine learning device 1 and operates the machine learning device 1 based on a program stored in the main storage device 204, for example.
  • the accelerator 202 is a device for assisting arithmetic processing, and includes, for example, a GPU.
  • the GPU speeds up the numerical calculation by GPGPU (General-Purpose computing on GPU).
  • the accelerator 202 may be provided with a memory that is an auxiliary storage device in itself, and may be capable of accessing data stored in the memory at high speed. In the exchange with the main storage device 204 or the auxiliary storage device 206, prefetching of necessary data from these storage devices to the memory on the accelerator 202 is performed so that this high-speed access can be utilized to the maximum extent. Also good.
  • the main storage device 204 is directly connected to the CPU 200 via a main bus or the like, and mainly stores programs and the like necessary for the operation of the machine learning device 1.
  • the main storage device 204 includes, for example, DRAM (Dynamic Random Access Memory) or SRAM (Static Random Access Memory).
  • the auxiliary storage device 206 is slower in throughput than the main storage device 204 but has a large capacity memory.
  • the auxiliary storage device 206 does not need to be in the same computer as the computer in which the machine learning device 1 is configured, and may be installed outside.
  • the training data may be stored in the auxiliary storage device 206, transferred to the CPU 200, the accelerator 202, and the main storage device 204 via the bus 212 and used.
  • the network interface 208 is an interface that connects the external network 300 and the machine learning device 1.
  • the device interface 210 is an interface that connects the external device 400 and the machine learning device 1.
  • the external device 400 may be connected to the machine learning device 1 via the network 300 and the network interface 208.
  • the calculation in this embodiment is mainly executed on the accelerator 202.
  • the arithmetic processing in the accelerator 202 is higher than that of the CPU 200, the capacity of the memory mounted in the accelerator 202 is often smaller than that of the main storage device 204 and the auxiliary storage device 206.
  • the data on the memory can be accessed at high speed from the processor in the accelerator 202, while the access from the processor in the accelerator 202 to the main storage device 204 and the auxiliary storage device 206 may be slow. Many.
  • the number of vectors to be processed at the same timing in the partial data may be determined based on the memory capacity on the accelerator 202.
  • b may be set so that the capacity excluding the capacity of a program necessary for operating the processor of the accelerator 202 and the buffer capacity of input data or the like can be used as much as the capacity of b ⁇ V elements. .
  • an accurate model is learned by calculating a partial loss (first error or second error) for each predetermined number of data that does not depend on the mini-batch size. Therefore, it is possible to secure a high calculation speed without reducing the mini-batch size. As described above, it is possible to perform loss back propagation and error propagation in a memory capacity that does not depend on the mini-batch size, so that it is possible to efficiently use the computing resources of the computer.
  • the data selection unit 160 inputs mini-batch size data to the model generation apparatus including the forward propagation unit 162 and the backward propagation unit 168, and the forward propagation unit 162 and the backward propagation unit 168 have a predetermined number of sizes of data.
  • the present invention is not limited to this.
  • the data selection unit 160 includes a data division unit that divides the data into b pieces of data, and the data division unit adds b pieces of input data to the model generation unit. May be operated as an input means.
  • the forward propagation unit 162 obtains the output from the output layer by calculation for the b pieces of data input from the input unit to the network.
  • the error output unit 164 outputs a first error based on the first difference between the output corresponding to the b pieces of data and the correct answer label. This error is stored, and the first error is similarly output for the next b pieces inputted from the input means.
  • the error calculation unit 166 calculates a second error, which is an error of the mini-batch.
  • the back propagation unit 168 outputs a first error via the error output unit 164 based on the b pieces of data input from the input means for the output of each layer and the back propagated output. Then, when the output of the first error is completed for the data for the mini-batch, the error calculation unit 166 calculates the second error in the layer focused on.
  • a data dividing unit that divides the data into b pieces of data may be provided as input means.
  • the division may have a certain amount of fluctuation with respect to the predetermined number b, for example. As another example, it may be changed dynamically according to the usage rate of the memory. Thus, the division is not limited to each predetermined number, and may be appropriately changed according to the situation, or may be various other division methods.
  • RNN learning in natural language processing has been described as an example.
  • the present invention is not limited to this, and is also applicable to learning in other neural networks that require a large amount of data areas when performing loss calculations. Is possible.
  • it can be used not only for MLP and CNN but also for LSTM (Long Shot-Term Memory).
  • the generated model is a model that performs natural language processing.
  • the present invention is not limited to this, and the machine learning device 1 generates a model that performs processing on various other data for other purposes. It may be a thing.
  • the softmax function or the like which is a function in the above description, is shown as an example, and other implementations may be used.
  • the softmax function may be a function suitable for obtaining other gradients such as a sigmoid function and ReLU (Rectified Linear Unit). It is possible to select an appropriate function as a function to be used in other places as appropriate.
  • control unit 12 is a control circuit implemented by analog, digital, or FPGA (Field Programmable Gate Array), ASIC (Application Specific Specific Integrated Circuit), or the like. There may be.
  • learning unit 16 may be implemented by a circuit.
  • the machine learning device 1 may be configured by hardware, or may be configured by software, and the CPU or the like may be implemented by software information processing.
  • the machine learning device 1 and a program that realizes at least a part of the functions are stored in a storage medium such as a flexible disk or a CD-ROM, and read and executed by a computer. Also good.
  • the storage medium is not limited to a removable medium such as a magnetic disk or an optical disk, but may be a fixed storage medium such as a hard disk device or a memory. That is, information processing by software may be specifically implemented using hardware resources.
  • the processing by software may be implemented in a circuit such as an FPGA and executed by hardware.
  • the generation of the model and the processing after inputting the model may be performed using an accelerator such as a GPU, for example.
  • a processing circuit such as a CPU, a storage device such as a memory, and other necessary hardware may be provided one by one, or a plurality of at least one may be provided.
  • the model generated by the machine learning device 1 can be used as a program module that is a part of the artificial intelligence software.
  • the CPU of the computer operates based on the model stored in the storage unit so as to perform an operation and output the result.
  • the present invention can be applied to natural language processing as an example.
  • natural language processing it is possible to deal with a neural network that inputs an image.
  • a plurality of input images can be divided into mini-batches and applied in the same manner.
  • the pixels in the image may be divided and the calculation may be performed for each of the divided pixels as described above.
  • the present invention can be applied to other types of data as long as mini-batch processing and division within the data can be appropriately performed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un apprentissage automatique dans des quantités d'utilisation de mémoire indépendantes de la taille de mini-lot. Ce dispositif de génération de modèle génère un modèle pré-entraîné comprenant un modèle de réseau neuronal, et comprend un moyen d'entrée, un moyen de sortie d'erreur, un moyen de calcul d'erreur et un moyen de génération de modèle. Le moyen d'entrée divise des données d'apprentissage et les introduit dans le modèle. Le moyen de sortie d'erreur délivre des premières erreurs représentant la différence entre des données acquises par entrée des données d'apprentissage divisées dans le modèle et les étiquettes de réponse correcte des données d'apprentissage divisées. Sur la base des premières erreurs, le moyen de calcul d'erreur calcule des secondes erreurs représentant la différence entre des données acquises par entrée des données d'apprentissage dans le modèle et les étiquettes de réponse correcte des données d'apprentissage. Sur la base des secondes erreurs, le moyen de génération de modèle génère le modèle pré-entraîné dans lequel le poids d'au moins une couche du réseau neuronal a été mis à jour par rétropropagation d'erreurs.
PCT/JP2019/011865 2018-03-22 2019-03-20 Dispositif de génération de modèle, procédé de génération de modèle et programme WO2019182059A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-055087 2018-03-22
JP2018055087A JP2021119425A (ja) 2018-03-22 2018-03-22 モデル生成装置、モデル生成方法及びプログラム

Publications (1)

Publication Number Publication Date
WO2019182059A1 true WO2019182059A1 (fr) 2019-09-26

Family

ID=67986223

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/011865 WO2019182059A1 (fr) 2018-03-22 2019-03-20 Dispositif de génération de modèle, procédé de génération de modèle et programme

Country Status (2)

Country Link
JP (1) JP2021119425A (fr)
WO (1) WO2019182059A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151510A1 (en) * 2018-11-12 2020-05-14 Advanced Micro Devices, Inc. Adaptive batch reuse on deep memories
CN114118449B (zh) * 2022-01-28 2022-10-04 深圳佑驾创新科技有限公司 基于偏标记学习模型的图片标签识别方法、介质及设备

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018160180A (ja) * 2017-03-23 2018-10-11 富士通株式会社 情報処理システム、情報処理装置および情報処理システムの制御方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018160180A (ja) * 2017-03-23 2018-10-11 富士通株式会社 情報処理システム、情報処理装置および情報処理システムの制御方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASHITANI, TATSUJI: "To try Data- Parallel with Distributed TensorFlow", DISTRIBUTED TENSORFLOW QIITA, 2016, XP055640683, Retrieved from the Internet <URL:https://qiita.com/ashitani/items/dbe76cb9194d60ead9de> [retrieved on 20190613] *
MORINAGA, YUYA ET AL.: "Development of hybrid type operation method of mathematical programming and machine learning for a thermal grid", DOCUMENTS OF RESEARCH GROUP OF THE INSTITUTE OF ELECTRICAL ENGINEERING OF JAPAN, 11 June 2017 (2017-06-11), pages 7 - 12 *

Also Published As

Publication number Publication date
JP2021119425A (ja) 2021-08-12

Similar Documents

Publication Publication Date Title
US11308398B2 (en) Computation method
KR101959376B1 (ko) 멀티 코어 최적화된 순환 신경망을 위한 시스템 및 방법
US20180260709A1 (en) Calculating device and method for a sparsely connected artificial neural network
US20170061279A1 (en) Updating an artificial neural network using flexible fixed point representation
US11983616B2 (en) Methods and apparatus for constructing digital circuits for performing matrix operations
US20190130268A1 (en) Tensor radix point calculation in a neural network
JP7410395B2 (ja) 最適化装置及び最適化方法
WO2019182059A1 (fr) Dispositif de génération de modèle, procédé de génération de modèle et programme
KR102290531B1 (ko) 재조직 가능한 뉴럴 네트워크 컴퓨팅 장치
CN114662646A (zh) 实现神经网络的方法和装置
US20190130276A1 (en) Tensor manipulation within a neural network
US20210294784A1 (en) Method and apparatus with softmax approximation
US11709783B1 (en) Tensor data distribution using grid direct-memory access (DMA) controller
WO2023125857A1 (fr) Procédé de formation de modèle sur la base d&#39;un système d&#39;infrastructure d&#39;apprentissage automatique et dispositif associé
CN116090518A (zh) 基于脉动运算阵列的特征图处理方法、装置以及存储介质
KR20230132369A (ko) 양자 회로에서의 리소스 감소
JP2020080048A (ja) 並列処理装置およびプログラム
US11704562B1 (en) Architecture for virtual instructions
CN114330682A (zh) 应用于Fastformer神经网络的硬件架构及其计算方法
Pochelu et al. An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks
JP2020191017A (ja) 情報処理装置、情報処理方法および情報処理プログラム
KR20200023155A (ko) 뉴럴 네트워크의 학습을 가속하는 방법 및 뉴럴 네트워크 시스템
JP7470019B2 (ja) 情報処理システム
US20240095493A1 (en) Desparsified convolution for sparse tensors
TWI844228B (zh) 訓練神經網路執行機器學習任務

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19771724

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19771724

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP