JP2024040614A - Technology to reduce switching time of learning models and networks in deep learning - Google Patents

Technology to reduce switching time of learning models and networks in deep learning Download PDF

Info

Publication number
JP2024040614A
JP2024040614A JP2022145070A JP2022145070A JP2024040614A JP 2024040614 A JP2024040614 A JP 2024040614A JP 2022145070 A JP2022145070 A JP 2022145070A JP 2022145070 A JP2022145070 A JP 2022145070A JP 2024040614 A JP2024040614 A JP 2024040614A
Authority
JP
Japan
Prior art keywords
inference
switching
trained
networks
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2022145070A
Other languages
Japanese (ja)
Inventor
上野由紀
Original Assignee
株式会社ピノー
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ピノー filed Critical 株式会社ピノー
Priority to JP2022145070A priority Critical patent/JP2024040614A/en
Publication of JP2024040614A publication Critical patent/JP2024040614A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

【課題】求める特性に合う学習済みモデルと推論ネットワークを単一計算機上で遅延なく切り替えることで、計算機の増加を抑えつつ、利用者へのサービス時間の短縮を図る方法を提供する。【解決手段】方法は、アルゴリズム毎に異なる推論ネットワークと学習済みデータの対について、複数対を単一計算機のCPU・GPUメモリに読み込み、推論セットワークと学習済みデータを切り替えて単一演算器で推論を行う。【選択図】図2[Problem] To provide a method for reducing the service time for users while suppressing the increase in the number of computers by switching between a trained model and an inference network that match desired characteristics without delay on a single computer. [Solution] The method involves loading multiple pairs of inference networks and trained data that differ for each algorithm into the CPU/GPU memory of a single computer, switching between the inference set work and the trained data, and using a single computing unit. Make inferences. [Selection diagram] Figure 2

Description

本発明は、深層学習(以下、DLと略す)において学習済みモデル及び推論ネットワークを時間のロスなく切り替えるためのソフトウエア技術である。 The present invention is a software technology for switching between a trained model and an inference network without loss of time in deep learning (hereinafter abbreviated as DL).

DLを用いた画像や音声、文章の推論では学習済みモデルからは当該モデルが持つ特性に基づいてモデル特有の画像や音声、文章が推論される。学習データが異なると学習済みモデルの特性も変わる。推論時に必要な特性を得るには、求める特性を持つ学習済みモデルを選択し、処理可能な推論ネットワークと共に計算機上で処理させる。求める特性を変えるために学習済みモデルや推論ネットワークを変更するとき、計算機に対して学習済みモデルや推論ネットワークの再ロードを行うか、単一学習済みモデルと推論ネットワークを所有する多数の計算機を準備しなければならない。前者の場合、求める特性が異なる毎に切り替え時間が生じ、後者の場合は学習済みモデルが増えると計算機を増やす必要がある。 When inferring images, sounds, and sentences using DL, images, sounds, and sentences unique to the model are inferred from a trained model based on the characteristics of the model. If the training data differs, the characteristics of the trained model will also change. To obtain the characteristics necessary for inference, select a trained model with the desired characteristics and process it on a computer with an inference network that can process it. When changing a trained model or inference network to change the desired characteristics, either reload the trained model or inference network to the computer or prepare multiple computers with a single trained model and inference network. Must. In the former case, switching time occurs each time the desired characteristics differ, and in the latter case, it is necessary to increase the number of computers as the number of trained models increases.

当発明では、求める特性に合う学習済みモデルと推論ネットワークを単一計算機上で遅延なく切り替えることで、計算機増の増加を抑えつつ、利用者へのサービス時間の短縮を図る手法を開発した。 In this invention, we have developed a method that reduces service time to users while suppressing the increase in the number of computers by switching between trained models and inference networks that match the desired characteristics without delay on a single computer.

学習済みモデルは学習データのサイズやアルゴリズムによりそのサイズが大きく異なる。また推論ネットワークは更に大きなサイズであり、切り替え時に大きな遅延が生じる。図1は従来手法の処理フローである。再ロードによる方法では切り替えは学習済みモデルや推論ネットワークを計算機に再ロードすることで容易に実現できるが遅延が生じる。遅延を低減させる問題解決の手段として従来技術である学習済みモデルや推論ネットワーク毎に計算機を準備する手法で計算機を切替であれば、再ロ―ドは生じず遅延なく切り替えができるが、学習済みモデルや推論ネットワーク毎に計算機が必要になり、計算機資源が肥大化する。当発明では計算機の急速な計算能力の向上並びに、1台当たりの計算機リソースの大きな拡大に着目し、必要な学習済みモデル及び推論ネットワークを拡大した計算機のメモリ資源内に事前に読み込んで分類し、処理ソフトウエアから指定することで、再ロードを行わずに学習済みモデル及び推論ネットワークの切り替えを行う手法を考案した。図2 The size of trained models varies greatly depending on the size of training data and algorithm. Additionally, the inference network is larger in size, resulting in large delays when switching. Figure 1 shows the processing flow of the conventional method. In the reloading method, switching can be easily achieved by reloading the trained model or inference network to the computer, but there is a delay. As a means of solving the problem of reducing delays, if you switch computers using the conventional technique of preparing a computer for each trained model or inference network, you can switch computers without reloading and without delay. A computer is required for each model and inference network, which increases computer resources. In this invention, we focus on the rapid improvement in computing power of computers and the large expansion of computer resources per computer, and we load the necessary trained models and inference networks into the memory resources of expanded computers in advance and classify them. We devised a method to switch between trained models and inference networks without reloading them by specifying them from the processing software. Figure 2

この発明を用いることで、必要とする特性を持つ推論データを得るための学習済みモデルの切り替えや推論ネットワーク切替の遅延が削減され、大規模な計算機群によらずに限られた計算機資源において複数の学習済みモデル及び推論ネットワークを遅延なく切り替えることが可能となり、操作性の向上、サービス提供時のユーザー待ち時間の短縮、深層学習モデルを利用する産業用計算機システムにおける処理時間の短縮に大きく貢献できるようになった。 By using this invention, delays in switching trained models and switching inference networks to obtain inference data with required characteristics can be reduced, and multiple This makes it possible to switch between trained models and inference networks without delay, greatly contributing to improved operability, reduced user waiting time when providing services, and reduced processing time in industrial computer systems that use deep learning models. It became so.

学習済みモデル及び推論ネットワークを利用する従来方式の説明図である。FIG. 2 is an explanatory diagram of a conventional method using a trained model and an inference network. 本発明で改良された学習済みモデル及び推論ネットワークの利用方法の説明図である。(実施例1)FIG. 2 is an explanatory diagram of a method of using a trained model and an inference network improved by the present invention. (Example 1) 本発明で改良された学習済みモデル及び推論ネットワークにおいて、学習済みモデルのみを切り替える利用方法の説明図である。(実施例2)FIG. 2 is an explanatory diagram of a usage method for switching only the trained model in the trained model and inference network improved by the present invention. (Example 2)

当該発明は学習済みモデルを読み込む手段としてのプログラムコード及び分類する手段を有し、必要な全ての学習済みモデル読込を読み込める計算機資源、すなわちCPUの主メモリ又はGPUメモリ並びに学習済みモデル及び推論ネットワークを切り替えるためのソフトウエア(セレクタ)で構成される。 The invention has a program code as a means for reading a trained model and a means for classifying it, and requires computer resources capable of reading all necessary trained models, that is, main memory of a CPU or GPU memory, a trained model, and an inference network. Consists of software (selector) for switching.

図2(実施例1)は、本発明を利用するための具体的なデータの処理フローである。学習済みモデルとその推論ネットワークは対をなし、計算機上に両者が存在することで推論処理が実行される。さらに学習アルゴリズム毎に推論ネットワークは異なり、各推論ネットワーク毎に学習済みモデルが存在する。本発明では、単一計算機上に大きな計算機資源を持たせ、学習済みモデルと推論ネットワークを一つの組み合わせとして複数の組を事前に読み込み、演算部はセレクタにより学習済みモデルと推論ネットワークを計算機内で使い分けることで、複数推論ネットワークと学習済みモデルの処理を行う。 FIG. 2 (Embodiment 1) is a specific data processing flow for utilizing the present invention. A trained model and its inference network form a pair, and inference processing is executed when both exist on a computer. Furthermore, the inference network is different for each learning algorithm, and a trained model exists for each inference network. In the present invention, a single computer has large computer resources, a trained model and an inference network are read in advance as a single combination, and the calculation unit uses a selector to combine the trained model and inference network within the computer. By using them properly, you can process multiple inference networks and trained models.

図3(実施例2)特定のアルゴリズムによる学習済みモデルは、学習データ毎に無数に生成が可能である。一方で推論ネットワークはアルゴリズムに対して1種類であることから、単一推論ネットワークと複数学習済みデータの読込みと分類を行い、演算部がセレクタを用いて学習済みモデルを切り替えて推論させることで、複数の計算機を用いずに学習済みモデルを切り替えて推論をすることができる。この実施例では計算機における演算部の動作に無駄がなく、切り替え遅延も生じない。 FIG. 3 (Embodiment 2) An infinite number of learned models based on a specific algorithm can be generated for each learning data. On the other hand, since there is only one type of inference network for each algorithm, a single inference network and multiple trained data are read and classified, and the calculation unit uses a selector to switch between trained models and perform inference. It is possible to perform inference by switching between trained models without using multiple computers. In this embodiment, there is no waste in the operation of the arithmetic unit in the computer, and no switching delay occurs.

従来、複数推論ネットワークと複数の学習済みモデルを扱うには比較的大きなシステムが必要であったが、当発明によると小規模な計算機システムで、複雑な多数の学習済みモデルを扱うことができるため、深層学習の用途が広がる。例えば顔認証や指紋認証の処理システムにおいて、計算機を増設することなく、同時に顔と指紋の認証が可能となる。 Conventionally, a relatively large system was required to handle multiple inference networks and multiple trained models, but according to the present invention, a large number of complex trained models can be handled with a small computer system. , the applications of deep learning will expand. For example, in facial recognition and fingerprint recognition processing systems, it is possible to simultaneously perform face and fingerprint authentication without adding additional computers.

1 再ロードによる方法
2 計算機切替え
3 学習済みモデル
4 潜在ネットワーク
5 セレクタ
6 CPU・GPU
7 主メモリ
8 GPUメモリ
1 Method by reloading 2 Computer switching 3 Trained model 4 Latent network 5 Selector 6 CPU/GPU
7 Main memory 8 GPU memory

Claims (2)

アルゴリズム毎に異なる推論ネットワークと学習済みデータの対について、複数対を単一計算機のCPU・GPUメモリに読み込み、推論セットワークと学習済みデータを切り替えて単一演算器で推論を行うこと。 For each pair of inference network and trained data that differs for each algorithm, multiple pairs are loaded into the CPU/GPU memory of a single computer, and inference is performed with a single computing unit by switching between the inference set work and the trained data. 単一推論ネットワークと複数学習済みデータを単一計算機のCPU・GPUメモリに読み込み、学習済みデータを切り替えて単一演算器で推論を行うこと。 Loading a single inference network and multiple trained data into the CPU/GPU memory of a single computer, switching the trained data and performing inference with a single computing unit.
JP2022145070A 2022-09-13 2022-09-13 Technology to reduce switching time of learning models and networks in deep learning Pending JP2024040614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022145070A JP2024040614A (en) 2022-09-13 2022-09-13 Technology to reduce switching time of learning models and networks in deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2022145070A JP2024040614A (en) 2022-09-13 2022-09-13 Technology to reduce switching time of learning models and networks in deep learning

Publications (1)

Publication Number Publication Date
JP2024040614A true JP2024040614A (en) 2024-03-26

Family

ID=90369131

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2022145070A Pending JP2024040614A (en) 2022-09-13 2022-09-13 Technology to reduce switching time of learning models and networks in deep learning

Country Status (1)

Country Link
JP (1) JP2024040614A (en)

Similar Documents

Publication Publication Date Title
US11853760B2 (en) Model conversion method, device, computer equipment, and storage medium
US20220391771A1 (en) Method, apparatus, and computer device and storage medium for distributed training of machine learning model
CN109219821B (en) Arithmetic device and method
US10338925B2 (en) Tensor register files
WO2017166449A1 (en) Method and device for generating machine learning model
CN110546611A (en) Reducing power consumption in a neural network processor by skipping processing operations
CN110692038B (en) Multi-functional vector processor circuit
CN109685204B (en) Image processing method and device, storage medium and electronic equipment
Xu et al. A multiple priority queueing genetic algorithm for task scheduling on heterogeneous computing systems
US20150215379A1 (en) Distributed processing device and distributed processing system as well as distributed processing method
CN111324422A (en) Multi-target virtual machine deployment method, device, equipment and storage medium
CN111967035A (en) Model training method and device and electronic equipment
Valery et al. CPU/GPU collaboration techniques for transfer learning on mobile devices
Li et al. An intelligent collaborative inference approach of service partitioning and task offloading for deep learning based service in mobile edge computing networks
US20200311511A1 (en) Accelerating neuron computations in artificial neural networks by skipping bits
CN114297934A (en) Model parameter parallel simulation optimization method and device based on proxy model
CN114648103A (en) Automatic multi-objective hardware optimization for processing deep learning networks
CN116261734A (en) Neural architecture scaling for hardware accelerators
JP2024040614A (en) Technology to reduce switching time of learning models and networks in deep learning
JP2021033994A (en) Text processing method, apparatus, device and computer readable storage medium
WO2022127603A1 (en) Model processing method and related device
CN111178529B (en) Data processing method and device, electronic equipment and readable storage medium
EP3948685A1 (en) Accelerating neuron computations in artificial neural networks by skipping bits
CN111737491A (en) Method and device for controlling interactive process, storage medium and equipment
EP3893143A1 (en) Corpus processing method, apparatus and storage medium