WO2024135867A1 - Procédé d'apprentissage à transfert efficace pour réseau d'apprentissage profond de petite échelle - Google Patents

Procédé d'apprentissage à transfert efficace pour réseau d'apprentissage profond de petite échelle Download PDF

Info

Publication number
WO2024135867A1
WO2024135867A1 PCT/KR2022/020680 KR2022020680W WO2024135867A1 WO 2024135867 A1 WO2024135867 A1 WO 2024135867A1 KR 2022020680 W KR2022020680 W KR 2022020680W WO 2024135867 A1 WO2024135867 A1 WO 2024135867A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
learning network
network
layers
weights
Prior art date
Application number
PCT/KR2022/020680
Other languages
English (en)
Korean (ko)
Inventor
이상설
이은총
김경호
Original Assignee
한국전자기술연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국전자기술연구원 filed Critical 한국전자기술연구원
Priority claimed from KR1020220177741A external-priority patent/KR20240096951A/ko
Publication of WO2024135867A1 publication Critical patent/WO2024135867A1/fr

Links

Images

Definitions

  • the present invention relates to deep learning network learning, and more specifically, to a method for efficiently transfer learning a small deep learning network with a pre-trained deep learning network.
  • Transfer learning is a method used in a variety of applications based on CNN (Convolutional Neural Network). Transfer learning allows CNN to be trained even with limited datasets.
  • CNN Convolutional Neural Network
  • the present invention was created to solve the above problems, and the purpose of the present invention is to use small-scale deep learning networks in limited resource environments such as mobile edge computing, which has no choice but to use a small number of learning datasets.
  • the goal is to provide an efficient transfer learning method for deep learning networks.
  • a deep learning network learning method for achieving the above object includes transferring weights of a pre-trained deep learning network to some of the layers constituting the deep learning network; It includes a step of fine tuning the deep learning network to which the weights have been transferred.
  • the transfer step may be to transfer the weights of the deep learning network that have been pre-trained for only some channels to some of the layers that make up the deep learning network.
  • the transition step may be setting the weights of some of the layers constituting the deep learning network to 0 for the remaining channels.
  • Some layers may include the first and second layers of a deep learning network.
  • the deep learning network training method according to the present invention may further include randomly initializing weights for the remaining layers among the layers constituting the deep learning network.
  • the remaining layers may include layers from the third layer to the final layer of the deep learning network.
  • the transferred weights for some channels in some layers may be fixed.
  • the deep learning network may be a CNN (Convolutional Neural Network) that is smaller than a pre-trained deep learning network.
  • the learning dataset of a deep learning network may be a smaller learning dataset than the learning dataset of a pre-trained deep learning network.
  • a deep learning operator transfers the weights of a deep learning network pre-trained for some of the layers constituting the deep learning network and fine tunes the deep learning network to which the weights have been transferred;
  • a deep learning network computing device is provided, characterized in that it includes a memory that provides storage space required for a deep learning computer.
  • a deep learning network learning method including the step of randomly initializing weights for the remaining layers among the layers constituting the deep learning network.
  • the weights of the pre-trained deep learning network are transferred to some of the layers constituting the deep learning network, and the weights are randomly assigned to the remaining layers among the layers constituting the deep learning network.
  • a deep learning network computing device is provided, characterized in that it includes a memory that provides storage space required for a deep learning computer.
  • efficient transfer for a small deep learning network is achieved in a limited resource environment such as mobile edge computing, where a small number of learning datasets can only be used based on a small deep learning network.
  • accuracy can be improved and learning time can be reduced.
  • weight initialization is an important factor in learning that determines the accuracy and learning speed of the neural network.
  • a bad approach to weight initialization can cause bursts or loss of output activation and gradients.
  • transfer learning is a very simple and efficient method of learning another dataset through a pre-trained network.
  • Transfer learning allows you to achieve high accuracy in a short training time with a limited dataset compared to randomly initialized models.
  • Most CNNs Convolutional Neural Networks
  • Most CNNs use it because of its ease and powerful performance.
  • transfer learning generally assumes a CNN model that is large enough to learn. Therefore, the limited resources of small-scale CNNs and limited datasets require a different approach than large-scale model-based transfer learning processes.
  • An embodiment of the present invention presents a transfer learning method in limited resources such as mobile edge computing, which has no choice but to use a small number of images based on a small-scale CNN.
  • the effect of each layer of transfer learning through multiple datasets is quantified through experiments, and the relative size between output channels of the pre-trained model is calculated.
  • the experimental results suggest that a model that activates only a few important channels in the front layers can achieve existing performance.
  • embodiments of the present invention will present several models applying hybrid selection between pre-trained weights and randomly initialized weights, which showed better performance compared to existing methods.
  • Transfer learning is based on the concept that a deep learning network can learn the feature representation of each layer. There are differences in the characteristics expressed depending on the layer depth. Specifically, the lower layer represents general features and the upper layer represents specific features.
  • the proposed fine tuning strategy uses a) a large-scale different dataset, b) a large-scale similar dataset, c) a small-scale (limited) other dataset, and d) a small-scale similar dataset.
  • the third strategy which is advantageous for learning through a limited and small dataset, is used because it showed the best performance in the experiment.
  • the first group includes the first and second layers, showing nearly 90% channel similarity.
  • the second group includes layers from the third layer to the final layer.
  • the experimental model was constructed so that the initial model could be applied differently for each stage.
  • CIFAR-100 was used for the pre-trained model. Seven multiple datasets were used for learning, including Oxford Flower, LFW, Stanford Dogs, CUB, Food-101, FGVC-Aircraft, and CIFAR-10.
  • Model-1 the first group of layers fixed the weights transferred during learning to prevent weight learning.
  • Figure 5 shows the average accuracy and training time (epochs to achieve the highest accuracy) of eight models trained on seven multiple datasets.
  • Model-1 which used randomly initialized weights, has significantly lower performance compared to other models. And through experiments with various initial weights of the hybrid model, the following meaningful results were derived.
  • the hybrid models showed relatively short training times when loading pre-trained weights with only the top 20% of channels for the first group (Model-5, Model-7, and Model-8).
  • Model-8 which has both advantages, showed the best performance.
  • Model-8 achieved the highest average accuracy of 79.97% at the shortest average epoch.
  • the hybrid model showed an increase in accuracy of 12.33% with a learning speed 1.6 times faster than Model-1, the existing randomly initialized model. Additionally, it showed an accuracy increase of 1.2% with a learning speed that was 1.1 times faster than Model-2, the existing transfer learning model.
  • Figure 6 is a diagram showing the configuration of a small-scale deep learning network computing device according to an embodiment of the present invention.
  • the lightweight deep learning computing device includes a communication interface 110, a deep learning calculator 120, and a memory 130.
  • the communication interface 110 communicates with an external host system and receives data sets, parameters (weight, bias) of a pre-trained deep learning network, etc.
  • the deep learning calculator 120 performs learning on the installed deep learning network and performs inference using the learned deep learning network.
  • the memory 130 provides storage space necessary for the deep learning calculator 120 to operate and function.
  • the deep learning operator 120 transfers the weights of the pre-trained deep learning network only to the first group of layers among the layers constituting the deep learning network.
  • the first group of layers includes the first layer and the second layer.
  • the deep learning operator 120 transfers the pre-learned weights only for important channels for the layers of the first group, and zero-pads the remaining channels to set the weights to 0. Important channels can be selected by calculating the average size of each channel.
  • the deep learning operator 120 randomly initializes weights for the layers of the second group.
  • the second group of layers includes layers from the third layer to the final layer.
  • the deep learning operator 120 fine-tunes the deep learning network transfer learned by the above method.
  • the weights transferred during the fine tuning process are fixed.
  • a computer-readable recording medium can be any data storage device that can be read by a computer and store data.
  • computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc.
  • computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.

Landscapes

  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé d'apprentissage à transfert efficace pour un réseau d'apprentissage profond de petite échelle. Le procédé d'apprentissage de réseau d'apprentissage profond selon un mode de réalisation de la présente invention comprend : le transfert d'une pondération d'un réseau d'apprentissage profond préentraîné à certaines couches parmi des couches constituant un réseau d'apprentissage profond ; le réglage fin du réseau d'apprentissage profond dans lequel la pondération a été transférée. En conséquence, dans un environnement à ressources limitées tel qu'un calcul de périphérie mobile qui n'a pas d'autre choix que d'utiliser un petit nombre d'ensembles de données d'entraînement sur la base d'un réseau d'apprentissage profond de petite échelle, par l'intermédiaire d'un procédé d'apprentissage à transfert efficace pour un réseau d'apprentissage profond de petite échelle, la précision peut être améliorée et un temps d'entraînement peut être réduit.
PCT/KR2022/020680 2022-12-19 2022-12-19 Procédé d'apprentissage à transfert efficace pour réseau d'apprentissage profond de petite échelle WO2024135867A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020220177741A KR20240096951A (ko) 2022-12-19 소규모 딥러닝 네트워크를 위한 효율적인 전이 학습 방법
KR10-2022-0177741 2022-12-19

Publications (1)

Publication Number Publication Date
WO2024135867A1 true WO2024135867A1 (fr) 2024-06-27

Family

ID=91589094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/020680 WO2024135867A1 (fr) 2022-12-19 2022-12-19 Procédé d'apprentissage à transfert efficace pour réseau d'apprentissage profond de petite échelle

Country Status (1)

Country Link
WO (1) WO2024135867A1 (fr)

Similar Documents

Publication Publication Date Title
Noroozi et al. Boosting self-supervised learning via knowledge transfer
KR102139740B1 (ko) 전자 장치 및 학습 모델 최적화 방법
WO2024135867A1 (fr) Procédé d'apprentissage à transfert efficace pour réseau d'apprentissage profond de petite échelle
CN111309921A (zh) 一种文本三元组抽取方法及抽取系统
WO2020153597A1 (fr) Procédé et appareil de génération de modèle de classification à plusieurs niveaux
WO2023033194A1 (fr) Procédé et système de distillation de connaissances spécialisés pour l'éclaircissement de réseau neuronal profond à base d'élagage
WO2023277448A1 (fr) Procédé et système d'entraînement de modèle de réseau neuronal artificiel pour traitement d'image
WO2023136417A1 (fr) Procédé et dispositif de construction d'un modèle de transformateur pour répondre à une question d'histoire vidéo
Madhumani et al. Learning not to discriminate: task agnostic learning for improving monolingual and code-switched speech recognition
WO2021020848A2 (fr) Opérateur matriciel et procédé de calcul matriciel pour réseau de neurones artificiels
WO2022107925A1 (fr) Dispositif de traitement de détection d'objet à apprentissage profond
WO2021107231A1 (fr) Procédé et dispositif de codage de phrases au moyen d'informations de mots hiérarchiques
KR20240096951A (ko) 소규모 딥러닝 네트워크를 위한 효율적인 전이 학습 방법
WO2024090600A1 (fr) Procédé d'entrainement de modèle d'apprentissage profond et appareil de calcul d'apprentissage profond appliqué à celui-ci
CN116090538A (zh) 一种模型权重获取方法以及相关系统
CN113626826A (zh) 智能合约安全检测方法、系统、设备、终端及应用
WO2020213757A1 (fr) Procédé de détermination de similarité de mots
WO2021125431A1 (fr) Procédé et dispositif d'initialisation de modèle d'apprentissage profond par l'intermédiaire d'une égalisation distribuée
WO2023090499A1 (fr) Procédé d'élagage de filtre basé sur l'apprentissage de la rareté pour réseaux neuronaux profonds
WO2022145713A1 (fr) Procédé et système d'allègement de modèle de réseau neuronal artificiel, et support d'enregistrement lisible par ordinateur non transitoire
CN113850078A (zh) 基于机器学习的多意图识别方法、设备及可读存储介质
WO2024135860A1 (fr) Procédé d'élagage de données pour dispositif matériel léger d'apprentissage profond
Sinaga et al. Klasifikasi Data Penduduk Pada Pemilihan Umum Di Kota Binjai Menggunakan Algoritma K-Means (Studi Kasus: KPU Kota Binjai)
WO2022107951A1 (fr) Procédé de formation d'un réseau d'apprentissage profond ultra-léger
WO2024010200A1 (fr) Procédé et dispositif d'inférence de modèle d'ia