WO2022027806A1 - 深度学习模型的参数重用方法、装置、终端及存储介质 - Google Patents

深度学习模型的参数重用方法、装置、终端及存储介质 Download PDF

Info

Publication number
WO2022027806A1
WO2022027806A1 PCT/CN2020/117656 CN2020117656W WO2022027806A1 WO 2022027806 A1 WO2022027806 A1 WO 2022027806A1 CN 2020117656 W CN2020117656 W CN 2020117656W WO 2022027806 A1 WO2022027806 A1 WO 2022027806A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
parameters
target model
original
training
Prior art date
Application number
PCT/CN2020/117656
Other languages
English (en)
French (fr)
Inventor
梁栋
朱燕杰
王位
刘新
郑海荣
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2022027806A1 publication Critical patent/WO2022027806A1/zh
Priority to US18/106,988 priority Critical patent/US20230196120A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning

Definitions

  • the present application relates to the technical field of deep learning models, and in particular, to a method, device, terminal and storage medium for reusing parameters of a deep learning model.
  • Transfer Learning A machine learning method that uses the model parameters developed for one task as the starting point for training the second model parameters.
  • Network-based deep transfer learning refers to reusing part of the pre-trained network in the original domain, including its network structure and parameters, as part of the deep neural network used in the target domain.
  • Semi-supervised learning Semi-supervised learning is an algorithm that combines supervised learning and unsupervised learning. It uses both labeled and unlabeled data for learning. The most popular practice in deep learning applications is unsupervised pre-training: use all data to train and reconstruct the auto-encoding network, and then use the parameters of the auto-encoding network as initial parameters to fine-tune with labeled data.
  • the present application provides a parameter reuse method, device, terminal and storage medium for a deep learning model, so as to solve the problem that the existing parameter reuse method cannot avoid the blindness of reused parameter selection.
  • a technical solution adopted in this application is to provide a method for reusing parameters of a deep learning model, including: training a target model according to a pre-configured data set, and the data set includes a training set and a verification set;
  • the pre-trained original model, the target model and the original model have the same part or all of the network structure; obtain the corresponding relationship between the target model and the original model with the same network structure layer, and the parameter correspondence of the corresponding layer; from the network structure in the original model
  • the same layer is extracted to obtain multiple original model parameters; according to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the verification set, and when the verification is passed, record
  • the original model parameters can be reused; use all reusable original model parameters to replace the corresponding parameters in the target model, and then train the new target model after obtaining a new target model.
  • verifying the replaced target model, and when the verification is passed, recording the parameters of the original model for reuse including: obtaining the first result obtained by training the target model according to the training set; verifying the replaced target according to the verification set model, record the second result of the verification; determine whether the difference between the first result and the second result is within the preset range; when the difference between the first result and the second result is within the preset range, the verification is passed, and the original result is recorded.
  • Model parameters are reusable.
  • training a new target model includes: directly using the training set to train the new target model.
  • training a new target model includes: freezing the reusable original model parameters in the new target model, and then using the training set to train the new target model.
  • the method before training the target model according to the preconfigured data set, the method further includes: preprocessing the data set.
  • a parameter reusing device for a deep learning model including: a training module for obtaining a target model by training according to a preconfigured data set, and the data set includes training set and validation set; the first acquisition module is used to acquire the pre-trained original model, and the target model is the same as part or all of the network structure of the original model; the second acquisition module is used to acquire the same network structure in the target model and the original model The corresponding relationship of the corresponding layers, and the corresponding relationship of the parameters of the corresponding layer; the extraction module is used to extract multiple original model parameters from the layers with the same network structure in the original model; the verification module is used to use each parameter one by one according to the corresponding relationship.
  • the original model parameters replace the corresponding parameters in the target model, and the replaced target model is verified on the validation set, and when the verification passes, the original model parameters are recorded for reuse; the migration module is used to replace all reusable original model parameters After the corresponding parameters in the target model are dropped, a new target model is obtained, and then the new target model is trained.
  • another technical solution adopted in this application is to provide a method for reusing parameters of a deep learning model, including: training a target model according to a pre-configured first data set, and obtaining a target model according to a pre-configured first data set.
  • the second data set is trained to obtain the original model, the target model and the original model have the same part or all of the network structure, the first data set includes the first training set and the first verification set; obtain the corresponding layers of the target model and the original model with the same network structure multiple original model parameters are extracted from the layers with the same network structure in the original model; according to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and in the first
  • the replaced target model is verified on a validation set, and when the verification is passed, the original model parameters are recorded for reuse; all the reusable original model parameters are used to replace the corresponding parameters in the target model, and after a new target model is obtained, retrain New target model.
  • a parameter reusing device for a deep learning model including: a training module for obtaining a target model according to the preconfigured first data set training, and according to The preconfigured second data set is trained to obtain the original model, the target model and the original model have the same part or all of the network structure, the first data set includes the first training set and the first verification set; the acquisition module is used to obtain the target model and The corresponding relationship of the layers with the same network structure in the original model, and the corresponding relationship of the parameters of the corresponding layers; the extraction module is used to extract multiple original model parameters from the layers with the same network structure in the original model; the verification module is used to correspond according to the parameters relationship, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the first verification set, and when the verification is passed, the original model parameters are recorded for reuse; the migration module is used to use All reusable original model parameters replace the corresponding parameters in
  • the terminal includes a processor and a memory coupled to the processor, wherein the memory stores a parameter reuse method for implementing the above-mentioned deep learning model
  • the program instructions the processor is used to execute the program instructions stored in the memory to achieve parameter reuse between different deep learning models.
  • another technical solution adopted in the present application is to provide a storage medium storing a program file capable of implementing the above-mentioned method for reusing parameters of a deep learning model.
  • the method for reusing parameters of the deep learning model of the present application performs initial training according to a preset data set to obtain the target model, and then obtains the pre-trained part or the network structure of the target model. All the same original model, and then replace the parameters of the same layer of the original model with the target model to the target model one by one, and then verify the replaced target model on the verification set. From the original model to the target model, until all parameters are verified, load all reusable parameters into the target model and then train to obtain a new target model, which makes it possible to train the target model even if the data volume of the target model is insufficient.
  • FIG. 1 is a schematic flowchart of a method for reusing parameters of a deep learning model according to the first embodiment of the present invention
  • FIG. 2 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to the first embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for reusing parameters of a deep learning model according to a second embodiment of the present invention
  • FIG. 4 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to a second embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.
  • first”, “second” and “third” in this application are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first”, “second”, “third” may expressly or implicitly include at least one of that feature.
  • "a plurality of” means at least two, such as two, three, etc., unless otherwise expressly and specifically defined. All directional indications (such as up, down, left, right, front, rear%) in the embodiments of the present application are only used to explain the relative positional relationship between components under a certain posture (as shown in the accompanying drawings).
  • FIG. 1 is a schematic flowchart of a method for reusing parameters of a deep learning model according to the first embodiment of the present invention. It should be noted that, if there is substantially the same result, the method of the present invention is not limited to the sequence of the processes shown in FIG. 1 . As shown in Figure 1, the method includes the steps:
  • Step S101 training a target model according to a preconfigured data set, where the data set includes a training set and a verification set.
  • the data set is collected according to the task requirements. For example, when the task is to realize cat and dog image recognition, multiple images of cats and dogs need to be prepared in advance.
  • the data set includes a training set and a validation set, the training set is used for model training, and the validation set is used to validate the model after training.
  • step S101 after acquiring a pre-configured data set, deep learning training is performed using the data set, so as to obtain the target model.
  • the method further includes: preprocessing the data set.
  • the preprocessing of the data set includes: data normalization, standardization, etc. If the data is too small, the data set can also be expanded by means of graphic rotation, cropping, etc.
  • Step S102 Obtain a pre-trained original model, and the target model has the same network structure as part or all of the original model.
  • a deep learning model usually includes an activation function layer, a convolutional layer, a fully connected layer, a pooling layer, a BN (Batch Normalization) layer, etc., of which some or all of the convolutional layers or BN layers must have the same network structure.
  • Step S103 Obtain the correspondence between the target model and the layers with the same network structure in the original model, and the parameter correspondence of the corresponding layers.
  • step S103 after acquiring the original model, confirm the layers in the target model with the same network structure as the original model, and then make a one-to-one correspondence between these layers, and then make a one-to-one correspondence with the parameters in the layers.
  • Step S104 Extract multiple original model parameters from layers with the same network structure in the original model.
  • step S104 after confirming the layers with the same network structure, all parameters of the layers with the same network structure are extracted from the original model to obtain a plurality of original model parameters.
  • Step S105 According to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the verification set, and when the verification is passed, the original model parameters are recorded for reuse.
  • step S105 after extracting a plurality of original model parameters from the original model, based on the parameter correspondence, the original model parameters are used to replace the parameters corresponding to the original model parameters in the target model, and the replaced target model is obtained. Without retraining the replaced target model, directly use the verification set to verify the replaced target model. When the verification passes, it is considered that the original model parameters can be reused in the target model. When the verification fails, It is considered that the original model parameters cannot be reused in the target model.
  • the above steps are performed cyclically until each original model parameter is verified, and finally all original model parameters that can be reused to the target model are obtained.
  • the parameters Runningmean, Runningvar, weight, and bias of the BN layer are represented by RM, RV, RW, and RB, respectively, and the parameters weight and bias of the convolutional layer are represented by W and B, respectively.
  • B 1 first, for the BN layer, find the BN layer of the target model corresponding to the BN layer of the original model in the target model, and find the target model parameters RM 2 , RV 2 , RW 2 , RB 2 from it , and then use RM 1 to replace RM 2 , do not retrain the replaced target model, directly verify the replaced target model on the validation set, when the verification is passed, record the RM 1 can be reused, and then replace the The target model is restored to its original state, and then RV 1 is used to replace RV 2 , and the verification is performed again until the four parameters of RM 1 , RV 1 , RW 1 , and RB 1 are all verified; then, for the convolutional layer, in the target model Find out the convolution layer of the target model corresponding to the convolution layer of the original model, and find out the target model parameters W 2 , B 2 , and then use W 1 to replace W 2 without retraining the replaced target Model, verify the replaced target model directly on the verification set, when the verification
  • the data set is divided into a training set and a verification set, and the training set is used to obtain a target model, and the first result obtained when training the target model is recorded.
  • the replaced target model is verified by using the verification set, and the second result obtained by the verification is recorded.
  • the verification is passed, and the recorded original model parameters can be reused; If the difference between the result and the second result is not within the preset range, the verification fails, and the original model parameters cannot be reused.
  • Step S106 Use all reusable original model parameters to replace the corresponding parameters in the target model, and then train the new target model after obtaining a new target model.
  • step S106 after the reusable original model parameters are obtained through verification, all the reusable original model parameters are used to replace the corresponding parameters in the target model to obtain a new target model, and then the new target model is obtained by using the data set The model is trained.
  • the step of training the new target model includes: directly using the training set to train the new target model.
  • the parameters reused in the new target model can be fine-tuned, so that the training effect of the new target model is better.
  • the step of training the new target model may further include: freezing reusable original model parameters in the new target model, and then using the training set to train the new target model.
  • the method for reusing parameters of a deep learning model performs initial training according to a preset data set to obtain a target model, and then obtains a pre-trained part or all of the same network structure as the target model.
  • the original model and then replace the parameters of the layers with the same network structure of the original model as the target model to the target model one by one, and then verify the replaced target model on the validation set.
  • the method of parameter reuse obtains a model with good effect, and the method of filtering out the reusable parameters by verifying the parameters one by one makes the selection of the reusable parameters more purposeful, and can help to select the most suitable reusable parameters, so as to avoid Blindly choose reusable parameters.
  • FIG. 2 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to the first embodiment of the present invention.
  • the apparatus 20 includes a training module 21 , a first acquisition module 22 , a second acquisition module 23 , an extraction module 24 , a verification module 25 and a migration module 26 .
  • the training module 21 is used for training a target model according to a preconfigured data set, and the data set includes a training set and a verification set.
  • the first obtaining module 22 is used to obtain a pre-trained original model, and the target model has the same network structure as part or all of the original model.
  • the second obtaining module 23 is configured to obtain the corresponding relationship between the target model and the layer with the same network structure in the original model, and the parameter corresponding relationship of the corresponding layer.
  • the extraction module 24 is used for extracting a plurality of original model parameters from layers with the same network structure in the original model.
  • the verification module 25 is used to replace the corresponding parameters in the target model with each original model parameter one by one according to the parameter correspondence, and verify the replaced target model on the verification set, and when the verification passes, record the original model parameters for reuse .
  • the migration module 26 is configured to replace the corresponding parameters in the target model with all reusable original model parameters, and then train the new target model after obtaining a new target model.
  • the training module 21 is further configured to preprocess the data set before training to obtain the target model according to the preconfigured data set.
  • the verification module 25 verifies the replaced target model, and when the verification is passed, the operation of recording the reusable pair of the original model parameters can also be: obtaining the first result obtained by training the target model according to the training set; verifying according to the verification set. For the replaced target model, record the second result of verification; determine whether the difference between the first result and the second result is within the preset range; when the difference between the first result and the second result is within the preset range, verify Pass, record the original model parameters can be reused.
  • the operation of the transfer module 26 to train the new target model may be to directly use the training set to train the new target model.
  • the operation of the migration module 26 to train the new target model may also be to freeze the reusable original model parameters in the new target model, and then use the training set to train the new target model.
  • FIG. 3 is a schematic flowchart of a method for reusing parameters of a deep learning model according to a second embodiment of the present invention. It should be noted that, if there is substantially the same result, the method of the present invention is not limited to the flow sequence shown in FIG. 3 . As shown in Figure 3, the method includes the steps:
  • Step S301 The target model is obtained by training according to the pre-configured first data set, and the original model is obtained by training according to the pre-configured second data set.
  • the target model and the original model have the same part or all of the network structure, and the first data set includes: The first training set and the first validation set.
  • the first data set and the second data set may be completely the same data set, and when the first data set and the second data set are the same, the target model and the original model may be the same data set for different Two models of tasks.
  • the first data set and the second data set may also be two different data sets, and the target model and the original model may be two models for the same task or different tasks from different data sets.
  • the data volume of the first data set is smaller than the data volume of the second data set.
  • the second data set can be used to obtain a model with good effect.
  • the parameters of the original model are reused in the target model trained based on the first data set, thereby improving the training effect of the target model.
  • Step S302 Obtain the correspondence between the target model and the layers with the same network structure in the original model, and the parameter correspondence of the corresponding layers.
  • step S302 in FIG. 3 is similar to step S103 in FIG. 1, and for the sake of brevity, details are not repeated here.
  • Step S303 Extract multiple original model parameters from layers with the same network structure in the original model.
  • step S303 in FIG. 3 is similar to step S104 in FIG. 1 , and for the sake of brevity, details are not repeated here.
  • Step S304 According to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the first verification set, and when the verification is passed, the original model parameters are recorded for reuse.
  • step S304 in FIG. 3 is similar to step S105 in FIG. 1 , and for the sake of brevity, details are not repeated here.
  • Step S305 Use all reusable original model parameters to replace the corresponding parameters in the target model, and then train the new target model after obtaining a new target model.
  • step S305 in FIG. 3 is similar to step S106 in FIG. 1 , and for the sake of brevity, details are not repeated here.
  • the method for reusing parameters of a deep learning model according to the second embodiment of the present invention is obtained by selecting a similar data set with a large amount of data for training when there is no trained model for parameter reusing. Models with reusable parameters can be provided, and then the parameters can be reused between models, so as to avoid the problem that it is difficult to train a model with better effect due to insufficient data volume.
  • FIG. 4 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to a second embodiment of the present invention.
  • the apparatus 40 includes a training module 41 , an acquisition module 42 , an extraction module 43 , a verification module 44 and a migration module 45 .
  • the training module 41 is used to train the target model according to the pre-configured first data set, and obtain the original model according to the pre-configured second data set.
  • the data volume of the data set is smaller than that of the second data set, and the first data set includes a first training set and a first validation set.
  • the obtaining module 42 is configured to obtain the corresponding relationship between the target model and the layer with the same network structure in the original model, and the parameter corresponding relationship of the corresponding layer.
  • the extraction module 43 is used for extracting a plurality of original model parameters from layers with the same network structure in the original model.
  • the verification module 44 is used to replace the corresponding parameters in the target model with each original model parameter one by one according to the parameter correspondence, and verify the replaced target model on the first verification set, and when the verification passes, record the original model parameters Reusable.
  • the migration module 45 is configured to replace the corresponding parameters in the target model with all reusable original model parameters, and then train the new target model after obtaining a new target model.
  • FIG. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal 60 includes a processor 61 and a memory 62 coupled to the processor 61 .
  • the memory 62 stores program instructions for implementing the parameter reuse method of the deep learning model described in any of the above embodiments.
  • the processor 61 is configured to execute program instructions stored in the memory 62 to realize parameter reuse between different deep learning models.
  • the processor 61 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 61 may be an integrated circuit chip with signal processing capability.
  • the processor 61 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components .
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.
  • the storage medium of the embodiment of the present invention stores a program file 71 capable of implementing all the above methods, wherein the program file 71 may be stored in the above-mentioned storage medium in the form of a software product, and includes several instructions to enable a computer device (which may A personal computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the methods described in the various embodiments of the present application.
  • a computer device which may A personal computer, a server, or a network device, etc.
  • processor processor
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes , or terminal devices such as computers, servers, mobile phones, and tablets.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

一种深度学习模型的参数重用方法、装置、终端及存储介质,其中方法包括:根据预设的训练集训练得到目标模型(S101);获取预先训练好的原模型,目标模型与原模型的部分或全部网络结构相同(S102);获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系(S103);从原模型中网络结构相同的层提取得到多个原模型参数(S104);根据参数对应关系,逐个利用原模型参数替换目标模型中的对应参数,并在预设的验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用(S105);利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型(S106)。通过上述方式,能够实现模型之间的参数重用,且避免盲目选择重用参数。

Description

深度学习模型的参数重用方法、装置、终端及存储介质 技术领域
本申请涉及深度学习模型技术领域,特别是涉及一种深度学习模型的参数重用方法、装置、终端及存储介质。
背景技术
众所周知深度学习需要大量标注数据进行训练,但是有些数据很难获取,而标注数据又要花费大量人力。所以如何能够用尽量少的数据达到目的是目前深度学习前沿方向之一,而参数重用是解决这一问题的重要策略。
针对如何利用少量数据训练的问题目前主要有两种方式:
1.迁移学习:一种机器学习方法,他将为一个任务开发的模型参数作为第二个模型参数训练的起点。基于网络的深度迁移学习是指将原领域中预先训练好的部分网络,包括其网络结构和参数,重用为用于目标领域的深度神经网络的一部分。
2.半监督学习:半监督学习是一种结合监督学习和无监督学习的算法,他同时利用有标签数据和无标签数据进行学习的一种方法。目前深度学习应用中比较流行的一直做法是无监督预训练:用所有数据训练重构自编码网络,然后把自编码网络的参数,作为初始参数,用有标签数据微调。
但是,目前迁移学习和半监督学习都有一个相同的问题:重用参数选取的盲目性,其暂时无法做到很好的选取可以重用的参数,导致模型重用效果较差。
发明内容
本申请提供一种深度学习模型的参数重用方法、装置、终端及存储介质,以解决现有参数重用方式无法避免重用参数选取的盲目性的问题。
为解决上述技术问题,本申请采用的一个技术方案是:提供一种深度学习模型的参数重用方法,包括:根据预先配置好的数据集训练得到目标模型,数据集包括训练集和验证集;获取预先训练好的原模型,目标模型与原模型的部分或全部网络结构相同;获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;从原模型中网络结构相同的层提取得到多个 原模型参数;根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用;利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
作为本申请的进一步改进,验证替换后的目标模型,且当验证通过时,记录原模型参数可重用,包括:获取根据训练集训练目标模型得到的第一结果;根据验证集验证替换后的目标模型,记录验证的第二结果;判断第一结果与第二结果的差值是否在预设范围内;当第一结果与第二结果的差值在预设范围内时,验证通过,记录原模型参数可重用。
作为本申请的进一步改进,训练新的目标模型,包括:直接利用训练集训练新的目标模型。
作为本申请的进一步改进,训练新的目标模型,包括:冻结新的目标模型中可重用的原模型参数,再利用训练集训练新的目标模型。
作为本申请的进一步改进,根据预先配置好的数据集训练得到目标模型之前,还包括:对数据集进行预处理。
为解决上述技术问题,本申请采用的另一个技术方案是:提供一种深度学习模型的参数重用装置,包括:训练模块,用于根据预先配置好的数据集训练得到目标模型,数据集包括训练集和验证集;第一获取模块,用于获取预先训练好的原模型,目标模型与原模型的部分或全部网络结构相同;第二获取模块,用于获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;提取模块,用于从原模型中网络结构相同的层提取得到多个原模型参数;验证模块,用于根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用;迁移模块,用于利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
为解决上述技术问题,本申请采用的另一个技术方案是:提供一种深度学习模型的参数重用方法,包括:根据预先配置好的第一数据集训练得到目标模型,并根据预先配置好的第二数据集训练得到原模型,目标模型和原模型的部分或全部网络结构相同,第一数据集包括第一训练集和第一验证集;获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系; 从原模型中网络结构相同的层提取得到多个原模型参数;根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在第一验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用;利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
为解决上述技术问题,本申请采用的另一个技术方案是:提供一种深度学习模型的参数重用装置,包括:训练模块,用于根据预先配置好的第一数据集训练得到目标模型,并根据预先配置好的第二数据集训练得到原模型,目标模型和原模型的部分或全部网络结构相同,第一数据集包括第一训练集和第一验证集;获取模块,用于获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;提取模块,用于从原模型中网络结构相同的层提取得到多个原模型参数;验证模块,用于根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在第一验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用;迁移模块,用于利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
为解决上述技术问题,本申请采用的再一个技术方案是:提供一种终端,该终端包括处理器、与处理器耦接的存储器,其中,存储器存储有用于实现上述深度学习模型的参数重用方法的程序指令;处理器用于执行存储器存储的程序指令以实现不同深度学习模型之间的参数重用。
为解决上述技术问题,本申请采用的再一个技术方案是:提供一种存储介质,存储有能够实现上述深度学习模型的参数重用方法的程序文件。
本申请的有益效果是:本申请的深度学习模型的参数重用方法通过根据预设的数据集进行初始的训练,得到目标模型后,再获取预先训练好的与该目标模型在网络结构上部分或全部相同的原模型,然后通过将原模型的网络结构与目标模型相同的层的参数逐个替换至目标模型上,再在验证集上对替换后的目标模型进行验证,验证通过则认为该参数可从原模型重用至目标模型,直至所有参数均验证完成后,将所有的可重用参数加载至目标模型再进行训练,得到新的目标模型,其使得即使训练目标模型的数据集的数据量不足,也能够通过参数重用的方式得到一个效果良好的模型,并且,通过逐个参数进行验证筛选 出可重用参数的方式,使得可重用参数的选取更具有目的性,能够帮助选取出最合适的可重用参数,从而避免盲目选择可重用参数。
附图说明
图1是本发明第一实施例的深度学习模型的参数重用方法的流程示意图;
图2是本发明第一实施例的深度学习模型的参数重用装置的功能模块示意图;
图3是本发明第二实施例的深度学习模型的参数重用方法的流程示意图;
图4是本发明第二实施例的深度学习模型的参数重用装置的功能模块示意图;
图5是本发明实施例的终端的结构示意图;
图6是本发明实施例的存储介质的结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
本申请中的术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括至少一个该特征。本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。本申请实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并 不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
图1是本发明第一实施例的深度学习模型的参数重用方法的流程示意图。需注意的是,若有实质上相同的结果,本发明的方法并不以图1所示的流程顺序为限。如图1所示,该方法包括步骤:
步骤S101:根据预先配置好的数据集训练得到目标模型,数据集包括训练集和验证集。
需要说明的是,该数据集是根据任务要求收集到的数据,例如,当任务是要实现猫狗图像识别,则需要预先准备多张猫和狗的图像。其中,数据集包括训练集和验证集,训练集用于进行模型训练,验证集用于对训练之后的模型进行验证。
在步骤S101中,在获取到预先配置好的数据集之后,利用该数据集进行深度学习训练,从而得到该目标模型。
进一步的,为了保证模型的训练效果,在根据预先配置好的数据集训练得到目标模型之前,还包括:对数据集进行预处理。
具体地,对数据集进行预处理具体包括:数据的归一化、标准化等,若数据过少,还可通过图形旋转、裁剪等方式进行数据集扩充。
步骤S102:获取预先训练好的原模型,目标模型与原模型的部分或全部网络结构相同。
需要说明的是,该原模型是预先已经训练好的模型,并且,目标模型和原模型必须部分或全部网络结构相同,否则,不可进行参数重用。例如,一个深度学习模型通常包括有激活函数层、卷积层、全连接层、池化层、BN(Batch Normalization)层等,其中部分或全部的卷积层或BN层的网络结构必须相同。
步骤S103:获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系。
在步骤S103中,在获取到原模型之后,确认目标模型中与原模型中网络结构相同的层,然后将这些层进行一一对应,再将层内的参数也进行一一对应。
步骤S104:从原模型中网络结构相同的层提取得到多个原模型参数。
在步骤S104中,确认网络结构相同的层之后,从原模型中提取网络结构 相同的层的所有参数,得到多个原模型参数。
步骤S105:根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用。
在步骤S105中,从原模型中提取到多个原模型参数后,再基于参数对应关系,利用原模型参数替换掉目标模型中与该原模型参数对应的参数,得到替换后的目标模型,在不重新训练该替换后的目标模型的基础上,直接使用验证集对该替换后的目标模型进行验证,当验证通过时,则认为该原模型参数可重用至目标模型中,当验证不通过,则认为该原模型参数不可重用至目标模型中。循环执行上述步骤,直至每一个原模型参数均被验证,最终得到所有可重用至目标模型的原模型参数。
需要说明的是,在每验证一个原模型参数后,需要将目标模型恢复原样之后,再进行下一个原模型参数的验证,始终保持目标模型中只有一个变量,从而有效地验证原模型参数是否可以重用。
例如,BN层的参数Runningmean、Runningvar、weight、bias分别用RM、RV、RW、RB表示,卷积层的参数weight、bias分别用W、B表示。假设目标模型和原模型之间存在网络结构相同的BN层和卷积层,提取原模型的BN层的原模型参数RM 1、RV 1、RW 1、RB 1和卷积层的原模型参数W 1、B 1,首先,针对于该BN层,在目标模型中找出与原模型BN层对应的目标模型的BN层,并从中找出目标模型参数RM 2、RV 2、RW 2、RB 2,再利用RM 1替换掉RM 2后,不重新训练替换后的目标模型,直接在验证集上对替换后的目标模型进行验证,当验证通过时,记录该RM 1可重用,然后将替换后的目标模型恢复原样,再利用RV 1替换RV 2,再次进行验证,直至RM 1、RV 1、RW 1、RB 1四个参数均以验证完成;然后,针对于卷积层,在目标模型中找出找出与原模型卷积层层对应的目标模型的卷积层,并从中找出目标模型参数W 2、B 2,再利用W 1替换掉W 2后,不重新训练替换后的目标模型,直接在验证集上对替换后的目标模型进行验证,当验证通过时,记录该W 1可重用,然后将替换后的目标模型恢复原样,再利用B 1替换B 2,再次进行验证,直至W 1、B 1两个参数均以验证完成。由此实现逐层逐个参数验证,以帮助选取出最合适的可重用参数。
进一步的,本实施例中,验证替换后的目标模型,且当验证通过时,记录 原模型参数可重用的步骤,具体包括:
1、获取根据训练集训练目标模型得到的第一结果。
具体地,在得到数据集之后,将数据集划分为训练集和验证集,再利用训练集训练得到目标模型,并且,记录训练目标模型时得到的第一结果。
2、根据验证集验证替换后的目标模型,记录验证的第二结果。
具体地,在利用原模型参数替换掉目标模型中的对应参数之后,利用验证集验证该替换后的目标模型,记录验证得到的第二结果。
3、判断第一结果与第二结果的差值是否在预设范围内。
4、当第一结果与第二结果的差值在预设范围内时,验证通过,记录原模型参数可重用。
具体地,通过比较第一结构和第二结果之间的差值,当第一结果与第二结果的差值在预设范围内时,则验证通过,记录原模型参数可重用;当第一结果与第二结果的差值不在预设范围内时,则验证不通过,该原模型参数不可重用。
步骤S106:利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
在步骤S106中,通过验证得到所用可重用的原模型参数之后,利用所有可重用的原模型参数换掉目标模型中的对应参数,得到一个新的目标模型,再利用数据集对该新的目标模型进行训练。
其中,在一些实施例中,训练新的目标模型的步骤包括:直接利用训练集训练新的目标模型。
具体地,在利用数据集训练新的目标模型时,能够对新的目标模型中重用的参数进行微调,使得新的目标模型的训练效果更好。
在另一些实施例中,训练新的目标模型的步骤还可以包括:冻结新的目标模型中可重用的原模型参数,再利用训练集训练新的目标模型。
应当理解的是,本实施例中仅列举了两个模型之间的参数重用,其同样适用于多个模型之间的参数重用,原理与两个模型之间的参数重用原理相同,均属于本发明的保护范围之内。
本发明第一实施例的深度学习模型的参数重用方法通过根据预设的数据集进行初始的训练,得到目标模型后,再获取预先训练好的与该目标模型在网络结构上部分或全部相同的原模型,然后通过将原模型的网络结构与目标模型 相同的层的参数逐个替换至目标模型上,再在验证集上对替换后的目标模型进行验证,验证通过则认为该参数可从原模型重用至目标模型,直至所有参数均验证完成后,将所有的可重用参数加载至目标模型再进行训练,得到新的目标模型,其使得即使训练目标模型的数据集的数据量不足,也能够通过参数重用的方式得到一个效果良好的模型,并且,通过逐个参数进行验证筛选出可重用参数的方式,使得可重用参数的选取更具有目的性,能够帮助选取出最合适的可重用参数,从而避免盲目选择可重用参数。
图2是本发明第一实施例的深度学习模型的参数重用装置的功能模块示意图。如图2所示,该装置20包括训练模块21、第一获取模块22、第二获取模块23、提取模块24、验证模块25和迁移模块26。
训练模块21,用于根据预先配置好的数据集训练得到目标模型,数据集包括训练集和验证集。
第一获取模块22,用于获取预先训练好的原模型,目标模型与原模型的部分或全部网络结构相同。
第二获取模块23,用于获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系。
提取模块24,用于从原模型中网络结构相同的层提取得到多个原模型参数。
验证模块25,用于根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用。
迁移模块26,用于利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
可选地,训练模块21根据预先配置好的数据集训练得到目标模型的操作之前,还用于对数据集进行预处理。
可选地,验证模块25验证替换后的目标模型,且当验证通过时,记录原模型参数可重用对的操作还可以为:获取根据训练集训练目标模型得到的第一结果;根据验证集验证替换后的目标模型,记录验证的第二结果;判断第一结果与第二结果的差值是否在预设范围内;当第一结果与第二结果的差值在预设范围内时,验证通过,记录原模型参数可重用。
可选地,迁移模块26训练新的目标模型的操作可以为直接利用训练集训练新的目标模型。
可选地,迁移模块26训练新的目标模型的操作还可以为冻结新的目标模型中可重用的原模型参数,再利用训练集训练新的目标模型。
关于上述第一实施例的深度学习模型的参数重用装置中各模块实现技术方案的其他细节,可参见上述第一实施例的深度学习模型的参数重用方法中的描述,此处不再赘述。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。对于装置类实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
图3是本发明第二实施例的深度学习模型的参数重用方法的流程示意图。需注意的是,若有实质上相同的结果,本发明的方法并不以图3所示的流程顺序为限。如图3所示,该方法包括步骤:
步骤S301:根据预先配置好的第一数据集训练得到目标模型,并根据预先配置好的第二数据集训练得到原模型,目标模型和原模型的部分或全部网络结构相同,第一数据集包括第一训练集和第一验证集。
在步骤S301中,该第一数据集和第二数据集可以为完全相同的数据集,当第一数据集和第二数据集相同时,则目标模型和原模型可以为同一数据集的针对不同任务的两个模型。此外,该第一数据集和第二数据集也可以为两个不同的数据集,目标模型和原模型可以为不同数据集针对同一任务或不同任务的两个模型。优选地,本实施例中,第一数据集的数据量小于第二数据集的数据量。
本实施例中,当第一数据集的数据量较少以致难以训练得到一个效果好的模型,而第二数据集的数据量多且可以训练得到一个效果好的模型时,则可利用第二数据集训练得到原模型之后,再将原模型的参数重用至基于第一数据集训练得到的目标模型中,从而提升目标模型的训练效果。
步骤S302:获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系。
在本实施例中,图3中的步骤S302和图1中的步骤S103类似,为简约 起见,在此不再赘述。
步骤S303:从原模型中网络结构相同的层提取得到多个原模型参数。
在本实施例中,图3中的步骤S303和图1中的步骤S104类似,为简约起见,在此不再赘述。
步骤S304:根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在第一验证集上验证替换后的目标模型,且当验证通过时,记录原模型参数可重用。
在本实施例中,图3中的步骤S304和图1中的步骤S105类似,为简约起见,在此不再赘述。
步骤S305:利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
在本实施例中,图3中的步骤S305和图1中的步骤S106类似,为简约起见,在此不再赘述。
本发明第二实施例的深度学习模型的参数重用方法在第一实施例的基础上,通过在没有训练好的模型进行参数重用时,也可选取相似且数据量较大的数据集进行训练得到可以提供可重用参数的模型,再进行模型之间的参数重用,从而避免因数据量不足导致难以训练得到效果较好的模型的问题。
图4是本发明第二实施例的深度学习模型的参数重用装置的功能模块示意图。如图4所示,该装置40包括训练模块41、获取模块42、提取模块43、验证模块44和迁移模块45。
训练模块41,用于根据预先配置好的第一数据集训练得到目标模型,并根据预先配置好的第二数据集训练得到原模型,目标模型和原模型的部分或全部网络结构相同,第一数据集的数据量小于第二数据集的数据量,第一数据集包括第一训练集和第一验证集。
获取模块42,用于获取目标模型和原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系。
提取模块43,用于从原模型中网络结构相同的层提取得到多个原模型参数。
验证模块44,用于根据参数对应关系,逐个利用每个原模型参数替换目标模型中的对应参数,并在第一验证集上验证替换后的目标模型,且当验证通 过时,记录原模型参数可重用。
迁移模块45,用于利用所有可重用的原模型参数替换掉目标模型中的对应参数,得到新的目标模型后,再训练新的目标模型。
关于上述第二实施例的深度学习模型的参数重用装置中各模块实现技术方案的其他细节,可参见上述第二实施例的深度学习模型的参数重用方法中的描述,此处不再赘述。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。对于装置类实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
请参阅图5,图5为本发明实施例的终端的结构示意图。如图5所示,该终端60包括处理器61及和处理器61耦接的存储器62。
存储器62存储有用于实现上述任一实施例所述的深度学习模型的参数重用方法的程序指令。
处理器61用于执行存储器62存储的程序指令以实现不同深度学习模型之间的参数重用。
其中,处理器61还可以称为CPU(Central Processing Unit,中央处理单元)。处理器61可能是一种集成电路芯片,具有信号的处理能力。处理器61还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
参阅图6,图6为本发明实施例的存储介质的结构示意图。本发明实施例的存储介质存储有能够实现上述所有方法的程序文件71,其中,该程序文件71可以以软件产品的形式存储在上述存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质,或者是计算机、服务器、手机、平板等终端设备。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。以上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (10)

  1. 一种深度学习模型的参数重用方法,其特征在于,包括:
    根据预先配置好的数据集训练得到目标模型,所述数据集包括训练集和验证集;
    获取预先训练好的原模型,所述目标模型与所述原模型的部分或全部网络结构相同;
    获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;
    从所述原模型中所述网络结构相同的层提取得到多个原模型参数;
    根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;
    利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
  2. 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用,包括:
    获取根据所述训练集训练所述目标模型得到的第一结果;
    根据所述验证集验证所述替换后的目标模型,记录验证的第二结果;
    判断所述第一结果与所述第二结果的差值是否在预设范围内;
    当所述第一结果与所述第二结果的差值在预设范围内时,验证通过,记录所述原模型参数可重用。
  3. 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述训练所述新的目标模型,包括:
    直接利用所述训练集训练所述新的目标模型。
  4. 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述训练所述新的目标模型,包括:
    冻结所述新的目标模型中可重用的原模型参数,再利用所述训练集训练所述新的目标模型。
  5. 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述根据预先配置好的数据集训练得到目标模型之前,还包括:
    对所述数据集进行预处理。
  6. 一种深度学习模型的参数重用装置,其特征在于,包括:
    训练模块,用于根据预先配置好的数据集训练得到目标模型,所述数据集包括训练集和验证集;
    第一获取模块,用于获取预先训练好的原模型,所述目标模型与所述原模型的部分或全部网络结构相同;
    第二获取模块,用于获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;
    提取模块,用于从所述原模型中所述网络结构相同的层提取得到多个原模型参数;
    验证模块,用于根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;
    迁移模块,用于利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
  7. 一种深度学习模型的参数重用方法,其特征在于,包括:
    根据预先配置好的第一数据集训练得到目标模型,并根据预先配置好的第二数据集训练得到原模型,所述目标模型和所述原模型的部分或全部网络结构相同,所述第一数据集包括第一训练集和第一验证集;
    获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;
    从所述原模型中所述网络结构相同的层提取得到多个原模型参数;
    根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述第一验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;
    利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
  8. 一种深度学习模型的参数重用装置,其特征在于,包括:
    训练模块,用于根据预先配置好的第一数据集训练得到目标模型,并根据 预先配置好的第二数据集训练得到原模型,所述目标模型和所述原模型的部分或全部网络结构相同,所述第一数据集包括第一训练集和第一验证集;
    获取模块,用于获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;
    提取模块,用于从所述原模型中所述网络结构相同的层提取得到多个原模型参数;
    验证模块,用于根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述第一验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;
    迁移模块,用于利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
  9. 一种终端,其特征在于,所述终端包括处理器、与所述处理器耦接的存储器,其中,
    所述存储器存储有用于实现如权利要求1-5或权利要求7中任一项所述的深度学习模型的参数重用方法的程序指令;
    所述处理器用于执行所述存储器存储的所述程序指令以实现不同深度学习模型之间的参数重用。
  10. 一种存储介质,其特征在于,存储有能够实现如权利要求1-5或权利要求7中任一项所述的深度学习模型的参数重用方法的程序文件。
PCT/CN2020/117656 2020-08-07 2020-09-25 深度学习模型的参数重用方法、装置、终端及存储介质 WO2022027806A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/106,988 US20230196120A1 (en) 2020-08-07 2023-02-07 Method, device, terminal, and storage medium for reusing parameters of a deep learning model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010786350.0 2020-08-07
CN202010786350.0A CN114065903A (zh) 2020-08-07 2020-08-07 深度学习模型的参数重用方法、装置、终端及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/106,988 Continuation US20230196120A1 (en) 2020-08-07 2023-02-07 Method, device, terminal, and storage medium for reusing parameters of a deep learning model

Publications (1)

Publication Number Publication Date
WO2022027806A1 true WO2022027806A1 (zh) 2022-02-10

Family

ID=80118613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117656 WO2022027806A1 (zh) 2020-08-07 2020-09-25 深度学习模型的参数重用方法、装置、终端及存储介质

Country Status (3)

Country Link
US (1) US20230196120A1 (zh)
CN (1) CN114065903A (zh)
WO (1) WO2022027806A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998893A (zh) * 2022-06-14 2022-09-02 江南大学 基于半监督迁移学习的食品品质无损检测模型构建方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583594A (zh) * 2018-11-16 2019-04-05 东软集团股份有限公司 深度学习训练方法、装置、设备及可读存储介质
CN110378487A (zh) * 2019-07-18 2019-10-25 深圳前海微众银行股份有限公司 横向联邦学习中模型参数验证方法、装置、设备及介质
CN110782043A (zh) * 2019-10-29 2020-02-11 腾讯科技(深圳)有限公司 模型优化方法、装置、存储介质及服务器
US20200104706A1 (en) * 2018-09-27 2020-04-02 Google Llc Parameter-Efficient Multi-Task and Transfer Learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104706A1 (en) * 2018-09-27 2020-04-02 Google Llc Parameter-Efficient Multi-Task and Transfer Learning
CN109583594A (zh) * 2018-11-16 2019-04-05 东软集团股份有限公司 深度学习训练方法、装置、设备及可读存储介质
CN110378487A (zh) * 2019-07-18 2019-10-25 深圳前海微众银行股份有限公司 横向联邦学习中模型参数验证方法、装置、设备及介质
CN110782043A (zh) * 2019-10-29 2020-02-11 腾讯科技(深圳)有限公司 模型优化方法、装置、存储介质及服务器

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998893A (zh) * 2022-06-14 2022-09-02 江南大学 基于半监督迁移学习的食品品质无损检测模型构建方法

Also Published As

Publication number Publication date
CN114065903A (zh) 2022-02-18
US20230196120A1 (en) 2023-06-22

Similar Documents

Publication Publication Date Title
Wu et al. Deep incremental hashing network for efficient image retrieval
US10691909B2 (en) User authentication method using fingerprint image and method of generating coded model for user authentication
US20200387755A1 (en) Optimizing training data for image classification
EP3370188B1 (en) Facial verification method, device, and computer storage medium
US9275307B2 (en) Method and system for automatic selection of one or more image processing algorithm
Bai et al. CNN feature boosted SeqSLAM for real‐time loop closure detection
TW201926148A (zh) 機器學習模型的訓練方法、裝置以及電子設備
CN110276406B (zh) 表情分类方法、装置、计算机设备及存储介质
WO2019082165A1 (en) GENERATION OF NEURAL NETWORKS WITH COMPRESSED REPRESENTATION HAVING A HIGH DEGREE OF PRECISION
WO2020134099A1 (zh) 物品识别方法、设备和系统
CN109086697A (zh) 一种人脸数据处理方法、装置及存储介质
WO2022027806A1 (zh) 深度学习模型的参数重用方法、装置、终端及存储介质
Lu et al. Combining context, consistency, and diversity cues for interactive image categorization
CN110737648B (zh) 性能特征降维方法及装置、电子设备及存储介质
CN108154120A (zh) 视频分类模型训练方法、装置、存储介质及电子设备
CN104573737A (zh) 特征点定位的方法及装置
EP3166022A1 (en) Method and apparatus for image search using sparsifying analysis operators
JP2022548341A (ja) 目標モデルの取得
JP2008009548A (ja) モデル作成装置および識別装置
Le et al. City-scale visual place recognition with deep local features based on multi-scale ordered VLAD pooling
WO2023010701A1 (en) Image generation method, apparatus, and electronic device
WO2023060575A1 (zh) 图像识别方法、装置、电子设备及存储介质
US20230196073A1 (en) Method for secure use of a first neural network on an input datum and method for learning parameters of a second neural network
US20230196831A1 (en) Image Group Classifier in a User Device
Zhang et al. Motion Field Consensus with Locality Preservation: A Geometric Confirmation Strategy for Loop Closure Detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20948197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20948197

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20948197

Country of ref document: EP

Kind code of ref document: A1