WO2022027806A1 - 深度学习模型的参数重用方法、装置、终端及存储介质 - Google Patents
深度学习模型的参数重用方法、装置、终端及存储介质 Download PDFInfo
- Publication number
- WO2022027806A1 WO2022027806A1 PCT/CN2020/117656 CN2020117656W WO2022027806A1 WO 2022027806 A1 WO2022027806 A1 WO 2022027806A1 CN 2020117656 W CN2020117656 W CN 2020117656W WO 2022027806 A1 WO2022027806 A1 WO 2022027806A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- parameters
- target model
- original
- training
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 83
- 238000012795 verification Methods 0.000 claims abstract description 72
- 238000013136 deep learning model Methods 0.000 claims abstract description 40
- 238000010200 validation analysis Methods 0.000 claims description 11
- 230000005012 migration Effects 0.000 claims description 9
- 238000013508 migration Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000013526 transfer learning Methods 0.000 description 3
- 201000004569 Blindness Diseases 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000008014 freezing Effects 0.000 description 2
- 238000007710 freezing Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
Definitions
- the present application relates to the technical field of deep learning models, and in particular, to a method, device, terminal and storage medium for reusing parameters of a deep learning model.
- Transfer Learning A machine learning method that uses the model parameters developed for one task as the starting point for training the second model parameters.
- Network-based deep transfer learning refers to reusing part of the pre-trained network in the original domain, including its network structure and parameters, as part of the deep neural network used in the target domain.
- Semi-supervised learning Semi-supervised learning is an algorithm that combines supervised learning and unsupervised learning. It uses both labeled and unlabeled data for learning. The most popular practice in deep learning applications is unsupervised pre-training: use all data to train and reconstruct the auto-encoding network, and then use the parameters of the auto-encoding network as initial parameters to fine-tune with labeled data.
- the present application provides a parameter reuse method, device, terminal and storage medium for a deep learning model, so as to solve the problem that the existing parameter reuse method cannot avoid the blindness of reused parameter selection.
- a technical solution adopted in this application is to provide a method for reusing parameters of a deep learning model, including: training a target model according to a pre-configured data set, and the data set includes a training set and a verification set;
- the pre-trained original model, the target model and the original model have the same part or all of the network structure; obtain the corresponding relationship between the target model and the original model with the same network structure layer, and the parameter correspondence of the corresponding layer; from the network structure in the original model
- the same layer is extracted to obtain multiple original model parameters; according to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the verification set, and when the verification is passed, record
- the original model parameters can be reused; use all reusable original model parameters to replace the corresponding parameters in the target model, and then train the new target model after obtaining a new target model.
- verifying the replaced target model, and when the verification is passed, recording the parameters of the original model for reuse including: obtaining the first result obtained by training the target model according to the training set; verifying the replaced target according to the verification set model, record the second result of the verification; determine whether the difference between the first result and the second result is within the preset range; when the difference between the first result and the second result is within the preset range, the verification is passed, and the original result is recorded.
- Model parameters are reusable.
- training a new target model includes: directly using the training set to train the new target model.
- training a new target model includes: freezing the reusable original model parameters in the new target model, and then using the training set to train the new target model.
- the method before training the target model according to the preconfigured data set, the method further includes: preprocessing the data set.
- a parameter reusing device for a deep learning model including: a training module for obtaining a target model by training according to a preconfigured data set, and the data set includes training set and validation set; the first acquisition module is used to acquire the pre-trained original model, and the target model is the same as part or all of the network structure of the original model; the second acquisition module is used to acquire the same network structure in the target model and the original model The corresponding relationship of the corresponding layers, and the corresponding relationship of the parameters of the corresponding layer; the extraction module is used to extract multiple original model parameters from the layers with the same network structure in the original model; the verification module is used to use each parameter one by one according to the corresponding relationship.
- the original model parameters replace the corresponding parameters in the target model, and the replaced target model is verified on the validation set, and when the verification passes, the original model parameters are recorded for reuse; the migration module is used to replace all reusable original model parameters After the corresponding parameters in the target model are dropped, a new target model is obtained, and then the new target model is trained.
- another technical solution adopted in this application is to provide a method for reusing parameters of a deep learning model, including: training a target model according to a pre-configured first data set, and obtaining a target model according to a pre-configured first data set.
- the second data set is trained to obtain the original model, the target model and the original model have the same part or all of the network structure, the first data set includes the first training set and the first verification set; obtain the corresponding layers of the target model and the original model with the same network structure multiple original model parameters are extracted from the layers with the same network structure in the original model; according to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and in the first
- the replaced target model is verified on a validation set, and when the verification is passed, the original model parameters are recorded for reuse; all the reusable original model parameters are used to replace the corresponding parameters in the target model, and after a new target model is obtained, retrain New target model.
- a parameter reusing device for a deep learning model including: a training module for obtaining a target model according to the preconfigured first data set training, and according to The preconfigured second data set is trained to obtain the original model, the target model and the original model have the same part or all of the network structure, the first data set includes the first training set and the first verification set; the acquisition module is used to obtain the target model and The corresponding relationship of the layers with the same network structure in the original model, and the corresponding relationship of the parameters of the corresponding layers; the extraction module is used to extract multiple original model parameters from the layers with the same network structure in the original model; the verification module is used to correspond according to the parameters relationship, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the first verification set, and when the verification is passed, the original model parameters are recorded for reuse; the migration module is used to use All reusable original model parameters replace the corresponding parameters in
- the terminal includes a processor and a memory coupled to the processor, wherein the memory stores a parameter reuse method for implementing the above-mentioned deep learning model
- the program instructions the processor is used to execute the program instructions stored in the memory to achieve parameter reuse between different deep learning models.
- another technical solution adopted in the present application is to provide a storage medium storing a program file capable of implementing the above-mentioned method for reusing parameters of a deep learning model.
- the method for reusing parameters of the deep learning model of the present application performs initial training according to a preset data set to obtain the target model, and then obtains the pre-trained part or the network structure of the target model. All the same original model, and then replace the parameters of the same layer of the original model with the target model to the target model one by one, and then verify the replaced target model on the verification set. From the original model to the target model, until all parameters are verified, load all reusable parameters into the target model and then train to obtain a new target model, which makes it possible to train the target model even if the data volume of the target model is insufficient.
- FIG. 1 is a schematic flowchart of a method for reusing parameters of a deep learning model according to the first embodiment of the present invention
- FIG. 2 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to the first embodiment of the present invention
- FIG. 3 is a schematic flowchart of a method for reusing parameters of a deep learning model according to a second embodiment of the present invention
- FIG. 4 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to a second embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.
- first”, “second” and “third” in this application are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first”, “second”, “third” may expressly or implicitly include at least one of that feature.
- "a plurality of” means at least two, such as two, three, etc., unless otherwise expressly and specifically defined. All directional indications (such as up, down, left, right, front, rear%) in the embodiments of the present application are only used to explain the relative positional relationship between components under a certain posture (as shown in the accompanying drawings).
- FIG. 1 is a schematic flowchart of a method for reusing parameters of a deep learning model according to the first embodiment of the present invention. It should be noted that, if there is substantially the same result, the method of the present invention is not limited to the sequence of the processes shown in FIG. 1 . As shown in Figure 1, the method includes the steps:
- Step S101 training a target model according to a preconfigured data set, where the data set includes a training set and a verification set.
- the data set is collected according to the task requirements. For example, when the task is to realize cat and dog image recognition, multiple images of cats and dogs need to be prepared in advance.
- the data set includes a training set and a validation set, the training set is used for model training, and the validation set is used to validate the model after training.
- step S101 after acquiring a pre-configured data set, deep learning training is performed using the data set, so as to obtain the target model.
- the method further includes: preprocessing the data set.
- the preprocessing of the data set includes: data normalization, standardization, etc. If the data is too small, the data set can also be expanded by means of graphic rotation, cropping, etc.
- Step S102 Obtain a pre-trained original model, and the target model has the same network structure as part or all of the original model.
- a deep learning model usually includes an activation function layer, a convolutional layer, a fully connected layer, a pooling layer, a BN (Batch Normalization) layer, etc., of which some or all of the convolutional layers or BN layers must have the same network structure.
- Step S103 Obtain the correspondence between the target model and the layers with the same network structure in the original model, and the parameter correspondence of the corresponding layers.
- step S103 after acquiring the original model, confirm the layers in the target model with the same network structure as the original model, and then make a one-to-one correspondence between these layers, and then make a one-to-one correspondence with the parameters in the layers.
- Step S104 Extract multiple original model parameters from layers with the same network structure in the original model.
- step S104 after confirming the layers with the same network structure, all parameters of the layers with the same network structure are extracted from the original model to obtain a plurality of original model parameters.
- Step S105 According to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the verification set, and when the verification is passed, the original model parameters are recorded for reuse.
- step S105 after extracting a plurality of original model parameters from the original model, based on the parameter correspondence, the original model parameters are used to replace the parameters corresponding to the original model parameters in the target model, and the replaced target model is obtained. Without retraining the replaced target model, directly use the verification set to verify the replaced target model. When the verification passes, it is considered that the original model parameters can be reused in the target model. When the verification fails, It is considered that the original model parameters cannot be reused in the target model.
- the above steps are performed cyclically until each original model parameter is verified, and finally all original model parameters that can be reused to the target model are obtained.
- the parameters Runningmean, Runningvar, weight, and bias of the BN layer are represented by RM, RV, RW, and RB, respectively, and the parameters weight and bias of the convolutional layer are represented by W and B, respectively.
- B 1 first, for the BN layer, find the BN layer of the target model corresponding to the BN layer of the original model in the target model, and find the target model parameters RM 2 , RV 2 , RW 2 , RB 2 from it , and then use RM 1 to replace RM 2 , do not retrain the replaced target model, directly verify the replaced target model on the validation set, when the verification is passed, record the RM 1 can be reused, and then replace the The target model is restored to its original state, and then RV 1 is used to replace RV 2 , and the verification is performed again until the four parameters of RM 1 , RV 1 , RW 1 , and RB 1 are all verified; then, for the convolutional layer, in the target model Find out the convolution layer of the target model corresponding to the convolution layer of the original model, and find out the target model parameters W 2 , B 2 , and then use W 1 to replace W 2 without retraining the replaced target Model, verify the replaced target model directly on the verification set, when the verification
- the data set is divided into a training set and a verification set, and the training set is used to obtain a target model, and the first result obtained when training the target model is recorded.
- the replaced target model is verified by using the verification set, and the second result obtained by the verification is recorded.
- the verification is passed, and the recorded original model parameters can be reused; If the difference between the result and the second result is not within the preset range, the verification fails, and the original model parameters cannot be reused.
- Step S106 Use all reusable original model parameters to replace the corresponding parameters in the target model, and then train the new target model after obtaining a new target model.
- step S106 after the reusable original model parameters are obtained through verification, all the reusable original model parameters are used to replace the corresponding parameters in the target model to obtain a new target model, and then the new target model is obtained by using the data set The model is trained.
- the step of training the new target model includes: directly using the training set to train the new target model.
- the parameters reused in the new target model can be fine-tuned, so that the training effect of the new target model is better.
- the step of training the new target model may further include: freezing reusable original model parameters in the new target model, and then using the training set to train the new target model.
- the method for reusing parameters of a deep learning model performs initial training according to a preset data set to obtain a target model, and then obtains a pre-trained part or all of the same network structure as the target model.
- the original model and then replace the parameters of the layers with the same network structure of the original model as the target model to the target model one by one, and then verify the replaced target model on the validation set.
- the method of parameter reuse obtains a model with good effect, and the method of filtering out the reusable parameters by verifying the parameters one by one makes the selection of the reusable parameters more purposeful, and can help to select the most suitable reusable parameters, so as to avoid Blindly choose reusable parameters.
- FIG. 2 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to the first embodiment of the present invention.
- the apparatus 20 includes a training module 21 , a first acquisition module 22 , a second acquisition module 23 , an extraction module 24 , a verification module 25 and a migration module 26 .
- the training module 21 is used for training a target model according to a preconfigured data set, and the data set includes a training set and a verification set.
- the first obtaining module 22 is used to obtain a pre-trained original model, and the target model has the same network structure as part or all of the original model.
- the second obtaining module 23 is configured to obtain the corresponding relationship between the target model and the layer with the same network structure in the original model, and the parameter corresponding relationship of the corresponding layer.
- the extraction module 24 is used for extracting a plurality of original model parameters from layers with the same network structure in the original model.
- the verification module 25 is used to replace the corresponding parameters in the target model with each original model parameter one by one according to the parameter correspondence, and verify the replaced target model on the verification set, and when the verification passes, record the original model parameters for reuse .
- the migration module 26 is configured to replace the corresponding parameters in the target model with all reusable original model parameters, and then train the new target model after obtaining a new target model.
- the training module 21 is further configured to preprocess the data set before training to obtain the target model according to the preconfigured data set.
- the verification module 25 verifies the replaced target model, and when the verification is passed, the operation of recording the reusable pair of the original model parameters can also be: obtaining the first result obtained by training the target model according to the training set; verifying according to the verification set. For the replaced target model, record the second result of verification; determine whether the difference between the first result and the second result is within the preset range; when the difference between the first result and the second result is within the preset range, verify Pass, record the original model parameters can be reused.
- the operation of the transfer module 26 to train the new target model may be to directly use the training set to train the new target model.
- the operation of the migration module 26 to train the new target model may also be to freeze the reusable original model parameters in the new target model, and then use the training set to train the new target model.
- FIG. 3 is a schematic flowchart of a method for reusing parameters of a deep learning model according to a second embodiment of the present invention. It should be noted that, if there is substantially the same result, the method of the present invention is not limited to the flow sequence shown in FIG. 3 . As shown in Figure 3, the method includes the steps:
- Step S301 The target model is obtained by training according to the pre-configured first data set, and the original model is obtained by training according to the pre-configured second data set.
- the target model and the original model have the same part or all of the network structure, and the first data set includes: The first training set and the first validation set.
- the first data set and the second data set may be completely the same data set, and when the first data set and the second data set are the same, the target model and the original model may be the same data set for different Two models of tasks.
- the first data set and the second data set may also be two different data sets, and the target model and the original model may be two models for the same task or different tasks from different data sets.
- the data volume of the first data set is smaller than the data volume of the second data set.
- the second data set can be used to obtain a model with good effect.
- the parameters of the original model are reused in the target model trained based on the first data set, thereby improving the training effect of the target model.
- Step S302 Obtain the correspondence between the target model and the layers with the same network structure in the original model, and the parameter correspondence of the corresponding layers.
- step S302 in FIG. 3 is similar to step S103 in FIG. 1, and for the sake of brevity, details are not repeated here.
- Step S303 Extract multiple original model parameters from layers with the same network structure in the original model.
- step S303 in FIG. 3 is similar to step S104 in FIG. 1 , and for the sake of brevity, details are not repeated here.
- Step S304 According to the parameter correspondence, each original model parameter is used to replace the corresponding parameters in the target model one by one, and the replaced target model is verified on the first verification set, and when the verification is passed, the original model parameters are recorded for reuse.
- step S304 in FIG. 3 is similar to step S105 in FIG. 1 , and for the sake of brevity, details are not repeated here.
- Step S305 Use all reusable original model parameters to replace the corresponding parameters in the target model, and then train the new target model after obtaining a new target model.
- step S305 in FIG. 3 is similar to step S106 in FIG. 1 , and for the sake of brevity, details are not repeated here.
- the method for reusing parameters of a deep learning model according to the second embodiment of the present invention is obtained by selecting a similar data set with a large amount of data for training when there is no trained model for parameter reusing. Models with reusable parameters can be provided, and then the parameters can be reused between models, so as to avoid the problem that it is difficult to train a model with better effect due to insufficient data volume.
- FIG. 4 is a schematic diagram of functional modules of a parameter reusing device for a deep learning model according to a second embodiment of the present invention.
- the apparatus 40 includes a training module 41 , an acquisition module 42 , an extraction module 43 , a verification module 44 and a migration module 45 .
- the training module 41 is used to train the target model according to the pre-configured first data set, and obtain the original model according to the pre-configured second data set.
- the data volume of the data set is smaller than that of the second data set, and the first data set includes a first training set and a first validation set.
- the obtaining module 42 is configured to obtain the corresponding relationship between the target model and the layer with the same network structure in the original model, and the parameter corresponding relationship of the corresponding layer.
- the extraction module 43 is used for extracting a plurality of original model parameters from layers with the same network structure in the original model.
- the verification module 44 is used to replace the corresponding parameters in the target model with each original model parameter one by one according to the parameter correspondence, and verify the replaced target model on the first verification set, and when the verification passes, record the original model parameters Reusable.
- the migration module 45 is configured to replace the corresponding parameters in the target model with all reusable original model parameters, and then train the new target model after obtaining a new target model.
- FIG. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
- the terminal 60 includes a processor 61 and a memory 62 coupled to the processor 61 .
- the memory 62 stores program instructions for implementing the parameter reuse method of the deep learning model described in any of the above embodiments.
- the processor 61 is configured to execute program instructions stored in the memory 62 to realize parameter reuse between different deep learning models.
- the processor 61 may also be referred to as a CPU (Central Processing Unit, central processing unit).
- the processor 61 may be an integrated circuit chip with signal processing capability.
- the processor 61 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components .
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.
- the storage medium of the embodiment of the present invention stores a program file 71 capable of implementing all the above methods, wherein the program file 71 may be stored in the above-mentioned storage medium in the form of a software product, and includes several instructions to enable a computer device (which may A personal computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the methods described in the various embodiments of the present application.
- a computer device which may A personal computer, a server, or a network device, etc.
- processor processor
- the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes , or terminal devices such as computers, servers, mobile phones, and tablets.
- the disclosed system, apparatus and method may be implemented in other manners.
- the device embodiments described above are only illustrative.
- the division of units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种深度学习模型的参数重用方法,其特征在于,包括:根据预先配置好的数据集训练得到目标模型,所述数据集包括训练集和验证集;获取预先训练好的原模型,所述目标模型与所述原模型的部分或全部网络结构相同;获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;从所述原模型中所述网络结构相同的层提取得到多个原模型参数;根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
- 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用,包括:获取根据所述训练集训练所述目标模型得到的第一结果;根据所述验证集验证所述替换后的目标模型,记录验证的第二结果;判断所述第一结果与所述第二结果的差值是否在预设范围内;当所述第一结果与所述第二结果的差值在预设范围内时,验证通过,记录所述原模型参数可重用。
- 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述训练所述新的目标模型,包括:直接利用所述训练集训练所述新的目标模型。
- 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述训练所述新的目标模型,包括:冻结所述新的目标模型中可重用的原模型参数,再利用所述训练集训练所述新的目标模型。
- 根据权利要求1所述的深度学习模型的参数重用方法,其特征在于,所述根据预先配置好的数据集训练得到目标模型之前,还包括:对所述数据集进行预处理。
- 一种深度学习模型的参数重用装置,其特征在于,包括:训练模块,用于根据预先配置好的数据集训练得到目标模型,所述数据集包括训练集和验证集;第一获取模块,用于获取预先训练好的原模型,所述目标模型与所述原模型的部分或全部网络结构相同;第二获取模块,用于获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;提取模块,用于从所述原模型中所述网络结构相同的层提取得到多个原模型参数;验证模块,用于根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;迁移模块,用于利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
- 一种深度学习模型的参数重用方法,其特征在于,包括:根据预先配置好的第一数据集训练得到目标模型,并根据预先配置好的第二数据集训练得到原模型,所述目标模型和所述原模型的部分或全部网络结构相同,所述第一数据集包括第一训练集和第一验证集;获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;从所述原模型中所述网络结构相同的层提取得到多个原模型参数;根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述第一验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
- 一种深度学习模型的参数重用装置,其特征在于,包括:训练模块,用于根据预先配置好的第一数据集训练得到目标模型,并根据 预先配置好的第二数据集训练得到原模型,所述目标模型和所述原模型的部分或全部网络结构相同,所述第一数据集包括第一训练集和第一验证集;获取模块,用于获取所述目标模型和所述原模型中网络结构相同的层的对应关系,以及对应层的参数对应关系;提取模块,用于从所述原模型中所述网络结构相同的层提取得到多个原模型参数;验证模块,用于根据所述参数对应关系,逐个利用每个所述原模型参数替换所述目标模型中的对应参数,并在所述第一验证集上验证替换后的目标模型,且当验证通过时,记录所述原模型参数可重用;迁移模块,用于利用所有可重用的原模型参数替换掉所述目标模型中的对应参数,得到新的目标模型后,再训练所述新的目标模型。
- 一种终端,其特征在于,所述终端包括处理器、与所述处理器耦接的存储器,其中,所述存储器存储有用于实现如权利要求1-5或权利要求7中任一项所述的深度学习模型的参数重用方法的程序指令;所述处理器用于执行所述存储器存储的所述程序指令以实现不同深度学习模型之间的参数重用。
- 一种存储介质,其特征在于,存储有能够实现如权利要求1-5或权利要求7中任一项所述的深度学习模型的参数重用方法的程序文件。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/106,988 US20230196120A1 (en) | 2020-08-07 | 2023-02-07 | Method, device, terminal, and storage medium for reusing parameters of a deep learning model |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010786350.0 | 2020-08-07 | ||
CN202010786350.0A CN114065903A (zh) | 2020-08-07 | 2020-08-07 | 深度学习模型的参数重用方法、装置、终端及存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/106,988 Continuation US20230196120A1 (en) | 2020-08-07 | 2023-02-07 | Method, device, terminal, and storage medium for reusing parameters of a deep learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022027806A1 true WO2022027806A1 (zh) | 2022-02-10 |
Family
ID=80118613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/117656 WO2022027806A1 (zh) | 2020-08-07 | 2020-09-25 | 深度学习模型的参数重用方法、装置、终端及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230196120A1 (zh) |
CN (1) | CN114065903A (zh) |
WO (1) | WO2022027806A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998893A (zh) * | 2022-06-14 | 2022-09-02 | 江南大学 | 基于半监督迁移学习的食品品质无损检测模型构建方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583594A (zh) * | 2018-11-16 | 2019-04-05 | 东软集团股份有限公司 | 深度学习训练方法、装置、设备及可读存储介质 |
CN110378487A (zh) * | 2019-07-18 | 2019-10-25 | 深圳前海微众银行股份有限公司 | 横向联邦学习中模型参数验证方法、装置、设备及介质 |
CN110782043A (zh) * | 2019-10-29 | 2020-02-11 | 腾讯科技(深圳)有限公司 | 模型优化方法、装置、存储介质及服务器 |
US20200104706A1 (en) * | 2018-09-27 | 2020-04-02 | Google Llc | Parameter-Efficient Multi-Task and Transfer Learning |
-
2020
- 2020-08-07 CN CN202010786350.0A patent/CN114065903A/zh active Pending
- 2020-09-25 WO PCT/CN2020/117656 patent/WO2022027806A1/zh active Application Filing
-
2023
- 2023-02-07 US US18/106,988 patent/US20230196120A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200104706A1 (en) * | 2018-09-27 | 2020-04-02 | Google Llc | Parameter-Efficient Multi-Task and Transfer Learning |
CN109583594A (zh) * | 2018-11-16 | 2019-04-05 | 东软集团股份有限公司 | 深度学习训练方法、装置、设备及可读存储介质 |
CN110378487A (zh) * | 2019-07-18 | 2019-10-25 | 深圳前海微众银行股份有限公司 | 横向联邦学习中模型参数验证方法、装置、设备及介质 |
CN110782043A (zh) * | 2019-10-29 | 2020-02-11 | 腾讯科技(深圳)有限公司 | 模型优化方法、装置、存储介质及服务器 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998893A (zh) * | 2022-06-14 | 2022-09-02 | 江南大学 | 基于半监督迁移学习的食品品质无损检测模型构建方法 |
Also Published As
Publication number | Publication date |
---|---|
CN114065903A (zh) | 2022-02-18 |
US20230196120A1 (en) | 2023-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Deep incremental hashing network for efficient image retrieval | |
US10691909B2 (en) | User authentication method using fingerprint image and method of generating coded model for user authentication | |
US20200387755A1 (en) | Optimizing training data for image classification | |
EP3370188B1 (en) | Facial verification method, device, and computer storage medium | |
US9275307B2 (en) | Method and system for automatic selection of one or more image processing algorithm | |
Bai et al. | CNN feature boosted SeqSLAM for real‐time loop closure detection | |
TW201926148A (zh) | 機器學習模型的訓練方法、裝置以及電子設備 | |
CN110276406B (zh) | 表情分类方法、装置、计算机设备及存储介质 | |
WO2019082165A1 (en) | GENERATION OF NEURAL NETWORKS WITH COMPRESSED REPRESENTATION HAVING A HIGH DEGREE OF PRECISION | |
WO2020134099A1 (zh) | 物品识别方法、设备和系统 | |
CN109086697A (zh) | 一种人脸数据处理方法、装置及存储介质 | |
WO2022027806A1 (zh) | 深度学习模型的参数重用方法、装置、终端及存储介质 | |
Lu et al. | Combining context, consistency, and diversity cues for interactive image categorization | |
CN110737648B (zh) | 性能特征降维方法及装置、电子设备及存储介质 | |
CN108154120A (zh) | 视频分类模型训练方法、装置、存储介质及电子设备 | |
CN104573737A (zh) | 特征点定位的方法及装置 | |
EP3166022A1 (en) | Method and apparatus for image search using sparsifying analysis operators | |
JP2022548341A (ja) | 目標モデルの取得 | |
JP2008009548A (ja) | モデル作成装置および識別装置 | |
Le et al. | City-scale visual place recognition with deep local features based on multi-scale ordered VLAD pooling | |
WO2023010701A1 (en) | Image generation method, apparatus, and electronic device | |
WO2023060575A1 (zh) | 图像识别方法、装置、电子设备及存储介质 | |
US20230196073A1 (en) | Method for secure use of a first neural network on an input datum and method for learning parameters of a second neural network | |
US20230196831A1 (en) | Image Group Classifier in a User Device | |
Zhang et al. | Motion Field Consensus with Locality Preservation: A Geometric Confirmation Strategy for Loop Closure Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20948197 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20948197 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.07.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20948197 Country of ref document: EP Kind code of ref document: A1 |