CN111444958B - Model migration training method, device, equipment and storage medium - Google Patents

Model migration training method, device, equipment and storage medium Download PDF

Info

Publication number
CN111444958B
CN111444958B CN202010219449.2A CN202010219449A CN111444958B CN 111444958 B CN111444958 B CN 111444958B CN 202010219449 A CN202010219449 A CN 202010219449A CN 111444958 B CN111444958 B CN 111444958B
Authority
CN
China
Prior art keywords
migration
model
training
layer
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010219449.2A
Other languages
Chinese (zh)
Other versions
CN111444958A (en
Inventor
卢阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010219449.2A priority Critical patent/CN111444958B/en
Publication of CN111444958A publication Critical patent/CN111444958A/en
Application granted granted Critical
Publication of CN111444958B publication Critical patent/CN111444958B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a model migration training method, device, equipment and storage medium, and relates to the field of artificial intelligence. The specific implementation scheme is as follows: taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model; constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters; and training a target model comprising initial parameters based on the target function. According to the method and the device, when the objective function is constructed, the distance between the training parameters of the migration layers and the initial parameters is introduced, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaptation of the target model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the target model is improved.

Description

Model migration training method, device, equipment and storage medium
Technical Field
The present application relates to computer technologies, and in particular, to the field of artificial intelligence, and in particular, to a model migration training method, apparatus, device, and storage medium.
Background
The transfer learning can utilize the similarity among data, tasks or models to apply the source model trained by the source field (namely the old field) to the target model of the target field (namely the new field), thereby reducing the requirement on massive data resources and solving the problem of high cost of training tasks in the process of training the target model.
In the prior art, when the model training of the target model is carried out, the network parameters of the target model are initialized by adopting the network parameters of the source model, the random initialization is replaced, and the retraining is carried out on the initialized target model.
However, the generalization capability of the target model obtained by training in the above manner is poor, and the overfitting phenomenon is easy to occur.
Disclosure of Invention
The embodiment of the application provides a model migration training method, device, equipment and storage medium, so as to improve the generalization capability of a target model and avoid the occurrence of the over-fitting phenomenon.
In a first aspect, an embodiment of the present application provides a model migration training method, including:
taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model;
constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters;
And training a target model comprising initial parameters based on the target function.
According to the embodiment of the application, network parameters of at least two migration layers in a source model are used as initial parameters of associated migration layers in a target model; constructing an objective function according to the distance between the training parameters and the initial parameters associated with at least two migration layers; the object model including the initial parameters is trained based on the object function. According to the technical scheme, the distance between the training parameters of the migration layers and the initial parameters is introduced when the objective function is constructed, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaption of the objective model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the objective model is improved.
Optionally, constructing an objective function according to a distance between the training parameters associated with the at least two migration layers and the initial parameters, including:
and constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.
In an optional implementation manner in the above application, the weight of the migration layer is introduced in the objective function construction process, so that the information retention capacity of different migration layers is considered on the network level, and the accuracy of migration learning is effectively improved on the premise of preventing migration fitting.
Optionally, before constructing the objective function according to the weights of the at least two migration layers and the distances between the training parameters associated with the at least two migration layers and the initial parameters, the method further includes:
and determining the weight of the migration layer according to the network level information of the migration layer in the model.
According to an optional implementation manner in the application, before the objective function is constructed, the weight of the migration layer is determined by adding the network level information according to the migration layer in the model, so that a determination mechanism of the weight of the migration layer is perfected, the damage to shallow characteristic information caused by too strong learning in the training process is avoided, and the situation that deep characteristic information is lost due to too weak learning in the training process is avoided.
Optionally, determining the weight of the migration layer according to the network level information of the migration layer in the model includes:
Determining the sequence number of the network block to which the migration layer belongs;
and determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.
In an optional implementation manner in the above application, the determining process of the migration layer is refined to determine the sequence number of the network block to which the migration layer belongs; according to the sequence number of the network block to which the migration layer belongs, the weight of the migration layer is determined, so that the weight determining mechanism of the migration layer is perfected, the extraction of shallow general information and the extraction of deep semantic information are ensured, and the importance of information extracted at different levels is considered.
Optionally, determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs includes:
based on a weight function, determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs; wherein the weight function is a decreasing function of the sequence number of the network block.
In an optional implementation manner in the above application, the weight determining process of the migration layer is refined into a decreasing function based on the sequence number of the network block, and the weight of the migration layer is determined according to the sequence number of the network block to which the migration layer belongs, so that the weight determining mechanism of the migration layer is further perfected.
Optionally, constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, including:
determining regularization items of the migration layers according to weighted sums of distances between training parameters and the initial parameters of the migration layers;
and constructing the objective function according to the regularization item and the loss function of the migration layer.
In an optional implementation manner in the above application, the construction process of the objective function is refined to construct the objective function according to the regularization term and the loss function of the migration layer, so that the construction mechanism of the objective function is perfected.
Optionally, constructing the objective function according to the regularization term and the loss function of the migration layer includes:
determining regularization items of the full-connection layer according to training parameters of the full-connection layer;
and constructing the objective function according to the regularization item of the migration layer, the regularization item of the full-connection layer and the loss function.
According to an optional implementation manner in the application, regularization items of the full-connection layer are introduced in the process of constructing the objective function, so that the situation that the training result of the objective model is poor due to constraint on the full-connection layer can be avoided, and the objective data set in the objective field can be matched as much as possible in the training process of the objective model.
Optionally, the migration layer is an image feature extraction layer.
In an optional implementation manner in the above application, the migration layer is refined into the image feature extraction layer, so that the model migration training method is applied to the migration process of the network model including the image feature extraction class, and the application scene of the model migration training method is enriched.
In a second aspect, an embodiment of the present application further provides a model migration training apparatus, including:
the initial parameter determining module is used for taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model;
the objective function construction module is used for constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters;
and the target model training module is used for training a target model comprising initial parameters based on the target function.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model migration training method as provided by embodiments of the first aspect.
In a fourth aspect, embodiments of the present application further provide a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a model migration training method provided by the embodiments of the first aspect.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flow chart of a model migration training method in accordance with a first embodiment of the present application;
FIG. 2 is a flow chart of a model migration training method in a second embodiment of the present application;
FIG. 3A is a flow chart of a model migration training method in a third embodiment of the present application;
FIG. 3B is a schematic diagram of an objective function construction process in a third embodiment of the present application;
FIG. 4 is a block diagram of a model migration training apparatus according to a fourth embodiment of the present application;
FIG. 5 is a block diagram of an electronic device for implementing a model migration training method of an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Example 1
Fig. 1 is a flowchart of a model migration training method in a first embodiment of the present application, which is applicable to a case of migrating a source model in a source domain to a target model in a target domain and training the target model. The method is executed by a model migration training device which is realized by software and/or hardware and is specifically configured in the electronic equipment.
A model migration training method as shown in fig. 1, comprising:
s101, taking network parameters of at least two migration layers in a source model as initial parameters of associated migration layers in a target model.
The source model is understood to be a stable network model after being successfully trained by a large number of source training samples in the source field. The target model may be understood as a model to be trained in a target domain similar to the source domain, and the number of target training samples in the model is far smaller than the number of source training samples when training the source model.
In order to avoid the occurrence of the target model overfitting phenomenon caused by training only by adopting a small number of target training samples when training the target model, the network parameters of the source model which are already trained can be migrated into the target model to serve as initial parameters of the target model, so that the target model containing the initial parameters is trained.
The migration layer may be understood as a network layer capable of migrating from a source model to a target model. The migration layer is part or all of the network layers in the source model. Illustratively, the migration layer includes at least a convolution layer. The convolution layer may be a feature extraction layer. For example, the source model or the target model may be at least one of an image classification, an image segmentation, and an image detection class network, and the migration layer may include an image feature extraction layer, respectively.
S102, constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters.
The training parameters are network parameters corresponding to each migration layer when training the target model.
The distance between the training parameter and the initial parameter associated with each migration layer can be realized by a norm solving mode. The order of the norms can be determined by the skilled person according to the need or an empirical value. Illustratively, it may be zero-norm, one-norm, or two-norm, etc.
In an optional implementation manner of the embodiment of the present application, the constructing the objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters may be: determining regularization items of the migration layers according to the distances between training parameters and initial parameters of at least two migration layers; and constructing an objective function according to the regularization term and the loss function of the migration layer.
The loss function is used for measuring the degree of inconsistency between the predicted value and the true value of the model and measuring the robustness of the model so as to reduce the experience risk of the model. The smaller the loss function, the better the robustness of the characterization model. Illustratively, the loss function may include at least one of a hinge loss function, a cross entropy loss function, a square loss function, an exponential loss function, and an absolute value loss function, among others.
Wherein the regularization term is used to gauge the complexity of the model, thereby reducing the structural risk of the model.
Illustratively, the objective function may be constructed using the following formula:
wherein θ is an objective function, ω f Training parameter matrixes are obtained for the training parameter combinations of the migration layers;an initial parameter matrix is obtained for the initial parameter combination of each migration layer; alpha is a super parameter, and can be determined by a technician according to the needs or experience values; i 2 Is a two-norm.
It can be appreciated that by introducing regularization terms of the migration layer, features of the source model can be retained in the target model as much as possible, so that constraint control is performed on the target model in the process of training the target model.
The full-connection layer is usually the last layer in the network model, and because of the different roles of different network models, the network parameters of the corresponding full-connection layer also have larger variability, so that the migration layer usually does not include the full-connection layer in order to enable the target model to be close to the target data set in the target field corresponding to the target model as much as possible.
In order to constrain the network parameters of the fully connected layer during the training process of the objective model, in an optional implementation manner of the embodiment of the present application, regularization terms of the fully connected layer may also be introduced into the objective function. Correspondingly, when an objective function is constructed according to the regularization term and the loss function of the migration layer, the objective function is constructed according to the regularization term of the migration layer, the regularization term of the full connection layer and the loss function. Illustratively, the regularization term of the fully connected layer may be determined from training parameters of the fully connected layer of the target model.
By way of example, the construction of the objective function may be performed using the following formula:
wherein beta is a super parameter, omega fc Training parameters of the full connection layer of the target model.
Because the ability of different migration layers to extract information is different, the meaning of the extracted information is also different, for example, shallow migration layers can extract general feature information, while deep migration layers can extract semantic feature information. In order to consider the information extraction capability of different migration layers and further ensure the comprehensiveness of the information extracted in the target model, in another optional implementation manner of the embodiment of the present application, the objective function is constructed according to the distance between the training parameters associated with the at least two migration layers and the initial parameters, where the objective function may be: and constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters. The weight of each migration layer can be determined by a technician according to the needs or experience values, and can be repeatedly determined through a large number of experiments.
S103, training a target model comprising initial parameters based on the target function.
And training the target model comprising the initial parameters based on the target function so that the trained target model meets the set training conditions.
According to the embodiment of the application, network parameters of at least two migration layers in a source model are used as initial parameters of associated migration layers in a target model; constructing an objective function according to the distance between the training parameters and the initial parameters associated with at least two migration layers; the object model including the initial parameters is trained based on the object function. According to the technical scheme, the distance between the training parameters of the migration layers and the initial parameters is introduced when the objective function is constructed, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaption of the objective model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the objective model is improved.
Example two
Fig. 2 is a flowchart of a model migration training method in a second embodiment of the present application, where the embodiment of the present application performs optimization and improvement based on the technical solutions of the foregoing embodiments.
Further, the operation of constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters is thinned into the operation of constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, so as to perfect a construction mechanism of the objective function.
A model migration training method as shown in fig. 2, comprising:
s201, taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model.
S202, constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.
The weight of each migration layer can be determined by a technician according to the needs or experience values, and can also be repeatedly determined through a large number of experiments.
Optionally, the weight of the migration layer is determined according to the network level information of the migration layer in the model. For example, the weight of the migration layer may be determined based on network level information of the migration layer in the source model and/or the target model.
The weight of the migration layer is determined according to the network level information of the migration layer in the model, and the sequence number of each migration layer is used as the network level information after the migration layers are numbered sequentially, so that the weight of the migration layer is determined according to the sequence number of the migration layer.
In order to improve training efficiency of the target model, the migration layer may be divided into network blocks in advance, where one network block includes at least one migration layer. Wherein the number of migration layers contained in different network blocks is the same or different. Correspondingly, determining the weight of the migration layer according to the network level information of the migration layer in the model, wherein the weight can be the sequence number of the network block to which the migration layer belongs; and determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.
Illustratively, based on the weight function, determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs; wherein the weight function is a decreasing function of the sequence number of the network block.
It can be understood that by introducing the decreasing function, different weights can be allocated to the migration layers contained in different network blocks, and when the sequence number of the network block is smaller, the shallower the level of the migration layer in the network block is indicated, and the greater the corresponding weight of the migration layer is, so that damage to general feature information extracted by a shallow network layer in the training process of the target model is avoided. When the sequence number of the network block is larger, the deeper the hierarchy of the migration layer in the network block is indicated, and the smaller the weight of the corresponding migration layer is, so that when the target model is trained, the network parameters of the deep network layer are modified to adapt to the deep semantic features in the target data set in the target field, and the situation that the accuracy of the target model is poor due to the deep semantic information loss is avoided.
Illustratively, the regularization term of the migration layer may be determined according to a weighted sum of the distances between the training parameters and the initial parameters of each migration layer; and constructing an objective function according to the regularization term and the loss function of the migration layer.
In order to constrain network parameters of the fully connected layer in the training process of the target model, in an optional implementation manner of the embodiment of the application, a regularization item of the fully connected layer may be determined according to the training parameters of the fully connected layer; and constructing the objective function according to the regularization item of the migration layer, the regularization item of the full-connection layer and the loss function.
S203, training a target model comprising initial parameters based on the target function.
It can be understood that, because weights are introduced for differentiation of each migration layer in the objective function, on one hand, all network parameters need to participate in retraining, and on the other hand, importance of information of extracted features of different layers is considered, and a compromise between fine tuning and feature extraction is realized to a certain extent. The fine tuning is to initialize the target model by using network parameters of the source model to replace the original random initialization; feature extraction, namely, taking a target model as a feature extractor of a target task, freezing the previous layers, and training only the last full-connection layer.
According to the embodiment of the application, the construction process of the objective function is refined into the construction of the objective function according to the weights of at least two migration layers and the distance between the training parameters associated with at least two migration layers and the initial parameters, so that in the process of training the objective function, network parameters of migration layers of different levels are treated differently, information retention capacities of different migration layers are considered, and on the premise of placement and migration probabilistic fitting, the accuracy of migration learning is effectively improved.
Example III
Fig. 3A is a flowchart of a model migration training method in a third embodiment of the present application, where a preferred implementation manner is provided based on the technical solutions of the foregoing embodiments.
A model migration training method as shown in fig. 3A, comprising:
s301, taking network parameters of each migration layer in the source model as initial parameters of corresponding migration layers in the target model. The migration layer is an image feature extraction layer.
S302, dividing a migration layer into a plurality of network blocks.
S303, determining the weight of the migration layer according to the sequence number of the network block to which each migration layer belongs based on the weight function.
Specifically, the weight of the migration layer is determined according to the following formula:
W i =softmax(N-i);
Wherein W is i Weights for each migration layer in the ith network block; n is the total number of network blocks; the softmax () function is a decreasing function of i.
S304, determining regularization items of the migration layers according to the weights of the migration layers and the distance between the training parameters and the initial parameters of the migration layers.
Specifically, the regularization term of the migration layer is determined according to the following formula:
wherein omega fi Training parameter matrixes are obtained for training parameter combinations of all migration layers in the ith network block;an initial parameter matrix is obtained for the initial parameter combination of each migration layer in the ith network block; alpha is a super parameter, and can be determined by a technician according to the needs or experience values; i 2 Is a two-norm.
S305, determining regularization items of the full-connection layer according to training parameters of the full-connection layer in the target model.
Specifically, the regularization term of the full connection layer is determined according to the following formula:
wherein omega fc The training parameters of the full-connection layer are beta super parameters, and the beta super parameters can be determined by technicians according to requirements or experience values; i 2 Is a two-norm.
S306, constructing an objective function according to the regularization item of the migration layer, the regularization item of the full-connection layer and the loss function.
Specifically, an objective function is constructed according to the following formula:
where Loss is the Loss function and θ is the objective function.
S307, training the target network comprising the initial parameters based on the target function.
In connection with the objective function construction process shown in fig. 3B, 3 network blocks (network block-1, network block-2, and network block-3) are exemplified.
The source model is trained by adopting a source training sample of a source data set in the source field; the target model is trained by using target training samples of a target data set in the target field. Wherein the number of source training samples is substantially greater than the number of target training samples; the source domain and the target domain are similar domains. The source model is a trained network model, and the target model is a network model to be trained.
Wherein, the network parameters of each network block in the source model are as followsThe training parameter of each network block in the target model is omega f1f3 The method comprises the steps of carrying out a first treatment on the surface of the The weight of each network block is W 1 -W 3
In migration based on a source model, multiplying the distance between the network parameters learned by a target model and the network parameters of the source model by the network block weight W to obtain a weighted distance item, and overlapping the distance items in each network block to obtain a regularized item of a migration layer. Network parameters omega of full connection layer according to object model fc Obtaining a regularization term of the full connection layer; adding the regularization item of the migration layer and the regularization item of the full connection layer to obtain a final regularization item; and introducing the final regularization term into a loss function of the target model to construct an objective function.
By the method, inheritance of source model information and self-adaption of a target model are realized. On one hand, the shallow general feature training information of the source model can be reserved as much as possible; on the other hand, the difference in weight of the deep specific features may be increased. The source model is reserved in different levels and integrated into the loss function, so that the accuracy of migration learning can be effectively improved on the premise of preventing migration through fitting.
Example IV
Fig. 4 is a block diagram of a model migration training apparatus in a fourth embodiment of the present application, which is applicable to a case where a source model in a source domain is migrated to a target model in a target domain and the target model is trained. The device is realized by software and/or hardware and is specifically configured in the electronic equipment.
A model migration training apparatus 400 as shown in fig. 4, comprising: an initial parameter determination module 401, an objective function construction module 402, and an objective model training module 403. Wherein,
An initial parameter determining module 401, configured to take network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model;
an objective function construction module 402, configured to construct an objective function according to a distance between the training parameters associated with the at least two migration layers and the initial parameters;
the object model training module 403 is configured to train an object model including initial parameters based on the object function.
According to the embodiment of the application, network parameters of at least two migration layers in a source model are used as initial parameters of associated migration layers in a target model through an initial parameter determination module; constructing an objective function according to the distance between the training parameters and the initial parameters associated with at least two migration layers by an objective function construction module; the target model including the initial parameters is trained based on the target function by a target model training module. According to the technical scheme, the distance between the training parameters of the migration layers and the initial parameters is introduced when the objective function is constructed, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaption of the objective model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the objective model is improved.
Further, the objective function construction module 402 includes:
and the objective function construction unit is used for constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.
Further, the apparatus further includes a weight determining module configured to:
before an objective function is constructed according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, the weights of the migration layers are determined according to network level information of the migration layers in a model.
Further, the weight determining module includes:
a network block sequence number determining unit, configured to determine a sequence number of a network block to which the migration layer belongs;
and the migration layer weight determining unit is used for determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.
Further, the migration layer weight determining unit includes:
a migration layer weight determining subunit, configured to determine a weight of the migration layer according to a sequence number of a network block to which the migration layer belongs, based on a weight function; wherein the weight function is a decreasing function of the sequence number of the network block.
Further, the objective function construction unit includes:
a migration layer regularization item determining subunit, configured to determine a regularization item of a migration layer according to a weighted sum of distances between training parameters and the initial parameters of each migration layer;
and the objective function construction subunit is used for constructing the objective function according to the regularization item and the loss function of the migration layer.
Further, the objective function construction unit further includes:
a full-connection layer regularization item determining subunit, configured to determine a full-connection layer regularization item according to training parameters of the full-connection layer;
correspondingly, the objective function construction subunit is configured to construct the objective function according to the regularization term of the migration layer, the regularization term of the full connection layer, and the loss function.
Further, the migration layer is an image feature extraction layer.
The model migration training device can execute the model migration training method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of executing the model migration training method.
Example five
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, a block diagram of an electronic device implementing the model migration training method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.
Memory 502 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the model migration training method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the model migration training method provided by the present application.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the model migration training method in the embodiments of the present application (e.g., the initial parameter determination module 401, the objective function construction module 402, and the objective model training module 403 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, i.e., implements the model migration training method in the method embodiments described above.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of an electronic device implementing the model migration training method, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected via a network to an electronic device implementing the model migration training method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for implementing the model migration training method may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device implementing the model migration training method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a joystick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the embodiment of the application, network parameters of at least two migration layers in a source model are used as initial parameters of associated migration layers in a target model; constructing an objective function according to the distance between the training parameters and the initial parameters associated with at least two migration layers; the object model including the initial parameters is trained based on the object function. According to the technical scheme, the distance between the training parameters of the migration layers and the initial parameters is introduced when the objective function is constructed, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaption of the objective model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the objective model is improved.
It should be appreciated that various forms of the flows shown above may be used to re-order, add or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (16)

1. A model migration training method, comprising:
taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model; the source model or the target model is at least one of image classification, image segmentation and image detection networks, and the migration layer comprises an image feature extraction layer correspondingly; the source model is a stable network model after the source training sample is successfully trained in the source field, and the target model is a model to be trained in a target field similar to the source field;
Constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters;
training a target model comprising initial parameters based on the target function;
wherein said constructing an objective function based on a distance between the training parameters associated with the at least two migration layers and the initial parameters comprises: determining regularization items of the migration layers according to the distances between training parameters and initial parameters of at least two migration layers; and constructing an objective function according to regularization terms and a loss function of the migration layers, wherein the regularization terms are used for measuring the complexity of the model, and the distance between training parameters and initial parameters according to at least two migration layers is realized by a norm solving mode.
2. The method of claim 1, wherein constructing an objective function based on a distance between training parameters associated with the at least two migration layers and the initial parameters comprises:
and constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.
3. The method of claim 2, wherein constructing an objective function based on the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, further comprises:
And determining the weight of the migration layer according to the network level information of the migration layer in the model.
4. A method according to claim 3, wherein determining the weight of the migration layer based on the network level information of the migration layer in the model comprises:
determining the sequence number of the network block to which the migration layer belongs;
and determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.
5. The method of claim 4, wherein determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs comprises:
based on a weight function, determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs; wherein the weight function is a decreasing function of the sequence number of the network block.
6. The method of claim 2, wherein constructing an objective function based on the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters comprises:
determining regularization items of the migration layers according to weighted sums of distances between training parameters and the initial parameters of the migration layers;
And constructing the objective function according to the regularization item and the loss function of the migration layer.
7. The method of claim 6, wherein constructing the objective function from regularized items and loss functions of the migration layer comprises:
determining regularization items of the full-connection layer according to training parameters of the full-connection layer;
and constructing the objective function according to the regularization item of the migration layer, the regularization item of the full-connection layer and the loss function.
8. A model migration training apparatus, comprising:
the initial parameter determining module is used for taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model; the source model or the target model is at least one of image classification, image segmentation and image detection networks, and the migration layer comprises an image feature extraction layer correspondingly; the source model is a stable network model after the source training sample is successfully trained in the source field, and the target model is a model to be trained in a target field similar to the source field;
the objective function construction module is specifically used for determining regularization items of the migration layers according to the distance between training parameters and initial parameters of at least two migration layers; constructing an objective function according to regularization terms and a loss function of the migration layers, wherein the regularization terms are used for measuring the complexity of the model, and the distance between training parameters and initial parameters of at least two migration layers is realized by a norm solving mode;
And the target model training module is used for training a target model comprising initial parameters based on the target function.
9. The apparatus of claim 8, wherein the objective function construction module comprises:
and the objective function construction unit is used for constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.
10. The apparatus of claim 9, wherein the apparatus further comprises:
the weight determining module is used for determining the weight of the migration layer according to the network level information of the migration layer in the model before constructing an objective function according to the weight of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.
11. The apparatus of claim 10, wherein the weight determination module comprises:
a network block sequence number determining unit, configured to determine a sequence number of a network block to which the migration layer belongs;
and the migration layer weight determining unit is used for determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.
12. The apparatus according to claim 11, wherein the migration layer weight determining unit comprises:
a migration layer weight determining subunit, configured to determine a weight of the migration layer according to a sequence number of a network block to which the migration layer belongs, based on a weight function; wherein the weight function is a decreasing function of the sequence number of the network block.
13. The apparatus according to claim 9, wherein the objective function construction unit includes:
a migration layer regularization item determining subunit, configured to determine a regularization item of a migration layer according to a weighted sum of distances between training parameters and the initial parameters of each migration layer;
and the objective function construction subunit is used for constructing the objective function according to the regularization item and the loss function of the migration layer.
14. The apparatus according to claim 13, wherein the objective function construction unit further comprises:
a full-connection layer regularization item determining subunit, configured to determine a full-connection layer regularization item according to training parameters of the full-connection layer;
correspondingly, the objective function construction subunit is configured to construct the objective function according to the regularization term of the migration layer, the regularization term of the full connection layer, and the loss function.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model migration training method according to any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a model migration training method according to any one of claims 1-7.
CN202010219449.2A 2020-03-25 2020-03-25 Model migration training method, device, equipment and storage medium Active CN111444958B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010219449.2A CN111444958B (en) 2020-03-25 2020-03-25 Model migration training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010219449.2A CN111444958B (en) 2020-03-25 2020-03-25 Model migration training method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111444958A CN111444958A (en) 2020-07-24
CN111444958B true CN111444958B (en) 2024-02-13

Family

ID=71648761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010219449.2A Active CN111444958B (en) 2020-03-25 2020-03-25 Model migration training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111444958B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065903A (en) * 2020-08-07 2022-02-18 深圳先进技术研究院 Parameter reuse method and device of deep learning model, terminal and storage medium
CN112016622A (en) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 Method, electronic device, and computer-readable storage medium for model training
CN115396831A (en) * 2021-05-08 2022-11-25 中国移动通信集团浙江有限公司 Interaction model generation method, device, equipment and storage medium
CN113283481A (en) * 2021-05-14 2021-08-20 群智未来人工智能科技研究院(无锡)有限公司 Intelligent membrane pollution decision-making method based on knowledge type-two fuzzy
CN113222976B (en) * 2021-05-31 2022-08-05 河海大学 Space-time image texture direction detection method and system based on DCNN and transfer learning
CN114005015B (en) * 2021-12-28 2022-05-31 合肥的卢深视科技有限公司 Training method of image recognition model, electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832711A (en) * 2017-11-13 2018-03-23 常州大学 A kind of recognition methods again of the pedestrian based on transfer learning
WO2018227800A1 (en) * 2017-06-15 2018-12-20 北京图森未来科技有限公司 Neural network training method and device
CN109902798A (en) * 2018-05-31 2019-06-18 华为技术有限公司 The training method and device of deep neural network
CN110580496A (en) * 2019-07-11 2019-12-17 南京邮电大学 Deep migration learning system and method based on entropy minimization
CN110647920A (en) * 2019-08-29 2020-01-03 北京百度网讯科技有限公司 Transfer learning method and device in machine learning, equipment and readable medium
CN110782883A (en) * 2019-11-12 2020-02-11 百度在线网络技术(北京)有限公司 Model training method and device, electronic equipment and storage medium
WO2020035075A1 (en) * 2018-08-17 2020-02-20 第四范式(北京)技术有限公司 Method and system for carrying out maching learning under data privacy protection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018227800A1 (en) * 2017-06-15 2018-12-20 北京图森未来科技有限公司 Neural network training method and device
CN107832711A (en) * 2017-11-13 2018-03-23 常州大学 A kind of recognition methods again of the pedestrian based on transfer learning
CN109902798A (en) * 2018-05-31 2019-06-18 华为技术有限公司 The training method and device of deep neural network
WO2020035075A1 (en) * 2018-08-17 2020-02-20 第四范式(北京)技术有限公司 Method and system for carrying out maching learning under data privacy protection
CN110580496A (en) * 2019-07-11 2019-12-17 南京邮电大学 Deep migration learning system and method based on entropy minimization
CN110647920A (en) * 2019-08-29 2020-01-03 北京百度网讯科技有限公司 Transfer learning method and device in machine learning, equipment and readable medium
CN110782883A (en) * 2019-11-12 2020-02-11 百度在线网络技术(北京)有限公司 Model training method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《计算机工程与设计》2009年(第30卷)总目次;计算机工程与设计(第24期);全部页 *
基于深度稀疏辨别的跨领域图像分类;杨涵方;周向东;;计算机工程(第04期);全部页 *

Also Published As

Publication number Publication date
CN111444958A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN111444958B (en) Model migration training method, device, equipment and storage medium
CN111539223B (en) Language model training method and device, electronic equipment and readable storage medium
CN110909877B (en) Neural network model structure searching method and device, electronic equipment and storage medium
US11914964B2 (en) Method and apparatus for training semantic representation model, device and computer storage medium
US11556715B2 (en) Method for training language model based on various word vectors, device and medium
US11521370B2 (en) Point cloud data processing method, apparatus, electronic device and computer readable storage medium
CN111738414B (en) Recommendation model generation method, content recommendation method, device, equipment and medium
CN111737994A (en) Method, device and equipment for obtaining word vector based on language model and storage medium
US20210374343A1 (en) Method and apparatus for obtaining word vectors based on language model, device and storage medium
CN112270711B (en) Model training and posture prediction method, device, equipment and storage medium
CN111783601B (en) Training method and device of face recognition model, electronic equipment and storage medium
CN111582452B (en) Method and device for generating neural network model
CN112149741B (en) Training method and device for image recognition model, electronic equipment and storage medium
CN108205581A (en) The compact video features generated in digital media environment represent
CN112529180B (en) Method and apparatus for model distillation
EP3910526A1 (en) Method, apparatus, electronic device and storage medium for training semantic similarity model
CN112329453B (en) Method, device, equipment and storage medium for generating sample chapter
CN111582477A (en) Training method and device of neural network model
CN111966361A (en) Method, device and equipment for determining model to be deployed and storage medium thereof
CN114386503A (en) Method and apparatus for training a model
CN112101552A (en) Method, apparatus, device and storage medium for training a model
CN112215243A (en) Image feature extraction method, device, equipment and storage medium
CN116245749A (en) Image generation method and device
CN112529181B (en) Method and apparatus for model distillation
CN111738325B (en) Image recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant