CN111444958B

CN111444958B - Model migration training method, device, equipment and storage medium

Info

Publication number: CN111444958B
Application number: CN202010219449.2A
Authority: CN
Inventors: 卢阳
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2024-02-13
Anticipated expiration: 2040-03-25
Also published as: CN111444958A

Abstract

The application discloses a model migration training method, device, equipment and storage medium, and relates to the field of artificial intelligence. The specific implementation scheme is as follows: taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model; constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters; and training a target model comprising initial parameters based on the target function. According to the method and the device, when the objective function is constructed, the distance between the training parameters of the migration layers and the initial parameters is introduced, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaptation of the target model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the target model is improved.

Description

Model migration training method, device, equipment and storage medium

Technical Field

The present application relates to computer technologies, and in particular, to the field of artificial intelligence, and in particular, to a model migration training method, apparatus, device, and storage medium.

Background

The transfer learning can utilize the similarity among data, tasks or models to apply the source model trained by the source field (namely the old field) to the target model of the target field (namely the new field), thereby reducing the requirement on massive data resources and solving the problem of high cost of training tasks in the process of training the target model.

In the prior art, when the model training of the target model is carried out, the network parameters of the target model are initialized by adopting the network parameters of the source model, the random initialization is replaced, and the retraining is carried out on the initialized target model.

However, the generalization capability of the target model obtained by training in the above manner is poor, and the overfitting phenomenon is easy to occur.

Disclosure of Invention

The embodiment of the application provides a model migration training method, device, equipment and storage medium, so as to improve the generalization capability of a target model and avoid the occurrence of the over-fitting phenomenon.

In a first aspect, an embodiment of the present application provides a model migration training method, including:

taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model;

constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters;

And training a target model comprising initial parameters based on the target function.

According to the embodiment of the application, network parameters of at least two migration layers in a source model are used as initial parameters of associated migration layers in a target model; constructing an objective function according to the distance between the training parameters and the initial parameters associated with at least two migration layers; the object model including the initial parameters is trained based on the object function. According to the technical scheme, the distance between the training parameters of the migration layers and the initial parameters is introduced when the objective function is constructed, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaption of the objective model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the objective model is improved.

Optionally, constructing an objective function according to a distance between the training parameters associated with the at least two migration layers and the initial parameters, including:

and constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.

In an optional implementation manner in the above application, the weight of the migration layer is introduced in the objective function construction process, so that the information retention capacity of different migration layers is considered on the network level, and the accuracy of migration learning is effectively improved on the premise of preventing migration fitting.

Optionally, before constructing the objective function according to the weights of the at least two migration layers and the distances between the training parameters associated with the at least two migration layers and the initial parameters, the method further includes:

and determining the weight of the migration layer according to the network level information of the migration layer in the model.

According to an optional implementation manner in the application, before the objective function is constructed, the weight of the migration layer is determined by adding the network level information according to the migration layer in the model, so that a determination mechanism of the weight of the migration layer is perfected, the damage to shallow characteristic information caused by too strong learning in the training process is avoided, and the situation that deep characteristic information is lost due to too weak learning in the training process is avoided.

Optionally, determining the weight of the migration layer according to the network level information of the migration layer in the model includes:

Determining the sequence number of the network block to which the migration layer belongs;

and determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.

In an optional implementation manner in the above application, the determining process of the migration layer is refined to determine the sequence number of the network block to which the migration layer belongs; according to the sequence number of the network block to which the migration layer belongs, the weight of the migration layer is determined, so that the weight determining mechanism of the migration layer is perfected, the extraction of shallow general information and the extraction of deep semantic information are ensured, and the importance of information extracted at different levels is considered.

Optionally, determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs includes:

based on a weight function, determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs; wherein the weight function is a decreasing function of the sequence number of the network block.

In an optional implementation manner in the above application, the weight determining process of the migration layer is refined into a decreasing function based on the sequence number of the network block, and the weight of the migration layer is determined according to the sequence number of the network block to which the migration layer belongs, so that the weight determining mechanism of the migration layer is further perfected.

Optionally, constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, including:

determining regularization items of the migration layers according to weighted sums of distances between training parameters and the initial parameters of the migration layers;

and constructing the objective function according to the regularization item and the loss function of the migration layer.

In an optional implementation manner in the above application, the construction process of the objective function is refined to construct the objective function according to the regularization term and the loss function of the migration layer, so that the construction mechanism of the objective function is perfected.

Optionally, constructing the objective function according to the regularization term and the loss function of the migration layer includes:

determining regularization items of the full-connection layer according to training parameters of the full-connection layer;

and constructing the objective function according to the regularization item of the migration layer, the regularization item of the full-connection layer and the loss function.

According to an optional implementation manner in the application, regularization items of the full-connection layer are introduced in the process of constructing the objective function, so that the situation that the training result of the objective model is poor due to constraint on the full-connection layer can be avoided, and the objective data set in the objective field can be matched as much as possible in the training process of the objective model.

Optionally, the migration layer is an image feature extraction layer.

In an optional implementation manner in the above application, the migration layer is refined into the image feature extraction layer, so that the model migration training method is applied to the migration process of the network model including the image feature extraction class, and the application scene of the model migration training method is enriched.

In a second aspect, an embodiment of the present application further provides a model migration training apparatus, including:

the initial parameter determining module is used for taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model;

the objective function construction module is used for constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters;

and the target model training module is used for training a target model comprising initial parameters based on the target function.

In a third aspect, an embodiment of the present application further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model migration training method as provided by embodiments of the first aspect.

In a fourth aspect, embodiments of the present application further provide a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a model migration training method provided by the embodiments of the first aspect.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flow chart of a model migration training method in accordance with a first embodiment of the present application;

FIG. 2 is a flow chart of a model migration training method in a second embodiment of the present application;

FIG. 3A is a flow chart of a model migration training method in a third embodiment of the present application;

FIG. 3B is a schematic diagram of an objective function construction process in a third embodiment of the present application;

FIG. 4 is a block diagram of a model migration training apparatus according to a fourth embodiment of the present application;

FIG. 5 is a block diagram of an electronic device for implementing a model migration training method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Example 1

Fig. 1 is a flowchart of a model migration training method in a first embodiment of the present application, which is applicable to a case of migrating a source model in a source domain to a target model in a target domain and training the target model. The method is executed by a model migration training device which is realized by software and/or hardware and is specifically configured in the electronic equipment.

A model migration training method as shown in fig. 1, comprising:

s101, taking network parameters of at least two migration layers in a source model as initial parameters of associated migration layers in a target model.

The source model is understood to be a stable network model after being successfully trained by a large number of source training samples in the source field. The target model may be understood as a model to be trained in a target domain similar to the source domain, and the number of target training samples in the model is far smaller than the number of source training samples when training the source model.

In order to avoid the occurrence of the target model overfitting phenomenon caused by training only by adopting a small number of target training samples when training the target model, the network parameters of the source model which are already trained can be migrated into the target model to serve as initial parameters of the target model, so that the target model containing the initial parameters is trained.

The migration layer may be understood as a network layer capable of migrating from a source model to a target model. The migration layer is part or all of the network layers in the source model. Illustratively, the migration layer includes at least a convolution layer. The convolution layer may be a feature extraction layer. For example, the source model or the target model may be at least one of an image classification, an image segmentation, and an image detection class network, and the migration layer may include an image feature extraction layer, respectively.

S102, constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters.

The training parameters are network parameters corresponding to each migration layer when training the target model.

The distance between the training parameter and the initial parameter associated with each migration layer can be realized by a norm solving mode. The order of the norms can be determined by the skilled person according to the need or an empirical value. Illustratively, it may be zero-norm, one-norm, or two-norm, etc.

In an optional implementation manner of the embodiment of the present application, the constructing the objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters may be: determining regularization items of the migration layers according to the distances between training parameters and initial parameters of at least two migration layers; and constructing an objective function according to the regularization term and the loss function of the migration layer.

The loss function is used for measuring the degree of inconsistency between the predicted value and the true value of the model and measuring the robustness of the model so as to reduce the experience risk of the model. The smaller the loss function, the better the robustness of the characterization model. Illustratively, the loss function may include at least one of a hinge loss function, a cross entropy loss function, a square loss function, an exponential loss function, and an absolute value loss function, among others.

Wherein the regularization term is used to gauge the complexity of the model, thereby reducing the structural risk of the model.

Illustratively, the objective function may be constructed using the following formula:

wherein θ is an objective function, ω _f Training parameter matrixes are obtained for the training parameter combinations of the migration layers;an initial parameter matrix is obtained for the initial parameter combination of each migration layer; alpha is a super parameter, and can be determined by a technician according to the needs or experience values; i ₂ Is a two-norm.

It can be appreciated that by introducing regularization terms of the migration layer, features of the source model can be retained in the target model as much as possible, so that constraint control is performed on the target model in the process of training the target model.

The full-connection layer is usually the last layer in the network model, and because of the different roles of different network models, the network parameters of the corresponding full-connection layer also have larger variability, so that the migration layer usually does not include the full-connection layer in order to enable the target model to be close to the target data set in the target field corresponding to the target model as much as possible.

In order to constrain the network parameters of the fully connected layer during the training process of the objective model, in an optional implementation manner of the embodiment of the present application, regularization terms of the fully connected layer may also be introduced into the objective function. Correspondingly, when an objective function is constructed according to the regularization term and the loss function of the migration layer, the objective function is constructed according to the regularization term of the migration layer, the regularization term of the full connection layer and the loss function. Illustratively, the regularization term of the fully connected layer may be determined from training parameters of the fully connected layer of the target model.

By way of example, the construction of the objective function may be performed using the following formula:

wherein beta is a super parameter, omega _fc Training parameters of the full connection layer of the target model.

Because the ability of different migration layers to extract information is different, the meaning of the extracted information is also different, for example, shallow migration layers can extract general feature information, while deep migration layers can extract semantic feature information. In order to consider the information extraction capability of different migration layers and further ensure the comprehensiveness of the information extracted in the target model, in another optional implementation manner of the embodiment of the present application, the objective function is constructed according to the distance between the training parameters associated with the at least two migration layers and the initial parameters, where the objective function may be: and constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters. The weight of each migration layer can be determined by a technician according to the needs or experience values, and can be repeatedly determined through a large number of experiments.

S103, training a target model comprising initial parameters based on the target function.

And training the target model comprising the initial parameters based on the target function so that the trained target model meets the set training conditions.

Example two

Fig. 2 is a flowchart of a model migration training method in a second embodiment of the present application, where the embodiment of the present application performs optimization and improvement based on the technical solutions of the foregoing embodiments.

Further, the operation of constructing an objective function according to the distance between the training parameters associated with the at least two migration layers and the initial parameters is thinned into the operation of constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, so as to perfect a construction mechanism of the objective function.

A model migration training method as shown in fig. 2, comprising:

s201, taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model.

S202, constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.

The weight of each migration layer can be determined by a technician according to the needs or experience values, and can also be repeatedly determined through a large number of experiments.

Optionally, the weight of the migration layer is determined according to the network level information of the migration layer in the model. For example, the weight of the migration layer may be determined based on network level information of the migration layer in the source model and/or the target model.

The weight of the migration layer is determined according to the network level information of the migration layer in the model, and the sequence number of each migration layer is used as the network level information after the migration layers are numbered sequentially, so that the weight of the migration layer is determined according to the sequence number of the migration layer.

In order to improve training efficiency of the target model, the migration layer may be divided into network blocks in advance, where one network block includes at least one migration layer. Wherein the number of migration layers contained in different network blocks is the same or different. Correspondingly, determining the weight of the migration layer according to the network level information of the migration layer in the model, wherein the weight can be the sequence number of the network block to which the migration layer belongs; and determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.

Illustratively, based on the weight function, determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs; wherein the weight function is a decreasing function of the sequence number of the network block.

It can be understood that by introducing the decreasing function, different weights can be allocated to the migration layers contained in different network blocks, and when the sequence number of the network block is smaller, the shallower the level of the migration layer in the network block is indicated, and the greater the corresponding weight of the migration layer is, so that damage to general feature information extracted by a shallow network layer in the training process of the target model is avoided. When the sequence number of the network block is larger, the deeper the hierarchy of the migration layer in the network block is indicated, and the smaller the weight of the corresponding migration layer is, so that when the target model is trained, the network parameters of the deep network layer are modified to adapt to the deep semantic features in the target data set in the target field, and the situation that the accuracy of the target model is poor due to the deep semantic information loss is avoided.

Illustratively, the regularization term of the migration layer may be determined according to a weighted sum of the distances between the training parameters and the initial parameters of each migration layer; and constructing an objective function according to the regularization term and the loss function of the migration layer.

In order to constrain network parameters of the fully connected layer in the training process of the target model, in an optional implementation manner of the embodiment of the application, a regularization item of the fully connected layer may be determined according to the training parameters of the fully connected layer; and constructing the objective function according to the regularization item of the migration layer, the regularization item of the full-connection layer and the loss function.

S203, training a target model comprising initial parameters based on the target function.

It can be understood that, because weights are introduced for differentiation of each migration layer in the objective function, on one hand, all network parameters need to participate in retraining, and on the other hand, importance of information of extracted features of different layers is considered, and a compromise between fine tuning and feature extraction is realized to a certain extent. The fine tuning is to initialize the target model by using network parameters of the source model to replace the original random initialization; feature extraction, namely, taking a target model as a feature extractor of a target task, freezing the previous layers, and training only the last full-connection layer.

According to the embodiment of the application, the construction process of the objective function is refined into the construction of the objective function according to the weights of at least two migration layers and the distance between the training parameters associated with at least two migration layers and the initial parameters, so that in the process of training the objective function, network parameters of migration layers of different levels are treated differently, information retention capacities of different migration layers are considered, and on the premise of placement and migration probabilistic fitting, the accuracy of migration learning is effectively improved.

Example III

Fig. 3A is a flowchart of a model migration training method in a third embodiment of the present application, where a preferred implementation manner is provided based on the technical solutions of the foregoing embodiments.

A model migration training method as shown in fig. 3A, comprising:

s301, taking network parameters of each migration layer in the source model as initial parameters of corresponding migration layers in the target model. The migration layer is an image feature extraction layer.

S302, dividing a migration layer into a plurality of network blocks.

S303, determining the weight of the migration layer according to the sequence number of the network block to which each migration layer belongs based on the weight function.

Specifically, the weight of the migration layer is determined according to the following formula:

W _i ＝softmax(N-i)；

Wherein W is _i Weights for each migration layer in the ith network block; n is the total number of network blocks; the softmax () function is a decreasing function of i.

S304, determining regularization items of the migration layers according to the weights of the migration layers and the distance between the training parameters and the initial parameters of the migration layers.

Specifically, the regularization term of the migration layer is determined according to the following formula:

wherein omega _fi Training parameter matrixes are obtained for training parameter combinations of all migration layers in the ith network block;an initial parameter matrix is obtained for the initial parameter combination of each migration layer in the ith network block; alpha is a super parameter, and can be determined by a technician according to the needs or experience values; i ₂ Is a two-norm.

S305, determining regularization items of the full-connection layer according to training parameters of the full-connection layer in the target model.

Specifically, the regularization term of the full connection layer is determined according to the following formula:

wherein omega _fc The training parameters of the full-connection layer are beta super parameters, and the beta super parameters can be determined by technicians according to requirements or experience values; i ₂ Is a two-norm.

S306, constructing an objective function according to the regularization item of the migration layer, the regularization item of the full-connection layer and the loss function.

Specifically, an objective function is constructed according to the following formula:

where Loss is the Loss function and θ is the objective function.

S307, training the target network comprising the initial parameters based on the target function.

In connection with the objective function construction process shown in fig. 3B, 3 network blocks (network block-1, network block-2, and network block-3) are exemplified.

The source model is trained by adopting a source training sample of a source data set in the source field; the target model is trained by using target training samples of a target data set in the target field. Wherein the number of source training samples is substantially greater than the number of target training samples; the source domain and the target domain are similar domains. The source model is a trained network model, and the target model is a network model to be trained.

Wherein, the network parameters of each network block in the source model are as followsThe training parameter of each network block in the target model is omega _f1 -ω _f3 The method comprises the steps of carrying out a first treatment on the surface of the The weight of each network block is W ₁ -W ₃ 。

In migration based on a source model, multiplying the distance between the network parameters learned by a target model and the network parameters of the source model by the network block weight W to obtain a weighted distance item, and overlapping the distance items in each network block to obtain a regularized item of a migration layer. Network parameters omega of full connection layer according to object model _fc Obtaining a regularization term of the full connection layer; adding the regularization item of the migration layer and the regularization item of the full connection layer to obtain a final regularization item; and introducing the final regularization term into a loss function of the target model to construct an objective function.

By the method, inheritance of source model information and self-adaption of a target model are realized. On one hand, the shallow general feature training information of the source model can be reserved as much as possible; on the other hand, the difference in weight of the deep specific features may be increased. The source model is reserved in different levels and integrated into the loss function, so that the accuracy of migration learning can be effectively improved on the premise of preventing migration through fitting.

Example IV

Fig. 4 is a block diagram of a model migration training apparatus in a fourth embodiment of the present application, which is applicable to a case where a source model in a source domain is migrated to a target model in a target domain and the target model is trained. The device is realized by software and/or hardware and is specifically configured in the electronic equipment.

A model migration training apparatus 400 as shown in fig. 4, comprising: an initial parameter determination module 401, an objective function construction module 402, and an objective model training module 403. Wherein,

An initial parameter determining module 401, configured to take network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model;

an objective function construction module 402, configured to construct an objective function according to a distance between the training parameters associated with the at least two migration layers and the initial parameters;

the object model training module 403 is configured to train an object model including initial parameters based on the object function.

According to the embodiment of the application, network parameters of at least two migration layers in a source model are used as initial parameters of associated migration layers in a target model through an initial parameter determination module; constructing an objective function according to the distance between the training parameters and the initial parameters associated with at least two migration layers by an objective function construction module; the target model including the initial parameters is trained based on the target function by a target model training module. According to the technical scheme, the distance between the training parameters of the migration layers and the initial parameters is introduced when the objective function is constructed, so that model migration and training conditions of each migration layer are considered in the model training process, inheritance of information of a source model and self-adaption of the objective model are realized, an overfitting phenomenon in the model migration training process is avoided, and generalization capability of the objective model is improved.

Further, the objective function construction module 402 includes:

and the objective function construction unit is used for constructing an objective function according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.

Further, the apparatus further includes a weight determining module configured to:

before an objective function is constructed according to the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, the weights of the migration layers are determined according to network level information of the migration layers in a model.

Further, the weight determining module includes:

a network block sequence number determining unit, configured to determine a sequence number of a network block to which the migration layer belongs;

and the migration layer weight determining unit is used for determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs.

Further, the migration layer weight determining unit includes:

a migration layer weight determining subunit, configured to determine a weight of the migration layer according to a sequence number of a network block to which the migration layer belongs, based on a weight function; wherein the weight function is a decreasing function of the sequence number of the network block.

Further, the objective function construction unit includes:

a migration layer regularization item determining subunit, configured to determine a regularization item of a migration layer according to a weighted sum of distances between training parameters and the initial parameters of each migration layer;

and the objective function construction subunit is used for constructing the objective function according to the regularization item and the loss function of the migration layer.

Further, the objective function construction unit further includes:

a full-connection layer regularization item determining subunit, configured to determine a full-connection layer regularization item according to training parameters of the full-connection layer;

correspondingly, the objective function construction subunit is configured to construct the objective function according to the regularization term of the migration layer, the regularization term of the full connection layer, and the loss function.

Further, the migration layer is an image feature extraction layer.

The model migration training device can execute the model migration training method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of executing the model migration training method.

Example five

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 5, a block diagram of an electronic device implementing the model migration training method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the model migration training method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the model migration training method provided by the present application.

The memory 502, which is a non-transitory computer readable storage medium, may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the model migration training method in the embodiments of the present application (e.g., the initial parameter determination module 401, the objective function construction module 402, and the objective model training module 403 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, i.e., implements the model migration training method in the method embodiments described above.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of an electronic device implementing the model migration training method, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected via a network to an electronic device implementing the model migration training method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for implementing the model migration training method may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device implementing the model migration training method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a joystick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to re-order, add or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A model migration training method, comprising:

taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model; the source model or the target model is at least one of image classification, image segmentation and image detection networks, and the migration layer comprises an image feature extraction layer correspondingly; the source model is a stable network model after the source training sample is successfully trained in the source field, and the target model is a model to be trained in a target field similar to the source field;

training a target model comprising initial parameters based on the target function;

wherein said constructing an objective function based on a distance between the training parameters associated with the at least two migration layers and the initial parameters comprises: determining regularization items of the migration layers according to the distances between training parameters and initial parameters of at least two migration layers; and constructing an objective function according to regularization terms and a loss function of the migration layers, wherein the regularization terms are used for measuring the complexity of the model, and the distance between training parameters and initial parameters according to at least two migration layers is realized by a norm solving mode.

2. The method of claim 1, wherein constructing an objective function based on a distance between training parameters associated with the at least two migration layers and the initial parameters comprises:

3. The method of claim 2, wherein constructing an objective function based on the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters, further comprises:

4. A method according to claim 3, wherein determining the weight of the migration layer based on the network level information of the migration layer in the model comprises:

5. The method of claim 4, wherein determining the weight of the migration layer according to the sequence number of the network block to which the migration layer belongs comprises:

6. The method of claim 2, wherein constructing an objective function based on the weights of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters comprises:

7. The method of claim 6, wherein constructing the objective function from regularized items and loss functions of the migration layer comprises:

8. A model migration training apparatus, comprising:

the initial parameter determining module is used for taking network parameters of at least two migration layers in the source model as initial parameters of associated migration layers in the target model; the source model or the target model is at least one of image classification, image segmentation and image detection networks, and the migration layer comprises an image feature extraction layer correspondingly; the source model is a stable network model after the source training sample is successfully trained in the source field, and the target model is a model to be trained in a target field similar to the source field;

the objective function construction module is specifically used for determining regularization items of the migration layers according to the distance between training parameters and initial parameters of at least two migration layers; constructing an objective function according to regularization terms and a loss function of the migration layers, wherein the regularization terms are used for measuring the complexity of the model, and the distance between training parameters and initial parameters of at least two migration layers is realized by a norm solving mode;

9. The apparatus of claim 8, wherein the objective function construction module comprises:

10. The apparatus of claim 9, wherein the apparatus further comprises:

the weight determining module is used for determining the weight of the migration layer according to the network level information of the migration layer in the model before constructing an objective function according to the weight of the at least two migration layers and the distance between the training parameters associated with the at least two migration layers and the initial parameters.

11. The apparatus of claim 10, wherein the weight determination module comprises:

12. The apparatus according to claim 11, wherein the migration layer weight determining unit comprises:

13. The apparatus according to claim 9, wherein the objective function construction unit includes:

14. The apparatus according to claim 13, wherein the objective function construction unit further comprises:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model migration training method according to any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a model migration training method according to any one of claims 1-7.