CN115115050A - Method, device and equipment for determining transfer learning model and storage medium - Google Patents

Method, device and equipment for determining transfer learning model and storage medium Download PDF

Info

Publication number
CN115115050A
CN115115050A CN202210757206.3A CN202210757206A CN115115050A CN 115115050 A CN115115050 A CN 115115050A CN 202210757206 A CN202210757206 A CN 202210757206A CN 115115050 A CN115115050 A CN 115115050A
Authority
CN
China
Prior art keywords
candidate
network layer
learning model
training sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210757206.3A
Other languages
Chinese (zh)
Inventor
黄隆锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210757206.3A priority Critical patent/CN115115050A/en
Publication of CN115115050A publication Critical patent/CN115115050A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a method, a device, equipment and a storage medium for determining a transfer learning model, and belongs to the technical field of computers and the Internet. The method comprises the following steps: determining a plurality of candidate network layers from at least one candidate transfer learning model; processing the training sample set based on the candidate network layer to obtain a sample coding information entropy and category coding information entropies respectively corresponding to a plurality of categories; determining the mobility of a candidate network layer according to the sample coding information entropy and the multiple category coding information entropies; and constructing a transfer learning model aiming at the training sample set according to the candidate network layers with the migration rates meeting the first condition based on the migration rates respectively corresponding to the candidate network layers. According to the method and the device, the transfer learning effect of the candidate network layer on the training sample set can be evaluated without transfer learning, so that the optimal candidate network layer for the training sample set can be determined quickly and accurately.

Description

Method, device and equipment for determining transfer learning model and storage medium
Technical Field
The present application relates to the field of computer and internet technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining a migration learning model.
Background
In the model learning process, a model developed by a certain task can be applied to model training of other tasks through transfer learning.
At present, during transfer learning, one or more transfer learning models suitable for a target task are determined according to an association relationship between the target task and a source task, and then, under the condition that a plurality of transfer learning models exist, a training sample set of the target task is adopted to respectively train each transfer learning model so as to obtain a plurality of deep learning models suitable for the target task. And then, for the plurality of deep learning models, determining the accuracy of the output result of each deep learning model through testing to determine the transfer learning effect of each deep learning model after the transfer learning, and determining the deep learning model with the highest accuracy, namely the deep learning model with the best transfer learning effect, as the final training model of the target task.
However, the migration learning effect can be evaluated only after the deep learning model is obtained through the migration learning, and the optimal migration learning model corresponding to the target task cannot be determined quickly under the condition that a plurality of migration learning models exist.
Disclosure of Invention
The embodiment of the application provides a determination method, a determination device, determination equipment and a storage medium for a transfer learning model, provides a mode for evaluating the transfer learning effect before transfer learning, and can quickly and accurately determine an optimal candidate network layer for a training sample set. The technical scheme is as follows.
According to an aspect of an embodiment of the present application, there is provided a method for determining a migration learning model, the method including the steps of:
determining a plurality of candidate network layers from at least one candidate transfer learning model, wherein one candidate transfer learning model corresponds to at least one candidate network layer; wherein, different candidate migration learning models are models obtained by training based on different training data;
processing a training sample set based on the candidate network layer to obtain a sample coding information entropy and class coding information entropies respectively corresponding to a plurality of classes, wherein the training sample set comprises training samples belonging to different classes; the sample coding information entropy is used for indicating the information amount contained in the training samples in the training sample set after coding, and the class coding information entropy corresponding to the class is used for indicating the information amount contained in the training samples belonging to the class in the training sample set after coding;
determining the mobility of the candidate network layer according to the sample coding information entropy and the plurality of category coding information entropies, wherein the mobility is used for indicating the transfer learning effect of the candidate network layer on the training sample set;
and constructing a transfer learning model aiming at the training sample set according to the candidate network layers with the transfer rates meeting the first condition based on the transfer rates respectively corresponding to the candidate network layers.
According to an aspect of the embodiments of the present application, there is provided an apparatus for determining a migration learning model, the apparatus including the following modules:
the network layer determining module is used for determining a plurality of candidate network layers from at least one candidate transfer learning model, and one candidate transfer learning model corresponds to at least one candidate network layer; wherein, different candidate migration learning models are models obtained by training based on different training data;
the sample processing module is used for processing a training sample set based on the candidate network layer to obtain a sample coding information entropy and class coding information entropies respectively corresponding to a plurality of classes, and the training sample set comprises training samples belonging to different classes; the sample coding information entropy is used for indicating the information content contained in the training samples in the training sample set after being coded, and the class coding information entropy corresponding to the class is used for indicating the information content contained in the training samples belonging to the class in the training sample set after being coded;
a mobility determining module, configured to determine, according to the sample coding information entropy and the multiple category coding information entropies, a mobility of the candidate network layer, where the mobility is used to indicate a transfer learning effect of the candidate network layer on the training sample set;
and the model construction module is used for constructing a transfer learning model aiming at the training sample set according to the candidate network layers with the migration rates meeting the first condition based on the migration rates respectively corresponding to the candidate network layers.
According to an aspect of an embodiment of the present application, there is provided a computer device, including a processor and a memory, where the memory stores at least one program, and the at least one program is loaded and executed by the processor to implement the determination method of the migration learning model described above.
According to an aspect of the embodiments of the present application, a computer-readable storage medium is provided, where at least one program is stored in the storage medium, and the at least one program is loaded by a processor and executed to implement the method for determining a transfer learning model described above.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to execute the method for determining the transfer learning model.
The technical scheme provided by the embodiment of the application can bring the following beneficial effects:
the migration rate of the candidate network layer is determined through the sample coding information entropy and the category coding information entropy, so that the migration learning effect of the candidate network layer on the training sample set is determined, a mode for evaluating the migration learning effect before the migration learning is provided, the migration learning effect of the candidate network layer on the training sample set can be evaluated without the migration learning, and the optimal candidate network layer for the training sample set can be determined quickly and accurately; moreover, according to the migration rate of the candidate network layer, the migration learning effect is determined by taking the network layer as a basic unit, the judgment precision of the migration rate is improved, the optimal network layer suitable for the migration learning in the candidate migration learning model can be determined for the same candidate migration learning model, further, the migration learning model can be established subsequently by taking the network layer as the basic unit, the difference between the initially established migration learning model and the finally trained migration learning model is reduced laterally, and the training efficiency of the migration learning model is improved.
Drawings
FIG. 1 is a schematic diagram of a system for determining a transfer learning model provided by one embodiment of the present application;
FIG. 2 illustrates a schematic diagram of a determination system for a transfer learning model;
FIG. 3 is a flow chart of a method for determining a transfer learning model provided by an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a mobility acquisition manner of a candidate mobility network;
FIG. 5 is a diagram illustrating an example of a construction method of a transfer learning model;
FIG. 6 is a schematic diagram illustrating another construction method of a migration learning model;
FIG. 7 is a diagram illustrating a flow of a determination method applied to a transfer learning model in an image classification task;
FIG. 8 is a diagram illustrating a flow of a method of determining a transfer learning model in a chemical molecular structure classification task;
FIG. 9 is a block diagram of an apparatus for determining a transfer learning model according to an embodiment of the present application;
fig. 10 is a block diagram of a determining apparatus of a migration learning model according to another embodiment of the present application;
fig. 11 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence, and the like, and is specifically explained by the following embodiment.
Referring to fig. 1, a schematic diagram of a determination system of a migration learning model according to an embodiment of the present application is shown. The determination system of the transfer learning model may include the terminal device 10 and the server 20.
The terminal device 10 may be an electronic device such as a mobile phone, a tablet Computer, a PC (Personal Computer), an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, an aircraft, and the like, which is not limited in this embodiment of the present invention.
The server 20 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform.
The terminal 10 and the server 20 may communicate via a network.
In some embodiments, the above-described determination system of the transfer learning model is applied in the transfer learning process. Illustratively, as shown in fig. 2, the terminal device 10 provides the training sample set of the present migration learning and at least one candidate migration learning model for the training sample set to the server 20, and further, the server 20 determines a plurality of candidate network layers from the at least one candidate migration learning model. Wherein one candidate transfer learning model corresponds to at least one candidate network layer. Then, for each candidate network layer, the server 20 processes each training sample in each training sample set based on the feature extraction function of the candidate network layer to obtain a feature vector corresponding to each training sample, constructs a sample feature matrix based on the feature vector corresponding to each training sample, and constructs a class feature matrix corresponding to a class based on the feature vector corresponding to the training sample belonging to the class. Then, the server 20 calculates and determines the entropy of the sample coding information according to the sample feature matrix, calculates and determines the entropy of the class coding information corresponding to each class according to the class feature matrix corresponding to each class, and further, the server 20 determines the mobility of the candidate network layer according to the entropy of the sample coding information and the entropy of the class coding information corresponding to each class. Then, the server 20 determines the candidate network layer whose mobility satisfies the first condition as a target network layer, determines the candidate transfer learning model to which the target network layer belongs as a target transfer learning model, and in the target transfer learning model, maintains the target network layer and the network layer located before the target network layer unchanged, and randomly initializes parameters of the network layer located after the target network layer to construct a transfer learning model for the training sample set. And then, training the transfer learning model by adopting a training sample set.
It should be noted that the above descriptions of the terminal device 10 and the server 20 are only exemplary and explanatory, and in the exemplary embodiment, the functions of the terminal device 10 and the server 20 can be flexibly set and adjusted. Illustratively, in the migration learning process, the server 20 performs data collection, data processing and model training simultaneously without depending on the terminal device 10, if the load capacity of the server 20 allows; alternatively, during the migration learning process, the terminal device 10 provides a visual configuration interface to the staff, so that the staff can configure relevant training parameters, such as a training sample set, a candidate migration learning model, and the like, in the visual configuration interface, and then the server 20 performs data collection, data processing, and model training based on the relevant training parameters configured by the staff.
Referring to fig. 3, a flowchart of a method for determining a transfer learning model according to an embodiment of the present application is shown. The steps of the method may be performed by the terminal device 10 and/or the server 20 (hereinafter collectively referred to as "computer devices") of fig. 1 described above. The method may comprise at least one of the following steps (301-304):
step 301, a plurality of candidate network layers is determined from at least one candidate transfer learning model.
The candidate transfer learning model refers to a deep learning model of transfer learning that can be applied to a set of training samples. Wherein, the different candidate migration learning models are models obtained by training based on different training data. In some embodiments, the training tasks corresponding to different candidate transfer learning models may be the same or different. In a possible implementation manner, different candidate transfer learning models have different training tasks; namely, different candidate migration learning models are obtained by training based on different training data, and are deep learning models for the same task. In another possible implementation manner, different candidate transfer learning models have different corresponding training tasks; illustratively, different candidate migration learning models are trained based on different training data, and are deep learning models for similar tasks. Wherein, the similar tasks may also be referred to as related tasks.
In an embodiment of the present application, prior to migration learning, a computer device obtains at least one candidate migration learning model and determines a plurality of candidate network layers from the at least one candidate migration learning model. Wherein one candidate transfer learning model corresponds to at least one candidate network layer. In some embodiments, after obtaining the at least one candidate transition learning model, the computer device samples from each candidate transition learning model to obtain the candidate network layer.
Step 302, processing the training sample set based on the candidate network layer to obtain a sample coding information entropy and category coding information entropies respectively corresponding to a plurality of categories.
In the embodiment of the present application, after the computer device obtains the candidate network layers, the training sample set is processed based on the candidate network layers, so as to obtain a sample coding information entropy and category coding information entropies corresponding to a plurality of categories, respectively. The sample coding information entropy is used for indicating the information content contained in the training samples in the training sample set after coding, and the category coding information entropy corresponding to the category is used for indicating the information content contained in the training samples belonging to the category in the training sample set after coding.
In some embodiments, the training sample sets include training samples belonging to different classes. The training sample set may correspond to one or more categories, which is not limited in the embodiment of the present application; the number of training samples corresponding to each category may be any number, which is not limited in the embodiment of the present application. In the embodiment of the application, after the computer device obtains the candidate network layer, the sample coding information entropy is obtained based on each training sample contained in the training sample set; based on the training samples belonging to a certain category in the training sample set, the category coding information entropy corresponding to the category is obtained, and then for each category, the step is repeatedly executed respectively to obtain the category coding information entropy corresponding to each category.
It should be noted that, in the embodiment of the present application, the labels represent the classes, and one class corresponds to one label, that is, in the training sample set, training samples belonging to the same class correspond to the same label.
Step 303, determining the mobility of the candidate network layer according to the sample coding information entropy and the multiple category coding information entropies.
In an embodiment of the present application, after obtaining the sample coding information entropy and the plurality of category coding information entropies, the computer device determines the mobility of the candidate network layer according to the sample coding information entropy and the plurality of category coding information entropies. And the migration rate is used for indicating the migration learning effect of the candidate network layer on the training sample set.
Illustratively, assume a set of training samples
Figure BDA0003719911590000071
x i Refers to the ith training sample in the training sample set, y i Refers to the label of the ith training data in the training sample set. Wherein, y i The value of (a) is {1,2 … … C }, i.e., the training sample set includes training samples of C classes. Generally, cross entropy is taken as a loss function of a model of a training sample set, and under the condition that the loss function is optimized to be optimal and minimum, the value of the cross entropy is approximately equal to x i Feature vector z i And a label y i Mutual information MI (Z, Y):
MI(Z,Y)=H(Z)-H(Z|Y);
h (Z) refers to the information entropy of the feature vector of the training sample set, and H (Z | Y) refers to the information entropy of the feature vector belonging to the category corresponding to the Y label in the training sample set.
However, due to the high dimensionality of the feature vector, the algorithm for estimating the information entropy associated with the feature vector is either inaccurate or computationally complex, and consumes a large amount of computing resources, because, in the embodiment of the present application, the encoded information entropy is approximately estimated from the mutual information.
In a physical sense, the sample coding information entropy R (Z) is approximately equal to the evaluation H (Z), and similarly, the class coding information entropy R (Z) c ) Approximately equal to H (Z | Y ═ c) evaluated, then the mobility T for the candidate network layer l of the candidate migratory learning model k k l Comprises the following steps:
Figure BDA0003719911590000081
wherein n is c The number of training samples included in the c-th category in the training sample set is referred to, N represents the total number of training samples in the training sample set, that is:
Figure BDA0003719911590000082
and 304, constructing a transfer learning model aiming at the training sample set according to the candidate network layers with the mobility meeting the first condition based on the mobility corresponding to each candidate network layer.
In the embodiment of the application, after obtaining the mobility rates respectively corresponding to the candidate network layers, the computer device determines, from the candidate network layers, the candidate network layers with the mobility rates meeting the first condition based on the mobility rates, and then constructs a transfer learning model for the training sample set according to the candidate network layers.
The first condition is a judgment condition of the candidate network layer, and the first condition can be flexibly set and adjusted according to an actual situation, which is not limited in the embodiment of the present application. Illustratively, the first condition includes, but is not limited to, at least one of: the mobility is highest, the mobility is greater than a target value, and the like, which is not limited in the embodiments of the present application. The target value may be any value, and the target value may be flexibly set and adjusted according to an actual situation, which is not limited in the embodiment of the present application.
In summary, in the technical solution provided in the embodiment of the present application, the mobility of the candidate network layer is determined through the sample coding information entropy and the category coding information entropy, so as to determine the transfer learning effect of the candidate network layer on the training sample set, and a manner for evaluating the transfer learning effect before the transfer learning is provided, so that the transfer learning effect of the candidate network layer on the training sample set can be evaluated without the transfer learning, and the optimal candidate network layer on the training sample set can be quickly and accurately determined; moreover, according to the migration rate of the candidate network layer, the migration learning effect is determined by taking the network layer as a basic unit, the judgment precision of the migration rate is improved, the optimal network layer suitable for the migration learning in the candidate migration learning model can be determined for the same candidate migration learning model, further, the migration learning model can be established subsequently by taking the network layer as the basic unit, the difference between the initially established migration learning model and the finally trained migration learning model is reduced laterally, and the training efficiency of the migration learning model is improved.
In the following, taking a certain candidate network layer as an example, the manner of obtaining the sample coding information entropy and the category coding information entropy is described.
In an exemplary embodiment, the step 302 includes at least one of:
1. and processing the training sample set based on the candidate network layer to obtain a sample characteristic matrix and a category characteristic matrix corresponding to each of a plurality of categories.
In this embodiment, after obtaining the candidate network layer, the computer device processes the training sample set based on the candidate network layer to obtain a sample feature matrix and a category feature matrix corresponding to each of a plurality of categories. In some embodiments, different candidate network layers correspond to different feature extraction functions, and the computer device processes the training sample set based on the feature extraction functions of the candidate network layers to obtain the sample feature matrix and category feature matrices corresponding to the multiple categories, respectively.
In some embodiments, after obtaining the candidate network layer, the computer device determines a feature extraction function of the candidate network layer, and further, separately adds the feature extraction function to the training sample set based on the candidate network layerAnd processing each training sample to obtain a feature vector corresponding to each training sample. Exemplarily, assume that the feature extraction function of the candidate network layer l of the candidate transition learning model k is f l k (x) Then training sample x i Characteristic vector z of i =f l k (x i )。
In some embodiments, after obtaining the feature vector corresponding to each training sample, the computer device constructs a sample feature matrix according to the feature vector corresponding to each training sample. The data of the first target column in the sample feature matrix is a feature vector of the first target training sample, and the first target column refers to any column in the sample feature matrix, which is not limited in this embodiment of the present application.
In some embodiments, after obtaining the feature vectors corresponding to the training samples, the computer device constructs a class feature matrix corresponding to each class according to the class to which the training samples included in the training sample set belong. Taking the target class as an example, for at least one training sample belonging to the target class in the training sample set, according to the feature vectors respectively corresponding to the training samples belonging to the target class, a class feature matrix corresponding to the target class is constructed. The data of the second target column in the class feature matrix corresponding to the target class is a feature vector of a second target training sample belonging to the target class, and the second target column refers to any column in the class feature matrix corresponding to the target class, which is not limited in the embodiment of the present application.
In the embodiment of the present application, the classification of each training sample in the training sample set may be performed before or after the feature vector is obtained, and the embodiment of the present application does not limit this.
In a possible implementation manner, after obtaining the training sample set, the computer device uses the labels as a reference, one label corresponds to one class, classifies the training samples based on the labels corresponding to the training samples, and further directly constructs the sample feature matrix and the class feature matrix after obtaining the feature vectors corresponding to the training samples.
In another possible implementation, after obtaining the feature vector, the computer device classifies training samples based on labels corresponding to training samples in a training sample set, where one label corresponds to one class, and further constructs the sample feature matrix and the class feature matrix.
Of course, in other possible embodiments, in order to reduce the classification time, when the training sample set is collected, the training samples may be directly collected and stored by category differentiation to generate the training sample set.
2. And determining the entropy of the sample coding information according to the sample feature matrix.
In the embodiment of the present application, after obtaining the sample feature matrix, the computer device determines the entropy of the sample coding information according to the sample feature matrix. In some embodiments, the computer device obtains dimensions of a feature vector corresponding to a training sample, and an encoding accuracy rate for the training sample; further, according to the dimension of the feature vector corresponding to the training sample and the coding accuracy rate of the training sample, the coding length required for compressing the sample feature matrix into the coding indicated by the coding accuracy rate is determined, and the sample coding information entropy is determined based on the coding length. The encoding accuracy may be any value preset by a worker, which is not limited in the embodiment of the present application.
For example, assuming that the dimension of the feature vector corresponding to the training sample is d, and the encoding precision for the training sample is ∈, the sample encoding information entropy r (z) is:
Figure BDA0003719911590000101
wherein, I d Is a d-dimensional unit matrix, Z is a sample feature matrix, Z T Is the transpose of the sample feature matrix.
As can be seen from the above equation, the calculation result of the sample coding information entropy r (Z) can be understood as a log value of a coding length required for compressing Z into coding with accuracy ∈.
3. And determining the class coding information entropy corresponding to each class according to the class characteristic matrix corresponding to each class.
In the embodiment of the application, after the computer device obtains the category feature matrix, the computer device determines the category coding information entropy corresponding to each category according to the category feature matrix corresponding to each category. Taking the target category as an example, the computer device obtains the dimensionality of the feature vector corresponding to the training sample and the coding accuracy rate aiming at the training sample; further, according to the dimension of the feature vector corresponding to the training sample and the encoding precision rate of the training sample, determining the encoding length required for compressing the class feature matrix corresponding to the target class into the encoding indicated by the encoding precision rate, and determining the class encoding information entropy corresponding to the target class based on the encoding length.
Illustratively, for class c, the class encoding information entropy R (Z) c ) Comprises the following steps:
Figure BDA0003719911590000111
in addition, as can be seen from the above formula, the class-coded information entropy R (Z) c ) Can be understood as the calculation of Z c The compression becomes the log of the code length required for the encoding with precision epsilon.
Illustratively, as shown in fig. 4, the training sample set includes training samples belonging to class 1, training samples belonging to class 2, and training samples belonging to class 3, and the feature extraction function of the candidate network layer l based on the candidate transfer learning model k is f l k (x) And obtaining the feature vector of each training sample, further constructing a sample feature matrix based on the feature vector of each training sample, and constructing a category feature matrix corresponding to each category based on the feature vector corresponding to each category. Then, determining the entropy of the sample coding information according to the sample characteristic matrix, and determining the entropy of the sample coding information according to the class characteristic matrixAnd determining the entropy of the class coding information by the array, and further determining the mobility of the candidate network layer l aiming at the training sample set.
In summary, in the technical scheme provided in the embodiment of the present application, the mobility of the candidate network layer is determined through the sample coding information entropy and the category coding information entropy, the information entropy of the feature vector of the training sample set is evaluated through the sample coding information entropy, and the information entropy of the feature vector belonging to the category corresponding to the label in the training sample set is evaluated through the category coding information entropy.
It should be noted that, the above is a description of an acquisition manner of sample coded information entropy and category coded information entropy taking a certain candidate network layer as an example, in the present application, the above-described steps need to be performed for each candidate network layer.
Next, a description is given of a determination method of the candidate network layer.
In an exemplary embodiment, the above step 301 comprises at least one of:
1. candidate transfer learning models are determined based on a set of training samples.
In the embodiment of the present application, after obtaining the training sample set, the computer device determines, based on the training sample set, a candidate transfer learning model applicable to the training sample set.
In one possible implementation, the computer device determines the candidate transfer learning model based on training tasks of a set of training samples. In some embodiments, after obtaining the training sample set, the computer device determines the training tasks of the training sample set, and then determines, based on the training tasks of the training sample set, the training model corresponding to the associated task associated with the training task as the candidate migration learning model. The related tasks may be referred to as similar tasks. For example, if the training task of the training sample set is to recognize a facial image of a user, the corresponding associated task may be to recognize a character image, recognize a staff image, recognize a character moving image, or the like.
In another possible implementation, the computer device determines the candidate transfer learning model based on a training task of a training sample set and content included in the training sample set. In some embodiments, after obtaining the training sample set, the computer device determines a training task of the training sample set, further determines, based on the training task of the training sample set, a training model corresponding to an associated task associated with the training task as a candidate migration learning model to be selected, and then determines, based on contents included in training data respectively corresponding to each candidate migration learning model to be selected, a model similar to the contents included in the training sample set from among the candidate migration learning models to be selected as the candidate migration learning model.
2. And sampling network layers contained in the candidate transfer learning model to obtain at least one candidate network layer corresponding to the candidate transfer learning model.
In this embodiment of the present application, after obtaining the candidate migration learning model, the computer device samples network layers included in the candidate migration learning model to obtain at least one candidate network layer corresponding to the candidate migration learning model. When the candidate migration learning model is adopted, the computer device may perform random sampling on the network layers included in the candidate migration learning model, or may also perform sampling on the network layers included in the candidate migration learning model based on the sampling basis, which is not limited in the embodiment of the present application. Illustratively, the sampling criteria for the candidate transfer learning model includes, but is not limited to, at least one of: the position between the network layer and the output layer, the degree of importance of the network layer in the model, the number of preset candidate network layers, the preset sampling interval, and the like, which are not limited in the embodiment of the present application. In some embodiments, different types of candidate transfer learning models correspond to different sampling bases.
In one possible embodiment, in order to evaluate the migration learning effect of the model as a whole as much as possible, the candidate network layer is determined based on the position distance between the network layer and the output layer. In some embodiments, after obtaining the candidate migration learning model, the computer device determines, from the candidate migration learning model, a network layer whose position distance from the output layer is smaller than a threshold value as a network layer to be sampled, and then samples and obtains the candidate network layer from the network layer to be sampled. The threshold may be any value, and the threshold may be flexibly set and adjusted according to an actual situation, which is not limited in the embodiment of the present application. In some embodiments, when the candidate network layer is obtained by sampling from the network layer to be sampled, the sampling may be performed randomly, or may be performed based on other sampling bases except for a position between the network layer and the output layer, which is not limited in this application.
In another possible implementation, in order to make the candidate network layer representative of the candidate migration learning model, the candidate network layer is determined based on the importance degree of the network layer in the model. In some embodiments, after obtaining the candidate migration learning model, the computer device obtains at least one network layer included in the candidate migration learning model and the importance degree of each network layer in the candidate migration learning model, further determines, from the candidate migration learning model, a network layer whose importance degree satisfies a second condition as a network layer to be sampled based on the importance degree, and further samples and obtains the candidate network layer from the network layer to be sampled. The importance degree can be determined during model training, and the second condition can be flexibly set and adjusted according to actual conditions. For example, the second condition is that the importance degree is greater than a threshold value, and the threshold value may be any value, which is not limited in this embodiment of the application. In some embodiments, when the candidate network layer is obtained by sampling from the network layer to be sampled, the sampling may be performed randomly, or may be performed based on other sampling bases except for the importance degree of the network layer in the model, which is not limited in the embodiment of the present application.
In yet another possible implementation, the candidate network layers are determined based on a preset number of candidate network layers. In some embodiments, after obtaining the candidate migration learning model, the computer device obtains the total number of network layers included in the candidate migration learning model, determines a sampling interval based on a preset number of candidate network layers, performs average sampling based on the sampling interval to obtain a network layer to be sampled, and samples from the network layer to be sampled to obtain the candidate network layer. The number of the preset candidate network layers may be any number, and may be flexibly set and adjusted according to actual conditions, which is not limited in the embodiments of the present application.
In some embodiments, when the candidate network layers are obtained by sampling from the network layers to be sampled, the sampling may be performed randomly, or may be performed based on other criteria except for the preset number of candidate network layers in the sampling criteria, which is not limited in the embodiment of the present application.
In other possible embodiments, the candidate network layers are determined based on a preset sampling interval. In some embodiments, the computer device determines the candidate network layers after obtaining the candidate transfer learning model.
In some embodiments, after obtaining the candidate transfer learning model, the computer device obtains a total number of network layers included in the candidate transfer learning model, determines the number of candidate network layers based on a preset sampling interval, and samples the candidate transfer learning model based on the number of candidate network layers to obtain the candidate network layers.
It should be noted that, the above description is given of an acquisition method of a candidate network layer by taking a certain candidate migration learning model as an example, and in the present application, the above described steps need to be performed for each candidate migration learning model.
In summary, in the technical scheme provided by the embodiment of the application, the candidate transfer learning model is determined through the training task, the candidate network layer is selected from the candidate transfer learning model to evaluate the transfer learning capability of the network layer, the judgment precision of the transfer learning capability is improved, for the same candidate transfer learning model, the optimal network layer suitable for transfer learning in the candidate transfer learning model can be determined, the difference between the initially constructed transfer learning model and the finally trained transfer learning model is reduced from the side, and the training efficiency of the transfer learning model is improved.
Next, a method of constructing the migration learning model will be described.
In an exemplary embodiment, the step 304 includes at least one of:
1. and determining the candidate network layer with the mobility meeting the first condition as the target network layer based on the mobility corresponding to each candidate network layer.
In this embodiment, after obtaining the mobility, the computer device determines, as the target network layer, the candidate network layer whose mobility satisfies the first condition based on the mobility corresponding to each candidate network layer.
In one possible embodiment, the first condition is that the mobility is highest. In some embodiments, after obtaining the mobility of each candidate network layer, the computer device ranks the mobility from high to low, determines the candidate network layer with the highest mobility as a target network layer, and constructs a migration learning model according to the target network layer.
In another possible embodiment, the first condition is that the mobility is greater than a target value. In some embodiments, the computer device determines, after obtaining the mobility of each candidate network layer, the candidate network layer having a mobility greater than a target value as the target network layer. Of course, in other possible embodiments, the first condition is that the mobility is greater than the target value and the distance from the output layer is the smallest. Exemplarily, after obtaining the mobility of each candidate network layer, the computer device determines the candidate network layer with the mobility greater than the target value as the candidate target network layer, where the number of the candidate target network layers is not one, and then determines the candidate transfer learning network to which each candidate target network layer belongs, thereby determining the position distance between each candidate target network layer and the output layer in the corresponding candidate transfer learning network, and determining the candidate target network layer with the smallest position distance as the target network layer.
2. And constructing a transfer learning model aiming at the training sample set according to the target network layer.
In the embodiment of the present application, after determining the target network layer, the computer device constructs a transfer learning model for the training sample set according to the target network layer.
In one possible implementation, in order to improve the efficiency of building the migration learning model, the computer device builds the migration learning model based on the candidate migration learning model to which the target network layer belongs. In some embodiments, the computer device determines a candidate transfer learning model to which the target network layer belongs as a target transfer learning model for the set of training samples; further, based on the position of the target network layer in the target migration learning model, determining the target network layer and other network layers positioned in front of the target network layer as first network layers, and determining other network layers positioned behind the target network layer as second network layers; and then, initializing the parameters of the second network layer to obtain a third network layer, and constructing a transfer learning model aiming at the training sample set according to the first network layer and the third network layer. Exemplarily, as shown in fig. 5, the computer device obtains a candidate transfer learning model to which the target network layer belongs, directly transfers the target network layer and other network layers located before the target network layer to the transfer learning model for the training sample set, and randomly initializes and transfers other network layers located after the target network layer to the transfer learning model for the training sample set.
In another possible implementation, the computer device builds the above-described migration learning model based solely on the target network layer. In some embodiments, after determining the target network layer, the computer device determines a candidate transfer learning model to which the target network layer belongs as a target transfer learning model for the training sample set, determines a target network layer and other networks located before the target network layer in the target transfer learning model as a first network layer, further constructs a fourth network layer, and constructs a transfer learning model for the training sample set according to the first network layer and the fourth network layer. Illustratively, as shown in fig. 6, after determining the target network layer, the computer device determines a candidate transfer learning model to which the target network layer belongs as a target transfer learning model for the training sample set, and directly transfers the target network layer and other network layers located before the target network layer in the target transfer learning model to the transfer learning model for the training sample set, and constructs other network layers located after the target network layer in the transfer learning model.
In summary, in the technical scheme provided in the embodiment of the present application, the migration learning effect is determined by using the network layer as the basic unit according to the mobility of the candidate network layer, so that the determination accuracy of the mobility is improved, the migration learning model is subsequently constructed by using the network layer as the basic unit, the construction efficiency of the migration learning model is improved, the difference between the initially constructed migration learning model and the finally trained migration learning model is reduced from the side, and the training efficiency of the migration learning model is improved.
Next, a method for determining a migration learning model in the present application will be described by taking an image classification task as an example.
Referring to fig. 7, a flowchart of a method for determining a migration learning model according to another embodiment of the present application is shown. The steps of the method may be performed by the terminal device 10 and/or the server 20 (hereinafter collectively referred to as "computer devices") of fig. 1 described above. The method may comprise at least one of the following steps (701-712):
step 701, determining at least one first candidate transfer learning model based on a training task of a first training sample set; the first training sample set comprises training sample images belonging to different categories.
In some embodiments, the training sample image may be an image capable of representing a disease in a medical field, an image of a commodity in a shopping field, an image of a book cover in an education field, and the like, which is not limited in this application.
Step 702, respectively sampling the network layers included in each first candidate transfer learning model to obtain a plurality of first candidate network layers. Wherein one first candidate transfer learning model corresponds to one or more first candidate network layers.
Step 703, processing each training sample image of the first training sample set based on the feature extraction function of the first candidate network layer to obtain a feature vector corresponding to each training sample image.
Step 704, determining a first sample feature matrix according to the feature vectors respectively corresponding to the training sample images.
Step 705, determining a first sample coding information entropy according to the first sample feature matrix.
Step 706, determining a first class feature matrix corresponding to each class according to the feature vector of the training sample image corresponding to each class.
And 707, determining the first category coding information entropy corresponding to each category according to the first category feature matrix corresponding to each category.
Step 708, determining the mobility of the first candidate network layer according to the first sample coding information entropy and the first class coding information entropy corresponding to each class.
Step 709, based on the mobility rates respectively corresponding to the first candidate network layers, determining the first candidate network layer whose mobility rate satisfies the first condition as the first target network layer.
Step 710, determining a first candidate transfer learning model to which the first target network layer belongs as a target transfer learning model for the first training sample set.
In step 711, based on the position of the first target network layer in the target migration learning model, the first target network layer and other network layers located before the first target network layer are determined as first network layers, and other network layers located after the first target network layer are determined as second network layers.
Step 712, a transfer learning model for the first training sample set is constructed according to the first network layer and the third network layer.
In addition, a determination method of the migration learning model in the present application is described by taking a classification task of a chemical molecular structure as an example.
Referring to fig. 8, a flowchart of a method for determining a migration learning model according to another embodiment of the present application is shown. The steps of the method may be performed by the terminal device 10 and/or the server 20 (hereinafter collectively referred to as "computer devices") of fig. 1 described above. The method may comprise at least one of the following steps (801-812):
step 801, determining at least one second candidate transfer learning model based on the training tasks of the second training sample set; wherein the second set of training samples comprises chemical molecules belonging to different structural classes.
Step 802, respectively sampling the network layers included in each second candidate transfer learning model to obtain a plurality of second candidate network layers. Wherein one second candidate transfer learning model corresponds to one or more second candidate network layers.
And 803, processing each training sample image of the second training sample set based on the feature extraction function of the second candidate network layer to obtain a feature vector corresponding to each training sample image.
And step 804, determining a second sample feature matrix according to the feature vectors respectively corresponding to the training sample images.
And step 805, determining the second sample coding information entropy according to the second sample feature matrix.
Step 806, determining a second class feature matrix corresponding to each structure class according to the feature vector of the training sample image corresponding to each structure class.
In step 807, second category coding information entropies respectively corresponding to the structure categories are determined according to the second category feature matrices respectively corresponding to the structure categories.
And 808, determining the mobility of the second candidate network layer according to the second sample coding information entropy and the second category coding information entropy corresponding to each structure category.
Step 809, based on the mobility rates respectively corresponding to the second candidate network layers, determining the second candidate network layer whose mobility rate satisfies the first condition as the second target network layer.
Step 810, determining a second candidate transfer learning model to which the second target network layer belongs as a target transfer learning model for the second training sample set.
Step 811, based on the position of the second target network layer in the target migration learning model, the second target network layer and other network layers located before the second target network layer are determined as the first network layer, and other network layers located after the second target network layer are determined as the second network layer.
Step 812, a transfer learning model for the second training sample set is constructed according to the first network layer and the third network layer.
It should be noted that, the foregoing describes a method for determining a migration learning model in the present application by taking an image classification task and a classification task of a chemical molecular structure as examples, and in an exemplary embodiment, the method for determining a migration learning model may also be used for text recognition, keyword extraction, and the like, which is not limited in the present application.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 9, a block diagram of a determining apparatus of a migration learning model according to an embodiment of the present application is shown. The device has the function of realizing the determination method of the transfer learning model, and the function can be realized by hardware or by hardware executing corresponding software. The device can be computer equipment, and can also be arranged in the computer equipment. The apparatus 900 may include: a network layer determination module 910, a sample processing module 920, a mobility determination module 930, and a model construction module 940.
The network layer determining module 910 is configured to determine a plurality of candidate network layers from at least one candidate transfer learning model, where one candidate transfer learning model corresponds to at least one candidate network layer; wherein, the different candidate migration learning models are models obtained by training based on different training data.
The sample processing module 920 is configured to process a training sample set based on the candidate network layer to obtain a sample coding information entropy and category coding information entropies corresponding to multiple categories, where the training sample set includes training samples belonging to different categories; the sample coding information entropy is used for indicating the information content contained in the training samples in the training sample set after coding, and the class coding information entropy corresponding to the class is used for indicating the information content contained in the training samples belonging to the class in the training sample set after coding.
The mobility determining module 930 is configured to determine, according to the sample coding information entropy and the plurality of class coding information entropies, a mobility of the candidate network layer, where the mobility is used to indicate a transfer learning effect of the candidate network layer on the training sample set.
The model building module 940 is configured to build a transfer learning model for the training sample set according to the candidate network layers whose mobility satisfies a first condition based on the mobility corresponding to each of the candidate network layers.
In some embodiments, as shown in fig. 10, the sample processing module 920 includes: a matrix acquisition sub-module 921 and an information entropy determination sub-module 922.
The matrix obtaining sub-module 921 is configured to process a training sample set based on the candidate network layer to obtain a sample feature matrix and a category feature matrix corresponding to each of a plurality of categories.
The information entropy determining submodule 922 is configured to determine a sample encoding information entropy according to the sample feature matrix.
The information entropy determining submodule 922 is further configured to determine, according to the category feature matrix corresponding to each category, a category coding information entropy corresponding to each category.
In some embodiments, as shown in fig. 10, the matrix obtaining sub-module 921 is configured to:
based on the feature extraction function of the candidate network layer, processing each training sample in the training sample set respectively to obtain a feature vector corresponding to each training sample;
constructing the sample feature matrix according to the feature vectors respectively corresponding to the training samples; the data of a first target column in the sample feature matrix is a feature vector of a first target training sample;
for at least one training sample belonging to a target class in the training sample set, constructing a class feature matrix corresponding to the target class according to feature vectors corresponding to the training samples belonging to the target class respectively; and the data of a second target column in the class characteristic matrix corresponding to the target class is the characteristic vector of a second target training sample belonging to the target class.
In some embodiments, as shown in fig. 10, the information entropy determination sub-module 922 is configured to:
obtaining the dimensionality of a feature vector corresponding to the training sample and the coding accuracy rate aiming at the training sample;
according to the dimension of the feature vector corresponding to the training sample and the encoding precision rate of the training sample, determining the encoding length required by encoding the sample feature matrix indicated by the encoding precision rate;
determining the sample coding information entropy based on the coding length.
In some embodiments, as shown in fig. 10, the network layer determining module 910 includes: a model determination sub-module 911 and a network layer acquisition sub-module 912.
The model determining submodule 911 is configured to determine, based on the training tasks of the training sample set, a training model corresponding to an associated task associated with the training tasks as the candidate migration learning model;
the network layer obtaining sub-module 912 is configured to sample network layers included in the candidate transfer learning model to obtain at least one candidate network layer corresponding to the candidate transfer learning model.
In some embodiments, as shown in fig. 10, the network layer obtaining sub-module 912 is configured to:
determining a network layer with a position distance smaller than a threshold value from the candidate transfer learning model as a network layer to be sampled; sampling from the network layer to be sampled to obtain the candidate network layer;
alternatively, the first and second electrodes may be,
acquiring at least one network layer contained in the candidate migration learning model and the importance degree of each network layer in the candidate migration model; and determining the network layer with the importance degree meeting a second condition from the candidate migration learning model as the candidate network layer based on the importance degree.
In some embodiments, the model building module 940 is configured to:
determining a candidate network layer with the mobility meeting the first condition as a target network layer based on the mobility corresponding to each candidate network;
determining a candidate transfer learning model to which the target network layer belongs as a target transfer learning model for the training sample set;
determining the target network layer and other network layers positioned in front of the target network layer as first network layers and determining other network layers positioned behind the target network layer as second network layers based on the position of the target network layer in the target transfer learning model;
initializing the parameters of the second network layer to obtain a third network layer;
constructing a transfer learning model for the set of training samples according to the first network layer and the third network layer.
In summary, in the technical solution provided in the embodiment of the present application, the mobility of the candidate network layer is determined through the sample coding information entropy and the category coding information entropy, so as to determine the transfer learning effect of the candidate network layer on the training sample set, and a manner for evaluating the transfer learning effect before the transfer learning is provided, so that the transfer learning effect of the candidate network layer on the training sample set can be evaluated without the transfer learning, and the optimal candidate network layer on the training sample set can be quickly and accurately determined; moreover, according to the migration rate of the candidate network layer, the migration learning effect is determined by taking the network layer as a basic unit, the judgment precision of the migration rate is improved, the optimal network layer suitable for the migration learning in the candidate migration learning model can be determined for the same candidate migration learning model, further, the migration learning model can be established subsequently by taking the network layer as the basic unit, the difference between the initially established migration learning model and the finally trained migration learning model is reduced laterally, and the training efficiency of the migration learning model is improved.
It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Referring to fig. 11, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be configured to implement the functionality of the determination method of the migration learning model described above. Specifically, the method comprises the following steps:
the computer apparatus 1100 includes a Central Processing Unit (CPU) 1101, a system Memory 1104 including a Random Access Memory (RAM) 1102 and a Read Only Memory (ROM) 1103, and a system bus 1105 connecting the system Memory 1104 and the Central Processing Unit 1101. The computer device 1100 also includes a basic Input/Output system (I/O system) 1106, which facilitates transfer of information between devices within the computer, and a mass storage device 1107 for storing an operating system 1113, application programs 1114, and other program modules 1115.
The basic input/output system 1106 includes a display 1108 for displaying information and an input device 1109 such as a mouse, keyboard, etc. for user input of information. Wherein the display 1108 and the input device 1109 are connected to the central processing unit 1101 through an input output controller 1110 connected to the system bus 1105. The basic input/output system 1106 may also include an input/output controller 1110 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1110 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 1107 is connected to the central processing unit 1101 through a mass storage controller (not shown) that is connected to the system bus 1105. The mass storage device 1107 and its associated computer-readable media provide non-volatile storage for the computer device 1100. That is, the mass storage device 1107 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact disk Read-Only Memory) drive.
Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory, CD-ROM, DVD (Digital Video Disc) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 1104 and mass storage device 1107 described above may be collectively referred to as memory.
According to various embodiments of the present application, the computer device 1100 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1100 may connect to the network 1112 through the network interface unit 1111 connected to the system bus 1105 or may connect to other types of networks or remote computer systems (not shown) using the network interface unit 1111.
The memory also includes a computer program stored in the memory and configured to be executed by the one or more processors to implement the method of determining a migration learning model described above.
In an exemplary embodiment, a computer-readable storage medium is also provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which when executed by a processor, implements the above-described determination method of the migration learning model.
Optionally, the computer-readable storage medium may include: ROM (Read Only Memory), RAM (Random Access Memory), SSD (Solid State drive), or optical disc. The Random Access Memory may include a ReRAM (resistive Random Access Memory) and a DRAM (Dynamic Random Access Memory).
In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to execute the determination method of the migration learning model.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.
The above description is only exemplary of the application and should not be taken as limiting the application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the application should be included in the protection scope of the application.

Claims (10)

1. A method for determining a transfer learning model, the method comprising:
determining a plurality of candidate network layers from at least one candidate transfer learning model, wherein one candidate transfer learning model corresponds to at least one candidate network layer; wherein, different candidate migration learning models are models obtained by training based on different training data;
processing a training sample set based on the candidate network layer to obtain a sample coding information entropy and class coding information entropies respectively corresponding to a plurality of classes, wherein the training sample set comprises training samples belonging to different classes; the sample coding information entropy is used for indicating the information content contained in the training samples in the training sample set after being coded, and the class coding information entropy corresponding to the class is used for indicating the information content contained in the training samples belonging to the class in the training sample set after being coded;
determining the mobility of the candidate network layer according to the sample coding information entropy and the plurality of category coding information entropies, wherein the mobility is used for indicating the transfer learning effect of the candidate network layer on the training sample set;
and constructing a transfer learning model aiming at the training sample set according to the candidate network layers with the transfer rates meeting the first condition based on the transfer rates respectively corresponding to the candidate network layers.
2. The method of claim 1, wherein the processing a training sample set based on the candidate network layer to obtain a sample coding information entropy and class coding information entropies corresponding to a plurality of classes respectively comprises:
processing the training sample set based on the candidate network layer to obtain a sample characteristic matrix and a category characteristic matrix corresponding to each of a plurality of categories;
determining sample coding information entropy according to the sample characteristic matrix;
and determining the class coding information entropy respectively corresponding to each class according to the class characteristic matrix corresponding to each class.
3. The method of claim 2, wherein the processing the training sample set based on the candidate network layer to obtain a sample feature matrix and a category feature matrix corresponding to each of a plurality of categories comprises:
based on the feature extraction function of the candidate network layer, processing each training sample in the training sample set respectively to obtain a feature vector corresponding to each training sample;
constructing the sample feature matrix according to the feature vectors respectively corresponding to the training samples; the data of a first target column in the sample feature matrix is a feature vector of a first target training sample;
for at least one training sample belonging to a target class in the training sample set, constructing a class feature matrix corresponding to the target class according to feature vectors corresponding to the training samples belonging to the target class respectively; and the data of a second target column in the class characteristic matrix corresponding to the target class is the characteristic vector of a second target training sample belonging to the target class.
4. The method according to claim 2, wherein the determining sample coding information entropy according to the sample feature matrix comprises:
obtaining the dimensionality of a feature vector corresponding to the training sample and the coding accuracy rate aiming at the training sample;
according to the dimension of the feature vector corresponding to the training sample and the encoding precision rate of the training sample, determining the encoding length required by encoding the sample feature matrix indicated by the encoding precision rate;
determining the sample coding information entropy based on the coding length.
5. The method of claim 1, wherein determining a plurality of candidate network layers from at least one candidate transfer learning model comprises:
determining a training model corresponding to an associated task associated with the training task as the candidate transfer learning model based on the training tasks of the training sample set;
and sampling network layers contained in the candidate transfer learning model to obtain at least one candidate network layer corresponding to the candidate transfer learning model.
6. The method according to claim 5, wherein the sampling network layers included in the candidate transfer learning model to obtain at least one candidate network layer corresponding to the candidate transfer learning model comprises:
determining a network layer with a position distance smaller than a threshold value from the candidate transfer learning model as a network layer to be sampled; sampling from the network layer to be sampled to obtain the candidate network layer;
alternatively, the first and second electrodes may be,
acquiring at least one network layer contained in the candidate transfer learning model and the importance degree of each network layer in the candidate transfer learning model; determining a network layer with the importance degree meeting a second condition from the candidate transfer learning model as a network layer to be sampled based on the importance degree; and sampling from the network layers to be sampled to obtain the candidate network layers.
7. The method according to any one of claims 1 to 6, wherein the constructing a transfer learning model for the training sample set according to candidate network layers whose mobility satisfies a first condition based on the mobility corresponding to each of the candidate network layers includes:
determining the candidate network layer with the mobility meeting the first condition as a target network layer based on the mobility corresponding to each candidate network layer;
determining a candidate transfer learning model to which the target network layer belongs as a target transfer learning model for the training sample set;
determining the target network layer and other network layers positioned in front of the target network layer as first network layers and determining other network layers positioned behind the target network layer as second network layers based on the position of the target network layer in the target transfer learning model;
initializing the parameters of the second network layer to obtain a third network layer;
constructing a transfer learning model for the set of training samples according to the first network layer and the third network layer.
8. An apparatus for determining a transfer learning model, the apparatus comprising:
the network layer determining module is used for determining a plurality of candidate network layers from at least one candidate transfer learning model, and one candidate transfer learning model corresponds to at least one candidate network layer; wherein, different candidate migration learning models are models obtained by training based on different training data;
the sample processing module is used for processing a training sample set based on the candidate network layer to obtain a sample coding information entropy and class coding information entropies respectively corresponding to a plurality of classes, and the training sample set comprises training samples belonging to different classes; the sample coding information entropy is used for indicating the information content contained in the training samples in the training sample set after being coded, and the class coding information entropy corresponding to the class is used for indicating the information content contained in the training samples belonging to the class in the training sample set after being coded;
a mobility determining module, configured to determine, according to the sample coding information entropy and the multiple category coding information entropies, a mobility of the candidate network layer, where the mobility is used to indicate a transfer learning effect of the candidate network layer on the training sample set;
and the model construction module is used for constructing a transfer learning model aiming at the training sample set according to the candidate network layers with the migration rates meeting the first condition based on the migration rates respectively corresponding to the candidate network layers.
9. A computer device comprising a processor and a memory, wherein at least one program is stored in the memory, and wherein the at least one program is loaded and executed by the processor to implement the method for determining a migration learning model according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which at least one program is stored, which is loaded and executed by a processor to implement the method for determining a migration learning model according to any one of claims 1 to 7.
CN202210757206.3A 2022-06-29 2022-06-29 Method, device and equipment for determining transfer learning model and storage medium Pending CN115115050A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210757206.3A CN115115050A (en) 2022-06-29 2022-06-29 Method, device and equipment for determining transfer learning model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210757206.3A CN115115050A (en) 2022-06-29 2022-06-29 Method, device and equipment for determining transfer learning model and storage medium

Publications (1)

Publication Number Publication Date
CN115115050A true CN115115050A (en) 2022-09-27

Family

ID=83330649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210757206.3A Pending CN115115050A (en) 2022-06-29 2022-06-29 Method, device and equipment for determining transfer learning model and storage medium

Country Status (1)

Country Link
CN (1) CN115115050A (en)

Similar Documents

Publication Publication Date Title
CN110929622B (en) Video classification method, model training method, device, equipment and storage medium
CN111797893B (en) Neural network training method, image classification system and related equipment
CN112183577A (en) Training method of semi-supervised learning model, image processing method and equipment
CN111680217B (en) Content recommendation method, device, equipment and storage medium
Baek et al. Deep self-representative subspace clustering network
CN111898703B (en) Multi-label video classification method, model training method, device and medium
CN113392359A (en) Multi-target prediction method, device, equipment and storage medium
CN111352965A (en) Training method of sequence mining model, and processing method and equipment of sequence data
CN112560829B (en) Crowd quantity determination method, device, equipment and storage medium
CN111782826A (en) Knowledge graph information processing method, device, equipment and storage medium
CN113822776B (en) Course recommendation method, device, equipment and storage medium
CN113139628A (en) Sample image identification method, device and equipment and readable storage medium
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
CN113298197A (en) Data clustering method, device, equipment and readable storage medium
CN114282059A (en) Video retrieval method, device, equipment and storage medium
Concolato et al. Data science: A new paradigm in the age of big-data science and analytics
CN113763385A (en) Video object segmentation method, device, equipment and medium
CN110457523B (en) Cover picture selection method, model training method, device and medium
CN114519397A (en) Entity link model training method, device and equipment based on comparative learning
CN112580616B (en) Crowd quantity determination method, device, equipment and storage medium
CN114329004A (en) Digital fingerprint generation method, digital fingerprint generation device, data push method, data push device and storage medium
CN113705293A (en) Image scene recognition method, device, equipment and readable storage medium
CN113609337A (en) Pre-training method, device, equipment and medium of graph neural network
CN111709473A (en) Object feature clustering method and device
WO2022105117A1 (en) Method and device for image quality assessment, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination