CN112052949B

CN112052949B - Image processing method, device, equipment and storage medium based on transfer learning

Info

Publication number: CN112052949B
Application number: CN202010852192.4A
Authority: CN
Inventors: 孙明; 窦浩轩
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2023-09-08
Anticipated expiration: 2040-08-21
Also published as: KR20220023825A; TW202209194A; WO2022036921A1; TWI785739B; JP2022548341A; CN112052949A

Abstract

The application discloses a method and a device for acquiring a target model, electronic equipment and a storage medium, wherein the method for acquiring the target model comprises the following steps: pre-training an original model by using a first training sample to adjust network parameters of the original model; wherein the original model comprises a first subnetwork for feature extraction; obtaining a target model by utilizing at least partial structures of the second sub-network and the pre-trained first sub-network; the second sub-network is used for executing a target task based on the features extracted by the first sub-network; and training the target model by using a second training sample corresponding to the target task so as to adjust network parameters of the target model. By the aid of the scheme, performance of the target model can be improved.

Description

Image processing method, device, equipment and storage medium based on transfer learning

Technical Field

The present application relates to the field of information technologies, and in particular, to an image processing method, apparatus, device, and storage medium based on transfer learning.

Background

The migration learning aims at performing correlation processing on an original model for executing a certain task to obtain a target model to be applied to the target task. With the rapid development of deep learning, computer vision and other technologies, transfer learning has been applied in many scenes. In addition, before the target model is applied to the target task, the target model needs to be trained, so that the trained target model can be smoothly applied to image processing, wherein the more accurate the target model is, the better the image processing result is.

Therefore, how to improve the accuracy of the processing result of the target model on the image is a very valuable subject.

Disclosure of Invention

The application provides an image processing method, device, equipment and storage medium based on transfer learning.

The first aspect of the present application provides an image processing method based on transfer learning, including: pre-training an original model by using a first training sample to adjust network parameters of the original model; wherein the original model comprises a first subnetwork for feature extraction; obtaining a target model by utilizing at least partial structures of the second sub-network and the pre-trained first sub-network; the second sub-network is used for executing a target task based on the features extracted by the first sub-network; and training the target model by using a second training sample corresponding to the target task so as to adjust network parameters of the target model.

Therefore, the original model is pre-trained by using the first training sample, network parameters of the original model are adjusted, the original model comprises a first sub-network for feature extraction, so that a target model is obtained by using at least partial structures of a second sub-network and the trained first sub-network, the second sub-network is used for executing a target task based on the features extracted by the first sub-network, and the target model is trained by using a second training sample corresponding to the target task to adjust network parameters of the template model, therefore, the original model can be adjusted in a network parameter dimension by using the second training sample corresponding to the target task, the original model can be adjusted in a network structure dimension, the freedom degree of network adjustment can be greatly improved, the potential of the pre-trained original model can be fully excavated from the network parameter dimension and the network structure dimension, and the performance of the target model can be improved.

Wherein obtaining the target model using the second sub-network and at least part of the structure of the pre-trained first sub-network comprises: obtaining at least one candidate sub-network by utilizing different partial structures of the first sub-network, and selecting the candidate sub-network meeting the preset condition as an optimal sub-network; and obtaining a target model by utilizing the optimal sub-network and the second sub-network.

Therefore, by utilizing different partial structures of the first sub-network, at least one candidate sub-network is obtained, and the candidate sub-network meeting the preset condition is selected as the optimal sub-network, so that the optimal sub-network and the second sub-network are utilized to obtain the target model, the adjustment space of the network structure dimension can be expanded, and further the performance of the target model can be improved.

Wherein the preset conditions include at least one of: the number of the feature extraction units in the candidate sub-network reaches a preset number, and the candidate model obtained by the candidate sub-network and the second sub-network meets preset performance conditions.

Accordingly, the preset conditions are set to include at least one of: the number of the feature extraction units in the candidate sub-network reaches a preset number, and the candidate model obtained by utilizing the candidate sub-network and the second sub-network meets a preset performance condition.

The first sub-network comprises at least one branch network, each branch network comprises a plurality of network sections which are connected in sequence, and each network section comprises at least one feature extraction unit which is connected in sequence; the candidate sub-networks include at least one feature extraction unit in each network segment in the same branch network, and the feature extraction units in different candidate sub-networks are at least partially different.

Therefore, by setting the first sub-network to include at least one branch network, each branch network including a plurality of network sections sequentially connected, each network section including at least one feature extraction unit sequentially connected, and the candidate sub-network including at least one feature extraction unit in each network section in the same branch network, and the feature extraction units in different candidate sub-networks being at least partially different, the first sub-network can be set to a single branch network of "single chain" or to a multi-branch network of "multi-chain", so that the target model can be acquired in both the multi-branch network and the single branch network, thereby being beneficial to expanding the application range.

The first sub-network comprises a branch network, at least one candidate sub-network is obtained by utilizing different part structures of the first sub-network, and the candidate sub-network meeting the preset condition is selected as an optimal sub-network, and the method comprises the following steps: obtaining an initial at least one candidate sub-network by using at least one feature extraction unit in each network section; selecting a candidate sub-network which is formed by the candidate model formed by the second sub-network and the at least one candidate sub-network and meets the preset performance condition as a selected sub-network; under the condition that the number of the feature extraction units in the selected sub-network is smaller than the preset number, obtaining a new candidate sub-network by utilizing the selected sub-network and at least one feature extraction unit which is not in the selected sub-network, and repeatedly executing the candidate sub-network and subsequent steps for selecting the candidate sub-network which meets the preset performance condition with the candidate model formed by the second sub-network; and under the condition that the number of the feature extraction units in the selected sub-network is equal to the preset number, taking the selected sub-network as an optimal sub-network.

Therefore, by using at least one feature extraction unit in each network section to obtain an initial at least one candidate sub-network, and selecting a candidate sub-network which is formed by the candidate model formed by the second sub-network and meets the preset performance condition from the at least one candidate sub-network as a selected sub-network, so that when the number of feature extraction units in the selected sub-network is smaller than the preset number, a new candidate sub-network is obtained by using the selected sub-network and at least one feature extraction unit which is not in the selected sub-network, and repeatedly executing the candidate sub-network and the subsequent steps of selecting whether the candidate model formed by the second sub-network meets the preset performance condition or not, and when the number of feature extraction units in the selected sub-network is equal to the preset number, the selected sub-network is used as an optimal sub-network, so that the candidate sub-network which meets the preset performance condition can be selected in the process that the number of feature extraction units gradually approaches the preset number, and as the selected sub-network, the target model can be restrained not only in the ' model complexity ' but also can be improved in the model performance ' level.

Wherein obtaining the initial at least one candidate sub-network using the at least one feature extraction unit in each network segment comprises: taking each network section as a target section, and obtaining initial candidate sub-networks corresponding to the target sections by utilizing the first two feature extraction units in each target section and the first feature extraction unit in the rest network sections; obtaining a new candidate sub-network by using the selected sub-network and at least one feature extraction unit not in the selected sub-network, including: in the selected sub-network, respectively determining the feature extraction unit positioned at the last position in each network section as a target unit of the corresponding network section; and obtaining new different candidate sub-networks by using the first feature extraction unit positioned behind the target unit in the selected sub-network and the different network sections respectively.

Therefore, by taking each network section as a target area and utilizing the first two feature extraction units in each target section and the first feature extraction unit in the other network sections, the initial candidate sub-network corresponding to the target section is obtained, so that network structure adjustment can be favorably started from the head of each network section of the first sub-network, the last feature extraction unit in each network section is respectively determined in the selected sub-network and is used as the target unit of the corresponding network section, and the first feature extraction unit behind the target unit in the selected sub-network and the different network sections are respectively utilized to obtain new different candidate sub-networks, the feature extraction units of the different network sections can be adjusted one by one in the subsequent adjustment process, and the accuracy of network adjustment can be favorably improved.

Wherein, under the condition that the preset condition comprises that a candidate model obtained by utilizing the candidate sub-network and the second sub-network meets the preset performance condition; selecting a candidate sub-network meeting a preset condition as an optimal sub-network, comprising: verifying the candidate model obtained by utilizing the candidate sub-network and the second sub-network by utilizing a verification sample corresponding to the target task to obtain a performance score of the candidate model for executing the target task; determining whether the candidate model meets a preset performance condition based on the performance score; and/or, in case the preset condition includes that the number of feature extraction units in the candidate sub-network reaches a preset number; the first sub-network comprises a first number of feature extraction units, the first sub-network comprises a second number of network segments, and the preset number is smaller than the first number and larger than or equal to the second number.

Therefore, under the condition that the preset condition comprises that the candidate model obtained by utilizing the candidate sub-network and the second sub-network meets the preset performance condition, the candidate model obtained by utilizing the candidate sub-network and the second sub-network is verified by utilizing a verification sample corresponding to the target task, the performance score of the candidate model for executing the target task is obtained, and whether the candidate model meets the preset performance condition is determined based on the performance score, so that the accuracy of selecting the optimal sub-network can be improved; in addition, when the preset condition includes that the number of feature extraction units in the candidate sub-network reaches the preset number, the first sub-network includes a first number of feature extraction units, and the first sub-network includes a second number of network segments, and the preset number is smaller than the first number and larger than or equal to the second number, which can be beneficial to reducing complexity of the target model.

The first sub-network comprises at least one branch network, each branch network comprises a plurality of network sections which are connected in sequence, and each network section comprises at least one feature extraction unit which is connected in sequence; pre-training the original model with the first training sample to adjust network parameters of the original model, comprising: before each training, a path of branch network is selected by utilizing a preset selection strategy, and a feature extraction unit is selected in each network section of the selected branch network; the portion of each network segment that is located before the selected feature extraction unit is trained using the first training samples to adjust network parameters of the portion of each network segment that is located before the selected feature extraction unit.

Therefore, the first sub-network is set to include at least one branch network, each branch network includes a plurality of network sections connected in sequence, and each network section includes at least one feature extraction unit connected in sequence, so that before each training, one branch network is selected, a feature extraction unit is selected in each network section of the selected branch network, and a first training sample is used to train a part of each network section before the selected feature extraction unit, so as to adjust network parameters of a part of each network section before the selected feature extraction unit, thereby being beneficial to fully training each part of the first sub-network after multiple training and improving pre-training efficiency.

The feature extraction unit comprises a convolution layer, an activation layer and a batch processing layer which are sequentially connected; and/or the first subnetwork further comprises a downsampling layer located between adjacent network segments.

Therefore, the feature extraction unit is arranged to comprise a convolution layer, an activation layer and a batch processing layer which are sequentially connected, so that the learning effect of the feature extraction unit in the training process can be improved; and the downsampling layer positioned between adjacent network sections is arranged in the first sub-network, so that feature dimension reduction can be realized, the number of data and parameters is compressed, the overfitting is reduced, and meanwhile, the fault tolerance is improved.

Wherein after pre-training the original model with the first training sample to adjust network parameters of the original model, and before obtaining the target model with the second sub-network and at least part of the structure of the pre-trained first sub-network, the method further comprises: the original model is trained using the second training samples to adjust network parameters of the original model.

Therefore, after pre-training, the original model is trained by using the second training sample corresponding to the target task so as to adjust the network parameters of the original model, which can be beneficial to improving the accuracy of subsequent network structure dimension adjustment.

Wherein after deriving the target model using the second sub-network and at least part of the structure of the pre-trained first sub-network, and before training the target model using the second training samples corresponding to the target tasks to adjust network parameters of the target model, the method further comprises: the target model is trained using the first training sample to adjust network parameters of the target model.

Therefore, after the dimension adjustment of the network structure is completed, the first training sample is utilized to train the target model, and then the second training sample corresponding to the target task is utilized to train the target model again, so that the performance of the target model can be improved.

The original model further comprises a third sub-network, wherein the third sub-network is used for executing a preset task based on the extracted features, and the preset task is the same as or different from the target task.

Therefore, by setting the original model to include the third sub-network for performing the preset task based on the extracted features, and the preset task being the same as or different from the target task, it is possible to facilitate further expansion of the range suitable for acquiring the target model.

Wherein the number of first training samples is greater than the number of second training samples.

Therefore, by setting the number of first training samples to be larger than the number of second training samples, it is possible to advantageously reduce the workload of sample labeling on the target task.

A second aspect of the present application provides an image processing apparatus based on transfer learning, including: the system comprises a first training module, a model acquisition module and a second training module, wherein the first training module is used for pre-training an original model by using a first training sample so as to adjust network parameters of the original model; wherein the original model comprises a first subnetwork for feature extraction; the model acquisition module is used for obtaining a target model by utilizing at least partial structures of the second sub-network and the pre-trained first sub-network; the second sub-network is used for executing a target task based on the features extracted by the first sub-network; the second training module is used for training the target model by using a second training sample corresponding to the target task so as to adjust network parameters of the target model.

A third aspect of the present application provides an electronic device, including a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory, so as to implement the image processing method based on transfer learning in the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the image processing method based on transfer learning in the first aspect described above.

According to the scheme, the original model is pre-trained by using the first training sample, the network parameters of the original model are adjusted, the original model comprises the first sub-network for feature extraction, so that the target model is obtained by using the second sub-network and at least part of the structure of the trained first sub-network, the second sub-network is used for executing the target task based on the features extracted by the first sub-network, and the target model is trained by using the second training sample corresponding to the target task so as to adjust the network parameters of the template model, therefore, the original model can be adjusted in the network parameter dimension by using the second training sample corresponding to the target task, the degree of freedom of network adjustment can be greatly improved, the potential of the pre-trained original model can be fully excavated from the network parameter dimension and the network structure dimension, and the performance of the target model can be improved.

Drawings

FIG. 1 is a flow chart of an embodiment of an image processing method based on transfer learning according to the present application;

FIG. 2 is a schematic diagram of a framework of an embodiment of a first subnetwork;

FIG. 3 is a schematic diagram of a framework of another embodiment of a first subnetwork;

FIG. 4 is a schematic diagram of an embodiment of step S12 in FIG. 1;

FIG. 5 is a schematic diagram of a framework of an embodiment of a selected subnetwork;

FIG. 6 is a flow chart of another embodiment of an image processing method based on transfer learning according to the present application;

fig. 7 is a schematic diagram of an image processing apparatus based on transfer learning according to an embodiment of the present application.

FIG. 8 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;

FIG. 9 is a schematic diagram of a frame of an embodiment of a computer readable storage medium of the present application.

Detailed Description

The following describes embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of an image processing method based on transfer learning according to the present application. Specifically, the method may include the steps of:

step S11: pre-training an original model by using a first training sample to adjust network parameters of the original model; wherein the original model comprises a first subnetwork for feature extraction.

In one implementation scenario, the first sub-network may specifically comprise a plurality of network segments, and each network segment may comprise at least one feature extraction unit connected in sequence. Specifically, the feature extraction unit is configured to perform feature extraction, and may specifically include a convolution layer, an activation layer, and a batch layer (Batch Normalization, BN) that are sequentially connected, where the convolution layer specifically may include a plurality of convolution kernels, and may be used to extract features, and the activation layer specifically may be sigmoid, tanh, reLu, and may be used to introduce a nonlinear factor, and the batch layer may be used to perform normalization operation, and through the convolution layer, the activation layer, and the batch layer that are sequentially connected, a learning effect of the feature extraction unit in a training process may be advantageously improved. In addition, in the feature extraction unit, a pooling layer may be further connected after the convolution layer, and is configured to downsample the features extracted by the convolution layer. In addition, the first sub-network further comprises a downsampling layer positioned between adjacent network sections, which can be beneficial to realizing feature dimension reduction, compressing the number of data and parameters, reducing over-fitting and improving fault tolerance. Furthermore, the number of feature extraction units included in each network segment may be the same, e.g., 3, or 4, or 5 feature extraction units are included in each network segment; alternatively, the number of feature extraction units included in each network segment may be quite different, e.g., a first network segment may include 3 feature extraction units, a second network segment may include 4 feature extraction units, and a third network segment may include 5 feature extraction units; alternatively, the number of feature extraction units included in each network segment may not be exactly the same, for example, the first network segment includes 3 feature extraction units, the second network segment also includes 3 feature extraction units, and the third network segment includes 5 feature extraction units, which may be specifically set according to practical application needs, and is not limited herein.

In a specific implementation scenario, several of the above-mentioned network segments may have the same input node, in which case the first sub-network may have multiple branch networks, each of which may specifically comprise a plurality of sequentially connected network segments. Referring to fig. 2 in combination, fig. 2 is a schematic diagram of a frame of an embodiment of a first sub-network, as shown in fig. 2, in which a dashed rectangle indicates network segments, each network segment includes 4 feature extraction units, the first sub-network includes three branch networks, the first branch network is a first row of network segments (for simplifying the schematic illustration, the first branch network schematically depicts only one network segment), the second branch network is a second row of network segments, the third branch network is a third row of network segments (for simplifying the schematic illustration, the third branch network schematically depicts only one network segment), and the three branch networks have the same input nodes. In other scenarios, the first sub-network including other numbers of branch networks may be set according to actual application needs, for example, the first sub-network may be set to include a 2-way branch network, a 4-way branch network, and the like, and specifically, the first sub-network may be similar, which is not limited herein.

In another specific implementation scenario, the plurality of network segments may also be connected sequentially, in which case the first sub-network may have only one branch network, and the branch network includes a plurality of network segments connected sequentially. Referring to fig. 3 in combination, fig. 3 is a schematic diagram of a frame of another embodiment of the first subnetwork, as shown in fig. 3, the dashed rectangle indicates network segments, each network segment includes 4 feature extraction units, and the first subnetwork includes two network segments connected in sequence, in which case the first subnetwork includes only one branch network.

In yet another specific implementation scenario, the original model may include, in addition to the first sub-network, another sub-network for performing a preset task based on the extracted features. The preset task may specifically be a target detection task, an image classification task, a scene segmentation task, and the like, which are not limited herein. The target detection task means that a target object is detected in an image, for example, a vehicle, a pedestrian, or the like is detected in the image; the image classification task represents classifying an image into a certain category, for example, classifying an image into a cat, a dog, a tortoise, or the like; the scene segmentation task is to detect the category to which the pixel points in the image belong, for example, the pixel points respectively belonging to the lane, the vehicle, the green belt and the sky in the image, and the example of the preset task is only one possible use condition in practical application and does not limit the use range. The specific structure of the other sub-network may be set according to practical application requirements, and is not limited herein. For example, in the case that the preset task is a target detection task or an image classification task, the other sub-network may specifically include a plurality of (e.g., 2, 3, etc.) sequentially connected full connection layers, softamx layers, etc., which are not limited herein; or, in the case that the preset task is scene segmentation, the other sub-network may specifically include a full connection layer and a softmax layer, which is not limited herein.

In one implementation scenario, to improve accuracy of the pre-training, the first training samples may specifically be a large-scale data set, i.e., the number of first training samples is greater than a preset value (e.g., 1000, 5000, 10000, etc.). Therefore, the first training sample can be utilized to fully pretrain the original model, and the accuracy of the target model obtained later is improved. In one implementation scenario, to increase the efficiency of pre-training, the first sub-network may include only one path of the branched network, and the path of the branched network may specifically include a plurality of sequentially connected network segments. In this case, a feature extraction unit may be selected in each network segment using a preset selection strategy prior to each training, so that the portion of each network segment that precedes the selected feature extraction unit may be trained using the first training samples to adjust network parameters of the portion of each network segment that precedes the selected feature extraction unit. By the mode, the pre-training efficiency can be improved. Specifically, the preset selection policy may include: a feature extraction unit is randomly selected within each network segment.

In a specific implementation scenario, the preset selection policy may specifically include: randomly sampling within a preset numerical range corresponding to each network segment, wherein the upper limit value of the preset numerical range is the number of feature extraction units contained in the corresponding network segment, and can be recorded as follows for convenience of descriptionRepresenting the number of feature extraction units comprised by the ith network segment, the lower limit of the preset value range may be 1, and may be 1 to +.>Randomly sampling within a preset value range to obtain an integer value, and for convenience of description, the integer value obtained by random sampling can be recorded as +.>Representing integer values obtained by randomly sampling the ith network segment, the first training sample may be used to locate each network segment at the front +.>(i.e. 1 st to->) The feature extraction units train to adjust the position of the network segment before +.>(i.e. 1 st to->) Network parameters of the feature extraction unit.

In another specific implementation scenario, as indicated by the dashed arrow in fig. 3, in the case of training the portion of each network section located before the selected feature extraction unit using the first training sample, the output result of the selected feature extraction unit within each network section may be used as the input data of the next network section. Referring to fig. 3 in combination, as shown in the drawing, when the feature extraction unit selected in the first network section is the 2 nd feature extraction unit, the output result of the 2 nd feature extraction unit can be used as the input data of the next network section; alternatively, when the feature extraction units selected in the first network segment are 3 feature extraction units, the output result of the 3 rd feature extraction unit may be used as the input result of the next network segment, and the other cases may be similar, which is not exemplified here.

In one implementation scenario, where the first subnetwork includes multiple branch networks, a path of the branch network may be selected in the first subnetwork prior to each training using a preset selection policy, and a feature extraction unit may be selected in each network segment included in the selected branch network, such that a portion of each network segment included in the selected branch network that is located before the selected feature extraction unit may be trained using the first training samples to adjust network parameters of the portion of each network segment included in the selected branch network that is located before the selected feature extraction unit. The specific manner of selecting the feature extraction unit in the network segment can be referred to in the foregoing related description, and will not be described herein.

In a specific implementation scenario, the manner in which the branch network is selected in the first subnetwork may be referred to in particular as the manner in which the feature extraction unit is selected in the network segment. Specifically, random sampling may be performed within a preset numerical range, and the upper limit value of the preset numerical range is the number of branch networks included in the corresponding first sub-network, which may be recorded asIndicating that the first sub-network includes N branch networks, the lower limit value of the preset numerical range may be 1, and may be 1 to +. >Randomly sampling within a preset value range to obtain an integer value, and for convenience of description, the integer value obtained by random sampling can be recorded as +.>Indicating that the S-th branch network is selected in the first subnetwork. Referring to fig. 2 in combination, the first sub-network shown in fig. 2 includes 3 branch networks, and then random sampling can be performed in 1 to 3 to obtain an integer value, for example, 2 nd branch networks may be selected as the branch networks, and then a feature extraction unit is selected in each network segment included in the 2 nd branch network, which is specifically referred to the above description and not repeated herein.

In one implementation scenario, the pre-training of the original model may be ended when a preset end condition is met. Specifically, the preset end condition may include: the number of times each first training sample participates in training reaches a preset number of times threshold, and the preset number of times threshold can be set according to practical application requirements, for example, can be set to 100, 120, 150, etc., and is not limited herein.

Step S12: a target model is obtained using at least part of the structure of the second sub-network and the pre-trained first sub-network.

In an embodiment of the disclosure, the second sub-network is configured to perform the target task based on the features extracted by the first sub-network. In one implementation scenario, the target task may specifically include any of the following: the specific meanings of the object detection task, the image classification task, and the scene segmentation task may be referred to the foregoing description, and will not be repeated herein. In addition, as described above, the original model may further include a third sub-network for performing a preset task based on the extracted features, which may be the same as or different from the target task. For example, the preset task and the target task may be both image classification tasks or both target detection tasks; alternatively, the preset task is a target detection task, and the target task is an image classification task, which is not limited herein. The third subnetwork may be the same as the second subnetwork or may be different from the second subnetwork. For example, the third subnetwork may include a fully-connected layer and a softmax layer, the second subnetwork may include two fully-connected layers connected in series and a softmax layer connected after the two fully-connected layers, or the second subnetwork may include a fully-connected layer and a softmax layer as with the third subnetwork, without limitation. Furthermore, in the embodiments of the present disclosure, at least part of the structure of the first sub-network includes trained network parameters, i.e. both the network parameters and the network structure are adjusted during the transfer learning process.

In one implementation scenario, different partial structures of the first sub-network may be utilized to obtain at least one candidate sub-network, and a candidate sub-network satisfying a preset condition is selected as an optimal sub-network, so that the optimal sub-network and the second sub-network may be utilized to obtain the target model.

In a specific implementation scenario, the candidate subnetworks include at least one feature extraction unit in each network segment in the same branch network, and the feature extraction units in different candidate subnetworks are at least partially different, so that each time the partial structure of the first subnetwork is selected, a branch network may be selected, and a feature extraction unit is selected in each network segment of the selected branch network, so that a combination of the parts of each network segment that precede the selected feature extraction unit may be used as a partial structure of the first subnetwork, and thus different partial structures of the first subnetwork may be obtained. Taking the example that the first sub-network includes only one branch network, referring to fig. 3, when the partial structure of the first sub-network is selected for the first time, a third feature extraction unit may be selected randomly in the first network section, and a second feature extraction unit may be selected randomly in the second network section, where a combination of a portion of the first network section located before the third feature extraction unit and a portion of the second network section located before the second feature extraction unit may be used as the partial structure of the first sub-network; when the partial structure of the first sub-network is selected for the second time, the second feature extraction unit may be selected randomly in the first network section, and the third feature extraction unit may be selected randomly in the second network section, then a combination of a portion of the first network section located before the second feature extraction unit and a portion of the second network section located before the third feature extraction unit may be used as the partial structure of the first sub-network, and so on, and will not be exemplified herein. Specifically, the number of times of selection may be set according to practical application requirements, for example, 10, 15, 20, or the like may be selected according to the computational complexity, which is not limited herein.

In another specific implementation scenario, the preset condition may specifically include: and the candidate model obtained by utilizing the candidate sub-network and the second sub-network meets the preset performance condition. Specifically, a verification sample corresponding to the target task may be used to verify the candidate model obtained by using the candidate sub-network and the second sub-network, so as to obtain a performance score of the candidate model for executing the target task, thereby determining whether the candidate model meets a preset performance condition based on the performance score. For example, when the performance score is at the highest value of the performance scores of all candidate models, the corresponding candidate model may be considered to satisfy the preset performance condition.

In yet another specific implementation scenario, the preset condition may specifically further include: the number of feature extraction units in the candidate subnetwork is not greater than a preset number. The preset number may be set according to practical application requirements, for example, may be set to 4, 5, 6, etc., which is not limited herein. In the above manner, the complexity of the target model may be constrained.

In yet another specific implementation scenario, the preset condition may specifically include: the candidate model obtained by the candidate sub-network and the second sub-network meets the preset performance condition, and the number of the feature extraction units in the candidate sub-network is not more than the preset number. Reference may be made specifically to the foregoing description, and details are not repeated here.

In yet another specific implementation scenario, the optimal sub-network and the second sub-network may be connected sequentially to obtain the target model.

In another implementation scenario, the candidate subnetworks include at least one feature extraction unit in each network segment of the same network branch, and feature extraction units in different candidate subnetworks are at least partially different, so that in order to obtain a global optimal solution, all different partial structures in the first subnetwork can be exhausted, each partial structure is used as a corresponding candidate subnetwork, and a candidate subnetwork meeting a preset condition is selected as an optimal subnetwork, so that the optimal subnetwork and the second subnetwork can be utilized to obtain the target model. Specifically, the setting manner of the preset condition may refer to the foregoing description, and will not be repeated herein.

In a specific implementation scenario, taking the first subnetwork including only one branch of the network as an example, when all the different partial structures in the first subnetwork are exhausted, in order to avoid repeated selection, a count value may be assigned to each network segment in the first subnetworkA count value representing the ith network section for setting the count value in each network section every time a partial structure is selected >The former feature extraction unit, wherein the number of network segments comprised in the first sub-network may be denoted +.>Count value->The initial value of (2) may be set to 1, and after each selection of the partial structure, 1-up counting is performed until all the different partial structures in the first sub-network are exhausted. Taking the example that the first sub-network includes 4 network sections and each network section also includes 4 feature extraction units, when a partial structure is selected for the first time, the count value of the 4 network sections can be marked as 1111, and at this time, the combination of the first feature extraction units in each network section can be used as the partial structure selected at this time; the count of 4 network segments can be recorded as 11 when the partial structure is selected for the second time12, at this time, a combination of the first feature extraction unit of the first 3 network sections and the first 2 feature extraction units of the last network section may be used as a part of the structure selected at this time; when the partial structure is selected for the third time, the count value of the 4 network sections can be recorded as 1113, and at the moment, the combination of the first feature extraction unit of the first 3 network sections and the first 3 feature extraction units of the last network section can be used as the partial structure selected at the time; when the partial structure is selected for the fourth time, the count value of the 4 network sections can be recorded as 1114, and at this time, the first feature extraction unit of the first 3 network sections and all feature extraction units of the last network section can be used as the partial structure selected at this time; when the partial structure is selected for the fifth time, the count value of the 4 network sections can be recorded as 1121, and at this time, the combination of the first feature extraction unit of the first 2 network sections, the first 2 feature extraction units of the third network section and the first feature extraction unit of the last network section can be used as the partial structure selected for the present time; when the partial structure is selected for the sixth time, the count value of the 4 network segments may be recorded as 1122, and at this time, the combination of the first feature extraction unit of the first 2 network segments and the first 2 feature extraction units of the second 2 network segments may be used as the partial structure selected for the present time, and so on, which is not described herein.

In one implementation scenario, in order to improve accuracy of network structure dimension adjustment, the original model may be trained by using the second training sample before the target model is obtained by using at least part of structures of the second sub-network and the pre-trained first sub-network, so as to adjust network parameters of the original model, so that the original model can perform network parameter adjustment based on the target task first, and further improvement of accuracy of network structure dimension adjustment can be facilitated. In this case, at least part of the first sub-network structure includes the adjusted network parameters after the training of the second training sample, that is, both the network parameters and the network structure are adjusted during the transfer learning process.

In a specific implementation scenario, the number of the first training samples may be greater than the number of the second training samples, so according to the embodiment disclosed by the application, a target model suitable for a target task can be obtained on the basis of a large-scale data sample (i.e., the first training sample) and a small-scale data sample (i.e., the second training sample) corresponding to the target task, thereby being beneficial to reducing the difficulty of collecting the small-scale data sample and the workload of labeling the second training sample, and further being beneficial to further improving the efficiency of obtaining the target model. Specifically, the number of the first training samples may be 5000, 10000, 15000, etc., and correspondingly, the number of the second training samples may be 100, 200, 300, etc., which may be specifically set according to actual use conditions, and is not limited herein.

Step S13: and training the target model by using a second training sample corresponding to the target task so as to adjust network parameters of the target model.

In one implementation scenario, the target model may be trained using only the second training samples corresponding to the target task to adjust network parameters of the target model. Specifically, the number of the second training samples is not greater than the number of the first training samples, for example, the second training samples are small-scale data samples, and the first training samples are large-scale data samples, which can be referred to in the foregoing description, and will not be described herein.

In another implementation scenario, in order to improve accuracy of the target model, the target model may be trained by using a first training sample to adjust network parameters of the target model, and then training the target model by using a second training sample corresponding to the target task to adjust network parameters of the target model again.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S12 in fig. 1. Specifically, in the embodiment of the present disclosure, the original model is a single-branch network structure, that is, in the embodiment of the present disclosure, the first sub-network includes a path branch network, and the path branch network includes a plurality of network segments sequentially connected, each network segment includes at least one feature extraction unit sequentially connected, and the candidate sub-networks include at least one feature extraction unit in each network segment, and the feature extraction units in different candidate sub-networks are at least partially different. The method specifically comprises the following steps:

step S41: at least one candidate sub-network is initially obtained using at least one feature extraction unit in each network segment.

Specifically, each network section may be taken as a target section, and the first two feature extraction units in each target section and the first feature extraction unit in the rest of the network sections may be utilized to obtain initial candidate sub-networks corresponding to the target areas. Referring to fig. 2 in combination, the first network section and the second network section may be used as target areas, and the first two feature extraction units in the first target section and the first feature extraction unit in the second network section may be used to obtain initial candidate sub-networks of the first target section, specifically, as shown by a dashed arrow in fig. 3, the output result of the second feature extraction unit in the first target section may be used as input data of the second network section, and the first two feature extraction units in the second target section and the first feature extraction unit in the first network section may be used to obtain initial candidate sub-networks of the second target section, specifically, as shown by a dashed arrow in fig. 3, the output result of the first feature extraction unit in the first network section may be used as input data of the second target section. With continued reference to fig. 2, for convenience of description, the initial candidate sub-network corresponding to the first target segment may be denoted as [2,1], which indicates that the initial candidate sub-network corresponding to the first target segment is composed of the first two feature extraction units of the first network segment and the first feature extraction unit of the second network segment, and the initial candidate sub-network corresponding to the second target segment is denoted as [1,2], which indicates that the initial candidate sub-network corresponding to the second target segment is composed of the first feature extraction unit of the first network segment and the first two feature extraction units of the second network segment, and other cases may be similarly mentioned, which are not exemplified herein.

Step S42: and selecting a candidate sub-network which is formed by the candidate model formed by the second sub-network and the at least one candidate sub-network and meets the preset performance condition as the selected sub-network.

Specifically, a verification sample corresponding to the target task may be used to verify the candidate model obtained by using the candidate sub-network and the second sub-network, obtain a performance score of the candidate model for executing the target task, and determine whether the candidate model meets a preset performance condition based on the performance score. Reference may be made specifically to the relevant descriptions in the foregoing disclosed embodiments, and details are not repeated here. For example, the candidate subnetwork [2,1] may be considered as the selected subnetwork upon verification of the verification sample. In other cases, other candidate subnetworks may be selected as the selected subnetwork according to actual conditions, which is not limited herein.

Step S43: and judging whether the number of the feature extraction units in the selected sub-network is smaller than the preset number, if so, executing the step S44, otherwise, executing the step S46.

In the embodiment of the disclosure, the preset number may be preset according to actual application needs. In particular, the settings may be made according to the desired complexity of the object model. For example, 4, 5, 6, etc. may be set, without limitation. For example, when the preset number is 4, since the number of feature extraction units in the selected sub-network [2,1] is 3, which is smaller than the preset number, step S44 may be performed; or, when the preset number is 3, since the number of feature extraction units in the selected subnetwork [2,1] is 3, so that it is not smaller than the preset number, step S46 may be performed, i.e. the selected subnetwork [2,1] is directly used as the optimal subnetwork. Other situations can be similar and are not exemplified here.

In one implementation scenario, the first sub-network may specifically include a first number of feature extraction units, the first sub-network may specifically include a second number of network segments, and the preset number may be set to be smaller than the first number and greater than or equal to the second number.

Step S44: and obtaining a new candidate sub-network by using the selected sub-network and at least one feature extraction unit not in the selected sub-network.

In the embodiment of the present disclosure, the at least one feature extraction unit may be one feature extraction unit, two feature extraction units, or the like, which is not limited herein. For example, in the case where the accuracy requirement of the network structure adjustment is relatively low, the at least one feature extraction unit may be a plurality of feature extraction units of two, three, or the like; under the condition of higher accuracy requirement of network structure adjustment, the device can be a feature extraction unit, and can be specifically set according to practical application requirements, and the device is not limited herein.

In one implementation scenario, in order to improve the accuracy of network structure adjustment, in the selected sub-network, the feature extraction unit located at the last position in each network section may be determined as the target unit of the corresponding network section, so that the first feature extraction unit located after the target unit in the selected sub-network and the different network sections are used to obtain new different candidate sub-networks.

In one implementation scenario, a new and different candidate sub-network may be obtained after adding, to the target unit of the corresponding network segment, the first feature extraction unit located after the target unit in each network segment, respectively. With continued reference to fig. 3, taking the selected sub-network [2,1] as an example, a first feature extraction unit located in the first network section after the target unit (i.e., the second feature extraction unit) is added to the target unit (i.e., the second feature extraction unit) of the corresponding network section of the selected sub-network to obtain a new candidate sub-network, which may be denoted as [3,1] for convenience of description, where the new candidate sub-network is composed of the first three feature extraction units in the first network section and the first feature extraction unit in the second network section; and a first feature extraction unit in the second network section located after the target unit (i.e., the first feature extraction unit) may be added to the target unit (i.e., the first feature extraction unit) in the network section corresponding to the selected subnetwork to obtain a new candidate subnetwork, which may be denoted as [2,2] for convenience of description. Other situations can be similar and are not exemplified here.

Step S45: step S42 and the subsequent steps are re-executed.

Specifically, in the case that the number of feature extraction units in the selected sub-network is smaller than the preset number, the above step S42 may be re-executed, and the candidate sub-network satisfying the preset performance condition may be selected as the selected sub-network, and further, whether the number of feature extraction units in the selected sub-network is smaller than the preset number may be determined. Referring to fig. 5 in combination, fig. 5 is a schematic diagram of a frame of an embodiment of a selected subnetwork, as shown in fig. 5, the feature extraction units of the solid rectangular frame represent selected feature extraction units, the feature extraction units of the dotted rectangular frame represent unselected feature extraction units, and the selected subnetwork shown in fig. 5 is [2,2], taking the selected subnetwork as [2,2], and the preset number is 4 as an example, since the number of feature extraction units in the selected subnetwork is equal to 4, the following step S46 can be executed, namely, the selected subnetwork [2,2] can be used as an optimal subnetwork.

Step S46: and taking the selected sub-network as an optimal sub-network.

Specifically, in the case where the number of feature extraction units in the selected sub-network is not less than the preset number, the selected sub-network may be regarded as the optimal sub-network. By the method, the optimal sub-network in the performance level can be obtained under the model complexity constrained by the preset quantity.

Step S47: and obtaining a target model by utilizing the optimal sub-network and the second sub-network.

Specifically, the optimal sub-network and the second sub-network may be sequentially connected to obtain the target model.

Different from the foregoing embodiment, by using at least one feature extraction unit in each network segment to obtain an initial at least one candidate sub-network, and selecting, from the at least one candidate sub-network, a candidate sub-network that satisfies a preset performance condition with a candidate model formed by the second sub-network as a selected sub-network, so that, in the case that the number of feature extraction units in the selected sub-network is smaller than the preset number, a new candidate sub-network is obtained by using the selected sub-network and at least one feature extraction unit that is not in the selected sub-network, and repeatedly executing the candidate sub-network and subsequent steps to select a candidate sub-network that satisfies the preset performance condition with a candidate model formed by the second sub-network, and in the case that the number of feature extraction units in the selected sub-network is equal to the preset number, the selected sub-network is regarded as an optimal sub-network, so that the candidate sub-network that satisfies the preset performance condition is selected in the process of gradually approximating the number of feature extraction units, so that the efficiency of obtaining the target model can be improved not only in terms of model complexity but also in terms of model performance.

Referring to fig. 6, fig. 6 is a flowchart illustrating another embodiment of an image processing method based on transfer learning according to the present application. Specifically, in an embodiment of the disclosure, the original model includes a first sub-network for feature extraction, the first sub-network including a path branching network, and the path branching network including a plurality of network segments connected in sequence, each network segment including at least one feature extraction unit connected in sequence. The method specifically comprises the following steps:

step S601: before each training of the original model, a feature extraction unit is selected in each network section by using a preset selection strategy.

Reference may be made specifically to the relevant descriptions in the foregoing disclosed embodiments, and details are not repeated here.

Step S602: the portion of each network segment that is located before the selected feature extraction unit is trained using the first training samples to adjust network parameters of the portion of each network segment that is located before the selected feature extraction unit.

Step S603: the original model is trained by using a second training sample corresponding to the target task to adjust network parameters of the original model.

Step S604: at least one candidate sub-network is initially obtained using at least one feature extraction unit in each network segment.

Step S605: and selecting a candidate sub-network which is formed by the candidate model formed by the second sub-network and the at least one candidate sub-network and meets the preset performance condition as the selected sub-network.

In an embodiment of the disclosure, the second sub-network is configured to perform the target task based on the features extracted by the first sub-network. Reference may be made specifically to the relevant descriptions in the foregoing disclosed embodiments, and details are not repeated here.

Step S606: judging whether the number of the feature extraction units of the selected sub-network is smaller than the preset number, if yes, executing step S607, otherwise executing step S609.

Step S607: and obtaining a new candidate sub-network by using the selected sub-network and at least one feature extraction unit not in the selected sub-network.

Step S608: step S605 and the subsequent steps are re-executed.

Step S609: and taking the selected sub-network as an optimal sub-network.

Step S610: and obtaining a target model by utilizing the optimal sub-network and the second sub-network.

Referring to fig. 5 in combination, when the selected subnetwork is subnetwork [2,2] shown in fig. 5 and the preset number is 4, the selected subnetwork [2,2] can be used as an optimal subnetwork, and the optimal subnetwork and the second subnetwork are utilized to obtain the target model. Specifically, the optimal sub-network and the second sub-network may be sequentially connected to obtain the target model.

Step S611: the target model is trained using the first training sample to adjust network parameters of the target model.

With continued reference to fig. 5, when the optimal subnetwork is the selected subnetwork [2,2] shown in fig. 5, the first training sample may be used to train the target model formed by the selected subnetwork [2,2] and the second subnetwork, so as to adjust the network parameters of the target model.

Step S612: and training the target model by using a second training sample corresponding to the target task so as to adjust network parameters of the target model.

With continued reference to fig. 5, when the optimal subnetwork is the selected subnetwork [2,2] shown in fig. 5, the second training sample may be further used to train the target model formed by the selected subnetwork [2,2] and the second subnetwork, so as to adjust the network parameters of the target model.

In contrast to the previous embodiment, by setting the first sub-network for feature extraction in the original model to include a plurality of network segments connected in sequence, each network segment including at least one feature extraction unit connected in sequence, and selecting a feature extraction unit in each network segment before each training using a preset selection policy, and training a portion of each network segment before the selected feature extraction unit using the first training sample to adjust network parameters of a portion of each network segment before the selected feature extraction unit, thereby being capable of facilitating improvement of efficiency of the pre-trained original model at a "network parameter adjustment" level, further, by using at least one feature extraction unit in each network segment, obtaining an initial at least one candidate sub-network, and selecting a candidate sub-network satisfying a preset performance condition with a candidate model composed of a second sub-network from among the at least one candidate sub-network, as a selected sub-network, thereby, under a condition that the number of feature extraction units in the selected sub-network is smaller than the preset number, using the sub-network and at least one sub-network extraction unit not in the selected sub-network, and repeating the preset feature extraction unit in the selected sub-network, and performing a "optimal adjustment" under the condition and the condition that the candidate sub-network is satisfied in the subsequent adjustment level "the candidate sub-network" can be obtained, and the optimal adjustment network can be performed at the "optimal adjustment level, thereby being capable of obtaining the candidate sub-network and the candidate sub-network, and the optimal performance model at the optimal adjustment level, and the following the optimal adjustment condition is obtained at the" and the candidate sub-network, and the candidate sub-network is obtained at the following the optimal adjustment level and the candidate sub-network and the method, the method can greatly improve the freedom degree of network adjustment, fully mine the potential of the pre-trained original model from the network parameter dimension and the network structure dimension, and is beneficial to improving the performance of the target model.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating an image processing apparatus 70 according to an embodiment of the application. The image processing apparatus 70 based on the transfer learning includes: the first training module 71, the model obtaining module 72 and the second training module 73, the first training module 71 is used for pre-training the original model by using the first training sample so as to adjust network parameters of the original model; wherein the original model comprises a first subnetwork for feature extraction; the model acquisition module 72 is configured to obtain a target model using at least part of the structures of the second sub-network and the pre-trained first sub-network; the second sub-network is used for executing a target task based on the features extracted by the first sub-network; the second training module 73 is configured to train the target model with a second training sample corresponding to the target task to adjust network parameters of the target model.

In some disclosed embodiments, the model acquisition module 72 includes a structure search sub-module configured to obtain at least one candidate sub-network using different partial structures of the first sub-network, and select the candidate sub-network satisfying the preset condition as an optimal sub-network, and the model acquisition module 72 includes a model construction module configured to obtain the target model using the optimal sub-network and the second sub-network.

Different from the foregoing embodiment, at least one candidate sub-network is obtained by using different partial structures of the first sub-network, and a candidate sub-network satisfying a preset condition is selected as an optimal sub-network, so that the optimal sub-network and the second sub-network are used to obtain the target model, which can be beneficial to expanding an adjustment space of a "network structure dimension", and further can be beneficial to improving performance of the target model.

In some disclosed embodiments, the preset conditions include at least one of: the number of the feature extraction units in the candidate sub-network reaches a preset number, and the candidate model obtained by the candidate sub-network and the second sub-network meets preset performance conditions.

Unlike the foregoing embodiments, the preset conditions are set to include at least one of: the number of the feature extraction units in the candidate sub-network reaches a preset number, and the candidate model obtained by utilizing the candidate sub-network and the second sub-network meets a preset performance condition.

In some disclosed embodiments, the first sub-network comprises at least one branch network, each branch network comprising a plurality of network segments connected in sequence, each network segment comprising at least one feature extraction unit connected in sequence; the candidate sub-networks include at least one feature extraction unit in each network segment in the same branch network, and the feature extraction units in different candidate sub-networks are at least partially different.

Unlike the foregoing embodiments, by setting the first subnetwork to include at least one branch network, each branch network including a plurality of network segments connected in sequence, each network segment including at least one feature extraction unit connected in sequence, and the candidate subnetwork including at least one feature extraction unit in each network segment in the same branch network, and the feature extraction units in different candidate subnetworks being at least partially different, the first subnetwork can be set to a single-branch network of "single-chain" or to a multi-branch network of "multi-chain", so that the target model can be acquired in both the multi-branch network and the single-branch network, and thus the range of use can be advantageously expanded.

In some disclosed embodiments, the structure search sub-module includes an initializing unit configured to obtain an initial at least one candidate sub-network by using at least one feature extraction unit in each network section, the structure search sub-module includes a performance evaluation unit configured to select, from the at least one candidate sub-network, a candidate sub-network that satisfies a preset performance condition with a candidate model formed by the second sub-network, as a selected sub-network, the structure search sub-module includes a repeated search unit configured to obtain a new candidate sub-network by using the selected sub-network and at least one feature extraction unit that is not in the selected sub-network, and repeatedly perform, in combination with the performance evaluation unit, selecting a candidate sub-network that satisfies the preset performance condition with the candidate model formed by the second sub-network, and subsequent steps, the structure search sub-module includes an optimal acquisition unit configured to take the selected sub-network as an optimal sub-network, if the number of feature extraction units in the selected sub-network is equal to the preset number.

Different from the foregoing embodiment, by using at least one feature extraction unit in each network segment to obtain an initial at least one candidate sub-network, and selecting, from the at least one candidate sub-network, a candidate sub-network that satisfies a preset performance condition with a candidate model formed by the second sub-network as a selected sub-network, so that, in the case that the number of feature extraction units in the selected sub-network is smaller than the preset number, a new candidate sub-network is obtained by using the selected sub-network and at least one feature extraction unit that is not in the selected sub-network, and repeatedly executing the candidate sub-network and subsequent steps of selecting whether the candidate model formed by the second sub-network satisfies the preset performance condition or not, and in the case that the number of feature extraction units in the selected sub-network is equal to the preset number, the selected sub-network is regarded as an optimal sub-network, so that the candidate sub-network that satisfies the preset performance condition is selected in the process of gradually approximating the number of feature extraction units, so that the efficiency of acquiring the target model can be improved, and the target model can be restrained not only in the "model complexity", "model performance" layer.

In some disclosed embodiments, the initializing unit is specifically configured to take each network segment as a target segment, and obtain an initial candidate sub-network corresponding to the target segment by using the first two feature extraction units in each target segment and the first feature extraction unit in the rest of the network segments, and the repeated searching unit is specifically configured to determine, in the selected sub-network, the feature extraction unit located at the last position in each network segment, as the target unit of the corresponding network segment, and obtain new different candidate sub-networks by using the selected sub-network and the first feature extraction unit located after the target unit in different network segments.

Different from the foregoing embodiment, by taking each network segment as a target area and using the first two feature extraction units in each target segment and the first feature extraction unit in the remaining network segments, the initial candidate sub-network corresponding to the target segment is obtained, so that network structure adjustment can be advantageously started from the head of each network segment of the first sub-network, and in the selected sub-network, the feature extraction unit located at the last position in each network segment is determined and used as the target unit of the corresponding network segment, and the first feature extraction unit located after the target unit in the selected sub-network and the different network segments is used to obtain new different candidate sub-networks, so that the feature extraction units of the different network segments can be adjusted one by one in the subsequent adjustment process, and the accuracy of network adjustment can be advantageously improved.

In some disclosed embodiments, the preset performance condition is satisfied when the preset condition includes a candidate model obtained using the candidate sub-network and the second sub-network; the performance evaluation unit is specifically configured to verify a candidate model obtained by using the candidate sub-network and the second sub-network by using a verification sample corresponding to the target task, so as to obtain a performance score of executing the target task by using the candidate model; under the condition that the preset condition comprises that the number of the feature extraction units in the candidate sub-network reaches the preset number; the first sub-network comprises a first number of feature extraction units, the first sub-network comprises a second number of network segments, and the preset number is smaller than the first number and larger than or equal to the second number.

Different from the foregoing embodiment, in the case that the preset condition includes that the candidate model obtained by using the candidate sub-network and the second sub-network meets the preset performance condition, the candidate model obtained by using the candidate sub-network and the second sub-network is verified by using the verification sample corresponding to the target task, so as to obtain the performance score of the candidate model for executing the target task, and whether the candidate model meets the preset performance condition is determined based on the performance score, so that the accuracy of selecting the optimal sub-network can be improved; in addition, when the preset condition includes that the number of feature extraction units in the candidate sub-network reaches the preset number, the first sub-network includes a first number of feature extraction units, and the first sub-network includes a second number of network segments, and the preset number is smaller than the first number and larger than or equal to the second number, which can be beneficial to reducing complexity of the target model.

In some disclosed embodiments, the first sub-network comprises at least one branch network, each branch network comprising a plurality of network segments connected in sequence, each network segment comprising at least one feature extraction unit connected in sequence, the first training module 71 comprises a unit selection sub-module for selecting one branch network with a preset selection policy before each training, and selecting a feature extraction unit in each network segment of the selected branch network, the first training module 71 comprises a sample training sub-module for training, with a first training sample, the portion of each network segment preceding the selected feature extraction unit to adjust network parameters of the portion of each network segment preceding the selected feature extraction unit.

Different from the foregoing embodiment, the first sub-network is configured to include at least one branch network, and each branch network includes a plurality of network sections sequentially connected, and each network section includes at least one feature extraction unit sequentially connected, so that before each training, one branch network is selected, and a feature extraction unit is selected in each network section of the selected branch network, and a first training sample is used to train a portion of each network section located before the selected feature extraction unit, so as to adjust network parameters of a portion of each network section located before the selected feature extraction unit, thereby facilitating full training of each portion of the first sub-network after multiple training, and improving pre-training efficiency.

In some disclosed embodiments, the feature extraction unit includes a convolution layer, an activation layer, and a batch layer connected in sequence; and/or the first subnetwork further comprises a downsampling layer located between adjacent network segments.

Different from the previous embodiment, the feature extraction unit is set to include a convolution layer, an activation layer and a batch processing layer which are sequentially connected, so that the learning effect of the feature extraction unit in the training process can be improved; and the downsampling layer positioned between adjacent network sections is arranged in the first sub-network, so that feature dimension reduction can be realized, the number of data and parameters is compressed, the overfitting is reduced, and meanwhile, the fault tolerance is improved.

In some disclosed embodiments, the image processing apparatus 70 based on the transfer learning further includes a third training module for training the original model with the second training samples to adjust network parameters of the original model.

Different from the foregoing embodiment, after pre-training, the original model is trained by using the second training sample corresponding to the target task, so as to adjust the network parameters of the original model, which can be beneficial to improving the accuracy of subsequent network structure dimension adjustment.

In some disclosed embodiments, the image processing apparatus 70 based on transfer learning further includes a fourth training module for training the target model with the first training sample to adjust network parameters of the target model.

In contrast to the foregoing embodiments, after the network structure dimension adjustment is completed, the target model is trained by using the first training sample, and then the target model is trained again by using the second training sample corresponding to the target task, which can be beneficial to improving the performance of the target model.

In some disclosed embodiments, the original model further comprises a third sub-network for performing a preset task based on the extracted features, wherein the preset task is the same as or different from the target task.

Different from the foregoing embodiment, by setting the original model to include the third sub-network for performing the preset task based on the extracted features, and the preset task being the same as or different from the target task, it is possible to facilitate further expansion of the range suitable for acquiring the target model.

In some disclosed embodiments, the number of first training samples is greater than the number of second training samples.

Unlike the foregoing embodiments, by setting the number of first training samples to be larger than the number of second training samples, it is possible to advantageously reduce the workload of sample labeling on a target task.

Referring to fig. 8, fig. 8 is a schematic diagram of a frame of an electronic device 80 according to an embodiment of the application. The electronic device 80 comprises a memory 81 and a processor 82 coupled to each other, the processor 82 being adapted to execute program instructions stored in the memory 81 for implementing the steps of any of the above-described embodiments of the image processing method based on transfer learning. In one particular implementation scenario, electronic device 80 may include, but is not limited to: the microcomputer and the server, and the electronic device 80 may also include a mobile device such as a notebook computer and a tablet computer, which is not limited herein.

Specifically, the processor 82 is configured to control itself and the memory 81 to implement the steps of any of the above-described embodiments of the image processing method based on the transfer learning. The processor 82 may also be referred to as a CPU (Central Processing Unit ). The processor 82 may be an integrated circuit chip having signal processing capabilities. The processor 82 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be commonly implemented by an integrated circuit chip.

According to the scheme, the second training samples corresponding to the target tasks can be utilized for adjustment in the network parameter dimension, the original model can be adjusted in the network structure dimension, the degree of freedom of network adjustment can be greatly improved, the potential of the pre-trained original model can be fully mined from the network parameter dimension and the network structure dimension, and the performance of the target model can be improved.

Referring to fig. 9, fig. 9 is a schematic diagram of a frame of an embodiment of a computer readable storage medium 90 according to the present application. The computer-readable storage medium 90 stores program instructions 901 executable by a processor, the program instructions 901 for implementing the steps of any of the above-described embodiments of the image processing method based on transfer learning.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

The elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. An image processing method based on transfer learning, comprising the steps of:

pre-training an original model by using a first training sample image to adjust network parameters of the original model; wherein the raw model includes a first subnetwork for image feature extraction, the raw model for performing one of object detection, image classification, and scene segmentation based on the extracted image features;

Obtaining a target model by using the second sub-network and at least part of the structure of the first sub-network which is pre-trained; wherein the second sub-network is configured to perform a target task based on the image features extracted by the first sub-network, the target task including one of target detection, image classification, and scene segmentation;

training the target model by using a second training sample image corresponding to the target task so as to adjust network parameters of the target model;

and extracting the characteristics of the image to be detected by utilizing at least part of the structure of the first subnetwork in the target model, and executing the target task on the characteristics of the image to be detected by utilizing the second subnetwork in the target model to obtain an image processing result of the image to be detected.

2. The method of claim 1, wherein the deriving the object model using the second sub-network and at least part of the structure of the first sub-network that is pre-trained comprises:

obtaining at least one candidate sub-network by utilizing different partial structures of the first sub-network, and selecting the candidate sub-network meeting preset conditions as an optimal sub-network;

And obtaining the target model by utilizing the optimal sub-network and the second sub-network.

3. The method of claim 2, wherein the preset conditions include at least one of: the number of the image feature extraction units in the candidate sub-network reaches a preset number, and the candidate model obtained by utilizing the candidate sub-network and the second sub-network meets preset performance conditions.

4. A method according to claim 3, wherein the first sub-network comprises at least one branch network, and each branch network comprises a plurality of network segments connected in sequence, each network segment comprising at least one image feature extraction unit connected in sequence; the candidate subnetworks include at least one image feature extraction unit in each of the network segments in the same branched network, and the image feature extraction units in different ones of the candidate subnetworks are at least partially different.

5. The method according to claim 4, wherein the first sub-network includes one path of the branch network, the obtaining at least one candidate sub-network by using different partial structures of the first sub-network, and selecting the candidate sub-network satisfying a preset condition as an optimal sub-network includes:

Obtaining an initial at least one candidate sub-network by using at least one image feature extraction unit in each network section;

selecting a candidate sub-network which meets a preset performance condition with a candidate model formed by the second sub-network from the at least one candidate sub-network as a selected sub-network;

when the number of the image feature extraction units in the selected sub-network is smaller than a preset number, obtaining a new candidate sub-network by using the selected sub-network and at least one image feature extraction unit not in the selected sub-network, and repeatedly executing the candidate sub-network and subsequent steps of which the candidate model formed by the selected sub-network and the second sub-network meets a preset performance condition;

and under the condition that the number of the image feature extraction units in the selected sub-network is equal to the preset number, taking the selected sub-network as the optimal sub-network.

6. The method of claim 5, wherein said deriving initial at least one candidate sub-network using at least one image feature extraction unit in each of said network segments comprises:

taking each network section as a target section, and obtaining initial candidate sub-networks corresponding to the target sections by utilizing the first two image feature extraction units in each target section and the first image feature extraction unit in the rest network sections;

The obtaining the new candidate sub-network by using the selected sub-network and at least one image feature extraction unit not in the selected sub-network includes:

in the selected sub-network, respectively determining an image feature extraction unit positioned at the last position in each network section as a target unit corresponding to the network section;

and obtaining new different candidate subnetworks by respectively utilizing the selected subnetwork and the first image feature extraction unit positioned behind the target unit in different network sections.

7. A method according to claim 3, wherein in case the preset condition comprises that a candidate model obtained using the candidate sub-network and the second sub-network meets a preset performance condition; the selecting the candidate sub-network meeting the preset condition as the optimal sub-network comprises the following steps:

verifying the candidate model obtained by utilizing the candidate sub-network and the second sub-network by utilizing a verification sample image corresponding to the target task to obtain a performance score of the candidate model for executing the target task;

determining whether the candidate model meets the preset performance condition based on the performance score;

And/or, in the case that the preset condition includes that the number of image feature extraction units in the candidate sub-network reaches a preset number; the first sub-network includes a first number of image feature extraction units, the first sub-network includes a second number of network segments, the preset number is less than the first number and greater than or equal to the second number.

8. The method according to any one of claims 1 to 7, wherein the first sub-network comprises at least one branch network, and each branch network comprises a plurality of network segments connected in sequence, each network segment comprising at least one image feature extraction unit connected in sequence; the pre-training the original model with the first training sample image to adjust network parameters of the original model includes:

before each training, selecting one path of the branch network by using a preset selection strategy, and selecting one image feature extraction unit in each network section of the selected branch network;

and training the part of each network section positioned before the selected image feature extraction unit by using the first training sample image so as to adjust network parameters of the part of each network section positioned before the selected image feature extraction unit.

9. The method of claim 8, wherein the image feature extraction unit comprises a convolution layer, an activation layer, and a batch layer connected in sequence;

and/or the first subnetwork further comprises a downsampling layer located between adjacent said network segments.

10. The method according to any one of claims 1 to 7, wherein after said pre-training an original model with a first training sample image to adjust network parameters of said original model, and before said obtaining said target model with a second sub-network and at least part of the structure of said pre-trained first sub-network, the method further comprises:

and training the original model by using the second training sample image so as to adjust network parameters of the original model.

11. The method according to any one of claims 1 to 7, wherein after the obtaining the target model using the second sub-network and at least part of the structure of the pre-trained first sub-network, and before the training the target model using the second training sample image corresponding to the target task to adjust network parameters of the target model, the method further comprises:

And training the target model by using the first training sample image so as to adjust network parameters of the target model.

12. The method according to any one of claims 1 to 7, wherein,

the original model further includes a third sub-network for performing a preset task based on the extracted image features, wherein the preset task is the same as or different from the target task.

13. The method of any one of claims 1 to 7, wherein the number of first training sample images is greater than the number of second training sample images.

14. An image processing apparatus based on transfer learning, comprising:

the first training module is used for pre-training the original model by using the first training sample image so as to adjust network parameters of the original model; wherein the raw model includes a first subnetwork for image feature extraction, the raw model for performing one of object detection, image classification, and scene segmentation based on the extracted image features;

the model acquisition module is used for obtaining a target model by utilizing the second sub-network and at least part of the structure of the first sub-network which is pre-trained; the second sub-network is used for executing a target task based on the image features extracted by the first sub-network; the target task includes one of target detection, image classification, and scene segmentation;

The second training module is used for training the target model by using a second training sample image corresponding to the target task so as to adjust network parameters of the target model;

and the application module is used for extracting the characteristics of the image to be detected by utilizing at least part of the structure of the first sub-network in the target model, and executing the target task on the characteristics of the image to be detected by utilizing the second sub-network in the target model to obtain an image processing result of the image to be detected.

15. An electronic device comprising a memory and a processor coupled to each other, the processor configured to execute program instructions stored in the memory to implement the transfer learning based image processing method of any one of claims 1 to 13.

16. A computer-readable storage medium having stored thereon program instructions, which when executed by a processor, implement the image processing method based on transfer learning of any one of claims 1 to 13.