CN113505848A

CN113505848A - Model training method and device

Info

Publication number: CN113505848A
Application number: CN202110848100.XA
Authority: CN
Inventors: 白亚龙; 张炜; 梅涛; 周伯文
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-15
Anticipated expiration: 2041-07-27
Also published as: CN113505848B; WO2023005386A1

Abstract

The embodiment of the disclosure discloses a model training method and a model training device. One embodiment of the method comprises: acquiring enhanced image sets corresponding to at least one original image respectively, and acquiring an image processing result corresponding to each enhanced image set, wherein each enhanced image set comprises enhanced images of at least two enhancement levels; acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to a feature extraction result output by each feature extraction network; and inputting the enhanced images in the enhanced image set into the initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function. The embodiment improves the adaptability of the model to be trained to various data enhancement methods.

Description

Model training method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a model training method and device.

Background

With the rapid development of Artificial Neural Networks (ANNs), they have been widely used in various image processing tasks such as image recognition, image classification, image retrieval, semantic segmentation, multi-modal image processing, and the like. Problems such as the fact that training samples contain noise signals, the number of training samples is limited, and the fact that training samples are fit transitionally are frequently encountered in the training process of various image processing models (such as convolutional neural networks and the like).

Currently, data enhancement technology is widely applied to training processes of various image processing models as a strategy for expanding training samples at low cost to improve some of the problems. Types of data enhancement include, but are not limited to, random image flipping, image cropping, random occlusion of images, and the like. Different types of data enhancement have different influences on the training effect of the image processing model, and meanwhile, the same type of data enhancement has different influences on the training effect of the image processing model respectively corresponding to different image processing tasks, so that how to select the appropriate data enhancement type is a problem to be considered.

Disclosure of Invention

The embodiment of the disclosure provides a model training method and device.

In a first aspect, an embodiment of the present disclosure provides a model training method, including: acquiring enhanced image sets corresponding to at least one original image respectively, and acquiring an image processing result corresponding to each enhanced image set, wherein each enhanced image set comprises enhanced images of at least two enhancement levels; acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to a feature extraction result output by each feature extraction network; and inputting the enhanced images in the enhanced image set into the initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

In a second aspect, an embodiment of the present disclosure provides an image processing method, including: acquiring an image to be processed; and inputting the image to be processed into an image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in the trained initial model, and the initial model is obtained by training by using the method described in any implementation manner of the first aspect.

In a third aspect, an embodiment of the present disclosure provides a model training apparatus, including: the image processing device comprises an enhanced image set acquisition unit, a processing unit and a processing unit, wherein the enhanced image set acquisition unit is configured to acquire enhanced image sets corresponding to at least one original image respectively, and acquire an image processing result corresponding to each enhanced image set, and each enhanced image set comprises enhanced images of at least two enhancement levels; the model acquisition unit is configured to acquire an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, each processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network; and the training unit is configured to input the enhanced images in the enhanced image set to the initial model, and train the initial model by using a preset loss function, wherein the image processing result corresponding to the input enhanced image set is used as an expected output result.

In a fourth aspect, an embodiment of the present disclosure provides an image processing apparatus including: a to-be-processed image acquisition unit configured to acquire a to-be-processed image; and the processing unit is configured to input the image to be processed into an image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in the trained initial model, and the initial model is obtained by training with the method described in any one implementation manner of the first aspect.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which computer program, when executed by a processor, implements the method as described in any of the implementations of the first aspect.

According to the model training method and device provided by the embodiment of the disclosure, in the model training process, the enhanced images corresponding to different enhancement levels are distinguished and respectively input to the processing networks corresponding to the enhancement levels, so that the processing network corresponding to each enhancement level extracts the features of each enhanced image above the enhancement level, and the image processing result is generated according to the obtained feature extraction result, so that the features which are contained in the enhanced images with higher enhancement levels and are beneficial to the image processing task can be learned in the model training process, the influence of the features which are contained in the enhanced images with higher enhancement levels and are not beneficial to the image processing task on the model training is reduced, and the model training effect is further improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a model training method according to the present disclosure;

FIG. 3a is a schematic diagram of a network structure of an initial model used in a conventional model training method;

FIG. 3b is a schematic diagram of a network structure of an initial model in the model training method according to the present embodiment;

FIG. 3c is a schematic diagram of the feature extraction operation of the initial model employed by the prior art model training method;

FIG. 3d is a diagram illustrating a feature extraction operation of an initial model in the model training method according to the present embodiment;

FIG. 4 is still another schematic diagram of a network structure of an initial model in the model training method according to the embodiment

FIG. 5 is a flow diagram for one embodiment of an image processing method according to the present disclosure;

FIG. 6 is a schematic block diagram of one embodiment of a model training apparatus according to the present disclosure;

FIG. 7 is a schematic block diagram of one embodiment of an image processing apparatus according to the present disclosure;

FIG. 8 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates an exemplary architecture 100 to which embodiments of the model training method or model training apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various client applications may be installed on the

terminal devices

101, 102, 103. Such as browser-type applications, search-type applications, instant messaging tools, image processing-type applications, and so forth.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a server that provides back-end support for the

terminal devices

101, 102, 103. The server 105 may obtain at least one enhanced image set of the original image, an image processing result and an initial model corresponding to each enhanced image set, as training data, and then complete training of the initial model by using a preset loss function to obtain an image processing model. Then, the server 105 may receive the image processing request sent by the

terminal device

101, 102, 103, and process the image indicated by the image processing request by using the trained image processing model to obtain an image processing result.

It should be noted that the model training method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the model training apparatus is generally disposed in the server 105. In some cases,

terminal devices

101, 102, 103 and network 104 may not be present.

It should also be noted that the

terminal devices

101, 102, 103 may also have a model training class application installed therein, and the

terminal devices

101, 102, 103 may complete model training based on the model training class application. In this case, the model training method may be executed by the

terminal devices

101, 102, and 103, and accordingly, the model training apparatus may be provided in the

terminal devices

101, 102, and 103. At this point, the exemplary system architecture 100 may not have the server 105 and the network 104.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a model training method according to the present disclosure is shown. The model training method comprises the following steps:

step 201, obtaining enhanced image sets corresponding to at least one original image respectively, and obtaining an image processing result corresponding to each enhanced image set.

In the present embodiment, the original image may be an arbitrarily designated image. Each of the at least one original image may be different. After an original image is designated, data enhancement processing can be performed on the original image to obtain an enhanced image corresponding to the original image. Different enhanced images can be obtained by processing the same original image by adopting different data enhancement methods.

In particular, various data enhancement methods may be employed on the original image to achieve image enhancement. By way of example, data enhancement methods include, but are not limited to, Blur (Blur), Flip (Flip), normalization (normalization), Transpose (Transpose), random crop (random crop), random Gamma, Rotate (Rotate), optical distortion (optical distortion), grid distortion (gridsdistorsion), elastic transform (elastic transform), random grid shuffle (random gridshuff), CutOut (random erasure), Gray (graying), and the like.

The enhanced image set to which each original image corresponds may be composed of the enhanced image to which the original image corresponds. Each enhanced image set may include enhanced images of at least two levels of enhancement. The enhancement level may be used to characterize an enhanced image formed using a corresponding data enhancement method, compared to the information loss of the original image. The information of the image includes various information such as color, object structure, and the like. In general, the more information loss that is caused, the higher the corresponding enhancement level.

As an example, flipping typically does not result in the original image losing too much information compared to random grid shuffling, which typically results in the original image losing a lot of important information. Thus, the enhancement level corresponding to random grid shuffling is higher than the enhancement level corresponding to flipping.

The division mode of the enhancement grade can be flexibly set according to the actual application scene. For example, the enhancement levels may be divided into two types, weak data enhancement and strong data enhancement. For example, the two enhancement levels may be further divided into a level one enhancement and a level two enhancement, and the level two enhancement level is higher than the level one enhancement level. At this time, the enhancement levels include three types of weak data enhancement, strong data enhancement at level one, and strong data enhancement at level two.

It should be noted that each enhanced image set may include any number of enhanced images for each enhancement level.

The image processing result corresponding to the enhanced image set may refer to an image processing result corresponding to the original image. In general, each enhanced image in the enhanced image set corresponds to the same image processing result. Specifically, the image processing result corresponding to the enhanced image set can be flexibly determined according to the actual application scene. For example, in a scenario where an image processing model for image classification is trained, the image processing result may refer to a class to which the image belongs. For another example, in a scenario of training an image processing model for image detection, the image processing result may refer to a position of an object to be detected in an image.

An executing subject of the model training method (e.g., the server 105 shown in fig. 1, etc.) may obtain the image processing result corresponding to each of the at least one enhanced image set and each enhanced image set from a local, connected database, a third-party data platform, or a storage device (e.g., the

terminal devices

101, 102, 103 shown in fig. 1, etc.). It should be noted that the image processing results corresponding to at least one enhanced image set and each enhanced image set may be obtained from the same data source, or may be obtained from different data sources.

Step 202, an initial model is obtained.

In this embodiment, the execution agent may obtain the pre-constructed initial model from a local or other data source. Wherein the initial model may comprise processing networks respectively corresponding to different levels of enhancement. The processing network corresponding to each enhancement level may include an output network and a feature extraction network corresponding to each enhancement level above the enhancement level.

As an example, the processing network corresponding to the enhancement level "N" may include an output network and feature extraction networks respectively corresponding to the enhancement levels (i.e., the enhancement levels "N", "N + 1", "N + 2" …) having a level not less than "N".

The feature extraction network in each processing network may be configured to extract features of an image, and the output network may be configured to generate an image processing result according to a feature extraction result output by each feature extraction network included in the processing network. It should be noted that the enhancement level corresponding to the enhanced image input to each feature extraction network may be generally the same as the enhancement level corresponding to the feature extraction network.

After each feature extraction network in each processing network outputs the extracted features, the output network may generate image processing results according to the features output by each feature extraction network by various methods (e.g., sequential stitching, fusion, etc.).

Specifically, the feature extraction network may employ various existing network structures for extracting image features. For example, a feature extraction network may be composed of several convolutional layers and pooling layers. The input network can be flexibly constructed according to the actual application scenario. For example, in the context of image classification, the input network may be various classifiers or the like.

In some optional implementations of this embodiment, the feature extraction networks corresponding to the same enhancement level in the initial model may share network parameters.

As an example, when the enhancement level includes a level one and a level two, and the level one is lower than the level two, the processing network corresponding to the level one may include a feature extraction network corresponding to the level one and a feature extraction network corresponding to the level two, and the processing network corresponding to the level two may include a feature extraction network corresponding to the level two, and at this time, the feature extraction network corresponding to the level two included in the processing network corresponding to the level one and the feature extraction network corresponding to the level two included in the processing network corresponding to the level two may share network parameters.

The network structure can be simplified by sharing the network parameters, the processing processes of the enhanced images with different enhancement levels can be associated, the influence of noise generated by the enhanced images with stronger enhancement levels in the training process on the model training effect is avoided, and the model training efficiency and the training effect are improved.

In some optional implementation manners of this embodiment, the scales of the feature extraction results output by the feature extraction networks included in the processing networks corresponding to different enhancement levels may be different. Specifically, the scale of the feature extraction result output by the feature extraction network included in the processing network corresponding to each enhancement level may be flexibly set by a technician in advance.

It should be noted that, for a processing network including two or more feature extraction networks, the scale of the feature extraction result corresponding to the processing network may refer to the scale of the concatenation or fusion result of the feature extraction results corresponding to each of the feature extraction networks included in the processing network.

As an example, when the enhancement level includes level one and level two, and level one is lower than level two, the processing network corresponding to level one includes the feature extraction network corresponding to level one and the feature extraction network corresponding to level two, and the feature extraction result output by the feature extraction network corresponding to level one has a length H, a width W, a number of channels (N-M), a length H, a width W, and a number M. Wherein N is greater than M. At this time, the scale of the sequential concatenation result of the feature extraction results of the two feature extraction networks included in the processing network corresponding to the level one is: length H, width W, number of channels N. The processing network corresponding to the level two may include a feature extraction network corresponding to the level two, and the feature extraction result of the feature extraction network corresponding to the level two has a length H, a width W, and a number of channels M.

Reference is now made to fig. 3a and 3 b. Fig. 3a is a schematic diagram of a network structure of an initial model used in a conventional model training method. Fig. 3b is a schematic diagram of a network structure of an initial model in the model training method according to the present embodiment.

As shown in fig. 3a, the initial model adopted by the existing model training method generally comprises a feature extraction network and an output network. The feature extraction network may include a plurality of convolution layers to extract image features, and the output network is configured to generate an image processing result according to the image features extracted by the feature extraction network.

The specific training process usually includes obtaining enhanced image sets corresponding to at least one original image, and obtaining image processing results (such as image categories) corresponding to each enhanced image set. The enhanced image set corresponding to each original image can be obtained by performing various enhancement processing on the original image, and the enhanced images in each enhanced image set are not distinguished by enhancement levels. Then, inputting the enhanced images in each enhanced image set into the initial model, comparing the difference between the result output by the initial model and the image processing result obtained in advance, adjusting the network parameters of the initial model according to the difference, and repeating the training process until the training of the initial model is completed.

As shown in fig. 3b, the enhancement level is divided into two types of weak data enhancement and strong data enhancement. The initial model includes a first processing network corresponding to weak data enhancements and a second processing network corresponding to strong data enhancements. The first processing network includes a first feature extraction network corresponding to a weak data enhancement, a second feature extraction network corresponding to a strong data enhancement, and a first output network corresponding to a weak data enhancement. The second processing network includes a second feature extraction network corresponding to the strong data enhancement and a second output network corresponding to the strong data enhancement.

Specifically, the weak data enhanced image is simultaneously input into a first feature extraction network and a second feature extraction network, and then a first image processing result (such as an image category) is generated by a first output network according to features respectively extracted by the first feature extraction network and the second feature extraction network. The strong data enhanced image is only input to the second feature extraction network, and then a second image processing result (such as an image category) is generated by the second output network according to the features extracted by the second feature extraction network.

Therefore, the existing model training method is to directly input the enhanced images into the initial model for training without performing enhancement grade distinction, and the initial model has no difference in the processing process of different enhanced images. The model training method proposed in this embodiment is to perform enhancement level differentiation on the enhanced image and input the enhanced image into the processing network corresponding to the enhancement level in the initial model for processing. That is, the existing model training method and the model training method proposed in this embodiment have essential differences in feature extraction operations of input images.

Step 203, inputting the enhanced images in the enhanced image set to the initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

In this embodiment, an enhanced image set may be selected from at least one obtained enhanced image set, and an enhanced image in the selected enhanced image set is input to an initial model to obtain an image processing result generated by the initial model, then an image processing result generated by the initial model is compared with a pre-obtained image processing result, a loss value is calculated by using a preset loss function, parameters of the initial model are adjusted by using algorithms such as back propagation and gradient descent according to the loss value, then it is determined whether training of the adjusted initial model is completed, and if not, an unselected enhanced image set is continuously selected and input to the adjusted initial model to continue training until training of the initial model is completed.

The loss function can be flexibly set by a technician in advance according to actual application requirements. For example, the loss function may characterize a sum of differences between the image processing results of the respective processing networks comprised by the process model and the desired output results, when the parameters of the initial model are adjusted by minimizing the loss function.

Optionally, the loss function may comprise a rank loss function. The level loss function may be used to characterize similarity between output results of feature extraction networks included in the processing network corresponding to each enhancement level. In this case, by minimizing the rank loss function (i.e., making the difference between the output results of the feature extraction networks as large as possible), the feature extraction network corresponding to the lowest enhancement level can concentrate on extracting the features of the enhanced image at the lowest enhancement level, while the feature extraction network corresponding to the higher enhancement level can concentrate on extracting the common features of the enhanced images corresponding to the enhancement levels higher than the enhancement level.

Alternatively, the loss function may include a loss function corresponding to each processing network of each enhancement level. The loss functions of the processing networks corresponding to different enhancement levels may be designed in the same way or in different ways. For example, the penalty function for the processing network for each enhancement level may be used to characterize the difference between the image processing results actually output by the processing network during the training process and the corresponding expected output results.

Based on the above description, the loss function can also be designed as the sum of the loss functions corresponding to the level loss function and the processing network of each enhancement level, so as to comprehensively control the adjustment of the network parameters trained by the model from multiple aspects.

As an example, reference is continued to fig. 3c and 3 d. Fig. 3c shows a schematic diagram of the feature extraction operation of the initial model adopted by the existing model training method. Fig. 3d is a schematic diagram of the feature extraction operation of the initial model in the model training method according to the present embodiment.

In particular, as shown in figure 3c,

representing the input image.

And

respectively representing convolution operations of the (t-1) th and the t-th network layers of the feature extraction network. h is_tLength of convolution kernel, w_tRepresenting the width of the convolution kernel, n_t-1And n_tRespectively representing the channel numbers of the feature extraction results output after the convolution processing of the (t-1) th convolution layer and the t-th convolution layer.

At this time, taking the initial model for image classification as an example, the loss function of the initial model can be shown as follows:

wherein the content of the first and second substances,

the function of the loss is a function of,

representing a cross entropy loss function.

An input image representing the ith convolutional layer. N represents the total number of convolutional layers. l_iClass label representing input imageAnd (6) a label. W_tAnd b_tRepresenting the network parameters to be learned.

As shown in fig. 3d, phi denotes the enhanced image corresponding to the input weak enhancement level.

Indicating the enhanced image corresponding to the input strong enhancement level.

And

respectively representing the (t-1) th and t-th convolutional layers.

And the feature extraction network corresponding to the weak enhancement level performs convolution operation on the enhanced image corresponding to the weak enhancement level.

And the feature extraction network corresponding to the strong enhancement level performs convolution operation on the enhanced image corresponding to the weak enhancement level.

And expressing the convolution operation of the feature extraction network corresponding to the strong enhancement level on the enhanced image corresponding to the strong enhancement level. m is_tAnd m_t-1Respectively representing convolutional layers

The number of input and output channels when processing the enhanced image corresponding to the strong enhancement level. n is_t-1And n_tRespectively representing convolutional layers

The number of input and output channels when processing the enhanced image corresponding to the weak enhancement level. Wherein n is_tGreater than m_t。

At this time, a specific procedure of the convolution operation for the enhanced image corresponding to the input weak enhancement level is as follows:

the specific process of the convolution operation on the enhanced image corresponding to the input strong enhancement level is shown as the following formula:

wherein [, ] denotes the splicing operation. W and b represent the network parameters to be learned.

Taking the initial model for image classification as an example, the loss function of the initial model can be shown as follows:

wherein f is_φAnd

and respectively representing classifiers corresponding to the weak enhancement level and the strong enhancement level.<,>This indicates the Kullback-Leibler Divergence (KL Divergence), also known as Relative Entropy or Information Divergence. And S represents a grade loss function to represent the similarity of the feature extraction results generated after the enhanced images corresponding to the weak enhancement grades are subjected to convolution processing of different feature extraction networks. Lambda is an adjustment parameter with a value between 0 and 1.

Fig. 3b and 3d illustrate two enhancement levels as an example, and it should be noted that the model training method provided in this embodiment can be extended to more than three enhancement levels according to actual requirements. As an example, fig. 4 shows yet another schematic diagram of a network structure of an initial model in the model training method according to the present embodiment.

As shown in fig. 4. And dividing the enhancement grade into three grades of a first grade, a second grade and a third grade, wherein the first grade is lower than the second grade, and the second grade is lower than the third grade. At this time, the initial model includes processing networks corresponding to the three enhancement levels, respectively.

Specifically, the processing network corresponding to level one includes a first feature extraction network corresponding to level one, a second feature extraction network corresponding to level two, a feature extraction network corresponding to level three, and a first output network corresponding to level one. The processing network corresponding to level two includes a second feature extraction network corresponding to level two, a feature extraction network corresponding to level three, and a second output network corresponding to level two. The processing network corresponding to level three includes a third feature extraction network corresponding to level three and a third output network corresponding to feature three.

The enhanced image corresponding to the level one is simultaneously input to the first feature extraction network, the second feature extraction network and the third feature extraction network, and then a first image processing result (such as an image category and the like) is generated by the first output network according to the features respectively extracted by the first feature extraction network, the second feature extraction network and the third feature extraction network.

And the enhanced images corresponding to the second level are simultaneously input into a second feature extraction network and a third feature extraction network, and then a second image processing result (such as image category and the like) is generated by a second output network according to features respectively extracted by the second feature extraction network and the third feature extraction network.

The enhanced image corresponding to the level three is only input to the third feature extraction network, and then the third output network generates a third image processing result (such as image category) according to the features extracted by the third feature extraction network.

At this time, the loss function may calculate a sum of a cross entropy loss function and a level loss function of the processing network corresponding to each of the three enhancement levels. Wherein the rank penalty functions may comprise a first rank penalty function and a second rank penalty function. The first rank loss function may represent a similarity between the feature extraction results of the input rank-one enhanced image by the first feature extraction network, the second feature extraction network, and the third feature extraction network included in the processing network corresponding to the rank one, respectively. The second level loss function may represent a similarity between feature extraction results of the input level two enhanced images by the second feature extraction network and the third feature extraction network included in the processing network corresponding to the level two.

The parameters of the initial model can be adjusted by using the loss function, so that the first feature extraction network corresponding to the first level can be dedicated to extracting the features of the enhanced image corresponding to the first level, the second feature extraction network corresponding to the second level can be dedicated to extracting the common features of the enhanced image corresponding to the first level and the enhanced image corresponding to the second level, and the third feature extraction network corresponding to the third level can be dedicated to extracting the common features of the enhanced image corresponding to the first level, the enhanced image corresponding to the second level and the enhanced image corresponding to the third level, thereby being beneficial to improving the sensitivity of the initial model to the features which are contained in the enhanced image with stronger enhanced level and are beneficial to the image processing task and improving the robustness of the noise contained in the enhanced image with stronger enhanced level.

In the model training method provided by the above embodiment of the present disclosure, the enhancement levels are divided by the method for enhancing different data, and in the training process, the enhancement images corresponding to different enhancement levels are respectively processed differently, so that the dependency relationship between the enhancement images formed by the data enhancement methods of different levels is fully considered, and an independent processing process of the enhancement image with a relatively low enhancement level is set for the enhancement image with a higher enhancement level, thereby ensuring that the features which are contained in the enhancement image with a higher enhancement level and are favorable for the image processing task are learned, and reducing the influence of the features which are contained in the enhancement image with a higher enhancement level and are unfavorable for the image processing task on the model training.

In addition, in the existing model training method, the initial model performs undifferentiated processing on the enhanced images formed by the data enhancement methods with different enhancement levels, and the influence of the different data enhancement methods on the model training is unstable. Some data enhancement methods may improve the image processing model of some network structures, but may also negatively affect the image processing model of other network structures.

For this case, methods based on meta-learning or search are proposed, aiming at automatically matching the optimal data enhancement methods for image processing tasks on a given set of data or image processing models of a given network structure, but the execution of these methods usually requires a large consumption of computational resources.

In contrast to the method for enhancing data for automatic matching of image processing models or image processing tasks proposed in the prior art, the model training method provided in the embodiments of the present disclosure adapts to various data enhancement methods through the design of the network structure of the initial model, thereby avoiding the situation that a large amount of computing resources are consumed due to the automatic matching of the data enhancement method, and reducing the model training cost.

With further reference to fig. 5, a flow 500 of yet another embodiment of an image processing method is shown. The flow 500 of the image processing method includes the following steps:

step 501, acquiring an image to be processed.

In the present embodiment, the image to be processed may be an arbitrary image. The execution subject of the image processing method may acquire the image to be processed from a local or other data source.

It should be noted that the execution subject of the image processing method may be the same as or different from the execution subject of the model training method described in the embodiment corresponding to fig. 2.

Step 502, inputting the image to be processed into the image processing model to obtain the image processing result.

In this embodiment, the executing subject of the image processing method may input the image to be processed into the image processing model obtained by training in advance, so as to obtain the image processing result. The image processing result corresponds to the image processing task corresponding to the image processing model. For example, if the image processing model is used for image classification, the image processing result is used to indicate the category to which the image to be processed belongs.

The image processing model may be a processing network corresponding to the lowest enhancement level in the initial model trained by the model training method described in the embodiment corresponding to fig. 2. As an example, the initial model includes a processing network corresponding to a weak enhancement level and a processing network corresponding to a strong enhancement level, and at this time, after the training of the initial model is completed, the processing network corresponding to the weak enhancement level in the trained initial model may be determined as an image processing network and used for subsequent image processing. And the processing network corresponding to the weak enhancement grade in the trained initial model comprises a feature extraction network corresponding to the weak enhancement grade, a feature extraction network corresponding to the strong enhancement grade and an output network corresponding to the weak enhancement grade.

For a specific training process of the initial model, reference may be made to the related description in the corresponding embodiment of fig. 2, which is not described herein again.

The image processing method provided by the above embodiment of the present disclosure utilizes the processing network with the lowest enhancement level after training as an image processing model for subsequent image processing, thereby improving the image processing efficiency. In addition, the processing network with the lowest enhancement level comprises a feature extraction network corresponding to each enhancement level and an output network corresponding to the lowest enhancement level, so that the features which are beneficial to image processing in the image to be processed can be extracted from the angles of different enhancement levels by utilizing the processing network with the lowest enhancement level to perform image processing, and the robustness of an image processing result is improved.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a model training apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 6, the present embodiment provides a model training apparatus 600 including an enhanced image set acquisition unit 601, a model acquisition unit 602, and a training unit 603. The enhanced image set obtaining unit 601 is configured to obtain enhanced image sets corresponding to at least one original image respectively, and obtain an image processing result corresponding to each enhanced image set, where each enhanced image set includes enhanced images of at least two enhancement levels; the model obtaining unit 602 is configured to obtain an initial model, where the initial model includes processing networks respectively corresponding to different enhancement levels, each enhancement level corresponding processing network includes an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to a feature extraction result output by each feature extraction network; the training unit 603 is configured to input the enhanced images in the enhanced image set to the initial model, and train the initial model with a preset loss function using the image processing result corresponding to the input enhanced image set as an expected output result.

In the present embodiment, in the model training apparatus 600: the detailed processing of the enhanced image set obtaining unit 601, the model obtaining unit 602, and the training unit 603 and the technical effects thereof can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of this embodiment, the feature extraction networks corresponding to the same enhancement level in the initial model share network parameters.

In some optional implementation manners of this embodiment, the scales of the feature extraction results output by the feature extraction networks included in the processing networks corresponding to different enhancement levels are different.

In some optional implementations of this embodiment, the loss function includes a rank loss function, where the rank loss function is used to characterize similarity between output results of the feature extraction networks included in the processing network corresponding to each enhancement rank.

In some optional implementations of this embodiment, the loss function includes a loss function corresponding to each processing network of each enhancement level.

In the model training device provided by the above embodiment of the present disclosure, the enhanced image set obtaining unit obtains the enhanced image sets corresponding to at least one original image respectively, and obtains the image processing result corresponding to each enhanced image set, where each enhanced image set includes enhanced images of at least two enhancement levels; the model obtaining unit obtains an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network; the training unit inputs the enhanced images in the enhanced image set to the initial model, takes the image processing results corresponding to the input enhanced image set as expected output results, trains the initial model by using a preset loss function, realizes the enhancement grade division of different data enhancement methods, and respectively performs different processing on the enhanced images corresponding to different enhancement grades in the training process, so as to ensure that the features which are contained in the enhanced images with higher enhancement grades and are beneficial to the image processing task are learned, and reduce the influence of the features which are contained in the enhanced images with higher enhancement grades and are not beneficial to the image processing task on the model training.

With further reference to fig. 7, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an image processing apparatus, which corresponds to the embodiment of the method shown in fig. 5, and which is particularly applicable in various electronic devices.

As shown in fig. 7, the image processing apparatus 700 provided in the present embodiment includes an image to be processed acquisition unit 701 and a processing unit 702. Wherein the to-be-processed image acquiring unit 701 is configured to acquire an image to be processed; the processing unit 702 is configured to input the image to be processed to an image processing model, and obtain an image processing result, where the image processing model is a processing network corresponding to a lowest enhancement level in a trained initial model, and the initial model is trained by using the method described in the embodiment of fig. 2.

In the present embodiment, in the image processing apparatus 700: the specific processing of the to-be-processed image obtaining unit 701 and the processing unit 702 and the technical effects thereof can refer to the related descriptions of step 501 and step 502 in the corresponding embodiment of fig. 5, which are not repeated herein.

The image processing apparatus provided by the above embodiment of the present disclosure acquires an image to be processed by the image to be processed acquiring unit; the processing unit inputs the image to be processed into the image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in the trained initial model, so that the features which are beneficial to image processing in the image to be processed can be respectively extracted from the angles of different enhancement levels, and the robustness of the image processing result is improved.

Referring now to FIG. 8, a block diagram of an electronic device (e.g., the server of FIG. 1) 800 suitable for use in implementing embodiments of the present disclosure is shown. The server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring enhanced image sets corresponding to at least one original image respectively, and acquiring an image processing result corresponding to each enhanced image set, wherein each enhanced image set comprises enhanced images of at least two enhancement levels; acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to a feature extraction result output by each feature extraction network; and inputting the enhanced images in the enhanced image set into the initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an enhanced image set acquisition unit, a model acquisition unit, and a training unit. The names of these units do not limit the units themselves in some cases, for example, the enhanced image set obtaining unit may be further described as a unit for obtaining enhanced image sets corresponding to at least one original image respectively, and obtaining image processing results corresponding to each enhanced image set, wherein each enhanced image set includes enhanced images of at least two enhancement levels.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A model training method, comprising:

acquiring enhanced image sets corresponding to at least one original image respectively, and acquiring an image processing result corresponding to each enhanced image set, wherein each enhanced image set comprises enhanced images of at least two enhancement levels;

acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network;

and inputting the enhanced images in the enhanced image set into the initial model, taking an image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

2. The method of claim 1, wherein the processing networks included in the processing networks corresponding to different enhancement levels have different scale of feature extraction results output by the feature extraction networks.

3. The method of claim 2, wherein the scale of the feature extraction results of each feature extraction network output in the initial model corresponding to the same enhancement level is different.

4. The method according to one of claims 1 to 3, wherein the loss function comprises a rank loss function, wherein the rank loss function is used for characterizing the similarity between the output results of the feature extraction networks comprised by the processing network corresponding to each enhancement rank.

5. A method according to any of claims 1 to 3, wherein the loss function comprises a loss function for each respective enhancement level processing network.

6. An image processing method comprising:

acquiring an image to be processed;

inputting the image to be processed into an image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in a trained initial model, and the initial model is trained by using the method of any one of claims 1 to 5.

7. A model training apparatus comprising:

the image processing device comprises an enhanced image set acquisition unit, a processing unit and a processing unit, wherein the enhanced image set acquisition unit is configured to acquire enhanced image sets corresponding to at least one original image respectively, and acquire an image processing result corresponding to each enhanced image set, and each enhanced image set comprises enhanced images of at least two enhancement levels;

the model acquisition unit is configured to acquire an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, each enhancement level corresponding processing network comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network;

and the training unit is configured to input the enhanced images in the enhanced image set to the initial model, and train the initial model by using a preset loss function, wherein the image processing result corresponding to the input enhanced image set is used as an expected output result.

8. An image processing apparatus comprising:

a to-be-processed image acquisition unit configured to acquire a to-be-processed image;

a processing unit configured to input the image to be processed into an image processing model, and obtain an image processing result, where the image processing model is a processing network corresponding to a lowest enhancement level in a trained initial model, and the initial model is trained by using the method according to any one of claims 1 to 5.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.