CN113505848B

CN113505848B - Model training method and device

Info

Publication number: CN113505848B
Application number: CN202110848100.XA
Authority: CN
Inventors: 白亚龙; 张炜; 梅涛; 周伯文
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2023-09-26
Anticipated expiration: 2041-07-27
Also published as: WO2023005386A1; CN113505848A

Abstract

The embodiment of the disclosure discloses a model training method and device. One embodiment of the method comprises the following steps: acquiring at least one enhancement image set corresponding to the original image respectively, and acquiring an image processing result corresponding to each enhancement image set, wherein each enhancement image set comprises enhancement images of at least two enhancement levels; acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, each processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network; and inputting the enhanced images in the enhanced image set into an initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function. The embodiment improves the adaptability of the model to be trained to various data enhancement methods.

Description

Model training method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a model training method and device.

Background

With the rapid development of artificial neural networks (ans, artificial Neural Networks), they have been widely used in various image processing tasks such as image recognition, image classification, image retrieval, semantic segmentation, multi-modal image processing, and the like. Problems such as noise signal contained in training samples, limited number of training samples, transition fitting of training samples, and the like are often encountered in the training process of various image processing models (such as convolutional neural networks and the like).

Currently, data enhancement techniques are widely used in the training of various image processing models as a low-cost strategy for expanding training samples to ameliorate some of the above problems. Types of data enhancement include, but are not limited to, random image flipping, image cropping, image random occlusion, and the like. The influence of different types of data enhancement on the training effect of the image processing model is different, and meanwhile, the influence of the same type of data enhancement on the training effect of the image processing model corresponding to different image processing tasks respectively is also different, so that how to select a proper data enhancement type is a considerable problem.

Disclosure of Invention

The embodiment of the disclosure provides a model training method and device.

In a first aspect, embodiments of the present disclosure provide a model training method, the method comprising: acquiring at least one enhanced image set corresponding to the original image respectively, and acquiring an image processing result corresponding to each enhanced image set, wherein each enhanced image set comprises enhanced images of at least two enhancement levels; acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network; and inputting the enhanced images in the enhanced image set into an initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

In a second aspect, embodiments of the present disclosure provide an image processing method, the method including: acquiring an image to be processed; inputting an image to be processed into an image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in a trained initial model, and the initial model is trained by the method described in any implementation mode of the first aspect.

In a third aspect, embodiments of the present disclosure provide a model training apparatus, the apparatus comprising: an enhanced image set obtaining unit configured to obtain enhanced image sets corresponding to at least one original image respectively, and obtain an image processing result corresponding to each enhanced image set, wherein each enhanced image set includes enhanced images of at least two enhancement levels; the model acquisition unit is configured to acquire an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network; and the training unit is configured to input the enhanced images in the enhanced image set into the initial model, take the image processing results corresponding to the input enhanced image set as expected output results, and train the initial model by utilizing a preset loss function.

In a fourth aspect, embodiments of the present disclosure provide an image processing apparatus including: a to-be-processed image acquisition unit configured to acquire an to-be-processed image; the processing unit is configured to input an image to be processed into the image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in the initial model after training, and the initial model is trained by the method described in any implementation manner of the first aspect.

In a fifth aspect, embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a sixth aspect, embodiments of the present disclosure provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

According to the model training method and device, in the model training process, the enhancement images corresponding to different enhancement levels are distinguished and are respectively input into the processing networks corresponding to the enhancement levels, so that the processing network corresponding to each enhancement level extracts the characteristics of each enhancement image above the enhancement level, and an image processing result is generated according to the obtained characteristic extraction results, so that the model training process can learn the characteristics which are contained in the enhancement image with the higher enhancement level and are favorable for the image processing task, the influence of the characteristics which are contained in the enhancement image with the higher enhancement level and are unfavorable for the image processing task on the model training is reduced, and the model training effect is further improved.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a model training method according to the present disclosure;

FIG. 3a is a schematic diagram of a network structure of an initial model used in the prior model training method;

FIG. 3b is a schematic diagram of the network structure of the initial model in the model training method according to the present embodiment;

FIG. 3c is a schematic diagram of a feature extraction operation of an initial model employed by a prior model training method;

FIG. 3d is a schematic diagram of a feature extraction operation of an initial model in the model training method according to the present embodiment;

FIG. 4 is a further schematic diagram of a network structure of an initial model in the model training method according to the present embodiment

FIG. 5 is a flow chart of one embodiment of an image processing method according to the present disclosure;

FIG. 6 is a schematic structural view of one embodiment of a model training apparatus according to the present disclosure;

fig. 7 is a schematic structural view of an embodiment of an image processing apparatus according to the present disclosure;

Fig. 8 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates an exemplary architecture 100 to which embodiments of the model training method or model training apparatus of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The terminal devices 101, 102, 103 interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications can be installed on the terminal devices 101, 102, 103. Such as browser-like applications, search-like applications, instant messaging tools, image processing-like applications, and the like.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablet computers, electronic book readers, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a server providing back-end support for the terminal devices 101, 102, 103. The server 105 may acquire at least one enhanced image set of the original image, an image processing result corresponding to each enhanced image set, and an initial model as training data, and then complete training of the initial model by using a preset loss function, so as to obtain an image processing model. Then, the server 105 may receive the image processing requests sent by the terminal devices 101, 102, 103, and process the images indicated by the image processing requests with the trained image processing model, to obtain image processing results.

It should be noted that, the model training method provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the model training apparatus is generally disposed in the server 105. In some cases, the terminal devices 101, 102, 103 and the network 104 may not be present.

It is further noted that the model training class application may also be installed in the terminal device 101, 102, 103, and the terminal device 101, 102, 103 may complete model training based on the model training class application. In this case, the model training method may be performed by the terminal apparatuses 101, 102, 103, and the model training device may be provided in the terminal apparatuses 101, 102, 103. At this point, the exemplary system architecture 100 may not have the server 105 and network 104 present.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a model training method according to the present disclosure is shown. The model training method comprises the following steps:

step 201, obtaining at least one enhanced image set corresponding to the original image respectively, and obtaining an image processing result corresponding to each enhanced image set.

In the present embodiment, the original image may be an arbitrarily specified image. Each of the at least one original image may be different. After an original image is specified, data enhancement processing can be performed on the original image to obtain an enhanced image corresponding to the original image. Different enhancement images can be obtained by processing the same original image by adopting different data enhancement methods.

Specifically, various data enhancement methods may be employed on the original image to achieve image enhancement. As examples, data enhancement methods include, but are not limited to, blur (blue), flip, normalization (normal), transpose (transform), random crop (random crop), random Gamma, rotation (Rotate), optical distortion (optical display), grid distortion (griddisplay), elastic transformation (elastic transform), random grid shuffling (random grid shuffle), cut out (random erase), gray (graying), and the like.

Each enhanced image set corresponding to the original image may be composed of enhanced images corresponding to the original image. Each enhanced image set may include enhanced images of at least two enhancement levels. Wherein the enhancement level may be used to characterize the enhanced image formed using the corresponding data enhancement method, as compared to the information loss level of the original image. The information of the image includes various information such as color, object structure, and the like. Generally, the more information is lost, the higher the corresponding enhancement level.

As an example, flipping does not typically result in the original image losing too much information compared to random grid shuffling, which typically results in the original image losing a large amount of important information. Thus, the random grid shuffle corresponds to a higher level of enhancement than the flip corresponds to.

The division mode of the enhancement level can be flexibly set according to the actual application scene. For example, enhancement levels may be classified into two types, weak data enhancement and strong data enhancement. For another example, the strong data enhancement may be further divided into a first-level strong data enhancement and a second-level strong data enhancement on the basis of the two enhancement levels, and the second-level enhancement level is higher than the first-level enhancement level. At this time, the enhancement level includes three kinds of weak data enhancement, strong data enhancement of level one, and strong data enhancement of level two.

It should be noted that, for each enhancement level, each enhancement image set may include any number of enhancement images for that enhancement level.

The image processing result corresponding to the enhanced image set may refer to the image processing result corresponding to the original image. In general, each enhanced image in the enhanced image set corresponds to the same image processing result. Specifically, the image processing result corresponding to the enhanced image set can be flexibly determined according to the actual application scene. For example, in a scenario in which an image processing model for image classification is trained, the image processing result may refer to the category to which the image belongs. For another example, for a scenario in which an image processing model for image detection is trained, the image processing result may refer to the position of the object to be detected in the image.

The execution body of the model training method (such as the server 105 shown in fig. 1) may obtain at least one enhanced image set and image processing results corresponding to each enhanced image set respectively from a local, connected database, a third party data platform or a storage device (such as the terminal devices 101, 102, 103 shown in fig. 1). It should be noted that, at least one enhanced image set and the image processing results corresponding to each enhanced image set may be obtained from the same data source, or may be obtained from different data sources.

Step 202, an initial model is obtained.

In this embodiment, the execution body may obtain the pre-built initial model from a local or other data source. Wherein the initial model may comprise processing networks corresponding to different enhancement levels, respectively. The processing network corresponding to each enhancement level may include an output network and a feature extraction network corresponding to each enhancement level above the enhancement level.

As an example, the processing network corresponding to the enhancement rank "N" may include an output network and a feature extraction network corresponding to each enhancement rank (i.e., enhancement ranks "N", "n+1", "n+2" …) of which the rank is not less than "N", respectively.

Wherein the feature extraction network in each processing network may be used to extract features of the image, and the output network may be used to generate image processing results according to the feature extraction results output by the feature extraction networks included in the processing network. It should be noted that, the enhancement level corresponding to the enhanced image input to each feature extraction network may be generally the same as the enhancement level corresponding to the feature extraction network.

After each feature extraction network in each processing network outputs the extracted features respectively, the output network can generate image processing results according to the features respectively output by each feature extraction network through various methods (such as sequential splicing, fusion and the like).

Specifically, the feature extraction network may employ various existing network structures for extracting image features. For example, the feature extraction network may be composed of several convolution layers and a pooling layer. The input network can be flexibly constructed according to the actual application scene. For example, in the context of image classification, the input network may be various classifiers, etc.

In some alternative implementations of the present embodiment, the network parameters may be shared by feature extraction networks in the initial model that correspond to the same enhancement level.

As an example, when the enhancement level includes a level one and a level two, and the level one is lower than the level two, the processing network corresponding to the level one may include a feature extraction network corresponding to the level one and a feature extraction network corresponding to the level two, and the processing network corresponding to the level two may include a feature extraction network corresponding to the level two, and at this time, the feature extraction network corresponding to the level two included in the processing network corresponding to the level one and the feature extraction network corresponding to the level two included in the processing network corresponding to the level two may share network parameters.

The network structure can be simplified by sharing network parameters, the processing processes of the enhanced images with different enhancement levels can be related, the influence of noise generated in the training process of the enhanced images with stronger enhancement levels on the model training effect is avoided, and the model training efficiency and the training effect are improved.

In some optional implementations of this embodiment, the dimensions of the feature extraction results output by the feature extraction network included in the processing network corresponding to the different enhancement levels may be different. Specifically, the scale of the feature extraction result output by the feature extraction network included in the processing network corresponding to each enhancement level may be flexibly set by a technician in advance.

It should be noted that, for a processing network including more than two feature extraction networks, the scale of the feature extraction result corresponding to the processing network may refer to the scale of the splicing or merging result of the feature extraction result corresponding to each feature extraction network included in the processing network.

As an example, when the enhancement level includes a level one and a level two, and the level one is lower than the level two, the processing network corresponding to the level one includes a feature extraction network corresponding to the level one and a feature extraction network corresponding to the level two, and the feature extraction network corresponding to the level one outputs a feature extraction result having a length H, a width W, a number of channels (N-M), the feature extraction network corresponding to the level two outputs a feature extraction result having a length H, a width W, and a number of channels M. Wherein N is greater than M. At this time, the processing network corresponding to the level includes the two feature extraction networks with the feature extraction results sequentially spliced by the scale: length H, width W, number of channels N. The processing network corresponding to the second class may include a feature extraction network corresponding to the second class, where the feature extraction result of the feature extraction network corresponding to the second class has a length H, a width W, and a channel number M.

Reference is now made to figures 3a and 3b. Fig. 3a is a schematic diagram of a network structure of an initial model used in the existing model training method. Fig. 3b is a network structure diagram of an initial model in the model training method according to the present embodiment.

As shown in fig. 3a, the initial model employed by the existing model training method typically includes a feature extraction network and an output network. The feature extraction network may include a plurality of convolution layers to extract image features, and the output network is configured to generate an image processing result according to the image features extracted by the feature extraction network.

The specific training process is usually to acquire at least one enhanced image set corresponding to each original image, and acquire an image processing result (such as an image category) corresponding to each enhanced image set. The enhanced image set corresponding to each original image can be obtained by performing various enhancement processes on the original image, and the enhanced images in each enhanced image set are not distinguished by enhancement levels. Then, the enhanced images in each enhanced image set are input into an initial model, the difference between the output result of the initial model and the pre-acquired image processing result is compared, the network parameters of the initial model are regulated according to the difference, and the training process is repeated until the training of the initial model is completed.

As shown in fig. 3b, the enhancement levels are divided into two types, weak data enhancement and strong data enhancement. The initial model includes a first processing network corresponding to the weak data enhancement and a second processing network corresponding to the strong data enhancement. The first processing network includes a first feature extraction network corresponding to a weak data enhancement, a second feature extraction network corresponding to a strong data enhancement, and a first output network corresponding to a weak data enhancement. The second processing network includes a second feature extraction network corresponding to the strong data enhancement and a second output network corresponding to the strong data enhancement.

Specifically, the weak data enhanced image is input into the first feature extraction network and the second feature extraction network at the same time, and then the first output network generates a first image processing result (such as an image category and the like) according to the features respectively extracted by the first feature extraction network and the second feature extraction network. The strong data enhanced image is input only to the second feature extraction network, and then a second image processing result (such as an image category and the like) is generated by the second output network according to the features extracted by the second feature extraction network.

It can be seen from this that the existing model training method is to directly input the enhanced image into the initial model without performing enhancement grade distinction for training, and the initial model has no difference in processing procedures of different enhanced images. The model training method provided by the embodiment is to input the enhancement grade of the enhanced image into the processing network corresponding to the enhancement grade in the initial model for processing. Namely, the existing model training method and the model training method proposed by the embodiment have essential differences in the feature extraction operation of the input image.

Step 203, inputting the enhanced image in the enhanced image set into the initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

In this embodiment, an enhanced image set may be selected from the acquired at least one enhanced image set, and an enhanced image in the selected enhanced image set is input to the initial model to obtain an image processing result generated by the initial model, then, a loss value is calculated by comparing the image processing result generated by the initial model with a pre-acquired image processing result and using a preset loss function, then, parameters of the initial model are adjusted by using algorithms such as back propagation and gradient descent according to the loss value, then, whether the adjusted initial model is trained is determined, if not trained, the selected enhanced image set is continuously selected and input to the adjusted initial model until training of the initial model is completed.

The loss function can be flexibly set by technicians in advance according to actual application requirements. For example, the loss function may characterize the sum of differences between the image processing results of each processing network comprised by the processing model and the desired output results, respectively, at which time the parameters of the initial model are adjusted by minimizing the loss function.

Alternatively, the loss function may comprise a level loss function. The level loss function may be used to characterize the similarity between the output results of the feature extraction networks included in the processing network corresponding to each enhancement level. At this time, the feature extraction network corresponding to the lowest enhancement level can be made to concentrate on extracting the features of the enhanced image of the lowest enhancement level by minimizing the level loss function (i.e., making the difference between the output results of the feature extraction networks as large as possible), while the feature extraction network corresponding to the higher enhancement level can be made to concentrate on extracting the common features of the enhanced images respectively corresponding to the enhancement levels above the enhancement level.

Alternatively, the penalty function may comprise a penalty function corresponding to each enhancement level processing network. The design of the loss function of the processing network corresponding to different enhancement levels can be the same or different. For example, a loss function of the processing network corresponding to each enhancement level may be used to characterize the difference between the image processing results actually output by the processing network during the training process and the corresponding desired output results.

Based on the above description, the loss function may also be designed as a sum of the class loss function and the loss function corresponding to each processing network of each enhancement class, so as to comprehensively control the adjustment of the network parameters trained by the model from multiple aspects.

As an example, reference is continued to fig. 3c and 3d. FIG. 3c shows a schematic diagram of a feature extraction operation of an initial model employed by a prior model training method. Fig. 3d is a schematic diagram of the feature extraction operation of the initial model in the model training method according to the present embodiment.

In particular, as shown in FIG. 3c,representing the input image. />And->The convolution operations of the (t-1) th and the (t) th network layers of the feature extraction network are represented respectively. h is a _t Length of convolution kernel, w _t Representing the width of the convolution kernel, n _t-1 And n _t The channel numbers of the feature extraction results output by the (t-1) th and the (t) th convolution layers after convolution processing are respectively shown.

At this time, taking the initial model for image classification as an example, the loss function of the initial model can be expressed as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,loss function (F)>Representing a cross entropy loss function. />Representing the input image of the ith convolution layer. N represents the total number of convolutional layers. l (L) _i A category label representing the input image. W (W) _t And b _t Representing the network parameters to be learned.

As shown in fig. 3d, phi represents the enhancement image corresponding to the input weak enhancement level.Representing the enhanced image corresponding to the input strong enhancement level. />And->The (t-1) th and t-th convolution layers are shown, respectively. / >And the convolution operation of the feature extraction network corresponding to the weak enhancement level on the enhanced image corresponding to the weak enhancement level is represented. />And the convolution operation of the feature extraction network corresponding to the strong enhancement level on the enhanced image corresponding to the weak enhancement level is represented. />And the convolution operation of the feature extraction network corresponding to the strong enhancement level on the enhanced image corresponding to the strong enhancement level is represented. m is m _t And m _t-1 Respectively represent convolution layersThe number of channels input and output when processing the enhanced image corresponding to the strong enhancement level. n is n _t-1 And n _t Respectively represent convolution layersThe number of channels input and output when processing the enhanced image corresponding to the weak enhancement level. Wherein n is _t Greater than m _t 。

At this time, a specific procedure of the convolution operation on the enhanced image corresponding to the input weak enhancement level is as follows:

the specific process of convolution operation of the enhanced image corresponding to the input strong enhancement level is as follows:

wherein [ (i) and "] represent splicing operations. W and b represent network parameters to be learned.

Taking the initial model for image classification as an example, the loss function of the initial model can be expressed as follows:

wherein f _φ Andrespectively representing the classifiers corresponding to the weak enhancement level and the strong enhancement level.<,>Represents Kullback-Leibler divergence (KL divergence), also known as Relative Entropy or Information Divergence (information divergence). S represents a grade loss function for representing the similarity of feature extraction results generated by the convolution processing of the enhanced images corresponding to the weak enhancement grade through different feature extraction networks. Lambda is an adjustment parameter having a value between 0 and 1.

Fig. 3b and 3d are examples of two enhancement levels, and it should be noted that, the model training method provided in this embodiment may be extended to more than three enhancement levels according to actual requirements. As an example, fig. 4 shows still another schematic diagram of the network structure of the initial model in the model training method according to the present embodiment.

As shown in fig. 4. The enhancement levels are classified into three of level one, level two and level three, and level one is lower than level two, and level two is lower than level three. At this time, the initial model includes processing networks to which the three enhancement levels respectively correspond.

Specifically, the processing network corresponding to the level one includes a first feature extraction network corresponding to the level one, a second feature extraction network corresponding to the level two, a feature extraction network corresponding to the level three, and a first output network corresponding to the level one. The processing network corresponding to the level two comprises a second feature extraction network corresponding to the level two, a feature extraction network corresponding to the level three and a second output network corresponding to the level two. The processing network corresponding to level three includes a third feature extraction network corresponding to level three and a third output network corresponding to feature three.

The enhanced images corresponding to the first level are simultaneously input into a first feature extraction network, a second feature extraction network and a third feature extraction network, and then a first output network generates a first image processing result (such as an image category and the like) according to the features respectively extracted by the first feature extraction network, the second feature extraction network and the third feature extraction network.

The enhanced images corresponding to the level two are simultaneously input into a second feature extraction network and a third feature extraction network, and then a second output network generates a second image processing result (such as an image category and the like) according to the features respectively extracted by the second feature extraction network and the third feature extraction network.

The enhanced image corresponding to the level three is only input to the third feature extraction network, and then the third output network generates a third image processing result (such as an image category and the like) according to the features extracted by the third feature extraction network.

At this time, the loss function may calculate the sum of the cross entropy loss function and the level loss function of the processing network to which the three enhancement levels respectively correspond. Wherein the level loss function may comprise a first level loss function and a second level loss function. The first level loss function may represent a degree of similarity between feature extraction results of the first, second, and third feature extraction networks included in the processing network corresponding to the level one, respectively, for the input level one enhanced image. The second level loss function may represent a similarity between feature extraction results of the second feature extraction network and the third feature extraction network included in the processing network corresponding to the level two, respectively, for the input level two enhanced image.

The parameters of the initial model can be adjusted by utilizing the loss function, so that the first feature extraction network corresponding to the first level can be focused on extracting the features of the enhanced image corresponding to the first level, the second feature extraction network corresponding to the second level can be focused on extracting the common features of the enhanced image corresponding to the first level and the enhanced image corresponding to the second level, and the third feature extraction network corresponding to the third level can be focused on extracting the common features of the enhanced image corresponding to the first level, the enhanced image corresponding to the second level and the enhanced image corresponding to the third level, thereby being beneficial to improving the sensitivity of the initial model to the features which are included in the enhanced image with the stronger enhanced level and are beneficial to the image processing task, and improving the robustness to the noise included in the enhanced image with the stronger enhanced level.

According to the model training method provided by the embodiment of the invention, the enhancement levels are divided by the enhancement methods of different data, and in the training process, different processing is respectively carried out on the enhancement images corresponding to the different enhancement levels, so that the dependency relationship among the enhancement images formed by the data enhancement methods of different levels is fully considered, and meanwhile, the independent processing process of the enhancement image with the relatively lower enhancement level is set for the enhancement image with the higher enhancement level, so that the characteristic which is favorable for the image processing task and is contained in the enhancement image with the higher enhancement level is learned, and the influence of the characteristic which is unfavorable for the image processing task and is contained in the enhancement image with the higher enhancement level on the model training is reduced.

In addition, in the existing model training method, an initial model performs indifferent processing on enhancement images formed by data enhancement methods with different enhancement levels, and the influence of the different data enhancement methods on model training is unstable. Some data enhancement methods may promote image processing models of some network structures, but may also negatively affect image processing models of other network structures.

For this case, meta-learning or search based methods are proposed which aim to automatically match the optimal data enhancement method for image processing models of a given network structure or image processing tasks on a given data set, but the execution of these methods typically requires a large amount of computational resources.

In contrast to the method for automatically matching data for an image processing model or an image processing task, which is proposed in the prior art, the model training method provided in the foregoing embodiment of the present disclosure adapts to various data enhancement methods through the design of the network structure of the initial model, thereby avoiding the situation that a large amount of computing resources are consumed by the method for automatically matching data enhancement, and thus reducing the model training cost.

With further reference to fig. 5, a flow 500 of yet another embodiment of an image processing method is shown. The flow 500 of the image processing method comprises the steps of:

Step 501, an image to be processed is acquired.

In this embodiment, the image to be processed may be an arbitrary image. The subject of execution of the image processing method may obtain the image to be processed from a local or other data source.

Note that, the execution subject of the image processing method may be the same as or different from the execution subject of the model training method described in the embodiment corresponding to fig. 2.

Step 502, inputting an image to be processed into an image processing model to obtain an image processing result.

In this embodiment, the execution subject of the image processing method may input the image to be processed into the image processing model obtained by training in advance, to obtain the image processing result. The image processing result corresponds to an image processing task to which the image processing model corresponds. For example, if the image processing model is used for image classification, the image processing result is used to represent the category to which the image to be processed belongs.

The image processing model may be a processing network corresponding to the lowest enhancement level in the initial model trained by the model training method described in the corresponding embodiment of fig. 2. As an example, the initial model includes a processing network corresponding to a weak enhancement level and a processing network corresponding to a strong enhancement level, at which time, after the initial model training is completed, the processing network corresponding to the weak enhancement level in the trained initial model may be determined as an image processing network and used for subsequent image processing. The processing network corresponding to the weak enhancement level in the initial model after training comprises a feature extraction network corresponding to the weak enhancement level, a feature extraction network corresponding to the strong enhancement level and an output network corresponding to the weak enhancement level.

For specific training process of the initial model, reference may be made to the related description in the corresponding embodiment of fig. 2, which is not repeated here.

The image processing method provided by the embodiment of the present disclosure uses the processing network with the lowest enhancement level after training as an image processing model for subsequent image processing, thereby improving the image processing efficiency. In addition, the processing network of the lowest enhancement level comprises a feature extraction network corresponding to each enhancement level and an output network corresponding to the lowest enhancement level, so that the features which are favorable for image processing in the image to be processed can be extracted from the angles of different enhancement levels by utilizing the processing network of the lowest enhancement level to perform image processing, and the robustness of the image processing result can be improved.

With further reference to fig. 6, as an implementation of the method illustrated in the above figures, the present disclosure provides an embodiment of a model training apparatus, which corresponds to the method embodiment illustrated in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the model training apparatus 600 provided in the present embodiment includes an enhanced image set acquisition unit 601, a model acquisition unit 602, and a training unit 603. The enhanced image set obtaining unit 601 is configured to obtain enhanced image sets corresponding to at least one original image respectively, and obtain an image processing result corresponding to each enhanced image set, where each enhanced image set includes enhanced images of at least two enhancement levels; the model obtaining unit 602 is configured to obtain an initial model, where the initial model includes processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level includes an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is configured to generate an image processing result according to a feature extraction result output by each feature extraction network; the training unit 603 is configured to input the enhanced images in the enhanced image set to the initial model, and train the initial model with a preset loss function with the image processing results corresponding to the input enhanced image set as desired output results.

In the present embodiment, in the model training apparatus 600: the specific processes of the enhanced image set acquiring unit 601, the model acquiring unit 602, and the training unit 603 and the technical effects thereof may refer to the descriptions related to step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, and are not described herein.

In some alternative implementations of the present embodiment, the network sharing network parameters are extracted from features in the initial model that correspond to the same enhancement level.

In some optional implementations of this embodiment, the feature extraction network output by the feature extraction network included in the processing network corresponding to the different enhancement levels has different scales.

In some optional implementations of this embodiment, the loss function includes a level loss function, where the level loss function is used to characterize a similarity between output results of feature extraction networks included in the processing network corresponding to each enhancement level.

In some alternative implementations of this embodiment, the loss function includes a loss function corresponding to each enhancement level processing network.

The model training device provided by the embodiment of the present disclosure obtains, through an enhanced image set obtaining unit, enhanced image sets corresponding to at least one original image respectively, and obtains an image processing result corresponding to each enhanced image set, where each enhanced image set includes enhanced images of at least two enhancement levels; the method comprises the steps that a model acquisition unit acquires an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network; the training unit inputs the enhanced images in the enhanced image set to the initial model, takes the image processing results corresponding to the input enhanced image set as expected output results, trains the initial model by using a preset loss function, realizes the enhancement level division of different data enhancement methods, and respectively carries out different processing on the enhanced images corresponding to different enhancement levels in the training process so as to ensure that the features which are contained in the enhanced images with higher enhancement levels and are favorable for the image processing tasks are learned, and reduce the influence of the features which are contained in the enhanced images with higher enhancement levels and are unfavorable for the image processing tasks on the model training.

With further reference to fig. 7, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an image processing apparatus, which corresponds to the method embodiment shown in fig. 5, and which is particularly applicable to various electronic devices.

As shown in fig. 7, the image processing apparatus 700 provided in the present embodiment includes a to-be-processed image acquisition unit 701 and a processing unit 702. Wherein the image to be processed acquisition unit 701 is configured to acquire an image to be processed; the processing unit 702 is configured to input the image to be processed into an image processing model, where the image processing model is a processing network corresponding to the lowest enhancement level in the trained initial model, and the initial model is trained by the method described in the embodiment of fig. 2.

In the present embodiment, in the image processing apparatus 700: the specific processing of the to-be-processed image obtaining unit 701 and the processing unit 702 and the technical effects thereof may refer to the related descriptions of step 501 and step 502 in the corresponding embodiment of fig. 5, and are not described herein again.

The image processing device provided by the embodiment of the present disclosure acquires an image to be processed through an image to be processed acquisition unit; the processing unit inputs the image to be processed into the image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in the initial model after training is completed, so that the characteristics of the image to be processed, which are beneficial to image processing, can be extracted from the angles of different enhancement levels, and the robustness of the image processing result can be improved.

Referring now to fig. 8, a schematic diagram of an electronic device (e.g., server in fig. 1) 800 suitable for use in implementing embodiments of the present disclosure is shown. The server illustrated in fig. 8 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 8, the electronic device 800 may include a processing means (e.g., a central processor, a graphics processor, etc.) 801, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the electronic device 800 are also stored. The processing device 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

In general, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, etc.; storage 808 including, for example, magnetic tape, hard disk, etc.; communication means 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 shows an electronic device 800 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 8 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 809, or installed from storage device 808, or installed from ROM 802. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 801.

It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least one enhanced image set corresponding to the original image respectively, and acquiring an image processing result corresponding to each enhanced image set, wherein each enhanced image set comprises enhanced images of at least two enhancement levels; acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network; and inputting the enhanced images in the enhanced image set into an initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an enhanced image set acquisition unit, a model acquisition unit, and a training unit. The names of these units do not constitute a limitation of the unit itself in some cases, and for example, the enhanced image set acquisition unit may also be described as "a unit that acquires enhanced image sets corresponding to at least one original image respectively, and acquires image processing results corresponding to each enhanced image set, wherein each enhanced image set includes enhanced images of at least two enhancement levels".

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A model training method, comprising:

acquiring at least one enhanced image set corresponding to the original image respectively, and acquiring an image processing result corresponding to each enhanced image set, wherein each enhanced image set comprises enhanced images of at least two enhancement levels;

acquiring an initial model, wherein the initial model comprises processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level comprises an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network;

and inputting the enhanced images in the enhanced image set into the initial model, taking the image processing result corresponding to the input enhanced image set as an expected output result, and training the initial model by using a preset loss function.

2. The method of claim 1, wherein the feature extraction network output by the feature extraction network included in the processing network for different enhancement levels is of different scales.

3. The method of claim 2, wherein the feature extraction results output by the feature extraction networks corresponding to the same enhancement level in the initial model are of different scales.

4. A method according to any of claims 1-3, wherein the loss function comprises a level loss function, wherein the level loss function is used to characterize the similarity between the output results of the feature extraction networks comprised by the processing networks corresponding to each enhancement level.

5. A method according to any of claims 1-3, wherein the loss function comprises a respective corresponding loss function for each enhancement level processing network.

6. An image processing method, comprising:

acquiring an image to be processed;

inputting the image to be processed into an image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in a trained initial model, and the initial model is trained by using the method as set forth in one of claims 1 to 5.

7. A model training apparatus comprising:

an enhanced image set obtaining unit configured to obtain enhanced image sets corresponding to at least one original image respectively, and obtain an image processing result corresponding to each enhanced image set, wherein each enhanced image set includes enhanced images of at least two enhancement levels;

a model acquisition unit configured to acquire an initial model, wherein the initial model includes processing networks respectively corresponding to different enhancement levels, the processing network corresponding to each enhancement level includes an output network and a feature extraction network respectively corresponding to each enhancement level above the enhancement level, and the output network is used for generating an image processing result according to the feature extraction result output by each feature extraction network;

And the training unit is configured to input the enhanced images in the enhanced image set into the initial model, take the image processing results corresponding to the input enhanced image set as expected output results, and train the initial model by using a preset loss function.

8. An image processing apparatus comprising:

a to-be-processed image acquisition unit configured to acquire an to-be-processed image;

the processing unit is configured to input the image to be processed into an image processing model to obtain an image processing result, wherein the image processing model is a processing network corresponding to the lowest enhancement level in a trained initial model, and the initial model is trained by using the method as set forth in one of claims 1 to 5.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

10. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-6.