CN110222705B

CN110222705B - Training method of network model and related device

Info

Publication number: CN110222705B
Application number: CN201910331313.8A
Authority: CN
Inventors: 陈汉亭; 舒晗; 王云鹤; 许春景
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2023-10-24
Anticipated expiration: 2039-04-23
Also published as: CN110222705A

Abstract

The application discloses a training method of a network model, which can be applied to a server for training the network model, wherein the server can respectively acquire a first generation network and a second generation network, and iteratively train the first generation network by utilizing a second converted image output by the second generation network.

Description

Training method of network model and related device

Technical Field

The application relates to the field of artificial intelligence, in particular to a training method of a network model and a related device.

Background

With the development of deep learning technology, the generation of the countermeasure network has been successfully applied to the field of image processing, especially to various image processing scenes such as image style migration and portrait rendering of a mobile terminal, and the generation of the countermeasure network has a huge application prospect. Specifically, the generating countermeasure network includes a generating network and a discriminating network, after initializing a group of generating countermeasure networks, the generating network and the discriminating network can mutually and iteratively train, so that a mature generating network can be obtained, and the terminal equipment uses the trained generating network to perform image processing.

However, the generating network in the prior art occupies a larger memory, and the required calculation amount is also often large, and the terminal device often does not have a processor with good calculation performance and larger cache, so that the existing generating network is difficult to operate on the terminal device, and in order to improve the compatibility between the generating network and the terminal device, the existing generating network needs to be compressed to a smaller volume and the calculation amount of the existing generating network needs to be reduced.

However, since the existing compression and acceleration algorithms are based on classification, detection, and other types of neural networks, the generated network is greatly different from the neural networks of the aforementioned types, and therefore, a compression acceleration scheme for the generated network is needed to be proposed.

Disclosure of Invention

The embodiment of the application provides a training method and a related device for a network model, which realize compression and acceleration of a generating network by reducing the parameter quantity of the generating network, train a small generating network to be trained by using a large-scale network which performs training operation, and ensure that the compressed generating network can still output images with higher quality.

In a first aspect, an embodiment of the present application provides a method for training a network model. The server may initialize a first generation contrast network, acquire the first generation network from the first generation contrast network, and acquire the second generation network from the second generation contrast network, where the first generation network is a small-sized network to be trained, the second generation network is a large-sized network performing training operation, a parameter amount in the first generation network is smaller than a parameter amount in the second generation network, the image to be converted is input into the first generation network to obtain a first converted image, the image to be converted is input into the second generation network to obtain a second converted image, and then the first generation network may be iteratively trained according to the first converted image and the second converted image until a similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, and the third generation network is a network obtained after the first generation network performs iterative training, where the parameter amount of the first generation network may be one fourth, one third or six times the parameter amount of the second generation network.

In the application, after a first generation network and a second generation network are acquired, an image to be converted is input into the first generation network to obtain a first converted image, the image to be converted is input into the second generation network to obtain a second converted image, and the first generation network is subjected to iterative training according to the first converted image and the second converted image until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold value, and then a third generation network obtained after the first generation network executes iterative training is acquired. Because the parameter quantity of the first generation network (namely the third generation network) is smaller than the parameter quantity of the second generation network, the memory area occupied by the third generation network is reduced, the operation quantity of the third generation network is reduced, and the compression and acceleration of the generation network are realized; in addition, as the first generating network is a small-sized network to be trained, the second generating network is a large-sized network for executing training operation, the first generating network is subjected to iterative training by utilizing the image output by the second generating network until the similarity of the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold, so that the image processing capability of the third generating network for executing training operation is better, namely, the compressed generating network can still output higher-quality images.

In one possible implementation manner, the server performs iterative training on the first generating network according to the first converted image and the second converted image until the similarity between the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold, including: the server may preset a first preset threshold value of the first loss function, acquire a first image matrix of the first converted image, and acquire a second image matrix of the second converted image, and according to the first image matrix and the second image matrix, the server may compare each pixel point in the first image matrix and each pixel point in the second image matrix one by one, and perform iterative training on the first generation network by adopting the first loss function, where the higher the similarity between the first converted image and the second converted image, the smaller the output value of the first loss function may be until the similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches the preset threshold value, that is, the output value of the first loss function is smaller than the first preset threshold value.

In the application, the server can judge the similarity between the first converted image and the second converted image by using the first image matrix of the first converted image and the second image matrix of the second converted image, and further reversely iterate and train the first generation network, so that the image output by the first generation network is more and more similar to the image output by the second generation network until the similarity of the first converted image and the second converted image reaches a preset threshold value, thereby not only ensuring the quality of the image output by the third generation network (namely the first generation network after training operation), but also ensuring that the first image matrix and the second image matrix are relatively easy to obtain, and further improving the feasibility of the scheme.

In one possible implementation, the first loss function may be expressed in particular as:

wherein G is _T (x) Representing a first image matrix, G _S (x) Represents a second image matrix, ||G _T (x)-G _S (x)|| ₁ A norm representing the difference between the first image matrix and the second image matrix,representation of G _T (x)-G _S (x)|| ₁ Square of L _L1 (G _S ) Representing a first loss function.

In one possible implementation manner, the server performs iterative training on the first generating network according to the first converted image and the second converted image until the similarity between the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold, which may include: the server acquires a first discrimination network and a second discrimination network, wherein the first discrimination network is a network to be trained, the second discrimination network is a network for executing training operation, the first discrimination network and the first generation network belong to the same first generation countermeasure network, and the second discrimination network and the second generation network belong to the same second generation countermeasure network; inputting the first converted image into a first judging network to obtain a first characteristic information set of the first converted image, inputting the second converted image into a second judging network to obtain a second characteristic information set of the first converted image, and performing iterative training on the first generating network by adopting a second loss function according to the first characteristic information set and the second characteristic information set until the similarity of the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold, wherein the first characteristic information set and the second characteristic information set can comprise attribute information of an object in the image, color information of the object and stripe information of the object.

In the application, as the similarity between the first converted image and the second converted image is higher, the output value of the second loss function can be smaller, the server can be preset with a second preset threshold value of the second loss function, and the first generation network is trained in a reverse iteration mode according to the output value of the second loss function until the output value of the second loss function is smaller than the first preset threshold value, namely, until the similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches the preset threshold value. The first characteristic information set and the second characteristic information set are characteristic information obtained after the image is analyzed, so that visual perception of a user on the image can be reflected, the image generated by a third generating network (namely a first generating network after training operation) according to the first characteristic information set and the second characteristic information set is more real in visual effect, the accuracy of a judging process is improved, and the original functions of the first judging network and the second judging network are utilized, so that the executable type of the scheme is improved, and the quality of the image output by the third generating network is ensured.

In one possible implementation, the second loss function is embodied as:

wherein D is _T (G _T (x) A value obtained after inputting the second converted image into the second discrimination network,representing D _T (G _T (x) A second characteristic information set obtained by removing the last layer, D _S (G _S (x) A) represents a value obtained after inputting the first converted image into the first discrimination network, ++>Representing D _S (G _S (x) A first set of characteristic information obtained after removal of the last layer,/a second set of characteristic information obtained after removal of the last layer>A norm representing the difference between the second set of characteristic information and the first set of characteristic information,/v>Representation->Square of L _prec (G _S ) Representing a second loss function.

In one possible implementation manner, the first generation network belongs to a first generation reactance network, the first generation reactance network further includes a first discrimination network, the first generation network and the first discrimination network are mutually opposed to each other to realize iterative training of the first generation network and the first discrimination network, and the server performs iterative training on the first generation network, which may include: performing iterative training on the first generation network by adopting a third loss function, wherein the third loss function is specifically expressed as follows:

L(G _S )＝L _L1 (G _S )+L _prec (G _S )+L _GAN (G _S )；

wherein L is _L1 (G _S ) Representing a first loss function, L _prec (G _S ) Representing a second loss function, L _GAN (G _S ) Representing a loss function adopted by the first generation network in the process of countermeasure training between the first generation network and the first discrimination network, L (G) _S ) Representing a third loss function;

wherein the first loss function is embodied as:

wherein G is _T (x) Representing a first image matrix, G _S (x) Represents a second image matrix, ||G _T (x)-G _S (x)|| ₁ A norm representing the difference between the first image matrix and the second image matrix,representation of G _T (x)-G _S (x)|| ₁ Square of L _L1 (G _S ) Representing a first loss function;

the second loss function is embodied as:

In the application, the first generating network is subjected to iterative training by adopting the first loss function, the second loss function and the loss function adopted by the first generating network in the process of countermeasure training between the first generating network and the first judging network, and the image output by the first generating network can be at least confused with the first judging network through the countermeasure training of the first judging network, namely, a judged real image is output as much as possible; in addition, since the second generation network is a mature large network, the output image can be regarded as an image with good quality, and the second converted image output by the second generation network is used for carrying out iterative training on the first generation network, so that the similarity of the image output by the third generation network (namely, the first generation network which carries out training operation) and the image output by the second generation network on each pixel point is ensured, and the similarity of the image on the characteristic points is ensured, and therefore, the final third generation network (namely, the first generation network which carries out training operation) can also output an image with good quality.

In one possible implementation manner, the network model training method provided by the application further can include: the server performs iterative training on the first discrimination network by using the second converted image, wherein the first discrimination network and the first generation network belong to the same first generation contrast network.

In the application, as the first generation network and the first discrimination network can mutually fight against each other to realize iterative training on the first generation network and the first discrimination network, after the server acquires the first converted image, the first converted image can be input into the first discrimination network, and the first discrimination network is subjected to iterative training by using a loss function adopted by the first discrimination network in the iterative training process between the first generation network and the first discrimination network, in addition, the generated image is easier to be discriminated as false because the first generation network adopts fewer parameters, so that the server can also utilize the second converted image to perform iterative training on the first discrimination network to assist the first discrimination network with better training, and further generate the first generation network with better quality in the antagonistic training of the first discrimination network and the first generation network, thereby ensuring that the finally obtained third generation network (namely the first generation network after training operation) can output the image with better quality.

In one possible implementation, the server performs iterative training on the first discrimination network using the second converted image, including: the server inputs the second converted image into the first discrimination network to obtain a discrimination value output by the first discrimination network, and adopts a fourth loss function to carry out iterative training on the first discrimination network according to the discrimination value output by the first discrimination network.

In the application, since the second generation network is a large mature generation network subjected to training operation, the image output by the second generation network can be regarded as a real image, so that the fourth loss function can be used for judging the second converted image as the real image.

In one possible implementation, the fourth loss function is embodied as:

wherein D is _S (G _T (x) A value obtained after inputting the second converted image into the first discrimination network,representing a fourth loss function.

In one possible implementation, the server performing iterative training on the first discrimination network using the second converted image may include: the server can input the test image into the first discrimination network to obtain a third characteristic information set of the test image, wherein the test image is a real image similar to the style of the second converted image, the second converted image is input into the first discrimination network to obtain a fourth characteristic information set of the second converted image, and the first converted image is input into the first discrimination network to obtain a first characteristic information set of the first converted image; and determining a first distance between the test image and the second converted image by using the third feature information set and the fourth feature information set, and determining a second distance between the test image and the first converted image by using the third feature information set and the first feature information set, wherein the first distance refers to displaying the difference between the test image and the second converted image in a digital form such as a numerical value, a vector or a matrix, and the second distance refers to displaying the difference between the test image and the first converted image in a digital form such as a numerical value, a vector or a matrix. The fifth loss function is used for reflecting the difference value between the first distance and the second distance, and the server can perform iterative training on the first discrimination network by adopting the fifth loss function until the difference value between the first distance and the second distance reaches a preset threshold value, that is, the output value of the fifth loss function can reach a fourth preset threshold value.

In the application, as the first distance can reflect the difference between the test image judged by the first judging network and the first converted image, the second distance can reflect the difference between the test image judged by the first judging network and the second converted image, namely the judging capability of the first judging network can be embodied, and as the second converted image can be regarded as a real image, the first distance of the first judging network is required to be judged to be smaller than the second distance, and a certain difference exists between the first distance and the second distance, the first judging network is subjected to iterative training in the mode, so that the judging capability of the first judging network can be further improved, the capability of the first generating network is further assisted to be improved, and the method is beneficial to helping the third generating network (namely the first generating network which executes training operation) output images with good effects.

In one possible implementation, the fifth loss function is embodied as:

wherein D is _S (y) represents a value obtained by inputting a test image into the first discrimination network,representing D _S (y) removing the third feature information set obtained from the last layer, D _S (G _T (x) A) represents a value obtained after inputting the second converted image into the first discrimination network, ++ >Representing D _S (G _T (x) A fourth feature information set obtained after removing the last layer,a norm representing the difference between the third and fourth sets of characteristic information, D _S (G _S (x) Indicating the first rotationThe value obtained after the converted image is input into the first discrimination network,>representing D _S (G _S (x) A first set of characteristic information obtained after removal of the last layer,/a second set of characteristic information obtained after removal of the last layer>A norm representing the difference between the third set of characteristic information and the first set of characteristic information, alpha representing +.>And->A preset distance value between L _tri (D _S ) Representing a fifth loss function.

In one possible implementation manner, the first generating network and the first judging network are mutually opposed to realize iterative training of the first generating network and the first judging network, and the server performs iterative training on the first judging network by using the second converted image, which may include: the server performs iterative training on the first discrimination network by using the second converted image and adopting a sixth loss function, wherein the sixth loss function is specifically expressed as follows:

in the process, representing a fourth loss function, ">Represents a fifth loss function, L _GAN (D _S ) Representing a loss function adopted by the first discrimination network in the process of countermeasure training between the first generation network and the first discrimination network, L (D _S ) Representing sixthA loss function;

wherein the fourth loss function is embodied as:

wherein D is _S (G _T (x) A value obtained after inputting the second converted image into the first discrimination network,representing a fourth loss function;

the fifth loss function is embodied as:

wherein D is _S (y) represents a value obtained by inputting a test image into the first discrimination network,representing D _S (y) removing the third feature information set obtained from the last layer, D _S (G _T (x) A) represents a value obtained after inputting the second converted image into the first discrimination network, ++>Representing D _S (G _T (x) A fourth feature information set obtained after removing the last layer,a norm representing the difference between the third and fourth sets of characteristic information, D _S (G _S (x) A) represents a value obtained after inputting the first converted image into the first discrimination network, ++>Representing D _S (G _S (x) After removing the last layerA first set of characteristic information to be reached, +.>A norm representing the difference between the third set of characteristic information and the first set of characteristic information, alpha representing +.>And->A preset distance value between L _tri (D _S ) Representing a fifth loss function.

In a second aspect, an embodiment of the present application provides a training apparatus for a network model. The device comprises an acquisition unit, an input unit and a processing unit. The system comprises an acquisition unit, a first generation network and a second generation network, wherein the acquisition unit is used for acquiring the first generation network and the second generation network, the first generation network is a network to be trained, the second generation network is a mature network for executing training operation, and the parameter quantity in the first generation network is smaller than the parameter quantity in the second generation network; the input unit is used for inputting the image to be converted into the first generation network acquired by the acquisition unit to obtain a first converted image, and inputting the image to be converted into the second generation network acquired by the acquisition unit to obtain a second converted image; the processing unit is used for performing iterative training on the first generation network according to the first converted image and the second converted image which are obtained through the input unit until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold; the acquisition unit is further used for acquiring a third generation network, wherein the third generation network is a network obtained after the processing unit performs iterative training on the first generation network.

In a second aspect of the present application, the constituent modules of the training apparatus of the network model may also perform the steps described in the foregoing first aspect and in various possible implementations, see the foregoing description of the first aspect and in various possible implementations for details.

In a third aspect, an embodiment of the present application provides a server.

The server comprises: memory, transceiver, processor, and bus system; wherein the memory is used for storing programs; the processor is used for executing the program in the memory, and comprises the following steps:

acquiring a first generation network and a second generation network, wherein the first generation network is a network to be trained, the second generation network is a mature network for executing training operation, and the parameter quantity in the first generation network is smaller than that in the second generation network;

inputting the image to be converted into a first generation network to obtain a first converted image;

inputting the image to be converted into a second generation network to obtain a second converted image;

according to the first converted image and the second converted image, performing iterative training on the first generation network until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold;

Acquiring a third generation network, wherein the third generation network is a network obtained after iterative training is performed on the first generation network;

the bus system is used to connect the memory and the processor to communicate the memory and the processor.

In a third aspect of the present application, the constituent modules of the server may also perform the steps described in the foregoing first aspect and in various possible implementations, see in detail the foregoing description of the first aspect and in various possible implementations.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform the method of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the method of the first aspect described above.

Advantageous effects of the second to fifth aspects of the present application reference may be made to the first aspect.

Drawings

FIG. 1 is a schematic diagram of a training system for a network model according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a training method of a network model according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of mutual countermeasure training between a generating network for generating a countermeasure network and a discriminating network according to an embodiment of the present application;

FIG. 4 is another flow chart of a training method of a network model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of the effect of three different generating networks on the output of the same image to be converted after conversion;

FIG. 6 is a schematic structural diagram of a training device for a network model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a training method and a related device for a network model, which realize compression and acceleration of a generating network by reducing the parameter quantity of the generating network, train a small generating network to be trained by using a large-scale network which performs training operation, and ensure that the compressed generating network can still output images with higher quality. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

In order to better understand the technical solution in the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solution in the embodiments of the present application is described in further detail below with reference to the accompanying drawings. Before the technical scheme of the embodiment of the application is described, an application scene of the embodiment of the application is described with reference to the attached drawings.

It should be understood that the training method of the network model provided by the embodiment of the application is applied to the field of artificial intelligence (artificial intelligence, AI), and is particularly suitable for the field of image processing, for example, the training method can be suitable for application scenes of image style migration, specifically, when a user takes a picture or repairs a picture through a client, an image style can be set, and the client is required to convert an image acquired by an image acquisition device according to the set image style, so as to generate an image corresponding to the set image style; as another example, the method may be applicable to an application scenario of image beautification, specifically, in a process of repairing a picture by a user through a client, operations such as lengthening legs, enlarging eyes, heightening nose bridge may be performed, and the client is required to perform conversion according to an image modification operation input by the user and an original image, so as to generate an image corresponding to the image modification operation input by the user; as another example, the method may be applicable to an automatic driving road scene generation scene, where the training process of the automatic driving model requires a large number of pictures of road scenes, but where it is very expensive to use vehicles to realistically acquire the road scenes in the environment where they are not used, the client may generate a real road scene picture from a false scene picture (e.g. a hand-drawn picture) instead of the real road scene picture or the like, and the other application scenes of the method are not listed here one by one.

Referring to fig. 1, fig. 1 is a network architecture diagram of a training system of a network model provided by an embodiment of the present application, where the training system of the network model includes a server and a terminal device, and it is understood that, in conjunction with the above description of the application environment of the present application, the training method of the network model provided by the embodiment of the present application may be applied not only to a portrait processing scenario, but also to a processing scenario of an animal image, and also to an automatic driving road scenario generating scenario, so that the terminal device shown in fig. 1 includes, but is not limited to, an unmanned vehicle, a notebook computer, a tablet computer, a palm computer, a personal computer, a mobile phone, etc., where the communication connection between the server and the terminal device is understood that the terminal device shown in fig. 1 is only for understanding the present application, and is not limited to the present application, and the terminal device may also be embodied in other forms, and is not further illustrated herein. It should be noted that, the client is disposed on the terminal device, specifically, the client may be disposed on the terminal device in a browser manner, or may be disposed on the terminal device in an independent application manner, which is not limited herein.

More specifically, the server is configured to generate a generating countermeasure network including the generating network, where the generating countermeasure network includes a generating network and a discriminating network, and mutual countermeasure training between the generating network and the discriminating network on the server is possible, so as to implement mutual iterative optimization between the generating network and the countermeasure network, thereby obtaining a mature generating network through training operation. The meaning of generating a network and countering a network may be understood in connection with the prior art in the field and will not be explained in detail here.

After obtaining the generating network after training operation, the server can send the generating network to the client of the terminal equipment, so that the client on the terminal equipment can perform image conversion operation by using the mature generating network, and the server is required to generate the generating network with smaller volume and smaller operation amount because the cache capacity of the processor on the terminal equipment is smaller and the calculation amount which can be supported by the processor is smaller.

In the training method of the network model provided by the embodiment of the application, a server acquires a first generation network and a second generation network, wherein the first generation network is a small network to be trained, the second generation network is a large network for performing training operation, and the parameter quantity in the first generation network is smaller than the parameter quantity in the second generation network, so that the server can iteratively train the first generation network and the first countermeasure network by utilizing the countermeasure training between the first generation network and the first discrimination network, and can also iteratively train the first generation network by utilizing a second converted image generated by the second generation network and a first converted image generated by the first generation network, thereby improving the quality of the image output by the first generation network; furthermore, the server can also perform iterative training on the first discrimination network by using the second converted image generated by the second generation network, so that the quality of the image output by the first generation network is further improved through countermeasure training between the first generation network and the first discrimination network. Because the loss functions adopted in the two processes of performing iterative training on the first generation network by using the second converted image and performing iterative training on the first discrimination network by using the second converted image have larger differences, the processes of performing iterative training on the first generation network and the first discrimination network will be respectively explained through different embodiments.

With reference to the foregoing description, first, a process of performing iterative training on a first discriminant network by using a second converted image in a training process of a network model provided by the present application is described below, and referring to fig. 2, an embodiment of a training method of a network model provided by an embodiment of the present application may include:

201. the server obtains a first generation network and a second generation network.

In this embodiment, since the generation network and the discrimination network are both present in pairs, the server may acquire the second generation network from the mature second generation countermeasure network on which the training operation has been performed, wherein the second generation network is a large-scale network on which the training operation has been performed. The server may also initialize a small first generation network and obtain a first generation network therefrom, where the first generation network includes a first generation network and a first discrimination network, and the first generation network and the first discrimination network are networks for which training operations have not been performed.

Wherein, since the first generation network and the second generation network are the same type of network, and the second generation network is a large mature network, and the parameter amount in the first generation network is smaller than the parameter amount in the second generation network, in one implementation manner, the parameter adopted in the first generation network may be determined according to the parameter adopted in the second generation network, specifically, for example, the number of channels of the convolution kernel of the first generation network is one fourth of the second generation network, that is, the number of the parameter adopted in the first generation network is one fourth of the number of the parameter adopted in the second generation network, the method may be to classify a plurality of parameters adopted in the second generation network, and select one fourth of the parameters in each type of parameters as the parameter adopted in the first generation network; more specifically, the parameter determined to be adopted in the first generation network may be the most important parameter in the second generation network, or may be any parameter extracted from the second generation network, etc.; of course, the number of channels of the convolution kernel of the first generation network may be one-half, one-sixth, or the like of the second generation network. In another implementation manner, the parameters adopted in the first generating network may not be determined according to the parameters adopted in the second generating network, that is, the parameters adopted in the first generating network may be different from the parameters adopted in the second generating network, so long as the parameters adopted in the first generating network are smaller than the parameters adopted in the second generating network, and the determination of the parameters adopted in the first generating network is not limited herein. Since the discrimination network does not need to be sent to the client, the number of parameters in the first discrimination network may be equal to the number of parameters in the second discrimination network; of course, in order to enable the first discrimination network and the first generation network to form complete correspondence, the parameter quantity in the first discrimination network may also be smaller than the parameter quantity in the second discrimination network, and specifically, the parameter quantity in the first discrimination network and the parameter quantity in the second discrimination network may be flexibly determined in combination with actual situations.

Still further, the types of the first and second generation networks each include, but are not limited to, a pairwise generation countermeasure network (Pix 2 Pix), a ring generation countermeasure network (cycle generative adversarial networks, cycleGAN), a multi-domain image translation generation countermeasure network (star generative adversarial networks, starGAN), and other types of generation countermeasure networks, as long as the types of the first and second generation networks are the same.

202. The server inputs the image to be converted into a first generation network to obtain a first converted image.

In this embodiment, after acquiring the first generation network, the server may input the image to be converted into the first generation network, so that the first generation network outputs the first converted image.

203. And the server inputs the image to be converted into a second generation network to obtain a second converted image.

In this embodiment, after acquiring the second generation network, the server may input the image to be converted into the second generation network, so that the second generation network outputs the second converted image.

It should be understood that, in the embodiment of the present application, the execution order of step 202 and step 203 is not limited, and step 202 and step 203 may be executed simultaneously; step 202 may be performed first, and then step 203 may be performed; step 203 may be performed first, and then step 202 may be performed.

204. The server carries out iterative training on the first generation network according to the first converted image and the second converted image until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold.

In the embodiment of the application, as the first generating network and the first judging network can be mutually opposed to realize the iterative training of the first generating network and the first judging network, after the server acquires the first converted image, the first converted image can be input into the first judging network, and the first generating network is subjected to the iterative training by utilizing the loss function adopted by the first generating network in the iterative training process between the first generating network and the first judging network; because the parameters adopted in the first generation network are fewer, the quality of the image generated by the third generation network subjected to the countermeasure training operation may not be good through the countermeasure training between the first generation network and the first discrimination network, and the server can also perform iterative training on the first generation network by using the second converted image generated by the second generation network until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, so that the finally obtained third generation network (namely the first generation network subjected to the training operation) can output the image with good quality.

Specifically, firstly, in the mutual countermeasure training between the first generating network and the first judging network, a loss function adopted by the first generating network is described by taking a common type of generating countermeasure network as an example, for convenience in understanding the scheme, please refer to fig. 3, fig. 3 is a schematic diagram of the countermeasure training between the generating network and the judging network provided by the embodiment of the present application, wherein an image to be converted is an image of a hand-painted shoe (i.e. a line image of a shoe on the left side of a broken line in fig. 3), we initialize a first generating network, convert the image of the hand-painted shoe into a first converted image (i.e. an image of a shoe on the left side of a broken line in fig. 3), and for training the first generating network, we input the first converted image and a test image (i.e. an image of a shoe on the right side of a broken line in fig. 3) into the first judging network respectively, wherein the test image refers to a real image of the same style as the converted image, and the first judging network aims at judging that the first converted image is the generated image rather than the real image, i.e. the first fake image is not the real image; and determines that the test image is a true image instead of the generated image, please continue with fig. 3, i.e., determines that the test image is true instead of false. Further, the last output of the first discrimination network is a value between 0 and 1, the more the first discrimination network considers the first converted graph The more the value output by the first discrimination network is close to 1 like a real picture; the more the first discrimination network considers the first converted image as a generated picture, the closer the value output by the first discrimination network is to 0, the goal of the first generation network is to let the first discrimination network judge that the generated first converted image (i.e., G _s (x) If the picture is a real picture, the loss function adopted by the first generation network may be:

L _GAN (G _S )＝1-D _S (G _S (x))； (1)

wherein D is _S (G _S (x) A value L representing the output of the first discrimination network after the first converted image is input to the first discrimination network _GAN (G _S ) When the first generation contrast network is a generation contrast network of a common type, the loss function adopted by the first generation network in the contrast training process between the first generation network and the first discrimination network is represented, it should be understood that the above example is only a loss function adopted by the first generation network when the first generation contrast network is a generation contrast network of a common type, and when the type of the first generation contrast network is pix2pix, the loss function adopted by the first generation network may also be:

wherein 1-D _S (G _S (x) Represented by formula (1), and G is not described in detail herein _S (x) An image matrix representing the first converted image, y representing the image matrix of the test image, |G _S (x)-y|| ₁ A norm representing the difference between the image matrix of the first converted image and the image matrix of the test image,represents ||G _S (x)-y|| ₁ Square of L _GAN (G _S ) Representing a loss function adopted by the first generation network in the course of the countermeasure training between the first generation network and the first discrimination network when the network type of the generation countermeasure network is pix2 pix.Whether using the above formula (1) or the above formula (2), the training goal of the first generation network is L _GAN (G _S ) The value of (2) is close to 0. It should be noted that, the above two examples of the loss function adopted by the first generating network during the countermeasure training between the first generating network and the first discriminating network are only for convenience of understanding the present solution, the loss function adopted by the first generating network and the first discriminating network should be determined in combination with the type of the first generating countermeasure network during the countermeasure training, and the loss function adopted by the first generating network in different types of first generating countermeasure networks, such as CycleGAN, starGAN or other types, may be known by the prior art, and the one-to-one example is not performed on the loss function adopted by the first generating network when the first generating countermeasure network is of other types.

Secondly, introducing a process of performing iterative training on the first generation network by using the second converted image generated by the second generation network, the server can start from multiple dimensions so as to realize iterative training on the first generation network by using the second converted image. As an implementation manner, the server performs iterative training on the first generation network according to the first converted image and the second converted image until the similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, which may include: the server acquires a first image matrix of the first converted image and a second image matrix of the second converted image, and adopts a first loss function to conduct iterative training on the first generation network according to the first image matrix and the second image matrix until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold.

In this embodiment, after the server obtains the first converted image and the second converted image, the server may obtain a first image matrix of the first converted image and a second image matrix of the second converted image respectively, and since the first converted image and the second converted image are obtained according to the image to be converted, a pixel point corresponding to each pixel point in the first image matrix may be found in the second image matrix, so that the server may compare each pixel point in the first image matrix with each pixel point in the second image matrix one by one, and determine the similarity between the first converted image and the second converted image at the pixel latitude by using the first loss function, that is, obtain the output value of the first loss function, where the higher the similarity between the first converted image and the second converted image is, the smaller the output value of the first loss function may be.

The server may preset a first preset threshold of the first loss function, and reversely iterate to train the first generation network according to the output value of the first loss function until the output value of the first loss function is smaller than the first preset threshold, that is, until the similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches the preset threshold. That is, the server can determine the similarity between the first converted image and the second converted image by using the first image matrix of the first converted image and the second image matrix of the second converted image, and further train the first generation network in a reverse iteration manner, so that the image output by the first generation network is more and more similar to the image output by the second generation network until the similarity between the first converted image and the second converted image reaches a preset threshold, the quality of the image output by the third generation network (i.e. the first generation network after training operation) is ensured, and the first image matrix and the second image matrix are both easier to obtain, thereby improving the feasibility of the scheme.

Further, the first loss function may be expressed specifically as:

Wherein G is _T (x) Representing a first image matrix, G _S (x) Represents a second image matrix, ||G _T (x)-G _S (x)|| ₁ Representing the difference between the first image matrix and the second image matrixIs a function of the one of the norms of (1),representation of G _T (x)-G _S (x)|| ₁ Square of L _L1 (G _S ) Representing a first loss function. The above-mentioned I G _T (x)-G _S (x)|| ₁ The sum of absolute values of differences between the n first image matrices and the n second image matrices may also be represented, where each first image matrix corresponds to a first converted image, and each second image matrix corresponds to a second converted image.

In this embodiment, that is, after obtaining a norm of the difference between the first image matrix and the second image matrix, the server squares to obtain an output value of the first loss function, where, since the output value of the first loss function is 0 (that is, the first converted image and the second converted image are completely consistent) is an ideal case, generally, the first preset threshold is determined to be a value slightly greater than 0, specifically, the first preset threshold may be 0.47, 0.4 or other values, and specifically, the first preset threshold may be determined by combining the above-mentioned n value, the performance requirement of the first generating network, and other factors, which are not limited herein. It should be appreciated that the first loss function may also be embodied as ||g _T (x)-G _S (x)|| ₁ The first loss function may also be embodied in other forms as long as the difference between each pixel point between the first converted image and the second converted image can be reflected, and the specific implementation of the first loss function is not limited herein.

As another implementation manner, performing iterative training on the first generation network according to the first converted image and the second converted image until the similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, including: acquiring a first discrimination network and a second discrimination network, wherein the first discrimination network is a network to be trained, the second discrimination network is a network for executing training operation, the first discrimination network and the first generation network belong to the same first generation countermeasure network, and the second discrimination network and the second generation network belong to the same second generation countermeasure network; inputting the first converted image into a first discrimination network to obtain a first characteristic information set of the first converted image; inputting the second converted image into a second discrimination network to obtain a second characteristic information set of the first converted image; and performing iterative training on the first generation network by adopting a second loss function according to the first characteristic information set and the second characteristic information set until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold value.

In this embodiment, since the distinguishing network collects feature information in the image and performs the judging operation according to the collected feature information in the process of judging whether the image is a real image or a generated image, the server may obtain the first distinguishing network and the second distinguishing network after obtaining the first converted image and the second converted image, input the first converted image into the first distinguishing network to obtain a first feature information set of the first converted image, input the second converted image into the second distinguishing network to obtain a second feature information set of the first converted image, and compare the first feature information set with the second feature information set by adopting the second loss function to determine the similarity of the first converted image output by the first generating network and the second converted image output by the second generating network, that is, obtain the output value of the second loss function, where the first feature information set and the second feature information set respectively include feature information of the first converted image and the second converted image, and the second feature information set are combined with fig. 3, for example, the first feature information set includes color information of shoelaces, the second color information of shoelaces, the color information of the shoes, and the like.

The server may preset a second preset threshold of the second loss function, and train the first generation network in a reverse iteration manner according to the output value of the second loss function until the output value of the second loss function is smaller than the second preset threshold, that is, until the similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches the preset threshold. The first characteristic information set and the second characteristic information set are characteristic information obtained after the image is analyzed, so that visual perception of a user on the image can be reflected, the image generated by a third generating network (namely a first generating network after training operation) according to the first characteristic information set and the second characteristic information set is more real in visual effect, the accuracy of a judging process is improved, and the original functions of the first judging network and the second judging network are utilized, so that the executable type of the scheme is improved, and the quality of the image output by the third generating network is ensured.

Further, the second loss function is embodied as:

In this embodiment, since the output value of the second loss function is 0 (i.e. the first converted image and the second converted image are completely consistent), the second preset threshold is generally determined to be a value slightly greater than 0, specifically, the second preset threshold may be 0.28, 0.3, or other values, which is not limited herein. The discrimination network is a neural network, the value of the penultimate layer (i.e., the last layer is removed) is the intermediate value calculated in the whole network,and->It should be appreciated that in other embodiments, the first set of characteristic information and the second set of characteristic information may also take the value of the last three or other layers in the discrimination network, and may specifically be determined in connection with generating the type of antagonism network, without limitation.

Optionally, the server performs iterative training on the first generation network, including: the server adopts a third loss function to carry out iterative training on the first generation network, wherein the third loss function is specifically expressed as follows:

L(G _S )＝L _L1 (G _S )+L _prec (G _S )+L _GAN (G _S )； (5)

wherein L is _L1 (G _S ) Representing a first loss function, L _prec (G _S ) Representing a second lossFunction, L _GAN (G _S ) Representing a loss function adopted by the first generation network in the process of countermeasure training between the first generation network and the first discrimination network, L (G) _S ) Representing a third loss function; specifically, since the first loss function and the second loss function have been described in detail in the above embodiments, details thereof are not repeated here, and L _GAN (G _S ) May be expressed in the above formula (1), may be expressed in the above formula (2), may be expressed in other forms as the type of the generated countermeasure network changes, and in the above embodiment is also given for L _GAN (G _S ) And are described in detail, and are not described in detail herein.

In the embodiment, the first generating network is subjected to iterative training by adopting the first loss function, the second loss function and the loss function adopted by the first generating network in the process of countermeasure training between the first generating network and the first judging network, and the image output by the first generating network can be at least confused with the first judging network through the countermeasure training of the first judging network, namely, a judged real image is output as much as possible; in addition, since the second generation network is a mature large network, the output image can be regarded as an image with good quality, and the second converted image output by the second generation network is used for carrying out iterative training on the first generation network, so that the similarity of the image output by the third generation network (namely, the first generation network which carries out training operation) and the image output by the second generation network on each pixel point is ensured, and the similarity of the image on the characteristic points is ensured, and therefore, the final third generation network (namely, the first generation network which carries out training operation) can also output an image with good quality.

It should be appreciated that the server may also iteratively train the first generation network by other loss functions, specifically, by a loss function formed by any combination of the above formulas (1) to (5), for example, by a loss function formed by adding the above formulas (3) and (4); as another example, the first generation network is iteratively trained, for example, by a loss function formed by adding the above formula (3) and formula (1), and the like, and the other combinations are not listed here.

205. The server acquires a third generation network, and the third generation network is a network obtained after iterative training is performed on the first generation network.

In this embodiment, after the server performs iterative training on the first generation network through step 204, a third generation network may be obtained, so that the server may obtain the third generation network and send the third generation network to the client on the terminal device, so that the client performs an image processing operation by using the third generation network.

In this embodiment, after acquiring the first generating network and the second generating network, the server inputs the image to be converted into the first generating network to obtain a first converted image, inputs the image to be converted into the second generating network to obtain a second converted image, and performs iterative training on the first generating network according to the first converted image and the second converted image until the similarity of the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold, so as to acquire a third generating network obtained after the first generating network performs iterative training. Because the parameter quantity of the first generation network (namely the third generation network) is smaller than the parameter quantity of the second generation network, the memory area occupied by the third generation network is reduced, the operation quantity of the third generation network is reduced, and the compression and acceleration of the generation network are realized; in addition, as the first generating network is a small-sized network to be trained, the second generating network is a large-sized network for executing training operation, the first generating network is subjected to iterative training by utilizing the image output by the second generating network until the similarity of the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold, so that the image processing capability of the third generating network for executing training operation is better, namely, the compressed generating network can still output higher-quality images.

Referring to fig. 4, referring to the description of the process of performing iterative training on the first generation network by using the second converted image in the training process of the network model provided by the present application, another embodiment of the training method of the network model provided by the embodiment of the present application may include:

401. the server obtains a first generation network and a second generation network.

402. The server inputs the image to be converted and the test image into a first generation network to obtain a first converted image.

403. And the server inputs the image to be converted and the test image into a second generation network to obtain a second converted image.

404. The server carries out iterative training on the first generation network according to the first converted image and the second converted image until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold.

In the embodiment of the present application, steps 401 to 404 are similar to steps 201 to 204 in the embodiment shown in fig. 2, and are not repeated here.

405. The server performs iterative training on the first discrimination network using the second converted image.

In this embodiment, since the first generating network and the first discriminating network may be mutually opposed to each other, so as to implement iterative training on the first generating network and the first discriminating network, after the server acquires the first converted image, the first converted image may be input into the first discriminating network, and the first discriminating network is iteratively trained by using a loss function adopted by the first discriminating network in the iterative training process between the first generating network and the first discriminating network, and in addition, since parameters adopted by the first generating network are fewer, the generated image is easier to be discriminated as false, so that the server may also iteratively train the first discriminating network by using the second converted image, so as to assist in training the better first discriminating network at the position, and further generate the better first generating network in the countermeasure learning of the first discriminating network and the first generating network, thereby ensuring that the finally obtained third generating network (i.e. the first generating network after training operation) may output an image with better quality.

Specifically, in the mutual countermeasure training between the first generating network and the first discriminating network, the loss function adopted by the first discriminating network is described by taking the generating countermeasure network of a common type as an example, for convenience of understanding the scheme, please refer to the description part of fig. 3 in step 204, since the first discriminating network outputs a value between 0 and 1, the more the first discriminating network considers that the first converted image is a real picture, the more the value output by the first discriminating network is close to 1; the more the first discrimination network considers the first converted image to be a generated picture, the closer the value output by the first discrimination network is to 0, the first discrimination network aims to discriminate the first converted image (i.e. the image of the color filled shoe on the left side of the dotted line in fig. 3) to be a generated picture, and the test image (i.e. the image of the color filled shoe on the right side of the dotted line in fig. 3) to be a real image, the loss function adopted by the first discrimination network may be:

L _GAN (D _S )＝D _S (G _S (x))+(1-D _S (y))； (6)

Wherein D is _S (G _S (x) A) represents a value obtained after inputting the first converted image to the first discrimination network, D _S (y) represents a value obtained after inputting the test image into the first discrimination network, L _GAN (D _S ) Representing the loss function adopted by the first discrimination network in the mutual countermeasure training process by the first generation network and the first discrimination network.

It should be understood that, when the above example is only a generated countermeasure network in which the first generated countermeasure network is of a common type, the loss function adopted by the first generated network may be different in the mutual countermeasure training process by the first generated network and the first discrimination network when the type of the first generated countermeasure network is pix2pix, cycleGAN, starGAN or other types, but may be obtained in combination with the prior art, which is not repeated here.

Secondly, a process of performing iterative training on the first discrimination network by using the second converted image generated by the second generation network is introduced, and likewise, the server can start from multiple dimensions so as to realize iterative training on the first discrimination network by using the second converted image. As one implementation, the server performs iterative training on the first discrimination network using the second converted image, including: the server inputs the second converted image into a first discrimination network to obtain a discrimination value output by the first discrimination network; and carrying out iterative training on the first discrimination network by adopting a fourth loss function according to the discrimination value output by the first discrimination network.

In this embodiment, the server may input the second converted image into the first discrimination network to obtain the discrimination value output by the first discrimination network, and perform iterative training on the first discrimination network according to the discrimination value output by the first discrimination network, and because the second generation network is a large mature generation network subjected to training operation, the image output by the second generation network may be regarded as a real image, so that the fourth loss function may be used to discriminate that the second converted image is a real image.

Specifically, the fourth loss function is embodied as:

In this embodiment, the server may input the second converted image to the first discrimination network to obtain the value output by the first discrimination network, and then take the value into the fourth loss function, and when the value reaches the third preset threshold of the fourth loss function, consider that the convergence condition of the fourth loss function is satisfied. Since the output value of the fourth loss function is 0, it is an ideal case, and therefore, in general, the third preset threshold is determined to be a value slightly greater than 0, specifically, the third preset threshold may be 0.35, 0.38, or other values, and the third preset threshold is not limited herein.

As another implementation, the server performs iterative training on the first discrimination network using the second converted image, including: the server inputs the test image into a first judging network to obtain a third characteristic information set of the test image; inputting the second converted image into the first discrimination network to obtain a fourth characteristic information set of the second converted image; inputting the first converted image into a first discrimination network to obtain a first characteristic information set of the first converted image; and carrying out iterative training on the first discrimination network by adopting a fifth loss function according to the three feature information sets, the fourth feature information set and the first feature information set.

In this embodiment, since the feature information set of the image may be obtained through the discrimination network, the server may input the test image, the second converted image, and the first converted image into the first discrimination network, respectively, so as to obtain the third feature information set of the test image, the fourth feature information set of the second converted image, and the first feature information set of the first converted image output by the first discrimination network, determine the first distance between the test image and the second converted image by using the third feature information set and the fourth feature information set, determine the second distance between the test image and the first converted image by using the third feature information set and the first feature information set, where, similar to the first feature information set and the second feature information set, the third feature information set and the fourth feature information set may be attribute information of an object included in the image, color information of an object included in the image, or other image feature information, and the first distance refers to a digital display form of a numerical value, a vector, a matrix, or the like, and the second distance refers to a digital display form of a difference between the second feature information and the first converted image, a digital display form of a numerical value, a digital display of a difference, and the second display form, and the like. The fifth loss function is used for reflecting the difference value between the first distance and the second distance, and the server can perform iterative training on the first discrimination network by adopting the fifth loss function until the difference value between the first distance and the second distance reaches a preset threshold value, that is, the output value of the fifth loss function can reach a fourth preset threshold value.

In this embodiment, since the first distance may reflect the difference between the test image determined by the first determination network and the first converted image, and the second distance may reflect the difference between the test image determined by the first determination network and the second converted image, that is, may reflect the determining capability of the first determination network, and since the second converted image may be regarded as a real image, the first distance of the first determination network should be determined should be smaller than the second distance, and there is a certain difference between the first distance and the second distance, by performing iterative training on the first determination network in the above manner, the determining capability of the first determination network may be further improved, thereby assisting the first generation network in improving the capability of image conversion, and being beneficial to helping the third generation network (that is, the first generation network that performs the training operation) output an image with good effect.

Specifically, the fifth loss function is embodied as:

wherein D is _S (y) represents a value obtained by inputting a test image into the first discrimination network,representing D _S (y) removing the third feature information set obtained from the last layer, D _S (G _T (x) A) represents a value obtained after inputting the second converted image into the first discrimination network, ++ >Representing D _S (G _T (x) A fourth feature information set obtained after removing the last layer,a norm representing the difference between the third and fourth sets of characteristic information, D _S (G _S (x) A) represents a value obtained after inputting the first converted image into the first discrimination network, ++>Representing D _S (G _S (x) A first set of characteristic information obtained after removal of the last layer,/a second set of characteristic information obtained after removal of the last layer>A norm representing the difference between the third set of characteristic information and the first set of characteristic information, alpha representing +.>And->A preset distance value between L _tri (D _S ) Representing a fifth loss function.

Specifically, since the output value of the fifth loss function is 0 is an ideal case, in general, the fourth preset threshold is determined to be a value slightly greater than 0, specifically, the second preset threshold may be 0.53, 0.5, or other values, and the like, which is not limited herein. The value of α may be determined in conjunction with the image output capability of the first generating network, and may be, for example, 0.5, 0.52, or other values, etc., which are not limited herein.

The discrimination network is a neural network, the value of the penultimate layer (i.e., the last layer is removed) is the intermediate value calculated in the whole network,and->It should be appreciated that, similar to the first and second sets of characteristic information, in other embodiments, the third and fourth sets of characteristic information may also take on values that distinguish the third to last or other layers in the network, particularly in connection with generating a type determination of the antagonizing network, without limitation herein.

Optionally, the server performs iterative training on the first discrimination network by using the second converted image, and may further include: the server performs iterative training on the first discrimination network by using the second converted image and adopting a sixth loss function, wherein the sixth loss function is specifically expressed as follows:

wherein, representing a fourth loss function, ">Represents a fifth loss function, L _GAN (D _S ) Representing a loss function adopted by the first discrimination network in the process of countermeasure training between the first generation network and the first discrimination network, L (D _S ) Representing a sixth loss function; specifically, the->Can be embodied as the formula (7), L _tri (D _S ) Can be embodied as the formula (8), L _GAN (D _S ) May be embodied in the above formula (6) or may be embodied in other forms as the type of countermeasure network is generated, and the above embodiments also detail the formula (6), the formula (7) and the formula (8)Is not described in detail herein.

It should be noted that, the server may also perform iterative training on the first discrimination network through other loss functions, specifically, may perform iterative training on the first discrimination network through a loss function formed by any combination of the above formulas (6) to (8), for example, perform iterative training on the first discrimination network through a loss function formed by adding the above formulas (6) and (7); as another example, the first discrimination network is iteratively trained, for example, by a loss function formed by adding the above formula (6) and formula (8), and the like, and the other combinations are not listed here.

It should be understood that, in the embodiment of the present application, the execution order of step 404 and step 405 is not limited, and step 404 and step 405 may be executed simultaneously; step 404 may be performed first, and then step 405 may be performed; step 405 may be performed before step 404 is performed.

406. The server acquires a third generation network, and the third generation network is a network obtained after iterative training is performed on the first generation network.

In the embodiment of the present application, step 406 is similar to step 205 in the embodiment shown in fig. 2, and will not be described here again.

In order to further understand the beneficial effects brought by the application, the beneficial effects of the scheme are further shown by combining experimental data, and firstly, the image conversion effect of the generated network provided by the embodiment of the application is shown. Referring to fig. 5, fig. 5 is a schematic diagram of an effect of converting the same image to be converted by three different generating networks, and fig. 5 includes four sub-schematic diagrams (a), (b), (c) and (d), wherein (a) is the image to be converted; a fourth generation network which can be obtained after the countermeasure training between the generation network and the discrimination network is only adopted for the first generation network with less parameter quantity, (b) an image which is output after the fourth generation network converts the image to be converted; (c) The third generation network provided by the application is used for converting the image to be converted and outputting the converted image; (d) The images output after the conversion of the image to be converted by the second generating network can be easily seen through comparison in fig. 5, and the images finally presented in the (c) schematic diagram and the (d) schematic diagram are better, and the image effect finally presented in the (b) schematic diagram is poorer.

Next, comparing the memory size, the operation amount and the processing time of each graph occupied by the small third generation network and the large second generation network provided by the present application, testing is performed on the operation system of Intel (R) Xeon (R) CPU E5-2690 v4@2.60GHz, in this embodiment, the first generation network is iteratively trained by using the above formula (5), the first discrimination network is iteratively trained by using the above formula (9), wherein the preset threshold of the first loss function is set to 0.47, the preset threshold of the second loss function is set to 0.28, the preset threshold of the fourth loss function is set to 0.35, the preset threshold of the fifth loss function is set to 0.53, and the value of α in the fifth loss function is set to 0.5. The test data obtained are shown in table 1 below, and are described below in conjunction with table 1.

TABLE 1

	Occupying memory size	Number of floating point operations	Processing time of each graph
				Large-scale generation network	43.4MB	47.19G	2.26s
Generating network in the present application	2.8MB	3.19G	0.32s

Referring to table 1, the memory size occupied by the large second generation network is 43.4MB, and the memory size occupied by the small third generation network provided by the application is 2.8MB, and the memory occupied by the compressed third generation network obtained by the model training method provided by the embodiment of the application is less than one tenth of the memory occupied before compression; the operation amount of the large second generation network is 471.9 hundred million times per second, and the operation amount of the small third generation network provided by the application is 31.9 hundred million times per second, so that the operation amount is greatly reduced; the processing time of the large second generation network for each image is 2.26s, and the processing time of the small third generation network provided by the application for each image is 0.32s, so that the processing time is greatly shortened, and the third generation network obtained by the embodiment of the application has the advantages of small occupied memory, less operation amount, short operation time and better conversion effect, so that the application is more suitable for being applied to terminal equipment, as shown in the comprehensive figure 5 and table 1.

In order to better implement the above-described aspects of the embodiments of the present application, the following provides related apparatuses for implementing the above-described aspects. Referring specifically to fig. 6, fig. 6 is a schematic structural diagram of a network model training apparatus according to an embodiment of the present application, and the network model training apparatus 60 includes an obtaining unit 601, an input unit 602, and a processing unit 603.

An obtaining unit 601, configured to obtain a first generation network and a second generation network, where the first generation network is a network to be trained, the second generation network is a mature network that performs training operation, and a parameter amount in the first generation network is smaller than a parameter amount in the second generation network;

an input unit 602, configured to input the image to be converted and the test image into the first generation network acquired by the acquisition unit 601, to obtain a first converted image;

the input unit 602 is further configured to input the image to be converted and the test image into the second generation network acquired by the acquisition unit 601, to obtain a second converted image;

a processing unit 603, configured to iteratively train the first generating network according to the first converted image and the second converted image obtained through the input unit 602 until a similarity between the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold;

The obtaining unit 601 is further configured to obtain a third generation network, where the third generation network is a network obtained by performing iterative training on the first generation network by the processing unit 603.

In the embodiment of the present application, after the acquiring unit 601 acquires the first generating network and the second generating network, the input unit 60 inputs the image to be converted into the first generating network acquired by the acquiring unit 601 to obtain a first converted image, inputs the image to be converted into the second generating network acquired by the acquiring unit 601 to obtain a second converted image, and the processing unit 603 performs iterative training on the first generating network according to the first converted image and the second converted image obtained by the input unit 602 until the similarity of the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold value, so as to obtain a third generating network obtained after the first generating network performs iterative training. Because the parameter quantity of the first generation network (namely the third generation network) is smaller than the parameter quantity of the second generation network, the memory area occupied by the third generation network is reduced, the operation quantity of the third generation network is reduced, and the compression and acceleration of the generation network are realized; in addition, as the first generating network is a small-sized network to be trained, the second generating network is a large-sized network for executing training operation, the first generating network is subjected to iterative training by utilizing the image output by the second generating network until the similarity of the first converted image output by the first generating network and the second converted image output by the second generating network reaches a preset threshold, so that the image processing capability of the third generating network for executing training operation is better, namely, the compressed generating network can still output higher-quality images.

In one possible design, the processing unit 603 is specifically configured to:

acquiring a first image matrix of a first converted image; acquiring a second image matrix of a second converted image; and performing iterative training on the first generation network by adopting a first loss function according to the first image matrix and the second image matrix until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold.

In one possible design, the processing unit 603 is specifically configured to:

acquiring a first discrimination network and a second discrimination network, wherein the first discrimination network is a network to be trained, the second discrimination network is a network for executing training operation, the first discrimination network and the first generation network belong to the same generation countermeasure network, and the second discrimination network and the second generation network belong to the same generation countermeasure network; inputting the first converted image into a first discrimination network to obtain a first characteristic information set of the first converted image; inputting the second converted image into a second discrimination network to obtain a second characteristic information set of the first converted image; and performing iterative training on the first generation network by adopting a second loss function according to the first characteristic information set and the second characteristic information set until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold value.

In a possible design, the processing unit 603 is further configured to iteratively train the first discrimination network using the second converted image.

In one possible design, the processing unit 603 is specifically configured to:

inputting the second converted image into a first discrimination network to obtain a discrimination value output by the first discrimination network; and carrying out iterative training on the first discrimination network by adopting a fourth loss function according to the discrimination value output by the first discrimination network.

In one possible design, the processing unit 603 is specifically configured to:

inputting the test image into a first discrimination network to obtain a third characteristic information set of the test image; inputting the second converted image into the first discrimination network to obtain a fourth characteristic information set of the second converted image; inputting the first converted image into a first discrimination network to obtain a first characteristic information set of the first converted image; and carrying out iterative training on the first discrimination network by adopting a fifth loss function according to the three feature information sets, the fourth feature information set and the first feature information set.

It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned device is based on the same concept as the method embodiment of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and the specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.

Next, referring to fig. 7, a schematic structural diagram of a server according to an embodiment of the present application is shown, where the server includes:

receiver 701, transmitter 702, processor 703 and memory 704 (where the number of processors 703 may be one or more, one processor is illustrated in fig. 7), where processor 703 may include modem processor 7031 and application processor 7032. In some embodiments of the application, the receiver 701, transmitter 702, processor 703, and memory 704 may be connected by a bus or other means.

Memory 704 may include read-only memory and random access memory, and provides instructions and data to processor 703. A portion of memory 704 may also include non-volatile random access memory (non-volatile random access memory, NVRAM). The memory 704 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.

The processor 703 controls the operation of the terminal device. In a specific application, the individual components of the terminal device are coupled together by a bus system, which may comprise, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.

The method disclosed in the above embodiment of the present application may be applied to the processor 703 or implemented by the processor 703. The processor 703 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 703 or by instructions in the form of software. The processor 703 may be a general purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 704, and the processor 703 reads information in the memory 704 and, in combination with its hardware, performs the steps of the method described above.

The receiver 701 may be used to receive input digital or character information and to generate signal inputs related to the relevant settings and function control of the terminal device. The transmitter 702 may be used to output numeric or character information via a first interface; the transmitter 702 may also be configured to send instructions to the disk group via the first interface to modify data in the disk group; the transmitter 702 may also include a display device such as a display screen.

In an embodiment of the present application, the processor 703 is configured to perform the method for model training performed by the foregoing server.

There is also provided in an embodiment of the application a computer program product comprising inter-communication system transfer instructions which, when run on a computer, cause the computer to perform the steps performed by a server in a method as described in the embodiments of figures 2 to 5 above.

Embodiments of the present application also provide a computer readable storage medium having stored therein instructions for inter-communication system transfer, which when executed on a computer, cause the computer to perform the steps performed by a server in the method described in the embodiments of fig. 2 to 5 described above.

The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method of the first aspect.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a U-disk, a transfer hard disk, a Read-Only Memory (ROM), a random-access Memory (RAM, random Access Memory), a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute the method according to the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device including one or more servers, data centers, etc. that can be integrated with the available medium. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

Claims

1. A method for training a network model, the method comprising:

acquiring a first generation network and a second generation network, wherein the first generation network is a small-sized network to be trained, the second generation network is a large-sized network for executing training operation, and the parameter quantity in the first generation network is smaller than the parameter quantity in the second generation network;

inputting an image to be converted into the first generation network to obtain a first converted image;

inputting the image to be converted into the second generation network to obtain a second converted image;

inputting the second converted image into a first discrimination network to perform iterative training on the first discrimination network by using the second converted image, wherein the first discrimination network and the first generation network belong to the same first generation contrast network;

performing iterative training on the first generation network according to the first converted image and the second converted image until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, wherein performing iterative training on the first generation network further comprises mutually opposing the first generation network and the first discrimination network so as to realize iterative training on the first generation network and the first discrimination network;

And acquiring a third generation network, wherein the third generation network is a network obtained after the iterative training is executed for the first generation network.

2. The method according to claim 1, wherein the iteratively training the first generation network based on the first converted image and the second converted image until a similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, comprises:

acquiring a first image matrix of the first converted image;

acquiring a second image matrix of the second converted image;

and performing iterative training on the first generation network by adopting a first loss function according to the first image matrix and the second image matrix until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches the preset threshold value.

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the first loss function is embodied as:

wherein G is _T (x) Representing the first image matrix, G _S (x) Representing the second image matrix, ||G _T (x)-G _S (x)|| ₁ A norm representing a difference between the first image matrix and the second image matrix,representation of G _T (x)-G _S (x)|| ₁ Square of L _L1 (G _S ) Representing the first loss function.

4. The method according to claim 1, wherein the iteratively training the first generation network based on the first converted image and the second converted image until a similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, comprises:

acquiring the first discrimination network and a second discrimination network, wherein the first discrimination network is a network to be trained, the second discrimination network is a network which performs training operation, and the second discrimination network and the second generation network belong to the same second generation countermeasure network;

inputting the first converted image into the first discrimination network to obtain a first characteristic information set of the first converted image;

inputting the second converted image into the second discrimination network to obtain a second characteristic information set of the first converted image;

And performing iterative training on the first generation network by adopting a second loss function according to the first characteristic information set and the second characteristic information set until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches the preset threshold value.

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

the second loss function is embodied as:

wherein D is _T (G _T (x) A value obtained after inputting the second converted image into the second discrimination network,representing D _T (G _T (x) The second characteristic information set obtained by the last layer is removed, D _S (G _S (x) Representing the first converted imageInputting the value obtained after said first discrimination network, -a>Representing D _S (G _S (x) The first feature information set obtained after the last layer is removed,/for>A norm representing the difference between said second set of characteristic information and said first set of characteristic information,/v>Representation ofSquare of L _prec (G _S ) Representing the second loss function.

6. The method of claim 2, wherein the iteratively training the first generation network comprises:

Performing iterative training on the first generation network by adopting a third loss function, wherein the third loss function is specifically expressed as follows:

L(G _S )＝L _L1 (G _S )+L _prec (G _S )+L _GAN (G _S )；

wherein L is _L1 (G _S ) Representing the first loss function, L _prec (G _S ) Representing a second loss function, L _GAN (G _S ) Representing a loss function, L (G) _S ) Representing the third loss function;

wherein the first loss function is embodied as:

wherein G is _T (x) Representing the first image matrix, G _S (x) Representing the second image matrix, ||G _T (x)-G _S (x)|| ₁ A norm representing a difference between the first image matrix and the second image matrix,representation of G _T (x)-G _S (x)|| ₁ Square of L _L1 (G _S ) Representing the first loss function;

the second loss function is embodied as:

wherein D is _T (G _T (x) A value obtained after inputting the second converted image into a second discrimination network belonging to the same second generation countermeasure network as the second generation network,representing D _T (G _T (x) A second characteristic information set obtained by removing the last layer, D _S (G _S (x) A) represents a value obtained after inputting the first converted image into the first discrimination network, (-) - >Representing D _S (G _S (x) A first set of characteristic information obtained after removal of the last layer,/a second set of characteristic information obtained after removal of the last layer>A norm representing the difference between said second set of characteristic information and said first set of characteristic information,/v>Representation->Square of L _prec (G _S ) Representing the second loss function.

7. The method of any one of claims 1 to 5, wherein inputting the second converted image into a first discrimination network to iteratively train the first discrimination network with the second converted image comprises:

inputting the second converted image into the first discrimination network to obtain a discrimination value output by the first discrimination network;

and performing iterative training on the first discrimination network by adopting a fourth loss function according to the discrimination value output by the first discrimination network.

8. The method of claim 7, the fourth loss function being embodied as:

wherein D is _S (G _T (x) A value obtained after inputting the second converted image into the first discrimination network,representing the fourth loss function.

9. The method of any one of claims 1 to 5, wherein inputting the second converted image into a first discrimination network to iteratively train the first discrimination network with the second converted image comprises:

Inputting a test image into the first discrimination network to obtain a third characteristic information set of the test image, wherein the test image is a real image with similar style to the second converted image;

inputting the second converted image into the first discrimination network to obtain a fourth characteristic information set of the second converted image;

and performing iterative training on the first judging network by adopting a fifth loss function according to the three characteristic information sets, the fourth characteristic information set and the first characteristic information set.

10. The method of claim 9, the fifth loss function being embodied as:

wherein D is _S (y) represents a value obtained after inputting the test image into the first discrimination network,representing D _S (y) removing the third feature information set obtained from the last layer, D _S (G _T (x) A value representing the value obtained after inputting said second converted image into said first discrimination network, -a second discrimination network>Representing D _S (G _T (x) The fourth characteristic information set obtained after the last layer is removed,/for >A norm representing a difference between the third set of characteristic information and the fourth set of characteristic information, D _S (G _S (x) A) represents a value obtained after inputting the first converted image into the first discrimination network, (-) ->Representing D _S (G _S (x) To remove)The first characteristic information set obtained after the last layer is removed,a norm representing the difference between said third set of characteristic information and said first set of characteristic information, alpha representing +.>And->A preset distance value between L _tri (D _S ) Representing the fifth loss function.

11. The method of claim 9, wherein inputting the second converted image into a first discrimination network to iteratively train the first discrimination network using the second converted image comprises:

inputting the second converted image into a first discrimination network to perform iterative training on the first discrimination network by using the second converted image and adopting a sixth loss function, wherein the sixth loss function is specifically expressed as follows:

wherein, represents a fourth loss function, L _tri (D _S ) Representing the fifth loss function, L _GAN (D _S ) Representing a loss function, L (D) _S ) Representing the sixth loss function;

wherein the fourth loss function is embodied as:

wherein D is _S (G _T (x) A value obtained after inputting the second converted image into the first discrimination network,representing the fourth loss function;

the fifth loss function is embodied as:

wherein D is _S (y) represents a value obtained after inputting the test image into the first discrimination network,representing D _S (y) removing the third feature information set obtained from the last layer, D _S (G _T (x) A value representing the value obtained after inputting said second converted image into said first discrimination network, -a second discrimination network>Representing D _S (G _T (x) The fourth characteristic information set obtained after the last layer is removed,/for>A norm representing a difference between the third set of characteristic information and the fourth set of characteristic information, D _S (G _S (x) A) represents a value obtained after inputting the first converted image into the first discrimination network, (-) ->Representing D _S (G _S (x) A first set of characteristic information obtained after removal of the last layer,a norm representing the difference between said third set of characteristic information and said first set of characteristic information, alpha representing +.>And->A preset distance value between L _tri (D _S ) Representing the fifth loss function.

12. A network model training apparatus, the apparatus comprising:

the system comprises an acquisition unit, a first generation network and a second generation network, wherein the first generation network is a network to be trained, the second generation network is a mature network for performing training operation, and the parameter quantity in the first generation network is smaller than the parameter quantity in the second generation network;

the input unit is used for inputting the image to be converted into the first generation network acquired by the acquisition unit to acquire a first converted image;

the input unit is further configured to input the image to be converted into the second generation network acquired by the acquisition unit, so as to obtain a second converted image;

the processing unit is used for inputting the second converted image into a first discrimination network so as to perform iterative training on the first discrimination network by utilizing the second converted image, wherein the first discrimination network and the first generation network belong to the same first generation contrast network;

the processing unit is further configured to perform iterative training on the first generation network according to the first converted image and the second converted image obtained through the input unit, until a similarity between the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold, where performing iterative training on the first generation network further includes mutually opposing the first generation network and the first discrimination network, so as to implement iterative training on the first generation network and the first discrimination network;

The obtaining unit is further configured to obtain a third generating network, where the third generating network is a network obtained by the processing unit performing the iterative training on the first generating network.

13. The apparatus according to claim 12, wherein the processing unit is specifically configured to:

acquiring a first image matrix of the first converted image;

acquiring a second image matrix of the second converted image;

and performing iterative training on the first generation network by adopting a first loss function according to the first image matrix and the second image matrix until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold.

14. The apparatus according to claim 12, wherein the processing unit is specifically configured to:

acquiring the first discrimination network and the second discrimination network, wherein the first discrimination network is a network to be trained, the second discrimination network is a network for executing training operation, and the second discrimination network and the second generation network belong to the same generation countermeasure network;

and performing iterative training on the first generation network by adopting a second loss function according to the first characteristic information set and the second characteristic information set until the similarity of the first converted image output by the first generation network and the second converted image output by the second generation network reaches a preset threshold.

15. The apparatus according to any one of claims 12 to 14, wherein the processing unit is specifically configured to:

16. The apparatus according to any one of claims 12 to 14, wherein the processing unit is specifically configured to:

17. A server, comprising: memory, transceiver, processor, and bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory, and comprises the following steps:

acquiring a first generation network and a second generation network, wherein the first generation network is a network to be trained, the second generation network is a mature network for executing training operation, and the parameter quantity in the first generation network is smaller than the parameter quantity in the second generation network;

inputting an image to be converted and a test image into the first generation network to obtain a first converted image;

inputting the image to be converted and the test image into the second generation network to obtain a second converted image;

acquiring a third generation network, wherein the third generation network is a network obtained after the iterative training is executed for the first generation network;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

18. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 11.