CN110197229B

CN110197229B - Training method and device of image processing model and storage medium

Info

Publication number: CN110197229B
Application number: CN201910470449.7A
Authority: CN
Inventors: 陈嘉伟; 李悦翔; 郑冶枫
Original assignee: Tencent Healthcare Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2022-06-07
Anticipated expiration: 2039-05-31
Also published as: CN110197229A

Abstract

The invention provides a training method, a device and a storage medium of an image processing model; the image processing model comprises at least: a first generator and a color checker, the method comprising: performing feature fusion and image conversion on the first image group of the source domain through a first generator to obtain a second image group of a corresponding target domain carrying fusion features; processing the second image group through a color checker, determining the relative relationship of color modes between images in the second image group, and determining the accuracy of the second image group relative to a target image group of a target domain; determining a value of a loss function of the color checker based on the determined relative relationship and accuracy; based on the value of the loss function, model parameters of the image processing model are updated. In this way, training of an image processing model for completing image conversion between different domains can be achieved.

Description

Training method and device of image processing model and storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a training method and a training device for an image processing model and a storage medium.

Background

Machine Learning (ML) is a branch of artificial intelligence, and aims to make a machine learn according to a priori knowledge, so that the machine has logical capability of classification and judgment. Machine learning models represented by neural networks are continuously developed and are gradually applied to auxiliary diagnosis of rectal cancer and gastric cancer in the medical field.

In the related art, the neural network model for the auxiliary diagnosis of the rectal cancer and the gastric cancer is obtained by training based on endoscope video data of hospitals, and due to the fact that imaging devices and imaging parameters of various hospitals are not uniform, endoscope videos of different hospitals have different color modes (such as color distribution, illumination intensity and the like), the neural network model obtained by training based on the video data of one hospital is not suitable for other hospitals.

Disclosure of Invention

Embodiments of the present invention provide a method and an apparatus for training an image processing model, and a storage medium, which can implement training of an image processing model for completing image conversion between different domains.

The embodiment of the invention provides a training method of an image processing model, wherein the image processing model at least comprises the following steps: a first generator and a color checker, the method comprising:

performing feature fusion on the first image group of the source domain through the first generator to obtain fusion features, and performing image conversion based on the fusion features to obtain a second image group of the corresponding target domain carrying the fusion features;

processing the second image group through the color checker, determining the relative relation of color modes among the images in the second image group, and determining the accuracy of the second image group relative to the target image group of the target domain;

determining a value of a loss function of the color checker based on the relative relationship and the accuracy;

updating model parameters of the image processing model based on the value of the loss function.

In the foregoing solution, the image processing model further includes a second generator, and the apparatus further includes: a second conversion unit and a first training unit;

the second conversion unit is configured to perform image reconstruction on the second image group through the second generator to obtain a third image group corresponding to the source domain, and perform image conversion on a fourth image group of the target domain to obtain a fifth image group corresponding to the source domain;

the first conversion unit is further configured to perform image reconstruction on the fifth image group through the first generator to obtain a sixth image group corresponding to the target domain;

the first training unit is used for training the first generator based on the difference between the first image group and the third image group and the difference between the fifth image group and the sixth image group.

In the above solution, the first training unit is further configured to determine a value of a loss function of the first generator based on a difference between the first image group and the third image group and a difference between the fifth image group and the sixth image group;

determining a respective first error signal based on the loss function of the first generator when the value of the loss function of the first generator reaches a first threshold;

and reversely propagating the first error signal in the first generator, and updating the model parameters of each layer of the first generator in the process of propagation.

In the above solution, the first conversion unit is further configured to perform feature extraction on a source frame image and a reference frame image respectively in response to that the first image group includes the source frame image and the reference frame image, so as to obtain an image feature of the source frame image and an image feature of the reference frame image;

fusing the image characteristics of the source frame image and the image characteristics of the reference frame image to obtain fused characteristics;

and performing image conversion on the first image group based on the fusion characteristics to obtain a second image group carrying the target color mode of the corresponding target domain of the fusion characteristics.

In the above scheme, the first conversion unit is further configured to decode the fusion feature in combination with the image feature of the source frame image, so as to obtain a source frame image of a target color mode of a corresponding target domain carrying the fusion feature;

decoding the fusion features by combining the image features of the reference frame image to obtain a reference frame image of a target color mode of a corresponding target domain carrying the fusion features;

and the source frame image of the target color mode of the target domain and the reference frame image of the target color mode of the target domain form a second image group of the target domain.

In the above scheme, the reference frame image is a first frame image in the video corresponding to the source domain, and the reference frame image is a frame image different from the first frame image in the video corresponding to the source domain.

In the above scheme, the color verification unit is further configured to obtain a gray level histogram of each frame of image in the second image group;

and determining the gray level histogram distance between the images in the second image group, and taking the gray level histogram distance as the relative relation of the color modes.

In the above scheme, the loss determining unit is further configured to obtain a first difference between the relative relationship and a target relative relationship, and a second difference between the accuracy and a target accuracy;

determining a value of a loss function of the color checker based on the first difference value and the second difference value.

In the foregoing solution, the parameter updating unit is further configured to determine a corresponding second error signal based on the loss function of the color checker when the value of the loss function exceeds a second threshold;

and propagating the second error signal in the image processing model in a reverse direction, and updating the model parameters of the image processing model in the process of propagation.

In the foregoing solution, the image processing model further includes a discriminator, and the apparatus further includes:

and the judging unit is used for processing the second image group through the discriminator, determining the accuracy of the pixel point level of each frame of image in the second image group, wherein the accuracy of the pixel point level indicates the matching degree of the pixel point in the image and the pixel point of the corresponding position of the corresponding image in the target image group.

The embodiment of the present invention further provides a training device for an image processing model, where the image processing model at least includes: a first generator and a color checker, the apparatus comprising:

the first conversion unit is used for performing feature fusion on the first image group of the source domain through the first generator to obtain fusion features, and performing image conversion based on the fusion features to obtain a second image group of the corresponding target domain carrying the fusion features;

the color checking unit is used for processing the second image group through the color checker, determining the relative relation of color modes among the images in the second image group and determining the accuracy of the second image group relative to the target image group of the target domain;

a loss determination unit for determining a value of a loss function of the color checker based on the relative relationship and the accuracy;

a parameter updating unit for updating the model parameters of the image processing model based on the value of the loss function.

The embodiment of the invention also provides an image processing method, which comprises the following steps:

obtaining a first image group of a source domain;

performing feature fusion on the first image group through a first generator included in an image processing model to obtain fusion features, and performing image conversion based on the fusion features to obtain a second image group of a corresponding target domain carrying the fusion features;

the image processing model is obtained by training based on the training method of the image processing model provided by the embodiment of the invention.

An embodiment of the present invention further provides an image processing apparatus, where the apparatus includes:

an acquisition unit configured to acquire a first image group of a source domain;

the processing unit is used for carrying out feature fusion on the first image group through a first generator included in an image processing model to obtain fusion features, and carrying out image conversion based on the fusion features to obtain a second image group of a corresponding target domain carrying the fusion features;

the image processing model is trained based on the training method of the image processing model provided by the embodiment of the invention.

An embodiment of the present invention further provides a training apparatus for an image processing model, where the apparatus includes:

a memory for storing executable instructions;

and the processor is used for realizing the training method of the image processing model provided by the embodiment of the invention when the executable instructions stored in the memory are executed.

The embodiment of the invention also provides a storage medium, wherein the storage medium stores executable instructions for causing a processor to execute, so that the training method of the image processing model provided by the embodiment of the invention is realized.

a memory for storing executable instructions;

and the processor is used for realizing the image processing method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

The embodiment of the invention also provides a storage medium, wherein the storage medium stores executable instructions for causing a processor to execute the executable instructions so as to realize the image processing method provided by the embodiment of the invention.

The application of the embodiment of the invention has the following beneficial effects:

1) performing feature fusion and image conversion on the first image group of the source domain through a first generator to obtain a second image group of a corresponding target domain carrying fusion features; therefore, the conversion of the image from the source domain to the target domain is realized, namely, the unification of the images in different domains is realized, and further, the neural network model obtained by training the image data based on the target domain can be applied to the image data of the source domain;

meanwhile, the second image group obtained by conversion carries the fusion characteristic, so that the image content obtained by domain conversion is ensured not to be distorted and changed, and the usability of the converted image data is enhanced.

2) Determining the value of a loss function of the color checker based on the relative relationship of the color modes between the images in the second image group and the accuracy of the second image group relative to a target image group of a target domain, and updating the model parameters of the image processing model based on the value of the loss function; therefore, the value of the loss function of the color checker is determined based on the relative relationship of the color modes among the images in the second image group, so that the relative consistency of the color modes among the images in the second image group obtained by conversion is ensured, meanwhile, the model parameters of the image processing model are updated through the value of the loss function of the color checker, the constraint of the value of the loss function of the color checker on the first generator is realized, and the accuracy of the second image group of the target domain obtained by conversion of the first generator is higher.

Drawings

FIG. 1 is a schematic diagram of a CycleGAN model provided in the related art;

fig. 2 is a schematic diagram of AugGAN provided in the related art;

FIG. 3 is a schematic diagram of an architecture of a training system for an image processing model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a structure of a training apparatus for an image processing model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a structure of an image processing model according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a method for training an image processing model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an X-shape generator according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a residual block according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of a dense fusion module according to an embodiment of the present invention;

FIG. 10 is a schematic diagram illustrating the operation of a color checker according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a model of an arbiter provided in an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of an image processing model according to an embodiment of the present invention;

FIG. 13 is a flowchart illustrating a method for training an image processing model according to an embodiment of the present invention;

FIG. 14 is a schematic diagram of a training data set provided by an embodiment of the present invention;

FIG. 15 is a schematic diagram of an application scenario of an image processing model according to an embodiment of the present invention;

fig. 16 is a schematic structural diagram of a training apparatus for an image processing model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first", "second", etc. merely distinguish similar objects and do not denote a particular order, but rather denote a particular order, and it is to be understood that "first", "second", etc. may be interchanged under appropriate circumstances with a particular order or sequence order, such that embodiments of the invention described herein may be practiced in other than those illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) The domain corresponds to an image data set, the image data set in the same domain is acquired by the same image acquisition device/imaging device, correspondingly, the source domain is a first image data set corresponding to the first video data, the target domain is a second image data set corresponding to the second video data, and the images in different domains have different color modes, such as different color distribution and different illumination intensity;

for example, the image or image group of the source domain is an image dataset corresponding to an endoscopic video acquired by a medical imaging device of a first hospital, and the image or image group of the target domain is an image dataset corresponding to an endoscopic video acquired by a medical imaging device of a second hospital.

2) The gray histogram is a statistic of gray level distribution in an image, and the occurrence frequency of all pixels in a digital image is counted according to the gray level, which represents the number of pixels with a certain gray level in the image and reflects the occurrence frequency of the certain gray level in the image.

3) A color mode, corresponding to a digital representation of the visual colors of the image, including the brightness, color distribution, color temperature, etc. of the image; wherein the color distribution can be represented by color histogram color distribution, etc.; the different color distributions correspond to different brightness and/or color distributions, e.g., the first image is reddish relative to the second image, and the second image is brightly relative to the first image.

4) In response to the condition or state on which the performed operation depends, one or more of the performed operations may be in real-time or may have a set delay when the dependent condition or state is satisfied; there is no restriction on the order of execution of the operations performed unless otherwise specified.

In order to realize conversion from a source domain image to a target domain image, the related art provides a cyclic-consistency generation countermeasure network (cyclic-coherent generated adaptive Networks) model, fig. 1 is a schematic diagram of a cyclic GAN model provided by the related art, and referring to fig. 1, the cyclic GAN is essentially two mirror-symmetric GANs, which form an annular network, and the two GANs share two generators and are respectively provided with a discriminator, that is, two discriminators and two generators are shared. The network structure of the CycleGA N model enables one-to-one pairing mapping relationship between training data to be unnecessary to be established between a source domain and a target domain, and image conversion (unpaired image-to-image conversion) of unpaired images is realized. One-way GAN as shown in part (a) of FIG. 1, X being an image sample of the source domain, Y being an image sample of the target domain, D_XIs a discriminator of the source domain, G is a generator of the source domain, D_yIs a discriminator of the target domain, F is a generator of the target domain, a generator G of the source domain and a discriminator D of the target domain_yAgainst each other, i.e. G tries to produce a sample from the desired distribution, D_YAn attempt is made to distinguish the original image from the generated image. Meanwhile, the CycleGAN also introduces a constraint condition of cycle-consistency, namely that the image sample real X of a real source domain has X_real→Y_fake→X_fakeAiming at inputting the image X in the source domain_realCapable of mapping to a desired target domain Y_fake。

The method can realize the image style transformation from the image in the source domain to the image in the target domain, however, although the CycleGAN introduces the constraint condition of cycle consistency, the CycleGAN has ambiguity on the image geometric transformation: that is, the image content will be distorted after the conversion from the source domain to the target domain, and the CycleGAN can still restore the distorted image to the original image in the source domain according to the mapping correspondence. As the image content of the training data is not limited, the geometric transformation does not influence the reality of the generated image, so that the discriminator can be easily cheated, and for domain adaptation, the image content is distorted, so that the usability of conversion data is reduced, and the performance of a migration model is influenced.

For the domain adaptation of video data, the CycleGAN has several color modes (modes) in the video data distribution in the same domain, and there is a diversity of color distribution and brightness, so that it is a necessary condition to keep the consistency of the image color distribution in the same video. Because CycleGAN provides image-to-image conversion, performing domain adaptive conversion on a video by framing (two or more frames) makes it difficult to keep that each frame is mapped to the same color mode, i.e., images in multiple color modes are likely to exist in a multi-frame image after domain conversion.

Related art provides a generation countermeasure network (GAN-based Data Augmentation) based on Data enhancement, fig. 2 is a schematic diagram of the principle of the AugGAN provided in the related art, and referring to fig. 2, the AugGAN introduces a sub-module based on the CycleGAN, and by adding a segmentation task (P in the figure)_xNetwork and P_yNetwork) to extract structure-aware (structure-aware) information, ensuring that generated samples can be successfully converted to a target domain, and simultaneously well maintaining the consistency of image content, thereby achieving the purpose of data enhancement based on domain adaptation.

With continued reference to FIG. 2, X and Y each represent a different domain, e.g., X represents a source domain, Y represents a target domain, and image data for the source domain is processed E_xPerforming down-sampling (down-sampling) processing and residual block (residual bl ock) processing to realize feature dimension reduction to obtain a feature domain Z1, and similarly, processing the image data of the target domain by E_yDown-sampling processing and residual block(residual block) processing is carried out to realize feature dimension reduction and obtain a feature domain Z2, the obtained two feature domains are respectively butted with two decoders, taking Z1 as an example, on one hand, Z1 is processed by a decoder G_xProcessing residual block and Up-sampling (Up-sampling) to obtain image conversion result

The conversion of the image data from the source domain to the target domain, i.e. the data disguise,

warp E_yAnd G_yProcessing to realize image reconstruction and obtain reconstructed image data X corresponding to the source domain_rec(ii) a On the other hand, Z1 is decoded by P_xThe residual block processing and the upsampling processing are performed to obtain a decoded mask, and the mask is used for making clear the structural information of the image when the image is converted, namely, knowing what the different positions (objects/backgrounds, etc.) in the image are respectively.

On one hand, the AugGAN model is applied to image conversion, so that the distortion of the converted image content can be avoided, however, the AugGAN model depends on a segmentation label, namely, training data needs to include corresponding pixel-level marking information, so that the data source cost is high, and the practical application is difficult to expand; on the other hand, when the AugGAN model is used for image conversion, there is also a problem that images of a plurality of color modes exist in the converted image like the CycleGAN model.

Based on the image processing model obtained through training, the image processing model provided by the embodiment of the invention can realize the conversion of the image from the source domain to the target domain, namely, realize the unification of images in different domains, so that the neural network model obtained through training based on the image data of the target domain can be applied to the image data of the source domain, the converted image content is not distorted and changed, and the relative consistency of color modes among the images in the source domain is kept.

First, a training system of an image processing model according to an embodiment of the present invention is described, fig. 3 is a schematic structural diagram of the training system of the image processing model according to an embodiment of the present invention, and referring to fig. 3, in order to support an exemplary application, the training system 100 of the image processing model includes a terminal (including a terminal 40-1 and a terminal 40-2) and a server 200, the terminal is connected to the server 200 through a network 300, the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented using a wireless link.

A terminal (terminal 40-1 and/or terminal 40-2) for transmitting an image set corresponding to a video of a source domain to the server 200;

the server 200 is configured to perform feature fusion and image conversion on the first image group of the source domain through a first generator included in the image processing model to obtain a second image group of a corresponding target domain carrying fusion features;

processing the second image group through a color checker included in the image processing model, determining a relative relationship of color patterns among the images in the second image group, and determining accuracy of the second image group relative to a target image group of a target domain;

determining a value of a loss function of the color checker based on the relative relationship and the accuracy, and updating a model parameter of the image processing model based on the value of the loss function;

the terminal (terminal 40-1 and/or terminal 40-2) is further configured to send an image conversion request to the server 200, where the image conversion request carries an image set to be converted corresponding to a video of the source domain;

and the server 200 is configured to perform image conversion on the image set to be converted by using the first generator in the image processing model obtained through training, so as to obtain an image set corresponding to the target domain.

In some embodiments, the terminal is provided with an image processing client, the image processing client sends the image of the source domain to the server, and the server performs domain conversion on the image of the source domain sent by the image processing client by using a first generator in the trained image processing model to obtain an image corresponding to the target domain, and returns the image to the image processing client.

Next, an image processing model training apparatus and an image processing apparatus according to an embodiment of the present invention will be described. The training apparatus of the image processing model and the image processing apparatus of the embodiment of the present invention can be implemented in various forms, such as: the method is implemented independently by terminals such as a smart phone, a tablet computer and a desktop computer, or implemented cooperatively by the terminals and a server. The training apparatus for image processing models and the image processing apparatus provided in the embodiments of the present invention may be implemented as hardware or a combination of hardware and software, and various exemplary implementations of the apparatus provided in the embodiments of the present invention will be described below by taking the training apparatus for image processing models in the embodiments of the present invention as an example.

The hardware structure of the training apparatus for an image processing model according to the embodiment of the present invention is described in detail below, and fig. 4 is a schematic structural diagram of the training apparatus for an image processing model according to the embodiment of the present invention, it should be understood that fig. 4 only shows an exemplary structure of the training apparatus for an image processing model, and not all structures, and a part of the structures or all structures shown in fig. 4 may be implemented as needed.

The training device of the image processing model provided by the embodiment of the invention comprises: at least one processor 401, memory 402, a user interface 403, and at least one network interface 404. The various components of the training apparatus of the image processing model are coupled together by a bus system 405. It will be appreciated that the bus system 405 is used to enable communications among the components. The bus system 405 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 405 in fig. 4.

The user interface 403 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.

It will be appreciated that the memory 402 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a Flash Memory (Flash Memory), and the like. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM). The memory 402 described in connection with embodiments of the invention is intended to comprise these and any other suitable types of memory.

The memory 402 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 40-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 40-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.

As an example of the image processing model training apparatus implemented by combining software and hardware, the image processing model training apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 401, where the software modules may be located in a storage medium, the storage medium is located in the memory 402, the processor 401 reads executable instructions included in the software modules in the memory 402, and the image processing model training method provided by the embodiment of the present invention is completed in combination with necessary hardware (for example, including the processor 401 and other components connected to the bus 405).

By way of example, the Processor 401 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.

As an example of the hardware implementation of the training apparatus for the image processing model provided in the embodiment of the present invention, the apparatus provided in the embodiment of the present invention may be implemented directly by using the processor 401 in the form of a hardware decoding processor, for example, the training method for implementing the gesture recognition model provided in the embodiment of the present invention is implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The memory 402 in embodiments of the present invention is used to store various types of data to support the operation of the training apparatus 40 of the image processing model. Examples of such data include: any executable instructions for operating on the training apparatus 40 of the image processing model, such as executable instructions, may be included in the executable instructions, and the program implementing the training method of the image processing model according to the embodiment of the present invention may be included in the executable instructions.

Before explaining a training method of an image processing model provided by an embodiment of the present invention, a structure of the image processing model provided by the embodiment of the present invention is explained, fig. 5 is a schematic structural diagram of the image processing model provided by the embodiment of the present invention, and referring to fig. 5, the image processing model provided by the embodiment of the present invention includes:

a first generator 51, configured to perform feature fusion and image conversion on an input first image group of a source domain to obtain a second image group of a corresponding target domain carrying a fusion feature;

a first color checker 52 for processing the second image group output from the first generator 51, determining a relative relationship of color patterns between images in the second image group, and determining an accuracy of the second image group with respect to a target image group of a target domain; the relative relationship between the color modes of the images can be the gray histogram distance.

In some embodiments, the image processing model may further include:

a second generator 53, configured to perform image reconstruction on the second image group output by the first generator 51, so as to obtain a third image group corresponding to the source domain; and the image conversion module is used for carrying out image conversion on the fourth image group of the target domain to obtain a fifth image group corresponding to the source domain.

In some embodiments, the image processing model may further include:

the first discriminator 54 is configured to process the second image group output by the first generator 51, and determine the accuracy of the pixel point level of each frame of image in the second image group, where the accuracy of the pixel point level indicates the matching degree between the pixel point in the image and the pixel point at the corresponding position of the corresponding image in the target image group.

The first color checker 52 and the first discriminator 54 included in the image processing model, and the first generator 51 and the second generator 53 can perform a competing training, and the first generator 51 in the trained image processing model can be used to perform the conversion from the image of the source domain to the target domain.

In some embodiments, to accommodate the image-to-source domain conversion of the target domain, the image processing model may further include:

a third generator model 55, configured to perform feature fusion and image conversion on the input sixth image group of the target domain to obtain a seventh image group of the corresponding source domain carrying the fusion features;

a second color checker 56 for processing the seventh image group output by the third generator model, determining the relative relationship of the color patterns between the images in the seventh image group, and determining the accuracy of the seventh image group relative to the target image group of the source domain; wherein, the relative relationship of the color modes between the images can be the distance of the gray level histogram;

the fourth generator model 57 is configured to perform image reconstruction on the seventh image group output by the third generator model to obtain an eighth image group corresponding to the target domain; and the image conversion module is used for carrying out image conversion on the ninth image group of the source domain to obtain a tenth image group corresponding to the target domain.

The second discriminator 58 is configured to process the seventh image group output by the third generator model 55, determine the accuracy of the pixel point level of each frame of image in the seventh image group, where the accuracy of the pixel point level indicates the matching degree between the pixel point in the image and the pixel point at the corresponding position of the corresponding image in the target image group.

The second color checker 56 and the second discriminator 58, which are included in the image processing model, may implement a confrontation training with the third generator model 55 and the fourth generator model 57, and further may implement the conversion from the image of the target domain to the source domain by using the third generator model 55 in the trained image processing model.

Based on the above description of the structure of the image processing model, a method for training an image processing model according to an embodiment of the present invention will be described below. Fig. 6 is a flowchart illustrating a method for training an image processing model according to an embodiment of the present invention, where in some embodiments, the method may be implemented by a server or a terminal, or implemented by a server and a terminal in a cooperative manner, and taking the server as an example, referring to fig. 6, the method for training an image processing model according to an embodiment of the present invention includes:

step 601: and the server performs feature fusion and image conversion on the first image group of the source domain through a first generator included in the image processing model to obtain a second image group of the corresponding target domain carrying the fusion features.

In some embodiments, before the server performs the training of the image processing model, i.e. before performing step 601, the training data may be preprocessed, i.e. the image dataset of the source domain, in some embodiments the server may process the image of the source domain as follows: each frame of image is resized, e.g., to 286 x 286 for each frame of image, then normalized, e.g., to between-1 and 1, and then randomly cropped (e.g., to 256 x 256) and/or randomly flipped upside down for data enhancement.

In some embodiments, the first image group includes a source frame image and a reference frame image, in practical applications, the reference frame image may be a first frame image in the video of the source domain, that is, a first frame image of the video, and the source frame image is any frame image in the video of the source domain except the first frame image.

In some embodiments, the server may implement the conversion of the first image into the second image group by: respectively extracting the features of the source frame image and the reference frame image to obtain the image features of the source frame image and the image features of the reference frame image; fusing the image characteristics of the source frame image and the image characteristics of the reference frame image to obtain fused characteristics; and based on the fusion characteristics, performing image conversion on the first image group to obtain a second image group carrying the fusion characteristics and corresponding to the target color mode of the target domain.

In some embodiments, the server may obtain the second set of images carrying the target color patterns of the corresponding target gamut of the fused feature by: decoding the fusion characteristics by combining the image characteristics of the source frame image to obtain a source frame image of a target color mode of a corresponding target domain carrying the fusion characteristics; decoding the fusion characteristics by combining the image characteristics of the reference frame image to obtain a reference frame image of a target color mode of a corresponding target domain carrying the fusion characteristics; the source frame image of the target color mode of the target domain and the reference frame image of the target color mode of the target domain constitute a second image group of the target domain.

In practical implementation, the first Generator may be implemented by an X-shape Generator (Generator), and fig. 7 is a schematic structural diagram of the X-shape Generator according to the embodiment of the present invention, as shown in fig. 7, the X-shape Generator according to the embodiment of the present invention includes: convolution layer (convolution), Max P pooling layer (Max P pooling), residual Block, Dense Fusion Block (Dense Fusion Block), and up-sampling Block (UM); for the convolution layer, the size of a convolution kernel is 7 multiplied by 7, the number of channels is 64, and the down-sampling multiple is 2; for the maximum pooling layer, the pooling kernel size is 2 × 2; for four residual blocks, 64 in the first residual block, 128 in the second residual block, 256 in the third residual block, and 512 in the fourth residual block are dimensions of the output data.

Still referring to fig. 7, the convolutional layer, the max pooling layer, and the residual block constitute an encoding portion of the X-shape generator, and the upsampling module and the convolutional layer constitute a decoding portion of the X-shape generator; the image characteristics of a source frame image and the image characteristics of a reference frame image extracted by a coding part of an X-shape generator are input into a dense fusion module, the dense fusion module performs characteristic fusion on the image characteristics of the source frame image and the image characteristics of the reference frame image to obtain fusion characteristics, an up-sampling module performs characteristic up-sampling on the source frame image and the reference frame image respectively, and then decodes the fusion characteristics by combining the image characteristics obtained by the up-sampling respectively to obtain the reference frame image and the source frame image of a target color mode of a corresponding target domain carrying the fusion characteristics.

By applying the first generator of the embodiment of the invention, the image group input by the first generator is composed of the first frame image in the video of the source domain and any frame image except the first frame image, so that the consistency of the generation directions of the rest video frames can be restrained; the feature fusion process can be regarded as adding watermarks to the other image mutually, so that the problem of image deformation and distortion caused by the inherent characteristics of circular reconstruction is solved, and once any image is distorted, the potential watermark is damaged and the circular consistency cannot be maintained; and the fusion characteristics are decoded by combining the image characteristics obtained by up-sampling, so that the loss of image information can be avoided.

Next, a description is given to a residual block in a first generator, fig. 8 is a schematic structural diagram of the residual block provided in the embodiment of the present invention, and referring to fig. 8, the residual block provided in the embodiment of the present invention includes three convolutional layers, it can be seen from fig. 8 that, for one residual block, a data trend is divided into two branches, one branch is subjected to three-layer convolution processing, and the other branch directly adds an input to an output of the convolutional layers, so that loss of image feature information is avoided, and gradient calculation in a model training process is facilitated.

Next, a dense fusion module in the first generator is explained, fig. 9 is a schematic structural diagram of the dense fusion module provided in the embodiment of the present invention, and referring to fig. 9, the dense fusion module provided in the embodiment of the present invention includes: an average pooling layer, a splicing layer, a convolution layer and an adjusting layer; wherein, B is the Batch size, it is known that two B × 512 × 8 × 8 output by the encoding portion of the first generator become B × 512 × 1 × 1, that is, B × 512, after being processed by the average pooling layer, the spliced layers are spliced according to two dimensions to obtain B × 1024, and then are processed by the convolution layer to obtain B × 32768, and finally are processed by the adjustment layer to obtain B × 512 × 8 × 8, so that the fusion of the content information of the two frames of images is realized, which is equivalent to adding a watermark to the other image.

In some embodiments, the image processing model further includes a second generator, and the second image group is subjected to image reconstruction by the second generator to obtain a third image group corresponding to the source domain, and the fourth image group of the target domain is subjected to image conversion to obtain a fifth image group corresponding to the source domain; carrying out image reconstruction on the fifth image group through a first generator to obtain a sixth image group corresponding to the target domain; the first generator is trained based on the difference between the first image set and the third image set and the difference between the fifth image set and the sixth image set.

Here, in practical applications, taking the source domain as the a domain and the target domain as the B domain as an example, the loss function corresponding to the first generator and the second generator may be a round-robin reconstruction loss function as follows:

wherein G is_ABAnd G_BAAre all generators, wherein G_ABResponsible for converting the real A-domain image into the B-domain, G_BAThe image of a real B domain is converted into an A domain; x is the number of_AIs a true A-domain image, x_BAn image that is a true B domain; g_AB(x_A) The image of true A-domain is G_ABPost-converting the obtained false B-domain image; g_BA(G_AB(x_A) ) false B-domain image passes through G_BAGenerating a false A-domain image, namely performing cyclic reconstruction to obtain the false A-domain image; i₁Represents L₁Norm, | | G_BA(G_AB(x_A))-x_A||₁Aiming at false A-domain images and trues obtained by expected cyclic reconstructionThe images in the domain a are close to the same mode, that is, the similarity is greater than a preset threshold, and in some embodiments, the histogram distance between two frames of images is smaller than the preset threshold; in the same way, G can be known_AB(G_BA(x_B))-x_B||₁The meaning of each part in (1).

In some embodiments, the server may implement training of the first generator by:

determining a value of a loss function of the first generator based on a difference between the first image group and the third image group and a difference between the fifth image group and the sixth image group; determining a corresponding first error signal based on the loss function of the first generator when the value of the loss function of the first generator reaches a first threshold; the first error signal is propagated back in the first generator and model parameters of the layers of the first generator are updated during the propagation.

Describing backward propagation, inputting training sample data into an input layer of a neural network model, passing through a hidden layer, finally reaching an output layer and outputting a result, which is a forward propagation process of the neural network model, wherein because the output result of the neural network model has an error with an actual result, an error between the output result and the actual value is calculated, the error is reversely propagated from the output layer to the hidden layer until the error is propagated to the input layer, and in the process of the reverse propagation, the value of a model parameter is adjusted according to the error; and continuously iterating the process until convergence.

Taking the Loss function as (1) for example, the server determines a first error signal based on the Loss function, the first error signal counter-propagates from the output layer of the first generator or the second generator, counter-propagates the first error signal layer by layer, when the first error signal reaches each layer, the gradient (i.e. the partial derivative of the Loss function on the layer parameter) is solved by combining the conducted first error signal, and the corresponding gradient value is updated for the layer parameter.

Step 602: processing the second image group by a color checker included in the image processing model, determining a relative relationship of color patterns between images in the second image group, and determining an accuracy of the second image group with respect to a target image group of the target domain.

Step 603: determining a value of a loss function of the color checker based on the relative relationship and the accuracy.

In some embodiments, a Color checker (Color Validator) may determine the relative relationship of Color patterns between images in the second image set by: respectively acquiring a gray level histogram of each frame of image in the second image group; and determining the distance of the gray level histogram between the images in the second image group, and taking the determined distance of the gray level histogram as the relative relation of the color modes.

Fig. 10 is a schematic diagram of an operating principle of a color checker according to an embodiment of the present invention, referring to fig. 10, in an actual implementation, after a first generator outputs a reference frame image and a source frame image corresponding to a target domain obtained by conversion, the converted reference frame image and source frame image are input to the color checker in a stacking or splicing manner, and meanwhile, a real reference frame image and source frame image (i.e., a target image group) of a B domain corresponding to the reference frame image and source frame image are also input to the color checker, on one hand, the color checker calculates a gray-scale histogram distance between the converted reference frame image and source frame image, on the other hand, the color checker serves as a discriminator to calculate accuracy of a second image group with respect to a target image group of the target domain based on the real reference frame image and source frame image (i.e., the target image group), that is, to calculate accuracy of the converted reference frame image and source frame image with respect to the real reference frame image and source frame image of the B domain The trueness of the image is to be noted here that, since the reference frame image and the source frame image are stacked or spliced and input to the color checker, the reference frame image and the source frame image are determined as a whole for the determination of the authenticity.

In practical implementation, the color checker is trained based on its loss function, and in some embodiments, the server may determine the value of the loss function of the color checker by: acquiring a first difference value of the relative relation and the target relative relation and a second difference value between the accuracy and the target accuracy; and determining the value of the loss function of the color checker based on the first difference value and the second difference value. Therefore, the first generator can be constrained by the relative relationship of the color modes between the images in the first image group and the accuracy of the image group of the target domain obtained by conversion, so that the images obtained by conversion by the first generator still keep the change trend of the color modes of the images in the source domain, for example, the reference frame in the source domain is brighter than the source frame, and the reference frame obtained by conversion by the first generator is still brighter than the source frame, namely, the gray histogram distance between the reference frame and the source frame is ensured to be unchanged as much as possible.

Illustratively, the server obtains a first gray histogram distance between the converted reference frame image and the source frame image, a first difference value between a second gray histogram distance between the reference frame image of the real source domain and the source frame image, and a second difference value between the accuracy (e.g. 0.4) of the converted reference frame image and the source frame image and the accuracy (e.g. 1) of the reference frame image of the real target domain and the source frame image; and determining the value of the loss function of the color checker based on the first difference value and the second difference value.

Here, the loss function of the color checker is explained. In practical applications, the Loss function of the color checker includes two, one is a color histogram Loss function (color histogram Loss) and the other is an intra-video Loss function (intra-video Loss), that is, the Loss function Loss of the color checker is: loss ═ color histogram Loss + intra-video Loss.

First, a color histogram loss function will be described, and in actual implementation, the color histogram loss function may be as follows:

wherein G is_BAAs previously described;

for the source frame image in the true B domain,

is a real reference frame image in B domain;

is a real reference frame image in B domain G_BAConverting the obtained false A-domain reference frame image;

for true source frame image in B domain via G_BAConverting the obtained false A-domain source frame image;

used for calculating the gray level histogram distance of two A-domain images,

is just calculating

And

the gray histogram distance between; hist_rcdThe relative color distribution, i.e., the relative color distribution, is equivalent to the gray histogram distance, and hist_rcdTo calculate the gray-scale histogram distance of the two images,

i.e. calculate two truths

And

the gray histogram distance between; i₁Represents L₁The norm of the number of the first-order-of-arrival,

aimed at the distance between grey histogram between images desired to be converted and the actual imageThe distance of the gray histogram is consistent, namely, the change trend of the color mode is kept unchanged.

Next, an internal loss function of the color checker is explained, and in practical implementation, the internal loss function of the color checker may be as follows:

wherein D is_AAs a discriminator, D_AThe image processing method is responsible for judging the authenticity of the image of the A domain; x is the number of_AIs a true A-domain image, x_BAn image that is a true B domain; g_BA(x_B) The image of the real B domain passes through G_BAA false A-domain image generated later;

is intended to cope with false A-domain images, i.e. G_BA(x_B) The closer the judgment result of (1) is to 0, the better, namely, the color checker can easily judge whether the sample is true or false.

Here, it should be noted that, since the color checker determines the reference frame image and the source frame image as a whole, in practical applications, an image with a size of 6 × 256 may be considered as an input.

Step 604: based on the value of the loss function, model parameters of the image processing model are updated.

In some embodiments, the server may update the model parameters of the image processing model by:

determining a corresponding second error signal based on the loss function of the color checker when the value of the loss function exceeds a second threshold; the second error signal is propagated back in the image processing model, and model parameters of the image processing model are updated in the process of propagation. In this way, the constraint and adjustment of the model parameters of the first generator based on the loss of the color checker is achieved.

In some embodiments, the image processing model further includes a discriminator, and each frame of image in the second image group output by the first generator is input to the discriminator, so that the discriminator processes each frame of image in the second image group, and determines the accuracy of the pixel point level of each frame of image in the second image group, where the accuracy of the pixel point level indicates the matching degree between the pixel point in the image and the pixel point at the corresponding position of the corresponding image in the target image group.

Fig. 11 is a schematic model diagram of the discriminator according to the embodiment of the present invention, and referring to fig. 11, in actual implementation, the discriminator may use a full convolution neural network of an encoder-decoder, and perform authenticity judgment on each frame image converted by the first generator through the full convolution neural network, and output a judgment result that is consistent with the size of the input image, that is, judge whether all pixel points of the frame are true or false to constrain the generation effect of the generator.

In actual implementation, the server trains the discriminators based on their loss functions, and the loss functions of the discriminators will be described next. In practical applications, the loss function of the discriminator may be an adaptive loss function (adaptive loss) as follows:

wherein D is_AAs a discriminator, D_AThe image processing method is responsible for judging the authenticity of the image of the A domain; x is the number of_AIs a true A-domain image, x_BAn image that is a true B domain; g_BA(x_B) The image of the true B domain passes through G_BAA false A-domain image generated later;

is intended to cope with false A-domain images, i.e. G_BA(x_B) The closer the judgment result of (1) is to 0, the better, namely, the discriminator can easily discriminate true and false samples.

It is obvious that the loss function of the discriminator has the same functional form as the internal loss function of the color checker, and differs from the color checker in the size of the model input image, taking the example of the input of the color checker being an image with a size of 6 × 256, and the input of the discriminator being an image with a size of 3 × 256.

In practical implementation, after the server calculates the loss based on the loss function of the arbiter, the arbiter is updated layer by using a back propagation algorithm until the loss function of the arbiter converges.

Based on the above description, the color checker and the discriminator included in the image processing model are respectively used for performing countermeasure training with the first generator, and further, the first generator in the trained image processing model can be used for realizing the image conversion from the image of the source domain to the image of the target domain.

The description continues on the training method of the image processing model provided by the embodiment of the present invention. Fig. 12 is a schematic structural diagram of an image processing model according to an embodiment of the present invention, and referring to fig. 12, the image processing model includes two parts, a first part for implementing conversion/migration of a target color mode from an a-domain frame image to a B-domain, and a second part for implementing conversion/migration of a target color mode from a B-domain frame image to an a-domain; fig. 13 is a schematic flowchart of a method for training an image processing model according to an embodiment of the present invention, where the method may be implemented by a terminal, and with reference to fig. 12 and 13, the method for training an image processing model according to an embodiment of the present invention includes:

step 701: and the terminal constructs a frame image set of the A domain based on the video corresponding to the A domain.

All the frame images of the video file are obtained based on the video file corresponding to the A domain, and then the obtained frame images are preprocessed to form a frame image set of the A domain. The video file corresponding to the a field may be a colon endoscope video file of a specific hospital.

In practical implementation, the terminal may perform the following preprocessing on the obtained frame image:

each frame of image is resized, e.g., to 286 x 286 for each frame of image, then normalized, e.g., to between-1 and 1, and then randomly cropped (e.g., to 256 x 256) and/or randomly flipped upside down for data enhancement.

Step 702: and determining a real A-domain source frame and a real A-domain reference frame based on the A-domain frame image set.

Here, in practical applications, the reference frame of the a domain is a first frame in the video corresponding to the a domain, and the source frame is any frame except the first frame.

Step 703: inputting real A-domain source frame and reference frame into generator model G_ABAnd outputting the converted false B-domain source frame and reference frame.

Step 704 a: splicing the false source frame and reference frame of B domain, inputting the spliced image into a color checker C_BAnd obtaining the gray histogram distance of the source frame and the reference frame in the false B domain and the accuracy of the source frame and the reference frame in the false B domain relative to the source frame and the reference frame in the real B domain.

Step 704 b: inputting false B-domain source frame and reference frame into discriminator D frame by frame_BThe accuracy of the source frame of the false B domain relative to the source frame of the real B domain and the accuracy of the reference frame of the false B domain relative to the reference frame of the real B domain are respectively obtained.

Step 704 c: inputting false B-domain source frame and reference frame into generator model G_BAAnd performing image reconstruction, and outputting a false A-domain source frame and a false A-domain reference frame.

Here, in actual implementation, the execution order of step 704a, step 704b and step 704c is not sequential, and may be executed simultaneously.

Step 705: calculating a color checker C based on the gray level histogram distance and accuracy obtained in step 704a_BAnd based on the calculated losses, updating the generator model G using a back propagation algorithm_ABThe model parameters of (2).

Step 706: based on the accuracy obtained in step 704b, a discriminator D is calculated_BAnd based on the calculated losses, updating the generator model G using a back propagation algorithm_ABThe model parameters of (1).

Step 707: based on the difference between the false A-domain source frame and reference frame and the real A-domain source frame and reference frameCalculation generator model G_ABAnd based on the calculated losses, updating the generator model G using a back propagation algorithm_ABThe model parameters of (1).

Here, it should be noted that the execution order of step 705, step 706, and step 707 is interchangeable, and the present invention is not limited in this embodiment, for example, the execution order is the order in which step 706, step 705, and step 707 are sequentially executed.

In practical application, the training of each model in the second part of the image processing model is the same as the training process principle of the first part, and details are not repeated here.

In practical application, for the trained image processing model, the generator model G included in the image processing model can be utilized_ABPerforming a migration and transformation of the source domain image to the target domain, e.g. inputting the reference frame image and the source frame image of the source domain into the generator model G_ABBy means of the generator model G_ABPerforming feature fusion and image conversion on the reference frame image and the source frame image of the source domain to obtain a reference frame image and a source frame image of a corresponding target domain, and training to obtain a generator model G_ABThe converted reference frame image and source frame image corresponding to the target domain still maintain the relative relationship of the color mode between the reference frame image and source frame image of the source domain under the condition that the image content is not distorted, and the conversion accuracy is high compared with the real reference frame image and source frame image of the target domain.

The image processing model and the training method thereof provided by the embodiment of the invention are explained. In practical implementation, the method for training the image processing model provided by the embodiment of the invention may include the following operations:

firstly, before training of an image processing model, a training data set is established, and preprocessing is performed to realize data enhancement, namely, original data expansion is realized.

Fig. 14 is a schematic diagram of a training data set provided by an embodiment of the present invention, and referring to fig. 14, in practical application MICCAI 2015 colonoscopy video polyp detection match data may be used as the training data set, including two hospital (domain) video databases: CVC-client (domain-A) and ETIS-Larib (domain-B) cause the data distribution of two domains to be greatly different due to the difference of endoscopic video acquisition equipment of two hospitals. Furthermore, even if there are multiple color modes (modes) in the same video data set, such as the CVC-client data set, the color distribution and brightness distribution of different videos are not completely consistent (yellow or red, brighter or darker), and the ETIS-drib data set is also.

For the establishment of a training data set, firstly, acquiring all frame images from a video, adjusting the image size of each frame image to 286 multiplied by 286, and normalizing the pixel value of the image to be between-1 and 1; the normalized frame image is then subjected to on-line random cropping (of size 256 × 256 in an image of size 286 × 286), and/or random up-down flipping of the line.

Next, an image processing model is described, referring to FIG. 12, which in some embodiments includes two X-shape generators, G each_ABAnd generator model G_BAAnd also comprises two Color validators, which are respectively C_AAnd C_BAnd two discriminators D_AAnd D_B。

Wherein, the X-shape Generator is a Generator model, which comprises a convolution layer, a maximum pooling layer, a residual block, a dense fusion module and an up-sampling module (UM, which consists of a 3 × 3 convolution layer and an up-sampling layer). In order to restrict the generated image of the same video to belong to the same mode in the target domain, the input of the generator is set as two frames of images (real images), namely a reference frame (the first frame of each video) and a source frame (each frame of the rest images), and finally the X-shape generator simultaneously outputs two corresponding frames (fake images) of the converted target domain. The reference frame (as anchor point) is constant throughout the video conversion process, so it appears multiple times during training, and gets more weight than other source frames, which can constrain the consistency of the generation direction of the remaining video frames. Furthermore, by embedding the features of the reference frame and the source frame extracted by the respective encoding sections into a new feature space by a dense fusion module, the fusion process can be regarded as adding watermarks to each other to the counterpart image, preventing image distortion problems caused by the inherent characteristics of the loop reconstruction, since once any image is distorted, the potential watermark will be destroyed and the loop consistency cannot be maintained.

Here, different infrastructure network structures can be used for the encoder portion of the X-shape Generator to achieve similar effects, such as initiation-v 3, ResNet, DenseNet, etc.

Dense fusion modules included in the generator model are described. The dense fusion module ensures that the source frames are mapped to the same pattern in the target domain and prevents image content distortion by embedding the geometric content information of the reference frames into the source frames. The dense fusion module comprises: average pooling layer, splice layer, 1 × 1 convolutional layer with channel size of 32768, feature map size adjustment layer. The fusion module reduces the size of the feature map to 1 multiplied by 1 through the pooling layer, and then utilizes splicing and convolution to perform feature fusion, thereby effectively avoiding mutual interference of the spatial information of the two feature maps.

A Color Validator included in the image processing model is explained. The Color Validator is a completely new discriminator, and its input is two converted frames stacked together, and finally the generator is constrained by two loss functions. Firstly, the color change trend of the frame image in the same video is adjusted through color histogram loss. For example, assuming the source frame is brighter than the reference frame, it is desirable to maintain this trend of change after the source frame is mapped to the target domain. Second, it acts as a second discriminator, evaluating the authenticity of two frames within the same video. Because the validator is trained based on any two frames of the same video in the target domain, the result of the generator transform must be close to two frames of images belonging to the same video to fool the discriminator.

A description will be given of a Discriminator included in the image processing model. And respectively judging the authenticity of each converted frame by adopting an encoder-decoder full convolution network. The input of the network is a frame of image converted by the generator, and the output is a judgment result consistent with the size of the input image, namely the Discriminator judges whether all pixel points of the frame are true or false to restrict the generation effect of the generator.

Next, training of the image processing model will be explained. The method comprises the following steps that the Color Validator and the X-shape Generator alternately perform antagonistic training, and the Discriminator and the X-shape Generator alternately perform antagonistic training, wherein the Color Validator and the Discriminator can simultaneously perform interactive training, and the model parameters of an image processing model are updated layer by running a back propagation algorithm based on a loss function of the Color Valid Generator and a loss function of the Discriminator until the loss functions converge, so that the Color Validator and the Di descriptor are trained; when the X-shape Generator is trained, the model parameters of the Color Validator and the Discriminator are ensured to be unchanged, and the model parameters of the X-shape Generator are updated layer by running a back propagation algorithm based on a loss function of the X-shape Generator until the loss function converges.

Wherein, the loss function of the X-shape Generator is shown in the formula (1), the loss function of the Color Validator is the result of adding the formula (2) and the formula (3), and the loss function of the Discriminator is shown in the formula (3).

Next, an application scenario of the trained image processing model will be described. In some embodiments, referring to fig. 15, fig. 15 is an application scene schematic diagram of an image processing model provided in an embodiment of the present invention, a front end a (i.e., a terminal a) acquires video data (a video capture frame image needs to be taken as a reference frame, and a pair of the first frame and a pair of the remaining frames are taken as input frames), and then uploads the video data to a background server (a back end), and the back end performs domain adaptation conversion on the video data by using an X-shape Generator in the image processing model obtained through training, and then outputs a conversion result to a front end B; wherein, the front end A and the front end B can be the same or different.

Continuing with the training apparatus for image processing models provided in the embodiment of the present invention, in some embodiments, the training apparatus for image processing models may be implemented by using software modules, and fig. 16 is a schematic structural diagram of the training apparatus for image processing models provided in the embodiment of the present invention, where the image processing models at least include: referring to fig. 16, a first generator and a color checker, according to an embodiment of the present invention, a training apparatus for an image processing model includes:

a first converting unit 161, configured to perform feature fusion and image conversion on the first image group of the source domain through the first generator to obtain a second image group of the corresponding target domain carrying the fusion features;

a color checking unit 162, configured to process the second image group through the color checker, determine a relative relationship between color patterns in the images in the second image group, and determine an accuracy of the second image group with respect to a target image group of the target domain;

a loss determining unit 163 for determining a value of a loss function of the color checker based on the relative relationship and the accuracy;

a parameter updating unit 164 for updating model parameters of the image processing model based on the values of the loss function.

In some embodiments, the image processing model further comprises a second generator, the apparatus further comprising: a second conversion unit and a first training unit;

In some embodiments, the first training unit is further configured to determine a value of a loss function of the first generator based on differences between the first image set and the third image set and differences between the fifth image set and the sixth image set;

In some embodiments, the first conversion unit is further configured to, in response to that the first image group includes a source frame image and a reference frame image, perform feature extraction on the source frame image and the reference frame image respectively to obtain an image feature of the source frame image and an image feature of the reference frame image;

In some embodiments, the first conversion unit is further configured to decode the fusion feature in combination with an image feature of the source frame image, so as to obtain a source frame image of a target color mode of a corresponding target domain carrying the fusion feature;

In some embodiments, the reference frame image is a first frame image in the video corresponding to the source domain, and the reference frame image is a frame image different from the first frame image in the video corresponding to the source domain.

In some embodiments, the color checking unit is further configured to obtain a gray level histogram of each frame of image in the second image group;

determining the gray level histogram distance between the images in the second image group, and taking the gray level histogram distance as the relative relation of the color modes.

In some embodiments, the loss determining unit is further configured to obtain a first difference between the relative relationship and a target relative relationship, and a second difference between the accuracy and a target accuracy;

In some embodiments, the parameter updating unit is further configured to determine a corresponding second error signal based on the loss function of the color checker when the value of the loss function exceeds a second threshold;

In some embodiments, the image processing model further comprises an arbiter, the apparatus further comprising:

An embodiment of the present invention further provides an image processing apparatus, including:

the processing unit is used for carrying out feature fusion and image conversion on the first image group through a first generator included in an image processing model to obtain a second image group of a corresponding target domain carrying fusion features;

the image processing model further comprises a color checker, which is used for processing the second image group, determining the relative relationship of color modes between the images in the second image group, and determining the accuracy of the second image group relative to the target image group of the target domain;

and updating the model parameters of the image processing model based on the value of the loss function of the color checker, wherein the value of the loss function is obtained based on the relative relation and the accuracy.

Here, it should be noted that: the above description related to the apparatus is similar to the above description of the method, and for the technical details not disclosed in the apparatus according to the embodiment of the present invention, please refer to the description of the method embodiment of the present invention.

The embodiment of the invention also provides a training device of the image processing model, which comprises:

a memory for storing executable instructions;

In some embodiments, the storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EE PROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method for training an image processing model, wherein the image processing model comprises at least: a first generator and a color checker, the method comprising:

responding to that a first image group of a source domain comprises a source frame image and a reference frame image, and respectively extracting the features of the source frame image and the reference frame image through the first generator to obtain the image features of the source frame image and the image features of the reference frame image;

performing image conversion based on the fusion features to obtain a second image group of the corresponding target domain carrying the fusion features;

2. The method of claim 1, wherein the image processing model further comprises a second generator, the method further comprising:

performing image reconstruction on the second image group through the second generator to obtain a third image group corresponding to the source domain, and performing image conversion on a fourth image group of the target domain to obtain a fifth image group corresponding to the source domain;

performing image reconstruction on the fifth image group through the first generator to obtain a sixth image group corresponding to the target domain;

training the first generator based on the difference between the first image set and the third image set and the difference between the fifth image set and the sixth image set.

3. The method of claim 2, wherein training the first generator based on the differences between the first image set and the third image set and the differences between the fifth image set and the sixth image set comprises:

determining a value of a loss function of the first generator based on a difference between the first image group and the third image group and a difference between the fifth image group and the sixth image group;

4. The method of claim 1, wherein the performing image transformation based on the fused feature to obtain a second set of images of corresponding target domains carrying the fused feature comprises:

5. The method of claim 4, wherein said image transforming said first set of images based on said fused features to obtain a second set of images of corresponding target domains carrying fused features comprises:

decoding the fusion characteristics by combining the image characteristics of the source frame image to obtain a source frame image of a target color mode of a corresponding target domain carrying the fusion characteristics;

6. The method of claim 4 or 5,

the reference frame image is a first frame image in the video corresponding to the source domain, and the source frame image is a frame image different from the first frame image in the video corresponding to the source domain.

7. The method of claim 1, wherein said processing said second set of images by said color checker to determine relative relationships between color patterns in said second set of images comprises:

respectively acquiring a gray level histogram of each frame of image in the second image group;

8. The method of claim 1, wherein determining the value of the loss function of the color checker based on the relative relationship and the accuracy comprises:

acquiring a first difference value of the relative relation and a target relative relation and a second difference value between the accuracy and a target accuracy;

9. The method of claim 1, wherein updating model parameters of the image processing model based on the values of the loss function comprises:

determining a corresponding second error signal based on the loss function of the color checker when the value of the loss function exceeds a second threshold;

10. The method of claim 1, wherein the image processing model further comprises an arbiter, the method further comprising:

and processing the second image group through the discriminator to determine the accuracy of the pixel point level of each frame of image in the second image group, wherein the accuracy of the pixel point level indicates the matching degree of the pixel point in the image and the pixel point at the corresponding position of the corresponding image in the target image group.

11. An apparatus for training an image processing model, wherein the image processing model comprises at least: a first generator and a color checker, the apparatus comprising:

the first conversion unit is used for responding to a first image group of a source domain including a source frame image and a reference frame image, and respectively extracting the features of the source frame image and the reference frame image through the first generator to obtain the image features of the source frame image and the image features of the reference frame image; fusing the image characteristics of the source frame image and the image characteristics of the reference frame image to obtain fused characteristics; performing image conversion based on the fusion features to obtain a second image group of the corresponding target domain carrying the fusion features;

a parameter updating unit for updating model parameters of the image processing model based on the value of the loss function.

12. An image processing method, characterized in that the method comprises:

obtaining a first image group of a source domain;

responding to that the first image group comprises a source frame image and a reference frame image, and respectively extracting the features of the source frame image and the reference frame image through a first generator included in an image processing model to obtain the image features of the source frame image and the image features of the reference frame image;

fusing the image characteristics of the source frame image and the image characteristics of the reference frame image to obtain fusion characteristics;

wherein the image processing model is trained based on the method of any one of claims 1 to 10.

13. An image processing apparatus, characterized in that the apparatus comprises:

the processing unit is used for responding to the fact that the first image group comprises a source frame image and a reference frame image, respectively extracting the features of the source frame image and the reference frame image through a first generator which comprises an image processing model, and obtaining the image features of the source frame image and the image features of the reference frame image; fusing the image characteristics of the source frame image and the image characteristics of the reference frame image to obtain fused characteristics, and performing image conversion based on the fused characteristics to obtain a second image group of the corresponding target domain carrying the fused characteristics;

14. An apparatus for training an image processing model, the apparatus comprising:

a memory for storing executable instructions;

a processor for implementing the method of training an image processing model of any of claims 1 to 10 when executing executable instructions stored in the memory.

15. A storage medium storing executable instructions for causing a processor to perform a method of training an image processing model according to any one of claims 1 to 10 when executed.