CN112633103A

CN112633103A - Image processing method and device and electronic equipment

Info

Publication number: CN112633103A
Application number: CN202011478673.XA
Authority: CN
Inventors: 卜乐平; 王腾; 闫正军; 杨植凯; 侯新国; 欧阳继能; 周扬; 王灿
Original assignee: Naval University of Engineering PLA
Current assignee: Naval University of Engineering PLA
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-04-09

Abstract

The invention provides an image processing method and device and electronic equipment. Wherein, the method comprises the following steps: acquiring a first image and a second image, wherein the second image contains a target object; inputting the first image and the second image into an image processing model which is trained in advance, and outputting a conversion image; wherein the converted image comprises a scene of the first image and the target object; the image processing model comprises a generator model, and the image processing of the first image and the second image is carried out through the generator model, and the generator model is constructed based on an attention mechanism and a U-net structure. According to the method, the target object can be extracted from the second image, the extracted target object is migrated into the first image to obtain the converted image, the target object is added into the given scene video, and the style information is fused, so that high-quality background details are kept, the diversity of the target object in the scene can be ensured, and the migrated scene has high visual reality.

Description

Image processing method and device and electronic equipment

Technical Field

The invention relates to the technical field of video fire detection, in particular to an image processing method, an image processing device and electronic equipment.

Background

Compared with the traditional video fire identification and fire detection method, the fire identification and detection method based on deep learning can achieve extremely high accuracy. However, the deep learning model is highly data-driven, and a fire disaster identification network with high accuracy and strong robustness needs a large amount of data support. In certain scenarios, the dangerous nature of a fire can pose a certain difficulty in the large number of acquisitions of fire images. Features suitable for describing flame movement and changes in large spaces are sought.

Since the method of deep learning depends on the number and quality of the samples of the training set, the imbalance between the positive samples and the negative samples affects the accuracy of deep neural network recognition. Due to safety constraints, ignition cannot be directly implemented in many special spatial scenarios, which will result in inconsistency between the training set and the test set. Data collected by a fixed test site has the defect of single background and interference condition, if the interference is similar to the characteristics of flame, a model or a network for identifying flame can cause false alarm due to failure of learning in advance to a great extent, and the action and the effect of a fire identification algorithm based on deep learning under a special space scene can be limited.

Disclosure of Invention

In view of the above, the present invention provides an image processing method, an image processing apparatus, and an electronic device, so as to improve the action and effect of a fire recognition algorithm based on deep learning in a special spatial scene.

In a first aspect, an embodiment of the present invention provides an image processing method, where the method includes: acquiring a first image and a second image, wherein the second image contains a target object; inputting the first image and the second image into an image processing model which is trained in advance, and outputting a conversion image; wherein the converted image comprises a scene of the first image and the target object; the image processing model comprises a generator model through which image processing of the first image and the second image is performed, the generator model being constructed based on an attention mechanism and a U-net structure.

In a preferred embodiment of the present invention, the step of inputting the first image and the second image into the image processing model trained in advance and outputting the converted image includes: extracting a target object image containing a target object from the second image; splicing the target object image and the first image to obtain an image to be converted; the image to be converted comprises a scene of the first image and a target object; and inputting the image to be converted into the generator model, and outputting the converted image.

In a preferred embodiment of the present invention, the target object is a flame.

In a preferred embodiment of the present invention, the generator model comprises a first convolution layer and a second convolution layer with the same depth; the U-net structure is in jump connection between the first convolution layer and the second convolution layer; the attention mechanism is loaded at a designated position of the generator model; wherein, the designated position at least comprises one of the following: front, middle, or end of the generator model.

In a preferred embodiment of the present invention, the image processing model further includes a discriminator model; the structure of the discriminator model comprises a conditional generation countermeasure network CGAN structure and a Markov discriminator PatchGAN structure.

In a preferred embodiment of the present invention, the image processing model is trained by the following steps: acquiring a first training sample, a second training sample and a real sample; the second image comprises a training object, and the real sample is an image which is shot in the scene of the first training sample and comprises a training sample; extracting a training object image containing a training object from the second training sample; splicing the training object image and the first training sample to obtain a sample to be converted; the sample to be converted comprises a scene of a first training sample and a training object; inputting a sample to be converted into a generator model, and outputting a conversion sample; wherein the conversion sample comprises a scene and a training object of the first training sample; inputting the conversion sample, the real sample and the sample to be converted into a discriminator model, and outputting a loss value of the image processing model; adjusting parameters of the image processing model based on the loss values; and continuously executing the steps of obtaining the first training sample, the second training sample and the real sample until the loss value meets the preset training end condition, and determining the image processing model obtained by the current training as the trained image processing model.

In a preferred embodiment of the present invention, the loss function of the image processing model includes a norm loss function of L1, a generator loss function, and a discriminator loss function.

In a second aspect, an embodiment of the present invention further provides an image processing apparatus, including: the image acquisition module is used for acquiring a first image and a second image, wherein the second image contains a target object; the image processing module is used for inputting the first image and the second image into an image processing model which is trained in advance and outputting a conversion image; wherein the converted image comprises a scene of the first image and the target object; the image processing model comprises a generator model, and the image processing of the first image and the second image is carried out through the generator model, and the generator model is constructed based on an attention mechanism and a U-net structure.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the steps of the image processing method described above.

In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, which stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the steps of the image processing method described above.

The embodiment of the invention has the following beneficial effects:

according to the image processing method, the image processing device and the electronic equipment provided by the embodiment of the invention, the first image and the second image containing the target object are input into the image processing model which is trained in advance, and the converted image is output. According to the method, the target object can be extracted from the second image, the extracted target object is transferred into the first image to obtain the converted image, the target object is added into the given scene video and the fusion of the style information is completed, so that high-quality background details are kept, the diversity of the target object in the scene can be ensured, and the transferred scene has high visual reality.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of flame scene migration according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a generator model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a discriminator model according to an embodiment of the present invention;

FIG. 6 is a parameter diagram of a discriminator model according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a patch gan structure according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any creative effort, shall fall within the protection scope of the present invention.

Currently, in certain scenarios, the dangerous nature of fire can cause certain difficulties in the large-scale acquisition of fire images. The characteristics suitable for describing flame motion and change in the large space are searched, and a convenient and quick model is built according to the characteristics so as to generate flame images and continuous flame sequence data, so that the method has important meanings for displaying flame combustion scenes in the large space and reducing the manufacturing cost of flame combustion scene data sets.

Thanks to the development of generating confrontation networks, a large number of image generation models are proposed and used in the field of image generation of faces, scenes, animation head portraits and the like. The models can generate images with certain specified characteristics, such as sex, hair color, expression and other characteristics of the face images in large quantity according to requirements. And by deconstructing and recombining the image features, a completely new image that never appears in the dataset can be obtained. Therefore, the embodiment of the invention provides that the flame recorded in other similar scenes is transferred to a specific scene, so that the fire video data in the restrictive situation is increased. In consideration of information retention and transmission, the generation of the countermeasure network is adopted, the defect that the image information is unwound to the hidden space in the existing method is overcome, flames are added into a given scene video, and style information fusion is completed, so that high-quality background details are kept. Experiments show that the method can ensure the diversity of flame forms in the scene, and the migrated scene has higher visual reality.

Based on this, the image processing method, the image processing device and the electronic equipment provided by the embodiment of the invention can transfer flames recorded in other similar scenes to a specific scene, so that fire video data in a restrictive situation is increased. In consideration of information retention and transmission, the generation of the countermeasure network is adopted, the defect that the image information is unwound to the hidden space in the existing method is overcome, flames are added into a given scene video, style information fusion is completed, and high-quality background details are kept. Experiments show that the method provided by the embodiment can ensure the diversity of flame forms in the scene, and the scene after migration has higher visual reality.

To facilitate understanding of the present embodiment, a detailed description will be given of an image processing method disclosed in the present embodiment.

The first embodiment is as follows:

the present embodiment provides an image processing method, referring to a flowchart of an image processing method shown in fig. 1, the image processing method including the steps of:

step S102, a first image and a second image are obtained, wherein the second image comprises a target object.

The present embodiment provides a method, which aims to transfer a target object from a second image into a first image, wherein the target object may be a flame, the first image may also be referred to as a scene to be transferred, and the second image may also be referred to as an ignition scene.

Step S104, inputting the first image and the second image into an image processing model which is trained in advance, and outputting a conversion image; wherein the converted image comprises a scene of the first image and the target object; the image processing model includes a generator model by which image processing of the first image and the second image is performed, the generator model being constructed based on an attention mechanism and a U-net structure.

The image processing model in the present embodiment includes a generator model by which image processing of the first image and the second image is performed. Wherein the generator model is constructed based on an attention mechanism and a U-net structure.

The attention mechanism may enable the neural network to focus on a subset of its inputs (or features): a particular input is selected. Attention may be applied to any type of input regardless of its shape. In the case of limited computing power, an attention mechanism (attention mechanism) is a resource allocation scheme of a main means for solving the problem of information overload, and computing resources are allocated to more important tasks.

The U-net network structure is symmetrical and is similar to English letter U, so that the U-net network structure is called as Unet, and a full convolution network is adopted to classify images pixel by pixel, so that a good effect can be achieved in the field of image segmentation. The structure of the device consists of a convolution compression layer at the left end and a transposition convolution amplification layer at the right end; the left end and the right end are connected, and when the right end carries out transposition convolution operation, the results after front convolution of the left end for several times can be spliced, so that more information can be obtained.

Specifically, the flame image kernel can be extracted from the ignition scene, and the flame image kernel is transferred to the scene to be transferred to obtain the image to be converted. The reality of the image to be converted is poor, the image to be converted can be input into the image processing model, convolution calculation is carried out through the generator model, and the converted image with high reality is obtained.

According to the image processing method provided by the embodiment of the invention, the first image and the second image containing the target object are input into the image processing model which is trained in advance, and the converted image is output. According to the method, the target object can be extracted from the second image, the extracted target object is transferred into the first image to obtain the converted image, the target object is added into the given scene video, the fusion of style information is completed, high-quality background details are kept, the diversity of the target object in the scene can be guaranteed, and the transferred scene has high visual reality.

Example two:

the embodiment provides another image processing method, which is implemented on the basis of the above embodiment; the present embodiment focuses on a specific implementation in which the image processing model outputs the converted image. Referring to a flowchart of another image processing method shown in fig. 2, the image processing method in the present embodiment includes the steps of:

step S202, a first image and a second image are acquired, wherein the second image includes a target object.

Referring to fig. 3, a schematic flow chart of flame scene migration is shown, where the first image is the scene to be migrated in fig. 3, the second image is the ignition scene in fig. 3, and the target object is the flame in fig. 3.

In step S204, a target object image including the target object is extracted from the second image.

As shown in fig. 3, the image of the region to be migrated (scene to be migrated) is taken out from the scene without fire and is taken as I_nThen, a target object image (i.e., the flame image kernel C in fig. 3) containing the target object can be extracted from the ignition scene photographed under the other scenes under the same lighting condition_I)。

Step S206, splicing the target object image and the first image to obtain an image to be converted; the image to be converted includes a scene of the first image and a target object.

Kernel C of flame image_IImplant I_nTo obtain the image to be converted

The image to be converted comprises a scene of the first image and a target object.

Step S208, inputting the image to be converted into the generator model, and outputting the converted image.

The image to be converted can be processed through the generator model to obtain a converted image

Will be provided with

And splicing with the original scene to obtain the scene migrating into the flame.

Wherein the generator model comprises a first convolutional layer and a second convolutional layer with the same depth; the U-net structure is in jump connection between the first convolution layer and the second convolution layer; the attention mechanism is loaded at a designated position of the generator model; wherein, the designated position at least comprises one of the following: front, middle, or end of the generator model.

Referring to the structural diagram of a generator model shown in fig. 4, in order to avoid deformation of the background during the conversion process, the generator G adopts a U-net structure. The U-net has a jump connection between the convolutional layers of corresponding depths in the encoder and the decoder, which can well avoid blurring and warping of the output image, thereby ensuring the resolution of the background.

Through training, the attention mechanism in the generator can produce high response to the interested area, so that the effect of generating the image is improved in a targeted manner. According to the self function and the specific task characteristics, the attention mechanism can be realized in different modes and structures and can be loaded at the front end, the middle part and the tail end of the generator. The embodiment using image masksMode-fused pre-conversion images

And the converted image I_fThis process can be expressed as formula (1) and formula (2):

in formula (2), M is an attention mask layer obtained by 7X 7 winding layers; i is_fIn order to be a color mask,

is the input of G. As shown in fig. 2, the attention mechanism is located at the end of the flame conversion network G. Image picture

After U-net conversion, primary conversion images I are respectively formed_fAnd an attention mask M, the final converted image being obtained from equation (2)

The image processing model comprises a generator model and a discriminator model; the structure of the discriminator model includes a CGAN (Conditional generated countermeasure network) structure and a PatchGAN (markov discriminator) structure.

In the image conversion process, different conversion modes are provided for different objects, for example, information such as color and texture needs to be added in the vicinity of flames and flames, and other areas should be kept unchanged as much as possible. In order to enable the generator to learn more conversion modes of different objects, the embodiment of the invention combines two discriminator structures of CGAN and PatchGAN.

Referring to a schematic structural diagram of a discriminator model shown in fig. 5, an image to be converted and a target image or a generated image are first spliced (cat) according to channels, and the spliced image is used as an input of the discriminator, and the discriminator in the CGAN form can theoretically penalize any difference that may exist between the converted image and the target image.

Referring to a parameter diagram of a discriminator model shown in fig. 6, the discriminators in this embodiment have 4 layers of convolution, and all use convolution kernels with a size of 4 × 4, where the convolution step sizes of the first two layers are 2, and the convolution step sizes of the second two layers are 1. Each convolutional layer is followed by a batch normalization layer (Batchnorm) and a leakyreu activation layer (slope a 0.2) except for the last layer of convolution. After 4 layers of convolution, the output of the discriminator is a matrix of 30 × 30 × 1, and the matrix is expanded into a vector of 900 × 1, which is the output of the discriminator.

Referring to a schematic diagram of the principle of the patch gan structure shown in fig. 7, the patch gan structure in this embodiment does not evaluate the whole image, but divides the image into N × N small regions and determines the authenticity of each region. Thus, the generator can receive separate evaluations of the discriminator for images of different regions, rather than an overall evaluation of the entire picture.

The embodiment also provides another image processing method, which is realized on the basis of the embodiment; this embodiment focuses on a specific implementation of the training method of the image processing model, and the image processing model can be trained through steps 1 to 7:

step 1, obtaining a first training sample, a second training sample and a real sample; the second image contains a training object, and the real sample is an image which is shot in the scene of the first training sample and contains a training sample.

The first training sample is a scene to be migrated, the second training sample contains flames of a training object, and the real sample may be an image containing flames captured in the scene of the first training sample.

And 2, extracting a training object image containing a training object from the second training sample. First, a training object image of a flame may be included from a second training sample.

Step 3, splicing the training object image and the first training sample to obtain a sample to be converted; the sample to be converted includes a scene of the first training sample and a training object. After the training object images are extracted, the training object images can be spliced into the first training sample, and a sample to be converted with low authenticity is obtained.

Step 4, inputting a sample to be converted into a generator model, and outputting a conversion sample; wherein the transformation sample comprises a scene and a training object of the first training sample. And the generator model carries out image processing on the band conversion sample to obtain the conversion sample with stronger authenticity.

And 5, inputting the conversion sample, the real sample and the sample to be converted into the discriminator model, and outputting the loss value of the image processing model. The purpose of the discriminator model is to determine whether there is a difference between the transformed sample and the true sample.

And 6, adjusting parameters of the image processing model based on the loss value. The loss functions in this embodiment include the L1 norm loss function, the generator loss function, and the discriminator loss function. Joint Loss functions may be used in this embodiment, including L1Loss, antagonistic Loss, and Perceptual Loss (Perceptual Loss). The complete loss function is shown in equations (4.3.3) and (4.3.4), where

In order to generate the loss function of the generator,

this embodiment sets λ for the discriminator loss function₁＝λ₂＝1。

The L1Loss evaluates the similarity between two images by the average value of the L1 distance between corresponding pixel points of the two images. Converting images

With the actual image I_ycThe L1Loss between may be expressed as:

this example uses LSGAN, which is relatively stable in the training process, as the resistance loss. Inputs z-p to the generator_z(z) and target images x to p_data(x) LSGAN can be expressed as:

this embodiment adopts commonly used setting: a is 0, c is 1, and b is 1.

Because the CGAN structure is adopted, the loss functions of the final generator and the discriminator are respectively

Respectively mapping the generated image and the corresponding actual image in a pre-trained Deep convolutional neural network (Deep CNN, DCNN) feature space F by Perceptual Loss (Perceptual Loss), and mapping the feature values in the feature space through two imagesl₂The distance evaluation generates the similarity between the image and the actual image. A pre-trained VGG or inclusion is usually selected as the mapping network, and the present embodiment selects a VGG as the mapping network. The perceptual loss can be expressed as:

in the formula (10), H_l、W_lAnd C_lThe height, width and number of profiles for the l layers in the DCNN are respectively.

For two images x, x₀Characterization of the l-th layer in DCNN

Perceptual loss calculation

L2 distance. This example arrangement

And 7, continuing to execute the steps of obtaining the first training sample, the second training sample and the real sample until the loss value meets a preset training end condition, and determining the image processing model obtained by current training as a trained image processing model.

And after the parameters of the image processing model are adjusted each time, continuing to execute the next training operation until the loss value meets the preset training ending condition, ending the training, and determining the image processing model obtained by the current training as the trained image processing model. Wherein, the training end condition may be the convergence of the loss function.

The method provided by the embodiment comprises the following steps: a) the overall design of flame scene migration realizes the mapping from a fireless scene to a fire scene under the control of flame form conditions, and achieves the purpose of increasing the flame diversity by changing the flame form; b) the flame image scene migration network structure design based on the countermeasure network comprises a generator, a discriminator and a loss function. Experiments show that the flame combustion scene generated by the flame scene migration model provided by the embodiment of the invention is closer to the real situation, the background details can be better kept by the attention mechanism and the U-net structure, and the fusion between the flame adding area and the whole scene is natural, so that the flame scene migration model has higher visual reality and can meet the sample requirement of deep learning.

Example three:

corresponding to the above method embodiment, an embodiment of the present invention provides an image processing apparatus, referring to a schematic structural diagram of an image processing apparatus shown in fig. 8, the image processing apparatus including:

an image acquisition module 81 for acquiring a first image and a second image, wherein the second image contains a target object;

the image processing module 82 is used for inputting the first image and the second image into an image processing model which is trained in advance and outputting a conversion image; wherein the converted image comprises a scene of the first image and a target object; the image processing model comprises a generator model, and the image processing of the first image and the second image is carried out through the generator model, and the generator model is constructed based on an attention mechanism and a U-net structure.

In the image processing apparatus according to the embodiment of the present invention, the first image and the second image including the target object are input into the image processing model trained in advance, and the converted image is output. According to the method, the target object can be extracted from the second image, the extracted target object is transferred into the first image to obtain the converted image, the target object is added into the given scene video, the fusion of style information is completed, high-quality background details are kept, the diversity of the target object in the scene can be guaranteed, and the transferred scene has high visual reality.

The image processing module is configured to extract a target object image including a target object from the second image; splicing the target object image and the first image to obtain an image to be converted; the image to be converted comprises a scene of the first image and a target object; and inputting the image to be converted into the generator model, and outputting the converted image.

The target object is a flame.

The generator model comprises a first convolution layer and a second convolution layer with the same depth; the U-net structure is in jump connection between the first convolution layer and the second convolution layer; the attention mechanism is loaded at a designated position of the generator model; wherein, the designated position at least comprises one of the following: front, middle, or end of the generator model.

The image processing model also comprises a discriminator model; the structure of the discriminator model comprises a conditional generation antithetical network CGAN structure and a Markov discriminator PatchGAN structure.

The device also comprises a model training module, a first training sample, a second training sample and a real sample, wherein the model training module is used for acquiring the first training sample, the second training sample and the real sample; the second image contains a training object, and the real sample is an image which is shot in the scene of the first training sample and contains the training sample; extracting a training object image containing a training object from the second training sample; splicing the training object image and the first training sample to obtain a sample to be converted; the sample to be converted comprises a scene of a first training sample and a training object; inputting a sample to be converted into a generator model, and outputting a conversion sample; wherein the conversion sample comprises a scene and a training object of the first training sample; inputting the conversion sample, the real sample and the sample to be converted into a discriminator model, and outputting a loss value of the image processing model; adjusting parameters of the image processing model based on the loss values; and continuously executing the steps of obtaining the first training sample, the second training sample and the real sample until the loss value meets the preset training end condition, and determining the image processing model obtained by current training as the trained image processing model.

The loss function of the image processing model includes an L1 norm loss function, a generator loss function, and a discriminator loss function.

The image processing apparatus provided in the embodiment of the present invention has the same implementation principle and technical effect as those of the foregoing image processing method embodiment, and for brief description, reference may be made to corresponding contents in the foregoing image processing method embodiment for a part not mentioned in the embodiment of the image processing apparatus.

Example four:

the embodiment of the invention also provides electronic equipment, which is used for operating the image processing method; referring to fig. 9, an electronic device includes a memory 100 and a processor 101, where the memory 100 is used to store one or more computer instructions, and the one or more computer instructions are executed by the processor 101 to implement the image processing method.

Further, the electronic device shown in fig. 9 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.

The Memory 100 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 102 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The Processor 101 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash memory, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in the memory 100, and the processor 101 reads the information in the memory 100, and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the image processing method.

The image processing method, the image processing apparatus, and the computer program product of the electronic device provided in the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and/or the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through an intermediary, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in a specific case by those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships indicated on the basis of the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or changes to the technical solutions described in the foregoing embodiments or make equivalent substitutions for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring a first image and a second image, wherein the second image contains a target object;

inputting the first image and the second image into an image processing model which is trained in advance, and outputting a conversion image; wherein the converted image comprises the scene of the first image and the target object; the image processing model includes a generator model by which image processing of the first image and the second image is performed, the generator model being constructed based on an attention mechanism and a U-net structure.

2. The method of claim 1, wherein the step of inputting the first image and the second image into a pre-trained image processing model and outputting a transformed image comprises:

extracting a target object image containing the target object from the second image;

splicing the target object image and the first image to obtain an image to be converted; the image to be converted comprises a scene of the first image and the target object;

and inputting the image to be converted into the generator model, and outputting a converted image.

3. The method of claim 1, wherein the target object is a flame.

4. The method of claim 1, wherein the generator model comprises a first convolutional layer and a second convolutional layer of the same depth; the U-net structure is in a jump connection between the first convolutional layer and the second convolutional layer;

the attention mechanism is loaded at a specified location of the generator model; wherein the designated location comprises at least one of: a front end, a middle end, or a tail end of the generator model.

5. The method of claim 1, wherein the image processing model further comprises a discriminator model; the structure of the discriminator model comprises a conditional generation countermeasure network CGAN structure and a Markov discriminator PatchGAN structure.

6. The method of claim 5, wherein the image processing model is trained by:

acquiring a first training sample, a second training sample and a real sample; wherein the second image contains a training object, and the real sample is an image containing the training sample taken under the scene of the first training sample;

extracting a training object image containing the training object from the second training sample;

splicing the training object image and the first training sample to obtain a sample to be converted; the sample to be converted comprises a scene of the first training sample and the training object;

inputting the sample to be converted into the generator model, and outputting a conversion sample; wherein the transformed sample comprises a scene of the first training sample and the training object;

inputting the conversion sample, the real sample and the sample to be converted into the discriminator model, and outputting a loss value of the image processing model;

adjusting parameters of the image processing model based on the loss values;

and continuing to execute the steps of obtaining the first training sample, the second training sample and the real sample until the loss value meets a preset training end condition, and determining the image processing model obtained by current training as a trained image processing model.

7. The method of claim 6, wherein the loss functions of the image processing model comprise an L1 norm loss function, a generator loss function, and a discriminator loss function.

8. An image processing apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring a first image and a second image, wherein the second image contains a target object;

the image processing module is used for inputting the first image and the second image into an image processing model which is trained in advance and outputting a conversion image; wherein the converted image comprises the scene of the first image and the target object; the image processing model includes a generator model by which image processing of the first image and the second image is performed, the generator model being constructed based on an attention mechanism and a U-net structure.

9. An electronic system, characterized in that the electronic system comprises: the device comprises an image acquisition device, a processing device and a storage device;

the image acquisition equipment is used for acquiring an image;

the storage means has stored thereon a computer program which, when executed by the processing apparatus, performs the image processing method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processing device, carries out the steps of the image processing method according to any one of claims 1 to 7.