CN114418919B

CN114418919B - Image fusion method and device, electronic equipment and storage medium

Info

Publication number: CN114418919B
Application number: CN202210298017.4A
Authority: CN
Inventors: 林纯泽; 王权; 钱晨
Original assignee: Beijing Datianmian White Sugar Technology Co ltd
Current assignee: Beijing Datianmian White Sugar Technology Co ltd
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-07-26
Anticipated expiration: 2042-03-25
Also published as: CN114418919A; WO2023179074A1

Abstract

The present disclosure relates to an image fusion method and apparatus, an electronic device, and a storage medium, the method including: acquiring a first image and a second image to be fused, wherein the first image and the second image have the same object; respectively coding the first image and the second image to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image; in response to the setting operation of the fusion weight of any object attribute of the same object, fusing the first hidden variable and the second hidden variable according to the set fusion weight to obtain a fused third hidden variable; and decoding the third hidden variable to obtain a fused target image. The embodiment of the disclosure can enable the fused target image to meet different fusion requirements of users.

Description

Image fusion method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image fusion method and apparatus, an electronic device, and a storage medium.

Background

The attribute fusion of the face images refers to the fusion of the attributes of the faces in the two images, for example, the user may want to fuse the image 1 and the image 2, and want the face shape of the face in the fused image to be close to the face in the image 1, and the face shape of the face to be close to the face in the image 2. However, in the related art, only the two images can be integrated, so that the integration degree of the face shape and the face color in the two images is the same, that is, the two face attributes of the face shape and the face color cannot be controlled to be decoupled and integrated.

Disclosure of Invention

The present disclosure provides an image fusion technical scheme.

According to an aspect of the present disclosure, there is provided an image fusion method including: acquiring a first image and a second image to be fused, wherein the first image and the second image have the same object; respectively encoding the first image and the second image to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image; in response to the setting operation of the fusion weight of any object attribute of the same object, fusing the first hidden variable and the second hidden variable according to the set fusion weight to obtain a fused third hidden variable; and decoding the third hidden variable to obtain a fused target image.

In a possible implementation manner, the fusion weight includes a first weight corresponding to the first image and a second weight corresponding to the second image, where the fusing the first hidden variable and the second hidden variable according to the set fusion weight in response to the setting operation of the fusion weight for any object attribute of the same object to obtain a fused third hidden variable includes: determining a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable according to the type of the object attribute; and determining the third hidden variable according to the first weighted hidden variable and the second weighted hidden variable.

In one possible implementation, the first hidden variables are represented as M first N-dimensional vectors, the second hidden variables are represented as M second N-dimensional vectors, M and N are positive integers, the types of the object attributes include contour shapes of the object, the first weights include first sub-weights corresponding to contour shapes in the first image, and the second weights include second sub-weights corresponding to contour shapes in the second image; wherein the determining a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable according to the type of the object attribute comprises: under the condition that the object attribute comprises a contour shape, multiplying the first i first N-dimensional vectors in the M first N-dimensional vectors by the first sub-weight to obtain the first i first weighted N-dimensional vectors of the first weighted hidden variables; multiplying the first i second N-dimensional vectors in the M second N-dimensional vectors by the second sub-weight to obtain first i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M).

In a possible implementation manner, the first hidden variables are represented as M first N-dimensional vectors, the second hidden variables are represented as M second N-dimensional vectors, M and N are positive integers, the types of the object attributes include appearance colors of the object, the first weights include third sub-weights corresponding to the appearance colors in the first image, and the second weights include fourth sub-weights corresponding to the appearance colors in the second image; wherein the determining a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable according to the type of the object attribute comprises: multiplying the last M-i first N-dimensional vectors of the M first N-dimensional vectors by the third sub-weight under the condition that the object attribute comprises an appearance color to obtain last M-i first weighted N-dimensional vectors of the first weighted hidden variables; multiplying the last M-i second N-dimensional vectors in the M second N-dimensional vectors by the fourth sub-weight to obtain last M-i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M).

In one possible implementation, the determining the third hidden variable according to the first weighted hidden variable and the second weighted hidden variable is performed by: adding the first i first weighted N-dimensional vectors of the first weighted hidden variables to the first i second weighted N-dimensional vectors of the second weighted hidden variables to obtain first i third N-dimensional vectors of the third hidden variables; and adding the last M-i first weighted N-dimensional vectors of the first weighted hidden variables and the last M-i second weighted N-dimensional vectors of the second weighted hidden variables to obtain the last M-i third N-dimensional vectors of the third hidden variables.

In a possible implementation manner, the first weighted hidden variables are represented as M first weighted N-dimensional vectors, the second weighted hidden variables are represented as M second weighted N-dimensional vectors, the third hidden variables are represented as M third N-dimensional vectors, and the determining the third hidden variables according to the first weighted hidden variables and the second weighted hidden variables further includes: taking the last M-i first N-dimensional vectors of the first hidden variables corresponding to the first weighted hidden variables as the last M-i third N-dimensional vectors of the third hidden variables; or, the last M-i second N-dimensional vectors of the second hidden variables corresponding to the second weighted hidden variables are used as the last M-i third N-dimensional vectors of the third hidden variables.

In a possible implementation manner, the decoding processing on the third hidden variable to obtain a fused target image includes: in response to a style setting operation for an image style of the target image, determining a target generation network corresponding to the set image style, the target generation network being used for generating an image having the set image style; and decoding the third hidden variable by using the target generation network to obtain the target image.

In a possible implementation manner, the set image styles comprise a first image style and a second image style, the style types of the first image style and the second image style are different, the style setting operation is further used for setting a style fusion degree, and the style fusion degree is used for indicating the number of network layers fused between the first generation network and the second generation network; wherein, the determining a target generation network corresponding to the set image style comprises: determining a first generation network corresponding to the first image style and a second generation network corresponding to the second image style, the first generation network being used for generating images with the first image style, and the second generation network being used for generating images with the second image style; and performing network fusion on the first generation network and the second generation network according to the style fusion degree to obtain the target generation network.

In a possible implementation manner, the first generating network and the second generating network each have an M-layer network layer, where the network merging the first generating network and the second generating network according to the style merging degree to obtain the target generating network includes: replacing a front I-layer network layer of the first generation network with a front I-layer network layer of the second generation network to obtain the target generation network; or, replacing the posterior I-layer network layer of the first generation network with the posterior I-layer network layer of the second generation network to obtain the target generation network; the image style of the target image and the first image style are in negative correlation with the network layer number I, and the style proximity degree of the target image and the second image style is in positive correlation with the network layer number I.

In a possible implementation manner, the target generation network has an M-layer network layer, the third hidden variables are represented as M third N-dimensional vectors, and the decoding processing performed on the third hidden variables by using the target generation network to obtain the target image includes: inputting the 1 st third N-dimensional vector to a layer 1 network layer of the target generation network to obtain a 1 st intermediate graph output by the layer 1 network layer; inputting the mth third N-dimensional vector and the (M-1) th intermediate graph into an mth network layer of the target generation network to obtain the mth intermediate graph output by the mth network layer, wherein N belongs to [2, M ]; and inputting the Mth third N-dimensional vector and the M-1 th intermediate graph into the Mth layer network layer of the target generation network to obtain a style fusion image output by the Mth layer network layer, wherein the target image comprises the style fusion image.

In one possible implementation, the target image further includes: and at least one of a first style image obtained by decoding the third hidden variable by using the first generation network and a second style image obtained by decoding the third hidden variable by using the second generation network.

According to an aspect of the present disclosure, there is provided an image fusion apparatus including: the device comprises an acquisition module, a fusion module and a fusion module, wherein the acquisition module is used for acquiring a first image and a second image to be fused, and the first image and the second image have the same object; the encoding module is used for respectively encoding the first image and the second image to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image; the fusion module is used for responding to the setting operation of the fusion weight of any object attribute of the same object, and fusing the first hidden variable and the second hidden variable according to the set fusion weight to obtain a fused third hidden variable; and the decoding module is used for decoding the third hidden variable to obtain a fused target image.

In a possible implementation manner, the fusion weight includes a first weight corresponding to the first image and a second weight corresponding to the second image, where the fusion module includes: a weighted hidden variable determining submodule, configured to determine, according to the type of the object attribute, a first weighted hidden variable between the first weight and the first hidden variable, and a second weighted hidden variable between the second weight and the second hidden variable; and the fusion submodule is used for determining the third hidden variable according to the first weighted hidden variable and the second weighted hidden variable.

In a possible implementation manner, the first hidden variables are represented as M first N-dimensional vectors, the second hidden variables are represented as M second N-dimensional vectors, M and N are positive integers, the types of the object attributes include appearance colors of the object, the first weights include third sub-weights corresponding to appearance colors in the first image, and the second weights include fourth sub-weights corresponding to appearance colors in the second image; wherein the determining a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable according to the type of the object attribute comprises: multiplying the last M-i first N-dimensional vectors of the M first N-dimensional vectors by the third sub-weight under the condition that the object attribute comprises an appearance color to obtain last M-i first weighted N-dimensional vectors of the first weighted hidden variables; multiplying the last M-i second N-dimensional vectors in the M second N-dimensional vectors by the fourth sub-weight to obtain last M-i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M).

In one possible implementation manner, the determining the third hidden variables according to the first weighted hidden variables and the second weighted hidden variables is further performed by using the first weighted hidden variables and the second weighted hidden variables as the first weighted N-dimensional vectors, the second weighted hidden variables and the third weighted hidden variables, and the determining the third hidden variables according to the first weighted hidden variables and the second weighted hidden variables further includes: taking the last M-i first N-dimensional vectors of the first hidden variables corresponding to the first weighted hidden variables as the last M-i third N-dimensional vectors of the third hidden variables; or, the last M-i second N-dimensional vectors of the second hidden variables corresponding to the second weighted hidden variables are used as the last M-i third N-dimensional vectors of the third hidden variables.

In one possible implementation, the decoding module includes: a network determination sub-module for determining a target generation network corresponding to a set image style in response to a style setting operation for the image style of the target image, the target generation network being for generating an image having the set image style; and the decoding submodule is used for decoding the third hidden variable by using the target generation network to obtain the target image.

In a possible implementation manner, the set image styles include a first image style and a second image style, the style types of the first image style and the second image style are different, the style setting operation is further used for setting a style fusion degree, and the style fusion degree is used for indicating the number of network layers fused between the first generation network and the second generation network; wherein, the determining a target generation network corresponding to the set image style comprises: determining a first generation network corresponding to the first image style and a second generation network corresponding to the second image style, the first generation network being used for generating images with the first image style, and the second generation network being used for generating images with the second image style; and performing network fusion on the first generation network and the second generation network according to the style fusion degree to obtain the target generation network.

In a possible implementation manner, the first generating network and the second generating network each have an M-layer network layer, where the network merging the first generating network and the second generating network according to the style merging degree to obtain the target generating network includes: replacing a front I-layer network layer of the first generation network with a front I-layer network layer of the second generation network to obtain the target generation network; or, replacing the posterior I-layer network layer of the first generation network with the posterior I-layer network layer of the second generation network to obtain the target generation network; the image style of the target image is in positive correlation with the network layer number I, and the style proximity degree between the image style of the target image and the second image style is in positive correlation with the network layer number I.

In a possible implementation manner, the target generation network has an M-layer network layer, the third hidden variables are represented as M third N-dimensional vectors, and the decoding processing performed on the third hidden variables by using the target generation network to obtain the target image includes: inputting the 1 st third N-dimensional vector to a 1 st network layer of the target generation network to obtain a 1 st intermediate graph output by the 1 st network layer; inputting the mth third N-dimensional vector and the (M-1) th intermediate graph to an mth network layer of the target generation network to obtain the mth intermediate graph output by the mth network layer, wherein N belongs to [2, M ]; inputting the Mth third N-dimensional vector and the M-1 th intermediate graph into an Mth network layer of the target generation network to obtain a style fusion image output by the Mth network layer, wherein the target image comprises the style fusion image.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, a first latent variable corresponding to a first image and a second latent variable corresponding to a second image are obtained by encoding the first image and the second image to be fused, the first latent variable and the second latent variable are fused according to a set fusion weight of any object attribute to obtain a fused third latent variable, and the third latent variable is decoded to obtain a target image.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow chart of an image fusion method according to an embodiment of the present disclosure.

FIG. 2 shows a schematic diagram of a graphical interaction interface, in accordance with an embodiment of the present disclosure.

Fig. 3a shows a schematic diagram of a first image according to an embodiment of the present disclosure.

Fig. 3b shows a schematic diagram of a second image according to an embodiment of the present disclosure.

FIG. 4a shows a schematic diagram of a target image according to an embodiment of the present disclosure.

FIG. 4b shows a schematic diagram of a target image according to an embodiment of the present disclosure.

FIG. 5 shows a schematic diagram of a graphical user interface according to an embodiment of the present disclosure.

Fig. 6 shows a schematic diagram of an image fusion procedure according to an embodiment of the present disclosure.

Fig. 7 shows a schematic diagram of an image fusion process according to an embodiment of the present disclosure.

Fig. 8 shows a block diagram of an image fusion apparatus according to an embodiment of the present disclosure.

Fig. 9 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.

Fig. 1 shows a flowchart of an image fusion method according to an embodiment of the present disclosure, which may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling computer-readable instructions stored in a memory, or may be performed by a server. As shown in fig. 1, the image fusion method includes:

in step S11, a first image and a second image to be fused are acquired, and the first image and the second image have the same object therein.

The first image and the second image may be images acquired in real time by an image acquisition device, images extracted from a local storage, or images transmitted by other electronic devices, and it should be understood that a user may custom upload the first image and the second image to be fused. The embodiment of the present disclosure is not limited to the manner of acquiring the first image and the second image.

Among these, objects may include, but are not limited to: human face, human hand, human body, object, animal, plant, etc. The first image and the second image have the same object, which is understood to mean that the object in the first image and the object in the second image are the same but may not be the same, for example, the first image and the second image may both have faces, but the faces in the first image and the second image are not the faces of the same person, or the user desires to fuse two different faces in the first image and the second image.

In step S12, the first image and the second image are encoded to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image.

In a possible implementation manner, the first image and the second image may be encoded by image encoders corresponding to different objects, respectively, to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image. For example, when the object is a human face, a human face image encoder may be used, and when the object is a human body, a human body image encoder may be used.

For example, the image encoder may use a depth neural network to perform feature extraction on the first image and the second image respectively, and use the first depth feature in the extracted first image as a first hidden variable and the second depth feature in the extracted second image as a second hidden variable. It should be understood that, the embodiments of the present disclosure are not limited to the encoding manner of the first image and the second image.

In one possible implementation, the first hidden variables may be represented as M first N-dimensional vectors, the second hidden variables may be represented as M second N-dimensional vectors, and M and N are positive integers, for example, the face image encoder may encode the first image into 18 first 512-dimensional vectors and the second image into 18 second 512-dimensional vectors. In this way, the first hidden variable and the second hidden variable can be fused later.

In step S13, in response to a setting operation of a fusion weight for any object attribute of the same object, the first hidden variable and the second hidden variable are fused according to the set fusion weight, and a fused third hidden variable is obtained.

It should be understood that fusing the two images can be roughly considered as fusing the contour shape and the appearance color of the same object in the two images, and thus the object attribute of the same object can at least include at least one of the contour shape and the appearance color. Of course, those skilled in the art may add the fusible object attribute according to the specific type of the object, for example, in the case that the object is a human face, the object attribute may further include a human face expression; in the case that the object is a human body, the object attribute may further include a human body posture, and the like, which is not limited in this embodiment of the present disclosure.

For example, in the case that the object is a human face, the first image and the second image are fused, and it can be considered that the shapes of faces and the colors of faces (including makeup color, skin color, pupil color, and the like) of two human faces in the first image and the second image are fused; when the object is a human hand, the first image and the second image are fused, and it can be considered that the hand shapes and skin colors of the two human hands in the first image and the second image are fused.

It should be understood that, a person skilled in the art may utilize software development technologies known in the art to design and implement an application program of the image fusion method of the embodiment of the present disclosure and a corresponding graphical interactive interface, where an operation control for setting a fusion weight may be provided in the graphical interactive interface, so as to implement a setting operation of the fusion weight of any object attribute by a user, and the embodiment of the present disclosure is not limited thereto.

Fig. 2 is a schematic diagram of a graphical interaction interface according to an embodiment of the present disclosure, and as shown in fig. 2, a user may upload the first image "1. jpg" and the second image "01 (2). jpg" by "dragging a file to this area" or "browsing a folder", and may set a fusion weight of any human face attribute (e.g., "face shape", "face color") by adjusting a position of a "solid circle" on a line segment at any human face attribute.

In a possible implementation manner, the fusion weight may be set to a certain value range, for example, the value range of the fusion weight may be set to [0,1 ]. In order to facilitate the fusion of the first hidden variable and the second hidden variable, the fusion weight may include a first weight corresponding to the first image and a second weight corresponding to the second image, where the first weight is applied to the first hidden variable and the second weight is applied to the second hidden variable.

Based on the value range of the fusion weight, the sum of the first weight and the second weight may be a designated value (e.g., 1), so that the user may set only the first weight to obtain the second weight, or set only the second weight to obtain the first weight, e.g., set the first weight to F, and then the second weight to 1-F.

The first weight may represent a degree of proximity of the object attribute in the fused target image and the first image, and the second weight may represent a degree of proximity of the object attribute in the fused target image and the second image. It should be understood that the greater the first weight (i.e., the smaller the second weight), the closer the object property in the target image is to the first image; conversely, the greater the second weight (i.e., the smaller the first weight), the closer the object property in the target image is to the second image. For example, in the case where the object is a human face, the greater the first weight, the closer the attribute of the human face in the target image is to the attribute of the human face in the first image.

In consideration of the experimental findings, in the subsequent process of decoding the third hidden variable by using the generation network to generate the target image, the sensitivity degrees of different network layers of the generation network to different object attributes are different, or the learning effects are different. The low resolution network layer (which may also be referred to as a shallow network layer) of the generated network is sensitive to the profile shape, and the high resolution network layer (which may also be referred to as a high network layer) is sensitive to the appearance color. Therefore, fusion weights of different object attributes can be applied to a part of first N-dimensional vectors of the first hidden vector and a part of second N-dimensional vectors of the second hidden vector, that is, according to the types of the object attributes, at least one first N-dimensional vector on which the first weight is applied and at least one second N-dimensional vector on which the second weight is applied are determined, so that the fusion degrees of different object attributes can be respectively controlled, the decoupling fusion of different object attributes is realized, and the fusion of different object attributes is not interfered with each other.

As described above, the first hidden variables may be represented as M first N-dimensional vectors, and the second hidden variables may be represented as M second N-dimensional vectors, and in a possible implementation manner, the fusing the first hidden variables and the second hidden variables according to the set fusion weight to obtain fused third hidden variables includes: multiplying the first weight by at least one first N-dimensional vector in the first hidden variable to obtain a first weighted hidden variable; multiplying the second weight by at least one second N-dimensional vector in the second hidden variable to obtain a second weighted hidden variable; and adding the first weighted hidden variable and the second weighted hidden variable to obtain a third hidden variable. In this way, the first hidden variable and the second hidden variable can be effectively fused according to the fusion weight.

In step S14, the third hidden variable is decoded to obtain a fused target image.

In a possible implementation manner, the generation network may be used to decode the third hidden variable to obtain the target image. It should be understood that the embodiments of the present disclosure are not limited to the network structure, the network type, and the training manner of the generated network, for example, the generated network may be obtained by training a Generated Adaptive Network (GAN).

The generation network may be configured to generate an image with a specified image style according to the M N-dimensional vectors, where the image style may at least include a true style and a non-true style, and the non-true style may at least include a cartoon style, an euro style, a sketch style, an oil painting style, a layout style, and the like. It should be understood that the image styles of the target images obtained by decoding the third hidden variable through the generation networks corresponding to different image styles are different, for example, when the object is a face, the face in the target image may be a real-style face or a non-real-style face.

In a possible implementation manner, the user may set an image style of the target image, or the user may select a generation network corresponding to different image styles to perform decoding processing on the third hidden variable, for example, in the graphical user interface shown in fig. 2, the user may set the image styles at "style model 1" and "style model 1", and set the image styles, that is, select the generation network used.

Fig. 3a shows a schematic diagram of a first image, fig. 3b shows a schematic diagram of a second image, fig. 4a shows a schematic diagram of a target image, and fig. 4b shows a schematic diagram of a target image according to an embodiment of the present disclosure. The target image shown in fig. 4a may be an image fusion method according to an embodiment of the present disclosure, and the target image shown in fig. 3a and the second image shown in fig. 3b may be a real-style target image obtained by fusing the first image shown in fig. 3a and the second image shown in fig. 3b, and the target image shown in fig. 4b may be an image fusion method according to an embodiment of the present disclosure, and the target image shown in fig. 3a and the second image shown in fig. 3b may be a cartoon-style target image obtained by fusing the first image shown in fig. 3a and the second image shown in fig. 3 b.

As described above, in one possible implementation manner, in step S13, in response to a setting operation of the fusion weight for any object attribute of the same object, the fusing the first hidden variable and the second hidden variable according to the set fusion weight to obtain a fused third hidden variable, where the fusing weight includes a first weight corresponding to the first image and a second weight corresponding to the second image, and the fusing includes:

step S131: and determining a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable according to the type of the object attribute.

In a possible implementation manner, the type of the object attribute includes at least one of a contour shape and an appearance color of the object, and in order to facilitate separately controlling a degree of fusion of the contour shape and the appearance color, that is, to implement decoupled fusion of the contour shape and the appearance color, the first weight includes at least one of a first sub-weight corresponding to the contour shape in the first image and a third sub-weight corresponding to the appearance color in the first image; the second weight comprises at least one of a second sub-weight corresponding to the outline shape in the second image and a fourth sub-weight corresponding to the appearance color in the second image.

As described above, the fusion weight may set a value range, for example, the value range of the fusion weight may be set to [0,1 ]; and, based on the range of values of the fusion weight, the sum of the first weight and the second weight may be a specified value (e.g., 1). Based on this, the sum of the first sub-weight and the second sub-weight is the designated value, and the sum of the third sub-weight and the fourth sub-weight is the designated value. Thus, the user can only set the first sub-weight to obtain the second sub-weight; or only setting the second sub-weight to obtain the first sub-weight; the fourth sub-weight can be obtained only by setting the third sub-weight; or only the fourth sub-weight is set, the third sub-weight is obtained, for example, if the first sub-weight is set to F1, the second sub-weight is 1-F1, and if the third sub-weight is set to F2, the fourth sub-weight is 1-F2. The face position shown in fig. 2 may be set with a first sub-weight of 0.5 and a second sub-weight of 0.5, and the face position may be set with a third sub-weight of 0.5 and a fourth sub-weight of 0.5.

The first sub-weight can represent the degree of closeness of the fused target image and the outline shape in the first image, the second sub-weight can represent the degree of closeness of the fused target image and the outline shape in the second image, the third sub-weight can represent the degree of closeness of the fused target image and the appearance color in the first image, and the fourth sub-weight can represent the degree of closeness of the fused target image and the appearance color in the second image.

It should be understood that the larger the first sub-weight (i.e., the smaller the second sub-weight), the closer the contour shape in the target image is to the first image; conversely, the larger the second sub-weight (i.e., the smaller the first sub-weight), the closer the contour shape in the target image is to the second image; the larger the third sub-weight is (namely, the smaller the fourth sub-weight is), the closer the appearance color in the target image is to the first image; conversely, the larger the fourth sub-weight (i.e., the smaller the third sub-weight), the closer the apparent color in the target image is to the second image. For example, in the case where the object is a human face, the larger the first sub-weight is, the closer the face of the human face in the target image is to the face of the human face in the first image, and the larger the fourth sub-weight is, the closer the face of the human face in the target image is to the face of the human face in the second image.

As described above, different network layers of the generated network have different sensitivity degrees or different learning effects for different object attributes. The low resolution network layer (also referred to as a shallow network layer) of the generated network is sensitive to the profile shape, and the high resolution network layer (also referred to as a high network layer) is sensitive to the appearance color.

In one possible implementation manner, determining a first weighted hidden variable between a first weight and a first hidden variable and a second weighted hidden variable between a second weight and a second hidden variable according to the type of the object attribute includes: under the condition that the object attribute comprises a contour shape, multiplying the first i first N-dimensional vectors in the M first N-dimensional vectors by the first sub-weight to obtain the first i first weighted N-dimensional vectors of the first weighted hidden variables; multiplying the first i second N-dimensional vectors in the M second N-dimensional vectors by the second sub-weight to obtain first i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M). By the method, the fusion degree of the outline shape of the object can be controlled, and the decoupling fusion of the outline shape and the appearance color is conveniently realized.

The first i first N-dimensional vectors in the M first N-dimensional vectors are multiplied by the first sub-weight to obtain first i first weighted N-dimensional vectors of the first weighted hidden variable, which can be understood as applying the first sub-weight to the first i first N-dimensional vectors of the first hidden variable; multiplying the first i second N-dimensional vectors of the M second N-dimensional vectors by the second sub-weights may be understood as applying the second sub-weights to the first i second N-dimensional vectors of the second hidden variables.

In a possible implementation manner, determining a first weighted hidden variable between a first weight and a first hidden variable and a second weighted hidden variable between a second weight and a second hidden variable according to a type of an object attribute includes: under the condition that the object attribute comprises an appearance color, multiplying the last M-i first N-dimensional vectors in the M first N-dimensional vectors by the third sub-weight to obtain the last M-i first weighted N-dimensional vectors of the first weighted hidden variables; multiplying the last M-i second N-dimensional vectors in the M second N-dimensional vectors by the fourth sub-weight to obtain last M-i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M). By the method, the fusion degree of the appearance colors of the object can be controlled, and the decoupling fusion of the outline shape and the appearance colors is conveniently realized.

Multiplying the last M-i first N-dimensional vectors in the M first N-dimensional vectors by the third sub-weight to obtain the last M-i first weighted N-dimensional vectors of the first weighted hidden variables, wherein the third sub-weight is applied to the last M-i first N-dimensional vectors of the first hidden variables; multiplying the last M-i second N-dimensional vectors of the M second N-dimensional vectors by the fourth sub-weight, it can be understood that the fourth sub-weight is applied to the last M-i second N-dimensional vectors of the second hidden variable.

It should be understood that the value of i may be an empirical value determined by experimental testing based on the network structure of the generated network, and the disclosed embodiments are not limited thereto.

Step S132: and determining a third hidden variable according to the first weighted hidden variable and the second weighted hidden variable.

As described above, the determining the first weighted hidden variables may be performed by determining the first weighted hidden variables as M first weighted N-dimensional vectors, the second weighted hidden variables as M second weighted N-dimensional vectors, and the third hidden variables as M third N-dimensional vectors, and in one possible implementation, the determining the third hidden variables according to the first weighted hidden variables and the second weighted hidden variables includes: adding the first i first weighted N-dimensional vectors of the first weighted hidden variables and the first i second weighted N-dimensional vectors of the second weighted hidden variables to obtain first i third N-dimensional vectors of third hidden variables; and adding the last M-i first weighted N-dimensional vectors of the first weighted hidden variables and the last M-i second weighted N-dimensional vectors of the second weighted hidden variables to obtain the last M-i third N-dimensional vectors of the third hidden variables. This approach can be understood as adding the first weighted hidden variable to the second weighted hidden variable to obtain a third hidden variable. By the method, the fused third hidden variable can be effectively obtained.

It is contemplated that the generating network may generate the target image with an unreal style, such as a cartoon style, in which case the apparent colors of the objects in the original first image and the second image have little or no effect on the apparent colors of the objects in the target image, or the apparent colors of the objects in the target image depend on the unreal style corresponding to the generating network and are not affected by the apparent colors of the objects in the original first image and the second image.

In a possible implementation manner, in a case where the generation network generates a target image with an unreal style, determining a third hidden variable according to the first weighted hidden variable and the second weighted hidden variable, further includes: taking the last M-i first N-dimensional vectors of the first hidden variables corresponding to the first weighted hidden variables as the last M-i third N-dimensional vectors of the third hidden variables; or, taking the last M-i second N-dimensional vectors of the second hidden variables corresponding to the second weighted hidden variables as the last M-i third N-dimensional vectors of the third hidden variables. This method can be understood as a method of selecting an appearance color in any one of the two images without fusing the appearance colors in the two images. By the method, the third hidden variable can be quickly obtained under the condition that the target image with the unreal style is generated by the generation network.

It should be noted that the appearance color of the object in the target image depends on the unreal style corresponding to the generation network, and is not affected by the appearance colors of the object in the first image and the second image, and is also not affected by the appearance color implied by the fused third hidden variable. Then, in the case that the generating network generates the target image with the unreal style, the last M-i third N-dimensional vectors of the third hidden vectors may be the last M-i first N-dimensional vectors of the first hidden variables or the last M-i second N-dimensional vectors of the second hidden variables, or may be the sum of the last M-i first weighted N-dimensional vectors and the last M-i second weighted N-dimensional vectors. And in the case that the target image with a real style is generated by the generation network, the last M-i third N-dimensional vectors of the third hidden vectors are the sum of the last M-i first weighted N-dimensional vectors and the last M-i second weighted N-dimensional vectors, so that the appearance color and the outline shape of the object in the first image and the second image are fused in the target image.

In the embodiment of the disclosure, fusion of different fusion degrees of different object attributes can be realized according to the types of the object attributes, the first weight and the second weight, so that a target image obtained based on the fused third dependent variable can meet different fusion requirements of a user.

As described above, the user may set the image style of the target image, and different image styles correspond to different generation networks, in a possible implementation manner, in step S14, the decoding process is performed on the third latent variable to obtain the fused target image, which includes:

step S141: in response to a style setting operation for an image style of a target image, a target generation network corresponding to the set image style is determined, the target generation network being used to generate an image having the set image style.

As described above, those skilled in the art may design and implement the application program of the image fusion method and the corresponding graphical interactive interface by using software development technologies known in the art, where the graphical interactive interface may provide an operation control for setting an image style, so as to implement a style setting operation for the image style by a user, and the embodiment of the present disclosure is not limited to this. For example, in the graphical user interface shown in fig. 2, the user may set an image style at "style model 1" and "style model 1", that is, determine the target generation network to be used.

Considering that the user may desire that the image style of the target image is a style in which two image styles are merged, for example, a style in which a realistic style is merged with a comic style, the user may set different image styles at "style model 1" and "style model 1" shown in fig. 2. When different image styles are set by a user, in order to realize the style fusion of the image styles, two generation networks corresponding to the two image styles can be subjected to network fusion to obtain a target generation network, so that a target image with the two image styles fused can be generated by using the target generation network after the network fusion. It should be understood that when a user sets one image style, the target generation network is the generation network corresponding to the image style set by the user.

Fig. 5 shows a schematic diagram of a graphical user interface according to an embodiment of the present disclosure, as shown in fig. 5, a user sets different image styles at a "style model 1" and a "style model 2", and the "style model" may set a style identifier "blend style 1" that blends a blended style of the two image styles, that is, may set a network identifier of a fused target generation network, so as to store the fused target generation network, so that the user may directly invoke the fused target generation network by setting the blended style later.

Step S142: and decoding the third hidden variable by using the target generation network to obtain a target image.

As described above, the third hidden variables may be represented as M third N-dimensional vectors, and in a possible implementation manner, the target generation network has an M-layer network layer, and the decoding processing is performed on the third hidden variables by using the target generation network to obtain the target image, including: inputting the 1 st third N-dimensional vector to a 1 st network layer of the target generation network to obtain a 1 st intermediate graph output by the 1 st network layer; inputting the mth third N-dimensional vector and the (M-1) th intermediate graph into an mth network layer of the target generation network to obtain the mth intermediate graph output by the mth network layer, wherein N belongs to [2, M ]; and inputting the Mth third N-dimensional vector and the M-1 intermediate map into an M-th layer network layer of the target generation network to obtain a style fusion image output by the M-th layer network layer, wherein the target image comprises the style fusion image.

In one possible implementation, a target generation network may be used to generate images that increase in resolution, which may also be referred to as a multi-layer transformed target generation network. The input of the first network layer of the target generation network is a third N-dimensional vector, then the input of each network layer comprises a third N-dimensional vector and an intermediate graph output by the upper network layer, and the last network layer outputs a target image.

It is understood that the low-resolution network layer (also referred to as a shallow network layer) of the target generation network first learns and generates a low-resolution intermediate map (e.g., 4 × 4 resolution), then, gradually as the depth of the network increases, continues to learn and generate a higher-resolution intermediate map (e.g., 512 × 512 resolution), and finally generates the target image of the highest resolution (e.g., 1024 × 1024 resolution).

Fig. 6 is a schematic diagram illustrating an image fusion process according to an embodiment of the present disclosure, where the image fusion process illustrated in fig. 6 may be an image fusion process in which a user sets an image style, and the first image, the second image, and the target image in fig. 6 are both real-style images, and the image fusion process illustrated in fig. 6 may include: respectively inputting the first image and the second image into a human face image encoder to respectively obtain a first hidden variable and a second hidden variable; fusing the first hidden variable and the second hidden variable according to fusion weights respectively set for the face shape and the face color to obtain a fused third hidden variable; and inputting the third hidden variable into a target generation network to obtain a fused target image.

In this embodiment of the present disclosure, the target generation network corresponding to the set image style may be used to perform decoding processing on the third latent variable, so as to effectively obtain the target image with the set image style.

As described above, the user may set two image styles, and perform network fusion on two generation networks corresponding to the two image styles to obtain the target generation network, so that the target generation network after network fusion is used to generate the target image with the two image styles being fused.

In a possible implementation manner, the set image styles include a first image style and a second image style, the first image style and the second image style are different in style type, and the style setting operation is further configured to set a style fusion degree, and the style fusion degree is used to indicate the number of network layers fused between the first generation network and the second generation network, wherein, in step S141, determining a target generation network corresponding to the set image style includes:

determining a first generation network corresponding to a first image style and a second generation network corresponding to a second image style, the first generation network being used for generating images with the first image style, and the second generation network being used for generating images with the second image style; and according to the style fusion degree, carrying out network fusion on the first generation network and the second generation network to obtain a target generation network. By the method, network fusion between the first generation network and the second generation network can be realized according to the style fusion degree, so that the target generation network can generate the target image with the style of fusing two images

It should be understood that after the user sets the two image styles, the corresponding first generation network and the second generation network may be called based on the set first image style and the set second image style, so as to perform network convergence on the first generation network and the second generation network.

The style fusion degree can control the proximity degree of the image style of the target image and the first image style, that is, the proximity degree of the image style of the target image and the second image style. The style fusion degree is used to indicate the number of network layers fused between the first generation network and the second generation network, which means that the style fusion degree is smaller than the total number of network layers of the first generation network and the second generation network.

In a possible implementation manner, the first generation network and the second generation network each have an M-layer network layer, where network fusion is performed on the first generation network and the second generation network according to the style fusion degree to obtain the target generation network, including: replacing a front I-layer network layer of a first generation network with a front I-layer network layer of a second generation network to obtain a target generation network; or replacing the back I-layer network layer of the first generation network with the back I-layer network layer of the second generation network to obtain the target generation network; wherein I is the number of network layers, I belongs to [1, M), the style proximity degree between the image style of the target image and the first image style is in negative correlation with the number of network layers I, and the style proximity degree between the image style of the target image and the second image style is in positive correlation with the number of network layers I. By the method, network fusion between the first generation network and the second generation network can be effectively realized, so that the target generation network can generate the target image with a style of fusing two images.

The former I-layer network layer of the first generation network is replaced by the former I-layer network layer of the second generation network, namely, the former I-layer network layer of the first generation network is spliced with the latter N-I-layer network layer of the second generation network. It should be understood that the value of I may be set by the user according to the style fusion requirement, for example, the style fusion degree may be set by setting an operation control of "face" in the graphical user interface shown in fig. 5, that is, when the user sets two image styles, the fusion weight for the appearance color set in the graphical user interface by the user may be converted into the set style fusion degree; of course, an independent operation control may also be provided in the above-mentioned graphical user interface to set the style fusion degree, and this embodiment of the present disclosure is not limited thereto.

The style proximity degree between the image style of the target image and the first image style is in negative correlation with the network layer number I, and the style proximity degree between the image style of the target image and the second image style is in positive correlation with the network layer number I, so that the smaller the value of I, the more the network layer occupation ratio of the first generation network in the target generation network is, the closer the target image generated by the target generation network is to the first image style (i.e. the closer the target image is to the second image style); the larger the value of I, the more network layer occupancy of the second generation network in the target generation network, the closer the target image generated by the target generation network is to the second image style (i.e., the closer it is to the first image style).

As described above, in step S142, the target image may include a style-fused image output by the target generation network, and in a possible implementation, the target image may further include: and the second generation network is used for decoding at least one of the first style image obtained by decoding the third hidden variable. The implementation manner of decoding the third hidden variable to obtain the style fused image by referring to the target generation network may be implemented to implement a first style image obtained by decoding the third hidden variable by using the first generation network, and a second style image obtained by decoding the third hidden variable by using the second generation network, which are not described herein again.

Fig. 7 is a schematic diagram illustrating an image fusion process according to an embodiment of the present disclosure, where the image fusion process illustrated in fig. 7 may be an image fusion process in which a user sets two image styles, and the image fusion process illustrated in fig. 7 may include: respectively inputting the first image and the second image into a human face image encoder to respectively obtain a first hidden variable and a second hidden variable; according to the fusion weight set for the face shape, fusing the first i first N-dimensional vectors of the first hidden variables with the first i second N-dimensional vectors of the second hidden variables to obtain first i third N-dimensional vectors of third hidden variables, and taking the last M-i second N-dimensional vectors of the first hidden variables or the last M-i second N-dimensional vectors of the second hidden variables as last M-i third N-dimensional vectors of the third hidden variables; according to the set style fusion degree, performing network fusion on a first generation network corresponding to the image style x and a second generation network corresponding to the image style y to obtain a target generation network; and inputting the third hidden variable into a target generation network, a first generation network and a second generation network respectively to obtain a style fusion image output by the target generation network, a first style image output by the first generation network and a second style image output by the second generation network, wherein the target image comprises the style fusion image, the first style image and the second style image.

According to the embodiment of the disclosure, the attribute fusion of the outline shape and the outline shape can be effectively decoupled, so that a user can set fusion weights for the outline shape and the outline shape respectively and perform fusion of different fusion degrees; but also directly to the fusion of images of different image styles.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an image fusion apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any image fusion method provided by the present disclosure, and the corresponding technical solutions and descriptions thereof and the corresponding descriptions in the methods section are omitted for brevity.

Fig. 8 shows a block diagram of an image fusion apparatus according to an embodiment of the present disclosure, as shown in fig. 8, the apparatus including:

an obtaining module 101, configured to obtain a first image and a second image to be fused, where the first image and the second image have a same object;

the encoding module 102 is configured to perform encoding processing on the first image and the second image respectively to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image;

the fusion module 103 is configured to, in response to a setting operation of a fusion weight for any object attribute of the same object, fuse the first hidden variable and the second hidden variable according to the set fusion weight to obtain a fused third hidden variable;

and the decoding module 104 is configured to perform decoding processing on the third hidden variable to obtain a fused target image.

In a possible implementation manner, the fusion weight includes a first weight corresponding to the first image and a second weight corresponding to the second image, where the fusion module 103 includes: a weighted hidden variable determining submodule, configured to determine, according to the type of the object attribute, a first weighted hidden variable between the first weight and the first hidden variable, and a second weighted hidden variable between the second weight and the second hidden variable; and the fusion submodule is used for determining the third hidden variable according to the first weighted hidden variable and the second weighted hidden variable.

In one possible implementation, the first hidden variables are represented as M first N-dimensional vectors, the second hidden variables are represented as M second N-dimensional vectors, M and N are positive integers, the types of the object attributes include contour shapes of the object, the first weights include first sub-weights corresponding to contour shapes in the first image, and the second weights include second sub-weights corresponding to contour shapes in the second image; wherein the determining, according to the type of the object attribute, a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable includes: under the condition that the object attribute comprises a contour shape, multiplying the first i first N-dimensional vectors in the M first N-dimensional vectors by the first sub-weight to obtain the first i first weighted N-dimensional vectors of the first weighted hidden variables; multiplying the first i second N-dimensional vectors in the M second N-dimensional vectors by the second sub-weight to obtain first i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M).

In a possible implementation manner, the first hidden variables are represented as M first N-dimensional vectors, the second hidden variables are represented as M second N-dimensional vectors, M and N are positive integers, the types of the object attributes include appearance colors of the object, the first weights include third sub-weights corresponding to appearance colors in the first image, and the second weights include fourth sub-weights corresponding to appearance colors in the second image; wherein the determining, according to the type of the object attribute, a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable includes: under the condition that the object attribute comprises an appearance color, multiplying the last M-i first N-dimensional vectors in the M first N-dimensional vectors by the third sub-weight to obtain the last M-i first weighted N-dimensional vectors of the first weighted hidden variables; multiplying the last M-i second N-dimensional vectors in the M second N-dimensional vectors by the fourth sub-weight to obtain last M-i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M).

In a possible implementation manner, the decoding module 104 includes: a network determination sub-module for determining a target generation network corresponding to a set image style in response to a style setting operation for the image style of the target image, the target generation network being used for generating an image having the set image style; and the decoding submodule is used for decoding the third hidden variable by using the target generation network to obtain the target image.

In a possible implementation manner, the set image styles include a first image style and a second image style, the style types of the first image style and the second image style are different, the style setting operation is further used for setting a style fusion degree, and the style fusion degree is used for indicating the number of network layers fused between the first generation network and the second generation network; wherein the determining of the target generation network corresponding to the set image style comprises: determining a first generation network corresponding to the first image style and a second generation network corresponding to the second image style, the first generation network being used for generating images with the first image style, and the second generation network being used for generating images with the second image style; and performing network fusion on the first generation network and the second generation network according to the style fusion degree to obtain the target generation network.

In a possible implementation manner, the first generating network and the second generating network each have an M-layer network layer, where the network merging the first generating network and the second generating network according to the style merging degree to obtain the target generating network includes: replacing a front I-layer network layer of the first generation network with a front I-layer network layer of the second generation network to obtain the target generation network; or, replacing a posterior I-layer network layer of the first generation network with a posterior I-layer network layer of the second generation network to obtain the target generation network; the image style of the target image is in positive correlation with the network layer number I, and the style proximity degree between the image style of the target image and the second image style is in positive correlation with the network layer number I.

In a possible implementation manner, the target generation network has an M-layer network layer, the third hidden variables are represented as M third N-dimensional vectors, and the decoding processing performed on the third hidden variables by using the target generation network to obtain the target image includes: inputting the 1 st third N-dimensional vector to a layer 1 network layer of the target generation network to obtain a 1 st intermediate graph output by the layer 1 network layer; inputting the mth third N-dimensional vector and the (M-1) th intermediate graph to an mth network layer of the target generation network to obtain the mth intermediate graph output by the mth network layer, wherein N belongs to [2, M ]; and inputting the Mth third N-dimensional vector and the M-1 th intermediate graph into the Mth layer network layer of the target generation network to obtain a style fusion image output by the Mth layer network layer, wherein the target image comprises the style fusion image.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and when executed by a processor, the computer program instructions implement the above method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

Embodiments of the present disclosure also provide a computer program product, which includes computer readable code or a non-volatile computer readable storage medium carrying computer readable code, when the computer readable code runs in a processor of an electronic device, the processor in the electronic device executes the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 9 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server or terminal device. Referring to fig. 9, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, that are executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, before the sensitive personal information is processed, a product applying the technical scheme of the application obtains individual consent and simultaneously meets the requirement of 'explicit consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is considered as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image fusion method, comprising:

acquiring a first image and a second image to be fused, wherein the first image and the second image have the same object;

respectively encoding the first image and the second image to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image;

in response to a setting operation of a fusion weight of any object attribute of the same object, fusing the first hidden variable and the second hidden variable according to the set fusion weight to obtain a fused third hidden variable, wherein the third hidden variable is obtained by applying the fusion weight of the object attribute to the first i first N-dimensional vectors of the first hidden variable and the first i second N-dimensional vectors of the second hidden variable, or to the last M-i first N-dimensional vectors of the first hidden variable and the last M-i second N-dimensional vectors of the second hidden variable, M and N are positive integers, and i belongs to [1, M);

and decoding the third hidden variable to obtain a fused target image.

2. The method of claim 1, wherein the fusion weight comprises a first weight corresponding to the first image and a second weight corresponding to the second image,

wherein, the fusing the first hidden variable and the second hidden variable according to the set fusion weight in response to the operation of setting the fusion weight for any object attribute of the same object to obtain a fused third hidden variable, includes:

determining a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable according to the type of the object attribute;

and determining the third hidden variable according to the first weighted hidden variable and the second weighted hidden variable.

3. The method according to claim 2, wherein the first hidden variables are represented as M first N-dimensional vectors, the second hidden variables are represented as M second N-dimensional vectors, M and N are positive integers, the class of the object property includes a contour shape of the object, the first weight includes a first sub-weight corresponding to a contour shape in the first image, and the second weight includes a second sub-weight corresponding to a contour shape in the second image;

wherein the determining a first weighted hidden variable between the first weight and the first hidden variable and a second weighted hidden variable between the second weight and the second hidden variable according to the type of the object attribute comprises:

under the condition that the object attribute comprises a contour shape, multiplying the first i first N-dimensional vectors in the M first N-dimensional vectors by the first sub-weight to obtain the first i first weighted N-dimensional vectors of the first weighted hidden variables; and the number of the first and second groups,

multiplying the first i second N-dimensional vectors in the M second N-dimensional vectors by the second sub-weight to obtain first i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M).

4. The method according to claim 2 or 3, wherein the first hidden variables are represented as M first N-dimensional vectors, the second hidden variables are represented as M second N-dimensional vectors, M and N are positive integers, the categories of the object attributes comprise appearance colors of the object, the first weights comprise third sub-weights corresponding to the appearance colors in the first image, and the second weights comprise fourth sub-weights corresponding to the appearance colors in the second image;

multiplying the last M-i first N-dimensional vectors of the M first N-dimensional vectors by the third sub-weight under the condition that the object attribute comprises an appearance color to obtain last M-i first weighted N-dimensional vectors of the first weighted hidden variables; and the number of the first and second groups,

multiplying the last M-i second N-dimensional vectors in the M second N-dimensional vectors by the fourth sub-weight to obtain last M-i second weighted N-dimensional vectors of the second weighted hidden variables; wherein i ∈ [1, M).

5. The method of claim 2, wherein the first weighted hidden variables are represented as M first weighted N-dimensional vectors, the second weighted hidden variables are represented as M second weighted N-dimensional vectors, and the third hidden variables are represented as M third N-dimensional vectors, and wherein determining the third hidden variables from the first weighted hidden variables and the second weighted hidden variables comprises:

adding the first i first weighted N-dimensional vectors of the first weighted hidden variables to the first i second weighted N-dimensional vectors of the second weighted hidden variables to obtain first i third N-dimensional vectors of the third hidden variables;

and adding the last M-i first weighted N-dimensional vectors of the first weighted hidden variables and the last M-i second weighted N-dimensional vectors of the second weighted hidden variables to obtain the last M-i third N-dimensional vectors of the third hidden variables.

6. The method of claim 2 or 5, wherein the first weighted hidden variables are represented as M first weighted N-dimensional vectors, the second weighted hidden variables are represented as M second weighted N-dimensional vectors, the third hidden variables are represented as M third N-dimensional vectors, and the determining the third hidden variables from the first weighted hidden variables and the second weighted hidden variables further comprises:

taking the last M-i first N-dimensional vectors of the first hidden variables corresponding to the first weighted hidden variables as the last M-i third N-dimensional vectors of the third hidden variables; or the like, or, alternatively,

and taking the last M-i second N-dimensional vectors of the second hidden variables corresponding to the second weighted hidden variables as the last M-i third N-dimensional vectors of the third hidden variables.

7. The method according to claim 1, wherein the decoding the third hidden variable to obtain a fused target image comprises:

in response to a style setting operation for an image style of the target image, determining a target generation network corresponding to the set image style, the target generation network being used for generating an image having the set image style;

and decoding the third hidden variable by using the target generation network to obtain the target image.

8. The method of claim 7, wherein the set image styles comprise a first image style and a second image style, the first image style and the second image style having different style types, and wherein the style setting operation is further configured to set a style fusion degree, the style fusion degree being used to indicate a number of fused network layers between the first generation network and the second generation network;

wherein, the determining a target generation network corresponding to the set image style comprises:

determining a first generation network corresponding to the first image style and a second generation network corresponding to the second image style, the first generation network being used for generating images with the first image style, and the second generation network being used for generating images with the second image style;

and performing network fusion on the first generation network and the second generation network according to the style fusion degree to obtain the target generation network.

9. The method according to claim 8, wherein the first generation network and the second generation network each have an M-layer network layer, and wherein the network merging the first generation network and the second generation network according to the style merging degree to obtain the target generation network comprises:

replacing a front I-layer network layer of the first generation network with a front I-layer network layer of the second generation network to obtain the target generation network; or the like, or a combination thereof,

replacing a posterior I-layer network layer of the first generation network with a posterior I-layer network layer of the second generation network to obtain the target generation network;

the image style of the target image and the first image style are in negative correlation with the network layer number I, and the style proximity degree of the target image and the second image style is in positive correlation with the network layer number I.

10. The method according to claim 7, wherein the target generation network has an M-layer network layer, the third hidden variables are represented as M third N-dimensional vectors, and the decoding processing performed on the third hidden variables by using the target generation network to obtain the target image includes:

inputting the 1 st third N-dimensional vector to a layer 1 network layer of the target generation network to obtain a 1 st intermediate graph output by the layer 1 network layer;

inputting the mth third N-dimensional vector and the (M-1) th intermediate graph into an mth network layer of the target generation network to obtain the mth intermediate graph output by the mth network layer, wherein N belongs to [2, M ];

and inputting the Mth third N-dimensional vector and the M-1 th intermediate graph into the Mth layer network layer of the target generation network to obtain a style fusion image output by the Mth layer network layer, wherein the target image comprises the style fusion image.

11. The method of any of claims 8-10, wherein the target image further comprises: at least one of a first style image obtained by decoding the third hidden variable by using a first generation network and a second style image obtained by decoding the third hidden variable by using a second generation network;

wherein the first generation network is configured to generate images having the first image style, and the second generation network is configured to generate images having the second image style.

12. An image fusion apparatus, comprising:

the system comprises an acquisition module, a fusion module and a fusion module, wherein the acquisition module is used for acquiring a first image and a second image to be fused, and the first image and the second image have the same object;

the encoding module is used for respectively encoding the first image and the second image to obtain a first hidden variable corresponding to the first image and a second hidden variable corresponding to the second image;

a fusion module, configured to fuse, in response to a setting operation of a fusion weight for any object attribute of the same object, the first hidden variable and the second hidden variable according to the set fusion weight, to obtain a fused third hidden variable, where the third hidden variable is obtained by applying the fusion weight of the object attribute to first i first N-dimensional vectors of the first hidden variable and first i second N-dimensional vectors of the second hidden variable, or applying the fusion weight of the object attribute to last M-i first N-dimensional vectors of the first hidden variable and last M-i second N-dimensional vectors of the second hidden variable, where M and N are positive integers, and i belongs to [1, M);

and the decoding module is used for decoding the third hidden variable to obtain a fused target image.

13. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 11.

14. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 11.