WO2023179074A1 - 图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品 - Google Patents

图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品 Download PDF

Info

Publication number
WO2023179074A1
WO2023179074A1 PCT/CN2022/134922 CN2022134922W WO2023179074A1 WO 2023179074 A1 WO2023179074 A1 WO 2023179074A1 CN 2022134922 W CN2022134922 W CN 2022134922W WO 2023179074 A1 WO2023179074 A1 WO 2023179074A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
latent variable
weighted
style
dimensional vectors
Prior art date
Application number
PCT/CN2022/134922
Other languages
English (en)
French (fr)
Inventor
林纯泽
王权
钱晨
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023179074A1 publication Critical patent/WO2023179074A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to an image fusion method and device, electronic equipment, storage media, computer programs, and computer program products.
  • Attribute fusion of face images refers to the fusion of face attributes in two images.
  • the user needs to fuse image 1 and image 2, and the face shape of the face in the fused image is close to the face shape of the face in image 1.
  • the color of the human face is close to the color of the human face in image 2.
  • the related technology can only fuse the two images as a whole, so that the degree of fusion of face shape and complexion in the two images is the same, that is, it cannot control the solution of the two facial attributes of face shape and complexion in image 1 and image 2. Coupled fusion.
  • Embodiments of the present disclosure provide an image fusion method, which includes: acquiring a first image and a second image to be fused, where the first image and the second image contain the same object; Perform encoding processing with the second image to obtain the first latent variable corresponding to the first image and the second latent variable corresponding to the second image; in response to the fusion of any object attributes for the same object
  • the weight setting operation is to fuse the first latent variable and the second latent variable according to the set fusion weight to obtain the fused third latent variable; decode the third latent variable to obtain the fused the final target image.
  • An embodiment of the present disclosure provides an image fusion device, including: an acquisition module configured to acquire a first image and a second image to be fused, where the first image and the second image contain the same object; and an encoding module , configured to perform coding processing on the first image and the second image respectively to obtain the first latent variable corresponding to the first image and the second latent variable corresponding to the second image; the fusion module is configured as In response to the setting operation of the fusion weight for any object attribute of the same type of object, the first latent variable and the second latent variable are fused according to the set fusion weight to obtain the fused third latent variable. variable; a decoding module configured to decode the third latent variable to obtain the fused target image.
  • An embodiment of the present disclosure provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute the above method.
  • Embodiments of the present disclosure provide a computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above method is implemented.
  • Embodiments of the present disclosure provide a computer program that includes computer readable code.
  • the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or All steps.
  • Embodiments of the present disclosure provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • any embodiment of the present disclosure is implemented. some or all of the steps in the method.
  • the first latent variable corresponding to the first image and the second latent variable corresponding to the second image are obtained, and then according to any set object Fusion weight of attributes, fuse the first latent variable and the second latent variable to obtain the fused third latent variable, and decode the third latent variable to obtain the target image, which can realize different object attributes based on user settings
  • the fusion weight realizes the decoupled fusion of different object attributes, and can also control the fusion degree of different object attributes, so that the fused target image can meet the different fusion needs of users.
  • Figure 1 is a flow chart of an image fusion method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of a graphical interactive interface provided by an embodiment of the present disclosure
  • Figure 3a is a schematic diagram of a first image provided by an embodiment of the present disclosure.
  • Figure 3b is a schematic diagram of a second image provided by an embodiment of the present disclosure.
  • Figure 4a is a schematic diagram 1 of a target image provided by an embodiment of the present disclosure.
  • Figure 4b is a schematic diagram 2 of a target image provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of a graphical user interface provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic diagram 1 of an image fusion process provided by an embodiment of the present disclosure.
  • Figure 7 is a schematic diagram 2 of an image fusion process provided by an embodiment of the present disclosure.
  • Figure 8 is a block diagram of an image fusion device provided by an embodiment of the present disclosure.
  • Figure 9 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • exemplary means "serving as an example, example, or illustrative.” Any embodiment described herein as “exemplary” is not necessarily to be construed as superior or superior to other embodiments.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations.
  • at least one herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, and C, which can mean including from A, Any one or more elements selected from the set composed of B and C.
  • FIG. 1 shows a flow chart of an image fusion method provided by an embodiment of the present disclosure.
  • the image fusion method can be executed by an electronic device such as a terminal device or a server.
  • the terminal device can be a user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle-mounted device, wearable device, etc.
  • the method can call the data stored in the memory through the processor
  • the method can be implemented in the form of computer readable instructions, or the method can be executed by a server.
  • the image fusion method includes:
  • Step S11 Obtain the first image and the second image to be fused, and the first image and the second image contain the same object.
  • the first image and the second image may be images collected in real time by an image acquisition device, images extracted from local storage, or images transmitted by other electronic devices. It should be understood that the user can Customize the upload of the first image and the second image to be fused.
  • the embodiment of the present disclosure does not limit the method of acquiring the first image and the second image.
  • objects may include but are not limited to: human faces, human hands, human bodies, objects, animals, plants, etc.
  • objects in the first image and the second image There are the same kind of objects in the first image and the second image. It can be understood that the objects in the first image and the second image are the same kind, but they may not be the same. For example, there may be people in the first image and the second image. face, but the faces in the first image and the second image are not the faces of the same person, or the user expects to fuse two different faces in the first image and the second image.
  • Step S12 Encode the first image and the second image respectively to obtain the first latent variable corresponding to the first image and the second latent variable corresponding to the second image.
  • the first image and the second image can be encoded respectively through image encoders corresponding to different objects to obtain the first latent variable corresponding to the first image and the third hidden variable corresponding to the second image.
  • Two latent variables For example, if the object is a human face, an image encoder for human faces can be used to encode the image; if the object is a human body, an image encoder for the human body can be used to encode the image, etc.
  • the above-mentioned image encoder can be implemented using deep learning technology known in the art.
  • the image encoder can use a deep neural network to extract features of the first image and the second image respectively, and extract the features of the first image.
  • the first depth feature is used as the first latent variable
  • the second depth feature extracted from the second image is used as the second latent variable. It should be understood that the embodiment of the present disclosure does not limit the encoding method of the first image and the second image.
  • the first latent variable can be expressed as M first N-dimensional vectors
  • the second latent variable can be expressed as M second N-dimensional vectors.
  • M and N are positive integers, for example, human faces.
  • the image encoder may encode the first image into 18 first 512-dimensional vectors and the second image into 18 second 512-dimensional vectors. Among them, M and N are both positive integers. In this way, the first latent variable and the second latent variable can be easily fused later.
  • Step S13 In response to the setting operation of the fusion weight for any object attribute of the same type of object, fuse the first latent variable and the second latent variable according to the set fusion weight to obtain a fused third latent variable.
  • the object attributes of the same type of object may include at least one of outline shape and appearance color. Fusion of two images may be considered as merging the outline shape and appearance of the same type of object in the two images. Colors are blended. It should be understood that those skilled in the art can add fusionable object attributes according to the type of object. For example, when the object is a human face, the object attributes can also include facial expressions; when the object is a human body, the object attributes can also include Including human body posture, etc., which are not limited by the embodiments of the present disclosure.
  • the first image and the second image are fused, which may be to combine the face shape and complexion (including makeup color, skin color, etc.) of the two faces in the first image and the second image. pupil color, etc.) are fused respectively;
  • the first image and the second image are fused, which may be to fuse the hand shape and skin color of the two human hands in the first image and the second image respectively.
  • the graphical interactive interface can provide for The operation control of setting the fusion weight is used to implement the user's setting operation of the fusion weight of any object attribute, which is not limited by the embodiment of the present disclosure.
  • the fusion weight can be set to a certain value range, for example, the value range of the fusion weight can be set to [0,1].
  • the fusion weight may include a first weight corresponding to the first image, and a second weight corresponding to the second image, and the first weight acts on the first latent variable, The second weight acts on the second latent variable.
  • the sum of the first weight and the second weight can be a specified value (for example, 1), so that the user can only set the first weight, based on the set first weight and
  • the second weight can be obtained by specifying a value; or the user can only set the second weight, and the first weight can be obtained based on the set second weight and the specified value.
  • the specified value is 1 and the first weight set by the user is F
  • the second weight can be obtained as 1-F, where F ⁇ [0,1].
  • the first weight may represent the proximity between the fused target image and the object attributes in the first image
  • the second weight may represent the proximity between the fused target image and the object attributes in the second image. It should be understood that the greater the first weight (that is, the smaller the second weight), the closer the object attributes in the target image are to the first image; conversely, the greater the second weight (that is, the smaller the first weight), the closer the target image is to the first weight.
  • the object properties in the object are closer to the second image. For example, when the object is a human face, the greater the first weight, the closer the face attributes in the target image are to the face attributes in the first image.
  • embodiments of the present disclosure can apply the fusion weights of different object attributes to part of the first N-dimensional vector of the first latent vector and part of the second N-dimensional vector of the second latent vector, that is, according to the type of the object attribute, the first N-dimensional vector is determined.
  • the fusion between attributes does not interfere with each other and the fusion effect.
  • the first latent variable can be expressed as M first N-dimensional vectors
  • the second latent variable can be expressed as M second N-dimensional vectors.
  • Fusion of the first latent variable and the second latent variable to obtain the fused third latent variable includes: multiplying the first weight by at least one first N-dimensional vector in the first latent variable to obtain the first weighted latent variable; The second weight is multiplied by at least one second N-dimensional vector in the second latent variable to obtain the second weighted latent variable; the first weighted latent variable and the second weighted latent variable are added to obtain the third latent variable. In this way, the first latent variable and the second latent variable can be effectively fused according to the fusion weight.
  • Step S14 Decode the third latent variable to obtain the fused target image.
  • the generation network can be used to decode the third latent variable to obtain the target image. It should be understood that the embodiments of the present disclosure do not limit the network structure, network type, and training method of the generating network.
  • the generating network can be obtained by training a generative adversarial network (GAN).
  • GAN generative adversarial network
  • the generation network can be used to generate an image with a specified image style based on M N-dimensional vectors.
  • the image style can include at least a real style and a non-realistic style, for example.
  • the non-realistic style can include at least a comic style, a European and American style, and a sketch style, for example. Oil painting style, print style, etc.
  • the image styles of the target image obtained by decoding the third latent variable using the generation network corresponding to different image styles are different.
  • the object is a human face
  • the target image obtained by the generation network corresponding to the real style is different.
  • the faces in can be real-style faces, and the faces obtained by the generation network corresponding to non-realistic styles can be non-realistic-style faces.
  • the user can set the image style of the target image, or the user can select a generation network corresponding to different image styles to decode the third latent variable, and determine based on the set image style of the target image.
  • the corresponding target generation network and then use the target generation network to decode the third latent variable to obtain the fused target image.
  • the user can set a first image style of the first image and a second image style of the second image, and determine the first generation network corresponding to the first image style based on the set first image style. ; Based on the set second image style, determine the second generation network corresponding to the second image style; perform network fusion of the first generation network and the second generation network to obtain the target generation network; and then use the target generation network to generate the third hidden network.
  • the variables are decoded to obtain the fused target image.
  • FIG. 2 shows a schematic diagram of a graphical interactive interface provided by an embodiment of the present disclosure.
  • users can upload images on control P2 by “drag files to this area” or “browse folders”.
  • "1.jpg” upload the image "01(2).jpg” on control P4 by “drag files to this area” or “browse folder”, etc.
  • You can also adjust the solid circle line segment on control P5 Set the degree of facial fusion by adjusting the position on control P6. Set the degree of facial fusion by adjusting the position of the solid circle on the line segment.
  • the user can select "Style Model 1" on control P1 and “Style Model 2" on control P3 to set the image style. Setting the image style means selecting the adopted image style.
  • the image “1.jpg” corresponds to the aforementioned first image
  • the image “01(2).jpg” corresponds to the aforementioned second image
  • "face shape” and “face color” correspond to the aforementioned object attributes
  • style model 1 corresponds to the aforementioned first generation network
  • style model 2 corresponds to the aforementioned second generation network.
  • Figure 3a shows a schematic diagram of a first image provided by an embodiment of the present disclosure.
  • Figure 3b shows a schematic diagram of a second image provided by an embodiment of the present disclosure.
  • Figure 4a shows a schematic diagram of a second image provided by an embodiment of the present disclosure.
  • Figure 4b shows a schematic diagram 2 of a target image provided by an embodiment of the present disclosure.
  • the target image shown in Figure 4a may be a real-style target image obtained by fusing the first image shown in Figure 3a and the second image shown in Figure 3b according to the image fusion method of the embodiment of the present disclosure.
  • the face shape identified by S5 is the face shape obtained by merging the face shape identified by S1 in Figure 3a and the face shape identified by S3 in Figure 3b
  • the face color identified by S6 is the face shape identified by S2 in Figure 3a and the face shape.
  • the target image shown in Figure 4b may be a comic-style target image obtained by fusing the first image shown in Figure 3a and the second image shown in Figure 3b according to the image fusion method of the embodiment of the present disclosure.
  • the face shape identified by S7 is a comic-style face obtained by merging the face shape identified by S1 in Figure 3a and the face shape identified by S3 in Figure 3b
  • the face color identified by S8 is identified by S2 in Figure 3a
  • the first latent variable corresponding to the first image and the second latent variable corresponding to the second image are obtained, and then according to any set object Fusion weight of attributes, fuse the first latent variable and the second latent variable to obtain the fused third latent variable, and decode the third latent variable to obtain the target image, which can realize different object attributes based on user settings
  • the fusion weight realizes the decoupled fusion of different object attributes, and can also control the fusion degree of different object attributes, so that the fused target image can meet the different fusion needs of users.
  • the fusion weight includes the first weight corresponding to the first image and the second weight corresponding to the second image.
  • the attribute fusion weight setting operation is to fuse the first latent variable and the second latent variable according to the set fusion weight to obtain the fused third latent variable, including:
  • Step S131 According to the type of the object attribute, determine the first weighted latent variable between the first weight and the first latent variable, and the second weighted latent variable between the second weight and the second latent variable.
  • the type of object attributes includes at least one of the outline shape and appearance color of the object.
  • the first weight includes at least one of the first sub-weight corresponding to the contour shape in the first image, and the third sub-weight corresponding to the appearance color in the first image;
  • the second weight includes the corresponding first sub-weight of the contour shape in the second image The second sub-weight, and at least one of the fourth sub-weight corresponding to the appearance color in the second image.
  • the fusion weight can be set to a certain value range.
  • the value range of the fusion weight can be set to [0,1]; and, based on the value range of the fusion weight, the sum of the first weight and the second weight can be Specify a numerical value (for example, 1). Based on this, the sum of the first sub-weight and the second sub-weight is the specified value, and the sum of the third sub-weight and the fourth sub-weight is also the specified value.
  • the user can only set the first sub-weight and get the second sub-weight; or only set the second sub-weight and get the first sub-weight; similarly, the user can also only set the third sub-weight and get The fourth sub-weight; or just set the fourth sub-weight to get the third sub-weight.
  • the specified value as 1 as an example, if the first sub-weight is set to F1, the second sub-weight is 1-F1, and if the third sub-weight is set to F2, the fourth sub-weight is 1-F2, where F1, F2 ⁇ [0,1].
  • the second sub-weight can be obtained as 0.5; based on the third sub-weight set at the face color as 0.5, the fourth sub-weight can be obtained as 0.5 .
  • the first sub-weight can represent the proximity between the fused target image and the contour shape in the first image
  • the second sub-weight can represent the proximity between the fused target image and the contour shape in the second image
  • the third sub-weight can represent the proximity between the fused target image and the contour shape in the second image. It can represent the closeness of the fused target image to the appearance color in the first image
  • the fourth sub-weight can represent the closeness of the fused target image to the appearance color in the second image.
  • the greater the first sub-weight that is, the smaller the second sub-weight
  • the closer the contour shape in the target image is to the contour shape in the first image
  • the greater the second sub-weight that is, the smaller the first sub-weight
  • the smaller the weight the closer the contour shape in the target image is to the contour shape in the second image
  • the larger the third sub-weight that is, the smaller the fourth sub-weight
  • the closer the appearance color in the target image is to the appearance color in the first image
  • the greater the fourth sub-weight that is, the smaller the third sub-weight
  • the larger the first sub-weight is, the closer the face shape of the face in the target image is to the face shape of the face in the first image.
  • the larger the fourth sub-weight is, the closer the complexion of the face in the target image is. Close to the face color in the second image.
  • the low-resolution network layer of the generative network is more sensitive to the contour shape, and the high-resolution network layer is more sensitive to the appearance color.
  • a first weighted latent variable between the first weight and the first latent variable, and a second weighted latent variable between the second weight and the second latent variable are determined according to the type of the object attribute. , including: when the object attribute includes the outline shape, multiply the first i first N-dimensional vector among the M first N-dimensional vectors with the first sub-weight to obtain the first i-th first weighted latent variable A weighted N-dimensional vector; and, multiply the first i second N-dimensional vectors among the M second N-dimensional vectors by the second sub-weight to obtain the first i second weighted N-dimensional vectors of the second weighted latent variable ; Among them, i ⁇ [1,M). In this way, the degree of fusion of the outline shape of the object can be controlled, which facilitates the decoupling and fusion of the outline shape and appearance color.
  • the first i first N-dimensional vectors among the M first N-dimensional vectors are multiplied by the first sub-weight to obtain the first i first weighted N-dimensional vectors of the first weighted latent variable, which can be understood as,
  • the first sub-weight acts on the first i first N-dimensional vectors of the first latent variable; multiplying the first i second N-dimensional vectors among the M second N-dimensional vectors by the second sub-weight can be understood as, Apply the second sub-weight to the first i second N-dimensional vectors of the second latent variable.
  • a first weighted latent variable between the first weight and the first latent variable, and a second weighted latent variable between the second weight and the second latent variable are determined according to the type of the object attribute. , including: when the object attributes include appearance color, multiply the last M-i first N-dimensional vectors among the M first N-dimensional vectors with the third sub-weight to obtain the last M-i-th first weighted latent variable. A weighted N-dimensional vector; and, multiply the last M-i second N-dimensional vectors among the M second N-dimensional vectors by the fourth sub-weight to obtain the last M-i second weighted N-dimensional vectors of the second weighted latent variable. ; Among them, i ⁇ [1,M).
  • the degree of fusion of the object's appearance color can be controlled, which facilitates the decoupling and fusion of outline shape and appearance color.
  • the last M-i first N-dimensional vectors among the M first N-dimensional vectors are multiplied by the third sub-weight to obtain the last M-i first weighted N-dimensional vectors of the first weighted latent variable, that is, the third sub-weight Act on the last M-i first N-dimensional vectors of the first hidden variable; multiply the last M-i second N-dimensional vectors among the M second N-dimensional vectors by the fourth sub-weight, that is, the fourth sub-weight acts on the The second M-i second N-dimensional vector of the two latent variables.
  • the value of i may be an empirical value determined through experimental testing based on the network structure of the generated network, and the embodiment of the present disclosure does not limit this.
  • Step S132 Determine the third latent variable based on the first weighted latent variable and the second weighted latent variable.
  • the first weighted latent variable can be expressed as M first weighted N-dimensional vectors
  • the second weighted latent variable can be expressed as M second weighted N-dimensional vectors
  • the third latent variable can be expressed as M third N-dimensional vectors. vector.
  • determining the third latent variable based on the first weighted latent variable and the second weighted latent variable includes: combining the first i first weighted N-dimensional vectors of the first weighted latent variable with the second weighted latent variable.
  • This method can be understood as adding the first weighted latent variable and the second weighted latent variable to obtain the third latent variable. In this way, the fused third latent variable can be effectively obtained.
  • the generative network can generate target images with non-realistic styles, such as comic-style target images, in this case, the appearance color of the objects in the first image and the second image has less impact on the appearance color of the objects in the target image. There is even no effect, so the apparent color of the object in the target image can depend on the corresponding non-realistic style of the generative network, independent of the apparent color of the object in the first image versus the second image.
  • determining the third latent variable based on the first weighted latent variable and the second weighted latent variable also includes: The last M-i first N-dimensional vectors of the first latent variable corresponding to the first weighted latent variable are used as the last M-i third N-dimensional vectors of the third latent variable; or, the second latent latent variable corresponding to the second weighted latent variable is used as the last M-i third N-dimensional vectors of the third latent variable.
  • the last M-i second N-dimensional vectors of the variable are used as the last M-i third N-dimensional vectors of the third hidden variable.
  • the appearance color of the object in the target image depends on the non-realistic style corresponding to the generation network, and is not affected by the appearance color of the object in the first image and the second image, and is also not affected by the third hidden image after fusion.
  • the influence of appearance color implied by the variable can be the last M-i first N-dimensional vectors of the first latent variable, or the last M-i first N-dimensional vectors of the first latent variable.
  • the last M-i second N-dimensional vectors of the two latent variables may also be the sum of the last M-i first weighted N-dimensional vectors and the last M-i second weighted N-dimensional vectors.
  • the last M-i third N-dimensional vectors of the third latent vector are the above-mentioned last M-i first weighted N-dimensional vectors and the above-mentioned last M-i second weighted N vectors. The sum of dimensional vectors. In this way, the target image can be fused with the appearance color and outline shape of the object in the first image and the second image.
  • the fusion of different object attributes with different degrees of fusion can be realized according to the types of object attributes, as well as the first weight and the second weight, so that the target image obtained based on the fused third latent variable can satisfy Different integration needs of users.
  • the user can set the image style of the target image. Different image styles correspond to different generation networks.
  • the third latent variable is decoded to obtain the fused target. image, including: step S141: in response to a style setting operation for an image style of the target image, determine a target generation network corresponding to the set image style, and the target generation network is used to generate an image with the set image style.
  • the graphical interactive interface can provide settings for setting The operation control of the image style is used to implement the user's style setting operation for the image style, which is not limited by the embodiment of the present disclosure.
  • the user can set the image style at "Style Model 1" and "Style Model 2", and the target generation network used can be determined based on the set image style.
  • the image style of the target image is a fusion of two image styles, for example, a style that is a fusion of a real style and a comic style, for this, the user can also use the "Style Model 1" shown in Figure 2 Set a different image style from "Style Model 2".
  • the two generation networks corresponding to the two image styles can be network fused to obtain the target generation network, and then the target generation network after network fusion can be used to generate a target generation network with A target image that blends two image styles.
  • the target generation network is also the generation network corresponding to the image style set by the user.
  • Figure 5 shows a schematic diagram of a graphical user interface provided by an embodiment of the present disclosure.
  • the user can set a style identifier that fuses two image styles at the control P7 corresponding to the "style model", such as "fusion Style 1", so that the network identifier of the fused target generation network can be determined, and the fused target generation network can be saved, so that the user can directly call the fused target generation network by setting the fused image style.
  • style model such as "fusion Style 1”
  • Step S142 Use the target generation network to decode the third latent variable to obtain the target image.
  • the third latent variable can be represented as M third N-dimensional vectors.
  • the target generation network has M network layers, and the target generation network is used to decode the third latent variable.
  • Obtaining the target image includes: inputting the first third N-dimensional vector into the first network layer of the target generation network to obtain the first intermediate image output by the first network layer; converting the m-th third N-dimensional vector And the m-1 intermediate image is input to the m-th network layer of the target generation network, and the m-th intermediate image output by the m-th network layer is obtained, n ⁇ [2,M); the M-th third N-dimensional
  • the vector and the M-1 intermediate image are input to the M-th network layer of the target generation network to obtain the style fusion image output by the M-th network layer.
  • the target image includes the style fusion image.
  • the target generation network can be used to generate images with gradually increasing resolutions, and the target generation network can also be called a multi-layer transformation target generation network.
  • the input of the first network layer of the target generation network is a third N-dimensional vector.
  • the input of each subsequent network layer includes a third N-dimensional vector and the intermediate image output by the upper network layer.
  • the last network layer outputs the target image. .
  • the low-resolution network layer of the target generation network (also called a shallow network layer) first learns and generates a low-resolution (such as 4 ⁇ 4 resolution) intermediate image, and then gradually increases with the depth of the network. increases, continue to learn and generate intermediate images with higher resolution (such as 512 ⁇ 512 resolution), and finally generate the target image with the highest resolution (such as 1024 ⁇ 1024 resolution).
  • a low-resolution such as 4 ⁇ 4 resolution
  • Figure 6 shows a schematic diagram 1 of an image fusion process provided by an embodiment of the present disclosure.
  • the image fusion process shown in Figure 6 can be an image fusion process when the user sets an image style.
  • the first image in Figure 6 , the second image and the target image are both real-style images.
  • the image fusion process shown in Figure 6 may include: inputting the first image and the second image into the face image encoders identified by L1 and L2 respectively (corresponding to the aforementioned image encoder for faces), and obtaining the first hidden image respectively.
  • the target generation network corresponding to the set image style can be used to decode the third latent variable, so that the target image with the set image style can be effectively obtained.
  • the user can set two image styles and perform network fusion on the two generation networks corresponding to the two image styles to obtain a target generation network.
  • the target generation network after network fusion can be used to generate two images with the fusion characteristics style target image.
  • the set image style includes a first image style and a second image style.
  • the first image style and the second image style have different style types.
  • the style setting operation is also used to set the degree of style fusion.
  • the style The degree of fusion is used to indicate the number of network layers fused between the first generation network and the second generation network.
  • determining the target generation network corresponding to the set image style includes: determining the target generation network corresponding to the first image style.
  • a first generation network, and a second generation network corresponding to the second image style the first generation network is used to generate an image with the first image style, and the second generation network is used to generate an image with the second image style; according to The degree of style fusion, network fusion of the first generation network and the second generation network is performed to obtain the target generation network.
  • network fusion between the first generation network and the second generation network can be achieved according to the degree of style fusion, so that the target generation network can generate a target image with a fusion of the two image styles.
  • the corresponding first generation network and the second generation network can be called based on the set first image style and the second image style, so as to compare the first generation network and the second generation network.
  • Generating networks for network fusion can control the closeness of the image style of the target image to the first image style, that is, it can control the closeness of the image style of the target image to the second image style.
  • the degree of style fusion is used to indicate the number of network layers fused between the first generating network and the second generating network, where the number of fused network layers is less than the total number of network layers of the first generating network and the second generating network.
  • the first generation network and the second generation network each have M network layers, wherein the first generation network and the second generation network are network fused according to the degree of style fusion to obtain the target generation network , including: replacing the first I network layer of the first generation network with the first I network layer of the second generation network to obtain the target generation network; or, replacing the last I network layer of the first generation network with the first I network layer of the second generation network.
  • the last I network layer of the second generation network is used to obtain the target generation network; where I is the number of network layers, I ⁇ [1,M), and the style proximity between the image style of the target image and the first image style is related to the network layer
  • the number I is negatively correlated, and the style proximity between the image style of the target image and the second image style is positively correlated with the number of network layers I.
  • network fusion between the first generation network and the second generation network can be effectively realized, so that the target generation network can generate a target image with a fusion of the two image styles.
  • the first I network layer of the first generation network is replaced with the first I network layer of the second generation network, that is, the first I layer network layer of the first generation network is replaced with the first I network layer of the second generation network.
  • the last N-I layer network layers of the generated network are spliced.
  • the value of I can be customized by the user according to the style fusion requirements.
  • the degree of style fusion can be set by setting the "face" operation control in the graphical interactive interface shown in Figure 5.
  • the fusion weight for appearance color set by the user in the graphical interactive interface can be converted into the set style fusion degree; of course, independent operation controls can also be provided in the above graphical interactive interface to set style fusion.
  • the embodiments of the present disclosure are not limited. Among them, the style proximity between the image style of the target image and the first image style is negatively correlated with the number of network layers I, and the style proximity between the image style of the target image and the second image style is positively correlated with the number of network layers I Related.
  • the target image may include a style fusion image output by the target generation network.
  • the target image may also include: a result of decoding the third latent variable using the first generation network. At least one of the obtained first style image and the second style image obtained by using the second generation network to decode the third latent variable.
  • the implementation method of decoding the third latent variable to obtain the style fusion image can be referred to the above-mentioned target generation network, realizing the first style image obtained by using the first generation network to decode the third latent variable, and using the second Generate the second style image obtained by decoding the third latent variable.
  • FIG. 7 shows a second schematic diagram of an image fusion process provided by an embodiment of the present disclosure.
  • the image fusion process shown in Figure 7 can be an image fusion process when the user sets two image styles.
  • the image fusion process shown in Figure 7 may include: inputting the first image and the second image into the face image encoders respectively identified by L6 and L7 to obtain the first latent variable and the second latent variable respectively; according to the identification of L8
  • the fusion weight set for the face shape is to fuse the first i first N-dimensional vectors of the first latent variable with the first i second N-dimensional vectors of the second latent variable to obtain the first i third third latent variable N-dimensional vector, and use the last M-i first N-dimensional vector of the first hidden variable or the last M-i second N-dimensional vector of the second hidden variable as the last M-i third N-dimensional vector of the third hidden variable; according to To set the style fusion degree of the L9 logo, fuse the first generation network of the L10 logo corresponding to the image style
  • the contour shape and the attribute fusion of the contour shape can be effectively decoupled, so that the user can set fusion weights for the contour shape and the contour shape respectively, and perform fusion with different degrees of fusion; it can also directly act on different images. Styled image fusion.
  • the present disclosure also provides image fusion devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image fusion method provided by the present disclosure.
  • image fusion devices electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image fusion method provided by the present disclosure.
  • Figure 8 shows a block diagram of an image fusion device provided by an embodiment of the present disclosure.
  • the device includes: an acquisition module 101 configured to acquire a first image and a second image to be fused.
  • the first image and the second image contain the same object;
  • the encoding module 102 is configured to encode the first image and the second image respectively to obtain the first latent variable corresponding to the first image.
  • the fusion module 103 is configured to respond to the setting operation of the fusion weight for any object attribute of the same type of object, according to the set fusion weight, to the first
  • the latent variable is fused with the second latent variable to obtain a fused third latent variable
  • the decoding module 104 is configured to decode the third latent variable to obtain a fused target image.
  • the fusion weight includes a first weight corresponding to the first image, and a second weight corresponding to the second image;
  • the fusion module 103 includes: a weighted latent variable determiner A module configured to determine, according to the type of the object attribute, a first weighted latent variable between the first weight and the first latent variable, and a first weighted latent variable between the second weight and the second latent variable. a second weighted latent variable; a fusion submodule configured to determine the third latent variable according to the first weighted latent variable and the second weighted latent variable.
  • the first hidden variable is represented by M first N-dimensional vectors
  • the second latent variable is represented by M second N-dimensional vectors
  • M and N are positive integers
  • the The type of object attribute includes the outline shape of the object
  • the first weight includes a first sub-weight corresponding to the outline shape in the first image
  • the second weight includes a third sub-weight corresponding to the outline shape in the second image.
  • the weighted latent variable determination sub-module is configured to compare the first i first N-dimensional vectors among the M first N-dimensional vectors with the Multiply the first sub-weights to obtain the first i first weighted N-dimensional vectors of the first weighted latent variable; and, combine the first i second N-dimensional vectors among the M second N-dimensional vectors with all Multiply the second sub-weights to obtain the first i second weighted N-dimensional vectors of the second weighted latent variable; where, i ⁇ [1,M).
  • the first hidden variable is represented by M first N-dimensional vectors
  • the second latent variable is represented by M second N-dimensional vectors
  • M and N are positive integers
  • the The type of object attribute includes the appearance color of the object
  • the first weight includes the third sub-weight corresponding to the appearance color in the first image
  • the second weight includes the third sub-weight corresponding to the appearance color in the second image.
  • the weighted latent variable determination sub-module is configured to combine the last M-i first N-dimensional vectors among the M first N-dimensional vectors with the said Multiply the third sub-weights to obtain the last M-i first weighted N-dimensional vectors of the first weighted latent variable; and, combine the last M-i second N-dimensional vectors among the M second N-dimensional vectors with the The fourth sub-weights are multiplied together to obtain the last M-i second weighted N-dimensional vectors of the second weighted latent variable; where i ⁇ [1,M).
  • the first weighted latent variable is represented by M first weighted N-dimensional vectors
  • the second weighted latent variable is represented by M second weighted N-dimensional vectors
  • the third latent variable is represented by M second weighted N-dimensional vectors.
  • the variables are represented as M third N-dimensional vectors; the fusion sub-module is configured to combine the first i-th first weighted N-dimensional vector of the first weighted latent variable with the first i-th first weighted N-dimensional vector of the second weighted latent variable.
  • the first weighted latent variable is represented by M first weighted N-dimensional vectors
  • the second weighted latent variable is represented by M second weighted N-dimensional vectors
  • the third latent variable is represented by M second weighted N-dimensional vectors.
  • the variables are represented as M third N-dimensional vectors; the fusion sub-module is configured to use the last M-i first N-dimensional vectors of the first latent variable corresponding to the first weighted latent variable as the third latent variable.
  • the last M-i third N-dimensional vectors of the variable; or, the last M-i second N-dimensional vectors of the second latent variable corresponding to the second weighted latent variable are used as the last M-i third N-dimensional vectors of the third latent variable.
  • the decoding module 104 includes: a network determination submodule configured to determine a target generation network corresponding to the set image style in response to a style setting operation for the image style of the target image. , the target generation network is used to generate an image with the set image style; the decoding submodule is configured to use the target generation network to decode the third latent variable to obtain the target image.
  • the set image style includes a first image style and a second image style.
  • the first image style and the second image style have different style types.
  • the style setting operation is also used to set the degree of style fusion.
  • the degree of style fusion is used to indicate the number of network layers fused between the first generation network and the second generation network;
  • the network determination submodule is configured to determine the first generation network corresponding to the first image style, and a second generation network corresponding to the second image style, the first generation network is used to generate an image with the first image style, and the second generation network is used to generate an image with the second image style.
  • image perform network fusion on the first generation network and the second generation network according to the degree of style fusion to obtain the target generation network.
  • the first generation network and the second generation network each have M network layers; the network determination submodule is configured to determine the first I layer network of the first generation network layer, replace the first I layer network layer of the second generation network to obtain the target generation network; or, replace the last I layer network layer of the first generation network with the last I layer network layer of the second generation network.
  • I layer network layer to obtain the target generation network where I is the number of network layers, I ⁇ [1,M), the style proximity between the image style of the target image and the first image style It is negatively correlated with the number of network layers I, and the style proximity between the image style of the target image and the second image style is positively correlated with the number of network layers I.
  • the target generation network has M network layers, and the third latent variable is represented as M third N-dimensional vectors; the decoding sub-module is configured to convert the first third The N-dimensional vector is input to the first network layer of the target generation network to obtain the first intermediate image output by the first network layer; the m-th third N-dimensional vector and the m-1-th intermediate image are obtained Input to the m-th network layer of the target generation network, and obtain the m-th intermediate graph output by the m-th network layer, n ⁇ [2,M); combine the M-th third N-dimensional vector and the M-th -1 intermediate image is input to the Mth network layer of the target generation network to obtain the style fusion image output by the Mth network layer, and the target image includes the style fusion image.
  • the target image further includes: a first style image obtained by decoding the third latent variable using the first generation network, and a first style image obtained by using the second generation network to decode the third latent variable. At least one of the second style images obtained by decoding the third latent variable.
  • the first latent variable corresponding to the first image and the second latent variable corresponding to the second image are obtained, and then according to any set object Fusion weight of attributes, fuse the first latent variable and the second latent variable to obtain the fused third latent variable, and decode the third latent variable to obtain the target image, which can realize different object attributes based on user settings
  • the fusion weight realizes the decoupled fusion of different object attributes, and can also control the fusion degree of different object attributes, so that the fused target image can meet the different fusion needs of users.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the method described in the above method embodiments.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the method described in the above method embodiments.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the method described in the above method embodiments.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the method described in the above method embodiments.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the method described in the above method embodiments.
  • Embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented.
  • Computer-readable storage media may be volatile or non-volatile computer-readable storage media.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute the above method.
  • Embodiments of the present disclosure also provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code.
  • the computer readable code is stored in a processor of an electronic device, When running, the processor in the electronic device executes the above method.
  • the electronic device may be provided as a terminal, a server, or other forms of equipment.
  • FIG. 9 shows a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server or terminal device.
  • electronic device 1900 includes a processing component 1922, which may include one or more processors, and memory resources, represented by memory 1932, for storing instructions, such as application programs, executable by processing component 1922.
  • the application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described method.
  • Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 .
  • I/O input-output
  • a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the above method.
  • the present disclosure may be a system, method, and/or computer program product.
  • a computer program product may include a computer-readable storage medium having thereon computer-readable program instructions for causing a processor to implement aspects of the present disclosure.
  • Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device.
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by dedicated hardware systems that perform the specified functions or actions, or It can be implemented using a combination of dedicated hardware and computer instructions.
  • the computer program product may be implemented in hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc.
  • the products applying the disclosed technical solution will clearly inform the personal information processing rules and obtain the individual's independent consent before processing personal information.
  • the product applying the disclosed technical solution must obtain the individual's separate consent before processing the sensitive personal information, and at the same time meet the requirement of "express consent”. For example, setting up clear and conspicuous signs on personal information collection devices such as cameras to inform them that they have entered the scope of personal information collection, and that personal information will be collected.
  • personal information processing rules may include personal information processing Information such as the person, the purpose of processing personal information, the method of processing, and the types of personal information processed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开涉及一种图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品,所述方法包括:获取待融合的第一图像与第二图像,第一图像与第二图像中有同一种对象;分别对第一图像与第二图像进行编码处理,得到第一图像对应的第一隐变量以及第二图像对应的第二隐变量;响应于针对同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量;对第三隐变量进行解码处理,得到融合后的目标图像。本公开实施例能够使融合后的目标图像满足用户的不同融合需求。

Description

图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品
相关申请的交叉引用
本公开基于申请号为202210298017.4、申请日为2022年03月25日、申请名称为“图像融合方法及装置、电子设备和存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及计算机技术领域,尤其涉及一种图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品。
背景技术
人脸图像的属性融合指的是对两张图像中人脸属性进行融合,例如,用户需要融合图像1与图像2,且融合后图像中人脸的脸型接近图像1中的人脸的脸型,而人脸的脸色接近图像2中的人脸的脸色。但相关技术只能将两张图像整体进行融合,这样两张图像中脸型与脸色的融合程度是相同的,也即无法控制将图像1与图像2中的脸型与脸色这两种人脸属性解耦开融合。
发明内容
本公开实施例提供了一种图像融合方法,包括:获取待融合的第一图像与第二图像,所述第一图像与所述第二图像中有同一种对象;分别对所述第一图像与所述第二图像进行编码处理,得到所述第一图像对应的第一隐变量以及所述第二图像对应的第二隐变量;响应于针对所述同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对所述第一隐变量与所述第二隐变量进行融合,得到融合后的第三隐变量;对所述第三隐变量进行解码处理,得到融合后的目标图像。
本公开实施例提供了一种图像融合装置,包括:获取模块,配置为获取待融合的第一图像与第二图像,所述第一图像与所述第二图像中有同一种对象;编码模块,配置为分别对所述第一图像与所述第二图像进行编码处理,得到所述第一图像对应的第一隐变量以及所述第二图像对应的第二隐变量;融合模块,配置为响应于针对所述同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对所述第一隐变量与所述第二隐变量进行融合,得到融合后的第三隐变量;解码模块,配置为对所述第三隐变 量进行解码处理,得到融合后的目标图像。
本公开实施例提供了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。
本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本公开实施例提供一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码被计算机读取并执行的情况下,实现本公开任一实施例中的方法的部分或全部步骤。
本公开实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现本公开任一实施例中的方法的部分或全部步骤。
在本公开实施例中,通过对待融合的第一图像与第二图像进行编码处理,得到第一图像对应的第一隐变量以及第二图像对应的第二隐变量,再根据设置的任一对象属性的融合权重,对第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量,并对第三隐变量进行解码处理,得到目标图像,可以实现基于用户设置的不同对象属性的融合权重,实现不同对象属性的解耦融合,并且还可控制不同对象属性的融合程度,使得融合后的目标图像满足用户的不同融合需求。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1为本公开实施例提供的一种图像融合方法的流程图;
图2为本公开实施例提供的一种图形交互界面的示意图;
图3a为本公开实施例提供的一种第一图像的示意图;
图3b为本公开实施例提供的一种第二图像的示意图;
图4a为本公开实施例提供的一种目标图像的示意图一;
图4b为本公开实施例提供的一种目标图像的示意图二;
图5为本公开实施例提供的一种图形用户界面的示意图;
图6为本公开实施例提供的一种图像融合流程的示意图一;
图7为本公开实施例提供的一种图像融合流程的示意图二;
图8为本公开实施例提供的一种图像融合装置的框图;
图9为本公开实施例提供的一种电子设备的框图。
具体实施方式
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
另外,为了更好地说明本公开,在下文的实施方式中给出了众多的细节。本领域技术人员应当理解,没有某些细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。
图1示出了本公开实施例提供的一种图像融合方法的流程图,所述图像融合方法可以由终端设备或服务器等电子设备执行,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等,所述方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现,或者,可通过服务器执行所述方法。如图1所示,所述图像融合方法包括:
步骤S11:获取待融合的第一图像与第二图像,第一图像与第二图像中有同一种对象。其中,第一图像与第二图像可以是通过图像采集设备实时采集的图像,还可以是从本地存储中提取的图像,或还可以是由其它电子设备传输的图像,应理解的是,用户可以自定义上传待融合的第一图像与第二图像。对于第一图像与第二图像的获取方式,本公开实施例不作限制。
其中,对象可以包括但不限于:人脸、人手、人体、物品、动物、植物等。第一图像与第二图像中有同一种对象,可以理解为,第一图像与第二图像中的对象是同一种,但可能不是同一个,例如,第一图像与第二图像中可以都有人脸,但第一图像与第二图像中的人脸不是同一个人的人脸,或者说用户期望将第一图像与第二图像中的两张不同人脸进行融合。
步骤S12:分别对第一图像与第二图像进行编码处理,得到第一图像对应的第一隐变量以及第二图像对应的第二隐变量。
在一种可能的实现方式中,可以通过不同对象所对应的图像编码器,分别对第一图像与第二图像进行编码处理,得到第一图像对应的第一隐变量以及第二图像对应的第二隐变量。例如对象为人脸的情况下,可以采用针对人脸的图像编码器对图像进行编码处 理;对象为人体的情况下,可以采用针对人体的图像编码器对图像进行编码处理等。其中,可以采用本领域已知的深度学习技术实现上述图像编码器,例如,图像编码器可以采用深度神经网络,分别对第一图像与第二图像进行特征提取,将提取的第一图像中的第一深度特征作为第一隐变量,将提取的第二图像中的第二深度特征作为第二隐变量。应理解的是,针对第一图像与第二图像的编码方式,本公开实施例不作限制。
在一种可能的实现方式中,第一隐变量可以表示为M个第一N维向量,第二隐变量可以表示为M个第二N维向量,M与N为正整数,例如,人脸图像编码器可以将第一图像编码成18个第一512维向量,将第二图像编码成18个第二512维向量。其中,M、N均为正整数。通过该方式,可以便于之后融合第一隐变量与第二隐变量。
步骤S13:响应于针对同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量。
在本公开实施例中,同一种对象的对象属性可以至少包括轮廓形状与外观颜色中的至少一种,将两张图像进行融合可以认为是将两张图像中该同一种对象的轮廓形状与外观颜色进行融合。应理解的是,本领域技术人员可以根据对象的种类增加可融合的对象属性,例如,对象为人脸的情况下,对象属性还可以包括人脸表情;对象为人体的情况下,对象属性还可以包括人体姿态等,对此本公开实施例不作限制。示例性的,在对象为人脸的情况下,将第一图像与第二图像进行融合,可以是将第一图像与第二图像中两张人脸的脸型与脸色(包括妆容颜色、皮肤颜色、瞳孔颜色等)分别进行融合;在对象为人手的情况下,将第一图像与第二图像进行融合,可以是将第一图像与第二图像中两张人手的手型与肤色分别进行融合。
应理解的是,本领域技术人员可以利用本领域已知的软件开发技术,设计并实现本公开实施例的图像融合方法的应用程序以及对应的图形交互界面,该图形交互界面中可以提供用于设置融合权重的操作控件,以实现用户对任一对象属性的融合权重的设置操作,对此本公开实施例不作限制。
在一种可能的实现方式中,融合权重可以设置一定取值范围,例如可以设置融合权重的取值范围为[0,1]。其中,为便于将第一隐变量与第二隐变量进行融合,融合权重可以包括第一图像对应的第一权重,以及第二图像对应的第二权重,第一权重作用于第一隐变量,第二权重作用于第二隐变量。
在本公开实施例中,基于融合权重的取值范围,第一权重与第二权重的和可以为指定数值(例如为1),这样用户可以仅设置第一权重,基于设置的第一权重以及指定数值可以得到第二权重;或用户可以仅设置第二权重,基于设置的第二权重以及指定数值可以得到第一权重。例如,若指定数值为1,用户设置的第一权重为F,可以得到第二权重为1-F,其中,F∈[0,1]。
在本公开实施例中,第一权重可以表征融合后的目标图像与第一图像中对象属性的接近程度,第二权重可以表征融合后目标图像与第二图像中对象属性的接近程度。应理解的是,第一权重越大(也即第二权重越小),目标图像中对象属性越接近第一图像; 反之,第二权重越大(也即第一权重越小),目标图像中对象属性越接近第二图像。例如,在对象为人脸的情况下,第一权重越大,目标图像中人脸属性越接近第一图像中人脸属性。
经实验发现,在后续利用生成网络对第三隐变量进行解码处理生成目标图像的过程中,生成网络的不同网络层对不同对象属性的敏感程度不同,或者说学习效果不同。生成网络的低分辨率网络层(也可以称为浅层网络层)对轮廓形状比较敏感,高分辨率网络层(也可以称为高层网络层)对外观颜色比较敏感。因此,本公开实施例可以将不同对象属性的融合权重作用于第一隐向量的部分第一N维向量以及第二隐向量的部分第二N维向量,也即根据对象属性的种类,确定第一权重作用于的至少一个第一N维向量以及第二权重作用于的至少一个第二N维向量,从而能够分别控制不同对象属性的融合程度,实现不同对象属性的解耦融合,兼顾不同对象属性之间的融合互不干扰以及融合效果。
如上所述,第一隐变量可以表示为M个第一N维向量,第二隐变量可以表示为M个第二N维向量,在一种可能的实现方式中,根据设置的融合权重,对第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量,包括:将第一权重与第一隐变量中至少一个第一N维向量相乘,得到第一加权隐变量;将第二权重与第二隐变量中的至少一个第二N维向量相乘,得到第二加权隐变量;将第一加权隐变量与第二加权隐变量相加,得到第三隐变量。通过该方式,能够根据融合权重有效地将第一隐变量与第二隐变量进行融合。
步骤S14:对第三隐变量进行解码处理,得到融合后的目标图像。
在一种可能的实现方式中,可以利用生成网络对第三隐变量进行解码处理,得到目标图像。应理解的是,对于生成网络的网络结构、网络类型以及训练方式,本公开实施例不作限制,例如,生成网络可以是通过训练生成式对抗网络(Generative Adversarial Networks,GAN)得到的。
其中,生成网络可以用于根据M个N维向量生成具有指定图像风格的图像,图像风格例如可以至少包括真实风格与非真实风格,非真实风格例如可以至少包括漫画风格、欧美风格、素描风格,油画风格,版画风格等。应理解的是,采用不同图像风格对应的生成网络对第三隐变量进行解码处理所得到的目标图像的图像风格不同,例如,对象为人脸的情况下,真实风格对应的生成网络得到的目标图像中的人脸可以是真实风格的人脸,非真实风格对应的生成网络得到的人脸可以是非真实风格的人脸。
在一种可能的实现方式中,用户可以设置目标图像的图像风格,或者说用户可以选择不同图像风格对应的生成网络来对第三隐变量进行解码处理,基于设置的目标图像的图像风格,确定对应的目标生成网络;进而利用目标生成网络对第三隐变量进行解码处理,得到融合后的目标图像。
在一种可能的实现方式中,用户可以设置第一图像的第一图像风格以及第二图像的第二图像风格,基于设置的第一图像风格,确定与第一图像风格对应的第一生成网络; 基于设置的第二图像风格,确定与第二图像风格对应的第二生成网络;将第一生成网络与第二生成网络进行网络融合,得到目标生成网络;进而利用目标生成网络对第三隐变量进行解码处理,得到融合后的目标图像。
图2示出了本公开实施例提供的一种图形交互界面的示意图,如图2所示,用户可以在控件P2上通过“拖拽文件到此区域”或“浏览文件夹”等方式上传图像“1.jpg”,在控件P4上通过“拖拽文件到此区域”或“浏览文件夹”等方式上传图像“01(2).jpg”,还可以在控件P5上通过调节实心圆在线段上的位置来设置脸型融合程度,在控件P6上通过调节实心圆在线段上的位置来设置脸色融合程度。此外,图2示出的图形交互界面中,用户可以在控件P1上选择“风格模型1”以及在控件P3上选择“风格模型2”以设置图像风格,设置了图像风格也即选择了所采用的生成网络。其中,图像“1.jpg”对应前述第一图像,图像“01(2).jpg”对应前述第二图像,“脸型”、“脸色”对应前述对象属性,风格模型1对应前述第一生成网络,风格模型2对应前述第二生成网络。
图3a示出了本公开实施例提供的一种第一图像的示意图,图3b示出了本公开实施例提供的一种第二图像的示意图,图4a示出了本公开实施例提供的一种目标图像的示意图一,图4b示出了本公开实施例提供的一种目标图像的示意图二,。如图4a示出的目标图像可以是按照本公开实施例的图像融合方法,将图3a示出的第一图像与图3b示出的第二图像进行融合所得到真实风格的目标图像。示例性的,图4a中,S5标识的脸型是由图3a中S1标识的脸型与图3b中S3标识的脸型融合后得到的脸型,S6标识的脸色是由图3a中S2标识的脸色与图3b中S4标识的脸色融合后得到的脸色。如图4b示出的目标图像可以是按照本公开实施例的图像融合方法,将图3a示出的第一图像与图3b示出的第二图像进行融合所得到漫画风格的目标图像。示例性的,图4b中,S7标识的脸型是由图3a中S1标识的脸型与图3b中S3标识的脸型进行融合后得到的漫画风格的脸型,S8标识的脸色是由图3a中S2标识的脸色与图3b中S4标识的脸色融合后得到的漫画风格的脸色。
在本公开实施例中,通过对待融合的第一图像与第二图像进行编码处理,得到第一图像对应的第一隐变量以及第二图像对应的第二隐变量,再根据设置的任一对象属性的融合权重,对第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量,并对第三隐变量进行解码处理,得到目标图像,可以实现基于用户设置的不同对象属性的融合权重,实现不同对象属性的解耦融合,并且还可控制不同对象属性的融合程度,使得融合后的目标图像满足用户的不同融合需求。
如上所述,融合权重包括第一图像对应的第一权重,以及第二图像对应的第二权重,在一种可能的实现方式中,在步骤S13中,响应于针对同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量,包括:
步骤S131:根据对象属性的种类,确定第一权重与第一隐变量之间的第一加权隐变量,以及第二权重与第二隐变量之间的第二加权隐变量。
在一种可能的实现方式中,对象属性的种类包括对象的轮廓形状与外观颜色中的至少一种,为了便于分别控制轮廓形状与外观颜色的融合程度,也即实现轮廓形状与外观颜色的解耦融合,第一权重包括第一图像中轮廓形状对应的第一子权重,以及第一图像中外观颜色对应的第三子权重中的至少一种;第二权重包括第二图像中轮廓形状对应的第二子权重,以及第二图像中外观颜色对应的第四子权重中的至少一种。
如上所述,融合权重可以设置一定取值范围,例如可以设置融合权重的取值范围为[0,1];以及,基于融合权重的取值范围,第一权重与第二权重的和可以为指定数值(例如为1)。基于此,第一子权重与第二子权重的和为该指定数值,第三子权重与第四子权重的和也为该指定数值。通过该方案,用户可以只设置第一子权重,便得到第二子权重;或只设置第二子权重,便得到第一子权重;类似的,用户还可以只设置第三子权重,便得到第四子权重;或只设置第四子权重,便得到第三子权重。以指定数值为1为例,若设置第一子权重为F1,则第二子权重为1-F1,若设置第三子权重为F2,则第四子权重为1-F2,其中,F1、F2∈[0,1]。如图2所示的图形交互界面,基于脸型处设置的第一子权重为0.5,可以得到第二子权重为0.5;基于脸色处设置的第三子权重0.5,可以得到第四子权重为0.5。
其中,第一子权重可以表征融合后的目标图像与第一图像中轮廓形状的接近程度,第二子权重可以表征融合后的目标图像与第二图像中轮廓形状的接近程度,第三子权重可以表征融合后的目标图像与第一图像中外观颜色的接近程度,第四子权重可以表征融合后的目标图像与第二图像中外观颜色的接近程度。
应理解的是,第一子权重越大(也即第二子权重越小),目标图像中轮廓形状越接近第一图像中轮廓形状;反之,第二子权重越大(也即第一子权重越小),目标图像中轮廓形状越接近第二图像中轮廓形状;第三子权重越大(也即第四子权重越小),目标图像中外观颜色越接近第一图像中外观颜色;反之,第四子权重越大(也即第三子权重越小),目标图像中外观颜色越接近第二图像中外观颜色。例如,在对象为人脸的情况下,第一子权重越大,目标图像中人脸的脸型越接近第一图像中人脸的脸型,第四子权重越大,目标图像中人脸的脸色越接近第二图像中人脸的脸色。
如上所述,生成网络的不同网络层对不同对象属性的敏感程度不同,或者说学习效果不同。生成网络的低分辨率网络层对轮廓形状比较敏感,高分辨率网络层对外观颜色比较敏感。
在一种可能的实现方式中,根据对象属性的种类,确定第一权重与第一隐变量之间的第一加权隐变量,以及第二权重与第二隐变量之间的第二加权隐变量,包括:在对象属性包括轮廓形状的情况下,将M个第一N维向量中的前i个第一N维向量与第一子权重相乘,得到第一加权隐变量的前i个第一加权N维向量;以及,将M个第二N维向量中的前i个第二N维向量与第二子权重相乘,得到第二加权隐变量的前i个第二加权N维向量;其中,i∈[1,M)。通过该方式,能够控制对象的轮廓形状的融合程度,便于实现轮廓形状与外观颜色的解耦融合。
其中,将M个第一N维向量中的前i个第一N维向量与第一子权重相乘,得到第一加权隐变量的前i个第一加权N维向量,可以理解为,将第一子权重作用于第一隐变量的前i个第一N维向量;将M个第二N维向量中的前i个第二N维向量与第二子权重相乘,可以理解为,将第二子权重作用于第二隐变量的前i个第二N维向量。
在一种可能的实现方式中,根据对象属性的种类,确定第一权重与第一隐变量之间的第一加权隐变量,以及第二权重与第二隐变量之间的第二加权隐变量,包括:在对象属性包括外观颜色的情况下,将M个第一N维向量中的后M-i个第一N维向量与第三子权重相乘,得到第一加权隐变量的后M-i个第一加权N维向量;以及,将M个第二N维向量中的后M-i个第二N维向量与第四子权重相乘,得到第二加权隐变量的后M-i个第二加权N维向量;其中,i∈[1,M)。通过该方式,能够控制对象的外观颜色的融合程度,便于实现轮廓形状与外观颜色的解耦融合。其中,将M个第一N维向量中的后M-i个第一N维向量与第三子权重相乘,得到第一加权隐变量的后M-i个第一加权N维向量,即将第三子权重作用于第一隐变量的后M-i个第一N维向量;将M个第二N维向量中的后M-i个第二N维向量与第四子权重相乘,即将第四子权重作用于第二隐变量的后M-i个第二N维向量。应理解的是,i的取值可以是基于生成网络的网络结构进行实验测试所确定的经验值,对此本公开实施例不作限制。
步骤S132:根据第一加权隐变量与第二加权隐变量,确定第三隐变量。
如上所述,第一加权隐变量可以表示为M个第一加权N维向量,第二加权隐变量可以表示为M个第二加权N维向量,第三隐变量表示为M个第三N维向量。在一种可能的实现方式中,根据第一加权隐变量与第二加权隐变量,确定第三隐变量,包括:将第一加权隐变量的前i个第一加权N维向量与第二加权隐变量的前i个第二加权N维向量相加,得到第三隐变量的前i个第三N维向量;将第一加权隐变量的后M-i个第一加权N维向量与第二加权隐变量的后M-i个第二加权N维向量相加,得到第三隐变量的后M-i个第三N维向量。该方式可以理解为将第一加权隐变量与第二加权隐变量相加,得到第三隐变量。通过该方式,可以有效得到融合后的第三隐变量。
由于生成网络可以生成具有非真实风格的目标图像,例如漫画风格的目标图像,在这种情况下,第一图像与第二图像中对象的外观颜色,对目标图像中对象的外观颜色影响较小甚至没有影响,因此,目标图像中对象的外观颜色可以取决于生成网络对应的非真实风格,而不受第一图像与第二图像中对象的外观颜色的影响。
在一种可能的实现方式中,在生成网络生成的是具有非真实风格的目标图像的情况下,根据第一加权隐变量与第二加权隐变量,确定第三隐变量,还包括:将与第一加权隐变量对应的第一隐变量的后M-i个第一N维向量,作为第三隐变量的后M-i个第三N维向量;或,将与第二加权隐变量对应的第二隐变量的后M-i个第二N维向量,作为第三隐变量的后M-i个第三N维向量。在本公开实施例中,无需对两个图像中的外观颜色进行融合,选择任一图像中的外观颜色即可。通过该方式,可以在生成网络生成的是具有非真实风格的目标图像的情况下,快捷地得到第三隐变量。
需要说明的是,目标图像中对象的外观颜色取决于生成网络对应的非真实风格,而不受第一图像与第二图像中对象的外观颜色的影响,同样也不受融合后的第三隐变量所隐含的外观颜色的影响。因此,在生成网络生成的是具有非真实风格的目标图像的情况下,第三隐向量的后M-i个第三N维向量可以是第一隐变量的后M-i个第一N维向量,或第二隐变量的后M-i个第二N维向量,也可以是上述后M-i个第一加权N维向量与上述后M-i个第二加权N维向量的和。而在生成网络生成的是具有真实风格的目标图像的情况下,第三隐向量的后M-i个第三N维向量是上述后M-i个第一加权N维向量与上述后M-i个第二加权N维向量的和。通过该方式,可以使目标图像中融合了第一图像与第二图像中对象的外观颜色以及轮廓形状。
在本公开实施例中,能够根据对象属性的种类,以及第一权重与第二权重,实现不同对象属性的不同融合程度的融合,使得基于融合后的第三隐变量所得到的目标图像能够满足用户的不同融合需求。如上所述,用户可以设置目标图像的图像风格,不同图像风格对应不同的生成网络,在一种可能的实现方式中,在步骤S14中,对第三隐变量进行解码处理,得到融合后的目标图像,包括:步骤S141:响应于针对目标图像的图像风格的风格设置操作,确定与设置的图像风格对应的目标生成网络,目标生成网络用于生成具有设置的图像风格的图像。如上所述,本领域技术人员可以利用本领域已知的软件开发技术,设计并实现本公开实施例的图像融合方法的应用程序以及对应的图形交互界面,该图形交互界面中可以提供用于设置图像风格的操作控件,以实现用户针对图像风格的风格设置操作,对此本公开实施例不作限制。例如,图2示出的图形交互界面中,用户可以在“风格模型1”与“风格模型2”处设置图像风格,基于设置的图像风格可以确定所采用的目标生成网络。
由于用户可能期望目标图像的图像风格是两种图像风格融合后的风格,例如,将真实风格与漫画风格进行融合后的风格,对此,用户还可以在图2示出的“风格模型1”与“风格模型2”处设置不同图像风格。当用户设置不同图像风格时,在实现图像风格的风格融合时,可以将两种图像风格对应的两个生成网络进行网络融合,得到目标生成网络,从而利用网络融合后的目标生成网络,生成具有融合两种图像风格的目标图像。应理解的是,当用户设置一种图像风格时,目标生成网络也即用户设置的该一种图像风格所对应的生成网络。
图5示出了本公开实施例提供的一种图形用户界面的示意图,如图5所示,用户可以在对应“风格模型”的控件P7处设置融合两种图像风格的风格标识,如“融合风格1”,从而可以确定融合后的目标生成网络的网络标识,并保存融合后的目标生成网络,便于用户之后通过设置该融合后的图像风格,可以直接调用该融合后的目标生成网络。图5中其他内容的理解可以参照前述对图2的说明。
步骤S142:利用目标生成网络对第三隐变量进行解码处理,得到目标图像。
如上所述,第三隐变量可以表示为M个第三N维向量,在一种可能的实现方式中,目标生成网络具有M层网络层,利用目标生成网络对第三隐变量进行解码处理,得到 目标图像,包括:将第1个第三N维向量输入至目标生成网络的第1层网络层,得到第1层网络层输出的第1个中间图;将第m个第三N维向量以及第m-1个中间图输入至目标生成网络的第m层网络层,得到第m层网络层输出的第m个中间图,n∈[2,M);将第M个第三N维向量以及第M-1个中间图输入至目标生成网络的第M层网络层,得到第M层网络层输出的风格融合图像,目标图像包括风格融合图像。
在一种可能的实现方式中,目标生成网络可以用于生成分辨率逐步递增的图像,目标生成网络也可以称为多层变换的目标生成网络。目标生成网络的第一层网络层的输入是一个第三N维向量,之后每层网络层的输入包括一个第三N维向量以及上层网络层输出的中间图,最后一层网络层输出目标图像。
可以理解的是,目标生成网络的低分辨率网络层(也可以称为浅层网络层)先学习并生成低分辨率(如4×4的分辨率)的中间图,之后逐渐随着网络深度的增加,继续学习并生成更高分辨率(如512×512的分辨率)的中间图,最后生成最高分辨率(如1024×1024的分辨率)的目标图像。
图6示出了本公开实施例提供的一种图像融合流程的示意图一,如图6所示的图像融合流程可以是在用户设置一种图像风格下的图像融合流程,图6中第一图像、第二图像与目标图像均为真实风格的图像。图6所示的图像融合流程可以包括:将第一图像与第二图像分别输入至L1和L2标识的人脸图像编码器(对应前述针对人脸的图像编码器)中,分别得到第一隐变量与第二隐变量;根据L3标识的针对脸型设置的融合权重以及L4标识的针对脸色设置的融合权重,将第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量;将第三隐变量输入至L5标识的目标生成网络中,得到融合后的目标图像。在本公开实施例中,可以利用与设置的图像风格对应的目标生成网络,对第三隐变量进行解码处理,从而可以有效得到具有该设置的图像风格的目标图像。
如上所述,用户可以设置两种图像风格,并将两种图像风格对应的两个生成网络进行网络融合,得到目标生成网络,这样利用网络融合后的目标生成网络,可以生成具有融合两种图像风格的目标图像。在一种可能的实现方式中,设置的图像风格包括第一图像风格与第二图像风格,第一图像风格与第二图像风格的风格类型不同,风格设置操作还用于设置风格融合程度,风格融合程度用于指示第一生成网络与第二生成网络之间融合的网络层数,其中,在步骤S141中,确定与设置的图像风格对应的目标生成网络,包括:确定与第一图像风格对应的第一生成网络,以及与第二图像风格对应的第二生成网络,第一生成网络用于生成具有第一图像风格的图像,第二生成网络用于生成具有第二图像风格的图像;根据风格融合程度,对第一生成网络与第二生成网络进行网络融合,得到目标生成网络。通过该方式,可以根据风格融合程度,实现第一生成网络与第二生成网络之间的网络融合,使目标生成网络能够生成具有融合两种图像风格的目标图像。
应理解的是,在用户设置两种图像风格后,可以基于设置的第一图像风格与第二图像风格调取对应的第一生成网络以及第二生成网络,以对第一生成网络与第二生成网络进行网络融合。其中,风格融合程度可以控制目标图像的图像风格与第一图像风格的接 近程度,也即可以控制目标图像的图像风格与第二图像风格的接近程度。风格融合程度用于指示第一生成网络与第二生成网络之间融合的网络层数,其中,融合的网络层数小于第一生成网络与第二生成网络的总网络层数。
在一种可能的实现方式中,第一生成网络与第二生成网络各自具有M层网络层,其中,根据风格融合程度,对第一生成网络与第二生成网络进行网络融合,得到目标生成网络,包括:将第一生成网络的前I层网络层,替换为第二生成网络的前I层网络层,得到目标生成网络;或,将第一生成网络的后I层网络层,替换为第二生成网络的后I层网络层,得到目标生成网络;其中,I为网络层数,I∈[1,M),目标图像的图像风格与第一图像风格之间的风格接近程度与网络层数I成负相关,目标图像的图像风格与第二图像风格之间的风格接近程度与网络层数I成正相关。通过该方式,可以有效实现第一生成网络与第二生成网络之间的网络融合,使目标生成网络能够生成具有融合两种图像风格的目标图像。
在本公开实施例中,将第一生成网络的前I层网络层,替换为第二生成网络的前I层网络层,也即,将第一生成网络的前I层网络层,与第二生成网络的后N-I层网络层进行拼接。应理解的是,I的值可以是用户根据风格融合需求自定义设置的,例如可以通过上述图5示出的图形交互界面中设置“脸色”的操作控件来设置风格融合程度,也即当用户设置了两种图像风格时,用户在图形交互界面中设置的针对外观颜色的融合权重,可以转换为设置的风格融合程度;当然也可以在上述图形交互界面中提供独立的操作控件来设置风格融合程度,对此本公开实施例不作限制。其中,目标图像的图像风格与第一图像风格之间的风格接近程度与网络层数I成负相关,以及目标图像的图像风格与第二图像风格之间的风格接近程度与网络层数I成正相关。其中,I的值越小,目标生成网络中第一生成网络的网络层占比越多,目标生成网络生成的目标图像越接近第一图像风格(也即越不接近第二图像风格);I的值越大,目标生成网络中第二生成网络的网络层占比越多,目标生成网络生成的目标图像越接近第二图像风格(也即越不接近第一图像风格)。
如上所述,在步骤S142中目标图像可以包括目标生成网络输出的风格融合图像,在一种可能的实现方式中,目标图像还可以包括:利用第一生成网络对第三隐变量进行解码处理所得到的第一风格图像,以及利用第二生成网络对第三隐变量进行解码处理所得到的第二风格图像中的至少一种。其中,可以参照上述目标生成网络对第三隐变量进行解码处理得到风格融合图像的实现方式,实现利用第一生成网络对第三隐变量进行解码处理所得到的第一风格图像,以及利用第二生成网络对第三隐变量进行解码处理所得到的第二风格图像。
图7示出了本公开实施例提供的一种图像融合流程的示意图二,如图7所示的图像融合流程可以是在用户设置两种图像风格下的图像融合流程。图7所示的图像融合流程可以包括:将第一图像与第二图像分别输入至L6和L7分别标识的人脸图像编码器中,分别得到第一隐变量与第二隐变量;根据L8标识的针对脸型设置的融合权重,将第一 隐变量的前i个第一N维向量与第二隐变量的前i个第二N维向量进行融合,得到第三隐变量的前i个第三N维向量,并将第一隐变量的后M-i后第一N维向量或第二隐变量的后M-i个第二N维向量,作为第三隐变量的后M-i个第三N维向量;根据L9标识的设置的风格融合程度,将图像风格x对应的L10标识的第一生成网络与图像风格y对应的L11标识的第二生成网络进行网络融合,得到L12标识的目标生成网络;将第三隐变量分别输入至L12标识的目标生成网络、L10标识的第一生成网络以及L11标识的第二生成网络中,得到L12标识的目标生成网络输出的风格融合图像、L10标识的第一生成网络输出的第一风格图像以及L11标识的第二生成网络输出的第二风格图像,其中,目标图像包括风格融合图像、第一风格图像以及第二风格图像。根据本公开的实施例,能够有效地解耦轮廓形状和轮廓形状的属性融合,使得用户可以对轮廓形状和轮廓形状分别设置融合权重,并进行不同融合程度的融合;还可以直接作用于不同图像风格的图像融合。
可以理解的是,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例。在实施方式的上述方法中,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。
此外,本公开还提供了图像融合装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种图像融合方法,相应技术方案和描述可参见方法部分的相应记载。
图8示出了本公开实施例提供的一种图像融合装置的框图,如图8所示,所述装置包括:获取模块101,配置为获取待融合的第一图像与第二图像,所述第一图像与所述第二图像中有同一种对象;编码模块102,配置为分别对所述第一图像与所述第二图像进行编码处理,得到所述第一图像对应的第一隐变量以及所述第二图像对应的第二隐变量;融合模块103,配置为响应于针对所述同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对所述第一隐变量与所述第二隐变量进行融合,得到融合后的第三隐变量;解码模块104,配置为对所述第三隐变量进行解码处理,得到融合后的目标图像。
在一种可能的实现方式中,所述融合权重包括所述第一图像对应的第一权重,以及所述第二图像对应的第二权重;所述融合模块103,包括:加权隐变量确定子模块,配置为根据所述对象属性的种类,确定所述第一权重与所述第一隐变量之间的第一加权隐变量,以及所述第二权重与所述第二隐变量之间的第二加权隐变量;融合子模块,配置为根据所述第一加权隐变量与所述第二加权隐变量,确定所述第三隐变量。
在一种可能的实现方式中,所述第一隐变量表示为M个第一N维向量,所述第二隐变量表示为M个第二N维向量,M与N为正整数,所述对象属性的种类包括所述对象的轮廓形状,所述第一权重包括所述第一图像中轮廓形状对应的第一子权重,所述第二权重包括所述第二图像中轮廓形状对应的第二子权重;所述加权隐变量确定子模块,配置为在所述对象属性包括轮廓形状的情况下,将所述M个第一N维向量中的前i个 第一N维向量与所述第一子权重相乘,得到所述第一加权隐变量的前i个第一加权N维向量;以及,将所述M个第二N维向量中的前i个第二N维向量与所述第二子权重相乘,得到所述第二加权隐变量的前i个第二加权N维向量;其中,i∈[1,M)。
在一种可能的实现方式中,所述第一隐变量表示为M个第一N维向量,所述第二隐变量表示为M个第二N维向量,M与N为正整数,所述对象属性的种类包括所述对象的外观颜色,所述第一权重包括所述第一图像中外观颜色对应的第三子权重,所述第二权重包括所述第二图像中外观颜色对应的第四子权重;所述加权隐变量确定子模块,配置为在所述对象属性包括外观颜色的情况下,将所述M个第一N维向量中的后M-i个第一N维向量与所述第三子权重相乘,得到所述第一加权隐变量的后M-i个第一加权N维向量;以及,将所述M个第二N维向量中的后M-i个第二N维向量与所述第四子权重相乘,得到所述第二加权隐变量的后M-i个第二加权N维向量;其中,i∈[1,M)。
在一种可能的实现方式中,所述第一加权隐变量表示为M个第一加权N维向量,所述第二加权隐变量表示为M个第二加权N维向量,所述第三隐变量表示为M个第三N维向量;所述融合子模块,配置为将所述第一加权隐变量的前i个第一加权N维向量与所述第二加权隐变量的前i个第二加权N维向量相加,得到所述第三隐变量的前i个第三N维向量;将所述第一加权隐变量的后M-i个第一加权N维向量与所述第二加权隐变量的后M-i个第二加权N维向量相加,得到所述第三隐变量的后M-i个第三N维向量。
在一种可能的实现方式中,所述第一加权隐变量表示为M个第一加权N维向量,所述第二加权隐变量表示为M个第二加权N维向量,所述第三隐变量表示为M个第三N维向量;所述融合子模块,配置为将与所述第一加权隐变量对应的第一隐变量的后M-i个第一N维向量,作为所述第三隐变量的后M-i个第三N维向量;或,将与所述第二加权隐变量对应的第二隐变量的后M-i个第二N维向量,作为所述第三隐变量的后M-i个第三N维向量。
在一种可能的实现方式中,所述解码模块104,包括:网络确定子模块,配置为响应于针对所述目标图像的图像风格的风格设置操作,确定与设置的图像风格对应的目标生成网络,所述目标生成网络用于生成具有所述设置的图像风格的图像;解码子模块,配置为利用所述目标生成网络对所述第三隐变量进行解码处理,得到所述目标图像。
在一种可能的实现方式中,设置的图像风格包括第一图像风格与第二图像风格,第一图像风格与第二图像风格的风格类型不同,所述风格设置操作还用于设置风格融合程度,所述风格融合程度用于指示第一生成网络与第二生成网络之间融合的网络层数;所述网络确定子模块,配置为确定与所述第一图像风格对应的第一生成网络,以及与所述第二图像风格对应的第二生成网络,所述第一生成网络用于生成具有所述第一图像风格的图像,所述第二生成网络用于生成具有所述第二图像风格的图像;根据所述风格融合程度,对所述第一生成网络与所述第二生成网络进行网络融合,得到所述目标生成网络。
在一种可能的实现方式中,所述第一生成网络与所述第二生成网络各自具有M层 网络层;所述网络确定子模块,配置为将所述第一生成网络的前I层网络层,替换为所述第二生成网络的前I层网络层,得到所述目标生成网络;或,将所述第一生成网络的后I层网络层,替换为所述第二生成网络的后I层网络层,得到所述目标生成网络;其中,I为所述网络层数,I∈[1,M),所述目标图像的图像风格与所述第一图像风格之间的风格接近程度与所述网络层数I成负相关,所述目标图像的图像风格与所述第二图像风格之间的风格接近程度与所述网络层数I成正相关。
在一种可能的实现方式中,所述目标生成网络具有M层网络层,所述第三隐变量表示为M个第三N维向量;所述解码子模块,配置为将第1个第三N维向量输入至所述目标生成网络的第1层网络层,得到所述第1层网络层输出的第1个中间图;将第m个第三N维向量以及第m-1个中间图输入至所述目标生成网络的第m层网络层,得到所述第m层网络层输出的第m个中间图,n∈[2,M);将第M个第三N维向量以及第M-1个中间图输入至所述目标生成网络的第M层网络层,得到所述第M层网络层输出的风格融合图像,所述目标图像包括所述风格融合图像。
在一种可能的实现方式中,所述目标图像还包括:利用所述第一生成网络对所述第三隐变量进行解码处理所得到的第一风格图像,以及利用所述第二生成网络对所述第三隐变量进行解码处理所得到的第二风格图像中的至少一种。
在本公开实施例中,通过对待融合的第一图像与第二图像进行编码处理,得到第一图像对应的第一隐变量以及第二图像对应的第二隐变量,再根据设置的任一对象属性的融合权重,对第一隐变量与第二隐变量进行融合,得到融合后的第三隐变量,并对第三隐变量进行解码处理,得到目标图像,可以实现基于用户设置的不同对象属性的融合权重,实现不同对象属性的解耦融合,并且还可控制不同对象属性的融合程度,使得融合后的目标图像满足用户的不同融合需求。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其实现以及效果可以参照上文方法实施例的描述。
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。电子设备可以被提供为终端、服务器或其它形态的设备。
图9示出了本公开实施例提供的一种电子设备的框图。例如,电子设备1900可以被提供为一服务器或终端设备。参照图9,电子设备1900包括处理组件1922,其可以包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理 组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发计算机可读程序指令,供存储在各个计算/处理设备中的计算机可读存储介质中。根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序 产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,可根据所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的硬件系统来实现,或者可以用专用的硬件与计算机指令的组合来实现。
该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品体现为计算机存储介质,在另一个可选实施例中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等。
若本公开技术方案涉及个人信息,应用本公开技术方案的产品在处理个人信息前,已明确告知个人信息处理规则,并取得个人自主同意。若本公开技术方案涉及敏感个人信息,应用本公开技术方案的产品在处理敏感个人信息前,已取得个人单独同意,并且同时满足“明示同意”的要求。例如,在摄像头等个人信息采集装置处,设置明确显著的标识告知已进入个人信息采集范围,将会对个人信息进行采集,若个人自愿进入采集范围即视为同意对其个人信息进行采集;或者在个人信息处理的装置上,利用标识或信息告知个人信息处理规则的情况下,通过弹窗信息或请个人自行上传其个人信息等方式获得个人授权;其中,个人信息处理规则可包括个人信息处理者、个人信息处理目的、处理方式以及处理的个人信息种类等信息。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (26)

  1. 一种图像融合方法,包括:
    获取待融合的第一图像与第二图像,所述第一图像与所述第二图像中有同一种对象;
    分别对所述第一图像与所述第二图像进行编码处理,得到所述第一图像对应的第一隐变量以及所述第二图像对应的第二隐变量;
    响应于针对所述同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对所述第一隐变量与所述第二隐变量进行融合,得到融合后的第三隐变量;
    对所述第三隐变量进行解码处理,得到融合后的目标图像。
  2. 根据权利要求1所述的方法,其中,所述融合权重包括所述第一图像对应的第一权重,以及所述第二图像对应的第二权重;所述响应于针对所述同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对所述第一隐变量与所述第二隐变量进行融合,得到融合后的第三隐变量,包括:
    根据所述对象属性的种类,确定所述第一权重与所述第一隐变量之间的第一加权隐变量,以及所述第二权重与所述第二隐变量之间的第二加权隐变量;
    根据所述第一加权隐变量与所述第二加权隐变量,确定所述第三隐变量。
  3. 根据权利要求2所述的方法,其中,所述第一隐变量表示为M个第一N维向量,所述第二隐变量表示为M个第二N维向量,M与N为正整数,所述对象属性的种类包括所述对象的轮廓形状,所述第一权重包括所述第一图像中轮廓形状对应的第一子权重,所述第二权重包括所述第二图像中轮廓形状对应的第二子权重;所述根据所述对象属性的种类,确定所述第一权重与所述第一隐变量之间的第一加权隐变量,以及所述第二权重与所述第二隐变量之间的第二加权隐变量,包括:
    在所述对象属性包括轮廓形状的情况下,将所述M个第一N维向量中的前i个第一N维向量与所述第一子权重相乘,得到所述第一加权隐变量的前i个第一加权N维向量;以及,
    将所述M个第二N维向量中的前i个第二N维向量与所述第二子权重相乘,得到所述第二加权隐变量的前i个第二加权N维向量;其中,i∈[1,M)。
  4. 根据权利要求2或3所述的方法,其中,所述第一隐变量表示为M个第一N维向量,所述第二隐变量表示为M个第二N维向量,M与N为正整数,所述对象属性的种类包括所述对象的外观颜色,所述第一权重包括所述第一图像中外观颜色对应的第三子权重,所述第二权重包括所述第二图像中外观颜色对应的第四子权重;所述根据所述对象属性的种类,确定所述第一权重与所述第一隐变量之间的第一加权隐变量,以及所述第二权重与所述第二隐变量之间的第二加权隐变量,包括:
    在所述对象属性包括外观颜色的情况下,将所述M个第一N维向量中的后M-i个第一N维向量与所述第三子权重相乘,得到所述第一加权隐变量的后M-i个第一 加权N维向量;以及,
    将所述M个第二N维向量中的后M-i个第二N维向量与所述第四子权重相乘,得到所述第二加权隐变量的后M-i个第二加权N维向量;其中,i∈[1,M)。
  5. 根据权利要求2至4中任意一项所述的方法,其中,所述第一加权隐变量表示为M个第一加权N维向量,所述第二加权隐变量表示为M个第二加权N维向量,所述第三隐变量表示为M个第三N维向量,M与N为正整数;所述根据所述第一加权隐变量与所述第二加权隐变量,确定所述第三隐变量,包括:
    将所述第一加权隐变量的前i个第一加权N维向量与所述第二加权隐变量的前i个第二加权N维向量相加,得到所述第三隐变量的前i个第三N维向量;
    将所述第一加权隐变量的后M-i个第一加权N维向量与所述第二加权隐变量的后M-i个第二加权N维向量相加,得到所述第三隐变量的后M-i个第三N维向量;其中,i∈[1,M)。
  6. 根据权利要求2至5中任意一项所述的方法,其中,所述第一加权隐变量表示为M个第一加权N维向量,所述第二加权隐变量表示为M个第二加权N维向量,所述第三隐变量表示为M个第三N维向量,M与N为正整数;所述根据所述第一加权隐变量与所述第二加权隐变量,确定所述第三隐变量,还包括:
    将与所述第一加权隐变量对应的第一隐变量的后M-i个第一N维向量,作为所述第三隐变量的后M-i个第三N维向量;或,将与所述第二加权隐变量对应的第二隐变量的后M-i个第二N维向量,作为所述第三隐变量的后M-i个第三N维向量;其中,i∈[1,M)。
  7. 根据权利要求1至6中任意一项所述的方法,其中,所述对所述第三隐变量进行解码处理,得到融合后的目标图像,包括:
    响应于针对所述目标图像的图像风格的风格设置操作,确定与设置的图像风格对应的目标生成网络,所述目标生成网络用于生成具有所述设置的图像风格的图像;
    利用所述目标生成网络对所述第三隐变量进行解码处理,得到所述目标图像。
  8. 根据权利要求7所述的方法,其中,所述设置的图像风格包括第一图像风格与第二图像风格,第一图像风格与第二图像风格的风格类型不同,所述风格设置操作还用于设置风格融合程度,所述风格融合程度用于指示第一生成网络与第二生成网络之间融合的网络层数;所述确定与设置的图像风格对应的目标生成网络,包括:
    确定与所述第一图像风格对应的第一生成网络,以及与所述第二图像风格对应的第二生成网络,所述第一生成网络用于生成具有所述第一图像风格的图像,所述第二生成网络用于生成具有所述第二图像风格的图像;
    根据所述风格融合程度,对所述第一生成网络与所述第二生成网络进行网络融合,得到所述目标生成网络。
  9. 根据权利要求8所述的方法,其中,所述第一生成网络与所述第二生成网络各自具有M层网络层;所述根据所述风格融合程度,对所述第一生成网络与所述第 二生成网络进行网络融合,得到所述目标生成网络,包括:
    将所述第一生成网络的前I层网络层,替换为所述第二生成网络的前I层网络层,得到所述目标生成网络;或,
    将所述第一生成网络的后I层网络层,替换为所述第二生成网络的后I层网络层,得到所述目标生成网络;其中,I为所述网络层数,I∈[1,M),所述目标图像的图像风格与所述第一图像风格之间的风格接近程度与所述网络层数I成负相关,所述目标图像的图像风格与所述第二图像风格之间的风格接近程度与所述网络层数I成正相关。
  10. 根据权利要求7至9中任意一项所述的方法,其中,所述目标生成网络具有M层网络层,所述第三隐变量表示为M个第三N维向量;所述利用所述目标生成网络对所述第三隐变量进行解码处理,得到所述目标图像,包括:
    将第1个第三N维向量输入至所述目标生成网络的第1层网络层,得到所述第1层网络层输出的第1个中间图;
    将第m个第三N维向量以及第m-1个中间图输入至所述目标生成网络的第m层网络层,得到所述第m层网络层输出的第m个中间图,n∈[2,M);
    将第M个第三N维向量以及第M-1个中间图输入至所述目标生成网络的第M层网络层,得到所述第M层网络层输出的风格融合图像,所述目标图像包括所述风格融合图像。
  11. 根据权利要求8至10中任意一项所述的方法,其中,所述目标图像还包括:利用所述第一生成网络对所述第三隐变量进行解码处理所得到的第一风格图像,以及利用所述第二生成网络对所述第三隐变量进行解码处理所得到的第二风格图像中的至少一种。
  12. 一种图像融合装置,包括:
    获取模块,配置为获取待融合的第一图像与第二图像,所述第一图像与所述第二图像中有同一种对象;编码模块,配置为分别对所述第一图像与所述第二图像进行编码处理,得到所述第一图像对应的第一隐变量以及所述第二图像对应的第二隐变量;融合模块,配置为响应于针对所述同一种对象的任一对象属性的融合权重的设置操作,根据设置的融合权重,对所述第一隐变量与所述第二隐变量进行融合,得到融合后的第三隐变量;解码模块,配置为对所述第三隐变量进行解码处理,得到融合后的目标图像。
  13. 根据权利要求12所述的装置,其中,所述融合权重包括所述第一图像对应的第一权重,以及所述第二图像对应的第二权重;所述融合模块,包括:加权隐变量确定子模块,配置为根据所述对象属性的种类,确定所述第一权重与所述第一隐变量之间的第一加权隐变量,以及所述第二权重与所述第二隐变量之间的第二加权隐变量;融合子模块,配置为根据所述第一加权隐变量与所述第二加权隐变量,确定所述第三隐变量。
  14. 根据权利要求13所述的装置,其中,所述第一隐变量表示为M个第一N维 向量,所述第二隐变量表示为M个第二N维向量,M与N为正整数,所述对象属性的种类包括所述对象的轮廓形状,所述第一权重包括所述第一图像中轮廓形状对应的第一子权重,所述第二权重包括所述第二图像中轮廓形状对应的第二子权重;所述加权隐变量确定子模块,配置为在所述对象属性包括轮廓形状的情况下,将所述M个第一N维向量中的前i个第一N维向量与所述第一子权重相乘,得到所述第一加权隐变量的前i个第一加权N维向量;以及,将所述M个第二N维向量中的前i个第二N维向量与所述第二子权重相乘,得到所述第二加权隐变量的前i个第二加权N维向量;其中,i∈[1,M)。
  15. 根据权利要求13或14所述的装置,其中,所述第一隐变量表示为M个第一N维向量,所述第二隐变量表示为M个第二N维向量,M与N为正整数,所述对象属性的种类包括所述对象的外观颜色,所述第一权重包括所述第一图像中外观颜色对应的第三子权重,所述第二权重包括所述第二图像中外观颜色对应的第四子权重;所述加权隐变量确定子模块,配置为在所述对象属性包括外观颜色的情况下,将所述M个第一N维向量中的后M-i个第一N维向量与所述第三子权重相乘,得到所述第一加权隐变量的后M-i个第一加权N维向量;以及,将所述M个第二N维向量中的后M-i个第二N维向量与所述第四子权重相乘,得到所述第二加权隐变量的后M-i个第二加权N维向量;其中,i∈[1,M)。
  16. 根据权利要求13至15中任意一项所述的装置,其中,所述第一加权隐变量表示为M个第一加权N维向量,所述第二加权隐变量表示为M个第二加权N维向量,所述第三隐变量表示为M个第三N维向量,M与N为正整数;所述融合子模块,配置为将所述第一加权隐变量的前i个第一加权N维向量与所述第二加权隐变量的前i个第二加权N维向量相加,得到所述第三隐变量的前i个第三N维向量;将所述第一加权隐变量的后M-i个第一加权N维向量与所述第二加权隐变量的后M-i个第二加权N维向量相加,得到所述第三隐变量的后M-i个第三N维向量;其中,i∈[1,M)。
  17. 根据权利要求13至16中任意一项所述的装置,其中,所述第一加权隐变量表示为M个第一加权N维向量,所述第二加权隐变量表示为M个第二加权N维向量,所述第三隐变量表示为M个第三N维向量,M与N为正整数;所述融合子模块,配置为将与所述第一加权隐变量对应的第一隐变量的后M-i个第一N维向量,作为所述第三隐变量的后M-i个第三N维向量;或,将与所述第二加权隐变量对应的第二隐变量的后M-i个第二N维向量,作为所述第三隐变量的后M-i个第三N维向量;其中,i∈[1,M)。
  18. 根据权利要求12至17中任意一项所述的装置,其中,所述解码模块,包括:网络确定子模块,配置为响应于针对所述目标图像的图像风格的风格设置操作,确定与设置的图像风格对应的目标生成网络,所述目标生成网络用于生成具有所述设置的图像风格的图像;解码子模块,配置为利用所述目标生成网络对所述第三隐变量进行解码处理,得到所述目标图像。
  19. 根据权利要求18所述的装置,其中,所述设置的图像风格包括第一图像风格与第二图像风格,第一图像风格与第二图像风格的风格类型不同,所述风格设置操作还用于设置风格融合程度,所述风格融合程度用于指示第一生成网络与第二生成网络之间融合的网络层数;所述网络确定子模块,配置为确定与所述第一图像风格对应的第一生成网络,以及与所述第二图像风格对应的第二生成网络,所述第一生成网络用于生成具有所述第一图像风格的图像,所述第二生成网络用于生成具有所述第二图像风格的图像;根据所述风格融合程度,对所述第一生成网络与所述第二生成网络进行网络融合,得到所述目标生成网络。
  20. 根据权利要求19所述的装置,其中,所述第一生成网络与所述第二生成网络各自具有M层网络层;所述网络确定子模块,配置为将所述第一生成网络的前I层网络层,替换为所述第二生成网络的前I层网络层,得到所述目标生成网络;或,将所述第一生成网络的后I层网络层,替换为所述第二生成网络的后I层网络层,得到所述目标生成网络;其中,I为所述网络层数,I∈[1,M),所述目标图像的图像风格与所述第一图像风格之间的风格接近程度与所述网络层数I成负相关,所述目标图像的图像风格与所述第二图像风格之间的风格接近程度与所述网络层数I成正相关。
  21. 根据权利要求18至20中任意一项所述的装置,其中,所述目标生成网络具有M层网络层,所述第三隐变量表示为M个第三N维向量;所述解码子模块,配置为将第1个第三N维向量输入至所述目标生成网络的第1层网络层,得到所述第1层网络层输出的第1个中间图;将第m个第三N维向量以及第m-1个中间图输入至所述目标生成网络的第m层网络层,得到所述第m层网络层输出的第m个中间图,n∈[2,M);将第M个第三N维向量以及第M-1个中间图输入至所述目标生成网络的第M层网络层,得到所述第M层网络层输出的风格融合图像,所述目标图像包括所述风格融合图像。
  22. 根据权利要求19至21中任意一项所述的装置,其中,所述目标图像还包括:利用所述第一生成网络对所述第三隐变量进行解码处理所得到的第一风格图像,以及利用所述第二生成网络对所述第三隐变量进行解码处理所得到的第二风格图像中的至少一种。
  23. 一种电子设备,包括:处理器,用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至11中任意一项所述的方法。
  24. 一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现权利要求1至11中任意一项所述的方法。
  25. 一种计算机程序,包括计算机可读代码,在计算机可读代码在设备上运行的情况下,设备中的处理器执行用于实现权利要求1至11中任意一项所述的方法。
  26. 一种计算机程序产品,配置为存储计算机可读指令,所述计算机可读指令被执行时使得计算机执行权利要求1至11中任意一项所述的方法。
PCT/CN2022/134922 2022-03-25 2022-11-29 图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品 WO2023179074A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210298017.4A CN114418919B (zh) 2022-03-25 2022-03-25 图像融合方法及装置、电子设备和存储介质
CN202210298017.4 2022-03-25

Publications (1)

Publication Number Publication Date
WO2023179074A1 true WO2023179074A1 (zh) 2023-09-28

Family

ID=81263979

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134922 WO2023179074A1 (zh) 2022-03-25 2022-11-29 图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品

Country Status (2)

Country Link
CN (1) CN114418919B (zh)
WO (1) WO2023179074A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418919B (zh) * 2022-03-25 2022-07-26 北京大甜绵白糖科技有限公司 图像融合方法及装置、电子设备和存储介质
CN116452466B (zh) * 2023-06-14 2023-10-20 荣耀终端有限公司 图像处理方法、装置、设备及计算机可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796628A (zh) * 2019-10-17 2020-02-14 浙江大华技术股份有限公司 图像融合方法、装置、拍摄装置及存储介质
US10970907B1 (en) * 2019-07-02 2021-04-06 Facebook Technologies, Llc System and method for applying an expression to an avatar
CN112767285A (zh) * 2021-02-23 2021-05-07 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质
CN112967261A (zh) * 2021-03-17 2021-06-15 北京三快在线科技有限公司 图像融合方法、装置、设备及存储介质
CN113850168A (zh) * 2021-09-16 2021-12-28 百果园技术(新加坡)有限公司 人脸图片的融合方法、装置、设备及存储介质
CN114119348A (zh) * 2021-09-30 2022-03-01 阿里巴巴云计算(北京)有限公司 图像生成方法、设备和存储介质
CN114418919A (zh) * 2022-03-25 2022-04-29 北京大甜绵白糖科技有限公司 图像融合方法及装置、电子设备和存储介质

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10748376B2 (en) * 2017-09-21 2020-08-18 NEX Team Inc. Real-time game tracking with a mobile device using artificial intelligence
CN109993716B (zh) * 2017-12-29 2023-04-14 微软技术许可有限责任公司 图像融合变换
CN109345449B (zh) * 2018-07-17 2020-11-10 西安交通大学 一种基于融合网络的图像超分辨率及去非均匀模糊方法
CN111583165B (zh) * 2019-02-19 2023-08-08 京东方科技集团股份有限公司 图像处理方法、装置、设备及存储介质
US10916050B1 (en) * 2019-09-23 2021-02-09 Tencent America LLC Method and apparatus for synthesizing realistic hand poses based on blending generative adversarial networks
CN111669587B (zh) * 2020-04-17 2021-07-20 北京大学 一种视频图像的拟态压缩方法、装置、存储介质及终端
CN111652828B (zh) * 2020-05-27 2023-08-08 北京百度网讯科技有限公司 人脸图像生成方法、装置、设备和介质
CN112784897B (zh) * 2021-01-20 2024-03-26 北京百度网讯科技有限公司 图像处理方法、装置、设备和存储介质
CN112766234B (zh) * 2021-02-23 2023-05-12 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质
CN112884758B (zh) * 2021-03-12 2023-01-10 国网四川省电力公司电力科学研究院 一种基于风格迁移方法的缺陷绝缘子样本生成方法及系统
CN113706577A (zh) * 2021-04-08 2021-11-26 腾讯科技(深圳)有限公司 一种图像处理方法、装置和计算机可读存储介质
CN113705316A (zh) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 获取虚拟图像的方法、装置、设备及存储介质
CN113255551A (zh) * 2021-06-04 2021-08-13 广州虎牙科技有限公司 一种人脸编辑器的训练、人脸编辑、直播方法及相关装置
CN113706663B (zh) * 2021-08-27 2024-02-02 脸萌有限公司 图像生成方法、装置、设备及存储介质
CN113763535A (zh) * 2021-09-02 2021-12-07 深圳数联天下智能科技有限公司 一种特征潜码提取方法、计算机设备及存储介质
CN113850712A (zh) * 2021-09-03 2021-12-28 北京达佳互联信息技术有限公司 图像风格转换模型的训练方法、图像风格转换方法及装置
CN113807265B (zh) * 2021-09-18 2022-05-06 山东财经大学 一种多样化的人脸图像合成方法及系统
CN114202456A (zh) * 2021-11-18 2022-03-18 北京达佳互联信息技术有限公司 图像生成方法、装置、电子设备及存储介质
CN114067162A (zh) * 2021-11-24 2022-02-18 重庆邮电大学 一种基于多尺度多粒度特征解耦的图像重构方法及系统
CN113837934B (zh) * 2021-11-26 2022-02-22 北京市商汤科技开发有限公司 图像生成方法及装置、电子设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970907B1 (en) * 2019-07-02 2021-04-06 Facebook Technologies, Llc System and method for applying an expression to an avatar
CN110796628A (zh) * 2019-10-17 2020-02-14 浙江大华技术股份有限公司 图像融合方法、装置、拍摄装置及存储介质
CN112767285A (zh) * 2021-02-23 2021-05-07 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质
CN112967261A (zh) * 2021-03-17 2021-06-15 北京三快在线科技有限公司 图像融合方法、装置、设备及存储介质
CN113850168A (zh) * 2021-09-16 2021-12-28 百果园技术(新加坡)有限公司 人脸图片的融合方法、装置、设备及存储介质
CN114119348A (zh) * 2021-09-30 2022-03-01 阿里巴巴云计算(北京)有限公司 图像生成方法、设备和存储介质
CN114418919A (zh) * 2022-03-25 2022-04-29 北京大甜绵白糖科技有限公司 图像融合方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN114418919A (zh) 2022-04-29
CN114418919B (zh) 2022-07-26

Similar Documents

Publication Publication Date Title
Lu et al. Image generation from sketch constraint using contextual gan
JP7490004B2 (ja) 機械学習を用いた画像カラー化
JP7137006B2 (ja) 画像処理方法及びその装置、プロセッサ、電子機器並びに記憶媒体
WO2023179074A1 (zh) 图像融合方法及装置、电子设备、存储介质、计算机程序、计算机程序产品
CN111401216B (zh) 图像处理、模型训练方法、装置、计算机设备和存储介质
Subramanian et al. Towards text generation with adversarially learned neural outlines
KR102387570B1 (ko) 표정 생성 방법, 표정 생성 장치 및 표정 생성을 위한 학습 방법
CN111553267B (zh) 图像处理方法、图像处理模型训练方法及设备
CN111814566A (zh) 图像编辑方法、装置、电子设备及存储介质
Liang et al. Spatial-separated curve rendering network for efficient and high-resolution image harmonization
CN110599395A (zh) 目标图像生成方法、装置、服务器及存储介质
CN109902672A (zh) 图像标注方法及装置、存储介质、计算机设备
CA3137297A1 (en) Adaptive convolutions in neural networks
CN116704079B (zh) 图像生成方法、装置、设备及存储介质
CN114239717A (zh) 模型训练方法、图像处理方法及装置、电子设备、介质
CN113781324A (zh) 一种老照片修复方法
CN110121719A (zh) 用于深度学习的装置、方法和计算机程序产品
CN113096001A (zh) 图像处理方法、电子设备及可读存储介质
CN114998583A (zh) 图像处理方法、图像处理装置、设备及存储介质
CN114529785A (zh) 模型的训练方法、视频生成方法和装置、设备、介质
WO2023179075A1 (zh) 图像处理方法及装置、电子设备、存储介质和程序产品
Zhao et al. ChildPredictor: A child face prediction framework with disentangled learning
JP7479507B2 (ja) 画像処理方法及び装置、コンピューター機器、並びにコンピュータープログラム
CN116152631A (zh) 模型训练及图像处理方法、装置、设备及存储介质
CN117688168A (zh) 一种摘要生成的方法以及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933125

Country of ref document: EP

Kind code of ref document: A1