WO2024051289A1 - 图像背景替换方法及相关设备 - Google Patents

图像背景替换方法及相关设备 Download PDF

Info

Publication number
WO2024051289A1
WO2024051289A1 PCT/CN2023/102696 CN2023102696W WO2024051289A1 WO 2024051289 A1 WO2024051289 A1 WO 2024051289A1 CN 2023102696 W CN2023102696 W CN 2023102696W WO 2024051289 A1 WO2024051289 A1 WO 2024051289A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
user
background
background replacement
portrait
Prior art date
Application number
PCT/CN2023/102696
Other languages
English (en)
French (fr)
Inventor
吕跃强
罗谌持
胡凯程
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024051289A1 publication Critical patent/WO2024051289A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Definitions

  • the present application relates to the field of image processing technology, and in particular, to an image background replacement method and related equipment.
  • Image background replacement refers to replacing the background of the original image with a different background.
  • the original image is generally segmented into the foreground and background to obtain the foreground (i.e., the portrait) and the original background, and then the foreground and the virtual background are synthesized to obtain the background replacement image.
  • the existing technology performs image background replacement, the human body parts of different users in the background replacement images obtained are often inconsistent, and the portraits after background replacement are not neat and the effect is not good.
  • Embodiments of the present application provide an image background replacement method and related equipment, which can perform image background replacement according to preset human body parts and improve the background replacement effect.
  • a first aspect of this application discloses an image background replacement method.
  • the method includes: acquiring a first image, which is a two-dimensional image or a three-dimensional model, and the first image includes upper body frontal image information of a first user; acquiring a first image.
  • the second image includes the characteristics of the first user, and the characteristics include posture and/or expression;
  • the first image and the second image are synthesized to obtain a first composite image, the first composite image includes the characteristics of the first user, and including the upper body frontal information of the first user; performing portrait segmentation on the first composite image according to preset human body parts to obtain a first target portrait corresponding to the first user; synthesizing the first target portrait and the virtual background to obtain a background replacement image , the background replacement image includes the first target portrait.
  • the background cannot be replaced according to the preset human body parts.
  • the portrait in the background replacement image includes the preset human body parts, which affects the effect of background replacement.
  • the background replacement image including the face, neck, and chest.
  • the actual background replacement image includes the face and neck, which does not meet the user's needs.
  • the captured images may be inconsistent (for example, some only have faces, some include faces and necks, and some include faces, necks, and shoulders).
  • Human body parts may also be inconsistent, and the portrait in the background replacement image is uneven.
  • the image background replacement method provided by the embodiment of the present application can perform background replacement according to preset human body parts, and the obtained background replacement image includes the preset human body parts, which improves the background replacement effect.
  • performing portrait segmentation on the first composite image based on preset human body parts includes: determining the portrait size based on the preset human body parts, and performing portrait segmentation on the first composite image based on the portrait size; or performing portrait segmentation on the first composite image based on the preset human body parts.
  • the image background replacement method provided by the embodiment of the present application can perform portrait segmentation on a synthetic image by determining the size of the portrait or identifying human body parts, and can accurately segment the target portrait from the synthetic image.
  • compositing the first target portrait with the virtual background includes: scaling the first target portrait according to the size of the position in the virtual scene and/or the position of the first user in the virtual background; The final first target portrait is synthesized with the virtual background.
  • Scaling the target portrait (eg, the first target portrait) according to the size of the position in the virtual scene is to adjust the target portrait to adapt to the size of the position in the virtual scene.
  • the larger the position in the virtual scene the larger the target portrait will be after scaling.
  • the smaller the position in the virtual scene the smaller the target portrait after scaling.
  • the image background replacement method provided by the embodiment of the present application scales the target portrait according to the size of the position in the virtual scene, which can make the portraits in the background replacement image more tidy and the background replacement effect is better.
  • Scaling the target portrait according to the user's position in the virtual background is to adjust the target portrait to adapt to the user's position in the virtual background (satisfying the rules of near and far).
  • the further forward the user is in the virtual background the larger the target portrait will be after scaling.
  • the further back the user is in the virtual background the smaller the zoomed target portrait becomes.
  • scaling the target portrait according to the user's position in the virtual background can make the background replacement image look more realistic.
  • the virtual background is a lecture theater.
  • the target portrait is scaled according to the user's position in the virtual background.
  • the resulting background replacement image has larger portraits in the front and smaller portraits in the back, making the effect more realistic.
  • the first user's position in the virtual background is pre-selected by the first user, or is pre-assigned.
  • the image background replacement method provided by the embodiment of the present application can provide the user with selecting a position in the virtual background, or pre-allocate a position in the virtual background for the user, thereby improving the flexibility of background replacement.
  • the characteristics of the first user include the posture of the first user, and synthesizing the first image and the second image includes: correcting the posture of the first user according to the preset posture; The corrected posture of the user obtains a first composite image.
  • the first composite image includes the corrected posture of the first user and includes frontal information of the first user's upper body.
  • the image background replacement method provided by the embodiment of the present application corrects the user's posture according to the preset posture, which can meet the needs of the user's posture in background replacement.
  • the posture of the portrait in the obtained background replacement image will be consistent, and the visual The visual effect is more tidy, and the effect of background replacement is further optimized.
  • the first image is a three-dimensional model
  • correcting the first user's posture according to the preset posture includes: performing face segmentation on the second image to obtain a face image; and generating a face image based on the face image.
  • Facial texture perform facial texture replacement on the three-dimensional model according to the facial texture to obtain a three-dimensional model after texture replacement, where the three-dimensional model after texture replacement includes the posture of the first user; correct the three-dimensional model after texture replacement to a preset
  • the corrected three-dimensional model is obtained at the position corresponding to the posture.
  • the corrected three-dimensional model includes the corrected posture of the first user.
  • Obtaining the first composite image according to the corrected posture of the first user includes: rendering the corrected three-dimensional model. to the two-dimensional image to obtain the first composite image.
  • the image background replacement method provided by the embodiment of the present application can correct the user's posture through a three-dimensional model to achieve better correction effects.
  • obtaining the first image includes: selecting one from a plurality of pre-stored first images according to the environment of the virtual background.
  • the computing device may pre-store a plurality of first images, and select one from the pre-stored plurality of first images according to the environment of the virtual background.
  • the environment of the virtual background may include the light corresponding to the virtual background, place (such as classroom, conference room, bar, etc.), weather (such as sunny day, rainy day, etc.), season (such as summer, winter, etc.), etc.
  • a first image can be taken of the user under different lights, and a first image taken under similar lights can be selected according to the light corresponding to the virtual background.
  • a plurality of first images of the user corresponding to different attires in different places can be captured, and one first image corresponding to the attire is selected according to the place corresponding to the virtual background.
  • first images of the user wearing different outfits in different seasons can be captured, and a first image of the corresponding outfit can be selected according to the season corresponding to the virtual background.
  • the image background replacement method provided by the embodiment of the present application selects one from multiple pre-stored first images according to the environment of the virtual background, which can improve the effect of background replacement.
  • the second image further includes characteristics of the second user
  • the method further includes: acquiring a third image, where the third image is a two-dimensional image or a three-dimensional model, and the third image includes the characteristics of the second user.
  • Upper body frontal image information synthesize the third image and the second image to obtain a second composite image, the second composite image includes the characteristics of the second user and includes the upper body frontal information of the second user; map the body parts according to the preset Perform portrait segmentation on the second synthetic image to obtain a second target portrait corresponding to the second user; synthesize the first target portrait and the virtual background to obtain a background replacement image including: combining the first target portrait, the second target portrait and the virtual background. Synthesize to obtain a background replacement image, which includes a first target portrait and a second target portrait.
  • the image background replacement method provided by the embodiment of the present application can realize background replacement in a multi-person scenario, and obtain a background replacement image of a virtual background shared by multiple people.
  • the portrait in the background replacement image includes preset human body parts, which improves the background quality in multiple scenarios. replacement effect.
  • the background replacement image includes a first target portrait corresponding to the first user and a second target portrait corresponding to the second user.
  • the human body parts in the first target portrait and the human body parts in the second target portrait are basically the same. For example, they both include people. face, neck, and shoulders, or both the human face, neck, and chest.
  • the second image is acquired in real time
  • the user's characteristics include the user's real-time posture and/or expression.
  • the image background replacement method provided by the embodiment of the present application can obtain the second image in real time, realizing image background replacement of the real-time image.
  • obtaining the second image includes: obtaining a video containing the user's characteristics in real time, and obtaining the second image from the video.
  • obtaining the second image from the video includes: obtaining image frames from the video; performing portrait segmentation on the image frames to obtain the second image.
  • the image background replacement method provided by the embodiment of the present application can replace the image background according to the user's video.
  • a semantic segmentation algorithm is used to perform portrait segmentation on the image frame; if the image frame is a multi-person image, an instance segmentation algorithm is used to perform portrait segmentation on the image frame.
  • synthesizing the first image and the second image includes: extracting feature points from the first image to obtain a first set of feature points; extracting feature points from the second image to obtain the second feature point set; align the first image and the second image according to the first feature point set and the second feature point set, and affine transform the second image to the first image to obtain an intermediate image; according to the intermediate image, the first image and The second image generates the first composite image.
  • generating the first composite image according to the intermediate image, the first image and the second image includes: inputting the intermediate image, the first image and the second image into a generative adversarial network model, and generating the first synthetic image through the generative adversarial network model. First composite image.
  • the image background replacement method provided by the embodiment of the present application generates a synthetic image (such as a first synthetic image) through a generative adversarial network model, which can improve the picture quality of the synthetic image.
  • the intermediate image, the first image and the second image are input to the generative adversarial network model.
  • the method further includes: obtaining the generated adversarial network model.
  • a training data set, the training data set includes a second image taken at different angles and/or different postures and a frontal image of the first user's upper body taken when the second image is taken; the generative adversarial network model is trained through the training data set.
  • the first image is a three-dimensional model.
  • Obtaining the first image includes: acquiring a video shot around the upper body of the first user; establishing a discrete feature point cloud based on the video; and networking the discrete feature point cloud. Grid processing is performed to obtain the initial model; texture mapping is performed on the initial model to obtain a three-dimensional model.
  • the second aspect of this application discloses an image background replacement device, which has the function of realizing the above first aspect or any optional implementation of the first aspect.
  • the image background replacement device includes at least one unit, and at least one unit is used to implement the method provided by the above-mentioned first aspect or any optional implementation of the first aspect.
  • the units in the image background replacement device are implemented by software, and the units in the image background replacement device are program modules.
  • the units in the image background replacement device are implemented by hardware or firmware.
  • a third aspect of the present application discloses a computer-readable storage medium, which includes computer instructions.
  • the computer instructions When the computer instructions are run on a computing device, the computing device performs the image background replacement method as in the first aspect.
  • a fourth aspect of this application discloses a computing device.
  • the computing device includes a processor and a memory.
  • the memory is used to store instructions, and the processor is used to call instructions in the memory, so that the computing device executes the image background replacement method as in the first aspect.
  • the fifth aspect of this application discloses a chip system, which is used in computing equipment; the chip system includes an interface circuit and a processor; the interface circuit and the processor are interconnected through lines; the interface circuit is used to receive signals from the memory of the computing equipment, and sends a signal to the processor, the signal includes a computer instruction stored in the memory; when the processor executes the computer instruction, the chip system executes the image background replacement method of the first aspect.
  • the image background replacement device of the second aspect, the computer-readable storage medium of the third aspect, the computing device of the fourth aspect, and the chip system of the fifth aspect provided above all correspond to the method of the first aspect. Therefore, , the beneficial effects it can achieve can be referred to the beneficial effects in the corresponding methods provided above, and will not be described again here.
  • Figures 1-3 are schematic diagrams of application scenarios of the image background replacement method provided by embodiments of the present application.
  • Figure 4 is a flow chart of an image background replacement method provided by an embodiment of the present application.
  • Figures 5-6 are schematic diagrams of the effects of the image background replacement method provided by the embodiment of the present application.
  • Figure 7 is a flow chart of an image background replacement method provided by another embodiment of the present application.
  • FIG. 8A is a schematic diagram of a background replacement image obtained without correcting the user's posture according to the image background replacement method provided by an embodiment of the present application.
  • FIG. 8B is a schematic diagram of a background replacement image obtained by correcting the user's posture according to the image background replacement method provided by an embodiment of the present application.
  • FIG. 9 is a detailed flow chart for synthesizing the first image and the second image and correcting the user's posture according to an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of an image background replacement device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Background replacement Replace the background of the original image with a different background.
  • Background replacement in a single-person scenario perform background replacement on images that include a single user, and the background replacement image includes a single user;
  • Background replacement in a multi-person centralized scenario Multiple users use one user device to take photos of multiple users at the same location, and obtain images including multiple users (including photos of multiple users). Background replacement is performed on the image, and the background replacement image includes multiple users;
  • Background replacement in multi-person distributed scenarios Multiple users use different user devices to take photos of multiple users in different locations, obtain each user's image, and perform background replacement (replacing to a common image) based on each user's image. virtual background), the background replacement image includes multiple users.
  • the background cannot be replaced according to the preset human body parts.
  • the portrait in the background replacement image includes the preset human body parts, which affects the effect of background replacement.
  • the background replacement image including the face, neck, and chest.
  • the actual background replacement image includes the face and neck, which does not meet the user's needs.
  • the captured images may be inconsistent (for example, some have only faces, some include faces and necks, and some include faces, necks, and shoulders), and the human body parts in the background replacement images obtained by background replacement may also be inconsistent.
  • the portraits in the image are uneven.
  • the embodiment of the present application provides an image background replacement method that can perform background replacement according to preset human body parts.
  • the obtained background replacement image includes the preset human body parts, which improves the background replacement effect.
  • Figures 1-3 are schematic diagrams of application scenarios of the image background replacement method provided by embodiments of the present application.
  • the image background replacement method provided by the embodiment of the present application can be applied to single-person scenes and multi-person scenes (including multi-person centralized scenes and multi-person distributed scenes).
  • Figure 1 is a schematic diagram of the application scenario in a single-person scenario
  • Figure 2 is a schematic diagram of an application scenario in a multi-person centralized scenario
  • Figure 3 is a schematic diagram of an application scenario in a multi-person distributed scenario.
  • Application scenarios of embodiments of the present application include computing devices (such as the computing device 101 in Figures 1-3) and user equipment (such as the user devices 102-105 in Figures 1-3).
  • User equipment includes cameras and displays.
  • the user equipment captures the user's original image (which may include a first image and a second image, see the relevant description in Figure 4) through the camera device, and sends the user's original image to the computing device.
  • the computing device performs background replacement according to the user's original image, generates a background replacement image, and returns the background replacement image to the user device.
  • the user device displays the background replacement image on the display.
  • the application scenario includes a computing device 101 and a user device 102 .
  • the user equipment 102 captures the original image of the user A through the camera device, and sends the original image of the user A to the computing device 101 .
  • the computing device 101 performs background replacement based on user A's original image, generates a background replacement image including user A's portrait, and returns the background replacement image to the user device 102 .
  • User device 102 displays the background replacement image on the display screen.
  • the application scenario includes a computing device 101 and a user device 102 .
  • the user equipment 102 takes an original image (group photo) including user A, user B, user C, and user D through a camera device, and sends the original image including user A, user B, user C, and user D to the computing device 101 .
  • the computing device 101 performs background replacement based on the original images including User A, User B, User C, and User D, generates a background replacement image of the virtual background shared by User A, User B, User C, and User D, and returns the background replacement image to the user device.
  • User device 102 displays the background replacement image on the display screen.
  • the application scenario includes computing device 101, user device 102, user device 103, user device 104, and user device 105.
  • the user equipment 102 captures the original image of user A through its own camera, and sends the original image of user A to the computing device 101.
  • the user equipment 103 captures the original image of user B through its own camera, and sends the original image of user B to the computing device 101.
  • Sent to the computing device 101 the user device 104 takes the original image of the user C through its own camera device, sends the original image of the user C to the computing device 101, the user device 105 takes the original image of the user D through its own camera device, and sends the original image of the user C to the computing device 101.
  • the original image of D is sent to the computing device 101 .
  • the computing device 101 performs background replacement based on the original images of User A, User B, User C, and User D, generates a background replacement image of the virtual background shared by User A, User B, User C, and User D, and returns the background replacement image to the user device 102 -105.
  • the user device 102-105 displays the background replacement image on the display screen.
  • the image background replacement method provided by the embodiment of this application can be applied to video conferences, online classes, online gatherings, etc.
  • background replacement can be performed to put participants into the virtual conference background.
  • background replacement can be performed to put the lecturers into the virtual classroom background.
  • the background can be replaced and the party members can be placed into a virtual gathering background (such as a bar, coffee shop, etc.).
  • the user equipment 102-105 is installed with application software (such as video conferencing software, online classroom software, etc.).
  • application software such as video conferencing software, online classroom software, etc.
  • the user devices 102-105 run application software, capture the user's original image through the application software, send the captured original image to the computing device 101, and display the background replacement image returned by the computing device 101 through the application software.
  • the computing device 101 includes, but is not limited to, an independent physical server, a server cluster composed of multiple physical servers, or a distributed system composed of multiple physical servers.
  • Computing device 101 may be located anywhere.
  • the computing device 101 may be located in the cloud and provide users with an image background replacement service in the form of a cloud service.
  • the computing device 101 may also be any user device.
  • the scenario with four user equipments shown in Figures 2 and 3 is only an example.
  • the user equipments The quantity can be more or less. For example, there are 2, 3, or more user equipments. This embodiment does not limit the number of user equipments.
  • user equipment includes, but is not limited to, personal computers, mobile phones, servers, laptops, IP phones, cameras, tablets, wearable devices, etc.
  • User devices are connected to computing devices through wired or wireless networks.
  • FIG 4 is a flow chart of an image background replacement method provided by an embodiment of the present application.
  • the image background replacement method is applied to computing devices (such as the computing device 101 in Figures 1-3).
  • the first image used for background replacement is a two-dimensional image.
  • the first image is a two-dimensional image, and the first image includes frontal image information of the user's upper body.
  • the number of the first image may be one (for example, the first image corresponding to the first user), or may be multiple (for example, the first image corresponding to the first user and the first image corresponding to the second user).
  • the first image corresponding to the second user may be called a third image.
  • the computing device acquires the first image, the first image is a two-dimensional image, and the first image includes the upper body of the first user. Front image information.
  • the computing device acquires a third image, the third image is a two-dimensional image, and the third image includes upper body frontal image information of the second user.
  • the number of first images acquired is one.
  • the first image of user A is obtained (the number of first images is one).
  • the number of acquired first images is multiple, each first image corresponds to one user, and different first images correspond to different users. For example, referring to FIGS. 2 and 3 , the first images of user A, user B, user C, and user D are obtained respectively (the number of first images is four).
  • the first image can be obtained by taking a picture of the user through a camera device of the user equipment (such as a camera of a mobile phone or a tablet computer, or a video conferencing camera).
  • a camera device of the user equipment such as a camera of a mobile phone or a tablet computer, or a video conferencing camera.
  • the user device may capture the first image in advance and send it to the computing device.
  • the computing device stores the first image in the storage device, and when background replacement is required, the first image is obtained from the storage device.
  • the computing device may control the user device to capture the first image of the user and receive the first image sent by the user device.
  • the computing device may pre-store a plurality of first images, and select one from the pre-stored plurality of first images according to the environment of the virtual background.
  • the environment of the virtual background may include the light corresponding to the virtual background, place (such as classroom, conference room, bar, etc.), weather (such as sunny day, rainy day, etc.), season (such as summer, winter, etc.), etc.
  • the first image of the user can be captured under different lights, and a first image captured under similar lights can be selected based on the light corresponding to the virtual background.
  • multiple first images of the user corresponding to different attires in different places can be captured, and one first image corresponding to the attire is selected according to the place corresponding to the virtual background.
  • first images of the user wearing different outfits in different seasons can be captured, and a first image of the corresponding outfit can be selected according to the season corresponding to the virtual background. Selecting one from a plurality of pre-stored first images according to the environment of the virtual background can improve the background The effect of scene replacement.
  • the second image includes the user's characteristics.
  • Characteristics of the user may include the user's gestures and/or expressions.
  • the number of second images may be one (for example, the second image corresponding to the first user), or may be multiple (for example, the second image corresponding to the first user and the second image corresponding to the second user).
  • the number of second images acquired is one. For example, as shown in Figure 1, a second image of user A is obtained (the number of second images is one).
  • the number of second images acquired may be one or multiple. If the number of acquired second images is one, the second image may correspond to multiple users and include characteristics of multiple users (for example, include characteristics of the first user and characteristics of the second user). If the number of acquired second images is multiple, each second image may correspond to a user, and different second images correspond to different users, including the characteristics of different users (for example, one second image corresponds to the first user, including the second image). Characteristics of a user; another second image corresponding to the second user, including characteristics of the second user). For example, referring to FIGS. 2 and 3 , second images of user A, user B, user C, and user D can be obtained (the number of second images may be one or four).
  • the user device can capture a second image of the user and send the second image to the computing device.
  • the computing device can acquire the second image in real time.
  • the computing device can obtain the user's video in real time and obtain the second image from the video.
  • the user device captures the user's conference video in real time and sends the conference video to the computing device.
  • the computing device receives the conference video and obtains the second image from the conference video.
  • the user's device captures the user's class video in real time and sends the class video to the computing device.
  • the computing device receives the class video and obtains the second image from the class video.
  • the user device captures the user's party video in real time and sends the party video to the computing device.
  • the computing device receives the party video and obtains a second image from the party video.
  • the computing device can obtain an image frame from the video, perform portrait segmentation on the image frame, and obtain a second image.
  • the computing device can use a deep learning segmentation algorithm to perform portrait segmentation on image frames.
  • a semantic segmentation algorithm such as Pyramid Scene Parsing Network (PSPNet), Fully Convolutional Network (FCN)
  • PSPNet Pyramid Scene Parsing Network
  • FCN Fully Convolutional Network
  • an instance segmentation algorithm such as Simultaneous Detection and Segmentation (SDS) algorithm, Mask Region-based Convolutional Neural Network ( Mask RCNN) performs portrait segmentation on image frames.
  • SDS Simultaneous Detection and Segmentation
  • Mask RCNN Mask Region-based Convolutional Neural Network
  • Semantic segmentation is the classification of each pixel in the image (categories include, for example, people, grass, trees, and sky). Instance segmentation is to detect targets (such as people) in images and distinguish them different individuals (such as distinguishing different characters).
  • the application scenario of background replacement can be determined based on the video. If the application scenario of background replacement is a single-person scenario, a semantic segmentation algorithm is used to perform portrait segmentation on the image frame; if the application scenario of background replacement is a multi-person scenario, Centralized scene, using instance segmentation algorithm to perform portrait segmentation on image frames. It can be judged whether the duration of multiple people in the video exceeds the preset time (for example, 1 minute). If the duration of multiple people in the video does not exceed the preset time, it is determined to be a single-person scene. If the video includes multiple people, it is determined to be a single-person scene. If the duration exceeds the preset time, it is determined to be a multi-person centralized scene.
  • the preset time for example, 1 minute
  • the image frame is a single person image, perform portrait segmentation on the image frame to obtain a second image. If the image frame is an image of multiple people, perform portrait segmentation on the image frame to obtain multiple second images. Multiple second images may have inconsistent human body parts (for example, some have shoulders and some have no shoulders) and image sizes are inconsistent.
  • the composite image includes the user's characteristics (such as the user's real-time characteristics) and includes the user's upper body frontal information.
  • the number of acquired first images and second images is one (for example, the first image and the second image corresponding to the first user), and the first image and the second image are obtained.
  • the image is combined with the second image to obtain a combined image (the number of combined images is one, for example, the first combined image corresponding to the first user).
  • obtain the first image and the second image of user A (the number of the first image and the second image is one), synthesize the first image and the second image of user A, and obtain the user Composite image of A.
  • the number of first images may be multiple, and the number of second images may be one or multiple (for example, obtaining the first image corresponding to the first user).
  • each first image and the corresponding second image are synthesized to obtain a composite image corresponding to each user (for example, a first composite image corresponding to the first user, the second composite image corresponding to the second user).
  • Figures 2 and 3 obtain the first image and the second image corresponding to user A, user B, user C, and user D respectively (the number of first images and second images is four), and The first image and the second image corresponding to user A are synthesized to obtain the composite image of user A.
  • the first image and the second image corresponding to user B are synthesized to obtain the composite image of user B.
  • the first image corresponding to user C is obtained.
  • the image and the second image are combined to obtain a combined image of user C.
  • the first image and the second image corresponding to user D are combined to obtain a combined image of user D. If the number of first images is multiple and the number of second images is one, then the corresponding portraits in each first image and the second image are synthesized to obtain a composite image corresponding to each user.
  • feature points can be extracted from the first image to obtain a first set of feature points; feature points can be extracted from the second image to obtain a second set of feature points; according to the first set of feature points and the second set of feature points
  • the two feature point sets align the first image and the second image, affine transform the second image to the first image, and obtain an intermediate image; generate a composite image based on the intermediate image, the first image, and the second image.
  • a scale-invariant feature transform (SIFT) algorithm can be used to extract feature points from the first image and the second image respectively.
  • SIFT scale-invariant feature transform
  • the intermediate image, the first image and the second image can be input into the first generative adversarial network model, and the synthetic image is generated through the first generative adversarial network model.
  • the first generative adversarial network model Before inputting the intermediate image, the first image and the second image into the first generative adversarial network model, the first generative adversarial network model needs to be trained.
  • the training data set of the first generative adversarial network model can be obtained, and the first generative adversarial network model can be trained through the training data set.
  • Second images taken at different angles and different postures can be taken, and a frontal image of the user's upper body is taken when taking each second image, and the second images taken at different angles and different postures and the corresponding upper body frontal images are used as the first generated confrontation Training data set for the network model.
  • the image background replacement method provided by the embodiment of the present application is applied to a single-person scene, the number of composite images is one, and the composite image is segmented according to the preset human body parts to obtain a target portrait (for example, the first user corresponding to the first user). target portrait).
  • a target portrait for example, the first user corresponding to the first user.
  • target portrait For example, as shown in Figure 1, after obtaining a composite image of user A, perform portrait segmentation on the composite image of user A based on preset human body parts to obtain a target portrait of user A.
  • portrait segmentation is performed on each synthesized image according to the preset human body parts to obtain the target portrait corresponding to each user (for example, The first target portrait corresponding to the first user and the second target portrait corresponding to the second user).
  • the target portrait corresponding to each user for example, The first target portrait corresponding to the first user and the second target portrait corresponding to the second user.
  • Figures 2 and 3 after obtaining the composite images of User A, User B, User C, and User D, perform portrait segmentation on the composite image of User A according to the preset human body parts, and obtain the target of User A.
  • For portraits perform portrait segmentation on the composite image of user B based on the preset human body parts to obtain the target portrait of user B.
  • preset human body parts can be determined based on the virtual background. For example, if the virtual background is a classroom, the preset human body parts may include the face, neck, and chest. For another example, if the virtual background is a bar, the preset human body parts can include the face, neck and waist. The corresponding relationship between the virtual background and the preset human body parts can be set in advance.
  • the portrait size can be determined based on preset human body parts, and the target portrait can be segmented from the composite image based on the portrait size.
  • the portrait size may be a preset multiple of the reference size in the composite image (eg, the distance from the eyes to the top of the head).
  • the preset human body parts include the face and neck, and the portrait size is 2.5 times the distance from the eyes to the top of the head in the composite image.
  • the preset human body parts include the face, neck and chest, and the portrait size is 5 times the distance from the eyes to the top of the head in the composite image.
  • human body parts can be identified in a synthetic image based on preset human body parts, and a target portrait can be segmented from the synthetic image based on the recognized human body parts.
  • the preset human body parts include the face and neck, identify the user's face and neck from the synthesized image, and segment the target portrait from the synthesized image based on the recognized face and neck.
  • the target portrait and the virtual background are synthesized to obtain a target portrait including one (such as the first target portrait) background replacement image.
  • the target portrait of user A is synthesized with the virtual background to obtain a background replacement image including the target portrait of user A.
  • the multiple target portraits will be The target portrait is synthesized with the virtual background to obtain a background replacement image including multiple target portraits (for example, a first target portrait and a second target portrait).
  • a background replacement image including multiple target portraits (for example, a first target portrait and a second target portrait).
  • the target portrait of user A, the target portrait of user B, the target portrait of user C, the target portrait of user D and the virtual background are synthesized to obtain the target portrait of user A, user B
  • the background replacement image of the target portrait of user C, the target portrait of user D, and the target portrait of user D are synthesized to obtain the target portrait of user A, user B
  • the background replacement image of the target portrait of user C, the target portrait of user D, and the target portrait of user D are synthesized to obtain the target portrait of user A, user B
  • the user's position in the virtual background can be obtained, and the target portrait and the virtual background are synthesized based on the user's position in the virtual background.
  • the user's location in the virtual background may be a location pre-selected by the user in the virtual background.
  • the user's location in the virtual background may be a pre-assigned location of the user in the virtual background.
  • Each position in the virtual background can be preset with a number. For example, if the virtual background is a classroom, the number of the first position from the left in row 1 is 1.1, the number of the second position from the left in row 1 is 1.2, the number of the third position from the left in row 1 is 1.3,... ...; The number for the first position from the left in row 2 is 2.1, the number for the second position from the left in row 2 is 2.2, the number for the third position from the left in row 2 is 2.3...; The number for the third position from the left in row 3 The number for the first position from the left is 3.1, the number for the second position from the left in the third row is 3.2, the number for the third position from the left in the third row is 3.3... and so on. If the user's position in the virtual background is 3.2, it means that the user's target portrait is synthesized into the second position from the left in the third row in the virtual background.
  • the target portrait when combining the target portrait with the virtual background, can be scaled according to the size of the position in the virtual scene, and the scaled target portrait can be combined with the virtual background.
  • Scaling the target portrait according to the size of the position in the virtual scene is to adjust the target portrait to adapt to the size of the position in the virtual scene.
  • the larger the position in the virtual scene the larger the target portrait will be after scaling.
  • the smaller the position in the virtual scene the smaller the target portrait after scaling.
  • the target portrait can be scaled according to the user's position in the virtual background, and the scaled target portrait and the virtual background can be synthesized.
  • Scaling the target portrait according to the user's position in the virtual background is to adjust the target portrait to adapt to the user's position in the virtual background (satisfying the rules of near and far). The further forward the user is in the virtual background, the larger the target portrait will be after scaling. The further back the user is in the virtual background, the smaller the zoomed target portrait becomes.
  • scaling the target portrait according to the user's position in the virtual background can make the background replacement image look more realistic.
  • the virtual background is a lecture theater. According to the user’s virtual
  • the target portrait is scaled by its position in the background.
  • the resulting background replacement image has a larger portrait in the front and a smaller portrait in the back, making the effect more realistic.
  • the target portrait can be scaled based on the size of the location in the virtual scene and the user's location in the virtual background.
  • Figures 5-6 are schematic diagrams of the effects of the image background replacement method provided by the embodiment of the present application.
  • Figure 5 shows background replacement in a multi-person scenario.
  • the target character is not scaled during the background replacement process.
  • a background replacement image including preset human body parts can be obtained.
  • the target person is not scaled during the background replacement process, and the size of the portrait in the resulting background replacement image is different.
  • Figure 6 shows background replacement in a multi-person scenario.
  • the target character is scaled.
  • a background replacement image including preset human body parts can be obtained.
  • the target portrait is scaled according to the size of the position in the virtual scene and the user's position in the virtual background.
  • the portraits in the same row are the same size, and the portraits in different rows are different in size (the front one is The portrait is large, the portrait behind is small), and the effect of background replacement image is more realistic.
  • the computing device synthesizes the first image and the second image to obtain a composite image.
  • the composite image includes the user's characteristics (such as the user's posture) and includes the user's upper body frontal information.
  • the computing device performs portrait segmentation on the synthesized image according to the preset human body parts to obtain a target portrait, where the target portrait includes the preset human body parts.
  • the computing device synthesizes the target portrait and the virtual background to obtain a background replacement image. Since the target portrait obtained from the synthetic image segmentation includes preset human body parts, the background replacement image obtained by synthesizing the target portrait and the virtual background also includes the preset human body parts, thereby realizing background replacement according to the preset human body parts. , ensuring that the obtained background replacement image includes preset human body parts, which improves the background replacement effect.
  • FIG 7 is a flow chart of an image background replacement method provided by another embodiment of the present application.
  • the image background replacement method is applied to computing devices (such as the computing device 101 in Figures 1-3).
  • the first image used for background replacement is a three-dimensional model.
  • the computing device corrects the user's posture during the background replacement process to further optimize the background replacement effect.
  • the first image is a three-dimensional model, and the first image includes frontal image information of the user's upper body.
  • the user device can be rotated around the user's upper body to capture a video, a discrete feature point cloud is established based on the video, the discrete feature point cloud is meshed to obtain an initial model, and the initial model is textured Texture to obtain a three-dimensional model.
  • Textures are images, and texture mapping uses textures to cover the initial model and arrange the textures on the initial model.
  • the three-dimensional model obtained through texture mapping will be more realistic in visual effects.
  • the second image includes the user's characteristics.
  • Characteristics of the user may include the user's gesture. Characteristics of the user may also include the user's expression.
  • the computing device corrects the user's posture according to the preset posture during the process of synthesizing the first image and the second image.
  • Correcting the user's posture according to the preset posture means correcting the user's posture to the preset posture. For example, if the user's posture is an inclined sitting posture (for example, leaning to the left or right), the inclined sitting posture is corrected to an upright sitting posture. For another example, if the user's posture is a side posture (that is, the posture of turning the head), the side posture is corrected to a front posture. For another example, the user's posture is a head-down posture, and the head-down posture is corrected to a head-up posture.
  • the user's posture may not be corrected when synthesizing the first image and the second image.
  • the computing device synthesizes the first image and the second image to obtain a synthesized image.
  • the synthesized image includes the user's characteristics and includes the user's upper body frontal information.
  • the computing device performs portrait segmentation on the synthesized image according to the preset human body parts to obtain a target portrait, and synthesizes the target portrait and the virtual background to obtain a background replacement image. Since the target portrait obtained from the synthetic image segmentation includes preset human body parts, the background replacement image obtained by synthesizing the target portrait and the virtual background also includes the preset human body parts, thereby realizing background replacement according to the preset human body parts. , ensuring that the obtained background replacement image includes preset human body parts, which improves the background replacement effect. Moreover, when the image background replacement method shown in Figure 7 combines the first image and the second image, the user's posture is corrected according to the preset posture, and the background replacement image of the preset posture can be obtained, further optimizing the background replacement. Effect.
  • FIG. 8A is a schematic diagram of a background replacement image obtained without correcting the user's posture according to the image background replacement method provided by an embodiment of the present application.
  • the postures of the users in the second image are different. Some are sitting upright, some are leaning to the left, and some are leaning to the right.
  • the background replacement image obtained without correcting the user's posture the human body parts of each portrait are consistent.
  • FIG. 8B is a schematic diagram of a background replacement image obtained by correcting the user's posture according to the image background replacement method provided by an embodiment of the present application.
  • the postures of the users in the second image are different. Some are sitting upright, some are leaning to the left, and some are leaning to the right.
  • the background replacement image obtained by correcting the user's posture the human body parts of each portrait are consistent (including the face, neck and shoulders), the postures of each portrait are consistent (all are upright sitting postures), the visual effect is neater, and the background is further optimized. replacement effect.
  • FIG. 9 is a detailed flow chart for synthesizing the first image and the second image and correcting the user's posture according to an embodiment of the present application.
  • the first image is a three-dimensional model.
  • the second image can be input into a pre-trained face segmentation network, and the face segmentation network outputs a face image.
  • a face image can be input into a face texture generation model, and the output of the face texture generation model is a face texture map.
  • the facial texture map includes the characteristics of the user.
  • the face texture generation model may be a second generative adversarial network model.
  • the three-dimensional model of the target user can be used to generate two-dimensional face images from different angles.
  • the two-dimensional face images from different angles and the facial texture map of the three-dimensional model are used as training data to train the neural network and obtain the face texture generation model.
  • the input of the neural network is two-dimensional face images from different angles, and the output is the face texture image of the three-dimensional model. Since the input and output of the neural network are both two-dimensional images, the second generative adversarial network model can be used for training.
  • the face texture generation model is a model obtained for the target person, which can generate high-quality facial texture for the target person.
  • FIG 10 is a schematic structural diagram of an image background replacement device provided by an embodiment of the present application.
  • the image background replacement device 1000 shown in Figure 10 is used to implement the methods shown in Figures 4, 7 and 9.
  • the image background replacement device 1000 shown in Figure 10 can be provided on the computing device 101 in Figures 1-3.
  • the image background replacement device 1000 includes an acquisition unit 1001 and a synthesis unit. 1002 and segmentation unit 1003.
  • the acquisition unit 1001 is used to support the image background replacement device 1000 in performing steps 401 and 402 in Figure 4 and steps 701 and 702 in Figure 7 .
  • the synthesis unit 1002 is used to support the image background replacement device 1000 in performing steps 403 and 405 in Figure 4 and steps 703 and 705 in Figure 7 .
  • the segmentation unit 1003 is used to support the image background replacement device 1000 in performing step 404 in Figure 4 and step 704 in Figure 7 .
  • the device embodiment described in Figure 10 is only illustrative.
  • the division of the above units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • Each functional unit in various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • Each unit in the image background replacement device 1000 is implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • the above-mentioned acquisition unit 1001, synthesis unit 1002 and segmentation unit 1003 are software functional units generated by at least one processor 1101 in Figure 11 after reading the program code stored in the memory 1102. accomplish.
  • the above-mentioned units in FIG. 10 are respectively implemented by different hardware in the computing device.
  • the acquisition unit 1001 is implemented by a part of the processing resources of at least one processor 1101 in FIG. 11 (such as a multi-core processor).
  • One core or two cores in the processor) while the synthesis unit 1002 and the split unit 1003 are processed by the remaining processing resources of at least one processor 1101 in Figure 11 (such as other cores in a multi-core processor), or are field programmable This is accomplished by programmable devices such as field-programmable gate array (FPGA) or coprocessor.
  • FPGA field-programmable gate array
  • the synthesis unit 1002 When implemented using a combination of software and hardware, for example, the synthesis unit 1002 is implemented by a hardware programmable device, while the synthesis unit 1002 and the segmentation unit 1003 are software generated by the CPU after reading the program code stored in the memory. Functional unit.
  • the following is an example of the basic hardware structure related to computing equipment.
  • FIG. 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the computing device 101 shown in Figure 11 is used to perform the methods shown in Figures 4, 7 and 9.
  • Computing device 101 includes at least one processor 1101, memory 1102, and at least one network interface 1103.
  • the processor 1101 is, for example, a general central processing unit (CPU), a network processor (NP), a graphics processing unit (GPU), or a neural network processor (NPU). ), a data processing unit (DPU), a microprocessor or one or more integrated circuits used to implement the solution of the present application.
  • the processor 1101 includes an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • PLD is, for example, a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.
  • the memory 1102 is, for example, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, or a random access memory (random access memory, RAM) or a device that can store information and instructions.
  • ROM read-only memory
  • RAM random access memory
  • Other types of dynamic storage devices such as electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical discs Storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can Any other media accessed by a computer, without limitation.
  • the memory 1102 exists independently and is connected to the processor 1101 through an internal connection 1104.
  • memory 1102 and processor 1101 are optionally integrated together.
  • Network interface 1103 uses any transceiver-like device for communicating with other devices or communications networks.
  • the network interface 1103 includes, for example, at least one of a wired network interface or a wireless network interface. item.
  • the wired network interface is, for example, an Ethernet interface.
  • the Ethernet interface is, for example, an optical interface, an electrical interface or a combination thereof.
  • the wireless network interface is, for example, a wireless local area networks (WLAN) interface, a cellular network interface or a combination thereof.
  • WLAN wireless local area networks
  • processor 1101 includes one or more CPUs, such as CPU0 and CPU1 as shown in Figure 11.
  • computing device 101 optionally includes multiple processors, such as processor 1101 and processor 1105 shown in FIG. 11 .
  • processors are, for example, a single-core processor (single-CPU) or a multi-core processor (multi-CPU).
  • Processor here optionally refers to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • computing device 101 also includes internal connections 1104 .
  • the processor 1101, the memory 1102 and at least one network interface 1103 are connected through an internal connection 1104.
  • Internal connections 1104 include pathways that carry information between the components described above.
  • internal connection 1104 is a single board or bus.
  • the internal connections 1104 are divided into address bus, data bus, control bus, etc.
  • computing device 101 also includes input and output interface 1106. Input/output interface 1106 is connected to internal connection 1104.
  • the processor 1101 implements the method in the above embodiment by reading the program code 1110 stored in the memory 1102, or the processor 1101 implements the method in the above embodiment by using the internally stored program code.
  • the memory 1102 stores the program code that implements the method provided by the embodiment of the present application.
  • A refers to B, which means that A is the same as B or that A is a simple transformation of B.
  • first and second in the description and claims of the embodiments of this application are used to distinguish different objects, rather than to describe a specific order of objects, and cannot be understood to indicate or imply relative importance. sex.
  • first resource and the second resource are used to distinguish different resources, rather than describing a specific order of resources, nor can it be understood that the first resource is more important than the second resource.
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present application are generated in whole or in part.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated therein.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Architecture (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请实施例提供一种图像背景替换方法及相关设备,所述方法包括:获取第一图像,第一图像为二维图像或三维模型,第一图像包括第一用户的上半身正面图像信息;获取第二图像,第二图像包括第一用户的特征;将第一图像与第二图像进行合成,得到第一合成图像,第一合成图像包括第一用户的特征,且包括第一用户的上半身正面信息;根据预设的人体部位对第一合成图像进行人像分割,得到第一用户对应的第一目标人像;将第一目标人像与虚拟背景进行合成,得到背景替换图像,背景替换图像包括第一目标人像。所述方法能够按照预设的人体部位进行图像背景替换,提高背景替换的效果。

Description

图像背景替换方法及相关设备
本申请要求于2022年09月06日提交中国专利局,申请号为202211084989.X、申请名称为“图像背景替换方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种图像背景替换方法及相关设备。
背景技术
图像的背景替换是指将原图像的背景替换成另一个不同的背景。在背景替换时,一般先对原图像进行前后景分割,得到前景(即人像)和原始背景,再将前景和虚拟背景进行合成,得到背景替换图像。现有技术在进行图像背景替换时,得到的背景替换图像中不同用户的人体部位经常不一致,背景替换后人像不整齐,效果不好。
发明内容
本申请实施例提供了一种图像背景替换方法及相关设备,能够按照预设的人体部位进行图像背景替换,提高背景替换效果。
本申请第一方面公开了一种图像背景替换方法,所述方法包括:获取第一图像,第一图像为二维图像或三维模型,第一图像包括第一用户的上半身正面图像信息;获取第二图像,第二图像包括第一用户的特征,特征包括姿态和/或表情;将第一图像与第二图像进行合成,得到第一合成图像,第一合成图像包括第一用户的特征,且包括第一用户的上半身正面信息;根据预设的人体部位对第一合成图像进行人像分割,得到第一用户对应的第一目标人像;将第一目标人像与虚拟背景进行合成,得到背景替换图像,背景替换图像包括第一目标人像。
目前,在执行背景替换时,无法按照预设的人体部位进行背景替换,不能保证背景替换图像中的人像包括预设的人体部位,影响背景替换的效果。例如,在单人场景下,需要得到包括人脸、脖子和胸部的背景替换图像,实际得到的背景替换图像包括人脸和脖子,不满足用户需求。又如,在多人场景(可以是多人集中式场景,也可以是多人分散式场景)下,由于拍摄角度、用户身高、用户动作(站立或坐下)不同等原因,拍摄的图像中不同用户的人体部位可能不一致(例如有的只有人脸、有的包括人脸和脖子、有的包括人脸、脖子和肩膀),背景替换得到的背景替换图像中 人体部位也可能不一致,背景替换图像中人像不整齐。
本申请实施例提供的图像背景替换方法能够按照预设的人体部位进行背景替换,得到的背景替换图像包括预设的人体部位,提高了背景替换效果。
在一些可选的实施方式中,根据预设的人体部位对第一合成图像进行人像分割包括:根据预设的人体部位确定人像尺寸,根据人像尺寸对第一合成图像进行人像分割;或者根据预设的人体部位在第一合成图像中进行人体部位识别,根据识别的人体部位对第一合成图像进行人像分割。
本申请实施例提供的图像背景替换方法可以通过确定人像尺寸,或者进行人体部位识别两种方式对合成图像进行人像分割,能够从合成图像中准确分割出目标人像。
在一些可选的实施方式中,将第一目标人像与虚拟背景进行合成包括:根据虚拟场景中位置的大小和/或第一用户在虚拟背景中的位置对第一目标人像进行缩放;将缩放后的第一目标人像与虚拟背景进行合成。
根据虚拟场景中位置的大小对目标人像(例如第一目标人像)进行缩放,是将目标人像调整为适应虚拟场景中位置的大小。虚拟场景中的位置越大,缩放后的目标人像就越大。虚拟场景中的位置越小,缩放后的目标人像就越小。本申请实施例提供的图像背景替换方法根据虚拟场景中位置的大小对目标人像进行缩放,可以使背景替换图像中的人像更整齐,背景替换效果更佳。
根据用户在虚拟背景中的位置对目标人像进行缩放,是将目标人像调整为适应用户在虚拟背景中的位置(满足近大远小的规则)。用户在虚拟背景中的位置越靠前,缩放后的目标人像就越大。用户在虚拟背景中的位置越靠后,缩放后的目标人像就越小。当本申请实施例提供的图像背景替换方法应用在多人场景时,根据用户在虚拟背景中的位置对目标人像进行缩放,可以使背景替换图像看起来更加真实。例如,虚拟背景是阶梯教室,根据用户在虚拟背景中的位置对目标人像进行缩放,得到的背景替换图像中靠前的人像大,靠后的人像小,效果更加真实。
在一些可选的实施方式中,第一用户在虚拟背景中的位置由第一用户预先选择得到,或者预先分配得到。
本申请实施例提供的图像背景替换方法可以为用户提供在虚拟背景中选择位置,或者为用户在虚拟背景中预分配位置,提高背景替换的灵活性。
在一些可选的实施方式中,第一用户的特征包括第一用户的姿态,将第一图像与第二图像进行合成包括:根据预设的姿态对第一用户的姿态进行矫正;根据第一用户的矫正后的姿态得到第一合成图像,第一合成图像包括第一用户的矫正后的姿态,且包括第一用户的上半身正面信息。
本申请实施例提供的图像背景替换方法根据预设的姿态对用户的姿态进行矫正,可以满足背景替换中对用户姿态的需求。多人场景下,若背景替换过程中对用户的姿态进行矫正,得到的背景替换图像中人像的姿态一致,视 觉效果更整齐,进一步优化了背景替换的效果。
在一些可选的实施方式中,第一图像为三维模型,根据预设的姿态对第一用户的姿态进行矫正包括:对第二图像进行人脸分割,得到人脸图像;根据人脸图像生成脸部纹理;根据脸部纹理对三维模型进行脸部纹理替换,得到纹理替换后的三维模型,纹理替换后的三维模型包括第一用户的姿态;将纹理替换后的三维模型矫正到预设的姿态对应的位置,得到矫正后的三维模型,矫正后的三维模型包括第一用户的矫正后的姿态;根据第一用户的矫正后的姿态得到第一合成图像包括:将矫正后的三维模型渲染到二维图像,得到第一合成图像。
本申请实施例提供的图像背景替换方法通过三维模型可以对用户姿态进行矫正,实现较好的校正效果。
在一些可选的实施方式中,获取第一图像包括:根据虚拟背景的环境从预先存储的多个第一图像中选择一个。
计算设备可以预先存储多个第一图像,根据虚拟背景的环境从预先存储的多个第一图像中选择一个。虚拟背景的环境可以包括虚拟背景对应的光线、场所(例如教室、会议室、酒吧等)、天气(例如晴天、雨天等)、季节(例如夏天、冬天等)等。例如,可以在不同光线下对用户拍摄第一图像,根据虚拟背景对应的光线选择相似光线下拍摄的一个第一图像。又如,可以拍摄用户对应不同场所不同着装的多个第一图像,根据虚拟背景对应的场所选择对应着装的一个第一图像。再如,可以拍摄用户对应不同季节不同着装的第一图像,根据虚拟背景对应的季节选择相应着装的一个第一图像。本申请实施例提供的图像背景替换方法根据虚拟背景的环境从预先存储的多个第一图像中选择一个,可以提高背景替换的效果。
在一些可选的实施方式中,第二图像还包括第二用户的特征,所述方法还包括:获取第三图像,第三图像为二维图像或三维模型,第三图像包括第二用户的上半身正面图像信息;将第三图像与第二图像进行合成,得到第二合成图像,第二合成图像包括第二用户的特征,且包括第二用户的上半身正面信息;根据预设的人体部位对第二合成图像进行人像分割,得到第二用户对应的第二目标人像;将第一目标人像与虚拟背景进行合成,得到背景替换图像包括:将第一目标人像、第二目标人像与虚拟背景进行合成,得到背景替换图像,背景替换图像包括第一目标人像和第二目标人像。
本申请实施例提供的图像背景替换方法可以实现多人场景下的背景替换,得到多人共享虚拟背景的背景替换图像,背景替换图像中的人像包括预设的人体部位,提高了多场景下背景替换的效果。例如,背景替换图像包括第一用户对应的第一目标人像和第二用户对应的第二目标人像,第一目标人像中的人体部位与第二目标人像中的人体部位基本一致,例如都包括人脸、脖子和肩膀,或者都包括人脸、脖子和胸部。
在一些可选的实施方式中,第二图像是实时获取的,用户的特征包括用户的实时的姿态和/或表情。
本申请实施例提供的图像背景替换方法可以实时获取第二图像,实现了实时图像的图像背景替换。
在一些可选的实施方式中,获取第二图像包括:实时获取包含用户的特征的视频,从视频中得到第二图像。
在一些可选的实施方式中,从视频中得到第二图像包括:从视频中获取图像帧;对图像帧进行人像分割,得到第二图像。
本申请实施例提供的图像背景替换方法可以根据用户的视频进行图像背景替换。
在一些可选的实施方式中,若图像帧是单人图像,采用语义分割算法对图像帧进行人像分割;若图像帧是多人图像,采用实例分割算法对图像帧进行人像分割。
针对不同的图像帧采用不同的分割算法,可以实现更准确的人像分割。
在一些可选的实施方式中,对第一图像和第二图像进行合成包括:对第一图像进行特征点抽取,得到第一特征点集合;对第二图像进行特征点抽取,得到第二特征点集合;根据第一特征点集合和第二特征点集合对第一图像和第二图像进行对齐,将第二图像仿射变换到第一图像,得到中间图像;根据中间图像、第一图像和第二图像生成第一合成图像。
在一些可选的实施方式中,根据中间图像、第一图像和第二图像生成第一合成图像包括:将中间图像、第一图像和第二图像输入生成对抗网络模型,通过生成对抗网络模型生成第一合成图像。
本申请实施例提供的图像背景替换方法通过生成对抗网络模型生成合成图像(例如第一合成图像),可以提高合成图像的图片质量。
在一些可选的实施方式中,将中间图像、第一图像和第二图像输入生成对抗网络模型,通过生成对抗网络模型生成第一合成图像之前,所述方法还包括:获取生成对抗网络模型的训练数据集,训练数据集包括不同角度和/或不同姿态下拍摄的第二图像和在拍摄第二图像时拍摄的第一用户的上半身正面图像;通过训练数据集对生成对抗网络模型进行训练。
在一些可选的实施方式中,第一图像为三维模型,获取第一图像包括:获取围绕第一用户的上半身旋转一周拍摄的视频;根据视频建立离散特征点云;将离散特征点云进行网格化处理,得到初始模型;对初始模型进行纹理贴图,得到三维模型。
本申请第二方面公开了一种图像背景替换装置,该图像背景替换装置具有实现上述第一方面或第一方面任一种可选的实施方式的功能。该图像背景替换装置包括至少一个单元,至少一个单元用于实现上述第一方面或第一方面任一种可选的实施方式所提供的方法。在一些实施例中,图像背景替换装置中的单元通过软件实现,图像背景替换装置中的单元是程序模块。在另一些实施例中,图像背景替换装置中的单元通过硬件或固件实现。第二方面提供的图像背景替换装置的具体细节可参见上述第一方面或第一方面任一种可选的实施方式,此处不再赘述。
本申请第三方面公开了一种计算机可读存储介质,包括计算机指令,当计算机指令在计算设备上运行时,使得计算设备执行如第一方面的图像背景替换方法。
本申请第四方面公开了一种计算设备,计算设备包括处理器和存储器,存储器用于存储指令,处理器用于调用存储器中的指令,使得计算设备执行如第一方面的图像背景替换方法。
本申请第五方面公开了一种芯片系统,该芯片系统应用于计算设备;芯片系统包括接口电路和处理器;接口电路和处理器通过线路互联;接口电路用于从计算设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行该计算机指令时,芯片系统执行如第一方面的图像背景替换方法。
应当理解地,上述提供的第二方面的图像背景替换装置、第三方面的计算机可读存储介质、第四方面的计算设备及第五方面的芯片系统均与上述第一方面的方法对应,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1-图3是本申请实施例提供的图像背景替换方法的应用场景示意图。
图4是本申请实施例提供的图像背景替换方法的流程图。
图5-图6是本申请实施例提供的图像背景替换方法的效果示意图。
图7是本申请另一实施例提供的图像背景替换方法的流程图。
图8A是根据本申请实施例提供的图像背景替换方法,未对用户的姿态进行矫正得到的背景替换图像的示意图。
图8B是根据本申请实施例提供的图像背景替换方法,对用户的姿态进行矫正得到的背景替换图像的示意图。
图9是本申请实施例提供的将第一图像与第二图像进行合成并对用户的姿态进行矫正的细化流程图。
图10是本申请实施例提供的图像背景替换装置的结构示意图。
图11是本申请实施例提供的计算设备的结构示意图。
具体实施方式
需要说明的是,本申请中“至少一个”是指一个或者多个,“多个”是指两个或多于两个。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。本申请的说明书和权利要求书及附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不是用于描述特定的顺序或先后次序。
为了便于理解,在介绍本申请实施例的技术方案之前,先对本申请实施例中应用的部分术语进行示例性地说明以供参考。
背景替换:将原图像的背景替换成另一个不同的背景。在背景替换时,先对原图像进行前后景分割,得到前景(即人像)和原始背景,再将前景和虚拟背景进行合成,得到背景替换图像。
背景替换主要分为以下三类:
单人场景下的背景替换:对包括单个用户的图像进行背景替换,背景替换图像包括单个用户;
多人集中式场景下的背景替换:多个用户在同一个地点,利用一个用户设备对多个用户进行拍照,得到包括多个用户的图像(包括多个用户的合照),对包括多个用户的图像进行背景替换,背景替换图像包括多个用户;
多人分散式场景下的背景替换:多个用户在不同地点,利用不同的用户设备对多个用户分别进行拍照,得到每个用户的图像,根据每个用户的图像进行背景替换(替换到共同的虚拟背景中),背景替换图像包括多个用户。
目前,在执行背景替换时,无法按照预设的人体部位进行背景替换,不能保证背景替换图像中的人像包括预设的人体部位,影响背景替换的效果。例如,在单人场景下,需要得到包括人脸、脖子和胸部的背景替换图像,实际得到的背景替换图像包括人脸和脖子,不满足用户需求。又如,在多人场景(可以是多人集中式场景,也可以是多人分散式场景)下,由于拍摄角度、用户身高、用户动作(站立或坐下)不同等原因,拍摄的图像中不同用户的人体部位可能不一致(例如有的只有人脸、有的包括人脸和脖子、有的包括人脸、脖子和肩膀),背景替换得到的背景替换图像中人体部位也可能不一致,背景替换图像中人像不整齐。
本申请实施例提供图像背景替换方法能够按照预设的人体部位进行背景替换,得到的背景替换图像包括预设的人体部位,提高了背景替换效果。
为了更好地理解本申请实施例提供的图像背景替换方法及相关设备,下面首先对本申请实施例提供的图像背景替换方法的应用场景进行描述。
图1-图3是本申请实施例提供的图像背景替换方法的应用场景示意图。本申请实施例提供的图像背景替换方法可以应用于单人场景和多人场景(包括多人集中式场景和多人分散式场景)。图1是单人场景下的应用场景示意图,图2是多人集中式场景下的应用场景示意图,图3是多人分散式场景下的应用场景示意图。
本申请实施例的应用场景包括计算设备(例如图1-图3中的计算设备101)和用户设备(例如图1-图3中的用户设备102-105)。用户设备包括摄像装置和显示屏。用户设备通过摄像装置拍摄用户的原图像(可以包括第一图像和第二图像,参见图4相关描述),将用户的原图像发送至计算设备。计算设备根据用户的原图像进行背景替换,生成背景替换图像,将背景替换图像返回用户设备。用户设备将背景替换图像显示在显示屏上。
参见图1所示,在单人场景下,应用场景包括计算设备101以及用户设备102。用户设备102通过摄像装置拍摄用户A的原图像,将用户A的原图像发送至计算设备101。计算设备101根据用户A的原图像进行背景替换,生成包括用户A的人像的背景替换图像,将背景替换图像返回用户设备102。用户设备102将背景替换图像显示在显示屏上。
参见图2所示,在多人集中式场景下,应用场景包括计算设备101以及用户设备102。用户设备102通过摄像装置拍摄包括用户A、用户B、用户C、用户D的原图像(合照),将包括用户A、用户B、用户C、用户D的原图像发送至计算设备101。计算设备101根据包括用户A、用户B、用户C、用户D的原图像进行背景替换,生成用户A、用户B、用户C、用户D共享虚拟背景的背景替换图像,将背景替换图像返回用户设备102。用户设备102将背景替换图像显示在显示屏上。
参见图3所示,在多人分散式场景下,应用场景包括计算设备101、用户设备102、用户设备103、用户设备104、用户设备105。其中,用户设备102通过自身的摄像装置拍摄用户A的原图像,将用户A的原图像发送至计算设备101,用户设备103通过自身的摄像装置拍摄用户B的原图像,将用户B的原图像发送至计算设备101,用户设备104通过自身的摄像装置拍摄用户C的原图像,将用户C的原图像发送至计算设备101,用户设备105通过自身的摄像装置拍摄用户D的原图像,将用户D的原图像发送至计算设备101。计算设备101根据用户A、用户B、用户C、用户D的原图像进行背景替换,生成用户A、用户B、用户C、用户D共享虚拟背景的背景替换图像,将背景替换图像返回用户设备102-105。用户设备102-105将背景替换图像显示在显示屏上。
本申请实施例提供的图像背景替换方法可以应用于视频会议、在线课堂、线上聚会等。例如,在视频会议中,可以进行背景替换,将参会人员放入虚拟会议背景。又如,在在线课堂中,可以进行背景替换,将听课人员放入虚拟课堂背景。再如,在线上聚会时,可以进行背景替换,将聚会人员放入虚拟聚会背景(例如酒吧、咖啡厅等)。
可选地,用户设备102-105安装有应用软件(例如视频会议软件、在线课堂软件等)。用户设备102-105运行应用软件,通过应用软件拍摄用户的原图像,将拍摄的原图像发送给计算设备101,通过应用软件显示计算设备101返回的背景替换图像。
计算设备101存在多种可能的产品形态。例如,计算设备101包括而不限于一台独立的物理服务器,或者多个物理服务器构成的服务器集群,或者多个物理服务器构成的分布式系统。计算设备101可以位于任意位置。计算设备101可以是位于云端,以云服务的形式为用户提供图像背景替换服务。
在一种实施方式中,计算设备101也可以是任意一个用户设备。
图2-图3示出的存在四个用户设备的场景仅是举例说明,用户设备的 数量可以更多或更少。例如用户设备为2个、3个,或者更多数量,本实施例对用户设备的数量不做限定。
图1-图3示出的用户设备存在多种可能的产品形态。例如,用户设备包括而不限于个人计算机、移动电话、服务器、笔记本电脑、IP电话、摄像头、平板电脑、可穿戴设备等。用户设备通过有线网络或无线网络与计算设备相连。
图4是本申请实施例提供的图像背景替换方法的流程图。所述图像背景替换方法应用于计算设备(例如图1-图3中的计算设备101)。图4所示图像背景替换方法中,用于背景替换的第一图像是二维图像。
401,获取第一图像,第一图像是二维图像,第一图像包括用户的上半身正面图像信息。
第一图像的数量可以是一个(例如第一用户对应的第一图像),也可以是多个(例如第一用户对应的第一图像和第二用户对应的第一图像)。在本申请的一些实施例中,第二用户对应的第一图像可以称为第三图像,例如,计算设备获取第一图像,第一图像是二维图像,第一图像包括第一用户的上半身正面图像信息。计算设备获取第三图像,第三图像是二维图像,第三图像包括第二用户的上半身正面图像信息。
若本申请实施例提供的图像背景替换方法应用于单人场景,则获取的第一图像的数量是一个。例如,参阅图1所示,获取用户A的第一图像(第一图像的数量为一个)。
若本申请实施例提供的图像背景替换方法应用于多人场景,则获取的第一图像的数量是多个,每个第一图像对应一个用户,不同的第一图像对应不同的用户。例如,参阅图2-图3所示,分别获取用户A、用户B、用户C、用户D的第一图像(第一图像的数量为四个)。
可以通过用户设备的摄像装置(例如手机或平板电脑的摄像头,或者视频会议摄像机)对用户进行拍照,得到第一图像。
用户设备可以预先拍摄第一图像并发送给计算设备。计算设备将第一图像存储在存储设备中,在需要进行背景替换时,从存储设备中获取第一图像。
或者,可以在需要进行背景替换时,计算设备控制用户设备拍摄用户的第一图像,并接收用户设备发送的第一图像。
在本申请的一个实施例中,计算设备可以预先存储多个第一图像,根据虚拟背景的环境从预先存储的多个第一图像中选择一个。虚拟背景的环境可以包括虚拟背景对应的光线、场所(例如教室、会议室、酒吧等)、天气(例如晴天、雨天等)、季节(例如夏天、冬天等)等。例如,可以在不同光线下对用户拍摄第一图像,根据虚拟背景对应的光线选择相似光线下拍摄的一个第一图像。又如,可以拍摄用户对应不同场所不同着装的多个第一图像,根据虚拟背景对应的场所选择对应着装的一个第一图像。再如,可以拍摄用户对应不同季节不同着装的第一图像,根据虚拟背景对应的季节选择相应着装的一个第一图像。根据虚拟背景的环境从预先存储的多个第一图像中选择一个,可以提高背 景替换的效果。
402,获取第二图像,第二图像包括用户的特征。
用户的特征可以包括用户的姿态和/或表情。
第二图像的数量可以是一个(例如第一用户对应的第二图像),也可以是多个(例如第一用户对应的第二图像和第二用户对应的第二图像)。
若本申请实施例提供的图像背景替换方法应用于单人场景,则获取的第二图像的数量是一个。例如,参阅图1所示,获取用户A的第二图像(第二图像的数量为一个)。
若本申请实施例提供的图像背景替换方法应用于多人场景,则获取的第二图像的数量可以是一个也可以是多个。若获取的第二图像的数量是一个,第二图像可以对应多个用户,包括多个用户的特征(例如包括第一用户的特征和第二用户的特征)。若获取的第二图像的数量是多个,每个第二图像可以对应一个用户,不同的第二图像对应不同的用户,包括不同用户的特征(例如一个第二图像对应第一用户,包括第一用户的特征;另一个第二图像对应第二用户,包括第二用户的特征)。例如,参阅图2-图3所示,可以获取用户A、用户B、用户C、用户D的第二图像(第二图像的数量可以为一个,也可以为四个)。
用户设备可以拍摄用户的第二图像,将第二图像发送给计算设备。
计算设备可以实时获取第二图像。
在本申请的一个实施例中,计算设备可以实时获取用户的视频,从视频中得到第二图像。例如,在视频会议中,用户设备实时拍摄用户的会议视频,将会议视频发送给计算设备。计算设备接收会议视频,从会议视频中得到第二图像。又如,在在线课堂中,用户设备实时拍摄用户的上课视频,将上课视频发送给计算设备。计算设备接收上课视频,从上课视频中得到第二图像。再如,在线上聚会时,用户设备实时拍摄用户的聚会视频,将聚会视频发送给计算设备。计算设备接收聚会视频,从聚会视频中得到第二图像。
计算设备可以从视频中获取一个图像帧,对图像帧进行人像分割,得到第二图像。
在本申请的一个实施例中,计算设备可以采用深度学习分割算法,对图像帧进行人像分割。
在本申请的一个实施例中,如果图像帧是单人图像,采用语义分割算法(例如金字塔场景解析网络(Pyramid Scene Parsing Network,PSPNet)、全卷积网络(Fully Convolutional Network,FCN))对图像帧进行人像分割;如果图像帧是多人图像,采用实例分割算法(例如协同检测和分割(Simultaneous Detection and Segmentation,SDS)算法、基于掩膜区域卷积神经网络(Mask Region-based Convolutional Neural Network(Mask RCNN))对图像帧进行人像分割。
语义分割是对图像中的每个像素进行分类(类别例如包括人、草地、树木、天空)。实例分割是在图像中将目标(例如人物)检测出来,并区分目标 中不同的个体(例如区分不同的人物)。
在本申请的一个实施例中,可以根据视频判断背景替换的应用场景,若背景替换的应用场景是单人场景,采用语义分割算法对图像帧进行人像分割;若背景替换的应用场景是多人集中式场景,采用实例分割算法对图像帧进行人像分割。可以判断视频中包括多人的持续时间是否超过预设时间(例如1分钟),若视频中包括多人的持续时间未超过预设时间,则确定为单人场景,若视频中包括多人的持续时间超过预设时间,则确定为多人集中式场景。
若图像帧是单人图像,对图像帧进行人像分割,得到一个第二图像。若图像帧是多人图像,对图像帧进行人像分割,得到多个第二图像。多个第二图像可能会存在人体部位不一致(例如有的有肩膀部分,有的没有肩膀部分)、图像大小不一致的问题。
403,将第一图像与第二图像进行合成,得到合成图像,合成图像包括用户的特征(例如用户的实时特征),且包括用户的上半身正面信息。
若本申请实施例提供的图像背景替换方法应用于单人场景,获取的第一图像和第二图像的数量均为一个(例如第一用户对应的第一图像和第二图像),将第一图像与第二图像进行合成,得到合成图像(合成图像的数量为一个,例如第一用户对应的第一合成图像)。例如,参阅图1所示,获取用户A的第一图像和第二图像(第一图像和第二图像的数量均为一个),将用户A的第一图像和第二图像进行合成,得到用户A的合成图像。
若本申请实施例提供的图像背景替换方法应用于多人场景,第一图像的数量为多个,第二图像的数量可以为一个,也可以为多个(例如获取第一用户对应的第一图像和第二图像、第二用户对应的第一图像和第二图像)。若第一图像和第二图像的数量为多个,则将每个第一图像和对应的第二图像进行合成,得到每个用户对应的合成图像(例如第一用户对应的第一合成图像、第二用户对应的第二合成图像)。例如,参阅图2-图3所示,分别获取用户A、用户B、用户C、用户D对应的第一图像和第二图像(第一图像和第二图像的数量均为四个),将用户A对应的第一图像和第二图像进行合成,得到用户A的合成图像,将用户B对应的第一图像和第二图像进行合成,得到用户B的合成图像,将用户C对应的第一图像和第二图像进行合成,得到用户C的合成图像,将用户D对应的第一图像和第二图像进行合成,得到用户D的合成图像。若第一图像的数量为多个,第二图像的数量为一个,则将每个第一图像和第二图像中对应的人像进行合成,得到每个用户对应的合成图像。
在本申请的一个实施例中,可以对第一图像进行特征点抽取,得到第一特征点集合;对第二图像进行特征点抽取,得到第二特征点集合;根据第一特征点集合和第二特征点集合对第一图像和第二图像进行对齐,将第二图像仿射变换到第一图像,得到中间图像;根据中间图像、第一图像和第二图像生成合成图像。
在本申请的一个实施例中,可以采用尺度不变特征转换(Scale-invariant Feature Transform,SIFT)算法对第一图像、第二图像分别进行特征点抽取。
在本申请的一个实施例中,可以将中间图像、第一图像和第二图像输入第一生成对抗网络模型,通过第一生成对抗网络模型生成合成图像。
在将中间图像、第一图像和第二图像输入第一生成对抗网络模型之前,需要对第一生成对抗网络模型进行训练。可以获取第一生成对抗网络模型的训练数据集,通过训练数据集对第一生成对抗网络模型进行训练。可以拍摄不同角度、不同姿态的第二图像,在拍摄每个第二图像时拍摄用户的上半身正面图像,将不同角度、不同姿态下拍摄的第二图像和对应的上半身正面图像作为第一生成对抗网络模型的训练数据集。
404,根据预设的人体部位对合成图像进行人像分割,得到目标人像。
若本申请实施例提供的图像背景替换方法应用于单人场景,合成图像的数量为一个,根据预设的人体部位对合成图像进行人像分割,得到一个目标人像(例如第一用户对应的第一目标人像)。例如,参阅图1所示,得到用户A的合成图像后,根据预设的人体部位对用户A的合成图像进行人像分割,得到用户A的目标人像。
若本申请实施例提供的图像背景替换方法应用于多人场景,合成图像的数量为多个,根据预设的人体部位对每个合成图像进行人像分割,得到每个用户对应的目标人像(例如第一用户对应的第一目标人像和第二用户对应的第二目标人像)。例如,参阅图2-图3所示,得到用户A、用户B、用户C、用户D各自的合成图像后,根据预设的人体部位对用户A的合成图像进行人像分割,得到用户A的目标人像,根据预设的人体部位对用户B的合成图像进行人像分割,得到用户B的目标人像,根据预设的人体部位对用户C的合成图像进行人像分割,得到用户C的目标人像,根据预设的人体部位对用户D的合成图像进行人像分割,得到用户D的目标人像。
在本申请的一个实施例中,可以根据虚拟背景确定预设的人体部位。例如,若虚拟背景是教室,预设的人体部位可以包括脸、脖子和胸部。又如,若虚拟背景是酒吧,预设的人体部位可以包括人脸、脖子和腰部。可以预先设置虚拟背景和预设的人体部位的对应关系。
在本申请的一个实施例中,可以根据预设的人体部位确定人像尺寸,根据人像尺寸从合成图像中分割出目标人像。人像尺寸可以是合成图像中的基准尺寸(例如眼睛到头顶的距离)的预设倍数。例如,预设的人体部位包括脸和脖子,人像尺寸是合成图像中的眼睛到头顶的距离的2.5倍。又如,预设的人体部位包括脸、脖子和胸部,人像尺寸是合成图像中的眼睛到头顶的距离的5倍。
在本申请的另一个实施例中,可以根据预设的人体部位在合成图像中进行人体部位识别,根据识别的人体部位从合成图像中分割出目标人像。例如,预设的人体部位包括脸和脖子,从合成图像中识别用户的脸和脖子,根据识别的脸和脖子从合成图像中分割出目标人像。
405,将目标人像与虚拟背景进行合成,得到背景替换图像,背景替换图像包括目标人像。
若本申请实施例提供的图像背景替换方法应用于单人场景,目标人像的数量为一个(例如第一用户对应的第一目标人像),将目标人像与虚拟背景进行合成,得到包括一个目标人像(例如第一目标人像)的背景替换图像。例如,参阅图1所示,将用户A的目标人像与虚拟背景进行合成,得到包括用户A的目标人像的背景替换图像。
若本申请实施例提供的图像背景替换方法应用于多人场景,目标人像的数量为多个(例如第一用户对应的第一目标人像和第二用户对应的第二目标人像),将多个目标人像与虚拟背景进行合成,得到包括多个目标人像(例如第一目标人像和第二目标人像)的背景替换图像。例如,参阅图2-3所示,将用户A的目标人像、用户B的目标人像、用户C的目标人像、用户D的目标人像与虚拟背景进行合成,得到包括用户A的目标人像、用户B的目标人像、用户C的目标人像和用户D的目标人像的背景替换图像。
在本申请的一个实施例中,在将目标人像与虚拟背景进行合成时,可以获取用户在虚拟背景中的位置,根据用户在虚拟背景中的位置,将目标人像与虚拟背景进行合成。
用户在虚拟背景中的位置可以是用户在虚拟背景中预先选择的位置。或者,用户在虚拟背景中的位置可以是用户在虚拟背景中预分配的位置。
虚拟背景中的每个位置可以预先设置一个编号。例如,虚拟背景是教室,第1排左起第1个位置的编号为1.1,第1排左起第2个位置的编号为1.2,第1排左起第3个位置的编号为1.3,……;第2排左起第1个位置的编号为2.1,第2排左起第2个位置的编号为2.2,第2排左起第3个位置的编号为2.3……;第3排左起第1个位置的编号为3.1,第3排左起第2个位置的编号为3.2,第3排左起第3个位置的编号为3.3……依此类推。如果用户在虚拟背景中的位置为3.2,表示将用户的目标人像合成到虚拟背景中的第3排左起第2个位置。
在本申请的一个实施例中,在将目标人像与虚拟背景进行合成时,可以根据虚拟场景中位置的大小对目标人像进行缩放,将缩放后的目标人像与虚拟背景进行合成。
根据虚拟场景中位置的大小对目标人像进行缩放,是将目标人像调整为适应虚拟场景中位置的大小。虚拟场景中的位置越大,缩放后的目标人像就越大。虚拟场景中的位置越小,缩放后的目标人像就越小。
在本申请的另一个实施例中,可以根据用户在虚拟背景中的位置对目标人像进行缩放,将缩放后的目标人像与虚拟背景进行合成。根据用户在虚拟背景中的位置对目标人像进行缩放,是将目标人像调整为适应用户在虚拟背景中的位置(满足近大远小的规则)。用户在虚拟背景中的位置越靠前,缩放后的目标人像就越大。用户在虚拟背景中的位置越靠后,缩放后的目标人像就越小。当本申请实施例提供的图像背景替换方法应用在多人场景时,根据用户在虚拟背景中的位置对目标人像进行缩放,可以使背景替换图像看起来更加真实。例如,虚拟背景是阶梯教室,根据用户在虚拟 背景中的位置对目标人像进行缩放,得到的背景替换图像中靠前的人像大,靠后的人像小,效果更加真实。
可以结合虚拟场景中位置的大小和用户在虚拟背景中的位置对目标人像进行缩放。
图5-图6是本申请实施例提供的图像背景替换方法的效果示意图。其中,图5是在多人场景下的背景替换,背景替换过程中未对目标人物进行缩放。如图5所示,根据本申请实施例提供的图像背景替换方法,可以得到包括预设的人体部位的背景替换图像。背景替换过程中未对目标人物进行缩放,得到的背景替换图像中人像的大小不一。
图6是在多人场景下的背景替换,背景替换过程中对目标人物进行缩放。如图6所示,根据本申请实施例提供的图像背景替换方法,可以得到包括预设的人体部位的背景替换图像。背景替换过程中,根据虚拟场景中位置的大小和用户在虚拟背景中的位置对目标人像进行缩放,得到的背景替换图像中同排的人像大小一致,不同排的人像大小有差别(靠前的人像大,靠后的人像小),背景替换图像的效果更加真实。
图4所示实施例中,计算设备将第一图像与第二图像进行合成,得到合成图像,合成图像包括用户的特征(例如用户的姿态),且包括用户的上半身正面信息。计算设备根据预设的人体部位对合成图像进行人像分割,得到目标人像,目标人像包括预设的人体部位。计算设备将目标人像与虚拟背景进行合成,得到背景替换图像。由于从合成图像分割得到的目标人像包括预设的人体部位,将目标人像与虚拟背景进行合成得到的背景替换图像中也包括预设的人体部位,从而实现了按照预设的人体部位进行背景替换,保证得到的背景替换图像包括预设的人体部位,提高了背景替换效果。
图7是本申请另一实施例提供的图像背景替换方法的流程图。所述图像背景替换方法应用于计算设备(例如图1-图3中计算设备101)。图7所示图像背景替换方法中,用于背景替换的第一图像是三维模型。图7所示图像背景替换方法中,计算设备在背景替换过程中对用户的姿态进行矫正,使背景替换效果进一步优化。
701,获取用户的第一图像,第一图像是三维模型,第一图像包括用户的上半身正面图像信息。
在本申请的一个实施例中,可以通过用户设备围绕用户的上半身旋转一周拍摄视频,根据视频建立离散特征点云,将离散特征点云进行网格化处理,得到初始模型,对初始模型进行纹理贴图,得到三维模型。
纹理是图像,纹理贴图是使用纹理对初始模型进行覆盖,将纹理排列放到初始模型上。经过纹理贴图得到的三维模型在视觉效果上会更加真实。
702,获取第二图像,第二图像包括用户的特征。
用户的特征可以包括用户的姿态。用户的特征还可以包括用户的表情。
获取第二图像的具体过程可以参阅上文对401的相关描述。
703,将第一图像与第二图像进行合成并根据预设的姿态对用户的姿态进 行矫正,得到合成图像,合成图像包括用户的矫正后的姿态,且包括用户的上半身正面信息。
图7所示图像背景替换方法中,计算设备在将第一图像与第二图像进行合成的过程中,根据预设的姿态对用户的姿态进行矫正。
根据预设的姿态对用户的姿态进行矫正,是将用户的姿态矫正为预设的姿态。例如,用户的姿态为倾斜的坐姿(例如向左倾斜或向右倾斜),将倾斜的坐姿矫正为端正的坐姿。又如,用户的姿态为侧面姿态(即转头的姿态),将侧面姿态矫正为正面姿态。又如,用户的姿态为低头姿态,将低头姿态矫正为平视姿态。
将第一图像与第二图像进行合成并对用户的姿态进行矫正的具体过程将在图9中进行描述。
在本申请其他的实施例中,在第一图像为三维模型的情况下,在将第一图像与第二图像进行合成时,可以不对用户的姿态进行矫正。
704,根据预设的人体部位对合成图像进行人像分割,得到目标人像。
根据预设的人体部位对合成图像进行人像分割的具体过程可以参阅上文对404的相关描述。
705,将目标人像与虚拟背景进行合成,得到背景替换图像,背景替换图像包括目标人像。
将目标人像与虚拟背景进行合成的具体过程可以参阅上文对405的相关描述。
图7所示的图像背景替换方法中,计算设备将第一图像与第二图像进行合成,得到合成图像,合成图像包括用户的特征,且包括用户的上半身正面信息。计算设备根据预设的人体部位对合成图像进行人像分割,得到目标人像,将目标人像与虚拟背景进行合成,得到背景替换图像。由于从合成图像分割得到的目标人像包括预设的人体部位,将目标人像与虚拟背景进行合成得到的背景替换图像中也包括预设的人体部位,从而实现了按照预设的人体部位进行背景替换,保证得到的背景替换图像包括预设的人体部位,提高了背景替换效果。并且,图7所示的图像背景替换方法在将第一图像与第二图像进行合成时,根据预设的姿态对用户的姿态进行矫正,可以得到预设姿态的背景替换图像,进一步优化背景替换的效果。
图8A是根据本申请实施例提供的图像背景替换方法,未对用户的姿态进行矫正得到的背景替换图像的示意图。图8A所示的多人场景下,第二图像中用户的姿态不一,有的端正坐着,有的向左倾斜,有的向右倾斜。未对用户的姿态进行矫正得到的背景替换图像中,各个人像的人体部位一致
(均包括人脸、脖子和肩膀),各个人像的姿态不一(有的是端正的坐姿,有的是向左倾斜的坐姿,有的是向右倾斜的坐姿)。
图8B是根据本申请实施例提供的图像背景替换方法,对用户的姿态进行矫正得到的背景替换图像的示意图。图8B所示的多人场景下,第二图像中用户的姿态不一,有的端正坐着,有的向左倾斜,有的向右倾斜。对用 户的姿态进行矫正得到的背景替换图像中,各个人像的人体部位一致(均包括人脸、脖子和肩膀),各个人像的姿态一致(均为端正的坐姿),视觉效果更整齐,进一步优化背景替换的效果。
图9是本申请实施例提供的将第一图像与第二图像进行合成并对用户的姿态进行矫正的细化流程图。图9所示流程中,第一图像是三维模型。
901,对第二图像进行人脸分割,得到人脸图像。
可以将第二图像输入预先训练好的人脸分割网络,人脸分割网络输出人脸图像。
902,根据人脸图像生成脸部纹理。
在本申请的一个实施例中,可以将人脸图像输入人脸纹理生成模型,人脸纹理生成模型的输出为脸部纹理图。脸部纹理图包括用户的特征。人脸纹理生成模型可以是第二生成对抗网络模型。
可以利用目标用户的三维模型产生不同角度的二维人脸图像,将不同角度的二维人脸图像与三维模型的脸部纹理图作为训练数据,对神经网络进行训练,得到人脸纹理生成模型。神经网络的输入为不同角度的二维人脸图,输出为三维模型的人脸纹理图。由于神经网络的输入和输出都是二维图像,可以使用第二生成对抗网络模型进行训练。人脸纹理生成模型是针对目标人物得到的模型,可以对目标人物生成高质量的脸部纹理。
903,根据脸部纹理对三维模型进行脸部纹理替换,得到纹理替换后的三维模型。
904,将纹理替换后的三维模型矫正到预设的姿态对应的位置(例如正脸位置),得到矫正后的三维模型。
905,将矫正后的三维模型渲染到二维图像,得到合成图像。
图10是本申请实施例提供的图像背景替换装置的结构示意图,图10所示的图像背景替换装置1000用于实施图4、图7和图9示出的方法。
可选地,结合图1所示的应用场景来看,图10所示的图像背景替换装置1000可以设于图1-图3中的计算设备101图像背景替换装置1000包括获取单元1001、合成单元1002和分割单元1003。获取单元1001用于支持图像背景替换装置1000执行图4中步骤401、402和图7中步骤701和702。合成单元1002用于支持图像背景替换装置1000执行图4中步骤403、405和图7中步骤703和705。分割单元1003用于支持图像背景替换装置1000执行图4中步骤404和图7中步骤704。
图10所描述的装置实施例仅仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
图像背景替换装置1000中的各个单元全部或部分地通过软件、硬件、固件或者其任意组合来实现。
在采用软件实现的情况下,例如,上述获取单元1001、合成单元1002和分割单元1003是由图11中的至少一个处理器1101读取存储器1102中存储的程序代码后,生成的软件功能单元来实现。
在采用硬件实现的情况下,例如,图10中上述各个单元由计算设备中的不同硬件分别实现,例如获取单元1001由图11中的至少一个处理器1101中的一部分处理资源(例如多核处理器中的一个核或两个核)实现,而合成单元1002和分割单元1003由图11中至少一个处理器1101中的其余部分处理资源(例如多核处理器中的其他核),或者采用现场可编程门阵列(field-programmable gate array,FPGA)、或协处理器等可编程器件来完成。
在采用软件硬件相结合的方式来实现的情况下,例如,合成单元1002由硬件可编程器件实现,而合成单元1002和分割单元1003是由CPU读取存储器中存储的程序代码后,生成的软件功能单元。
下面对计算设备相关的基本硬件结构举例说明。
图11是本申请实施例提供的一种计算设备的结构示意图。
可选地,图11所示的计算设备101用于执行图4、图7和图9示出的方法。
计算设备101包括至少一个处理器1101、存储器1102以及至少一个网络接口1103。
处理器1101例如是通用中央处理器(central processing unit,CPU)、网络处理器(network processer,NP)、图形处理器(graphics processing unit,GPU)、神经网络处理器(neural-network processing units,NPU)、数据处理单元(data processing unit,DPU)、微处理器或者一个或多个用于实现本申请方案的集成电路。例如,处理器1101包括专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。PLD例如是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。
存储器1102例如是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,又如是随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备,又如是电可擦可编程只读存储器(electrically erasable programmable read-only Memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。可选地,存储器1102独立存在,并通过内部连接1104与处理器1101相连接。或者,可选地存储器1102和处理器1101集成在一起。
网络接口1103使用任何收发器一类的装置,用于与其它设备或通信网络通信。网络接口1103例如包括有线网络接口或者无线网络接口中的至少一 项。其中,有线网络接口例如为以太网接口。以太网接口例如是光接口,电接口或其组合。无线网络接口例如为无线局域网(wireless local area networks,WLAN)接口,蜂窝网络接口或其组合等。
在一些实施例中,处理器1101包括一个或多个CPU,如图11中所示的CPU0和CPU1。
在一些实施例中,计算设备101可选地包括多个处理器,如图11中所示的处理器1101和处理器1105。这些处理器中的每一个例如是一个单核处理器(single-CPU),又如是一个多核处理器(multi-CPU)。这里的处理器可选地指一个或多个设备、电路、和/或用于处理数据(如计算机程序指令)的处理核。
在一些实施例中,计算设备101还包括内部连接1104。处理器1101、存储器1102以及至少一个网络接口1103通过内部连接1104连接。内部连接1104包括通路,在上述组件之间传送信息。可选地,内部连接1104是单板或总线。可选地,内部连接1104分为地址总线、数据总线、控制总线等。
在一些实施例中,计算设备101还包括输入输出接口1106。输入输出接口1106连接到内部连接1104上。
可选地,处理器1101通过读取存储器1102中保存的程序代码1110实现上述实施例中的方法,或者,处理器1101通过内部存储的程序代码实现上述实施例中的方法。在处理器1101通过读取存储器1102中保存的程序代码1110实现上述实施例中的方法的情况下,存储器1102中保存实现本申请实施例提供的方法的程序代码。
处理器1101实现上述功能的更多细节请参考前面各个方法实施例中的描述,在这里不再重复。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分可互相参考,每个实施例重点说明的都是与其他实施例的不同之处。
A参考B,指的是A与B相同或者A为B的简单变形。
本申请实施例的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序,也不能理解为指示或暗示相对重要性。例如,第一资源和第二资源用于区别不同的资源,而不是用于描述资源的特定顺序,也不能理解为第一资源比第二资源更重要。
本申请实施例,除非另有说明,“至少一个”的含义是指一个或多个,“多个”的含义是指两个或两个以上。例如,多个业务请求是指两个或两个以上的业务请求。
上述实施例可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例描述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网 络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (25)

  1. 一种图像背景替换方法,其特征在于,所述方法包括:
    获取第一图像,所述第一图像为二维图像或三维模型,所述第一图像包括第一用户的上半身正面图像信息;
    获取第二图像,所述第二图像包括所述第一用户的特征,所述特征包括姿态和/或表情;
    将所述第一图像与所述第二图像进行合成,得到第一合成图像,所述第一合成图像包括所述第一用户的特征,且包括所述第一用户的上半身正面信息;
    根据预设的人体部位对所述第一合成图像进行人像分割,得到所述第一用户对应的第一目标人像;
    将所述第一目标人像与虚拟背景进行合成,得到背景替换图像,所述背景替换图像包括所述第一目标人像。
  2. 如权利要求1所述的图像背景替换方法,其特征在于,所述根据预设的人体部位对所述第一合成图像进行人像分割包括:
    根据所述预设的人体部位确定人像尺寸,根据所述人像尺寸对所述第一合成图像进行人像分割;或者
    根据所述预设的人体部位在所述第一合成图像中进行人体部位识别,根据识别的人体部位对所述第一合成图像进行人像分割。
  3. 如权利要求1或2所述的图像背景替换方法,其特征在于,所述将所述第一目标人像与虚拟背景进行合成包括:
    根据所述虚拟场景中位置的大小和/或所述第一用户在所述虚拟背景中的位置对所述第一目标人像进行缩放;
    将缩放后的第一目标人像与所述虚拟背景进行合成。
  4. 如权利要求3所述的图像背景替换方法,其特征在于,所述第一用户在所述虚拟背景中的位置由所述第一用户预先选择得到,或者预先分配得到。
  5. 如权利要求1至4中任一项所述的图像背景替换方法,所述第一用户的特征包括所述第一用户的姿态,其特征在于,所述将所述第一图像与所述第二图像进行合成包括:
    根据预设的姿态对所述第一用户的姿态进行矫正;
    根据所述第一用户的矫正后的姿态得到所述第一合成图像,所述第一合成图像包括所述第一用户的矫正后的姿态,且包括所述第一用户的上半身正面信息。
  6. 如权利要求5所述的图像背景替换方法,其特征在于,所述第一图像为三维模型,所述根据预设的姿态对所述第一用户的姿态进行矫正包括:
    对所述第二图像进行人脸分割,得到人脸图像;
    根据所述人脸图像生成脸部纹理;
    根据所述脸部纹理对所述三维模型进行脸部纹理替换,得到纹理替换后的 三维模型,所述纹理替换后的三维模型包括所述第一用户的姿态;
    将所述纹理替换后的三维模型矫正到所述预设的姿态对应的位置,得到矫正后的三维模型,所述矫正后的三维模型包括所述第一用户的矫正后的姿态;
    所述根据所述第一用户的矫正后的姿态得到所述第一合成图像包括:
    将所述矫正后的三维模型渲染到二维图像,得到所述第一合成图像。
  7. 如权利要求1至6中任一项所述的图像背景替换方法,其特征在于,所述获取第一图像包括:
    根据所述虚拟背景的环境从预先存储的多个第一图像中选择一个。
  8. 如权利要求1至7中任一项所述的图像背景替换方法,其特征在于,所述第二图像还包括第二用户的特征,所述方法还包括:
    获取第三图像,所述第三图像为二维图像或三维模型,所述第三图像包括所述第二用户的上半身正面图像信息;
    将所述第三图像与所述第二图像进行合成,得到第二合成图像,所述第二合成图像包括所述第二用户的特征,且包括所述第二用户的上半身正面信息;
    根据预设的人体部位对所述第二合成图像进行人像分割,得到所述第二用户对应的第二目标人像;
    所述将所述第一目标人像与虚拟背景进行合成,得到背景替换图像包括:
    将所述第一目标人像、所述第二目标人像与所述虚拟背景进行合成,得到所述背景替换图像,所述背景替换图像包括所述第一目标人像和所述第二目标人像。
  9. 如权利要求1至8中任一项所述的图像背景替换方法,其特征在于,所述第二图像是实时获取的,所述用户的特征包括所述用户的实时的姿态和/或表情。
  10. 如权利要求9所述的图像背景替换方法,其特征在于,所述获取第二图像包括:
    实时获取包含所述用户的特征的视频;
    从所述视频中得到所述第二图像。
  11. 如权利要求10所述的图像背景替换方法,其特征在于,所述从所述视频中得到所述第二图像包括:
    从所述视频中获取图像帧;
    对所述图像帧进行人像分割,得到所述第二图像。
  12. 一种图像背景替换装置,其特征在于,所述装置包括:
    获取单元,用于获取第一图像,所述第一图像为二维图像或三维模型,
    所述第一图像包括第一用户的上半身正面图像信息;
    所述获取单元,还用于获取第二图像,所述第二图像包括所述第一用户的特征,所述特征包括姿态和/或表情;
    合成单元,用于将所述第一图像与所述第二图像进行合成,得到第一合成图像,所述第一合成图像包括所述第一用户的特征,且包括所述第一用户的上半身正面信息;
    分割单元,用于根据预设的人体部位对所述第一合成图像进行人像分割,得到所述第一用户对应的第一目标人像;
    所述合成单元,还用于将所述第一目标人像与虚拟背景进行合成,得到背景替换图像,所述背景替换图像包括所述第一目标人像。
  13. 如权利要求12所述的图像背景替换装置,其特征在于,所述分割单元用于:
    根据所述预设的人体部位确定人像尺寸,根据所述人像尺寸对所述第一合成图像进行人像分割;或者
    根据所述预设的人体部位在所述第一合成图像中进行人体部位识别,根据识别的人体部位对所述第一合成图像进行人像分割。
  14. 如权利要求12或13所述的图像背景替换装置,其特征在于,所述合成单元用于:
    根据所述虚拟场景中位置的大小和/或所述第一用户在所述虚拟背景中的位置对所述第一目标人像进行缩放;
    将缩放后的第一目标人像与所述虚拟背景进行合成。
  15. 如权利要求14所述的图像背景替换装置,其特征在于,所述用户在所述虚拟背景中的位置由所述用户预先选择得到,或者预先分配得到。
  16. 如权利要求12至15中任一项所述的图像背景替换装置,所述第一用户的特征包括所述第一用户的姿态,其特征在于,所述合成单元用于:
    根据预设的姿态对所述第一用户的姿态进行矫正;
    根据所述第一用户的矫正后的姿态得到所述第一合成图像,所述第一合成图像包括所述第一用户的矫正后的姿态,且包括所述第一用户的上半身正面信息。
  17. 如权利要求16所述的图像背景替换装置,其特征在于,所述第一图像为三维模型,所述合成单元用于:
    对所述第二图像进行人脸分割,得到人脸图像;
    根据所述人脸图像生成脸部纹理;
    根据所述脸部纹理对所述三维模型进行脸部纹理替换,得到纹理替换后的三维模型,所述纹理替换后的三维模型包括所述第一用户的姿态;
    将所述纹理替换后的三维模型矫正到所述预设的姿态对应的位置,得到矫正后的三维模型,所述矫正后的三维模型包括所述第一用户的矫正后的姿态;
    将所述矫正后的三维模型渲染到二维图像,得到所述第一合成图像。
  18. 如权利要求12至17中任一项所述的图像背景替换装置,其特征在于,所述获取单元用于:
    根据所述虚拟背景的环境从预先存储的多个第一图像中选择一个。
  19. 如权利要求12至18中任一项所述的图像背景替换装置,其特征在于,所述第二图像还包括第二用户的特征,所述获取单元还用于:
    获取第三图像,所述第三图像为二维图像或三维模型,所述第三图像包括 所述第二用户的上半身正面图像信息;
    所述合成单元还用于:
    将所述第三图像与所述第二图像进行合成,得到第二合成图像,所述第二合成图像包括所述第二用户的特征,且包括所述第二用户的上半身正面信息;
    所述分割单元还用于:
    根据预设的人体部位对所述第二合成图像进行人像分割,得到所述第二用户对应的第二目标人像;
    所述合成单元用于:
    将所述第一目标人像、所述第二目标人像与所述虚拟背景进行合成,得到所述背景替换图像,所述背景替换图像包括所述第一目标人像和所述第二目标人像。
  20. 如权利要求12至19中任一项所述的图像背景替换装置,其特征在于,所述第二图像是实时获取的,所述用户的特征包括所述用户的实时的姿态和/或表情。
  21. 如权利要求20所述的图像背景替换装置,其特征在于,所述获取单元用于:
    实时获取包含所述用户的特征的视频,从所述视频中得到所述第二图像。
  22. 如权利要求21所述的图像背景替换装置,其特征在于,所述获取单元用于:
    从所述视频中获取图像帧;
    对所述图像帧进行人像分割,得到所述第二图像。
  23. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在计算设备上运行时,使得所述计算设备执行如权利要求1至11中任一项所述的图像背景替换方法。
  24. 一种计算设备,其特征在于,所述计算设备包括处理器和存储器,所述存储器,用于存储指令,所述处理器用于调用所述存储器中的指令,使得所述计算设备执行如权利要求1至11中任一项所述的图像背景替换方法。
  25. 一种芯片系统,其特征在于,该芯片系统应用于计算设备;芯片系统包括接口电路和处理器;接口电路和处理器通过线路互联;接口电路用于从计算设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行该计算机指令时,芯片系统执行如权利要求1至11中任一项所述的图像背景替换方法。
PCT/CN2023/102696 2022-09-06 2023-06-27 图像背景替换方法及相关设备 WO2024051289A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211084989.X 2022-09-06
CN202211084989.XA CN117710496A (zh) 2022-09-06 2022-09-06 图像背景替换方法及相关设备

Publications (1)

Publication Number Publication Date
WO2024051289A1 true WO2024051289A1 (zh) 2024-03-14

Family

ID=90159363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/102696 WO2024051289A1 (zh) 2022-09-06 2023-06-27 图像背景替换方法及相关设备

Country Status (2)

Country Link
CN (1) CN117710496A (zh)
WO (1) WO2024051289A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019083509A1 (en) * 2017-10-24 2019-05-02 Hewlett-Packard Development Company, L.P. PEOPLE SEGMENTATIONS FOR BACKGROUND REPLACEMENTS
CN110390704A (zh) * 2019-07-11 2019-10-29 深圳追一科技有限公司 图像处理方法、装置、终端设备及存储介质
CN113240702A (zh) * 2021-06-25 2021-08-10 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019083509A1 (en) * 2017-10-24 2019-05-02 Hewlett-Packard Development Company, L.P. PEOPLE SEGMENTATIONS FOR BACKGROUND REPLACEMENTS
CN110390704A (zh) * 2019-07-11 2019-10-29 深圳追一科技有限公司 图像处理方法、装置、终端设备及存储介质
CN113240702A (zh) * 2021-06-25 2021-08-10 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN117710496A (zh) 2024-03-15

Similar Documents

Publication Publication Date Title
WO2019238114A1 (zh) 动态模型三维重建方法、装置、设备和存储介质
US20220222893A1 (en) Method and apparatus for generating three-dimensional face model, computer device, and storage medium
US11900557B2 (en) Three-dimensional face model generation method and apparatus, device, and medium
US11849102B2 (en) System and method for processing three dimensional images
CN114219878B (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
CN114097248B (zh) 一种视频流处理方法、装置、设备及介质
CN110956691A (zh) 一种三维人脸重建方法、装置、设备及存储介质
CN109147012B (zh) 图像处理方法和装置
WO2024078243A1 (zh) 视频生成模型的训练方法、装置、存储介质及计算机设备
WO2022135574A1 (zh) 肤色检测方法、装置、移动终端和存储介质
WO2022135579A1 (zh) 肤色检测方法、装置、移动终端和存储介质
CN111951368A (zh) 一种点云、体素和多视图融合的深度学习方法
CN111836058B (zh) 用于实时视频播放方法、装置、设备以及存储介质
CN111597928A (zh) 三维模型处理方法及装置、电子设备、存储介质
WO2022083118A1 (zh) 一种数据处理方法及相关设备
US11887249B2 (en) Systems and methods for displaying stereoscopic rendered image data captured from multiple perspectives
CN115239857B (zh) 图像生成方法以及电子设备
WO2024051289A1 (zh) 图像背景替换方法及相关设备
CN116567349A (zh) 基于多摄像机的视频展示方法、装置及存储介质
US20220408015A1 (en) Matching Active Speaker Pose Between Two Cameras
WO2021170127A1 (zh) 一种半身像的三维重建方法及装置
Zhu et al. A shared augmented virtual environment for real‐time mixed reality applications
Sun et al. SSAT $++ $: A Semantic-Aware and Versatile Makeup Transfer Network With Local Color Consistency Constraint
WO2024113779A1 (zh) 图像处理方法、装置及相关设备
Seo et al. Automatic Gaze Correction based on Deep Learning and Image Warping

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23861990

Country of ref document: EP

Kind code of ref document: A1