WO2022088750A1 - 图像生成方法和电子设备 - Google Patents

图像生成方法和电子设备 Download PDF

Info

Publication number
WO2022088750A1
WO2022088750A1 PCT/CN2021/105334 CN2021105334W WO2022088750A1 WO 2022088750 A1 WO2022088750 A1 WO 2022088750A1 CN 2021105334 W CN2021105334 W CN 2021105334W WO 2022088750 A1 WO2022088750 A1 WO 2022088750A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
area
model
dimensional
image
Prior art date
Application number
PCT/CN2021/105334
Other languages
English (en)
French (fr)
Inventor
申婷婷
赵松涛
郭益林
宋丛礼
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022088750A1 publication Critical patent/WO2022088750A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image generation method and an electronic device.
  • Virtual manicure is a new feature of short video application or camera application, virtual nail nails beautify the nails in the image.
  • the current virtual nail art scheme usually performs beautification processing on the two-dimensional nails in the two-dimensional images.
  • the present disclosure provides an image generation method and electronic device, and the technical solutions of the present disclosure are as follows:
  • an image generation method comprising:
  • the target image comprising a first target object
  • the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
  • an image generation device comprising:
  • an acquisition module configured to acquire a target image, the target image comprising a first target object
  • a position information acquisition module configured to acquire the position information of the two-dimensional key points of the first target object from the target image
  • the image generation module is configured to project the three-dimensional model to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object to generate a special effect image, the target
  • the region is the region where the first target object is located in the target image
  • the three-dimensional key point is the key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
  • an electronic device comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to the instructions to perform the following steps:
  • the target image comprising a first target object
  • the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
  • a storage medium when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device can perform the following steps:
  • the target image comprising a first target object
  • the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
  • a computer program product comprising readable program code that can be executed by a processor of an electronic device to perform the following steps:
  • the target image comprising a first target object
  • the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
  • FIG. 1 is a flowchart of an image generation method.
  • Figure 2a is a schematic diagram of a nail makeup application scheme.
  • FIG. 2b is a schematic flowchart of projecting a three-dimensional model to a target image in a nail makeup application scheme.
  • Figure 3a is a target image in a nail application scheme.
  • Figure 3b is a hand area in a nail application scheme.
  • FIG. 3c is a schematic diagram of a segmentation result of semantic segmentation of hand regions using a nail segmentation model in a nail makeup scheme.
  • FIG. 3d is a composite image of segmentation results of semantic segmentation of hand regions using a nail segmentation model in a nail makeup scheme.
  • Figure 3e is a nail area in a make-up scheme of a nail.
  • FIG. 3f is a schematic diagram of key points of a nail in a nail makeup application scheme.
  • Figure 3g is a top view of a three-dimensional model in a nail application solution.
  • FIG. 3h is an effect diagram of the projection of the three-dimensional model in a nail makeup scheme onto the nail area.
  • FIG. 3i and FIG. 3j are two makeup effect diagrams in a nail makeup application scheme.
  • FIG. 4 is a block diagram of an image generating apparatus.
  • Figure 5 is a block diagram of an image generation electronic device.
  • FIG. 6 is a block diagram of an electronic device for processing image effects.
  • FIG. 1 is a flowchart of an image generation method. As shown in FIG. 1 , the image generation method can be applied to a terminal or a server. In the following description process, the image generation method is applied to a terminal as an example. The image generation method includes the following steps.
  • the target image is also called an image to be processed, and the target image can be understood as a two-dimensional target image, that is, the target image is a plane image.
  • the first target object is a fingernail, including fingernails and toenails.
  • the first target object may also be an eyeball, eyelashes, lips, etc., which are not limited in the embodiments of the present disclosure.
  • the first target object is a nail as an example for description.
  • the target image may contain one or more first target objects.
  • the target image is the live broadcast image
  • the live broadcast image includes the host
  • the first target object is the host's nail.
  • the terminal obtains live broadcast images through a shooting device, such as a camera built in the terminal or an external camera, and the terminal is also the terminal used by the host for live broadcast.
  • the image generation method provided by the embodiment of the present disclosure is applied to the server, the terminal obtains the live image through the photographing device, sends the live image to the server, and the server obtains the live image.
  • the target image is a video frame of the short video
  • the video frame includes a character
  • the first target object is the character of the character. nail.
  • the terminal can shoot the short video through a shooting device, such as a camera built in the terminal or an external camera.
  • the image generation method provided by the embodiment of the present disclosure is applied to a server, the terminal shoots a short video through a shooting device, sends the shot completed short video to the server, and the server obtains the short video and obtains video frames from the short video.
  • the terminal first segments the first target object from the target image, and then processes the first target object to obtain position information of two-dimensional key points.
  • the two-dimensional key point is a boundary point or a corner point of the first target object in the target image, which is not limited in this embodiment of the present disclosure.
  • the terminal when the terminal obtains the position information of the two-dimensional key points of the first target object from the target image, the terminal inputs the target image into the segmentation model to determine the target area, which is also the area including the first target object .
  • the target area is an area containing the smallest enclosing rectangle of the first target object, that is, the target area is a rectangular area that is tangent to the boundary of the first target object.
  • the terminal inputs the target area into the key point regression model to determine the location information of the two-dimensional key points.
  • the segmentation model includes a first segmentation sub-model and a second segmentation sub-model.
  • the first segmentation sub-model and the second segmentation sub-model are both pre-trained models.
  • the first segmentation sub-model The model is used to segment the candidate region from the target image
  • the second segmentation model is used to segment the target region from the candidate region, that is, when determining the target region from the target image, a multi-level segmentation method is used.
  • the terminal inputs the target image into the first segmentation sub-model, and performs semantic segmentation on the target image through the first segmentation sub-model to obtain a candidate region, the candidate region includes the second target object, and the first target object belongs to the second target object.
  • the terminal inputs the candidate region into the second segmentation sub-model, and performs semantic segmentation on the candidate region through the second segmentation sub-model to obtain the target region.
  • the terminal inputs the target image into the first segmentation sub-model, and performs semantic segmentation on the target image through the first segmentation sub-model to obtain a first region segmentation mask.
  • the first region segmentation mask is a A first mask image
  • the size of the first mask image is the same as the target image
  • the first mask image is a binary image
  • the pixel value of the pixel point is 0 or 1
  • the pixel point of the first mask image One-to-one correspondence with the pixels of the target image.
  • the terminal obtains the candidate region by segmenting the target image according to the first region segmentation mask, that is, the terminal multiplies the target image and the first mask image to obtain the candidate region.
  • the pixel value of a pixel point is 1, then the original pixel value can be retained after the corresponding pixel point of the target image is multiplied; if the pixel value of a pixel point of the first mask image is 0, then the corresponding pixel point of the target image has the same pixel value. After the multiplication, it is 0, and the area where the original pixel value is finally retained, that is, the candidate area.
  • the terminal inputs the candidate region into the second segmentation sub-model, and performs semantic segmentation on the candidate region through the second segmentation sub-model to obtain a second region segmentation mask.
  • the terminal obtains the target area by dividing the candidate area according to the second area dividing mask.
  • the terminal inputs the target area into the key point regression model, and extracts the features of the target area through the key point regression model to obtain the regional features of the target area, and obtains the location information of the two-dimensional key points of the first target object based on the regional features of the target area.
  • the second target object includes the first target object. If the first target object is a fingernail, the second target object is a hand or a foot. Correspondingly, the candidate area is an area including the hand or the foot.
  • the terminal performs semantic segmentation on the target image through the first segmentation sub-model to obtain the hand region, that is, the candidate region.
  • the terminal performs semantic segmentation on the hand region through the second segmentation sub-model to obtain the nail region, that is, the target region.
  • the terminal extracts the position information of the two-dimensional key points of each nail in the nail area through the key point regression model.
  • the target image is semantically segmented into target regions by the first segmentation sub-model and the second segmentation sub-model, and the position information of two-dimensional key points is extracted from the target region by the key point regression model.
  • the target image is divided into target regions step by step, which improves the accuracy of the extraction of two-dimensional key points of the first target object.
  • the terminal in order to make the output terms of the segmentation model and the keypoint regression model more stable, inputs the target image into the segmentation model to obtain a first initial area, where the first initial area includes the first target object.
  • the terminal performs time series smoothing processing on the first initial area to obtain a target area.
  • the terminal inputs the target area into the key point regression model, determines the position information of the initial two-dimensional key points of the first target object, and performs optical flow stabilization processing on the position information of the initial two-dimensional key points to obtain the position information of the two-dimensional key points.
  • the terminal performs semantic segmentation on the target image through the first segmentation sub-model to obtain a second initial area corresponding to the candidate area, and the terminal performs time series smoothing processing on the second initial area to obtain the candidate area.
  • the terminal performs semantic segmentation on the candidate area through the second segmentation sub-model to obtain the first initial area corresponding to the target area, and the terminal performs time series smoothing processing on the first initial area to obtain the target area.
  • the terminal When acquiring the position information of the two-dimensional key points, the terminal performs feature extraction on the target area according to the key point regression model and the optical flow algorithm, and obtains the position information of the two-dimensional key points.
  • the optical flow algorithm may use the Lucas-Kanade optical flow algorithm (a two-frame difference optical flow estimation algorithm).
  • the output of the segmentation model that is, the stability of the target region, can be enhanced through time series smoothing.
  • the output item of the key point regression model that is, the stability of the position information of the two-dimensional key point, can be enhanced.
  • the 3D key points are the key points corresponding to the 2D key points in the 3D model.
  • the three-dimensional model is a model that has been trained, and the embodiments of the present disclosure do not limit the training process of the three-dimensional model.
  • the three-dimensional model is also a three-dimensional model of the nail.
  • the terminal obtains a three-dimensional model by a three-dimensional reconstruction method.
  • the position information of the two-dimensional key points in the three-dimensional model may be the position information of the preset key points.
  • a 2D key point corresponds to a 3D key point in the 3D model.
  • the terminal can project the 3D model to the first image based on the correspondence between the positions of the 2D key points in the target image and the positions of the 3D key points in the 3D model.
  • the target object is on the target area of the target image.
  • the terminal when the terminal projects the 3D model onto the target area of the target image of the first target object, the terminal may use a Perspective NPoint (PNP) algorithm, based on the position information of the 2D key points and the 3D key point The position information of the point determines the external parameter matrix of the camera, and the camera is the camera that shoots the target image.
  • PNP Perspective NPoint
  • the purpose of the PNP algorithm is to solve the method of 3D-2D point pair motion. Simply put, it is how to estimate the pose of the camera (that is, the camera's position in the coordinate system A) when the coordinates of n three-dimensional space points (relative to a specified coordinate system A) and their two-dimensional projection positions are known. position and attitude).
  • the extrinsic parameter matrix of the camera is used to describe the motion of the camera in a static scene, or the rigid motion of a moving object when the camera is fixed.
  • the extrinsic parameter matrix of the camera includes a rotation matrix and a translation matrix, wherein the rotation matrix describes the direction of the coordinate axis of the world coordinate system relative to the camera coordinate axis, and the translation matrix describes the position of the spatial origin in the camera coordinate system .
  • the internal parameter matrix of the camera is determined by the hardware structure of the camera, including the focal length of the camera, the principal point offset, etc.
  • the principal axis of the camera is the line perpendicular to the image plane and passing through the vacuum, and the focal point of the principal axis and the image plane is called the principal point.
  • the principal point offset is the position of the principal point relative to the image plane.
  • the terminal can use blender (an open source cross-platform all-round 3D animation production software, providing a series of animation short film production solutions from modeling, animation, material, rendering, to audio processing, video editing, etc.)
  • the tool projects the three-dimensional model to the target area of the first target object in the target image to obtain a special effect image.
  • the terminal projects the three-dimensional model onto the target area of the first target object in the target image according to the camera's internal parameter matrix and the calculated camera's external parameter matrix, which can ensure the accuracy of the three-dimensional model projected to the target area.
  • the terminal projects the three-dimensional model to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object to obtain the projection area.
  • the terminal performs special effect processing on the projection area to obtain a special effect image.
  • the projection area includes the first target object, and the first target object exists in the projection area in a three-dimensional form, that is, in the form of a three-dimensional model of the first target object.
  • Performing special effects processing on the projection area can be understood as beautifying the first target object existing in three-dimensional form, that is, beautifying the three-dimensional model of the first target object, such as changing the rendering parameters of the three-dimensional model, etc.
  • the special effect image contains The beautified three-dimensional image of the first target object.
  • the beautification process may include changing the color of the first target object, changing the pattern of the first target object, etc., that is, adjusting the rendering color and rendering texture in the rendering parameters of the three-dimensional model.
  • the terminal can fine-tune the projection area to ensure the matching effect between the three-dimensional model and the target area.
  • the terminal acquires the mask area of the first target object from the target image, and the mask area is used to indicate the position of the first target object in the target image.
  • the terminal adjusts the projection position of the three-dimensional model in the projection area to obtain the adjusted projection area.
  • the terminal performs special effect processing on the adjusted projection area to obtain a special effect image.
  • the mask is a binary image consisting of 0s and 1s.
  • the three-dimensional model is projected onto the target area of the target image by combining the position information of the two-dimensional key points of the first target object and the position information of the corresponding three-dimensional key points in the three-dimensional model, so as to perform special effects on the target area. deal with. Firstly, the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object are obtained, and then the three-dimensional model of the first target object is projected to the target based on the position information of the two-dimensional key points and the position information of the three-dimensional key points. On the target area of the image, the authenticity of the first target object in the target image is improved.
  • the terminal may replace the target area of the target image with the projection area to obtain the final effect image.
  • the terminal may further fine-tune the final effect image.
  • the mask area of the first target object can be extracted from the final effect image, and the mask area can be understood as including the image where the mask is located, where the mask is a binary image composed of 0 and 1.
  • Image masks are defined by specifying data values, data ranges, finite or infinite values, regions of interest, and annotation files, or any combination of the above options can be applied as input to build the mask.
  • the mask area is matched with the position of the first target object in the final effect image, so that the 3D model has better robustness when the first target object is occluded.
  • the nail makeup application scheme may involve a hand segmentation model, a nail segmentation model and a nail key point regression model.
  • the hand region is obtained by semantically segmenting the target image containing fingernails using the hand segmentation model.
  • the nail region is obtained by semantically segmenting the hand region using the nail segmentation model.
  • the key point coordinates of each nail in the nail area are extracted using the nail key point regression model. Then use the PNP algorithm and blender tools to apply makeup to the nails, and replace or cover the nails in the target image to obtain the final nail image.
  • a schematic diagram of the process of projecting a three-dimensional model to a target image in a nail makeup scheme is shown.
  • the P3P algorithm is used to generate the camera's external parameter matrix.
  • the three-dimensional model is projected onto the nail area of the target image by means of the blender tool.
  • FIGS. 3 a to 3 j each schematic diagram of a nail makeup application scheme is shown, and FIG. 3 a shows a target image, and the target image includes the nail.
  • Figure 3b shows the hand region.
  • Fig. 3c shows a schematic diagram of the segmentation result of semantic segmentation of the hand region using the nail segmentation model
  • Fig. 3d shows the synthetic graph of the segmentation result of the semantic segmentation of the hand region using the nail segmentation model.
  • Figure 3e shows the nail area.
  • Figure 3f shows a schematic diagram of the key points of the nail, and the key points are "0", "1", "2” and “3” respectively.
  • Figure 3g shows a top view of the three-dimensional model.
  • Figure 3h shows the rendering of the 3D model projected onto the nail area.
  • Figures 3i and 3j show two makeup effect maps.
  • FIG. 4 is a block diagram of an image generating apparatus.
  • the image generating apparatus may be applied to a terminal or a server, and the image generating apparatus may specifically include the following modules.
  • the acquiring module 41 is configured to acquire a target image, where the target image includes a first target object.
  • the position information obtaining module 42 is configured to obtain the position information of the two-dimensional key points of the first target object from the target image.
  • the image generation module 43 is configured to project the three-dimensional model of the first target object to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object, and generate a special effect image, and the target area is In the region where the first target object is located in the target image, the three-dimensional key points are key points corresponding to the two-dimensional key points in the three-dimensional model of the first target object.
  • the image generation module 43 includes:
  • the extrinsic parameter matrix determination unit is configured to determine the extrinsic parameter matrix of the camera based on the position information of the two-dimensional key points and the position information of the three-dimensional key points, and the camera is the camera that shoots the target image.
  • the model projection unit is configured to project the three-dimensional model to the target area based on the extrinsic parameter matrix of the camera and the intrinsic parameter matrix of the camera to generate a special effect image.
  • the image generation module 43 is configured to project the three-dimensional model to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object to obtain the projection area. Perform special effect processing on the projection area to obtain a special effect image.
  • the apparatus further includes a fine-tuning module configured to obtain a mask area of the first target object from the target image, the mask area being used to indicate the position of the first target object in the target image. Based on the mask area, the projection position of the three-dimensional model in the projection area is adjusted to obtain the adjusted projection area.
  • a fine-tuning module configured to obtain a mask area of the first target object from the target image, the mask area being used to indicate the position of the first target object in the target image. Based on the mask area, the projection position of the three-dimensional model in the projection area is adjusted to obtain the adjusted projection area.
  • the image generation module 43 is further configured to perform special effect processing on the adjusted projection area to obtain a special effect image.
  • the location information acquisition module 42 includes:
  • a segmentation unit configured to input the target image into the segmentation model, to determine the target area.
  • the regression unit is configured to input the target area into the key point regression model to determine the position information of the two-dimensional key points.
  • the segmentation model includes a first segmentation sub-model and a second segmentation sub-model
  • the segmentation unit is configured to input the target image into the first segmentation sub-model, and perform semantic segmentation on the target image by the first segmentation sub-model, A candidate area is obtained, the candidate area includes a second target object, and the first target object belongs to the second target object.
  • the candidate region is input into the second segmentation sub-model, and the candidate region is semantically segmented through the second segmentation sub-model to obtain the target region.
  • the segmentation unit is configured to input the target image into the segmentation model to obtain a first initial region, the first initial region including the first target object.
  • a time series smoothing process is performed on the first initial region to obtain a target region.
  • the regression unit is configured to input the target area into a keypoint regression model to obtain position information of the initial two-dimensional keypoints of the first target object.
  • the optical flow stabilization process is performed on the position information of the initial two-dimensional key points, and the position information of the two-dimensional key points is obtained.
  • FIG. 5 is a block diagram of an image generation electronic device.
  • electronic device 500 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.
  • an electronic device 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514 , and the communication component 516 .
  • the processing component 502 generally controls the overall operation of the electronic device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 502 may include one or more processors 520 to execute instructions to perform all or some of the steps of the image generation method described above. Additionally, processing component 502 may include one or more modules to facilitate interaction between processing component 502 and other components. For example, processing component 502 may include a multimedia module to facilitate interaction between multimedia component 508 and processing component 502.
  • Memory 504 is configured to store various types of data to support operation at electronic device 500 . Examples of such data include instructions for any application or method operating on electronic device 500, contact data, phonebook data, messages, images, videos, and the like. Memory 504 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power supply assembly 506 provides power to various components of electronic device 500 .
  • Power supply components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 500 .
  • Multimedia component 508 includes a screen that provides an output interface between the electronic device 500 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
  • the multimedia component 508 includes a front-facing camera and/or a rear-facing camera. When the electronic device 500 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 510 is configured to output and/or input audio signals.
  • audio component 510 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 500 is in operating modes, such as calling mode, recording mode, and voice recognition mode.
  • the received audio signal may be further stored in memory 504 or transmitted via communication component 516 .
  • the audio component 510 also includes a speaker for outputting audio signals.
  • the I/O interface 412 provides an interface between the processing component 502 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • Sensor assembly 514 includes one or more sensors for providing status assessments of various aspects of electronic device 500 .
  • the sensor assembly 514 can detect the open/closed state of the electronic device 500, the relative positioning of the components, such as the display and the keypad of the electronic device 500, the sensor assembly 514 can also detect the electronic device 500 or one of the electronic devices 500 Changes in the positions of components, presence or absence of user contact with the electronic device 500 , orientation or acceleration/deceleration of the electronic device 500 and changes in the temperature of the electronic device 500 .
  • Sensor assembly 514 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 516 is configured to facilitate wired or wireless communication between electronic device 500 and other devices.
  • Electronic device 500 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof.
  • the communication component 516 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 516 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • electronic device 500 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the image generation method described above.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A programmed gate array
  • controller microcontroller, microprocessor or other electronic component implementation is used to perform the image generation method described above.
  • a storage medium including instructions such as a memory 504 including instructions, is also provided, and the instructions can be executed by the processor 520 of the electronic device 500 to complete the image generation method described above.
  • the storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical disk data storage devices, etc.
  • a computer program product includes readable program code executable by the processor 520 of the electronic device 500 to complete the image generation method described above.
  • the program code may be stored in a storage medium of the electronic device 500, and the storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be a ROM, a random Access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices, etc.
  • FIG. 6 is a block diagram of an electronic device for processing image effects.
  • the electronic device 600 may be provided as a server.
  • electronic device 600 includes a processing component 622, which further includes one or more processors, and a memory resource, represented by memory 632, for storing instructions executable by processing component 622, such as applications.
  • An application program stored in memory 632 may include one or more modules, each corresponding to a set of instructions.
  • the processing component 622 is configured to instruct instructions to perform the image generation method described above.
  • the electronic device 600 may also include a power supply assembly 626 configured for power management of the electronic device 600, a wired or wireless network interface 650 configured to connect the electronic device 600 to a network, and an input output (I/O) interface 658.
  • Electronic device 600 may operate based on an operating system stored in memory 632, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

本公开关于一种图像生成方法、装置、电子设备及存储介质,所述方法包括:获取目标图像,所述目标图像包含第一目标对象;从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。

Description

图像生成方法和电子设备
本公开要求于2020年10月29日提交、申请号为202011199693.3、发明名称为“图像特效的处理方法、装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像生成方法和电子设备。
背景技术
虚拟美甲是短视频应用程序或者相机应用程序的一项新功能,虚拟美甲指对图像中的指甲进行美化处理。相关技术中,目前的虚拟美甲方案通常对二维图像中的二维指甲进行美化处理。
发明内容
本公开提供了一种图像生成方法和电子设备,本公开的技术方案如下:
一方面,提供了一种图像生成方法,包括:
获取目标图像,所述目标图像包含第一目标对象;
从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
一方面,提供了一种图像生成装置,包括:
获取模块,被配置为获取目标图像,所述目标图像包含第一目标对象;
位置信息获取模块,被配置为从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
图像生成模块,被配置为基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
一方面,提供了一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为所述指令,以执行下述步骤:
获取目标图像,所述目标图像包含第一目标对象;
从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
一方面,提供了一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行下述步骤:
获取目标图像,所述目标图像包含第一目标对象;
从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
一方面,提供了一种计算机程序产品,包括可读性程序代码,所述可读性程序代码可由电子设备的处理器执行下述步骤:
获取目标图像,所述目标图像包含第一目标对象;
从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理,并不构成对本公开的不当限定。
图1是一种图像生成方法的流程图。
图2a是一种指甲的上妆方案的示意图。
图2b是一种指甲的上妆方案中将三维模型投影到目标图像的流程示意图。
图3a是一种指甲的上妆方案中的目标图像。
图3b是一种指甲的上妆方案中的手部区域。
图3c是一种指甲的上妆方案中的利用指甲分割模型对手部区域进行语义分割的分割结果示意图。
图3d是一种指甲的上妆方案中的利用指甲分割模型对手部区域进行语义分割的分割结果合成图。
图3e是一种指甲的上妆方案中的指甲区域。
图3f是一种指甲的上妆方案中的指甲的关键点示意图。
图3g是一种指甲的上妆方案中的三维模型的俯视图。
图3h是一种指甲的上妆方案中的三维模型投影到指甲区域上的效果图。
图3i和图3j是一种指甲的上妆方案中的两种上妆效果图。
图4是一种图像生成装置的框图。
图5是一种图像生成电子设备的框图。
图6是一种用于对图像特效进行处理的电子设备的框图。
具体实施方式
图1是一种图像生成方法的流程图,如图1所示,该图像生成方法可以应用于终端或者服务器,在下述说明过程中,以该图像生成方法应用于终端为例。该图像生成方法包括以下步骤。
S11、获取目标图像,目标图像包含第一目标对象。
也即是获取包含第一目标对象的目标图像,在一些实施例中,目标图像也被称为待处理图像,目标图像可以理解为二维目标图像,也即是目标图像为一个平面图像。
在一些实施例中,第一目标对象为指甲,指甲包含手指甲和脚趾甲。除此之外,第一目标对象还可以为眼球、睫毛或者嘴唇等等,本公开实施例对此不做限定,在下述说明过程中,以第一目标对象为指甲为例进行说明。在一些实施例中,目标图像中可以包含一个或多个第一目标对象。
在一些实施例中,在本公开实施例提供的图像生成方法应用在直播场景的情况下,目标图像也即是直播图像,直播图像包括主播,第一目标对象也即是主播的指甲。在直播过程中,终端通过拍摄设备,比如终端自带的摄像头或者外接摄像头来获取直播图像,该终端也即是主播直播时使用的终端。另外,若本公开实施例提供的图像生成方法应用于服务器,那么终端通过拍摄设备获取直播图像,将直播图像发送给服务器,服务器获取直播图像。
在一些实施例中,在本公开实施例提供的图像生成方法应用在短视频的情况下,目标图像也即是短视频的视频帧,视频帧包括人物,第一目标对象也即是该人物的指甲。在短视频的拍摄过程中,终端能够通过拍摄设备,比如终端自带的摄像头或者外接摄像头来拍摄短视频。另外,若本公开实施例提供的图像生成方法应用于服务器,那么终端通过拍摄设备拍摄短视频,将拍摄完成的短视频发送给服务器,服务器获取短视频,从该短视频中获取视频帧。
S12、从目标图像中获取第一目标对象的二维关键点的位置信息。
在一些实施例中,终端先从目标图像中分割出第一目标对象,然后对第一目标对象进行处理,以获取二维关键点的位置信息。在一些实施例中,二维关键点为目标图像中,第一目标对象的边界点或者角点等,本公开实施例对此不做限定。
在一些实施例中,终端从目标图像中获取第一目标对象的二维关键点的位置信息时,将目标图像输入分割模型,确定目标区域,该目标区域也即是包含第一目标对象的区域。在一些实施例中,目标区域为包含第一目标对象的最小外包矩形的区域,也即是,目标区域为一个矩形区域,该矩形区域与第一目标对象的边界相切。终端将目标区域输入关键点 回归模型,确定二维关键点的位置信息。
在一些实施例中,分割模型包括第一分割子模型和第二分割子模型,第一分割子模型和第二分割子模型均为预先训练完毕的模型,在一些实施例中,第一分割子模型用于从目标图像中分割出候选区域,第二分割模型用于从候选区域中分割出目标区域,也即是,在从目标图像确定目标区域时,采用的是多级分割的方法。
在一些实施例中,终端将目标图像输入第一分割子模型,通过第一分割子模型对目标图像进行语义分割,得到候选区域,候选区域包括第二目标对象,第一目标对象属于第二目标对象。终端将候选区域输入第二分割子模型,通过第而分割子模型对候选区域进行语义分割,得到目标区域。
举例来说,终端将目标图像输入第一分割子模型,通过第一分割子模型对目标图像进行语义分割,得到第一区域分割掩码,在一些实施例中,第一区域分割掩码为一张第一掩码图像,该第一掩码图像的尺寸与目标图像相同,该第一掩码图像为一个二值图像,像素点的像素值为0或者1,第一掩码图像的像素点与目标图像的像素点一一对应。终端根据第一区域分割掩码从目标图像中分割得到候选区域,也即是终端将目标图像与第一掩码图像相乘,得到候选区域,在相乘过程中,若第一掩码图像的一个像素点的像素值为1,那么目标图像对应像素点相乘后也就能够保留原本的像素值;若第一掩码图像的一个像素点的像素值为0,那么目标图像对应像素点相乘后也就为0,最终保留原本像素值的区域,也即是候选区域。终端将候选区域输入第二分割子模型,通过第二分割子模型对候选区域进行语义分割,得到第二区域分割掩码。终端根据第二区域分割掩码从候选区域中分割得到目标区域。终端将目标区域输入关键点回归模型,通过关键点回归模型对目标区域进行特征提取,得到目标区域的区域特征,基于目标区域的区域特征,获取第一目标对象的二维关键点的位置信息。
其中,第二目标对象包括第一目标对象,若第一目标对象为指甲,那么第二目标对象为手或者脚,相应的,候选区域也即是包括手或者脚的区域,在一些实施例中,终端通过第一分割子模型对目标图像进行语义分割,得到手部区域,也即是候选区域。终端通过第二分割子模型对手部区域进行语义分割,得到指甲区域,也即是目标区域。终端通过关键点回归模型,提取指甲区域中的每个指甲的二维关键点的位置信息。
通过第一分割子模型和第二分割子模型将目标图像逐级语义分割为目标区域,再通过关键点回归模型从目标区域中提取出二维关键点的位置信息。将目标图像逐级分割为目标区域,提升了第一目标对象的二维关键点提取的准确率。
在一些实施例中,为了使得上述分割模型和关键点回归模型的输出项更加稳定,终端将目标图像输入分割模型,得到第一初始区域,第一初始区域包括第一目标对象。终端对第一初始区域进行时序平滑处理,得到目标区域。终端将目标区域输入关键点回归模型,确定第一目标对象的初始二维关键点的位置信息,对初始二维关键点的位置信息进行光流稳定处理得到二维关键点的位置信息。
举例来说,终端通过第一分割子模型对目标图像进行语义分割,得到候选区域对应的第二初始区域,终端对第二初始区域进行时序平滑处理,得到候选区域。终端通过第二分割子模型对候选区域进行语义分割,得到目标区域对应的第一初始区域,终端对第一初始 区域进行时序平滑处理得到目标区域。终端在获取二维关键点的位置信息时,根据关键点回归模型和光流算法对目标区域进行特征提取,得到二维关键点的位置信息。在一些实施例中,光流算法可以采用Lucas-Kanade光流算法(一种两帧差分的光流估计算法)。通过时序平滑处理可以对分割模型的输出项,也即是对目标区域的稳定性进行增强。通过光流算法可以对关键点回归模型的输出项,也即是对二维关键点的位置信息的稳定性进行增强。
S13、基于二维关键点的位置信息,以及第一目标对象的三维关键点的位置信息,将第一目标对象的三维模型投影至目标区域,生成特效图像,目标区域为第一目标对象在目标图像中所在的区域,三维关键点为三维模型中与二维关键点对应的关键点。
在一些实施例中,三维模型为已经训练完毕的模型,本公开的实施例对三维模型的训练过程不做限定。在一些实施例中,若第一目标对象为指甲,那么三维模型也即是指甲的三维模型。在一些实施例中,终端通过三维重建的方法来得到三维模型。
在一些实施例中,二维关键点的位置信息在三维模型中的三维关键点的位置信息可以为预先设定的关键点的位置信息。其中,一个二维关键点在三维模型中对应一个三维关键点。结合二维关键点的位置信息和三维关键点的位置信息,终端基于二维关键点在目标图像中的位置与三维关键点在三维模型中的位置的对应关系,能够将三维模型投影到第一目标对象在目标图像的目标区域上。
在一些实施例中,终端在将三维模型投影到第一目标对象在目标图像的目标区域上时,可以利用N点透视(Perspective NPoint,PNP)算法,基于二维关键点的位置信息和三维关键点的位置信息,确定相机的外参矩阵,相机为拍摄目标图像的相机。
其中,PNP算法的目的是求解三维-二维点对运动的方法。简单来说,就是在已知n个三维空间点坐标(相对于某个指定的坐标系A)及其二维投影位置的情况下,如何估计相机的位姿(即相机在坐标系A下的位置和姿态)。相机的外参矩阵用于描述相机在静态场景下的运动,或者在相机固定时,运动物体的刚性运动。在一些实施例中,相机的外参矩阵包含旋转矩阵和平移矩阵,其中,旋转矩阵描述了世界坐标系的坐标轴相对于相机坐标轴的方向,平移矩阵描述了相机坐标系下空间原点的位置。在一些实施例中,由于指甲的关键点的数量较少,因此终端采用P3P算法(即N=3)来生成相机的外参矩阵。终端基于相机的外参矩阵和相机的内参矩阵,将三维模型投影到第一目标对象在目标图像的目标区域上,得到特效图像。
其中,相机的内参矩阵由相机的硬件结构决定,包含相机的焦距、主点偏移等。相机的主轴是与图像平面垂直且穿过真空的线,主轴与图像平面的焦点称为主点。主点偏移就是主点位置相对于图像平面的位置。在一些实施例中,终端可以借助blender(是一款开源的跨平台全能三维动画制作软件,提供从建模、动画、材质、渲染、到音频处理、视频剪辑等一系列动画短片制作解决方案)工具将三维模型投影到第一目标对象在目标图像的目标区域,得到特效图像。终端根据相机的内参矩阵和计算得到的相机的外参矩阵将三维模型投影到第一目标对象在目标图像的目标区域上,可以保证三维模型投影到目标区域的准确程度。
在一些实施例中,终端基于二维关键点的位置信息,以及第一目标对象的三维关键点 的位置信息,将三维模型投影至目标区域,得到投影区域。终端对投影区域进行特效处理,得到特效图像。
在一些实施例中,投影区域包含第一目标对象,而且,第一目标对象在投影区域中以三维的形式存在,也即是以第一目标对象的三维模型的形式存在。对投影区域进行特效处理可以理解为对三维形式存在的第一目标对象进行美化处理,也即是对第一目标对象的三维模型进行美化处理,比如改变三维模型的渲染参数等,特效图像即包含美化后的三维的第一目标对象的图像。在一些实施例中,美化处理可以包含更换第一目标对象颜色、更换第一目标对象图案等等,也即是对三维模型的渲染参数中的渲染颜色,渲染纹理进行调整,本公开的实施例对美化处理的内容和采用的技术手段等不做具体限制。
在一些实施例中,在上述实施例的基础上,终端能够对投影区域进行微调,以保证三维模型与目标区域的匹配效果。终端从目标图像中获取第一目标对象的掩膜区域,掩膜区域用于指示第一目标对象在目标图像中的位置。终端基于掩膜区域,对投影区域中三维模型的投影位置进行调整,得到调整后的投影区域。终端对调整后的投影区域进行特效处理,得到特效图像。,其中,掩膜是由0和1组成的一个二进制图像。当在某一功能中应用掩模时,1值区域被处理,被屏蔽的0值区域不被包括在计算中。通过指定的数据值、数据范围、有限或无限值、感兴趣区和注释文件来定义图像掩模,也可以应用上述选项的任意组合作为输入来建立掩模。
本公开实施例结合第一目标对象的二维关键点的位置信息,以及,三维模型中对应的三维关键点的位置信息,将三维模型投影到目标图像的目标区域上,进而对目标区域进行特效处理。首先通过获取第一目标对象的二维关键点的位置信息和三维关键点的位置信息,然后基于二维关键点的位置信息和三维关键点的位置信息将第一目标对象的三维模型投影到目标图像的目标区域上,提高了第一目标对象在目标图像中的真实度。
在一些实施例中,终端可以将目标图像的目标区域替换为投影区域,得到最终效果图像。可选地,终端可以进一步对最终效果图像进行微调。在实际应用中,可以在最终效果图像中提取出第一目标对象的掩膜区域,掩膜区域可以理解为包含掩膜所在的图像,其中,掩膜是由0和1组成的一个二进制图像。当在某一功能中应用掩模时,1值区域被处理,被屏蔽的0值区域不被包括在计算中。通过指定的数据值、数据范围、有限或无限值、感兴趣区和注释文件来定义图像掩模,也可以应用上述选项的任意组合作为输入来建立掩模。将掩膜区域与第一目标对象在最终效果图像中的位置相匹配,使得三维模型遮挡第一目标对象的情况具有更好的鲁棒性。
基于上述关于一种图像生成方法的实施例的相关说明,下面介绍一种指甲的上妆方案,如图2a所示,该指甲的上妆方案可以涉及到手部分割模型、指甲分割模型和指甲关键点回归模型。利用手部分割模型对包含指甲的目标图像进行人手语义分割得到手部区域。利用指甲分割模型对手部区域进行指甲语义分割得到指甲区域。利用指甲关键点回归模型提取出指甲区域中每个指甲的关键点坐标。再运用PNP算法和blender工具等对指甲进行上妆,将上妆后的指甲替换或者覆盖目标图像中的指甲得到最终的美甲图像。
如图2b所示,示出了一种指甲的上妆方案中将三维模型投影到目标图像的流程示意图。根据目标图像中指甲的关键点坐标和三维模型中对应的关键点坐标,并利用P3P算法 生成相机的外参矩阵。再根据相机的外参矩阵、相机的内参矩阵和三维模型,借助于blender工具将三维模型投影到指甲在目标图像的指甲区域上。
参照图3a至图3j示出了一种指甲的上妆方案中各示意图,图3a示出了目标图像,该目标图像中包含了指甲。图3b示出了手部区域。图3c示出了利用指甲分割模型对手部区域进行语义分割的分割结果示意图,图3d示出了利用指甲分割模型对手部区域进行语义分割的分割结果合成图。图3e示出了指甲区域。图3f示出了指甲的关键点示意图,关键点分别为“0”、“1”、“2”和“3”。图3g示出了三维模型的俯视图。图3h示出了三维模型投影到指甲区域上的效果图。图3i和图3j示出了两种上妆效果图。
图4是一种图像生成装置的框图。该图像生成装置可以应用于终端或者服务器中,该图像生成装置具体可以包括如下模块。
获取模块41,被配置为获取目标图像,目标图像包含第一目标对象。
位置信息获取模块42,被配置为从目标图像中获取第一目标对象的二维关键点的位置信息。
图像生成模块43,被配置为基于二维关键点的位置信息,以及第一目标对象的三维关键点的位置信息,将第一目标对象的三维模型投影至目标区域,生成特效图像,目标区域为第一目标对象在目标图像中所在的区域,三维关键点为第一目标对象的三维模型中与二维关键点对应的关键点。
在一些实施例中,图像生成模块43,包括:
外参矩阵确定单元,被配置为基于二维关键点的位置信息和三维关键点的位置信息,确定相机的外参矩阵,相机为拍摄目标图像的相机。
模型投影单元,被配置为基于相机的外参矩阵和相机的内参矩阵,将三维模型投影至目标区域,生成特效图像。
在一些实施例中,图像生成模块43,被配置为基于二维关键点的位置信息,以及第一目标对象的三维关键点的位置信息,将三维模型投影至目标区域,得到投影区域。对投影区域进行特效处理,得到特效图像。
在一些实施例中,装置还包括微调模块,被配置为从目标图像中获取第一目标对象的掩膜区域,掩膜区域用于指示第一目标对象在目标图像中的位置。基于掩膜区域,对投影区域中三维模型的投影位置进行调整,得到调整后的投影区域。
图像生成模块43,还被配置为对调整后的投影区域进行特效处理,得到特效图像。
在一些实施例中,位置信息获取模块42,包括:
分割单元,被配置为将目标图像输入分割模型,确定目标区域。
回归单元,被配置为将目标区域输入关键点回归模型,确定二维关键点的位置信息。
在一些实施例中,分割模型包括第一分割子模型和第二分割子模型,分割单元,被配置为将目标图像输入第一分割子模型,通过第一分割子模型对目标图像进行语义分割,得到候选区域,候选区域包括第二目标对象,第一目标对象属于第二目标对象。将候选区域输入第二分割子模型,通过第而分割子模型对候选区域进行语义分割,得到目标区域。
在一些实施例中,分割单元,被配置为将目标图像输入分割模型,得到第一初始区域, 第一初始区域包括第一目标对象。对第一初始区域进行时序平滑处理,得到目标区域。
在一些实施例中,回归单元,被配置为将目标区域输入关键点回归模型,得到第一目标对象的初始二维关键点的位置信息。对初始二维关键点的位置信息进行光流稳定处理,得到二维关键点的位置信息。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图5是一种图像生成电子设备的框图。例如,电子设备500可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图5,电子设备500可以包括以下一个或多个组件:处理组件502,存储器504,电力组件506,多媒体组件508,音频组件510,输入/输出(I/O)的接口512,传感器组件514,以及通信组件516。
处理组件502通常控制电子设备500的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件502可以包括一个或多个处理器520来执行指令,以完成上述图像生成方法的全部或部分步骤。此外,处理组件502可以包括一个或多个模块,便于处理组件502和其他组件之间的交互。例如,处理组件502可以包括多媒体模块,以方便多媒体组件508和处理组件502之间的交互。
存储器504被配置为存储各种类型的数据以支持在电子设备500的操作。这些数据的示例包括用于在电子设备500上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图像,视频等。存储器504可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件506为电子设备500的各种组件提供电力。电源组件506可以包括电源管理系统,一个或多个电源,及其他与为电子设备500生成、管理和分配电力相关联的组件。
多媒体组件508包括在所述电子设备500和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件508包括一个前置摄像头和/或后置摄像头。当电子设备500处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件510被配置为输出和/或输入音频信号。例如,音频组件510包括一个麦克风(MIC),当电子设备500处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器504 或经由通信组件516发送。在一些实施例中,音频组件510还包括一个扬声器,用于输出音频信号。
I/O接口412为处理组件502和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件514包括一个或多个传感器,用于为电子设备500提供各个方面的状态评估。例如,传感器组件514可以检测到电子设备500的打开/关闭状态,组件的相对定位,例如所述组件为电子设备500的显示器和小键盘,传感器组件514还可以检测电子设备500或电子设备500一个组件的位置改变,用户与电子设备500接触的存在或不存在,电子设备500方位或加速/减速和电子设备500的温度变化。传感器组件514可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件514还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件514还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件516被配置为便于电子设备500和其他设备之间有线或无线方式的通信。电子设备500可以接入基于通信标准的无线网络,如WiFi,运营商网络(如2G、3G、4G或5G),或它们的组合。在一个示例性实施例中,通信组件516经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件516还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,电子设备500可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述图像生成方法。
在示例性实施例中,还提供了一种包括指令的存储介质,例如包括指令的存储器504,上述指令可由电子设备500的处理器520执行以完成上述图像生成方法。在一些实施例中,存储介质可以是非临时性计算机可读存储介质,例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品,该计算机程序产品包括可读性程序代码,该可读性程序代码可由电子设备500的处理器520执行以完成上述图像生成方法。在一些实施例中,该程序代码可以存储在电子设备500的存储介质中,该存储介质可以是非临时性计算机可读存储介质,例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
图6是一种用于对图像特效进行处理的电子设备的框图。例如,电子设备600可以被提供为一服务器。参照图6,电子设备600包括处理组件622,其进一步包括一个或多个处理器,以及由存储器632所代表的存储器资源,用于存储可由处理组件622的执行的指令,例如应用程序。存储器632中存储的应用程序可以包括一个或一个以上的每一个对应 于一组指令的模块。此外,处理组件622被配置为指令,以执行上述图像生成方法。
电子设备600还可以包括一个电源组件626被配置为电子设备600的电源管理,一个有线或无线网络接口650被配置为将电子设备600连接到网络,和一个输入输出(I/O)接口658。电子设备600可以操作基于存储在存储器632的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。

Claims (26)

  1. 一种图像生成方法,包括:
    获取目标图像,所述目标图像包含第一目标对象;
    从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
    基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述三维模型中与所述二维关键点对应的关键点。
  2. 根据权利要求1所述的方法,其中,所述基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,包括:
    基于所述二维关键点的位置信息和所述三维关键点的位置信息,确定相机的外参矩阵,所述相机为拍摄所述目标图像的相机;
    基于所述相机的外参矩阵和所述相机的内参矩阵,将所述三维模型投影至所述目标区域,生成所述特效图像。
  3. 根据权利要求1所述的方法,其中,所述基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,包括:
    基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至所述目标区域,得到投影区域;
    对所述投影区域进行特效处理,得到所述特效图像。
  4. 根据权利要求3所述的方法,其中,所述方法还包括:
    从所述目标图像中获取所述第一目标对象的掩膜区域,所述掩膜区域用于指示所述第一目标对象在所述目标图像中的位置;
    基于所述掩膜区域,对所述投影区域中所述三维模型的投影位置进行调整,得到调整后的所述投影区域;
    所述对所述投影区域进行特效处理,得到所述特效图像包括:
    对所述调整后的所述投影区域进行特效处理,得到所述特效图像。
  5. 根据权利要求1所述的方法,其中,所述从所述目标图像中获取所述第一目标对象的二维关键点的位置信息,包括:
    将所述目标图像输入分割模型,确定所述目标区域;
    将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息。
  6. 根据权利要求5所述的方法,其中,所述分割模型包括第一分割子模型和第二分割 子模型,所述将所述目标图像输入分割模型,确定所述目标区域,包括:
    将所述目标图像输入所述第一分割子模型,通过所述第一分割子模型对所述目标图像进行语义分割,得到候选区域,所述候选区域包括第二目标对象,所述第一目标对象属于所述第二目标对象;
    将所述候选区域输入所述第二分割子模型,通过所述第而分割子模型对所述候选区域进行语义分割,得到所述目标区域。
  7. 根据权利要求5所述的方法,其中,所述将所述目标图像输入分割模型,确定所述目标区域,包括:
    将所述目标图像输入所述分割模型,得到第一初始区域,所述第一初始区域包括所述第一目标对象;
    对所述第一初始区域进行时序平滑处理,得到所述目标区域。
  8. 根据权利要求5所述的方法,其中,所述将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息,包括:
    将所述目标区域输入所述关键点回归模型,得到所述第一目标对象的初始二维关键点的位置信息;
    对所述初始二维关键点的位置信息进行光流稳定处理,得到所述二维关键点的位置信息。
  9. 一种图像生成装置,包括:
    获取模块,被配置为获取目标图像,所述目标图像包含第一目标对象;
    位置信息获取模块,被配置为从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
    图像生成模块,被配置为基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
  10. 根据权利要求9所述的装置,其中,所述图像生成模块,包括:
    外参矩阵确定单元,被配置为基于所述二维关键点的位置信息和所述三维关键点的位置信息,确定相机的外参矩阵,所述相机为拍摄所述目标图像的相机;
    模型投影单元,被配置为基于所述相机的外参矩阵和所述相机的内参矩阵,将所述三维模型投影至所述目标区域,生成所述特效图像。
  11. 根据权利要求9所述的装置,其中,所述图像生成模块,被配置为基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至所述目标区域,得到投影区域;对所述投影区域进行特效处理,得到所述特效图像。
  12. 根据权利要求9所述的装置,其中,所述装置还包括微调模块,被配置为从所述目标图像中获取所述第一目标对象的掩膜区域,所述掩膜区域用于指示所述第一目标对象在所述目标图像中的位置;基于所述掩膜区域,对所述投影区域中所述三维模型的投影位置进行调整,得到调整后的所述投影区域;
    所述图像生成模块,还被配置为对所述调整后的所述投影区域进行特效处理,得到所述特效图像。
  13. 根据权利要求9所述的装置,其中,所述位置信息获取模块,包括:
    分割单元,被配置为将所述目标图像输入分割模型,确定所述目标区域;
    回归单元,被配置为将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息。
  14. 根据权利要求13所述的装置,其中,所述分割模型包括第一分割子模型和第二分割子模型,所述分割单元,被配置为将所述目标图像输入所述第一分割子模型,通过所述第一分割子模型对所述目标图像进行语义分割,得到候选区域,所述候选区域包括第二目标对象,所述第一目标对象属于所述第二目标对象;将所述候选区域输入所述第二分割子模型,通过所述第而分割子模型对所述候选区域进行语义分割,得到所述目标区域。
  15. 根据权利要求13所述的装置,其中,所述分割单元,被配置为将所述目标图像输入所述分割模型,得到第一初始区域,所述第一初始区域包括所述第一目标对象;对所述第一初始区域进行时序平滑处理,得到所述目标区域。
  16. 根据权利要求13所述的装置,其中,所述回归单元,被配置为将所述目标区域输入所述关键点回归模型,得到所述第一目标对象的初始二维关键点的位置信息;对所述初始二维关键点的位置信息进行光流稳定处理,得到所述二维关键点的位置信息。
  17. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为所述指令,以执行下述步骤:
    获取目标图像,所述目标图像包含第一目标对象;
    从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
    基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
  18. 根据权利要求17所述的电子设备,其中,所述处理器被配置为下述步骤:
    基于所述二维关键点的位置信息和所述三维关键点的位置信息,确定相机的外参矩阵,所述相机为拍摄所述目标图像的相机;
    基于所述相机的外参矩阵和所述相机的内参矩阵,将所述三维模型投影至所述目标区域,生成所述特效图像。
  19. 根据权利要求17所述的电子设备,其中,所述处理器被配置为下述步骤:
    基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至所述目标区域,得到投影区域;
    对所述投影区域进行特效处理,得到所述特效图像。
  20. 根据权利要求19所述的电子设备,其中,所述处理器被配置为下述步骤:
    从所述目标图像中获取所述第一目标对象的掩膜区域,所述掩膜区域用于指示所述第一目标对象在所述目标图像中的位置;
    基于所述掩膜区域,对所述投影区域中所述三维模型的投影位置进行调整,得到调整后的所述投影区域;
    所述对所述投影区域进行特效处理,得到所述特效图像包括:
    对所述调整后的所述投影区域进行特效处理,得到所述特效图像。
  21. 根据权利要求17所述的电子设备,其中,所述处理器被配置为下述步骤:
    将所述目标图像输入分割模型,确定所述目标区域;
    将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息。
  22. 根据权利要求21所述的电子设备,其中,所述分割模型包括第一分割子模型和第二分割子模型,所述处理器被配置为下述步骤:
    将所述目标图像输入所述第一分割子模型,通过所述第一分割子模型对所述目标图像进行语义分割,得到候选区域,所述候选区域包括第二目标对象,所述第一目标对象属于所述第二目标对象;
    将所述候选区域输入所述第二分割子模型,通过所述第而分割子模型对所述候选区域进行语义分割,得到所述目标区域。
  23. 根据权利要求21所述的电子设备,其中,所述处理器被配置为下述步骤:
    将所述目标图像输入所述分割模型,得到第一初始区域,所述第一初始区域包括所述第一目标对象;
    对所述第一初始区域进行时序平滑处理,得到所述目标区域。
  24. 根据权利要求21所述的电子设备,其中,所述处理器被配置为下述步骤:
    将所述目标区域输入所述关键点回归模型,得到所述第一目标对象的初始二维关键点 的位置信息;
    对所述初始二维关键点的位置信息进行光流稳定处理,得到所述二维关键点的位置信息。
  25. 一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行下述步骤:
    获取目标图像,所述目标图像包含第一目标对象;
    从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
    基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
  26. 一种计算机程序产品,包括可读性程序代码,所述可读性程序代码可由电子设备的处理器执行下述步骤:
    获取目标图像,所述目标图像包含第一目标对象;
    从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;
    基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
PCT/CN2021/105334 2020-10-29 2021-07-08 图像生成方法和电子设备 WO2022088750A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011199693.3 2020-10-29
CN202011199693.3A CN112669198A (zh) 2020-10-29 2020-10-29 图像特效的处理方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022088750A1 true WO2022088750A1 (zh) 2022-05-05

Family

ID=75402841

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105334 WO2022088750A1 (zh) 2020-10-29 2021-07-08 图像生成方法和电子设备

Country Status (2)

Country Link
CN (1) CN112669198A (zh)
WO (1) WO2022088750A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669198A (zh) * 2020-10-29 2021-04-16 北京达佳互联信息技术有限公司 图像特效的处理方法、装置、电子设备和存储介质
CN113421182B (zh) * 2021-05-20 2023-11-28 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN114359522B (zh) * 2021-12-23 2024-06-18 阿依瓦(北京)技术有限公司 Ar模型放置方法及装置
CN115358958A (zh) * 2022-08-26 2022-11-18 北京字跳网络技术有限公司 特效图生成方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090262989A1 (en) * 2007-07-12 2009-10-22 Tatsuo Kozakaya Image processing apparatus and method
CN109272579A (zh) * 2018-08-16 2019-01-25 Oppo广东移动通信有限公司 基于三维模型的美妆方法、装置、电子设备和存储介质
CN110675489A (zh) * 2019-09-25 2020-01-10 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备和存储介质
CN111047526A (zh) * 2019-11-22 2020-04-21 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备及存储介质
CN112669198A (zh) * 2020-10-29 2021-04-16 北京达佳互联信息技术有限公司 图像特效的处理方法、装置、电子设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090262989A1 (en) * 2007-07-12 2009-10-22 Tatsuo Kozakaya Image processing apparatus and method
CN109272579A (zh) * 2018-08-16 2019-01-25 Oppo广东移动通信有限公司 基于三维模型的美妆方法、装置、电子设备和存储介质
CN110675489A (zh) * 2019-09-25 2020-01-10 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备和存储介质
CN111047526A (zh) * 2019-11-22 2020-04-21 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备及存储介质
CN112669198A (zh) * 2020-10-29 2021-04-16 北京达佳互联信息技术有限公司 图像特效的处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112669198A (zh) 2021-04-16

Similar Documents

Publication Publication Date Title
WO2022088750A1 (zh) 图像生成方法和电子设备
US11114130B2 (en) Method and device for processing video
WO2022179026A1 (zh) 图像处理方法及装置、电子设备和存储介质
WO2020007241A1 (zh) 图像处理方法和装置、电子设备以及计算机可读存储介质
CN104639843B (zh) 图像处理方法及装置
WO2022179025A1 (zh) 图像处理方法及装置、电子设备和存储介质
CN110929651A (zh) 图像处理方法、装置、电子设备及存储介质
US11308692B2 (en) Method and device for processing image, and storage medium
WO2016011747A1 (zh) 肤色调整方法和装置
US11030733B2 (en) Method, electronic device and storage medium for processing image
CN104484858B (zh) 人物图像处理方法及装置
CN109840939B (zh) 三维重建方法、装置、电子设备及存储介质
CN107341777B (zh) 图片处理方法及装置
CN109857311A (zh) 生成人脸三维模型的方法、装置、终端及存储介质
WO2022110837A1 (zh) 图像处理方法及装置
WO2022077970A1 (zh) 特效添加方法及装置
WO2022193466A1 (zh) 图像处理方法及装置、电子设备和存储介质
US11252341B2 (en) Method and device for shooting image, and storage medium
CN113643356A (zh) 相机位姿确定、虚拟物体显示方法、装置及电子设备
CN110580688A (zh) 一种图像处理方法、装置、电子设备及存储介质
US20210118148A1 (en) Method and electronic device for changing faces of facial image
JP2022055302A (ja) 遮蔽された画像の検出方法、装置、及び媒体
CN114463212A (zh) 图像处理方法及装置、电子设备和存储介质
CN113570581A (zh) 图像处理方法及装置、电子设备和存储介质
CN113763286A (zh) 图像处理方法及装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884487

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884487

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21884487

Country of ref document: EP

Kind code of ref document: A1