WO2022088750A1 - 图像生成方法和电子设备 - Google Patents
图像生成方法和电子设备 Download PDFInfo
- Publication number
- WO2022088750A1 WO2022088750A1 PCT/CN2021/105334 CN2021105334W WO2022088750A1 WO 2022088750 A1 WO2022088750 A1 WO 2022088750A1 CN 2021105334 W CN2021105334 W CN 2021105334W WO 2022088750 A1 WO2022088750 A1 WO 2022088750A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- area
- model
- dimensional
- image
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000011218 segmentation Effects 0.000 claims description 101
- 230000000694 effects Effects 0.000 claims description 58
- 238000012545 processing Methods 0.000 claims description 41
- 239000011159 matrix material Substances 0.000 claims description 30
- 230000003287 optical effect Effects 0.000 claims description 15
- 238000009499 grossing Methods 0.000 claims description 8
- 230000006641 stabilisation Effects 0.000 claims description 5
- 238000011105 stabilization Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 210000000282 nail Anatomy 0.000 description 49
- 238000010586 diagram Methods 0.000 description 15
- 238000004422 calculation algorithm Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 10
- 238000009877 rendering Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 210000004905 finger nail Anatomy 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000007726 management method Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 210000005252 bulbus oculi Anatomy 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 210000000720 eyelash Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 210000000088 lip Anatomy 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 210000004906 toe nail Anatomy 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
Definitions
- the present disclosure relates to the technical field of image processing, and in particular, to an image generation method and an electronic device.
- Virtual manicure is a new feature of short video application or camera application, virtual nail nails beautify the nails in the image.
- the current virtual nail art scheme usually performs beautification processing on the two-dimensional nails in the two-dimensional images.
- the present disclosure provides an image generation method and electronic device, and the technical solutions of the present disclosure are as follows:
- an image generation method comprising:
- the target image comprising a first target object
- the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
- an image generation device comprising:
- an acquisition module configured to acquire a target image, the target image comprising a first target object
- a position information acquisition module configured to acquire the position information of the two-dimensional key points of the first target object from the target image
- the image generation module is configured to project the three-dimensional model to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object to generate a special effect image, the target
- the region is the region where the first target object is located in the target image
- the three-dimensional key point is the key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
- an electronic device comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to the instructions to perform the following steps:
- the target image comprising a first target object
- the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
- a storage medium when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device can perform the following steps:
- the target image comprising a first target object
- the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
- a computer program product comprising readable program code that can be executed by a processor of an electronic device to perform the following steps:
- the target image comprising a first target object
- the three-dimensional model is projected to a target area to generate a special effect image, and the target area is the first target The area where the object is located in the target image, and the three-dimensional key point is a key point corresponding to the two-dimensional key point in the three-dimensional model of the first target object.
- FIG. 1 is a flowchart of an image generation method.
- Figure 2a is a schematic diagram of a nail makeup application scheme.
- FIG. 2b is a schematic flowchart of projecting a three-dimensional model to a target image in a nail makeup application scheme.
- Figure 3a is a target image in a nail application scheme.
- Figure 3b is a hand area in a nail application scheme.
- FIG. 3c is a schematic diagram of a segmentation result of semantic segmentation of hand regions using a nail segmentation model in a nail makeup scheme.
- FIG. 3d is a composite image of segmentation results of semantic segmentation of hand regions using a nail segmentation model in a nail makeup scheme.
- Figure 3e is a nail area in a make-up scheme of a nail.
- FIG. 3f is a schematic diagram of key points of a nail in a nail makeup application scheme.
- Figure 3g is a top view of a three-dimensional model in a nail application solution.
- FIG. 3h is an effect diagram of the projection of the three-dimensional model in a nail makeup scheme onto the nail area.
- FIG. 3i and FIG. 3j are two makeup effect diagrams in a nail makeup application scheme.
- FIG. 4 is a block diagram of an image generating apparatus.
- Figure 5 is a block diagram of an image generation electronic device.
- FIG. 6 is a block diagram of an electronic device for processing image effects.
- FIG. 1 is a flowchart of an image generation method. As shown in FIG. 1 , the image generation method can be applied to a terminal or a server. In the following description process, the image generation method is applied to a terminal as an example. The image generation method includes the following steps.
- the target image is also called an image to be processed, and the target image can be understood as a two-dimensional target image, that is, the target image is a plane image.
- the first target object is a fingernail, including fingernails and toenails.
- the first target object may also be an eyeball, eyelashes, lips, etc., which are not limited in the embodiments of the present disclosure.
- the first target object is a nail as an example for description.
- the target image may contain one or more first target objects.
- the target image is the live broadcast image
- the live broadcast image includes the host
- the first target object is the host's nail.
- the terminal obtains live broadcast images through a shooting device, such as a camera built in the terminal or an external camera, and the terminal is also the terminal used by the host for live broadcast.
- the image generation method provided by the embodiment of the present disclosure is applied to the server, the terminal obtains the live image through the photographing device, sends the live image to the server, and the server obtains the live image.
- the target image is a video frame of the short video
- the video frame includes a character
- the first target object is the character of the character. nail.
- the terminal can shoot the short video through a shooting device, such as a camera built in the terminal or an external camera.
- the image generation method provided by the embodiment of the present disclosure is applied to a server, the terminal shoots a short video through a shooting device, sends the shot completed short video to the server, and the server obtains the short video and obtains video frames from the short video.
- the terminal first segments the first target object from the target image, and then processes the first target object to obtain position information of two-dimensional key points.
- the two-dimensional key point is a boundary point or a corner point of the first target object in the target image, which is not limited in this embodiment of the present disclosure.
- the terminal when the terminal obtains the position information of the two-dimensional key points of the first target object from the target image, the terminal inputs the target image into the segmentation model to determine the target area, which is also the area including the first target object .
- the target area is an area containing the smallest enclosing rectangle of the first target object, that is, the target area is a rectangular area that is tangent to the boundary of the first target object.
- the terminal inputs the target area into the key point regression model to determine the location information of the two-dimensional key points.
- the segmentation model includes a first segmentation sub-model and a second segmentation sub-model.
- the first segmentation sub-model and the second segmentation sub-model are both pre-trained models.
- the first segmentation sub-model The model is used to segment the candidate region from the target image
- the second segmentation model is used to segment the target region from the candidate region, that is, when determining the target region from the target image, a multi-level segmentation method is used.
- the terminal inputs the target image into the first segmentation sub-model, and performs semantic segmentation on the target image through the first segmentation sub-model to obtain a candidate region, the candidate region includes the second target object, and the first target object belongs to the second target object.
- the terminal inputs the candidate region into the second segmentation sub-model, and performs semantic segmentation on the candidate region through the second segmentation sub-model to obtain the target region.
- the terminal inputs the target image into the first segmentation sub-model, and performs semantic segmentation on the target image through the first segmentation sub-model to obtain a first region segmentation mask.
- the first region segmentation mask is a A first mask image
- the size of the first mask image is the same as the target image
- the first mask image is a binary image
- the pixel value of the pixel point is 0 or 1
- the pixel point of the first mask image One-to-one correspondence with the pixels of the target image.
- the terminal obtains the candidate region by segmenting the target image according to the first region segmentation mask, that is, the terminal multiplies the target image and the first mask image to obtain the candidate region.
- the pixel value of a pixel point is 1, then the original pixel value can be retained after the corresponding pixel point of the target image is multiplied; if the pixel value of a pixel point of the first mask image is 0, then the corresponding pixel point of the target image has the same pixel value. After the multiplication, it is 0, and the area where the original pixel value is finally retained, that is, the candidate area.
- the terminal inputs the candidate region into the second segmentation sub-model, and performs semantic segmentation on the candidate region through the second segmentation sub-model to obtain a second region segmentation mask.
- the terminal obtains the target area by dividing the candidate area according to the second area dividing mask.
- the terminal inputs the target area into the key point regression model, and extracts the features of the target area through the key point regression model to obtain the regional features of the target area, and obtains the location information of the two-dimensional key points of the first target object based on the regional features of the target area.
- the second target object includes the first target object. If the first target object is a fingernail, the second target object is a hand or a foot. Correspondingly, the candidate area is an area including the hand or the foot.
- the terminal performs semantic segmentation on the target image through the first segmentation sub-model to obtain the hand region, that is, the candidate region.
- the terminal performs semantic segmentation on the hand region through the second segmentation sub-model to obtain the nail region, that is, the target region.
- the terminal extracts the position information of the two-dimensional key points of each nail in the nail area through the key point regression model.
- the target image is semantically segmented into target regions by the first segmentation sub-model and the second segmentation sub-model, and the position information of two-dimensional key points is extracted from the target region by the key point regression model.
- the target image is divided into target regions step by step, which improves the accuracy of the extraction of two-dimensional key points of the first target object.
- the terminal in order to make the output terms of the segmentation model and the keypoint regression model more stable, inputs the target image into the segmentation model to obtain a first initial area, where the first initial area includes the first target object.
- the terminal performs time series smoothing processing on the first initial area to obtain a target area.
- the terminal inputs the target area into the key point regression model, determines the position information of the initial two-dimensional key points of the first target object, and performs optical flow stabilization processing on the position information of the initial two-dimensional key points to obtain the position information of the two-dimensional key points.
- the terminal performs semantic segmentation on the target image through the first segmentation sub-model to obtain a second initial area corresponding to the candidate area, and the terminal performs time series smoothing processing on the second initial area to obtain the candidate area.
- the terminal performs semantic segmentation on the candidate area through the second segmentation sub-model to obtain the first initial area corresponding to the target area, and the terminal performs time series smoothing processing on the first initial area to obtain the target area.
- the terminal When acquiring the position information of the two-dimensional key points, the terminal performs feature extraction on the target area according to the key point regression model and the optical flow algorithm, and obtains the position information of the two-dimensional key points.
- the optical flow algorithm may use the Lucas-Kanade optical flow algorithm (a two-frame difference optical flow estimation algorithm).
- the output of the segmentation model that is, the stability of the target region, can be enhanced through time series smoothing.
- the output item of the key point regression model that is, the stability of the position information of the two-dimensional key point, can be enhanced.
- the 3D key points are the key points corresponding to the 2D key points in the 3D model.
- the three-dimensional model is a model that has been trained, and the embodiments of the present disclosure do not limit the training process of the three-dimensional model.
- the three-dimensional model is also a three-dimensional model of the nail.
- the terminal obtains a three-dimensional model by a three-dimensional reconstruction method.
- the position information of the two-dimensional key points in the three-dimensional model may be the position information of the preset key points.
- a 2D key point corresponds to a 3D key point in the 3D model.
- the terminal can project the 3D model to the first image based on the correspondence between the positions of the 2D key points in the target image and the positions of the 3D key points in the 3D model.
- the target object is on the target area of the target image.
- the terminal when the terminal projects the 3D model onto the target area of the target image of the first target object, the terminal may use a Perspective NPoint (PNP) algorithm, based on the position information of the 2D key points and the 3D key point The position information of the point determines the external parameter matrix of the camera, and the camera is the camera that shoots the target image.
- PNP Perspective NPoint
- the purpose of the PNP algorithm is to solve the method of 3D-2D point pair motion. Simply put, it is how to estimate the pose of the camera (that is, the camera's position in the coordinate system A) when the coordinates of n three-dimensional space points (relative to a specified coordinate system A) and their two-dimensional projection positions are known. position and attitude).
- the extrinsic parameter matrix of the camera is used to describe the motion of the camera in a static scene, or the rigid motion of a moving object when the camera is fixed.
- the extrinsic parameter matrix of the camera includes a rotation matrix and a translation matrix, wherein the rotation matrix describes the direction of the coordinate axis of the world coordinate system relative to the camera coordinate axis, and the translation matrix describes the position of the spatial origin in the camera coordinate system .
- the internal parameter matrix of the camera is determined by the hardware structure of the camera, including the focal length of the camera, the principal point offset, etc.
- the principal axis of the camera is the line perpendicular to the image plane and passing through the vacuum, and the focal point of the principal axis and the image plane is called the principal point.
- the principal point offset is the position of the principal point relative to the image plane.
- the terminal can use blender (an open source cross-platform all-round 3D animation production software, providing a series of animation short film production solutions from modeling, animation, material, rendering, to audio processing, video editing, etc.)
- the tool projects the three-dimensional model to the target area of the first target object in the target image to obtain a special effect image.
- the terminal projects the three-dimensional model onto the target area of the first target object in the target image according to the camera's internal parameter matrix and the calculated camera's external parameter matrix, which can ensure the accuracy of the three-dimensional model projected to the target area.
- the terminal projects the three-dimensional model to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object to obtain the projection area.
- the terminal performs special effect processing on the projection area to obtain a special effect image.
- the projection area includes the first target object, and the first target object exists in the projection area in a three-dimensional form, that is, in the form of a three-dimensional model of the first target object.
- Performing special effects processing on the projection area can be understood as beautifying the first target object existing in three-dimensional form, that is, beautifying the three-dimensional model of the first target object, such as changing the rendering parameters of the three-dimensional model, etc.
- the special effect image contains The beautified three-dimensional image of the first target object.
- the beautification process may include changing the color of the first target object, changing the pattern of the first target object, etc., that is, adjusting the rendering color and rendering texture in the rendering parameters of the three-dimensional model.
- the terminal can fine-tune the projection area to ensure the matching effect between the three-dimensional model and the target area.
- the terminal acquires the mask area of the first target object from the target image, and the mask area is used to indicate the position of the first target object in the target image.
- the terminal adjusts the projection position of the three-dimensional model in the projection area to obtain the adjusted projection area.
- the terminal performs special effect processing on the adjusted projection area to obtain a special effect image.
- the mask is a binary image consisting of 0s and 1s.
- the three-dimensional model is projected onto the target area of the target image by combining the position information of the two-dimensional key points of the first target object and the position information of the corresponding three-dimensional key points in the three-dimensional model, so as to perform special effects on the target area. deal with. Firstly, the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object are obtained, and then the three-dimensional model of the first target object is projected to the target based on the position information of the two-dimensional key points and the position information of the three-dimensional key points. On the target area of the image, the authenticity of the first target object in the target image is improved.
- the terminal may replace the target area of the target image with the projection area to obtain the final effect image.
- the terminal may further fine-tune the final effect image.
- the mask area of the first target object can be extracted from the final effect image, and the mask area can be understood as including the image where the mask is located, where the mask is a binary image composed of 0 and 1.
- Image masks are defined by specifying data values, data ranges, finite or infinite values, regions of interest, and annotation files, or any combination of the above options can be applied as input to build the mask.
- the mask area is matched with the position of the first target object in the final effect image, so that the 3D model has better robustness when the first target object is occluded.
- the nail makeup application scheme may involve a hand segmentation model, a nail segmentation model and a nail key point regression model.
- the hand region is obtained by semantically segmenting the target image containing fingernails using the hand segmentation model.
- the nail region is obtained by semantically segmenting the hand region using the nail segmentation model.
- the key point coordinates of each nail in the nail area are extracted using the nail key point regression model. Then use the PNP algorithm and blender tools to apply makeup to the nails, and replace or cover the nails in the target image to obtain the final nail image.
- a schematic diagram of the process of projecting a three-dimensional model to a target image in a nail makeup scheme is shown.
- the P3P algorithm is used to generate the camera's external parameter matrix.
- the three-dimensional model is projected onto the nail area of the target image by means of the blender tool.
- FIGS. 3 a to 3 j each schematic diagram of a nail makeup application scheme is shown, and FIG. 3 a shows a target image, and the target image includes the nail.
- Figure 3b shows the hand region.
- Fig. 3c shows a schematic diagram of the segmentation result of semantic segmentation of the hand region using the nail segmentation model
- Fig. 3d shows the synthetic graph of the segmentation result of the semantic segmentation of the hand region using the nail segmentation model.
- Figure 3e shows the nail area.
- Figure 3f shows a schematic diagram of the key points of the nail, and the key points are "0", "1", "2” and “3” respectively.
- Figure 3g shows a top view of the three-dimensional model.
- Figure 3h shows the rendering of the 3D model projected onto the nail area.
- Figures 3i and 3j show two makeup effect maps.
- FIG. 4 is a block diagram of an image generating apparatus.
- the image generating apparatus may be applied to a terminal or a server, and the image generating apparatus may specifically include the following modules.
- the acquiring module 41 is configured to acquire a target image, where the target image includes a first target object.
- the position information obtaining module 42 is configured to obtain the position information of the two-dimensional key points of the first target object from the target image.
- the image generation module 43 is configured to project the three-dimensional model of the first target object to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object, and generate a special effect image, and the target area is In the region where the first target object is located in the target image, the three-dimensional key points are key points corresponding to the two-dimensional key points in the three-dimensional model of the first target object.
- the image generation module 43 includes:
- the extrinsic parameter matrix determination unit is configured to determine the extrinsic parameter matrix of the camera based on the position information of the two-dimensional key points and the position information of the three-dimensional key points, and the camera is the camera that shoots the target image.
- the model projection unit is configured to project the three-dimensional model to the target area based on the extrinsic parameter matrix of the camera and the intrinsic parameter matrix of the camera to generate a special effect image.
- the image generation module 43 is configured to project the three-dimensional model to the target area based on the position information of the two-dimensional key points and the position information of the three-dimensional key points of the first target object to obtain the projection area. Perform special effect processing on the projection area to obtain a special effect image.
- the apparatus further includes a fine-tuning module configured to obtain a mask area of the first target object from the target image, the mask area being used to indicate the position of the first target object in the target image. Based on the mask area, the projection position of the three-dimensional model in the projection area is adjusted to obtain the adjusted projection area.
- a fine-tuning module configured to obtain a mask area of the first target object from the target image, the mask area being used to indicate the position of the first target object in the target image. Based on the mask area, the projection position of the three-dimensional model in the projection area is adjusted to obtain the adjusted projection area.
- the image generation module 43 is further configured to perform special effect processing on the adjusted projection area to obtain a special effect image.
- the location information acquisition module 42 includes:
- a segmentation unit configured to input the target image into the segmentation model, to determine the target area.
- the regression unit is configured to input the target area into the key point regression model to determine the position information of the two-dimensional key points.
- the segmentation model includes a first segmentation sub-model and a second segmentation sub-model
- the segmentation unit is configured to input the target image into the first segmentation sub-model, and perform semantic segmentation on the target image by the first segmentation sub-model, A candidate area is obtained, the candidate area includes a second target object, and the first target object belongs to the second target object.
- the candidate region is input into the second segmentation sub-model, and the candidate region is semantically segmented through the second segmentation sub-model to obtain the target region.
- the segmentation unit is configured to input the target image into the segmentation model to obtain a first initial region, the first initial region including the first target object.
- a time series smoothing process is performed on the first initial region to obtain a target region.
- the regression unit is configured to input the target area into a keypoint regression model to obtain position information of the initial two-dimensional keypoints of the first target object.
- the optical flow stabilization process is performed on the position information of the initial two-dimensional key points, and the position information of the two-dimensional key points is obtained.
- FIG. 5 is a block diagram of an image generation electronic device.
- electronic device 500 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.
- an electronic device 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514 , and the communication component 516 .
- the processing component 502 generally controls the overall operation of the electronic device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
- the processing component 502 may include one or more processors 520 to execute instructions to perform all or some of the steps of the image generation method described above. Additionally, processing component 502 may include one or more modules to facilitate interaction between processing component 502 and other components. For example, processing component 502 may include a multimedia module to facilitate interaction between multimedia component 508 and processing component 502.
- Memory 504 is configured to store various types of data to support operation at electronic device 500 . Examples of such data include instructions for any application or method operating on electronic device 500, contact data, phonebook data, messages, images, videos, and the like. Memory 504 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read only memory
- EPROM erasable Programmable Read Only Memory
- PROM Programmable Read Only Memory
- ROM Read Only Memory
- Magnetic Memory Flash Memory
- Magnetic or Optical Disk Magnetic Disk
- Power supply assembly 506 provides power to various components of electronic device 500 .
- Power supply components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 500 .
- Multimedia component 508 includes a screen that provides an output interface between the electronic device 500 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
- the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
- the multimedia component 508 includes a front-facing camera and/or a rear-facing camera. When the electronic device 500 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
- Audio component 510 is configured to output and/or input audio signals.
- audio component 510 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 500 is in operating modes, such as calling mode, recording mode, and voice recognition mode.
- the received audio signal may be further stored in memory 504 or transmitted via communication component 516 .
- the audio component 510 also includes a speaker for outputting audio signals.
- the I/O interface 412 provides an interface between the processing component 502 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
- Sensor assembly 514 includes one or more sensors for providing status assessments of various aspects of electronic device 500 .
- the sensor assembly 514 can detect the open/closed state of the electronic device 500, the relative positioning of the components, such as the display and the keypad of the electronic device 500, the sensor assembly 514 can also detect the electronic device 500 or one of the electronic devices 500 Changes in the positions of components, presence or absence of user contact with the electronic device 500 , orientation or acceleration/deceleration of the electronic device 500 and changes in the temperature of the electronic device 500 .
- Sensor assembly 514 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
- Sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- Communication component 516 is configured to facilitate wired or wireless communication between electronic device 500 and other devices.
- Electronic device 500 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof.
- the communication component 516 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
- the communication component 516 also includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- electronic device 500 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the image generation method described above.
- ASICs application specific integrated circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGA field programmable A programmed gate array
- controller microcontroller, microprocessor or other electronic component implementation is used to perform the image generation method described above.
- a storage medium including instructions such as a memory 504 including instructions, is also provided, and the instructions can be executed by the processor 520 of the electronic device 500 to complete the image generation method described above.
- the storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical disk data storage devices, etc.
- a computer program product includes readable program code executable by the processor 520 of the electronic device 500 to complete the image generation method described above.
- the program code may be stored in a storage medium of the electronic device 500, and the storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be a ROM, a random Access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices, etc.
- FIG. 6 is a block diagram of an electronic device for processing image effects.
- the electronic device 600 may be provided as a server.
- electronic device 600 includes a processing component 622, which further includes one or more processors, and a memory resource, represented by memory 632, for storing instructions executable by processing component 622, such as applications.
- An application program stored in memory 632 may include one or more modules, each corresponding to a set of instructions.
- the processing component 622 is configured to instruct instructions to perform the image generation method described above.
- the electronic device 600 may also include a power supply assembly 626 configured for power management of the electronic device 600, a wired or wireless network interface 650 configured to connect the electronic device 600 to a network, and an input output (I/O) interface 658.
- Electronic device 600 may operate based on an operating system stored in memory 632, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (26)
- 一种图像生成方法,包括:获取目标图像,所述目标图像包含第一目标对象;从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述三维模型中与所述二维关键点对应的关键点。
- 根据权利要求1所述的方法,其中,所述基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,包括:基于所述二维关键点的位置信息和所述三维关键点的位置信息,确定相机的外参矩阵,所述相机为拍摄所述目标图像的相机;基于所述相机的外参矩阵和所述相机的内参矩阵,将所述三维模型投影至所述目标区域,生成所述特效图像。
- 根据权利要求1所述的方法,其中,所述基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,包括:基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至所述目标区域,得到投影区域;对所述投影区域进行特效处理,得到所述特效图像。
- 根据权利要求3所述的方法,其中,所述方法还包括:从所述目标图像中获取所述第一目标对象的掩膜区域,所述掩膜区域用于指示所述第一目标对象在所述目标图像中的位置;基于所述掩膜区域,对所述投影区域中所述三维模型的投影位置进行调整,得到调整后的所述投影区域;所述对所述投影区域进行特效处理,得到所述特效图像包括:对所述调整后的所述投影区域进行特效处理,得到所述特效图像。
- 根据权利要求1所述的方法,其中,所述从所述目标图像中获取所述第一目标对象的二维关键点的位置信息,包括:将所述目标图像输入分割模型,确定所述目标区域;将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息。
- 根据权利要求5所述的方法,其中,所述分割模型包括第一分割子模型和第二分割 子模型,所述将所述目标图像输入分割模型,确定所述目标区域,包括:将所述目标图像输入所述第一分割子模型,通过所述第一分割子模型对所述目标图像进行语义分割,得到候选区域,所述候选区域包括第二目标对象,所述第一目标对象属于所述第二目标对象;将所述候选区域输入所述第二分割子模型,通过所述第而分割子模型对所述候选区域进行语义分割,得到所述目标区域。
- 根据权利要求5所述的方法,其中,所述将所述目标图像输入分割模型,确定所述目标区域,包括:将所述目标图像输入所述分割模型,得到第一初始区域,所述第一初始区域包括所述第一目标对象;对所述第一初始区域进行时序平滑处理,得到所述目标区域。
- 根据权利要求5所述的方法,其中,所述将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息,包括:将所述目标区域输入所述关键点回归模型,得到所述第一目标对象的初始二维关键点的位置信息;对所述初始二维关键点的位置信息进行光流稳定处理,得到所述二维关键点的位置信息。
- 一种图像生成装置,包括:获取模块,被配置为获取目标图像,所述目标图像包含第一目标对象;位置信息获取模块,被配置为从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;图像生成模块,被配置为基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
- 根据权利要求9所述的装置,其中,所述图像生成模块,包括:外参矩阵确定单元,被配置为基于所述二维关键点的位置信息和所述三维关键点的位置信息,确定相机的外参矩阵,所述相机为拍摄所述目标图像的相机;模型投影单元,被配置为基于所述相机的外参矩阵和所述相机的内参矩阵,将所述三维模型投影至所述目标区域,生成所述特效图像。
- 根据权利要求9所述的装置,其中,所述图像生成模块,被配置为基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至所述目标区域,得到投影区域;对所述投影区域进行特效处理,得到所述特效图像。
- 根据权利要求9所述的装置,其中,所述装置还包括微调模块,被配置为从所述目标图像中获取所述第一目标对象的掩膜区域,所述掩膜区域用于指示所述第一目标对象在所述目标图像中的位置;基于所述掩膜区域,对所述投影区域中所述三维模型的投影位置进行调整,得到调整后的所述投影区域;所述图像生成模块,还被配置为对所述调整后的所述投影区域进行特效处理,得到所述特效图像。
- 根据权利要求9所述的装置,其中,所述位置信息获取模块,包括:分割单元,被配置为将所述目标图像输入分割模型,确定所述目标区域;回归单元,被配置为将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息。
- 根据权利要求13所述的装置,其中,所述分割模型包括第一分割子模型和第二分割子模型,所述分割单元,被配置为将所述目标图像输入所述第一分割子模型,通过所述第一分割子模型对所述目标图像进行语义分割,得到候选区域,所述候选区域包括第二目标对象,所述第一目标对象属于所述第二目标对象;将所述候选区域输入所述第二分割子模型,通过所述第而分割子模型对所述候选区域进行语义分割,得到所述目标区域。
- 根据权利要求13所述的装置,其中,所述分割单元,被配置为将所述目标图像输入所述分割模型,得到第一初始区域,所述第一初始区域包括所述第一目标对象;对所述第一初始区域进行时序平滑处理,得到所述目标区域。
- 根据权利要求13所述的装置,其中,所述回归单元,被配置为将所述目标区域输入所述关键点回归模型,得到所述第一目标对象的初始二维关键点的位置信息;对所述初始二维关键点的位置信息进行光流稳定处理,得到所述二维关键点的位置信息。
- 一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为所述指令,以执行下述步骤:获取目标图像,所述目标图像包含第一目标对象;从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述第一目标对象的三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
- 根据权利要求17所述的电子设备,其中,所述处理器被配置为下述步骤:基于所述二维关键点的位置信息和所述三维关键点的位置信息,确定相机的外参矩阵,所述相机为拍摄所述目标图像的相机;基于所述相机的外参矩阵和所述相机的内参矩阵,将所述三维模型投影至所述目标区域,生成所述特效图像。
- 根据权利要求17所述的电子设备,其中,所述处理器被配置为下述步骤:基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至所述目标区域,得到投影区域;对所述投影区域进行特效处理,得到所述特效图像。
- 根据权利要求19所述的电子设备,其中,所述处理器被配置为下述步骤:从所述目标图像中获取所述第一目标对象的掩膜区域,所述掩膜区域用于指示所述第一目标对象在所述目标图像中的位置;基于所述掩膜区域,对所述投影区域中所述三维模型的投影位置进行调整,得到调整后的所述投影区域;所述对所述投影区域进行特效处理,得到所述特效图像包括:对所述调整后的所述投影区域进行特效处理,得到所述特效图像。
- 根据权利要求17所述的电子设备,其中,所述处理器被配置为下述步骤:将所述目标图像输入分割模型,确定所述目标区域;将所述目标区域输入关键点回归模型,确定所述二维关键点的位置信息。
- 根据权利要求21所述的电子设备,其中,所述分割模型包括第一分割子模型和第二分割子模型,所述处理器被配置为下述步骤:将所述目标图像输入所述第一分割子模型,通过所述第一分割子模型对所述目标图像进行语义分割,得到候选区域,所述候选区域包括第二目标对象,所述第一目标对象属于所述第二目标对象;将所述候选区域输入所述第二分割子模型,通过所述第而分割子模型对所述候选区域进行语义分割,得到所述目标区域。
- 根据权利要求21所述的电子设备,其中,所述处理器被配置为下述步骤:将所述目标图像输入所述分割模型,得到第一初始区域,所述第一初始区域包括所述第一目标对象;对所述第一初始区域进行时序平滑处理,得到所述目标区域。
- 根据权利要求21所述的电子设备,其中,所述处理器被配置为下述步骤:将所述目标区域输入所述关键点回归模型,得到所述第一目标对象的初始二维关键点 的位置信息;对所述初始二维关键点的位置信息进行光流稳定处理,得到所述二维关键点的位置信息。
- 一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得所述电子设备能够执行下述步骤:获取目标图像,所述目标图像包含第一目标对象;从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
- 一种计算机程序产品,包括可读性程序代码,所述可读性程序代码可由电子设备的处理器执行下述步骤:获取目标图像,所述目标图像包含第一目标对象;从所述目标图像中获取所述第一目标对象的二维关键点的位置信息;基于所述二维关键点的位置信息,以及所述第一目标对象的三维关键点的位置信息,将所述三维模型投影至目标区域,生成特效图像,所述目标区域为所述第一目标对象在所述目标图像中所在的区域,所述三维关键点为所述第一目标对象的三维模型中与所述二维关键点对应的关键点。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011199693.3 | 2020-10-29 | ||
CN202011199693.3A CN112669198A (zh) | 2020-10-29 | 2020-10-29 | 图像特效的处理方法、装置、电子设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022088750A1 true WO2022088750A1 (zh) | 2022-05-05 |
Family
ID=75402841
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/105334 WO2022088750A1 (zh) | 2020-10-29 | 2021-07-08 | 图像生成方法和电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112669198A (zh) |
WO (1) | WO2022088750A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112669198A (zh) * | 2020-10-29 | 2021-04-16 | 北京达佳互联信息技术有限公司 | 图像特效的处理方法、装置、电子设备和存储介质 |
CN113421182B (zh) * | 2021-05-20 | 2023-11-28 | 北京达佳互联信息技术有限公司 | 三维重建方法、装置、电子设备及存储介质 |
CN114359522B (zh) * | 2021-12-23 | 2024-06-18 | 阿依瓦(北京)技术有限公司 | Ar模型放置方法及装置 |
CN115358958A (zh) * | 2022-08-26 | 2022-11-18 | 北京字跳网络技术有限公司 | 特效图生成方法、装置、设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090262989A1 (en) * | 2007-07-12 | 2009-10-22 | Tatsuo Kozakaya | Image processing apparatus and method |
CN109272579A (zh) * | 2018-08-16 | 2019-01-25 | Oppo广东移动通信有限公司 | 基于三维模型的美妆方法、装置、电子设备和存储介质 |
CN110675489A (zh) * | 2019-09-25 | 2020-01-10 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置、电子设备和存储介质 |
CN111047526A (zh) * | 2019-11-22 | 2020-04-21 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置、电子设备及存储介质 |
CN112669198A (zh) * | 2020-10-29 | 2021-04-16 | 北京达佳互联信息技术有限公司 | 图像特效的处理方法、装置、电子设备和存储介质 |
-
2020
- 2020-10-29 CN CN202011199693.3A patent/CN112669198A/zh active Pending
-
2021
- 2021-07-08 WO PCT/CN2021/105334 patent/WO2022088750A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090262989A1 (en) * | 2007-07-12 | 2009-10-22 | Tatsuo Kozakaya | Image processing apparatus and method |
CN109272579A (zh) * | 2018-08-16 | 2019-01-25 | Oppo广东移动通信有限公司 | 基于三维模型的美妆方法、装置、电子设备和存储介质 |
CN110675489A (zh) * | 2019-09-25 | 2020-01-10 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置、电子设备和存储介质 |
CN111047526A (zh) * | 2019-11-22 | 2020-04-21 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置、电子设备及存储介质 |
CN112669198A (zh) * | 2020-10-29 | 2021-04-16 | 北京达佳互联信息技术有限公司 | 图像特效的处理方法、装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112669198A (zh) | 2021-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022088750A1 (zh) | 图像生成方法和电子设备 | |
US11114130B2 (en) | Method and device for processing video | |
WO2022179026A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
WO2020007241A1 (zh) | 图像处理方法和装置、电子设备以及计算机可读存储介质 | |
CN104639843B (zh) | 图像处理方法及装置 | |
WO2022179025A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
CN110929651A (zh) | 图像处理方法、装置、电子设备及存储介质 | |
US11308692B2 (en) | Method and device for processing image, and storage medium | |
WO2016011747A1 (zh) | 肤色调整方法和装置 | |
US11030733B2 (en) | Method, electronic device and storage medium for processing image | |
CN104484858B (zh) | 人物图像处理方法及装置 | |
CN109840939B (zh) | 三维重建方法、装置、电子设备及存储介质 | |
CN107341777B (zh) | 图片处理方法及装置 | |
CN109857311A (zh) | 生成人脸三维模型的方法、装置、终端及存储介质 | |
WO2022110837A1 (zh) | 图像处理方法及装置 | |
WO2022077970A1 (zh) | 特效添加方法及装置 | |
WO2022193466A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
US11252341B2 (en) | Method and device for shooting image, and storage medium | |
CN113643356A (zh) | 相机位姿确定、虚拟物体显示方法、装置及电子设备 | |
CN110580688A (zh) | 一种图像处理方法、装置、电子设备及存储介质 | |
US20210118148A1 (en) | Method and electronic device for changing faces of facial image | |
JP2022055302A (ja) | 遮蔽された画像の検出方法、装置、及び媒体 | |
CN114463212A (zh) | 图像处理方法及装置、电子设备和存储介质 | |
CN113570581A (zh) | 图像处理方法及装置、电子设备和存储介质 | |
CN113763286A (zh) | 图像处理方法及装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21884487 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21884487 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.08.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21884487 Country of ref document: EP Kind code of ref document: A1 |