WO2022188056A1 - Method and device for image processing, and storage medium - Google Patents

Method and device for image processing, and storage medium Download PDF

Info

Publication number
WO2022188056A1
WO2022188056A1 PCT/CN2021/079924 CN2021079924W WO2022188056A1 WO 2022188056 A1 WO2022188056 A1 WO 2022188056A1 CN 2021079924 W CN2021079924 W CN 2021079924W WO 2022188056 A1 WO2022188056 A1 WO 2022188056A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
template
target
foreground target
foreground
Prior art date
Application number
PCT/CN2021/079924
Other languages
French (fr)
Chinese (zh)
Inventor
聂谷洪
胡晓翔
施泽浩
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2021/079924 priority Critical patent/WO2022188056A1/en
Publication of WO2022188056A1 publication Critical patent/WO2022188056A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • the present application relates to the technical field of image processing, and in particular, to an image processing method, device, and storage medium.
  • image processing has become an indispensable part of people's lives. Whether it is professional image processing at work or entertainment image processing in life, one of the more popular image processing
  • One of the processing methods is that the user manually extracts the foreground area in the image, and then fills it into another template image to generate a PS image. In this way, the action of the character represented by the foreground area is likely to be unsightly and not suitable for the template image, thereby reducing the quality of the composite image.
  • one of the objectives of the present application is to provide an image processing method, device and storage medium.
  • an embodiment of the present application provides an image processing method, including:
  • Matching the foreground target image with the preset posture template if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
  • an embodiment of the present application provides an image processing apparatus, including a processor and a memory storing a computer program
  • the processor implements the following steps when executing the computer program:
  • Matching the foreground target image with the preset posture template if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the method according to the first aspect.
  • An image processing method, device, and storage medium acquire a material image and a template image, wherein the material image represents an image that provides a foreground target, and the template image represents an image that provides a background. After obtaining the foreground target image through the segmentation process, the foreground target image is matched with a preset posture template, and if the matching is successful, the template image and the foreground target image are fused to generate a fusion image.
  • the gesture template it is determined whether the posture of the foreground target in the foreground target image meets the preset requirements (such as meeting the requirements of beautiful display or the requirements of some other scenes) through the gesture template, and in the case of successful matching, it indicates that The posture of the foreground target in the foreground target image meets the preset requirements and can be adapted to the template image.
  • the template image and the foreground target image are fused, thereby helping to improve fusion. image quality.
  • Fig. 1, Fig. 5 and Fig. 7 are different flow diagrams of the image processing method provided by an embodiment of the present application.
  • FIG. 3A and FIG. 6A are different schematic diagrams of template images provided by an embodiment of the present application.
  • 2B is a schematic diagram of an image including a first foreground target provided by an embodiment of the present application
  • 2C is a schematic diagram of a target area of a template image provided by an embodiment of the present application.
  • 2D is a schematic diagram of a gesture template provided by an embodiment of the present application.
  • 2E is a schematic diagram of joint point detection provided by an embodiment of the present application.
  • 3B and 3C are different schematic diagrams of the correspondence between classification results and gesture templates provided by an embodiment of the present application.
  • FIG. 4A is a schematic diagram of a material image provided by an embodiment of the present application.
  • 4B is a schematic diagram of a foreground target image provided by an embodiment of the present application.
  • 4C is a schematic diagram of a fused image provided by an embodiment of the present application.
  • 6B is a schematic diagram of a material image set provided by an embodiment of the present application.
  • 6C is a schematic diagram of a gesture template set and an adapted target area provided by an embodiment of the present application.
  • 6D is a schematic diagram of all fused images of a material image set provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • an embodiment of the present application provides an image processing method for acquiring a material image and a template image, wherein the material image represents an image that provides a foreground target, and the template image represents an image that provides a background. After performing the foreground segmentation process to obtain the foreground target image, the foreground target image is matched with a preset posture template, and if the matching is successful, the template image and the foreground target image are fused to generate a fusion image.
  • the posture of the foreground target in the foreground target image meets the preset requirements (such as meeting the requirements of beautiful display) through the gesture template, and in the case of successful matching, it indicates that the foreground target image in the foreground target image.
  • the posture of the foreground target meets the preset requirements and can be adapted to the template image. Only in this case can the template image and the foreground target image be fused, thereby helping to improve the quality of the fused image.
  • the image processing methods provided in the embodiments of the present application may be executed by an image processing apparatus.
  • the image processing device may be a computer chip or an integrated circuit with data processing capability, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC) or off-the-shelf Programmable Gate Array (Field-Programmable Gate Array, FPGA), etc.
  • the image processing apparatus may be installed in a mobile platform, a terminal device, or a server, or other equipment with image processing functions.
  • the image processing apparatus may refer to an entity device with image processing functions, such as a movable platform, a terminal device, or a server.
  • a movable platform include, but are not limited to, unmanned aerial vehicles, unmanned vehicles, pan-tilts, unmanned ships, or mobile robots.
  • terminal devices include, but are not limited to: smartphones/mobile phones, tablet computers, personal digital assistants (PDAs), laptop computers, desktop computers, media content players, video game stations/systems, virtual reality systems, augmented reality Systems, wearable devices (eg, watches, glasses, gloves, headwear (eg, hats, helmets, virtual reality headsets, augmented reality headsets, head mounted devices (HMDs), headbands), pendants, armbands , leg loops, shoes, vest), remote control, or any other type of device.
  • PDAs personal digital assistants
  • laptop computers desktop computers
  • media content players e.g, video game stations/systems
  • virtual reality systems augmented reality Systems
  • wearable devices eg, watches, glasses, gloves, headwear (eg, hats, helmets, virtual reality headsets, augmented reality headsets, head mounted devices (HMDs), headbands), pendants, armbands , leg loops, shoes, vest), remote control, or any other type of device.
  • HMDs head mounted devices
  • the user can select a template image based on actual needs, for example, multiple template images can be displayed on the interactive interface. After the user selects a template image on the interactive interface, the corresponding template image can be displayed on the interactive interface.
  • One or more of the gesture templates are available for the user to select; or, the user can also select a gesture template that matches the template image from the gesture database according to his own experience or needs; the gesture template can be used to collect or Select a material image to provide guidance.
  • the gesture template can be an image, text, or sound, etc. way to present. That is to say, the material image can be collected or selected according to the posture template, for example, the material image of the user can be collected through a handheld gimbal, mobile phone, unmanned aerial vehicle, etc. A separate imaging device to capture the user's material image.
  • the image processing device may perform foreground segmentation processing on the material image, acquire a foreground target image, and then match the foreground target image with a preset posture template to achieve the The posture template to evaluate the posture of the foreground target in the foreground target image all meet the preset criteria, if the matching is successful, it indicates that the posture of the foreground target in the foreground target image meets the preset requirements, and the foreground target image If it is suitable for presentation in combination with the template image, the foreground target image can be filled in the template image to fuse the template image and the foreground target image to generate a fusion image, which is conducive to improving the fusion image. It also improves the user's visual experience.
  • FIG. 1 is a schematic flowchart of an image processing method provided by the embodiment of the present application.
  • the method may be executed by an image processing apparatus. Methods include:
  • step S101 a material image and a template image are acquired.
  • step S102 foreground segmentation processing is performed on the material image to obtain a foreground target image.
  • step S103 the foreground target image is matched with a preset gesture template, and if the matching is successful, the foreground target image is filled in the template image, so that the template image and the foreground target image are matched. Fusion, to generate a fused image.
  • the material image is used to provide the foreground target
  • the template image is used to provide the background.
  • the fusion image is generated based on the foreground target provided by the material image and the background provided by the material image.
  • the material image and the template image may be selected by the user according to actual needs.
  • the material image may be obtained by a user based on an imaging device, and the material image may be an image that has been captured in advance, or may be an image captured by the imaging device in real time.
  • the gesture template may be determined according to the template image, the gesture template may be used to provide guidance for the user to collect or select material images, and the gesture template may be in the form of images, text, or sounds, etc. render.
  • the template image corresponds to one or more gesture templates
  • the preset gesture template may be at least one of all gesture templates corresponding to the template image.
  • the preset gesture template may be composed of The user selects from all gesture templates corresponding to the template image.
  • the preset gesture template may also be at least one gesture template selected by the user according to a template image selected by himself, combined with his own experience or needs, from a gesture database that fits the template image; wherein the The gesture database includes several gesture templates.
  • one or more gesture templates corresponding to the template image can be determined at least in the following ways:
  • foreground segmentation processing may be performed on the template image to obtain a first foreground target, and then one or more gesture templates corresponding to the template image may be generated according to the first foreground target.
  • semantic segmentation processing may also be performed on the template image in advance to obtain a classification result of each pixel in the template image, and one or more gesture templates corresponding to the template image may be determined from the gesture database according to the classification result. .
  • the foreground target image and the template image are not directly fused.
  • the posture of the foreground target meets the preset requirements (such as meeting the requirements of beautiful display or meeting certain scene requirements)
  • the foreground target image and the preset When the posture template matching is successful, it is determined that the posture of the foreground target in the target image can be adapted to the template image, and the combination of the two can have a good presentation effect, so the foreground target image is filled in the
  • the template image is used to fuse the template image and the foreground target image to generate a fusion image, thereby helping to improve the quality of the fusion image.
  • the template image includes a preset target area
  • the filling the foreground target image into the template image includes: filling the foreground target image into the target area
  • the The target area can represent an area in the template image that has a high degree of adaptation to the foreground target image.
  • the foreground target image is filled in the target area, the final generated fusion image has a good and natural appearance.
  • Presentation effect in one example, the foreground template image may be an image of a "bicycle", the template image is an image containing a road, and the target area may be a road in the template image your region.
  • the image processing apparatus may perform the following processing on all template images in advance, or perform the following processing on the acquired template images in response to a template image selection instruction from a user: the image processing apparatus
  • the template image shown in FIG. 2A can be subjected to foreground segmentation processing to obtain the first foreground target shown in FIG. 2B (for the convenience of distinction, the foreground target obtained from the template image is called the first foreground target, and the first foreground target is obtained from the The foreground target obtained in the image is called the second foreground target), and the target area shown in FIG.
  • the 2C in the template image is determined according to the first foreground target, and finally, according to the target area and/or the first target area A foreground target is generated, and the gesture template is generated; in this embodiment, the target area and the gesture template are both determined based on the first foreground target in the same template image.
  • the foreground target image of the material image is filled into the target area of the template image, so as to achieve a good match between the foreground target image of the material image and the background provided by the template image.
  • Combining and realistic rendering which is beneficial to improve the quality of the fused image.
  • the gesture template in one example, for example, when the gesture template is presented in the form of an image, it can be as shown in Figure 2D; in another example, for example, when the gesture template is presented in the form of text, it can be " In another example, for example, the gesture template presented in the form of sound may be a speech signal of "open hands and jump".
  • FIG. 2A , FIG. 2B , FIG. 2C and FIG. 2D illustrate the template image, the first foreground target, the target area and the gesture template respectively, and do not constitute a limitation on the above processing procedure of the present application.
  • the image processing apparatus may perform semantic segmentation processing on the template image, obtain a classification result of each pixel in the template image, and then according to the The classification result of each pixel in the template image obtains the first foreground target; for example, after foreground segmentation is performed on the template image shown in FIG. 2A, the classification result that the pixel belongs to any one of ⁇ sky, tree, person ⁇ can be obtained , the first foreground target can be obtained according to the pixels whose classification result is human.
  • each pixel in the template image can be accurately classified, thereby improving the accuracy of the obtained first foreground target.
  • the first foreground target may also be obtained in other manners, for example, a trimap algorithm is used to obtain the first foreground target, which is not limited in this embodiment.
  • the image processing apparatus may generate the gesture template according to the contour information of the target area; in the second example, please refer to FIG. 2E, The image processing device may perform joint point detection on the first foreground target to obtain joint point detection results, and the joint point detection results may include at least one of the following: an angle between joint points, a joint point type or a joint point distribution position, and then the pose template shown in FIG. 2D can be generated according to the joint point detection result; in the third example, the image processing device can combine the contour information of the target area and the first foreground The joint point detection result of the target is used to obtain the pose template.
  • the gesture template includes, but is not limited to, presentation in the form of images, text, or sounds.
  • the target area and the gesture template are both determined based on the first foreground target in the same template image, then the foreground target image of the material image that is successfully matched with the gesture template is the same as the target image.
  • the region also has a high degree of fit. In the case of successful matching, filling the foreground target image into the target region is beneficial to improve the authenticity of the subsequently obtained fused image.
  • multiple template images are displayed on the interactive interface, and the user can select the template image according to actual needs on the interactive interface, that is, select the background to be merged; It is the image shown in FIG. 2A , that is, the image of the target area and the posture template has not been obtained through the above-mentioned processing.
  • the image processing device can respond to the user's template image selection instruction for the selected template. Perform the above processing on the image to obtain the target area and posture template of the template image; in another example, the template image may be an image as shown in FIG.
  • the gesture template obtained in the above manner can be correspondingly displayed on the interactive interface, so as to provide guidance for the user to collect or select material images through the gesture template.
  • the gesture template may be acquired from a gesture database including several gesture templates according to the template image.
  • the gesture template may be obtained by the user from a gesture database including several gesture templates based on the user's own experience or needs according to the selected template image.
  • the gesture template may be acquired by the image processing apparatus from a gesture database including several gesture templates according to the template image; The device can perform semantic segmentation processing on the template image, and obtain the classification result of each pixel in the template image; for example, after performing semantic segmentation on the template image shown in FIG.
  • the image processing apparatus may obtain one or more gesture templates corresponding to the template image from the gesture database according to the classification result.
  • the user may select at least one gesture template to be referenced according to actual needs.
  • the image processing apparatus may obtain one or more corresponding to the template image from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.
  • Multiple pose templates for example, as shown in Figures 3B and 3C, there are 3 pose templates corresponding to the classification result of "beach", and 4 pose templates corresponding to the classification result of "road”, where the pose template is an image example.
  • a classification model may be pre-trained, and the classification model may be used to obtain a pose template according to the classification result; as an example, the classification model may be based on a supervised learning method, using a deep learning model, using Several classification result samples carrying pose template labels are obtained by training. Then, the image processing apparatus may input the classification result into the classification model, and obtain one or more gesture templates corresponding to the template image through the classification model.
  • the target area may be generated in the template image according to the selected gesture template.
  • contour information of the gesture template may be acquired, and then the target area may be generated in the template image according to the contour information.
  • the outline information of the gesture template may be acquired, and then the target area is generated according to the outline information in the area specified by the user in the template image.
  • edge detection may be performed on the image to obtain the outline information of the gesture template; in the case that the gesture template is presented in the form of text or sound, the The outline information of the gesture template is determined according to the semantic information obtained from the text or sound.
  • the classification result can well reflect the suitable position of the foreground target image that is successfully matched with the gesture template in the template image, for example, the classification result corresponds to "road"
  • the foreground target images that are successfully matched with the 4 kinds of posture templates are suitable to be presented at the position of the road.
  • the image processing device can obtain The outline information of the posture template and the region where the pixels belonging to the classification result corresponding to the posture template are determined in the template image, and then a target area is generated in the area and/or the adjacent area of the area according to the outline information
  • the corresponding classification result of the selected gesture template is "road”
  • a place in the template image suitable for generating the target area is determined, so that the final generated fusion image is more natural and less rigid, which is beneficial to improve the real effect of the fusion image. , to enhance the user's visual experience.
  • the user may select a material image with reference to the gesture template or use an imaging device to collect the material image in real time.
  • the image processing device may perform foreground segmentation on the material image in step S102 Process to obtain the foreground target image.
  • the image processing apparatus may perform semantic segmentation processing on the material image, obtain the classification result of each pixel in the material image, and then obtain the classification result of each pixel in the material image according to the classification result of each pixel in the material image.
  • the foreground target image for example, after foreground segmentation is performed on the material image shown in FIG.
  • the classification result that the pixel belongs to any one of ⁇ buildings, plants, and people ⁇ can be obtained, and then the pixels whose classification results are people can be obtained as follows:
  • each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.
  • FIG. 4A and FIG. 4B are examples of the material image and the foreground target image respectively, and are not construed as a limitation of the above processing manner.
  • the image processing apparatus may match the foreground target image with a preset gesture template, and this embodiment implements evaluating the foreground target by using the gesture template How well the image fits the template image.
  • the matching process is to determine whether the difference between the foreground target image and the gesture template satisfies a preset condition, and if so, it is determined that the foreground target image and the gesture template are successfully matched, indicating that the The foreground target image obtained from the image is adapted to the template image.
  • the contour structure and the gesture of the second foreground target indicated by the foreground target image may be acquired
  • the contour structure of the template for example, edge detection can be performed on the foreground target image and the posture template respectively, so as to obtain the contour structures of the two, and then determine whether the matching is successful according to the similarity between the two contour structures, such as When the similarity between the two is greater than the preset threshold, it is determined that the matching is successful, indicating that the foreground target image obtained from the material image is suitable for filling in the target area of the template image.
  • the preset threshold may be specifically set according to the actual application scenario.
  • joint point detection may be performed on the second foreground target indicated by the foreground target image to obtain The joint point detection result of the second foreground target; and when the posture template is presented in the form of an image, performing joint point detection on the posture template to obtain the joint point detection result of the posture template; the joint point detection result includes: At least one of the following: the angle between the joint points, the type of the joint points, or the distribution position of the joint points; and then determine whether the matching is successful according to the difference between the joint point detection result of the second foreground target and the joint point detection result of the pose template , for example, if the difference between the two is within a preset range, it is determined that the matching is successful, indicating that the foreground target image obtained from the material image is suitable for filling in the target area of the template image.
  • the preset range can be specifically set according to the actual application scenario.
  • the image processing apparatus may fill the foreground target image into the target area to fuse the template image and the foreground target image , generate a fusion image; for example, fill the foreground image shown in Figure 4B into the target area shown in Figure 2C, and fuse the template image and the foreground target image to generate a fusion image as shown in Figure 4C .
  • the degree of adaptation between the foreground target image and the template image is determined by the gesture template, and the foreground target image is filled into the target area of the template image when the matching is successful, so that the resulting fusion
  • the image is more natural and not rigid, which is conducive to improving the real effect of the fusion image and improving the user's visual experience.
  • the following deformation processing process can be considered based on the actual scene, so that the foreground target image completely corresponds to the target area:
  • the first foreground target image may indicate the first Two contour structures of the foreground target, perform deformation processing on the target area, so that the contour structure of the target area can be completely adapted to the second foreground target, and then fill the foreground target image into the deformed target area. , so that a better fusion effect can be obtained and the resulting fusion image is more natural.
  • the target color gamut range of the fusion image to be generated can be determined according to the color gamut range of the posture image and the color gamut range of the template image, and then according to the The target color gamut range performs color mapping on the image preliminarily synthesized by the template image and the foreground target image to generate the fusion image. Obtrusive, further increasing the realism of the fused image.
  • FIG. 5 is a schematic flowchart of another image processing method provided by this embodiment of the present application.
  • a template image is used to generate a fusion video
  • the material image may be one of a set of material images
  • the gesture template may be one of a set of gesture templates
  • the method may be performed by an image processing apparatus, and the method include:
  • step S201 a material image set and a template image are acquired.
  • step S202 foreground segmentation processing is performed on a plurality of material images included in the material image set, respectively, and a foreground target image corresponding to each material image is obtained.
  • step S203 for the foreground target image corresponding to each material image, the foreground target image is matched with the gesture template in the gesture template set, and if the matching is successful, the foreground target image is filled in the template image to The template image and the foreground target image are fused to generate a fusion image.
  • step S204 a fused video is generated according to the fused images corresponding to the material image set.
  • the multiple material images included in the material image set may be multiple material images selected by the user according to actual needs, or may be multiple frames in a piece of material video.
  • the gesture template set may be obtained from a database including a plurality of gesture template sets according to the template image.
  • the image processing apparatus may perform semantic segmentation processing on the template image, Obtain the classification result of each pixel in the template image, and then obtain the pose template set corresponding to the template image from the pose database according to the classification result; for example, according to the correspondence between the pre-stored classification result and the pose template set, and the classification result of each pixel in the template image, and obtain the gesture template set corresponding to the template image from the gesture database.
  • the target area may be generated in the template image according to the gesture template in the gesture template set, so that the foreground target image corresponding to the material image and the template image have good rendering results.
  • the target area may be generated in the template image according to the outline information of the gesture template (refer to the above description about generating the target area in the template image, which will not be repeated here).
  • the target area in the template image is adaptively generated based on different gesture templates in the gesture template set, so as to satisfy the needs of different gesture templates. Matching fusion requirements of different foreground target images, so that the foreground target image corresponding to the material image and the template image have a good presentation result after fusion.
  • each template image that only includes a target area corresponding to the gesture template may be acquired.
  • the image processing apparatus may perform semantic segmentation processing on a plurality of material images included in the material image set, respectively, to obtain a classification result of each pixel in each material image, Then, for each material image, a foreground target image corresponding to the material image is obtained according to the classification result of each pixel in the material image. Based on the semantic segmentation method in this embodiment, each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.
  • the image processing apparatus matches the foreground target image with the gesture templates in the gesture template set; for example, the foreground target image may be sequentially matched with the gesture templates in the gesture template set. Matching is performed until the matching is successful; or, in order to save computing resources, the foreground target image may be matched with one or more gesture templates specified in the gesture template set.
  • a target gesture template that is successfully matched with the foreground target image is determined, and then a target area can be generated in the template image according to the target gesture template, Then, the foreground target image is filled into the target area to fuse the template image and the foreground target image to generate a fusion image; finally, a fusion video is generated according to the fusion image corresponding to each material image.
  • the foreground target image obtained based on the material image set can be fused with the same template image, and the target area in the same template image is adaptively generated based on different posture templates, so that the The fusion video of different actions in the same scene further improves the user experience.
  • the target area may be generated in advance based on each gesture template in the gesture template set; or if the matching is successful, the target area may be generated in the template image in real time according to the target gesture template matched with the foreground target image, This embodiment does not impose any restrictions on the generation timing of the target area.
  • the user can select a template image and a set of material images according to actual needs, and then the image processing apparatus can acquire the corresponding gesture from the gesture database according to the template image after acquiring the template image.
  • a template set as an example, as shown in FIG. 6A, for example, the template image may be an image with a road background, as shown in FIG. 6B (3 material images are used as an example in FIG. 6B), a plurality of materials in the material image set
  • the image can be obtained by photographing the running process of the user, as shown in FIG. 6C ( FIG. 6C shows three examples of posture templates in the form of images), and the plurality of posture templates in the posture template set represent running posture templates in the running process. .
  • the image processing apparatus may perform foreground segmentation processing on a plurality of material images included in the material image set, respectively, to obtain foreground target images corresponding to each material image. Then, for the foreground target image corresponding to each material image, match the foreground target image with the posture template in the posture template set, if the matching is successful, obtain the target posture template matching the foreground target image, and then according to the target posture template
  • the gesture template generates a target area in the template image (as shown in FIG. 6C ), and then fills the foreground target image into the target area to fuse the template image and the foreground target image to generate The fusion image shown in Figure 6D; finally, a fusion video is generated according to the fusion image corresponding to each material image.
  • the foreground target image obtained based on the material image set can be fused with the same template image, and the target area in the same template image is adaptively generated based on different posture templates, so that the The fusion video of different actions in the same scene further improves the user experience.
  • a fused video can also be generated.
  • this embodiment of the present application also provides another image processing method.
  • This embodiment can generate a fusion video based on a template image set, where the material image is one of the material image sets; the gesture template is a gesture one of the template set; and, the template image is one of the template image set; the method can be performed by an image processing apparatus, and the method includes:
  • step S301 a material image set and a template image set are acquired.
  • step S302 foreground segmentation processing is performed on a plurality of material images included in the material image set, respectively, and a foreground target image corresponding to each material image is obtained.
  • step S303 for the foreground target image corresponding to each material image, match the foreground target image with the gesture template in the gesture template set; if the matching is successful, according to the target gesture template matched with the foreground target image, in the A target template image is determined in the template image set; the foreground target image and the target template image are fused to generate a fusion image.
  • step S304 a fusion video is generated according to the fusion image corresponding to the material image set.
  • the gesture template set may be determined according to the template image set; each template image in the template image set corresponds to one or more gesture templates, which may be determined based on each template in the template image set
  • One or more gesture templates corresponding to the images are used to generate the gesture template set, that is, there is a corresponding relationship between the gesture templates in the gesture template set and the template images in the template image set.
  • one or more gesture templates corresponding to the template image may be generated according to the first foreground target in the template image; in another example, one or more gesture templates corresponding to the template image , which may be obtained from a gesture database based on the template image.
  • the template images in the template image set include a preset target area
  • filling the foreground target image into the target template image includes: filling the foreground target image into the target template image. in the target area of the target template image.
  • foreground segmentation processing may be performed on the template images in the template image set to obtain a first foreground target, and the target area in the template image may be determined according to the first foreground target; and then according to the target region and/or the first foreground target, the gesture template of the template image is generated; finally, the gesture template set is obtained based on all gesture templates corresponding to all template images in the template image set.
  • both the target area and the gesture template are determined based on the first foreground target in the same template image, and when it is determined that the foreground target image of the material image is successfully matched with the gesture template, the The foreground target image of the material image is filled in the target area of the template image, so as to achieve a good combination and real presentation of the foreground target image of the material image and the background provided by the template image.
  • semantic segmentation processing may be performed on the template image in the template image to obtain a classification result of each pixel in the template image; Obtain the posture template corresponding to each of the template images in the database; in one example, it can be obtained from the posture database according to the correspondence between the pre-stored classification results and the posture templates, and the classification results of each pixel in the template image.
  • the posture template corresponding to each of the template images; then the posture template set can be obtained based on all posture templates corresponding to all the template images in the template image set; The target area is generated in the template image.
  • the image processing apparatus may perform semantic segmentation processing on a plurality of material images included in the material image set, respectively, to obtain a classification result of each pixel in each material image, Then, for each material image, a foreground target image corresponding to the material image is obtained according to the classification result of each pixel in the material image. Based on the semantic segmentation method in this embodiment, each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.
  • the image processing apparatus matches the foreground target image with the gesture templates in the gesture template set; for example, the foreground target image may be sequentially matched with the gesture templates in the gesture template set. Matching is performed until the matching is successful; or, in order to save computing resources, the foreground target image may be matched with one or more gesture templates specified in the gesture template set.
  • a target template corresponding to the target gesture template may be determined in the template image set according to the target gesture template successfully matched with the foreground target image Then fill the foreground target image into the target area of the target template image to fuse the foreground target image and the target template image to generate a fusion image; finally generate a fusion image corresponding to each material image Fusion video.
  • the foreground target image obtained based on the material image set can be fused with the template image in the template image set, thereby generating a fusion video of performing the same or different actions in different scenes, further improving the user experience. user experience.
  • the template image set is the same as the target template image.
  • the template images adjacent to the target template image include template images adjacent to the acquisition time; for example, when the template image set is a set of frames of a template video, the template images adjacent to the target template image
  • the adjacent template images may be the frames before and after the target template image; finally, the foreground target image, the target template image, and the template image adjacent to the target template image are fused.
  • the template image adjacent to the target template image is used to fill the vacant position of the foreground target image and the target template image after preliminary synthesis, so as to achieve a better fusion effect.
  • a partial image may be acquired at the same position in the template image adjacent to the target template image, and then the foreground target The image, the target template image and the partial image are fused.
  • the partial image can be used to fill in the vacant position of the foreground target image and the target template image after preliminary synthesis, so as to achieve better fusion. Effect.
  • an embodiment of the present application further provides an image processing apparatus, including a processor 401 and a memory 402 storing a computer program;
  • the processor 401 implements the following steps when executing the computer program:
  • Matching the foreground target image with the preset posture template if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
  • the template image includes a preset target area.
  • the processor 401 is further configured to: fill the foreground target image into the target area.
  • the processor 401 is further configured to: perform foreground segmentation processing on the template image, obtain a first foreground target, and determine the target area in the template image according to the first foreground target ; generating the gesture template according to the target area and/or the first foreground target.
  • the processor 401 is further configured to: perform joint point detection on the first foreground target to obtain a joint point detection result; and generate the posture template according to the joint point detection result.
  • the gesture template is selected from a gesture database according to the template image; the gesture database includes several gesture templates.
  • the processor 401 is further configured to:
  • Semantic segmentation is performed on the template image to obtain a classification result of each pixel in the template image
  • the gesture template corresponding to the template image is obtained from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.
  • the processor 401 is further configured to: generate the target area in the template image according to the gesture template.
  • the processor 401 is further configured to: determine that the matching is successful if the difference between the foreground target image and the preset gesture template satisfies a preset condition.
  • the preset condition includes at least one of the following:
  • the similarity between the contour structure of the second foreground target indicated by the foreground target image and the contour structure of the posture template is greater than a preset threshold; or, the joint point detection result of the second foreground target indicated by the foreground target image.
  • the difference from the joint point detection result of the gesture template is within a preset range.
  • the joint point detection result includes at least one of the following: an angle between joint points, a joint point type, or a distribution position of the joint points.
  • the processor 401 is further configured to: perform deformation processing on the target area according to the outline structure of the second foreground target indicated by the foreground target image; target area.
  • the processor 401 is further configured to: perform deformation processing on the foreground target image according to the size of the target area; and fill the deformed foreground target image into the target area.
  • the processor 401 is further configured to: perform semantic segmentation processing on the material image to obtain a classification result of each pixel in the material image; obtain the classification result of each pixel in the material image according to the classification result of each pixel in the material image Describe the foreground target image.
  • the material image is one of a set of material images
  • the gesture template is one of a set of gesture templates
  • the processor 401 is further configured to: after acquiring the foreground target image of the material image in the material image set, if the foreground target image is successfully matched with the gesture template in the gesture template set, compare the foreground target image with the gesture template in the gesture template set.
  • the template images are fused to generate a fused image; and a fused video is generated according to the fused images corresponding to the material image set.
  • the target area in the template image is generated according to a target pose template matching the foreground target image.
  • the processor is further configured to: fill the foreground target image into the target area of the template image.
  • the template image is one of a set of template images.
  • the processor 401 is also used for:
  • a target template image is determined in the template image set according to a target pose template matched with the foreground target image; the foreground target image and the target template image are fused.
  • the processor 401 is further configured to:
  • the template images adjacent to the target template image in the template image set include template images adjacent to the acquisition time;
  • the foreground target image, the target template image and the template image adjacent to the target template image are fused.
  • the processor 401 is further configured to: perform color fusion processing on the template image and the foreground target image according to the color gamut range of the gesture image and the color gamut range of the template image.
  • the various embodiments described herein can be implemented using computer readable media such as computer software, hardware, or any combination thereof.
  • the embodiments described herein can be implemented using application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable gate arrays
  • processors controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented.
  • embodiments such as procedures or functions may be implemented with separate software modules that allow the performance of at least one function or operation.
  • the software codes may be implemented by a software application (or program) written in any suitable programming language, which may be stored in
  • non-transitory computer-readable storage medium such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • a non-transitory computer-readable storage medium when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.

Abstract

A method and device for image processing, and a storage medium. The method comprises: acquiring a stock image and a template image; performing foreground segmentation processing with respect to the stock image to acquire a foreground target image; matching the foreground target image with a preset posture template; and if successfully matched, filling the foreground target image into the template image, thus fusing the template image and the foreground target image to produce a fused image. The present embodiment increases the quality of the fused image.

Description

图像处理方法、装置及存储介质Image processing method, device and storage medium 技术领域technical field
本申请涉及图像处理技术领域,具体而言,涉及一种图像处理方法、装置及存储介质。The present application relates to the technical field of image processing, and in particular, to an image processing method, device, and storage medium.
背景技术Background technique
随着当今社会中的智能化设备的发展,图像处理已经成为人们生活及中不可或缺的一部分,无论是工作中的专业图像处理还是生活中的娱乐型图像处理,其中一种较为普及的图像处理方式之一就是用户手动提取图像中的前景区域,再将其填入另一模板图像中,生成PS图像。这种方式,前景区域所代表的人物动作很有可能并不美观,也不足以跟模板图像适配,从而降低合成图片的质量。With the development of intelligent equipment in today's society, image processing has become an indispensable part of people's lives. Whether it is professional image processing at work or entertainment image processing in life, one of the more popular image processing One of the processing methods is that the user manually extracts the foreground area in the image, and then fills it into another template image to generate a PS image. In this way, the action of the character represented by the foreground area is likely to be unsightly and not suitable for the template image, thereby reducing the quality of the composite image.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请的目的之一是提供一种图像处理方法、装置及存储介质。In view of this, one of the objectives of the present application is to provide an image processing method, device and storage medium.
第一方面,本申请实施例提供了一种图像处理方法,包括:In a first aspect, an embodiment of the present application provides an image processing method, including:
获取素材图像以及模板图像;Get material images and template images;
对所述素材图像进行前景分割处理,获取前景目标图像;performing foreground segmentation processing on the material image to obtain a foreground target image;
将所述前景目标图像与预设的姿势模板进行匹配,若匹配成功,将所述前景目标图像填入所述模板图像中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像。Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
第二方面,本申请实施例提供了一种图像处理装置,包括处理器和存储有计算机程序的存储器;In a second aspect, an embodiment of the present application provides an image processing apparatus, including a processor and a memory storing a computer program;
所述处理器在执行所述计算机程序时实现以下步骤:The processor implements the following steps when executing the computer program:
获取素材图像以及模板图像;Get material images and template images;
对所述素材图像进行前景分割处理,获取前景目标图像;performing foreground segmentation processing on the material image to obtain a foreground target image;
将所述前景目标图像与预设的姿势模板进行匹配,若匹配成功,将所述前景目标 图像填入所述模板图像中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像。Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
第三方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述的方法。In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the method according to the first aspect.
本申请实施例所提供的一种图像处理方法、装置及存储介质,获取素材图像和模板图像,其中,素材图像表征提供前景目标的图像,模板图像表征提供背景的图像,在对素材图像进行前景分割处理获取前景目标图像之后,将前景目标图像与预设的姿势模板进行匹配,在匹配成功的情况下,将所述模板图像和所述前景目标图像进行融合,生成融合图像。本实施例中,通过姿势模板来确定所述前景目标图像中的前景目标的姿势是否符合预设要求(比如满足美观展示的要求或者其他某些场景的要求),在匹配成功的情况下,表明所述前景目标图像中的前景目标的姿势符合预设要求,能够与所述模板图像适配,在这种情况下才将所述模板图像和所述前景目标图像进行融合,从而有利于提高融合图像的质量。An image processing method, device, and storage medium provided by the embodiments of the present application acquire a material image and a template image, wherein the material image represents an image that provides a foreground target, and the template image represents an image that provides a background. After obtaining the foreground target image through the segmentation process, the foreground target image is matched with a preset posture template, and if the matching is successful, the template image and the foreground target image are fused to generate a fusion image. In this embodiment, it is determined whether the posture of the foreground target in the foreground target image meets the preset requirements (such as meeting the requirements of beautiful display or the requirements of some other scenes) through the gesture template, and in the case of successful matching, it indicates that The posture of the foreground target in the foreground target image meets the preset requirements and can be adapted to the template image. In this case, the template image and the foreground target image are fused, thereby helping to improve fusion. image quality.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1、图5和图7是本申请一个实施例提供的图像处理方法的不同流程示意图;Fig. 1, Fig. 5 and Fig. 7 are different flow diagrams of the image processing method provided by an embodiment of the present application;
图2A、图3A和图6A是本申请一个实施例提供的模板图像的不同示意图;2A, FIG. 3A and FIG. 6A are different schematic diagrams of template images provided by an embodiment of the present application;
图2B是本申请一个实施例提供的包括有第一前景目标的图像的示意图;2B is a schematic diagram of an image including a first foreground target provided by an embodiment of the present application;
图2C是本申请一个实施例提供的模板图像的目标区域的示意图;2C is a schematic diagram of a target area of a template image provided by an embodiment of the present application;
图2D是本申请一个实施例提供的姿势模板的示意图;2D is a schematic diagram of a gesture template provided by an embodiment of the present application;
图2E是本申请一个实施例提供的关节点检测的示意图;2E is a schematic diagram of joint point detection provided by an embodiment of the present application;
图3B和图3C是本申请一个实施例提供的分类结果与姿势模板的对应关系的不同示意图;3B and 3C are different schematic diagrams of the correspondence between classification results and gesture templates provided by an embodiment of the present application;
图4A是本申请一个实施例提供的素材图像的示意图;4A is a schematic diagram of a material image provided by an embodiment of the present application;
图4B是本申请一个实施例提供的前景目标图像的示意图;4B is a schematic diagram of a foreground target image provided by an embodiment of the present application;
图4C是本申请一个实施例提供的融合图像的示意图;4C is a schematic diagram of a fused image provided by an embodiment of the present application;
图6B是本申请一个实施例提供的素材图像集的示意图;6B is a schematic diagram of a material image set provided by an embodiment of the present application;
图6C是本申请一个实施例提供的姿势模板集及其适配的目标区域的示意图;6C is a schematic diagram of a gesture template set and an adapted target area provided by an embodiment of the present application;
图6D是本申请一个实施例提供的素材图像集的所有融合图像的示意图;6D is a schematic diagram of all fused images of a material image set provided by an embodiment of the present application;
图8是本申请一个实施例提供的一种图像处理装置的结构示意图。FIG. 8 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
针对于相关技术中的问题,本申请实施例提供了一种图像处理方法,获取素材图像和模板图像,其中,素材图像表征提供前景目标的图像,模板图像表征提供背景的图像,在对素材图像进行前景分割处理获取前景目标图像之后,将前景目标图像与预设的姿势模板进行匹配,在匹配成功的情况下,将所述模板图像和所述前景目标图像进行融合,生成融合图像。本实施例中,通过姿势模板来确定所述前景目标图像中的前景目标的姿势是否符合预设要求(比如满足美观展示的要求),在匹配成功的情况下,表明所述前景目标图像中的前景目标的姿势符合预设要求,能够与所述模板图像适配,在这种情况下才将所述模板图像和所述前景目标图像进行融合,从而有利于提高融合图像的质量。In view of the problems in the related art, an embodiment of the present application provides an image processing method for acquiring a material image and a template image, wherein the material image represents an image that provides a foreground target, and the template image represents an image that provides a background. After performing the foreground segmentation process to obtain the foreground target image, the foreground target image is matched with a preset posture template, and if the matching is successful, the template image and the foreground target image are fused to generate a fusion image. In this embodiment, it is determined whether the posture of the foreground target in the foreground target image meets the preset requirements (such as meeting the requirements of beautiful display) through the gesture template, and in the case of successful matching, it indicates that the foreground target image in the foreground target image. The posture of the foreground target meets the preset requirements and can be adapted to the template image. Only in this case can the template image and the foreground target image be fused, thereby helping to improve the quality of the fused image.
本申请实施例提供的图像处理方法可以由图像处理装置来执行。The image processing methods provided in the embodiments of the present application may be executed by an image processing apparatus.
在一种可能的实现方式中,所述图像处理装置可以是具有数据处理能力的计算机芯片或者集成电路,例如中央处理单元(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)或者现成可编程门阵列(Field-Programmable Gate Array,FPGA)等。所述图像处理装置可以安装于可移动平台、终端设备或者服务器等具备图像处理功能的设备。In a possible implementation manner, the image processing device may be a computer chip or an integrated circuit with data processing capability, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC) or off-the-shelf Programmable Gate Array (Field-Programmable Gate Array, FPGA), etc. The image processing apparatus may be installed in a mobile platform, a terminal device, or a server, or other equipment with image processing functions.
在另一种可能的实现方式中,所述图像处理装置可以指可移动平台、终端设备或者服务器等具备图像处理功能的实体设备。其中,所述可移动平台的示例包括但不限于无人飞行器、无人驾驶车辆、云台、无人驾驶船只或者移动机器人等。所述终端设备的示例包括但不限于:智能电话/手机、平板计算机、个人数字助理(PDA)、膝上计算机、台式计算机、媒体内容播放器、视频游戏站/系统、虚拟现实系统、增强现 实系统、可穿戴式装置(例如,手表、眼镜、手套、头饰(例如,帽子、头盔、虚拟现实头戴耳机、增强现实头戴耳机、头装式装置(HMD)、头带)、挂件、臂章、腿环、鞋子、马甲)、遥控器、或者任何其他类型的装置。In another possible implementation manner, the image processing apparatus may refer to an entity device with image processing functions, such as a movable platform, a terminal device, or a server. Wherein, examples of the movable platform include, but are not limited to, unmanned aerial vehicles, unmanned vehicles, pan-tilts, unmanned ships, or mobile robots. Examples of such terminal devices include, but are not limited to: smartphones/mobile phones, tablet computers, personal digital assistants (PDAs), laptop computers, desktop computers, media content players, video game stations/systems, virtual reality systems, augmented reality Systems, wearable devices (eg, watches, glasses, gloves, headwear (eg, hats, helmets, virtual reality headsets, augmented reality headsets, head mounted devices (HMDs), headbands), pendants, armbands , leg loops, shoes, vest), remote control, or any other type of device.
在一示例性的应用场景中,用户可以基于实际需要选择模板图像,比如可以在交互界面上显示多个模板图像,用户在交互界面上选择模板图像之后,交互界面上可以显示所述模板图像对应的一个或多个的姿势模板供用户选择;或者,用户也可以根据自身经验或者需求,从姿势数据库中选择与所述模板图像相契合的姿势模板;所述姿势模板可以用来对用户采集或者选择素材图像提供指导意见,可以理解的是,本申请实施例对于所述姿势模板的具体形式不做任何限制,可依据实际应用场景进行设置,例如所述姿势模板可以以图像、文本或者声音等方式呈现。即是说,可以根据所述姿势模板来采集素材图像或者选择已采集的素材图像,比如通过安装有成像装置的手持云台、手机、无人飞行器等来采集用户的素材图像,或者也可以通过独立的成像装置来采集用户的素材图像。In an exemplary application scenario, the user can select a template image based on actual needs, for example, multiple template images can be displayed on the interactive interface. After the user selects a template image on the interactive interface, the corresponding template image can be displayed on the interactive interface. One or more of the gesture templates are available for the user to select; or, the user can also select a gesture template that matches the template image from the gesture database according to his own experience or needs; the gesture template can be used to collect or Select a material image to provide guidance. It can be understood that this embodiment of the present application does not impose any restrictions on the specific form of the gesture template, and can be set according to actual application scenarios. For example, the gesture template can be an image, text, or sound, etc. way to present. That is to say, the material image can be collected or selected according to the posture template, for example, the material image of the user can be collected through a handheld gimbal, mobile phone, unmanned aerial vehicle, etc. A separate imaging device to capture the user's material image.
在获取素材图像和所述模板图像之后,所述图像处理装置可以对所述素材图像进行前景分割处理,获取前景目标图像,然后将所述前景目标图像与预设的姿势模板进行匹配,实现通过所述姿势模板来评估所述前景目标图像中的前景目标的姿势是都满足预设标准,若匹配成功,表明所述前景目标图像中的前景目标的姿势符合预设要求,所述前景目标图像与所述模板图像适合结合呈现,则可以将所述前景目标图像填入所述模板图像中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像,从而有利于提高融合图像的质量,也提升用户的视觉体验。After acquiring the material image and the template image, the image processing device may perform foreground segmentation processing on the material image, acquire a foreground target image, and then match the foreground target image with a preset posture template to achieve the The posture template to evaluate the posture of the foreground target in the foreground target image all meet the preset criteria, if the matching is successful, it indicates that the posture of the foreground target in the foreground target image meets the preset requirements, and the foreground target image If it is suitable for presentation in combination with the template image, the foreground target image can be filled in the template image to fuse the template image and the foreground target image to generate a fusion image, which is conducive to improving the fusion image. It also improves the user's visual experience.
接下来对本申请实施例提供的图像处理方法进行说明:请参阅图1,图1为本申请实施例提供的一种图像处理方法的流程示意图,所述方法可以由图像处理装置来执行,所述方法包括:Next, the image processing method provided by the embodiment of the present application will be described: please refer to FIG. 1 , which is a schematic flowchart of an image processing method provided by the embodiment of the present application. The method may be executed by an image processing apparatus. Methods include:
在步骤S101中,获取素材图像以及模板图像。In step S101, a material image and a template image are acquired.
在步骤S102中,对所述素材图像进行前景分割处理,获取前景目标图像。In step S102, foreground segmentation processing is performed on the material image to obtain a foreground target image.
在步骤S103中,将所述前景目标图像与预设的姿势模板进行匹配,若匹配成功,将所述前景目标图像填入所述模板图像,以将所述模板图像和所述前景目标图像进行融合,生成融合图像。In step S103, the foreground target image is matched with a preset gesture template, and if the matching is successful, the foreground target image is filled in the template image, so that the template image and the foreground target image are matched. Fusion, to generate a fused image.
其中,所述素材图像用于提供前景目标,所述模板图像用于提供背景,本实施例正是基于所述素材图像所提供的前景目标以及素材图像所提供的背景来生成融合图像。The material image is used to provide the foreground target, and the template image is used to provide the background. In this embodiment, the fusion image is generated based on the foreground target provided by the material image and the background provided by the material image.
示例性地,所述素材图像和所述模板图像可以根据由用户根据实际需要进行选择。 所述素材图像可以由用户基于成像装置获得,所述素材图像可以是已预先采集的图像,也可以是所述成像装置实时采集的图像。Exemplarily, the material image and the template image may be selected by the user according to actual needs. The material image may be obtained by a user based on an imaging device, and the material image may be an image that has been captured in advance, or may be an image captured by the imaging device in real time.
在一些实施例中,所述姿势模板可以根据所述模板图像来确定,所述姿势模板可以用来对用户采集或者选择素材图像提供指导意见,所述姿势模板可以以图像、文本或者声音等方式呈现。示例性地,所述模板图像对应有一个或多个姿势模板,所述预设的姿势模板可以是所述模板图像对应的所有姿势模板中的至少一个,比如所述预设的姿势模板可以由用户从所述模板图像对应的所有姿势模板中选择。示例性地,所述预设的姿势模板也可以是用户根据自身选择的模板图像,结合自身经验或需要,从姿势数据库中选择的与所述模板图像契合的至少一个姿势模板;其中,所述姿势数据库包括有若干姿势模板。In some embodiments, the gesture template may be determined according to the template image, the gesture template may be used to provide guidance for the user to collect or select material images, and the gesture template may be in the form of images, text, or sounds, etc. render. Exemplarily, the template image corresponds to one or more gesture templates, and the preset gesture template may be at least one of all gesture templates corresponding to the template image. For example, the preset gesture template may be composed of The user selects from all gesture templates corresponding to the template image. Exemplarily, the preset gesture template may also be at least one gesture template selected by the user according to a template image selected by himself, combined with his own experience or needs, from a gesture database that fits the template image; wherein the The gesture database includes several gesture templates.
其中,所述模板图像对应的一个或多个姿势模板至少可以通过以下方式确定:Wherein, one or more gesture templates corresponding to the template image can be determined at least in the following ways:
作为例子,可以对所述模板图像进行前景分割处理,获得第一前景目标,然后根据所述第一前景目标,生成所述模板图像对应的一个或多个姿势模板。As an example, foreground segmentation processing may be performed on the template image to obtain a first foreground target, and then one or more gesture templates corresponding to the template image may be generated according to the first foreground target.
作为例子,还可以预先对所述模板图像进行语义分割处理,获得所述模板图像中各个像素的分类结果,根据所述分类结果从姿势数据库中确定所述模板图像对应的一个或多个姿势模板。As an example, semantic segmentation processing may also be performed on the template image in advance to obtain a classification result of each pixel in the template image, and one or more gesture templates corresponding to the template image may be determined from the gesture database according to the classification result. .
本实施例中,在对所述素材图像进行前景分割处理获取前景目标图像之后,并不是直接将所述前景目标图像与所述模板图像进行融合处理,而是需要先评估所述前景目标图像中的前景目标的姿势是否符合预设要求(比如满足美观展示的要求或者满足某些场景要求),通过将所述前景目标图像与预设的姿势模板进行匹配,在所述前景目标图像与预设的姿势模板匹配成功的情况下,确定所述目标图像中的前景目标的姿势能够与所述模板图像适配,两者的结合能够具有良好的呈现效果,因此将所述前景目标图像填入所述模板图像,以将所述模板图像和所述前景目标图像进行融合,生成融合图像,从而有利于提高融合图像的质量。In this embodiment, after the foreground target image is obtained by performing the foreground segmentation process on the material image, the foreground target image and the template image are not directly fused. Whether the posture of the foreground target meets the preset requirements (such as meeting the requirements of beautiful display or meeting certain scene requirements), by matching the foreground target image with the preset posture template, the foreground target image and the preset When the posture template matching is successful, it is determined that the posture of the foreground target in the target image can be adapted to the template image, and the combination of the two can have a good presentation effect, so the foreground target image is filled in the The template image is used to fuse the template image and the foreground target image to generate a fusion image, thereby helping to improve the quality of the fusion image.
在一些实施例中,所述模板图像中包括有预设的目标区域,所述将所述前景目标图像填入所述模板图像,包括:将所述前景目标图像填入所述目标区域,所述目标区域可以表征所述模板图像中与所述前景目标图像适配程度较高的区域,在所述前景目标图像填入所述目标区域的情况下,最终生成的融合图像具有良好且自然的呈现效果;在一个例子中,所述前景模板图像可以是“骑着自行车的人”的图像,所述模板图像为一张包含有道路的图像,所述目标区域可以是所述模板图像中道路所在区域。In some embodiments, the template image includes a preset target area, and the filling the foreground target image into the template image includes: filling the foreground target image into the target area, and the The target area can represent an area in the template image that has a high degree of adaptation to the foreground target image. When the foreground target image is filled in the target area, the final generated fusion image has a good and natural appearance. Presentation effect; in one example, the foreground template image may be an image of a "bicycle", the template image is an image containing a road, and the target area may be a road in the template image your region.
以下对所述姿势模板以及所述目标区域的确定过程进行具体说明:The following describes the determination process of the gesture template and the target area in detail:
在第一种可能的实现方式中,所述图像处理装置可以预先对所有的模板图像都进行如下处理,或者响应于用户的模板图像选择指令对获取的模板图像做如下处理:所述图像处理装置可以对如图2A所示的模板图像进行前景分割处理,获得如图2B所示的第一前景目标(为了方便区别,将从模板图像中获取的前景目标称为第一前景目标,将从素材图像中获取的前景目标称为第二前景目标),并根据所述第一前景目标确定所述模板图像中的如图2C所示的目标区域,最后根据所述目标区域和/或所述第一前景目标,生成所述姿势模板;本实施例中,所述目标区域以及所述姿势模板均基于同一张模板图像中的第一前景目标来确定,在确定所述素材图像的前景目标图像与姿势模板匹配成功的情况下,将所述素材图像的前景目标图像填入所述模板图像的所述目标区域中,实现所述素材图像的前景目标图像与所述模板图像所提供的背景的良好结合以及真实呈现,从而有利于提高融合图像的质量。In a first possible implementation manner, the image processing apparatus may perform the following processing on all template images in advance, or perform the following processing on the acquired template images in response to a template image selection instruction from a user: the image processing apparatus The template image shown in FIG. 2A can be subjected to foreground segmentation processing to obtain the first foreground target shown in FIG. 2B (for the convenience of distinction, the foreground target obtained from the template image is called the first foreground target, and the first foreground target is obtained from the The foreground target obtained in the image is called the second foreground target), and the target area shown in FIG. 2C in the template image is determined according to the first foreground target, and finally, according to the target area and/or the first target area A foreground target is generated, and the gesture template is generated; in this embodiment, the target area and the gesture template are both determined based on the first foreground target in the same template image. In the case of successful posture template matching, the foreground target image of the material image is filled into the target area of the template image, so as to achieve a good match between the foreground target image of the material image and the background provided by the template image. Combining and realistic rendering, which is beneficial to improve the quality of the fused image.
对于所述姿势模板的表现形式:在一个例子中,比如所述姿势模板以图像形式呈现时可以如图2D所示;在另一个例子中,比如所述姿势模板以文本形式呈现是可以是“双手张开跳跃”的文本;在又一个例子中,比如所述姿势模板以声音形式呈现是可以是“双手张开跳跃”的语音信号。可以理解的是,图2A、图2B、图2C和图2D为分别对模板图像、第一前景目标、目标区域和姿势模板的举例说明,并不构成对本申请的上述处理过程的限定。For the presentation form of the gesture template: in one example, for example, when the gesture template is presented in the form of an image, it can be as shown in Figure 2D; in another example, for example, when the gesture template is presented in the form of text, it can be " In another example, for example, the gesture template presented in the form of sound may be a speech signal of "open hands and jump". It can be understood that FIG. 2A , FIG. 2B , FIG. 2C and FIG. 2D illustrate the template image, the first foreground target, the target area and the gesture template respectively, and do not constitute a limitation on the above processing procedure of the present application.
其中,在获取所述第一前景目标时,在一个例子中,所述图像处理装置可以对所述模板图像进行语义分割处理,获得所述模板图像中每个像素的分类结果,然后根据所述模板图像中每个像素的分类结果获得所述第一前景目标;比如对图2A所示的模板图像进行前景分割之后,可以获得像素属于{天空、树木、人}中的任意一项的分类结果,则可以根据分类结果为人的像素获得所述第一前景目标。本实施例中基于语义分割方法可以实现对模板图像中的每个像素进行准确分类,进而提高获得的第一前景目标的准确性。Wherein, when acquiring the first foreground target, in an example, the image processing apparatus may perform semantic segmentation processing on the template image, obtain a classification result of each pixel in the template image, and then according to the The classification result of each pixel in the template image obtains the first foreground target; for example, after foreground segmentation is performed on the template image shown in FIG. 2A, the classification result that the pixel belongs to any one of {sky, tree, person} can be obtained , the first foreground target can be obtained according to the pixels whose classification result is human. In this embodiment, based on the semantic segmentation method, each pixel in the template image can be accurately classified, thereby improving the accuracy of the obtained first foreground target.
当然,也可以使用其他方式获得所述第一前景目标,比如使用三分图(trimap)算法来获得所述第一前景目标,本实施例对此不做任何限制。Certainly, the first foreground target may also be obtained in other manners, for example, a trimap algorithm is used to obtain the first foreground target, which is not limited in this embodiment.
其中,在获取所述姿势模板时,在第一个例子中,所述图像处理装置可以根据所述目标区域的轮廓信息来生成所述姿势模板;在第二个例子中,请参阅图2E,所述图像处理装置可以对所述第一前景目标进行关节点检测,获得关节点检测结果,所述关节点检测结果可以包括以下至少一种:关节点之间的角度、关节点类型或者关节点的分布位置,然后根据所述关节点检测结果可以生成如图2D所示的姿势模板;在第三 个例子中,所述图像处理装置可以结合所述目标区域的轮廓信息和所述第一前景目标的关节点检测结果来获得所述姿势模板。其中,所述姿势模板包括但不限于以图像、文本或者声音等方式呈现。本实施例中,所述目标区域以及所述姿势模板均基于同一张模板图像中的第一前景目标来确定,则与所述姿势模板匹配成功的所述素材图像的前景目标图像与所述目标区域也具有较高的契合度,在匹配成功的情况下,将所述前景目标图像填入所述目标区域中,有利于提高后续得到的融合图像的真实度。Wherein, when acquiring the gesture template, in the first example, the image processing apparatus may generate the gesture template according to the contour information of the target area; in the second example, please refer to FIG. 2E, The image processing device may perform joint point detection on the first foreground target to obtain joint point detection results, and the joint point detection results may include at least one of the following: an angle between joint points, a joint point type or a joint point distribution position, and then the pose template shown in FIG. 2D can be generated according to the joint point detection result; in the third example, the image processing device can combine the contour information of the target area and the first foreground The joint point detection result of the target is used to obtain the pose template. Wherein, the gesture template includes, but is not limited to, presentation in the form of images, text, or sounds. In this embodiment, the target area and the gesture template are both determined based on the first foreground target in the same template image, then the foreground target image of the material image that is successfully matched with the gesture template is the same as the target image. The region also has a high degree of fit. In the case of successful matching, filling the foreground target image into the target region is beneficial to improve the authenticity of the subsequently obtained fused image.
在一示例性的应用场景中,多个模板图像显示于交互界面上,用户可以在交互界面上根据实际需要选择模板图像,即选择想要融合的背景;在一个例子中,所述模板图像可以是如图2A所示的图像,即还未经过上述处理获得目标区域和姿势模板的图像,等用户选择完模板图像之后,所述图像处理装置可以响应于用户的模板图像选择指令对选择的模板图像做上述处理,获得所述模板图像的目标区域和姿势模板;在另一个例子中,所述模板图像可以是如图2C所示的图像,即已经预先对模板图像进行了上述处理获得了目标区域和姿势模板,用户在选择完模板图像之后,可以在交互界面上相应地显示通过上述方式获取的姿势模板,从而通过所述姿势模板给用户采集或者选择素材图像提供指导。In an exemplary application scenario, multiple template images are displayed on the interactive interface, and the user can select the template image according to actual needs on the interactive interface, that is, select the background to be merged; It is the image shown in FIG. 2A , that is, the image of the target area and the posture template has not been obtained through the above-mentioned processing. After the user selects the template image, the image processing device can respond to the user's template image selection instruction for the selected template. Perform the above processing on the image to obtain the target area and posture template of the template image; in another example, the template image may be an image as shown in FIG. 2C , that is, the template image has been processed in advance to obtain the target Area and gesture template, after the user selects the template image, the gesture template obtained in the above manner can be correspondingly displayed on the interactive interface, so as to provide guidance for the user to collect or select material images through the gesture template.
在第二种可能的实现方式中,所述姿势模板可以根据所述模板图像从包括有若干姿势模板的姿势数据库中获取的。示例性的,所述姿势模板可以是用户根据选择的模板图像,集合自身的经验或需要,从包括有若干姿势模板的姿势数据库中获取的。示例性的,所述姿势模板可以是所述图像处理装置根据所述模板图像从包括有若干姿势模板的姿势数据库中获取的;具体来说,在获得用户选择的模板图像之后,所述图像处理装置可以对所述模板图像进行语义分割处理,获得所述模板图像中各个像素的分类结果;比如对图3A所示的模板图像进行语义分割之后,可以获得像素属于{沙滩、海、山}中的任意一项的分类结果;然后所述图像处理装置可以根据所述分类结果从所述姿势数据库中获得所述模板图像对应的一个或多个姿势模板。其中,在获取的姿势模板有多个的情况下,用户可以根据实际需要选择想要参考的至少一个姿势模板。In a second possible implementation manner, the gesture template may be acquired from a gesture database including several gesture templates according to the template image. Exemplarily, the gesture template may be obtained by the user from a gesture database including several gesture templates based on the user's own experience or needs according to the selected template image. Exemplarily, the gesture template may be acquired by the image processing apparatus from a gesture database including several gesture templates according to the template image; The device can perform semantic segmentation processing on the template image, and obtain the classification result of each pixel in the template image; for example, after performing semantic segmentation on the template image shown in FIG. 3A, it can be obtained that the pixels belong to {beach, sea, mountain} The classification result of any one of ; and then the image processing apparatus may obtain one or more gesture templates corresponding to the template image from the gesture database according to the classification result. Wherein, when there are multiple acquired gesture templates, the user may select at least one gesture template to be referenced according to actual needs.
在一个例子中,所述图像处理装置可以根据预存的分类结果与姿势模板的对应关系、以及所述模板图像中各个像素的分类结果,从所述姿势数据库中获得所述模板图像对应的一个或多个姿势模板;比如如图3B和图3C所示,分类结果为“沙滩”对应有3种姿势模板,分类结果为“道路”对应有4种姿势模板,这里的姿势模板是图像示例。In one example, the image processing apparatus may obtain one or more corresponding to the template image from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image. Multiple pose templates; for example, as shown in Figures 3B and 3C, there are 3 pose templates corresponding to the classification result of "beach", and 4 pose templates corresponding to the classification result of "road", where the pose template is an image example.
在另一个例子中,可以预先训练有分类模型,所述分类模型可以用于根据所述分 类结果来获得姿势模板;作为例子,所述分类模型可以基于有监督学习方式,采用深度学习模型,使用若干携带有姿势模板标签的分类结果样本训练得到。则所述图像处理装置可以将所述分类结果输入所述分类模型中,通过所述分类模型获得所述模板图像对应的一个或多个姿势模板。In another example, a classification model may be pre-trained, and the classification model may be used to obtain a pose template according to the classification result; as an example, the classification model may be based on a supervised learning method, using a deep learning model, using Several classification result samples carrying pose template labels are obtained by training. Then, the image processing apparatus may input the classification result into the classification model, and obtain one or more gesture templates corresponding to the template image through the classification model.
进一步地,可以根据选择的姿势模板在所述模板图像中生成所述目标区域。在一个例子中,可以获取所述姿势模板的轮廓信息,然后根据所述轮廓信息在所述模板图像中生成所述目标区域。在另一个例子中,可以获取所述姿势模板的轮廓信息,然后在所述模板图像中用户指定的区域根据所述轮廓信息生成所述目标区域。其中,示例性的,在所述姿势模板以图像形式呈现的情况下,可以对该图像进行边缘检测以获取姿势模板的轮廓信息;在所述姿势模板以文字或者声音形式呈现的情况下,可以根据从文本或者声音中获取的语义信息来确定姿势模板的轮廓信息。Further, the target area may be generated in the template image according to the selected gesture template. In one example, contour information of the gesture template may be acquired, and then the target area may be generated in the template image according to the contour information. In another example, the outline information of the gesture template may be acquired, and then the target area is generated according to the outline information in the area specified by the user in the template image. Wherein, exemplarily, in the case that the gesture template is presented in the form of an image, edge detection may be performed on the image to obtain the outline information of the gesture template; in the case that the gesture template is presented in the form of text or sound, the The outline information of the gesture template is determined according to the semantic information obtained from the text or sound.
在又一个例子中,考虑到所述分类结果与姿势模板的对应关系可以很好地体现出与姿势模板匹配成功的前景目标图像在模板图像中适合呈现的位置,比如分类结果为“道路”对应有4种姿势模板,分别与该4种姿势模板匹配成功的前景目标图像均适合在道路所在的位置上呈现,因此,为了进一步提高最后生成的融合图像的真实感,所述图像处理装置可以获取所述姿势模板的轮廓信息以及确定所述模板图像中属于所述姿势模板对应的分类结果的像素所在区域,然后根据所述轮廓信息在该区域和/或该区域的相邻区域中生成目标区域;比如选择的姿势模板对应的分类结果是“道路”,则可以在所述模板图像中属于“道路”的像素所在区域和/或该区域的相邻区域中根据所述姿势模板的轮廓信息生成所述目标区域,本实施例根据分类结果与姿势模板的对应关系来确定所述模板图像中适合生成目标区域的地方,使得最后生成的融合图像较为自然不生硬,有利于提高融合图像的真实效果,提升用户的视觉体验。In yet another example, considering the corresponding relationship between the classification result and the gesture template, it can well reflect the suitable position of the foreground target image that is successfully matched with the gesture template in the template image, for example, the classification result corresponds to "road" There are 4 kinds of posture templates, and the foreground target images that are successfully matched with the 4 kinds of posture templates are suitable to be presented at the position of the road. Therefore, in order to further improve the realism of the finally generated fusion image, the image processing device can obtain The outline information of the posture template and the region where the pixels belonging to the classification result corresponding to the posture template are determined in the template image, and then a target area is generated in the area and/or the adjacent area of the area according to the outline information For example, the corresponding classification result of the selected gesture template is "road", then can be generated according to the outline information of the gesture template in the area where the pixels belonging to "road" are located in the template image and/or in the adjacent area of the area For the target area, in this embodiment, according to the corresponding relationship between the classification result and the posture template, a place in the template image suitable for generating the target area is determined, so that the final generated fusion image is more natural and less rigid, which is beneficial to improve the real effect of the fusion image. , to enhance the user's visual experience.
在一些实施例中,用户可以参考所述姿势模板选择素材图像或者使用成像装置实时采集素材图像,所述图像处理装置在获得所述素材图像之后,在步骤S102中,可以对素材图像进行前景分割处理,获取前景目标图像。在一种可能的实现方式中,所述图像处理装置可以对所述素材图像进行语义分割处理,获得所述素材图像中各个像素的分类结果,然后根据所述素材图像中各个像素的分类结果获取所述前景目标图像;比如对图4A所示的素材图像进行前景分割之后,可以获得像素属于{建筑、植物、人}中的任意一项的分类结果,则可以根据分类结果为人的像素获得如图4B所示的前景目标图像。本实施例中基于语义分割方法可以实现对素材图像中的每个像素进行准确分类,进而提高获得的前景目标图像的准确性。In some embodiments, the user may select a material image with reference to the gesture template or use an imaging device to collect the material image in real time. After obtaining the material image, the image processing device may perform foreground segmentation on the material image in step S102 Process to obtain the foreground target image. In a possible implementation manner, the image processing apparatus may perform semantic segmentation processing on the material image, obtain the classification result of each pixel in the material image, and then obtain the classification result of each pixel in the material image according to the classification result of each pixel in the material image. The foreground target image; for example, after foreground segmentation is performed on the material image shown in FIG. 4A, the classification result that the pixel belongs to any one of {buildings, plants, and people} can be obtained, and then the pixels whose classification results are people can be obtained as follows: The foreground object image shown in Figure 4B. Based on the semantic segmentation method in this embodiment, each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.
可以理解的是,图4A以及图4B为分别对素材图像和前景目标图像的举例说明,并不构成为上述处理方式的限定。It can be understood that, FIG. 4A and FIG. 4B are examples of the material image and the foreground target image respectively, and are not construed as a limitation of the above processing manner.
在获取所述前景目标图像之后,在步骤S103中,所述图像处理装置可以将所述前景目标图像与预设的姿势模板进行匹配,本实施例实现通过所述姿势模板来评估所述前景目标图像与所述模板图像的适配程度。在一些实施例中,匹配的过程即是确定所述前景目标图像与所述姿势模板之间的差异是否满足预设条件,如果满足,确定所述前景目标图像与姿势模板匹配成功,表明从素材图像得到的前景目标图像与所述模板图像适配。如果不满足,确定所述前景目标图像与姿势模板匹配不成功,表明从素材图像得到的前景目标图像与所述模板图像不适合结合在一起呈现,可能不太自然,可选地,可以输出指示无法融合的提醒信息,以提醒用户可以选择另一张素材图像或者结束图像融合过程。After acquiring the foreground target image, in step S103, the image processing apparatus may match the foreground target image with a preset gesture template, and this embodiment implements evaluating the foreground target by using the gesture template How well the image fits the template image. In some embodiments, the matching process is to determine whether the difference between the foreground target image and the gesture template satisfies a preset condition, and if so, it is determined that the foreground target image and the gesture template are successfully matched, indicating that the The foreground target image obtained from the image is adapted to the template image. If it is not satisfied, it is determined that the foreground target image and the gesture template are not successfully matched, indicating that the foreground target image obtained from the material image and the template image are not suitable to be presented together, which may not be natural. Optionally, an indication can be output. Reminder information that cannot be merged, to remind the user that another material image can be selected or the image fusion process can be ended.
在一个例子中,在确定所述前景目标图像与所述姿势模板之间的差异是否满足预设条件的过程中,可以获取所述前景目标图像指示的第二前景目标的轮廓结构以及所述姿势模板的轮廓结构,比如可以分别对所述前景目标图像和所述姿势模板进行边缘检测,从而获取两者的轮廓结构,然后根据两者的轮廓结构之间的相似度来确定是否匹配成功,比如在两者之间的相似度大于预设阈值时确定匹配成功,表明从素材图像得到的前景目标图像适合填入所述模板图像的目标区域中。可以理解的是,所述预设阈值可依据实际应用场景进行具体设置。In one example, in the process of determining whether the difference between the foreground target image and the gesture template satisfies a preset condition, the contour structure and the gesture of the second foreground target indicated by the foreground target image may be acquired The contour structure of the template, for example, edge detection can be performed on the foreground target image and the posture template respectively, so as to obtain the contour structures of the two, and then determine whether the matching is successful according to the similarity between the two contour structures, such as When the similarity between the two is greater than the preset threshold, it is determined that the matching is successful, indicating that the foreground target image obtained from the material image is suitable for filling in the target area of the template image. It can be understood that, the preset threshold may be specifically set according to the actual application scenario.
在另一个例子中,在确定所述前景目标图像与所述姿势模板之间的差异是否满足预设条件的过程中,可以对所述前景目标图像指示的第二前景目标进行关节点检测,获得第二前景目标的关节点检测结果;以及在所述姿势模板以图像方式呈现的情况下,对所述姿势模板进行关节点检测,获得姿势模板的关节点检测结果;所述关节点检测结果包括以下至少一种:关节点之间的角度、关节点类型或者关节点的分布位置;然后根据第二前景目标的关节点检测结果与所述姿势模板的关节点检测结果的差异来确定是否匹配成功,比如在两者之间的差异在预设范围内确定匹配成功,表明从素材图像得到的前景目标图像适合填入所述模板图像的目标区域中。可以理解的是,所述预设范围可依据实际应用场景进行具体设置。In another example, in the process of determining whether the difference between the foreground target image and the posture template satisfies a preset condition, joint point detection may be performed on the second foreground target indicated by the foreground target image to obtain The joint point detection result of the second foreground target; and when the posture template is presented in the form of an image, performing joint point detection on the posture template to obtain the joint point detection result of the posture template; the joint point detection result includes: At least one of the following: the angle between the joint points, the type of the joint points, or the distribution position of the joint points; and then determine whether the matching is successful according to the difference between the joint point detection result of the second foreground target and the joint point detection result of the pose template , for example, if the difference between the two is within a preset range, it is determined that the matching is successful, indicating that the foreground target image obtained from the material image is suitable for filling in the target area of the template image. It can be understood that, the preset range can be specifically set according to the actual application scenario.
在所述前景目标图像与姿势模板匹配成功的情况下,则所述图像处理装置可以将所述前景目标图像填入所述目标区域中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像;比如将如图4B所示的前景图像填入如图2C所示的目标区域中,将所述模板图像和所述前景目标图像进行融合,生成如图4C所述的融合图像。本实 施例中,通过姿势模板来确定前景目标图像与所述模板图像的适配程度,在匹配成功的情况下将所述前景目标图像填入所述模板图像的目标区域中,使得生成的融合图像较为自然不生硬,有利于提高融合图像的真实效果,提升用户的视觉体验。When the foreground target image is successfully matched with the gesture template, the image processing apparatus may fill the foreground target image into the target area to fuse the template image and the foreground target image , generate a fusion image; for example, fill the foreground image shown in Figure 4B into the target area shown in Figure 2C, and fuse the template image and the foreground target image to generate a fusion image as shown in Figure 4C . In this embodiment, the degree of adaptation between the foreground target image and the template image is determined by the gesture template, and the foreground target image is filled into the target area of the template image when the matching is successful, so that the resulting fusion The image is more natural and not rigid, which is conducive to improving the real effect of the fusion image and improving the user's visual experience.
在融合过程中,为了达到更好的融合效果,可基于实际场景考虑进行如下的形变处理过程,以使得前景目标图像与所述目标区域完全对应:In the fusion process, in order to achieve a better fusion effect, the following deformation processing process can be considered based on the actual scene, so that the foreground target image completely corresponds to the target area:
在第一种可能的实现方式中,考虑到所述第二前景目标与所述目标区域可能不是完全一模一样的,因此,为了达到更好的融合效果,可以先根据所述前景目标图像指示的第二前景目标的轮廓结构,对所述目标区域进行形变处理,以使得所述目标区域的轮廓结构可以完全适应于所述第二前景目标,进而将所述前景目标图像填入形变后的目标区域,从而可以获得更好的融合效果,使得生成的融合图像更为自然。In a first possible implementation manner, considering that the second foreground target and the target area may not be exactly the same, in order to achieve a better fusion effect, the first foreground target image may indicate the first Two contour structures of the foreground target, perform deformation processing on the target area, so that the contour structure of the target area can be completely adapted to the second foreground target, and then fill the foreground target image into the deformed target area. , so that a better fusion effect can be obtained and the resulting fusion image is more natural.
在第二种可能的实现方式中,考虑到所述第二前景目标与所述目标区域的尺寸可能不适配,因此,为了达到更好的融合效果,可以先根据所述目标区域的尺寸,对所述前景目标图像进行形变处理,使得所述前景目标图像指示的第二前景目标的尺寸能够适配于所述目标区域,进而将形变后的前景目标图像填入所述目标区域中,从而可以获得更好的融合效果,使得生成的融合图像更为自然。In the second possible implementation manner, considering that the size of the second foreground target and the target area may not match, therefore, in order to achieve a better fusion effect, according to the size of the target area, Perform deformation processing on the foreground target image, so that the size of the second foreground target indicated by the foreground target image can be adapted to the target area, and then fill the deformed foreground target image into the target area, thereby A better fusion effect can be obtained, making the resulting fusion image more natural.
可以理解的是,在实际应用过程中,在将所述前景目标图像填入所述目标区域之前,可以使用上述任意一种形变方式,也可以同时使用上述两种形变方式,本实施例对此不做任何限制。It can be understood that, in the actual application process, before filling the foreground target image into the target area, any one of the above deformation methods can be used, and the above two deformation methods can also be used simultaneously. Do not make any restrictions.
在一些实施例中,为了进一步提高融合效果,在将所述模板图像和所述前景目标图像进行融合时,可以根据所述姿势图像的色域范围以及所述模板图像的色域范围,将所述模板图像和所述前景目标图像进行色彩融合处理;比如可以根据所述姿势图像的色域范围以及所述模板图像的色域范围确定待生成的融合图像的目标色域范围,然后根据所述目标色域范围对由所述模板图像和所述前景目标图像初步合成的图像进行色彩映射处理,生成所述融合图像,本实施例通过色彩融合方式使得生成的融合图像的色彩整体协调,不会突兀,进一步增加融合图像的真实感。In some embodiments, in order to further improve the fusion effect, when the template image and the foreground target image are fused, according to the color gamut range of the pose image and the color gamut range of the template image, the The template image and the foreground target image are subjected to color fusion processing; for example, the target color gamut range of the fusion image to be generated can be determined according to the color gamut range of the posture image and the color gamut range of the template image, and then according to the The target color gamut range performs color mapping on the image preliminarily synthesized by the template image and the foreground target image to generate the fusion image. Obtrusive, further increasing the realism of the fused image.
除了生成单一的融合图像之外,在图5所示的实施例中,也可以生成融合视频,图5为本申请实施例提供的另一种图像处理方法的流程示意图,本实施例可以基于单张模板图像来生成融合视频,所述素材图像可以为素材图像集中的其中一个,以及,所述姿势模板可以为姿势模板集中的其中一个,所述方法可以由图像处理装置来执行,所述方法包括:In addition to generating a single fused image, in the embodiment shown in FIG. 5 , a fused video can also be generated. FIG. 5 is a schematic flowchart of another image processing method provided by this embodiment of the present application. A template image is used to generate a fusion video, the material image may be one of a set of material images, and the gesture template may be one of a set of gesture templates, the method may be performed by an image processing apparatus, and the method include:
在步骤S201中,获取素材图像集以及模板图像。In step S201, a material image set and a template image are acquired.
在步骤S202中,对所述素材图像集包括的多张素材图像分别进行前景分割处理,获取各个素材图像对应的前景目标图像。In step S202, foreground segmentation processing is performed on a plurality of material images included in the material image set, respectively, and a foreground target image corresponding to each material image is obtained.
在步骤S203中,对于各个素材图像对应的前景目标图像,将所述前景目标图像与姿势模板集中的姿势模板进行匹配,若匹配成功,将所述前景目标图像填入所述模板图像中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像。In step S203, for the foreground target image corresponding to each material image, the foreground target image is matched with the gesture template in the gesture template set, and if the matching is successful, the foreground target image is filled in the template image to The template image and the foreground target image are fused to generate a fusion image.
在步骤S204中,根据所述素材图像集对应的融合图像生成融合视频。In step S204, a fused video is generated according to the fused images corresponding to the material image set.
在一些实施例中,所述素材图像集中包括的多张素材图像可以是用户根据实际需要选择的多张素材图像,也可以是一段素材视频中的多个帧。In some embodiments, the multiple material images included in the material image set may be multiple material images selected by the user according to actual needs, or may be multiple frames in a piece of material video.
在一些实施例中,所述姿势模板集可以根据所述模板图像从包括有多个姿势模板集的数据库中获取,具体来说,所述图像处理装置可以对所述模板图像进行语义分割处理,获得所述模板图像中各个像素的分类结果,然后根据所述分类结果从所述姿势数据库中获得所述模板图像对应的姿势模板集;比如可以根据预存的分类结果与姿势模板集的对应关系、以及所述模板图像中各个像素的分类结果,从所述姿势数据库中获得所述模板图像对应的姿势模板集。In some embodiments, the gesture template set may be obtained from a database including a plurality of gesture template sets according to the template image. Specifically, the image processing apparatus may perform semantic segmentation processing on the template image, Obtain the classification result of each pixel in the template image, and then obtain the pose template set corresponding to the template image from the pose database according to the classification result; for example, according to the correspondence between the pre-stored classification result and the pose template set, and the classification result of each pixel in the template image, and obtain the gesture template set corresponding to the template image from the gesture database.
在一些实施例中,可以根据所述姿势模板集中的姿势模板在所述模板图像中生成目标区域,使得所述素材图像对应的前景目标图像与所述模板图像具有良好的呈现结果。其中,可以根据所述姿势模板的轮廓信息在所述模板图像中生成目标区域(可参见上述关于在模板图像中生成目标区域的描述,此处不再赘述)。本实施例中,考虑到与同一张模板图像融合的前景目标图像可能不同,因此,该模板图像中的目标区域基于所述姿势模板集中不同的姿势模板适应性生成,从而满足于与不同姿势模板匹配的不同前景目标图像的融合需求,使得所述素材图像对应的前景目标图像与所述模板图像在融合后具有良好的呈现结果。In some embodiments, the target area may be generated in the template image according to the gesture template in the gesture template set, so that the foreground target image corresponding to the material image and the template image have good rendering results. Wherein, the target area may be generated in the template image according to the outline information of the gesture template (refer to the above description about generating the target area in the template image, which will not be repeated here). In this embodiment, considering that the foreground target image fused with the same template image may be different, the target area in the template image is adaptively generated based on different gesture templates in the gesture template set, so as to satisfy the needs of different gesture templates. Matching fusion requirements of different foreground target images, so that the foreground target image corresponding to the material image and the template image have a good presentation result after fusion.
示例性地,比如对于所述姿势模板集中的每一个姿势模板,可以获取每一张仅包括有与该姿势模板对应的目标区域的模板图像。Exemplarily, for example, for each gesture template in the gesture template set, each template image that only includes a target area corresponding to the gesture template may be acquired.
在一些实施例中,所述图像处理装置在获取所述素材图像集之后,可以对所述素材图像集包括的多张素材图像分别进行语义分割处理,获得各个素材图像中各个像素的分类结果,然后对于各个素材图像,根据该素材图像中各个像素的分类结果获取该素材图像对应的前景目标图像。本实施例中基于语义分割方法可以实现对素材图像中的每个像素进行准确分类,进而提高获得的前景目标图像的准确性。In some embodiments, after acquiring the material image set, the image processing apparatus may perform semantic segmentation processing on a plurality of material images included in the material image set, respectively, to obtain a classification result of each pixel in each material image, Then, for each material image, a foreground target image corresponding to the material image is obtained according to the classification result of each pixel in the material image. Based on the semantic segmentation method in this embodiment, each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.
然后,对于各个素材图像对应的前景目标图像,所述图像处理装置将所述前景目标图像与姿势模板集中的姿势模板进行匹配;比如可以是将所述前景目标图像依次与 姿势模板集中的姿势模板进行匹配,直到匹配成功为止;或者,为了节省计算资源,可以将所述前景目标图像与姿势模板集中指定的一个或多个姿势模板进行匹配。Then, for the foreground target image corresponding to each material image, the image processing apparatus matches the foreground target image with the gesture templates in the gesture template set; for example, the foreground target image may be sequentially matched with the gesture templates in the gesture template set. Matching is performed until the matching is successful; or, in order to save computing resources, the foreground target image may be matched with one or more gesture templates specified in the gesture template set.
如果所述前景目标图像与所述姿势模板集中的姿势模板匹配成功,确定与所述前景目标图像匹配成功的目标姿势模板,然后可以根据所述目标姿势模板在所述模板图像中生成目标区域,进而将所述前景目标图像填入所述目标区域中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像;最后根据各个素材图像对应的融合图像生成融合视频。本实施例中,在匹配成功的情况下,基于素材图像集获得的前景目标图像能够与同一张模板图像进行融合,同一张模板图像中的目标区域基于不同的姿势模板适应性生成,从而生成在同一个场景中进行不同动作的融合视频,进一步提高用户的使用体验。If the foreground target image is successfully matched with the gesture templates in the gesture template set, a target gesture template that is successfully matched with the foreground target image is determined, and then a target area can be generated in the template image according to the target gesture template, Then, the foreground target image is filled into the target area to fuse the template image and the foreground target image to generate a fusion image; finally, a fusion video is generated according to the fusion image corresponding to each material image. In this embodiment, if the matching is successful, the foreground target image obtained based on the material image set can be fused with the same template image, and the target area in the same template image is adaptively generated based on different posture templates, so that the The fusion video of different actions in the same scene further improves the user experience.
其中,所述目标区域可以预先基于姿势模板集中的各个姿势模板生成;也可以在匹配成功的情况下,根据与所述前景目标图像匹配的目标姿势模板实时在所述模板图像中生成目标区域,本实施例对于所述目标区域的生成时机不做任何限制。Wherein, the target area may be generated in advance based on each gesture template in the gesture template set; or if the matching is successful, the target area may be generated in the template image in real time according to the target gesture template matched with the foreground target image, This embodiment does not impose any restrictions on the generation timing of the target area.
在一示例性的应用场景中,用户可以根据实际需要选择一张模板图像以及素材图像集,进而所述图像处理装置可以在获取模板图像之后,根据所述模板图像从姿势数据库中获取对应的姿势模板集;作为例子,如图6A所示,比如所述模板图像可以是有道路背景的图像,如图6B所示(图6B以3张素材图像示例),所述素材图像集中的多张素材图像可以是拍摄用户的跑步过程获得的,如图6C所示(图6C以3个以图像形式呈现的姿势模板示例),所述姿势模板集中的多个姿势模板表征跑步过程中的跑步姿势模板。In an exemplary application scenario, the user can select a template image and a set of material images according to actual needs, and then the image processing apparatus can acquire the corresponding gesture from the gesture database according to the template image after acquiring the template image. A template set; as an example, as shown in FIG. 6A, for example, the template image may be an image with a road background, as shown in FIG. 6B (3 material images are used as an example in FIG. 6B), a plurality of materials in the material image set The image can be obtained by photographing the running process of the user, as shown in FIG. 6C ( FIG. 6C shows three examples of posture templates in the form of images), and the plurality of posture templates in the posture template set represent running posture templates in the running process. .
所述图像处理装置可以对所述素材图像集包括的多张素材图像分别进行前景分割处理,获取各个素材图像对应的前景目标图像。然后,对于各个素材图像对应的前景目标图像,将所述前景目标图像与姿势模板集中的姿势模板进行匹配,若匹配成功,获取与所述前景目标图像匹配的目标姿势模板,然后根据所述目标姿势模板在所述模板图像中生成目标区域(如图6C所示),进而将所述前景目标图像填入所述目标区域中,以将所述模板图像和所述前景目标图像进行融合,生成如图6D所示的融合图像;最后根据各个素材图像对应的融合图像生成融合视频。本实施例中,在匹配成功的情况下,基于素材图像集获得的前景目标图像能够与同一张模板图像进行融合,同一张模板图像中的目标区域基于不同的姿势模板适应性生成,从而生成在同一个场景中进行不同动作的融合视频,进一步提高用户的使用体验。The image processing apparatus may perform foreground segmentation processing on a plurality of material images included in the material image set, respectively, to obtain foreground target images corresponding to each material image. Then, for the foreground target image corresponding to each material image, match the foreground target image with the posture template in the posture template set, if the matching is successful, obtain the target posture template matching the foreground target image, and then according to the target posture template The gesture template generates a target area in the template image (as shown in FIG. 6C ), and then fills the foreground target image into the target area to fuse the template image and the foreground target image to generate The fusion image shown in Figure 6D; finally, a fusion video is generated according to the fusion image corresponding to each material image. In this embodiment, if the matching is successful, the foreground target image obtained based on the material image set can be fused with the same template image, and the target area in the same template image is adaptively generated based on different posture templates, so that the The fusion video of different actions in the same scene further improves the user experience.
相应的,除了生成单一的融合图像之外,在图7所示的实施例中,也可以生成融合视频。请参阅图7,本申请实施例还提供了另一种图像处理方法,本实施例可以基于模板图像集来生成融合视频,所述素材图像为素材图像集中的其中一个;所述姿势模板为姿势模板集中的其中一个;以及,所述模板图像为模板图像集中的其中一个;所述方法可以由图像处理装置来执行,所述方法包括:Correspondingly, in addition to generating a single fused image, in the embodiment shown in FIG. 7 , a fused video can also be generated. Referring to FIG. 7 , this embodiment of the present application also provides another image processing method. This embodiment can generate a fusion video based on a template image set, where the material image is one of the material image sets; the gesture template is a gesture one of the template set; and, the template image is one of the template image set; the method can be performed by an image processing apparatus, and the method includes:
在步骤S301中,获取素材图像集以及模板图像集。In step S301, a material image set and a template image set are acquired.
在步骤S302中,对所述素材图像集包括的多张素材图像分别进行前景分割处理,获取各个素材图像对应的前景目标图像。In step S302, foreground segmentation processing is performed on a plurality of material images included in the material image set, respectively, and a foreground target image corresponding to each material image is obtained.
在步骤S303中,对于各个素材图像对应的前景目标图像,将所述前景目标图像与姿势模板集中的姿势模板进行匹配;若匹配成功,根据与所述前景目标图像匹配的目标姿势模板,在所述模板图像集中确定目标模板图像;将所述前景目标图像与所述目标模板图像进行融合,生成融合图像。In step S303, for the foreground target image corresponding to each material image, match the foreground target image with the gesture template in the gesture template set; if the matching is successful, according to the target gesture template matched with the foreground target image, in the A target template image is determined in the template image set; the foreground target image and the target template image are fused to generate a fusion image.
在步骤S304中,根据所述素材图像集对应的融合图像生成融合视频。In step S304, a fusion video is generated according to the fusion image corresponding to the material image set.
在一些实施例中,所述姿势模板集可以根据所述模板图像集确定;所述模板图像集中的每张模板图像对应有一个或者多个姿势模板,可以基于所述模板图像集中的每张模板图像对应的一个或者多个姿势模板来生成所述姿势模板集,即所述姿势模板集中的姿势模板与所述模板图像集中的模板图像存在对应关系。在一个例子中,所述模板图像对应的一个或者多个姿势模板,可以根据所述模板图像中的第一前景目标生成;在另一个例子中,所述模板图像对应的一个或者多个姿势模板,可以基于所述模板图像从姿势数据库中获取。In some embodiments, the gesture template set may be determined according to the template image set; each template image in the template image set corresponds to one or more gesture templates, which may be determined based on each template in the template image set One or more gesture templates corresponding to the images are used to generate the gesture template set, that is, there is a corresponding relationship between the gesture templates in the gesture template set and the template images in the template image set. In one example, one or more gesture templates corresponding to the template image may be generated according to the first foreground target in the template image; in another example, one or more gesture templates corresponding to the template image , which may be obtained from a gesture database based on the template image.
在一些实施例中,所述模板图像集中的模板图像包括有预设的目标区域,所述将所述前景目标图像填入所述目标模板图像,包括:将所述前景目标图像填入所述目标模板图像的目标区域中。In some embodiments, the template images in the template image set include a preset target area, and filling the foreground target image into the target template image includes: filling the foreground target image into the target template image. in the target area of the target template image.
示例性地,可以对所述模板图像集中的模板图像进行前景分割处理,获得第一前景目标,并根据所述第一前景目标确定所述模板图像中的所述目标区域;然后根据所述目标区域和/或所述第一前景目标,生成所述模板图像的所述姿势模板;最后基于所述模板图像集中的所有模板图像对应的所有姿势模板得到所述姿势模板集。本实施例中,所述目标区域以及所述姿势模板均基于同一张模板图像中的第一前景目标来确定,在确定所述素材图像的前景目标图像与姿势模板匹配成功的情况下,将所述素材图像的前景目标图像填入所述模板图像的所述目标区域中,实现所述素材图像的前景目标图像与所述模板图像所提供的背景的良好结合以及真实呈现。Exemplarily, foreground segmentation processing may be performed on the template images in the template image set to obtain a first foreground target, and the target area in the template image may be determined according to the first foreground target; and then according to the target region and/or the first foreground target, the gesture template of the template image is generated; finally, the gesture template set is obtained based on all gesture templates corresponding to all template images in the template image set. In this embodiment, both the target area and the gesture template are determined based on the first foreground target in the same template image, and when it is determined that the foreground target image of the material image is successfully matched with the gesture template, the The foreground target image of the material image is filled in the target area of the template image, so as to achieve a good combination and real presentation of the foreground target image of the material image and the background provided by the template image.
在另一些实施例中,可以对所述模板图像中的模板图像进行语义分割处理,获得所述模板图像中各个像素的分类结果;然后根据所述模板图像中各个像素的分类结果从所述姿势数据库中获得各个所述模板图像对应的姿势模板;在一个例子中,可以根据预存的分类结果与姿势模板的对应关系、以及所述模板图像中各个像素的分类结果,从所述姿势数据库中获得各个所述模板图像对应的姿势模板;接着可以基于所述模板图像集中的所有模板图像对应的所有姿势模板得到所述姿势模板集;进一步地,可以根据各个所述模板图像对应的姿势模板在所述模板图像中生成所述目标区域。In other embodiments, semantic segmentation processing may be performed on the template image in the template image to obtain a classification result of each pixel in the template image; Obtain the posture template corresponding to each of the template images in the database; in one example, it can be obtained from the posture database according to the correspondence between the pre-stored classification results and the posture templates, and the classification results of each pixel in the template image. The posture template corresponding to each of the template images; then the posture template set can be obtained based on all posture templates corresponding to all the template images in the template image set; The target area is generated in the template image.
在一些实施例中,所述图像处理装置在获取所述素材图像集之后,可以对所述素材图像集包括的多张素材图像分别进行语义分割处理,获得各个素材图像中各个像素的分类结果,然后对于各个素材图像,根据该素材图像中各个像素的分类结果获取该素材图像对应的前景目标图像。本实施例中基于语义分割方法可以实现对素材图像中的每个像素进行准确分类,进而提高获得的前景目标图像的准确性。In some embodiments, after acquiring the material image set, the image processing apparatus may perform semantic segmentation processing on a plurality of material images included in the material image set, respectively, to obtain a classification result of each pixel in each material image, Then, for each material image, a foreground target image corresponding to the material image is obtained according to the classification result of each pixel in the material image. Based on the semantic segmentation method in this embodiment, each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.
然后,对于各个素材图像对应的前景目标图像,所述图像处理装置将所述前景目标图像与姿势模板集中的姿势模板进行匹配;比如可以是将所述前景目标图像依次与姿势模板集中的姿势模板进行匹配,直到匹配成功为止;或者,为了节省计算资源,可以将所述前景目标图像与姿势模板集中指定的一个或多个姿势模板进行匹配。Then, for the foreground target image corresponding to each material image, the image processing apparatus matches the foreground target image with the gesture templates in the gesture template set; for example, the foreground target image may be sequentially matched with the gesture templates in the gesture template set. Matching is performed until the matching is successful; or, in order to save computing resources, the foreground target image may be matched with one or more gesture templates specified in the gesture template set.
如果所述前景目标图像与所述姿势模板集中的姿势模板匹配成功,可以根据与所述前景目标图像匹配成功的目标姿势模板,在所述模板图像集中确定与所述目标姿势模板对应的目标模板图像;然后将所述前景目标图像填入所述目标模板图像的目标区域,以将所述前景目标图像与所述目标模板图像进行融合,生成融合图像;最后根据各个素材图像对应的融合图像生成融合视频。本实施例中,在匹配成功的情况下,基于素材图像集获得的前景目标图像能够与模板图像集中的模板图像进行融合,从而生成在不同场景中进行相同或不同动作的融合视频,进一步提高用户的使用体验。If the foreground target image is successfully matched with the gesture template in the gesture template set, a target template corresponding to the target gesture template may be determined in the template image set according to the target gesture template successfully matched with the foreground target image Then fill the foreground target image into the target area of the target template image to fuse the foreground target image and the target template image to generate a fusion image; finally generate a fusion image corresponding to each material image Fusion video. In this embodiment, when the matching is successful, the foreground target image obtained based on the material image set can be fused with the template image in the template image set, thereby generating a fusion video of performing the same or different actions in different scenes, further improving the user experience. user experience.
在一些实施例中,在进行融合的过程中,为了进一步提高融合效果,在将所述前景目标图像与所述目标模板图像进行融合时,可以确定所述模板图像集中与所述目标模板图像相邻的模板图像,所述与所述目标模板图像相邻的模板图像包括采集时间相邻的模板图像;例如在模板图像集为一段模板视频的帧的集合时,所述与所述目标模板图像相邻的模板图像可以是所述目标模板图像的前后帧;最后将所述前景目标图像、所述目标模板图像以及所述与所述目标模板图像相邻的模板图像进行融合,本实施例可以使用所述与所述目标模板图像相邻的模板图像来填补所述前景目标图像和所述目标模板图像在初步合成后空缺的位置,从而到达更好的融合效果。In some embodiments, in the process of fusion, in order to further improve the fusion effect, when the foreground target image and the target template image are fused, it may be determined that the template image set is the same as the target template image. adjacent template images, the template images adjacent to the target template image include template images adjacent to the acquisition time; for example, when the template image set is a set of frames of a template video, the template images adjacent to the target template image The adjacent template images may be the frames before and after the target template image; finally, the foreground target image, the target template image, and the template image adjacent to the target template image are fused. The template image adjacent to the target template image is used to fill the vacant position of the foreground target image and the target template image after preliminary synthesis, so as to achieve a better fusion effect.
在一种可能的实现方式中,可以根据所述目标模板图像的目标区域的位置信息,在与所述目标模板图像相邻的模板图像中相同的位置处获取局部图像,然后将所述前景目标图像、所述目标模板图像以及所述局部图像融合,本实施例可以使用所述局部图像来填补所述前景目标图像和所述目标模板图像在初步合成后空缺的位置,从而到达更好的融合效果。In a possible implementation manner, according to the position information of the target area of the target template image, a partial image may be acquired at the same position in the template image adjacent to the target template image, and then the foreground target The image, the target template image and the partial image are fused. In this embodiment, the partial image can be used to fill in the vacant position of the foreground target image and the target template image after preliminary synthesis, so as to achieve better fusion. Effect.
相应的,请参阅图8,本申请实施例还提供了一种图像处理装置,包括处理器401和存储有计算机程序的存储器402;Correspondingly, referring to FIG. 8 , an embodiment of the present application further provides an image processing apparatus, including a processor 401 and a memory 402 storing a computer program;
所述处理器401在执行所述计算机程序时实现以下步骤:The processor 401 implements the following steps when executing the computer program:
获取素材图像以及模板图像;Get material images and template images;
对所述素材图像进行前景分割处理,获取前景目标图像;performing foreground segmentation processing on the material image to obtain a foreground target image;
将所述前景目标图像与预设的姿势模板进行匹配,若匹配成功,将所述前景目标图像填入所述模板图像中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像。Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
在一实施例中,所述模板图像包括预设的目标区域。In one embodiment, the template image includes a preset target area.
所述处理器401还用于:将所述前景目标图像填入所述目标区域中。The processor 401 is further configured to: fill the foreground target image into the target area.
在一实施例中,所述处理器401还用于:对所述模板图像进行前景分割处理,获得第一前景目标,并根据所述第一前景目标确定所述模板图像中的所述目标区域;根据所述目标区域和/或所述第一前景目标,生成所述姿势模板。In one embodiment, the processor 401 is further configured to: perform foreground segmentation processing on the template image, obtain a first foreground target, and determine the target area in the template image according to the first foreground target ; generating the gesture template according to the target area and/or the first foreground target.
在一实施例中,所述处理器401还用于:对所述第一前景目标进行关节点检测,获得关节点检测结果;根据所述关节点检测结果生成所述姿势模板。In an embodiment, the processor 401 is further configured to: perform joint point detection on the first foreground target to obtain a joint point detection result; and generate the posture template according to the joint point detection result.
在一实施例中,所述姿势模板根据所述模板图像从姿势数据库中选取;所述姿势数据库中包括有若干姿势模板。In one embodiment, the gesture template is selected from a gesture database according to the template image; the gesture database includes several gesture templates.
在一实施例中,所述处理器401还用于:In one embodiment, the processor 401 is further configured to:
对所述模板图像进行语义分割处理,获得所述模板图像中各个像素的分类结果;Semantic segmentation is performed on the template image to obtain a classification result of each pixel in the template image;
根据预存的分类结果与姿势模板的对应关系、以及所述模板图像中各个像素的分类结果,从所述姿势数据库中获得所述模板图像对应的姿势模板。The gesture template corresponding to the template image is obtained from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.
在一实施例中,所述处理器401还用于:根据所述姿势模板在所述模板图像中生成所述目标区域。In an embodiment, the processor 401 is further configured to: generate the target area in the template image according to the gesture template.
在一实施例中,所述处理器401还用于:若所述前景目标图像与所述预设的姿势模板之间的差异满足预设条件,确定匹配成功。In one embodiment, the processor 401 is further configured to: determine that the matching is successful if the difference between the foreground target image and the preset gesture template satisfies a preset condition.
在一实施例中,所述预设条件包括以下至少一种:In one embodiment, the preset condition includes at least one of the following:
所述前景目标图像指示的第二前景目标的轮廓结构与所述姿势模板的轮廓结构之间的相似度大于预设阈值;或者,所述前景目标图像指示的第二前景目标的关节点检测结果与所述姿势模板的关节点检测结果的差异在预设范围内。The similarity between the contour structure of the second foreground target indicated by the foreground target image and the contour structure of the posture template is greater than a preset threshold; or, the joint point detection result of the second foreground target indicated by the foreground target image. The difference from the joint point detection result of the gesture template is within a preset range.
在一实施例中,所述关节点检测结果包括以下至少一种:关节点之间的角度、关节点类型或者关节点的分布位置。In an embodiment, the joint point detection result includes at least one of the following: an angle between joint points, a joint point type, or a distribution position of the joint points.
在一实施例中,所述处理器401还用于:根据所述前景目标图像指示的第二前景目标的轮廓结构,对所述目标区域进行形变处理;将所述前景目标图像填入形变后的目标区域。In one embodiment, the processor 401 is further configured to: perform deformation processing on the target area according to the outline structure of the second foreground target indicated by the foreground target image; target area.
在一实施例中,所述处理器401还用于:根据所述目标区域的尺寸,对所述前景目标图像进行形变处理;将形变后的前景目标图像填入所述目标区域中。In an embodiment, the processor 401 is further configured to: perform deformation processing on the foreground target image according to the size of the target area; and fill the deformed foreground target image into the target area.
在一实施例中,所述处理器401还用于:对所述素材图像进行语义分割处理,获得所述素材图像中各个像素的分类结果;根据所述素材图像中各个像素的分类结果获取所述前景目标图像。In one embodiment, the processor 401 is further configured to: perform semantic segmentation processing on the material image to obtain a classification result of each pixel in the material image; obtain the classification result of each pixel in the material image according to the classification result of each pixel in the material image Describe the foreground target image.
在一实施例中,所述素材图像为素材图像集中的其中一个,以及,所述姿势模板为姿势模板集中的其中一个。In one embodiment, the material image is one of a set of material images, and the gesture template is one of a set of gesture templates.
所述处理器401还用于:在获取所述素材图像集中的素材图像的前景目标图像之后,如果所述前景目标图像与所述姿势模板集中的姿势模板匹配成功,将所述前景目标图像与所述模板图像进行融合,生成融合图像;根据所述素材图像集对应的融合图像生成融合视频。The processor 401 is further configured to: after acquiring the foreground target image of the material image in the material image set, if the foreground target image is successfully matched with the gesture template in the gesture template set, compare the foreground target image with the gesture template in the gesture template set. The template images are fused to generate a fused image; and a fused video is generated according to the fused images corresponding to the material image set.
在一实施例中,所述模板图像中的目标区域根据与所述前景目标图像匹配的目标姿势模板生成。In one embodiment, the target area in the template image is generated according to a target pose template matching the foreground target image.
所述处理器还用于:将所述前景目标图像填入所述模板图像的目标区域中。The processor is further configured to: fill the foreground target image into the target area of the template image.
在一实施例中,所述模板图像为模板图像集中的其中一个。In one embodiment, the template image is one of a set of template images.
所述处理器401还用于:The processor 401 is also used for:
根据与所述前景目标图像匹配的目标姿势模板,在所述模板图像集中确定目标模板图像;将所述前景目标图像与所述目标模板图像进行融合。A target template image is determined in the template image set according to a target pose template matched with the foreground target image; the foreground target image and the target template image are fused.
在一实施例中,所述处理器401还用于:In one embodiment, the processor 401 is further configured to:
确定所述模板图像集中与所述目标模板图像相邻的模板图像;所述与所述目标模板图像相邻的模板图像包括采集时间相邻的模板图像;determining the template images adjacent to the target template image in the template image set; the template images adjacent to the target template image include template images adjacent to the acquisition time;
将所述前景目标图像、所述目标模板图像以及所述与所述目标模板图像相邻的模板图像进行融合。The foreground target image, the target template image and the template image adjacent to the target template image are fused.
在一实施例中,所述处理器401还用于:根据所述姿势图像的色域范围以及所述模板图像的色域范围,将所述模板图像和所述前景目标图像进行色彩融合处理。In one embodiment, the processor 401 is further configured to: perform color fusion processing on the template image and the foreground target image according to the color gamut range of the gesture image and the color gamut range of the template image.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器中并且由控制器执行。For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The various embodiments described herein can be implemented using computer readable media such as computer software, hardware, or any combination thereof. For hardware implementation, the embodiments described herein can be implemented using application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented. For software implementation, embodiments such as procedures or functions may be implemented with separate software modules that allow the performance of at least one function or operation. The software codes may be implemented by a software application (or program) written in any suitable programming language, which may be stored in memory and executed by a controller.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由装置的处理器执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
一种非临时性计算机可读存储介质,当存储介质中的指令由终端的处理器执行时,使得终端能够执行上述方法。A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
以上对本申请实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The methods and devices provided by the embodiments of the present application have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present application. At the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as a limitation to the application. .

Claims (37)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    获取素材图像以及模板图像;Obtain material images and template images;
    对所述素材图像进行前景分割处理,获取前景目标图像;performing foreground segmentation processing on the material image to obtain a foreground target image;
    将所述前景目标图像与预设的姿势模板进行匹配,若匹配成功,将所述前景目标图像填入所述模板图像中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像。Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
  2. 根据权利要求1所述的方法,其特征在于,所述模板图像包括预设的目标区域;The method according to claim 1, wherein the template image comprises a preset target area;
    所述将所述前景目标图像填入所述模板图像中,包括:Filling the foreground target image into the template image includes:
    将所述前景目标图像填入所述目标区域中。The foreground target image is filled in the target area.
  3. 根据权利要求2所述的方法,其特征在于,所述目标区域以及所述姿势模板通过以下方式获得:The method according to claim 2, wherein the target area and the gesture template are obtained by:
    对所述模板图像进行前景分割处理,获得第一前景目标,并根据所述第一前景目标确定所述模板图像中的所述目标区域;Perform foreground segmentation processing on the template image to obtain a first foreground target, and determine the target area in the template image according to the first foreground target;
    根据所述目标区域和/或所述第一前景目标,生成所述姿势模板。The gesture template is generated according to the target area and/or the first foreground target.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述第一前景目标,生成所述姿势模板,包括:The method according to claim 3, wherein the generating the gesture template according to the first foreground target comprises:
    对所述第一前景目标进行关节点检测,获得关节点检测结果;Perform joint point detection on the first foreground target to obtain joint point detection results;
    根据所述关节点检测结果生成所述姿势模板。The pose template is generated according to the joint point detection result.
  5. 根据权利要求1所述的方法,其特征在于,所述姿势模板根据所述模板图像从姿势数据库中选取;所述姿势数据库中包括有若干姿势模板。The method according to claim 1, wherein the gesture template is selected from a gesture database according to the template image; and the gesture database includes several gesture templates.
  6. 根据权利要求5所述的方法,其特征在于,所述姿势模板通过以下方式获得:The method of claim 5, wherein the gesture template is obtained by:
    对所述模板图像进行语义分割处理,获得所述模板图像中各个像素的分类结果;Semantic segmentation is performed on the template image to obtain a classification result of each pixel in the template image;
    根据预存的分类结果与姿势模板的对应关系、以及所述模板图像中各个像素的分类结果,从所述姿势数据库中获得所述模板图像对应的姿势模板。The gesture template corresponding to the template image is obtained from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.
  7. 根据权利要求6所述的方法,其特征在于,还包括:The method of claim 6, further comprising:
    根据所述姿势模板在所述模板图像中生成所述目标区域。The target area is generated in the template image according to the gesture template.
  8. 根据权利要求1所述的方法,其特征在于,通过以下方式确定匹配成功:The method according to claim 1, wherein the matching is determined to be successful in the following manner:
    若所述前景目标图像与所述预设的姿势模板之间的差异满足预设条件,确定匹配成功。If the difference between the foreground target image and the preset gesture template satisfies a preset condition, it is determined that the matching is successful.
  9. 根据权利要求8所述的方法,其特征在于,所述预设条件包括以下至少一种:The method according to claim 8, wherein the preset condition includes at least one of the following:
    所述前景目标图像指示的第二前景目标的轮廓结构与所述姿势模板的轮廓结构之间的相似度大于预设阈值;The similarity between the contour structure of the second foreground target indicated by the foreground target image and the contour structure of the gesture template is greater than a preset threshold;
    或者,所述前景目标图像指示的第二前景目标的关节点检测结果与所述姿势模板的关节点检测结果的差异在预设范围内。Or, the difference between the joint point detection result of the second foreground target indicated by the foreground target image and the joint point detection result of the posture template is within a preset range.
  10. 根据权利要求4或9所述的方法,其特征在于,所述关节点检测结果包括以下至少一种:关节点之间的角度、关节点类型或者关节点的分布位置。The method according to claim 4 or 9, wherein the joint point detection result includes at least one of the following: an angle between joint points, a joint point type, or a distribution position of the joint points.
  11. 根据权利要求2所述的方法,其特征在于,所述将所述前景目标图像填入所述目标区域中,包括:The method according to claim 2, wherein the filling the foreground target image into the target area comprises:
    根据所述前景目标图像指示的第二前景目标的轮廓结构,对所述目标区域进行形变处理;Perform deformation processing on the target area according to the outline structure of the second foreground target indicated by the foreground target image;
    将所述前景目标图像填入形变后的目标区域。The foreground target image is filled into the deformed target area.
  12. 根据权利要求2所述的方法,其特征在于,所述将所述前景目标图像填入所述目标区域中,包括:The method according to claim 2, wherein the filling the foreground target image into the target area comprises:
    根据所述目标区域的尺寸,对所述前景目标图像进行形变处理;Perform deformation processing on the foreground target image according to the size of the target area;
    将形变后的前景目标图像填入所述目标区域中。The deformed foreground target image is filled into the target area.
  13. 根据权利要求1所述的方法,其特征在于,所述对所述素材图像进行前景分割处理,包括:The method according to claim 1, wherein the performing foreground segmentation processing on the material image comprises:
    对所述素材图像进行语义分割处理,获得所述素材图像中各个像素的分类结果;Perform semantic segmentation processing on the material image to obtain a classification result of each pixel in the material image;
    根据所述素材图像中各个像素的分类结果获取所述前景目标图像。The foreground target image is acquired according to the classification result of each pixel in the material image.
  14. 根据权利要求1所述的方法,其特征在于,所述素材图像为素材图像集中的其中一个,以及,所述姿势模板为姿势模板集中的其中一个;The method according to claim 1, wherein the material image is one of a set of material images, and the gesture template is one of a set of gesture templates;
    所述方法还包括:The method also includes:
    在获取所述素材图像集中的素材图像的前景目标图像之后,如果所述前景目标图像与所述姿势模板集中的姿势模板匹配成功,将所述前景目标图像与所述模板图像进行融合,生成融合图像;After acquiring the foreground target image of the material image in the material image set, if the foreground target image is successfully matched with the gesture template in the gesture template set, the foreground target image and the template image are fused to generate a fusion image;
    根据所述素材图像集对应的融合图像生成融合视频。A fusion video is generated according to the fusion images corresponding to the material image set.
  15. 根据权利要求14所述的方法,其特征在于,所述模板图像中的目标区域根据与所述前景目标图像匹配的目标姿势模板生成;The method according to claim 14, wherein the target area in the template image is generated according to a target pose template matched with the foreground target image;
    所述将所述前景目标图像与所述模板图像进行融合,还包括:将所述前景目标图像填入所述模板图像的目标区域中。The fusion of the foreground target image and the template image further includes: filling the foreground target image into a target area of the template image.
  16. 根据权利要求14所述的方法,其特征在于,所述模板图像为模板图像集中的 其中一个;The method according to claim 14, wherein the template image is one of a template image set;
    所述将所述前景目标图像与所述模板图像进行融合,包括:The fusion of the foreground target image and the template image includes:
    根据与所述前景目标图像匹配的目标姿势模板,在所述模板图像集中确定目标模板图像;determining a target template image in the template image set according to a target pose template matching the foreground target image;
    将所述前景目标图像与所述目标模板图像进行融合。The foreground target image is fused with the target template image.
  17. 根据权利要求16所述的方法,其特征在于,所述将所述前景目标图像与所述目标模板图像进行融合,包括:The method according to claim 16, wherein the fusion of the foreground target image and the target template image comprises:
    确定所述模板图像集中与所述目标模板图像相邻的模板图像;所述与所述目标模板图像相邻的模板图像包括采集时间相邻的模板图像;determining the template images adjacent to the target template image in the template image set; the template images adjacent to the target template image include template images adjacent to the acquisition time;
    将所述前景目标图像、所述目标模板图像以及所述与所述目标模板图像相邻的模板图像进行融合。The foreground target image, the target template image and the template image adjacent to the target template image are fused.
  18. 根据权利要求1所述的方法,其特征在于,所述将所述模板图像和所述前景目标图像进行融合,还包括:The method according to claim 1, wherein the fusion of the template image and the foreground target image further comprises:
    根据所述姿势图像的色域范围以及所述模板图像的色域范围,将所述模板图像和所述前景目标图像进行色彩融合处理。According to the color gamut range of the gesture image and the color gamut range of the template image, color fusion processing is performed on the template image and the foreground target image.
  19. 一种图像处理装置,其特征在于,包括处理器和存储有计算机程序的存储器;An image processing device, characterized in that it comprises a processor and a memory storing a computer program;
    所述处理器在执行所述计算机程序时实现以下步骤:The processor implements the following steps when executing the computer program:
    获取素材图像以及模板图像;Obtain material images and template images;
    对所述素材图像进行前景分割处理,获取前景目标图像;performing foreground segmentation processing on the material image to obtain a foreground target image;
    将所述前景目标图像与预设的姿势模板进行匹配,若匹配成功,将所述前景目标图像填入所述模板图像中,以将所述模板图像和所述前景目标图像进行融合,生成融合图像。Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
  20. 根据权利要求19所述的装置,其特征在于,所述模板图像包括预设的目标区域;The device according to claim 19, wherein the template image comprises a preset target area;
    所述处理器还用于:将所述前景目标图像填入所述目标区域中。The processor is further configured to: fill the foreground target image into the target area.
  21. 根据权利要求19所述的装置,其特征在于,所述处理器还用于:对所述模板图像进行前景分割处理,获得第一前景目标,并根据所述第一前景目标确定所述模板图像中的所述目标区域;根据所述目标区域和/或所述第一前景目标,生成所述姿势模板。The apparatus according to claim 19, wherein the processor is further configured to: perform foreground segmentation processing on the template image to obtain a first foreground target, and determine the template image according to the first foreground target The target area in ; the gesture template is generated according to the target area and/or the first foreground target.
  22. 根据权利要求21所述的装置,其特征在于,所述处理器还用于:对所述第一 前景目标进行关节点检测,获得关节点检测结果;根据所述关节点检测结果生成所述姿势模板。The device according to claim 21, wherein the processor is further configured to: perform joint point detection on the first foreground target to obtain joint point detection results; generate the posture according to the joint point detection results template.
  23. 根据权利要求19所述的装置,其特征在于,所述姿势模板根据所述模板图像从姿势数据库中选取;所述姿势数据库中包括有若干姿势模板。The apparatus according to claim 19, wherein the gesture template is selected from a gesture database according to the template image; and the gesture database includes several gesture templates.
  24. 根据权利要求23所述的装置,其特征在于,所述处理器还用于:The apparatus of claim 23, wherein the processor is further configured to:
    对所述模板图像进行语义分割处理,获得所述模板图像中各个像素的分类结果;Semantic segmentation is performed on the template image to obtain a classification result of each pixel in the template image;
    根据预存的分类结果与姿势模板的对应关系、以及所述模板图像中各个像素的分类结果,从所述姿势数据库中获得所述模板图像对应的姿势模板。The gesture template corresponding to the template image is obtained from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.
  25. 根据权利要求24所述的装置,其特征在于,所述处理器还用于:根据所述姿势模板在所述模板图像中生成所述目标区域。The apparatus of claim 24, wherein the processor is further configured to: generate the target area in the template image according to the gesture template.
  26. 根据权利要求19所述的装置,其特征在于,所述处理器还用于:若所述前景目标图像与所述预设的姿势模板之间的差异满足预设条件,确定匹配成功。The apparatus according to claim 19, wherein the processor is further configured to: determine that the matching is successful if the difference between the foreground target image and the preset gesture template satisfies a preset condition.
  27. 根据权利要求26所述的装置,其特征在于,所述预设条件包括以下至少一种:The device according to claim 26, wherein the preset condition includes at least one of the following:
    所述前景目标图像指示的第二前景目标的轮廓结构与所述姿势模板的轮廓结构之间的相似度大于预设阈值;The similarity between the contour structure of the second foreground target indicated by the foreground target image and the contour structure of the gesture template is greater than a preset threshold;
    或者,所述前景目标图像指示的第二前景目标的关节点检测结果与所述姿势模板的关节点检测结果的差异在预设范围内。Or, the difference between the joint point detection result of the second foreground target indicated by the foreground target image and the joint point detection result of the posture template is within a preset range.
  28. 根据权利要求22或26所述的装置,其特征在于,所述关节点检测结果包括以下至少一种:关节点之间的角度、关节点类型或者关节点的分布位置。The device according to claim 22 or 26, wherein the joint point detection result includes at least one of the following: an angle between joint points, a joint point type, or a distribution position of the joint points.
  29. 根据权利要求19所述的装置,其特征在于,所述处理器还用于:根据所述前景目标图像指示的第二前景目标的轮廓结构,对所述目标区域进行形变处理;将所述前景目标图像填入形变后的目标区域。The device according to claim 19, wherein the processor is further configured to: perform deformation processing on the target area according to the outline structure of the second foreground target indicated by the foreground target image; The target image fills the deformed target area.
  30. 根据权利要求19所述的装置,其特征在于,所述处理器还用于:根据所述目标区域的尺寸,对所述前景目标图像进行形变处理;将形变后的前景目标图像填入所述目标区域中。The device according to claim 19, wherein the processor is further configured to: perform deformation processing on the foreground target image according to the size of the target area; fill the deformed foreground target image into the in the target area.
  31. 根据权利要求19所述的装置,其特征在于,所述处理器还用于:对所述素材图像进行语义分割处理,获得所述素材图像中各个像素的分类结果;根据所述素材图像中各个像素的分类结果获取所述前景目标图像。The device according to claim 19, wherein the processor is further configured to: perform semantic segmentation processing on the material image to obtain a classification result of each pixel in the material image; The classification result of the pixels obtains the foreground target image.
  32. 根据权利要求19所述的装置,其特征在于,所述素材图像为素材图像集中的其中一个,以及,所述姿势模板为姿势模板集中的其中一个;The apparatus according to claim 19, wherein the material image is one of a set of material images, and the gesture template is one of a set of gesture templates;
    所述处理器还用于:The processor is also used to:
    在获取所述素材图像集中的素材图像的前景目标图像之后,如果所述前景目标图像与所述姿势模板集中的姿势模板匹配成功,将所述前景目标图像与所述模板图像进行融合,生成融合图像;After acquiring the foreground target image of the material image in the material image set, if the foreground target image is successfully matched with the gesture template in the gesture template set, the foreground target image and the template image are fused to generate a fusion image;
    根据所述素材图像集对应的融合图像生成融合视频。A fusion video is generated according to the fusion images corresponding to the material image set.
  33. 根据权利要求32所述的装置,其特征在于,所述模板图像中的目标区域根据与所述前景目标图像匹配的目标姿势模板生成;The device according to claim 32, wherein the target area in the template image is generated according to a target pose template matched with the foreground target image;
    所述处理器还用于:将所述前景目标图像填入所述模板图像的目标区域中。The processor is further configured to: fill the foreground target image into the target area of the template image.
  34. 根据权利要求32所述的装置,其特征在于,所述模板图像为模板图像集中的其中一个;The device according to claim 32, wherein the template image is one of a template image set;
    所述处理器还用于:The processor is also used to:
    根据与所述前景目标图像匹配的目标姿势模板,在所述模板图像集中确定目标模板图像;determining a target template image in the template image set according to a target pose template matching the foreground target image;
    将所述前景目标图像与所述目标模板图像进行融合。The foreground target image is fused with the target template image.
  35. 根据权利要求34所述的装置,其特征在于,所述处理器还用于:The apparatus of claim 34, wherein the processor is further configured to:
    确定所述模板图像集中与所述目标模板图像相邻的模板图像;所述与所述目标模板图像相邻的模板图像包括采集时间相邻的模板图像;determining the template images adjacent to the target template image in the template image set; the template images adjacent to the target template image include template images adjacent to the acquisition time;
    将所述前景目标图像、所述目标模板图像以及所述与所述目标模板图像相邻的模板图像进行融合。The foreground target image, the target template image and the template image adjacent to the target template image are fused.
  36. 根据权利要求19所述的装置,其特征在于,所述处理器还用于:根据所述姿势图像的色域范围以及所述模板图像的色域范围,将所述模板图像和所述前景目标图像进行色彩融合处理。The apparatus according to claim 19, wherein the processor is further configured to: combine the template image and the foreground target according to the color gamut range of the pose image and the color gamut range of the template image The image is processed by color fusion.
  37. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至18任一项所述的方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 18 is implemented.
PCT/CN2021/079924 2021-03-10 2021-03-10 Method and device for image processing, and storage medium WO2022188056A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/079924 WO2022188056A1 (en) 2021-03-10 2021-03-10 Method and device for image processing, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/079924 WO2022188056A1 (en) 2021-03-10 2021-03-10 Method and device for image processing, and storage medium

Publications (1)

Publication Number Publication Date
WO2022188056A1 true WO2022188056A1 (en) 2022-09-15

Family

ID=83226163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079924 WO2022188056A1 (en) 2021-03-10 2021-03-10 Method and device for image processing, and storage medium

Country Status (1)

Country Link
WO (1) WO2022188056A1 (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130028517A1 (en) * 2011-07-27 2013-01-31 Samsung Electronics Co., Ltd. Apparatus, method, and medium detecting object pose
CN105120144A (en) * 2015-07-31 2015-12-02 小米科技有限责任公司 Image shooting method and device
CN107230182A (en) * 2017-08-03 2017-10-03 腾讯科技(深圳)有限公司 A kind of processing method of image, device and storage medium
CN107808373A (en) * 2017-11-15 2018-03-16 北京奇虎科技有限公司 Sample image synthetic method, device and computing device based on posture
CN109299659A (en) * 2018-08-21 2019-02-01 中国农业大学 A kind of human posture recognition method and system based on RGB camera and deep learning
CN109743504A (en) * 2019-01-22 2019-05-10 努比亚技术有限公司 A kind of auxiliary photo-taking method, mobile terminal and storage medium
CN110113523A (en) * 2019-03-15 2019-08-09 深圳壹账通智能科技有限公司 Intelligent photographing method, device, computer equipment and storage medium
CN110335277A (en) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 Image processing method, device, computer readable storage medium and computer equipment
CN110473266A (en) * 2019-07-08 2019-11-19 南京邮电大学盐城大数据研究院有限公司 A kind of reservation source scene figure action video generation method based on posture guidance
CN110602396A (en) * 2019-09-11 2019-12-20 腾讯科技(深圳)有限公司 Intelligent group photo method and device, electronic equipment and storage medium
CN111062276A (en) * 2019-12-03 2020-04-24 广州极泽科技有限公司 Human body posture recommendation method and device based on human-computer interaction, machine readable medium and equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130028517A1 (en) * 2011-07-27 2013-01-31 Samsung Electronics Co., Ltd. Apparatus, method, and medium detecting object pose
CN105120144A (en) * 2015-07-31 2015-12-02 小米科技有限责任公司 Image shooting method and device
CN107230182A (en) * 2017-08-03 2017-10-03 腾讯科技(深圳)有限公司 A kind of processing method of image, device and storage medium
CN107808373A (en) * 2017-11-15 2018-03-16 北京奇虎科技有限公司 Sample image synthetic method, device and computing device based on posture
CN109299659A (en) * 2018-08-21 2019-02-01 中国农业大学 A kind of human posture recognition method and system based on RGB camera and deep learning
CN109743504A (en) * 2019-01-22 2019-05-10 努比亚技术有限公司 A kind of auxiliary photo-taking method, mobile terminal and storage medium
CN110113523A (en) * 2019-03-15 2019-08-09 深圳壹账通智能科技有限公司 Intelligent photographing method, device, computer equipment and storage medium
CN110335277A (en) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 Image processing method, device, computer readable storage medium and computer equipment
CN110473266A (en) * 2019-07-08 2019-11-19 南京邮电大学盐城大数据研究院有限公司 A kind of reservation source scene figure action video generation method based on posture guidance
CN110602396A (en) * 2019-09-11 2019-12-20 腾讯科技(深圳)有限公司 Intelligent group photo method and device, electronic equipment and storage medium
CN111062276A (en) * 2019-12-03 2020-04-24 广州极泽科技有限公司 Human body posture recommendation method and device based on human-computer interaction, machine readable medium and equipment

Similar Documents

Publication Publication Date Title
CN109618222B (en) A kind of splicing video generation method, device, terminal device and storage medium
US11960651B2 (en) Gesture-based shared AR session creation
CN109688463A (en) A kind of editing video generation method, device, terminal device and storage medium
US20220237812A1 (en) Item display method, apparatus, and device, and storage medium
US11790625B2 (en) Messaging system with augmented reality messages
CN113362263B (en) Method, apparatus, medium and program product for transforming an image of a virtual idol
US20190318543A1 (en) R-snap for production of augmented realities
US20230410811A1 (en) Augmented reality-based translation of speech in association with travel
CN109600559B (en) Video special effect adding method and device, terminal equipment and storage medium
US11017233B2 (en) Contextual media filter search
CN111009028A (en) Expression simulation system and method of virtual face model
US20220319231A1 (en) Facial synthesis for head turns in augmented reality content
CN110148406B (en) Data processing method and device for data processing
CN109218615A (en) Image taking householder method, device, terminal and storage medium
US20220292690A1 (en) Data generation method, data generation apparatus, model generation method, model generation apparatus, and program
WO2022188056A1 (en) Method and device for image processing, and storage medium
CN109857244B (en) Gesture recognition method and device, terminal equipment, storage medium and VR glasses
CN112637692B (en) Interaction method, device and equipment
CN114339393A (en) Display processing method, server, device, system and medium for live broadcast picture
US20210319622A1 (en) Intermediary emergent content
CN112907702A (en) Image processing method, image processing device, computer equipment and storage medium
CN108334806B (en) Image processing method and device and electronic equipment
US10325408B2 (en) Method and device for presenting multimedia information
CN115714888B (en) Video generation method, device, equipment and computer readable storage medium
WO2024051467A1 (en) Image processing method and apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929535

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929535

Country of ref document: EP

Kind code of ref document: A1