WO2022188056A1

WO2022188056A1 - Method and device for image processing, and storage medium

Info

Publication number: WO2022188056A1
Application number: PCT/CN2021/079924
Authority: WO
Inventors: 聂谷洪; 胡晓翔; 施泽浩
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2022-09-15

Abstract

A method and device for image processing, and a storage medium. The method comprises: acquiring a stock image and a template image; performing foreground segmentation processing with respect to the stock image to acquire a foreground target image; matching the foreground target image with a preset posture template; and if successfully matched, filling the foreground target image into the template image, thus fusing the template image and the foreground target image to produce a fused image. The present embodiment increases the quality of the fused image.

Description

Image processing method, device and storage medium

technical field

The present application relates to the technical field of image processing, and in particular, to an image processing method, device, and storage medium.

Background technique

With the development of intelligent equipment in today's society, image processing has become an indispensable part of people's lives. Whether it is professional image processing at work or entertainment image processing in life, one of the more popular image processing One of the processing methods is that the user manually extracts the foreground area in the image, and then fills it into another template image to generate a PS image. In this way, the action of the character represented by the foreground area is likely to be unsightly and not suitable for the template image, thereby reducing the quality of the composite image.

SUMMARY OF THE INVENTION

In view of this, one of the objectives of the present application is to provide an image processing method, device and storage medium.

In a first aspect, an embodiment of the present application provides an image processing method, including:

Get material images and template images;

performing foreground segmentation processing on the material image to obtain a foreground target image;

Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including a processor and a memory storing a computer program;

The processor implements the following steps when executing the computer program:

Get material images and template images;

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the method according to the first aspect.

An image processing method, device, and storage medium provided by the embodiments of the present application acquire a material image and a template image, wherein the material image represents an image that provides a foreground target, and the template image represents an image that provides a background. After obtaining the foreground target image through the segmentation process, the foreground target image is matched with a preset posture template, and if the matching is successful, the template image and the foreground target image are fused to generate a fusion image. In this embodiment, it is determined whether the posture of the foreground target in the foreground target image meets the preset requirements (such as meeting the requirements of beautiful display or the requirements of some other scenes) through the gesture template, and in the case of successful matching, it indicates that The posture of the foreground target in the foreground target image meets the preset requirements and can be adapted to the template image. In this case, the template image and the foreground target image are fused, thereby helping to improve fusion. image quality.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

Fig. 1, Fig. 5 and Fig. 7 are different flow diagrams of the image processing method provided by an embodiment of the present application;

2A, FIG. 3A and FIG. 6A are different schematic diagrams of template images provided by an embodiment of the present application;

2B is a schematic diagram of an image including a first foreground target provided by an embodiment of the present application;

2C is a schematic diagram of a target area of a template image provided by an embodiment of the present application;

2D is a schematic diagram of a gesture template provided by an embodiment of the present application;

2E is a schematic diagram of joint point detection provided by an embodiment of the present application;

3B and 3C are different schematic diagrams of the correspondence between classification results and gesture templates provided by an embodiment of the present application;

4A is a schematic diagram of a material image provided by an embodiment of the present application;

4B is a schematic diagram of a foreground target image provided by an embodiment of the present application;

4C is a schematic diagram of a fused image provided by an embodiment of the present application;

6B is a schematic diagram of a material image set provided by an embodiment of the present application;

6C is a schematic diagram of a gesture template set and an adapted target area provided by an embodiment of the present application;

6D is a schematic diagram of all fused images of a material image set provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In view of the problems in the related art, an embodiment of the present application provides an image processing method for acquiring a material image and a template image, wherein the material image represents an image that provides a foreground target, and the template image represents an image that provides a background. After performing the foreground segmentation process to obtain the foreground target image, the foreground target image is matched with a preset posture template, and if the matching is successful, the template image and the foreground target image are fused to generate a fusion image. In this embodiment, it is determined whether the posture of the foreground target in the foreground target image meets the preset requirements (such as meeting the requirements of beautiful display) through the gesture template, and in the case of successful matching, it indicates that the foreground target image in the foreground target image. The posture of the foreground target meets the preset requirements and can be adapted to the template image. Only in this case can the template image and the foreground target image be fused, thereby helping to improve the quality of the fused image.

The image processing methods provided in the embodiments of the present application may be executed by an image processing apparatus.

In a possible implementation manner, the image processing device may be a computer chip or an integrated circuit with data processing capability, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC) or off-the-shelf Programmable Gate Array (Field-Programmable Gate Array, FPGA), etc. The image processing apparatus may be installed in a mobile platform, a terminal device, or a server, or other equipment with image processing functions.

In another possible implementation manner, the image processing apparatus may refer to an entity device with image processing functions, such as a movable platform, a terminal device, or a server. Wherein, examples of the movable platform include, but are not limited to, unmanned aerial vehicles, unmanned vehicles, pan-tilts, unmanned ships, or mobile robots. Examples of such terminal devices include, but are not limited to: smartphones/mobile phones, tablet computers, personal digital assistants (PDAs), laptop computers, desktop computers, media content players, video game stations/systems, virtual reality systems, augmented reality Systems, wearable devices (eg, watches, glasses, gloves, headwear (eg, hats, helmets, virtual reality headsets, augmented reality headsets, head mounted devices (HMDs), headbands), pendants, armbands , leg loops, shoes, vest), remote control, or any other type of device.

In an exemplary application scenario, the user can select a template image based on actual needs, for example, multiple template images can be displayed on the interactive interface. After the user selects a template image on the interactive interface, the corresponding template image can be displayed on the interactive interface. One or more of the gesture templates are available for the user to select; or, the user can also select a gesture template that matches the template image from the gesture database according to his own experience or needs; the gesture template can be used to collect or Select a material image to provide guidance. It can be understood that this embodiment of the present application does not impose any restrictions on the specific form of the gesture template, and can be set according to actual application scenarios. For example, the gesture template can be an image, text, or sound, etc. way to present. That is to say, the material image can be collected or selected according to the posture template, for example, the material image of the user can be collected through a handheld gimbal, mobile phone, unmanned aerial vehicle, etc. A separate imaging device to capture the user's material image.

After acquiring the material image and the template image, the image processing device may perform foreground segmentation processing on the material image, acquire a foreground target image, and then match the foreground target image with a preset posture template to achieve the The posture template to evaluate the posture of the foreground target in the foreground target image all meet the preset criteria, if the matching is successful, it indicates that the posture of the foreground target in the foreground target image meets the preset requirements, and the foreground target image If it is suitable for presentation in combination with the template image, the foreground target image can be filled in the template image to fuse the template image and the foreground target image to generate a fusion image, which is conducive to improving the fusion image. It also improves the user's visual experience.

Next, the image processing method provided by the embodiment of the present application will be described: please refer to FIG. 1 , which is a schematic flowchart of an image processing method provided by the embodiment of the present application. The method may be executed by an image processing apparatus. Methods include:

In step S101, a material image and a template image are acquired.

In step S102, foreground segmentation processing is performed on the material image to obtain a foreground target image.

In step S103, the foreground target image is matched with a preset gesture template, and if the matching is successful, the foreground target image is filled in the template image, so that the template image and the foreground target image are matched. Fusion, to generate a fused image.

The material image is used to provide the foreground target, and the template image is used to provide the background. In this embodiment, the fusion image is generated based on the foreground target provided by the material image and the background provided by the material image.

Exemplarily, the material image and the template image may be selected by the user according to actual needs. The material image may be obtained by a user based on an imaging device, and the material image may be an image that has been captured in advance, or may be an image captured by the imaging device in real time.

In some embodiments, the gesture template may be determined according to the template image, the gesture template may be used to provide guidance for the user to collect or select material images, and the gesture template may be in the form of images, text, or sounds, etc. render. Exemplarily, the template image corresponds to one or more gesture templates, and the preset gesture template may be at least one of all gesture templates corresponding to the template image. For example, the preset gesture template may be composed of The user selects from all gesture templates corresponding to the template image. Exemplarily, the preset gesture template may also be at least one gesture template selected by the user according to a template image selected by himself, combined with his own experience or needs, from a gesture database that fits the template image; wherein the The gesture database includes several gesture templates.

Wherein, one or more gesture templates corresponding to the template image can be determined at least in the following ways:

As an example, foreground segmentation processing may be performed on the template image to obtain a first foreground target, and then one or more gesture templates corresponding to the template image may be generated according to the first foreground target.

As an example, semantic segmentation processing may also be performed on the template image in advance to obtain a classification result of each pixel in the template image, and one or more gesture templates corresponding to the template image may be determined from the gesture database according to the classification result. .

In this embodiment, after the foreground target image is obtained by performing the foreground segmentation process on the material image, the foreground target image and the template image are not directly fused. Whether the posture of the foreground target meets the preset requirements (such as meeting the requirements of beautiful display or meeting certain scene requirements), by matching the foreground target image with the preset posture template, the foreground target image and the preset When the posture template matching is successful, it is determined that the posture of the foreground target in the target image can be adapted to the template image, and the combination of the two can have a good presentation effect, so the foreground target image is filled in the The template image is used to fuse the template image and the foreground target image to generate a fusion image, thereby helping to improve the quality of the fusion image.

In some embodiments, the template image includes a preset target area, and the filling the foreground target image into the template image includes: filling the foreground target image into the target area, and the The target area can represent an area in the template image that has a high degree of adaptation to the foreground target image. When the foreground target image is filled in the target area, the final generated fusion image has a good and natural appearance. Presentation effect; in one example, the foreground template image may be an image of a "bicycle", the template image is an image containing a road, and the target area may be a road in the template image your region.

The following describes the determination process of the gesture template and the target area in detail:

In a first possible implementation manner, the image processing apparatus may perform the following processing on all template images in advance, or perform the following processing on the acquired template images in response to a template image selection instruction from a user: the image processing apparatus The template image shown in FIG. 2A can be subjected to foreground segmentation processing to obtain the first foreground target shown in FIG. 2B (for the convenience of distinction, the foreground target obtained from the template image is called the first foreground target, and the first foreground target is obtained from the The foreground target obtained in the image is called the second foreground target), and the target area shown in FIG. 2C in the template image is determined according to the first foreground target, and finally, according to the target area and/or the first target area A foreground target is generated, and the gesture template is generated; in this embodiment, the target area and the gesture template are both determined based on the first foreground target in the same template image. In the case of successful posture template matching, the foreground target image of the material image is filled into the target area of the template image, so as to achieve a good match between the foreground target image of the material image and the background provided by the template image. Combining and realistic rendering, which is beneficial to improve the quality of the fused image.

For the presentation form of the gesture template: in one example, for example, when the gesture template is presented in the form of an image, it can be as shown in Figure 2D; in another example, for example, when the gesture template is presented in the form of text, it can be " In another example, for example, the gesture template presented in the form of sound may be a speech signal of "open hands and jump". It can be understood that FIG. 2A , FIG. 2B , FIG. 2C and FIG. 2D illustrate the template image, the first foreground target, the target area and the gesture template respectively, and do not constitute a limitation on the above processing procedure of the present application.

Wherein, when acquiring the first foreground target, in an example, the image processing apparatus may perform semantic segmentation processing on the template image, obtain a classification result of each pixel in the template image, and then according to the The classification result of each pixel in the template image obtains the first foreground target; for example, after foreground segmentation is performed on the template image shown in FIG. 2A, the classification result that the pixel belongs to any one of {sky, tree, person} can be obtained , the first foreground target can be obtained according to the pixels whose classification result is human. In this embodiment, based on the semantic segmentation method, each pixel in the template image can be accurately classified, thereby improving the accuracy of the obtained first foreground target.

Certainly, the first foreground target may also be obtained in other manners, for example, a trimap algorithm is used to obtain the first foreground target, which is not limited in this embodiment.

Wherein, when acquiring the gesture template, in the first example, the image processing apparatus may generate the gesture template according to the contour information of the target area; in the second example, please refer to FIG. 2E, The image processing device may perform joint point detection on the first foreground target to obtain joint point detection results, and the joint point detection results may include at least one of the following: an angle between joint points, a joint point type or a joint point distribution position, and then the pose template shown in FIG. 2D can be generated according to the joint point detection result; in the third example, the image processing device can combine the contour information of the target area and the first foreground The joint point detection result of the target is used to obtain the pose template. Wherein, the gesture template includes, but is not limited to, presentation in the form of images, text, or sounds. In this embodiment, the target area and the gesture template are both determined based on the first foreground target in the same template image, then the foreground target image of the material image that is successfully matched with the gesture template is the same as the target image. The region also has a high degree of fit. In the case of successful matching, filling the foreground target image into the target region is beneficial to improve the authenticity of the subsequently obtained fused image.

In an exemplary application scenario, multiple template images are displayed on the interactive interface, and the user can select the template image according to actual needs on the interactive interface, that is, select the background to be merged; It is the image shown in FIG. 2A , that is, the image of the target area and the posture template has not been obtained through the above-mentioned processing. After the user selects the template image, the image processing device can respond to the user's template image selection instruction for the selected template. Perform the above processing on the image to obtain the target area and posture template of the template image; in another example, the template image may be an image as shown in FIG. 2C , that is, the template image has been processed in advance to obtain the target Area and gesture template, after the user selects the template image, the gesture template obtained in the above manner can be correspondingly displayed on the interactive interface, so as to provide guidance for the user to collect or select material images through the gesture template.

In a second possible implementation manner, the gesture template may be acquired from a gesture database including several gesture templates according to the template image. Exemplarily, the gesture template may be obtained by the user from a gesture database including several gesture templates based on the user's own experience or needs according to the selected template image. Exemplarily, the gesture template may be acquired by the image processing apparatus from a gesture database including several gesture templates according to the template image; The device can perform semantic segmentation processing on the template image, and obtain the classification result of each pixel in the template image; for example, after performing semantic segmentation on the template image shown in FIG. 3A, it can be obtained that the pixels belong to {beach, sea, mountain} The classification result of any one of ; and then the image processing apparatus may obtain one or more gesture templates corresponding to the template image from the gesture database according to the classification result. Wherein, when there are multiple acquired gesture templates, the user may select at least one gesture template to be referenced according to actual needs.

In one example, the image processing apparatus may obtain one or more corresponding to the template image from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image. Multiple pose templates; for example, as shown in Figures 3B and 3C, there are 3 pose templates corresponding to the classification result of "beach", and 4 pose templates corresponding to the classification result of "road", where the pose template is an image example.

In another example, a classification model may be pre-trained, and the classification model may be used to obtain a pose template according to the classification result; as an example, the classification model may be based on a supervised learning method, using a deep learning model, using Several classification result samples carrying pose template labels are obtained by training. Then, the image processing apparatus may input the classification result into the classification model, and obtain one or more gesture templates corresponding to the template image through the classification model.

Further, the target area may be generated in the template image according to the selected gesture template. In one example, contour information of the gesture template may be acquired, and then the target area may be generated in the template image according to the contour information. In another example, the outline information of the gesture template may be acquired, and then the target area is generated according to the outline information in the area specified by the user in the template image. Wherein, exemplarily, in the case that the gesture template is presented in the form of an image, edge detection may be performed on the image to obtain the outline information of the gesture template; in the case that the gesture template is presented in the form of text or sound, the The outline information of the gesture template is determined according to the semantic information obtained from the text or sound.

In yet another example, considering the corresponding relationship between the classification result and the gesture template, it can well reflect the suitable position of the foreground target image that is successfully matched with the gesture template in the template image, for example, the classification result corresponds to "road" There are 4 kinds of posture templates, and the foreground target images that are successfully matched with the 4 kinds of posture templates are suitable to be presented at the position of the road. Therefore, in order to further improve the realism of the finally generated fusion image, the image processing device can obtain The outline information of the posture template and the region where the pixels belonging to the classification result corresponding to the posture template are determined in the template image, and then a target area is generated in the area and/or the adjacent area of the area according to the outline information For example, the corresponding classification result of the selected gesture template is "road", then can be generated according to the outline information of the gesture template in the area where the pixels belonging to "road" are located in the template image and/or in the adjacent area of the area For the target area, in this embodiment, according to the corresponding relationship between the classification result and the posture template, a place in the template image suitable for generating the target area is determined, so that the final generated fusion image is more natural and less rigid, which is beneficial to improve the real effect of the fusion image. , to enhance the user's visual experience.

In some embodiments, the user may select a material image with reference to the gesture template or use an imaging device to collect the material image in real time. After obtaining the material image, the image processing device may perform foreground segmentation on the material image in step S102 Process to obtain the foreground target image. In a possible implementation manner, the image processing apparatus may perform semantic segmentation processing on the material image, obtain the classification result of each pixel in the material image, and then obtain the classification result of each pixel in the material image according to the classification result of each pixel in the material image. The foreground target image; for example, after foreground segmentation is performed on the material image shown in FIG. 4A, the classification result that the pixel belongs to any one of {buildings, plants, and people} can be obtained, and then the pixels whose classification results are people can be obtained as follows: The foreground object image shown in Figure 4B. Based on the semantic segmentation method in this embodiment, each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.

It can be understood that, FIG. 4A and FIG. 4B are examples of the material image and the foreground target image respectively, and are not construed as a limitation of the above processing manner.

After acquiring the foreground target image, in step S103, the image processing apparatus may match the foreground target image with a preset gesture template, and this embodiment implements evaluating the foreground target by using the gesture template How well the image fits the template image. In some embodiments, the matching process is to determine whether the difference between the foreground target image and the gesture template satisfies a preset condition, and if so, it is determined that the foreground target image and the gesture template are successfully matched, indicating that the The foreground target image obtained from the image is adapted to the template image. If it is not satisfied, it is determined that the foreground target image and the gesture template are not successfully matched, indicating that the foreground target image obtained from the material image and the template image are not suitable to be presented together, which may not be natural. Optionally, an indication can be output. Reminder information that cannot be merged, to remind the user that another material image can be selected or the image fusion process can be ended.

In one example, in the process of determining whether the difference between the foreground target image and the gesture template satisfies a preset condition, the contour structure and the gesture of the second foreground target indicated by the foreground target image may be acquired The contour structure of the template, for example, edge detection can be performed on the foreground target image and the posture template respectively, so as to obtain the contour structures of the two, and then determine whether the matching is successful according to the similarity between the two contour structures, such as When the similarity between the two is greater than the preset threshold, it is determined that the matching is successful, indicating that the foreground target image obtained from the material image is suitable for filling in the target area of the template image. It can be understood that, the preset threshold may be specifically set according to the actual application scenario.

In another example, in the process of determining whether the difference between the foreground target image and the posture template satisfies a preset condition, joint point detection may be performed on the second foreground target indicated by the foreground target image to obtain The joint point detection result of the second foreground target; and when the posture template is presented in the form of an image, performing joint point detection on the posture template to obtain the joint point detection result of the posture template; the joint point detection result includes: At least one of the following: the angle between the joint points, the type of the joint points, or the distribution position of the joint points; and then determine whether the matching is successful according to the difference between the joint point detection result of the second foreground target and the joint point detection result of the pose template , for example, if the difference between the two is within a preset range, it is determined that the matching is successful, indicating that the foreground target image obtained from the material image is suitable for filling in the target area of the template image. It can be understood that, the preset range can be specifically set according to the actual application scenario.

When the foreground target image is successfully matched with the gesture template, the image processing apparatus may fill the foreground target image into the target area to fuse the template image and the foreground target image , generate a fusion image; for example, fill the foreground image shown in Figure 4B into the target area shown in Figure 2C, and fuse the template image and the foreground target image to generate a fusion image as shown in Figure 4C . In this embodiment, the degree of adaptation between the foreground target image and the template image is determined by the gesture template, and the foreground target image is filled into the target area of the template image when the matching is successful, so that the resulting fusion The image is more natural and not rigid, which is conducive to improving the real effect of the fusion image and improving the user's visual experience.

In the fusion process, in order to achieve a better fusion effect, the following deformation processing process can be considered based on the actual scene, so that the foreground target image completely corresponds to the target area:

In a first possible implementation manner, considering that the second foreground target and the target area may not be exactly the same, in order to achieve a better fusion effect, the first foreground target image may indicate the first Two contour structures of the foreground target, perform deformation processing on the target area, so that the contour structure of the target area can be completely adapted to the second foreground target, and then fill the foreground target image into the deformed target area. , so that a better fusion effect can be obtained and the resulting fusion image is more natural.

In the second possible implementation manner, considering that the size of the second foreground target and the target area may not match, therefore, in order to achieve a better fusion effect, according to the size of the target area, Perform deformation processing on the foreground target image, so that the size of the second foreground target indicated by the foreground target image can be adapted to the target area, and then fill the deformed foreground target image into the target area, thereby A better fusion effect can be obtained, making the resulting fusion image more natural.

It can be understood that, in the actual application process, before filling the foreground target image into the target area, any one of the above deformation methods can be used, and the above two deformation methods can also be used simultaneously. Do not make any restrictions.

In some embodiments, in order to further improve the fusion effect, when the template image and the foreground target image are fused, according to the color gamut range of the pose image and the color gamut range of the template image, the The template image and the foreground target image are subjected to color fusion processing; for example, the target color gamut range of the fusion image to be generated can be determined according to the color gamut range of the posture image and the color gamut range of the template image, and then according to the The target color gamut range performs color mapping on the image preliminarily synthesized by the template image and the foreground target image to generate the fusion image. Obtrusive, further increasing the realism of the fused image.

In addition to generating a single fused image, in the embodiment shown in FIG. 5 , a fused video can also be generated. FIG. 5 is a schematic flowchart of another image processing method provided by this embodiment of the present application. A template image is used to generate a fusion video, the material image may be one of a set of material images, and the gesture template may be one of a set of gesture templates, the method may be performed by an image processing apparatus, and the method include:

In step S201, a material image set and a template image are acquired.

In step S202, foreground segmentation processing is performed on a plurality of material images included in the material image set, respectively, and a foreground target image corresponding to each material image is obtained.

In step S203, for the foreground target image corresponding to each material image, the foreground target image is matched with the gesture template in the gesture template set, and if the matching is successful, the foreground target image is filled in the template image to The template image and the foreground target image are fused to generate a fusion image.

In step S204, a fused video is generated according to the fused images corresponding to the material image set.

In some embodiments, the multiple material images included in the material image set may be multiple material images selected by the user according to actual needs, or may be multiple frames in a piece of material video.

In some embodiments, the gesture template set may be obtained from a database including a plurality of gesture template sets according to the template image. Specifically, the image processing apparatus may perform semantic segmentation processing on the template image, Obtain the classification result of each pixel in the template image, and then obtain the pose template set corresponding to the template image from the pose database according to the classification result; for example, according to the correspondence between the pre-stored classification result and the pose template set, and the classification result of each pixel in the template image, and obtain the gesture template set corresponding to the template image from the gesture database.

In some embodiments, the target area may be generated in the template image according to the gesture template in the gesture template set, so that the foreground target image corresponding to the material image and the template image have good rendering results. Wherein, the target area may be generated in the template image according to the outline information of the gesture template (refer to the above description about generating the target area in the template image, which will not be repeated here). In this embodiment, considering that the foreground target image fused with the same template image may be different, the target area in the template image is adaptively generated based on different gesture templates in the gesture template set, so as to satisfy the needs of different gesture templates. Matching fusion requirements of different foreground target images, so that the foreground target image corresponding to the material image and the template image have a good presentation result after fusion.

Exemplarily, for example, for each gesture template in the gesture template set, each template image that only includes a target area corresponding to the gesture template may be acquired.

In some embodiments, after acquiring the material image set, the image processing apparatus may perform semantic segmentation processing on a plurality of material images included in the material image set, respectively, to obtain a classification result of each pixel in each material image, Then, for each material image, a foreground target image corresponding to the material image is obtained according to the classification result of each pixel in the material image. Based on the semantic segmentation method in this embodiment, each pixel in the material image can be accurately classified, thereby improving the accuracy of the obtained foreground target image.

Then, for the foreground target image corresponding to each material image, the image processing apparatus matches the foreground target image with the gesture templates in the gesture template set; for example, the foreground target image may be sequentially matched with the gesture templates in the gesture template set. Matching is performed until the matching is successful; or, in order to save computing resources, the foreground target image may be matched with one or more gesture templates specified in the gesture template set.

If the foreground target image is successfully matched with the gesture templates in the gesture template set, a target gesture template that is successfully matched with the foreground target image is determined, and then a target area can be generated in the template image according to the target gesture template, Then, the foreground target image is filled into the target area to fuse the template image and the foreground target image to generate a fusion image; finally, a fusion video is generated according to the fusion image corresponding to each material image. In this embodiment, if the matching is successful, the foreground target image obtained based on the material image set can be fused with the same template image, and the target area in the same template image is adaptively generated based on different posture templates, so that the The fusion video of different actions in the same scene further improves the user experience.

Wherein, the target area may be generated in advance based on each gesture template in the gesture template set; or if the matching is successful, the target area may be generated in the template image in real time according to the target gesture template matched with the foreground target image, This embodiment does not impose any restrictions on the generation timing of the target area.

In an exemplary application scenario, the user can select a template image and a set of material images according to actual needs, and then the image processing apparatus can acquire the corresponding gesture from the gesture database according to the template image after acquiring the template image. A template set; as an example, as shown in FIG. 6A, for example, the template image may be an image with a road background, as shown in FIG. 6B (3 material images are used as an example in FIG. 6B), a plurality of materials in the material image set The image can be obtained by photographing the running process of the user, as shown in FIG. 6C ( FIG. 6C shows three examples of posture templates in the form of images), and the plurality of posture templates in the posture template set represent running posture templates in the running process. .

The image processing apparatus may perform foreground segmentation processing on a plurality of material images included in the material image set, respectively, to obtain foreground target images corresponding to each material image. Then, for the foreground target image corresponding to each material image, match the foreground target image with the posture template in the posture template set, if the matching is successful, obtain the target posture template matching the foreground target image, and then according to the target posture template The gesture template generates a target area in the template image (as shown in FIG. 6C ), and then fills the foreground target image into the target area to fuse the template image and the foreground target image to generate The fusion image shown in Figure 6D; finally, a fusion video is generated according to the fusion image corresponding to each material image. In this embodiment, if the matching is successful, the foreground target image obtained based on the material image set can be fused with the same template image, and the target area in the same template image is adaptively generated based on different posture templates, so that the The fusion video of different actions in the same scene further improves the user experience.

Correspondingly, in addition to generating a single fused image, in the embodiment shown in FIG. 7 , a fused video can also be generated. Referring to FIG. 7 , this embodiment of the present application also provides another image processing method. This embodiment can generate a fusion video based on a template image set, where the material image is one of the material image sets; the gesture template is a gesture one of the template set; and, the template image is one of the template image set; the method can be performed by an image processing apparatus, and the method includes:

In step S301, a material image set and a template image set are acquired.

In step S302, foreground segmentation processing is performed on a plurality of material images included in the material image set, respectively, and a foreground target image corresponding to each material image is obtained.

In step S303, for the foreground target image corresponding to each material image, match the foreground target image with the gesture template in the gesture template set; if the matching is successful, according to the target gesture template matched with the foreground target image, in the A target template image is determined in the template image set; the foreground target image and the target template image are fused to generate a fusion image.

In step S304, a fusion video is generated according to the fusion image corresponding to the material image set.

In some embodiments, the gesture template set may be determined according to the template image set; each template image in the template image set corresponds to one or more gesture templates, which may be determined based on each template in the template image set One or more gesture templates corresponding to the images are used to generate the gesture template set, that is, there is a corresponding relationship between the gesture templates in the gesture template set and the template images in the template image set. In one example, one or more gesture templates corresponding to the template image may be generated according to the first foreground target in the template image; in another example, one or more gesture templates corresponding to the template image , which may be obtained from a gesture database based on the template image.

In some embodiments, the template images in the template image set include a preset target area, and filling the foreground target image into the target template image includes: filling the foreground target image into the target template image. in the target area of the target template image.

Exemplarily, foreground segmentation processing may be performed on the template images in the template image set to obtain a first foreground target, and the target area in the template image may be determined according to the first foreground target; and then according to the target region and/or the first foreground target, the gesture template of the template image is generated; finally, the gesture template set is obtained based on all gesture templates corresponding to all template images in the template image set. In this embodiment, both the target area and the gesture template are determined based on the first foreground target in the same template image, and when it is determined that the foreground target image of the material image is successfully matched with the gesture template, the The foreground target image of the material image is filled in the target area of the template image, so as to achieve a good combination and real presentation of the foreground target image of the material image and the background provided by the template image.

In other embodiments, semantic segmentation processing may be performed on the template image in the template image to obtain a classification result of each pixel in the template image; Obtain the posture template corresponding to each of the template images in the database; in one example, it can be obtained from the posture database according to the correspondence between the pre-stored classification results and the posture templates, and the classification results of each pixel in the template image. The posture template corresponding to each of the template images; then the posture template set can be obtained based on all posture templates corresponding to all the template images in the template image set; The target area is generated in the template image.

If the foreground target image is successfully matched with the gesture template in the gesture template set, a target template corresponding to the target gesture template may be determined in the template image set according to the target gesture template successfully matched with the foreground target image Then fill the foreground target image into the target area of the target template image to fuse the foreground target image and the target template image to generate a fusion image; finally generate a fusion image corresponding to each material image Fusion video. In this embodiment, when the matching is successful, the foreground target image obtained based on the material image set can be fused with the template image in the template image set, thereby generating a fusion video of performing the same or different actions in different scenes, further improving the user experience. user experience.

In some embodiments, in the process of fusion, in order to further improve the fusion effect, when the foreground target image and the target template image are fused, it may be determined that the template image set is the same as the target template image. adjacent template images, the template images adjacent to the target template image include template images adjacent to the acquisition time; for example, when the template image set is a set of frames of a template video, the template images adjacent to the target template image The adjacent template images may be the frames before and after the target template image; finally, the foreground target image, the target template image, and the template image adjacent to the target template image are fused. The template image adjacent to the target template image is used to fill the vacant position of the foreground target image and the target template image after preliminary synthesis, so as to achieve a better fusion effect.

In a possible implementation manner, according to the position information of the target area of the target template image, a partial image may be acquired at the same position in the template image adjacent to the target template image, and then the foreground target The image, the target template image and the partial image are fused. In this embodiment, the partial image can be used to fill in the vacant position of the foreground target image and the target template image after preliminary synthesis, so as to achieve better fusion. Effect.

Correspondingly, referring to FIG. 8 , an embodiment of the present application further provides an image processing apparatus, including a processor 401 and a memory 402 storing a computer program;

The processor 401 implements the following steps when executing the computer program:

Get material images and template images;

In one embodiment, the template image includes a preset target area.

The processor 401 is further configured to: fill the foreground target image into the target area.

In one embodiment, the processor 401 is further configured to: perform foreground segmentation processing on the template image, obtain a first foreground target, and determine the target area in the template image according to the first foreground target ; generating the gesture template according to the target area and/or the first foreground target.

In an embodiment, the processor 401 is further configured to: perform joint point detection on the first foreground target to obtain a joint point detection result; and generate the posture template according to the joint point detection result.

In one embodiment, the gesture template is selected from a gesture database according to the template image; the gesture database includes several gesture templates.

In one embodiment, the processor 401 is further configured to:

Semantic segmentation is performed on the template image to obtain a classification result of each pixel in the template image;

The gesture template corresponding to the template image is obtained from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.

In an embodiment, the processor 401 is further configured to: generate the target area in the template image according to the gesture template.

In one embodiment, the processor 401 is further configured to: determine that the matching is successful if the difference between the foreground target image and the preset gesture template satisfies a preset condition.

In one embodiment, the preset condition includes at least one of the following:

The similarity between the contour structure of the second foreground target indicated by the foreground target image and the contour structure of the posture template is greater than a preset threshold; or, the joint point detection result of the second foreground target indicated by the foreground target image. The difference from the joint point detection result of the gesture template is within a preset range.

In an embodiment, the joint point detection result includes at least one of the following: an angle between joint points, a joint point type, or a distribution position of the joint points.

In one embodiment, the processor 401 is further configured to: perform deformation processing on the target area according to the outline structure of the second foreground target indicated by the foreground target image; target area.

In an embodiment, the processor 401 is further configured to: perform deformation processing on the foreground target image according to the size of the target area; and fill the deformed foreground target image into the target area.

In one embodiment, the processor 401 is further configured to: perform semantic segmentation processing on the material image to obtain a classification result of each pixel in the material image; obtain the classification result of each pixel in the material image according to the classification result of each pixel in the material image Describe the foreground target image.

In one embodiment, the material image is one of a set of material images, and the gesture template is one of a set of gesture templates.

The processor 401 is further configured to: after acquiring the foreground target image of the material image in the material image set, if the foreground target image is successfully matched with the gesture template in the gesture template set, compare the foreground target image with the gesture template in the gesture template set. The template images are fused to generate a fused image; and a fused video is generated according to the fused images corresponding to the material image set.

In one embodiment, the target area in the template image is generated according to a target pose template matching the foreground target image.

The processor is further configured to: fill the foreground target image into the target area of the template image.

In one embodiment, the template image is one of a set of template images.

The processor 401 is also used for:

A target template image is determined in the template image set according to a target pose template matched with the foreground target image; the foreground target image and the target template image are fused.

In one embodiment, the processor 401 is further configured to:

determining the template images adjacent to the target template image in the template image set; the template images adjacent to the target template image include template images adjacent to the acquisition time;

The foreground target image, the target template image and the template image adjacent to the target template image are fused.

In one embodiment, the processor 401 is further configured to: perform color fusion processing on the template image and the foreground target image according to the color gamut range of the gesture image and the color gamut range of the template image.

For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The various embodiments described herein can be implemented using computer readable media such as computer software, hardware, or any combination thereof. For hardware implementation, the embodiments described herein can be implemented using application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented. For software implementation, embodiments such as procedures or functions may be implemented with separate software modules that allow the performance of at least one function or operation. The software codes may be implemented by a software application (or program) written in any suitable programming language, which may be stored in memory and executed by a controller.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.

It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The methods and devices provided by the embodiments of the present application have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present application. At the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as a limitation to the application. .

Claims

An image processing method, comprising:

Obtain material images and template images;

performing foreground segmentation processing on the material image to obtain a foreground target image;

Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
The method according to claim 1, wherein the template image comprises a preset target area;

Filling the foreground target image into the template image includes:

The foreground target image is filled in the target area.
The method according to claim 2, wherein the target area and the gesture template are obtained by:

Perform foreground segmentation processing on the template image to obtain a first foreground target, and determine the target area in the template image according to the first foreground target;

The gesture template is generated according to the target area and/or the first foreground target.
The method according to claim 3, wherein the generating the gesture template according to the first foreground target comprises:

Perform joint point detection on the first foreground target to obtain joint point detection results;

The pose template is generated according to the joint point detection result.
The method according to claim 1, wherein the gesture template is selected from a gesture database according to the template image; and the gesture database includes several gesture templates.
The method of claim 5, wherein the gesture template is obtained by:

Semantic segmentation is performed on the template image to obtain a classification result of each pixel in the template image;

The gesture template corresponding to the template image is obtained from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.
The method of claim 6, further comprising:

The target area is generated in the template image according to the gesture template.
The method according to claim 1, wherein the matching is determined to be successful in the following manner:

If the difference between the foreground target image and the preset gesture template satisfies a preset condition, it is determined that the matching is successful.
The method according to claim 8, wherein the preset condition includes at least one of the following:

The similarity between the contour structure of the second foreground target indicated by the foreground target image and the contour structure of the gesture template is greater than a preset threshold;

Or, the difference between the joint point detection result of the second foreground target indicated by the foreground target image and the joint point detection result of the posture template is within a preset range.
The method according to claim 4 or 9, wherein the joint point detection result includes at least one of the following: an angle between joint points, a joint point type, or a distribution position of the joint points.
The method according to claim 2, wherein the filling the foreground target image into the target area comprises:

Perform deformation processing on the target area according to the outline structure of the second foreground target indicated by the foreground target image;

The foreground target image is filled into the deformed target area.
The method according to claim 2, wherein the filling the foreground target image into the target area comprises:

Perform deformation processing on the foreground target image according to the size of the target area;

The deformed foreground target image is filled into the target area.
The method according to claim 1, wherein the performing foreground segmentation processing on the material image comprises:

Perform semantic segmentation processing on the material image to obtain a classification result of each pixel in the material image;

The foreground target image is acquired according to the classification result of each pixel in the material image.
The method according to claim 1, wherein the material image is one of a set of material images, and the gesture template is one of a set of gesture templates;

The method also includes:

After acquiring the foreground target image of the material image in the material image set, if the foreground target image is successfully matched with the gesture template in the gesture template set, the foreground target image and the template image are fused to generate a fusion image;

A fusion video is generated according to the fusion images corresponding to the material image set.
The method according to claim 14, wherein the target area in the template image is generated according to a target pose template matched with the foreground target image;

The fusion of the foreground target image and the template image further includes: filling the foreground target image into a target area of the template image.
The method according to claim 14, wherein the template image is one of a template image set;

The fusion of the foreground target image and the template image includes:

determining a target template image in the template image set according to a target pose template matching the foreground target image;

The foreground target image is fused with the target template image.
The method according to claim 16, wherein the fusion of the foreground target image and the target template image comprises:

determining the template images adjacent to the target template image in the template image set; the template images adjacent to the target template image include template images adjacent to the acquisition time;

The foreground target image, the target template image and the template image adjacent to the target template image are fused.
The method according to claim 1, wherein the fusion of the template image and the foreground target image further comprises:

According to the color gamut range of the gesture image and the color gamut range of the template image, color fusion processing is performed on the template image and the foreground target image.
An image processing device, characterized in that it comprises a processor and a memory storing a computer program;

The processor implements the following steps when executing the computer program:

Obtain material images and template images;

performing foreground segmentation processing on the material image to obtain a foreground target image;

Matching the foreground target image with the preset posture template, if the matching is successful, filling the foreground target image into the template image to fuse the template image and the foreground target image to generate a fusion image.
The device according to claim 19, wherein the template image comprises a preset target area;

The processor is further configured to: fill the foreground target image into the target area.
The apparatus according to claim 19, wherein the processor is further configured to: perform foreground segmentation processing on the template image to obtain a first foreground target, and determine the template image according to the first foreground target The target area in ; the gesture template is generated according to the target area and/or the first foreground target.
The device according to claim 21, wherein the processor is further configured to: perform joint point detection on the first foreground target to obtain joint point detection results; generate the posture according to the joint point detection results template.
The apparatus according to claim 19, wherein the gesture template is selected from a gesture database according to the template image; and the gesture database includes several gesture templates.
The apparatus of claim 23, wherein the processor is further configured to:

Semantic segmentation is performed on the template image to obtain a classification result of each pixel in the template image;

The gesture template corresponding to the template image is obtained from the gesture database according to the pre-stored correspondence between the classification result and the gesture template, and the classification result of each pixel in the template image.
The apparatus of claim 24, wherein the processor is further configured to: generate the target area in the template image according to the gesture template.
The apparatus according to claim 19, wherein the processor is further configured to: determine that the matching is successful if the difference between the foreground target image and the preset gesture template satisfies a preset condition.
The device according to claim 26, wherein the preset condition includes at least one of the following:

The similarity between the contour structure of the second foreground target indicated by the foreground target image and the contour structure of the gesture template is greater than a preset threshold;

Or, the difference between the joint point detection result of the second foreground target indicated by the foreground target image and the joint point detection result of the posture template is within a preset range.
The device according to claim 22 or 26, wherein the joint point detection result includes at least one of the following: an angle between joint points, a joint point type, or a distribution position of the joint points.
The device according to claim 19, wherein the processor is further configured to: perform deformation processing on the target area according to the outline structure of the second foreground target indicated by the foreground target image; The target image fills the deformed target area.
The device according to claim 19, wherein the processor is further configured to: perform deformation processing on the foreground target image according to the size of the target area; fill the deformed foreground target image into the in the target area.
The device according to claim 19, wherein the processor is further configured to: perform semantic segmentation processing on the material image to obtain a classification result of each pixel in the material image; The classification result of the pixels obtains the foreground target image.
The apparatus according to claim 19, wherein the material image is one of a set of material images, and the gesture template is one of a set of gesture templates;

The processor is also used to:

After acquiring the foreground target image of the material image in the material image set, if the foreground target image is successfully matched with the gesture template in the gesture template set, the foreground target image and the template image are fused to generate a fusion image;

A fusion video is generated according to the fusion images corresponding to the material image set.
The device according to claim 32, wherein the target area in the template image is generated according to a target pose template matched with the foreground target image;

The processor is further configured to: fill the foreground target image into the target area of the template image.
The device according to claim 32, wherein the template image is one of a template image set;

The processor is also used to:

determining a target template image in the template image set according to a target pose template matching the foreground target image;

The foreground target image is fused with the target template image.
The apparatus of claim 34, wherein the processor is further configured to:

determining the template images adjacent to the target template image in the template image set; the template images adjacent to the target template image include template images adjacent to the acquisition time;

The foreground target image, the target template image and the template image adjacent to the target template image are fused.
The apparatus according to claim 19, wherein the processor is further configured to: combine the template image and the foreground target according to the color gamut range of the pose image and the color gamut range of the template image The image is processed by color fusion.
A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 18 is implemented.