CN112258435A

CN112258435A - Image processing method and related product

Info

Publication number: CN112258435A
Application number: CN202011117195.XA
Authority: CN
Inventors: 吴磊; 姚超睿; 曹恩丹; 王元吉; 彭南京
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2021-01-22

Abstract

The embodiment of the application discloses an image processing method and a related product, wherein the method comprises the following steps: performing shadow estimation processing based on a first image and a second image to obtain a target silhouette of a target object in the first image; the first image is a foreground area image obtained from an original image; performing image fusion processing on the first image, the second image and the target silhouette of the target object to obtain a target image; the target silhouette in the target image is taken as a shadow of the target object in the first image. In the embodiment of the application, image fusion processing is carried out on the first image, the second image and a target silhouette of a target object to obtain a target image; the target image containing the shadow of the target object can be obtained, and the target image is more real.

Description

Image processing method and related product

Technical Field

The present application relates to the field of image processing, and in particular, to an image processing method and related product.

Background

With the development of image processing technology, the application of video background changing and image background changing is more and more extensive. In the currently adopted video background-changing scheme and image background-changing scheme, the image fusion effect needs to be further enhanced.

Disclosure of Invention

The embodiment of the application discloses an image processing method and a related product, which can automatically generate the shadow of a target object in an image so as to ensure that the image is more real.

In a first aspect, an embodiment of the present application provides an image processing method, including: performing shadow estimation processing based on a first image and a second image to obtain a target silhouette of a target object in the first image; the first image is a foreground area image obtained from an original image; performing image fusion processing on the first image, the second image and the target silhouette of the target object to obtain a target image; the target silhouette in the target image is taken as a shadow of the target object in the first image.

In the embodiment of the application, image fusion processing is carried out on the first image, the second image and a target silhouette of a target object to obtain a target image; the target image containing the shadow of the target object can be obtained, and the target image is more real.

In one possible implementation manner, the performing the shadow estimation processing based on the first image and the second image to obtain the target silhouette includes: processing the first image to obtain an original silhouette of the target object; the original silhouette is used for describing the outline of the target object in the first image; utilizing the second image to estimate a shadow angle to obtain a target projection angle; performing affine transformation processing on the original silhouette of the first image based on the target projection angle to obtain the target silhouette; the boundary frame of the target silhouette is a parallelogram with the angle being the target projection angle.

In some embodiments, the outline of the target object in the first image may be understood as an outline of the first image, and the original silhouette may be understood as an outline for describing the first image. The boundary frame of the original silhouette is a rectangular frame with a right angle.

In the implementation mode, affine transformation processing is carried out on an original silhouette of the first image based on a target projection angle to obtain a target silhouette; the shadow of the target object can be made more realistic.

In a possible implementation manner, the processing the first image to obtain an original silhouette of the target object includes: and adjusting all pixel points in the first image to be 0 to obtain the original silhouette.

In a possible implementation manner, the performing the shadow angle estimation by using the second image to obtain the target projection angle includes: performing depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located; determining the target projection angle based on the target plane and the normal vector.

In the implementation mode, the target projection angle can be accurately and quickly obtained.

In one possible implementation, the determining the target projection angle based on the target plane and the normal vector includes: taking the angle between the projection of a straight line which passes through the origin of a target image coordinate system and is perpendicular to the normal vector in the target image coordinate system and the coordinate axis of the target image coordinate system as the target projection angle; the target image coordinate system is an image coordinate system in the second image and coincides with the target plane.

In a possible implementation manner, the performing image fusion processing based on the first image, the second image, and the target silhouette to obtain a target image includes: taking the first image as a foreground image and the second image as a background image to perform front-background fusion processing to obtain an intermediate image; and obtaining the target image based on the intermediate image and the target silhouette.

In one possible implementation manner, the obtaining the target image based on the intermediate image and the target silhouette includes: carrying out person foot positioning processing on the intermediate image to obtain position information of a person foot area in the intermediate image; performing image fusion processing on the target silhouette and the intermediate image based on the position information of the foot area of the person to obtain the target image; and overlapping the foot area in the target silhouette in the target image with the human foot area.

The overlap of the foot region in the target silhouette with the person foot region may be that at least a portion of the foot region in the target silhouette overlaps at least a portion of the person foot region.

In this implementation, shadows in the target image can be made more natural and realistic.

In one possible implementation manner, the performing the shadow estimation processing based on the first image and the second image to obtain the target silhouette of the target object in the first image includes: taking the first image as a foreground image and the second image as a background image to perform front-background fusion processing to obtain an intermediate image; performing depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located; determining a mapping of a calibration region in the target plane to a shadow region of the intermediate image based on the normal vector; the calibration area is a parallelogram area, and a first edge of the calibration area and a second edge of a boundary frame of the target object in the intermediate image are mapped to be overlapped with a target line segment of the target plane; obtaining the target silhouette based on the shadow region and the first image; and the graph corresponding to the boundary frame of the target silhouette is the same as the graph corresponding to the shadow area.

In this implementation, the shadow region in the intermediate image can be determined more accurately.

In one possible implementation, before determining that the calibration region in the target plane is mapped to the shadow region of the intermediate image based on the normal vector, the method further includes: obtaining the target line segment of the second edge of the bounding box of the target object mapped to the target plane in the intermediate image; the second edge is the edge closest to the feet of the person in the boundary frame of the target object; obtaining the calibration area in the target plane based on the target line segment; the target line segment is the first side of the calibration area.

In the implementation mode, the calibration area in the target plane can be determined reasonably, so that the shadow area generated based on the calibration area is more real and natural.

In one possible implementation manner, the angle of the calibration region is 45 degrees, and/or the length of the third side of the calibration region is two thirds of the length of the fourth side of the bounding box of the target object in the intermediate image.

In a possible implementation manner, the first image is used as a foreground image and the second image is used as a background image to perform front-background fusion processing, so as to obtain an intermediate image; processing the first image and the second image to obtain a target position of the first image relative to the second image; and carrying out image fusion processing on the first image and the second image based on the target position to obtain the intermediate image.

In the implementation mode, the image processing device automatically determines the position of the first image relative to the second image, and the determined relative position is more reasonable, so that the second image and the second image are subjected to front-background fusion to obtain an image with a better fusion effect.

In one possible implementation manner, the obtaining the target position of the first image relative to the second image by processing the first image and the second image includes: performing ground identification processing on the second image to obtain position information of a ground area in the second image; determining the target position of the first image relative to the second image as a first central region of the ground area in case a target boundary line in the second image is located in the ground area, and/or determining the target position of the first image relative to the second image as a second central region of the second image in case the target boundary line in the second image is not located in the ground area.

In this implementation, the position of the first image relative to the second image may be more reasonably determined.

In one possible implementation manner, the obtaining the target position of the first image relative to the second image by processing the first image and the second image includes: performing ground identification processing on the second image to obtain position information of a ground area in the second image; determining the target position of the first image relative to the second image based on location information of the ground area; the determination of the target position is such that a target area in the first image is located within the ground area relative to the second image, the target area containing an area in contact with the ground in the original image.

In this implementation, the area in contact with the ground in the original image may be located within the ground area relative to the second image, conforming to the real scene.

In one possible implementation manner, the obtaining the target position of the first image relative to the second image by processing the first image and the second image includes: processing the first image to obtain the position information of the reference area in the first image; the reference area is any area in the first image; obtaining the target position of the first image relative to the second image based on the position information of the reference region; the determination of the target position is such that a specific position of the reference area is located within a flat viewing area of the second image; the head-up region includes a vanishing line resulting from a plurality of vanishing points, any one of which is a point where two lines parallel in the real world intersect in the second image. Optionally, the reference area includes a boundary line of the first image dividing the first image into two parts having a height ratio of one to two. The reference region may be an image region between two straight lines in the first image determined according to the height of the first image.

In one possible implementation manner, the obtaining the target position of the first image relative to the second image by processing the first image and the second image includes: processing the first image to obtain the position information of the calibration line in the first image; obtaining the target position of the first image relative to the second image based on the position information of the calibration line; the determination of the target position is such that the calibration line is located within a flat viewing region of the second image relative to the second image; the head-up region includes a vanishing line resulting from a plurality of vanishing points, any one of which is a point where two lines parallel in the real world intersect in the second image. The calibration line may be a boundary dividing the first image into two parts with a height ratio of one to two in the first image, or may be another boundary, which is not limited in this embodiment of the application.

In one possible implementation, the second image is a frame image in the target video, and the method further includes: identifying whether lens motion exists in the target video; under the condition that lens motion exists in the target video, identifying a path of the lens motion; translating and/or scaling the third image to obtain a fourth image based on the path of the lens movement; the third image is a frame of image in the target video and is shot later than the second image; performing image fusion processing on the first image, the fourth image and the target silhouette of the target object to obtain a fifth image; the target silhouette in the fifth image is taken as a shadow of the target object in the fifth image.

In the implementation manner, the third image in the target video is translated and/or zoomed based on the lens motion, so that the processing operation on the background image can be reduced, and the efficiency is improved.

In one possible implementation, the method further includes: adjusting the transparency of the target silhouette in the target image, and/or performing feathering on the target silhouette in the target image. Optionally, the transparency of the target silhouette in the target image is adjusted to 60%.

In one possible implementation manner, a ratio of a height of the target silhouette in the target image to a height of the foreground image in the target image is a target value. The target value may be one third, one quarter, etc.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including: the shadow estimation unit is used for carrying out shadow estimation processing on the basis of a first image and a second image to obtain a target silhouette of a target object in the first image; the first image is a foreground area image obtained from an original image; the image fusion unit is used for carrying out image fusion processing on the first image, the second image and the target silhouette of the target object to obtain a target image; the target silhouette in the target image is taken as a shadow of the target object in the first image.

In a possible implementation manner, the shadow estimation unit is specifically configured to process the first image to obtain an original silhouette of the target object; the original silhouette is used for describing the outline of the target object in the first image; utilizing the second image to estimate a shadow angle to obtain a target projection angle; performing affine transformation processing on the original silhouette of the first image based on the target projection angle to obtain the target silhouette; the boundary frame of the target silhouette is a parallelogram with the angle being the target projection angle.

In a possible implementation manner, the shadow estimation unit is specifically configured to adjust each pixel point in the first image to 0, so as to obtain the original silhouette.

In a possible implementation manner, the shadow estimation unit is specifically configured to perform depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located; determining the target projection angle based on the target plane and the normal vector.

In a possible implementation manner, the shadow estimation unit is specifically configured to use an angle between a projection of a straight line passing through an origin of a target image coordinate system and perpendicular to the normal vector in the target image coordinate system and a coordinate axis of the target image coordinate system as the target projection angle; the target image coordinate system is an image coordinate system in the second image and coincides with the target plane.

In a possible implementation manner, the image fusion unit is specifically configured to perform front-background fusion processing on the first image as a foreground image and the second image as a background image to obtain an intermediate image; and obtaining the target image based on the intermediate image and the target silhouette.

In a possible implementation manner, the image fusion unit is specifically configured to perform person foot positioning processing on the intermediate image to obtain position information of a person foot region in the intermediate image; performing image fusion processing on the target silhouette and the intermediate image based on the position information of the foot area of the person to obtain the target image; and overlapping the foot area in the target silhouette in the target image with the human foot area.

In a possible implementation manner, the shadow estimation unit is specifically configured to perform front-background fusion processing on the first image as a foreground image and the second image as a background image to obtain an intermediate image; performing depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located; determining a mapping of a calibration region in the target plane to a shadow region of the intermediate image based on the normal vector; the calibration area is a parallelogram area, and a first edge of the calibration area and a second edge of a boundary frame of the target object in the intermediate image are mapped to be overlapped with a target line segment of the target plane; obtaining the target silhouette based on the shadow region and the first image; and the graph corresponding to the boundary frame of the target silhouette is the same as the graph corresponding to the shadow area.

In a possible implementation manner, the shadow estimation unit is further configured to obtain the target line segment of the target plane to which the second edge of the bounding box of the target object in the intermediate image is mapped; the second edge is the edge closest to the feet of the person in the boundary frame of the target object; obtaining the calibration area in the target plane based on the target line segment; the target line segment is the first side of the calibration area.

In a possible implementation manner, the image fusion unit is specifically configured to obtain a target position of the first image relative to the second image by processing the first image and the second image; and carrying out image fusion processing on the first image and the second image based on the target position to obtain the intermediate image.

In a possible implementation manner, the image fusion unit is specifically configured to perform ground identification processing on the second image to obtain location information of a ground area in the second image; determining the target position of the first image relative to the second image as a first central region of the ground area in case a target boundary line in the second image is located in the ground area, and/or determining the target position of the first image relative to the second image as a second central region of the second image in case the target boundary line in the second image is not located in the ground area.

In a possible implementation manner, the image fusion unit is specifically configured to perform ground identification processing on the second image to obtain location information of a ground area in the second image; determining the target position of the first image relative to the second image based on location information of the ground area; the determination of the target position is such that a target area in the first image is located within the ground area relative to the second image, the target area containing an area in contact with the ground in the original image.

In a possible implementation manner, the image fusion unit is specifically configured to process the first image to obtain position information of a reference region in the first image; the reference area is any area in the first image; obtaining the target position of the first image relative to the second image based on the position information of the reference region; the determination of the target position is such that a specific position of the reference area is located within a flat viewing area of the second image; the head-up region includes a vanishing line resulting from a plurality of vanishing points, any one of which is a point where two lines parallel in the real world intersect in the second image.

In a possible implementation manner, the image fusion unit is further configured to identify whether there is lens motion in the target video; under the condition that lens motion exists in the target video, identifying a path of the lens motion; translating and/or scaling the third image to obtain a fourth image based on the path of the lens movement; the third image is a frame of image in the target video and is shot later than the second image; performing image fusion processing on the first image, the fourth image and the target silhouette of the target object to obtain a fifth image; the target silhouette in the fifth image is taken as a shadow of the target object in the fifth image.

In a possible implementation manner, the image fusion unit is further configured to adjust a transparency of the target silhouette in the target image, and/or perform feathering on the target silhouette in the target image. Optionally, the transparency of the target silhouette in the target image is adjusted to 60.

With regard to the technical effects brought about by the second aspect or various alternative embodiments, reference may be made to the introduction of the technical effects of the first aspect or the corresponding implementation.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory, wherein the memory is configured to store instructions and the processor is configured to execute the instructions stored by the memory, so that the processor performs the method according to the first aspect and any possible implementation manner.

In a fourth aspect, an embodiment of the present application provides a chip, where the chip includes a data interface and a processor, where the processor is configured to execute the method in the first aspect or any possible implementation manner of the first aspect.

In a fifth aspect, the present application provides a computer-readable storage medium storing a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the method of the first aspect and any optional implementation manner.

In a sixth aspect, the present application provides a computer program product, which includes program instructions, and when executed by a processor, causes the processor to execute the method of the first aspect and any optional implementation manner.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating an example of a target image according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of another image processing method provided in the embodiments of the present application;

FIG. 4 is a schematic diagram of an example of an original silhouette provided by an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an example of a target image coordinate system and normal vectors according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a bounding box of an original silhouette and a bounding box of a target silhouette according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of another image processing method provided in the embodiments of the present application;

FIG. 8 is a diagram illustrating an example of a calibration area in a target plane and a bounding box of a target object in an intermediate image according to an embodiment of the present disclosure;

FIG. 9 is a flowchart of another image processing method provided in the embodiments of the present application;

FIG. 10 is a schematic diagram of an example of a second image and a first center region provided by an embodiment of the present application;

fig. 11A is a schematic diagram of an example of a reference area of a first image according to an embodiment of the present disclosure;

fig. 11B is a schematic diagram of an example of a head-up region of a second image according to an embodiment of the present disclosure;

fig. 12 is a schematic diagram of an example of an image after processing a target silhouette in a target image according to an embodiment of the present application;

FIG. 13 is a flowchart of another image processing method provided in the embodiments of the present application;

FIG. 14 is a flowchart of another image processing method provided in the embodiments of the present application;

fig. 15 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 16 is a schematic structural diagram of a server provided in an embodiment of the present application;

fig. 17 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The terms "first," "second," and "third," etc. in the description and claims of the present application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a list of steps or elements. A method, system, article, or apparatus is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, system, article, or apparatus.

As described in the background art, in the currently adopted video background-changing scheme and image background-changing scheme, in order to make the fused image more realistic, a user needs to manually make a shadow of at least one object (e.g., a person) in the fused image, which is complicated in operation and takes a long time. Therefore, there is a need to develop a scheme capable of automatically generating a shadow of at least one object (e.g., a person) in a fused image. An image processing method for automatically generating a shadow of at least one object (e.g., a person) in a fused image is provided. The following respectively briefly introduces scenes to which the image processing method provided by the embodiment of the present application is applicable.

Scene 1: the user performs image fusion processing on the foreground image and the background image through image processing software running on terminal equipment (such as a personal computer) to obtain a fused image. Wherein the fused image comprises a shadow of at least one object (i.e. an object in the foreground image).

Scene 2: the user performs image fusion processing on the foreground image and each frame of background image in the video respectively through image processing software running on terminal equipment (such as a personal computer) to obtain a fused video. Wherein each frame image in the fused video comprises a shadow of at least one object (i.e. an object in the foreground image).

Scene 3: a user uploads a foreground image (or an original image containing the foreground image) and a background image to a server through a network by a terminal device (such as a personal computer), and the server performs image fusion processing on the foreground image and the background image and sends an image (corresponding to a target image) obtained by the image fusion processing to the terminal device. The image obtained by the image fusion process contains the shadow of at least one object (i.e. the object in the foreground image).

Scene 4: a user uploads a foreground image and a video to a server through a network by a terminal device (such as a personal computer), the server performs image fusion processing on the foreground image and each frame of background image in the video respectively, and the video obtained through the image fusion processing is sent to the terminal device. Each frame of image in the video obtained by the image fusion processing is the shadow of at least one object (namely, the object in the foreground image).

In the above scenario, by implementing the image processing method provided by the embodiment of the present application, the shadow of at least one object (for example, a person) in the fused image can be automatically generated, so that the fused image is more realistic.

The following describes an image processing method provided by an embodiment of the present application with reference to the drawings.

Referring to fig. 1, fig. 1 is a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:

101. the image processing device performs shadow estimation processing based on the first image and the second image to obtain a target silhouette of the target object in the first image.

The first image is a foreground region image obtained from an original image. In some embodiments, the target objects are one or more persons. In some embodiments, the target object may include a person and other objects (i.e., all objects in the first image), such as an animal and any object. In some embodiments, the image processing apparatus may acquire the first image from the original image before performing step 101.

The image processing device may be a terminal device such as a mobile phone, a tablet computer, a wearable device, a notebook computer, a desktop computer, or a server. In some embodiments, the image processing apparatus is a terminal device such as a notebook computer, and the image processing software executed by the image processing apparatus executes the method flow of fig. 1. For example, the user uploads the original image and the second image to the image processing software run by the image processing apparatus, or the image processing software acquires the original image and the second image from a network or a specific storage area based on the instruction of the user, and the method flow of fig. 1 is executed by the image processing software. In some embodiments, the image processing apparatus is a server, and before performing step 101, the image processing apparatus may receive an original image and a second image from a terminal device (e.g., a mobile phone or a PC, etc.), and extract a first image from the original image. The second image may be a still image or an image frame extracted from a video stream, and the type of the second image is not limited in the embodiments of the present disclosure.

102. The image processing device performs image fusion processing on the first image, the second image and the target silhouette of the target object to obtain a target image.

The target silhouette in the target image is a shadow of the target object in the first image.

One possible implementation of step 102 is as follows: taking the first image as a foreground image and the second image as a background image to perform front-background fusion processing to obtain an intermediate image; and obtaining the target image based on the intermediate image and the target silhouette. In some embodiments, obtaining the target image based on the intermediate image and the target silhouette may be: performing person foot positioning processing on the intermediate image to obtain position information of a person foot area in the intermediate image; performing image fusion processing on the target silhouette and the intermediate image based on the position information of the human foot region to obtain the target image; the foot area in the target silhouette in the target image overlaps the person foot area. In some embodiments, the image processing apparatus performs object detection on the intermediate image, detects the feet of the person, and then locates the feet area of the person. It should be understood that the image processing apparatus may also obtain the position information of the foot region of the person in the intermediate image in other manners, and the application is not limited thereto. The overlap between the foot region of the target silhouette and the foot region of the person in the target image means that at least a part of the foot region of the target silhouette in the target image overlaps at least a part of the foot region of the person. In the real world, the foot area in the shadow of a person overlaps the person's foot. Fig. 2 is a schematic diagram of an example of a target image according to an embodiment of the present application. As shown in fig. 2, the foot area in the shadow (corresponding to the target silhouette) overlaps with the human foot area. In some embodiments, the image processing device determines a position of the first image relative to the second image according to an instruction input by a user, and the image processing device performs image fusion processing on the first image and the second image based on the position of the first image relative to the second image to obtain the intermediate image. In some embodiments, the image processing device may automatically determine a position of the first image relative to the second image, and the image processing device performs image fusion processing on the first image and the second image based on the position of the first image relative to the second image to obtain the intermediate image. For example, the image processing device obtains a target position of the first image relative to the second image by processing the first image and the second image; and performing image fusion processing on the first image and the second image based on the target position to obtain the intermediate image. The implementation of the image processing means for determining the target position of the first image relative to the second image is described in more detail later.

In some embodiments, after the image processing apparatus performs step 102 to obtain the target image, the following operations may be performed: adjusting the transparency of the target silhouette in the target image, and/or performing feathering on the target silhouette in the target image. For example, the transparency of the target silhouette in the target image is adjusted to 60%. For example, the image processing apparatus sets the radius of feathering to 200 pixels when feathering the target silhouette in the target image. In some embodiments, the image processing apparatus may also blur the target silhouette in the target image. In the embodiments, the target silhouette in the target image can be made more real and natural by blurring, feathering, adjusting the transparency, and the like.

In some embodiments, the image processing apparatus is a terminal device, and after the image processing apparatus performs step 102 to obtain the target image, the image processing apparatus may further display the target image through a display device (e.g., a display or a display screen). In some embodiments, the image processing apparatus is a server, and after the image processing apparatus performs step 102 to obtain the target image, the image processing apparatus may further transmit the target image to a terminal device (e.g., a mobile phone).

In some embodiments, the second image is a frame image of the target video, and the method further includes: identifying whether lens motion exists in the target video; under the condition that the lens motion exists in the target video, identifying the path of the lens motion; translating and/or scaling the third image to obtain a fourth image based on the path of the lens movement; the third image is a frame of image in the target video and is captured later than the second image; performing image fusion processing on the first image, the fourth image and the target silhouette of the target object to obtain a fifth image; the target silhouette in the fifth image is defined as a shadow of the target object in the fifth image. In this embodiment, the third image in the target video is translated and/or scaled based on the lens movement, so that the processing operation on the background image can be reduced, and the efficiency can be improved.

Since the foregoing embodiments do not describe in detail how to perform the shadow estimation processing based on the first image and the second image to obtain the target silhouette, an implementation of performing the shadow estimation processing based on the first image and the second image to obtain the target silhouette will be described below with reference to the drawings.

Fig. 3 is a flowchart of another image processing method according to an embodiment of the present application. The method flow in fig. 3 is one possible implementation of step 101. As shown in fig. 3, the method includes:

301. the image processing device processes the first image to obtain an original silhouette of the target object in the first image.

The original silhouette is used to describe the outline of the target object in the first image. In some embodiments, the processing the first image to obtain the original silhouette of the target object may be: and adjusting all pixel points in the first image to be 0 to obtain the original silhouette. That is, the image processing apparatus adjusts the pixel value of each pixel point in the first image extracted from the original image to 0, so as to obtain the original silhouette. In some embodiments, the processing the first image to obtain the original silhouette of the target object may be: and adjusting all pixel points in the human object area in the first image to be 0 to obtain the original silhouette. In some embodiments, the image processing apparatus may perform recognition processing on the first image to obtain a person region in the first image; then, all the pixel points in the human object area in the first image are adjusted to be 0, and the original silhouette is obtained. Fig. 4 is a schematic diagram of an example of an original silhouette provided by an embodiment of the present application.

302. And the image processing device carries out shadow angle estimation by utilizing the second image to obtain a target projection angle.

One possible implementation of step 302 is as follows: performing depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located; and determining the target projection angle based on the target plane and the normal vector. In some embodiments, the image processing apparatus may perform depth estimation and normal vector estimation on the second image (i.e., the background image) by using depth learning, and the ground may be obtained because normal vectors of the same ground are similar. Since the main inventive point of the present application is not depth estimation and normal vector estimation for the second image, the implementation of depth estimation and normal vector estimation for the second image will not be described in detail here. The image processing device may obtain the target plane where the ground is located in the second image and the normal vector of the target plane in any manner.

In some embodiments, the image processing apparatus may determine the target projection angle based on the target plane and the normal vector by: setting an angle between a projection of a straight line passing through an origin of a target image coordinate system and perpendicular to the normal vector in the target image coordinate system and a coordinate axis of the target image coordinate system as the target projection angle; the target image coordinate system is an image coordinate system in the second image and coincides with the target plane. Fig. 5 is a schematic diagram of an example of a target image coordinate system and normal vectors according to an embodiment of the present application. As shown in fig. 5, 501 denotes an x-axis of an image coordinate system, 502 denotes a y-axis of the image coordinate system, 503 denotes an origin of the image coordinate system, 504 denotes a normal vector, 505 denotes a straight line passing through the origin of the target image coordinate system and perpendicular to the normal vector, 506 denotes a projection of a straight line passing through the origin of the target image coordinate system and perpendicular to the normal vector in the target image coordinate system, and 507 or 508 denotes a target projection angle. It is to be understood that the above-mentioned target image coordinate system can be understood as an image coordinate system of the above-mentioned second image constructed at the above-mentioned target plane. From the correspondence of the depth estimate (x, y, z [ depth ]) and the pixel (x, y) of the image, we can determine a mapping of 3D points (i.e. three-dimensional coordinate points) to 2D points (i.e. image coordinate points). Since the image coordinate system is two-dimensional, i.e. there is no z-axis, by making the image coordinate system in the second image coincide with the target plane, the normal vector can be linked to the image coordinate system in the second image, thereby calculating the target projection angle.

303. The image processing device performs affine transformation processing on the original silhouette of the first image based on the target projection angle to obtain a target silhouette.

The bounding box of the target silhouette is a parallelogram with an angle of the target projection angle. The boundary frame of the original silhouette is a rectangular frame with a right angle. An affine transformation is a linear transformation of two-dimensional coordinates into two-dimensional coordinates. In short, the affine transformation allows the graphics to be arbitrarily inclined, and allows the graphics to be arbitrarily expanded and contracted in two directions. Affine transformations can be achieved by the composition of a series of atomic transformations, including: translation, zoom, flip, rotate, and crop. Fig. 6 is a schematic diagram of a bounding box of an original silhouette and a bounding box of a target silhouette according to an embodiment of the present disclosure. As shown in fig. 6, the left-hand diagram represents the bounding box of the original silhouette, and the right-hand diagram represents the bounding box of the target silhouette. In some embodiments, the affine transformation processing performed by the image processing apparatus on the original silhouette of the first image based on the target projection angle may be: obtaining an affine transformation matrix based on the target projection angle; and calculating the product of the coordinates of each pixel point in the original silhouette of the first image and the affine transformation matrix to obtain the target silhouette. It is understood that the target silhouette is an image obtained by performing affine transformation processing on the original silhouette.

In the embodiment of the application, affine transformation processing is carried out on the original silhouette of the first image based on the target projection angle to obtain a target silhouette; the shadow of the target object can be made more realistic.

Fig. 7 is a flowchart of another image processing method according to an embodiment of the present application. The method flow in fig. 7 is one possible implementation of step 101. As shown in fig. 7, the method includes:

701. the image processing device performs front-background fusion processing on the first image as a foreground image and the second image as a background image to obtain an intermediate image.

The first image is a foreground region image obtained from an original image.

702. The image processing device performs depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane.

The target plane is used for describing a plane where the ground in the second image is located.

703. And determining a shadow region of the intermediate image mapped to the calibration region in the target plane based on the normal vector.

The calibration area is a parallelogram area and a first side of the calibration area coincides with a target line segment mapped to the target plane by a second side of the boundary frame of the target object in the intermediate image. It will be appreciated that the image processing apparatus, on obtaining the normal vector and the origin (e.g. the lower left corner of the intermediate image), can find an expression of the ground plane (i.e. an expression of the target plane), where each point in the target plane can be mapped onto the image coordinate system in which the intermediate image is located. That is, the image processing apparatus can map an arbitrary point on the object plane to the image coordinate system of the intermediate image based on the normal vector of the object plane. Therefore, the image processing apparatus can map the calibration area in the target plane to the intermediate image, resulting in the shadow area.

In some embodiments, the image processing apparatus may perform the following operations before performing step 703: obtaining the target line segment of the second edge of the boundary frame of the target object in the intermediate image, which is mapped to the target plane; the second side is the side closest to the feet of the person in the boundary frame of the target object; obtaining the calibration area in the target plane based on the target line segment; the target line segment is the first side of the calibration area. For example, the angle of the calibration area is 45 degrees, and/or the length of the third side of the calibration area is two thirds of the length of the fourth side of the bounding box of the target object in the intermediate image. In the embodiment of the application, the shape and the size of the calibration area can be set according to actual needs. Fig. 8 is a schematic diagram of an example of a calibration area in a target plane and a bounding box of a target object in an intermediate image according to an embodiment of the present application. As shown in fig. 8, the left rectangle represents the bounding box of the target object in the intermediate image, the right parallelogram represents the calibration area in the target plane, 801 represents one side (i.e., the second side) of the bounding box of the target object closest to the foot of the person, 802 represents the first side (corresponding to the target line segment) of the calibration area, 803 represents the fourth side of the bounding box of the target object, and 804 represents the third side of the calibration area. In these embodiments, the image processing apparatus may first determine a target line segment in the intermediate image, to which the second edge of the bounding box of the target object is mapped; then, a target line segment is used as one side of a calibration area in a target plane to construct a calibration area (a parallelogram); and finally, mapping the marked area in the target plane to the intermediate image to obtain a shadow area.

704. And obtaining a target silhouette based on the shadow area and the first image.

The figure corresponding to the bounding box of the target silhouette is the same as the figure corresponding to the shadow region. In some embodiments, the image processing apparatus may perform affine transformation processing based on a bounding box (a rectangular box) of the first image and a bounding box corresponding to the shadow region, so as to obtain the target silhouette. For example, the image processing apparatus first determines an affine transformation formula (or an affine transformation matrix) for affine-transforming the bounding box of the first image into the bounding box corresponding to the shadow region; then, affine transformation processing is respectively carried out on each point in the first image by using the affine transformation formula, and a target silhouette is obtained.

The calibration area in the target plane can be interpreted as a real shadow, and therefore the shadow area in the intermediate image determined based on the calibration area in the target plane can be interpreted as a real shadow in the captured image.

In the embodiment of the application, the shadow area in the intermediate image can be determined more accurately so as to generate a more real target silhouette.

Referring to fig. 9, fig. 9 is a flowchart of another image processing method according to an embodiment of the present disclosure. The process flow in fig. 9 is a refinement and refinement of the process flow in fig. 1. As shown in fig. 9, the method may include:

901. the image processing device processes the first image to obtain an original silhouette of the target object in the first image.

The original silhouette is used to describe the outline of the target object in the first image. The implementation of step 901 may be the same as the implementation of step 301.

902. And the image processing device carries out shadow angle estimation by utilizing the second image to obtain a target projection angle.

The implementation of step 902 may be the same as the implementation of step 302.

903. The image processing device performs affine transformation processing on the original silhouette of the first image based on the target projection angle to obtain a target silhouette of the target object in the first image.

The implementation of step 903 may be the same as the implementation of step 303.

904. And processing the first image and the second image to obtain the target position of the first image relative to the second image.

One possible implementation of step 904 is as follows: performing ground identification processing on the second image to obtain position information of a ground area in the second image; the method may further include determining that the target position of the first image with respect to the second image is a first center area of the ground area when a target boundary in the second image is located in the ground area, and/or determining that the target position of the first image with respect to the second image is a second center area of the second image when the target boundary in the second image is not located in the ground area. The target boundary may be any one horizontal line in the second image, for example, a horizontal line that divides the second image into two rectangular regions having an area ratio of 4 to 1. The first central region is a central region of the ground region, and the second central region is a central region of the second image. In some embodiments, the image processing device may determine the location of the ground area in the second image using monocular depth estimation, resulting in location information for the ground area. Fig. 10 is a schematic diagram of an example of a second image and a first central area provided in an embodiment of the present application. As shown in fig. 10, the target boundary is a boundary dividing the second image into two boundaries having a height ratio of 4 to 1, the reference boundary is a boundary dividing the second image into two boundaries having a height ratio of 1 to 4, 801 denotes a first center region, the first center region is located between the target boundary and the reference boundary, and the center of the first center region is at the same distance from the left boundary of the second image as the right boundary of the second image. Two images are included in fig. 10, the top image being a color image, the middle image showing the first central region, and the bottom image being a depth image (different depths are indicated by different colors). In this implementation, the image processing apparatus can quickly determine the position of the first image relative to the second image, which is simple to implement.

Another possible implementation of step 904 is as follows: performing ground identification processing on the second image to obtain position information of a ground area in the second image; determining the target position of the first image relative to the second image based on the position information of the ground area; the determination of the target position causes a target area in the first image to be located within the ground area relative to the second image, the target area including an area in contact with the ground in the original image. In some embodiments, determining the target position of the first image relative to the second image based on the location information of the ground area may be: determining the target position of the first image relative to the second image based on the target area, the position information of the ground area, and a first boundary line in the second image; a distance between a second boundary line in the first image and the first boundary line with respect to a target straight line in the second image is smaller than a distance threshold, the second boundary line is a boundary line between a first sub-portion and a second sub-portion in the first image, a ratio of a distance from a pixel point farthest from the second boundary line in the first sub-portion to the second boundary line to a distance from a pixel point farthest from the second boundary line in the second sub-portion to the second boundary line is a first value, the first boundary line is a boundary line between a third sub-portion and a fourth sub-portion in the second image, a ratio of a distance from a pixel point farthest from the first boundary line in the third sub-portion to the first boundary line to a distance from a pixel point farthest from the first boundary line in the fourth sub-portion to the first boundary line is a second value, the difference between the first value and the second value is smaller than a first threshold. In some embodiments, the first value and the second value are equal (e.g., both are one-half), and the first threshold is 0. In some embodiments, the first value and the second value are not equal, and the first threshold is not 0. In this implementation, the area in contact with the ground in the original image may be located within the ground area relative to the second image, conforming to the real scene.

Another possible implementation of step 904 is as follows: processing the first image to obtain the position information of the reference area in the first image; the reference area is an arbitrary area in the first image; obtaining the target position of the first image relative to the second image based on the position information of the reference area; the determination of the target position enables the reference area to be located in the flat visual area of the second image; the head-up region includes a vanishing line obtained from a plurality of vanishing points, and any one of the plurality of vanishing points is a point where two parallel lines in the real world intersect in the second image. Optionally, the reference area includes a boundary line of the first image, which divides the first image into two parts having a height ratio of one to two. Fig. 11A is a schematic diagram of an example of a reference area of a first image according to an embodiment of the present application. The reference area may be an image area between two straight lines in the first image determined according to a height of the first image. In practical applications, the user may use a region between any two straight lines in the first image as a reference region of the first image. That is, the reference area in the first image can be configured according to actual requirements. In some embodiments, the above-mentioned head-up region of the second image is a preset region, for example, a region between two straight lines (horizontal lines) in the second image. Fig. 11B is a schematic diagram of an example of a head-up region of a second image according to an embodiment of the present application. In some embodiments, the image processing apparatus may regard a certain region in each background image as a head-up region, see fig. 11B. In some embodiments, the image processing device may perform a head-up area estimation process on the second image by using a neural network, so as to obtain position information of the head-up area in the second image. The neural network may be a network trained by using a plurality of training samples for marking the head-up area.

Another possible implementation of step 904 is as follows: processing the first image to obtain the position information of the calibration line in the first image; obtaining the target position of the first image relative to the second image based on the position information of the calibration line; determining the target position such that the calibration line is located within a flat viewing area of the second image relative to the second image; the head-up region includes a vanishing line obtained from a plurality of vanishing points, and any one of the plurality of vanishing points is a point where two parallel lines in the real world intersect in the second image. The calibration line may be a boundary dividing the first image into two parts with a height ratio of one to two in the first image, or may be another boundary, which is not limited in the embodiment of the present application.

905. And performing image fusion processing on the first image and the second image based on the target position to obtain an intermediate image.

906. And carrying out character foot positioning processing on the intermediate image to obtain the position information of the character foot area in the intermediate image.

907. And performing image fusion processing on the target silhouette and the intermediate image based on the position information of the human foot area to obtain a target image.

The foot area in the target silhouette in the target image overlaps the person foot area. In some embodiments, a ratio of a height of the target silhouette in the target image to a height of the foreground image in the target image is a target value. The target value may be one third, one fourth, etc., and the present application is not limited thereto. In some embodiments, the image processing apparatus may determine the position of the target silhouette relative to the intermediate image before performing the image fusion process on the target silhouette and the intermediate image; then, based on the position of the target silhouette relative to the intermediate image, image fusion processing is performed on the target silhouette and the intermediate image to obtain a target image. In some embodiments, the image processing apparatus determines the position of the target silhouette with respect to the intermediate image such that a ratio of a height of the target silhouette in the target image to a height of the person region in the target image is a target value and overlap of the foot region in the target silhouette and the person foot region in the target image exists. For example, the image processing apparatus determines the height of the person region in the intermediate image before performing the image fusion processing on the target silhouette and the intermediate image; then, the target silhouette is scaled so that the ratio of the height of the target silhouette to the height of the person region is a target value.

908. Adjusting the transparency of the target silhouette in the target image, and/or performing feathering on the target silhouette in the target image.

Optionally, the transparency of the target silhouette in the target image is adjusted to be 60%. In some embodiments, the image processing apparatus may further perform blurring processing on the target silhouette in the target image. Fig. 12 is a schematic diagram of an example of an image after processing a target silhouette in a target image according to an embodiment of the present application. Fig. 2 is a schematic diagram of an example of a target image according to an embodiment of the present application. Comparing fig. 2 and fig. 12, it can be seen that after the target silhouette in the target image is processed, the target silhouette in the target image is more natural and real.

Referring to fig. 13, fig. 13 is a flowchart of another image processing method according to an embodiment of the present disclosure. The process flow in fig. 13 is a refinement and refinement of the process flow in fig. 1. As shown in fig. 13, the method may include:

1301. the image processing device performs front-background fusion processing on the first image as a foreground image and the second image as a background image to obtain an intermediate image.

1302. The image processing device performs depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane.

1303. The image processing device obtains a target line segment of which the second edge of the bounding box of the target object in the intermediate image is mapped to the target plane.

The second side is a side of the bounding box of the target object closest to the feet of the person.

1304. And the image processing device obtains a calibration area in the target plane based on the target line segment.

The target line segment is one side (i.e., a first side) of the calibration area.

1305. And determining a shadow region of the intermediate image mapped to the calibration region in the target plane based on the normal vector.

1306. And obtaining a target silhouette based on the shadow area and the first image.

1307. And the image processing device performs image fusion processing on the target silhouette and the intermediate image to obtain a target image.

In some embodiments, the image processing apparatus may further perform the following operations: adjusting the transparency of the target silhouette in the target image, and/or performing feathering on the target silhouette in the target image.

In the embodiment of the application, the shadow area in the intermediate image can be accurately determined, and then a real shadow is generated.

The foregoing embodiments describe a scheme in which the image processing apparatus independently implements fusion of the foreground image and the background image. The following describes a scheme in which a server (i.e., an image processing apparatus) and a terminal device jointly perform fusion of a foreground image and a background image.

Fig. 14 is a flowchart of another image processing method according to an embodiment of the present application. As shown in fig. 14, the method includes:

1401. the terminal device acquires the original image and the second image.

The terminal device can be a mobile phone, a desktop computer, a notebook computer, a tablet computer and other devices with data transmission and image display functions. The terminal device may acquire the original image and the second image by receiving the original image and the second image uploaded by the user, or may acquire the original image and the second image from other devices or local storage in response to an instruction input by the user; the original image and the second image may also be obtained by other means, and the application is not limited.

1402. The terminal device sends the original image and the second image to the server.

1403. And the server performs image fusion processing on the original image and the second image to obtain a target image.

The server (i.e., the image processing apparatus) may perform the method flow in fig. 1 or fig. 13. In some embodiments, the server may extract the first image from the original image and then perform the method flow of fig. 1 or fig. 13.

1404. And the server sends the target image to the terminal equipment.

1405. The terminal device displays the target image.

In some embodiments, the user uploads the original image and the second image to image processing software running on the terminal device, the original image and the second image are sent to the server through the image processing software, and an interface of the image processing software displays the target image received from the server. It should be understood that the server often has processing capability that the terminal device cannot compare with, and therefore the server can implement the image processing method provided by the embodiment of the present application more accurately and quickly.

In the embodiment of the application, the terminal device realizes the foreground and background fusion of the first image and the second image by means of the server, does not need to execute the responsible image processing operation, and is simple to realize.

Having described the image processing method provided by the embodiment of the present application, the functions of the respective components of the image processing apparatus that can provide the image processing method of the embodiment of the present application are described below. Fig. 15 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. As shown in fig. 15, the image processing apparatus may include:

a shadow estimation unit 1501, configured to perform shadow estimation processing based on a first image and a second image, to obtain a target silhouette of a target object in the first image; the first image is a foreground area image obtained from an original image;

an image fusion unit 1502 configured to perform image fusion processing on the first image, the second image, and the target silhouette of the target object to obtain a target image; the target silhouette in the target image is a shadow of the target object in the first image.

In a possible implementation manner, the shadow estimation unit 1501 is specifically configured to process the first image to obtain an original silhouette of the target object; the original silhouette is used for describing the outline of the target object in the first image; utilizing the second image to estimate a shadow angle to obtain a target projection angle; performing affine transformation processing on the original silhouette of the first image based on the target projection angle to obtain the target silhouette; the bounding box of the target silhouette is a parallelogram with an angle of the target projection angle.

In a possible implementation manner, the shadow estimation unit 1501 is specifically configured to perform depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located; and determining the target projection angle based on the target plane and the normal vector.

In one possible implementation manner, the shadow estimation unit 1501 is specifically configured to use an angle between a projection of a straight line passing through an origin of a target image coordinate system and perpendicular to the normal vector in the target image coordinate system and a coordinate axis of the target image coordinate system as the target projection angle; the target image coordinate system is an image coordinate system in the second image and coincides with the target plane.

In a possible implementation manner, the image fusion unit 1502 is specifically configured to perform front-background fusion processing on the first image as a foreground image and the second image as a background image to obtain an intermediate image; and obtaining the target image based on the intermediate image and the target silhouette.

In one possible implementation manner, the image fusion unit 1502 is specifically configured to perform a person foot positioning process on the intermediate image to obtain position information of a person foot region in the intermediate image;

performing image fusion processing on the target silhouette and the intermediate image based on the position information of the human foot region to obtain the target image; the foot area in the target silhouette in the target image overlaps the person foot area.

In a possible implementation manner, the shadow estimation unit 1501 is specifically configured to perform front-background fusion processing on the first image as a foreground image and the second image as a background image to obtain an intermediate image; performing depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located; determining a shadow region of the intermediate image mapped to the calibration region in the target plane based on the normal vector; the calibration area is a parallelogram area, and a first side of the calibration area is superposed with a target line segment mapped to the target plane by a second side of the boundary frame of the target object in the intermediate image; obtaining the target silhouette based on the shadow region and the first image; the figure corresponding to the bounding box of the target silhouette is the same as the figure corresponding to the shadow region.

In a possible implementation manner, the shadow estimation unit 1501 is further configured to obtain the target line segment, where the second edge of the bounding box of the target object in the intermediate image is mapped to the target plane; the second side is the side closest to the feet of the person in the boundary frame of the target object; obtaining the calibration area in the target plane based on the target line segment; the target line segment is the first side of the calibration area.

In one possible implementation, the angle of the calibration region is 45 degrees, and/or the length of the third side of the calibration region is two-thirds of the length of the fourth side of the bounding box of the target object in the intermediate image.

In a possible implementation manner, the image fusion unit 1502 is specifically configured to obtain a target position of the first image relative to the second image by processing the first image and the second image;

and performing image fusion processing on the first image and the second image based on the target position to obtain the intermediate image.

In a possible implementation manner, the image fusion unit 1502 is specifically configured to perform ground identification processing on the second image to obtain location information of a ground area in the second image;

the method may further include determining that the target position of the first image with respect to the second image is a first center area of the ground area when a target boundary in the second image is located in the ground area, and/or determining that the target position of the first image with respect to the second image is a second center area of the second image when the target boundary in the second image is not located in the ground area.

determining the target position of the first image relative to the second image based on the position information of the ground area; the determination of the target position causes a target area in the first image to be located within the ground area relative to the second image, the target area including an area in contact with the ground in the original image.

In a possible implementation manner, the image fusion unit 1502 is specifically configured to process the first image to obtain position information of a reference region in the first image; the reference area is an arbitrary area in the first image;

obtaining the target position of the first image relative to the second image based on the position information of the reference area; the target position is determined so that the specific position of the reference area is positioned in the flat viewing area of the second image; the head-up region includes a vanishing line obtained from a plurality of vanishing points, and any one of the plurality of vanishing points is a point where two parallel lines in the real world intersect in the second image.

In a possible implementation manner, the image fusion unit 1502 is further configured to identify whether there is lens motion in the target video;

under the condition that the lens motion exists in the target video, identifying the path of the lens motion;

translating and/or scaling the third image to obtain a fourth image based on the path of the lens movement; the third image is a frame of image in the target video and is captured later than the second image;

performing image fusion processing on the first image, the fourth image and the target silhouette of the target object to obtain a fifth image; the target silhouette in the fifth image is defined as a shadow of the target object in the fifth image.

In one possible implementation, the image processing apparatus further includes: an output unit 1503 to output the target image. In some embodiments, the entity corresponding to the input unit 1503 is a display screen, a display, and the like, and the output target image refers to a display target image. In some embodiments, the entity corresponding to the input unit 1503 is a communication interface, a transmitter, and the like, and outputting the target image refers to transmitting the target image to another device.

It should be understood that the above division of each unit of the image processing apparatus is only a division of a logical function, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. For example, the above units may be processing elements which are set up separately, or may be implemented by integrating the same chip, or may be stored in a storage element of the controller in the form of program codes, and a certain processing element of the processor calls and executes the functions of the above units. In addition, the units can be integrated together or can be independently realized. The processing element may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method or the units above may be implemented by hardware integrated logic circuits in a processor element or instructions in software. The processing element may be a general-purpose processor, such as a Central Processing Unit (CPU), or may be one or more integrated circuits configured to implement the above method, such as: one or more application-specific integrated circuits (ASICs), one or more microprocessors (DSPs), one or more field-programmable gate arrays (FPGAs), etc.

Fig. 16 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1600 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1622 (e.g., one or more processors) and a memory 1632, and one or more storage media 1630 (e.g., one or more mass storage devices) storing an application program 1642 or data 1644. Memory 1632 and storage media 1630 may be transient or persistent storage, among others. The program stored on the storage medium 1630 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Further, central processing unit 1622 may be configured to communicate with storage medium 1630 to execute a series of instruction operations on storage medium 1630 at server 1600. The server 1600 may be the image processing method provided by the present application.

The server 1600 may also include one or more power supplies 1626, one or more wired or wireless network interfaces 1650, one or more input-output interfaces 1658, and/or one or more operating systems 1641, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The steps performed by the image processing apparatus in the above-described embodiment may be based on the server configuration shown in fig. 16. Specifically, the central processing unit 1622 may implement the functions of the shadow estimation unit 1501 and the image fusion unit 1502 in fig. 15, and the input-output interface 1658 may implement the function of the output unit 1503 in fig. 15.

Fig. 17 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 17, the terminal device 170 includes a processor 1701, a memory 1702, and a communication interface 1703; the processor 1701, the memory 1702 and the communication interface 1703 are connected to each other by a bus. The terminal device in fig. 17 may be the image processing apparatus in the foregoing embodiment.

The memory 1702 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a compact disc read-only memory (CDROM), and the memory 1702 is used for related instructions and data. Communication interface 1703 is used to receive and transmit data.

The processor 1701 may be one or more Central Processing Units (CPUs), which may be single core CPUs or multi-core CPUs in the case where the processor 1701 is one CPU. The steps performed by the image processing apparatus in the above-described embodiment may be based on the structure of the terminal device shown in fig. 17. Specifically, the processor 1701 may implement the functions of the shadow estimation unit 1501 and the image fusion unit 1502 in fig. 15, and the communication interface 1703 may implement the function of the output unit 1503 in fig. 15.

In an embodiment of the present application, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the image processing method provided by the foregoing embodiment.

The present application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the image processing method provided by the foregoing embodiments.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

performing shadow estimation processing based on a first image and a second image to obtain a target silhouette of a target object in the first image; the first image is a foreground area image obtained from an original image;

performing image fusion processing on the first image, the second image and the target silhouette of the target object to obtain a target image; the target silhouette in the target image is taken as a shadow of the target object in the first image.

2. The method of claim 1, wherein performing the shadow estimation process based on the first image and the second image to obtain the target silhouette comprises:

processing the first image to obtain an original silhouette of the target object; the original silhouette is used for describing the outline of the target object in the first image;

utilizing the second image to estimate a shadow angle to obtain a target projection angle;

performing affine transformation processing on the original silhouette of the first image based on the target projection angle to obtain the target silhouette; the boundary frame of the target silhouette is a parallelogram with the angle being the target projection angle.

3. The method of claim 2, wherein the estimating the shadow angle using the second image to obtain the target projection angle comprises:

performing depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located;

determining the target projection angle based on the target plane and the normal vector.

4. The method of claim 3, wherein the determining the target projection angle based on the target plane and the normal vector comprises:

taking the angle between the projection of a straight line which passes through the origin of a target image coordinate system and is perpendicular to the normal vector in the target image coordinate system and the coordinate axis of the target image coordinate system as the target projection angle; the target image coordinate system is an image coordinate system in the second image and coincides with the target plane.

5. The method according to any one of claims 1 to 4, wherein the performing image fusion processing based on the first image, the second image and the target silhouette to obtain a target image comprises:

taking the first image as a foreground image and the second image as a background image to perform front-background fusion processing to obtain an intermediate image;

and obtaining the target image based on the intermediate image and the target silhouette.

6. The method of claim 5, wherein the deriving the target image based on the intermediate image and the target silhouette comprises:

carrying out person foot positioning processing on the intermediate image to obtain position information of a person foot area in the intermediate image;

performing image fusion processing on the target silhouette and the intermediate image based on the position information of the foot area of the person to obtain the target image; and overlapping the foot area in the target silhouette in the target image with the human foot area.

7. The method of claim 1, wherein performing the shadow estimation process based on the first image and the second image to obtain the target silhouette of the target object in the first image comprises:

determining a mapping of a calibration region in the target plane to a shadow region of the intermediate image based on the normal vector; the calibration area is a parallelogram area, and a first edge of the calibration area and a second edge of a boundary frame of the target object in the intermediate image are mapped to be overlapped with a target line segment of the target plane;

obtaining the target silhouette based on the shadow region and the first image; and the graph corresponding to the boundary frame of the target silhouette is the same as the graph corresponding to the shadow area.

8. The method of claim 7, wherein prior to determining that a calibration region in the target plane maps to a shadow region of the intermediate image based on the normal vector, the method further comprises:

obtaining the target line segment of the second edge of the bounding box of the target object mapped to the target plane in the intermediate image; the second edge is the edge closest to the feet of the person in the boundary frame of the target object;

obtaining the calibration area in the target plane based on the target line segment; the target line segment is the first side of the calibration area.

9. The method according to any one of claims 5 to 8, wherein the performing of the pre-background fusion process on the first image as a foreground image and the second image as a background image to obtain an intermediate image comprises;

processing the first image and the second image to obtain a target position of the first image relative to the second image;

and carrying out image fusion processing on the first image and the second image based on the target position to obtain the intermediate image.

10. The method of claim 9, wherein the obtaining the target position of the first image relative to the second image by processing the first image and the second image comprises:

performing ground identification processing on the second image to obtain position information of a ground area in the second image;

determining the target position of the first image relative to the second image as a first central region of the ground area in case a target boundary line in the second image is located in the ground area, and/or determining the target position of the first image relative to the second image as a second central region of the second image in case the target boundary line in the second image is not located in the ground area.

11. The method according to any one of claims 1 to 10, wherein the second image is a frame image in a target video, the method further comprising:

identifying whether lens motion exists in the target video;

under the condition that lens motion exists in the target video, identifying a path of the lens motion;

translating and/or scaling the third image to obtain a fourth image based on the path of the lens movement; the third image is a frame of image in the target video and is shot later than the second image;

performing image fusion processing on the first image, the fourth image and the target silhouette of the target object to obtain a fifth image; the target silhouette in the fifth image is taken as a shadow of the target object in the fifth image.

12. An image processing apparatus characterized by comprising:

the shadow estimation unit is used for carrying out shadow estimation processing on the basis of a first image and a second image to obtain a target silhouette of a target object in the first image; the first image is a foreground area image obtained from an original image;

the image fusion unit is used for carrying out image fusion processing on the first image, the second image and the target silhouette of the target object to obtain a target image; the target silhouette in the target image is taken as a shadow of the target object in the first image.

13. The image processing apparatus according to claim 12,

the shadow estimation unit is specifically configured to process the first image to obtain an original silhouette of the target object; the original silhouette is used for describing the outline of the target object in the first image;

14. The image processing apparatus according to claim 13,

the shadow estimation unit is specifically configured to perform depth estimation and normal vector estimation on the second image to obtain a target plane and a normal vector of the target plane; the target plane is used for describing a plane where the ground in the second image is located;

15. The image processing apparatus according to claim 12,

the shadow estimation unit is specifically configured to perform front-background fusion processing on the first image serving as a foreground image and the second image serving as a background image to obtain an intermediate image;

16. The image processing apparatus according to claim 15,

the shadow estimation unit is further configured to obtain the target line segment of the target plane to which the second edge of the bounding box of the target object in the intermediate image is mapped; the second edge is the edge closest to the feet of the person in the boundary frame of the target object;

17. An electronic device comprising a memory and a processor, wherein the memory is configured to store instructions and the processor is configured to execute the instructions stored by the memory, such that the processor performs the method of any of claims 1 to 11.

18. A computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 11.