CN110033423B

CN110033423B - Method and apparatus for processing image

Info

Publication number: CN110033423B
Application number: CN201910302471.0A
Authority: CN
Inventors: 王光伟
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2020-08-28
Anticipated expiration: 2039-04-16
Also published as: WO2020211573A1; CN110033423A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for processing images. One embodiment of the method comprises: acquiring a target object illumination image and a target virtual object image, wherein the target object illumination image comprises an object image and a shadow image corresponding to the object image; inputting the target object illumination image into a pre-trained shadow extraction model to obtain a result shadow image comprising distance information; generating illumination direction information corresponding to the target object illumination image based on the result shadow image; generating a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information; and fusing the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image to obtain a result image. The embodiment can enable the virtual object image to be better fused into the target object illumination image, improves the reality of the result image and is beneficial to improving the display effect of the image.

Description

Method and apparatus for processing image

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for processing an image.

Background

With the development of image processing technology, people can add virtual object images to shot images so as to enhance the display effect of the images.

At present, the virtual object image to be added to the real scene image is generally an image that is preset by a technician according to the appearance of the virtual object.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatuses for processing an image.

In a first aspect, an embodiment of the present disclosure provides a method for processing an image, the method including: acquiring a target object illumination image and a target virtual object image, wherein the target object illumination image comprises an object image and a shadow image corresponding to the object image; inputting the target object illumination image into a pre-trained shadow extraction model to obtain a result shadow image comprising distance information, wherein the distance information is used for representing the distance between a pixel point of the shadow image and a pixel point corresponding to the object image in the target object illumination image; generating illumination direction information corresponding to the target object illumination image based on the result shadow image; generating a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein the illumination direction corresponding to the virtual shadow image in the virtual object illumination image is matched with the illumination direction indicated by the illumination direction information; and fusing the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image to obtain a result image.

In some embodiments, generating illumination direction information corresponding to the target object illumination image based on the resulting shadow image comprises: and inputting the result shadow image into a pre-trained illumination direction recognition model to obtain illumination direction information.

In some embodiments, the distance information is pixel values of pixel points in the resulting shadow image.

In some embodiments, the shadow extraction model is trained by: acquiring a preset training sample set, wherein the training sample comprises a sample object illumination image and a sample result shadow image predetermined aiming at the sample object illumination image; acquiring a pre-established generating type countermeasure network, wherein the generating type countermeasure network comprises a generating network and a judging network, the generating network is used for identifying the input object illumination image and outputting a result shadow image, and the judging network is used for determining whether the input image is the image output by the generating network; based on a machine learning method, a sample object illumination image included in a training sample set is used as input of a generation network, a result shadow image output by the generation network and a sample result shadow image corresponding to the input sample object illumination image are used as input of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as a shadow extraction model.

In some embodiments, the method further comprises: the obtained result image is displayed.

In some embodiments, the method further comprises: and sending the obtained result image to a user terminal of the communication connection, and controlling the user terminal to display the result image.

In a second aspect, an embodiment of the present disclosure provides an apparatus for processing an image, the apparatus including: the image acquisition unit is configured to acquire a target object illumination image and a target virtual object image, wherein the target object illumination image comprises an object image and a shadow image corresponding to the object image; the image input unit is configured to input the target object illumination image into a pre-trained shadow extraction model, and obtain a result shadow image comprising distance information, wherein the distance information is used for representing the distance between a pixel point of the shadow image and a pixel point corresponding to the object image in the target object illumination image; an information generating unit configured to generate illumination direction information corresponding to the target object illumination image based on the resultant shadow image; an image generating unit configured to generate a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein an illumination direction corresponding to a virtual shadow image in the virtual object illumination image matches an illumination direction indicated by the illumination direction information; and the image fusion unit is configured to fuse the virtual object illumination image and the target object illumination image so as to add the virtual object illumination image to the target object illumination image to obtain a result image.

In some embodiments, the information generating unit is further configured to: and inputting the result shadow image into a pre-trained illumination direction recognition model to obtain illumination direction information.

In some embodiments, the apparatus further comprises: an image display unit configured to display the obtained result image.

In some embodiments, the apparatus further comprises: an image transmitting unit configured to transmit the obtained result image to a user terminal of the communication connection, and control the user terminal to display the result image.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for processing images described above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which when executed by a processor, implements the method of any of the above-described methods for processing an image.

The method and the device for processing the image, provided by the embodiment of the disclosure, obtain a target object illumination image and a target virtual object image, wherein the target object illumination image comprises the object image and a shadow image corresponding to the object image, then input the target object illumination image into a pre-trained shadow extraction model, obtain a result shadow image comprising distance information, wherein the distance information is used for representing the distance between a pixel point of the shadow image and a pixel point corresponding to the object image in the target object illumination image, then generate illumination direction information corresponding to the target object illumination image based on the result shadow image, then generate a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein the illumination direction corresponding to the virtual shadow image in the virtual object illumination image is matched with the illumination direction indicated by the illumination direction information, and finally, fusing the virtual object illumination image and the target object illumination image to add the virtual object illumination image into the target object illumination image to obtain a result image, so that when the virtual object image is added into the target object illumination image, a shadow image corresponding to the virtual object image is generated based on the determined illumination direction, and therefore the virtual object image can be better fused into the target object illumination image, the authenticity of the result image is improved, and the display effect of the image is favorably improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing an image according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for processing an image according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing an image according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of an apparatus for processing images according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the method for processing images or the apparatus for processing images of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as an image processing application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices with cameras, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as an image processing server that processes a target object illumination image captured by the

terminal devices

101, 102, 103. The image processing server may perform processing such as analysis on the received data such as the illumination image of the target object, and obtain a processing result (e.g., a result image). In practice, the server may also feed back the obtained processing result to the terminal device.

It should be noted that the method for processing the image provided by the embodiment of the present disclosure may be executed by the server 105, and may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for processing the image may be disposed in the server 105, and may also be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the data used in generating the result image does not need to be acquired from a remote place, the above system architecture may not include a network but only a terminal device or a server.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing an image in accordance with the present disclosure is shown. The method for processing the image comprises the following steps:

step 201, acquiring a target object illumination image and a target virtual object image.

In the present embodiment, an execution subject (e.g., the server 105 shown in fig. 1) of the method for processing an image may acquire the target object illumination image and the target virtual object image from a remote or local place by a wired connection manner or a wireless connection manner. The target object illumination image is an image to be processed. The target object illumination image comprises an object image and a shadow image corresponding to the object image. Specifically, the target object illumination image may be an image obtained by shooting an object in an illumination scene. The light source in the illumination scene for shooting the illumination image of the target object is parallel light or sunlight. It is understood that in an illuminated scene, when an object occludes a light source, a shadow is created.

In the present embodiment, the target virtual object image is an image for processing the target object illumination image. The target virtual object image may be an image predetermined according to the shape of the virtual object. Specifically, the image may be a pre-rendered image, or may be an image extracted from an existing image in advance according to the contour of the object. It should be noted that, here, the "virtual" of the target virtual object image is relative to the target object illumination image, and means that the virtual object corresponding to the target virtual object image does not substantially exist in the real scene used for capturing the target object illumination image.

Step 202, inputting the target object illumination image into a pre-trained shadow extraction model, and obtaining a result shadow image including distance information.

In this embodiment, based on the target object illumination image obtained in step 201, the executing entity may input the target object illumination image into a pre-trained shadow extraction model, and obtain a result shadow image including distance information. The resulting shadow image may be a shadow image extracted from the target object illumination image and added with distance information. The distance information is used for representing the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image. Specifically, a certain point in the object can generate a shadow point on a projection surface (for example, the ground, the wall surface, the desktop, or the like) due to shielding the light source, and then, the point on the object for generating the shadow point can be used as an object point corresponding to the shadow point, the object point on the object corresponds to a pixel point in the object image, the shadow point in the shadow corresponds to a pixel point in the image, and then, the pixel point corresponding to the object point for generating the shadow point corresponding to the pixel point in the shadow image can be used as a pixel point corresponding to the pixel point in the shadow image.

In this embodiment, the distance information may be embodied in various forms in the resulting shadow image. As an example, the distance information may be recorded in the result shadow image in a digital form. Specifically, each pixel point in the result shadow image may correspond to a number, and the number may be a distance between the corresponding pixel point and the corresponding pixel point in the object image.

In some optional implementations of this embodiment, the distance information may be pixel values of pixel points in the resulting shadow image. Specifically, the distance may be characterized using pixel values in various ways. As an example, a manner may be adopted in which the larger the pixel value, the farther the distance; alternatively, the smaller the pixel value, the farther the distance becomes.

In this embodiment, the shadow extraction model may be used to represent the correspondence between the object illumination image and the resulting shadow image. Specifically, as an example, the shadow extraction model may be a table that is prepared in advance by a technician based on statistics of a large number of object illumination images and result shadow images corresponding to the object illumination images and stores correspondence between a plurality of object illumination images and corresponding result shadow images; the model may be a model obtained by training an initial model (e.g., a neural network) by a machine learning method based on a preset training sample.

In some optional implementations of this embodiment, the shadow extraction model may be trained by the executing agent or other electronic device by the following steps:

first, a preset training sample set is obtained, wherein the training samples comprise a sample object illumination image and a sample result shadow image predetermined for the sample object illumination image.

Here, the sample object illumination image may be an image obtained by photographing a sample object in an illumination scene. The sample object illumination image may include a sample object image and a sample shadow image. The sample result shadow image may be an image obtained by extracting a sample shadow image from the sample object illumination image and adding sample distance information to the extracted sample shadow image.

And then, acquiring a pre-established generating type countermeasure network, wherein the generating type countermeasure network comprises a generating network and a judging network, the generating network is used for identifying the input object illumination image and outputting a result shadow image, and the judging network is used for determining whether the input image is the image output by the generating network.

Here, the generative countermeasure network described above may be various structures of generative countermeasure networks. For example, the Generative confrontation network may be a Deep Convolutional Generative adaptive confrontation network (DCGAN). The generative confrontation network may be an untrained generative confrontation network with initialized parameters, or may be an already trained generative confrontation network.

Specifically, the generation network may be a convolutional neural network (e.g., a convolutional neural network including various structures of a convolutional layer, a pooling layer, an anti-pooling layer, and an anti-convolutional layer) for performing image processing. The discriminative network may also be a convolutional neural network (e.g., a convolutional neural network of various structures including a fully-connected layer that can perform a classification function). In addition, the discriminant network may be another model for implementing a classification function, such as a Support Vector Machine (SVM). Here, the determination network may output 1 (or 0) if it determines that the image input to the determination network is the image output by the generation network; if it is determined that the image output from the network is not generated, 0 (or 1) may be output. It should be noted that the discrimination network may output other preset information to represent the discrimination result, and is not limited to the values 1 and 0.

And finally, based on a machine learning method, taking a sample object illumination image included in a training sample in the training sample set as the input of a generation network, taking a result shadow image output by the generation network and a sample result shadow image corresponding to the input sample object illumination image as the input of a discrimination network, training the generation network and the discrimination network, and determining the trained generation network as a shadow extraction model.

Specifically, the parameters of any one of the generation network and the discrimination network (which may be referred to as a first network) may be fixed first, and the network with unfixed parameters (which may be referred to as a second network) may be optimized; and fixing the parameters of the second network to improve the first network. The iterations are continued so that the discrimination network cannot distinguish whether the input image is output by the generation network. In this case, the result shadow image generated by the generation network is close to the sample result shadow image, and the discrimination network cannot accurately distinguish the true data from the generated data (that is, the accuracy is 50%), and the generation network at this time can be determined as the shadow extraction model.

It should be noted that the executing entity or other electronic device may utilize the existing back propagation algorithm and gradient descent algorithm to train the generating network and the discriminating network. And adjusting the parameters of the generation network and the discrimination network after each training, and taking the generation network and the discrimination network obtained after each parameter adjustment as the generation network and the discrimination network used for the next training.

And step 203, generating illumination direction information corresponding to the target object illumination image based on the result shadow image.

In this embodiment, based on the shadow image obtained in step 202, the executing entity may generate illumination direction information corresponding to the illumination image of the target object. Wherein, the illumination direction information may be used to indicate the illumination direction, which may include but is not limited to at least one of the following: characters, numbers, symbols, images. Specifically, as an example, the illumination direction information may be an arrow marked in the result shadow image, where the pointing direction of the arrow may be the illumination direction; alternatively, the illumination direction information may be a two-dimensional vector, where the direction corresponding to the two-dimensional vector may be an illumination direction.

In this embodiment, the illumination direction indicated by the illumination direction information is a projection of the actual illumination direction in the three-dimensional coordinate system onto a projection plane on which a shadow in the three-dimensional coordinate system is located. It will be appreciated that in practice the direction of illumination (i.e. the projection of the actual direction of illumination onto the projection surface on which the shadow is located) generally coincides with the direction of extension of the shadow. Furthermore, the execution subject may determine the extending direction of the shadow based on the pixel point in the result shadow image and the distance information corresponding to the pixel point, and further determine the illumination direction. Specifically, as an example, the execution main body may select, from the result shadow image, a pixel point that is closest to the distance represented by the corresponding distance information as a first pixel point, and select a pixel point that is farthest from the distance represented by the corresponding distance information as a second pixel point, and then the execution main body may determine a direction in which the first pixel point points to the second pixel point as the illumination direction.

And 204, generating a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information.

In this embodiment, based on the illumination direction information obtained in step 203, the executing entity may generate a virtual object illumination image corresponding to the target virtual object image. The virtual object illumination image comprises the target virtual object image and a virtual shadow image corresponding to the target virtual object image. The illumination direction corresponding to the virtual shadow image in the virtual object illumination image is matched with the illumination direction indicated by the illumination direction information. Here, the matching means that an angular deviation of the illumination direction corresponding to the virtual shadow image from the illumination direction indicated by the illumination direction information is equal to or smaller than a preset angle.

Specifically, the executing entity may generate the virtual object illumination image corresponding to the target virtual object image by using various methods based on the illumination direction information.

As an example, a light source may be constructed in the rendering engine based on the illumination direction indicated by the illumination direction information, and then the target virtual object image is rendered based on the constructed light source, so that the virtual object illumination image may be obtained. It should be noted that, since the illumination direction indicated by the illumination direction information is the actual illumination direction projected on the projection surface where the shadow is located, in the process of constructing the light source, it is necessary to first determine the actual illumination direction based on the illumination direction information, and then construct the light source based on the actual illumination direction. It should be noted that, in practice, the actual illumination direction may be determined by the illumination direction on the projection plane where the shadow is located and the illumination direction on the projection plane perpendicular to the projection plane where the shadow is located, and in this embodiment, the illumination direction on the projection plane perpendicular to the projection plane where the shadow is located may be predetermined.

As another example, the execution subject stores an initial virtual shadow image corresponding to the target virtual object image in advance. The executing body may adjust the initial virtual shadow image based on the illumination direction information to obtain the virtual shadow image, and then combine the virtual shadow image and the target virtual object image to generate the virtual object illumination image.

It should be noted that, since the light source corresponding to the target object illumination image is parallel light or sunlight, here, it may be considered that the illumination direction corresponding to the virtual shadow image in the virtual object illumination image matches the illumination direction indicated by the illumination direction information, without considering the influence of the position where the virtual object illumination image is added to the target object illumination image on the illumination direction corresponding to the virtual shadow image.

Step 205, fusing the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image, and obtaining a result image.

In this embodiment, based on the virtual object illumination image obtained in step 204, the executing entity may fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image, so as to obtain a result image. Wherein, the result image is the target object illumination image added with the virtual object illumination image.

Here, the position where the virtual object illumination image is added to the target object illumination image may be predetermined (for example, may be a center position of the image), or may be determined by recognizing the target object illumination image (for example, after recognizing the object image and the shadow image in the target object illumination image, a region excluding the object image and the shadow image in the target object illumination image may be determined as a position for adding the virtual object illumination image).

In some optional implementations of the embodiment, after obtaining the result image, the execution subject may display the obtained result image.

In some optional implementations of this embodiment, the execution main body may further send the obtained result image to a user terminal of the communication connection, and control the user terminal to display the result image. The user terminal is a terminal used by the user and connected with the execution main body in a communication mode. Specifically, the execution main body may send a control signal to the user terminal, so as to control the user terminal to display the result image.

Here, because the virtual object image in the result image corresponds to the virtual shadow image, and the illumination direction of the corresponding virtual shadow image matches the illumination direction of the shadow image corresponding to the real object image, the implementation manner can control the user terminal to display a more real result image, thereby improving the display effect of the image.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing an image according to the present embodiment. In the application scenario of fig. 3, the server 301 first acquires a lighting image 302 (target object lighting image) of a cat and a soccer ball image 303 (target virtual object image), where the lighting image 302 of the cat includes an image (object image) of the cat and a shadow image (shadow image) of the cat. Then, the server 301 may input the illumination image 302 of the cat into a pre-trained shadow extraction model 304, and obtain a shadow image 305 (a result shadow image) of the cat including distance information, where the distance information is used to represent a distance between a pixel point of the shadow image of the cat and a pixel point corresponding to the image of the cat in the illumination image 302 of the cat. Next, the server 301 may generate illumination direction information 306 corresponding to the illumination image 302 of the cat based on the shadow image 305 of the cat including the distance information. Then, the server 301 may generate a soccer ball illumination image 307 (virtual object illumination image) corresponding to the soccer ball image 304 based on the illumination direction information 306, wherein an illumination direction corresponding to a soccer ball shadow image (virtual shadow image) in the soccer ball illumination image 307 matches the illumination direction indicated by the illumination direction information 306. Finally, the server 301 may fuse the football lighting image 307 with the lighting image 302 of the cat to add the football lighting image 307 to the lighting image 302 of the cat to obtain a result image 308.

Currently, when an object in an illuminated scene is photographed, shadows of the object in the scene are usually photographed. The virtual object image added to the real scene image usually does not include a shadow image, and at this time, adding the virtual object image to the real scene image may reduce the reality of the image and affect the display effect of the image. The method provided by the embodiment of the disclosure can generate the virtual object illumination image corresponding to the virtual object image, so that the corresponding virtual shadow image can be added to the virtual object image, and the reality of the generated result image can be improved after the virtual object illumination image and the target object illumination image are fused; in addition, the illumination direction of the virtual shadow image corresponding to the virtual object image can be determined based on the illumination direction of the shadow image in the target object illumination image, so that the virtual object image can be better fused into the target object illumination image, the reality of the result image is further improved, and the display effect of the result image is improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for processing an image is shown. The flow 400 of the method for processing an image comprises the steps of:

step 401, acquiring a target object illumination image and a target virtual object image.

In the present embodiment, an execution subject (e.g., the server 105 shown in fig. 1) of the method for processing an image may acquire the target object illumination image and the target virtual object image from a remote or local place by a wired connection manner or a wireless connection manner. The target object illumination image is an image to be processed. The target object illumination image comprises an object image and a shadow image corresponding to the object image. The target virtual object image is an image for processing the target object illumination image. The target virtual object image may be an image predetermined according to the shape of the virtual object.

Step 402, inputting the target object illumination image into a pre-trained shadow extraction model, and obtaining a result shadow image including distance information.

In this embodiment, based on the target object illumination image obtained in step 401, the executing entity may input the target object illumination image into a pre-trained shadow extraction model, and obtain a result shadow image including distance information. The resulting shadow image may be a shadow image extracted from the target object illumination image and added with distance information. The distance information is used for representing the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image. The shadow extraction model can be used for representing the corresponding relation between the object illumination image and the result shadow image.

And 403, inputting the shadow image into a pre-trained illumination direction identification model to obtain illumination direction information.

In this embodiment, based on the result shadow image obtained in step 402, the executing entity may input the result shadow image into a pre-trained illumination direction recognition model to obtain illumination direction information. Wherein, the illumination direction information may be used to indicate the illumination direction, which may include but is not limited to at least one of the following: characters, numbers, symbols, images.

In this embodiment, the illumination direction identification model may be used to represent a corresponding relationship between the result shadow image and the illumination direction information. Specifically, as an example, the illumination direction identification model may be a correspondence table which is pre-established by a technician in advance based on statistics of a large number of result shadow images and illumination direction information corresponding to the result shadow images and stores a plurality of result shadow images and corresponding illumination direction information; the model may be a model obtained by training an initial model (e.g., a neural network) by a machine learning method based on a preset training sample.

Step 404, based on the illumination direction information, generating a virtual object illumination image corresponding to the target virtual object image.

In this embodiment, based on the illumination direction information obtained in step 403, the executing entity may generate a virtual object illumination image corresponding to the target virtual object image. The virtual object illumination image comprises the target virtual object image and a virtual shadow image corresponding to the target virtual object image. The illumination direction corresponding to the virtual shadow image in the virtual object illumination image is matched with the illumination direction indicated by the illumination direction information. Here, the matching means that an angular deviation of the illumination direction corresponding to the virtual shadow image from the illumination direction indicated by the illumination direction information is equal to or smaller than a preset angle.

Step 405, fusing the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image, and obtaining a result image.

In this embodiment, based on the virtual object illumination image obtained in step 404, the executing entity may fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image, so as to obtain a result image. Wherein, the result image is the target object illumination image added with the virtual object illumination image.

Step 401, step 402, step 404, and step 405 may be performed in a manner similar to that of step 201, step 202, step 204, and step 205 in the foregoing embodiment, respectively, and the above description for step 201, step 202, step 204, and step 205 also applies to step 401, step 402, step 404, and step 405, and is not repeated here.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for processing an image in the present embodiment highlights the step of generating illumination direction information by using the illumination direction recognition model. Therefore, the scheme described in the embodiment can determine the illumination direction corresponding to the target object illumination image more conveniently, so that the result image can be generated more quickly, and the image processing efficiency is improved.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for processing an image, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for processing an image of the present embodiment includes: an image acquisition unit 501, an image input unit 502, an information generation unit 503, an image generation unit 504, and an image fusion unit 505. The image acquiring unit 501 is configured to acquire a target object illumination image and a target virtual object image, where the target object illumination image includes an object image and a shadow image corresponding to the object image; the image input unit 502 is configured to input the target object illumination image into a pre-trained shadow extraction model, and obtain a result shadow image including distance information, where the distance information is used to represent distances between pixel points of the shadow image and pixel points corresponding to the object image in the target object illumination image; the information generating unit 503 is configured to generate illumination direction information corresponding to the target object illumination image based on the resultant shadow image; the image generating unit 504 is configured to generate a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein an illumination direction corresponding to a virtual shadow image in the virtual object illumination image matches the illumination direction indicated by the illumination direction information; the image fusion unit 505 is configured to fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image, obtaining a result image.

In this embodiment, the image acquiring unit 501 of the apparatus 500 for processing an image may acquire the target object illumination image and the target virtual object image from a remote or local place by a wired connection manner or a wireless connection manner. The target object illumination image is an image to be processed. The target object illumination image comprises an object image and a shadow image corresponding to the object image. The target virtual object image is an image for processing the target object illumination image. The target virtual object image may be an image predetermined according to the shape of the virtual object.

In this embodiment, based on the target object illumination image obtained by the image obtaining unit 501, the image input unit 502 may input the target object illumination image into a pre-trained shadow extraction model, and obtain a resultant shadow image including distance information. The resulting shadow image may be a shadow image extracted from the target object illumination image and added with distance information. The distance information is used for representing the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image. The distance information may be embodied in the resulting shadow image in various forms. The shadow extraction model can be used for representing the corresponding relation between the object illumination image and the result shadow image.

In this embodiment, based on the resulting shadow image obtained by the image input unit 502, the information generating unit 503 may generate illumination direction information corresponding to the illumination image of the target object. Wherein, the illumination direction information may be used to indicate the illumination direction, which may include but is not limited to at least one of the following: characters, numbers, symbols, images.

In the present embodiment, based on the illumination direction information obtained by the information generating unit 503, the image generating unit 504 generates a virtual object illumination image corresponding to the target virtual object image. The virtual object illumination image comprises the target virtual object image and a virtual shadow image corresponding to the target virtual object image. The illumination direction corresponding to the virtual shadow image in the virtual object illumination image is matched with the illumination direction indicated by the illumination direction information. Here, the matching means that an angular deviation of the illumination direction corresponding to the virtual shadow image from the illumination direction indicated by the illumination direction information is equal to or smaller than a preset angle.

In this embodiment, based on the virtual object illumination image obtained by the image generation unit 504, the image fusion unit 505 may fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image, and obtain a result image. Wherein, the result image is the target object illumination image added with the virtual object illumination image.

In some optional implementations of this embodiment, the information generating unit 503 may be further configured to: and inputting the result shadow image into a pre-trained illumination direction recognition model to obtain illumination direction information.

In some optional implementations of this embodiment, the distance information is a pixel value of a pixel point in the resulting shadow image.

In some optional implementations of this embodiment, the shadow extraction model may be trained by: acquiring a preset training sample set, wherein the training sample comprises a sample object illumination image and a sample result shadow image predetermined aiming at the sample object illumination image; acquiring a pre-established generating type countermeasure network, wherein the generating type countermeasure network comprises a generating network and a judging network, the generating network is used for identifying the input object illumination image and outputting a result shadow image, and the judging network is used for determining whether the input image is the image output by the generating network; based on a machine learning method, a sample object illumination image included in a training sample set is used as input of a generation network, a result shadow image output by the generation network and a sample result shadow image corresponding to the input sample object illumination image are used as input of a discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as a shadow extraction model.

In some optional implementations of this embodiment, the apparatus 500 may further include: an image display unit (not shown in the figure) configured to display the obtained result image.

In some optional implementations of this embodiment, the apparatus 500 may further include: an image transmitting unit (not shown in the figure) configured to transmit the obtained result image to a user terminal of the communication connection and control the user terminal to display the result image.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

The apparatus 500 provided by the above embodiment of the present disclosure may generate a virtual object illumination image corresponding to a virtual object image, so that a corresponding virtual shadow image may be added to the virtual object image, and further, after the virtual object illumination image and the target object illumination image are fused, the authenticity of the generated result image may be improved; in addition, the illumination direction of the virtual shadow image corresponding to the virtual object image can be determined based on the illumination direction of the shadow image in the target object illumination image, so that the virtual object image can be better fused into the target object illumination image, the reality of the result image is further improved, and the display effect of the result image is improved.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., a terminal device or a server in fig. 1) 600 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target object illumination image and a target virtual object image, wherein the target object illumination image comprises an object image and a shadow image corresponding to the object image; inputting the target object illumination image into a pre-trained shadow extraction model to obtain a result shadow image comprising distance information, wherein the distance information is used for representing the distance between a pixel point of the shadow image and a pixel point corresponding to the object image in the target object illumination image; generating illumination direction information corresponding to the target object illumination image based on the result shadow image; generating a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein the illumination direction corresponding to the virtual shadow image in the virtual object illumination image is matched with the illumination direction indicated by the illumination direction information; and fusing the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image to obtain a result image.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Here, the name of the unit does not constitute a limitation to the unit itself in some cases, and for example, the image acquisition unit may also be described as a "unit that acquires a target object illumination image and a virtual object image".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for processing an image, comprising:

acquiring a target object illumination image and a target virtual object image, wherein the target object illumination image comprises an object image and a shadow image corresponding to the object image;

inputting the target object illumination image into a pre-trained shadow extraction model to obtain a result shadow image comprising distance information, wherein the result shadow image is a shadow image which is extracted from the target object illumination image and added with the distance information, and the distance information is used for representing the distance between a pixel point of the shadow image and a pixel point corresponding to the object image in the target object illumination image;

generating illumination direction information corresponding to the target object illumination image based on the result shadow image, wherein the generating illumination direction information corresponding to the target object illumination image based on the result shadow image comprises:

determining the extending direction of the shadow based on the pixel points in the shadow image and the distance information corresponding to the pixel points, and determining the extending direction as the illumination direction;

generating a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein an illumination direction corresponding to a virtual shadow image in the virtual object illumination image is matched with the illumination direction indicated by the illumination direction information;

and fusing the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image to obtain a result image.

2. The method of claim 1, wherein the generating illumination direction information corresponding to the target object illumination image based on the resulting shadow image comprises:

and inputting the result shadow image into a pre-trained illumination direction recognition model to obtain illumination direction information.

3. The method of claim 1, wherein pixel values of pixel points in the resulting shadow image are distance information.

4. The method of claim 1, wherein the shadow extraction model is trained by:

acquiring a preset training sample set, wherein the training sample comprises a sample object illumination image and a sample result shadow image predetermined aiming at the sample object illumination image;

acquiring a pre-established generating type countermeasure network, wherein the generating type countermeasure network comprises a generating network and a judging network, the generating network is used for identifying the input object illumination image and outputting a result shadow image, and the judging network is used for determining whether the input image is the image output by the generating network;

based on a machine learning method, a sample object illumination image included in a training sample in the training sample set is used as the input of a generation network, a result shadow image output by the generation network and a sample result shadow image corresponding to the input sample object illumination image are used as the input of the discrimination network, the generation network and the discrimination network are trained, and the trained generation network is determined as a shadow extraction model.

5. The method according to one of claims 1-4, wherein the method further comprises:

the obtained result image is displayed.

6. The method according to one of claims 1-4, wherein the method further comprises:

and sending the obtained result image to a user terminal in communication connection, and controlling the user terminal to display the result image.

7. An apparatus for processing an image, comprising:

an image acquisition unit configured to acquire a target object illumination image and a target virtual object image, wherein the target object illumination image includes an object image and a shadow image corresponding to the object image;

the image input unit is configured to input the target object illumination image into a pre-trained shadow extraction model, and obtain a result shadow image including distance information, wherein the result shadow image is a shadow image extracted from the target object illumination image and added with the distance information, and the distance information is used for representing the distance between a pixel point of the shadow image and a pixel point corresponding to the object image in the target object illumination image;

an information generating unit configured to generate illumination direction information corresponding to the target object illumination image based on the result shadow image, wherein the generating illumination direction information corresponding to the target object illumination image based on the result shadow image includes:

an image generating unit configured to generate a virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein an illumination direction corresponding to a virtual shadow image in the virtual object illumination image matches an illumination direction indicated by the illumination direction information;

an image fusion unit configured to fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image to obtain a result image.

8. The apparatus of claim 7, wherein the information generating unit is further configured to:

9. The apparatus of claim 7, wherein pixel values of pixel points in the resulting shadow image are distance information.

10. The apparatus of claim 7, wherein the shadow extraction model is trained by:

11. The apparatus according to one of claims 7-10, wherein the apparatus further comprises:

an image display unit configured to display the obtained result image.

12. The apparatus according to one of claims 7-10, wherein the apparatus further comprises:

an image transmitting unit configured to transmit the obtained result image to a user terminal of a communication connection, and control the user terminal to display the result image.

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.