WO2020211573A1

WO2020211573A1 - Method and device for processing image

Info

Publication number: WO2020211573A1
Application number: PCT/CN2020/078582
Authority: WO
Inventors: 王光伟
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2019-04-16
Filing date: 2020-03-10
Publication date: 2020-10-22
Also published as: CN110033423A; CN110033423B

Abstract

The embodiments of the present disclosure disclose a method and a device for processing an image. An embodiment of the method comprises: acquiring a target object illumination image and a target virtual object image, the target object illumination image including an object image and a shadow image corresponding to the object image; inputting the target object illumination image into a pre-trained shadow extraction model, and obtaining a resulting shadow image comprising distance information; generating, on the basis of the resulting shadow image, illumination direction information corresponding to the target object illumination image; generating, on the basis of the illumination direction information, a virtual object illumination image corresponding to the target virtual object image; and fusing the virtual object illumination image and the target object illumination image, so as to add the virtual object illumination image to the target object illumination image, so as to obtain a resulting image. According to the embodiment, a virtual object image can be better fused into a target object illumination image, improving the authenticity of the resulting image, and thus facilitating the improvement of the display effect of the image.

Description

Method and device for processing image

Cross references to related applications

This application is based on a Chinese patent application with the application number 201910302471.0, the application date being April 16, 2019, and the invention title "Method and Apparatus for Processing Images", and it claims the priority of the Chinese patent application. The Chinese patent The entire content of the application is hereby incorporated into this application as a reference.

Technical field

The embodiments of the present disclosure relate to the field of computer technology, and more particularly to methods and devices for processing images.

Background technique

With the development of image processing technology, people can add virtual object images to the captured images to enhance the display effect of the images.

At present, the virtual object image used for adding to the real scene image is usually an image preset by the technician according to the shape of the virtual object.

Summary of the invention

The embodiments of the present disclosure propose methods and devices for processing images.

In the first aspect, an embodiment of the present disclosure provides a method for processing an image, the method includes: acquiring a target object illumination image and a target virtual object image, wherein the target object illumination image includes the object image and the corresponding object image Shadow image; input the target object illumination image into the pre-trained shadow extraction model to obtain the resulting shadow image including distance information, where the distance information is used to represent the pixel points of the shadow image and the pixel corresponding to the object image in the target object illumination image The distance of the point; based on the resulting shadow image, generate the illumination direction information corresponding to the target object illumination image; based on the illumination direction information, generate the virtual object illumination image corresponding to the target virtual object image, where the virtual shadow image in the virtual object illumination image The corresponding light direction matches the light direction indicated by the light direction information; the virtual object light image and the target object light image are merged to add the virtual object light image to the target object light image to obtain the result image.

In some embodiments, based on the resulting shadow image, generating the illumination direction information corresponding to the target object illumination image includes: inputting the resulting shadow image into a pre-trained illumination direction recognition model to obtain the illumination direction information.

In some embodiments, the distance information is the pixel value of the pixel in the resulting shadow image.

In some embodiments, the shadow extraction model is obtained by training in the following steps: obtaining a preset training sample set, where the training samples include a sample object illumination image and a sample result shadow image predetermined for the sample object illumination image; obtaining a pre-established Generative confrontation network, where the generative confrontation network includes a generation network and a discriminant network. The generation network is used to identify the input object lighting image and output the resulting shadow image, and the discriminant network is used to determine whether the input image is a generation network The output image; based on the machine learning method, the sample object illumination image included in the training samples in the training sample set is used as the input of the generation network, and the resulting shadow image output by the network and the sample corresponding to the input sample object illumination image are generated As a result, the shadow image is used as the input of the discriminant network, the generation network and the discriminant network are trained, and the trained generation network is determined as the shadow extraction model.

In some embodiments, the method further includes: displaying the obtained result image.

In some embodiments, the method further includes: sending the obtained result image to a user terminal connected in communication, and controlling the user terminal to display the result image.

In a second aspect, an embodiment of the present disclosure provides an apparatus for processing an image. The apparatus includes: an image acquisition unit configured to acquire a target object illumination image and a target virtual object image, wherein the target object illumination image includes an object The shadow image corresponding to the image and the object image; the image input unit is configured to input the target object illumination image into the pre-trained shadow extraction model to obtain the resulting shadow image including distance information, where the distance information is used to characterize the illumination of the target object In the image, the distance between the pixel point of the shadow image and the pixel point corresponding to the object image; the information generation unit is configured to generate light direction information corresponding to the target object illumination image based on the resulting shadow image; the image generation unit is configured to be based on The illumination direction information generates the virtual object illumination image corresponding to the target virtual object image, wherein the illumination direction corresponding to the virtual shadow image in the virtual object illumination image matches the illumination direction indicated by the illumination direction information; the image fusion unit is It is configured to fuse the lighting image of the virtual object and the lighting image of the target object to add the lighting image of the virtual object to the lighting image of the target object to obtain a result image.

In some embodiments, the information generating unit is further configured to: input the resulting shadow image into a pre-trained light direction recognition model to obtain light direction information.

In some embodiments, the device further includes: an image display unit configured to display the obtained result image.

In some embodiments, the device further includes: an image sending unit configured to send the obtained result image to a user terminal connected in communication, and control the user terminal to display the result image.

In a third aspect, the embodiments of the present disclosure provide an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when one or more programs are processed by one or more The processor executes, so that one or more processors implement the method of any one of the foregoing methods for processing images.

In a fourth aspect, the embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method of any one of the above methods for processing an image is implemented.

The method and device for processing an image provided by the embodiments of the present disclosure acquire a target object illumination image and a target virtual object image, where the target object illumination image includes the object image and the shadow image corresponding to the object image, and then the target object The illumination image is input into the pre-trained shadow extraction model to obtain the resulting shadow image including distance information. The distance information is used to characterize the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image, and then based on As a result, the shadow image generates the illumination direction information corresponding to the illumination image of the target object, and then based on the illumination direction information, the virtual object illumination image corresponding to the target virtual object image is generated, where the illumination corresponding to the virtual shadow image in the virtual object illumination image The direction matches the light direction indicated by the light direction information, and finally the virtual object light image and the target object light image are merged to add the virtual object light image to the target object light image to obtain the result image, which can be used as the target When adding a virtual object image to the lighting image of the object, based on the determined light direction, the shadow image corresponding to the virtual object image is generated, so that the virtual object image can be better integrated into the target object lighting image, which improves the reality of the result image It can improve the display effect of the image.

Description of the drawings

By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes and advantages of the present disclosure will become more apparent:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied;

Fig. 2 is a flowchart of an embodiment of a method for processing an image according to the present disclosure;

3 is a schematic diagram of an application scenario of the method for processing images according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of another embodiment of a method for processing an image according to the present disclosure;

Fig. 5 is a schematic structural diagram of an embodiment of an image processing apparatus according to the present disclosure;

Fig. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.

detailed description

The present disclosure will be further described in detail below in conjunction with the drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for ease of description, only the parts related to the relevant invention are shown in the drawings.

It should be noted that the embodiments in the present disclosure and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present disclosure will be described in detail with reference to the drawings and in conjunction with embodiments.

FIG. 1 shows an exemplary system architecture 100 to which an embodiment of the method for processing images or the apparatus for processing images of the present disclosure can be applied.

As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and so on. Various communication client applications, such as image processing applications, web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the

terminal devices

101, 102, and 103.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they can be various electronic devices with cameras, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc. When the

terminal devices

101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.

The server 105 may be a server that provides various services, for example, an image processing server that processes the illumination images of the target object obtained by shooting the

terminal devices

101, 102, and 103. The image processing server can analyze and process the received data such as the illumination image of the target object, and obtain the processing result (for example, the result image). In practice, the server can also feed back the obtained processing result to the terminal device.

It should be noted that the method for processing images provided by the embodiments of the present disclosure can be executed by the server 105, and can also be executed by the

terminal devices

101, 102, 103. Accordingly, the device for processing images can be set in the server. 105 can also be set in the

terminal devices

101, 102, 103.

It should be noted that the server can be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers. In the case that the data used in the process of generating the result image does not need to be obtained remotely, the above system architecture may not include the network, but only include the terminal device or the server.

With continued reference to FIG. 2, there is shown a flow 200 of an embodiment of the method for processing an image according to the present disclosure. The method for processing images includes the following steps:

Step 201: Obtain a target object illumination image and a target virtual object image.

In this embodiment, the execution subject of the method for processing images (for example, the server 105 shown in FIG. 1) may remotely or locally obtain the target object illumination image and the target virtual object image through a wired connection or a wireless connection. Among them, the illumination image of the target object is the image to be processed. The illumination image of the target object includes the object image and the shadow image corresponding to the object image. Specifically, the illuminated image of the target object may be an image obtained by shooting an object in the illuminated scene. The light source in the illumination scene in which the illumination image of the target object is captured is parallel light or sunlight. It can be understood that in an illuminated scene, when an object blocks the light source, shadows will be generated.

In this embodiment, the target virtual object image is an image used to process the illumination image of the target object. The target virtual object image may be an image predetermined according to the shape of the virtual object. Specifically, it may be a pre-drawn image, or it may be an image pre-extracted from an existing image according to the contour of the object. It should be noted that, here, the "virtual" of the target virtual object image is relative to the target object illumination image, which means that the virtual object corresponding to the target virtual object image does not actually exist in the target virtual object image. Objects in the real scene of the illuminated image.

Step 202: Input the illumination image of the target object into a pre-trained shadow extraction model to obtain a resultant shadow image including distance information.

In this embodiment, based on the target object illumination image obtained in step 201, the above-mentioned execution subject may input the target object illumination image into a pre-trained shadow extraction model to obtain a resultant shadow image including distance information. Among them, the resulting shadow image may be a shadow image extracted from the illumination image of the target object and added with distance information. The distance information is used to characterize the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image. Specifically, because a certain point in the object blocks the light source, a shadow point will be generated on the projection surface (such as the ground, wall, desktop, etc.). Furthermore, here, the point on the object used to generate the shadow point can be used. As the object point corresponding to the shadow point, and the object point on the object corresponds to the pixel point in the object image, the shadow point in the shadow corresponds to the pixel point in the shadow image, which can then be used to generate the shadow image corresponding to the pixel point The pixel point corresponding to the object point of the shadow point is regarded as the pixel point corresponding to the pixel point in the shadow image.

In this embodiment, the distance information can be embodied in the resulting shadow image in various forms. As an example, the distance information can be recorded in the resulting shadow image in digital form. Specifically, each pixel in the resulting shadow image may correspond to a number, and the number may be the distance between the corresponding pixel and the corresponding pixel in the object image.

In some optional implementation manners of this embodiment, the distance information may be the pixel value of the pixel in the resulting shadow image. Specifically, various ways can be used to characterize the distance using pixel values. As an example, the larger the pixel value, the longer the distance; or the smaller the pixel value, the longer the distance.

In this embodiment, the shadow extraction model can be used to characterize the correspondence between the illumination image of the object and the resulting shadow image. Specifically, as an example, the shadow extraction model may be pre-made by technicians based on statistics of a large number of object illumination images and result shadow images corresponding to the object illumination image, and stores multiple object illumination images and corresponding result shadows. Correspondence table of the image; it can also be a model obtained after training an initial model (such as a neural network) using a machine learning method based on a preset training sample.

In some optional implementations of this embodiment, the shadow extraction model may be trained by the above-mentioned executive body or other electronic devices through the following steps:

First, obtain a preset training sample set, where the training sample includes a sample object illumination image and a sample result shadow image predetermined for the sample object illumination image.

Here, the illuminated image of the sample object may be an image obtained by shooting the sample object in an illuminated scene. The sample object illumination image may include a sample object image and a sample shadow image. The sample result shadow image may be an image obtained by extracting a sample shadow image from a sample object illumination image, and adding sample distance information to the extracted sample shadow image.

Then, a pre-established generative confrontation network is obtained, where the generative confrontation network includes a generation network and a discrimination network. The generation network is used to recognize the input object illumination image and output the resulting shadow image, and the discrimination network is used to determine the input Whether the image of is the image output by the generation network.

Here, the above-mentioned generative confrontation network may be a generative confrontation network of various structures. For example, the generative adversarial network may be a deep convolutional generative adversarial network (Deep Convolutional Generative Adversarial Network, DCGAN). It should be noted that the above-mentioned generative confrontation network may be an untrained generative confrontation network after initializing parameters, or a trained generative confrontation network.

Specifically, the generation network may be a convolutional neural network for image processing (for example, a convolutional neural network with various structures including a convolutional layer, a pooling layer, a depooling layer, and a deconvolutional layer). The above-mentioned discriminant network may also be a convolutional neural network (for example, a convolutional neural network of various structures including a fully connected layer, where the above-mentioned fully connected layer can implement a classification function). In addition, the discriminant network can also be other models used to implement classification functions, such as Support Vector Machine (SVM). Here, if the discriminant network determines that the image input to the discriminant network is an image output by the generation network, it can output 1 (or 0); if it determines that it is not an image output by the generation network, it can output 0 (or 1). It should be noted that the discrimination network can also output other preset information to characterize the discrimination result, which is not limited to the values 1 and 0.

Finally, based on the machine learning method, the sample object illumination image included in the training samples in the training sample set is used as the input of the generation network, and the resulting shadow image output by the network and the sample result shadow image corresponding to the input sample object illumination image are generated. As the input of the discriminant network, the generation network and the discriminant network are trained, and the trained generation network is determined as the shadow extraction model.

Specifically, the parameters of any one of the generating network and the discriminating network (which can be called the first network) can be fixed first, and the network with no fixed parameters (which can be called the second network) can be optimized; then the parameters of the second network can be fixed. Parameters to improve the first network. Continuously carry out the above iterations, so that the judgment network cannot distinguish whether the input image is output by the generation network. At this time, the result shadow image generated by the above generation network is close to the sample result shadow image, and the above discrimination network cannot accurately distinguish between the real data and the generated data (that is, the accuracy rate is 50%). The generation network at this time can be determined as the shadow extraction model.

It should be noted that the above-mentioned executive body or other electronic devices can use the existing back propagation algorithm and gradient descent algorithm to train the generation network and the discrimination network. The parameters of the generation network and the discrimination network after each training will be adjusted, and the generation network and the discrimination network obtained after each adjustment of the parameters are used as the generation network and the discrimination network used in the next training.

Step 203: Based on the resulting shadow image, generate light direction information corresponding to the light image of the target object.

In this embodiment, based on the resulting shadow image obtained in step 202, the above-mentioned execution subject may generate the illumination direction information corresponding to the illumination image of the target object. Wherein, the light direction information can be used to indicate the light direction, which can include but is not limited to at least one of the following: text, numbers, symbols, and images. Specifically, as an example, the illumination direction information may be an arrow marked in the resulting shadow image. Here, the direction of the arrow may be the illumination direction; or the illumination direction information may be a two-dimensional vector, where the two-dimensional vector corresponds to The direction can be the light direction.

It should be noted that, in this embodiment, the light direction indicated by the light direction information is the projection of the actual light direction in the three-dimensional coordinate system on the projection surface where the shadow in the three-dimensional coordinate system is located. It can be understood that in practice, the light direction (that is, the projection of the actual light direction on the projection surface of the shadow) is usually the same as the extension direction of the shadow. Furthermore, the above-mentioned execution subject may determine the extension direction of the shadow based on the pixel points in the resulting shadow image and the distance information corresponding to the pixel points, and then determine the light direction. Specifically, as an example, the above-mentioned execution subject may select, from the resulting shadow image, the pixel with the closest distance represented by the corresponding distance information as the first pixel, and select the pixel with the farthest distance represented by the corresponding distance information The dot is used as the second pixel, and further, the above-mentioned execution subject may determine the direction in which the first pixel points to the second pixel as the illumination direction.

Step 204, based on the illumination direction information, generate a virtual object illumination image corresponding to the target virtual object image.

In this embodiment, based on the illumination direction information obtained in step 203, the above-mentioned execution subject may generate the virtual object illumination image corresponding to the target virtual object image. The lighting image of the virtual object includes the above-mentioned target virtual object image and the virtual shadow image corresponding to the target virtual object image. The lighting direction corresponding to the virtual shadow image in the lighting image of the virtual object matches the lighting direction indicated by the lighting direction information. Here, the matching means that the angular deviation of the illumination direction corresponding to the virtual shadow image with respect to the illumination direction indicated by the illumination direction information is less than or equal to the preset angle.

Specifically, the above-mentioned execution subject may use various methods to generate the virtual object illumination image corresponding to the target virtual object image based on the illumination direction information.

As an example, a light source can be constructed in the rendering engine based on the light direction indicated by the light direction information, and then the target virtual object image can be rendered based on the constructed light source to obtain the virtual object light image. It should be noted that since the light direction indicated by the light direction information is the actual light direction projected on the projection surface where the shadow is located, in the process of constructing the light source, it is necessary to first determine the actual light direction based on the light direction information, and then based on the actual light The direction builds the light source. It should be noted that, in practice, the actual light direction can be determined by the light direction on the projection surface where the shadow is located and the light direction on the projection surface perpendicular to the projection surface where the shadow is located, and in this embodiment, it is perpendicular to the projection surface where the shadow is located. The direction of illumination on the projection surface can be predetermined.

As another example, the above-mentioned execution subject pre-stores an initial virtual shadow image corresponding to the target virtual object image. Then the execution subject can adjust the initial virtual shadow image based on the illumination direction information to obtain the virtual shadow image, and then combine the virtual shadow image and the target virtual object image to generate the virtual object illumination image.

It should be noted that since the light source corresponding to the illumination image of the target object is parallel light or sunlight, here, it can be considered that the illumination direction corresponding to the virtual shadow image in the illumination image of the virtual object is the same as the illumination direction indicated by the aforementioned illumination direction information. It does not need to consider the influence of the position of the virtual object illumination image added to the target object illumination image on the illumination direction corresponding to the virtual shadow image.

Step 205: Fusion of the lighting image of the virtual object and the lighting image of the target object to add the lighting image of the virtual object to the lighting image of the target object to obtain a result image.

In this embodiment, based on the virtual object illumination image obtained in step 204, the above-mentioned execution subject may fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image to obtain the result image. Among them, the result image is the target object illumination image with the virtual object illumination image added.

Here, the position where the virtual object illumination image is added to the target object illumination image can be predetermined (for example, the center position of the image), or it can be determined after recognizing the target object illumination image (for example, it can be After the object image and shadow image in the illumination image of the target object are obtained, the area in the illumination image of the target object that does not include the object image and shadow image is determined as the location for adding the virtual object illumination image).

In some optional implementation manners of this embodiment, after obtaining the result image, the execution subject may display the obtained result image.

In some optional implementation manners of this embodiment, the above-mentioned execution subject may also send the obtained result image to the user terminal connected in communication, and control the user terminal to display the result image. Wherein, the user terminal is a terminal used by the user to communicate with the execution subject. Specifically, the above-mentioned execution subject may send a control signal to the user terminal, thereby controlling the user terminal to display the result image.

Here, since the virtual object image in the result image corresponds to a virtual shadow image, and the illumination direction of the corresponding virtual shadow image matches the illumination direction of the shadow image corresponding to the real object image, this implementation can control the user terminal Display a more realistic result image to improve the display effect of the image.

Continue to refer to Fig. 3, which is a schematic diagram of an application scenario of the method for processing an image according to this embodiment. In the application scenario of FIG. 3, the server 301 first obtains the cat's lighting image 302 (target object lighting image) and the football image 303 (target virtual object image), where the cat lighting image 302 includes the cat's image (object image) and The shadow image of the cat (shadow image). Then, the server 301 can input the cat's light image 302 into the pre-trained shadow extraction model 304 to obtain the cat's shadow image 305 (resulting shadow image) including distance information, where the distance information is used to represent the cat's light image 302 , The distance between the pixel of the cat’s shadow image and the pixel of the cat’s image. Then, the server 301 may generate the light direction information 306 corresponding to the light image 302 of the cat based on the shadow image 305 of the cat including the distance information. Then, the server 301 can generate a football illumination image 307 (virtual object illumination image) corresponding to the football image 304 based on the illumination direction information 306, where the football shadow image (virtual shadow image) in the football illumination image 307 corresponds to the illumination direction It matches the light direction indicated by the light direction information 306. Finally, the server 301 may merge the football lighting image 307 and the cat lighting image 302 to add the football lighting image 307 to the cat lighting image 302 to obtain a result image 308.

Currently, when shooting objects in a illuminated scene, the shadows of the objects in the scene are usually shot. The virtual object image used to add to the real scene image usually does not include the shadow image. At this time, adding the virtual object image to the real scene image will reduce the authenticity of the image and affect the display effect of the image. The method provided by the above-mentioned embodiments of the present disclosure can generate the virtual object illumination image corresponding to the virtual object image, so that the corresponding virtual shadow image can be added to the virtual object image, and then the illumination image of the virtual object and the illumination image of the target object can be added. After the fusion, the authenticity of the generated result image can be improved; in addition, the present disclosure can determine the illumination direction of the virtual shadow image corresponding to the virtual object image based on the illumination direction of the shadow image in the target object illumination image. The virtual object image is better integrated into the target object illumination image, and the authenticity of the result image is further improved, which helps to improve the display effect of the result image.

With further reference to FIG. 4, it shows a flow 400 of another embodiment of a method for processing an image. The process 400 of the method for processing an image includes the following steps:

Step 401: Obtain a target object illumination image and a target virtual object image.

In this embodiment, the execution subject of the method for processing images (for example, the server 105 shown in FIG. 1) may remotely or locally obtain the target object illumination image and the target virtual object image through a wired connection or a wireless connection. Among them, the illumination image of the target object is the image to be processed. The illumination image of the target object includes the object image and the shadow image corresponding to the object image. The target virtual object image is an image used to process the illumination image of the target object. The target virtual object image may be an image predetermined according to the shape of the virtual object.

Step 402: Input the illumination image of the target object into a pre-trained shadow extraction model, and obtain a resulting shadow image including distance information.

In this embodiment, based on the target object illumination image obtained in step 401, the above-mentioned execution subject may input the target object illumination image into a pre-trained shadow extraction model to obtain a resulting shadow image including distance information. Among them, the resulting shadow image may be a shadow image extracted from the illumination image of the target object and added with distance information. The distance information is used to characterize the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image. The shadow extraction model can be used to characterize the correspondence between the illumination image of the object and the resulting shadow image.

Step 403: Input the resulting shadow image into a pre-trained light direction recognition model to obtain light direction information.

In this embodiment, based on the result shadow image obtained in step 402, the above-mentioned execution subject may input the result shadow image into a pre-trained light direction recognition model to obtain light direction information. Wherein, the light direction information can be used to indicate the light direction, which can include but is not limited to at least one of the following: text, numbers, symbols, and images.

In this embodiment, the light direction recognition model can be used to characterize the corresponding relationship between the resulting shadow image and light direction information. Specifically, as an example, the illumination direction recognition model may be pre-made by technicians based on statistics of a large number of result shadow images and the illumination direction information corresponding to the result shadow images, and store multiple result shadow images and corresponding illumination. Correspondence table of direction information; it can also be a model obtained after training an initial model (such as a neural network) using a machine learning method based on preset training samples.

Step 404, based on the illumination direction information, generate a virtual object illumination image corresponding to the target virtual object image.

In this embodiment, based on the illumination direction information obtained in step 403, the above-mentioned execution subject may generate the virtual object illumination image corresponding to the target virtual object image. The lighting image of the virtual object includes the above-mentioned target virtual object image and the virtual shadow image corresponding to the target virtual object image. The lighting direction corresponding to the virtual shadow image in the lighting image of the virtual object matches the lighting direction indicated by the lighting direction information. Here, the matching means that the angular deviation of the illumination direction corresponding to the virtual shadow image with respect to the illumination direction indicated by the illumination direction information is less than or equal to the preset angle.

Step 405: Fusion of the lighting image of the virtual object and the lighting image of the target object to add the lighting image of the virtual object to the lighting image of the target object to obtain a result image.

In this embodiment, based on the illumination image of the virtual object obtained in step 404, the above-mentioned execution subject may merge the illumination image of the virtual object and the illumination image of the target object to add the illumination image of the virtual object to the illumination image of the target object to obtain the result image. Among them, the result image is the target object illumination image with the virtual object illumination image added.

The above step 401, step 402, step 404, and step 405 can be respectively performed in a manner similar to step 201, step 202, step 204, and step 205 in the foregoing embodiment. The above is for step 201, step 202, step 204, and step 205. The description of 205 is also applicable to step 401, step 402, step 404, and step 405, and will not be repeated here.

It can be seen from FIG. 4 that, compared with the embodiment corresponding to FIG. 2, the process 400 of the method for processing an image in this embodiment highlights the step of using the light direction recognition model to generate light direction information. Therefore, the solution described in this embodiment can more conveniently determine the illumination direction corresponding to the illumination image of the target object, thereby can generate the result image more quickly, and improve the efficiency of image processing.

With further reference to FIG. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a device for processing images. The device embodiment corresponds to the method embodiment shown in FIG. The device can be specifically applied to various electronic devices.

As shown in FIG. 5, the apparatus 500 for processing images in this embodiment includes: an image acquisition unit 501, an image input unit 502, an information generation unit 503, an image generation unit 504, and an image fusion unit 505. The image acquisition unit 501 is configured to acquire a target object illumination image and a target virtual object image, where the target object illumination image includes the object image and a shadow image corresponding to the object image; the image input unit 502 is configured to illuminate the target object image Input the pre-trained shadow extraction model to obtain the resulting shadow image including distance information, where the distance information is used to represent the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image; information generation unit 503 Is configured to generate illumination direction information corresponding to the target object illumination image based on the resulting shadow image; the image generating unit 504 is configured to generate the virtual object illumination image corresponding to the target virtual object image based on the illumination direction information, wherein the virtual object illuminates The illumination direction corresponding to the virtual shadow image in the image matches the illumination direction indicated by the illumination direction information; the image fusion unit 505 is configured to fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image Into the illumination image of the target object, the result image is obtained.

In this embodiment, the image acquisition unit 501 of the apparatus 500 for processing images may remotely or locally acquire the target object illumination image and the target virtual object image through a wired connection or a wireless connection. Among them, the illumination image of the target object is the image to be processed. The illumination image of the target object includes the object image and the shadow image corresponding to the object image. The target virtual object image is an image used to process the illumination image of the target object. The target virtual object image may be an image predetermined according to the shape of the virtual object.

In this embodiment, based on the illumination image of the target object obtained by the image acquisition unit 501, the image input unit 502 may input the illumination image of the target object into a pre-trained shadow extraction model to obtain a resulting shadow image including distance information. Among them, the resulting shadow image may be a shadow image extracted from the illumination image of the target object and added with distance information. The distance information is used to characterize the distance between the pixel point of the shadow image and the pixel point corresponding to the object image in the target object illumination image. The distance information can be embodied in the resulting shadow image in various forms. The shadow extraction model can be used to characterize the correspondence between the illumination image of the object and the resulting shadow image.

In this embodiment, based on the resulting shadow image obtained by the image input unit 502, the information generating unit 503 may generate light direction information corresponding to the light image of the target object. Wherein, the light direction information can be used to indicate the light direction, which can include but is not limited to at least one of the following: text, numbers, symbols, and images.

In this embodiment, based on the lighting direction information obtained by the information generating unit 503, the image generating unit 504 generates a virtual object lighting image corresponding to the target virtual object image. The lighting image of the virtual object includes the above-mentioned target virtual object image and the virtual shadow image corresponding to the target virtual object image. The lighting direction corresponding to the virtual shadow image in the lighting image of the virtual object matches the lighting direction indicated by the lighting direction information. Here, the matching means that the angular deviation of the illumination direction corresponding to the virtual shadow image with respect to the illumination direction indicated by the illumination direction information is less than or equal to the preset angle.

In this embodiment, based on the virtual object illumination image obtained by the image generation unit 504, the image fusion unit 505 may fuse the virtual object illumination image and the target object illumination image to add the virtual object illumination image to the target object illumination image. Obtain the resulting image. Among them, the result image is the target object illumination image with the virtual object illumination image added.

In some optional implementations of this embodiment, the information generating unit 503 may be further configured to input the resulting shadow image into a pre-trained light direction recognition model to obtain light direction information.

In some optional implementations of this embodiment, the distance information is the pixel value of the pixel in the resulting shadow image.

In some optional implementations of this embodiment, the shadow extraction model can be obtained by training in the following steps: Obtain a preset training sample set, where the training samples include sample object illumination images and samples predetermined for the sample object illumination images Result shadow image; obtain a pre-established generative confrontation network, where the generative confrontation network includes a generation network and a discriminant network. The generation network is used to identify the input object illumination image and output the resulting shadow image, and the discriminant network is used to determine Whether the input image is the image output by the generation network; based on the machine learning method, the illumination image of the sample object included in the training sample in the training sample set is used as the input of the generation network, and the resulting shadow image output by the generation network is combined with the input The sample result shadow image corresponding to the illumination image of the sample object is used as the input of the discriminant network, the generation network and the discriminant network are trained, and the trained generation network is determined as the shadow extraction model.

In some optional implementation manners of this embodiment, the apparatus 500 may further include: an image display unit (not shown in the figure), configured to display the obtained result image.

In some optional implementation manners of this embodiment, the apparatus 500 may further include: an image sending unit (not shown in the figure) configured to send the obtained result image to the user terminal connected in communication, and to control the user The terminal displays the result image.

It can be understood that the units recorded in the device 500 correspond to the steps in the method described with reference to FIG. 2. Therefore, the operations, features, and beneficial effects produced by the method described above are also applicable to the device 500 and the units contained therein, and will not be repeated here.

The apparatus 500 provided by the above-mentioned embodiment of the present disclosure can generate the virtual object illumination image corresponding to the virtual object image, and thereby can add a corresponding virtual shadow image to the virtual object image, and then illuminate the virtual object illumination image and the target object illumination image After the fusion, the authenticity of the generated result image can be improved; in addition, the present disclosure can determine the illumination direction of the virtual shadow image corresponding to the virtual object image based on the illumination direction of the shadow image in the target object illumination image, so as to: The virtual object image can be better integrated into the target object illumination image, further improving the authenticity of the result image, and helping to improve the display effect of the result image.

Next, refer to FIG. 6, which shows a schematic structural diagram of an electronic device (such as the terminal device or the server in FIG. 1) 600 suitable for implementing the embodiments of the present disclosure. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (e.g. Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG. 6 is only an example, and should not bring any limitation to the function and use range of the embodiments of the present disclosure.

As shown in FIG. 6, the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which can be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608. The program in the memory (RAM) 603 executes various appropriate actions and processing. The RAM 603 also stores various programs and data required for the operation of the electronic device 600. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.

Generally, the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, vibration An output device 607 such as a device; a storage device 608 such as a magnetic tape and a hard disk; and a communication device 609. The communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data. Although FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. When the computer program is executed by the processing device 601, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.

It should be noted that the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device . The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the target object illumination image and the target virtual object image, where the target object illumination image includes The object image and the shadow image corresponding to the object image; input the target object illumination image into the pre-trained shadow extraction model to obtain the resulting shadow image including distance information, where the distance information is used to characterize the shadow image in the target object illumination image The distance between the pixel point and the pixel point corresponding to the object image; based on the resulting shadow image, the illumination direction information corresponding to the target object illumination image is generated; based on the illumination direction information, the virtual object illumination image corresponding to the target virtual object image is generated, where the virtual The lighting direction corresponding to the virtual shadow image in the object lighting image matches the lighting direction indicated by the lighting direction information; the virtual object lighting image and the target object lighting image are merged to add the virtual object lighting image to the target object lighting image , Get the result image.

The computer program code used to perform the operations of the present disclosure can be written in one or more programming languages or a combination thereof. The programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).

The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented in a software manner, or may be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation on the unit itself under certain circumstances. For example, the image acquisition unit can also be described as "a unit that acquires the illumination image of the target object and the image of the virtual object".

The above description is only a preferred embodiment of the present disclosure and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of disclosure involved in this disclosure is not limited to the technical solutions formed by the specific combination of the above technical features, and should also cover the above technical features or technical solutions without departing from the above disclosed concept. Other technical solutions formed by any combination of its equivalent features. For example, the above-mentioned features and the technical features disclosed in the present disclosure (but not limited to) with similar functions are mutually replaced to form a technical solution.

Claims

A method for processing images, including:

Acquiring a target object illumination image and a target virtual object image, where the target object illumination image includes the object image and the shadow image corresponding to the object image;

The illumination image of the target object is input into a pre-trained shadow extraction model to obtain a resulting shadow image including distance information, where the distance information is used to represent the pixel points of the shadow image corresponding to the object image in the illumination image of the target object Pixel distance;

Based on the resulting shadow image, generating light direction information corresponding to the target object light image;

Based on the illumination direction information, a virtual object illumination image corresponding to the target virtual object image is generated, wherein the illumination direction corresponding to the virtual shadow image in the virtual object illumination image is the same as the illumination indicated by the illumination direction information Match the direction;

The virtual object illumination image and the target object illumination image are fused to add the virtual object illumination image to the target object illumination image to obtain a result image.
The method according to claim 1, wherein said generating the illumination direction information corresponding to the illumination image of the target object based on the resulting shadow image comprises:

The resultant shadow image is input into a pre-trained light direction recognition model to obtain light direction information.
The method according to claim 1, wherein the distance information is the pixel value of the pixel in the resulting shadow image.
The method according to claim 1, wherein the shadow extraction model is obtained by training in the following steps:

Acquiring a preset training sample set, where the training sample includes a sample object illumination image and a sample result shadow image predetermined for the sample object illumination image;

Obtain a pre-established generative confrontation network, where the generative confrontation network includes a generation network and a discrimination network. The generation network is used to recognize the input object illumination image and output the resulting shadow image, and the discrimination network is used to determine the input Whether the image of is the image output by the generation network;

Based on the machine learning method, the sample object illumination image included in the training samples in the training sample set is used as the input of the generation network, and the resulting shadow image output by the network and the sample result shadow image corresponding to the input sample object illumination image are generated As the input of the discriminant network, the generation network and the discriminant network are trained, and the trained generation network is determined as the shadow extraction model.
The method according to any one of claims 1-4, wherein the method further comprises:

Display the obtained result image.
The method according to any one of claims 1-4, wherein the method further comprises:

The obtained result image is sent to a user terminal connected in communication, and the user terminal is controlled to display the result image.
A device for processing images, including:

An image acquisition unit configured to acquire a target object illumination image and a target virtual object image, wherein the target object illumination image includes the object image and the shadow image corresponding to the object image;

The image input unit is configured to input the target object illumination image into a pre-trained shadow extraction model to obtain a resultant shadow image including distance information, wherein the distance information is used to characterize the shadow image in the target object illumination image The distance between the pixel point and the pixel point corresponding to the object image;

An information generating unit configured to generate light direction information corresponding to the target object light image based on the resulting shadow image;

The image generating unit is configured to generate a virtual object lighting image corresponding to the target virtual object image based on the lighting direction information, wherein the lighting direction corresponding to the virtual shadow image in the virtual object lighting image is the same as the Match the light direction indicated by the light direction information;

The image fusion unit is configured to fuse the illumination image of the virtual object and the illumination image of the target object to add the illumination image of the virtual object to the illumination image of the target object to obtain a result image.
The device according to claim 7, wherein the information generating unit is further configured to:

The resultant shadow image is input into a pre-trained light direction recognition model to obtain light direction information.
8. The device according to claim 7, wherein the distance information is the pixel value of the pixel in the resulting shadow image.
The device according to claim 7, wherein the shadow extraction model is obtained by training in the following steps:

Acquiring a preset training sample set, where the training sample includes a sample object illumination image and a sample result shadow image predetermined for the sample object illumination image;

Obtain a pre-established generative confrontation network, where the generative confrontation network includes a generation network and a discrimination network. The generation network is used to recognize the input object illumination image and output the resulting shadow image, and the discrimination network is used to determine the input Whether the image of is the image output by the generation network;

Based on the machine learning method, the sample object illumination image included in the training samples in the training sample set is used as the input of the generation network, and the resulting shadow image output by the network and the sample result shadow image corresponding to the input sample object illumination image are generated As the input of the discriminant network, the generation network and the discriminant network are trained, and the trained generation network is determined as the shadow extraction model.
The device according to any one of claims 7-10, wherein the device further comprises:

The image display unit is configured to display the obtained result image.
The device according to any one of claims 7-10, wherein the device further comprises:

The image sending unit is configured to send the obtained result image to a user terminal connected in communication, and control the user terminal to display the result image.
An electronic device including:

One or more processors;

A storage device on which one or more programs are stored,

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-6.
A computer-readable medium having a computer program stored thereon, wherein the program is executed by a processor to implement the method according to any one of claims 1-6.