CN112651881A

CN112651881A - Image synthesis method, apparatus, device, storage medium, and program product

Info

Publication number: CN112651881A
Application number: CN202011619097.6A
Authority: CN
Inventors: 卢飞翔; 刘宗岱; 张良俊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Baidu USA LLC
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Baidu USA LLC
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-13
Anticipated expiration: 2040-12-30
Also published as: CN112651881B

Abstract

The disclosure discloses an image synthesis method, an image synthesis device, an image synthesis apparatus, a storage medium and a program product, and relates to the technical field of image processing. The specific implementation scheme is as follows: performing texture completion processing on an image of a first view angle including a first target object to obtain a texture map of the first target object; generating a three-dimensional model of the first target object using the texture map; projecting the three-dimensional model of the first target object according to the azimuth information of the scene image of the second visual angle to obtain a two-dimensional image of the first target object; and superposing the two-dimensional image of the first target object to the scene image to obtain a composite image of a second visual angle. The embodiment of the disclosure can obviously reduce the cost of data synthesis, provides a large amount of training data for deep neural network training, and greatly reduces the consumption of manpower, material resources and financial resources.

Description

Image synthesis method, apparatus, device, storage medium, and program product

Technical Field

The present disclosure relates to the field of computer technology, and more particularly, to the field of image processing technology.

Background

Machine learning model training typically requires a large number of labeled multi-perspective images as a training set. Taking an application scene of vehicle-road cooperation as an example, visual sensors can be arranged at the tops of vehicles, telegraph poles and traffic lights at intersections, and multi-view detection, segmentation and pose estimation can be performed on the vehicles on the roads. Vehicle-road coordination is an important way to realize automatic driving. The difficulty of sheltering vehicles can be effectively solved by utilizing the cooperation of the vehicle and the road, and the safety of the automatic driving technology is greatly improved. However, the conventional method requires a large number of labeled multi-view images as a training set, and then performs network model training. Training data of multi-view images are difficult to obtain in traffic scenes, and data are difficult to label.

Disclosure of Invention

The present disclosure provides an image synthesis method, apparatus, device, storage medium, and program product.

According to an aspect of the present disclosure, there is provided an image synthesis method including:

performing texture completion processing on an image of a first view angle including a first target object to obtain a texture map of the first target object;

generating a three-dimensional model of the first target object using the texture map;

projecting the three-dimensional model of the first target object according to the azimuth information of the scene image of the second visual angle to obtain a two-dimensional image of the first target object;

and superposing the two-dimensional image of the first target object to the scene image to obtain a composite image of a second visual angle.

According to another aspect of the present disclosure, there is provided an image synthesizing apparatus including:

the processing unit is used for performing texture completion processing on the image of the first visual angle including the first target object to obtain a texture map of the first target object;

a generating unit for generating a three-dimensional model of the first target object using the texture map;

the projection unit is used for projecting the three-dimensional model of the first target object according to the azimuth information of the scene image of the second visual angle to obtain a two-dimensional image of the first target object;

and the superposition unit is used for superposing the two-dimensional image of the first target object to the scene image to obtain a composite image of a second visual angle.

According to still another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method provided by any one of the embodiments of the present disclosure.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present disclosure.

According to yet another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method provided by any one of the embodiments of the present disclosure.

One embodiment in the above application has the following advantages or benefits: the cost of data synthesis can be obviously reduced, a large amount of training data are provided for deep neural network training, and the consumption of manpower, material resources and financial resources is greatly reduced. By taking a vehicle as a target object as an example, the embodiment of the disclosure can provide a large number of labeled multi-view images for network model training, can improve the accuracy of the lane cooperative task, improve the performance of environmental perception, and can effectively improve the safety of an automatic driving vehicle.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of an image synthesis method according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of texture completion for an image synthesis method according to another embodiment of the present disclosure;

FIG. 3 is a flow diagram of three-dimensional model reconstruction for an image synthesis method according to another embodiment of the present disclosure;

FIG. 4 is a flow diagram of an image projection of an image synthesis method according to another embodiment of the present disclosure;

FIG. 5 is a flow chart of image inpainting for an image composition method according to another embodiment of the present disclosure;

FIG. 6 is a flow chart of an image synthesis method according to another embodiment of the present disclosure;

FIG. 7 is a schematic diagram of the data diversity effect of an image synthesis method according to another embodiment of the present disclosure;

FIG. 8 is a schematic diagram of an image compositing device according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an image compositing device according to another embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device for implementing an image synthesis method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Machine learning model training typically requires a large number of labeled multi-perspective images as a training set. By taking the application scene of vehicle-road cooperation as an example, the vehicle-road cooperation can be used for detecting vehicles on the road at multiple visual angles, the problem of vehicle shielding is effectively solved, and the safety of the automatic driving technology is greatly improved. However, the conventional method requires a large number of labeled multi-view images as a training set, and then performs network model training. Training data of multi-view images are difficult to obtain in traffic scenes, and data are difficult to label.

Taking an application scene of vehicle-road cooperation as an example, the method for generating the multi-angle image in the related technology mainly comprises the following technical schemes:

(1) and rendering the three-dimensional model. The scheme needs to construct a large number of vehicle three-dimensional models and city three-dimensional models. Data such as texture maps, scene illumination, rendering parameters and the like of the model need to be adjusted, and image rendering is carried out by using rendering software such as 3 dsMax. The scheme has high cost, low efficiency and difficult guarantee of effect, and the obtained image data is difficult to carry out network training.

(2) And training the corresponding relation of the pixel points by using the multi-view images, and predicting the images under the new view angles. This approach requires a large number of labeled multi-view images as a training set. Training data is difficult to obtain in traffic scenarios and data is difficult to label.

(3) Image synthesis is performed by generating a countermeasure network (GAN). This scheme requires two or more image pairs (image pairs) as training data, which is difficult to acquire. In addition, GAN networks are difficult to train and, as a result, difficult to control. The biggest defect of the scheme is that the corresponding labeling result cannot be automatically generated.

In view of this, the present disclosure provides a method for synthesizing multi-view images for vehicle-road cooperative tasks. Fig. 1 is a flowchart of an image synthesis method according to an embodiment of the present disclosure. Referring to fig. 1, the image synthesizing method includes:

step S110, performing texture completion processing on an image of a first view angle including a first target object to obtain a texture map of the first target object;

step S120, generating a three-dimensional model of the first target object by using the texture map;

step S130, projecting the three-dimensional model of the first target object to obtain a two-dimensional image of the first target object according to the orientation information of the scene image of the second visual angle;

step S140, superimposing the two-dimensional image of the first target object onto the scene image to obtain a composite image of the second view angle.

In step S110 and step S120, a three-dimensional model of the first target object is reconstructed, and in step S130 and step S140, the projection of the three-dimensional model of the first target object is superimposed on the scene image, so as to obtain a composite image of a new view angle.

In the task of reconstructing a three-dimensional model of a first target object, it is generally necessary to reconstruct a texture map of the three-dimensional model from a monocular image. Due to the unicity of the shooting view angle of the monocular image, a complete texture map of the first target object cannot be acquired. Taking a vehicle as an example of the first target object, the vehicle is photographed from the front, and the tail lights of the vehicle cannot be photographed. In addition, since the photographing view angle is single, the image textures of some parts in the image of the first target object may be photographed. Therefore, the missing part in the first target object needs to be completed to reconstruct the three-dimensional model of the first target object.

In step S110, a captured image of a first perspective including a first target object may be first acquired. For example, the image of the first perspective may be a front view taken from the front. And performing texture completion processing on the image of the first visual angle comprising the first target object by utilizing a pre-trained deep neural network to obtain a texture map of the first target object. In step S120, a three-dimensional model is reconstructed using the texture map obtained in step S110, and a three-dimensional model of the first target object is generated.

In step S130, the captured scene image of the second viewing angle and the orientation information of the scene image may be first acquired. For example, the scene image of the second perspective may be a top view of a road scene taken from a high position down. In one example, orientation information for an image of a scene may be obtained from camera parameters. The orientation information may include three-dimensional geometric information of the road scene, including plane equations, normals, and the like. According to the orientation information, the three-dimensional model of the first target object can be subjected to projection operation to obtain a two-dimensional image of the first target object. For example, a three-dimensional model of the first target object may be projected onto a plane determined by a plane equation of the road scene resulting in a two-dimensional image of the first target object. And the placing position of the three-dimensional model in the road scene is coordinated with the three-dimensional geometric information of the road scene through projection operation.

In step S140, the two-dimensional image of the first target object obtained in step S110 is superimposed on the scene image, so as to obtain a composite image of the second view angle.

In an application scenario of vehicle-road coordination, most of images captured by various visual sensors may be images from a first viewing angle, and the number of images from a second viewing angle is small. The disclosed embodiments may synthesize an image of a second perspective using an image of a first perspective of a first target object and a scene image of the second perspective. By utilizing the image generation method and device disclosed by the embodiment of the disclosure, the cost of data synthesis can be obviously reduced, a large amount of training data is provided for the training of the deep neural network, and the consumption of manpower, material resources and financial resources is greatly reduced. By taking a vehicle as a target object as an example, the embodiment of the disclosure can provide a large number of labeled multi-view images for network model training, can improve the accuracy of the lane cooperative task, improve the performance of environmental perception, and can effectively improve the safety of an automatic driving vehicle.

Fig. 2 is a flowchart of texture completion of an image synthesis method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. In addition, as shown in fig. 2, in an embodiment, in step S110 in fig. 1, performing texture completion processing on the image of the first perspective including the first target object to obtain a texture map of the first target object may specifically include:

step S210, segmenting an image of a first view angle including a first target object to obtain a segmented image of at least one component including the first target object;

step S220, labeling the pose of the first target object in the image of the first visual angle comprising the first target object to obtain pose labeling information;

step S230, projecting the segmentation image according to the pose marking information to obtain a to-be-processed image of the first target object;

step S240, carrying out texture completion processing on the image to be processed by utilizing the deep neural network to obtain a texture mapping of the first target object.

In step S210, an image of a first perspective including a first target object is first segmented to obtain a segmented image of at least one component including the first target object.

Taking a vehicle as a first target object as an example, a model object to be reconstructed is divided into a plurality of parts (components). For example, the vehicle may be divided into a plurality of members such as 4 wheels, a front cover, a rear cover, and a tail lamp. In one example, if the captured image of the vehicle is taken from the front, there may be only the front cover and 2 front wheels in the image, and no rear cover and tail lights. That is, some parts are visible in the captured image and some parts may not be visible in the captured image. In addition, due to the limitation of the shooting angle, the image textures of the front cover and the 2 front wheels in the image can also be incomplete. The captured image of the vehicle may be segmented to obtain a segmented image including various components in the image.

In one example, the segmented image may be a to-be-processed image of at least one component of the first target object.

In another example, in step S220, the pose of the first target object may be further tagged in the image including the first target object, so as to obtain pose tagging information. Although the same first target object is photographed, the pose of the first target object presented on the image may be different due to the difference in photographing angle, and the images of the respective parts of the first target object may also be different due to the difference in photographing angle. Therefore, the pose of the first target object can be identified by using an identification algorithm to obtain pose marking information. And the pose marking information can also be obtained in a manual marking mode.

In one embodiment, the pose marking information may include a six degree of freedom spatial pose. The six degrees of freedom that the object has in space may include degrees of freedom of movement in the directions of three orthogonal axes x, y, z and degrees of freedom of rotation about these three axes. Therefore, the position of the object can be determined by utilizing the six-degree-of-freedom space pose.

In step S230, the segmented image is projected according to the pose labeling information, and an image projection algorithm may be used to perform a projection operation on the segmented image to correct the deviation of the segmented image caused by different poses of the first target object, so as to obtain a to-be-processed image of at least one component of the first target object after projection.

In step S240, a pre-trained deep neural network is used to perform texture completion processing on the image to be processed, so as to obtain a texture map of the first target object. In one example, the graph neural network model may be utilized to perform texture completion processing on the image to be processed. Specifically, a data structure of the association map of all the parts of the first target object may be constructed in advance. In the data structure of the dependency graph, each node element in the dependency graph is used to represent a component of the first target object. In an example where a vehicle is the first target object, n nodes may be included in the association map, each node representing one component of the vehicle, such as a wheel, a front cover, a tail lamp, and the like. When the image including the first target object is segmented in step S210, the image segmentation is also performed according to the nodes defined in the data structure of the association map. Each part in the image to be processed obtained after segmentation can find the node corresponding to the part in the association diagram.

For a part visible in the captured image comprising the first target object, the node to which the part corresponds can be found in the association map. The images of the components in the image to be processed can be assigned to the corresponding node elements in the association graph respectively. For a component that is not visible in the captured image including the first target object, that is, a component that is not captured in the image, the node corresponding to the component is assigned as a null node in the association map. And finally, constructing an association graph of all the parts of the first target object by using the node elements corresponding to all the parts after assignment.

And inputting the constructed association graph of the first target object into the graph neural network model. In the input correlation map, the dots in the map represent the image of the part of the first target object, and the image texture of some parts may be incomplete and the image texture of other parts may be completely absent. And (4) complementing incomplete or completely absent image textures in the input association diagram by using the graph neural network model, and outputting texture-complemented images of all parts of the first target object.

The embodiment of the disclosure can generate a high-quality complete three-dimensional texture map for the first target object, can remarkably reduce the cost of three-dimensional texture reconstruction, and realizes omnibearing simulation rendering of the target object. Taking a vehicle as a first target object as an example, the automatic driving simulation database can be greatly enriched through the reconstruction of a three-dimensional model of the vehicle, and abundant resources are provided for the training of a perception system.

Fig. 3 is a flowchart of three-dimensional model reconstruction of an image synthesis method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. In addition, as shown in fig. 3, in an embodiment, in step S120 in fig. 1, the generating a three-dimensional model of the first target object by using texture mapping may specifically include:

step S310, obtaining deformation parameters of a deformable template of the first target object, wherein the deformation parameters correspond to the appearance shape of the first target object;

step S320, generating a three-dimensional model of the first target object according to the deformation parameters and the texture map of the deformable template.

Taking a vehicle as an example of the first target object, the deformable template is used to generate vehicles with different appearances. The deformation parameters of the deformable template correspond to the appearance shape of different vehicles. The overall appearance shape of the vehicle is different for different vehicle types, and the shapes of the various parts constituting the vehicle may also be different. Corresponding deformable templates can be created according to the shapes of various parts of different vehicle models. The texture map in the deformable template is a predefined texture contour, and the texture in the contour can be filled in for image texture completion. And adjusting the deformation parameters of the deformable template, and combining the supplemented texture maps to generate a three-dimensional model of the vehicle.

In the embodiment of the disclosure, the reconstruction of the three-dimensional model of the vehicle is realized through the completion of the deformable template and the texture, so that the automatic driving simulation database can be greatly enriched, and abundant resources are provided for the training of the perception system.

Fig. 4 is a flowchart of image projection of an image synthesis method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. In addition, as shown in fig. 4, in an embodiment, in step S130 in fig. 1, projecting the three-dimensional model of the first target object according to the orientation information of the scene image from the second view angle to obtain a two-dimensional image of the first target object may specifically include:

step S410, obtaining azimuth information of the scene image according to the shooting parameters of the scene image of the second visual angle, wherein the azimuth information comprises a plane equation of the scene image;

step S420, adjusting the pose of the three-dimensional model of the first target object, and putting the three-dimensional model of the first target object on a plane determined by a plane equation;

step S430, projecting the delivered three-dimensional model of the first target object to obtain a two-dimensional image of the first target object.

Wherein, the shooting parameters of the scene image of the second visual angle can comprise camera parameters. The camera parameters may include at least one of internal and external parameters of the camera. The internal parameters of the camera may include a focal length. The external parameters of the camera may include a camera position. In step S410, when the captured scene image of the second angle of view is acquired, the capturing parameters may be acquired at the same time. And obtaining the azimuth information of the scene image according to the shooting parameters. The orientation information may include three-dimensional geometric information of the road scene. The three-dimensional geometric information may include plane equations, normals, and the like.

Taking a vehicle as an example of the first target object, in step S420, the pose of the three-dimensional model of the vehicle is adjusted according to the orientation information of the scene image, and the three-dimensional model of the vehicle is placed on the plane determined by the plane equation. The pose of the three-dimensional model of the vehicle is adjusted according to the orientation information of the scene image, so that the placing position of the three-dimensional model in the road scene is coordinated with the three-dimensional geometric information of the road scene. In step S430, the three-dimensional model of the released vehicle is projected to obtain a two-dimensional image of the vehicle.

In the embodiment of the disclosure, the three-dimensional model of the first target object is projected to obtain the two-dimensional image according to the orientation information of the scene image at the second view angle, so that the placement position of the three-dimensional model in the road scene is coordinated with the three-dimensional geometric information of the road scene, and the synthesized image effect is more real.

Fig. 5 is a flowchart of image inpainting of an image composition method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. Furthermore, as shown in fig. 5, in one embodiment, the method further comprises:

step S510, removing a second target object in the shot standby image of the second visual angle by using an image restoration method;

in step S520, the spare image without the second target object is used as the scene image of the second view angle.

In this embodiment, after the photographing is performed based on the second angle of view, the photographed image is taken as a standby image. And after the standby image is subjected to restoration processing, taking the restored standby image as a scene image of a second visual angle. Pedestrians and vehicles may be present in the road scene of the photographed standby image of the second viewing angle. Taking a vehicle reconstructed by the three-dimensional model as the first target object as an example, the pedestrian and the vehicle in the standby image of the second view angle are taken as the second target object. The second target object may be removed from the standby image by an image restoration method, and the standby image from which the second target object is removed is used as the scene image of the second view angle.

In the application scene of vehicle-road cooperation, the number of images of the possible second visual angle in the images shot by the visual sensor is less. By the method, a large number of images of the second visual angle can be generated, a large number of multi-visual-angle images are provided for network model training, and the robustness of the model is improved.

In one embodiment, the method further comprises:

and obtaining the labeling information of the synthetic image according to the position information of the first target object in the synthetic image.

The labeling information may include two-dimensional labeling information and three-dimensional labeling information. The two-dimensional annotation information can include at least one of "two-dimensional bounding boxes" and "instance-level segmentation". The "two-dimensional bounding box" includes labeled information of the overall position of the vehicle. "instance-level partitioning" includes partitioning a vehicle into components, noting the location of each component. The three-dimensional label comprises at least one of a three-dimensional bounding box and a six-degree-of-freedom space pose.

The image synthesis method disclosed by the embodiment of the disclosure can synthesize images of multiple visual angles and automatically generate corresponding two-dimensional labeling information and three-dimensional labeling information, thereby greatly reducing the cost of acquiring training data and effectively improving the robustness of a deep learning model.

Fig. 6 is a flowchart of an image synthesis method according to another embodiment of the present disclosure. The respective reference numerals in fig. 6 denote the following:

reference numeral 1 denotes a Source image (Source), which is a Front View (Front View);

reference numeral 2 denotes a Target image (Target), which is a Top View (Top View);

reference numeral (a) denotes a deformable Vehicle Template and a six-degree-of-freedom spatial Pose (Vehicle Template & labeled 6-DOF Pose);

reference numeral (b) denotes Part based Texture Inpainting;

reference numeral (c) denotes Model-based View Synthesis (Model-based View Synthesis);

reference numeral (d) denotes a Background image with Camera Calibration (Background Images with Camera Calibration);

reference numeral (e) denotes Background image Inpainting;

reference numeral (f) denotes a three-dimensional structure (3D structype of Background) of the Background image;

reference numeral (g) denotes a synthesized result (Novel-view Results with group-try indications) with a new view angle labeled.

Referring to fig. 1 to 6, as indicated by reference numeral (a) in fig. 6, for a vehicle object, input information of a three-dimensional reconstruction task may include a single traffic scene image, a six-degree-of-freedom spatial pose of each vehicle labeled in the image, and a deformable template of a three-dimensional vehicle. The deformable template may contain texture maps. As shown by reference numeral (b), image pixels are projected onto the texture map according to the annotated six-degree-of-freedom spatial pose. Then, a deep neural network is trained, and the missing area of the texture map is completely filled. As indicated by reference numeral (c), the deformation parameters of the deformable templates of the three-dimensional vehicle model are then adjusted to generate a number of different three-dimensional models of the vehicle. And (4) rendering the model by combining the generated texture maps to obtain a two-dimensional image of the vehicle.

As shown by reference numeral (d), an image of an intersection can be acquired for the background image portion. The background image portion may be an image of a scene from a second perspective that backgrounds the three-dimensional vehicle model. The internal and external parameters of the camera for shooting images are calibrated in advance. As shown by reference numeral (e), the vehicle in the background image portion is removed by a conventional image inpainting method. And (f) recovering the three-dimensional geometric information of the intersection by using the internal reference and the external reference of the camera. The three-dimensional geometric information includes plane equations, normal, and the like. Finally, as shown by reference numeral (g), the textured vehicle generated in reference numeral (c) is placed at a random position of the background image, that is, the vehicle is placed on the background road surface, and images of a plurality of viewing angles are synthesized. Meanwhile, two-dimensional labeling information and three-dimensional labeling information corresponding to the synthetic image are obtained.

Fig. 7 is a schematic diagram of data diversity effect of an image synthesis method according to another embodiment of the present disclosure.

The respective reference numerals in fig. 7 denote the following:

reference numeral (a1) denotes a Real image (Input Real Images in AD) Input in the automatic driving system;

reference numeral (b1) denotes Texture map completion (Inpainted Texture Maps);

reference numeral (c1) denotes a three-dimensional deformable template (3D formed Vehicle Models) of the Vehicle;

reference numeral (d1) denotes an Output image (Output Images with variable params) containing rich parameters.

As indicated by reference numeral (a1), the real image input in the automatic driving system is an image of a first perspective including the first target object. As indicated by reference numeral (b1), the texture completion process is performed on the image of the first perspective including the first target object, resulting in a texture map of the first target object. As indicated by reference numeral (c1), a three-dimensional model of the first target object may be generated based on the deformation parameters and the texture map of the deformable template. Different vehicle appearance shapes in the deformable template can be randomly combined with the texture maps, and a large number of vehicle three-dimensional models with different appearance shapes and textures can be generated. The output of the generative model is shown at reference numeral (d 1).

The image synthesis method disclosed by the embodiment of the disclosure can ensure the diversity and the verisimilitude of the generated data. As shown in fig. 6 and 7, the disclosed embodiments recover the texture map of the vehicle from the real captured image of the traffic scene. And then adjusting deformation parameters of the three-dimensional model to obtain a large number of three-dimensional vehicles with different shapes. And then randomly combining the texture maps with the three-dimensional vehicles with different shapes, and performing multi-view rendering. During rendering, different camera parameters (internal and external) and scene lighting information may also be adjusted, as well as the resolution of the generated image. With the above method, the diversity of data can be increased as much as possible while ensuring the image quality.

Fig. 8 is a schematic diagram of an image synthesis apparatus according to an embodiment of the present disclosure. Referring to fig. 8, the image synthesizing apparatus includes:

a processing unit 100, configured to perform texture completion processing on an image of a first view including a first target object to obtain a texture map of the first target object;

a generating unit 200 for generating a three-dimensional model of the first target object using the texture map;

the projection unit 300 is configured to project the three-dimensional model of the first target object according to the orientation information of the scene image of the second view angle to obtain a two-dimensional image of the first target object;

the superimposing unit 400 is configured to superimpose the two-dimensional image of the first target object onto the scene image, so as to obtain a composite image of the second view angle.

In one embodiment, the processing unit 100 is configured to:

segmenting an image of a first perspective comprising a first target object to obtain a segmented image of at least one component comprising the first target object;

marking the pose of the first target object in an image of a first visual angle comprising the first target object to obtain pose marking information;

projecting the segmented image according to the pose marking information to obtain a to-be-processed image of the first target object;

and performing texture completion processing on the image to be processed by utilizing the deep neural network to obtain a texture mapping of the first target object.

In one embodiment, the generating unit 200 is configured to:

acquiring deformation parameters of a deformable template of a first target object, wherein the deformation parameters correspond to the appearance shape of the first target object;

and generating a three-dimensional model of the first target object according to the deformation parameters of the deformable template and the texture mapping.

In one embodiment, the projection unit 300 is configured to:

obtaining azimuth information of the scene image according to the shooting parameters of the scene image at the second visual angle, wherein the azimuth information comprises a plane equation of the scene image;

adjusting the pose of the three-dimensional model of the first target object, and putting the three-dimensional model of the first target object on a plane determined by a plane equation;

and projecting the three-dimensional model of the launched first target object to obtain a two-dimensional image of the first target object.

Fig. 9 is a schematic diagram of an image synthesis apparatus according to another embodiment of the present disclosure. As shown in fig. 9, in an embodiment, the apparatus further includes a repair unit 220, and the repair unit 220 is configured to:

removing a second target object in the shot standby image of the second visual angle by using an image restoration method;

and taking the standby image with the second target object removed as a scene image of a second visual angle.

In one embodiment, the apparatus further includes a labeling unit 500, where the labeling unit 500 is configured to:

The functions of the units in the image synthesis apparatus according to the embodiment of the present disclosure may refer to the corresponding descriptions in the above methods, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 10 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 80 performs the respective methods and processes described above, such as the image synthesis method. For example, in some embodiments, the image composition method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM803 and executed by the computing unit 801, one or more steps of the image synthesis method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image synthesis method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image synthesis method comprising:

generating a three-dimensional model of a first target object using the texture map;

2. The method of claim 1, wherein the performing texture completion processing on the image including the first perspective of the first target object to obtain the texture map of the first target object comprises:

segmenting the image comprising the first view angle of the first target object to obtain a segmented image comprising at least one component of the first target object;

marking the pose of the first target object in the image of the first visual angle comprising the first target object to obtain pose marking information;

projecting the segmentation image according to the pose marking information to obtain a to-be-processed image of the first target object;

and performing texture completion processing on the image to be processed by utilizing the deep neural network to obtain a texture map of the first target object.

3. The method of claim 1, wherein said generating a three-dimensional model of a first target object using said texture map comprises:

and generating a three-dimensional model of the first target object according to the deformation parameters of the deformable template and the texture map.

4. The method of any of claims 1 to 3, wherein projecting the three-dimensional model of the first target object into a two-dimensional image of the first target object based on the orientation information of the image of the scene from the second perspective comprises:

obtaining azimuth information of the scene image according to shooting parameters of the scene image at a second visual angle, wherein the azimuth information comprises a plane equation of the scene image;

adjusting the pose of the three-dimensional model of the first target object, and putting the three-dimensional model of the first target object on a plane determined by the plane equation;

5. The method of any of claims 1-3, further comprising:

and taking the standby image with the second target object removed as the scene image of the second visual angle.

6. The method of any of claims 1-3, further comprising:

7. An image synthesizing apparatus comprising:

8. The apparatus of claim 7, wherein the processing unit is to:

9. The apparatus of claim 7, wherein the generating unit is to:

10. The apparatus of any of claims 7 to 9, wherein the projection unit is to:

11. The apparatus according to any one of claims 7 to 9, further comprising a repair unit for:

12. The apparatus according to any one of claims 7 to 9, further comprising an annotation unit for:

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.