CN112651881B - Image synthesizing method, apparatus, device, storage medium, and program product - Google Patents

Image synthesizing method, apparatus, device, storage medium, and program product Download PDF

Info

Publication number
CN112651881B
CN112651881B CN202011619097.6A CN202011619097A CN112651881B CN 112651881 B CN112651881 B CN 112651881B CN 202011619097 A CN202011619097 A CN 202011619097A CN 112651881 B CN112651881 B CN 112651881B
Authority
CN
China
Prior art keywords
image
target object
dimensional model
dimensional
view angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011619097.6A
Other languages
Chinese (zh)
Other versions
CN112651881A (en
Inventor
卢飞翔
刘宗岱
张良俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Baidu USA LLC
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Baidu USA LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd, Baidu USA LLC filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011619097.6A priority Critical patent/CN112651881B/en
Publication of CN112651881A publication Critical patent/CN112651881A/en
Application granted granted Critical
Publication of CN112651881B publication Critical patent/CN112651881B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Geometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The disclosure discloses an image synthesis method, an image synthesis device, a storage medium and a program product, and relates to the technical field of image processing. The specific implementation scheme is as follows: performing texture complement processing on an image comprising a first view angle of a first target object to obtain a texture map of the first target object; generating a three-dimensional model of the first target object by using the texture map; according to azimuth information of the scene image of the second view angle, projecting the three-dimensional model of the first target object to obtain a two-dimensional image of the first target object; and superposing the two-dimensional image of the first target object into the scene image to obtain a composite image of the second visual angle. The embodiment of the disclosure can obviously reduce the cost of data synthesis, provide a large amount of training data for training the deep neural network, and greatly reduce the consumption of manpower, material resources and financial resources.

Description

Image synthesizing method, apparatus, device, storage medium, and program product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of image processing technologies.
Background
Machine learning model training typically requires a large number of annotated multi-perspective images as a training set. Taking the application scene of vehicle and road cooperation as an example, visual sensors can be arranged at the tops of vehicles and at the positions of crossing telegraph poles and traffic lights, and the vehicles on the roads can be subjected to multi-view detection, segmentation and pose estimation. Vehicle-road coordination is an important way to achieve automatic driving. The difficulty in shielding vehicles can be effectively solved by utilizing vehicle road cooperation, and the safety of the automatic driving technology is greatly improved. However, the conventional method requires a large number of annotated multi-view images as a training set, and then performs network model training. Training data of multi-view images is difficult to obtain in traffic scenes, and the data is difficult to annotate.
Disclosure of Invention
The present disclosure provides an image synthesizing method, apparatus, device, storage medium, and program product.
According to an aspect of the present disclosure, there is provided an image synthesizing method including:
performing texture complement processing on an image comprising a first view angle of a first target object to obtain a texture map of the first target object;
generating a three-dimensional model of the first target object by using the texture map;
according to azimuth information of the scene image of the second view angle, projecting the three-dimensional model of the first target object to obtain a two-dimensional image of the first target object;
and superposing the two-dimensional image of the first target object into the scene image to obtain a composite image of the second visual angle.
According to another aspect of the present disclosure, there is provided an image synthesizing apparatus including:
the processing unit is used for carrying out texture complement processing on the image comprising the first view angle of the first target object to obtain a texture map of the first target object;
a generation unit for generating a three-dimensional model of the first target object using the texture map;
the projection unit is used for projecting the three-dimensional model of the first target object according to the azimuth information of the scene image of the second view angle to obtain a two-dimensional image of the first target object;
and the superposition unit is used for superposing the two-dimensional image of the first target object into the scene image to obtain a composite image of the second visual angle.
According to still another aspect of the present disclosure, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by any one of the embodiments of the present disclosure.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method provided by any one of the embodiments of the present disclosure.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method provided by any of the embodiments of the present disclosure.
One embodiment of the above application has the following advantages or benefits: the cost of data synthesis can be obviously reduced, a large amount of training data is provided for training of the deep neural network, and the consumption of manpower, material resources and financial resources is greatly reduced. Taking a vehicle as a target object as an example, the embodiment of the disclosure can provide a plurality of marked multi-view images for network model training, can improve the accuracy of vehicle-road cooperative tasks, improve the performance of environmental perception, and can effectively improve the safety of an automatic driving vehicle.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of an image compositing method according to an embodiment of the disclosure;
FIG. 2 is a flow chart of texture completion for an image synthesis method according to another embodiment of the present disclosure;
FIG. 3 is a flow chart of three-dimensional model reconstruction for an image synthesis method according to another embodiment of the present disclosure;
FIG. 4 is a flow chart of image projection of an image compositing method according to another embodiment of the disclosure;
FIG. 5 is a flow chart of image restoration of an image synthesis method according to another embodiment of the present disclosure;
FIG. 6 is a flow chart of an image compositing method according to another embodiment of the disclosure;
fig. 7 is a schematic view of data diversity effect of an image synthesizing method according to another embodiment of the present disclosure;
FIG. 8 is a schematic diagram of an image compositing apparatus according to an embodiment of the disclosure;
fig. 9 is a schematic diagram of an image synthesizing apparatus according to another embodiment of the present disclosure;
fig. 10 is a block diagram of an electronic device for implementing an image compositing method according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Machine learning model training typically requires a large number of annotated multi-perspective images as a training set. Taking the application scene of the vehicle-road cooperation as an example, the vehicle-road cooperation can be used for detecting vehicles on the road at multiple visual angles, so that the difficulty of shielding the vehicles is effectively solved, and the safety of the automatic driving technology is greatly improved. However, the conventional method requires a large number of annotated multi-view images as a training set, and then performs network model training. Training data of multi-view images is difficult to obtain in traffic scenes, and the data is difficult to annotate.
Taking an application scene of vehicle-road cooperation as an example, the method for generating the multi-angle image in the related technology mainly comprises the following technical schemes:
(1) And (5) rendering a three-dimensional model. This approach requires the construction of a large number of three-dimensional models of vehicles and of cities. Data such as texture mapping, scene illumination, rendering parameters and the like of the model need to be adjusted, and image rendering is performed by using rendering software such as 3dsMax and the like. The scheme has high cost, low efficiency and difficult guarantee of the effect, and the obtained image data is difficult to train on the network.
(2) And predicting the image under the new view angle by utilizing the corresponding relation of the training pixel points of the multi-view image. This approach requires a large number of annotated multi-view images as a training set. Training data is difficult to obtain in traffic scenes and data is difficult to annotate.
(3) Image synthesis is performed by means of generating a countermeasure network (GAN, generative Adversarial Networks). This scheme requires two or more image pairs (image pairs) as training data, which are difficult to acquire. In addition, GAN networks are difficult to train and as a result are difficult to control. The greatest defect of the scheme is that a corresponding labeling result cannot be automatically generated.
Therefore, the method for synthesizing the multi-view images is oriented to the vehicle-road cooperative task. Fig. 1 is a flowchart of an image synthesizing method according to an embodiment of the present disclosure. Referring to fig. 1, the image synthesizing method includes:
step S110, performing texture complement processing on an image comprising a first view angle of a first target object to obtain a texture map of the first target object;
step S120, generating a three-dimensional model of the first target object by using the texture map;
step S130, according to azimuth information of the scene image of the second view angle, projecting the three-dimensional model of the first target object to obtain a two-dimensional image of the first target object;
step S140, the two-dimensional image of the first target object is superimposed on the scene image, so as to obtain a composite image of the second view angle.
Wherein, in step S110 and step S120, three-dimensional model reconstruction of the first target object is performed, and in step S130 and step S140, projection of the three-dimensional model of the first target object is superimposed into the scene image, resulting in a composite image of a new view angle.
In the task of three-dimensional model reconstruction of a first target object, it is often necessary to reconstruct a texture map of the three-dimensional model from the monocular image. Due to the singleness of the monocular image capture perspective, a complete texture map of the first target object cannot be obtained. Taking the vehicle as the first target object as an example, the vehicle is photographed from the front, and the tail lamp of the vehicle cannot be photographed. In addition, since the photographing angle of view is single, image textures of some parts in the image of the first target object may be incomplete. Therefore, the missing part in the first target object needs to be complemented so as to reconstruct the three-dimensional model of the first target object.
In step S110, a captured image including a first perspective of a first target object may be first acquired. For example, the image of the first angle of view may be a front view taken from the front. The texture complement processing can be performed on the image including the first view angle of the first target object by utilizing a pre-trained deep neural network, so as to obtain a texture map of the first target object. In step S120, three-dimensional model reconstruction is performed using the texture map obtained in step S110, and a three-dimensional model of the first target object is generated.
In step S130, the photographed scene image of the second view angle and the azimuth information of the scene image may be first acquired. For example, the scene image at the second viewing angle may be a top view of a road scene taken from a high elevation down. In one example, orientation information for a scene image may be obtained from camera parameters. The azimuth information may include three-dimensional geometric information of the road scene, including plane equations, normal directions, and the like. According to the azimuth information, the three-dimensional model of the first target object can be subjected to projection operation, and a two-dimensional image of the first target object is obtained. For example, a three-dimensional model of the first target object may be projected onto a plane determined by a plane equation of the road scene, resulting in a two-dimensional image of the first target object. And the placing position of the three-dimensional model in the road scene is coordinated with the three-dimensional geometric information of the road scene through projection operation.
In step S140, the two-dimensional image of the first target object obtained in step S110 is superimposed on the scene image, resulting in a composite image at the second viewing angle.
In the application scene of the vehicle-road coordination, most of the images shot by various visual sensors may be images of a first visual angle, and the number of images of a second visual angle is smaller. Embodiments of the present disclosure may synthesize an image of a second perspective using an image of a first perspective of a first target object and a scene image of the second perspective. By using the embodiment of the invention to generate the image, the cost of data synthesis can be obviously reduced, a large amount of training data is provided for training the deep neural network, and the consumption of manpower, material resources and financial resources is greatly reduced. Taking a vehicle as a target object as an example, the embodiment of the disclosure can provide a plurality of marked multi-view images for network model training, can improve the accuracy of vehicle-road cooperative tasks, improve the performance of environmental perception, and can effectively improve the safety of an automatic driving vehicle.
Fig. 2 is a flow chart of texture completion for an image synthesis method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. In addition, as shown in fig. 2, in an embodiment, step S110 in fig. 1, performing texture complement processing on an image including a first view angle of a first target object to obtain a texture map of the first target object may specifically include:
step S210, dividing an image of a first view angle comprising a first target object to obtain a divided image comprising at least one component of the first target object;
step S220, marking the pose of the first target object in the image comprising the first view angle of the first target object to obtain pose marking information;
step S230, projecting the segmented image according to pose labeling information to obtain a to-be-processed image of the first target object;
and step S240, performing texture complement processing on the image to be processed by using the deep neural network to obtain a texture map of the first target object.
In step S210, an image including a first view angle of a first target object is first segmented to obtain a segmented image including at least one component of the first target object.
Taking a vehicle as a first target object as an example, a model object to be reconstructed is divided into a plurality of parts. For example, the vehicle may be divided into a plurality of parts such as 4 wheels, a front cover, a rear cover, and a tail lamp. In one example, if the captured image of the vehicle is taken from the front, there may be only a front cover and 2 front wheels in the image, and no rear cover and tail lights. That is, some parts may be visible in the captured image and some parts may not be visible in the captured image. In addition, due to the limitation of the shooting angle, the image textures of the front cover and the 2 front wheels in the image may also be incomplete. The captured image of the vehicle may be segmented to obtain a segmented image that includes the various components in the image.
In one example, the segmented image may be taken as a to-be-processed image of at least one component of the first target object.
In another example, in step S220, the pose of the first target object may also be labeled in the image including the first target object, to obtain pose labeling information. Although the same first target object is photographed, the pose of the first target object presented on the image may be different due to the photographing angle, and the images of the respective parts of the first target object may be different due to the photographing angle. Therefore, the pose of the first target object can be identified by utilizing an identification algorithm, and pose labeling information is obtained. The pose labeling information can also be obtained by a manual labeling mode.
In one embodiment, the pose annotation information may include a six degree of freedom spatial pose. The six degrees of freedom of the object in space may include a degree of freedom of movement in the directions of three orthogonal coordinate axes of x, y, and z and a degree of freedom of rotation about the three coordinate axes. Thus, the position of the object can be determined using the six degrees of freedom spatial pose.
In step S230, the segmented image is projected according to the pose labeling information, and the image projection algorithm may be used to perform a projection operation on the segmented image, so as to correct the deviation of the segmented image caused by different poses of the first target object, and obtain the to-be-processed image of at least one component of the first target object after projection.
In step S240, a texture complement process is performed on the image to be processed by using the pre-trained deep neural network, so as to obtain a texture map of the first target object. In one example, a texture completion process may be performed on an image to be processed using a graph neural network model. Specifically, the data structure of the association graphs of all the components of the first target object may be constructed in advance. In the data structure of the association graph, each node element in the association graph is used to represent a component of the first target object. In an example where a vehicle is the first target object, n nodes may be included in the association graph, each node representing a component of the vehicle, such as a wheel, a front cover, a tail light, and the like. When the image including the first target object is segmented in step S210, the image segmentation is also performed according to the nodes defined in the data structure of the association graph. Each part in the segmented image to be processed can find the node corresponding to the part in the association graph.
For a component visible in the captured image comprising the first target object, the node corresponding to the component can be found in the association graph. The images of each component in the image to be processed can be assigned to the corresponding node elements in the association graph. For a component that is not visible in the captured image that includes the first target object, that is, a component that is not captured in the image, the node corresponding to the component is assigned as a null node in the association graph. And finally, constructing a correlation diagram of all the components of the first target object by using node elements corresponding to all the assigned components.
And inputting the constructed association diagram of the first target object into the graphic neural network model. In the input association diagram, nodes in the diagram represent images of components of the first target object, image textures of some components may be incomplete, and image textures of other components may be completely absent. And (3) complementing incomplete or completely absent image textures in the input association diagram by using the graphic neural network model, and outputting texture-complemented images of all parts of the first target object.
According to the embodiment of the invention, the high-quality complete three-dimensional texture map can be generated aiming at the first target object, the cost of three-dimensional texture reconstruction can be obviously reduced, and the omnibearing simulation rendering of the target object is realized. Taking a vehicle as a first target object as an example, the automatic driving simulation database can be greatly enriched through three-dimensional model reconstruction of the vehicle, and abundant resources are provided for perception system training.
Fig. 3 is a flow chart of three-dimensional model reconstruction for an image synthesis method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. In addition, as shown in fig. 3, in an embodiment, step S120 in fig. 1, generating a three-dimensional model of the first target object using the texture map may specifically include:
step S310, obtaining deformation parameters of a deformable template of a first target object, wherein the deformation parameters correspond to the appearance shape of the first target object;
step S320, a three-dimensional model of the first target object is generated according to the deformation parameters and the texture map of the deformable template.
Taking a vehicle as a first target object as an example, the deformable template is used to generate vehicles of different shapes in appearance. The deformation parameters of the deformable templates correspond to different vehicle appearance shapes. The overall exterior shape of the vehicle may vary from vehicle to vehicle, and the shape of the various components that make up the vehicle may also vary. Corresponding deformable templates can be created according to the shapes of various components of different vehicle models. The texture map in the deformable template is a predefined texture contour, and the texture in the contour can be filled for image texture completion. And (3) adjusting deformation parameters of the deformable template, and combining the completed texture map to generate a three-dimensional model of the vehicle.
In the embodiment of the disclosure, the three-dimensional model reconstruction of the vehicle is realized through the deformable template and the texture complement, so that an automatic driving simulation database can be greatly enriched, and abundant resources are provided for the training of a perception system.
Fig. 4 is a flowchart of image projection of an image composition method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. In addition, as shown in fig. 4, in an embodiment, step S130 in fig. 1, projecting the three-dimensional model of the first target object to obtain the two-dimensional image of the first target object according to the azimuth information of the scene image at the second viewing angle may specifically include:
step S410, according to shooting parameters of the scene image of the second view angle, azimuth information of the scene image is obtained, wherein the azimuth information comprises a plane equation of the scene image;
step S420, adjusting the pose of the three-dimensional model of the first target object, and putting the three-dimensional model of the first target object on a plane determined by a plane equation;
step S430, projecting the released three-dimensional model of the first target object to obtain a two-dimensional image of the first target object.
Wherein the photographing parameters of the scene image of the second view angle may include camera parameters. The camera parameters may include at least one of internal and external parameters of the camera. The internal parameters of the camera may include the focal length. The external parameters of the camera may include the camera position. In step S410, when capturing the captured scene image at the second angle of view, the capturing parameters may be simultaneously acquired. And obtaining the azimuth information of the scene image according to the shooting parameters. The azimuth information may include three-dimensional geometric information of the road scene. The three-dimensional geometric information may include plane equations, normal, and the like.
Taking the vehicle as the first target object as an example, in step S420, the pose of the three-dimensional model of the vehicle is adjusted according to the azimuth information of the scene image, and the three-dimensional model of the vehicle is put on the plane determined by the plane equation. And adjusting the pose of the three-dimensional model of the vehicle according to the azimuth information of the scene image, so that the placement position of the three-dimensional model in the road scene is consistent with the three-dimensional geometric information of the road scene. In step S430, the three-dimensional model of the vehicle after delivery is projected to obtain a two-dimensional image of the vehicle.
According to the azimuth information of the scene image at the second view angle, the three-dimensional model of the first target object is projected to obtain a two-dimensional image, so that the placement position of the three-dimensional model in the road scene is coordinated and consistent with the three-dimensional geometric information of the road scene, and the synthesized image effect is more real.
Fig. 5 is a flowchart of image restoration of an image synthesis method according to another embodiment of the present disclosure. The image synthesizing method of this embodiment may include the steps of the above-described embodiments. Furthermore, as shown in fig. 5, in one embodiment, the method further includes:
step S510, removing a second target object in the photographed standby image at the second viewing angle by using an image restoration method;
in step S520, the standby image from which the second target object is removed is taken as the scene image at the second viewing angle.
In this embodiment, after photographing based on the second angle of view, the photographed image is used as the standby image. And after the standby image is subjected to restoration processing, taking the standby image after the restoration processing as a scene image of the second visual angle. Pedestrians and vehicles may be present in the road scene of the photographed standby image of the second view angle. Taking the vehicle reconstructed by the three-dimensional model as a first target object as an example, pedestrians and vehicles in the standby image of the second view angle are taken as second target objects. The second target object may be removed from the standby image using an image restoration method, and the standby image from which the second target object is removed may be used as the scene image at the second view angle.
In the application scene of the vehicle-road coordination, the number of possible images of the second visual angle in the images shot by the visual sensor is small. By using the method, a large number of images at the second view angle can be generated, a large number of multi-view images can be provided for training the network model, and the robustness of the model is improved.
In one embodiment, the method further comprises:
and obtaining the annotation information of the composite image according to the position information of the first target object in the composite image.
The labeling information may include two-dimensional labeling information and three-dimensional labeling information. The two-dimensional annotation information may include at least one of a "two-dimensional bounding box" and an "instance-level segmentation. The "two-dimensional bounding box" includes labeling information of the overall position of the vehicle. An "instance-level segmentation" includes segmenting a vehicle into components, marking the location of each component. The three-dimensional annotation includes at least one of a "three-dimensional bounding box" and a "six-degree-of-freedom spatial pose".
The image synthesis method disclosed by the embodiment of the invention can synthesize images with multiple visual angles, and automatically generate corresponding two-dimensional annotation information and three-dimensional annotation information, thereby greatly reducing the cost of acquiring training data and effectively improving the robustness of the deep learning model.
Fig. 6 is a flowchart of an image synthesizing method according to another embodiment of the present disclosure. The various reference numerals in fig. 6 are as follows:
reference numeral 1 denotes a Source image (Source), which is a Front View;
reference numeral 2 denotes a Target image (Target), which is a Top View;
the reference numeral (a) denotes a deformable Vehicle Template and a six-degree-of-freedom space Pose notation (Vehicle Template & supported 6-DOF Pose);
reference numeral (b) denotes a component-based texture map completion (Part based Texture Inpainting);
reference numeral (c) denotes model-based view synthesis (Model based View Synthesis);
reference numeral (d) denotes a background image (Background Images with Camera Calibration) with camera calibration;
reference numeral (e) denotes a background image restoration (Background Inpainting);
the reference numeral (f) denotes a three-dimensional structure (3D structyre of Background) of the background image;
the reference (g) indicates the composite result (Novel-view Results with Ground-Truth Annotations) with the new view angle noted.
Referring to fig. 1 to 6, as indicated by reference numeral (a) in fig. 6, input information of the three-dimensional reconstruction task may include a single traffic scene image, a six-degree-of-freedom spatial pose of each vehicle noted in the image, and a deformable template of the three-dimensional vehicle, for the vehicle object. The deformable template may contain texture maps therein. As indicated by reference numeral (b), image pixels are projected onto the texture map according to the noted six-degree-of-freedom spatial pose. Then training a deep neural network to fill the missing area of the texture map. As indicated by reference numeral (c), the deformation parameters of the deformable templates of the three-dimensional vehicle model are then adjusted to produce a number of different three-dimensional vehicle models. And rendering the model by combining the generated texture map to obtain a two-dimensional image of the vehicle.
As indicated by reference numeral (d), an image of an intersection can be acquired for the background image portion. The background image portion may be a scene image of a second perspective that is background to the three-dimensional vehicle model. The internal parameters and external parameters of a camera for shooting images are calibrated in advance. As shown in reference numeral (e), the vehicle in the background image portion is removed using the existing image restoration (image inpainting) method. As shown by the reference number (f), the three-dimensional geometric information of the intersection is restored by using the internal parameters and the external parameters of the camera. The three-dimensional geometric information includes plane equations, normal directions, and the like. Finally, as shown in reference numeral (g), the textured vehicle generated in reference numeral (c) is placed at random positions of the background image, that is, the vehicle is placed on the background road surface, and images of multiple perspectives are synthesized. And simultaneously, obtaining two-dimensional labeling information and three-dimensional labeling information corresponding to the synthesized image.
Fig. 7 is a schematic view of data diversity effect of an image synthesizing method according to another embodiment of the present disclosure.
The various reference numerals in fig. 7 are as follows:
the reference numeral (a 1) denotes a real image (Input Real Images in AD) input in the automated driving system;
the reference numeral (b 1) denotes texture map completion (Inpainted Texture Maps);
the reference numeral (c 1) denotes a three-dimensional deformable template (3D Deformed Vehicle Models) of the vehicle;
reference numeral (d 1) denotes an output image (Output Images with various params) containing rich parameters.
As indicated by the reference numeral (a 1), a real image input in the automated driving system is taken as an image including a first angle of view of a first target object. As indicated by reference numeral (b 1), a texture complement process is performed on an image including a first view angle of a first target object to obtain a texture map of the first target object. As indicated by reference numeral (c 1), a three-dimensional model of the first target object may be generated from the deformation parameters and texture map of the deformable template. Wherein, different appearance shapes of the vehicle in the deformable template can be randomly and randomly combined with the texture map, and a large number of three-dimensional models of the vehicle with different appearance shapes and textures can be generated. The output of the generative model is shown as (d 1).
The image synthesis method of the embodiment of the disclosure can ensure the diversity and the verisimilitude of the generated data. As shown in fig. 6 and 7, embodiments of the present disclosure recover a texture map of a vehicle from images of a truly acquired traffic scene. And then, adjusting deformation parameters of the three-dimensional model to obtain a large number of three-dimensional vehicles with different shapes. And then, randomly combining the texture map with the three-dimensional vehicles with different shapes, and performing multi-view rendering. During the rendering process, different camera parameters (internal and external) and scene illumination information can also be adjusted, as well as the resolution of the generated image. The above method can increase the diversity of data as much as possible while ensuring the image quality.
Fig. 8 is a schematic diagram of an image synthesizing apparatus according to an embodiment of the present disclosure. Referring to fig. 8, the image synthesizing apparatus includes:
a processing unit 100, configured to perform texture complement processing on an image including a first view angle of a first target object, to obtain a texture map of the first target object;
a generating unit 200 for generating a three-dimensional model of the first target object using the texture map;
a projection unit 300, configured to project the three-dimensional model of the first target object according to the azimuth information of the scene image at the second view angle to obtain a two-dimensional image of the first target object;
and a superposition unit 400, configured to superimpose the two-dimensional image of the first target object onto the scene image, so as to obtain a composite image of the second viewing angle.
In one embodiment, the processing unit 100 is configured to:
segmenting an image comprising a first perspective of a first target object to obtain a segmented image comprising at least one component of the first target object;
marking the pose of the first target object in an image of a first visual angle comprising the first target object to obtain pose marking information;
projecting the segmented image according to the pose labeling information to obtain an image to be processed of the first target object;
and performing texture complement processing on the image to be processed by using the deep neural network to obtain a texture map of the first target object.
In one embodiment, the generating unit 200 is configured to:
obtaining deformation parameters of a deformable template of the first target object, wherein the deformation parameters correspond to the appearance shape of the first target object;
and generating a three-dimensional model of the first target object according to the deformation parameters of the deformable template and the texture map.
In one embodiment, the projection unit 300 is configured to:
obtaining azimuth information of the scene image according to shooting parameters of the scene image at the second view angle, wherein the azimuth information comprises a plane equation of the scene image;
adjusting the pose of the three-dimensional model of the first target object, and putting the three-dimensional model of the first target object on a plane determined by a plane equation;
and projecting the released three-dimensional model of the first target object to obtain a two-dimensional image of the first target object.
Fig. 9 is a schematic diagram of an image synthesizing apparatus according to another embodiment of the present disclosure. As shown in fig. 9, in one embodiment, the apparatus further includes a repairing unit 220, where the repairing unit 220 is configured to:
removing a second target object in the photographed standby image at the second view angle by using an image restoration method;
and taking the standby image from which the second target object is removed as a scene image of the second visual angle.
In one embodiment, the apparatus further includes an labeling unit 500, where the labeling unit 500 is configured to:
and obtaining the annotation information of the composite image according to the position information of the first target object in the composite image.
The functions of each unit in the image synthesizing apparatus according to the embodiments of the present disclosure may be referred to the corresponding descriptions in the above methods, and are not described herein.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 10 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 80 performs the respective methods and processes described above, such as an image synthesizing method. For example, in some embodiments, the image synthesis method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into the RAM803 and executed by the computing unit 801, one or more steps of the image synthesis method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image synthesis method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (12)

1. An image synthesis method, comprising:
performing texture complement processing on an image comprising a first view angle of a first target object to obtain a texture map of the first target object;
generating a three-dimensional model of the first target object by using the texture map;
according to azimuth information of the scene image of the second view angle, projecting the three-dimensional model of the first target object to obtain a two-dimensional image of the first target object;
overlapping the two-dimensional image of the first target object into the scene image to obtain a composite image of a second view angle;
the performing texture complement processing on the image including the first view angle of the first target object to obtain a texture map of the first target object includes: dividing the image comprising the first view angle of the first target object to obtain a divided image comprising at least one component of the first target object; marking the pose of the first target object in the image comprising the first visual angle of the first target object to obtain pose marking information; projecting the segmented image according to the pose labeling information to obtain a to-be-processed image of the first target object; and performing texture complement processing on the image to be processed by using the deep neural network to obtain a texture map of the first target object.
2. The method of claim 1, wherein the generating a three-dimensional model of a first target object using the texture map comprises:
obtaining deformation parameters of a deformable template of a first target object, wherein the deformation parameters correspond to the appearance shape of the first target object;
and generating a three-dimensional model of the first target object according to the deformation parameters of the deformable template and the texture map.
3. The method according to any one of claims 1 to 2, wherein projecting the three-dimensional model of the first target object from the orientation information of the scene image at the second perspective to obtain a two-dimensional image of the first target object comprises:
obtaining azimuth information of a scene image according to shooting parameters of the scene image at a second view angle, wherein the azimuth information comprises a plane equation of the scene image;
adjusting the pose of the three-dimensional model of the first target object, and putting the three-dimensional model of the first target object on a plane determined by the plane equation;
and projecting the released three-dimensional model of the first target object to obtain a two-dimensional image of the first target object.
4. The method of any one of claims 1 to 2, the method further comprising:
removing a second target object in the photographed standby image at the second view angle by using an image restoration method;
and taking the standby image from which the second target object is removed as the scene image of the second visual angle.
5. The method of any one of claims 1 to 2, the method further comprising:
and obtaining the annotation information of the composite image according to the position information of the first target object in the composite image.
6. An image synthesizing apparatus comprising:
the processing unit is used for carrying out texture complement processing on the image comprising the first view angle of the first target object to obtain a texture map of the first target object;
a generating unit, configured to generate a three-dimensional model of a first target object using the texture map;
the projection unit is used for projecting the three-dimensional model of the first target object according to the azimuth information of the scene image of the second view angle to obtain a two-dimensional image of the first target object;
the superposition unit is used for superposing the two-dimensional image of the first target object into the scene image to obtain a composite image of a second visual angle;
wherein the processing unit is used for: dividing the image comprising the first view angle of the first target object to obtain a divided image comprising at least one component of the first target object; marking the pose of the first target object in the image comprising the first visual angle of the first target object to obtain pose marking information; projecting the segmented image according to the pose labeling information to obtain a to-be-processed image of the first target object; and performing texture complement processing on the image to be processed by using the deep neural network to obtain a texture map of the first target object.
7. The apparatus of claim 6, wherein the generating unit is configured to:
obtaining deformation parameters of a deformable template of a first target object, wherein the deformation parameters correspond to the appearance shape of the first target object;
and generating a three-dimensional model of the first target object according to the deformation parameters of the deformable template and the texture map.
8. The apparatus of any of claims 6 to 7, wherein the projection unit is configured to:
obtaining azimuth information of a scene image according to shooting parameters of the scene image at a second view angle, wherein the azimuth information comprises a plane equation of the scene image;
adjusting the pose of the three-dimensional model of the first target object, and putting the three-dimensional model of the first target object on a plane determined by the plane equation;
and projecting the released three-dimensional model of the first target object to obtain a two-dimensional image of the first target object.
9. The apparatus according to any one of claims 6 to 7, further comprising a repair unit for:
removing a second target object in the photographed standby image at the second view angle by using an image restoration method;
and taking the standby image from which the second target object is removed as the scene image of the second visual angle.
10. The apparatus according to any one of claims 6 to 7, further comprising an annotating unit for:
and obtaining the annotation information of the composite image according to the position information of the first target object in the composite image.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-5.
CN202011619097.6A 2020-12-30 2020-12-30 Image synthesizing method, apparatus, device, storage medium, and program product Active CN112651881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011619097.6A CN112651881B (en) 2020-12-30 2020-12-30 Image synthesizing method, apparatus, device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011619097.6A CN112651881B (en) 2020-12-30 2020-12-30 Image synthesizing method, apparatus, device, storage medium, and program product

Publications (2)

Publication Number Publication Date
CN112651881A CN112651881A (en) 2021-04-13
CN112651881B true CN112651881B (en) 2023-08-01

Family

ID=75366650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011619097.6A Active CN112651881B (en) 2020-12-30 2020-12-30 Image synthesizing method, apparatus, device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN112651881B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11796670B2 (en) * 2021-05-20 2023-10-24 Beijing Baidu Netcom Science And Technology Co., Ltd. Radar point cloud data processing method and device, apparatus, and storage medium
CN113379763A (en) * 2021-06-01 2021-09-10 北京齐尔布莱特科技有限公司 Image data processing method, model generating method and image segmentation processing method
CN113610968B (en) * 2021-08-17 2024-09-20 北京京东乾石科技有限公司 Updating method and device of target detection model
CN114359312B (en) * 2022-03-17 2022-08-23 荣耀终端有限公司 Image processing method and device
CN117078509B (en) * 2023-10-18 2024-04-09 荣耀终端有限公司 Model training method, photo generation method and related equipment

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014071850A (en) * 2012-10-02 2014-04-21 Osaka Prefecture Univ Image processing apparatus, terminal device, image processing method, and program
CN104599243A (en) * 2014-12-11 2015-05-06 北京航空航天大学 Virtual and actual reality integration method of multiple video streams and three-dimensional scene
CN106803286A (en) * 2017-01-17 2017-06-06 湖南优象科技有限公司 Mutual occlusion real-time processing method based on multi-view image
CN107393017A (en) * 2017-08-11 2017-11-24 北京铂石空间科技有限公司 Image processing method, device, electronic equipment and storage medium
CN108765537A (en) * 2018-06-04 2018-11-06 北京旷视科技有限公司 A kind of processing method of image, device, electronic equipment and computer-readable medium
CN109697688A (en) * 2017-10-20 2019-04-30 虹软科技股份有限公司 A kind of method and apparatus for image procossing
CN109767485A (en) * 2019-01-15 2019-05-17 三星电子(中国)研发中心 Image processing method and device
CN109829969A (en) * 2018-12-27 2019-05-31 北京奇艺世纪科技有限公司 A kind of data capture method, device and storage medium
CN110223370A (en) * 2019-05-29 2019-09-10 南京大学 A method of complete human body's texture mapping is generated from single view picture
CN110223380A (en) * 2019-06-11 2019-09-10 中国科学院自动化研究所 Fusion is taken photo by plane and the scene modeling method of ground multi-view image, system, device
CN110490960A (en) * 2019-07-11 2019-11-22 阿里巴巴集团控股有限公司 A kind of composograph generation method and device
CN111783525A (en) * 2020-05-20 2020-10-16 中国人民解放军93114部队 Aerial photographic image target sample generation method based on style migration
CN112150575A (en) * 2020-10-30 2020-12-29 深圳市优必选科技股份有限公司 Scene data acquisition method, model training method, device and computer equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014071850A (en) * 2012-10-02 2014-04-21 Osaka Prefecture Univ Image processing apparatus, terminal device, image processing method, and program
CN104599243A (en) * 2014-12-11 2015-05-06 北京航空航天大学 Virtual and actual reality integration method of multiple video streams and three-dimensional scene
CN106803286A (en) * 2017-01-17 2017-06-06 湖南优象科技有限公司 Mutual occlusion real-time processing method based on multi-view image
CN107393017A (en) * 2017-08-11 2017-11-24 北京铂石空间科技有限公司 Image processing method, device, electronic equipment and storage medium
CN109697688A (en) * 2017-10-20 2019-04-30 虹软科技股份有限公司 A kind of method and apparatus for image procossing
CN108765537A (en) * 2018-06-04 2018-11-06 北京旷视科技有限公司 A kind of processing method of image, device, electronic equipment and computer-readable medium
CN109829969A (en) * 2018-12-27 2019-05-31 北京奇艺世纪科技有限公司 A kind of data capture method, device and storage medium
CN109767485A (en) * 2019-01-15 2019-05-17 三星电子(中国)研发中心 Image processing method and device
CN110223370A (en) * 2019-05-29 2019-09-10 南京大学 A method of complete human body's texture mapping is generated from single view picture
CN110223380A (en) * 2019-06-11 2019-09-10 中国科学院自动化研究所 Fusion is taken photo by plane and the scene modeling method of ground multi-view image, system, device
CN110490960A (en) * 2019-07-11 2019-11-22 阿里巴巴集团控股有限公司 A kind of composograph generation method and device
CN111783525A (en) * 2020-05-20 2020-10-16 中国人民解放军93114部队 Aerial photographic image target sample generation method based on style migration
CN112150575A (en) * 2020-10-30 2020-12-29 深圳市优必选科技股份有限公司 Scene data acquisition method, model training method, device and computer equipment

Also Published As

Publication number Publication date
CN112651881A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN112651881B (en) Image synthesizing method, apparatus, device, storage medium, and program product
CN111783820B (en) Image labeling method and device
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
CN109683699B (en) Method and device for realizing augmented reality based on deep learning and mobile terminal
US9437034B1 (en) Multiview texturing for three-dimensional models
WO2023093739A1 (en) Multi-view three-dimensional reconstruction method
CN111382618B (en) Illumination detection method, device, equipment and storage medium for face image
CN107203962B (en) Method for making pseudo-3D image by using 2D picture and electronic equipment
CN108734773A (en) A kind of three-dimensional rebuilding method and system for mixing picture
CN112330815A (en) Three-dimensional point cloud data processing method, device and equipment based on obstacle fusion
CN114022542A (en) Three-dimensional reconstruction-based 3D database manufacturing method
CN115008454A (en) Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement
CN107203961B (en) Expression migration method and electronic equipment
CN113486941B (en) Live image training sample generation method, model training method and electronic equipment
CN117011474B (en) Fisheye image sample generation method, device, computer equipment and storage medium
US20140306953A1 (en) 3D Rendering for Training Computer Vision Recognition
CN114299230A (en) Data generation method and device, electronic equipment and storage medium
Guo et al. Full-automatic high-precision scene 3D reconstruction method with water-area intelligent complementation and mesh optimization for UAV images
CN115063485B (en) Three-dimensional reconstruction method, device and computer-readable storage medium
CN116468796A (en) Method for generating representation from bird's eye view, vehicle object recognition system, and storage medium
CN109089100B (en) Method for synthesizing binocular stereo video
Fechteler et al. Articulated 3D model tracking with on-the-fly texturing
JP6641313B2 (en) Region extraction device and program
US20240153207A1 (en) Systems, methods, and media for filtering points of a point cloud utilizing visibility factors to generate a model of a scene
Liu et al. Image-based rendering for large-scale outdoor scenes with fusion of monocular and multi-view stereo depth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant