CN118097020A

CN118097020A - Method, system, equipment and medium for three-dimensional synthesis of target object and application scene

Info

Publication number: CN118097020A
Application number: CN202410289101.9A
Authority: CN
Inventors: 邵小飞; 朱力; 吕方璐; 汪博
Original assignee: Chongqing Guangjian Aoshen Technology Co ltd; Shenzhen Guangjian Technology Co Ltd
Current assignee: Chongqing Guangjian Aoshen Technology Co ltd; Shenzhen Guangjian Technology Co Ltd
Priority date: 2024-03-14
Filing date: 2024-03-14
Publication date: 2024-05-28

Abstract

The invention provides a method, a system, equipment and a medium for three-dimensional synthesis of a target object and an application scene, which comprise the following steps: s1: training a model of the multi-view RGBD image acquired by the camera by using a deep learning network to acquire a neural network model of the corresponding equipment; step S2: utilizing a 3D camera to acquire multi-view RGBD images of a target object and the neural network model, and performing three-dimensional reconstruction; obtaining a plurality of nodes of the target object; step S3: obtaining an RGB image and a depth image of an application scene by using a depth camera, and determining a foreground object range in the RGB image by using a distance parameter; step S4: and placing the target object in the RGB image, determining a visual angle and a fixed node of the target object, determining a placement plane according to the position of the target object in the RGB image, and determining a reference plane according to three nearest fixed nodes from the placement plane so as to enable the reference plane to coincide with the placement plane.

Description

Method, system, equipment and medium for three-dimensional synthesis of target object and application scene

Technical Field

The invention relates to the technical field of video processing, in particular to a method, a system, equipment and a medium for three-dimensional synthesis of a target object and an application scene.

Background

Live broadcast is a real-time interactive video propagation mode, and users can watch real-time video content of a host broadcast through a network live broadcast platform and interact with the host broadcast. Live content is in a wide variety of forms including, but not limited to, game live, educational live, entertainment live, e-commerce live, and the like.

In live broadcasting, a host can display real-time pictures and sounds of the host through equipment such as a camera, a microphone and the like, and communicate with a spectator in real time. The audience can interact with the anchor through the bullet screen, comments, praise and the like to express his own opinion and feeling.

Live broadcast has characteristics such as instantaneity, interactivity and social nature for audience can more direct, more in depth know anchor and content, also more easily produce resonance and interaction. Live broadcast has therefore become a popular way of internet entertainment and marketing.

However, the live broadcasting technical means cannot display oversized equipment in a live broadcasting room due to space and other limitations, and certainly, the problem of inconvenience in displaying part of equipment is also caused.

At present, the prior art also lacks a technology capable of displaying the target object in the live broadcasting room with high precision, so that various objects which can be displayed in the live broadcasting room are greatly expanded.

The foregoing background is only for the purpose of providing an understanding of the inventive concepts and technical aspects of the present application and is not necessarily prior art to the present application and is not intended to be used as an aid in the evaluation of the novelty and creativity of the present application in the event that no clear evidence indicates that such is already disclosed at the date of filing of the present application.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method, a system, equipment and a medium for three-dimensional synthesis of a target object and an application scene.

The method for three-dimensionally synthesizing the target object and the application scene provided by the invention comprises the following steps:

Step S1: training a model of the multi-view RGBD image acquired by the camera by using a deep learning network to acquire a neural network model of the corresponding equipment;

step S2: utilizing a 3D camera to acquire multi-view RGBD images of a target object and the neural network model, and performing three-dimensional reconstruction; obtaining a plurality of nodes of the target object; wherein the three-dimensional features include shape and size, the nodes including movable nodes and fixed nodes;

Step S3: obtaining an RGB image and a depth image of an application scene by using a depth camera, and determining a foreground object range in the RGB image by using a distance parameter; wherein the distance parameter is a preset value;

Step S4: and placing the target object in the RGB image, determining a visual angle and fixed nodes of the target object, determining an object placing plane according to the position of the target object in the RGB image, determining a reference plane according to three nearest fixed nodes from the object placing plane, enabling the reference plane to coincide with the object placing plane, obtaining a two-dimensional image of the target object corresponding to the corrected visual angle and the corrected visual angle, and fusing the two-dimensional image.

Optionally, the method for three-dimensionally synthesizing the target object and the application scene is characterized by further comprising:

Step S5: and if the depth value of the target object is smaller than the depth value of the application scene on the same pixel point, judging that the part is shielded, and rotating the part of the target object by taking the movable node as an axis so as to minimize the shielded part on the target object.

Optionally, the method for three-dimensionally synthesizing the target object and the application scene is characterized in that various gestures obtained by using a deep learning algorithm are matched with the application scene when the movable part of the target object is adjusted, so that the shielded part on the target object is minimized.

Optionally, the method for three-dimensionally synthesizing the target object and the application scene is characterized in that step S3 includes:

Step S31: obtaining an RGB image and a depth image of an application scene by using a depth camera;

Step S32: determining a first range according to the relation between the depth image and the distance parameter;

Step S33: dividing the RGB image corresponding to the first range to obtain a plurality of object placing planes;

Step S34: and forming a foreground object placing range by a plurality of object placing surfaces.

Optionally, the method for three-dimensionally synthesizing the target object and the application scene is characterized in that step S4 includes:

Step S41: placing the target object in the RGB image to obtain the visual angle and the fixed node of the target object;

Step S42: determining a placement plane according to the position of the target object in the RGB image, and determining a reference plane according to three nearest fixed nodes from the placement plane;

step S43: and adjusting the visual angle and the position of the target object to enable the reference surface to coincide with the object placing surface, and enabling the gravity center of the target object to be located above the object placing surface.

Step S45: calculating the depth value of the target object according to the depth value of the fixed node;

step S46: comparing the depth value of the target object with the depth value of the corresponding pixel point on the RGB image to obtain a boundary line with the same depth value;

step S47: dividing the area of the target object on the RGB image into an occlusion area and a non-occlusion area by the boundary line, and setting the occlusion area as the content on the RGB image.

Optionally, the method for three-dimensionally synthesizing the target object and the application scene is characterized in that step S46 includes:

Step S461: comparing the depth value of the target object with the depth value of the corresponding pixel point on the RGB image to obtain first boundary points with the same depth value, and obtaining a plurality of first boundary lines;

Step S462: if the distance value between the two first boundary lines is smaller than a first threshold value, merging the two first boundary lines to obtain a second boundary line;

Step S463: connecting the end points of the adjacent first boundary line or second boundary line by using a straight line if the distance between the end points is smaller than a first threshold value;

Step S464: steps S462 and S463 are repeatedly performed until there is no new merging or connecting object, and the final boundary line is confirmed.

The invention provides a system for three-dimensional synthesis of a target object and an application scene, which comprises the following modules:

The training module is used for performing model training on the multi-view RGBD image acquired by the camera by utilizing the deep learning network to obtain a neural network model of the corresponding equipment;

The acquisition module is used for acquiring multi-view RGBD images of the target object and the neural network model by using the 3D camera and carrying out three-dimensional reconstruction; obtaining a plurality of nodes of the target object; wherein the three-dimensional features include shape and size, the nodes including movable nodes and fixed nodes;

The foreground module is used for obtaining an RGB image and a depth image of the application scene by using the depth camera, and determining a foreground object range in the RGB image by using the distance parameter; wherein the distance parameter is a preset value;

And the merging module is used for placing the target object in the RGB image, determining the visual angle and the fixed node of the target object, determining the object placing plane according to the position of the target object in the RGB image, determining the reference plane according to the three nearest fixed nodes from the object placing plane, enabling the reference plane to coincide with the object placing plane, obtaining the two-dimensional image of the target object corresponding to the corrected visual angle and the corrected visual angle, and merging.

The invention provides a device for three-dimensional synthesis of a target object and an application scene, which comprises:

A processor;

a memory having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the method of stereoscopic compositing of a target object with an application scene via execution of the executable instructions.

According to the present invention, there is provided a computer-readable storage medium storing a program which, when executed, implements the steps of the method for three-dimensionally synthesizing a target object and an application scene.

Compared with the prior art, the invention has the following beneficial effects:

According to the invention, the neural network model of the corresponding equipment is obtained through training of the neural network model, and the neural network model has good adaptability and wide application range to various objects, can effectively identify various target objects, and greatly improves the efficiency.

The invention obtains the three-dimensional characteristics of the target object, thereby enabling the size, the depth value and the like of the target object to be more accurate and enabling the fused image to be more reasonable and real.

The invention determines the foreground object placement range by utilizing the distance parameter, and is more in line with the application scene, so that the target object is always at a position which can be intuitively watched by a user, and the target object is always better displayed.

According to the invention, the target object is finely adjusted according to the object placing plane in the RGB image and the reference plane determined by the fixed node, so that the target object can be better integrated into an application scene, the target object can be more truly presented, and the same technical effect as the target object in the video is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art. Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of steps of a method for three-dimensional synthesis of a target object and an application scene in an embodiment of the invention;

FIG. 2 is a schematic diagram of an embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps of another method for three-dimensional synthesis of a target object and an application scene according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating steps for determining a range of foreground objects according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating steps for locating a fixed node on a mount surface according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating another step of locating a fixed node on a mount surface according to an embodiment of the present invention;

FIG. 7 is a flowchart illustrating another step of locating a fixed node on a mount surface according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating steps for calculating boundary lines according to an embodiment of the present invention;

FIG. 9 is a schematic block diagram of a system for three-dimensional synthesis of a target object and an application scene according to an embodiment of the present invention;

Fig. 10 is a schematic structural diagram of a device for three-dimensional synthesis of a target object and an application scene in an embodiment of the present invention; and

Fig. 11 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.

1-A target object;

2-fixing the nodes;

3-placing an object plane;

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

The invention provides a method for three-dimensional synthesis of a target object and an application scene, which aims to solve the problems in the prior art.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a flow chart of steps of a method for three-dimensionally synthesizing a target object and an application scene in an embodiment of the present invention, as shown in fig. 1, the method for three-dimensionally synthesizing a target object and an application scene provided by the present invention includes the following steps:

Step S1: and training the model of the multi-view RGBD image acquired by the camera by using the deep learning network to obtain a neural network model of the corresponding equipment.

In this step, a neural network model is trained mainly by deep learning techniques. The model is capable of learning and extracting three-dimensional features of a target object from multi-view RGBD (color with depth information) images. RGBD images contain not only color information but also depth information of the object, which is important for accurately reconstructing the three-dimensional shape of the object.

Data preparation: a plurality of multi-view RGBD image datasets of different target objects are collected. These data should contain images of the object from different angles, different distances and different illumination conditions to ensure generalization of the model.

Network design: a suitable deep learning network structure, such as a Convolutional Neural Network (CNN) or a Generative Antagonism Network (GAN), is selected for extracting features from the RGBD images.

Training process: the neural network model is trained using the collected RGBD image dataset. During the training process, the model learns how to extract useful features from the input image and generates a three-dimensional representation of the target object.

Model optimization: the performance of the model is optimized by adjusting network parameters, using regularization technology or optimization algorithm, and the like, so that the accuracy of extracting three-dimensional features from RGBD images is improved.

Step S2: utilizing a 3D camera to acquire multi-view RGBD images of a target object and the neural network model, and performing three-dimensional reconstruction; a plurality of nodes of the target object are obtained.

In this step, a 3D camera is used to capture a multi-view RGBD image of the target object, and a neural network model trained previously is applied to reconstruct the target object in three dimensions, so that a two-dimensional image at any view angle can be generated. Two-dimensional images at any viewing angle can be obtained through three-dimensional reconstruction.

And (3) image acquisition: and shooting the target object at multiple angles by using a 3D camera, and acquiring RGBD images of multiple angles of view of the target object. Shooting from different angles and distances is ensured to capture the complete three-dimensional structure of the object.

Model application: the acquired RGBD images are input into a previously trained neural network model. The model processes the images and outputs three-dimensional information of the target object, such as point cloud data, a grid model, or three-dimensional coordinates.

Three-dimensional information extraction: three-dimensional features of the target object, such as shape, size, surface texture, etc., are extracted from the output of the model. These features may be used for subsequent analysis, processing or visualization.

Precision evaluation: and (3) carrying out precision evaluation on the extracted three-dimensional information to ensure that the three-dimensional information accords with the three-dimensional structure of the real object. Further adjustments and optimizations may be made to the model if necessary to improve the accuracy of the three-dimensional information.

Three-dimensional features mainly include shape, color, and size of objects, and nodes include movable nodes and fixed nodes. The active node is a point about which the target object can rotate. The fixed nodes are corner points for supporting the target object. For part of nodes, the nodes are not only movable nodes, but also fixed nodes, such as the connection part of a notebook computer, and can play a role of rotating a screen, and also play a role of supporting the notebook computer. The three-dimensional features may be obtained in a variety of ways, for example using a three-dimensional scanner, stereo camera system or other related three-dimensional measurement device. These devices are capable of capturing the three-dimensional shape of an object and generating a corresponding three-dimensional model or data. In acquiring the three-dimensional characteristics of the target object, it is necessary to select an appropriate three-dimensional measuring device to ensure that it can accurately measure the shape and size of the target object. The detailed three-dimensional data of the target object is acquired by scanning or measuring the target object. The acquired three-dimensional data is processed to remove noise, optimize the model, or make other necessary adjustments. The nodes are required to be preset.

The data in this step may be obtained either directly by the measuring device or indirectly by being stored in the storage device. This means that the target object may be either a real existing object or simply a designed product. For example, the designed three-dimensional cartoon image and the like can be integrated into the video only by parameters without actual products, so that the cost of data acquisition is greatly reduced.

In this step, the viewing angle of the target object is acquired through the 3D camera, so that the acquisition parameters identical to those of the video are obtained, and especially when the 3D camera adopted in this embodiment and the depth camera adopted in step S3 are of the same model, the consistency of the hardware device and the software device can enable the acquired application scene to have the best consistency with the target object, and the fused image is more real.

Step S3: and obtaining an RGB image and a depth image of the application scene by using a depth camera, and determining a foreground object range in the RGB image by using a distance parameter.

In this step, the distance parameter is a preset value. A depth camera is used to capture RGB images (color images) and depth images (representing the distance between points in the scene and the camera) of the application scene. Then, a foreground object range, namely a region close to the camera, is determined in the RGB image by using a preset distance parameter. First a depth camera is set to ensure that it can capture a complete view of the application scene. The depth camera outputs an RGB image and a depth image at the same time. The RGB image provides visual information of the scene, while the depth image provides depth information of points in the scene. The range of the foreground region is determined in the RGB image according to a preset distance parameter (e.g., the region within a few meters from the camera is considered foreground). This typically involves thresholding the depth image to identify pixel regions that meet the distance condition.

It should be noted that the foreground object range may be a continuous area or a discontinuous area on the RGB image.

In this step, the target object is placed in the RGB image obtained before. As shown in fig. 2, it is necessary to determine at least three fixed nodes (typically the corners of the target object) and one object plane (the surface that can support the target object within the range of the foreground object), and adjust the position and angle of the target object so that the determined fixed nodes are located on at least one object plane. The target object is placed at an initial position in the RGB image by using an image processing technology or a manual mode, and the position is adjusted by a manual or moving mode. Based on the positional relationship, the object plane of the application scene is identified in the RGB image, which is typically a flat surface for placing objects. And obtaining three fixed nodes closest to the object placing plane by identifying the fixed nodes of the object and calculating the distances between the fixed nodes and the object placing plane. And (3) taking the plane where the three fixed nodes are positioned as a reference plane, and adjusting the position and the direction of the target object in the image to enable the reference plane to coincide with the object placing plane. And obtaining the corrected visual angle of the target object. According to the corrected viewing angle, a two-dimensional image of the corresponding target object can be obtained, and then the two-dimensional image is fused with the RGB image of the direct scene.

Fig. 3 is a flowchart of steps of another method for three-dimensional synthesis of a target object and an application scene according to an embodiment of the present invention. As shown in fig. 3, another method for three-dimensionally synthesizing a target object and an application scene in the embodiment of the present invention further includes:

In the step, the visual angle of the target object in the application scene is adjusted by taking the movable node as an axis, so that the superposition of the reference surface and the object placing surface is ensured, and the relative position of the reference surface and the object placing surface is kept unchanged. Meanwhile, the depth value of the target object and the depth value of the foreground object range are analyzed in combination with the previously acquired depth image to judge whether the target object is at least partially blocked by other objects in the foreground.

When the movable part of the target object is adjusted, various gestures obtained by a deep learning algorithm are matched with the application scene, so that the shielded part of the target object is minimum.

When the depth value is analyzed to determine the occlusion condition, the depth value of the corresponding region of the target object in the depth image is extracted, which represents the distance between the target object and the depth camera. Meanwhile, the depth value distribution of the foreground object range in the depth image is analyzed, and the values reflect the distances between different objects in the foreground and the camera. Comparing the depth value of the target object with the depth value of the foreground region, if some or all of the depth values of the target object are less than some of the values in the foreground region, it may be determined that the target object is occluded by objects in the foreground. And according to the shielding condition, carrying out corresponding processing according to the user setting. If the object is set as the display object, the object is directly displayed, or the position of the object is further adjusted to reduce or eliminate shielding. And if the display is set to be real display, at least part of the target object is shielded, so that various characteristics of the target object are displayed more truly.

According to the embodiment, whether shielding exists is further judged according to the depth value in the range of the target object and the foreground object, and the target object is subjected to activity adjustment, so that the display effect is diversified, and a more real display effect can be obtained.

FIG. 4 is a flowchart illustrating steps for determining a range of foreground objects according to an embodiment of the present invention. As shown in fig. 4, the step of determining the range of the foreground object according to the embodiment of the present invention includes:

step S31: an RGB image and a depth image of the application scene are obtained using a depth camera.

In this step we will use a depth camera to capture RGB images and depth images of the application scene. The RGB image provides color information of the scene, while the depth image provides distance information between points in the scene and the depth camera.

When setting up the depth camera, ensure that the depth camera is properly installed in the application scene and adjust its parameters to obtain high quality RGB and depth images. The depth camera is started and allowed to capture RGB images and depth images of the application scene. Ensuring that the camera field of view covers the entire application scene and paying special attention to the foreground region. The captured RGB images and depth images are saved to a computer or other storage medium for subsequent processing and analysis.

Step S32: and determining a first range according to the relation between the depth image and the distance parameter.

In this step, the depth information in the depth image is compared with a preset distance parameter, so as to determine a first range. This range typically represents the region closer to the depth camera, i.e., the foreground region.

According to the actual requirements of the application scene and the characteristics of the camera, a proper distance parameter is set by a user or a system by default. This parameter represents the maximum distance between the foreground region and the depth camera that we want to determine. Each pixel point in the depth image is traversed and its corresponding depth value (i.e., the distance between the point and the camera) is obtained. These depth values are compared with a set distance parameter. And marking the pixel points with the depth values smaller than or equal to the distance parameter as a part of the foreground region according to the comparison result, so as to determine the first range. This range appears as one or more contiguous regions in the depth image. In some embodiments, a determination is also made as to whether a continuous region is considered to be a valid range only if the area of the region is greater than a specific area, i.e., only if the area of the continuous region is greater than the specific area.

Step S33: and carrying out segmentation processing on the RGB image corresponding to the first range to obtain a plurality of object placing planes.

In this step, the region corresponding to the first range in the RGB image is subjected to segmentation processing to identify a plurality of object planes. These object planes are flat surfaces that can be used to place objects in an application scenario. In image segmentation, image processing techniques (such as edge detection, region growing, or depth learning methods) are used to segment corresponding regions of the first range in the RGB image. The goal is to identify areas of continuity and flatness that may represent object planes. In performing feature extraction, features such as areas, shapes, textures, and the like are extracted from the segmented regions to further confirm whether they are object planes. Based on the extracted features, it is determined whether each of the segmented regions is a valid object plane. This typically involves an assessment of the size, shape regularity and texture consistency of the region. The area meeting certain conditions will be regarded as a object plane.

In this step, we will combine the identified object planes to form a complete foreground object range. This range will be used for reference later in the application scenario when the target object is placed. The final determined foreground object range will include all identified object planes and form a clear boundary. This boundary may be used to define which regions are foreground regions in the application scene that are suitable for placing objects. The areas of the plurality of object placing surfaces are different in size. For each object plane, the angle of the object plane is determined by adopting the plane formed by the largest inner triangle of the object plane. If the maximum inner triangle is multiple, taking the average value of the planes of the multiple inner triangles as the angle of the object placing plane.

In this embodiment, a clearly defined foreground object range is obtained by object plane recognition, and it will be used to guide the placement of the target object in the application scene in the subsequent steps.

FIG. 5 is a flowchart illustrating steps for locating a fixed node on a mount surface according to an embodiment of the present invention. As shown in fig. 5, in an embodiment of the present invention, the step of locating the fixed node on the object plane includes:

Step S41: and placing the target object in the RGB image to obtain the visual angle and the fixed node of the target object.

In this step we need to place the target object in the RGB image of the application scene and find the object plane closest to the target object. This involves searching and analyzing the range of foreground objects in the RGB image to find the most appropriate placement location. And calculating the distance between the center point of the target object and the object plane to obtain the nearest object plane. The viewing angle obtained in this step is the initial viewing angle, i.e. the general viewing angle desired by the operator.

Step S42: and determining a surface object according to the position of the target object in the RGB image, and determining a reference surface according to three nearest fixed nodes from the surface object.

In this step, the distances between the fixed nodes and the object plane are calculated, so that three fixed nodes closest to the object plane are obtained. These three fixed nodes constitute a reference plane. This helps to ensure that the target object is able to maintain the proper angle and orientation with the object plane when placed, thereby achieving a more realistic visual effect. The fixed node is typically the bottom or primary support surface of the target object for determining the stable orientation of the object when placed. This helps to ensure that the target object is able to maintain proper angular alignment with the object plane when placed.

In this step, we need to adjust the viewing angle of the target object in the RGB image so that its fixed node is located on the object plane. This involves both rotational and translational manipulation of the target object in the image to ensure its correct placement. Since the three-dimensional characteristics of the target object are known, an image of an arbitrary angle can be obtained, so that it can be displayed in an RGB image. When the visual angle is adjusted, the position and the angle of the target object in the RGB image are adjusted by using an image processing technology or an interactive interface tool. This may include rotating the object to change its orientation, translating the object to change its position, etc. Through continuous adjustment and observation, the fixed node of the target object is ensured to be positioned on the object placing plane. The reference plane is overlapped with the object placing plane, and the position of the target object is adjusted so that the gravity center of the target object is positioned above the object placing plane. The position of the center of gravity is not constant. The center of gravity is determined according to different morphologies of the target object.

In the embodiment, the object plane closest to the target object is determined, then the fixed node is determined according to the distance between the fixed node and the object plane, and then the visual angle of the target object is adjusted, so that the visual angle of the target object meets the real requirements, meets the requirements of users, and is ready for subsequent display activities.

FIG. 6 is a flowchart illustrating another step of locating a fixed node on a mount surface according to an embodiment of the present invention. As shown in fig. 6, another step of locating the fixed node on the object plane according to the embodiment of the present invention further includes:

step S44: and adjusting the volume of the target object according to a preset proportion, so that the size of the range of the target object and the range of the foreground object is the preset proportion.

In this step, the size of the target object in the application scene is adjusted according to the preset proportional relationship, so as to ensure that the size of the target object matches with the size of the foreground object range. This helps to visually achieve the harmony and balance of the target object with the application scene.

The preset proportion is to set a proper proportion relation according to the actual demand of the application scene and the visual effect requirement. This ratio may be determined based on factors such as the relative sizes of the target object and the range of foreground objects, the viewing angle, etc. For example, the size of the target object is very large, much larger than the size between videos, and it is required to reduce the size to show the overall view. In contrast, if the size of the target object is small, enlargement is required to make the user more clearly see the target object. The preset comparison may be multiple and may be adjusted during the video process.

When the volume of the target object is adjusted, the volume of the target object in the application scene is adjusted according to a preset proportional relation. This may be achieved by a scaling operation in the image processing technique, ensuring that the size of the target object is coordinated with the size of the foreground object range.

And (3) verifying the proportion relation: after the adjustment of the target object volume is completed, it is required to verify whether the size ratio of the target object volume to the foreground object range meets the preset requirement. This may be done by comparing the relative sizes of the target object and the range of foreground objects, observing visual effects, etc.

Care must be taken to maintain the integrity of the shape and proportions of the target object while adjusting its volume to avoid distortion or distortion. The relative size and positional relationship of the target object and other objects in the application scene also need to be considered to ensure the coordination and consistency of the whole scene.

According to the embodiment, the volume of the target object is adjusted according to the preset proportion, the target object is placed in an application scene in a proper volume size, a preset proportion relation is kept between the target object and the size of the foreground object range, and a good visual effect is provided for subsequent video activities.

FIG. 7 is a flowchart illustrating another step of locating a fixed node on a mount surface according to an embodiment of the present invention. As shown in fig. 7, another step of locating the fixed node on the object plane according to the embodiment of the present invention further includes:

Step S45: and calculating the depth value of the target object according to the depth value of the fixed node.

In this step we calculate the depth value of the entire target object surface from the determined depth values of the stationary nodes. In the calculation, the depth value is calculated by at least two fixed nodes. Since the fixed node is located on the object plane, its depth value can be determined. Meanwhile, parameters such as the size of the target object are known, and the depth value of the surface of the target object can be calculated.

Step S46: and comparing the depth value of the target object with the depth value of the corresponding pixel point on the RGB image to obtain a boundary line with the same depth value.

In this step, the depth value of the target object is compared with the depth value of the corresponding pixel point on the RGB image to find out the boundary line with the same depth value. These boundary lines will be used to distinguish the occlusion relationship of the target object from the background or other objects. When the pixel points are matched, for each pixel point of the target object in the RGB image, the corresponding pixel point of the target object in the depth image is found. And comparing the depth value of the target object in the depth image with the depth value of the corresponding pixel in the RGB image pixel by pixel. Pixels of equal depth values are recorded, which form the boundary line of the target object with the background or other objects.

In this step, the area of the target object on the RGB image is divided into an occlusion area and a non-occlusion area by the boundary line. The occlusion region represents a portion of the target object that is occluded by other objects, while the non-occlusion region represents a portion of the target object that is visible. Then, we replace the pixel values of the occlusion region with the content of the corresponding position on the RGB image. For the pixels of the occlusion region, the pixel values thereof are replaced with the pixel values of the corresponding locations (i.e., the background or other object) in the RGB image. This can be achieved by simple pixel replication or more complex image synthesis techniques. The pixel values of the non-occlusion regions remain unchanged because they represent the visible portion of the target object.

By the processing of the embodiment, an updated RGB image is obtained, wherein the shielding relation between the target object and the background or other objects is correctly processed, so that the whole scene is more real and accurate in vision.

Fig. 8 is a flowchart illustrating steps for calculating a boundary line according to an embodiment of the present invention. As shown in fig. 8, the step of calculating the boundary line in the embodiment of the present invention includes:

This step is iterated until no new second boundary line is generated. The second boundary lines may be merged to generate a new second boundary line. The two first boundary lines may be one long and one short, and this step may combine them into one longer first boundary line, so as to eliminate interference caused by errors in the depth values.

The step is to connect adjacent endpoints, so that partial data errors or data missing caused by low depth value data density can be eliminated, and the boundary is ensured to be completely identified. Such as structured light data, the depth value is not one per pixel. The processing of the step can ensure that the structured light, even the depth data obtained by the sparse structured light, can be effectively identified and processed.

The boundary line is processed, the processing steps of the boundary line are clear, and the method has good applicability to TOF technology and structured light technology, can be suitable for scenes with dense depth values and scenes with sparse depth values, ensures that boundary division is more accurate and complete, and can ensure the accuracy and the integrity of the boundary.

Fig. 9 is a schematic block diagram of a system for three-dimensional synthesis of a target object and an application scene in an embodiment of the present invention, as shown in fig. 9, where the system for three-dimensional synthesis of a target object and an application scene provided by the present invention includes the following modules:

The acquisition module is used for acquiring the three-dimensional characteristics of the target object and a plurality of nodes; wherein the three-dimensional features include shape and size, the nodes including movable nodes and fixed nodes;

In the embodiment, the three-dimensional characteristics of the target object are obtained through determination, the object placement surface is identified in the direct scene, and the accurate positioning of the target object in the application scene is realized through the matching of the fixed node and the object placement surface, so that a user can obtain more real watching experience, the fusion authenticity is greatly improved, and a better video display effect is achieved.

The embodiment of the invention also provides equipment for three-dimensional synthesis of the target object and the application scene, which comprises a processor. A memory having stored therein executable instructions of a processor. Wherein the processor is configured to execute the steps of a method of stereoscopic compositing of a target object with an application scene via execution of executable instructions.

As described above, in this embodiment, the three-dimensional feature of the target object is obtained by determining, the object placement plane is identified in the direct scene, and the accurate positioning of the target object in the application scene is achieved by matching the fixed node with the object placement plane, so that the user can obtain a more real viewing experience, the fusion authenticity is greatly improved, and a better video display effect is achieved.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" platform.

Fig. 10 is a schematic structural diagram of an apparatus for three-dimensional synthesis of a target object and an application scene in an embodiment of the present invention. An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 10. The electronic device 600 shown in fig. 10 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 10, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including memory unit 620 and processing unit 610), a display unit 640, etc.

Wherein the storage unit stores program code executable by the processing unit 610 such that the processing unit 610 performs the steps according to various exemplary embodiments of the present invention described in the above-described method section for stereoscopic synthesis of a target object and an application scene. For example, the processing unit 610 may perform the steps as shown in fig. 1.

The storage unit 620 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.

The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown in fig. 10, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage platforms, and the like.

The embodiment of the invention also provides a computer readable storage medium for storing a program, and the method for realizing the three-dimensional synthesis of the target object and the application scene when the program is executed. In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the present invention as described in the above-mentioned method section for stereoscopic synthesis of a target object and an application scenario, when the program product is run on a terminal device.

As shown above, in this embodiment, the three-dimensional feature of the target object is obtained through determination, the object placement plane is identified in the direct scene, and the accurate positioning of the target object in the application scene is realized through the matching of the fixed node and the object placement plane, so that the user can obtain more real viewing experience, the fusion authenticity is greatly improved, and the better video display effect is achieved.

Fig. 11 is a schematic structural view of a computer-readable storage medium in an embodiment of the present invention. Referring to fig. 11, a program product 800 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. The method for three-dimensionally synthesizing the target object and the application scene is characterized by comprising the following steps:

step S2: utilizing a 3D camera to acquire multi-view RGBD images of a target object and the neural network model, and performing three-dimensional reconstruction; obtaining a plurality of nodes of the target object; the method comprises the steps that a two-dimensional image under any view angle can be obtained through three-dimensional reconstruction, and the nodes comprise movable nodes and fixed nodes;

2. The method for stereoscopic synthesis of a target object and an application scene according to claim 1, further comprising:

3. The method of three-dimensional synthesis of a target object and an application scene according to claim 2, wherein various gestures obtained by a deep learning algorithm are matched with the application scene when adjusting the active part of the target object so as to minimize the blocked part of the target object.

4. The method for stereo synthesis of a target object and an application scene according to claim 1, wherein step S3 comprises:

5. The method for stereo synthesis of a target object and an application scene according to claim 1, wherein step S4 comprises:

6. The method for stereoscopic synthesis of a target object and an application scene according to claim 5, further comprising:

7. The method of stereoscopic combining of a target object and an application scene according to claim 6, wherein step S46 includes:

8. The system for three-dimensional synthesis of the target object and the application scene is characterized by comprising the following modules:

9. A device for three-dimensional synthesis of a target object and an application scene, comprising:

A processor;

a memory having stored therein executable instructions of the processor;

Wherein the processor is configured to perform the steps of a method of stereoscopic compositing of a target object with an application scene according to any of claims 1-7 via execution of the executable instructions.

10. A computer-readable storage medium storing a program, wherein the program when executed implements the steps of a method of stereoscopic compositing of a target object with an application scene according to any of claims 1-7.