CN116405642A

CN116405642A - Method and device for fusing video and live-action three-dimensional model

Info

Publication number: CN116405642A
Application number: CN202310421991.XA
Authority: CN
Inventors: 隗刚; 李鑫; 孙士欣; 王晓辉
Original assignee: Beijing Daoheng Software Co ltd
Current assignee: Beijing Daoheng Software Co ltd
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2023-07-07

Abstract

The application discloses a method and a device for fusing a video and a live-action three-dimensional model, which relate to the technical field of video fusion and are characterized in that a JSON registration file generated at the rear end is acquired; analyzing the JSON registration file and obtaining parameters of the monitoring camera and parameters of the model projection surface; rendering and loading a live-action model at the front end of the three-dimensional GIS; analyzing a plurality of monitoring videos and outputting video streams; according to the monitoring camera parameters and the model projection plane parameters, video streams are put in the projection plane; and fusing and displaying the live-action model and the video stream. The method and the device can project the real-time picture of the camera onto the three-dimensional live-action model or the VR panorama, can splice and fuse adjacent pictures, form a picture with larger resolution after splicing, can not deform or misplace along with the operations such as tilting and rotating the three-dimensional model, avoid the situation that the picture is distorted, and can be simultaneously accessed into multi-path videos.

Description

Method and device for fusing video and live-action three-dimensional model

Technical Field

The application relates to the technical field of video fusion, in particular to a method and a device for fusing video and a live-action three-dimensional model.

Background

Video fusion techniques are techniques that integrate multiple video or image information into a single video or image. However, the existing video fusion method has the following defects:

1. viewing angle problems. When the video is mapped to the surface of the model, the observation effect is good only under the condition that the user viewing angle is consistent with the monitoring camera viewing angle, and if the user viewing angle is greatly different from the monitoring camera viewing angle, the picture can be distorted very seriously during browsing.

2. The number of videos that can be accessed simultaneously in a three-dimensional scene is limited. Although the performance of the existing display card is better, the existing display card can only decode about ten paths of videos at the same time, and more monitoring cameras in a three-dimensional scene are used, so that a plurality of paths of videos are difficult to access at the same time.

Disclosure of Invention

Therefore, the application provides a method and a device for fusing video and a live-action three-dimensional model, which are used for solving the problems of serious picture distortion and limited number of accessed videos when the difference between the visual angle of a user and the visual angle of a camera is large in the prior art.

In order to achieve the above object, the present application provides the following technical solutions:

in a first aspect, a method for fusing video with a live-action three-dimensional model includes:

acquiring a JSON registration file generated at the rear end;

analyzing the JSON registration file and acquiring parameters of a monitoring camera and parameters of a model projection surface;

rendering and loading a live-action model at the front end of the three-dimensional GIS;

analyzing a plurality of monitoring videos and outputting video streams;

according to the monitoring camera parameters and the model projection plane parameters, the video stream is put in a projection plane;

and fusing and displaying the live-action model and the video stream.

Preferably, the specific process of generating the JSON registration file at the back end is as follows:

acquiring a monitoring live-action range under a monitoring camera;

the method comprises the steps of setting up a monitoring range of a live-action in a live-action model by combining the live-action model and creating a plurality of projection surfaces;

adjusting parameters of the monitoring camera and cutting out a plurality of projection surfaces;

adjusting video effects, and selecting the best effect by matching with the projection surface to obtain a registration result;

and saving the registration result as a JSON registration file.

Preferably, the monitoring live-action range is determined by the viewing angle and the installation position of the monitoring camera.

Preferably, the adjusting the parameters of the monitoring camera includes adjusting up, down, left and right.

Preferably, the adjusting the video effect includes adjusting brightness, saturation, and brightness.

Preferably, the RTSP method is adopted to parse the plurality of monitoring videos and output the video stream.

Preferably, the JSON registration file includes a clipping polygon attribute parameter and a polygon region boundary coordinate parameter.

In a second aspect, an apparatus for fusing video with a live-action three-dimensional model, includes:

the data acquisition module is used for acquiring the JSON registration file generated at the back end;

the JSON registration file analysis module is used for analyzing the JSON registration file and acquiring parameters of the monitoring camera and parameters of the model projection surface;

the loading module is used for rendering and loading the live-action model at the front end of the three-dimensional GIS;

the video stream analysis module is used for analyzing a plurality of monitoring videos and outputting video streams;

the video delivery module is used for delivering video on the projection surface according to the monitoring camera and the model projection surface parameters;

and the fusion display module is used for fusing and displaying the live-action model and the video on the projection surface.

In a third aspect, a computer device comprises a memory storing a computer program and a processor implementing the steps of a method of fusing video with a live three-dimensional model when the computer program is executed.

In a fourth aspect, a computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of a method of fusing video with a live three-dimensional model.

Compared with the prior art, the application has the following beneficial effects:

the application provides a method and a device for fusing video and a live-action three-dimensional model, wherein the method comprises the following steps: acquiring a JSON registration file generated at the rear end; analyzing the JSON registration file and obtaining parameters of the monitoring camera and parameters of the model projection surface; rendering and loading a live-action model at the front end of the three-dimensional GIS; analyzing a plurality of monitoring videos and outputting video streams; according to the monitoring camera parameters and the model projection plane parameters, video streams are put in the projection plane; and fusing and displaying the live-action model and the video stream. The method and the device can project the real-time picture of the camera onto the three-dimensional live-action model or the VR panorama, can splice and fuse adjacent pictures, form a picture with larger resolution after splicing, can not deform or misplace along with the operations such as tilting and rotating the three-dimensional model, avoid the situation that the picture is distorted, and can be simultaneously accessed into multi-path videos.

Drawings

For a more visual illustration of the prior art and the present application, several exemplary drawings are presented below. It should be understood that the specific shape and configuration shown in the drawings should not be considered in general as limiting upon the practice of the present application; for example, based on the technical concepts and exemplary drawings disclosed herein, those skilled in the art have the ability to easily make conventional adjustments or further optimizations for the add/subtract/assign division, specific shapes, positional relationships, connection modes, dimensional scaling relationships, etc. of certain units (components).

Fig. 1 is a flowchart of a method for fusing video and a live-action three-dimensional model according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a front-end video fusion display according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of back-end video registration according to an embodiment of the present application.

Detailed Description

The present application is further described in detail below with reference to the attached drawings.

In the description of the present application: unless otherwise indicated, the meaning of "a plurality" is two or more. The terms "first," "second," "third," and the like in this application are intended to distinguish between the referenced objects without a special meaning in terms of technical connotation (e.g., should not be construed as emphasis on degree or order of importance, etc.). The expressions "comprising", "including", "having", etc. also mean "not limited to" (certain units, components, materials, steps, etc.).

The terms such as "upper", "lower", "left", "right", "middle", and the like, as referred to in this application, are generally used for convenience in visual understanding with reference to the drawings, and are not intended to be an absolute limitation of the positional relationship in actual products. Such changes in relative positional relationship are considered to be within the scope of the present description without departing from the technical concepts disclosed herein.

Example 1

Referring to fig. 1 and 2, the present embodiment provides a method for fusing video and a live-action three-dimensional model (i.e. front-end video fusion display), which includes:

s1: acquiring a JSON registration file generated at the rear end;

specifically, JSON registration files generated by the back end can be obtained by using a JSON library. JSON registration files may be read into memory using JSON load functions.

S2: analyzing the JSON registration file and obtaining parameters of the monitoring camera and parameters of the model projection surface;

s201: analyzing the JSON registration file;

specifically, after reading the JSON registration file, the JSON registration file needs to be parsed so as to obtain the required parameter information. Dictionary or list may be used at parsing to access key-value pairs or elements in JSON.

S202: acquiring parameters of a monitoring camera;

specifically, parameter information of the monitoring camera, such as the camera position, direction, and angle of view, can be obtained from the JSON registration file, and these parameters are typically stored under the "camera" or "camera" key.

S203: obtaining model projection plane parameters;

in particular, parameter information of the model projection plane, such as the parameter equation, position, orientation, etc. of the projection plane, can be obtained from the JSON registration file, and these parameters are typically stored under the "projection plane" or "model" key.

S204: performing parameter matching;

specifically, after parameters of the monitoring camera and the model projection surface are acquired, matching is needed to determine the relationship between the parameters, and projection of the model projection surface under the monitoring camera is calculated.

S205: and outputting a parameter result.

Specifically, the parameter result of the monitoring camera and the parameter result of the model projection surface after the matching are output so as to carry out subsequent monitoring and data processing.

S3: rendering and loading a live-action model at the front end of the three-dimensional GIS;

s301: obtaining data of a live-action model;

specifically, the real-scene model data is obtained through the GIS system, and the real-scene model data generally comprises geometric information and texture information of the real-scene model.

S302: converting the data format of the live-action model;

in particular, the acquired live-action model data typically requires format conversion for loading and rendering at the front end, which may be implemented using three-dimensional modeling software, model conversion tools, or open source frameworks.

S303: front-end rendering engine selection;

specifically, a proper front-end rendering engine, such as three, basylon, js or Cesium, is selected, and provides rich rendering functions and API interfaces, so that the loading and rendering of the live-action model are conveniently realized.

S304: model loading and scene construction;

specifically, an API interface provided by a front-end rendering engine is used for loading geometric information and texture information of a live-action model and constructing a scene. Corresponding cameras, lights, materials, etc. can be established to facilitate the display and interaction of the models.

S305: model position and orientation calibration;

specifically, during the process of loading the live-action model, the position and orientation thereof need to be calibrated so as to be aligned with the GIS map. The geographic position and orientation information on the GIS map and the coordinate system and direction information of the live-action model can be obtained for calculation and conversion.

S306: terrain and elevation data processing;

in particular, in practical applications, the processing of terrain and elevation data is also considered in order to achieve accurate display of the subtle and high levels of terrain. The scene may be processed and adjusted using DEM data, lidar data, or terrain information, etc.

S307: preprocessing and optimizing.

Specifically, in order to improve loading and rendering efficiency, the live-action model data may be preprocessed and optimized, such as gridding, compressing, clipping, and the like.

S4: analyzing a plurality of monitoring videos in an RTSP mode and outputting video streams;

s401: acquiring an RTSP address of a monitoring camera; such as rtsp:/(192.168.1.100:554/live/Ch01_0;

s402: selecting an RTSP client library, for example: live555 or FFmpeg, etc., which can quickly decode RTSP packets in a very short time and parse them as much as possible;

s403: using the selected RTSP client library in the code to acquire a video stream from the RTSP address of the monitoring camera;

specifically, firstly, an RTSP session is established, and an RTSP LAY command is sent to start a video stream of a monitoring camera; then, a video player is created, and video is presented while the video stream is read from the monitoring camera in real time.

S404: adding necessary codes into the codes to enable the video stream to be displayed on a Web interface;

specifically, the video stream can be embedded by using HTML5, javaScript and other technologies, and the video playing logic can be realized through a web page, and the video display function can also be realized by using labels such as canvas, video and the like.

S405: network settings are confirmed.

In particular, if the distance between the monitoring camera and the Web server is far, it is necessary to use some streaming protocol, such as RTMP or HLS. This ensures proper transmission of the video stream and reduces peak loads.

S5: according to the monitoring camera parameters and the model projection plane parameters, video streams are put in the projection plane;

s501: acquiring position parameters of the monitoring camera, including position, orientation and parameter setting of the monitoring camera in a three-dimensional space, for example: angle of view, angle of rotation, etc.;

s502: model projection plane parameters are obtained, including the position, size, orientation, etc. of the projection plane. Models that may be used include physical models or virtual models, such as buildings, venues, stages, etc.;

s503: the monitoring camera and model projection plane parameters are calculated and converted to display the video stream in the correct position and orientation. This requires projecting the video stream from the monitoring camera (in three dimensions) and aligning it with each other (aiming at the target position of the model projection plane);

s504: presenting the calculated video stream on a model projection surface using a projector or another video player;

s505: special effects, animations or interactive functions may be added when rendering the video, as desired.

S6: and fusing and displaying the live-action model and the video stream.

S601: registering the model and the video data;

specifically, by establishing the corresponding relation between the model reference point and the video reference point, the model reference point and the video reference point are calibrated in space so that consistent operation and display can be performed.

S602: generating a three-dimensional scene;

specifically, a three-dimensional scene is created by loading model data into threes. This scene may include buildings, roads, vegetation, terrain, and the like.

S603: loading video data;

specifically, video data is loaded through threes or other three-dimensional scene software, and settings are made, such as setting attributes such as the position and size of the video.

S604: fusing the model and the video;

specifically, the model data and the video data are fused through a hybrid mode in the three.js or other three-dimensional GIS software, so that seamless switching and transition of the model and the video are realized. The interaction mode of the pick-up points can be adjusted, and the position, the angle, the visual angle and the like are adjusted in the display.

S605: performing animation and interactive operation;

specifically, by adding animation and interaction control, the user can control and operate the elements in the scene. JavaScript programs can be written to control graphics and visual effects to achieve more complex animations.

Referring to fig. 3, in step S1 of the present embodiment, the specific process of generating the JSON registration file by the back end is (i.e., back end video registration):

s101: acquiring a monitoring live-action range under a monitoring camera;

specifically, the monitoring range may be determined by the angle of view and the installation position of the monitoring camera. The size and shape of the monitoring range may be measured using a measuring tool or map software.

S102: the method comprises the steps of setting up a monitoring range of a live-action in a live-action model by combining the live-action model and creating a plurality of projection surfaces;

s1021: the monitoring range of the live-action in the model is defined by combining with the live-action model;

specifically, 3D modeling software or map software is used to create a live-action model, and boundaries of the monitoring range are marked in the live-action model.

S1022: according to the position and the orientation of the monitoring camera and the monitoring range in the real scene model, a projection surface is created and used for displaying the monitoring real scene;

in particular, the display of the projection surface may be achieved using virtual reality technology or other display technology.

S1023: according to the boundaries of the live-action model and the monitoring range, a plurality of projection surfaces can be created in the model, and each projection surface corresponds to one monitoring range;

in particular, 3D modeling software or GIS software may be used to create the plurality of projection surfaces.

S1024: and projecting the video of the monitoring camera to a corresponding projection surface.

In particular, it may be implemented using video processing software or monitoring system software. The monitoring video is projected onto the corresponding projection surface, so that the monitoring effect can be improved, and the blind area is reduced.

S103: adjusting parameters of the monitoring camera, and cutting out a plurality of projection surfaces;

s1031: adding a plurality of cameras in the model;

specifically, multiple cameras may be created by selecting one camera in a scene and then copy-pasting.

S1032: adjusting parameters of each camera;

specifically, parameters such as a position, a direction, a viewing angle and the like of the camera can be set in the attribute panel by selecting the camera. The direction of the camera can also be adjusted by rotating the camera up, down, left, right.

S1033: creating a projection surface for each camera;

specifically, a camera may be selected in the scene, and then a projection plane may be selected to be created in the menu bar to create the projection plane; the monitoring range can be determined by adjusting the size and position of the projection surface.

S1034: binding each camera with a corresponding projection surface, selecting the camera and the projection surface in the scene, and then combining the camera and the projection surface together, so that a plurality of projection surfaces can be created, and a plurality of camera parameters can be adjusted to monitor different areas;

s1035: aiming at a multi-camera system, parameters of each camera, including focal length, visual angle, exposure time, photosensitivity and the like, are adjusted according to actual needs so as to ensure that each camera is uniformly distributed, mutually independent and clearly imaged;

s1036: according to the scene requirement, a clipping plane is set, and the clipping plane is respectively applied to the projection planes in the upper direction, the lower direction, the left direction and the right direction so as to realize the self-defined visual field range.

S104: adjusting the video effect, and selecting the optimal effect by matching with the projection surface to obtain a registration result;

s1041: processing the input video, and adjusting the effects of brightness, saturation, brightness and the like to optimize the video quality;

s1042: and selecting a proper projection surface according to the actual needs of the scene so as to furthest improve the projection effect.

S105: and saving the registration result as a JSON registration file.

S1051: saving the camera parameters and the model parameters into a JSON file for subsequent use and configuration;

s1052: the processed video and the registered parameters are stored in a new JSON file for subsequent playing and running.

According to the method for fusing the video and the live-action three-dimensional model, the two processes of background video registration and front-end video fusion display are needed to be achieved in the technology for fusing the video and the live-action three-dimensional model.

Preprocessing video data at the back end of video registration (preprocessing stage), and performing geometric correction, noise elimination, color and brightness adjustment, registration, effective region clipping and the like; and the front-end video fusion display stage performs video fusion projection operation, performs projection calculation according to a perspective projection algorithm based on the spatial position relation between the video and the three-dimensional scene, the user view angle and the camera view angle, and accordingly realizes seamless video image projection on the three-dimensional live-action model. When the visual angle of the user changes, video rendering is realized through a special algorithm and changes along with the change of the visual angle of the user, so that the situation that the picture is distorted is avoided.

The method for fusing the video and the live-action three-dimensional model can be applied to the fields of artificial intelligence application and intelligent security, and relates to the fields of reservoir dam safety monitoring, electromechanical facility safety, water safety, flood season safety, inspection management and the like.

Example two

The embodiment provides a device for fusing video and a live-action three-dimensional model, which comprises:

the video delivery module is used for delivering video on the projection surface according to the monitoring camera parameters and the model projection surface parameters;

For specific limitations on a device for fusing video with a live three-dimensional model, reference may be made to the above limitation on a method for fusing video with a live three-dimensional model, and details thereof will not be repeated here.

Example III

The embodiment provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of a method for fusing video and a live three-dimensional model when executing the computer program.

Example IV

The present embodiment provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of fusing video with a live three-dimensional model.

Any combination of the technical features of the above embodiments may be performed (as long as there is no contradiction between the combination of the technical features), and for brevity of description, all of the possible combinations of the technical features of the above embodiments are not described; these examples, which are not explicitly written, should also be considered as being within the scope of the present description.

The foregoing has outlined and detailed description of the present application in terms of the general description and embodiments. It should be appreciated that numerous conventional modifications and further innovations may be made to these specific embodiments, based on the technical concepts of the present application; but such conventional modifications and further innovations may be made without departing from the technical spirit of the present application, and such conventional modifications and further innovations are also intended to fall within the scope of the claims of the present application.

Claims

1. A method for fusing video with a live-action three-dimensional model, comprising:

acquiring a JSON registration file generated at the rear end;

analyzing a plurality of monitoring videos and outputting video streams;

and fusing and displaying the live-action model and the video stream.

2. The method for fusing video and live-action three-dimensional models according to claim 1, wherein the specific process of generating the JSON registration file at the back end is as follows:

acquiring a monitoring live-action range under a monitoring camera;

and saving the registration result as a JSON registration file.

3. The method of fusing video with a live-action three-dimensional model of claim 2, wherein the monitored live-action range is determined by monitoring a camera's viewing angle and mounting position.

4. The method of fusing video and live-action three-dimensional model of claim 2, wherein adjusting the monitoring camera parameters comprises up, down, left, right adjustments.

5. The method of claim 2, wherein adjusting the video effect comprises adjusting brightness, saturation, and brightness.

6. The method for merging the video and the live-action three-dimensional model according to claim 1, wherein the analyzing of the plurality of monitoring videos and the outputting of the video stream is performed in an RTSP mode.

7. The method of claim 1, wherein the JSON registration file includes a clipping polygon attribute parameter and a polygon region boundary coordinate parameter.

8. A device for fusing video with a live-action three-dimensional model, comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.