CN111402374A

CN111402374A - Method, device, equipment and storage medium for fusing multi-channel video and three-dimensional model

Info

Publication number: CN111402374A
Application number: CN201811634403.6A
Authority: CN
Inventors: 吴旻烨; 李家豪; 莫晓烨
Original assignee: Yaoke Intelligent Technology Shanghai Co ltd
Current assignee: Yaoke Intelligent Technology Shanghai Co ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-10
Anticipated expiration: 2038-12-29
Also published as: CN111402374B

Abstract

According to the method, the device, the equipment and the storage medium for fusing the multi-channel video and the three-dimensional model, a three-dimensional scene model is formed by obtaining internal and external parameters and video data of a plurality of monitoring cameras, monitoring depth images corresponding to the monitoring cameras are obtained through rendering, virtual depth images and color images of virtual cameras are obtained, the monitoring camera closest to the visual angle of the virtual camera is selected according to the internal and external parameters, the virtual depth images and the monitoring depth images of the monitoring cameras, whether a target point under the visual angle of the virtual camera is shielded under the currently selected closest visual angle of the monitoring camera is judged, and finally the completely rendered three-dimensional scene model corresponding to the current frame is obtained. The method and the device can greatly improve the effect of fusion of the video and the three-dimensional model, and avoid the problem of overlapping or color distortion of scene objects.

Description

Method, device, equipment and storage medium for fusing multi-channel video and three-dimensional model

Technical Field

The application relates to the technical field of three-dimensional model rendering and image processing. In particular to a method for fusing multi-channel videos and three-dimensional models, a device, equipment and a storage medium thereof.

Background

At present, a large number of cameras are used for video monitoring in cities, and basically only two-dimensional image information is used. In fact, it is also possible to utilize the spatial position information of the monitoring camera.

In the related art, there is a fusion technology of a three-dimensional model and a surveillance video. The technology generally extracts video frames from a monitoring video stream, and projects the video frames into a three-dimensional space scene to realize full-time-space stereo fusion of video data and three-dimensional model data, so that a three-dimensional model and a three-dimensional scene image obtained by real-time shooting of a monitoring video are perfectly combined in real time. The technology can realize visual expression of the three-dimensional scene, obtain information of the sheltered object and the like.

However, due to the complexity of the monitoring scene, in the related art, the effect of the established three-dimensional model or the observation effect of selecting any angle from the established three-dimensional model is not ideal. For example, the two or more surveillance videos can be shot together in an area where the two or more surveillance videos can be overlapped, distorted, and the like, and the area cannot be smoothly spliced. Still alternatively, in a picture presented at an optional viewing angle, problems of distortion or mismatch of image pixel colors may occur.

Content of application

In view of the above drawbacks of the prior art, an object of the present application is to provide a method for fusing a multi-channel video and a three-dimensional model, and an apparatus, a device and a storage medium thereof, which are used to solve the problem of poor effect of fusing a three-dimensional video in the prior art.

To achieve the above and other related objects, the present application provides a method for fusing a multi-channel video with a three-dimensional model, the method comprising: acquiring internal and external parameters and video data of a plurality of monitoring cameras to form a three-dimensional scene model corresponding to a current frame and a scene model color map corresponding to the three-dimensional scene model; rendering the three-dimensional scene under the visual angle of each monitoring camera in the three-dimensional scene model to obtain a monitoring depth image corresponding to each monitoring camera, and rendering the three-dimensional scene under the visual angle of a corresponding virtual camera to obtain a virtual depth image and a color image; selecting a monitoring camera closest to the visual angle of the virtual camera according to the internal and external parameters of the virtual camera, the virtual depth image and the monitoring depth image of each monitoring camera, and judging whether a target point under the visual angle of the virtual camera is blocked under the currently selected visual angle of the closest monitoring camera; if the monitoring camera is blocked, judging whether a next monitoring camera closest to the visual angle of the virtual camera exists; if yes, repeating the previous step; if not, rendering is carried out according to the pixel color of the scene color map; if the target point is not shielded, rendering the target point under the virtual camera view angle according to the pixel color corresponding to the target point in the currently selected video analysis image of the nearest monitoring camera; and finally, obtaining a completely rendered three-dimensional scene model corresponding to the current frame.

In an embodiment of the present application, a method for selecting a monitoring camera closest to an angle of view of the virtual camera according to the inside and outside parameters of the virtual camera, the virtual depth image, and the monitoring depth image of each monitoring camera includes: calculating the three-dimensional coordinates of the target point in the three-dimensional scene model under the visual angle of the virtual camera according to the internal and external parameters of the virtual camera and the virtual depth image; calculating direction vectors of the target three-dimensional coordinates to the positions of the virtual cameras and the positions of the monitoring cameras respectively, and calculating included angles between the direction vectors corresponding to the virtual cameras and the direction vectors corresponding to the monitoring cameras respectively; and selecting the monitoring camera corresponding to the smallest included angle in the included angles as the monitoring camera closest to the visual angle of the virtual camera.

In an embodiment of the present application, the method for determining whether the target point under the viewing angle of the virtual camera is blocked under the currently selected closest viewing angle of the monitoring camera includes: calculating a first depth value of the target point under the view angle of the selected monitoring camera according to the currently selected internal and external parameters of the closest monitoring camera; acquiring a second depth value corresponding to the target point on the monitoring depth image according to the currently selected monitoring depth image corresponding to the closest monitoring camera; judging whether the first depth value is larger than the second depth value; if yes, judging that the screen is blocked; if not, the image is judged not to be shielded.

In an embodiment of the present application, a formula for calculating a first depth value of the target point under the view angle of the selected monitoring camera according to the currently selected internal and external parameters of the monitoring camera is as follows:

wherein X, Y, Z is the three-dimensional coordinates of the target point in the three-dimensional model data; x and y are two-dimensional coordinates of the target point under the physical coordinate system of the selected monitoring camera; r, T, K are the inside-outside parameters of the monitoring camera and d is the depth value of the target point at the view angle of the selected monitoring camera.

In an embodiment of the present application, the method for determining whether the target point under the viewing angle of the virtual camera is blocked under the currently selected closest viewing angle of the monitoring camera further includes: presetting an offset to judge whether the first depth value is larger than the sum of the second depth value and the offset; if yes, judging that the screen is blocked; if not, the image is judged not to be shielded.

In an embodiment of the present application, the internal and external parameters and the video data of the plurality of monitoring cameras are obtained after being processed in a binary format.

In an embodiment of the present application, the scene model color map is a static scene color map obtained according to the three-dimensional scene model; the scene color map contains color data of different resolutions to match different pixel accuracies in the three-dimensional scene model.

To achieve the above and other related objects, the present application provides a multi-channel video and three-dimensional model fusion apparatus, comprising: the acquisition module is used for acquiring internal and external parameters and video data of the multiple monitoring cameras so as to form a three-dimensional scene model corresponding to the current frame and a scene model color map corresponding to the three-dimensional scene model; the rendering module is used for rendering the three-dimensional scene under the visual angle of each monitoring camera in the three-dimensional scene model to obtain a monitoring depth image corresponding to each monitoring camera, and rendering the three-dimensional scene under the visual angle of the corresponding virtual camera to obtain a virtual depth image and a color image; the processing module is used for selecting the monitoring camera closest to the visual angle of the virtual camera according to the internal and external parameters of the virtual camera, the virtual depth image and the monitoring depth image of each monitoring camera, and judging whether a target point under the visual angle of the virtual camera is blocked under the currently selected visual angle of the closest monitoring camera; if the monitoring camera is blocked, judging whether a next monitoring camera closest to the visual angle of the virtual camera exists; if yes, repeating the previous step; if not, rendering is carried out according to the pixel color of the scene color map; if the target point is not shielded, rendering the target point under the virtual camera view angle according to the pixel color corresponding to the target point in the currently selected video analysis image of the nearest monitoring camera; and finally, obtaining a completely rendered three-dimensional scene model corresponding to the current frame.

To achieve the above and other related objects, the present application provides a multi-channel video and three-dimensional model fusion apparatus, comprising: a memory, a processor, and a communicator; the memory is used for storing programs; the processor runs a program to realize the fusion method of the multi-channel video and the three-dimensional model; the communicator is used for being connected with an external device in a communication mode.

To achieve the above and other related objects, the present application provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a multi-channel video and three-dimensional model fusion method as described above.

In summary, the present application provides a method, an apparatus, a device, and a storage medium for fusing a multi-channel video and a three-dimensional model.

Has the following beneficial effects:

the method can greatly improve the fusion effect of the video and the three-dimensional model, and avoid the problems of overlapping or color distortion of scene objects.

Drawings

Fig. 1 is a flowchart illustrating a multi-channel video and three-dimensional model fusion method according to an embodiment of the present disclosure.

Fig. 2 is a block diagram of a multi-channel video and three-dimensional model fusion apparatus according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a multi-channel video and three-dimensional model fusion apparatus according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present application, and the drawings only show the components related to the present application and are not drawn according to the number, shape and size of the components in actual implementation, and the type, number and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Fig. 1 shows a flow chart of a multi-channel video and three-dimensional model fusion method according to an embodiment of the present invention. As shown, the method comprises:

step S101: the method comprises the steps of obtaining internal and external parameters and video data of a plurality of monitoring cameras, and forming a three-dimensional scene model corresponding to a current frame and a scene model color mapping corresponding to the three-dimensional scene model.

In this embodiment, the monitoring camera is a camera or a camera capable of capturing video and images, such as a monitor and a camera at an intersection.

In particular, the inside and outside position parameters of the monitoring camera, and the video data may be read from the binary file. The reason for using binary files instead of files in a format similar to OBJ is that: on one hand, files in the OBJ format are stored in a text form, the text needs to be analyzed during reading, the reading is very slow, the binary files are directly and continuously read, and the speed can be greatly improved. On the other hand, the file processed by the binary format can be encrypted or obfuscated to ensure the security of the data.

In this embodiment, the scene model color maps are obtained according to the three-dimensional scene model, so that the scene model color maps corresponding to the monitoring cameras can be formed, that is, the scene model color maps are equivalent to the model colors before video fusion.

In this embodiment, after acquiring the video stream link of the video data, the decoding is started. The obtaining manner is determined according to specific situations, and generally, an rtsp video stream with low delay is used as the video stream. The video stream is parsed into rgb images to obtain video parsing images, which can then be placed into a data buffer for subsequent use and processing.

It should be noted that the video analysis image is not completely the same as the static scene color map obtained from the three-dimensional scene model, and specifically, the video analysis image includes more optimal color data and more complete color data.

The purpose of static scene color mapping is to separate color from three-dimensional data so that high/low precision three-dimensional data can be selected from different scenes to match the corresponding high/low resolution color data.

In summary, this is also the reason why the color rendering is performed by determining whether to use the video analysis image or the static scene color map in the subsequent method of the present application.

In the embodiment, a three-dimensional scene model can be formed according to the internal and external parameters of the plurality of monitoring cameras and the video data.

For example, two cameras spaced at a certain distance acquire two images of the same scene at the same time, corresponding pixel points in the two images are found through a stereo matching algorithm, then parallax information is calculated according to a trigonometric principle, and the parallax information can be used for representing depth information of objects in the scene through conversion. Based on the stereo matching algorithm, the depth image of the scene can be obtained by shooting a group of images of different angles in the same scene.

In this embodiment, the view matrix corresponding to each monitoring camera is calculated according to the three-dimensional coordinates of the position of the monitoring camera and the pose angles (and the internal and external parameters of the monitoring camera), in which coordinate conversion is mainly performed, that is, the three-dimensional model is converted from the world coordinate system to the camera coordinate system.

In this embodiment, the coordinate transformation used in the present application further includes transformation between two-dimensional coordinates and three-dimensional coordinates and transformation between three-dimensional coordinate systems.

The two-dimensional coordinates and the three-dimensional coordinates are converted by a 3 × 3 camera reference matrix, and the conversion from the three-dimensional coordinates to the two-dimensional coordinates normalizes the depth and loses depth information. On the contrary, the conversion from two-dimensional coordinates to three-dimensional coordinates requires depth information, and in general, it is assumed that the depth of all points on the image is the same.

The transformation between the three-dimensional coordinate systems is performed, for example, by means of a projection matrix. The three-dimensional coordinate transformation includes rotation about x, y, z axes and translation in the xyz direction, the rotation being transformed with a matrix of 3x3, the translation being represented by a matrix of 3x1, and the combination becoming a matrix of 3x 4. For ease of calculation a homogeneous coordinate system is used and a row (0, 0, 0, 1) is added below the 3x4 matrix making it a square matrix.

Step S102: rendering the three-dimensional scene under the visual angle of each monitoring camera in the three-dimensional scene model to obtain a monitoring depth image corresponding to each monitoring camera, and rendering the three-dimensional scene under the visual angle of the corresponding virtual camera to obtain a virtual depth image and a color image.

It should be noted that, here, depth information for a scene may be obtained by constructing a three-dimensional scene model, and naturally, rendering may be performed on the basis of the three-dimensional scene model or the depth information may be obtained, which is different in the method of obtaining the depth information. The above-mentioned methods for obtaining depth information have many cases or applications.

In this embodiment, on one hand, depth information corresponding to each monitoring camera may be obtained through rendering on the basis of obtaining the depth information through the three-dimensional scene model; on the other hand, the depth information corresponding to each monitoring camera can be obtained through rendering on the basis of the three-dimensional scene model.

In this embodiment, the monitoring depth image or the virtual depth image is an image including depth information in a current frame scene, and the color image includes an image of color information in the current frame scene.

In this embodiment, the virtual camera is not a "real" camera, which is a view angle from which an observer views a scene in the three-dimensional scene model, and helps the observer understand multiple video images from any view angle of the scene. In the application, a virtual camera is used to refer to any view angle in a three-dimensional scene.

Step S103: selecting a monitoring camera closest to the visual angle of the virtual camera according to the internal and external parameters of the virtual camera, the virtual depth map and the monitoring depth image of each monitoring camera, and judging whether a target point under the visual angle of the virtual camera is blocked under the currently selected visual angle of the closest monitoring camera.

In this embodiment, a ray emitted to a camera may intersect objects in two three-dimensional worlds. If three-dimensional coordinate transformation is used for projection, two points can be projected to the same pixel point on the camera picture finally, so that the actual depth of the two points under the corresponding camera view angle needs to be judged, and then the corresponding video is pasted on an object closer to the camera

In an embodiment of the present application, a method for selecting a monitoring camera closest to an angle of view of the virtual camera according to the inside and outside parameters of the virtual camera, the virtual depth image, and the monitoring depth image of each monitoring camera includes:

a. and calculating the three-dimensional coordinates of the target point in the three-dimensional scene model under the visual angle of the virtual camera according to the internal and external parameters of the virtual camera and the virtual depth image.

In this embodiment, the two-dimensional coordinates and the three-dimensional coordinates are transformed by the 3 × 3 camera reference matrix, and the transformation from the three-dimensional coordinates to the two-dimensional coordinates normalizes the depth and thus loses the depth information. On the contrary, the conversion from two-dimensional coordinates to three-dimensional coordinates requires depth information, and in general, it is assumed that the depth of all points on the image is the same.

b. And calculating direction vectors of the target three-dimensional coordinates to the positions of the virtual cameras and the positions of the monitoring cameras respectively, and calculating included angles between the direction vectors corresponding to the virtual cameras and the direction vectors corresponding to the monitoring cameras respectively.

The transformation between three-dimensional coordinate systems is, for example, performed by means of a projection matrix. The three-dimensional coordinate transformation includes rotation about x, y, z axes and translation in the xyz direction, the rotation being transformed with a matrix of 3x3, the translation being represented by a matrix of 3x1, and the combination becoming a matrix of 3x 4. For ease of calculation a homogeneous coordinate system is used and a row (0, 0, 0, 1) is added below the 3x4 matrix making it a square matrix.

c. And selecting the monitoring camera corresponding to the smallest included angle in the included angles as the monitoring camera closest to the visual angle of the virtual camera.

In this embodiment, the angle between the virtual camera and each of the monitoring cameras is calculated so that the next minimum angle is selected when the minimum angle is subsequently determined to be blocked.

In an embodiment of the present application, the method for determining whether the target point under the viewing angle of the virtual camera is blocked under the currently selected closest viewing angle of the monitoring camera includes:

a. and calculating a first depth value of the target point under the view angle of the selected monitoring camera according to the currently selected internal and external parameters of the closest monitoring camera.

In the embodiment, specifically, R, T is an internal parameter of the monitoring camera, and K is an external parameter of the monitoring camera.

Through the formula, the depth value of the target point under the view angle of the selected monitoring camera can be obtained.

b. And acquiring a second depth value corresponding to the target point on the monitored depth image according to the monitored depth image corresponding to the currently selected closest monitoring camera.

In this embodiment, different from the above-mentioned obtaining of the first depth value, the second depth value is obtained by rendering the obtained monitored depth image by selecting in the previous step according to the currently selected closest monitoring camera, and corresponding to the target point position.

c. Judging whether the first depth value is larger than the second depth value; if yes, judging that the screen is blocked; if not, the image is judged not to be shielded.

In this embodiment, the offset is preset, so that the method can be more flexibly adapted to scenes of different environments, for example, the difference between depth values in some scenes is not large, and very precise and fine determination is required, while the difference between depth values in some scenes is large, and very fine determination is not required, so that the method can improve the cheap amount, reduce the processing amount and processing time in the whole fusion or rendering process, and greatly improve the processing speed.

In addition, the offset can be modified to achieve different effects, so that the method can be suitable for monitoring camera equipment of different models or types.

Step S104: if the monitoring camera is blocked, judging whether a next monitoring camera closest to the visual angle of the virtual camera exists; if yes, repeating the previous step; and if not, rendering according to the pixel color of the scene color map.

In this embodiment, after the determination in step S103, if the monitoring camera is blocked, it needs to be determined whether there is a next monitoring camera closest to the viewing angle of the virtual camera.

In the present embodiment, it is determined whether the viewing angles are closest, and a range, such as 0 ° to 90 °, may be preset for the viewing angle, that is, when the included angle between the virtual camera and the monitoring camera is greater than 90 °, the close viewing angle is not calculated. Namely, only the monitoring camera corresponding to the virtual camera with the included angle less than or equal to 90 degrees is judged.

In this embodiment, if there is a next monitoring camera closest to the viewing angle of the virtual camera, step S103 is repeated, and it is determined whether the target point at the viewing angle is blocked for the next monitoring camera closest to the viewing angle.

If the image data does not exist, that is, all the monitoring cameras with the closest viewing angles are judged to be blocked, rendering is performed according to the pixel color of the scene color map acquired in step S101.

Step S105: and if the target point is not shielded, rendering the target point under the virtual camera view angle according to the pixel color corresponding to the target point in the currently selected video analysis image of the nearest monitoring camera.

In this embodiment, the video analysis image is analyzed by the video data in step S101 or step S101 to obtain an rgb image, and the rgb image is placed in a data buffer, so as to facilitate subsequent use and processing.

Step S106: and finally, obtaining a completely rendered three-dimensional scene model corresponding to the current frame.

It should be noted that, for the overlapped area which often appears in the three-dimensional video fusion process, if a simple average value calculation is performed, a ghost image appears if a vehicle or a pedestrian passes through. For better display effect, the camera video closest to the current virtual camera view angle needs to be taken for mapping. Therefore, the whole multi-angle fusion process of the method can be regarded as that the videos are pasted on the model layer by layer, and the videos are sequentially stacked from bottom to top from large to small according to the angle difference between the monitoring visual angle and the virtual camera visual angle.

In this embodiment, through the process of the above steps, the multi-angle rendering is reasonably performed on the three-dimensional scene model of the current frame scene, so that the observation effect at any angle in the current frame scene is excellent.

Fig. 2 is a block diagram of a multi-channel video and three-dimensional model fusion apparatus according to an embodiment of the present invention. As shown, the multi-channel video and three-dimensional model fusion apparatus 200 includes:

an obtaining module 201, configured to obtain internal and external parameters and video data of a plurality of monitoring cameras, so as to form a three-dimensional scene model corresponding to a current frame and a scene model color map corresponding to the three-dimensional scene model;

a rendering module 202, configured to respectively render a three-dimensional scene under the viewing angle of each monitoring camera in the three-dimensional scene model to obtain a monitored depth image corresponding to each monitoring camera, and render a three-dimensional scene under the viewing angle of a corresponding virtual camera to obtain a virtual depth image and a color image;

the processing module 203 is configured to select a monitoring camera closest to the viewing angle of the virtual camera according to the internal and external parameters of the virtual camera, the virtual depth image, and the monitoring depth image of each monitoring camera, and determine whether a target point at the viewing angle of the virtual camera is blocked at the currently selected viewing angle of the closest monitoring camera; if the monitoring camera is blocked, judging whether a next monitoring camera closest to the visual angle of the virtual camera exists; if yes, repeating the previous step; if not, rendering is carried out according to the pixel color of the scene color map; if the target point is not shielded, rendering the target point under the virtual camera view angle according to the pixel color corresponding to the target point in the currently selected video analysis image of the nearest monitoring camera; and finally, obtaining a completely rendered three-dimensional scene model corresponding to the current frame.

In an embodiment of the present application, the multi-channel video and three-dimensional model fusion apparatus 200 may be a graphics processor, such as a programmable graphics rendering pipeline and a shader. Wherein, the graphics processor can be realized in the form of calling by a processing element through software; or may be implemented entirely in hardware.

The programmable graphics rendering pipeline is a modern GPU rendering architecture, and the main flow of the programmable graphics rendering pipeline comprises data filling, coordinate transformation, vertex illumination, clipping and rasterization, pixel processing, A L PHA test, depth test, global mixing and output, wherein the programmable characteristic is mainly embodied in a coordinate transformation step and a pixel processing step.

Modern graphics processing interfaces, represented by opengl, have programmable rendering modules, including vertex shaders and fragment shaders. The vertex shader corresponds to a coordinate transformation and vertex lighting part in the programmable graphics rendering pipeline, and the fragment shader corresponds to a pixel processing part in the programmable graphics rendering pipeline.

The vertex coloring program extracts information such as vertex positions, normal vectors, texture coordinates and the like from a GPU register, completes operations such as vertex coordinate space conversion, normal vector space conversion, illumination calculation and the like, and finally transmits calculated data to a designated register; then the fragment coloring program acquires data such as texture coordinates and illumination information from the fragment coloring program, performs color calculation of each fragment according to the information and the information transmitted from the application program, and finally sends the processed data to the grating operation module.

The shader technology greatly improves the controllability and the effect of the graphic rendering, and is the basis of the invention. The model rendering part of the invention uses a programmable renderer and acquires the required data by controlling the output of the renderer.

In an embodiment of the present application, the modules are used together to implement the steps of the multi-channel video and three-dimensional model fusion method as shown in fig. 1.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the processing module 203 may be a separate processing element, or may be integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the processing module 203. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 3 is a schematic structural diagram of a multi-channel video and three-dimensional model fusion device according to an embodiment of the present invention. As shown, the multi-channel video and three-dimensional model fusion apparatus 300 includes: a memory 301, a processor 302, and a communicator 303; the memory 301 is used for storing programs; the processor 302 runs a program to implement the multi-channel video and three-dimensional model fusion method as described in fig. 1; the communicator 303 is used for communication connection with an external device.

The external device may include a monitoring camera, a camera, and the like, as well as a server, a terminal, and the like.

The Memory 301 may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The processor 302 may also be a graphics processor, such as a programmable graphics rendering pipeline and a shader.

In addition, the Processor 302 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The communicator 303 may include one or more sets of modules of different communication means, such as a CAN communication module communicatively coupled to a CAN bus, the communication connection may be one or more wired/wireless communication means and combinations thereof, including any one or more of the Internet, CAN, Intranet, Wide Area Network (WAN), local area network (L AN), wireless network, digital subscriber line (DS L) network, frame Relay network, Asynchronous Transfer Mode (ATM) network, Virtual Private Network (VPN), and/or any other suitable communication network, such as any one or more of WIFI, Bluetooth, NFC, GPRS, GSM, and Ethernet.

To achieve the above objects and other related objects, the present application provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements a multi-channel video and three-dimensional model fusion method as described in fig. 1.

The computer-readable storage medium, as will be appreciated by one of ordinary skill in the art: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In summary, according to the method for fusing a multi-channel video and a three-dimensional model, the device, the equipment and the storage medium thereof, a three-dimensional scene model corresponding to a current frame and a scene model color map corresponding to the three-dimensional scene model are formed by acquiring internal and external parameters and video data of a plurality of monitoring cameras; rendering the three-dimensional scene under the visual angle of each monitoring camera in the three-dimensional scene model to obtain a monitoring depth image corresponding to each monitoring camera, and rendering the three-dimensional scene under the visual angle of a corresponding virtual camera to obtain a virtual depth image and a color image; selecting a monitoring camera closest to the visual angle of the virtual camera according to the internal and external parameters of the virtual camera, the virtual depth image and the monitoring depth image of each monitoring camera, and judging whether a target point under the visual angle of the virtual camera is blocked under the currently selected visual angle of the closest monitoring camera; if the monitoring camera is blocked, judging whether a next monitoring camera closest to the visual angle of the virtual camera exists; if yes, repeating the previous step; if not, rendering is carried out according to the pixel color of the scene color map; if the target point is not shielded, rendering the target point under the virtual camera view angle according to the pixel color corresponding to the target point in the currently selected video analysis image of the nearest monitoring camera; and finally, obtaining a completely rendered three-dimensional scene model corresponding to the current frame.

The application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. A method for fusing a multi-channel video and a three-dimensional model, which is characterized by comprising the following steps:

acquiring internal and external parameters and video data of a plurality of monitoring cameras to form a three-dimensional scene model corresponding to a current frame and a scene model color map corresponding to the three-dimensional scene model;

rendering the three-dimensional scene under the visual angle of each monitoring camera in the three-dimensional scene model to obtain a monitoring depth image corresponding to each monitoring camera, and rendering the three-dimensional scene under the visual angle of a corresponding virtual camera to obtain a virtual depth image and a color image;

selecting a monitoring camera closest to the visual angle of the virtual camera according to the internal and external parameters of the virtual camera, the virtual depth image and the monitoring depth image of each monitoring camera, and judging whether a target point under the visual angle of the virtual camera is blocked under the currently selected visual angle of the closest monitoring camera;

if the monitoring camera is blocked, judging whether a next monitoring camera closest to the visual angle of the virtual camera exists; if yes, repeating the previous step; if not, rendering is carried out according to the pixel color of the scene color map;

if the target point is not shielded, rendering the target point under the virtual camera view angle according to the pixel color corresponding to the target point in the currently selected video analysis image of the nearest monitoring camera;

and finally, obtaining a completely rendered three-dimensional scene model corresponding to the current frame.

2. The method for fusing multi-channel video and three-dimensional model according to claim 1, wherein the method for selecting the monitoring camera closest to the viewing angle of the virtual camera according to the inside and outside parameters of the virtual camera, the virtual depth image and the monitored depth image of each monitoring camera comprises:

calculating the three-dimensional coordinates of the target point in the three-dimensional scene model under the visual angle of the virtual camera according to the internal and external parameters of the virtual camera and the virtual depth image;

calculating direction vectors of the target three-dimensional coordinates to the positions of the virtual cameras and the positions of the monitoring cameras respectively, and calculating included angles between the direction vectors corresponding to the virtual cameras and the direction vectors corresponding to the monitoring cameras respectively;

and selecting the monitoring camera corresponding to the smallest included angle in the included angles as the monitoring camera closest to the visual angle of the virtual camera.

3. The method for fusing multi-channel videos and three-dimensional models according to claim 1, wherein the method for determining whether the target point at the viewing angle of the virtual camera is blocked at the currently selected closest viewing angle of the monitoring camera comprises:

calculating a first depth value of the target point under the view angle of the selected monitoring camera according to the currently selected internal and external parameters of the closest monitoring camera;

acquiring a second depth value corresponding to the target point on the monitoring depth image according to the currently selected monitoring depth image corresponding to the closest monitoring camera;

judging whether the first depth value is larger than the second depth value; if yes, judging that the screen is blocked; if not, the image is judged not to be shielded.

4. The method as claimed in claim 3, wherein the formula for calculating the first depth value of the target point under the viewing angle of the selected monitoring camera according to the internal and external parameters of the currently selected monitoring camera is as follows:

5. The method for fusing multi-channel videos and three-dimensional models according to claim 3, wherein the method for determining whether the target point at the viewing angle of the virtual camera is blocked at the currently selected viewing angle of the nearest monitoring camera further comprises:

presetting an offset to judge whether the first depth value is larger than the sum of the second depth value and the offset;

if yes, judging that the screen is blocked; if not, the image is judged not to be shielded.

6. The method according to claim 1, wherein the internal and external parameters of the plurality of surveillance cameras and the video data are obtained after being processed in binary format.

7. The method for fusing multi-channel video and three-dimensional model according to claim 1, wherein the scene model color map is a static scene color map obtained from the three-dimensional scene model; the scene color map contains color data of different resolutions to match different pixel accuracies in the three-dimensional scene model.

8. An apparatus for fusing a multi-channel video with a three-dimensional model, the apparatus comprising:

the acquisition module is used for acquiring internal and external parameters and video data of the multiple monitoring cameras so as to form a three-dimensional scene model corresponding to the current frame and a scene model color map corresponding to the three-dimensional scene model;

the rendering module is used for rendering the three-dimensional scene under the visual angle of each monitoring camera in the three-dimensional scene model to obtain a monitoring depth image corresponding to each monitoring camera, and rendering the three-dimensional scene under the visual angle of the virtual camera to obtain a virtual depth image and a color image;

the processing module is used for selecting the monitoring camera closest to the visual angle of the virtual camera according to the internal and external parameters of the virtual camera, the virtual depth image and the monitoring depth image of each monitoring camera, and judging whether a target point under the visual angle of the virtual camera is blocked under the currently selected visual angle of the closest monitoring camera; if the monitoring camera is blocked, judging whether a next monitoring camera closest to the visual angle of the virtual camera exists; if yes, repeating the previous step; if not, rendering is carried out according to the pixel color of the scene color map; if the target point is not shielded, rendering the target point under the virtual camera view angle according to the pixel color corresponding to the target point in the currently selected video analysis image of the nearest monitoring camera; and finally, obtaining a completely rendered three-dimensional scene model corresponding to the current frame.

9. A multi-channel video and three-dimensional model fusion apparatus, comprising: a memory, a processor, and a communicator;

the memory is used for storing programs; the processor runs a program to realize the fusion method of the multi-channel video and the three-dimensional model according to any one of claims 1 to 7; the communicator is used for being connected with an external device in a communication mode.

10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the multi-channel video and three-dimensional model fusion method according to any one of claims 1 to 7.