CN117237512A

CN117237512A - Three-dimensional scene mapping method and system for video image

Info

Publication number: CN117237512A
Application number: CN202311492934.7A
Authority: CN
Inventors: 张斌; 吴上锦; 邵传华; 陈晓斌; 温景昌
Original assignee: Shenzhen Etop Information Co ltd
Current assignee: Shenzhen Etop Information Co ltd
Priority date: 2023-11-10
Filing date: 2023-11-10
Publication date: 2023-12-15
Anticipated expiration: 2043-11-10
Also published as: CN117237512B

Abstract

The application relates to the technical field of computer vision, in particular to a three-dimensional scene mapping method and system for video images. The method comprises the steps of firstly loading a three-dimensional scene image corresponding to a target monitoring area, collecting a monitoring video image collection of the target monitoring area, secondly extracting a plurality of first characteristic points and a plurality of second characteristic points, matching the first target characteristic points with the second target characteristic points to obtain a characteristic point matching degree set, selecting N first target characteristic points and N second target characteristic points from the plurality of first characteristic points and the plurality of second characteristic points based on the characteristic point matching degree set, calculating camera coordinates of target shooting equipment and rotation angles of the target shooting equipment according to the N first target characteristic points and the N second target characteristic points, and finally mapping the monitoring video image collection to the three-dimensional scene image according to the camera coordinates and the rotation angles. The problem of video image and three-dimensional scene mapping inefficiency in prior art is solved.

Description

Three-dimensional scene mapping method and system for video image

Technical Field

The application relates to the technical field of computer vision, in particular to a three-dimensional scene mapping method and system for video images.

Background

The monitoring video is widely applied to a plurality of fields such as security protection, traffic, environmental monitoring and the like. For example, traffic monitoring systems use surveillance videos to monitor traffic scenes such as roads, intersections, and parking lots to ensure traffic safety and traffic flow control. The monitoring video can be used for traffic management, accident investigation, violation detection and traffic flow statistics and prediction, and helps to improve traffic efficiency and reduce congestion. However, the monitoring video image has the characteristics of large data volume, sparse distribution of high-value information and the like, and has the problems of low manual browsing and searching of video, redundant video image transmission and the like.

In order to improve the efficiency of manually browsing the monitoring video images and facilitate the user to know surrounding scenes while watching the video, the data of the Internet of things such as the monitoring video is accessed into the three-dimensional scene so as to meet the business requirements of different fields such as security protection, traffic and the like. The three-dimensional scene-based method can truly restore the physical world objects such as topography, buildings, bridges and the like, and has the effects of high precision, equal proportion and high simulation. The method maps the monitoring video into the three-dimensional scene, and has the characteristics of high restoration fidelity, intuitiveness, close and appropriate video position and real position, easy understanding and the like.

At present, the mapping process of the video image and the three-dimensional scene is generally manually operated, the calibration of the video image and the three-dimensional scene is needed to be manually realized, camera information is restored by adjusting a plurality of parameter values such as the position, the orientation, the depression angle and the like of a camera, and the fitting range of the video image and the actual model is needed to be manually compared, so that related parameters are determined. The method has low efficiency and poor precision, and the situation needs to be improved.

Disclosure of Invention

In order to solve the problem of low mapping efficiency of video images and three-dimensional scenes in the prior art, the application provides a three-dimensional scene mapping method and system for video images, which adopts the following technical scheme:

in a first aspect, the present application provides a three-dimensional scene mapping method for video images, including:

acquiring position information of a target monitoring area, and loading a three-dimensional scene image corresponding to the target monitoring area according to the position information;

collecting a monitoring video image collection corresponding to each of at least two camera devices in the target monitoring area, wherein each camera device in the at least two camera devices is positioned at different positions;

extracting a plurality of first feature points in the three-dimensional scene image and a plurality of second feature points in a target monitoring video image, wherein the target monitoring video image is any monitoring video image in the monitoring video image set;

Matching a first target feature point with a second target feature point to obtain a feature point matching degree set, wherein the first target feature point is any one feature point in the plurality of first feature points, and the second target feature point is any one feature point in the plurality of second feature points;

determining N first target feature points and N second target feature points based on the feature point matching degree set, the first feature points and the second feature points, wherein N is an integer greater than or equal to 4;

calculating camera coordinates of a target image capturing device and a rotation angle of the target image capturing device according to the N first target feature points and the N second target feature points, wherein the target image capturing device is any one image capturing device of the at least two image capturing devices;

and mapping the monitoring video image collection to the three-dimensional scene image according to the camera coordinates and the rotation angle.

By adopting the technical scheme, the video image is mapped into the three-dimensional scene image, firstly, image data preparation is carried out, the three-dimensional scene image corresponding to the target monitoring area is loaded, and the monitoring video image set corresponding to each camera equipment in the target monitoring area is collected; secondly, carrying out feature point matching and selection, extracting a plurality of first feature points from a three-dimensional scene image, extracting a plurality of second feature points from a target monitoring video image, matching the first target feature points with the second target feature points to obtain a feature point matching degree set, selecting N first target feature points from the plurality of first feature points and N second target feature points from the plurality of second feature points based on the feature point matching degree set, calculating camera coordinates and rotation angles of the target imaging device according to the N first target feature points and the N second target feature points, and finally mapping the monitoring video image in total into the three-dimensional scene image according to the camera coordinates and the rotation angles; compared with the prior art, the method has the advantages that a plurality of parameter values of the camera are required to be manually adjusted to restore camera information, video images are mapped into three-dimensional scene images, and the problem of low efficiency exists; according to the scheme, the monitoring video image collection shot by each camera device in the target monitoring area is automatically matched into the three-dimensional scene image, and the parameter values of the camera devices are automatically adjusted and regulated to restore the information of the camera devices, so that the problem of low mapping efficiency of the video image and the three-dimensional scene in the prior art is solved.

Optionally, the calculating the camera coordinates of the target image capturing apparatus and the rotation angle of the target image capturing apparatus according to the N first target feature points and the N second target feature points includes:

reading the position information of the N first target feature points to obtain N image coordinates;

reading the position information of the N second target feature points to obtain N screen coordinates, and converting each screen coordinate in the N screen coordinates into a world coordinate to obtain N world coordinates;

and calling a solvepnp function to calculate the N image coordinates and the N world coordinates so as to obtain the camera coordinates and the rotation angle.

By adopting the technical scheme, firstly, the position information of N first target feature points is read to obtain N image coordinates, each first target feature point corresponds to one image coordinate, secondly, the position information of N second target feature points is read to obtain N screen coordinates, each screen coordinate in the N screen coordinates is converted into world coordinates to obtain N world coordinates, each second target feature point corresponds to one world coordinate, and finally, a solvepnp function is called to calculate the N image coordinates and the N world coordinates to obtain camera coordinates and rotation angles; according to the scheme, the image coordinates and the world coordinates are automatically detected through an algorithm and used as input parameters, the solvepnp function is directly called for calculation, so that the camera coordinates and the rotation angle are obtained, and the efficiency of mapping the video images on the three-dimensional scene is improved.

Optionally, the calling a solvepnp function to calculate the N image coordinates and the N world coordinates to obtain the camera coordinates and the rotation angle includes:

judging whether a first image coordinate is matched with a first world coordinate, wherein the first image coordinate is any one of the N image coordinates, and the first world coordinate is any one of the N world coordinates;

if the first image coordinate is not matched with the first world coordinate, matching the first image coordinate with other world coordinates except the first world coordinate in the N world coordinates until the matching is successful, and obtaining a first target world coordinate corresponding to the first image coordinate;

and calling a solvepnp function to calculate the first image coordinate and the first target world coordinate so as to obtain the camera coordinate and the rotation angle.

By adopting the technical scheme, the corresponding first image coordinates and the first world coordinate input sequence are consistent, otherwise, the final calculation result is affected; in the process of calling a solvepnp function to calculate, firstly, judging whether a first image coordinate is matched with a first world coordinate, and if the first image coordinate is not matched with the first world coordinate, matching the first image coordinate with other world coordinates except the first world coordinate in N world coordinates to obtain a first target world coordinate corresponding to the first image coordinate; after the first image coordinate and the first world coordinate are ensured to be consistent in input sequence, a solvepnp function is called to calculate the first image coordinate and the first target world coordinate so as to obtain a camera coordinate and a rotation angle; in the process of calling the solvepnp function to calculate, a judging link is added, when the first image coordinate is not matched with the first world coordinate, the first world coordinate is timely adjusted to be the first target world coordinate corresponding to the first image coordinate, the consistency of the input sequence is ensured, and the accuracy of a calculation result is improved.

Optionally, the obtaining the position information of the target monitoring area, and loading the three-dimensional scene image corresponding to the target monitoring area according to the position information includes:

loading three-dimensional model data based on the position information;

and performing layer rendering according to the three-dimensional model data to obtain the three-dimensional scene image.

By adopting the technical scheme, the three-dimensional model data is loaded based on the position information of the target monitoring area, the image layer rendering is carried out according to the three-dimensional model data to obtain the three-dimensional scene image, and the image layer rendering is carried out on the three-dimensional model data, so that bitmap textures added to the three-dimensional scene image or program textures, illumination, convex-concave texture mapping and positions relative to other objects are beneficial to the observation of a complete and clear image by a user.

Optionally, the collecting the monitoring video image collection corresponding to each of at least two image capturing devices in the target monitoring area includes:

acquiring real-time video streams corresponding to each of the at least two image capturing devices;

extracting image frames from the real-time video stream according to a preset time period;

and performing layer rendering on the extracted image frames to obtain the monitoring video image collection.

By adopting the technical scheme, the real-time video streams corresponding to the camera equipment are acquired, the real-time video streams of at least two camera equipment are acquired in the target monitoring area, the two camera equipment jointly acquire the target monitoring area, but the angles of the two camera equipment shooting the target monitoring area are different, then the image frames are extracted from the real-time video streams according to the preset time period, and the image frames are subjected to image layer rendering to obtain the monitoring video image collection.

Optionally, the mapping the surveillance video image collection to the three-dimensional scene image according to the camera coordinates and the rotation angle includes:

determining time stamp information of each monitoring video image in the monitoring video image collection;

and mapping the monitoring video images to the three-dimensional scene images according to the time stamp information of each monitoring video image, the camera coordinates and the rotation angle.

By adopting the technical scheme, the time stamp information of each monitoring video image in the monitoring video image collection is determined, the monitoring video images are mapped to the three-dimensional scene images according to the time stamp information, the camera coordinates of the target camera equipment and the rotation angle of the target camera equipment, and each monitoring video image in the monitoring video image collection is sequentially mapped to the three-dimensional scene images according to the time stamp information, so that the problem that the sequence of generating a plurality of monitoring video images is disordered is solved.

Optionally, the mapping the surveillance video image to the three-dimensional scene image according to the timestamp information of each surveillance video image and the camera coordinates and the rotation angle further includes:

selecting an image frame set with the same time stamp from the monitoring video image set according to the time stamp information;

splicing each image frame in the image frame set according to the camera coordinates and the rotation angle to obtain a spliced image;

mapping the stitched image to the three-dimensional scene image.

By adopting the technical scheme, since the monitoring video image comprises the image frames of at least two camera devices, in order to improve the accuracy of mapping the monitoring video image collection on the three-dimensional scene image, the image frame collection with the same time stamp is selected from the video monitoring image collection according to the time stamp information, each image frame in the image frame collection is spliced to obtain a spliced image, and the spliced image is mapped to the three-dimensional scene image.

Optionally, the extracting the plurality of second feature points in the target surveillance video image includes:

determining a feature point extraction algorithm corresponding to the target monitoring video image;

And extracting feature points of the target monitoring video image according to the feature point extraction algorithm to obtain a plurality of second feature points.

By adopting the technical scheme, in the process of extracting the plurality of second characteristic points in the target monitoring video image, firstly, a characteristic point extraction algorithm corresponding to the target monitoring video image is determined, and the characteristic points of the target monitoring video image are extracted according to the characteristic point extraction algorithm, so that the plurality of second characteristic points are obtained, the plurality of second characteristic points are conveniently used for matching with the first characteristic points in the three-dimensional scene image, and the redundant second characteristic points can be used for correcting work so as to improve the accuracy of mapping of the target monitoring video image.

Optionally, the matching the first target feature point with the second target feature point to obtain the feature point matching degree set includes:

calculating the distance between the first target feature point and the second target feature point;

and determining the distance as the feature point matching degree between the first target feature point and the second target feature point to obtain the feature point matching degree set.

By adopting the technical scheme, in the process of obtaining the feature point matching degree set after the first target feature point is matched with the second target feature point, firstly, the distance between the first target feature point and the second target feature point is calculated, secondly, the matching degree of the first target feature point and the second target feature point is judged according to the distance, and finally, the first target feature point is any one feature point in a plurality of first feature points, and the second target feature point is any one feature point in a plurality of second feature points, so that the first target feature point and the second target feature point are multiple, for example, each first target feature point is matched with the second target feature point respectively, and the feature point matching degree set is obtained.

In a second aspect, the present application provides a three-dimensional scene mapping system for video images, comprising:

the acquisition module is used for acquiring the position information of a target monitoring area and loading a three-dimensional scene image corresponding to the target monitoring area according to the position information;

the acquisition module is used for acquiring a monitoring video image collection corresponding to each of at least two camera devices in the target monitoring area, and each camera device in the at least two camera devices is positioned at different positions;

the first extraction module is used for extracting a plurality of first characteristic points in the three-dimensional scene image and a plurality of second characteristic points in a target monitoring video image, wherein the target monitoring video image is any monitoring video image in the monitoring video image set;

the matching module is used for matching a first target feature point with a second target feature point to obtain a feature point matching degree set, wherein the first target feature point is any one feature point in the plurality of first feature points, and the second target feature point is any one feature point in the plurality of second feature points;

the second extraction module is used for determining N first target feature points and N second target feature points based on the feature point matching degree set, the first feature points and the second feature points, wherein N is an integer greater than or equal to 4;

A processing module, configured to calculate camera coordinates of a target image capturing device and a rotation angle of the target image capturing device according to the N first target feature points and the N second target feature points, where the target image capturing device is any one image capturing device of the at least two image capturing devices;

and the mapping module is used for mapping the monitoring video image collection to the three-dimensional scene image according to the camera coordinates and the rotation angle.

By adopting the technical scheme, the video image is mapped into the three-dimensional scene image, firstly, the acquisition module and the acquisition module are used for preparing data, loading the three-dimensional scene image and acquiring a monitoring video image set; secondly, carrying out feature extraction, feature point matching and feature selection through a first extraction module, a matching module and a second extraction module, extracting a plurality of first feature points and a plurality of second feature points, matching a first target feature point with a second target feature point to obtain a feature point matching degree set, selecting N first target feature points and N second target feature points based on the feature point matching degree set, calculating camera coordinates and rotation angles of the target camera equipment through a processing module according to the N first target feature points and the N second target feature points, and finally, mapping the total of the monitoring video images into the three-dimensional scene image through a film and television module according to the camera coordinates and the rotation angles; the information of the image pickup equipment is restored by automatically adjusting the parameter values of the image pickup equipment, and the monitoring video image collection shot by each image pickup equipment in the target monitoring area is automatically matched into the three-dimensional scene image, so that the problem of low mapping efficiency of the video image and the three-dimensional scene in the prior art is solved.

In summary, it can be seen that, in the embodiment of the present application, compared with the prior art that multiple parameter values of the camera are required to be manually adjusted to restore the camera information, the video image is mapped into the three-dimensional scene image, which has the problem of low efficiency; according to the scheme, the parameter values of the image pickup devices are automatically adjusted to restore the information of the image pickup devices, and the monitoring video image collection shot by each image pickup device in the target monitoring area is automatically matched into the three-dimensional scene image, so that the problem of low mapping efficiency of the video image and the three-dimensional scene in the prior art is solved.

Drawings

FIG. 1 is a schematic view of a three-dimensional scene provided by an embodiment of the present application;

FIG. 2 is a schematic view of a video image mapped to a first target surveillance area according to an embodiment of the present application;

FIG. 3 is a schematic view of a video image mapped to a second target surveillance area according to an embodiment of the present application;

fig. 4 is a flow chart of a three-dimensional scene mapping method for video images according to an embodiment of the present application;

fig. 5 is a schematic diagram of a virtual structure of a three-dimensional scene mapping system for video images according to an embodiment of the present application.

Detailed Description

The terminology used in the following embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates to the contrary. It should also be understood that the term "and/or" as used in this disclosure is intended to encompass any or all possible combinations of one or more of the listed items.

The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.

Fig. 1 is a schematic three-dimensional scene provided by the embodiment of the present application, 101 in fig. 1 is a first target monitoring area, a three-dimensional scene image corresponding to the first target monitoring area is loaded, fig. 2 is a schematic scene in which a video image provided by the embodiment of the present application is mapped to the first target monitoring area, and 201 in fig. 2 is the first target monitoring area mapped with the video image. The first target monitoring area is an intersection section, after the video image is mapped to the first target monitoring area, the traffic flow of the intersection, which is not in a state of flowing, can be seen, and a user can intuitively observe the traffic condition of the first target monitoring area. The actual position, the gesture and the running direction of the vehicle are restored, and if traffic accidents occur at the crossroad, the user can conveniently judge the accident responsibility party.

102 in fig. 1 is a second target monitoring area, and a three-dimensional scene image corresponding to the second target monitoring area is loaded, as shown in fig. 3, fig. 3 is a schematic scene diagram of mapping a video image provided in the embodiment of the present application to the second target monitoring area, and 301 in fig. 3 is the second target monitoring area mapped with the video image. The second target monitoring area is an entrance and a parking area of an enterprise, after the video image is mapped to the second target monitoring area, a user can clearly observe the current situation of traffic flow at the entrance and the exit of the enterprise, and the user can easily judge which parking space the vehicle leaves the enterprise or which parking space the vehicle enters in by assisting other buildings in the three-dimensional scene image, if the vehicle has an accident at the entrance and the exit of the enterprise, the user can conveniently and clearly judge the dispute.

The embodiment of the application discloses a three-dimensional scene mapping method of a video image.

Referring to fig. 4, fig. 4 is a flowchart of a three-dimensional scene mapping method for video images according to an embodiment of the present application, where the method includes the following steps:

s10, acquiring position information of a target monitoring area, and loading a three-dimensional scene image corresponding to the target monitoring area according to the position information;

The method comprises the steps of firstly preparing image data, acquiring position information of a target monitoring area according to user requirements, and loading a three-dimensional scene image corresponding to the target monitoring area according to the position information, wherein the three-dimensional scene image specifically refers to a scene image with depth feeling and vividness created by using a three-dimensional graphic technology.

It is noted that the loaded three-dimensional scene image may be a scene image of only the range of the target monitoring area, or may be a scene image containing the target monitoring area, and the scene image may have scenes of other non-target monitoring areas.

In one embodiment, step S10 includes:

loading three-dimensional model data based on the position information;

and performing layer rendering according to the three-dimensional model data to obtain a three-dimensional scene image.

In the embodiment, in the process of loading the three-dimensional scene image, three-dimensional model data is loaded firstly based on the position information of the target monitoring area, then the three-dimensional scene image is obtained after layer rendering is carried out according to the three-dimensional model data, and the three-dimensional model is subjected to layer rendering, so that bitmap textures or program textures, illumination, convex-concave texture mapping and positions relative to other objects are added on the three-dimensional scene image, and the three-dimensional scene image is more vivid.

S20, collecting a monitoring video image collection corresponding to each of at least two camera devices in a target monitoring area, wherein each camera device in the at least two camera devices is located at different positions;

the method comprises the steps of collecting a monitoring video image set corresponding to each camera device in a target monitoring area, collecting monitoring video image sets of all angles of the target monitoring area as much as possible in order to enable the collected monitoring video images of the target monitoring area to be clear and complete, collecting the monitoring video image sets corresponding to each camera device in at least two camera devices, and enabling the camera devices to be located at different positions, so that the fusion expression effect of the monitoring video images in three-dimensional scene images is enhanced.

In one embodiment, step S20 includes:

acquiring real-time video streams corresponding to each of at least two camera devices;

and performing layer rendering on the extracted image frames to obtain a monitoring video image collection.

In an embodiment, a real-time video stream corresponding to each image capturing device is obtained, image frames are extracted from the real-time video stream according to a preset time period, the plurality of image frames can be obtained by sampling the image frames in the real-time video stream of the target monitoring area according to a preset frequency, and image layer rendering is performed on the extracted image frames to obtain a monitoring video image set. For example, the real-time video of the intersection is a video stream for acquiring 50 frames of video images, and then 5 frames of image frames can be acquired from the 50 frames of video images at a frequency of 5 frames per second. Specifically, each object included in each image frame is identified, the position coordinates of each object in the corresponding image are obtained, and object information matched with each object is obtained. In some examples, information about the geometry, structure, color, coordinates, etc. of the target may be extracted to improve the accuracy of target identification in the video image.

S30, extracting a plurality of first characteristic points in the three-dimensional scene image and a plurality of second characteristic points in the target monitoring video image, wherein the target monitoring video image is any monitoring video image in the monitoring video image set;

the feature point extraction work includes extracting a plurality of first feature points in the three-dimensional scene image and extracting a plurality of second feature points in the target monitoring video image, wherein the feature points generally refer to points with obvious changes in the image, such as corner points or boundary points, so that the corresponding second feature points on the follow-up first feature points can be conveniently matched.

In the following, a description will be given of how to extract the feature points in the image by taking the manner of extracting the second feature points as an example, in which the manner of extracting the plurality of first feature points in the three-dimensional scene image is similar to the manner of extracting the second feature points, and in one embodiment, step S30 includes:

and extracting feature points of the target monitoring video image according to a feature point extraction algorithm to obtain a plurality of second feature points.

In the embodiment, in the process of extracting a plurality of second feature points in the target monitoring video image, first, a feature point extraction algorithm corresponding to the target monitoring video image is determined, and feature point extraction is performed on the target monitoring video image according to the feature point extraction algorithm to obtain a plurality of second feature points, so that the plurality of second feature points can be conveniently used for matching with first feature points in the three-dimensional scene image in the follow-up process, and redundant second feature points can be used for correction work to improve the accuracy of mapping of the target monitoring video image. In some examples, feature points in the three-dimensional scene image and the target surveillance video image may be extracted based on Scale-invariant feature transform algorithm (Scale-invariant feature transform algorithm), respectively, and information such as positions, scales, directions, and the like of the feature points may be obtained.

S40, matching a first target feature point with a second target feature point to obtain a feature point matching degree set, wherein the first target feature point is any one of a plurality of first feature points, and the second target feature point is any one of a plurality of second feature points;

after a plurality of first feature points and a plurality of second feature points are obtained, a first target feature point is arbitrarily selected from the plurality of first target feature points, the first target feature point is matched with each of the plurality of second feature points to obtain a feature point matching degree set corresponding to the first target feature point, the plurality of first feature points are traversed, and therefore a feature point matching degree set corresponding to each first feature point can be obtained, the feature point matching degree set comprises a first target feature point, a second target feature point and matching degrees among the first target feature points and the second target feature points, and the subsequent selection of an optimal feature point matching degree set according to the matching degrees is facilitated.

In one embodiment, step S40 includes:

and determining the distance as the feature point matching degree between the first target feature point and the second target feature point to obtain a feature point matching degree set.

In an embodiment, first, a distance between a first target feature point and a second target feature point is calculated, where the distance may be any one of a euclidean distance, a manhattan distance, a chebyshev distance, and a mahalanobis distance, and after the distance between the first target feature point and the second target feature point is calculated, the distance may be determined as a matching degree of the first target feature point and the second target feature point, so each first target feature point is matched with a plurality of second target feature points respectively, to obtain a feature point matching degree set corresponding to the first target feature point, that is, a distance between the first target feature point and each feature point of the plurality of second feature points is obtained. In other examples, each second target feature point is respectively matched with a plurality of first target feature points, and a feature point matching degree set may also be obtained.

S50, determining N first target feature points and N second target feature points based on a feature point matching degree set, a plurality of first feature points and a plurality of second feature points, wherein N is an integer greater than or equal to 4;

and determining a feature point matching degree set corresponding to each first feature point in the plurality of first feature points, wherein a second feature point with the highest matching degree can be selected from the feature point matching degree set corresponding to each first feature point as a second feature point corresponding to the first feature point. For example, the plurality of first feature points includes ABCDE5 first feature points, and each first feature point in ABCDE corresponds to a feature point matching degree set. Taking the first feature point a as an example for illustration, a second feature point a ' with the highest matching degree is selected from the feature point matching degree set corresponding to the first feature point a (i.e. the distance between the second feature point a ' and the first feature point a in the feature point matching degree set corresponding to the first feature point a is closest, and meanwhile, the first feature point a is also the closest to the second feature point a '), so that a second feature point corresponding to each first feature point in the plurality of first feature points can be obtained, then N first feature points are selected from the plurality of first feature points as N first target feature points, and N second target feature points corresponding to the N first target feature points can be obtained.

In one embodiment, a feature point matching degree set corresponding to each first feature point in the plurality of first feature points is determined, at least two second feature points with matching degrees higher than a preset threshold value can be selected from the feature point matching degree set corresponding to each first feature point, and a second target feature point is determined according to the at least two feature points. When the two feature points are the two feature points, taking the point of the shortest distance between the connecting line between the two feature points and the first feature point as a second target feature point; and when the distance between the first characteristic point and the second characteristic point is larger than the first characteristic point, determining the characteristic point with the smallest distance between the first characteristic point and the second characteristic point, wherein the characteristic point with the smallest distance is used as a second target characteristic point. Thus, a second target feature point corresponding to each of the plurality of first feature points can be obtained, and N first feature points are selected from the plurality of first feature points as the first target feature points. For example, the plurality of first feature points include ABCDE5 first feature points, each first feature point in ABCDE corresponds to a feature point matching degree set, taking first feature point a as an example for explanation, at least two second feature points with matching degrees higher than a preset threshold value are selected from the feature point matching degree set corresponding to the first feature point a, and a second target feature point is determined according to the at least two second feature points. Taking three second feature points as an example for explanation, the three second feature points are formed into a triangle area, the distances between M points in the triangle area (the M points are 10 points randomly selected in the triangle area, 1000 points or selected according to actual conditions) and the first feature points are determined, and the feature point with the shortest distance is determined as the second target feature point.

S60, calculating camera coordinates of target image pickup equipment and rotation angles of the target image pickup equipment according to N first target feature points and N second target feature points, wherein the target image pickup equipment is any one image pickup equipment of at least two image pickup equipment;

the method comprises the steps of obtaining position information of a first target feature point and position information of a second target feature point through N first target feature points and N second target feature points, wherein the first target feature point is matched with the second target feature point, obtaining the position information of the first target feature point and the position information of the second target feature point, and calculating camera coordinates and rotation angles of target image capturing equipment according to the position information of the first target feature point and the position information of the second target feature point, wherein the target image capturing equipment is any one of at least two image capturing equipment.

In one embodiment, step S60 includes:

step 1, reading position information of N first target feature points to obtain N image coordinates;

each first target feature point in the extracted N first target feature points contains information such as position, color, scale, direction and the like, and the position information of the N first target feature points is read, namely the position information of each first target feature point in the N first target feature points is read, wherein the position information of the feature points is the pixel coordinate value of the feature point, namely the image coordinate. The image coordinates of the first target feature point can be obtained by reading the position information of the first target feature point.

Step 2, reading the position information of N second target feature points to obtain N screen coordinates, and converting each screen coordinate in the N screen coordinates into a world coordinate to obtain N world coordinates;

the screen coordinates of the second target feature points can be obtained by reading the position information of the second target feature points, and are converted into world coordinates, and the world coordinates are used for calculating the camera coordinates and the rotation angle of the target camera equipment.

Specifically, the screen coordinate is converted into world coordinate by adopting the method of Screen ToWorldPoint, and the screen coordinate is converted into the viewport coordinate firstly, then, the viewport coordinate is converted into the clipping coordinate, and finally, the clipping coordinate is converted into the world coordinate.

And step 3, calling a solvepnp function to calculate N image coordinates and N world coordinates so as to obtain camera coordinates and rotation angles.

The method comprises the steps of automatically detecting image coordinates and world coordinates through an algorithm to serve as input parameters, and directly calling a solvepnp function to calculate so as to obtain camera coordinates and rotation angles.

In one embodiment, step 3 comprises:

Judging whether the first image coordinate is matched with the first world coordinate, wherein the first image coordinate is any one of N image coordinates, and the first world coordinate is any one of N world coordinates;

and calling a solvepnp function to calculate the first image coordinate and the first target world coordinate so as to obtain a camera coordinate and a rotation angle.

In an embodiment, since the corresponding first image coordinates and the first world coordinate input sequence must be identical, the final calculation result is affected; therefore, in the process of calling the solvepnp function to calculate, it is necessary to determine whether the first image coordinate and the first world coordinate match, and if the first image coordinate and the first world coordinate match, the solvepnp function is called to calculate.

If the first image coordinate is not matched with the first world coordinate, the final calculation result will be affected, so that the first image coordinate needs to be matched with other world coordinates except the first world coordinate in the N world coordinates, and the first target world coordinate corresponding to the first image coordinate is obtained. Therefore, after the input sequence of the first image coordinate and the first target world coordinate is consistent, the solvepnp function is called to calculate the first image coordinate and the first target world coordinate.

Specifically, in the process of calling a solvepnp function to calculate, a judging link is added, when the first image coordinate is not matched with the first world coordinate, the first world coordinate is timely adjusted to be the first target world coordinate corresponding to the first image coordinate, consistency of input sequences is ensured, and accuracy of calculation results is improved.

And S70, mapping the monitoring video image collection to a three-dimensional scene image according to the camera coordinates and the rotation angle.

After the camera coordinates and the rotation angle of the target camera equipment are calculated, the monitoring video image collection is mapped into the three-dimensional scene image according to the camera coordinates and the rotation angle, so that the monitoring video image collection is automatically matched into the three-dimensional scene image, and the problem that the mapping efficiency of the video image and the three-dimensional scene is low in the prior art is solved.

In some examples, after the surveillance video image collection is mapped to the three-dimensional scene image, it needs to be evaluated to ensure that the surveillance video image collection and the three-dimensional scene image fusion result meet the intended target and quality requirements. Common evaluation indexes include information entropy, structural Similarity Index (SSI), peak signal to noise ratio (PSNR), etc., and the effect and quality of fusion can be quantitatively evaluated by these indexes.

In one embodiment, step S70 includes:

step S71, determining the time stamp information of each monitoring video image in the monitoring video image collection;

in order to improve the accuracy of the monitoring video image collection projected on the three-dimensional model, time stamp information is added on each monitoring video image, and the monitoring video images are projected according to the time stamp information.

Step S72, mapping the surveillance video image to the three-dimensional scene image according to the time stamp information of each surveillance video image and the camera coordinates and rotation angle.

Each monitoring video image in the monitoring video image collection is mapped into the three-dimensional scene image sequentially according to the time stamp information, so that the problem that the sequence of the plurality of monitoring video images is disordered is solved, and the monitoring video images are projected into the three-dimensional scene image according to the combination of the camera coordinates and the rotation angle of the target camera equipment and the time stamp information of each monitoring video image.

In one embodiment, step S72 includes:

The stitched image is mapped to a three-dimensional scene image.

In one embodiment, since the surveillance video image includes at least two image frames of the image capturing device, in order to improve the accuracy of mapping the surveillance video image collection on the three-dimensional scene image, the image frame collection with the same time stamp is selected from the video surveillance image collection according to the time stamp information, and each image frame in the image frame collection is spliced to obtain a spliced image, and the spliced image is mapped to the three-dimensional scene image.

In summary, it can be seen that in the embodiment provided by the present application, the video image is mapped into the three-dimensional scene image, firstly, the preparation of the image data is performed, secondly, the extraction, matching and selection of the feature points are performed, then, the camera coordinates and the rotation angle of the target image capturing device are calculated, and finally, the monitoring video image is projected according to the camera coordinates and the rotation angle. According to the scheme, the monitoring video image collection shot by each camera device in the target monitoring area is automatically matched into the three-dimensional scene image, and the parameter values of the camera devices are automatically adjusted and regulated to restore the information of the camera devices, so that the problem of low mapping efficiency of the video image and the three-dimensional scene in the prior art is solved.

The embodiments of the present application are described above from the perspective of a video image mapping method, and the embodiments of the present application are described below from the perspective of a video image three-dimensional scene mapping system, please refer to fig. 5, fig. 5 is a schematic diagram of a virtual structure of the video image three-dimensional scene mapping system provided in the embodiments of the present application, and the video image three-dimensional scene mapping system 500 includes:

the acquisition module 501 is configured to acquire position information of a target monitoring area, and load a three-dimensional scene image corresponding to the target monitoring area according to the position information;

the acquisition module 502 is configured to acquire a monitoring video image collection corresponding to each of at least two image capturing devices in the target monitoring area, where each of the at least two image capturing devices is located at a different position;

a first extraction module 503, configured to extract a plurality of first feature points in the three-dimensional scene image, and extract a plurality of second feature points in a target surveillance video image, where the target surveillance video image is any surveillance video image in the surveillance video image set;

the matching module 504 is configured to match a first target feature point with a second target feature point to obtain a feature point matching degree set, where the first target feature point is any one feature point of the plurality of first feature points, and the second target feature point is any one feature point of the plurality of second feature points;

A second extraction module 505, configured to determine N first target feature points and N second target feature points based on the feature point matching degree set, the plurality of first feature points and the plurality of second feature points, where N is an integer greater than or equal to 4;

a processing module 506, configured to calculate, according to the N first target feature points and the N second target feature points, camera coordinates of a target image capturing apparatus and a rotation angle of the target image capturing apparatus, where the target image capturing apparatus is any one of the at least two image capturing apparatuses;

the mapping module 507 is configured to map the surveillance video image aggregate to a three-dimensional scene image according to the camera coordinates and the rotation angle.

In a possible implementation manner, the processing module 506 is specifically configured to:

reading the position information of N first target feature points to obtain N image coordinates;

reading the position information of N second target feature points to obtain N screen coordinates, and converting each screen coordinate in the N screen coordinates into a world coordinate to obtain N world coordinates;

and calling a solvepnp function to calculate N image coordinates and N world coordinates so as to obtain camera coordinates and a rotation angle.

In a possible implementation manner, the processing module 506 is specifically further configured to:

In a possible implementation manner, the obtaining module 501 is specifically configured to:

loading three-dimensional model data based on the position information;

In a possible implementation manner, the acquisition module 502 is specifically configured to:

In a possible implementation manner, the mapping module 507 is specifically configured to:

and mapping the monitoring video images to the three-dimensional scene images according to the time stamp information of each monitoring video image and the camera coordinates and the rotation angle.

In a possible implementation manner, the mapping module 507 is specifically further configured to:

the stitched image is mapped to a three-dimensional scene image.

In a possible implementation manner, the first extracting module 503 is specifically configured to:

In a possible implementation manner, the matching module 504 is specifically configured to:

In the embodiment of the application, the video image is mapped into the three-dimensional scene image, and firstly, the acquisition module and the acquisition module are used for preparing data, loading the three-dimensional scene image and acquiring the monitoring video image set. And secondly, carrying out feature extraction, feature point matching and feature selection through a first extraction module, a matching module and a second extraction module, extracting a plurality of first feature points and a plurality of second feature points, matching the first target feature points with the second target feature points to obtain a feature point matching degree set, and selecting N first target feature points and N second target feature points based on the feature point matching degree set. Next, camera coordinates and rotation angles of the target image capturing apparatus are calculated by the processing module from the N first target feature points and the N second target feature points. And finally, mapping the monitoring video image aggregate into the three-dimensional scene image according to the camera coordinates and the rotation angle through the film and television module. The information of the image pickup equipment is restored by automatically adjusting the parameter values of the image pickup equipment, and the monitoring video image collection shot by each image pickup equipment in the target monitoring area is automatically matched into the three-dimensional scene image, so that the problem of low mapping efficiency of the video image and the three-dimensional scene in the prior art is solved.

In one embodiment, there is also provided an electronic device including a memory and a processor, the memory storing a computer program that can be loaded and executed by the processor, the processor implementing the steps in the method embodiments described above when executing the computer program.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

In one embodiment, there is also provided a computer readable storage medium storing a computer program capable of being loaded and executed by a processor, the computer program when executed by the processor implementing the steps of the video image mapping method described above.

The readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are not intended to limit the scope of the present application, so: all equivalent changes in structure, shape and principle of the application should be covered in the scope of protection of the application.

Claims

1. A method for three-dimensional scene mapping of video images, the method comprising:

2. The method of claim 1, wherein the calculating camera coordinates of a target image capturing apparatus and a rotation angle of the target image capturing apparatus from the N first target feature points and the N second target feature points comprises:

3. The method of claim 2, wherein the invoking a solvepnp function to calculate the N image coordinates and the N world coordinates to obtain the camera coordinates and the rotation angle comprises:

4. The method of claim 1, wherein the acquiring the location information of the target monitoring area and loading the three-dimensional scene image corresponding to the target monitoring area according to the location information comprises:

loading three-dimensional model data based on the position information;

5. The method of claim 1, wherein the acquiring a surveillance video image collection corresponding to each of at least two image capturing devices in the target surveillance area comprises:

6. The method of claim 1, wherein the mapping the surveillance video image collection to the three-dimensional scene image according to the camera coordinates and the rotation angle comprises:

7. The method of claim 6, wherein said mapping said surveillance video images to said three-dimensional scene image based on said time stamp information for each surveillance video image and said camera coordinates and said rotation angle further comprises:

Mapping the stitched image to the three-dimensional scene image.

8. The method of claim 1, wherein extracting the plurality of second feature points in the target surveillance video image comprises:

9. The method of claim 1, wherein matching the first target feature point with the second target feature point to obtain a feature point matching degree set comprises:

10. A three-dimensional scene mapping system for video images, comprising: